pax_global_header00006660000000000000000000000064143555474660014535gustar00rootroot0000000000000052 comment=79e4599ba775bc4e9dc245177f750886dd52b5ca UpSetPlot-0.8.0/000077500000000000000000000000001435554746600134415ustar00rootroot00000000000000UpSetPlot-0.8.0/.gitignore000066400000000000000000000016571435554746600154420ustar00rootroot00000000000000######################################### # Editor temporary/working/backup files # .#* *\#*\# [#]*# *~ *$ *.bak *flymake* *.kdev4 *.log *.swp *.pdb .project .pydevproject .settings .idea .vagrant .noseids .ipynb_checkpoints .tags .cache/ # Compiled source # ################### *.a *.com *.class *.dll *.exe *.pxi *.o *.py[ocd] *.so .build_cache_dir MANIFEST # Python files # ################ # setup.py working directory build # sphinx build directory doc/_* # setup.py dist directory dist # Egg metadata *.egg-info .eggs .pypirc # tox testing tool .tox # rope .ropeproject # wheel files *.whl **/wheelhouse/* # coverage .coverage coverage.xml coverage_html_report # OS generated files # ###################### .directory .gdb_history .DS_Store ehthumbs.db Icon? Thumbs.db # Data files # ############## *.dta *.xpt *.h5 # Generated Sources # ##################### !skts.c !np_datetime.c !np_datetime_strings.c *.c *.cpp .pytest_cache UpSetPlot-0.8.0/.readthedocs.yaml000066400000000000000000000007611435554746600166740ustar00rootroot00000000000000# .readthedocs.yaml # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details # Required version: 2 # Set the version of Python and other tools you might need build: os: ubuntu-20.04 tools: python: "3.9" # Build documentation in the docs/ directory with Sphinx sphinx: configuration: doc/conf.py # Optionally declare the Python requirements required to build your docs python: install: - requirements: doc/requirements.txt UpSetPlot-0.8.0/.travis.yml000066400000000000000000000021751435554746600155570ustar00rootroot00000000000000language: generic install: - test -x $HOME/miniconda/bin/conda || (sudo apt-get update; wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh; bash miniconda.sh -u -b -p $HOME/miniconda) - export PATH="$HOME/miniconda/bin:$PATH" - hash -r - conda config --set always_yes yes --set changeps1 no - conda update -q conda # Useful for debugging any issues with conda - conda info -a - conda create -c conda-forge -q -n test-environment $CONDA_DEPS $FIXED_CONDA_DEPS - source activate test-environment - python setup.py install - pip list - cp ci/matplotlibrc matplotlibrc script: - pytest - pip install flake8 # fixes issue with missing typing import - flake8 after_success: - coveralls env: global: - FIXED_CONDA_DEPS="flake8 pytest pytest-cov<2.6 coveralls" matrix: include: - env: CONDA_DEPS="python=3.6 pandas=1.0 matplotlib=2.1.2 numpy=1.17" - env: CONDA_DEPS="python=3.10 pandas matplotlib seaborn" - env: CONDA_DEPS="python pandas matplotlib seaborn scikit-learn" # extra deps to run examples branches: only: - master - /^[0-9]+\.[0-9]+.*$/ UpSetPlot-0.8.0/CHANGELOG.rst000066400000000000000000000125331435554746600154660ustar00rootroot00000000000000What's new in version 0.8 ------------------------- - Allowed ``show_percentages`` to be provided with a custom formatting string, for example to show more decimal places. (:issue:`194`) - Added `include_empty_subsets` to `UpSet` and `query` to allow the display of all possible subsets. (:issue:`185`) - `sort_by` and `sort_categories_by` now accept '-' prefix to their values to sort in reverse. 'input' and '-input' are also supported. (:issue:`180`) - Added `subsets` attribute to QueryResult. (:issue:`198`) - Fixed a bug where more than 64 categories could result in an error. (:issue:`193`) What's new in version 0.7 ------------------------- - Added `query` function to support analysing set-based data. - Fixed support for matplotlib >3.5.2 (:issue:`191`. Thanks :user:`GuyTeichman`) What's new in version 0.6 ------------------------- - Added `add_stacked_bars`, similar to `add_catplot` but to add stacked bar charts to show discrete variable distributions within each subset. (:issue:`137`) - Improved ability to control colors, and added a new example of same. Parameters ``other_dots_color`` and ``shading_color`` were added. ``facecolor`` will now default to white if ``matplotlib.rcParams['axes.facecolor']`` is dark. (:issue:`138`) - Added `style_subsets` to colour intersection size bars and matrix dots in the plot according to a specified query. (:issue:`152`) - Added `from_indicators` to allow yet another data input format. This allows category membership to be easily derived from a DataFrame, such as when plotting missing values in the columns of a DataFrame. (:issue:`143`) What's new in version 0.5 ------------------------- - Support using input intersection order with ``sort_by=None`` (:issue:`133` with thanks to :user:`Brandon B `). - Add parameters for filtering by subset size (with thanks to :user:`Sichong Peng `) and degree. (:issue:`134`) - Fixed an issue where tick labels were not given enough space and overlapped category totals. (:issue:`132`) - Fixed an issue where our implementation of ``sort_by='degree'`` apparently gave incorrect results for some inputs and versions of Pandas. (:issue:`134`) What's new in version 0.4.4 --------------------------- - Fixed a regresion which caused the first column to be hidden (:issue:`125`) What's new in version 0.4.3 --------------------------- - Fixed issue with the order of catplots being reversed for vertical plots (:issue:`122` with thanks to :user:`Enrique Fernandez-Blanco `) - Fixed issue with the x limits of vertical plots (:issue:`121`). What's new in version 0.4.2 --------------------------- - Fixed large x-axis plot margins with high number of unique intersections (:issue:`106` with thanks to :user:`Yidi Huang `) What's new in version 0.4.1 --------------------------- - Fixed the calculation of percentage which was broken in 0.4.0. (:issue:`101`) What's new in version 0.4 ------------------------- - Added option to display both the absolute frequency and the percentage of the total for each intersection and category. (:issue:`89` with thanks to :user:`Carlos Melus ` and :user:`Aaron Rosenfeld `) - Improved efficiency where there are many categories, but valid combinations are sparse, if `sort_by='degree'`. (:issue:`82`) - Permit truthy (not necessarily bool) values in index. (:issue:`74` with thanks to :user:`ZaxR`) - `intersection_plot_elements` can now be set to 0 to hide the intersection size plot when `add_catplot` is used. (:issue:`80`) What's new in version 0.3 ------------------------- - Added `from_contents` to provide an alternative, intuitive way of specifying category membership of elements. - To improve code legibility and intuitiveness, `sum_over=False` was deprecated and a `subset_size` parameter was added. It will have better default handling of DataFrames after a short deprecation period. - `generate_data` has been replaced with `generate_counts` and `generate_samples`. - Fixed the display of the "intersection size" label on plots, which had been missing. - Trying to improve nomenclature, upsetplot now avoids "set" to refer to the top-level sets, which are now to be known as "categories". This matches the intuition that categories are named, logical groupings, as opposed to "subsets". To this end: - `generate_counts` (formerly `generate_data`) now names its categories "cat1", "cat2" etc. rather than "set1", "set2", etc. - the `sort_sets_by` parameter has been renamed to `sort_categories_by` and will be removed in version 0.4. What's new in version 0.2.1 --------------------------- - Return a Series (not a DataFrame) from `from_memberships` if data is 1-dimensional. What's new in version 0.2 ------------------------- - Added `from_memberships` to allow a more convenient data input format. - `plot` and `UpSet` now accept a `pandas.DataFrame` as input, if the `sum_over` parameter is also given. - Added an `add_catplot` method to `UpSet` which adds Seaborn plots of set intersection data to show more than just set size or total. - Shading of subset matrix is continued through to totals. - Added a `show_counts` option to show counts at the ends of bar plots. (:issue:`5`) - Defined `_repr_html_` so that an `UpSet` object will render in Jupyter notebooks. (:issue:`36`) - Fix a bug where an error was raised if an input set was empty. UpSetPlot-0.8.0/LICENSE000066400000000000000000000027121435554746600144500ustar00rootroot00000000000000New BSD License Copyright (c) 2018 Joel Nothman. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. c. The names of the contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. UpSetPlot-0.8.0/README.rst000066400000000000000000000146271435554746600151420ustar00rootroot00000000000000UpSetPlot documentation ============================ |version| |licence| |py-versions| |issues| |build| |docs| |coverage| This is another Python implementation of UpSet plots by Lex et al. [Lex2014]_. UpSet plots are used to visualise set overlaps; like Venn diagrams but more readable. Documentation is at https://upsetplot.readthedocs.io. This ``upsetplot`` library tries to provide a simple interface backed by an extensible, object-oriented design. There are many ways to represent the categorisation of data, as covered in our `Data Format Guide `_. Our internal input format uses a `pandas.Series` containing counts corresponding to subset sizes, where each subset is an intersection of named categories. The index of the Series indicates which rows pertain to which categories, by having multiple boolean indices, like ``example`` in the following:: >>> from upsetplot import generate_counts >>> example = generate_counts() >>> example cat0 cat1 cat2 False False False 56 True 283 True False 1279 True 5882 True False False 24 True 90 True False 429 True 1957 Name: value, dtype: int64 Then:: >>> from upsetplot import plot >>> plot(example) # doctest: +SKIP >>> from matplotlib import pyplot >>> pyplot.show() # doctest: +SKIP makes: .. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_generated_001.png :target: ../auto_examples/plot_generated.html And you can save the image in various formats:: >>> pyplot.savefig("/path/to/myplot.pdf") # doctest: +SKIP >>> pyplot.savefig("/path/to/myplot.png") # doctest: +SKIP This plot shows the cardinality of every category combination seen in our data. The leftmost column counts items absent from any category. The next three columns count items only in ``cat1``, ``cat2`` and ``cat3`` respectively, with following columns showing cardinalities for items in each combination of exactly two named sets. The rightmost column counts items in all three sets. Rotation ........ We call the above plot style "horizontal" because the category intersections are presented from left to right. `Vertical plots `__ are also supported! .. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_vertical_001.png :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html Distributions ............. Providing a DataFrame rather than a Series as input allows us to expressively `plot the distribution of variables `__ in each subset. .. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_boston_001.png :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_boston.html Loading datasets ................ While the dataset above is randomly generated, you can prepare your own dataset for input to upsetplot. A helpful tool is `from_memberships`, which allows us to reconstruct the example above by indicating each data point's category membership:: >>> from upsetplot import from_memberships >>> example = from_memberships( ... [[], ... ['cat2'], ... ['cat1'], ... ['cat1', 'cat2'], ... ['cat0'], ... ['cat0', 'cat2'], ... ['cat0', 'cat1'], ... ['cat0', 'cat1', 'cat2'], ... ], ... data=[56, 283, 1279, 5882, 24, 90, 429, 1957] ... ) >>> example cat0 cat1 cat2 False False False 56 True 283 True False 1279 True 5882 True False False 24 True 90 True False 429 True 1957 dtype: int64 See also `from_contents`, another way to describe categorised data, and `from_indicators` which allows each category to be indicated by a column in the data frame (or a function of the column's data such as whether it is a missing value). Installation ------------ To install the library, you can use `pip`:: $ pip install upsetplot Installation requires: * pandas * matplotlib >= 2.0 * seaborn to use `UpSet.add_catplot` It should then be possible to:: >>> import upsetplot in Python. Why an alternative to py-upset? ------------------------------- Probably for petty reasons. It appeared `py-upset `_ was not being maintained. Its input format was undocumented, inefficient and, IMO, inappropriate. It did not facilitate showing plots of each subset's distribution as in Lex et al's work introducing UpSet plots. Nor did it include the horizontal bar plots illustrated there. It did not support Python 2. I decided it would be easier to construct a cleaner version than to fix it. References ---------- .. [Lex2014] Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister, *UpSet: Visualization of Intersecting Sets*, IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), vol. 20, no. 12, pp. 1983–1992, 2014. doi: `doi.org/10.1109/TVCG.2014.2346248 `_ .. |py-versions| image:: https://img.shields.io/pypi/pyversions/upsetplot.svg :alt: Python versions supported .. |version| image:: https://badge.fury.io/py/UpSetPlot.svg :alt: Latest version on PyPi :target: https://badge.fury.io/py/UpSetPlot .. |build| image:: https://travis-ci.org/jnothman/UpSetPlot.svg?branch=master :alt: Travis CI build status :scale: 100% :target: https://travis-ci.org/jnothman/UpSetPlot .. |issues| image:: https://img.shields.io/github/issues/jnothman/UpSetPlot.svg :alt: Issue tracker :target: https://github.com/jnothman/UpSetPlot .. |coverage| image:: https://coveralls.io/repos/github/jnothman/UpSetPlot/badge.svg :alt: Test coverage :target: https://coveralls.io/github/jnothman/UpSetPlot .. |docs| image:: https://readthedocs.org/projects/upsetplot/badge/?version=latest :alt: Documentation Status :scale: 100% :target: https://upsetplot.readthedocs.io/en/latest/?badge=latest .. |licence| image:: https://img.shields.io/badge/Licence-BSD-blue.svg :target: https://opensource.org/licenses/BSD-3-Clause UpSetPlot-0.8.0/ci/000077500000000000000000000000001435554746600140345ustar00rootroot00000000000000UpSetPlot-0.8.0/ci/matplotlibrc000066400000000000000000000000161435554746600164500ustar00rootroot00000000000000backend : Agg UpSetPlot-0.8.0/conftest.py000066400000000000000000000005331435554746600156410ustar00rootroot00000000000000import sys import pytest from _pytest.doctest import DoctestItem def pytest_runtest_setup(item): if isinstance(item, DoctestItem): if sys.version_info.major < 3 or (sys.version_info.major == 3 and sys.version_info.minor < 6): pytest.skip('Doctests are disabled in Python < 3.6') UpSetPlot-0.8.0/doc/000077500000000000000000000000001435554746600142065ustar00rootroot00000000000000UpSetPlot-0.8.0/doc/Makefile000066400000000000000000000155371435554746600156610ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # User-friendly check for sphinx-build ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " xml to make Docutils-native XML files" @echo " pseudoxml to make pseudoxml-XML files for display purposes" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* -rm -rf auto_examples/ -rm -rf _modules/* html: # These two lines make the build a bit more lengthy, and the # the embedding of images more robust rm -rf $(BUILDDIR)/html/_images #rm -rf _build/doctrees/ $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/project-template.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/project-template.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/project-template" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/project-template" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." latexpdfja: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through platex and dvipdfmx..." $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." xml: $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml @echo @echo "Build finished. The XML files are in $(BUILDDIR)/xml." pseudoxml: $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml @echo @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." UpSetPlot-0.8.0/doc/api.rst000066400000000000000000000007251435554746600155150ustar00rootroot00000000000000 API Reference ............. .. currentmodule:: upsetplot Plotting -------- .. autofunction:: plot .. autoclass:: UpSet :members: Dataset loading and generation ------------------------------ .. autofunction:: from_contents .. autofunction:: from_indicators .. autofunction:: from_memberships .. autofunction:: generate_counts .. autofunction:: generate_samples Data querying and transformation -------------------------------- .. autofunction:: query UpSetPlot-0.8.0/doc/changelog.rst000066400000000000000000000000641435554746600166670ustar00rootroot00000000000000 Changelog ......... .. include:: ../CHANGELOG.rst UpSetPlot-0.8.0/doc/conf.py000066400000000000000000000211361435554746600155100ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # project-template documentation build configuration file, created by # sphinx-quickstart on Mon Jan 18 14:44:12 2016. # # This file is execfile()d with the current directory set to its # containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys import os import re # project root sys.path.insert(0, os.path.abspath('..')) import matplotlib # noqa matplotlib.use('agg') import sphinx_rtd_theme # noqa from upsetplot import __version__ as release # noqa # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # -- General configuration --------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. # needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ 'sphinx_gallery.gen_gallery', 'sphinx.ext.autodoc', 'sphinx.ext.autosummary', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'numpydoc', 'sphinx.ext.ifconfig', 'sphinx.ext.viewcode', 'sphinx_issues', 'nbsphinx', ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. # source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'upsetplot' copyright = u'2018-2022, Joel Nothman' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = re.match(r'^\d+(\.\d+)*', release).group() # version = upsetplot.__version__ # The full version, including alpha/beta/rc tags. # release = version # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: # today = '' # Else, today_fmt is used as the format for a strftime call. # today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all # documents. default_role = 'any' # If true, '()' will be appended to :func: etc. cross-reference text. # add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). # add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. # show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. # modindex_common_prefix = [] # If true, keep warnings as "system message" paragraphs in the built documents. # keep_warnings = False # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme = 'sphinx_rtd_theme' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". # html_title = None # A shorter title for the navigation bar. Default is the same as html_title. # html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. # html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. # html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Add any extra paths that contain custom files (such as robots.txt or # .htaccess) here, relative to this directory. These files are copied # directly to the root of the documentation. # html_extra_path = [] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. # html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. # html_use_smartypants = True # Custom sidebar templates, maps document names to template names. # html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. # html_additional_pages = {} # If false, no module index is generated. # html_domain_indices = True # If false, no index is generated. # html_use_index = True # If true, the index is split into individual pages for each letter. # html_split_index = False # If true, links to the reST sources are added to the pages. # html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. # html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. # html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. # html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). # html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'project-templatedoc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # 'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ ('index', 'upsetplot.tex', u'upsetplot Documentation', u'Joel Nothman', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. # latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. # latex_use_parts = False # If true, show page references after internal links. # latex_show_pagerefs = False # If true, show URL addresses after external links. # latex_show_urls = False # Documents to append as an appendix to all manuals. # latex_appendices = [] # If false, no module index is generated. # latex_domain_indices = True # Documents to append as an appendix to all manuals. # texinfo_appendices = [] # If false, no module index is generated. # texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. # texinfo_show_urls = 'footnote' # If true, do not generate a @detailmenu in the "Top" node's menu. # texinfo_no_detailmenu = False # Example configuration for intersphinx: refer to the Python standard library. intersphinx_mapping = { 'python': ('http://docs.python.org/', None), 'numpy': ('https://docs.scipy.org/doc/numpy/', None), 'matplotlib': ('https://matplotlib.org/', None), 'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None), } # Config for sphinx_issues issues_uri = 'https://github.com/jnothman/upsetplot/issues/{issue}' issues_github_path = 'jnothman/upsetplot' issues_user_uri = 'https://github.com/{user}' sphinx_gallery_conf = { # path to your examples scripts 'examples_dirs': '../examples', # path where to save gallery generated examples 'gallery_dirs': 'auto_examples', 'backreferences_dir': '_modules', } UpSetPlot-0.8.0/doc/formats.ipynb000066400000000000000000023350301435554746600167320ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Format Guide" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "UpSetPlot fundamentally is about visualizing datapoints (or data aggregates) that are each assigned to one or more categories. Curiously, there are many ways to represent categories as data structures. Object 1 belongs to categories `A` and `B` and object 2 belongs to category `B` only, this information can be represented by:\n", "\n", "* listing the memberships for each object, i.e.\n", " ```\n", " [[\"A\", \"B\"], # object 1\n", " [\"B\"]] # object 2\n", " ```\n", "* listing the contents of each category, i.e.\n", " ```\n", " {\"A\": [1], \"B\": [1, 2]}\n", " ```\n", "* using a boolean-valued indicator matrix (perhaps columns in a larger DataFrame), i.e.\n", " ```\n", " # A B\n", " [[ True, True ], # object 1\n", " [ False, True ]] # object 2\n", " ```\n", "\n", "Moreover, UpSetPlot aims to handle both of the following cases:\n", "\n", "* where only aggregates (e.g. counts) of the values in each category subset are given; and\n", "* there are data points with several attributes in each category subset, where these attributes can be visualized as well as aggregates.\n", "\n", "This guide reviews the internal data format and alternative representations, but we recommend using the helper functions [`from_memberships`](api.html#upsetplot.from_memberships), [`from_contents`](api.html#upsetplot.from_contents) or [`from_indicators`](api.html#upsetplot.from_indicators) depending on how it's most convenient to express your data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Internal data format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "UpSetPlot internally works with data based on [Pandas](https://pandas.pydata.org/) data structres: a Series when all you care about is counts, or a DataFrame when you're interested in visualising additional properties of the data, such as with the `UpSet.add_catplot` method.\n", "\n", "UpSetPlot expects the Series or DataFrame to have a MultiIndex as input, with this index being an indicator matrix. Specifically, each category is a level in the `pandas.MultiIndex` with boolean values.\n", "\n", "Note: This internal data format may change in a future version since it is not efficient. Using the `from_*` methods will provide more stable compatibility with future releases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use `Series` as input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a minimal example using `Series` as input: \n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.057148Z", "start_time": "2021-07-31T14:30:31.618795Z" } }, "outputs": [ { "data": { "text/plain": [ "cat0 cat1 cat2 \n", "False False False 56\n", " True 283\n", " True False 1279\n", " True 5882\n", "True False False 24\n", " True 90\n", " True False 429\n", " True 1957\n", "Name: value, dtype: int64" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from upsetplot import generate_counts\n", "example_counts = generate_counts()\n", "example_counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a `pandas.Series` with 3-level Multi-index. Each level is a `Set`: `cat0`, `cat1`, and `cat2`. Each row is a unique subset with boolean values in indices indicating memberships of each row. The value in each row indicates the number of observations in each subset. `upsetplot` will simply plot these numbers when supplied with a `Series`: \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.241823Z", "start_time": "2021-07-31T14:30:32.059689Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(example_counts).plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can supply a `Series` with each observation in a row: \n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.255597Z", "start_time": "2021-07-31T14:30:32.244146Z" } }, "outputs": [ { "data": { "text/plain": [ "cat0 cat1 cat2\n", "False True True 1.652317\n", " True 1.510447\n", " False True 1.584646\n", " True 1.279395\n", " True True 2.338243\n", " ... \n", " True 1.701618\n", " True 1.577837\n", "True True True 1.757554\n", "False True True 1.407799\n", "True True True 1.709067\n", "Name: value, Length: 10000, dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from upsetplot import generate_samples\n", "example_values = generate_samples().value\n", "example_values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we can use `subset_size='count'` to have `upsetplot` count the number of observations in each unique subset and plot them: \n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.471669Z", "start_time": "2021-07-31T14:30:32.258639Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(example_values, subset_size='count').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, we can weight each subset's size by the series value:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.677424Z", "start_time": "2021-07-31T14:30:32.474115Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVgAAAD9CAYAAAD5ym+pAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO2deZgU1bXAf4cZFlmGRQGBQRZFgRFBIDhuESUIJgSMuOECooRoXIgG0UQTfL4Yl7hvEOOGPNwCRnBXMKBElKAgCoYlDMoAEY2yisAw5/1xq2d6hu6e6uruqenh/L6vvq66ddeqrlO3zr33HFFVDMMwjPRTJ+wKGIZh1FZMwBqGYWQIE7CGYRgZwgSsYRhGhjABaxiGkSFMwBqGYWQIE7CGkcWIyNUiskxEPhWRZ0SkgYg8JyJLvG2tiCyJiv8bEVktIitEZFCM/GaJyKfV24raS27YFTAMIxgi0g64CuiuqjtF5HngXFU9JyrOXcAWb787cC5QALQFZovI4aq61zt/BrC9mptRq7EerGFkN7nAASKSCzQENkROiIgAZwPPeEHDgGdVdZeqFgGrgX5e3MbANcAfqrHutR4TsIaRpajqeuBO4AtgI7BFVd+MinIi8KWqrvKO2wHros4Xe2EA/wvcBXyX0Urvb6iqbbb53gYNGqRGzeCbb77Rk08+WTdt2qS7d+/WYcOG6dSpU8vOX3rppXrnnXeWHf/yl7+scP7iiy/W6dOn6+LFi3XIkCGqqlpUVKQFBQXV14jaQdznxXqwRlJ8/fXXYVfB8Jg9ezadOnWiZcuW1K1blzPOOIP33nsPgJKSEl544QXOOadMHUt+fj7r1pV3YIuLi2nbti0LFizgww8/pGPHjpxwwgmsXLmS/v37V3dzaiUmYA0jSznkkEN4//33+e6771BV5syZQ7du3QAnfLt27Up+fn5Z/KFDh/Lss8+ya9cuioqKWLVqFf369eOyyy5jw4YNrF27lvnz53P44Yczd+7ckFpVu7BZBIaRpRxzzDGceeaZ9O7dm9zcXI4++mjGjh0LwLPPPsuIESMqxC8oKODss8+me/fu5Obm8tBDD5GTkxNG1fcbRNXMFRr+6du3ry5atCjsahhGTULinTAVgWEYRoYwAWsYhpEhTAdrGLUMt77AP6YmzBzWgzUMw8gQJmANwzAyhAlYwzCMDGEC1jAMI0OYgDUMw8gQJmANwzAyhAnYLCSOFfsWIvKWiKzyfptHxY9pxV5E+ojIJ965+yXZ+T2GYSTEBGyWEWXFvq+qHgnk4KzUXw/MUdUuwBzvuLIV+8HAwyISWYA+CRgLdPG2wdXYFMOo9ZiAzU5iWbEfBkzxzk8BTvf2Y1qxF5E2QJ6qLlA30/ypqDSGYaQBE7BZhsa3Yt9aVTd6cTYCrbwk8azYt/P2K4cbhpEmbKlsluHpVocBnYDNwF9F5IJESWKEaYLwWGWOxakSaN26tdkKrWXY/UyNRMbJTcBmHz8CilT1KwAReQE4DvhSRNqo6kbv83+TF78YaB+VPh+nUij29iuH74OqPgI8As5coVm7r13Y/cwcpiLIPr4ACkWkoTfqPwD4DJgFjPLijAJmevuzgHNFpL6IdMINZi301AjbRKTQy2dkVBrDMNKA9WCzDFX9QESmAx8BJcBiXO+yMfC8iFyCE8JnefGXicjzwHIv/uWqutfL7jLgSeAA4DVvMwwjTZhHAyMpzKNBzcfMFVY75tHAMAyjujEBaxiGkSFMwBqGYWQIE7CGYRgZwgSsYRhGhjABaxiGkSFMwBqGYWQIE7CGYRgZwgRsyIhIBxH5kbd/gIg0CbtOhmGkBxOwISIiPwemA3/2gvKBF8OrkWEY6cQEbLhcDhwPbAVQ1VWU23E1DCPLMQEbLrtUdXfkwPNQYAvDDaOWYAI2XOaJyG9x7l8GAn8FXgq5ToZhpAkTsOFyPfAV8AnwC+BVVb0h3CoZhpEuzB5suJyPc0j4l0iAiAxR1ZdDrJNhGGnCerDh8gDwroh0iwq7OazKGIaRXkzAhksRcDEwXUTO8sKSs5ZsGEaNxVQE4aKq+pGInAQ8IyLHADlhV8owjPRgPdhw2Qigql8Dg3BTtI4MtUaGYaQNE7Ahoqo/idovVdVrVdXuiWHUEkxFEAIicq+q/kpEXiLGwgJVHRpCtQzDSDMmYMNhqvd7Z6i1MAwjo9jnaAio6ofe77zIBiwFvvX2EyIizURkuoj8S0Q+E5FjRaSFiLwlIqu83+ZR8X8jIqtFZIWIDIoK7yMin3jn7pdk/T0bhpEQE7AhIiJzRSRPRFoAHwNPiMjdPpLeB7yuql2BnsBnuFVhc1S1CzDHO0ZEugPnAgXAYOBhEYnMVJgEjAW6eNvgtDXOMAwTsCHTVFW3AmcAT6hqH+BHiRKISB7wQ+AxAFXdraqbgWHAFC/aFOB0b38YbrXYLlUtAlYD/USkDZCnqgtUVYGnotIYhpEGTMCGS64n6M4G/C6P7YyzX/CEiCwWkUdFpBHQWlUj0742Um72sB2wLip9sRfWztuvHG4YRpqwQa5wuRl4A5ivqv8Ukc7AqirS5AK9gStV9QMRuQ9PHRCHWHpVTRC+bwYiY3GqBFq3bs3cuXOrqKKRTdj9TI3+/fvHPWcCNkRU9a84E4WR4zXA8CqSFQPFqvqBdzwdJ2C/FJE2qrrR6xVviorfPip9PrDBC8+PER6rno8AjwD07dtXE/2hjOzD7mfmMBVBlqGq/wHWicgRXtAAYDkwCxjlhY0CZnr7s4BzRaS+iHTCDWYt9NQI20Sk0Js9MDIqjWEYacB6sNnJlcA0EakHrAFG416Wz4vIJcAXwFkAqrpMRJ7HCeES4HJV3evlcxnwJHAA8Jq3GYaRJsQNIBuGP/r27auLFi0KuxpGApKdzmwyIGXiXnDrwYaIiNTH6Vw7EnUvVNVswhpGLcAEbLjMBLYAHwK7Qq6LYRhpxgRsuOSrqq2eMoxais0iCJf3RKRH2JUwDCMzWA82XE4ALhKRIpyKQHBeDo4Kt1qGYaQDE7DhclrYFTAMI3OYiiBEVPVzoBnwU29r5oUZhlELMAEbIiIyDpiGM8zSCvg/Ebky3FoZhpEuTEUQLpcAx6jqDgARuR1YADwQaq0Mw0gL1oMNFwH2Rh3vJcGqEMMwsgvrwYbLE8AHIvI37/h0PEPahmFkPyZgQ0RV7xaRubjpWgKMVtXF4dbKMIx0YQI2BEQkT1W3er641npb5FwLVf0mrLoZhpE+TMCGw9PAEJwNgmhTRuIddw6jUoZhpBcTsCGgqkO8305h18UwjMxhswhCRETm+AkzDCM7MQEbAiLSwNO/HiQizUWkhbd1BNqGWzvDqF18//339OvXj549e1JQUMDEiRMBWLJkCYWFhfTq1Yu+ffuycOFCAPbs2cOoUaPo0aMH3bp149Zbby3L68MPP6RHjx4cdthhXHXVVVUbK1dV26p5A8YBEQMva7z9IuBj4Iqw65do69Onjxo1G5we3/dW2yktLdVt27apquru3bu1X79+umDBAh04cKC++uqrqqr6yiuv6EknnaSqqtOmTdNzzjlHVVV37NihHTp00KKiIlVV/cEPfqDvvfeelpaW6uDBgyPp4z4v1oMNAVW9T53+dbyqdlbVTt7WU1UfDLt+hlGbEBEaN24MuN7pnj17EBFEhK1btwKwZcsW2rZtWxZ/x44dlJSUsHPnTurVq0deXh4bN25k69atHHvssYgII0eO5MUXX0xYtgnYcCkVkWaRA09d8MswK2QYtZG9e/fSq1cvWrVqxcCBAznmmGO49957ufbaa2nfvj3jx48vUwWceeaZNGrUiDZt2nDIIYcwfvx4WrRowfr168nPL/d0n5+fz/r16xOWawI2XH6uqpsjB6r6LfDzEOtjGLWSnJwclixZQnFxMQsXLuTTTz9l0qRJ3HPPPaxbt4577rmHSy65BICFCxeSk5PDhg0bKCoq4q677mLNmjUR9V4FqnIwaQI2XOpI1B0SkRygXoj1MYxaTbNmzejfvz+vv/46U6ZM4YwzzgDgrLPOKhvkevrppxk8eDB169alVatWHH/88SxatIj8/HyKi4vL8iouLi5TK8TDBGy4vAE8LyIDROQU4BngdT8JRSRHRBaLyMvecQsReUtEVnm/zaPi/kZEVovIChEZFBXeR0Q+8c7dL1W9jg0jC/nqq6/YvNl9KO7cuZPZs2fTtWtX2rZty7x58wB4++236dKlCwCHHHIIb7/9NqrKjh07eP/99+natStt2rShSZMmvP/++6gqTz31FMOGDUtYti00CJfrgF8Al+FWcb0JPOoz7TjgMyDPO74emKOqt4nI9d7xdSLSHTgXKMBNAZstIoer6l5gEjAWeB94FRgMvJaOhhlGTWHjxo2MGjWKvXv3Ulpaytlnn82QIUNo1qwZ48aNo6SkhAYNGvDII48AcPnllzN69GiOPPJIVJXRo0dz1FHOi9OkSZO46KKL2LlzJ6eddhqnnZbYKYnE0isY1YeIHAAcoqorkkiTD0wBbgGuUdUhIrIC6K+qG0WkDTBXVY8Qkd8AqOqtXto3gJtw9g/+rqpdvfARXvpfJCq7b9++umjRomSbaVQjyX6ImAxImbgX3HqwISIiQ4E/4fSunUSkF3Czqg6tIum9wASgSVRYa1XdCOAJ2VZeeDtcDzVCsRe2x9uvHB6rnmNxPV1at27N3Llzq26ckTXY/UyN/v37xz1nAjZcJgL9gLkAqrrEW80VFxEZAmxS1Q9FpL+PMmK9XTVB+L6Bqo8Aj4DrwSb6QxnZh93PzGECNlxKVHVLkp90xwNDReTHQAMgT0T+D/hSRNpEqQg2efGLgfZR6fOBDV54foxww9gvqA5Vis0iCJdPReQ8IEdEuojIA8B7iRKo6m9UNV9VO+IGr95W1QuAWcAoL9ooYKa3Pws4V0Tqi0gnoAuw0FMnbBORQm/2wMioNIZhpAETsOFyJW50fxduitZW4FcB87oNGCgiq4CB3jGqugx4HliOmwJ2uTeDANzshUeB1cC/sRkEhpFWbBZBDcFbZNBIVbeGXZdE2CyCmo/NIvBHGq9T3IysBxsiIvK0iOSJSCNgGbBCRK4Nu16GYaQHE7Dh0t3rsZ6Om+h/CHBhuFUyDCNdmIANl7oiUhcnYGeq6h7iTJUyDCP7MAEbLpNxK6oaAe+ISAfcQJdhGLUAmwcbEiJSB/hSVdtFhX0BnBxerQzDSCfWgw0JVS0FrqgUpqpaElKVDMNIMyZgw+UtERkvIu2jHB+2CLtShmGkB1MRhMvF3u/lUWEKdA6hLoZhpBkTsCHiOT40DKOWYiqCEBGRhiJyo4g84h138axlGYZRCzABGy5PALuB47zjYuAP4VXHMIx0YgI2XA5V1Ttwxq9R1Z0kWNdsGEZ2YQI2XHZ7LmMUQEQOxVnWMgyjFmCDXOFyE86EYHsRmYYzpj061BoZhpE2TMCGiKq+KSIfAoU41cA4Vf065GoZhpEmTEUQIiIyR1X/q6qvqOrLqvq1iMwJu16GYaQH68GGgIg0ABoCB4lIc8oHtvKAtqFVzDCMtGICNhx+gXMN0xb4kHIBuxV4KKxKGYaRXkzAhoCq3gfcJyJXquoDYdfHMIzMYAI2RFT1ARE5DuhI1L1Q1adCq5RhGGnDBGyIiMhU4FBgCRDx9KqACVjDqAWYgA2Xvji/XOYmxjBqITZNK1w+BQ5OJoFnO/bvIvKZiCwTkXFeeAsReUtEVnm/zaPS/EZEVovIChEZFBXeR0Q+8c7dL8n6MTYMIyEmYMPlIGC5iLwhIrMiWxVpSoBfq2o33AKFy0WkO3A9MEdVuwBzvGO8c+cCBcBg4GERyfHymgSMBbp42+D0Ns8w9m9MRRAuNyWbQFU3Ahu9/W0i8hnQDhgG9PeiTQHmAtd54c+q6i6gSERWA/1EZC2Qp6oLAETkKZx329eCN8cwjGhMwIaIqs5LJb2IdASOBj4AWnvCF1XdKCKtvGjtgPejkhV7YXu8/crhscoZi+vp0rp1a+bOnZtKtY0aht1Pf8S7Tv3794+fSFVtq+YN2IZbVFB52wZs9ZlHY9wihTO8482Vzn/r/T4EXBAV/hgwHPgBMDsq/ETgparK7dOnj2aa0aNHa8uWLbWgoKAsbPz48XrEEUdojx499PTTT9dvv/22QprPP/9cGzVqpH/6059UVXXHjh364x//WI844gjt3r27XnfddRmvd00BNxPF97a/ksbrFPd5MR1sCKhqE1XNi7E1UdW8qtKLSF1gBjBNVV/wgr8UkTbe+TbAJi+8GGgflTwf2OCF58cID52LLrqI119/vULYwIED+fTTT1m6dCmHH344t956a4XzV199NaeddlqFsPHjx/Ovf/2LxYsX849//IPXXjPth1G9mIDNMryR/seAz1T17qhTs4BR3v4oYGZU+LkiUl9EOuEGsxaqUydsE5FCL8+RUWlC5Yc//CEtWlR0rnvqqaeSm+s0WoWFhRQXl2s3XnzxRTp37kxBQUFZWMOGDTn55JMBqFevHr17966QxjCqAxOw2cfxwIXAKSKyxNt+DNwGDBSRVcBA7xhVXQY8DyzH2Z69XFUjixouAx4FVgP/JksGuB5//PGy3uqOHTu4/fbbmThxYtz4mzdv5qWXXmLAgAHVVUXDAGyQK+tQ1fnEdysTU4Ko6i3ALTHCFwFHpq92meeWW24hNzeX888/H4CJEydy9dVX07hx45jxS0pKGDFiBFdddRWdO5s3dKN6MQFrZA1Tpkzh5ZdfZs6cOUTWRHzwwQdMnz6dCRMmsHnzZurUqUODBg244oorABg7dixdunThV7/6VZhVN/ZTTMAaWcHrr7/O7bffzrx582jYsGFZ+Lvvvlu2f9NNN9G4ceMy4XrjjTeyZcsWHn300Wqvr2GA6WCNGsiIESM49thjWbFiBfn5+Tz22GNcccUVbNu2jYEDB9KrVy8uvfTShHkUFxdzyy23sHz5cnr37k2vXr1M0BrVjrjpYIbhj759++qiRYvCroaRgGRNSuyvMiCN1yluRtaDNQzDN5s3b+bMM8+ka9eudOvWjQULFvC73/2Oo446il69enHqqaeyYYObTr127VoOOOAAevXq5eurozZiPVgjKawHW/PJZA921KhRnHjiiYwZM4bdu3fz3XffUadOHfLy3PqY+++/n+XLlzN58mTWrl3LkCFD+PTTT5OqT3VRHT1YG+Qysgb79A2XrVu38s477/Dkk08CbgFHvXr1KsTZsWNH0vepNmMqAsMwfLFmzRpatmzJ6NGjOfrooxkzZgw7duwA4IYbbqB9+/ZMmzaNm2++uSxNUVERRx99NCeddFKFGR/7CyZgDcPwRUlJCR999BGXXXYZixcvplGjRtx2222AWwCybt06zj//fB588EEA2rRpwxdffMHixYu5++67Oe+889i6dWuYTah2TMAahuGL/Px88vPzOeaYYwA488wz+eijjyrEOe+885gxYwYA9evX58ADDwSgT58+HHrooaxcubJ6Kx0yJmCN/Z5169Zx8skn061bNwoKCrjvvvsAOOecc8pGwDt27EivXr3K0ixdupRjjz2WgoICevTowffffx9W9auNgw8+mPbt27NixQoA5syZQ/fu3Vm1alVZnFmzZtG1a1cAvvrqK/budWYv1qxZw6pVq/a75co2yGXs9+Tm5nLXXXfRu3dvtm3bRp8+fRg4cCDPPfdcWZxf//rXNG3aFHCfyhdccAFTp06lZ8+e/Pe//6Vu3bphVb9aeeCBBzj//PPZvXs3nTt35oknnmDMmDGsWLGCOnXq0KFDByZPngzAO++8w+9//3tyc3PJyclh8uTJ+1hJq/UkMhZrm22Vt+owuB0PqsmQ9NChQ/XNN98sOy4tLdX8/HxduXKlqqq+8sorev7556fcnkxRXdcpE8Qytn7jjTdqjx49tGfPnjpw4EBdv359hTSVja37JY3XKe7zYioCw4hi7dq1LF68uEzPCM7eQevWrenSpQsAK1euREQYNGgQvXv35o477girurWOWMbWr732WpYuXcqSJUsYMmRIhVkKENvYek3BBKyRNuLpMpcsWUJhYSG9evWib9++LFy4MOSaxmb79u0MHz6ce++9t2ziPMAzzzzDiBEjyo5LSkqYP38+06ZNY/78+fztb39jzpw5vsuJd52uvfZaunbtylFHHcXPfvYzNm/enL7GZQmxjK1H34vK82xjGVuvUSTq3tpmW+UtkYpgw4YN+uGHH6qq6tatW7VLly66bNkyHThwoL766quq6j6vTzrppLh5JIIMfvru3r1bTz31VL3rrrsqhO/Zs0dbtWql69atKwt75plndNSoUWXHN998s95xxx2+y4p3nd544w3ds2ePqqpOmDBBJ0yYkFQbImTyOgXJP9kyioqKKqgIVFV/+9vfan5+vhYUFOimTZtUVXX79u1aWFio27Zt04kTJ5qKwKjdtGnTht69ewPQpEkTunXrxvr16xGRsvmPW7ZsoW3btmFWcx9UlUsuuYRu3bpxzTXXVDg3e/ZsunbtSn5+ufuyQYMGsXTpUr777jtKSkqYN28e3bt3911evOuUyC3O/k6sebZVGVuvESSSvrbZVnnzO8hVVFSk7du31y1btujy5cu1ffv2mp+fr23bttW1a9f6yqMyZKjX9O677ypQNpDSs2dPfeWVV1RVddSoUTpp0qR90kydOlW7d++uBQUFeu211wZqj2rF6xTNkCFDdOrUqYHyzNR1Cpp/smXE6sFGWLt2bdm5E044QTt06KAdOnTQpk2bavPmzfWBBx7IWDsSZRVvC/2BtS27Nj8Cdtu2bdq7d2+dMWOGqqpeeeWVOn36dFVVfe6553TAgAFV5hGLTAuO6qbydYrwhz/8QU8//XQtLS0NlG9tE7CR2Ruqqvfff78OHz58nzQ1VUUQ+gNrW3ZtVQnYWLrMvLy8MmFRWlqqTZo0SZhHPGqTgI2n833yySe1sLBQd+zYETjvbBaw5557rh588MGam5ur7dq100cffVTPOOMMLSgo0B49euiQIUO0uLh4n3QmYG2rFVsiAVtaWqoXXnihjhs3rkJ4165d9e9//7uqqs6ePVt79+4dN49E1BYBG+86vfbaa9qtW7eyQZygZLOArU6qQ8CaPVgjKRLZg50/fz4nnngiPXr0oE4dN376xz/+kby8PMaNG0dJSQkNGjTg4Ycfpk+fPkmXXR3mCqujjHjX6aqrrmLXrl1l6/cLCwvLVkUlQ6bbEMQcYU2UM9VhD9YE7H6OiAwG7gNygEdV9bZE8cM0uF1bBGymMQHrDzO4bWQUEckBHgIGAsXAP0VklqouD7dmxv5ObXjRga3k2t/pB6xW1TWquht4FhgWcp0Mo9ZgPdj9m3bAuqjjYuCYypFEZCww1jtcBhyZ+artS3X0UmpqTygZMt2G2nIfqqMME7D7N7G+w/b516nqI8Ajma+OYdQuTEWwf1MMtI86zgc2hFQXw6h1mIDdv/kn0EVEOolIPeBcYFbIdTKMWoOpCPZjVLVERK4A3sBN03pcVZeFXC3DqDXYPFjDMIwMYSoCwzCMDGEC1jAMI0OYgDUMw8gQJmANwzAyhAlYwzCMDGEC1jAMI0OYgDUMw8gQtVbAzpw58/Ww62AYxv5NrRWwwEFhV8AwjP2b2ixgDcMwQqXG2yIQkSZB0s2YMaNO0LSGYRh+UdVt8c5ZD9YwDCND1PgerJG15AJNcVa6SoCt3m86aQQ0xBkO/94rI53UwbWhLlAKbAN2pbmMw4EhQGOcfd4XgM1pLqMJ0MDb3wlsT3P+BwLDgYNx92AWsCbNZfQBBgD1vbxfwLUlXdTDuUs6Avc/nQcsSDVTE7CJSeWBzUtbLbKLOjhXNJXbfzCwBWfQO1UTbg29MupVCi8BviQ9AqolbqA0+iuvNfAdThDuSTH/tsBkoH+l8FuBR4Hf4YR6KuThrnvdSuG7gY2kLmjrAbcDF1LxXvwPzgTmZcB/UyyjK+469a4UfhtwF3BvivkDjAFuwL0oIvwW5x7pcuCjoBnXeHOFVehR6wDv4B7as72wXwBjDz744Pb/+c9/JgO/986Ni0p3JHACsBp4CugM7AVeAyZGxTMBmxyCu5YNEsT5DihKoYyGQEcSuErGCY9vUiijDdAiwfkSXC8qqJBtCczBtSMefwUuCZg/uJ53foLzivPHFld/WAV1gGeA0xLEWYHzWBz0hdcFeIvE9+IunEAPyq+AmxOc3w78GFgSL0Jt1sH+EncTI5wI/AQ49qGHHvoMuN8Lfx443tt+DnwOfOKdux/3+XE8UIj7Q+wPnEBFB4f1gCdxf6S3gUMC5NmCxMIVnIBsHiDvCG1JLFzB9TSD/rcPIPEDDe7Lr3XA/AGuJ7FwBTgLODmFMtpUcV58xEnEUBILV3Cf21elUMYfqfpeXAMcFjD/1rgOWCIaA3cEzD+rBWxbYBAwJSpsDHA37hMI4OsY6c4Cpnv7O4F3vf09OOHSNu01rZmcSEUBOxLX0+gFPETit3o8qnoYko1XmYY4HVxV1AGaBSzDb93ycPrlZGmEc83jhzEB8gfXdj91q4vTzwbBb91GEkwVeQj+OjsCXBwgf4CL8Fe3QgJ6Us5mAXs7++qpDgOOA96eMGFCF/bV2wCcgfv8qkxT3Bt5XprrWd2MwCnn38N5gj0N1yOdjxt8aIn7816C0y/9AzgW1/N/2svjRfbVDVZFDvvqRONRVS83Hg0zFDdIOsH1dpOlG/6FWr8A+UNy9QrSBvBft1ZApwD598W/fAp6nZJJ94MgBWSrgB0MfMW+epFc3Nv7lJEjR66nYu8W3E3bCXxWKTwHeBynTF+b7spWI12B8ThheRxwHU7YnoJTCUzH6Zy+AB7D9VSP9+K0xQ3egNNHbyG5nmZVn+2pxq8tJPPMBX0+k7m2Qe9DMnULUkZ1XKeMl5GtswgKcYrnU3G9oSbAX4D1eF5RjzzyyO9wPaoDKR/JHE7s3uv9wL+BhzNa68xzEjCT8gGeb4HuON1qZDT58zhpUxV4JTg1S+UR61jsJthMgu8zFDeanfjriWvAMlZ66fz04pcGyB+q5zotxXVYqmIL7oUeJH+/fFJ1lLjpBmSyjGztwUm3X/0AABPPSURBVN6E660didOjzMMNXr2MEzJ8/vnn9XEPSkS4CvAzYEalvH6HUw9cl+E6VwfCvoLrTpyqoBA3kyLeg72e8lHnHNw1SXYk/luf8YJO3dlOuX49EZpEXSrjN912gs3r3Yybw+mHxwLkD64NfqZ4leAEYBAe9xnvaYK/iP7hM27Q6/Q4/l70nwALgxSQrQI2HlNxo7Mf3H333Z1wU7YiHI8TImujwtoC1+KE9XzcDR1ZHRXNEHNxL5HIp31z3GDMBu/4vKi423EjpBFejTp/OsF00f+lagG4i+DCD9w816r4CqfmCMIOqp6eV+qzHvG4napfXnNx9yQIir/6pdKG56la6GwA7kmhjN9RtXCeRvCe/lqq/motwc2RDUS2z4ONy4wZM+YNHz78pBSLz8Z5sOfheqp7gY+Bl3CTsjfiHog+OPXKYbgXUilOb/sRTs1yFE4AjiaYPjoXN4gWa/BkB27uZVDhF6EpbopR5ZFyxQnXr1LMH9xChlgzEXbjdNWpriLqjrv+XSqFK07NcyluznAqtCD2lLVS4D+k9qIDdx8ew6nqKrMMuACnekuFE7wyKk8p2ws8gesgpfJ/Etw82svZV731NW4qaELTp4nmwZqANTLFAbgHMBfXC9hMcH1fLAQnACOj/t/j/9PYL3VxXwF1cYJvG8En5sdjAOVLZdfjemSr0ph/ZMpa5IW3E3ed0vngd8fNXoks5vkJ5dMf00EO7hpN9Y5v8fY3xE2RPC1xK9Ju8o7HAH/Dx2ISE7CGYVQHkS++TH29ZTr/QGXU5pVchmEYNZYaP00r0dshETNnziwNmtYwjOQRcTP9MvXcZTr/TJRhPVjDMIwMYQLWMAwjQ5iANQzDyBAmYA3DMDKECVjDMIwMYQLWMAwjQ9T4aVpG9iFurktTyg0/R1ZybdU0rWwRkbq4paCNcKu6dgLfqGraVouJyEk4Y86dcMtWXwGmqGpanCuKSGSVVVNcZ2cPrg1pc0ooIl1wNjn64FZvfQD8WVXXprGMPKKWFYtIc2BzGu91C9zS7cjxY8BfVPX9dOTv5Xk0MDbq+LfAo6q6KaWMVTXpDWeFZhPwaVTYn4B/4Qwv/A1o5oXXw60Z/gS3Nr5/VJpzvPjLgDuiwi+i3N7rEmBM1Dn1s915552+4mVqC3Jda8OGs9bVDWfToPJ2BFA/DWW0BHrEKSMfb4ViCvk3wxkpj3VvtwI/TUMbGgMFcdpwGJCbYv6CM7RSGqMNe4Gb09CGujivuJF6R/I/Crd8tmEayhiBe7nFuhcvA41TzL8+zrdYrPx3Ab9IJf+gKoIncUavo3kLOFJVj8KZGvuNF/5zAFXtgXMBcZeI1BGRA3FCeYCqFgCtRSTaNuNzqtrL2x4NWE8jDiLSX0SOizq+RkSWi8hSEZkjIh0C5FkX5/Qwnk3Y+kBnEQn85eT1ZtoQ335tC1Jw++P1Kl8ivj+sJsAMETkhhTIOwFl9i+fWpSHQSSKz3oNxC864eqw86gC/E5EJQTP3rlMiB5e5uDb4cfETr4zTgP8jvteFn+CseqXCE8R34VMPmCwi5wTNPJCAVdV3qGRuTVXfVNWIfcz3Kbct2h3nQRN13e3NOEO9nYGVqhqxfDQbZxDbqB7647weRFgM9PVekNMJ5uitJVWrnepS0T2ybzyBc7CPqAeKiF/3NZX5Cc6CUyLqkpon01ZU/ewdQEC/YiJyEM4ZYFXcICKNgpSBe5FVJTxzcP+JoPyBqq/TaSJyYpDMRaQHrodcZT2CvuwyNch1Mc4FNji1wDARyRWRTjhdUHucy+yuItLR69Gc7oVHGO71pqaLSHS4kQARGeldt49FZKqI/FREPhCRxSIyW0Rai0hHnDm8q0VkiYicqKp/V9WIebzoF6TfcgX/3mKDOj3Mw/+4QdAyxlYdBYBTRCRpb6bef92vIZGgbRiFP+eQefh3wFgZv3Vr5vV2k0JE+hDbp14s/N6zoOkOw7ldSpq0D3KJyA24QY1pXtDjOJ3cIpy7kveAElX9VkQuA57D6Ynew/VqwX2iPaOqu0TkUpxvrUAN3J8QkQKcceDjVfVr73NagUJVVREZA0xQ1V+LyGRgu6reGSOrSyh/QfolF/9eVuuKSB1VTda0YDKfm0E/TY9IMu7qJPOvh3/3PNXRhq4By/Bbtzq4Nic7+FgdbUi2jDnJFpBWASsio3B2Gweop0H21AZXR8V5D8/epaq+hBOmiMhYPMO5qhrtUuQvOAvwRtWcAkxX1a8BVPUb7zPoORFpg/ujFyXKQEQuwKlwkjX1mOyIcZAR5mTSBB3BrtL+ZxR+3NekQk1ug+L/RRGkHdXRhoyXkTYVgYgMxvm1Ghr1qYmINIzoeURkIK73utw7buX9NsdZDn/UO462Xj6Ufb3AGrGJ5ZPrAeBBb5DxFyRwticiP8L1gIeq6q5kCvZepH57KTsiL+AkSWb6UtCpTn57KdtxU56SZSf+fXllug3Jxo3Gb912J/tf8ngH/wIwaBv8plPcrJKkCSRgReQZnKvnI0SkWEQuAR7EjbC+5en1JnvRWwEfichnOAF8YVRW94nIcpwvrNtUdaUXfpWILBORj4GrcNO2jKqZA5ztzdCIjLg3xVnKB6ebi7ANd7/w4h4N/BknXIPO/fPrzDCQ00NV3Yk/Nyp7cYOpQfDrWXiaBpgP671Y/DqTDOocciblLtgT8S9VDSQ48F+3ZB1nAqCqX7Kvg9JY7MU59QzCE/hz/fOmqgZyfVPjPRpURkR8VfjOO+9k/Pjxma5OXFQ1VTfYgfDUNBE/RYtxc5LvwQnZ94EfqGp/ETkcN1ugFLgSmIibW7rRy+oLVR2aZNmCm36UyAvFZlUN4sY5UkYD4FDi63sV+DyI8Isq4wbcCHY8VuD03IEEoDfocyjxpx8BfKWqGxOcr6qMU3BOE+PpSnfgVHlBeuGRMvKpONj1sffbM6qMNQG/VhCRtrixmURTBq9U1QeD5O+VcSFujCfe8/olcJyqrgmSv63kqmWo6hTcHyaamTHircRNCI/wozSUrSKyFjeVqgUVheBeXK8nFU+mqOr3IvJv3FzXxpVO7wQ2aooroVT1FhH5EriRig/3HuCvwLigwtXLv1RE1uDa0IyKD/cenHD9Omj+XhlveyqfO4FjKp1+F7haVT9MsYxiEdkNHERFWVKK8/u1Mahw9fLf4M3VfgAYRsX/02pgoqo+HTR/r4ypIrIFuBU3pTRCKW6g96qgwhWysAfrl5kzZy4aNmxY37Drsb/i9dLyKF8quy3ArIGqyqiPWyoL8H207j9N+dfBvXg64oT3m96nazrLyMH1+HNwwnVbKkIpThm9gYgwPUpVP0lz/oJrwxYvKFdVU/UcXLmMfJyDyPo44fr3DFynE3G6X4DOqppwQNgP1oM1MoInTIPqQf2WsQu3nDFT+ZcCb2Yqf6+MVPTFfsv4SMpdoaRVuHp5KrA1qoy0Clcvz2L2/TJLdxnvRrUhZeEKZk3LMAwjY5iANQzDyBAmYA3DMDKECVjDMIwMYQLWMAwjQ9T4WQQikmjSelxmzJhRJ2haw6itVMczURueu2TaoKrb4p2zHqxhGEaGMAFrGIaRIWq8igDnAylpioqKAqdNE36NKtdG6uG8FjSlfCXXFtxS2WRMxMVDgJ8CY3BeGerg/Lo9jvOvlA7Hhzk4A+LNce0pxRnI+QZ/Bmf8cABuSXEe5U4Pv/U2vxa3ElEXOBtnAD/CGzirdS/gmQdNkXycW6how90P4gwHpWtRQyPc/yni4HIP7j58i7sv6cj/AqIcK+IMzfwFeD2VjKvswYrI4yKySUQ+jXFuvIio56IiOvwQEdkuIuOjwuqJyCMislJE/iUiw73wS0XkE88C13wR6V65HCOraIqzAB9tiyAX94AcRmJDMH7Ixflp+j+c25t6XlhP4D6cb7igngAi1McZY2nt5Q/uWWmK8zDbKsX8wV2PzjhbBJHnsK6X92EkNgTjhyY4L7iTgB9EhR8LPIYTIHFNV/rkBJzJxqtxftIijMTZO7goxfzB2bXoiGtPHZyAreeFH0p8/29+aQPMw/kHjJY9A3H+vibj3+7tPvhRETzJvg4O8dy4DARiWUa6h30t4t8AbFLVw3ENmeeFP62qPVS1F84P1N3+qm6kyAlUNAJyHO6h+BZnWCMIBwDtiP+HrINzCxTUXxbA/+J6r/HoiRO+QRGcgZdED25LAvrL8mhCYt9iOV4d/HqIiMWfgcIE50/BPadBaYfzRhLvhVkHuBf4YQpltCCx/7Z6OOGbCs/jPOPG4zzg+qCZVylgYzk49LgHmEAlA88icjqwBvfJFs3FOIs1qGpplNX96M/4RpXzMzLGiVQUsMU4P12peOk8kKrf9kJAp4e4HuTFVcZyL4+ghn6a469XlIozPz9pcwguxLvgPItUxTlU7Hkmw8+p+mukDjAuYP7grHRVRT3c/yIIAyg3rZiIhIbqExHU4PZQYL2qflwpvBHOqPb/VAqP/FH+V0Q+EpG/ikjrqPOXeybo7sAZ2DaCMwJnDP09nCHi03DW2OcDs3AP9yE4v1uX44ydH4v7EllGcJ2W4F/vHFRwnI7/T+egzvz8Pqz1CPbQ1cV/G/w6kayM37bnAmdluIwf4U9QVqYR/j//gwpYv21oAQwKUkAQb48NcZ/7v49x+n+Ae2LY48zFKcP/oaq9cQKgzNmeqj6kqofihPONydbJKKMrMB7nevo43PVcgPscPAFnYPtXOGH6GPAQcLwXJ1Vy8K+riujSkqV11VHK8OPeOxbJDPwG0f8lkyboIHQy1ymZuEHSCcF01slcp6B62Iz/n4LcwENxiv6PPdNe+TiXMP1wn5xnisgduF5KqYh8j3uQv8NZ1wdntPiSGHk/i1PKG8E4CWdcO6LS+Ran734S9wepi/PsmwmS6fkqwVRBycwK2VJ1lJgk044go/DJpAk6yp9M24Nepy34H0wMUkZ1XKeM/5+S7sGq6ieq2kpVO6pqR5zurreq/kdVT4wKvxf4o6o+6NmLfAk36gtO9xFxfNglKvuf4HmcNQIRy+nhnThVQSFOH5bqyHE8SnEuQvwQdPrcS/h/mF4MWIbfB2kPwaZr7cK/h9Kg12kfDxYJCHqd/JaxiHKfcMmwHf/3Ouh18tv2nbjpbUnjZ5pWLAeHQbgOuElEluIcH/7aC7/Cc3C4BLiGio75jOSYC/yM8p5Fc5xedIN3fF5U3O3s63IlVfy6OQnkCA/3oPp5sFfhpmsFwe/cyqBtAH8OA5NxjliZhTjBVhVvAyurjBWbP+NPAAb9IlXcvaiKvT7jxWIm/oT/c0HLqPEuY/w6OaxM2E4PCW+hwXm4nupenBO6l4DbcM4MFwJ9gB/j5lpOxQmT8bie1dM41c73wCagX4DyW5JY57aR1IRTM9z8zh5xzn+J+xIKKjjAvXgOIb6eeCuwLoX8wU1zijfYp7gHP+jnO7jpcK/h2hGLVbgB0KAehMF1lB4gfkdtEq5jlQodiN8RKMWpvFJZ+NETJ2jjqTvexw2uxi0jkS0CE7CZY39eydUEN3LcMCpsB66Hm5JDQo/GwBW4iextvbCtuFVc9xLsk7QyDXBtyKNc0O7CvRxSeUFE0ww3ZS2itlHcarGv8edOuipa4l62F1AuQL4CngLuJ3jPL5rjvTJOpVzQ/hMnXKenIX8onw8bmT+tuJfP16THZVAn3ODv2ZT7eIsMBE+iipWB2S5gg1rTmjd8+PCT0l0fIylyKV8qm3Y/TV7eHXEPdjHpEUqVqYMbHCwlPct8YxG5TntIz9LPytTH9WQV1+PLRDta4L5ctlKukko3dSlfUpyJ69QI92VRAhThcyA2kYDNBlsERvZSQnrW1MdjL/DvDOYP7kHOmGNFj0xfp11kfvA4nT37eGTqBRdhB6mplvbBrGkZhmFkiBrfg03U/U7EzJkzS4OmNQzDSAfWgzUMw8gQJmANwzAyRDbMInidYMYiDsL/xPdM0EBVjwyxfMMwQqbGC9igiMgiVQ1qsi7ryzcMI3xMRWAYhpEhTMAahmFkiNosYB/Zz8s3DCNkaq0O1jAMI2xqcw/WMAwjVGq0gI3lMlxEWojIWyKyyvttHnXuNyKyWkRWiMigqPA+nmvw1SJyv3iuGESkvog854V/ICId01DnwV75q0UksDdKwzCynxotYIntMvx6YI6qdgHmeMeISHecE7MCL83DIhJxezwJGIvzttklKs9LgG9V9TCcl9zbU6msV95DODub3YERXr0Mw9gPqdECNo7L8GHAFG9/Cs4YbiT8WVXdpapFwGqgn4i0AfJUdYHnuuapSmkieU0HBkR6twHpB6xW1TWquhvnY2xYCvkZhpHF1GgBG4fWqroRwPuNWM9vR0Ur88VeWDtvv3J4hTSqWoIz4ntgCnWLVwfDMPZDslHAxiNWz1MThCdKk+46GIaxH5KNAvZL77Mf7zfiU6gY54coQj7Osnqxt185vEIaEckFmpKa0eB4dTAMYz8kGwXsLMo9z46i3MvoLOBcb2ZAJ9xg1kJPjbBNRAo9/erISmkieZ0JvK2pTQz+J9BFRDqJSD3coNusFPIzDCOLqdEGtz2X4f2Bg0SkGJiI85D6vOc+/AvgLABVXSYizwPLce43LlfViB+oy3AzEg7Aedp8zQt/DJgqIqtxPddzU6mvqpaIyBU4H+o5wOOquiyVPA3DyF5sJZdhGEaGyEYVgWEYRlZgAtYwDCNDmIA1DMPIECZgDcMwMoQJWMMwjAxhAtYwDCNDmIA1DMPIECZgDcMwMsT/A8o3yGW5fGh6AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(example_values, subset_size='sum', show_counts=True).plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use `DataFrame` as input: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `DataFrame` can also be used as input to carry additional information. \n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.691208Z", "start_time": "2021-07-31T14:30:32.678924Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexvalue
cat0cat1cat2
FalseTrueTrue01.652317
True11.510447
FalseTrue21.584646
True31.279395
TrueTrue42.338243
\n", "
" ], "text/plain": [ " index value\n", "cat0 cat1 cat2 \n", "False True True 0 1.652317\n", " True 1 1.510447\n", " False True 2 1.584646\n", " True 3 1.279395\n", " True True 4 2.338243" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from upsetplot import generate_samples\n", "example_samples_df = generate_samples()\n", "example_samples_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this data frame, each observation has two variables: `index` and `value`. If we simply want to count the number of observations in each unique subset, we can use `subset_size='count'`: \n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:32.872276Z", "start_time": "2021-07-31T14:30:32.693257Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(example_samples_df, subset_size='count').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If for some reason, we want to plot the sum of a variable in each subset (eg. `index`), we can use `sum_over='index'`. This will make `upsetplot` to take sum of a given variable in each unique subset and plot that number: \n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.058637Z", "start_time": "2021-07-31T14:30:32.876980Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(example_samples_df, sum_over='index', subset_size='sum').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert Data to UpSet-compatible format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can convert data from common formats to be compatible with `upsetplot`.\n", "\n", "Suppose we have three categories (the data is not scientifically true!): \n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.065219Z", "start_time": "2021-07-31T14:30:33.061354Z" } }, "outputs": [ { "data": { "text/plain": [ "(['Cat', 'Dog', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Rhinoceros', 'Moose'],\n", " ['Horse', 'Sheep', 'Cattle', 'Moose', 'Rhinoceros'],\n", " ['Dog', 'Chicken', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Duck'])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mammals = ['Cat', 'Dog', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Rhinoceros', 'Moose']\n", "herbivores = ['Horse', 'Sheep', 'Cattle', 'Moose', 'Rhinoceros']\n", "domesticated = ['Dog', 'Chicken', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Duck']\n", "(mammals, herbivores, domesticated)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since this format lists the entries in each category, we can use `from_contents` to construct a data frame ready for plotting.\n", "\n", "`from_contents` takes a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) as input. The input dictionary should have categories names as key and a [list](https://docs.python.org/3/tutorial/datastructures.html) or set of category members as values: \n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.077758Z", "start_time": "2021-07-31T14:30:33.066609Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id
mammalherbivoredomesticated
TrueFalseFalseCat
TrueDog
TrueTrueHorse
TrueSheep
FalseTruePig
TrueTrueCattle
FalseRhinoceros
FalseMoose
FalseFalseTrueChicken
TrueDuck
\n", "
" ], "text/plain": [ " id\n", "mammal herbivore domesticated \n", "True False False Cat\n", " True Dog\n", " True True Horse\n", " True Sheep\n", " False True Pig\n", " True True Cattle\n", " False Rhinoceros\n", " False Moose\n", "False False True Chicken\n", " True Duck" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from upsetplot import from_contents\n", "animals = from_contents({'mammal': mammals, 'herbivore': herbivores, 'domesticated': domesticated})\n", "animals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can plot: \n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.254138Z", "start_time": "2021-07-31T14:30:33.079234Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(animals, subset_size='count').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, our input data may have been structured by species, allowing us to use `from_memberships`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.265478Z", "start_time": "2021-07-31T14:30:33.256034Z" } }, "outputs": [ { "data": { "text/plain": [ "Domesticated Herbivore Mammal\n", "False False True 1\n", "True False True 1\n", " True True 1\n", " True 1\n", " False True 1\n", " True True 1\n", "False True True 1\n", " True 1\n", "True False False 1\n", " False 1\n", "Name: ones, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from upsetplot import from_memberships\n", "\n", "animal_memberships = {\n", " \"Cat\": \"Mammal\",\n", " \"Dog\": \"Mammal,Domesticated\",\n", " \"Horse\": \"Mammal,Herbivore,Domesticated\",\n", " \"Sheep\": \"Mammal,Herbivore,Domesticated\",\n", " \"Pig\": \"Mammal,Domesticated\",\n", " \"Cattle\": \"Mammal,Herbivore,Domesticated\",\n", " \"Rhinoceros\": \"Mammal,Herbivore\",\n", " \"Moose\": \"Mammal,Herbivore\",\n", " \"Chicken\": \"Domesticated\",\n", " \"Duck\": \"Domesticated\",\n", "}\n", "\n", "# Turn this into a list of lists:\n", "animal_membership_lists = [categories.split(\",\") for categories in animal_memberships.values()]\n", "\n", "animals = from_memberships(animal_membership_lists)\n", "animals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This should produce the same plot:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.444009Z", "start_time": "2021-07-31T14:30:33.266781Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from upsetplot import UpSet\n", "ax_dict = UpSet(animals, subset_size='count').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## When category membership is indicated in DataFrame columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at a `movies` dataset like that used in the [original publication by Alexander Lex et al.](https://caleydo.org/publications/2014_infovis_upset/).\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.622764Z", "start_time": "2021-07-31T14:30:33.445565Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore
01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376
12PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665
23SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262
34SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259
45Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240
\n", "
" ], "text/plain": [ " Rank Title Genre \\\n", "0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", "1 2 Prometheus Adventure,Mystery,Sci-Fi \n", "2 3 Split Horror,Thriller \n", "3 4 Sing Animation,Comedy,Family \n", "4 5 Suicide Squad Action,Adventure,Fantasy \n", "\n", " Description Director \\\n", "0 A group of intergalactic criminals are forced ... James Gunn \n", "1 Following clues to the origin of mankind, a te... Ridley Scott \n", "2 Three girls are kidnapped by a man with a diag... M. Night Shyamalan \n", "3 In a city of humanoid animals, a hustling thea... Christophe Lourdelet \n", "4 A secret government agency recruits some of th... David Ayer \n", "\n", " Actors Year Runtime (Minutes) \\\n", "0 Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 \n", "1 Noomi Rapace, Logan Marshall-Green, Michael Fa... 2012 124 \n", "2 James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 117 \n", "3 Matthew McConaughey,Reese Witherspoon, Seth Ma... 2016 108 \n", "4 Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 123 \n", "\n", " Rating Votes Revenue (Millions) Metascore \n", "0 8.1 757074 333.13 76 \n", "1 7.0 485820 126.46 65 \n", "2 7.3 157606 138.12 62 \n", "3 7.2 60545 270.32 59 \n", "4 6.2 393727 325.02 40 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "movies = pd.read_csv(\"https://raw.githubusercontent.com/peetck/IMDB-Top1000-Movies/master/IMDB-Movie-Data.csv\")\n", "movies.head()" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2021-07-29T12:58:20.800787Z", "start_time": "2021-07-29T12:58:20.797460Z" } }, "source": [ "Here Genre category membership is represented with a comma-separated Genre column.\n", "\n", "`from_memberships` is our best option:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:33.721513Z", "start_time": "2021-07-31T14:30:33.625121Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore
ActionAdventureAnimationBiographyComedyCrimeDramaFamilyFantasyHistoryHorrorMusicMusicalMysteryRomanceSci-FiSportThrillerWarWestern
TrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalse1Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376
FalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseTrueFalseFalseFalseFalse2PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665
FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseTrueFalseFalse3SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262
TrueFalseTrueFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse4SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259
TrueTrueFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse5Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240
................................................................................................
FalseFalseFalseFalseFalseTrueTrueFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalse996Secret in Their EyesCrime,Drama,MysteryA tight-knit team of rising investigators, alo...Billy RayChiwetel Ejiofor, Nicole Kidman, Julia Roberts...20151116.2275850.0045
FalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalse997Hostel: Part IIHorrorThree American college students studying abroa...Eli RothLauren German, Heather Matarazzo, Bijou Philli...2007945.57315217.5446
TrueFalseFalseFalseFalseTrueFalseFalseTrueFalseFalseFalseFalseFalse998Step Up 2: The StreetsDrama,Music,RomanceRomantic sparks occur between two dance studen...Jon M. ChuRobert Hoffman, Briana Evigan, Cassie Ventura,...2008986.27069958.0150
TrueFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse999Search PartyAdventure,ComedyA pair of friends embark on a mission to reuni...Scot ArmstrongAdam Pally, T.J. Miller, Thomas Middleditch,Sh...2014935.648810.0022
FalseFalseFalseTrueFalseFalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse1000Nine LivesComedy,Family,FantasyA stuffy businessman finds himself trapped ins...Barry SonnenfeldKevin Spacey, Jennifer Garner, Robbie Amell,Ch...2016875.31243519.6411
\n", "

1000 rows × 12 columns

\n", "
" ], "text/plain": [ " Rank \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 1 \n", "False True False False False False False False False False False False False True False True False False False False 2 \n", " False False False False False False False False False True False False False False False False True False False 3 \n", " True False True False False True False False False False False False False False False False False False 4 \n", "True True False False False False False False True False False False False False False False False False False False 5 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 996 \n", " False False False False False True False False False False False False False False False 997 \n", " True False False False False True False False True False False False False False 998 \n", " True False False True False False False False False False False False False False False False False False False 999 \n", " False False False True False False True True False False False False False False False False False False False 1000 \n", "\n", " Title \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False Guardians of the Galaxy \n", "False True False False False False False False False False False False False True False True False False False False Prometheus \n", " False False False False False False False False False True False False False False False False True False False Split \n", " True False True False False True False False False False False False False False False False False False Sing \n", "True True False False False False False False True False False False False False False False False False False False Suicide Squad \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False Secret in Their Eyes \n", " False False False False False True False False False False False False False False False Hostel: Part II \n", " True False False False False True False False True False False False False False Step Up 2: The Streets \n", " True False False True False False False False False False False False False False False False False False False Search Party \n", " False False False True False False True True False False False False False False False False False False False Nine Lives \n", "\n", " Genre \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False Action,Adventure,Sci-Fi \n", "False True False False False False False False False False False False False True False True False False False False Adventure,Mystery,Sci-Fi \n", " False False False False False False False False False True False False False False False False True False False Horror,Thriller \n", " True False True False False True False False False False False False False False False False False False Animation,Comedy,Family \n", "True True False False False False False False True False False False False False False False False False False False Action,Adventure,Fantasy \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False Crime,Drama,Mystery \n", " False False False False False True False False False False False False False False False Horror \n", " True False False False False True False False True False False False False False Drama,Music,Romance \n", " True False False True False False False False False False False False False False False False False False False Adventure,Comedy \n", " False False False True False False True True False False False False False False False False False False False Comedy,Family,Fantasy \n", "\n", " Description \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False A group of intergalactic criminals are forced ... \n", "False True False False False False False False False False False False False True False True False False False False Following clues to the origin of mankind, a te... \n", " False False False False False False False False False True False False False False False False True False False Three girls are kidnapped by a man with a diag... \n", " True False True False False True False False False False False False False False False False False False In a city of humanoid animals, a hustling thea... \n", "True True False False False False False False True False False False False False False False False False False False A secret government agency recruits some of th... \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False A tight-knit team of rising investigators, alo... \n", " False False False False False True False False False False False False False False False Three American college students studying abroa... \n", " True False False False False True False False True False False False False False Romantic sparks occur between two dance studen... \n", " True False False True False False False False False False False False False False False False False False False A pair of friends embark on a mission to reuni... \n", " False False False True False False True True False False False False False False False False False False False A stuffy businessman finds himself trapped ins... \n", "\n", " Director \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False James Gunn \n", "False True False False False False False False False False False False False True False True False False False False Ridley Scott \n", " False False False False False False False False False True False False False False False False True False False M. Night Shyamalan \n", " True False True False False True False False False False False False False False False False False False Christophe Lourdelet \n", "True True False False False False False False True False False False False False False False False False False False David Ayer \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False Billy Ray \n", " False False False False False True False False False False False False False False False Eli Roth \n", " True False False False False True False False True False False False False False Jon M. Chu \n", " True False False True False False False False False False False False False False False False False False False Scot Armstrong \n", " False False False True False False True True False False False False False False False False False False False Barry Sonnenfeld \n", "\n", " Actors \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... \n", "False True False False False False False False False False False False False True False True False False False False Noomi Rapace, Logan Marshall-Green, Michael Fa... \n", " False False False False False False False False False True False False False False False False True False False James McAvoy, Anya Taylor-Joy, Haley Lu Richar... \n", " True False True False False True False False False False False False False False False False False False Matthew McConaughey,Reese Witherspoon, Seth Ma... \n", "True True False False False False False False True False False False False False False False False False False False Will Smith, Jared Leto, Margot Robbie, Viola D... \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False Chiwetel Ejiofor, Nicole Kidman, Julia Roberts... \n", " False False False False False True False False False False False False False False False Lauren German, Heather Matarazzo, Bijou Philli... \n", " True False False False False True False False True False False False False False Robert Hoffman, Briana Evigan, Cassie Ventura,... \n", " True False False True False False False False False False False False False False False False False False False Adam Pally, T.J. Miller, Thomas Middleditch,Sh... \n", " False False False True False False True True False False False False False False False False False False False Kevin Spacey, Jennifer Garner, Robbie Amell,Ch... \n", "\n", " Year \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 2014 \n", "False True False False False False False False False False False False False True False True False False False False 2012 \n", " False False False False False False False False False True False False False False False False True False False 2016 \n", " True False True False False True False False False False False False False False False False False False 2016 \n", "True True False False False False False False True False False False False False False False False False False False 2016 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 2015 \n", " False False False False False True False False False False False False False False False 2007 \n", " True False False False False True False False True False False False False False 2008 \n", " True False False True False False False False False False False False False False False False False False False 2014 \n", " False False False True False False True True False False False False False False False False False False False 2016 \n", "\n", " Runtime (Minutes) \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 121 \n", "False True False False False False False False False False False False False True False True False False False False 124 \n", " False False False False False False False False False True False False False False False False True False False 117 \n", " True False True False False True False False False False False False False False False False False False 108 \n", "True True False False False False False False True False False False False False False False False False False False 123 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 111 \n", " False False False False False True False False False False False False False False False 94 \n", " True False False False False True False False True False False False False False 98 \n", " True False False True False False False False False False False False False False False False False False False 93 \n", " False False False True False False True True False False False False False False False False False False False 87 \n", "\n", " Rating \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 8.1 \n", "False True False False False False False False False False False False False True False True False False False False 7.0 \n", " False False False False False False False False False True False False False False False False True False False 7.3 \n", " True False True False False True False False False False False False False False False False False False 7.2 \n", "True True False False False False False False True False False False False False False False False False False False 6.2 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 6.2 \n", " False False False False False True False False False False False False False False False 5.5 \n", " True False False False False True False False True False False False False False 6.2 \n", " True False False True False False False False False False False False False False False False False False False 5.6 \n", " False False False True False False True True False False False False False False False False False False False 5.3 \n", "\n", " Votes \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 757074 \n", "False True False False False False False False False False False False False True False True False False False False 485820 \n", " False False False False False False False False False True False False False False False False True False False 157606 \n", " True False True False False True False False False False False False False False False False False False 60545 \n", "True True False False False False False False True False False False False False False False False False False False 393727 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 27585 \n", " False False False False False True False False False False False False False False False 73152 \n", " True False False False False True False False True False False False False False 70699 \n", " True False False True False False False False False False False False False False False False False False False 4881 \n", " False False False True False False True True False False False False False False False False False False False 12435 \n", "\n", " Revenue (Millions) \\\n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 333.13 \n", "False True False False False False False False False False False False False True False True False False False False 126.46 \n", " False False False False False False False False False True False False False False False False True False False 138.12 \n", " True False True False False True False False False False False False False False False False False False 270.32 \n", "True True False False False False False False True False False False False False False False False False False False 325.02 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 0.00 \n", " False False False False False True False False False False False False False False False 17.54 \n", " True False False False False True False False True False False False False False 58.01 \n", " True False False True False False False False False False False False False False False False False False False 0.00 \n", " False False False True False False True True False False False False False False False False False False False 19.64 \n", "\n", " Metascore \n", "Action Adventure Animation Biography Comedy Crime Drama Family Fantasy History Horror Music Musical Mystery Romance Sci-Fi Sport Thriller War Western \n", "True True False False False False False False False False False False False False False True False False False False 76 \n", "False True False False False False False False False False False False False True False True False False False False 65 \n", " False False False False False False False False False True False False False False False False True False False 62 \n", " True False True False False True False False False False False False False False False False False False 59 \n", "True True False False False False False False True False False False False False False False False False False False 40 \n", "... ... \n", "False False False False False True True False False False False False False True False False False False False False 45 \n", " False False False False False True False False False False False False False False False 46 \n", " True False False False False True False False True False False False False False 50 \n", " True False False True False False False False False False False False False False False False False False False 22 \n", " False False False True False False True True False False False False False False False False False False False 11 \n", "\n", "[1000 rows x 12 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies_by_genre = from_memberships(movies.Genre.str.split(','), data=movies)\n", "movies_by_genre" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:34.400949Z", "start_time": "2021-07-31T14:30:33.722972Z" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "UpSet(movies_by_genre)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the size of this plot, we limit ourselves to frequent genres:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:34.819839Z", "start_time": "2021-07-31T14:30:34.402594Z" } }, "outputs": [ { "data": { "text/plain": [ "{'matrix': ,\n", " 'shading': ,\n", " 'totals': ,\n", " 'intersections': }" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "UpSet(movies_by_genre, min_subset_size=15, show_counts=True).plot()" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2021-07-29T12:58:35.379788Z", "start_time": "2021-07-29T12:58:35.374869Z" } }, "source": [ "If the genres were instead presented as a series of boolean columns, we could use `from_indicators`." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:34.846966Z", "start_time": "2021-07-31T14:30:34.821393Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActionAdventureSci-FiMysteryHorrorThrillerAnimationComedyFamilyFantasyDramaMusicBiographyRomanceHistoryCrimeWesternWarMusicalSport
0TrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1FalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2FalseFalseFalseFalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
3FalseFalseFalseFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
4TrueTrueFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
...............................................................
995FalseFalseFalseTrueFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalseFalseFalse
996FalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
997FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueTrueFalseTrueFalseFalseFalseFalseFalseFalse
998FalseTrueFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
999FalseFalseFalseFalseFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", "

1000 rows × 20 columns

\n", "
" ], "text/plain": [ " Action Adventure Sci-Fi Mystery Horror Thriller Animation Comedy \\\n", "0 True True True False False False False False \n", "1 False True True True False False False False \n", "2 False False False False True True False False \n", "3 False False False False False False True True \n", "4 True True False False False False False False \n", ".. ... ... ... ... ... ... ... ... \n", "995 False False False True False False False False \n", "996 False False False False True False False False \n", "997 False False False False False False False False \n", "998 False True False False False False False True \n", "999 False False False False False False False True \n", "\n", " Family Fantasy Drama Music Biography Romance History Crime \\\n", "0 False False False False False False False False \n", "1 False False False False False False False False \n", "2 False False False False False False False False \n", "3 True False False False False False False False \n", "4 False True False False False False False False \n", ".. ... ... ... ... ... ... ... ... \n", "995 False False True False False False False True \n", "996 False False False False False False False False \n", "997 False False True True False True False False \n", "998 False False False False False False False False \n", "999 True True False False False False False False \n", "\n", " Western War Musical Sport \n", "0 False False False False \n", "1 False False False False \n", "2 False False False False \n", "3 False False False False \n", "4 False False False False \n", ".. ... ... ... ... \n", "995 False False False False \n", "996 False False False False \n", "997 False False False False \n", "998 False False False False \n", "999 False False False False \n", "\n", "[1000 rows x 20 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genre_indicators = pd.DataFrame([{cat: True\n", " for cat in cats}\n", " for cats in movies.Genre.str.split(',').values]).fillna(False)\n", "genre_indicators" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:34.863338Z", "start_time": "2021-07-31T14:30:34.848832Z" } }, "outputs": [], "source": [ "from upsetplot import from_indicators\n", "# this produces the same result as from_memberships above\n", "movies_by_genre = from_indicators(genre_indicators, data=movies)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These columns could also be part of the original matrix. For this case `from_indicators` allows the `indicators` to be specified as a list of column names, or as a function of the data frame." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:34.887205Z", "start_time": "2021-07-31T14:30:34.864873Z" }, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotes...DramaMusicBiographyRomanceHistoryCrimeWesternWarMusicalSport
01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
12PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
23SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
34SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
45Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
..................................................................
995996Secret in Their EyesCrime,Drama,MysteryA tight-knit team of rising investigators, alo...Billy RayChiwetel Ejiofor, Nicole Kidman, Julia Roberts...20151116.227585...TrueFalseFalseFalseFalseTrueFalseFalseFalseFalse
996997Hostel: Part IIHorrorThree American college students studying abroa...Eli RothLauren German, Heather Matarazzo, Bijou Philli...2007945.573152...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
997998Step Up 2: The StreetsDrama,Music,RomanceRomantic sparks occur between two dance studen...Jon M. ChuRobert Hoffman, Briana Evigan, Cassie Ventura,...2008986.270699...TrueTrueFalseTrueFalseFalseFalseFalseFalseFalse
998999Search PartyAdventure,ComedyA pair of friends embark on a mission to reuni...Scot ArmstrongAdam Pally, T.J. Miller, Thomas Middleditch,Sh...2014935.64881...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
9991000Nine LivesComedy,Family,FantasyA stuffy businessman finds himself trapped ins...Barry SonnenfeldKevin Spacey, Jennifer Garner, Robbie Amell,Ch...2016875.312435...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", "

1000 rows × 32 columns

\n", "
" ], "text/plain": [ " Rank Title Genre \\\n", "0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi \n", "1 2 Prometheus Adventure,Mystery,Sci-Fi \n", "2 3 Split Horror,Thriller \n", "3 4 Sing Animation,Comedy,Family \n", "4 5 Suicide Squad Action,Adventure,Fantasy \n", ".. ... ... ... \n", "995 996 Secret in Their Eyes Crime,Drama,Mystery \n", "996 997 Hostel: Part II Horror \n", "997 998 Step Up 2: The Streets Drama,Music,Romance \n", "998 999 Search Party Adventure,Comedy \n", "999 1000 Nine Lives Comedy,Family,Fantasy \n", "\n", " Description Director \\\n", "0 A group of intergalactic criminals are forced ... James Gunn \n", "1 Following clues to the origin of mankind, a te... Ridley Scott \n", "2 Three girls are kidnapped by a man with a diag... M. Night Shyamalan \n", "3 In a city of humanoid animals, a hustling thea... Christophe Lourdelet \n", "4 A secret government agency recruits some of th... David Ayer \n", ".. ... ... \n", "995 A tight-knit team of rising investigators, alo... Billy Ray \n", "996 Three American college students studying abroa... Eli Roth \n", "997 Romantic sparks occur between two dance studen... Jon M. Chu \n", "998 A pair of friends embark on a mission to reuni... Scot Armstrong \n", "999 A stuffy businessman finds himself trapped ins... Barry Sonnenfeld \n", "\n", " Actors Year \\\n", "0 Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 \n", "1 Noomi Rapace, Logan Marshall-Green, Michael Fa... 2012 \n", "2 James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 \n", "3 Matthew McConaughey,Reese Witherspoon, Seth Ma... 2016 \n", "4 Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 \n", ".. ... ... \n", "995 Chiwetel Ejiofor, Nicole Kidman, Julia Roberts... 2015 \n", "996 Lauren German, Heather Matarazzo, Bijou Philli... 2007 \n", "997 Robert Hoffman, Briana Evigan, Cassie Ventura,... 2008 \n", "998 Adam Pally, T.J. Miller, Thomas Middleditch,Sh... 2014 \n", "999 Kevin Spacey, Jennifer Garner, Robbie Amell,Ch... 2016 \n", "\n", " Runtime (Minutes) Rating Votes ... Drama Music Biography Romance \\\n", "0 121 8.1 757074 ... False False False False \n", "1 124 7.0 485820 ... False False False False \n", "2 117 7.3 157606 ... False False False False \n", "3 108 7.2 60545 ... False False False False \n", "4 123 6.2 393727 ... False False False False \n", ".. ... ... ... ... ... ... ... ... \n", "995 111 6.2 27585 ... True False False False \n", "996 94 5.5 73152 ... False False False False \n", "997 98 6.2 70699 ... True True False True \n", "998 93 5.6 4881 ... False False False False \n", "999 87 5.3 12435 ... False False False False \n", "\n", " History Crime Western War Musical Sport \n", "0 False False False False False False \n", "1 False False False False False False \n", "2 False False False False False False \n", "3 False False False False False False \n", "4 False False False False False False \n", ".. ... ... ... ... ... ... \n", "995 False True False False False False \n", "996 False False False False False False \n", "997 False False False False False False \n", "998 False False False False False False \n", "999 False False False False False False \n", "\n", "[1000 rows x 32 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies_with_indicators = pd.concat([movies, genre_indicators], axis=1)\n", "movies_with_indicators" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2021-07-29T13:00:57.617771Z", "start_time": "2021-07-29T13:00:57.594629Z" } }, "source": [ "We can now specify some or all category column names instead of passing a separate indicator matrix:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:35.089744Z", "start_time": "2021-07-31T14:30:34.888825Z" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgYAAAERCAYAAAAABG8eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3deZxcVZ338c8vnY3sCUtkCQkgCoigEpFxGVFRYIYBFFFAfUB5BB3cnhFHHJ1HR8dn1EfnwXFwwRUXRBAhiAyLaHBFCArIIhBJBpCdEAIJZOvf88fvVHWlqeXWuVVd1d3f9+vVr+6+deucU/feOvd3zz33HHN3RERERAAm9LoAIiIi0j8UGIiIiEiVAgMRERGpUmAgIiIiVQoMREREpEqBgYiIiFRN7HUBZPQ55JBD/NJLL+11MUREpByrt1AtBtK2hx9+uNdFEBGRLlFgICIiIlUKDERERKRKgYGIiIhUKTAQERGRKgUGIiIiUqXHFWVUMav7dE1Dmj1URKQ9ajEQERGRKgUGIiIiUqXAYAwxs6lmdo2Z3WBmN5vZv6Tl88zsCjO7I/2eW/OeD5nZcjO7zcwO7l3pRUSkHygwGFvWA690932B5wGHmNkBwGnAle6+O3Bl+h8z2ws4BngOcAjwRTMb6EnJRUSkLygwGEM8PJH+nZR+HDgCOCstPws4Mv19BHCOu6939xXAcmD/ESyyiIj0GQUGY4yZDZjZ9cCDwBXu/jtgvrvfB5B+b5dW3xG4u+bt96RlIiIyTulxxTHG3TcDzzOzOcAFZrZ3k9XrPftX9/k+MzsJOAlg/vz5LF26tGxRR8RoKaeIyEg78MAD6y5XYDBGuftqM1tK9B14wMy2d/f7zGx7ojUBooVgQc3bdgLubZDemcCZAIsXL/ZGB1S/GS3lFBHpF7qVMIaY2bappQAz2wo4CPgTcBFwfFrteGBJ+vsi4Bgzm2JmuwC7A9eMbKlFRKSfqMVgbNkeOCs9WTABONfdLzaz3wLnmtmJwF3A0QDufrOZnQvcAmwCTkm3IkREZJwyDRkr7Vq8eLEvW7asJ3lrSGQRkY6pW6HqVoKIiIhUKTAQERGRKgUGIiIiUqXAQERERKoUGIiIiEiVAgMRERGpUmAgIiIiVQoMREREpEqBgYiIiFQpMBAREZEqBQYiIiJSpcCgj5nZQjM7KP29lZnN7HWZRERkbFNg0KfM7O3AD4GvpEU7ARf2rkQiIjIeKDDoX6cALwHWALj7HcB2PS2RiIiMeQoM+td6d99Q+cfMJgKaQ1hERLpKgUH/usrM/gnYysxeDZwH/LjHZRIRkTFOgUH/Og14CPgjcDJwibt/uLdFEhGRsW5irwsgDb0JOMfdv1pZYGaHufvFPSyTiIiMcWox6F9fAH5pZnvWLPt4rwojIiLjgwKD/rUCeBvwQzM7Oi2zHpZHRETGAd1K6F/u7r83s5cD3zezFwEDvS6UiIiMbWox6F/3Abj7w8DBxKOKe/e0RCIiMuYpMOhT7v63NX8PuvsH3F37S0REukq3EvqMmZ3u7u8zsx9TZ0Ajdz+8B8USEZFxQoFB//lO+v3ZnpZCRETGJQUGfcbdr0u/r6osM7O5wAJ3v7FnBRMRkXFB96z7lJktNbNZZjYPuAH4ppn9e6/LJSIiY5sCg/41293XAK8Dvunu+wEH9bhMIiIyxikw6F8TzWx74A2AhkEWEZERocCgf30cuAxY7u7XmtmuwB09LpOIiIxx6nzYp9z9PGKq5cr/dwJH9a5EIiIyHqjFQERERKoUGIiIiEiVAoMxxMwWmNnPzexWM7vZzN6bls8zsyvM7I70e27Nez5kZsvN7DYzO7h3pRcRkX6gPgZ9ysymEH0KFlGzn9z9403etgl4f5qVcSZwnZldAZwAXOnunzKz04DTgA+a2V7AMcBzgB2An5rZs9x9czc+k4iI9D+1GPSvJcARxMl+bc1PQ+5+n7v/Pv39OHArsGNK56y02lnAkenvI4Bz3H29u68AlgP7d/hziIjIKKIWg/61k7sfkvtmM1sEPB/4HTDf3SvTON9nZtul1XYErq552z1pmYiIjFMKDPrXb8zsue7+x3bfaGYzgPOB97n7GjNruGqdZU+b0TGleRJwEsD8+fNZunRpu8XqidFSThGRkXbggQfWXa7AoH+9FDjBzFYA64mTuLv7Ps3eZGaTiKDge+7+o7T4ATPbPrUWbA88mJbfAyyoeftOwL310nX3M4EzARYvXuyNDqh+M1rKKSLSLxQY9K9D232DRdPA14Fb3b12wqWLgOOBT6XfS2qWn50mZ9oB2B24pkyhRURkdFNg0Kfc/b/NbF/gZWnRL939hhZvewnwFuCPZnZ9WvZPREBwrpmdCNwFHJ3yuNnMzgVuITo5nqInEkRExjdzr3tLWXosjUHwdqByO+C1wJnu/oXelSosXrzYly1b1pO8m/SXqEvHt4hIQ3UrVLUY9K8TgRe5+1oAM/s08Fug54GBiIiMXRrHoH8ZUNusv5kG0Z2IiEinqMWgf30T+J2ZXZD+P5LoWCgiItI1Cgz6lLv/u5ktJR5bNOCt7v6H3pZKRETGOgUGfcbMZqVBieYBK9NP5bV57r6qV2UTEZGxT4FB/zkbOAy4ji1HIbT0/669KJSIiIwPCgz6jLsfln7v0uuyiIjI+KOnEvqUmV1ZZJmIiEgnqcWgz5jZVGAasI2ZzWXoEcVZxLDFIiIiXaPAoP+cDLyPCAKuYygwWAOc0atCiYjI+KDAoM+4++eBz5vZu/th+GMRERlf1Megfw2a2ZzKP2Y218z+vpcFEhGRsU+BQf96u7uvrvzj7o8SkyqJiIh0jQKD/jXBaqYSNLMBYHIPyyMiIuOA+hj0r8uAc83sy8TARu8ALu1tkUREZKxTYNC/Pkg8ofBO4smEy4Gv9bREIiIy5ikw6FPuPmhm3wJ+5u639bo8IiIyPqiPQZ8ys8OB60m3D8zseWZ2UW9LJSIiY50Cg/71UWB/YDWAu18PLOplgUREZOxTYNC/Nrn7Y70uhIiIjC/qY9C/bjKz44ABM9sdeA/wmx6XSURExji1GPSvdwPPAdYD3yfmSnhfT0skIiJjnloM+pS7rwM+DHw4DW403d2f6nGxRERkjFOLQZ8ys7PNbJaZTQduBm4zsw/0ulwiIjK2KTDoX3u5+xrgSOASYGfgLb0tkoiIjHUKDPrXJDObRAQGS9x9IzE0soiISNcoMOhfXwZWAtOBX5jZQqIDooiISNeo82EfMrMJwAPuvmPNsruAV/SuVCLSj2omYS3MXY2P0phaDPqQuw8C7xq2zN19U4+KJCIi44QCg/51hZmdamYLzGxe5afXhRIRkbFNtxL619vS71Nqljmwaw/KIiIi44QCgz7l7rv0ugwiIjL+6FZCnzKzaWb2ETM7M/2/u5kd1utyiYjI2KbAoH99E9gAvDj9fw/wr63eZGbfMLMHzeymmmXzzOwKM7sj/Z5b89qHzGy5md1mZgd3+kOIiMjoosCgf+3m7p8BNgK4+5NAkeeSvgUcMmzZacCV7r47cGX6HzPbCziGmKzpEOCLaV4GEREZpxQY9K8NZrYVabRDM9uNmGmxKXf/BbBq2OIjgLPS32cRoylWlp/j7uvdfQWwHNi/A2UXEZFRSoFB//oYcCmwwMy+R1zpfzAzrfnufh9A+r1dWr4jcHfNevekZSIiMk7pqYQ+5e6Xm9l1wAHELYT3uvvDHc6m3q2JukOimdlJwEkA8+fPZ+nSpR0uSneMlnKKjCR9LwTgwAMPrLtcgUGfMrMr3f1VwE/qLGvXA2a2vbvfZ2bbAw+m5fcAC2rW2wm4t14C7n4mcCbA4sWLvdEB1W/6sZztDmGr4Wul0/rxeyH9Q7cS+oyZTU0jHG5jZnNrRj1cBOyQmexFwPHp7+OBJTXLjzGzKWa2C7A7cE1+6UVEZLRTi0H/ORl4HxEEXMdQc/8a4IxWbzaz7wMHEoHFPcBHgU8B55rZicBdwNEA7n6zmZ0L3AJsAk5x980d/TQiIjKqmJop+5OZvdvdv9DrctSzePFiX7ZsWU/yHgvN8GPhM4wEzRpYjLaTlFD34FGLQZ9y9y+Y2YuBRdTsJ3f/ds8K1YJOeCIio58Cgz5lZt8BdgOuByrN+w70bWAgIiKjnwKD/rUY2Mt1WS0iIiNITyX0r5uAZ/S6ECIiMr6oxaB/bQPcYmbXUDMUsrsf3rsiiYjIWKfAoH99rNcFEBGR8UeBQZ9y96t6XQYRERl/FBj0GTN7nPrzFRjg7j5rhIskIiLjiAKDPuPuM3tdBhERGb/0VIKIiIhUqcVApIaGlxWR8U4tBiIiIlKlwEBERESqFBiIiIhIlQIDERERqVLnQxFpmzppioxdajEQERGRKgUGIiIiUqXAQERERKoUGIiIiEiVAgMRERGp0lMJIiLSVLtPoegJlNFNLQYiIiJSpRYDkTFIV3gikkuBgYiMWwqgRJ5OtxJERESkSi0GIiJdpFYJGW3UYiAiIiJVCgxERESkSrcSRERkXNBtnWLUYiAiIiJVajEQEREZJUai1UMtBiIiIlKlwEAws0PM7DYzW25mp/W6PCIi0ju6lTDOmdkAcAbwauAe4Fozu8jdb+ltyURERpd2m/mhPzs4qsVA9geWu/ud7r4BOAc4osdlEhGRHlFgIDsCd9f8f09aJiIi45BuJUi9tq+ntW2Z2UnASenfm4G9n/amEWgS63YeY+EzjEQeY+EzjJU8xsJnGCnaF8UoMJB7gAU1/+8E3Dt8JXc/EzhzpAolIiK9oVsJci2wu5ntYmaTgWOAi3pcJhER6RG1GIxz7r7JzN4FXAYMAN9w95t7XCwREekRGyv3jkRERKQ83UoQERGRKgUGIiIiUqXAQERERKoUGIiIiEiVAgMRERGpUmAgIiIiVQoMREREpEqBgYiIiFR1JDBYsmTJpZ1IR0RERHqrUy0G23QoHREREekh3UoQERGRKk2iJF1jZhOAKenf9e4+2IU8JgKTAHf3pzqdfspjMjHB1GZ339CF9I3YTgZscPfNXchjJvBMYBC4rRvbyswmEXXKoLuv73T6KY8pxAXNJnff2IX0R2JfzAV2ATYBt3bpcywA5gOr3X15F9I3YA9gOnCPu9/fhTxGov7YBlgIrCf2RTf29yJgW+Bhd1/RhfQnAHsCU4H/dveHy6bZt4GBmXVtdid3t26lLdWT9XbAXOKECrDZzFYDD3aiIjSzacSXbRZRiWNmG4BHiC9g6ePHzOakPLaqWbYWeMjd13Qg/Qkp/XlEcAPgZvYYsZ1Kn7xTpfRPwHFEJQ7wiJl9E/g3d1/VgTymE/t7Zs2y9cR+eKRs+im9ecQty6k1y54gttMTHUh/gKF9UakXB2v2RelAx8z2IPbFGxg64d1vZl8FPtOhz3Eo8I/AgTXLrgc+7+7f6kD6A8C7gFOA3dPiQTP7L+J4+nUH8qhXfwya2aN0rv54PrEvjmRof99lZl8BPteh/f164P3AATXLfpfSP68D6U8B/hfwDiK4AdhkZhcC/8fd/5CddidmV1yyZMmyI444YnHphGqMhcDAzF4L/AjY093/VOf1pcCp7r6sg3keCdzu7rd0Ks02858M7MbQiW64jcCfy1x5m9ks4ovQaD8+DqwsExyY2fbEiaKR+939wRLpTyCuGqc3WGWQ+AzZJwsz2wv4OVHJ1nMb8HJ3f6BEHnOBnWi8L1a7+1256ac8diJO2I38pUwAkk52u1ETdAyzGbjT3Z8skceLiKnNZzdY5ffAK939sRJ5vAc4ncb74kvu/vcl0h8AzgNe22CVTcCb3P3cEnlMIvbF5AarbCT2RfaJ28xeDSyhJuAfZilwaJnA3Mw+Bny0ySr/6u7/XCL9qcBPgFc2WOUp4Eh3vywnffUx6K5jgV8Bx4xgnkcCe7XzhhShd8rONA4KSK/tnJt4KuvONK78IK5c55fIYxbNgwKAZ6Qr5Vzb0zgogPhuLkwBRNtSU++PaBwUADwb+EZO+imPyTQPCgDmpOba3Dzm0TwoANgxVZS5dqRxUABx1boobdO2pe10AY2DAoAXAP+Rk37KYz+aBwUA7zSzN+fmQVz9NgoKIK68v2Nm2d9vYAGNgwKI+mNhk9ebMrPZwA9pHBRAtLZ8skQer6Z5UADwETM7JDcP4OM0DgogjufzUuDeNgUGXWJmM4CXACeSAgMz28rMzjGzG83sB6SD08zeaWafqXnvCWb2hfT3m83sGjO73sy+kqJ2zOwJM/ukmd1gZleb2XwzezFwOPB/0/q7mdlSM1uc3rONma2syeM8M/sxcLmZTTezb5jZtWb2BzM7IuMzbwVMK7DqtHQrIMc8ih2383Ircoo/ZZN1wkv7sMgXtuh69RxMnPhbOdTMnpmZx9Y0PxFVlHlqqdv7YhLNT9gVk4jbVjmOJgLBVo4xs1YBaSPvpti+eE9O4umYPaXAqpOBkzPzmArMKLDq1FS/5jiBYvvxxBJ11HsLrpe7L6YBby+w6kzi87ZNgUH3HAlc6u63A6vM7AXAO4F17r4PEZHul9b9IfC6mve+EfiBme2Z/n6Juz+PaNJ8U1pnOnC1u+8L/AJ4u7v/BrgI+IC7P8/d/9yijH8FHO/urwQ+DPzM3V8IvIIILtq9Ii5SweasW6to5TyR5lfkdaUKsGilMysz+JhB8e9e7nZqdmVXy9pYd7iiZZucgsa2pCvtoi0BuSftah+VAnL3xetarwLESfWwLufxwnRrpl2LKd7SV7Qsw7WzD3P3d9FjfTbwqnYTT8HNoQVXPzgz+DgQmFNw3ax9ocCge44Fzkl/n5P+/2vguwDufiNwY/r7IeBOMzvAzLYmrvR+TRyY+wHXpg5ErwJ2TWluAC5Of18HLMoo4xU1nc9eA5yW8llKVMjtNgkOtF6lKvfYayePdtataKdcRvGTSq2R2E4jUcm2U7acz9HtfQ3d/wxQ0ymzgLb3RQpO27mCztnfXf0MyUjs725/L6ZT/DiZQHv7raLr3+2+fSphNEsn91cCe6dOlAOAA39Iv+v5AdFb+U/ABe7u6Qt/lrt/qM76G2s6122m8b7cxNCBOvzqa21tsYGj3P22xp+spXZ6C+f2LN7IUI/ubuSxmdhHRU74mzMfoRqJ7XRvG+v+JTOPTRSvQ3I+x0hsp01dWrfWfW2s2/a+SHXF/RS7XbEZyOls2tXPkIzU9+L5BdfN+RyPEfVqkdbKdcDqjDy6/t1Wi0F3vB74trsvdPdF7r4AWEH0PH4TgJntDexT854fEbcfjiWCBIArgdeb2XbpPfPMrFXHm8fZMrpfydAti9c3ed9lwLsrTePpcZ52PUrjwGe4nC9EJY8i1rv7unYTTyf6oj3Dcz/DExSv2Ip+3uHOKrjek0BuL/KiZVub8xSKu28ijudOlmW4x4iTZRG5j3Z+q+B6jzDUCtitPC7OeYLD3W8Gij49VfTYG241xeuPbn8vVhC3aNuSjtmzC67+/cyns34J3Flw3ax9ocCgO44leiHXOp9o7p9hZjcSzxpfU3nR3R8FbgEWuvs1adktwEeIzoE3AlfQ+qrgHOADqQPhbsBnid7Iv6F5B61PEB2sbjSzm9L/bUnPFxc5Wa4u8bjiauI2SivZjxICD9G6ghoEsgYSSS09DxVY9anc8RLSrapLCqx6ZomxDFZR7Cq6yGct897NxEm1bSkQLPLetTmBZsrj58DVBVY9vcQjcl+kdUC7Gfi/mekD/FuBdf4CfDsn8XRSLXIsPlbiccULiFbZVj5TYkCl04mAu5mn0nptS/XHpwusejtxwdk2jWMgHZUer1tI43uSjxOjc2WPYpYG9tiFxo81PVDm2fyUx2waPxZZeoyBlMcONA7W1hPPa2cP5pI+w0+Ip2PqOQ84LlXIuXlsReyLRrcUSo0xkPKYRzxSWG9fbAZW5J60a/JYQOMnQJ5MeZTZTvOJVrl9G6zyVeDkkmNvvIzofFyvY9pG4G3u/t3c9FMe7yeCi3r74l7gYHe/qUT6RtQfje6NP0F898rUH4uIi6xGT+N8qsHt23by+Bvi+1Wvc+GTwBvcPbd1qJLHJ4lBmur5M/Dq3JEW1WIgHZW+sCuB/ya+xJvTzxNpWakvdcpjPXAHURE9RZyoNxHNi8vLBgUpj8eIAYAeIirVQaKl4kFiSOHSo9S5+73EF3g1sY0GiUrjL8AdZUd4S5/hFcTtq18SQdlqoiXhMOCNZU52KY8nie10HxHMVPbFKmKgrdIjH6YWjduJK/vafXE/sS9KBQUpj7uJ5uM1DO2LdcDdxDFVdjs9QDwFdCI1LYXEFd1B7n5SmaAg5fFLYgyTjxHHVcXngb3LBgUpj88Rtya/XrP4VuADwHPLBAUpfXf3lUQd8jhD9cda4C4iQCtbf6wkxo14F3BDzUtnE0+AlQoKUh6XEPviU0S5Kz4N7FU2KEh5fBh4MfC9msU3EI+uPj83KIA+bjEQERmrKi2i3Wy97HYeI/EZRoL2xdOpxUBERESqFBiIiIhIlQIDERERqVJgICIiIlUKDERERKRKgYGIiIhUKTAQERGRqtKTKJnZzPPPP3+CmbUz+5aIyLg3EvVmt/MYK3X/eNwX7l53LhK1GIiIiEiVAgORYro9utsEGs/9IFsaPn14pw0QE4pJ7w3QgZZtaU+/bvCsGeXa0GiCDumceelnSvp/PTF+fu5MfsNNTOnPYagSf4IYT7/0PAbJVGBr4niZQIyf/1jKI3d2t+H+DjgJ+Gsi+LiHmEL3a3RmWxkxOdDWDAUeT6a0c6eNHm4SsS/mEhW5E+PcryLGuO+EFwHvILbXZKLs3we+QvEpaJuZQExL/j+BA9KyO4Fvpp9O1EkTiG00r2bZLsR2KjrVdyuTiX09u2bZTsQx22rGv6IOJPZFxZ3Ad4l98ZcOpD8JOIbYF5Xp328l5mf4Dp35HAMMHbMVi4jtVHSq71aeA7wTOKpm2RnAl4BSc0rUOJSoPypuI6Za/iolZjUtPVdC6mNw1VFHHfXyUgltSYHB6DWB+IJt1eD1J4kJUspMhDKVmIGtUWC7ipjUp4y5xBTX9VoKnDiBlzlOjaggjmvw+j3A4cDyEnkMECeeKQ1eX0tMbFWmEphGzEI50OD1Byk37TLAe4hpwOvti7XEJFE/K5H+JOKE8zcNXr+DCEjuLZnHIoaCs9+m33+Vfq8hJmwqYyYRBFRagofncR/lg82PAf/Q4LVHieDq2hLpTwN+CLy0wes3AEdQ7nNMIeqPygXF8O30KOX2NcDRRKBUr47aRAQMPyiZx+eBtzZ47X5iO93aLAH1Meh/2xFXJjcQX6wf0nha0E67iS2vYsrYkcZBAem1nUqkb8SJqFlrV6W1ItdWNA4KKmXYicYn3CJOpXFQQEr/h5Rr1duZ5mWcTnzOXAM0DwogjuvZTV5v5WDgX2m8L6YTV6tljqmP0zgoANidaJ0oYyHNbxXNAuaXSH8SWwYF9WxPbK9cx9I4KIAIps+j8fTVRZxO46AAYtrqrzd5vYidaX6rqNLClmtfGgcFpOVfYqg1JMe7aBwUADwDOJ/MOkqBQf84m5gad1/ghcC/ANv2tETtm0Sx1piZ5N9Pn0ux+79lvtjb0LpPgZXIYxJxxdDKrsSVao6tqD8X/HBzaH5ib6Zy66CVbTLTh2gtaGUG0eycYxZwQoH1nk/c7skxg2IV9Fzy+7JsTbH6vMy+eG+BdeYBb85MfwfiSruVVxHN9DnmUKzuKVN//D2tA/qJab0cE4BTCqy3E1vexmgrA+m9vyaal75Rs+yPRBPXJ4DfAVcDr0uvvRT4L+Je0u+JIOINwM/Teruk9bYmrqaWpp/KvdN5wIXAr4jmqEpl9BG2PGH9b7a8l9jKnDbWzb2KLPq+yRQ7MQ5nRODSybIM9yqKV9BvzMyjaNmsjXWHK3plOJW8QHAH4GUF183dTn9H8avo3DyKfi8GyL/NWXQfziAvENwb2Kvgurnb6fUUL1u3vxeTyGtdmQi8tuC6R5L3vXgp0TJbRNZ2UmDQH/YC/lBn+eHAPsS9r78jmlQrzY17A/9InOyPIW47vIIIFk5O63wG+E+is9Cb098ApxFBx0uBS4AFafm3GWreNiLaPLeNz9FOs3duE3m38xig+FXbhDbWrdVOS1Buq9FI7It2TjA5ebRzdZu7nfptX+S23nR7f4/EdhqJ/d3t7TSL4k/NTCEvEOz6vujXpxIk/BVxn3mQ6MD1K2A/oqPS74EH0norGOp8dTNDTZ6vAPaoSW8mccXwEqLDFsBlRGcbgLuITj37EPeGb6S9Tj6b2lh3cxvr5r4vJ492OkU6eR33Hm29SlXukwPd3k6V9xU9keXk0c5nz91O/bYvcjvljoV9MRJ5dPt78QRRDxY5t24m7wmqrm8ntRj0h1up3xGl2dVo7eNygzX/DzJUQUwgmq1fkn6ezdCB2OiEdhYRNLyZ6KndjnZ66ec+nlX0QN9I3qNygxT/suY+lXAlxT//jzLzKFo2b6MswxV933ryHu+8C1hWcN3zM9IHuJjiZcvdF0W30yD5x1TRPNbRXgBf8QfgzwXXzd0XF1A80O7292ITeSftDcBPCq57CfBURh6/AB4uuG7WvlBg0B+uIu41HV+z7AXESfB1xH7amji5F60oIU5AJ9f8/9z0+9dEnwSAV7PlveIfAwel/H/aRl4QFWyRk/Fa8scBWE2xSL7M40xFv3SPZKb/JPEESiv3Ey1GOdZSrNJ5nLwTBcQ2LnKFm7udIJ77bmUD8dx2jkcodrvsDqJ1LccaooytrCa/xWAVxU6qZfbFlwqss5YYhyPHCqLvVCvXkP9I5KMUO97L1B9n0HpfOMWO7Xo2Uux4fwQ4JycDBQb94zjglcTjitcAHyIe/bmJ6A/wE+CfiefCi/oA0RLxW+KLdGJa/ikiyPhlyrP2+emNafkF5FVSd9O8EtxAuee1B4kryWZle4ziJ/d61jJ0m6aR+yg30MongMubvL6K6DhUZiClu2i+L56i3IA0m4jxFppVgqtor7l+uPOJDrKNbCSC3zLjPXyA6LTbyH1EP54y4z3cRfMT0loiEMy1nnj2vlkZH6Lc2Btn0rwV8UngLbRXRw33TpoP/nMnW15AtcuJfdHs4uJxyq9Zd5wAABd6SURBVI29cTXwQRrvCwf+CfhNiTw+Q1zENfIYccxmDTCmAY5kOCP6MvwPijcdDldp4ah9tHAjcYJ4hHKDG1VMJjorzWYowH2K8ieiWjOIzzEj/e9E8+LDRJNsWQPEo3JvZ6jH9xoiyv8CMfhQJ/Ko7IvKfc8NDI1CWa4CCJURImczdPtrHZ0d0e8w4qRReUphI3ARsZ1+34H0pxD74W0MjR+yijgRnkG5k3bFROKYnUME/xB9hh6hcyOCbpXymMlQsLMPnR3R743EaHsvTP8/RQRw/0GLAXUKmkEEe29jqGP0A8Rtzi/SmW01iaF98bu07AXEdupU/fHXxGOFBzNUR11KHE9XdSB9I+rpkxhqDX6CGDjpCxQYEbTRAEf9GhhIbzybaKX4MfDhDqVZORnlNle3YimPQfI70bUygTjBbqYzQU0984lg50E6N9zycGNhX1QuGnagc0NfN8pjayIA6WYe3bpIMYaCsm7lsTXxSF+nAuXhaj/DXLp/THVrO81mqJW0W3lUPsO2tFF/NAoM9FSC1LqNuLropG6dhCqc7lXeFYN0LyCoaHXrohPGwr6o6FZQUGukPks3dKIlqJVHKNdnoZXaz9CtoGAkdKrVrIiOXFSoj4GIiIhUlW4xcPfHlyxZMtioSUJEpFPMogtDN+sb5dE/xsp26nYenU5fLQYiIiJSpcBAREREqhQYiIiISJUCAxEREalSYCAiIiJVCgxERESkSoGBiIiIVPVNYGBmPlI/vf6s44GZTTeznc3s2elnZzOb3sH0zczmmNmuZraHmT3LzLY3s8kdzGOemZ1qZtea2Z/N7Goze7eZdWxYUzObYmY7pPLvYWa7mNlsqzyY3Jk8ZpjZwpp9sZOZbdXB9Kea2QlmtjRtpxvM7ONmtmMH85hoZtvV/L9b2j8dq8PMbF8z+0rN/xea2aEd3hezzGxRzf87mtnUDqY/w8zeYWa/rln2ITPbtoN5TDKzZ9Qcs7ua2dwOb6cXmdm3av4/x8wO7GD6E83s9WZ2ac2yfzez3TuYx4CZbWNmz6xZto2ZDXQwj2ea2edq/r/MzI42s1JjFJWeKwFgyZIly4444ojFpQoygidsd+/YASxbSpXDTmw5lXOt1cDdXuLAM7NJwC7E5D3DOXCvu5caqjVVQhcS45wP9xBwmLtfU+e1dvLYFti+wcvrgBXunj0UbNoXC2k8Pvsj7l5mdkXMbDdiOuLd6ry8ATje3bOmfq3JYxawM3Ehc0NavG/6vZHYTjnz2tfm8WngHxu8/FPgte6ePQxzOhksZGhCruGf4wF3LzUstpntQ0xbvEOdl9cCR7t7kWmNm+Uxl/h+16tD1xP7osgU043SnwB8mZjQqp7zgeNK5rEtsZ32q/PyIPB+dz89N/2UxzSijqoEAbX7ezOw0t2zZj6syeM9wP+j/gX+74FD3T1rpsu+aTGQYszstanlY48W670vHZyV/y8xszndLyHb0zgogJjNrF7FVUg62TUKCiAqrB3LXNWb2bOJiaTqBQUQE5X8l5ktaPB6kTzm0jgoAJgGLMpNP1lA80lbtjaz+bmJm9kMYuroekEBxKRQ3zGz7AnW0jG8kMZ11SRglzJXYWb2fhoHBQAHAd/LTT/ZmaGgoJ75ZrZ1buLpZHcZjb9b04EfmdnzSuQxgzimGl1YTSH2RZnzyidpHBQAHAV8KTfxVLaLqR8UQBxn/8/Mji2Rx2S2DAqGGwAWlWndNLM3ENORN9rWLwB+krsvFBiMPscS0yIf02K99xEnFwDc/W/cfXU3C5aar4pUbvNKNHXNonFQUCv7hAe8n+aVOMA84F0l8ihSvumpMm6bmU0hgrBWti1Rkb8F2LXFOhMpN1PntjQ+EVVMIvZH29J2Oq3AqofnnlRTcDOzwKrblWiOPxl4Rot1ptI8AGqlyDE7hcYBdVPpwuU9BVY9wcwW5uQBHArsX2C9f85MH2I651aB6kBaL9f/LrDOYuBvcxJXYDCKpJPES4ATSYFBuo/1WTP7o5ndaHEP/D3ElcPPzeznab2VZrZN+vsfzOym9PO+tGyRmd1qZl81s5vN7HJr/z70XFpX4qR1mrUqNFP0BLBVRvlJ93uPK7j629pNP+Uxg7iaLiLrhEfx7TuBYgFEPUU//0FmtnO7iafgsWjLT+52OpziFfSJmXkULdskigUQ9RQt21Fm1vaJOwVQRfsI5e6LY6m5mGliAnBCZh5Fj9k9zezF7SaeArui372sfhlm9iLgOQVXzzpmFRiMLkcCl7r77cAqM3sBcBLRbPV8d98H+J67/wdwL/AKd39FbQJmth/wVuBFwAHA283s+enl3YEz3P05RF+Ao9osXztNY7nNaN3OY1uKV4DbZF7R99t2mpSZx6KC61X6OrRrEsUCTcjfTou6tG6tru7vmr4kRdPPuZXXzjHSz/tily7nMYHWrQUVA22sW6vbn0GBwShzLFDpyHVO+v8g4MvuvgnA3Ve1SOOlwAXuvjZ1pvoR8LL02gp3vz79fR3tH1SDXVp3JPNY18a6DuR0euu37ZTbEfTJLq1bMRLbqdufAbr8OVJH3naOw5zP0c4x0q/HE7T3/e72doLu11FZ20mBwSiROia9Eviama0EPgC8kdiH7RyMza7A1tf8vZn2p+Ve08a6udODFn3fZqIndlvS0wy/K7j6ZZWArE1PUHyfdXs7QXv7rdZPCq73F+AP7Sbu7uvZ8phsJvczXELxyvnizDyKls3J399Fy3aTu6/MSH8dUPRYz90X7Wzf3H1R9JhdB/ys3cTdfZD4fhexNq3frqUUr9uytpMCg9Hj9cC33X2huy9y9wXACuKxlHdUOvOZWeX+3uPUv1/5C+BIM5tmMa7Aa4FfdqKA6fGbIlcuT5V49OsRip1UH8380gH8Z4fX20IKJop0BB0EWrUANbKGeJSvlbUlHvU7g2L74sslHrss+thp1uOp7n4n8ehaKw8BP8jJg9jXRT7/Gncvss/qKXosnpGTeGqVKHos5u6LZRQLyv8MXNpyrfq+RrE66rvu/lhmHkU//8M5ibv7GuA7BVZdT3zetikwGD2OBS4Ytux84n7hXcCNZnYDQx3nziQeqft57Rvc/ffAt4BriC/h19y97au5Ju6ieSW4Oa2TJVWc97RYbR1wf24exKNp326xzn+4e9Grj3rupXkF5cR4D1kn1FSR/zfNr4Y3AnfnpJ/yuIl4gqOZnwKfyc2DqGRbXYE+4O7tNK8OdzKwssnrTwJvTC0YbUsB6l00D6I2EC0rWdz9F8D/abHaD4l6IdcDtL5S/Uvudkr+R8qnkceIfZEV9Lv7Q8DxNK+jbqDE0xspoGgVHKwqEXgAfJDmrXCbgbdmj43h7qV/LrzwwmVl0yC+NCPy04nPrJ+m+3IK8dz2c4F90s9ziQ5SUzqUx0zgmTXp70P01N0emNCB9A34X0SrTO3xcztwcoc+wwCwI7D3sM+xKzC9Q3lMJfqKDN8XC4BJHcrj9UTLVe12egD4BDC5Q/tiO2DPmvT3AZ4FzOnQZ3gG8HUiqKzksZm41bC4Q3lUBr3ZpyaPvYkBgyZ2KI8TgJuH7Yu7iUcyBzq0L54B7DXsmN0dmNWhz7ALEZyvr/kMm4j+UM/pUB6vIlpPa7fTamJsgE59jq2BPYbt7z2AbTqU/izg9FTu2s/xC+CgMmn3zciHMvak2xuVRwaf8vxm0mZ5TCV6QQ8C6zz/9kGj9CcQT2/MAx4ErvVOfGmensc0otLd4OWuuBrlMYmh8R+e9Ly+Ea3y2BeodF6d4iVGp2uQvjHUAjLN3XM7oDXLYy7x/PdE4FbPux/fKo/JDPWdmOglRrdsksd+RJC8Gvhtp/NI+2I60eq8wUuOPNkgj+0Yaj3YyUuO0tkgj2cDf0r/zvCSoxE2yGMaqaXFuzDqbkr/r4jv93J3v61smqXGUxZpJp18cjtTFc3jKfKeDCia/iDwm26lX5NH9nC7BfPYSLE+B2XyuKHyWHang4KUptek3/GgIKX7KHBFN9KuyWNDzefoeFCQ0r2uG+nWpO90/5h9sGY7dTwoSOneVpNHx4OClO66/HGriqUPXNnJNNXHQERERKoUGIiIiEiVAgMRERGpUmAgIiIiVQoMREREpEqBgYiIiFSVflzRzGaef/75E8wsd7pQERljul0fjJX6Zqx8jm4bie00VvJoh7vXfZxcLQYiIiJSpcBAREREqjox8uGaFStWQP5Um70wq9cFGAcGgLkMDYn8JPAoxWaZK2pPYkKURcToh5cTE0t1ckjhGcBs4vNsIiZx6eQIaQbMSfkYUfZHiUl1OmUiQ/vCifkAHiVvLvhGng+8ueb/1wEXUXyq3lYmAH9T8//pwPcpPkV20TzmEEP9GnFMPUpnR4xcRMxnUPF24nN0chTBacTnmEh839bQ+RFIZxPzlUwgts+jdHYE0nnAW2r+/0fgLJpPsNSuZxH1R8VbgPPo7Od4BTGXSMUryZjOuYlK/VEdnprYF6XqwNJzJZiZf/azn+XUU08tlc4IU2DQXVsD84mDtpYT8w1kTTdaYytilrgj6rz2CPBWYs7yMiYTkz5NrvPaU8TMhWVPejOJSZQG6ry2mhKz7dXYDtiGp++LQaKSzZ3WuWI2MVvnq+q8di8RLCwrmcfexMlzYZ3Xfp3yyJrqt8Yc0gRcdV57hHKzdULs488Cb+Pp++Jx4L3E7Idl81jIUDBeawMxw2PZoHkaMQFXvYvKJ4gJm8oGnO8EPk5MxlZrE/A54JMl058MfBF4Q53XVhGzbV5WMo+dgXOIY3e4m4FjiDqkjBnEBFz16o/HaD0LrfoYjGKricrvGmLM/nfx9Iqln8wjZl+rV0YjAoatS+bxbeoHBaS0zwVeWCL9AeLKrl5QADFZyS6U+/5MJyrYel9qiBPVTiXShwgKtqX+vphAnAjnlEh/InGFVS8ogJgS/ELg2SXyWARcTP2gAOAlxHTkw08i7ZhFBGiN9ufWxDFdxueAE6m/L2YCX2PLFpF2GXFM1gsKII7lRZRrJZ5C7IdGacwgTohlvBX4NPX350RiuuHsKZGTr1M/KICov84GXlwi/XnEMVsvKICYBfbitF6uacS2blR/zCbqlywKDPrfk0Tltz9wOPAa4EN11mt0gIykyvS4rWxHfnDzcuDgFutMBf45M32IK+xJLdaZTDTP52oUPNWazdCMiO0aID5HK/Mz04c4Hg9osc4sYsrfXO+ndQX6PBpX9EUU2QZb0/qYaOSZxAmvmQnENNW55tI6OJpIBIq55tP6nDGdCHRyTAE+WmC9U8n/7u1P44uKiknAxzLTh7g9tKjFOguJlolc9Vpkh5tFBBBtU2AwujwMvAc4Kf3/JuLq+QfAEuJL+WPgl8DVDF2B7AxcB/wncU/2a8CBxD35PwD7pfX2I2aW+1X6vXub5ZtDsQBlAvlf7LcVXO/lwK6ZeRS9is6N+KdS/ISfm8dcigVfE8m/tVZ0XxxOXivRDODoDpelXh6NWoaGyz1m30qxfbE78NLMPIoeJ7MLlmW4icS26mRZhjuy4HunAsdl5tEqQKs4ANgrM48T2lgvZ19MofgJP+uY1bTLo89K4sRaifz3J+bifpQ4KR9H3LOcR3RyuSSttyvRueZW4Cqiwn0NETycChwL3A4cQnRYOpCI3ms7lLXSTnNubtNv0S+rAXsAd7aZ/kSKfy8mp3za7ajTTitAbotBO+/L3Rd7FlxvEnHSa7cfwEKKV4C5lXg7nz13XxTdThCf41cZeRQNbgaI/dFu59YpFD+Jdft4gvz93c779gRuaTP9OcRtqSK2J07c7fbz6Xr9ocBgdKr9gv6MCAoqyz9K3HoYJO7xVpr2VzJ0kFeCA9Kyyn3BWcBXgN2Ik127TaftnCBze7220+EvpxNUu+Uq13u3e+l3u1zQ3hMmOU+jdDt9GJljdiQ+RztyPsdY2U7t1B85ebTbITknj65/t3UrYfRZRBxMD6X/19W89kbivvLLiODgQYYixtorhEGGeicPMhQgfgT4BfAi4p5tu5H/utarZK1b6zcF11sPXJuR/maKX009mZE+tPe4Y+6jke1s39w8iu6L1cBNGekvp/jjab/OSB9GZju1U7Zuf46N5D1++RTFA+3c7/ZIbKeix+wm8h6FfQK4seC6NxFPD7RrHcWDg6zHYBUYjC5bE89vn9ng9VlEwLCJCA7a7SE8m3jEDKL/Qrsep1ils5H8cS++WnC9C8l/hK3o+3If9dtIsS+sl8hjNcWuRtaTX5EX3RdnkxdEbSKeXS/iaxnpQ5zwipRtkNimOb5NsccEfw38KTOPosdJ7vHUzufPzeNnRDDYysPEkyg5vk6xk+olwH2ZeRQ9FnOP2U0UH5cia18oMOh/WzH0uOKPiS/PvzVY9wfEQDNXEa0Ht7eZ1+lEb9zLyX/K4V6af/GcoeAjx5+IR7+a+QvFejc3sorWJ8snyD9RQGyDVs2ODxRYpxGndcU2SLmxEn4NfLPFOncQj5/lOh34Y4t1zqXcc+f30vpq+P4C6zSyinjMrpnHgH/ITB8i0G4VbD9FufEeHqR1a9oj5LekQTyO3SyI2kx0wM4dAGwl8K8t1nkA+HBm+gDfBX7eYp2rgO+UyON+Wl+EPVhgnbo0wJF0w3SiY83wWxHriZNVJ0YOPIV4lK32kTwHriQqjpaDe7RgDD3nX9unY5Do01F2wBuIDmM7ENur1kai5efRp72jfTOJRyOHd057ijghlqnEKz5I7I/apzk2E1dd76X8gFZzgH8neq3X9otaQ7RafILyg+pMJfbF8HEANhAVbE6T73BvJALW4eNTXE1sp1s7kMd8ouNx7UWfE+W/j/LbaSKxnYY/kriZ2M9l9zVEZ+rPAs8dtvwO4lHtyzuQx/8kjtvhj6ouJeqPlSXTnwJ8imh5re0A+BTRgnYa5UdYnER0dBxef2wi6o+WrQWNBjjqRGAw8/zzz7/qqKOOenmphGQsmsaWQyLnNlk3Mgn4W4aGRL4C+HOH8xggAskJRKX6GJ0dShiiEqkMw7uBzg9fS0q/UkGtozMBQa2tgMOIimot8F+UD86Gewaxv2cSV3UX0dnhqSG20TSGhqfu5FDFEMfRa4hBnzYRV445/S9a5TGLOHYrQUGnOzVOIvaDMTRUeKcdQAxUZsANDHWY7pSJwKFEZ+v1wE+J4KOT5gJ/RwRrq4iBjcqONjrcZIaGVG/rNq0CAxEREanSkMgiIiLSkgIDERERqSo9wJG7P75kyZLBRk0SIiIiMnqoxUBERESqFBiIiIhIlQIDERERqerE44qXErPYdWJgi5Ey1d337nUhRERE+k3pwADAzJa5++IOlGdEjLbyioiIjBTdShAREZEqBQYiIiJS1anAoNE0wP1qtJVXRERkRHSkj4GIiIiMDbqVICIiIlWFAgMzW2lmfzSz681sWVo2z8yuMLM70u+5Net/yMyWm9ltZnZwtwqfw8wOSeVabman9bo8IiIi/aTQrQQzWwksdveHa5Z9Bljl7p9KJ9i57v5BM9sL+D6wP7ADMcf1s9y90/OBt83MBoDbgVcTc8VfCxzr7rf0tGAiIiJ9osythCOAs9LfZwFH1iw/x93Xu/sKYDkRJPSD/YHl7n6nu28AziHKKyIiIhQPDBy43MyuM7OT0rL57n4fQPq9XVq+I3B3zXvvScv6QT+XTUREpOeKTrv8Ene/18y2A64wsz81WdfqLOuXRx/6uWwiIiI9V6jFwN3vTb8fBC4gmuQfMLPtAdLvB9Pq9wALat6+E3BvpwpcUj+XTUREpOdaBgZmNt3MZlb+Bl4D3ARcBByfVjseWJL+vgg4xsymmNkuwO7ANZ0ueKZrgd3NbBczmwwcQ5RXREREKHYrYT5wgZlV1j/b3S81s2uBc83sROAu4GgAd7/ZzM4FbgE2Aaf0wxMJAO6+yczeBVwGDADfcPebe1wsERGRvqGRD0VERKRKIx+KiIhIlQIDERERqVJgICIiIlUKDERERKRKgYGIiIhUKTAQERGRKgUGIiIiUqXAQERERKr+PyX+tmKXwicHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "UpSet(from_indicators([\"Drama\", \"Action\", \"Comedy\", \"Adventure\"],\n", " data=movies_with_indicators))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we can use `pd.select_dtypes` to extract out all boolean columns:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2021-07-31T14:30:35.625514Z", "start_time": "2021-07-31T14:30:35.092223Z" }, "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "UpSet(from_indicators(lambda df: df.select_dtypes(bool),\n", " data=movies_with_indicators),\n", " min_subset_size=15, show_counts=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc-autonumbering": false, "toc-showcode": false, "toc-showmarkdowntxt": false, "toc-showtags": true, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 } UpSetPlot-0.8.0/doc/index.rst000066400000000000000000000001521435554746600160450ustar00rootroot00000000000000.. include:: ../README.rst .. toctree:: auto_examples/index formats.ipynb api changelog UpSetPlot-0.8.0/doc/make.bat000066400000000000000000000151011435554746600156110ustar00rootroot00000000000000@ECHO OFF REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set BUILDDIR=_build set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . set I18NSPHINXOPTS=%SPHINXOPTS% . if NOT "%PAPER%" == "" ( set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% ) if "%1" == "" goto help if "%1" == "help" ( :help echo.Please use `make ^` where ^ is one of echo. html to make standalone HTML files echo. dirhtml to make HTML files named index.html in directories echo. singlehtml to make a single large HTML file echo. pickle to make pickle files echo. json to make JSON files echo. htmlhelp to make HTML files and a HTML help project echo. qthelp to make HTML files and a qthelp project echo. devhelp to make HTML files and a Devhelp project echo. epub to make an epub echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter echo. text to make text files echo. man to make manual pages echo. texinfo to make Texinfo files echo. gettext to make PO message catalogs echo. changes to make an overview over all changed/added/deprecated items echo. xml to make Docutils-native XML files echo. pseudoxml to make pseudoxml-XML files for display purposes echo. linkcheck to check all external links for integrity echo. doctest to run all doctests embedded in the documentation if enabled goto end ) if "%1" == "clean" ( for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i del /q /s %BUILDDIR%\* goto end ) %SPHINXBUILD% 2> nul if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx echo.installed, then set the SPHINXBUILD environment variable to point echo.to the full path of the 'sphinx-build' executable. Alternatively you echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from echo.http://sphinx-doc.org/ exit /b 1 ) if "%1" == "html" ( %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/html. goto end ) if "%1" == "dirhtml" ( %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. goto end ) if "%1" == "singlehtml" ( %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. goto end ) if "%1" == "pickle" ( %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the pickle files. goto end ) if "%1" == "json" ( %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the JSON files. goto end ) if "%1" == "htmlhelp" ( %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run HTML Help Workshop with the ^ .hhp project file in %BUILDDIR%/htmlhelp. goto end ) if "%1" == "qthelp" ( %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run "qcollectiongenerator" with the ^ .qhcp project file in %BUILDDIR%/qthelp, like this: echo.^> qcollectiongenerator %BUILDDIR%\qthelp\project-template.qhcp echo.To view the help file: echo.^> assistant -collectionFile %BUILDDIR%\qthelp\project-template.ghc goto end ) if "%1" == "devhelp" ( %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp if errorlevel 1 exit /b 1 echo. echo.Build finished. goto end ) if "%1" == "epub" ( %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub if errorlevel 1 exit /b 1 echo. echo.Build finished. The epub file is in %BUILDDIR%/epub. goto end ) if "%1" == "latex" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex if errorlevel 1 exit /b 1 echo. echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. goto end ) if "%1" == "latexpdf" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex cd %BUILDDIR%/latex make all-pdf cd %BUILDDIR%/.. echo. echo.Build finished; the PDF files are in %BUILDDIR%/latex. goto end ) if "%1" == "latexpdfja" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex cd %BUILDDIR%/latex make all-pdf-ja cd %BUILDDIR%/.. echo. echo.Build finished; the PDF files are in %BUILDDIR%/latex. goto end ) if "%1" == "text" ( %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text if errorlevel 1 exit /b 1 echo. echo.Build finished. The text files are in %BUILDDIR%/text. goto end ) if "%1" == "man" ( %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man if errorlevel 1 exit /b 1 echo. echo.Build finished. The manual pages are in %BUILDDIR%/man. goto end ) if "%1" == "texinfo" ( %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo if errorlevel 1 exit /b 1 echo. echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. goto end ) if "%1" == "gettext" ( %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale if errorlevel 1 exit /b 1 echo. echo.Build finished. The message catalogs are in %BUILDDIR%/locale. goto end ) if "%1" == "changes" ( %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes if errorlevel 1 exit /b 1 echo. echo.The overview file is in %BUILDDIR%/changes. goto end ) if "%1" == "linkcheck" ( %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck if errorlevel 1 exit /b 1 echo. echo.Link check complete; look for any errors in the above output ^ or in %BUILDDIR%/linkcheck/output.txt. goto end ) if "%1" == "doctest" ( %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest if errorlevel 1 exit /b 1 echo. echo.Testing of doctests in the sources finished, look at the ^ results in %BUILDDIR%/doctest/output.txt. goto end ) if "%1" == "xml" ( %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml if errorlevel 1 exit /b 1 echo. echo.Build finished. The XML files are in %BUILDDIR%/xml. goto end ) if "%1" == "pseudoxml" ( %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml if errorlevel 1 exit /b 1 echo. echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. goto end ) :end UpSetPlot-0.8.0/doc/requirements.txt000066400000000000000000000001741435554746600174740ustar00rootroot00000000000000numpy scipy pandas matplotlib numpydoc sphinx-gallery sphinx-issues seaborn scikit-learn nbsphinx sphinx<2 sphinx-rtd-theme UpSetPlot-0.8.0/examples/000077500000000000000000000000001435554746600152575ustar00rootroot00000000000000UpSetPlot-0.8.0/examples/README.txt000066400000000000000000000001171435554746600167540ustar00rootroot00000000000000.. _general_examples: Examples ======== Introductory examples for upsetplot. UpSetPlot-0.8.0/examples/plot_diabetes.py000066400000000000000000000051111435554746600204450ustar00rootroot00000000000000""" ================================== Above-average features in Diabetes ================================== Explore above-average attributes in the Diabetes dataset (Efron et al, 2004). Here we take some features correlated with disease progression, and look at the distribution of that disease progression value when each of these features is above average. The most correlated features are: - bmi body mass index - bp average blood pressure - s4 tch, total cholesterol / HDL - s5 ltg, possibly log of serum triglycerides level - s6 glu, blood sugar level This kind of dataset analysis may not be a practical use of UpSet, but helps to illustrate the :meth:`UpSet.add_catplot` feature. """ import pandas as pd from sklearn.datasets import load_diabetes from matplotlib import pyplot as plt from upsetplot import UpSet # Load the dataset into a DataFrame diabetes = load_diabetes() diabetes_df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names) # Get five features most correlated with median house value correls = diabetes_df.corrwith(pd.Series(diabetes.target), method='spearman').sort_values() top_features = correls.index[-5:] # Get a binary indicator of whether each top feature is above average diabetes_above_avg = diabetes_df > diabetes_df.median(axis=0) diabetes_above_avg = diabetes_above_avg[top_features] diabetes_above_avg = diabetes_above_avg.rename(columns=lambda x: x + '>') # Make this indicator mask an index of diabetes_df diabetes_df = pd.concat([diabetes_df, diabetes_above_avg], axis=1) diabetes_df = diabetes_df.set_index(list(diabetes_above_avg.columns)) # Also give us access to the target (median house value) diabetes_df = diabetes_df.assign(progression=diabetes.target) ########################################################################## # UpSet plot it! upset = UpSet(diabetes_df, subset_size='count', intersection_plot_elements=3) upset.add_catplot(value='progression', kind='strip', color='blue') print(diabetes_df) upset.add_catplot(value='bmi', kind='strip', color='black') upset.plot() plt.title("UpSet with catplots, for orientation='horizontal'") plt.show() ########################################################################## # And again in vertical orientation upset = UpSet(diabetes_df, subset_size='count', intersection_plot_elements=3, orientation='vertical') upset.add_catplot(value='progression', kind='strip', color='blue') upset.add_catplot(value='bmi', kind='strip', color='black') upset.plot() plt.title("UpSet with catplots, for orientation='vertical'") plt.show() UpSetPlot-0.8.0/examples/plot_discrete.py000066400000000000000000000022311435554746600204670ustar00rootroot00000000000000""" ================================================= Plotting discrete variables as stacked bar charts ================================================= Currently, a somewhat contrived example of `add_stacked_bars`. """ import pandas as pd from upsetplot import UpSet from matplotlib import pyplot as plt from matplotlib import cm TITANIC_URL = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv' # noqa df = pd.read_csv(TITANIC_URL) # Show UpSet on survival and first classs df = df.set_index(df.Survived == 1).set_index(df.Pclass == 1, append=True) upset = UpSet(df, intersection_plot_elements=0) # disable the default bar chart upset.add_stacked_bars(by="Sex", colors=cm.Pastel1, title="Count by gender", elements=10) upset.plot() plt.suptitle("Gender for first class and survival on Titanic") plt.show() upset = UpSet(df, show_counts=True, orientation="vertical", intersection_plot_elements=0) upset.add_stacked_bars(by="Sex", colors=cm.Pastel1, title="Count by gender", elements=10) upset.plot() plt.suptitle("Same, but vertical, with counts shown") plt.show() UpSetPlot-0.8.0/examples/plot_generated.py000066400000000000000000000021371435554746600206300ustar00rootroot00000000000000""" ============================ Plotting with generated data ============================ This example illustrates basic plotting functionality using generated data. """ from matplotlib import pyplot as plt from upsetplot import generate_counts, plot example = generate_counts() print(example) ########################################################################## plot(example) plt.suptitle('Ordered by degree') plt.show() ########################################################################## plot(example, sort_by='cardinality') plt.suptitle('Ordered by cardinality') plt.show() ########################################################################## plot(example, show_counts='{:d}') plt.suptitle('With counts shown') plt.show() ########################################################################## plot(example, show_counts='%d', show_percentages=True) plt.suptitle('With counts and % shown') plt.show() ########################################################################## plot(example, show_percentages="{:.2%}") plt.suptitle('With fraction shown in custom format') plt.show() UpSetPlot-0.8.0/examples/plot_hide.py000066400000000000000000000021221435554746600175750ustar00rootroot00000000000000""" ====================================== Hiding subsets based on size or degree ====================================== This illustrates the use of ``min_subset_size``, ``max_subset_size``, ``min_degree`` or ``max_degree``. """ from matplotlib import pyplot as plt from upsetplot import generate_counts, plot example = generate_counts() plot(example, show_counts=True) plt.suptitle('Nothing hidden') plt.show() ########################################################################## plot(example, show_counts=True, min_subset_size=100) plt.suptitle('Small subsets hidden') plt.show() ########################################################################## plot(example, show_counts=True, max_subset_size=500) plt.suptitle('Large subsets hidden') plt.show() ########################################################################## plot(example, show_counts=True, min_degree=2) plt.suptitle('Degree <2 hidden') plt.show() ########################################################################## plot(example, show_counts=True, max_degree=2) plt.suptitle('Degree >2 hidden') plt.show() UpSetPlot-0.8.0/examples/plot_highlight.py000066400000000000000000000037331435554746600206440ustar00rootroot00000000000000""" ============================= Highlighting selected subsets ============================= Demonstrates use of the `style_subsets` method to mark some subsets as different. """ from matplotlib import pyplot as plt from upsetplot import generate_counts, UpSet example = generate_counts() ########################################################################## # Subsets can be styled by the categories present in them, and a legend # can be optionally generated. upset = UpSet(example) upset.style_subsets(present=["cat1", "cat2"], facecolor="blue", label="special") upset.plot() plt.suptitle("Paint blue subsets including both cat1 and cat2; show a legend") plt.show() ########################################################################## # ... or styling can be applied by the categories absent in a subset. upset = UpSet(example, orientation="vertical") upset.style_subsets(present="cat2", absent="cat1", edgecolor="red", linewidth=2) upset.plot() plt.suptitle("Border for subsets including cat2 but not cat1") plt.show() ########################################################################## # ... or their size or degree. upset = UpSet(example) upset.style_subsets(min_subset_size=1000, facecolor="lightblue", hatch="xx", label="big") upset.plot() plt.suptitle("Hatch subsets with size >1000") plt.show() ########################################################################## # Multiple stylings can be applied with different criteria in the same # plot. upset = UpSet(example, facecolor="gray") upset.style_subsets(present="cat0", label="Contains cat0", facecolor="blue") upset.style_subsets(present="cat1", label="Contains cat1", hatch="xx") upset.style_subsets(present="cat2", label="Contains cat2", edgecolor="red") # reduce legend size: params = {'legend.fontsize': 8} with plt.rc_context(params): upset.plot() plt.suptitle("Styles for every category!") plt.show() UpSetPlot-0.8.0/examples/plot_missingness.py000066400000000000000000000012371435554746600212340ustar00rootroot00000000000000""" ======================================= Plot the distribution of missing values ======================================= UpSet plots are often used to show which variables are missing together. Passing a callable ``indicators=pd.isna`` to :func:`from_indicators` is an easy way to categorise a record by the variables that are missing in it. """ from matplotlib import pyplot as plt import pandas as pd from upsetplot import plot, from_indicators TITANIC_URL = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv' # noqa data = pd.read_csv(TITANIC_URL) plot(from_indicators(indicators=pd.isna, data=data), show_counts=True) plt.show() UpSetPlot-0.8.0/examples/plot_sizing.py000066400000000000000000000026721435554746600202010ustar00rootroot00000000000000""" ======================================== Customising element size and figure size ======================================== This example illustrates controlling sizing within an UpSet plot. """ from matplotlib import pyplot as plt from upsetplot import generate_counts, plot example = generate_counts() print(example) plot(example) plt.suptitle('Defaults') plt.show() ########################################################################## # upsetplot uses a grid of square "elements" to display. Controlling the # size of these elements affects all components of the plot. plot(example, element_size=40) plt.suptitle('Increased element_size') plt.show() ########################################################################## # When setting ``figsize`` explicitly, you then need to pass the figure to # ``plot``, and use ``element_size=None`` for optimal sizing. fig = plt.figure(figsize=(10, 3)) plot(example, fig=fig, element_size=None) plt.suptitle('Setting figsize explicitly') plt.show() ########################################################################## # Components in the plot can be resized by indicating how many elements # they should equate to. plot(example, intersection_plot_elements=3) plt.suptitle('Decreased intersection_plot_elements') plt.show() ########################################################################## plot(example, totals_plot_elements=5) plt.suptitle('Increased totals_plot_elements') plt.show() UpSetPlot-0.8.0/examples/plot_theming.py000066400000000000000000000050601435554746600203230ustar00rootroot00000000000000""" ==================== Changing Plot Colors ==================== This example illustrates use of matplotlib and upsetplot color settings, aside from matplotlib style sheets, which can control colors as well as grid lines, fonts and tick display. Upsetplot provides some color settings: * ``facecolor``: sets the color for intersection size bars, and for active matrix dots. Defaults to white on a dark background, otherwise black. * ``other_dots_color``: sets the color for other (inactive) dots. Specify as a color, or a float specifying opacity relative to facecolor. * ``shading_color``: sets the color odd rows. Specify as a color, or a float specifying opacity relative to facecolor. For an introduction to matplotlib theming see: * `Tutorial `__ * `Reference `__ """ from matplotlib import pyplot as plt from upsetplot import generate_counts, plot example = generate_counts() plot(example, facecolor="darkblue") plt.suptitle('facecolor="darkblue"') plt.show() ########################################################################## plot(example, facecolor="darkblue", shading_color="lightgray") plt.suptitle('facecolor="darkblue", shading_color="lightgray"') plt.show() ########################################################################## with plt.style.context('Solarize_Light2'): plot(example) plt.suptitle('matplotlib classic stylesheet') plt.show() ########################################################################## with plt.style.context('dark_background'): plot(example, show_counts=True) plt.suptitle('matplotlib dark_background stylesheet') plt.show() ########################################################################## with plt.style.context('dark_background'): plot(example, show_counts=True, shading_color=.15) plt.suptitle('matplotlib dark_background stylesheet, shading_color=.15') plt.show() ########################################################################## with plt.style.context('dark_background'): plot(example, show_counts=True, facecolor="red") plt.suptitle('matplotlib dark_background, facecolor="red"') plt.show() ########################################################################## with plt.style.context('dark_background'): plot(example, show_counts=True, facecolor="red", other_dots_color=.4, shading_color=.2) plt.suptitle('dark_background, red face, stronger other colors') plt.show() UpSetPlot-0.8.0/examples/plot_vertical.py000066400000000000000000000013431435554746600205010ustar00rootroot00000000000000""" ==================== Vertical orientation ==================== This illustrates the effect of orientation='vertical'. """ from matplotlib import pyplot as plt from upsetplot import generate_counts, plot example = generate_counts() plot(example, orientation='vertical') plt.suptitle('A vertical plot') plt.show() ########################################################################## plot(example, orientation='vertical', show_counts='{:d}') plt.suptitle('A vertical plot with counts shown') plt.show() ########################################################################## plot(example, orientation='vertical', show_counts='{:d}', show_percentages=True) plt.suptitle('With counts and percentages shown') plt.show() UpSetPlot-0.8.0/setup.cfg000066400000000000000000000013731435554746600152660ustar00rootroot00000000000000[metadata] description = Draw Lex et al.'s UpSet plots with Pandas and Matplotlib long_description = file: README.rst author = Joel Nothman author_email = joel.nothman@gmail.com url = https://upsetplot.readthedocs.io license = BSD 3-Clause License classifiers = License :: OSI Approved :: BSD License Programming Language :: Python :: 3 Programming Language :: Python :: 3.6 Programming Language :: Python :: 3.10 Topic :: Scientific/Engineering :: Visualization Intended Audience :: Science/Research [aliases] test = pytest [tool:pytest] addopts = --doctest-modules --verbose --cov=upsetplot --showlocals # --cov=upsetplot testpaths = upsetplot README.rst doctest_optionflags = ALLOW_UNICODE NORMALIZE_WHITESPACE ELLIPSIS [flake8] ignore = W503,W504 UpSetPlot-0.8.0/setup.py000066400000000000000000000015661435554746600151630ustar00rootroot00000000000000#!/usr/bin/env python import os import sys from setuptools import setup def setup_package(): src_path = os.path.dirname(os.path.abspath(sys.argv[0])) old_path = os.getcwd() os.chdir(src_path) sys.path.insert(0, src_path) try: os.environ['__in-setup'] = '1' # ensures only version is imported from upsetplot import __version__ as version # See also setup.cfg setup(name='UpSetPlot', version=version, packages=["upsetplot"], license='BSD-3-Clause', setup_requires=['pytest-runner'], tests_require=['pytest>=2.7', 'pytest-cov<2.6'], # TODO: check versions install_requires=['pandas>=0.23', 'matplotlib>=2.0']) finally: del sys.path[0] os.chdir(old_path) return if __name__ == '__main__': setup_package() UpSetPlot-0.8.0/upsetplot/000077500000000000000000000000001435554746600155005ustar00rootroot00000000000000UpSetPlot-0.8.0/upsetplot/__init__.py000066400000000000000000000007571435554746600176220ustar00rootroot00000000000000__version__ = '0.8.0' import os if os.environ.get('__in-setup', None) != '1': from .plotting import UpSet, plot from .data import (generate_counts, generate_data, generate_samples, from_memberships, from_contents, from_indicators) from .reformat import query __all__ = ['UpSet', 'generate_data', 'generate_counts', 'generate_samples', 'plot', 'from_memberships', 'from_contents', 'from_indicators', 'query'] UpSetPlot-0.8.0/upsetplot/data.py000066400000000000000000000336011435554746600167660ustar00rootroot00000000000000from __future__ import print_function, division, absolute_import from numbers import Number import functools from distutils.version import LooseVersion import warnings import pandas as pd import numpy as np def generate_samples(seed=0, n_samples=10000, n_categories=3): """Generate artificial samples assigned to set intersections Parameters ---------- seed : int A seed for randomisation n_samples : int Number of samples to generate n_categories : int Number of categories (named "cat0", "cat1", ...) to generate Returns ------- DataFrame Field 'value' is a weight or score for each element. Field 'index' is a unique id for each element. Index includes a boolean indicator mask for each category. Note: Further fields may be added in future versions. See Also -------- generate_counts : Generates the counts for each subset of categories corresponding to these samples. """ rng = np.random.RandomState(seed) df = pd.DataFrame({'value': np.zeros(n_samples)}) for i in range(n_categories): r = rng.rand(n_samples) df['cat%d' % i] = r > rng.rand() df['value'] += r df.reset_index(inplace=True) df.set_index(['cat%d' % i for i in range(n_categories)], inplace=True) return df def generate_counts(seed=0, n_samples=10000, n_categories=3): """Generate artificial counts corresponding to set intersections Parameters ---------- seed : int A seed for randomisation n_samples : int Number of samples to generate statistics over n_categories : int Number of categories (named "cat0", "cat1", ...) to generate Returns ------- Series Counts indexed by boolean indicator mask for each category. See Also -------- generate_samples : Generates a DataFrame of samples that these counts are derived from. """ df = generate_samples(seed=seed, n_samples=n_samples, n_categories=n_categories) return df.value.groupby(level=list(range(n_categories))).count() def generate_data(seed=0, n_samples=10000, n_sets=3, aggregated=False): warnings.warn('generate_data was replaced by generate_counts in version ' '0.3 and will be removed in version 0.4.', DeprecationWarning) if aggregated: return generate_counts(seed=seed, n_samples=n_samples, n_categories=n_sets) else: return generate_samples(seed=seed, n_samples=n_samples, n_categories=n_sets)['value'] def from_indicators(indicators, data=None): """Load category membership indicated by a boolean indicator matrix This loader also supports the case where the indicator columns can be derived from `data`. .. versionadded:: 0.6 Parameters ---------- indicators : DataFrame-like of booleans, Sequence of str, or callable Specifies the category indicators (boolean mask arrays) within ``data``, i.e. which records in ``data`` belong to which categories. If a list of strings, these should be column names found in ``data`` whose values are boolean mask arrays. If a DataFrame, its columns should correspond to categories, and its index should be a subset of those in ``data``, values should be True where a data record is in that category, and False or NA otherwise. If callable, it will be applied to ``data`` after the latter is converted to a Series or DataFrame. data : Series-like or DataFrame-like, optional If given, the index of category membership is attached to this data. It must have the same length as `indicators`. If not given, the series will contain the value 1. Returns ------- DataFrame or Series `data` is returned with its index indicating category membership. It will be a Series if `data` is a Series or 1d numeric array or None. Notes ----- Categories with indicators that are all False will be removed. Examples -------- >>> import pandas as pd >>> from upsetplot import from_indicators >>> >>> # Just indicators: >>> indicators = {"cat1": [True, False, True, False], ... "cat2": [False, True, False, False], ... "cat3": [True, True, False, False]} >>> from_indicators(indicators) cat1 cat2 cat3 True False True 1.0 False True True 1.0 True False False 1.0 False False False 1.0 Name: ones, dtype: float64 >>> >>> # Where indicators are included within data, specifying >>> # columns by name: >>> data = pd.DataFrame({"value": [5, 4, 6, 4], **indicators}) >>> from_indicators(["cat1", "cat3"], data=data) value cat1 cat2 cat3 cat1 cat3 True True 5 True False True False True 4 False True True True False 6 True False False False False 4 False False False >>> >>> # Making indicators out of all boolean columns: >>> from_indicators(lambda data: data.select_dtypes(bool), data=data) value cat1 cat2 cat3 cat1 cat2 cat3 True False True 5 True False True False True True 4 False True True True False False 6 True False False False False False 4 False False False >>> >>> # Using a dataset with missing data, we can use missingness as >>> # an indicator: >>> data = pd.DataFrame({"val1": [pd.NA, .7, pd.NA, .9], ... "val2": ["male", pd.NA, "female", "female"], ... "val3": [pd.NA, pd.NA, 23000, 78000]}) >>> from_indicators(pd.isna, data=data) val1 val2 val3 val1 val2 val3 True False True male False True True 0.7 True False False female 23000 False False False 0.9 female 78000 """ if data is not None: data = _convert_to_pandas(data) if callable(indicators): if data is None: raise ValueError("data must be provided when indicators is " "callable") indicators = indicators(data) try: indicators[0] except Exception: pass else: if isinstance(indicators[0], (str, int)): if data is None: raise ValueError("data must be provided when indicators are " "specified as a list of columns") if isinstance(indicators, tuple): raise ValueError("indicators as tuple is not supported") # column array indicators = data[indicators] indicators = pd.DataFrame(indicators).fillna(False).infer_objects() # drop all-False (should we be dropping all-True also? making an option?) indicators = indicators.loc[:, indicators.any(axis=0)] if not all(dtype.kind == 'b' for dtype in indicators.dtypes): raise ValueError('The indicators must all be boolean') if data is not None: if not (isinstance(indicators.index, pd.RangeIndex) and indicators.index[0] == 0 and indicators.index[-1] == len(data) - 1): # index is specified on indicators. Need to align it to data if not indicators.index.isin(data.index).all(): raise ValueError("If indicators.index is not the default, " "all its values must be present in " "data.index") indicators = indicators.reindex(index=data.index, fill_value=False) else: data = pd.Series(np.ones(len(indicators)), name="ones") indicators.set_index(list(indicators.columns), inplace=True) data.index = indicators.index return data def _convert_to_pandas(data, copy=True): is_series = False if hasattr(data, 'loc'): if copy: data = data.copy(deep=False) is_series = data.ndim == 1 elif len(data): try: is_series = isinstance(data[0], Number) except KeyError: is_series = False if is_series: data = pd.Series(data) else: data = pd.DataFrame(data) return data def from_memberships(memberships, data=None): """Load data where each sample has a collection of category names The output should be suitable for passing to `UpSet` or `plot`. Parameters ---------- memberships : sequence of collections of strings Each element corresponds to a data point, indicating the sets it is a member of. Each category is named by a string. data : Series-like or DataFrame-like, optional If given, the index of category memberships is attached to this data. It must have the same length as `memberships`. If not given, the series will contain the value 1. Returns ------- DataFrame or Series `data` is returned with its index indicating category membership. It will be a Series if `data` is a Series or 1d numeric array. The index will have levels ordered by category names. Examples -------- >>> from upsetplot import from_memberships >>> from_memberships([ ... ['cat1', 'cat3'], ... ['cat2', 'cat3'], ... ['cat1'], ... [] ... ]) cat1 cat2 cat3 True False True 1 False True True 1 True False False 1 False False False 1 Name: ones, dtype: ... >>> # now with data: >>> import numpy as np >>> from_memberships([ ... ['cat1', 'cat3'], ... ['cat2', 'cat3'], ... ['cat1'], ... [] ... ], data=np.arange(12).reshape(4, 3)) 0 1 2 cat1 cat2 cat3 True False True 0 1 2 False True True 3 4 5 True False False 6 7 8 False False False 9 10 11 """ df = pd.DataFrame([{name: True for name in names} for names in memberships]) for set_name in df.columns: if not hasattr(set_name, 'lower'): raise ValueError('Category names should be strings') if df.shape[1] == 0: raise ValueError('Require at least one category. None were found.') df.sort_index(axis=1, inplace=True) df.fillna(False, inplace=True) df = df.astype(bool) df.set_index(list(df.columns), inplace=True) if data is None: return df.assign(ones=1)['ones'] data = _convert_to_pandas(data) if len(data) != len(df): raise ValueError('memberships and data must have the same length. ' 'Got len(memberships) == %d, len(data) == %d' % (len(memberships), len(data))) data.index = df.index return data def from_contents(contents, data=None, id_column='id'): """Build data from category listings Parameters ---------- contents : Mapping (or iterable over pairs) of strings to sets Keys are category names, values are sets of identifiers (int or string). data : DataFrame, optional If provided, this should be indexed by the identifiers used in `contents`. id_column : str, default='id' The column name to use for the identifiers in the output. Returns ------- DataFrame `data` is returned with its index indicating category membership, including a column named according to id_column. If data is not given, the order of rows is not assured. Notes ----- The order of categories in the output DataFrame is determined from `contents`, which may have non-deterministic iteration order. Examples -------- >>> from upsetplot import from_contents >>> contents = {'cat1': ['a', 'b', 'c'], ... 'cat2': ['b', 'd'], ... 'cat3': ['e']} >>> from_contents(contents) id cat1 cat2 cat3 True False False a True False b False False c False True False d False True e >>> import pandas as pd >>> contents = {'cat1': [0, 1, 2], ... 'cat2': [1, 3], ... 'cat3': [4]} >>> data = pd.DataFrame({'favourite': ['green', 'red', 'red', ... 'yellow', 'blue']}) >>> from_contents(contents, data=data) id favourite cat1 cat2 cat3 True False False 0 green True False 1 red False False 2 red False True False 3 yellow False True 4 blue """ cat_series = [pd.Series(True, index=list(elements), name=name) for name, elements in contents.items()] if not all(s.index.is_unique for s in cat_series): raise ValueError('Got duplicate ids in a category') concat = pd.concat if LooseVersion(pd.__version__) >= '0.23.0': # silence the warning concat = functools.partial(concat, sort=False) df = concat(cat_series, axis=1) if id_column in df.columns: raise ValueError('A category cannot be named %r' % id_column) df.fillna(False, inplace=True) cat_names = list(df.columns) if data is not None: if set(df.columns).intersection(data.columns): raise ValueError('Data columns overlap with category names') if id_column in data.columns: raise ValueError('data cannot contain a column named %r' % id_column) not_in_data = df.drop(data.index, axis=0, errors='ignore') if len(not_in_data): raise ValueError('Found identifiers in contents that are not in ' 'data: %r' % not_in_data.index.values) df = df.reindex(index=data.index).fillna(False) df = concat([data, df], axis=1) df.index.name = id_column return df.reset_index().set_index(cat_names) UpSetPlot-0.8.0/upsetplot/plotting.py000066400000000000000000001047661435554746600177300ustar00rootroot00000000000000from __future__ import print_function, division, absolute_import try: import typing except ImportError: import collections as typing import numpy as np import pandas as pd import matplotlib from matplotlib import pyplot as plt from matplotlib import colors from matplotlib import patches from .reformat import query, _get_subset_mask from . import util # prevents ImportError on matplotlib versions >3.5.2 try: from matplotlib.tight_layout import get_renderer RENDERER_IMPORTED = True except ImportError: RENDERER_IMPORTED = False def _process_data(df, *, sort_by, sort_categories_by, subset_size, sum_over, min_subset_size=None, max_subset_size=None, min_degree=None, max_degree=None, reverse=False, include_empty_subsets=False): results = query(df, sort_by=sort_by, sort_categories_by=sort_categories_by, subset_size=subset_size, sum_over=sum_over, min_subset_size=min_subset_size, max_subset_size=max_subset_size, min_degree=min_degree, max_degree=max_degree, include_empty_subsets=include_empty_subsets) df = results.data agg = results.subset_sizes totals = results.category_totals total = agg.sum() # add '_bin' to df indicating index in agg # XXX: ugly! def _pack_binary(X): X = pd.DataFrame(X) # use objects if arbitrary precision integers are needed dtype = np.object_ if X.shape[1] > 62 else np.uint64 out = pd.Series(0, index=X.index, dtype=dtype) for i, (_, col) in enumerate(X.items()): out *= 2 out += col return out df_packed = _pack_binary(df.index.to_frame()) data_packed = _pack_binary(agg.index.to_frame()) df['_bin'] = pd.Series(df_packed).map( pd.Series(np.arange(len(data_packed))[::-1 if reverse else 1], index=data_packed)) if reverse: agg = agg[::-1] return total, df, agg, totals def _multiply_alpha(c, mult): r, g, b, a = colors.to_rgba(c) a *= mult return colors.to_hex((r, g, b, a), keep_alpha=True) class _Transposed: """Wrap an object in order to transpose some plotting operations Attributes of obj will be mapped. Keyword arguments when calling obj will be mapped. The mapping is not recursive: callable attributes need to be _Transposed again. """ def __init__(self, obj): self.__obj = obj def __getattr__(self, key): return getattr(self.__obj, self._NAME_TRANSPOSE.get(key, key)) def __call__(self, *args, **kwargs): return self.__obj(*args, **{self._NAME_TRANSPOSE.get(k, k): v for k, v in kwargs.items()}) _NAME_TRANSPOSE = { 'width': 'height', 'height': 'width', 'hspace': 'wspace', 'wspace': 'hspace', 'hlines': 'vlines', 'vlines': 'hlines', 'bar': 'barh', 'barh': 'bar', 'xaxis': 'yaxis', 'yaxis': 'xaxis', 'left': 'bottom', 'right': 'top', 'top': 'right', 'bottom': 'left', 'sharex': 'sharey', 'sharey': 'sharex', 'get_figwidth': 'get_figheight', 'get_figheight': 'get_figwidth', 'set_figwidth': 'set_figheight', 'set_figheight': 'set_figwidth', 'set_xlabel': 'set_ylabel', 'set_ylabel': 'set_xlabel', 'set_xlim': 'set_ylim', 'set_ylim': 'set_xlim', 'get_xlim': 'get_ylim', 'get_ylim': 'get_xlim', 'set_autoscalex_on': 'set_autoscaley_on', 'set_autoscaley_on': 'set_autoscalex_on', } def _transpose(obj): if isinstance(obj, str): return _Transposed._NAME_TRANSPOSE.get(obj, obj) return _Transposed(obj) def _identity(obj): return obj class UpSet: """Manage the data and drawing for a basic UpSet plot Primary public method is :meth:`plot`. Parameters ---------- data : pandas.Series or pandas.DataFrame Elements associated with categories (a DataFrame), or the size of each subset of categories (a Series). Should have MultiIndex where each level is binary, corresponding to category membership. If a DataFrame, `sum_over` must be a string or False. orientation : {'horizontal' (default), 'vertical'} If horizontal, intersections are listed from left to right. sort_by : {'cardinality', 'degree', '-cardinality', '-degree', 'input', '-input'} If 'cardinality', subset are listed from largest to smallest. If 'degree', they are listed in order of the number of categories intersected. If 'input', the order they appear in the data input is used. Prefix with '-' to reverse the ordering. Note this affects ``subset_sizes`` but not ``data``. sort_categories_by : {'cardinality', '-cardinality', 'input', '-input'} Whether to sort the categories by total cardinality, or leave them in the input data's provided order (order of index levels). Prefix with '-' to reverse the ordering. subset_size : {'auto', 'count', 'sum'} Configures how to calculate the size of a subset. Choices are: 'auto' (default) If `data` is a DataFrame, count the number of rows in each group, unless `sum_over` is specified. If `data` is a Series with at most one row for each group, use the value of the Series. If `data` is a Series with more than one row per group, raise a ValueError. 'count' Count the number of rows in each group. 'sum' Sum the value of the `data` Series, or the DataFrame field specified by `sum_over`. sum_over : str or None If `subset_size='sum'` or `'auto'`, then the intersection size is the sum of the specified field in the `data` DataFrame. If a Series, only None is supported and its value is summed. min_subset_size : int, optional Minimum size of a subset to be shown in the plot. All subsets with a size smaller than this threshold will be omitted from plotting. Size may be a sum of values, see `subset_size`. .. versionadded:: 0.5 max_subset_size : int, optional Maximum size of a subset to be shown in the plot. All subsets with a size greater than this threshold will be omitted from plotting. .. versionadded:: 0.5 min_degree : int, optional Minimum degree of a subset to be shown in the plot. .. versionadded:: 0.5 max_degree : int, optional Maximum degree of a subset to be shown in the plot. .. versionadded:: 0.5 facecolor : 'auto' or matplotlib color or float Color for bar charts and active dots. Defaults to black if axes.facecolor is a light color, otherwise white. .. versionchanged:: 0.6 Before 0.6, the default was 'black' other_dots_color : matplotlib color or float Color for shading of inactive dots, or opacity (between 0 and 1) applied to facecolor. .. versionadded:: 0.6 shading_color : matplotlib color or float Color for shading of odd rows in matrix and totals, or opacity (between 0 and 1) applied to facecolor. .. versionadded:: 0.6 with_lines : bool Whether to show lines joining dots in the matrix, to mark multiple categories being intersected. element_size : float or None Side length in pt. If None, size is estimated to fit figure intersection_plot_elements : int The intersections plot should be large enough to fit this many matrix elements. Set to 0 to disable intersection size bars. .. versionchanged:: 0.4 Setting to 0 is handled. totals_plot_elements : int The totals plot should be large enough to fit this many matrix elements. show_counts : bool or str, default=False Whether to label the intersection size bars with the cardinality of the intersection. When a string, this formats the number. For example, '{:d}' is equivalent to True. Note that, for legacy reasons, if the string does not contain '{', it will be interpreted as a C-style format string, such as '%d'. show_percentages : bool or str, default=False Whether to label the intersection size bars with the percentage of the intersection relative to the total dataset. When a string, this formats the number representing a fraction of samples. For example, '{:.1%}' is the default, formatting .123 as 12.3%. This may be applied with or without show_counts. .. versionadded:: 0.4 include_empty_subsets : bool (default=False) If True, all possible category combinations will be shown as subsets, even when some are not present in data. """ _default_figsize = (10, 6) def __init__(self, data, orientation='horizontal', sort_by='degree', sort_categories_by='cardinality', subset_size='auto', sum_over=None, min_subset_size=None, max_subset_size=None, min_degree=None, max_degree=None, facecolor='auto', other_dots_color=.18, shading_color=.05, with_lines=True, element_size=32, intersection_plot_elements=6, totals_plot_elements=2, show_counts='', show_percentages=False, include_empty_subsets=False): self._horizontal = orientation == 'horizontal' self._reorient = _identity if self._horizontal else _transpose if facecolor == 'auto': bgcolor = matplotlib.rcParams.get('axes.facecolor', 'white') r, g, b, a = colors.to_rgba(bgcolor) lightness = colors.rgb_to_hsv((r, g, b))[-1] * a facecolor = 'black' if lightness >= .5 else 'white' self._facecolor = facecolor self._shading_color = (_multiply_alpha(facecolor, shading_color) if isinstance(shading_color, float) else shading_color) self._other_dots_color = (_multiply_alpha(facecolor, other_dots_color) if isinstance(other_dots_color, float) else other_dots_color) self._with_lines = with_lines self._element_size = element_size self._totals_plot_elements = totals_plot_elements self._subset_plots = [{'type': 'default', 'id': 'intersections', 'elements': intersection_plot_elements}] if not intersection_plot_elements: self._subset_plots.pop() self._show_counts = show_counts self._show_percentages = show_percentages (self.total, self._df, self.intersections, self.totals) = _process_data( data, sort_by=sort_by, sort_categories_by=sort_categories_by, subset_size=subset_size, sum_over=sum_over, min_subset_size=min_subset_size, max_subset_size=max_subset_size, min_degree=min_degree, max_degree=max_degree, reverse=not self._horizontal, include_empty_subsets=include_empty_subsets) self.subset_styles = [{"facecolor": facecolor} for i in range(len(self.intersections))] self.subset_legend = [] # pairs of (style, label) def _swapaxes(self, x, y): if self._horizontal: return x, y return y, x def style_subsets(self, present=None, absent=None, min_subset_size=None, max_subset_size=None, min_degree=None, max_degree=None, facecolor=None, edgecolor=None, hatch=None, linewidth=None, linestyle=None, label=None): """Updates the style of selected subsets' bars and matrix dots Parameters are either used to select subsets, or to style them with attributes of :class:`matplotlib.patches.Patch`, apart from label, which adds a legend entry. Parameters ---------- present : str or list of str, optional Category or categories that must be present in subsets for styling. absent : str or list of str, optional Category or categories that must not be present in subsets for styling. min_subset_size : int, optional Minimum size of a subset to be styled. max_subset_size : int, optional Maximum size of a subset to be styled. min_degree : int, optional Minimum degree of a subset to be styled. max_degree : int, optional Maximum degree of a subset to be styled. facecolor : str or matplotlib color, optional Override the default UpSet facecolor for selected subsets. edgecolor : str or matplotlib color, optional Set the edgecolor for bars, dots, and the line between dots. hatch : str, optional Set the hatch. This will apply to intersection size bars, but not to matrix dots. linewidth : int, optional Line width in points for edges. linestyle : str, optional Line style for edges. label : str, optional If provided, a legend will be added """ style = {"facecolor": facecolor, "edgecolor": edgecolor, "hatch": hatch, "linewidth": linewidth, "linestyle": linestyle} style = {k: v for k, v in style.items() if v is not None} mask = _get_subset_mask(self.intersections, present=present, absent=absent, min_subset_size=min_subset_size, max_subset_size=max_subset_size, min_degree=min_degree, max_degree=max_degree) for idx in np.flatnonzero(mask): self.subset_styles[idx].update(style) if label is not None: if "facecolor" not in style: style["facecolor"] = self._facecolor for i, (other_style, other_label) in enumerate(self.subset_legend): if other_style == style: if other_label != label: self.subset_legend[i] = (style, other_label + '; ' + label) break else: self.subset_legend.append((style, label)) def _plot_bars(self, ax, data, title, colors=None, use_labels=False): ax = self._reorient(ax) ax.set_autoscalex_on(False) data_df = pd.DataFrame(data) if self._horizontal: data_df = data_df.loc[:, ::-1] # reverse: top row is top of stack # TODO: colors should be broadcastable to data_df shape if callable(colors): colors = colors(range(data_df.shape[1])) elif isinstance(colors, (str, type(None))): colors = [colors] * len(data_df) if self._horizontal: colors = reversed(colors) x = np.arange(len(data_df)) cum_y = None all_rects = [] for (name, y), color in zip(data_df.items(), colors): rects = ax.bar(x, y, .5, cum_y, color=color, zorder=10, label=name if use_labels else None, align='center') cum_y = y if cum_y is None else cum_y + y all_rects.extend(rects) self._label_sizes(ax, rects, 'top' if self._horizontal else 'right') ax.xaxis.set_visible(False) for x in ['top', 'bottom', 'right']: ax.spines[self._reorient(x)].set_visible(False) tick_axis = ax.yaxis tick_axis.grid(True) ax.set_ylabel(title) return all_rects def _plot_stacked_bars(self, ax, by, sum_over, colors, title): df = self._df.set_index("_bin").set_index(by, append=True, drop=False) gb = df.groupby(level=list(range(df.index.nlevels)), sort=True) if sum_over is None and "_value" in df.columns: data = gb["_value"].sum() elif sum_over is None: data = gb.size() else: data = gb[sum_over].sum() data = data.unstack(by).fillna(0) if isinstance(colors, str): colors = matplotlib.cm.get_cmap(colors) elif isinstance(colors, typing.Mapping): colors = data.columns.map(colors).values if pd.isna(colors).any(): raise KeyError("Some labels mapped by colors: %r" % data.columns[pd.isna(colors)].tolist()) self._plot_bars(ax, data=data, colors=colors, title=title, use_labels=True) handles, labels = ax.get_legend_handles_labels() if self._horizontal: # Make legend order match visual stack order ax.legend(reversed(handles), reversed(labels)) else: ax.legend() def add_stacked_bars(self, by, sum_over=None, colors=None, elements=3, title=None): """Add a stacked bar chart over subsets when :func:`plot` is called. Used to plot categorical variable distributions within each subset. .. versionadded:: 0.6 Parameters ---------- by : str Column name within the dataframe for color coding the stacked bars, containing discrete or categorical values. sum_over : str, optional Ordinarily the bars will chart the size of each group. sum_over may specify a column which will be summed to determine the size of each bar. colors : Mapping, list-like, str or callable, optional The facecolors to use for bars corresponding to each discrete label, specified as one of: Mapping Maps from label to matplotlib-compatible color specification. list-like A list of matplotlib colors to apply to labels in order. str The name of a matplotlib colormap name. callable When called with the number of labels, this should return a list-like of that many colors. Matplotlib colormaps satisfy this callable API. None Uses the matplotlib default colormap. elements : int, default=3 Size of the axes counted in number of matrix elements. title : str, optional The axis title labelling bar length. Returns ------- None """ # TODO: allow sort_by = {"lexical", "sum_squares", "rev_sum_squares", # list of labels} self._subset_plots.append({'type': 'stacked_bars', 'by': by, 'sum_over': sum_over, 'colors': colors, 'title': title, 'id': 'extra%d' % len(self._subset_plots), 'elements': elements}) def add_catplot(self, kind, value=None, elements=3, **kw): """Add a seaborn catplot over subsets when :func:`plot` is called. Parameters ---------- kind : str One of {"point", "bar", "strip", "swarm", "box", "violin", "boxen"} value : str, optional Column name for the value to plot (i.e. y if orientation='horizontal'), required if `data` is a DataFrame. elements : int, default=3 Size of the axes counted in number of matrix elements. **kw : dict Additional keywords to pass to :func:`seaborn.catplot`. Our implementation automatically determines 'ax', 'data', 'x', 'y' and 'orient', so these are prohibited keys in `kw`. Returns ------- None """ assert not set(kw.keys()) & {'ax', 'data', 'x', 'y', 'orient'} if value is None: if '_value' not in self._df.columns: raise ValueError('value cannot be set if data is a Series. ' 'Got %r' % value) else: if value not in self._df.columns: raise ValueError('value %r is not a column in data' % value) self._subset_plots.append({'type': 'catplot', 'value': value, 'kind': kind, 'id': 'extra%d' % len(self._subset_plots), 'elements': elements, 'kw': kw}) def _check_value(self, value): if value is None and '_value' in self._df.columns: value = '_value' elif value is None: raise ValueError('value can only be None when data is a Series') return value def _plot_catplot(self, ax, value, kind, kw): df = self._df value = self._check_value(value) kw = kw.copy() if self._horizontal: kw['orient'] = 'v' kw['x'] = '_bin' kw['y'] = value else: kw['orient'] = 'h' kw['x'] = value kw['y'] = '_bin' import seaborn kw['ax'] = ax getattr(seaborn, kind + 'plot')(data=df, **kw) ax = self._reorient(ax) if value == '_value': ax.set_ylabel('') ax.xaxis.set_visible(False) for x in ['top', 'bottom', 'right']: ax.spines[self._reorient(x)].set_visible(False) tick_axis = ax.yaxis tick_axis.grid(True) def make_grid(self, fig=None): """Get a SubplotSpec for each Axes, accounting for label text width """ n_cats = len(self.totals) n_inters = len(self.intersections) if fig is None: fig = plt.gcf() # Determine text size to determine figure size / spacing text_kw = {"size": matplotlib.rcParams['xtick.labelsize']} # adding "x" ensures a margin t = fig.text(0, 0, '\n'.join(str(label) + "x" for label in self.totals.index.values), **text_kw) window_extent_args = {} if RENDERER_IMPORTED: window_extent_args["renderer"] = get_renderer(fig) textw = t.get_window_extent(**window_extent_args).width t.remove() window_extent_args = {} if RENDERER_IMPORTED: window_extent_args["renderer"] = get_renderer(fig) figw = self._reorient( fig.get_window_extent(**window_extent_args)).width sizes = np.asarray([p['elements'] for p in self._subset_plots]) fig = self._reorient(fig) non_text_nelems = len(self.intersections) + self._totals_plot_elements if self._element_size is None: colw = (figw - textw) / non_text_nelems else: render_ratio = figw / fig.get_figwidth() colw = self._element_size / 72 * render_ratio figw = colw * (non_text_nelems + np.ceil(textw / colw) + 1) fig.set_figwidth(figw / render_ratio) fig.set_figheight((colw * (n_cats + sizes.sum())) / render_ratio) text_nelems = int(np.ceil(figw / colw - non_text_nelems)) # print('textw', textw, 'figw', figw, 'colw', colw, # 'ncols', figw/colw, 'text_nelems', text_nelems) GS = self._reorient(matplotlib.gridspec.GridSpec) gridspec = GS(*self._swapaxes(n_cats + (sizes.sum() or 0), n_inters + text_nelems + self._totals_plot_elements), hspace=1) if self._horizontal: out = {'matrix': gridspec[-n_cats:, -n_inters:], 'shading': gridspec[-n_cats:, :], 'totals': gridspec[-n_cats:, :self._totals_plot_elements], 'gs': gridspec} cumsizes = np.cumsum(sizes[::-1]) for start, stop, plot in zip(np.hstack([[0], cumsizes]), cumsizes, self._subset_plots[::-1]): out[plot['id']] = gridspec[start:stop, -n_inters:] else: out = {'matrix': gridspec[-n_inters:, :n_cats], 'shading': gridspec[:, :n_cats], 'totals': gridspec[:self._totals_plot_elements, :n_cats], 'gs': gridspec} cumsizes = np.cumsum(sizes) for start, stop, plot in zip(np.hstack([[0], cumsizes]), cumsizes, self._subset_plots): out[plot['id']] = \ gridspec[-n_inters:, start + n_cats:stop + n_cats] return out def plot_matrix(self, ax): """Plot the matrix of intersection indicators onto ax """ ax = self._reorient(ax) data = self.intersections n_cats = data.index.nlevels inclusion = data.index.to_frame().values # Prepare styling styles = [ [ self.subset_styles[i] if inclusion[i, j] else {"facecolor": self._other_dots_color, "linewidth": 0} for j in range(n_cats) ] for i in range(len(data)) ] styles = sum(styles, []) # flatten nested list style_columns = {"facecolor": "facecolors", "edgecolor": "edgecolors", "linewidth": "linewidths", "linestyle": "linestyles", "hatch": "hatch"} styles = pd.DataFrame(styles).reindex(columns=style_columns.keys()) styles["linewidth"].fillna(1, inplace=True) styles["facecolor"].fillna(self._facecolor, inplace=True) styles["edgecolor"].fillna(styles["facecolor"], inplace=True) styles["linestyle"].fillna("solid", inplace=True) del styles["hatch"] # not supported in matrix (currently) x = np.repeat(np.arange(len(data)), n_cats) y = np.tile(np.arange(n_cats), len(data)) # Plot dots if self._element_size is not None: s = (self._element_size * .35) ** 2 else: # TODO: make s relative to colw s = 200 ax.scatter(*self._swapaxes(x, y), s=s, zorder=10, **styles.rename(columns=style_columns)) # Plot lines if self._with_lines: idx = np.flatnonzero(inclusion) line_data = (pd.Series(y[idx], index=x[idx]) .groupby(level=0) .aggregate(['min', 'max'])) colors = pd.Series([ style.get("edgecolor", style.get("facecolor", self._facecolor)) for style in self.subset_styles], name="color") line_data = line_data.join(colors) ax.vlines(line_data.index.values, line_data['min'], line_data['max'], lw=2, colors=line_data["color"], zorder=5) # Ticks and axes tick_axis = ax.yaxis tick_axis.set_ticks(np.arange(n_cats)) tick_axis.set_ticklabels(data.index.names, rotation=0 if self._horizontal else -90) ax.xaxis.set_visible(False) ax.tick_params(axis='both', which='both', length=0) if not self._horizontal: ax.yaxis.set_ticks_position('top') ax.set_frame_on(False) ax.set_xlim(-.5, x[-1] + .5, auto=False) ax.grid(False) def plot_intersections(self, ax): """Plot bars indicating intersection size """ rects = self._plot_bars(ax, self.intersections, title='Intersection size', colors=self._facecolor) for style, rect in zip(self.subset_styles, rects): style = style.copy() style.setdefault("edgecolor", style.get("facecolor", self._facecolor)) for attr, val in style.items(): getattr(rect, "set_" + attr)(val) if self.subset_legend: styles, labels = zip(*self.subset_legend) styles = [patches.Patch(**patch_style) for patch_style in styles] ax.legend(styles, labels) def _label_sizes(self, ax, rects, where): if not self._show_counts and not self._show_percentages: return if self._show_counts is True: count_fmt = "{:.0f}" else: count_fmt = self._show_counts if '{' not in count_fmt: count_fmt = util.to_new_pos_format(count_fmt) if self._show_percentages is True: pct_fmt = "{:.1%}" else: pct_fmt = self._show_percentages if count_fmt and pct_fmt: if where == 'top': fmt = '%s\n(%s)' % (count_fmt, pct_fmt) else: fmt = '%s (%s)' % (count_fmt, pct_fmt) def make_args(val): return val, val / self.total elif count_fmt: fmt = count_fmt def make_args(val): return val, else: fmt = pct_fmt def make_args(val): return val / self.total, if where == 'right': margin = 0.01 * abs(np.diff(ax.get_xlim())) for rect in rects: width = rect.get_width() + rect.get_x() ax.text(width + margin, rect.get_y() + rect.get_height() * .5, fmt.format(*make_args(width)), ha='left', va='center') elif where == 'left': margin = 0.01 * abs(np.diff(ax.get_xlim())) for rect in rects: width = rect.get_width() + rect.get_x() ax.text(width + margin, rect.get_y() + rect.get_height() * .5, fmt.format(*make_args(width)), ha='right', va='center') elif where == 'top': margin = 0.01 * abs(np.diff(ax.get_ylim())) for rect in rects: height = rect.get_height() + rect.get_y() ax.text(rect.get_x() + rect.get_width() * .5, height + margin, fmt.format(*make_args(height)), ha='center', va='bottom') else: raise NotImplementedError('unhandled where: %r' % where) def plot_totals(self, ax): """Plot bars indicating total set size """ orig_ax = ax ax = self._reorient(ax) rects = ax.barh(np.arange(len(self.totals.index.values)), self.totals, .5, color=self._facecolor, align='center') self._label_sizes(ax, rects, 'left' if self._horizontal else 'top') max_total = self.totals.max() if self._horizontal: orig_ax.set_xlim(max_total, 0) for x in ['top', 'left', 'right']: ax.spines[self._reorient(x)].set_visible(False) ax.yaxis.set_visible(False) ax.xaxis.grid(True) ax.yaxis.grid(False) ax.patch.set_visible(False) def plot_shading(self, ax): # alternating row shading (XXX: use add_patch(Rectangle)?) for i in range(0, len(self.totals), 2): rect = plt.Rectangle(self._swapaxes(0, i - .4), *self._swapaxes(*(1, .8)), facecolor=self._shading_color, lw=0, zorder=0) ax.add_patch(rect) ax.set_frame_on(False) ax.tick_params( axis='both', which='both', left=False, right=False, bottom=False, top=False, labelbottom=False, labelleft=False) ax.grid(False) ax.set_xticks([]) ax.set_yticks([]) ax.set_xticklabels([]) ax.set_yticklabels([]) def plot(self, fig=None): """Draw all parts of the plot onto fig or a new figure Parameters ---------- fig : matplotlib.figure.Figure, optional Defaults to a new figure. Returns ------- subplots : dict of matplotlib.axes.Axes Keys are 'matrix', 'intersections', 'totals', 'shading' """ if fig is None: fig = plt.figure(figsize=self._default_figsize) specs = self.make_grid(fig) shading_ax = fig.add_subplot(specs['shading']) self.plot_shading(shading_ax) matrix_ax = self._reorient(fig.add_subplot)(specs['matrix'], sharey=shading_ax) self.plot_matrix(matrix_ax) totals_ax = self._reorient(fig.add_subplot)(specs['totals'], sharey=matrix_ax) self.plot_totals(totals_ax) out = {'matrix': matrix_ax, 'shading': shading_ax, 'totals': totals_ax} for plot in self._subset_plots: ax = self._reorient(fig.add_subplot)(specs[plot['id']], sharex=matrix_ax) if plot['type'] == 'default': self.plot_intersections(ax) elif plot['type'] in self.PLOT_TYPES: kw = plot.copy() del kw['type'] del kw['elements'] del kw['id'] self.PLOT_TYPES[plot['type']](self, ax, **kw) else: raise ValueError('Unknown subset plot type: %r' % plot['type']) out[plot['id']] = ax return out PLOT_TYPES = { 'catplot': _plot_catplot, 'stacked_bars': _plot_stacked_bars, } def _repr_html_(self): fig = plt.figure(figsize=self._default_figsize) self.plot(fig=fig) return fig._repr_html_() def plot(data, fig=None, **kwargs): """Make an UpSet plot of data on fig Parameters ---------- data : pandas.Series or pandas.DataFrame Values for each set to plot. Should have multi-index where each level is binary, corresponding to set membership. If a DataFrame, `sum_over` must be a string or False. fig : matplotlib.figure.Figure, optional Defaults to a new figure. kwargs Other arguments for :class:`UpSet` Returns ------- subplots : dict of matplotlib.axes.Axes Keys are 'matrix', 'intersections', 'totals', 'shading' """ return UpSet(data, **kwargs).plot(fig) UpSetPlot-0.8.0/upsetplot/reformat.py000066400000000000000000000325721435554746600177020ustar00rootroot00000000000000from __future__ import print_function, division, absolute_import try: import typing except ImportError: import collections as typing import numpy as np import pandas as pd def _aggregate_data(df, subset_size, sum_over): """ Returns ------- df : DataFrame full data frame aggregated : Series aggregates """ _SUBSET_SIZE_VALUES = ['auto', 'count', 'sum'] if subset_size not in _SUBSET_SIZE_VALUES: raise ValueError('subset_size should be one of %s. Got %r' % (_SUBSET_SIZE_VALUES, subset_size)) if df.ndim == 1: # Series input_name = df.name df = pd.DataFrame({'_value': df}) if subset_size == 'auto' and not df.index.is_unique: raise ValueError('subset_size="auto" cannot be used for a ' 'Series with non-unique groups.') if sum_over is not None: raise ValueError('sum_over is not applicable when the input is a ' 'Series') if subset_size == 'count': sum_over = False else: sum_over = '_value' else: # DataFrame if sum_over is False: raise ValueError('Unsupported value for sum_over: False') elif subset_size == 'auto' and sum_over is None: sum_over = False elif subset_size == 'count': if sum_over is not None: raise ValueError('sum_over cannot be set if subset_size=%r' % subset_size) sum_over = False elif subset_size == 'sum': if sum_over is None: raise ValueError('sum_over should be a field name if ' 'subset_size="sum" and a DataFrame is ' 'provided.') gb = df.groupby(level=list(range(df.index.nlevels)), sort=False) if sum_over is False: aggregated = gb.size() aggregated.name = 'size' elif hasattr(sum_over, 'lower'): aggregated = gb[sum_over].sum() else: raise ValueError('Unsupported value for sum_over: %r' % sum_over) if aggregated.name == '_value': aggregated.name = input_name return df, aggregated def _check_index(df): # check all indices are boolean if not all(set([True, False]) >= set(level) for level in df.index.levels): raise ValueError('The DataFrame has values in its index that are not ' 'boolean') df = df.copy(deep=False) # XXX: this may break if input is not MultiIndex kw = {'levels': [x.astype(bool) for x in df.index.levels], 'names': df.index.names, } if hasattr(df.index, 'codes'): # compat for pandas <= 0.20 kw['codes'] = df.index.codes else: kw['labels'] = df.index.labels df.index = pd.MultiIndex(**kw) return df def _scalar_to_list(val): if not isinstance(val, (typing.Sequence, set)) or isinstance(val, str): val = [val] return val def _get_subset_mask(agg, min_subset_size, max_subset_size, min_degree, max_degree, present, absent): """Get a mask over subsets based on size, degree or category presence""" subset_mask = True if min_subset_size is not None: subset_mask = np.logical_and(subset_mask, agg >= min_subset_size) if max_subset_size is not None: subset_mask = np.logical_and(subset_mask, agg <= max_subset_size) if (min_degree is not None and min_degree >= 0) or max_degree is not None: degree = agg.index.to_frame().sum(axis=1) if min_degree is not None: subset_mask = np.logical_and(subset_mask, degree >= min_degree) if max_degree is not None: subset_mask = np.logical_and(subset_mask, degree <= max_degree) if present is not None: for col in _scalar_to_list(present): subset_mask = np.logical_and( subset_mask, agg.index.get_level_values(col).values) if absent is not None: for col in _scalar_to_list(absent): exclude_mask = np.logical_not( agg.index.get_level_values(col).values) subset_mask = np.logical_and(subset_mask, exclude_mask) return subset_mask def _filter_subsets(df, agg, min_subset_size, max_subset_size, min_degree, max_degree, present, absent): subset_mask = _get_subset_mask(agg, min_subset_size=min_subset_size, max_subset_size=max_subset_size, min_degree=min_degree, max_degree=max_degree, present=present, absent=absent) if subset_mask is True: return df, agg agg = agg[subset_mask] df = df[df.index.isin(agg.index)] return df, agg class QueryResult: """Container for reformatted data and aggregates Attributes ---------- data : DataFrame Selected samples. The index is a MultiIndex with one boolean level for each category. subsets : dict[frozenset, DataFrame] Dataframes for each intersection of categories. subset_sizes : Series Total size of each selected subset as a series. The index is as for `data`. category_totals : Series Total size of each category, regardless of selection. """ def __init__(self, data, subset_sizes, category_totals): self.data = data self.subset_sizes = subset_sizes self.category_totals = category_totals def __repr__(self): return ("QueryResult(data={data}, subset_sizes={subset_sizes}, " "category_totals={category_totals}".format(**vars(self))) @property def subsets(self): categories = np.asarray(self.data.index.names) return { frozenset(categories.take(mask)): subset_data for mask, subset_data in self.data.groupby(level=list(range(len(categories))), sort=False) } def query(data, present=None, absent=None, min_subset_size=None, max_subset_size=None, min_degree=None, max_degree=None, sort_by='degree', sort_categories_by='cardinality', subset_size='auto', sum_over=None, include_empty_subsets=False): """Transform and filter a categorised dataset Retrieve the set of items and totals corresponding to subsets of interest. Parameters ---------- data : pandas.Series or pandas.DataFrame Elements associated with categories (a DataFrame), or the size of each subset of categories (a Series). Should have MultiIndex where each level is binary, corresponding to category membership. If a DataFrame, `sum_over` must be a string or False. present : str or list of str, optional Category or categories that must be present in subsets for styling. absent : str or list of str, optional Category or categories that must not be present in subsets for styling. min_subset_size : int, optional Minimum size of a subset to be reported. All subsets with a size smaller than this threshold will be omitted from category_totals and data. Size may be a sum of values, see `subset_size`. max_subset_size : int, optional Maximum size of a subset to be reported. min_degree : int, optional Minimum degree of a subset to be reported. max_degree : int, optional Maximum degree of a subset to be reported. sort_by : {'cardinality', 'degree', '-cardinality', '-degree', 'input', '-input'} If 'cardinality', subset are listed from largest to smallest. If 'degree', they are listed in order of the number of categories intersected. If 'input', the order they appear in the data input is used. Prefix with '-' to reverse the ordering. Note this affects ``subset_sizes`` but not ``data``. sort_categories_by : {'cardinality', '-cardinality', 'input', '-input'} Whether to sort the categories by total cardinality, or leave them in the input data's provided order (order of index levels). Prefix with '-' to reverse the ordering. subset_size : {'auto', 'count', 'sum'} Configures how to calculate the size of a subset. Choices are: 'auto' (default) If `data` is a DataFrame, count the number of rows in each group, unless `sum_over` is specified. If `data` is a Series with at most one row for each group, use the value of the Series. If `data` is a Series with more than one row per group, raise a ValueError. 'count' Count the number of rows in each group. 'sum' Sum the value of the `data` Series, or the DataFrame field specified by `sum_over`. sum_over : str or None If `subset_size='sum'` or `'auto'`, then the intersection size is the sum of the specified field in the `data` DataFrame. If a Series, only None is supported and its value is summed. include_empty_subsets : bool (default=False) If True, all possible category combinations will be returned in subset_sizes, even when some are not present in data. Returns ------- QueryResult Including filtered ``data``, filtered and sorted ``subset_sizes`` and overall ``category_totals``. Examples -------- >>> from upsetplot import query, generate_samples >>> data = generate_samples(n_samples=20) >>> result = query(data, present="cat1", max_subset_size=4) >>> result.category_totals cat1 14 cat2 4 cat0 0 dtype: int64 >>> result.subset_sizes cat1 cat2 cat0 True True False 3 Name: size, dtype: int64 >>> result.data index value cat1 cat2 cat0 True True False 0 2.04... False 2 2.05... False 10 2.55... >>> >>> # Sorting: >>> query(data, min_degree=1, sort_by="degree").subset_sizes cat1 cat2 cat0 True False False 11 False True False 1 True True False 3 Name: size, dtype: int64 >>> query(data, min_degree=1, sort_by="cardinality").subset_sizes cat1 cat2 cat0 True False False 11 True False 3 False True False 1 Name: size, dtype: int64 >>> >>> # Getting each subset's data >>> result = query(data) >>> result.subsets[frozenset({"cat1", "cat2"})] index value cat1 cat2 cat0 False True False 3 1.333795 >>> result.subsets[frozenset({"cat1"})] index value cat1 cat2 cat0 False False False 5 0.918174 False 8 1.948521 False 9 1.086599 False 13 1.105696 False 19 1.339895 """ data, agg = _aggregate_data(data, subset_size, sum_over) data = _check_index(data) totals = [agg[agg.index.get_level_values(name).values.astype(bool)].sum() for name in agg.index.names] totals = pd.Series(totals, index=agg.index.names) if include_empty_subsets: nlevels = len(agg.index.levels) if nlevels > 10: raise ValueError( "include_empty_subsets is supported for at most 10 categories") new_agg = pd.Series(0, index=pd.MultiIndex.from_product( [[False, True]] * nlevels, names=agg.index.names), dtype=agg.dtype, name=agg.name) new_agg.update(agg) agg = new_agg data, agg = _filter_subsets(data, agg, min_subset_size=min_subset_size, max_subset_size=max_subset_size, min_degree=min_degree, max_degree=max_degree, present=present, absent=absent) # sort: if sort_categories_by in ('cardinality', '-cardinality'): totals.sort_values(ascending=sort_categories_by[:1] == '-', inplace=True) elif sort_categories_by == '-input': totals = totals[::-1] elif sort_categories_by in (None, 'input'): pass else: raise ValueError('Unknown sort_categories_by: %r' % sort_categories_by) data = data.reorder_levels(totals.index.values) agg = agg.reorder_levels(totals.index.values) if sort_by in ('cardinality', '-cardinality'): agg = agg.sort_values(ascending=sort_by[:1] == '-') elif sort_by in ('degree', '-degree'): index_tuples = sorted(agg.index, key=lambda x: (sum(x),) + tuple(reversed(x)), reverse=sort_by[:1] == '-') agg = agg.reindex(pd.MultiIndex.from_tuples(index_tuples, names=agg.index.names)) elif sort_by == '-input': print("<", agg) agg = agg[::-1] print(">", agg) elif sort_by in (None, 'input'): pass else: raise ValueError('Unknown sort_by: %r' % sort_by) return QueryResult(data=data, subset_sizes=agg, category_totals=totals) UpSetPlot-0.8.0/upsetplot/tests/000077500000000000000000000000001435554746600166425ustar00rootroot00000000000000UpSetPlot-0.8.0/upsetplot/tests/__init__.py000066400000000000000000000000001435554746600207410ustar00rootroot00000000000000UpSetPlot-0.8.0/upsetplot/tests/test_data.py000066400000000000000000000217221435554746600211700ustar00rootroot00000000000000from collections import OrderedDict import pytest import pandas as pd import numpy as np from distutils.version import LooseVersion from pandas.util.testing import (assert_series_equal, assert_frame_equal, assert_index_equal) from upsetplot import (from_memberships, from_contents, from_indicators, generate_data) @pytest.mark.parametrize('typ', [set, list, tuple, iter]) def test_from_memberships_no_data(typ): with pytest.raises(ValueError, match='at least one category'): from_memberships([]) with pytest.raises(ValueError, match='at least one category'): from_memberships([[], []]) with pytest.raises(ValueError, match='strings'): from_memberships([[1]]) with pytest.raises(ValueError, match='strings'): from_memberships([[1, 'str']]) with pytest.raises(TypeError): from_memberships([1]) out = from_memberships([typ([]), typ(['hello']), typ(['world']), typ(['hello', 'world']), ]) exp = pd.DataFrame([[False, False, 1], [True, False, 1], [False, True, 1], [True, True, 1]], columns=['hello', 'world', 'ones'] ).set_index(['hello', 'world'])['ones'] assert isinstance(exp.index, pd.MultiIndex) assert_series_equal(exp, out) # test sorting by name out = from_memberships([typ(['hello']), typ(['world'])]) exp = pd.DataFrame([[True, False, 1], [False, True, 1]], columns=['hello', 'world', 'ones'] ).set_index(['hello', 'world'])['ones'] assert_series_equal(exp, out) out = from_memberships([typ(['world']), typ(['hello'])]) exp = pd.DataFrame([[False, True, 1], [True, False, 1]], columns=['hello', 'world', 'ones'] ).set_index(['hello', 'world'])['ones'] assert_series_equal(exp, out) @pytest.mark.parametrize('data,ndim', [ ([1, 2, 3, 4], 1), (np.array([1, 2, 3, 4]), 1), (pd.Series([1, 2, 3, 4], name='foo'), 1), ([[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']], 2), (pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c'], [4, 'd']], columns=['foo', 'bar'], index=['q', 'r', 's', 't']), 2), ]) def test_from_memberships_with_data(data, ndim): memberships = [[], ['hello'], ['world'], ['hello', 'world']] out = from_memberships(memberships, data=data) assert out is not data # make sure frame is copied if hasattr(data, 'loc') and np.asarray(data).dtype.kind in 'ifb': # but not deepcopied when possible if LooseVersion(pd.__version__) > LooseVersion('0.35'): assert out.values.base is np.asarray(data).base if ndim == 1: assert isinstance(out, pd.Series) else: assert isinstance(out, pd.DataFrame) assert_frame_equal(pd.DataFrame(out).reset_index(drop=True), pd.DataFrame(data).reset_index(drop=True)) no_data = from_memberships(memberships=memberships) assert_index_equal(out.index, no_data.index) with pytest.raises(ValueError, match='length'): from_memberships(memberships[:-1], data=data) @pytest.mark.parametrize('data', [None, {'attr1': [3, 4, 5, 6, 7, 8], 'attr2': list('qrstuv')}]) @pytest.mark.parametrize('typ', [set, list, tuple, iter]) @pytest.mark.parametrize('id_column', ['id', 'blah']) def test_from_contents_vs_memberships(data, typ, id_column): contents = OrderedDict([('cat1', typ(['aa', 'bb', 'cc'])), ('cat2', typ(['cc', 'dd'])), ('cat3', typ(['ee']))]) # Note that ff is not present in contents data_df = pd.DataFrame(data, index=['aa', 'bb', 'cc', 'dd', 'ee', 'ff']) baseline = from_contents(contents, data=data_df, id_column=id_column) # compare from_contents to from_memberships expected = from_memberships(memberships=[{'cat1'}, {'cat1'}, {'cat1', 'cat2'}, {'cat2'}, {'cat3'}, []], data=data_df) assert_series_equal(baseline[id_column].reset_index(drop=True), pd.Series(['aa', 'bb', 'cc', 'dd', 'ee', 'ff'], name=id_column)) assert_frame_equal(baseline.drop([id_column], axis=1), expected) def test_from_contents(typ=set, id_column='id'): contents = OrderedDict([('cat1', {'aa', 'bb', 'cc'}), ('cat2', {'cc', 'dd'}), ('cat3', {'ee'})]) empty_data = pd.DataFrame(index=['aa', 'bb', 'cc', 'dd', 'ee']) baseline = from_contents(contents, data=empty_data, id_column=id_column) # data=None out = from_contents(contents, id_column=id_column) assert_frame_equal(out.sort_values(id_column), baseline) # unordered contents dict out = from_contents({'cat3': contents['cat3'], 'cat2': contents['cat2'], 'cat1': contents['cat1']}, data=empty_data, id_column=id_column) assert_frame_equal(out.reorder_levels(['cat1', 'cat2', 'cat3']), baseline) # empty category out = from_contents({'cat1': contents['cat1'], 'cat2': contents['cat2'], 'cat3': contents['cat3'], 'cat4': []}, data=empty_data, id_column=id_column) assert not out.index.to_frame()['cat4'].any() # cat4 should be all-false assert len(out.index.names) == 4 out.index = out.index.to_frame().set_index(['cat1', 'cat2', 'cat3']).index assert_frame_equal(out, baseline) @pytest.mark.parametrize('id_column', ['id', 'blah']) def test_from_contents_invalid(id_column): contents = OrderedDict([('cat1', {'aa', 'bb', 'cc'}), ('cat2', {'cc', 'dd'}), ('cat3', {'ee'})]) with pytest.raises(ValueError, match='columns overlap'): from_contents(contents, data=pd.DataFrame({'cat1': [1, 2, 3, 4, 5]}), id_column=id_column) with pytest.raises(ValueError, match='duplicate ids'): from_contents({'cat1': ['aa', 'bb'], 'cat2': ['dd', 'dd']}, id_column=id_column) # category named id with pytest.raises(ValueError, match='cannot be named'): from_contents({id_column: {'aa', 'bb', 'cc'}, 'cat2': {'cc', 'dd'}, }, id_column=id_column) # category named id with pytest.raises(ValueError, match='cannot contain'): from_contents(contents, data=pd.DataFrame({id_column: [1, 2, 3, 4, 5]}, index=['aa', 'bb', 'cc', 'dd', 'ee']), id_column=id_column) with pytest.raises(ValueError, match='identifiers in contents'): from_contents({'cat1': ['aa']}, data=pd.DataFrame([[1]]), id_column=id_column) @pytest.mark.parametrize('indicators,data,exc_type,match', [ (["a", "b"], None, ValueError, "data must be provided"), (lambda df: [True, False, True], None, ValueError, "data must be provided"), (["a", "unknown_col"], {"a": [1, 2, 3]}, KeyError, "unknown_col"), (("a",), {"a": [1, 2, 3]}, ValueError, "tuple"), ({"cat1": [0, 1, 1]}, {"a": [1, 2, 3]}, ValueError, "must all be boolean"), (pd.DataFrame({"cat1": [True, False, True]}, index=["a", "b", "c"]), {"A": [1, 2, 3]}, ValueError, "all its values must be present"), ]) def test_from_indicators_invalid(indicators, data, exc_type, match): with pytest.raises(exc_type, match=match): from_indicators(indicators=indicators, data=data) @pytest.mark.parametrize('indicators', [ pd.DataFrame({"cat1": [False, True, False]}), pd.DataFrame({"cat1": [False, True, False]}, dtype="O"), {"cat1": [False, True, False]}, lambda data: {"cat1": {pd.DataFrame(data).index.values[1]: True}}, ]) @pytest.mark.parametrize('data', [ pd.DataFrame({"val1": [3, 4, 5]}), pd.DataFrame({"val1": [3, 4, 5]}, index=["a", "b", "c"]), {"val1": [3, 4, 5]}, ]) def test_from_indicators_equivalence(indicators, data): assert_frame_equal(from_indicators(indicators, data), from_memberships([[], ["cat1"], []], data)) def test_generate_data_warning(): with pytest.warns(DeprecationWarning): generate_data() UpSetPlot-0.8.0/upsetplot/tests/test_examples.py000066400000000000000000000007651435554746600221010ustar00rootroot00000000000000import glob import os import subprocess import sys import pytest exa_glob = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', '..', 'examples', '*.py') @pytest.mark.parametrize('path', glob.glob(exa_glob)) def test_example(path): pytest.importorskip('sklearn') pytest.importorskip('seaborn') env = os.environ.copy() env["PYTHONPATH"] = os.getcwd() + ":" + env.get("PYTHONPATH", "") subprocess.check_output([sys.executable, path], env=env) UpSetPlot-0.8.0/upsetplot/tests/test_reformat.py000066400000000000000000000032311435554746600220710ustar00rootroot00000000000000import pytest import pandas as pd from pandas.util.testing import assert_series_equal, assert_frame_equal from upsetplot import generate_counts, generate_samples from upsetplot import query # `query` is mostly tested through plotting tests, especially tests of # `_process_data` which cover sort_by, sort_categories_by, subset_size # and sum_over. @pytest.mark.parametrize('data', [ generate_counts(), generate_samples(), ]) @pytest.mark.parametrize('param_set', [ [{"present": "cat1"}, {"absent": "cat1"}], [{"max_degree": 0}, {"min_degree": 1, "max_degree": 2}, {"min_degree": 3}], [{"max_subset_size": 30}, {"min_subset_size": 31}], [{"present": "cat1", "max_subset_size": 30}, {"absent": "cat1", "max_subset_size": 30}, {"present": "cat1", "min_subset_size": 31}, {"absent": "cat1", "min_subset_size": 31}, ], ]) def test_mece_queries(data, param_set): unfiltered_results = query(data) all_results = [query(data, **params) for params in param_set] # category_totals is unaffected by filter for results in all_results: assert_series_equal(unfiltered_results.category_totals, results.category_totals) combined_data = pd.concat([results.data for results in all_results]) combined_data.sort_index(inplace=True) assert_frame_equal(unfiltered_results.data.sort_index(), combined_data) combined_sizes = pd.concat([results.subset_sizes for results in all_results]) combined_sizes.sort_index(inplace=True) assert_series_equal(unfiltered_results.subset_sizes.sort_index(), combined_sizes) UpSetPlot-0.8.0/upsetplot/tests/test_upsetplot.py000066400000000000000000001115171435554746600223200ustar00rootroot00000000000000import io import itertools import pytest from pandas.util.testing import ( assert_series_equal, assert_frame_equal, assert_index_equal) from numpy.testing import assert_array_equal import pandas as pd import numpy as np import matplotlib.figure import matplotlib.pyplot as plt from matplotlib.text import Text from matplotlib.colors import to_hex from matplotlib import cm from upsetplot import plot from upsetplot import UpSet from upsetplot import generate_counts, generate_samples from upsetplot.plotting import _process_data # TODO: warnings should raise errors def is_ascending(seq): # return np.all(np.diff(seq) >= 0) return sorted(seq) == list(seq) def get_all_texts(mpl_artist): out = [text.get_text() for text in mpl_artist.findobj(Text)] return [text for text in out if text] @pytest.mark.parametrize('x', [ generate_counts(), generate_counts().iloc[1:-2], ]) @pytest.mark.parametrize( 'sort_by', ['cardinality', 'degree', '-cardinality', '-degree', None, 'input', '-input']) @pytest.mark.parametrize( 'sort_categories_by', [None, 'input', '-input', 'cardinality', '-cardinality']) def test_process_data_series(x, sort_by, sort_categories_by): assert x.name == 'value' for subset_size in ['auto', 'sum', 'count']: for sum_over in ['abc', False]: with pytest.raises(ValueError, match='sum_over is not applicable'): _process_data(x, sort_by=sort_by, sort_categories_by=sort_categories_by, subset_size=subset_size, sum_over=sum_over) # shuffle input to test sorting x = x.sample(frac=1., replace=False, random_state=0) total, df, intersections, totals = _process_data( x, subset_size='auto', sort_by=sort_by, sort_categories_by=sort_categories_by, sum_over=None) assert total == x.sum() assert intersections.name == 'value' x_reordered_levels = (x .reorder_levels(intersections.index.names)) x_reordered = (x_reordered_levels .reindex(index=intersections.index)) assert len(x) == len(x_reordered) assert x_reordered.index.is_unique assert_series_equal(x_reordered, intersections, check_dtype=False) if sort_by == 'cardinality': assert is_ascending(intersections.values[::-1]) elif sort_by == '-cardinality': assert is_ascending(intersections.values) elif sort_by == 'degree': # check degree order assert is_ascending(intersections.index.to_frame().sum(axis=1)) # TODO: within a same-degree group, the tuple of active names should # be in sort-order elif sort_by == '-degree': # check degree order assert is_ascending(intersections.index.to_frame().sum(axis=1)[::-1]) else: find_first_in_orig = x_reordered_levels.index.tolist().index orig_order = [find_first_in_orig(key) for key in intersections.index.tolist()] assert orig_order == sorted( orig_order, reverse=sort_by is not None and sort_by.startswith('-')) if sort_categories_by == 'cardinality': assert is_ascending(totals.values[::-1]) elif sort_categories_by == '-cardinality': assert is_ascending(totals.values) assert np.all(totals.index.values == intersections.index.names) assert np.all(df.index.names == intersections.index.names) assert set(df.columns) == {'_value', '_bin'} assert_index_equal(df['_value'].reorder_levels(x.index.names).index, x.index) assert_array_equal(df['_value'], x) assert_index_equal(intersections.iloc[df['_bin']].index, df.index) assert len(df) == len(x) @pytest.mark.parametrize('x', [ generate_samples()['value'], generate_counts(), ]) def test_subset_size_series(x): kw = {'sort_by': 'cardinality', 'sort_categories_by': 'cardinality', 'sum_over': None} total, df_sum, intersections_sum, totals_sum = _process_data( x, subset_size='sum', **kw) assert total == intersections_sum.sum() if x.index.is_unique: total, df, intersections, totals = _process_data( x, subset_size='auto', **kw) assert total == intersections.sum() assert_frame_equal(df, df_sum) assert_series_equal(intersections, intersections_sum) assert_series_equal(totals, totals_sum) else: with pytest.raises(ValueError): _process_data(x, subset_size='auto', **kw) total, df_count, intersections_count, totals_count = _process_data( x, subset_size='count', **kw) assert total == intersections_count.sum() total, df, intersections, totals = _process_data( x.groupby(level=list(range(len(x.index.levels)))).count(), subset_size='sum', **kw) assert total == intersections.sum() assert_series_equal(intersections, intersections_count, check_names=False) assert_series_equal(totals, totals_count) @pytest.mark.parametrize('x', [ generate_samples()['value'], ]) @pytest.mark.parametrize('sort_by', ['cardinality', 'degree', None]) @pytest.mark.parametrize('sort_categories_by', [None, 'cardinality']) def test_process_data_frame(x, sort_by, sort_categories_by): # shuffle input to test sorting x = x.sample(frac=1., replace=False, random_state=0) X = pd.DataFrame({'a': x}) with pytest.warns(None): total, df, intersections, totals = _process_data( X, sort_by=sort_by, sort_categories_by=sort_categories_by, sum_over='a', subset_size='auto') assert df is not X assert total == pytest.approx(intersections.sum()) # check equivalence to Series total1, df1, intersections1, totals1 = _process_data( x, sort_by=sort_by, sort_categories_by=sort_categories_by, subset_size='sum', sum_over=None) assert intersections.name == 'a' assert_frame_equal(df, df1.rename(columns={'_value': 'a'})) assert_series_equal(intersections, intersections1, check_names=False) assert_series_equal(totals, totals1) # check effect of extra column X = pd.DataFrame({'a': x, 'b': np.arange(len(x))}) total2, df2, intersections2, totals2 = _process_data( X, sort_by=sort_by, sort_categories_by=sort_categories_by, sum_over='a', subset_size='auto') assert total2 == pytest.approx(intersections2.sum()) assert_series_equal(intersections, intersections2) assert_series_equal(totals, totals2) assert_frame_equal(df, df2.drop('b', axis=1)) assert_array_equal(df2['b'], X['b']) # disregard levels, tested above # check effect not dependent on order/name X = pd.DataFrame({'b': np.arange(len(x)), 'c': x}) total3, df3, intersections3, totals3 = _process_data( X, sort_by=sort_by, sort_categories_by=sort_categories_by, sum_over='c', subset_size='auto') assert total3 == pytest.approx(intersections3.sum()) assert_series_equal(intersections, intersections3, check_names=False) assert intersections.name == 'a' assert intersections3.name == 'c' assert_series_equal(totals, totals3) assert_frame_equal(df.rename(columns={'a': 'c'}), df3.drop('b', axis=1)) assert_array_equal(df3['b'], X['b']) # check subset_size='count' X = pd.DataFrame({'b': np.ones(len(x), dtype='int64'), 'c': x}) total4, df4, intersections4, totals4 = _process_data( X, sort_by=sort_by, sort_categories_by=sort_categories_by, sum_over='b', subset_size='auto') total5, df5, intersections5, totals5 = _process_data( X, sort_by=sort_by, sort_categories_by=sort_categories_by, subset_size='count', sum_over=None) assert total5 == pytest.approx(intersections5.sum()) assert_series_equal(intersections4, intersections5, check_names=False) assert intersections4.name == 'b' assert intersections5.name == 'size' assert_series_equal(totals4, totals5) assert_frame_equal(df4, df5) @pytest.mark.parametrize('x', [ generate_samples()['value'], generate_counts(), ]) def test_subset_size_frame(x): kw = {'sort_by': 'cardinality', 'sort_categories_by': 'cardinality'} X = pd.DataFrame({'x': x}) total_sum, df_sum, intersections_sum, totals_sum = _process_data( X, subset_size='sum', sum_over='x', **kw) total_count, df_count, intersections_count, totals_count = _process_data( X, subset_size='count', sum_over=None, **kw) # error cases: sum_over=False for subset_size in ['auto', 'sum', 'count']: with pytest.raises(ValueError, match='sum_over'): _process_data( X, subset_size=subset_size, sum_over=False, **kw) with pytest.raises(ValueError, match='sum_over'): _process_data( X, subset_size=subset_size, sum_over=False, **kw) # error cases: sum_over incompatible with subset_size with pytest.raises(ValueError, match='sum_over should be a field'): _process_data( X, subset_size='sum', sum_over=None, **kw) with pytest.raises(ValueError, match='sum_over cannot be set'): _process_data( X, subset_size='count', sum_over='x', **kw) # check subset_size='auto' with sum_over=str => sum total, df, intersections, totals = _process_data( X, subset_size='auto', sum_over='x', **kw) assert total == intersections.sum() assert_frame_equal(df, df_sum) assert_series_equal(intersections, intersections_sum) assert_series_equal(totals, totals_sum) # check subset_size='auto' with sum_over=None => count total, df, intersections, totals = _process_data( X, subset_size='auto', sum_over=None, **kw) assert total == intersections.sum() assert_frame_equal(df, df_count) assert_series_equal(intersections, intersections_count) assert_series_equal(totals, totals_count) @pytest.mark.parametrize('sort_by', ['cardinality', 'degree']) @pytest.mark.parametrize('sort_categories_by', [None, 'cardinality']) def test_not_unique(sort_by, sort_categories_by): kw = {'sort_by': sort_by, 'sort_categories_by': sort_categories_by, 'subset_size': 'sum', 'sum_over': None} Xagg = generate_counts() total1, df1, intersections1, totals1 = _process_data(Xagg, **kw) Xunagg = generate_samples()['value'] Xunagg.loc[:] = 1 total2, df2, intersections2, totals2 = _process_data(Xunagg, **kw) assert_series_equal(intersections1, intersections2, check_dtype=False) assert total2 == intersections2.sum() assert_series_equal(totals1, totals2, check_dtype=False) assert set(df1.columns) == {'_value', '_bin'} assert set(df2.columns) == {'_value', '_bin'} assert len(df2) == len(Xunagg) assert df2['_bin'].nunique() == len(intersections2) def test_include_empty_subsets(): X = generate_counts(n_samples=2, n_categories=3) no_empty_upset = UpSet(X, include_empty_subsets=False) assert len(no_empty_upset.intersections) <= 2 include_empty_upset = UpSet(X, include_empty_subsets=True) assert len(include_empty_upset.intersections) == 2 ** 3 common_intersections = include_empty_upset.intersections.loc[ no_empty_upset.intersections.index] assert_series_equal(no_empty_upset.intersections, common_intersections) include_empty_upset.plot() # smoke test @pytest.mark.parametrize('kw', [{'sort_by': 'blah'}, {'sort_by': True}, {'sort_categories_by': 'blah'}, {'sort_categories_by': True}]) def test_param_validation(kw): X = generate_counts(n_samples=100) with pytest.raises(ValueError): UpSet(X, **kw) @pytest.mark.parametrize('kw', [{}, {'element_size': None}, {'orientation': 'vertical'}, {'intersection_plot_elements': 0}, {'facecolor': 'red'}, {'shading_color': 'lightgrey', 'other_dots_color': 'pink'}]) def test_plot_smoke_test(kw): fig = matplotlib.figure.Figure() X = generate_counts(n_samples=100) axes = plot(X, fig, **kw) fig.savefig(io.BytesIO(), format='png') attr = ('get_xlim' if kw.get('orientation', 'horizontal') == 'horizontal' else 'get_ylim') lim = getattr(axes['matrix'], attr)() expected_width = len(X) assert expected_width == lim[1] - lim[0] # Also check fig is optional n_nums = len(plt.get_fignums()) plot(X, **kw) assert len(plt.get_fignums()) - n_nums == 1 assert plt.gcf().axes @pytest.mark.parametrize('set1', itertools.product([False, True], repeat=2)) @pytest.mark.parametrize('set2', itertools.product([False, True], repeat=2)) def test_two_sets(set1, set2): # we had a bug where processing failed if no items were in some set fig = matplotlib.figure.Figure() plot(pd.DataFrame({'val': [5, 7], 'set1': set1, 'set2': set2}).set_index(['set1', 'set2'])['val'], fig, subset_size='sum') def test_vertical(): X = generate_counts(n_samples=100) fig = matplotlib.figure.Figure() UpSet(X, orientation='horizontal').make_grid(fig) horz_height = fig.get_figheight() horz_width = fig.get_figwidth() assert horz_height < horz_width fig = matplotlib.figure.Figure() UpSet(X, orientation='vertical').make_grid(fig) vert_height = fig.get_figheight() vert_width = fig.get_figwidth() assert horz_width / horz_height > vert_width / vert_height # TODO: test axes positions, plot order, bar orientation pass def test_element_size(): X = generate_counts(n_samples=100) figsizes = [] for element_size in range(10, 50, 5): fig = matplotlib.figure.Figure() UpSet(X, element_size=element_size).make_grid(fig) figsizes.append((fig.get_figwidth(), fig.get_figheight())) figwidths, figheights = zip(*figsizes) # Absolute width increases assert np.all(np.diff(figwidths) > 0) aspect = np.divide(figwidths, figheights) # Font size stays constant, so aspect ratio decreases assert np.all(np.diff(aspect) <= 1e-8) # allow for near-equality assert np.any(np.diff(aspect) < 1e-4) # require some significant decrease # But doesn't decrease by much assert np.all(aspect[:-1] / aspect[1:] < 1.1) fig = matplotlib.figure.Figure() figsize_before = fig.get_figwidth(), fig.get_figheight() UpSet(X, element_size=None).make_grid(fig) figsize_after = fig.get_figwidth(), fig.get_figheight() assert figsize_before == figsize_after # TODO: make sure axes are all within figure # TODO: make sure text does not overlap axes, even with element_size=None def _walk_artists(el): children = el.get_children() yield el, children for ch in children: for x in _walk_artists(ch): yield x def _count_descendants(el): return sum(len(children) for x, children in _walk_artists(el)) @pytest.mark.parametrize('orientation', ['horizontal', 'vertical']) def test_show_counts(orientation): fig = matplotlib.figure.Figure() X = generate_counts(n_samples=10000) plot(X, fig, orientation=orientation) n_artists_no_sizes = _count_descendants(fig) fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_counts=True) n_artists_yes_sizes = _count_descendants(fig) assert n_artists_yes_sizes - n_artists_no_sizes > 6 assert '9547' in get_all_texts(fig) # set size assert '283' in get_all_texts(fig) # intersection size fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_counts='%0.2g') assert n_artists_yes_sizes == _count_descendants(fig) assert '9.5e+03' in get_all_texts(fig) assert '2.8e+02' in get_all_texts(fig) fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_counts='{:0.2g}') assert n_artists_yes_sizes == _count_descendants(fig) assert '9.5e+03' in get_all_texts(fig) assert '2.8e+02' in get_all_texts(fig) fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_percentages=True) assert n_artists_yes_sizes == _count_descendants(fig) assert '95.5%' in get_all_texts(fig) assert '2.8%' in get_all_texts(fig) fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_percentages='!{:0.2f}!') assert n_artists_yes_sizes == _count_descendants(fig) assert '!0.95!' in get_all_texts(fig) assert '!0.03!' in get_all_texts(fig) fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_counts=True, show_percentages=True) assert n_artists_yes_sizes == _count_descendants(fig) if orientation == 'vertical': assert '9547\n(95.5%)' in get_all_texts(fig) assert '283 (2.8%)' in get_all_texts(fig) else: assert '9547 (95.5%)' in get_all_texts(fig) assert '283\n(2.8%)' in get_all_texts(fig) with pytest.raises(ValueError): fig = matplotlib.figure.Figure() plot(X, fig, orientation=orientation, show_counts='%0.2h') def test_add_catplot(): pytest.importorskip('seaborn') X = generate_counts(n_samples=100) upset = UpSet(X) # smoke test upset.add_catplot('violin') fig = matplotlib.figure.Figure() upset.plot(fig) # can't provide value with Series with pytest.raises(ValueError): upset.add_catplot('violin', value='foo') # check the above add_catplot did not break the state upset.plot(fig) X = generate_counts(n_samples=100) X.name = 'foo' X = X.to_frame() upset = UpSet(X, subset_size='count') # must provide value with DataFrame with pytest.raises(ValueError): upset.add_catplot('violin') upset.add_catplot('violin', value='foo') with pytest.raises(ValueError): # not a known column upset.add_catplot('violin', value='bar') upset.plot(fig) # invalid plot kind raises error when plotting upset.add_catplot('foobar', value='foo') with pytest.raises(AttributeError): upset.plot(fig) def _get_patch_data(axes, is_vertical): out = [{"y": patch.get_y(), "x": patch.get_x(), "h": patch.get_height(), "w": patch.get_width(), "fc": patch.get_facecolor(), "ec": patch.get_edgecolor(), "lw": patch.get_linewidth(), "ls": patch.get_linestyle(), "hatch": patch.get_hatch(), } for patch in axes.patches] if is_vertical: out = [{"y": patch["x"], "x": 6.5 - patch["y"], "h": patch["w"], "w": patch["h"], "fc": patch["fc"], "ec": patch["ec"], "lw": patch["lw"], "ls": patch["ls"], "hatch": patch["hatch"], } for patch in out] return pd.DataFrame(out).sort_values("x").reset_index(drop=True) def _get_color_to_label_from_legend(ax): handles, labels = ax.get_legend_handles_labels() color_to_label = { patches[0].get_facecolor(): label for patches, label in zip(handles, labels) } return color_to_label @pytest.mark.parametrize('orientation', ['horizontal', 'vertical']) @pytest.mark.parametrize('show_counts', [False, True]) def test_add_stacked_bars(orientation, show_counts): df = generate_samples() df["label"] = (pd.cut(generate_samples().value + np.random.rand() / 2, 3) .cat.codes .map({0: "foo", 1: "bar", 2: "baz"})) upset = UpSet(df, show_counts=show_counts, orientation=orientation) upset.add_stacked_bars(by="label") upset_axes = upset.plot() int_axes = upset_axes["intersections"] stacked_axes = upset_axes["extra1"] is_vertical = orientation == 'vertical' int_rects = _get_patch_data(int_axes, is_vertical) stacked_rects = _get_patch_data(stacked_axes, is_vertical) # check bar heights match between int_rects and stacked_rects assert_series_equal(int_rects.groupby("x")["h"].sum(), stacked_rects.groupby("x")["h"].sum(), check_dtype=False) # check count labels match (TODO: check coordinate) assert ([elem.get_text() for elem in int_axes.texts] == [elem.get_text() for elem in stacked_axes.texts]) color_to_label = _get_color_to_label_from_legend(stacked_axes) stacked_rects["label"] = stacked_rects["fc"].map(color_to_label) # check totals for each label assert_series_equal(stacked_rects.groupby("label")["h"].sum(), df.groupby("label").size(), check_dtype=False, check_names=False) label_order = [text_obj.get_text() for text_obj in stacked_axes.get_legend().get_texts()] # label order should be lexicographic assert label_order == sorted(label_order) if orientation == "horizontal": # order of labels in legend should match stack, top to bottom for prev, curr in zip(label_order, label_order[1:]): assert (stacked_rects.query("label == @prev") .sort_values("x")["y"].values >= stacked_rects.query("label == @curr") .sort_values("x")["y"].values).all() else: # order of labels in legend should match stack, left to right for prev, curr in zip(label_order, label_order[1:]): assert (stacked_rects.query("label == @prev") .sort_values("x")["y"].values <= stacked_rects.query("label == @curr") .sort_values("x")["y"].values).all() @pytest.mark.parametrize("colors, expected", [ (["blue", "red", "green"], ["blue", "red", "green"]), ({"bar": "blue", "baz": "red", "foo": "green"}, ["blue", "red", "green"]), ("Pastel1", ["#fbb4ae", "#b3cde3", "#ccebc5"]), (cm.viridis, ["#440154", "#440256", "#450457"]), (lambda x: cm.Pastel1(x), ["#fbb4ae", "#b3cde3", "#ccebc5"]), ]) def test_add_stacked_bars_colors(colors, expected): df = generate_samples() df["label"] = (pd.cut(generate_samples().value + np.random.rand() / 2, 3) .cat.codes .map({0: "foo", 1: "bar", 2: "baz"})) upset = UpSet(df) upset.add_stacked_bars(by="label", colors=colors, title="Count by gender") upset_axes = upset.plot() stacked_axes = upset_axes["extra1"] color_to_label = _get_color_to_label_from_legend(stacked_axes) label_to_color = {v: k for k, v in color_to_label.items()} actual = [to_hex(label_to_color[label]) for label in ["bar", "baz", "foo"]] expected = [to_hex(color) for color in expected] assert actual == expected @pytest.mark.parametrize('int_sum_over', [False, True]) @pytest.mark.parametrize('stack_sum_over', [False, True]) @pytest.mark.parametrize('show_counts', [False, True]) def test_add_stacked_bars_sum_over(int_sum_over, stack_sum_over, show_counts): # A rough test of sum_over df = generate_samples() df["label"] = (pd.cut(generate_samples().value + np.random.rand() / 2, 3) .cat.codes .map({0: "foo", 1: "bar", 2: "baz"})) upset = UpSet(df, sum_over="value" if int_sum_over else None, show_counts=show_counts) upset.add_stacked_bars(by="label", sum_over="value" if stack_sum_over else None, colors='Pastel1') upset_axes = upset.plot() int_axes = upset_axes["intersections"] stacked_axes = upset_axes["extra1"] int_rects = _get_patch_data(int_axes, is_vertical=False) stacked_rects = _get_patch_data(stacked_axes, is_vertical=False) if int_sum_over == stack_sum_over: # check bar heights match between int_rects and stacked_rects assert_series_equal(int_rects.groupby("x")["h"].sum(), stacked_rects.groupby("x")["h"].sum(), check_dtype=False) # and check labels match with show_counts assert ([elem.get_text() for elem in int_axes.texts] == [elem.get_text() for elem in stacked_axes.texts]) else: assert (int_rects.groupby("x")["h"].sum() != stacked_rects.groupby("x")["h"].sum()).all() if show_counts: assert ([elem.get_text() for elem in int_axes.texts] != [elem.get_text() for elem in stacked_axes.texts]) @pytest.mark.parametrize('x', [ generate_counts(), ]) def test_index_must_be_bool(x): # Truthy ints are okay x = x.reset_index() x[['cat0', 'cat2', 'cat2']] = x[['cat0', 'cat1', 'cat2']].astype(int) x = x.set_index(['cat0', 'cat1', 'cat2']).iloc[:, 0] UpSet(x) # other ints are not x = x.reset_index() x[['cat0', 'cat2', 'cat2']] = x[['cat0', 'cat1', 'cat2']] + 1 x = x.set_index(['cat0', 'cat1', 'cat2']).iloc[:, 0] with pytest.raises(ValueError, match='not boolean'): UpSet(x) @pytest.mark.parametrize( "filter_params, expected", [ ({"min_subset_size": 623}, {(True, False, False): 884, (True, True, False): 1547, (True, False, True): 623, (True, True, True): 990, }), ({"min_subset_size": 800, "max_subset_size": 990}, {(True, False, False): 884, (True, True, True): 990, }), ({"min_degree": 2}, {(True, True, False): 1547, (True, False, True): 623, (False, True, True): 258, (True, True, True): 990, }), ({"min_degree": 2, "max_degree": 2}, {(True, True, False): 1547, (True, False, True): 623, (False, True, True): 258, }), ({"max_subset_size": 500, "max_degree": 2}, {(False, False, False): 220, (False, True, False): 335, (False, False, True): 143, (False, True, True): 258, }), ] ) @pytest.mark.parametrize('sort_by', ['cardinality', 'degree']) def test_filter_subsets(filter_params, expected, sort_by): data = generate_samples(seed=0, n_samples=5000, n_categories=3) # data = # cat1 cat0 cat2 # False False False 220 # True False False 884 # False True False 335 # False True 143 # True True False 1547 # False True 623 # False True True 258 # True True True 990 upset_full = UpSet(data, subset_size='auto', sort_by=sort_by) upset_filtered = UpSet(data, subset_size='auto', sort_by=sort_by, **filter_params) intersections = upset_full.intersections df = upset_full._df # check integrity of expected, just to be sure for key, value in expected.items(): assert intersections.loc[key] == value subset_intersections = intersections[ intersections.index.isin(list(expected.keys()))] subset_df = df[df.index.isin(list(expected.keys()))] assert len(subset_intersections) < len(intersections) assert_series_equal(upset_filtered.intersections, subset_intersections) assert_frame_equal(upset_filtered._df.drop("_bin", axis=1), subset_df.drop("_bin", axis=1)) # category totals should not be affected assert_series_equal(upset_full.totals, upset_filtered.totals) @pytest.mark.parametrize('x', [ generate_counts(n_categories=3), generate_counts(n_categories=8), generate_counts(n_categories=15), ]) @pytest.mark.parametrize('orientation', [ 'horizontal', 'vertical', ]) def test_matrix_plot_margins(x, orientation): """Non-regression test addressing a bug where there is are large whitespace margins around the matrix when the number of intersections is large""" axes = plot(x, orientation=orientation) # Expected behavior is that each matrix column takes up one unit on x-axis expected_width = len(x) attr = 'get_xlim' if orientation == 'horizontal' else 'get_ylim' lim = getattr(axes['matrix'], attr)() assert expected_width == lim[1] - lim[0] def _make_facecolor_list(colors): return [{"facecolor": c} for c in colors] CAT1_2_RED_STYLES = _make_facecolor_list(["blue", "blue", "blue", "blue", "red", "blue", "blue", "red"]) CAT1_RED_STYLES = _make_facecolor_list(["blue", "red", "blue", "blue", "red", "red", "blue", "red"]) CAT_NOT1_RED_STYLES = _make_facecolor_list(["red", "blue", "red", "red", "blue", "blue", "red", "blue"]) CAT1_NOT2_RED_STYLES = _make_facecolor_list(["blue", "red", "blue", "blue", "blue", "red", "blue", "blue"]) CAT_NOT1_2_RED_STYLES = _make_facecolor_list(["red", "blue", "blue", "red", "blue", "blue", "blue", "blue"]) @pytest.mark.parametrize( "kwarg_list,expected_subset_styles,expected_legend", [ # Different forms of including two categories ([{"present": ["cat1", "cat2"], "facecolor": "red"}], CAT1_2_RED_STYLES, []), ([{"present": {"cat1", "cat2"}, "facecolor": "red"}], CAT1_2_RED_STYLES, []), ([{"present": ("cat1", "cat2"), "facecolor": "red"}], CAT1_2_RED_STYLES, []), # with legend ([{"present": ("cat1", "cat2"), "facecolor": "red", "label": "foo"}], CAT1_2_RED_STYLES, [({"facecolor": "red"}, "foo")]), # present only cat1 ([{"present": ("cat1",), "facecolor": "red"}], CAT1_RED_STYLES, []), ([{"present": "cat1", "facecolor": "red"}], CAT1_RED_STYLES, []), # Some uses of absent ([{"absent": "cat1", "facecolor": "red"}], CAT_NOT1_RED_STYLES, []), ([{"present": "cat1", "absent": ["cat2"], "facecolor": "red"}], CAT1_NOT2_RED_STYLES, []), ([{"absent": ["cat2", "cat1"], "facecolor": "red"}], CAT_NOT1_2_RED_STYLES, []), # min/max args ([{"present": ["cat1", "cat2"], "min_degree": 3, "facecolor": "red"}], _make_facecolor_list(["blue"] * 7 + ["red"]), []), ([{"present": ["cat1", "cat2"], "max_subset_size": 3000, "facecolor": "red"}], _make_facecolor_list(["blue"] * 7 + ["red"]), []), ([{"present": ["cat1", "cat2"], "max_degree": 2, "facecolor": "red"}], _make_facecolor_list(["blue"] * 4 + ["red"] + ["blue"] * 3), []), ([{"present": ["cat1", "cat2"], "min_subset_size": 3000, "facecolor": "red"}], _make_facecolor_list(["blue"] * 4 + ["red"] + ["blue"] * 3), []), # cat1 _or_ cat2 ([{"present": "cat1", "facecolor": "red"}, {"present": "cat2", "facecolor": "red"}], _make_facecolor_list(["blue", "red", "red", "blue", "red", "red", "red", "red"]), []), # With multiple uses of label ([{"present": "cat1", "facecolor": "red", "label": "foo"}, {"present": "cat2", "facecolor": "red", "label": "bar"}], _make_facecolor_list(["blue", "red", "red", "blue", "red", "red", "red", "red"]), [({"facecolor": "red"}, "foo; bar")]), ([{"present": "cat1", "facecolor": "red", "label": "foo"}, {"present": "cat2", "facecolor": "red", "label": "foo"}], _make_facecolor_list(["blue", "red", "red", "blue", "red", "red", "red", "red"]), [({"facecolor": "red"}, "foo")]), # With multiple colours, the latest overrides ([{"present": "cat1", "facecolor": "red", "label": "foo"}, {"present": "cat2", "facecolor": "green", "label": "bar"}], _make_facecolor_list(["blue", "red", "green", "blue", "green", "red", "green", "green"]), [({"facecolor": "red"}, "foo"), ({"facecolor": "green"}, "bar")]), # Combining multiple style properties ([{"present": "cat1", "facecolor": "red", "hatch": "//"}, {"present": "cat2", "edgecolor": "green", "linestyle": "dotted"}], [{"facecolor": "blue"}, {"facecolor": "red", "hatch": "//"}, {"facecolor": "blue", "edgecolor": "green", "linestyle": "dotted"}, {"facecolor": "blue"}, {"facecolor": "red", "hatch": "//", "edgecolor": "green", "linestyle": "dotted"}, {"facecolor": "red", "hatch": "//"}, {"facecolor": "blue", "edgecolor": "green", "linestyle": "dotted"}, {"facecolor": "red", "hatch": "//", "edgecolor": "green", "linestyle": "dotted"}, ], []), ]) def test_style_subsets(kwarg_list, expected_subset_styles, expected_legend): data = generate_counts() upset = UpSet(data, facecolor="blue") for kw in kwarg_list: upset.style_subsets(**kw) actual_subset_styles = upset.subset_styles assert actual_subset_styles == expected_subset_styles assert upset.subset_legend == expected_legend def _dots_to_dataframe(ax, is_vertical): matrix_path_collection = ax.collections[0] matrix_dots = pd.DataFrame( matrix_path_collection.get_offsets(), columns=["x", "y"] ).join( pd.DataFrame(matrix_path_collection.get_facecolors(), columns=["fc_r", "fc_g", "fc_b", "fc_a"]), ).join( pd.DataFrame(matrix_path_collection.get_edgecolors(), columns=["ec_r", "ec_g", "ec_b", "ec_a"]), ).assign( lw=matrix_path_collection.get_linewidths(), ls=matrix_path_collection.get_linestyles(), hatch=matrix_path_collection.get_hatch(), ) matrix_dots["ls_offset"] = matrix_dots["ls"].map( lambda tup: tup[0]).astype(float) matrix_dots["ls_seq"] = matrix_dots["ls"].map( lambda tup: None if tup[1] is None else tuple(tup[1])) del matrix_dots["ls"] if is_vertical: matrix_dots[["x", "y"]] = matrix_dots[["y", "x"]] matrix_dots["x"] = 7 - matrix_dots["x"] return matrix_dots @pytest.mark.parametrize('orientation', ['horizontal', 'vertical']) def test_style_subsets_artists(orientation): # Check that subset_styles are all appropriately reflected in matplotlib # artists. # This may be a bit overkill, and too coupled with implementation details. is_vertical = orientation == 'vertical' data = generate_counts() upset = UpSet(data, orientation=orientation) subset_styles = [ {"facecolor": "black"}, {"facecolor": "red"}, {"edgecolor": "red"}, {"edgecolor": "red", "linewidth": 4}, {"linestyle": "dotted"}, {"edgecolor": "red", "facecolor": "blue", "hatch": "//"}, {"facecolor": "blue"}, {}, ] if is_vertical: upset.subset_styles = subset_styles[::-1] else: upset.subset_styles = subset_styles upset_axes = upset.plot() int_rects = _get_patch_data(upset_axes["intersections"], is_vertical) int_rects[["fc_r", "fc_g", "fc_b", "fc_a"]] = ( int_rects.pop("fc").apply(lambda x: pd.Series(x))) int_rects[["ec_r", "ec_g", "ec_b", "ec_a"]] = ( int_rects.pop("ec").apply(lambda x: pd.Series(x))) int_rects["ls_is_solid"] = int_rects.pop("ls").map( lambda x: x == "solid" or pd.isna(x)) expected = pd.DataFrame({ "fc_r": [0, 1, 0, 0, 0, 0, 0, 0], "fc_g": [0, 0, 0, 0, 0, 0, 0, 0], "fc_b": [0, 0, 0, 0, 0, 1, 1, 0], "ec_r": [0, 1, 1, 1, 0, 1, 0, 0], "ec_g": [0, 0, 0, 0, 0, 0, 0, 0], "ec_b": [0, 0, 0, 0, 0, 0, 1, 0], "lw": [1, 1, 1, 4, 1, 1, 1, 1], "ls_is_solid": [True, True, True, True, False, True, True, True], }) assert_frame_equal(expected, int_rects[expected.columns], check_dtype=False) styled_dots = _dots_to_dataframe(upset_axes["matrix"], is_vertical) baseline_dots = _dots_to_dataframe( UpSet(data, orientation=orientation).plot()["matrix"], is_vertical ) inactive_dot_mask = (baseline_dots[["fc_a"]] < 1).values.ravel() assert_frame_equal(baseline_dots.loc[inactive_dot_mask], styled_dots.loc[inactive_dot_mask]) styled_dots = styled_dots.loc[~inactive_dot_mask] styled_dots = styled_dots.drop(columns="y").groupby("x").apply( lambda df: df.drop_duplicates()) styled_dots["ls_is_solid"] = styled_dots.pop("ls_seq").isna() assert_frame_equal(expected.iloc[1:].reset_index(drop=True), styled_dots[expected.columns].reset_index(drop=True), check_dtype=False) # TODO: check lines between dots # matrix_line_collection = upset_axes["matrix"].collections[1] def test_many_categories(): # Tests regressions against GH#193 n_cats = 250 index1 = [True, False] + [False] * (n_cats - 2) index2 = [False, True] + [False] * (n_cats - 2) columns = [chr(i + 33) for i in range(n_cats)] data = pd.DataFrame([index1, index2], columns=columns) data["value"] = 1 data = data.set_index(columns)["value"] UpSet(data) UpSetPlot-0.8.0/upsetplot/util.py000066400000000000000000000050301435554746600170250ustar00rootroot00000000000000"""Generic utilities""" import re # The below is adapted from an answer to # https://stackoverflow.com/questions/66822945 # by Andrius at https://stackoverflow.com/a/66869159/1017546 # Reproduced under the CC-BY-SA 3.0 licence. ODD_REPEAT_PATTERN = r'((? str: """Convert old style named formatting to new style formatting. For example: '%(x)s - %%%(y)s' -> '{x} - %{y}' Args: fmt: old style formatting to convert. Returns: new style formatting. """ return __to_new_format(fmt, named=True) def to_new_pos_format(fmt: str) -> str: """Convert old style positional formatting to new style formatting. For example: '%s - %%%s' -> '{} - %{}' Args: fmt: old style formatting to convert. Returns: new style formatting. """ return __to_new_format(fmt, named=False)