pax_global_header00006660000000000000000000000064130732206530014513gustar00rootroot0000000000000052 comment=e3c4f4a9b6120433e5cc3383464c7a79e9b2b86e PubChemPy-1.0.4/000077500000000000000000000000001307322065300133515ustar00rootroot00000000000000PubChemPy-1.0.4/.bumpversion.cfg000066400000000000000000000003301307322065300164550ustar00rootroot00000000000000[bumpversion] current_version = 1.0.4 commit = True tag = True [bumpversion:file:setup.py] [bumpversion:file:pubchempy.py] [bumpversion:file:docs/source/guide/install.rst] [bumpversion:file:docs/source/conf.py] PubChemPy-1.0.4/.travis.yml000066400000000000000000000005101307322065300154560ustar00rootroot00000000000000language: python sudo: false python: - "2.7" - "3.5" - "3.6" env: matrix: - OPTIONAL_DEPS=true - OPTIONAL_DEPS=false install: - pip install -U coveralls pytest - if [ "$OPTIONAL_DEPS" = true ]; then pip install -U pandas; fi script: - coverage run --source=pubchempy -m pytest after_success: - coveralls PubChemPy-1.0.4/CHANGELOG.md000066400000000000000000000051401307322065300151620ustar00rootroot00000000000000# Change Log ## [v1.0.4](https://github.com/mcs07/PubChemPy/releases/tag/v1.0.4) (2017-04-11) [Full Changelog](https://github.com/mcs07/PubChemPy/compare/v1.0.3...v1.0.4) **Implemented enhancements:** - Discrepancy between the CACTVS fingerprint spec and Compound.fingerprint [\#15](https://github.com/mcs07/PubChemPy/issues/15) - Using pubchempy / urllib behind proxy [\#11](https://github.com/mcs07/PubChemPy/issues/11) **Fixed bugs:** - Two substance tests out-of-date [\#20](https://github.com/mcs07/PubChemPy/issues/20) - Xref-queries always 404 [\#18](https://github.com/mcs07/PubChemPy/issues/18) - TypeError when trying to download a compund with None as cid [\#13](https://github.com/mcs07/PubChemPy/issues/13) - On certain Compounds, the atoms\(\) function fails [\#5](https://github.com/mcs07/PubChemPy/issues/5) **Merged pull requests:** - Switch to using pytest for tests [\#23](https://github.com/mcs07/PubChemPy/pull/23) ([mcs07](https://github.com/mcs07)) - Decode CACTVS fingerprint to binary string [\#22](https://github.com/mcs07/PubChemPy/pull/22) ([mcs07](https://github.com/mcs07)) - Allow requests with xref input [\#19](https://github.com/mcs07/PubChemPy/pull/19) ([RickardSjogren](https://github.com/RickardSjogren)) - add get\_sdf function [\#17](https://github.com/mcs07/PubChemPy/pull/17) ([hsiaoyi0504](https://github.com/hsiaoyi0504)) - fix \#13 check if identifier is None and raise an exception to let the user know that the identifier is invalid [\#14](https://github.com/mcs07/PubChemPy/pull/14) ([llazzaro](https://github.com/llazzaro)) - Add syntax highlighting and output to README example. [\#7](https://github.com/mcs07/PubChemPy/pull/7) ([bjodah](https://github.com/bjodah)) ## [v1.0.3](https://github.com/mcs07/PubChemPy/releases/tag/v1.0.3) (2015-03-07) [Full Changelog](https://github.com/mcs07/PubChemPy/compare/v1.0.2...v1.0.3) ## [v1.0.2](https://github.com/mcs07/PubChemPy/releases/tag/v1.0.2) (2014-04-02) [Full Changelog](https://github.com/mcs07/PubChemPy/compare/v1.0.1...v1.0.2) **Merged pull requests:** - Add backwards-compatible Python 3 support [\#4](https://github.com/mcs07/PubChemPy/pull/4) ([mcs07](https://github.com/mcs07)) - Pandas Series [\#2](https://github.com/mcs07/PubChemPy/pull/2) ([zachcp](https://github.com/zachcp)) - fix ccordinate type recognition [\#1](https://github.com/mcs07/PubChemPy/pull/1) ([zachcp](https://github.com/zachcp)) ## [v1.0.1](https://github.com/mcs07/PubChemPy/releases/tag/v1.0.1) (2014-01-06) [Full Changelog](https://github.com/mcs07/PubChemPy/compare/v1.0...v1.0.1) ## [v1.0](https://github.com/mcs07/PubChemPy/releases/tag/v1.0) (2013-05-01) PubChemPy-1.0.4/LICENSE000066400000000000000000000020531307322065300143560ustar00rootroot00000000000000The MIT License Copyright 2014 Matt Swain Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. PubChemPy-1.0.4/MANIFEST.in000066400000000000000000000002141307322065300151040ustar00rootroot00000000000000include README.rst include LICENSE include pubchempy_test.py recursive-include docs * recursive-include requirements *.txt prune docs/build PubChemPy-1.0.4/README.rst000066400000000000000000000034371307322065300150470ustar00rootroot00000000000000PubChemPy ========= .. image:: http://img.shields.io/pypi/v/PubChemPy.svg?style=flat :target: https://pypi.python.org/pypi/PubChemPy .. image:: http://img.shields.io/pypi/l/PubChemPy.svg?style=flat :target: https://github.com/mcs07/PubChemPy/blob/master/LICENSE .. image:: http://img.shields.io/travis/mcs07/PubChemPy/master.svg?style=flat :target: https://travis-ci.org/mcs07/PubChemPy .. image:: http://img.shields.io/coveralls/mcs07/PubChemPy/master.svg?style=flat :target: https://coveralls.io/r/mcs07/PubChemPy?branch=master PubChemPy provides a way to interact with PubChem in Python. It allows chemical searches by name, substructure and similarity, chemical standardization, conversion between chemical file formats, depiction and retrieval of chemical properties. .. code:: python >>> from pubchempy import get_compounds, Compound >>> comp = Compound.from_cid(1423) >>> print(comp.isomeric_smiles) CCCCCCCNC1CCCC1CCCCCCC(=O)O >>> comps = get_compounds('Aspirin', 'name') >>> print(comps[0].xlogp) 1.2 Installation ------------ Install PubChemPy using: :: pip install pubchempy Alternatively, try one of the other `installation options`_. Documentation ------------- Full documentation is available at http://pubchempy.readthedocs.io. Contribute ---------- - Feature ideas and bug reports are welcome on the `Issue Tracker`_. - Fork the `source code`_ on GitHub, make changes and file a pull request. License ------- PubChemPy is licensed under the `MIT license`_. .. _`installation options`: http://pubchempy.readthedocs.io/en/latest/guide/install.html .. _`source code`: https://github.com/mcs07/PubChemPy .. _`Issue Tracker`: https://github.com/mcs07/PubChemPy/issues .. _`MIT license`: https://github.com/mcs07/PubChemPy/blob/master/LICENSE PubChemPy-1.0.4/docs/000077500000000000000000000000001307322065300143015ustar00rootroot00000000000000PubChemPy-1.0.4/docs/Makefile000066400000000000000000000151771307322065300157540ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = build # User-friendly check for sphinx-build ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " xml to make Docutils-native XML files" @echo " pseudoxml to make pseudoxml-XML files for display purposes" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/PubChemPy.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/PubChemPy.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/PubChemPy" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/PubChemPy" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." latexpdfja: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through platex and dvipdfmx..." $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." xml: $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml @echo @echo "Build finished. The XML files are in $(BUILDDIR)/xml." pseudoxml: $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml @echo @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." PubChemPy-1.0.4/docs/README.rst000066400000000000000000000017061307322065300157740ustar00rootroot00000000000000PubChemPy Documentation ======================= This file provides a quick guide on how to compile the PubChemPy documentation. You will find all the documentation source files in the ``docs/source`` directory, written in reStructuredText format. All generated documentation is saved to the ``docs/build`` directory. Requirements ------------ Sphinx is required to compile the documentation. Sphinx also requires docutils and jinja. Install them all using:: pip install Sphinx Compile the documentation ------------------------- To compile the documentation and produce HTML output, run the following command from this ``docs`` directory:: make html Documentation will be generated in HTML format and saved to the ``build/html`` directory. Open the ``index.html`` file in a browser to view it. Reset ----- To clear all generated documentation files and start over from scratch, run:: make clean This will not delete any of the source files. PubChemPy-1.0.4/docs/source/000077500000000000000000000000001307322065300156015ustar00rootroot00000000000000PubChemPy-1.0.4/docs/source/api.rst000066400000000000000000000043141307322065300171060ustar00rootroot00000000000000.. _api: API documentation ================= .. module:: pubchempy This part of the documentation is automatically generated from the PubChemPy source code and comments. Search functions ---------------- .. autofunction:: get_compounds .. autofunction:: get_substances .. autofunction:: get_assays .. autofunction:: get_properties Compound -------- .. autoclass:: pubchempy.Compound :members: Atom ---- .. autoclass:: pubchempy.Atom :members: Bond ---- .. autoclass:: pubchempy.Bond :members: Substance --------- .. autoclass:: pubchempy.Substance :members: Assay ----- .. autoclass:: pubchempy.Assay :members: *pandas* functions ------------------ Each of the search functions, :func:`~pubchempy.get_compounds`, :func:`~pubchempy.get_substances` and :func:`~pubchempy.get_properties` has an ``as_dataframe`` parameter. When set to ``True``, these functions automatically extract properties from each result in the list into a pandas :class:`~pandas.DataFrame` and return that instead of the results themselves. If you already have a list of Compounds or Substances, the functions below allow a :class:`~pandas.DataFrame` to be constructed easily. .. autofunction:: compounds_to_frame .. autofunction:: substances_to_frame Exceptions ---------- .. autoexception:: pubchempy.PubChemPyError() .. autoexception:: pubchempy.ResponseParseError() .. autoexception:: pubchempy.PubChemHTTPError() .. autoexception:: pubchempy.BadRequestError() .. autoexception:: pubchempy.NotFoundError() .. autoexception:: pubchempy.MethodNotAllowedError() .. autoexception:: pubchempy.TimeoutError() .. autoexception:: pubchempy.UnimplementedError() .. autoexception:: pubchempy.ServerError() Changes ------- - As of v1.0.3, the ``atoms`` and ``bonds`` properties on :class:`Compounds ` now return lists of :class:`~pubchempy.Atom` and :class:`~pubchempy.Bond` objects, rather than dicts. - As of v1.0.2, search functions now return an empty list instead of raising a :class:`~pubchempy.NotFoundError` exception when no results are found. :class:`~pubchempy.NotFoundError` is still raised when attempting to create a :class:`~pubchempy.Compound` using the ``from_cid`` class method with an invalid CID. PubChemPy-1.0.4/docs/source/conf.py000066400000000000000000000212421307322065300171010ustar00rootroot00000000000000# -*- coding: utf-8 -*- # PubChemPy documentation build configuration file, created by sphinx-quickstart on Thu Jan 23 10:39:02 2014. # This file is execfile()d with the current directory set to its containing dir. # Note that not all possible configuration values are present in this autogenerated file. # All configuration values have a default; values that are commented out serve to show the default. import sys import os # on_rtd is whether we are on readthedocs.org on_rtd = os.environ.get('READTHEDOCS', None) == 'True' # If extensions (or modules to document with autodoc) are in another directory, add these directories to sys.path here. # If the directory is relative to the documentation root, use os.path.abspath to make it absolute, like shown here. sys.path.insert(0, os.path.abspath('../..')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions coming with Sphinx # (named 'sphinx.ext.*') or your custom ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx', 'sphinx.ext.ifconfig', ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'PubChemPy' copyright = u'2014, Matt Swain' # The version info for the project you're documenting, acts as replacement for |version| and |release|, also used in # various other places throughout the built documents. # The short X.Y version. version = '1.0.4' # The full version, including alpha/beta/rc tags. release = '1.0.4' # The language for content autogenerated by Sphinx. Refer to documentation for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and directories to ignore when looking for source # files. exclude_patterns = [] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # If true, keep warnings as "system message" paragraphs in the built documents. #keep_warnings = False # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for a list of builtin themes. if not on_rtd: # only import and set the theme if we're building docs locally import sphinx_rtd_theme html_theme = 'sphinx_rtd_theme' html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] #html_theme = 'default' # Theme options are theme-specific and customize the look and feel of a theme further. For a list of options available # for each theme, see the documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the docs. This file should be a Windows icon # file (.ico) being 16x16 or 32x32 pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, relative to this directory. They are # copied after the builtin static files, so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Add any extra paths that contain custom files (such as robots.txt or .htaccess) here, relative to this directory. # These files are copied directly to the root of the documentation. #html_extra_path = [] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will contain a tag referring to it. The # value of this option must be the base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'PubChemPydoc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). 'papersize': 'a4paper', # The font size ('10pt', '11pt' or '12pt'). 'pointsize': '12pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto, manual, or own class]). latex_documents = [ ('index', 'PubChemPy.tex', u'PubChemPy Documentation', u'Matt Swain', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, not chapters. latex_use_parts = False # If true, show page references after internal links. latex_show_pagerefs = True # If true, show URL addresses after external links. latex_show_urls = True # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. latex_domain_indices = False # -- Options for manual page output --------------------------------------- # One entry per manual page. List of tuples (source start file, name, description, authors, manual section). man_pages = [ ('index', 'pubchempy', u'PubChemPy Documentation', [u'Matt Swain'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, dir menu entry, description, category) texinfo_documents = [ ('index', 'PubChemPy', u'PubChemPy Documentation', u'Matt Swain', 'PubChemPy', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # If true, do not generate a @detailmenu in the "Top" node's menu. #texinfo_no_detailmenu = False # -- Miscellaneous options ------------------------------------------------ intersphinx_mapping = { 'python': ('http://docs.python.org/', None), 'pandas': ('http://pandas.pydata.org/pandas-docs/stable/', None), } # Sort autodoc members by the order they appear in the source code autodoc_member_order = 'bysource' # Concatenate the class and __init__ docstrings together autoclass_content = 'both' PubChemPy-1.0.4/docs/source/guide/000077500000000000000000000000001307322065300166765ustar00rootroot00000000000000PubChemPy-1.0.4/docs/source/guide/advanced.rst000066400000000000000000000116531307322065300212030ustar00rootroot00000000000000.. _advanced: Advanced ======== .. _avoiding_timeouterror: Avoiding TimeoutError --------------------- If there are too many results for a request, you will receive a TimeoutError. There are different ways to avoid this, depending on what type of request you are doing. If retrieving full compound or substance records, instead request a list of cids or sids for your input, and then request the full records for those identifiers individually or in small groups. For example:: sids = get_sids('Aspirin', 'name') for sid in sids: s = Substance.from_sid(sid) When using the ``formula`` namespace or a ``searchtype``, you can also alternatively use the ``listkey_count`` and ``listkey_start`` keyword arguments to specify pagination. The ``listkey_count`` value specifies the number of results per page, and the ``listkey_start`` value specifies which page to return. For example:: get_compounds('CC', 'smiles', searchtype='substructure', listkey_count=5) get('C10H21N', 'formula', listkey_count=3, listkey_start=6) Logging ------- PubChemPy can generate logging statements if required. Just set the desired logging level:: import logging logging.basicConfig(level=logging.DEBUG) The logger is named 'pubchempy'. There is more information on logging in the `Python logging documentation`_. Using behind a proxy -------------------- When using PubChemPy behind a proxy, you may receive a ``URLError``:: URLError: A simple fix is to specify the proxy information via urllib. For Python 3:: import urllib proxy_support = urllib.request.ProxyHandler({ 'http': 'http://:', 'https': 'https://:' }) opener = urllib.request.build_opener(proxy_support) urllib.request.install_opener(opener) For Python 2:: import urllib2 proxy_support = urllib2.ProxyHandler({ 'http': 'http://:', 'https': 'https://:' }) opener = urllib2.build_opener(proxy_support) urllib2.install_opener(opener) Custom requests --------------- If you wish to perform more complicated requests, you can use the ``request`` function. This is an extremely simple wrapper around the REST API that allows you to construct any sort of request from a few parameters. The `PUG REST Specification`_ has all the information you will need to formulate your requests. The ``request`` function simply returns the exact response from the PubChem server as a string. This can be parsed in different ways depending on the output format you choose. See the Python `json`_, `xml`_ and `csv`_ packages for more information. Additionally, cheminformatics toolkits such as `Open Babel`_ and `RDKit`_ offer tools for handling SDF files in Python. The ``get`` function is very similar to the ``request`` function, except it handles ``listkey`` type responses automatically for you. This makes things simpler, however it means you can't take advantage of using the same ``listkey`` repeatedly to obtain different types of information. See the `PUG REST specification`_ for more information on how `listkey` responses work. Summary of possible inputs ~~~~~~~~~~~~~~~~~~~~~~~~~~ :: = list of cid, sid, aid, source, inchikey, listkey; string of name, smiles, xref, inchi, sdf; = substance | compound | assay compound domain = cid | name | smiles | inchi | sdf | inchikey | | | listkey | formula = record | property/[comma-separated list of property tags] | synonyms | sids | cids | aids | assaysummary | classification substance domain = sid | sourceid/ | sourceall/ | name | | listkey = record | synonyms | sids | cids | aids | assaysummary | classification assay domain = aid | listkey | type/ | sourceall/ = all | confirmatory | doseresponse | onhold | panel | rnai | screening | summary = record | aids | sids | cids | description | targets/{ProteinGI, ProteinName, GeneID, GeneSymbol} | doseresponse/sid = {substructure | superstructure | similarity | identity}/{smiles | inchi | sdf | cid} = xref/{RegistryID | RN | PubMedID | MMDBID | ProteinGI | NucleotideGI | TaxonomyID | MIMID | GeneID | ProbeID | PatentID} = XML | ASNT | ASNB | JSON | JSONP [ ?callback= ] | SDF | CSV | PNG | TXT .. _`Python logging documentation`: http://docs.python.org/2/howto/logging.html .. _`json`: http://docs.python.org/2/library/json.html .. _`xml`: http://docs.python.org/2/library/xml.etree.elementtree.html .. _`csv`: http://docs.python.org/2/library/csv.html .. _`PUG REST Specification`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html .. _`Open Babel`: http://openbabel.org/docs/current/UseTheLibrary/Python.html .. _`RDKit`: http://www.rdkit.org PubChemPy-1.0.4/docs/source/guide/compound.rst000066400000000000000000000033501307322065300212550ustar00rootroot00000000000000.. _compound: Compound ======== The :func:`~pubchempy.get_compounds` function returns a list of :class:`~pubchempy.Compound` objects. You can also instantiate a :class:`~pubchempy.Compound` object directly if you know its CID:: c = pcp.Compound.from_cid(6819) Dictionary representation ------------------------- Each :class:`~pubchempy.Compound` has a ``record`` property, which is a dictionary that contains the all the information about the compound, produced exactly from the JSON response from the PubChem API. All other properties are derived from this record. Additionally, each :class:`~pubchempy.Compound` provides a ``to_dict()`` method that returns PubChemPy's own dictionary representation of the Compound data. As well as being more concisely formatted than the raw ``record``, this method also takes an optional parameter to filter the list of the desired properties:: >>> c = pcp.Compound.from_cid(962) >>> c.to_dict(properties=['atoms', 'bonds', 'inchi']) {'atoms': [{'aid': 1, 'element': 'o', 'x': 2.5369, 'y': -0.155}, {'aid': 2, 'element': 'h', 'x': 3.0739, 'y': 0.155}, {'aid': 3, 'element': 'h', 'x': 2, 'y': 0.155}], 'bonds': [{'aid1': 1, 'aid2': 2, 'order': 'single'}, {'aid1': 1, 'aid2': 3, 'order': 'single'}], 'inchi': u'InChI=1S/H2O/h1H2'} 3D Compounds ------------ Many properties are missing from 3D records, and the following properties are *only* available on 3D records: - ``volume_3d`` - ``multipoles_3d`` - ``conformer_rmsd_3d`` - ``effective_rotor_count_3d`` - ``pharmacophore_features_3d`` - ``mmff94_partial_charges_3d`` - ``mmff94_energy_3d`` - ``conformer_id_3d`` - ``shape_selfoverlap_3d`` - ``feature_selfoverlap_3d`` - ``shape_fingerprint_3d`` PubChemPy-1.0.4/docs/source/guide/contribute.rst000066400000000000000000000016561307322065300216160ustar00rootroot00000000000000.. _contribute: Contribute ========== The `Issue Tracker`_ is the best place to post any feature ideas, requests and bug reports. If you are able to contribute changes yourself, just fork the `source code`_ on GitHub, make changes and file a pull request. Contributors ------------ - |ghi| `mcs07 `_ (Matt Swain) - |ghi| `ekaakurniawan `_ (Eka A. Kurniawan) - |ghi| `zachcp `_ (Zach Powers) - |ghi| `hsiaoyi0504 `_ (Hsiao Yi) - |ghi| `llazzaro `_ (Leonardo Lazzaro) - |ghi| `bjodah `_ (Björn Dahlgren) - |ghi| `RickardSjogren `_ (Rickard Sjögren) .. _`source code`: https://github.com/mcs07/PubChemPy .. _`Issue Tracker`: https://github.com/mcs07/PubChemPy/issues .. |ghi| raw:: html PubChemPy-1.0.4/docs/source/guide/download.rst000066400000000000000000000012061307322065300212360ustar00rootroot00000000000000.. _download: Download ======== The download function is for saving a file to disk. The following formats are available: XML, ASNT/B, JSON, SDF, CSV, PNG, TXT. Beware that not all formats are available for all types of information. SDF and PNG are only available for full Compound and Substance records, and CSV is best suited to tables of properties and identifiers. Examples:: pcp.download('PNG', 'asp.png', 'Aspirin', 'name') pcp.download('CSV', 's.csv', [1,2,3], operation='property/CanonicalSMILES,IsomericSMILES') For PNG images, the ``image_size`` argument can be used to specfiy ``large``, ``small`` or ``x``. PubChemPy-1.0.4/docs/source/guide/gettingstarted.rst000066400000000000000000000055201307322065300224620ustar00rootroot00000000000000.. _gettingstarted: Getting started =============== This page gives a introduction on how to get started with PubChemPy. This assumes you already have PubChemPy :ref:`installed `. Retrieving a Compound --------------------- Retrieving information about a specific Compound in the PubChem database is simple. Begin by importing PubChemPy:: >>> import pubchempy as pcp Let's get the Compound with `CID 5090`_:: >>> c = pcp.Compound.from_cid(5090) Now we have a :class:`~pubchempy.Compound` object called ``c``. We can get all the information we need from this object:: >>> print c.molecular_formula C17H14O4S >>> print c.molecular_weight 314.35566 >>> print c.isomeric_smiles CS(=O)(=O)C1=CC=C(C=C1)C2=C(C(=O)OC2)C3=CC=CC=C3 >>> print c.xlogp 2.3 >>> print c.iupac_name 3-(4-methylsulfonylphenyl)-4-phenyl-2H-furan-5-one >>> print c.synonyms [u'rofecoxib', u'Vioxx', u'Ceoxx', u'162011-90-7', u'MK 966', ... ] .. note:: All the code examples in this documentation will assume you have imported PubChemPy as `pcp`. If you prefer, you can alternatively import specific functions and classes by name and use them directly:: from pubchempy import Compound, get_compounds c = Compound.from_cid(1423) cs = get_compounds('Aspirin', 'name') Searching --------- What if you don't know the PubChem CID of the Compound you want? Just use the :func:`~pubchempy.get_compounds` function:: >>> results = pcp.get_compounds('Glucose', 'name') >>> print results [Compound(79025), Compound(5793), Compound(64689), Compound(206)] The first argument is the identifier, and the second argument is the identifier type, which must be one of ``name``, ``smiles``, ``sdf``, ``inchi``, ``inchikey`` or ``formula``. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let's take a look at them in more detail:: >>> for compound in results: ... print compound.isomeric_smiles C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O C(C1C(C(C(C(O1)O)O)O)O)O It looks like they all have different stereochemistry information. Retrieving the record for a SMILES string is just as easy:: >>> pcp.get_compounds('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles') [Compound(1318)] .. note:: Beware that line notation inputs like SMILES and InChI can return automatically generated records that aren't actually present in PubChem, and therefore have no CID and are missing many properties that are too complicated to calculate on the fly. That's all the most basic things you can do with PubChemPy. Read on for more some more advanced usage examples. .. _`CID 5090`: https://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5090 PubChemPy-1.0.4/docs/source/guide/install.rst000066400000000000000000000037601307322065300211040ustar00rootroot00000000000000.. _install: Installation ============ PubChemPy supports Python versions 2.7, 3.5, and 3.6. There are no other dependencies. There are a variety of ways to download and install PubChemPy. Option 1: Use pip (recommended) ------------------------------- The easiest and recommended way to install is using pip:: pip install pubchempy This will download the latest version of PubChemPy, and place it in your `site-packages` folder so it is automatically available to all your python scripts. If you don't already have pip installed, you can `install it using get-pip.py`_:: curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py python get-pip.py Option 2: Use conda ------------------- If you use `Anaconda Python`_, install with conda:: conda install -c mcs07 pubchempy Option 3: Download the latest release ------------------------------------- Alternatively, `download the latest release`_ manually and install yourself:: tar -xzvf PubChemPy-1.0.4.tar.gz cd PubChemPy-1.0.4 python setup.py install The setup.py command will install PubChemPy in your `site-packages` folder so it is automatically available to all your python scripts. Instead, you may prefer to just copy the pubchempy.py file into the desired project directory to only make it available to that project. Option 4: Clone the repository ------------------------------ The latest development version of PubChemPy is always `available on GitHub`_. This version is not guaranteed to be stable, but may include new features that have not yet been released. Simply clone the repository and install as usual:: git clone https://github.com/mcs07/PubChemPy.git cd PubChemPy python setup.py install .. _`install it using get-pip.py`: http://www.pip-installer.org/en/latest/installing.html .. _`Anaconda Python`: https://www.continuum.io/anaconda-overview .. _`download the latest release`: https://github.com/mcs07/PubChemPy/releases/ .. _`available on GitHub`: https://github.com/mcs07/PubChemPy PubChemPy-1.0.4/docs/source/guide/introduction.rst000066400000000000000000000044561307322065300221620ustar00rootroot00000000000000.. _introduction: Introduction ============ How PubChemPy works ------------------- PubChemPy relies entirely on the PubChem database and chemical toolkits provided via their PUG REST web service [#f1]_. This service provides an interface for programs to automatically carry out the tasks that you might otherwise perform manually via the `PubChem website`_. This is important to remember when using PubChemPy: Every request you make is transmitted to the PubChem servers, evaluated, and then a response is sent back. There are some downsides to this: It is less suitable for confidential work, it requires a constant internet connection, and some tasks will be slower than if they were performed locally on your own computer. On the other hand, this means we have the vast resources of the PubChem database and chemical toolkits at our disposal. As a result, it is possible to do complex similarity and substructure searching against a database containing tens of millions of compounds in seconds, without needing any of the storage space or computational power on your own local computer. The PUG REST web service ------------------------ You don't need to worry too much about how the PubChem web service works, because PubChemPy handles all of the details for you. But if you want to go beyond the capabilities of PubChemPy, there is some helpful documentation on the PubChem website. - `PUG REST Tutorial`_: Explains how the web service works with a variety of usage examples. - `PUG REST Specification`_: A more comprehensive but dense specification that details every possible way to use the web service. PubChemPy license ----------------- .. include:: ../../../LICENSE .. rubric:: Footnotes .. [#f1] That's a lot of acronyms! PUG stands for "Power User Gateway", a term used to describe a variety of methods for programmatic access to PubChem data and services. REST stands for `Representational State Transfer`_, which describes the specific architectural style of the web service. .. _`PubChem website`: https://pubchem.ncbi.nlm.nih.gov .. _`PUG REST Tutorial`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST_Tutorial.html .. _`PUG REST Specification`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html .. _`Representational State Transfer`: https://en.wikipedia.org/wiki/Representational_state_transfer PubChemPy-1.0.4/docs/source/guide/pandas.rst000066400000000000000000000017541307322065300207050ustar00rootroot00000000000000.. _pandas: *pandas* integration ==================== Getting *pandas* ---------------- *pandas* must be installed to use its functionality from within PubChemPy. The easiest way is to use pip:: pip install pandas See the `pandas documentation`_ for more information. Usage ----- It is possible for ``get_compounds``, ``get_substances`` and ``get_properties`` to return a pandas DataFrame:: df1 = pcp.get_compounds('C20H41Br', 'formula', as_dataframe=True) df2 = pcp.get_substances([1, 2, 3, 4], as_dataframe=True) df3 = pcp.get_properties(['isomeric_smiles', 'xlogp', 'rotatable_bond_count'], 'C20H41Br', 'formula', as_dataframe=True) An existing list of Compound objects can be converted into a dataframe, optionally specifying the desired columns:: cs = pcp.get_compounds('C20H41Br', 'formula') df4 = pcp.compounds_to_frame(cs, properties=['isomeric_smiles', 'xlogp', 'rotatable_bond_count']) .. _`pandas documentation`: http://pandas.pydata.org/pandas-docs/stable/ PubChemPy-1.0.4/docs/source/guide/properties.rst000066400000000000000000000033621307322065300216300ustar00rootroot00000000000000.. _properties: Properties ========== The ``get_properties`` function allows the retrieval of specific properties without having to deal with entire compound records. This is especially useful for retrieving the properties of a large number of compounds at once:: p = pcp.get_properties('IsomericSMILES', 'CC', 'smiles', searchtype='superstructure') Multiple properties may be specified in a list, or in a comma-separated string. The available properties are: MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, InChI, InChIKey, IUPACName, XLogP, ExactMass, MonoisotopicMass, TPSA, Complexity, Charge, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, HeavyAtomCount, IsotopeAtomCount, AtomStereoCount, DefinedAtomStereoCount, UndefinedAtomStereoCount, BondStereoCount, DefinedBondStereoCount, UndefinedBondStereoCount, CovalentUnitCount, Volume3D, XStericQuadrupole3D, YStericQuadrupole3D, ZStericQuadrupole3D, FeatureCount3D, FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D, FeatureHydrophobeCount3D, ConformerModelRMSD3D, EffectiveRotorCount3D, ConformerCount3D. Synonyms -------- Get a list of synonyms for a given input using the ``get_synonyms`` function:: pcp.get_synonyms('Aspirin', 'name') pcp.get_synonyms('Aspirin', 'name', 'substance') Inputs that match more than one SID/CID will have multiple, separate synonyms lists returned. Identifiers ----------- There are three functions for getting a list of identifiers for a given input: - ``pcp.get_cids`` - ``pcp.get_sids`` - ``pcp.get_aids`` For example, passing a CID to get_sids will return a list of SIDs corresponding to the Substance records that were standardised and merged to produce the given Compound. PubChemPy-1.0.4/docs/source/guide/searching.rst000066400000000000000000000051141307322065300213740ustar00rootroot00000000000000.. _searching: Searching ========= 2D and 3D coordinates --------------------- By default, compounds are returned with 2D coordinates. Use the ``record_type`` keyword argument to specify otherwise:: pcp.get_compounds('Aspirin', 'name', record_type='3d') Advanced search types --------------------- By default, requests look for an exact match with the input. Alternatively, you can specify substructure, superstructure, similarity and identity searches using the ``searchtype`` keyword argument:: pcp.get_compounds('CC', searchtype='superstructure', listkey_count=3) The ``listkey_count`` and ``listkey_start`` arguments can be used for pagination. Each ``searchtype`` has its own options that can be specified as keyword arguments. For example, similarity searches have a ``Threshold``, and super/substructure searches have ``MatchIsotopes``. A full list of options is available in the `PUG REST Specification`_. Note: These types of search are *slow*. Getting a full results list for common compound names ----------------------------------------------------- For some very common names, PubChem maintains a filtered whitelist of human-chosen CIDs with the intention of reducing confusion about which is the 'right' result. In the past, a search for Glucose would return four different results, each with different stereochemistry information. But now, a single result is returned, which has been chosen as 'correct' by the PubChem team. Unfortunately it isn't directly possible to return to the previous behaviour, but there is a straightforward workaround: Search for Substances with that name (which are completely unfiltered) and then get the compounds that are derived from those substances. There area a few different ways you can do this using PubChemPy, but the easiest is probably using the ``get_cids`` function: >>> pcp.get_cids('2-nonenal', 'name', 'substance', list_return='flat') [17166, 5283335, 5354833] This searches the substance database for '2-nonenal', and gets the CID for the compound associated with each substance. By default, this returns a mapping between each SID and CID, but the ``list_return='flat'`` parameter flattens this into just a single list of unique CIDs. You can then use ``Compound.from_cid`` to get the full Compound record, equivalent to what is returned by get_compounds: >>> cids = pcp.get_cids('2-nonenal', 'name', 'substance', list_return='flat') >>> [pcp.Compound.from_cid(cid) for cid in cids] [Compound(17166), Compound(5283335), Compound(5354833)] .. _`PUG REST Specification`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html PubChemPy-1.0.4/docs/source/guide/substance.rst000066400000000000000000000032721307322065300214230ustar00rootroot00000000000000.. _substance: Substance ========= The PubChem Substance database contains all chemical records deposited in PubChem in their most raw form, before any significant processing is applied. As a result, it contains duplicates, mixtures, and some records that don't make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record. The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Hence each Compound may be derived from a number of different Substances. Retrieving substances --------------------- Retrieve Substances using the :func:`~pubchempy.get_substances` function:: >>> results = pcp.get_substances('Coumarin 343', 'name') >>> print results [Substance(24864499), Substance(85084977), Substance(126686397), Substance(143491255), Substance(152243230), Substance(162092514), Substance(162189467), Substance(186021999), Substance(206257050)] You can also instantiate a Substance directly from its SID:: >>> substance = pcp.Substance.from_sid(223766453) >>> print substance.synonyms ['2-(Acetyloxy)-benzoic acid', '2-(acetyloxy)benzoic acid', '2-acetoxy benzoic acid', '2-acetoxy-benzoic acid', '2-acetoxybenzoic acid', '2-acetyloxybenzoic acid', 'BSYNRYMUTXBXSQ-UHFFFAOYSA-N', 'acetoxybenzoic acid', 'acetyl salicylic acid', 'acetyl-salicylic acid', 'acetylsalicylic acid', 'aspirin', 'o-acetoxybenzoic acid'] >>> print substance.source_id BSYNRYMUTXBXSQ-UHFFFAOYSA-N >>> print substance.standardized_cid 2244 >>> print substance.standardized_compound Compound(2244) PubChemPy-1.0.4/docs/source/index.rst000066400000000000000000000052131307322065300174430ustar00rootroot00000000000000.. PubChemPy documentation master file, created by sphinx-quickstart on Thu Jan 23 10:39:02 2014. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. PubChemPy documentation ======================= PubChemPy provides a way to interact with PubChem in Python. It allows chemical searches by name, substructure and similarity, chemical standardization, conversion between chemical file formats, depiction and retrieval of chemical properties. Here's a quick example showing how to search for a compound by name:: for compound in get_compounds('glucose', 'name'): print compound.cid print compound.isomeric_smiles Here's how you get calculated properties for a specific compound:: vioxx = Compound.from_cid(5090) print vioxx.molecular_formula print vioxx.molecular_weight print vioxx.xlogp All the heavy lifting is done by PubChem's servers, using their database and chemical toolkits. Features -------- - Search PubChem Substance and Compound databases by name, SMILES, InChI and SDF. - Retrieve the standardised Compound record for a given input structure. - Convert between SDF, SMILES, InChI, PubChem CID and more. - Retrieve calculated properties, fingerprints and descriptors. - Generate 2D and 3D coordinates. - Get IUPAC systematic names, trade names and all known synonyms for a given Compound. - Download compound records as XML, ASNT/B, JSON, SDF and depiction as a PNG image. - Construct property tables using *pandas* DataFrames. - A complete Python wrapper around the `PubChem PUG REST web service`_. - Supports Python versions 2.7 – 3.4. Useful links ------------ - Source code is available on `GitHub`_. - Ask a question or report a bug on the `Issue Tracker`_. - PUG REST API `tutorial`_ and `documentation`_. User guide ---------- A step-by-step guide to getting started with PubChemPy. .. toctree:: :maxdepth: 2 guide/introduction guide/install guide/gettingstarted guide/searching guide/compound guide/substance guide/properties guide/pandas guide/download guide/advanced guide/contribute API documentation ----------------- Comprehensive API documentation with information on every function, class and method. .. toctree:: :maxdepth: 2 api .. _`PubChem PUG REST web service`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST_Tutorial.html .. _`GitHub`: https://github.com/mcs07/PubChemPy .. _`Issue Tracker`: https://github.com/mcs07/PubChemPy/issues .. _`tutorial`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST_Tutorial.html .. _`documentation`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html PubChemPy-1.0.4/examples/000077500000000000000000000000001307322065300151675ustar00rootroot00000000000000PubChemPy-1.0.4/examples/1-introduction.ipynb000066400000000000000000000101761307322065300211160ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PubChemPy examples\n", "\n", "## Table of Contents\n", "\n", "- [1. Introduction](1-introduction.ipynb)\n", "- [2. Getting Started](2-getting-started.ipynb)\n", "\n", "# 1. Introduction\n", "\n", "PubChemPy provides a way to interact with PubChem in Python. It allows chemical searches by name, substructure and similarity, chemical standardization, conversion between chemical file formats, depiction and retrieval of chemical properties.\n", "\n", "Here’s a quick example showing how to search for a compound by name:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5793\n", "C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O\n", "79025\n", "C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O\n", "64689\n", "C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O\n", "206\n", "C(C1C(C(C(C(O1)O)O)O)O)O\n" ] } ], "source": [ "from pubchempy import get_compounds\n", "\n", "for compound in get_compounds('glucose', 'name'):\n", " print(compound.cid)\n", " print(compound.isomeric_smiles)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So how does this work behind the scenes?\n", "\n", "1. We call the PubChemPy function `get_compounds` with the parameters `'glucose'` and `'name'`\n", "2. This is translated into [a request](https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/glucose/JSON) for the PubChem PUG REST API.\n", "3. PubChemPy parses the JSON response into a list of `Compound` objects.\n", "4. Each `Compound` has properties like `cid` and `isomeric_smiles`, which we print.\n", "\n", "Here’s how you get calculated properties for a specific compound:\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C17H14O4S\n", "314.35566\n", "2.3\n" ] } ], "source": [ "from pubchempy import Compound\n", "\n", "vioxx = Compound.from_cid(5090)\n", "print vioxx.molecular_formula\n", "print vioxx.molecular_weight\n", "print vioxx.xlogp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using PubChemPy, it is important to remember that every request you make is transmitted to the PubChem servers, evaluated, and then a response is sent back. There are some downsides to this: It is less suitable for confidential work, it requires a constant internet connection, and some tasks will be slower than if they were performed locally on your own computer. On the other hand, this means we have the vast resources of the PubChem database and chemical toolkits at our disposal. As a result, it is possible to do complex similarity and substructure searching against a database containing tens of millions of compounds in seconds, without needing any of the storage space or computational power on your own local computer.\n", "\n", "You don’t need to worry too much about how the PubChem web service works, because PubChemPy handles all of the details for you. But if you want to go beyond the capabilities of PubChemPy, there is some [helpful documentation on the PubChem website](https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST_Tutorial.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Next: [Getting Started](2-getting-started.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 } PubChemPy-1.0.4/examples/2-getting-started.ipynb000066400000000000000000000221151307322065300214770ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PubChemPy examples\n", "\n", "## Table of Contents\n", "\n", "- [1. Introduction](1-introduction.ipynb)\n", "- [2. Getting Started](2-getting-started.ipynb)\n", "\n", "# 2. Getting Started\n", "\n", "## Retrieving a Compound\n", "\n", "Retrieving information about a specific Compound in the PubChem database is simple.\n", "\n", "Begin by importing PubChemPy:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pubchempy as pcp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s get the Compound with [CID 5090](https://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5090):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Compound(5090)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = pcp.Compound.from_cid(5090)\n", "c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a `Compound` object called `c`. We can get all the information we need from this object:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C17H14O4S\n" ] } ], "source": [ "print(c.molecular_formula)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "314.35566\n" ] } ], "source": [ "print(c.molecular_weight)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CS(=O)(=O)C1=CC=C(C=C1)C2=C(C(=O)OC2)C3=CC=CC=C3\n" ] } ], "source": [ "print(c.isomeric_smiles)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.3\n" ] } ], "source": [ "print(c.xlogp)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3-(4-methylsulfonylphenyl)-4-phenyl-2H-furan-5-one\n" ] } ], "source": [ "print(c.iupac_name)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[u'rofecoxib', u'Vioxx', u'Ceoxx', u'162011-90-7', u'MK 966', u'MK-966', u'4-[4-(methylsulfonyl)phenyl]-3-phenylfuran-2(5H)-one', u'MK-0966', u'Vioxx (trademark)', u'MK 0966', u'CCRIS 8967', u'CHEBI:8887', u'Vioxx (TN)', u'HSDB 7262', u'Spectrum_000119', u'SpecPlus_000669', u'Spectrum2_000446', u'Spectrum3_001153', u'Spectrum4_000631', u'Spectrum5_001598', u'UNII-0QTW8Z7MCR', u'MK 996', u'MK0966', u'CHEMBL122', u'AC1L1JL6', u'KS-1107', u'3-(4-methylsulfonylphenyl)-4-phenyl-2H-furan-5-one', u'4-(4-methylsulfonylphenyl)-3-phenyl-5H-furan-2-one', u'NCGC00095118-01', u'BSPBio_002705', u'KBioGR_001242', u'KBioGR_002345', u'KBioSS_000559', u'KBioSS_002348', u'Rofecoxib (JAN/USAN/INN)', u'BIDD:GT0399', u'Bio-0094', u'DivK1c_006765', u'4-[4-(methylsulfonyl)phenyl]-3-phenyl-2(5H)-furanone', u'SPBio_000492', u'SPECTRUM1504235', u'MLS000759440', u'MLS001165770', u'MLS001195623', u'MLS001424113', u'Jsp003237', u'C17H14O4S', u'KBio1_001709', u'KBio2_000559', u'KBio2_002345', u'KBio2_003127', u'KBio2_004913', u'KBio2_005695', u'KBio2_007481', u'KBio3_002205', u'KBio3_002825', u'cMAP_000024', u'MolPort-000-883-878', u'MolPort-006-817-786', u'HMS1922H11', u'HMS2051G16', u'HMS2089H20', u'HMS2093E04', u'NSC720256', u'STK635144', u'ZINC00007455', u'AKOS000280931', u'4-(4-(Methylsulfonyl)phenyl)-3-phenyl-2(5H)-furanone', u'4-(p-(Methylsulfonyl)phenyl)-3-phenyl-2(5H)-furanone', u'DB00533', u'MK 0996', u'2(5H)-Furanone, 4-(4-(methylsulfonyl)phenyl)-3-phenyl-', u'3-Phenyl-4-(4-(methylsulfonyl)phenyl))-2(5H)-furanone', u'NCGC00095118-02', u'NCGC00095118-03', u'NCGC00095118-04', u'AC-13144', u'CPD000466331', u'LS-70511', u'NCI60_041175', u'SAM001246617', u'SMR000466331', u'FT-0081390', u'C07590', u'D00568', u'L000912', u'186912-82-3', u'BRD-K21733600-001-02-6', u'I01-1042', u'3-phenyl-4-[4-(methylsulfonyl)phenyl]-2(5H)-furanone', u'4-(4-(Methylsulfonyl)phenyl)-3-phenylfuran-2(5H)-one', u'refecoxib', u'2(5H)-Furanone, 4-[4-(methyl-sulfonyl)phenyl]-3-phenyl-', u'Vioxx Dolor', u'MSD brand of rofecoxib', u'Merck brand of rofecoxib', u'2(5H)-Furanone, 4-[4-(methylsulfonyl)phenyl]-3-phenyl-', u'Merck Frosst brand of rofecoxib', u'CID5090', u'Rofecoxib (Vioxx)', u'Rofecoxib [USAN]', u'Cahill May Roberts brand of rofecoxib', u'Merck Sharp & Dhome brand of rofecoxib', u'PubChem15028', u'SureCN3050', u'DSSTox_CID_3567', u'AGN-PC-00E0TK', u'DSSTox_RID_77084', u'C116926', u'DSSTox_GSID_23567', u'Rofecoxib [USAN:INN:BAN]', u'GTPL2893', u'HMS2232G21', u'HMS3371P11', u'HMS3393G16', u'MK966', u'Pharmakon1600-01504235', u'Tox21_111430', u'ANW-71936', u'CCG-40253', u'DAP001338', u'NSC758705', u'AB07701', u'BD41342', u'CS-0997', u'MCULE-4806636118', u'NC00132', u'NSC-720256', u'NSC-758705', u'NCGC00095118-05', u'AK-60971', u'HY-17372', u'CAS-162011-90-7', u'FT-0631192', u'K-5064', u'AB00052090-06', u'AB00052090-08', u'A810324', u'3B2-0954', u'Rofecoxib|162011-90-7|Vioxx|MK966|MK-966', u'4-(4-METHANESULFONYL-PHENYL)-3-PHENYL-5H-FURAN-2-ONE', u'4-(4-methanesulfonylphenyl)-3-phenyl-2,5-dihydrofuran-2-one']\n" ] } ], "source": [ "print(c.synonyms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching\n", "\n", "What if you don’t know the PubChem CID of the Compound you want? Just use the `get_compounds()` function:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[Compound(5793), Compound(79025), Compound(64689), Compound(206)]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results = pcp.get_compounds('Glucose', 'name')\n", "results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first argument is the identifier, and the second argument is the identifier type, which must be one of name, smiles, sdf, inchi, inchikey or formula. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let’s take a look at them in more detail:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O\n", "C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O\n", "C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O\n", "C(C1C(C(C(C(O1)O)O)O)O)O\n" ] } ], "source": [ "for compound in results:\n", " print compound.isomeric_smiles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like they all have different stereochemistry information.\n", "\n", "Retrieving the record for a SMILES string is just as easy:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[Compound(1318)]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pcp.get_compounds('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's worth being aware that line notation inputs like SMILES and InChI can return automatically generated records that aren’t actually present in PubChem, and therefore have no CID and are missing many properties that are too complicated to calculate on the fly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Previous: [Introduction](1-introduction.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 } PubChemPy-1.0.4/examples/CAS registry numbers.ipynb000066400000000000000000000156001307322065300221270ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Retrieving CAS registry numbers" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import re\n", "import pubchempy as pcp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enable debug logging to make it easier to see what is going on:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import logging\n", "\n", "logging.getLogger('pubchempy').setLevel(logging.DEBUG)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function to get the CAS registry numbers for compounds with a particular SMILES substructure:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def get_substructure_cas(smiles):\n", " cas_rns = []\n", " results = pcp.get_synonyms(smiles, 'smiles', searchtype='substructure')\n", " for result in results:\n", " for syn in result.get('Synonym', []):\n", " match = re.match('(\\d{2,7}-\\d\\d-\\d)', syn)\n", " if match:\n", " cas_rns.append(match.group(1))\n", " return cas_rns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test some inputs:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/JSON\n", "DEBUG:pubchempy:Request data: smiles=%5BPb%5D\n", "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/3178699647975629202/synonyms/JSON\n", "DEBUG:pubchempy:Request data: None\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "2174\n", "[u'7439-92-1', u'15875-18-0', u'54076-28-7', u'14701-27-0', u'15158-12-0', u'52229-97-7', u'724427-66-1', u'598-63-0', u'13427-42-4', u'17398-75-3']\n" ] } ], "source": [ "cas_rns = get_substructure_cas('[Pb]')\n", "print(len(cas_rns))\n", "print(cas_rns[:10])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/JSON\n", "DEBUG:pubchempy:Request data: smiles=%5BSe%5D\n", "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/852509384123203131/synonyms/JSON\n", "DEBUG:pubchempy:Request data: None\n", "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/852509384123203131/synonyms/JSON\n", "DEBUG:pubchempy:Request data: None\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "14577\n", "[u'10102-18-8', u'26970-82-1', u'15498-87-0', u'7782-82-3', u'14013-56-0', u'14013-56-0', u'29528-97-0', u'50647-14-8', u'7782-49-2', u'11125-23-8']\n" ] } ], "source": [ "cas_rns = get_substructure_cas('[Se]')\n", "print(len(cas_rns))\n", "print(cas_rns[:10])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/JSON\n", "DEBUG:pubchempy:Request data: smiles=%5BTi%5D\n", "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/1812119792714669902/synonyms/JSON\n", "DEBUG:pubchempy:Request data: None\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "1630\n", "[u'13463-67-7', u'1317-80-2', u'1317-70-0', u'98084-96-9', u'100292-32-8', u'101239-53-6', u'116788-85-3', u'12000-59-8', u'12036-20-3', u'12701-76-7']\n" ] } ], "source": [ "cas_rns = get_substructure_cas('[Ti]')\n", "print(len(cas_rns))\n", "print(cas_rns[:10])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/JSON\n", "DEBUG:pubchempy:Request data: smiles=%5BPd%5D\n", "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/2802290965277166497/synonyms/JSON\n", "DEBUG:pubchempy:Request data: None\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "1401\n", "[u'7440-05-3', u'17637-99-9', u'53092-86-7', u'7647-10-1', u'10038-97-8', u'10102-05-3', u'14846-30-1', u'884739-77-9', u'3375-31-3', u'19807-27-3']\n" ] } ], "source": [ "cas_rns = get_substructure_cas('[Pd]')\n", "print(len(cas_rns))\n", "print(cas_rns[:10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could potentially get a TimeoutError if there are too many results. In this case, it might be better to perform the substructure search and then get the synonyms separately:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/JSON\n", "DEBUG:pubchempy:Request data: smiles=%5BPd%5D\n", "DEBUG:pubchempy:Request URL: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/3838589667186536348/cids/JSON\n", "DEBUG:pubchempy:Request data: None\n" ] } ], "source": [ "cids = pcp.get_cids('[Pd]', 'smiles', searchtype='substructure')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Then you can do `pcp.get_synonyms(cids)` with the list of CIDs." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 } PubChemPy-1.0.4/examples/Chemical fingerprints and similarity.ipynb000066400000000000000000000240121307322065300253230ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Chemical similarity using PubChem fingerprints" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pubchempy as pcp\n", "from IPython.display import Image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we'll get some compounds. Here we just use PubChem CIDs to retrieve, but you could search (e.g. using name, SMILES, SDF, etc.)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coumarin = pcp.Compound.from_cid(323)\n", "Image(url='https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=323&t=l')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coumarin_314 = pcp.Compound.from_cid(72653)\n", "Image(url='https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=72653&t=l')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coumarin_343 = pcp.Compound.from_cid(108770)\n", "Image(url='https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=108770&t=l')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aspirin = pcp.Compound.from_cid(2244)\n", "Image(url='https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=2244&t=l')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The similarity between two molecules is typically calculated using molecular fingerprints that encode structural information about the molecule as a series of bits (0 or 1). These bits represent the presence or absence of particular patterns or substructures — two molecules that contain more of the same patterns will have more bits in common, indicating that they are more similar.\n", "\n", "The PubChem CACTVS fingerprint is available on each compound using the `fingerprint` method. This is returned as a hex-encoded string:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'0000037180703000000000000000000000000000000000000000304000000000000000810000001A00000000000C04809800300E80000400880220D208000208002020000888000608C80C262284311A823A20A4C01108A98780C0200E00000000000800000000000000100000000000000000'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coumarin.fingerprint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can decode this from hexadecimal and then display as a binary string as follows:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'0b1101110001100000000111000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011000001000000000000000000000000000000000000000000000000000000000000001000000100000000000000000000000000011010000000000000000000000000000000000000000000001100000001001000000010011000000000000011000000001110100000000000000000000100000000001000100000000010001000001101001000001000000000000000001000001000000000000010000000100000000000000000100010001000000000000000011000001000110010000000110000100110001000101000010000110001000110101000001000111010001000001010010011000000000100010000100010101001100001111000000011000000001000000000111000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bin(int(coumarin.fingerprint, 16))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is more information about the PubChem fingerprints at \n", "\n", "The most commonly used measure for quantifying the similarity of two fingerprints is the Tanimoto Coefficient, given by:\n", "\n", "$$ T = \\frac{N_{ab}}{N_{a} + N_{b} - N_{ab}} $$\n", "\n", "where $N_{a}$ and $N_{b}$ are the number of 1-bits (i.e corresponding to the presence of a pattern) in the fingerprints of molecule $a$ and molecule $b$ respectively. $N_{ab}$ is the number of 1-bits common to the fingerprints of both molecule $a$ and $b$. The Tanimoto coefficient ranges from 0 when the fingerprints have no bits in common, to 1 when the fingerprints are identical.\n", "\n", "Here's a simple way to calculate the Tanimoto coefficient between two compounds in python:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def tanimoto(compound1, compound2):\n", " fp1 = int(compound1.fingerprint, 16)\n", " fp2 = int(compound2.fingerprint, 16)\n", " fp1_count = bin(fp1).count('1')\n", " fp2_count = bin(fp2).count('1')\n", " both_count = bin(fp1 & fp2).count('1')\n", " return float(both_count) / (fp1_count + fp2_count - both_count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try it out:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tanimoto(coumarin, coumarin)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.6011904761904762" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tanimoto(coumarin, coumarin_314)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.6011904761904762" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tanimoto(coumarin, coumarin_343)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.9529411764705882" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tanimoto(coumarin_314, coumarin_343)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.8211382113821138" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tanimoto(coumarin, aspirin)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.6123595505617978" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tanimoto(coumarin_343, aspirin)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a nice simple method, but not particularly efficient. If you are looking for better performance, check out Andrew Dalke's work:\n", "\n", "- [Computing Tanimoto scores, quickly](http://www.dalkescientific.com/writings/diary/archive/2008/06/27/computing_tanimoto_scores.html)\n", "- [chemfp](http://chemfp.com)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 } PubChemPy-1.0.4/pubchempy.py000066400000000000000000001365251307322065300157330ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ PubChemPy Python interface for the PubChem PUG REST service. https://github.com/mcs07/PubChemPy """ from __future__ import print_function from __future__ import unicode_literals from __future__ import division import functools import json import logging import os import sys import time import warnings import binascii try: from urllib.error import HTTPError from urllib.parse import quote, urlencode from urllib.request import urlopen except ImportError: from urllib import urlencode from urllib2 import quote, urlopen, HTTPError try: from itertools import zip_longest except ImportError: from itertools import izip_longest as zip_longest __author__ = 'Matt Swain' __email__ = 'm.swain@me.com' __version__ = '1.0.4' __license__ = 'MIT' API_BASE = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug' log = logging.getLogger('pubchempy') log.addHandler(logging.NullHandler()) if sys.version_info[0] == 3: text_types = str, bytes else: text_types = basestring, class CompoundIdType(object): """""" #: Original Deposited Compound DEPOSITED = 0 #: Standardized Form of the Deposited Compound STANDARDIZED = 1 #: Component of the Standardized Form COMPONENT = 2 #: Neutralized Form of the Standardized Form NEUTRALIZED = 3 #: Deposited Mixture Component MIXTURE = 4 #: Alternate Tautomer Form of the Standardized Form TAUTOMER = 5 #: Ionized pKa Form of the Standardized Form IONIZED = 6 #: Unspecified or Unknown Compound Type UNKNOWN = 255 class BondType(object): SINGLE = 1 DOUBLE = 2 TRIPLE = 3 QUADRUPLE = 4 DATIVE = 5 COMPLEX = 6 IONIC = 7 UNKNOWN = 255 class CoordinateType(object): TWO_D = 1 THREE_D = 2 SUBMITTED = 3 EXPERIMENTAL = 4 COMPUTED = 5 STANDARDIZED = 6 AUGMENTED = 7 ALIGNED = 8 COMPACT = 9 UNITS_ANGSTROMS = 10 UNITS_NANOMETERS = 11 UNITS_PIXEL = 12 UNITS_POINTS = 13 UNITS_STDBONDS = 14 UNITS_UNKNOWN = 255 class ProjectCategory(object): MLSCN = 1 MPLCN = 2 MLSCN_AP = 3 MPLCN_AP = 4 JOURNAL_ARTICLE = 5 ASSAY_VENDOR = 6 LITERATURE_EXTRACTED = 7 LITERATURE_AUTHOR = 8 LITERATURE_PUBLISHER = 9 RNAIGI = 10 OTHER = 255 ELEMENTS = { 1: 'H', 2: 'He', 3: 'Li', 4: 'Be', 5: 'B', 6: 'C', 7: 'N', 8: 'O', 9: 'F', 10: 'Ne', 11: 'Na', 12: 'Mg', 13: 'Al', 14: 'Si', 15: 'P', 16: 'S', 17: 'Cl', 18: 'Ar', 19: 'K', 20: 'Ca', 21: 'Sc', 22: 'Ti', 23: 'V', 24: 'Cr', 25: 'Mn', 26: 'Fe', 27: 'Co', 28: 'Ni', 29: 'Cu', 30: 'Zn', 31: 'Ga', 32: 'Ge', 33: 'As', 34: 'Se', 35: 'Br', 36: 'Kr', 37: 'Rb', 38: 'Sr', 39: 'Y', 40: 'Zr', 41: 'Nb', 42: 'Mo', 43: 'Tc', 44: 'Ru', 45: 'Rh', 46: 'Pd', 47: 'Ag', 48: 'Cd', 49: 'In', 50: 'Sn', 51: 'Sb', 52: 'Te', 53: 'I', 54: 'Xe', 55: 'Cs', 56: 'Ba', 57: 'La', 58: 'Ce', 59: 'Pr', 60: 'Nd', 61: 'Pm', 62: 'Sm', 63: 'Eu', 64: 'Gd', 65: 'Tb', 66: 'Dy', 67: 'Ho', 68: 'Er', 69: 'Tm', 70: 'Yb', 71: 'Lu', 72: 'Hf', 73: 'Ta', 74: 'W', 75: 'Re', 76: 'Os', 77: 'Ir', 78: 'Pt', 79: 'Au', 80: 'Hg', 81: 'Tl', 82: 'Pb', 83: 'Bi', 84: 'Po', 85: 'At', 86: 'Rn', 87: 'Fr', 88: 'Ra', 89: 'Ac', 90: 'Th', 91: 'Pa', 92: 'U', 93: 'Np', 94: 'Pu', 95: 'Am', 96: 'Cm', 97: 'Bk', 98: 'Cf', 99: 'Es', 100: 'Fm', 101: 'Md', 102: 'No', 103: 'Lr', 104: 'Rf', 105: 'Db', 106: 'Sg', 107: 'Bh', 108: 'Hs', 109: 'Mt', 110: 'Ds', 111: 'Rg', 112: 'Cp', 113: 'ut', 114: 'uq', 115: 'up', 116: 'uh', 117: 'us', 118: 'uo', } def request(identifier, namespace='cid', domain='compound', operation=None, output='JSON', searchtype=None, **kwargs): """ Construct API request from parameters and return the response. Full specification at http://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html """ if not identifier: raise ValueError('identifier/cid cannot be None') # If identifier is a list, join with commas into string if isinstance(identifier, int): identifier = str(identifier) if not isinstance(identifier, text_types): identifier = ','.join(str(x) for x in identifier) # Filter None values from kwargs kwargs = dict((k, v) for k, v in kwargs.items() if v is not None) # Build API URL urlid, postdata = None, None if namespace == 'sourceid': identifier = identifier.replace('/', '.') if namespace in ['listkey', 'formula', 'sourceid'] \ or searchtype == 'xref' \ or (searchtype and namespace == 'cid') or domain == 'sources': urlid = quote(identifier.encode('utf8')) else: postdata = urlencode([(namespace, identifier)]).encode('utf8') comps = filter(None, [API_BASE, domain, searchtype, namespace, urlid, operation, output]) apiurl = '/'.join(comps) if kwargs: apiurl += '?%s' % urlencode(kwargs) # Make request try: log.debug('Request URL: %s', apiurl) log.debug('Request data: %s', postdata) response = urlopen(apiurl, postdata) return response except HTTPError as e: raise PubChemHTTPError(e) def get(identifier, namespace='cid', domain='compound', operation=None, output='JSON', searchtype=None, **kwargs): """Request wrapper that automatically handles async requests.""" if (searchtype and searchtype != 'xref') or namespace in ['formula']: response = request(identifier, namespace, domain, None, 'JSON', searchtype, **kwargs).read() status = json.loads(response.decode()) if 'Waiting' in status and 'ListKey' in status['Waiting']: identifier = status['Waiting']['ListKey'] namespace = 'listkey' while 'Waiting' in status and 'ListKey' in status['Waiting']: time.sleep(2) response = request(identifier, namespace, domain, operation, 'JSON', **kwargs).read() status = json.loads(response.decode()) if not output == 'JSON': response = request(identifier, namespace, domain, operation, output, searchtype, **kwargs).read() else: response = request(identifier, namespace, domain, operation, output, searchtype, **kwargs).read() return response def get_json(identifier, namespace='cid', domain='compound', operation=None, searchtype=None, **kwargs): """Request wrapper that automatically parses JSON response and supresses NotFoundError.""" try: return json.loads(get(identifier, namespace, domain, operation, 'JSON', searchtype, **kwargs).decode()) except NotFoundError as e: log.info(e) return None def get_sdf(identifier, namespace='cid', domain='compound',operation=None, searchtype=None, **kwargs): """Request wrapper that automatically parses SDF response and supresses NotFoundError.""" try: return get(identifier, namespace, domain, operation, 'SDF', searchtype, **kwargs).decode() except NotFoundError as e: log.info(e) return None def get_compounds(identifier, namespace='cid', searchtype=None, as_dataframe=False, **kwargs): """Retrieve the specified compound records from PubChem. :param identifier: The compound identifier to use as a search query. :param namespace: (optional) The identifier type, one of cid, name, smiles, sdf, inchi, inchikey or formula. :param searchtype: (optional) The advanced search type, one of substructure, superstructure or similarity. :param as_dataframe: (optional) Automatically extract the :class:`~pubchempy.Compound` properties into a pandas :class:`~pandas.DataFrame` and return that. """ results = get_json(identifier, namespace, searchtype=searchtype, **kwargs) compounds = [Compound(r) for r in results['PC_Compounds']] if results else [] if as_dataframe: return compounds_to_frame(compounds) return compounds def get_substances(identifier, namespace='sid', as_dataframe=False, **kwargs): """Retrieve the specified substance records from PubChem. :param identifier: The substance identifier to use as a search query. :param namespace: (optional) The identifier type, one of sid, name or sourceid/. :param as_dataframe: (optional) Automatically extract the :class:`~pubchempy.Substance` properties into a pandas :class:`~pandas.DataFrame` and return that. """ results = get_json(identifier, namespace, 'substance', **kwargs) substances = [Substance(r) for r in results['PC_Substances']] if results else [] if as_dataframe: return substances_to_frame(substances) return substances def get_assays(identifier, namespace='aid', **kwargs): """Retrieve the specified assay records from PubChem. :param identifier: The assay identifier to use as a search query. :param namespace: (optional) The identifier type. """ results = get_json(identifier, namespace, 'assay', 'description', **kwargs) return [Assay(r) for r in results['PC_AssayContainer']] if results else [] # Allows properties to optionally be specified as underscore_separated, consistent with Compound attributes PROPERTY_MAP = { 'molecular_formula': 'MolecularFormula', 'molecular_weight': 'MolecularWeight', 'canonical_smiles': 'CanonicalSMILES', 'isomeric_smiles': 'IsomericSMILES', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'iupac_name': 'IUPACName', 'xlogp': 'XLogP', 'exact_mass': 'ExactMass', 'monoisotopic_mass': 'MonoisotopicMass', 'tpsa': 'TPSA', 'complexity': 'Complexity', 'charge': 'Charge', 'h_bond_donor_count': 'HBondDonorCount', 'h_bond_acceptor_count': 'HBondAcceptorCount', 'rotatable_bond_count': 'RotatableBondCount', 'heavy_atom_count': 'HeavyAtomCount', 'isotope_atom_count': 'IsotopeAtomCount', 'atom_stereo_count': 'AtomStereoCount', 'defined_atom_stereo_count': 'DefinedAtomStereoCount', 'undefined_atom_stereo_count': 'UndefinedAtomStereoCount', 'bond_stereo_count': 'BondStereoCount', 'defined_bond_stereo_count': 'DefinedBondStereoCount', 'undefined_bond_stereo_count': 'UndefinedBondStereoCount', 'covalent_unit_count': 'CovalentUnitCount', 'volume_3d': 'Volume3D', 'conformer_rmsd_3d': 'ConformerModelRMSD3D', 'conformer_model_rmsd_3d': 'ConformerModelRMSD3D', 'x_steric_quadrupole_3d': 'XStericQuadrupole3D', 'y_steric_quadrupole_3d': 'YStericQuadrupole3D', 'z_steric_quadrupole_3d': 'ZStericQuadrupole3D', 'feature_count_3d': 'FeatureCount3D', 'feature_acceptor_count_3d': 'FeatureAcceptorCount3D', 'feature_donor_count_3d': 'FeatureDonorCount3D', 'feature_anion_count_3d': 'FeatureAnionCount3D', 'feature_cation_count_3d': 'FeatureCationCount3D', 'feature_ring_count_3d': 'FeatureRingCount3D', 'feature_hydrophobe_count_3d': 'FeatureHydrophobeCount3D', 'effective_rotor_count_3d': 'EffectiveRotorCount3D', 'conformer_count_3d': 'ConformerCount3D', } def get_properties(properties, identifier, namespace='cid', searchtype=None, as_dataframe=False, **kwargs): """Retrieve the specified properties from PubChem. :param identifier: The compound, substance or assay identifier to use as a search query. :param namespace: (optional) The identifier type. :param searchtype: (optional) The advanced search type, one of substructure, superstructure or similarity. :param as_dataframe: (optional) Automatically extract the properties into a pandas :class:`~pandas.DataFrame`. """ if isinstance(properties, text_types): properties = properties.split(',') properties = ','.join([PROPERTY_MAP.get(p, p) for p in properties]) properties = 'property/%s' % properties results = get_json(identifier, namespace, 'compound', properties, searchtype=searchtype, **kwargs) results = results['PropertyTable']['Properties'] if results else [] if as_dataframe: import pandas as pd return pd.DataFrame.from_records(results, index='CID') return results def get_synonyms(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs): results = get_json(identifier, namespace, domain, 'synonyms', searchtype=searchtype, **kwargs) return results['InformationList']['Information'] if results else [] def get_cids(identifier, namespace='name', domain='compound', searchtype=None, **kwargs): results = get_json(identifier, namespace, domain, 'cids', searchtype=searchtype, **kwargs) if not results: return [] elif 'IdentifierList' in results: return results['IdentifierList']['CID'] elif 'InformationList' in results: return results['InformationList']['Information'] def get_sids(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs): results = get_json(identifier, namespace, domain, 'sids', searchtype=searchtype, **kwargs) if not results: return [] elif 'IdentifierList' in results: return results['IdentifierList']['SID'] elif 'InformationList' in results: return results['InformationList']['Information'] def get_aids(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs): results = get_json(identifier, namespace, domain, 'aids', searchtype=searchtype, **kwargs) if not results: return [] elif 'IdentifierList' in results: return results['IdentifierList']['AID'] elif 'InformationList' in results: return results['InformationList']['Information'] def get_all_sources(domain='substance'): """Return a list of all current depositors of substances or assays.""" results = json.loads(get(domain, None, 'sources').decode()) return results['InformationList']['SourceName'] def download(outformat, path, identifier, namespace='cid', domain='compound', operation=None, searchtype=None, overwrite=False, **kwargs): """Format can be XML, ASNT/B, JSON, SDF, CSV, PNG, TXT.""" response = get(identifier, namespace, domain, operation, outformat, searchtype, **kwargs) if not overwrite and os.path.isfile(path): raise IOError("%s already exists. Use 'overwrite=True' to overwrite it." % path) with open(path, 'wb') as f: f.write(response) def memoized_property(fget): """Decorator to create memoized properties. Used to cache :class:`~pubchempy.Compound` and :class:`~pubchempy.Substance` properties that require an additional request. """ attr_name = '_{0}'.format(fget.__name__) @functools.wraps(fget) def fget_memoized(self): if not hasattr(self, attr_name): setattr(self, attr_name, fget(self)) return getattr(self, attr_name) return property(fget_memoized) def deprecated(message=None): """Decorator to mark functions as deprecated. A warning will be emitted when the function is used.""" def deco(func): @functools.wraps(func) def wrapped(*args, **kwargs): warnings.warn( message or 'Call to deprecated function {}'.format(func.__name__), category=PubChemPyDeprecationWarning, stacklevel=2 ) return func(*args, **kwargs) return wrapped return deco class Atom(object): """Class to represent an atom in a :class:`~pubchempy.Compound`.""" def __init__(self, aid, number, x=None, y=None, z=None, charge=0): """Initialize with an atom ID, atomic number, coordinates and optional change. :param int aid: Atom ID :param int number: Atomic number :param float x: X coordinate. :param float y: Y coordinate. :param float z: (optional) Z coordinate. :param int charge: (optional) Formal charge on atom. """ self.aid = aid """The atom ID within the owning Compound.""" self.number = number """The atomic number for this atom.""" self.x = x """The x coordinate for this atom.""" self.y = y """The y coordinate for this atom.""" self.z = z """The z coordinate for this atom. Will be ``None`` in 2D Compound records.""" self.charge = charge """The formal charge on this atom.""" def __repr__(self): return 'Atom(%s, %s)' % (self.aid, self.element) def __eq__(self, other): return (isinstance(other, type(self)) and self.aid == other.aid and self.element == other.element and self.x == other.x and self.y == other.y and self.z == other.z and self.charge == other.charge) @deprecated('Dictionary style access to Atom attributes is deprecated') def __getitem__(self, prop): """Allow dict-style access to attributes to ease transition from when atoms were dicts.""" if prop in {'element', 'x', 'y', 'z', 'charge'}: return getattr(self, prop) raise KeyError(prop) @deprecated('Dictionary style access to Atom attributes is deprecated') def __setitem__(self, prop, val): """Allow dict-style setting of attributes to ease transition from when atoms were dicts.""" setattr(self, prop, val) @deprecated('Dictionary style access to Atom attributes is deprecated') def __contains__(self, prop): """Allow dict-style checking of attributes to ease transition from when atoms were dicts.""" if prop in {'element', 'x', 'y', 'z', 'charge'}: return getattr(self, prop) is not None return False @property def element(self): """The element symbol for this atom.""" return ELEMENTS.get(self.number, None) def to_dict(self): """Return a dictionary containing Atom data.""" data = {'aid': self.aid, 'number': self.number, 'element': self.element} for coord in {'x', 'y', 'z'}: if getattr(self, coord) is not None: data[coord] = getattr(self, coord) if self.charge is not 0: data['charge'] = self.charge return data def set_coordinates(self, x, y, z=None): """Set all coordinate dimensions at once.""" self.x = x self.y = y self.z = z @property def coordinate_type(self): """Whether this atom has 2D or 3D coordinates.""" return '2d' if self.z is None else '3d' class Bond(object): """Class to represent a bond between two atoms in a :class:`~pubchempy.Compound`.""" def __init__(self, aid1, aid2, order=BondType.SINGLE, style=None): """Initialize with begin and end atom IDs, bond order and bond style. :param int aid1: Begin atom ID. :param int aid2: End atom ID. :param int order: Bond order. """ self.aid1 = aid1 """ID of the begin atom of this bond.""" self.aid2 = aid2 """ID of the end atom of this bond.""" self.order = order """Bond order.""" self.style = style """Bond style annotation.""" def __repr__(self): return 'Bond(%s, %s, %s)' % (self.aid1, self.aid2, self.order) def __eq__(self, other): return (isinstance(other, type(self)) and self.aid1 == other.aid1 and self.aid2 == other.aid2 and self.order == other.order and self.style == other.style) @deprecated('Dictionary style access to Bond attributes is deprecated') def __getitem__(self, prop): """Allow dict-style access to attributes to ease transition from when bonds were dicts.""" if prop in {'order', 'style'}: return getattr(self, prop) raise KeyError(prop) @deprecated('Dictionary style access to Bond attributes is deprecated') def __setitem__(self, prop, val): """Allow dict-style setting of attributes to ease transition from when bonds were dicts.""" setattr(self, prop, val) @deprecated('Dictionary style access to Atom attributes is deprecated') def __contains__(self, prop): """Allow dict-style checking of attributes to ease transition from when bonds were dicts.""" if prop in {'order', 'style'}: return getattr(self, prop) is not None return False @deprecated('Dictionary style access to Atom attributes is deprecated') def __delitem__(self, prop): """Delete the property prop from the wrapped object.""" if not hasattr(self.__wrapped, prop): raise KeyError(prop) delattr(self.__wrapped, prop) def to_dict(self): """Return a dictionary containing Bond data.""" data = {'aid1': self.aid1, 'aid2': self.aid2, 'order': self.order} if self.style is not None: data['style'] = self.style return data class Compound(object): """Corresponds to a single record from the PubChem Compound database. The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Each Compound is uniquely identified by a CID. """ def __init__(self, record): """Initialize with a record dict from the PubChem PUG REST service. For most users, the ``from_cid()`` class method is probably a better way of creating Compounds. :param dict record: A compound record returned by the PubChem PUG REST service. """ self._record = None self._atoms = {} self._bonds = {} self.record = record @property def record(self): """The raw compound record returned by the PubChem PUG REST service.""" return self._record @record.setter def record(self, record): self._record = record log.debug('Created %s' % self) self._setup_atoms() self._setup_bonds() def _setup_atoms(self): """Derive Atom objects from the record.""" # Delete existing atoms self._atoms = {} # Create atoms aids = self.record['atoms']['aid'] elements = self.record['atoms']['element'] if not len(aids) == len(elements): raise ResponseParseError('Error parsing atom elements') for aid, element in zip(aids, elements): self._atoms[aid] = Atom(aid=aid, number=element) # Add coordinates if 'coords' in self.record: coord_ids = self.record['coords'][0]['aid'] xs = self.record['coords'][0]['conformers'][0]['x'] ys = self.record['coords'][0]['conformers'][0]['y'] zs = self.record['coords'][0]['conformers'][0].get('z', []) if not len(coord_ids) == len(xs) == len(ys) == len(self._atoms) or (zs and not len(zs) == len(coord_ids)): raise ResponseParseError('Error parsing atom coordinates') for aid, x, y, z in zip_longest(coord_ids, xs, ys, zs): self._atoms[aid].set_coordinates(x, y, z) # Add charges if 'charge' in self.record['atoms']: for charge in self.record['atoms']['charge']: self._atoms[charge['aid']].charge = charge['value'] def _setup_bonds(self): """Derive Bond objects from the record.""" self._bonds = {} if 'bonds' not in self.record: return # Create bonds aid1s = self.record['bonds']['aid1'] aid2s = self.record['bonds']['aid2'] orders = self.record['bonds']['order'] if not len(aid1s) == len(aid2s) == len(orders): raise ResponseParseError('Error parsing bonds') for aid1, aid2, order in zip(aid1s, aid2s, orders): self._bonds[frozenset((aid1, aid2))] = Bond(aid1=aid1, aid2=aid2, order=order) # Add styles if 'coords' in self.record and 'style' in self.record['coords'][0]['conformers'][0]: aid1s = self.record['coords'][0]['conformers'][0]['style']['aid1'] aid2s = self.record['coords'][0]['conformers'][0]['style']['aid2'] styles = self.record['coords'][0]['conformers'][0]['style']['annotation'] for aid1, aid2, style in zip(aid1s, aid2s, styles): self._bonds[frozenset((aid1, aid2))].style = style @classmethod def from_cid(cls, cid, **kwargs): """Retrieve the Compound record for the specified CID. Usage:: c = Compound.from_cid(6819) :param int cid: The PubChem Compound Identifier (CID). """ record = json.loads(request(cid, **kwargs).read().decode())['PC_Compounds'][0] return cls(record) def __repr__(self): return 'Compound(%s)' % self.cid if self.cid else 'Compound()' def __eq__(self, other): return isinstance(other, type(self)) and self.record == other.record def to_dict(self, properties=None): """Return a dictionary containing Compound data. Optionally specify a list of the desired properties. synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request. """ if not properties: skip = {'aids', 'sids', 'synonyms'} properties = [p for p in dir(Compound) if isinstance(getattr(Compound, p), property) and p not in skip] return {p: [i.to_dict() for i in getattr(self, p)] if p in {'atoms', 'bonds'} else getattr(self, p) for p in properties} def to_series(self, properties=None): """Return a pandas :class:`~pandas.Series` containing Compound data. Optionally specify a list of the desired properties. synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request. """ import pandas as pd return pd.Series(self.to_dict(properties)) @property def cid(self): """The PubChem Compound Identifier (CID). .. note:: When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property. """ if 'id' in self.record and 'id' in self.record['id'] and 'cid' in self.record['id']['id']: return self.record['id']['id']['cid'] @property def elements(self): """List of element symbols for atoms in this Compound.""" return [a.element for a in self.atoms] @property def atoms(self): """List of :class:`Atoms ` in this Compound.""" return sorted(self._atoms.values(), key=lambda x: x.aid) @property def bonds(self): """List of :class:`Bonds ` between :class:`Atoms ` in this Compound.""" return sorted(self._bonds.values(), key=lambda x: (x.aid1, x.aid2)) @memoized_property def synonyms(self): """A ranked list of all the names associated with this Compound. Requires an extra request. Result is cached. """ if self.cid: results = get_json(self.cid, operation='synonyms') return results['InformationList']['Information'][0]['Synonym'] if results else [] @memoized_property def sids(self): """Requires an extra request. Result is cached.""" if self.cid: results = get_json(self.cid, operation='sids') return results['InformationList']['Information'][0]['SID'] if results else [] @memoized_property def aids(self): """Requires an extra request. Result is cached.""" if self.cid: results = get_json(self.cid, operation='aids') return results['InformationList']['Information'][0]['AID'] if results else [] @property def coordinate_type(self): if CoordinateType.TWO_D in self.record['coords'][0]['type']: return '2d' elif CoordinateType.THREE_D in self.record['coords'][0]['type']: return '3d' @property def charge(self): """Formal charge on this Compound.""" return self.record['charge'] if 'charge' in self.record else 0 @property def molecular_formula(self): """Molecular formula.""" return _parse_prop({'label': 'Molecular Formula'}, self.record['props']) @property def molecular_weight(self): """Molecular Weight.""" return _parse_prop({'label': 'Molecular Weight'}, self.record['props']) @property def canonical_smiles(self): """Canonical SMILES, with no stereochemistry information.""" return _parse_prop({'label': 'SMILES', 'name': 'Canonical'}, self.record['props']) @property def isomeric_smiles(self): """Isomeric SMILES.""" return _parse_prop({'label': 'SMILES', 'name': 'Isomeric'}, self.record['props']) @property def inchi(self): """InChI string.""" return _parse_prop({'label': 'InChI', 'name': 'Standard'}, self.record['props']) @property def inchikey(self): """InChIKey.""" return _parse_prop({'label': 'InChIKey', 'name': 'Standard'}, self.record['props']) @property def iupac_name(self): """Preferred IUPAC name.""" # Note: Allowed, CAS-like Style, Preferred, Systematic, Traditional are available in full record return _parse_prop({'label': 'IUPAC Name', 'name': 'Preferred'}, self.record['props']) @property def xlogp(self): """XLogP.""" return _parse_prop({'label': 'Log P'}, self.record['props']) @property def exact_mass(self): """Exact mass.""" return _parse_prop({'label': 'Mass', 'name': 'Exact'}, self.record['props']) @property def monoisotopic_mass(self): """Monoisotopic mass.""" return _parse_prop({'label': 'Weight', 'name': 'MonoIsotopic'}, self.record['props']) @property def tpsa(self): """Topological Polar Surface Area.""" return _parse_prop({'implementation': 'E_TPSA'}, self.record['props']) @property def complexity(self): """Complexity.""" return _parse_prop({'implementation': 'E_COMPLEXITY'}, self.record['props']) @property def h_bond_donor_count(self): """Hydrogen bond donor count.""" return _parse_prop({'implementation': 'E_NHDONORS'}, self.record['props']) @property def h_bond_acceptor_count(self): """Hydrogen bond acceptor count.""" return _parse_prop({'implementation': 'E_NHACCEPTORS'}, self.record['props']) @property def rotatable_bond_count(self): """Rotatable bond count.""" return _parse_prop({'implementation': 'E_NROTBONDS'}, self.record['props']) @property def fingerprint(self): """Raw padded and hex-encoded fingerprint, as returned by the PUG REST API.""" return _parse_prop({'implementation': 'E_SCREEN'}, self.record['props']) @property def cactvs_fingerprint(self): """PubChem CACTVS fingerprint. Each bit in the fingerprint represents the presence or absence of one of 881 chemical substructures. More information at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt """ # Skip first 4 bytes (contain length of fingerprint) and last 7 bits (padding) then re-pad to 881 bits return '{0:020b}'.format(int(self.fingerprint[8:], 16))[:-7].zfill(881) @property def heavy_atom_count(self): """Heavy atom count.""" if 'count' in self.record and 'heavy_atom' in self.record['count']: return self.record['count']['heavy_atom'] @property def isotope_atom_count(self): """Isotope atom count.""" if 'count' in self.record and 'isotope_atom' in self.record['count']: return self.record['count']['isotope_atom'] @property def atom_stereo_count(self): """Atom stereocenter count.""" if 'count' in self.record and 'atom_chiral' in self.record['count']: return self.record['count']['atom_chiral'] @property def defined_atom_stereo_count(self): """Defined atom stereocenter count.""" if 'count' in self.record and 'atom_chiral_def' in self.record['count']: return self.record['count']['atom_chiral_def'] @property def undefined_atom_stereo_count(self): """Undefined atom stereocenter count.""" if 'count' in self.record and 'atom_chiral_undef' in self.record['count']: return self.record['count']['atom_chiral_undef'] @property def bond_stereo_count(self): """Bond stereocenter count.""" if 'count' in self.record and 'bond_chiral' in self.record['count']: return self.record['count']['bond_chiral'] @property def defined_bond_stereo_count(self): """Defined bond stereocenter count.""" if 'count' in self.record and 'bond_chiral_def' in self.record['count']: return self.record['count']['bond_chiral_def'] @property def undefined_bond_stereo_count(self): """Undefined bond stereocenter count.""" if 'count' in self.record and 'bond_chiral_undef' in self.record['count']: return self.record['count']['bond_chiral_undef'] @property def covalent_unit_count(self): """Covalently-bonded unit count.""" if 'count' in self.record and 'covalent_unit' in self.record['count']: return self.record['count']['covalent_unit'] @property def volume_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Shape', 'name': 'Volume'}, conf['data']) @property def multipoles_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Shape', 'name': 'Multipoles'}, conf['data']) @property def conformer_rmsd_3d(self): coords = self.record['coords'][0] if 'data' in coords: return _parse_prop({'label': 'Conformer', 'name': 'RMSD'}, coords['data']) @property def effective_rotor_count_3d(self): return _parse_prop({'label': 'Count', 'name': 'Effective Rotor'}, self.record['props']) @property def pharmacophore_features_3d(self): return _parse_prop({'label': 'Features', 'name': 'Pharmacophore'}, self.record['props']) @property def mmff94_partial_charges_3d(self): return _parse_prop({'label': 'Charge', 'name': 'MMFF94 Partial'}, self.record['props']) @property def mmff94_energy_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Energy', 'name': 'MMFF94 NoEstat'}, conf['data']) @property def conformer_id_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Conformer', 'name': 'ID'}, conf['data']) @property def shape_selfoverlap_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Shape', 'name': 'Self Overlap'}, conf['data']) @property def feature_selfoverlap_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Feature', 'name': 'Self Overlap'}, conf['data']) @property def shape_fingerprint_3d(self): conf = self.record['coords'][0]['conformers'][0] if 'data' in conf: return _parse_prop({'label': 'Fingerprint', 'name': 'Shape'}, conf['data']) def _parse_prop(search, proplist): """Extract property value from record using the given urn search filter.""" props = [i for i in proplist if all(item in i['urn'].items() for item in search.items())] if len(props) > 0: return props[0]['value'][list(props[0]['value'].keys())[0]] class Substance(object): """Corresponds to a single record from the PubChem Substance database. The PubChem Substance database contains all chemical records deposited in PubChem in their most raw form, before any significant processing is applied. As a result, it contains duplicates, mixtures, and some records that don't make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record. The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Hence each Compound may be derived from a number of different Substances. """ @classmethod def from_sid(cls, sid): """Retrieve the Substance record for the specified SID. :param int sid: The PubChem Substance Identifier (SID). """ record = json.loads(request(sid, 'sid', 'substance').read().decode())['PC_Substances'][0] return cls(record) def __init__(self, record): self.record = record """A dictionary containing the full Substance record that all other properties are obtained from.""" def __repr__(self): return 'Substance(%s)' % self.sid if self.sid else 'Substance()' def __eq__(self, other): return isinstance(other, type(self)) and self.record == other.record def to_dict(self, properties=None): """Return a dictionary containing Substance data. If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve. :param properties: (optional) A list of the desired properties. """ if not properties: skip = {'deposited_compound', 'standardized_compound', 'cids', 'aids'} properties = [p for p in dir(Substance) if isinstance(getattr(Substance, p), property) and p not in skip] return {p: getattr(self, p) for p in properties} def to_series(self, properties=None): """Return a pandas :class:`~pandas.Series` containing Substance data. If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve. :param properties: (optional) A list of the desired properties. """ import pandas as pd return pd.Series(self.to_dict(properties)) @property def sid(self): """The PubChem Substance Idenfitier (SID).""" return self.record['sid']['id'] @property def synonyms(self): """A ranked list of all the names associated with this Substance.""" if 'synonyms' in self.record: return self.record['synonyms'] @property def source_name(self): """The name of the PubChem depositor that was the source of this Substance.""" return self.record['source']['db']['name'] @property def source_id(self): """Unique ID for this Substance within those from the same PubChem depositor source.""" return self.record['source']['db']['source_id']['str'] @property def standardized_cid(self): """The CID of the Compound that was produced when this Substance was standardized. May not exist if this Substance was not standardizable. """ for c in self.record['compound']: if c['id']['type'] == CompoundIdType.STANDARDIZED: return c['id']['id']['cid'] @memoized_property def standardized_compound(self): """Return the :class:`~pubchempy.Compound` that was produced when this Substance was standardized. Requires an extra request. Result is cached. """ for c in self.record['compound']: if c['id']['type'] == CompoundIdType.STANDARDIZED: return Compound.from_cid(c['id']['id']['cid']) @property def deposited_compound(self): """Return a :class:`~pubchempy.Compound` produced from the unstandardized Substance record as deposited. The resulting :class:`~pubchempy.Compound` will not have a ``cid`` and will be missing most properties. """ for c in self.record['compound']: if c['id']['type'] == CompoundIdType.DEPOSITED: return Compound(c) @memoized_property def cids(self): """A list of all CIDs for Compounds that were produced when this Substance was standardized. Requires an extra request. Result is cached.""" results = get_json(self.sid, 'sid', 'substance', 'cids') return results['InformationList']['Information'][0]['CID'] if results else [] @memoized_property def aids(self): """A list of all AIDs for Assays associated with this Substance. Requires an extra request. Result is cached.""" results = get_json(self.sid, 'sid', 'substance', 'aids') return results['InformationList']['Information'][0]['AID'] if results else [] class Assay(object): @classmethod def from_aid(cls, aid): """Retrieve the Assay record for the specified AID. :param int aid: The PubChem Assay Identifier (AID). """ record = json.loads(request(aid, 'aid', 'assay', 'description').read().decode())['PC_AssayContainer'][0] return cls(record) def __init__(self, record): self.record = record """A dictionary containing the full Assay record that all other properties are obtained from.""" def __repr__(self): return 'Assay(%s)' % self.aid if self.aid else 'Assay()' def __eq__(self, other): return isinstance(other, type(self)) and self.record == other.record def to_dict(self, properties=None): """Return a dictionary containing Assay data. If the properties parameter is not specified, everything is included. :param properties: (optional) A list of the desired properties. """ if not properties: properties = [p for p in dir(Assay) if isinstance(getattr(Assay, p), property)] return {p: getattr(self, p) for p in properties} @property def aid(self): """The PubChem Substance Idenfitier (SID).""" return self.record['assay']['descr']['aid']['id'] @property def name(self): """The short assay name, used for display purposes.""" return self.record['assay']['descr']['name'] @property def description(self): """Description""" return self.record['assay']['descr']['description'] @property def project_category(self): """A category to distinguish projects funded through MLSCN, MLPCN or from literature. Possible values include mlscn, mlpcn, mlscn-ap, mlpcn-ap, literature-extracted, literature-author, literature-publisher, rnaigi. """ if 'project_category' in self.record['assay']['descr']: return self.record['assay']['descr']['project_category'] @property def comments(self): """Comments and additional information.""" return [comment for comment in self.record['assay']['descr']['comment'] if comment] @property def results(self): """A list of dictionaries containing details of the results from this Assay.""" return self.record['assay']['descr']['results'] @property def target(self): """A list of dictionaries containing details of the Assay targets.""" if 'target' in self.record['assay']['descr']: return self.record['assay']['descr']['target'] @property def revision(self): """Revision identifier for textual description.""" return self.record['assay']['descr']['revision'] @property def aid_version(self): """Incremented when the original depositor updates the record.""" return self.record['assay']['descr']['aid']['version'] def compounds_to_frame(compounds, properties=None): """Construct a pandas :class:`~pandas.DataFrame` from a list of :class:`~pubchempy.Compound` objects. Optionally specify a list of the desired :class:`~pubchempy.Compound` properties. """ import pandas as pd if isinstance(compounds, Compound): compounds = [compounds] properties = set(properties) | set(['cid']) if properties else None return pd.DataFrame.from_records([c.to_dict(properties) for c in compounds], index='cid') def substances_to_frame(substances, properties=None): """Construct a pandas :class:`~pandas.DataFrame` from a list of :class:`~pubchempy.Substance` objects. Optionally specify a list of the desired :class:`~pubchempy.Substance` properties. """ import pandas as pd if isinstance(substances, Substance): substances = [substances] properties = set(properties) | set(['sid']) if properties else None return pd.DataFrame.from_records([s.to_dict(properties) for s in substances], index='sid') # def add_columns_to_frame(dataframe, id_col, id_namespace, add_cols): # """""" # # Existing dataframe with some identifier column # # But consider what to do if the identifier column is an index? # # What about having the Compound/Substance object as a column? class PubChemPyDeprecationWarning(Warning): """Warning category for deprecated features.""" pass class PubChemPyError(Exception): """Base class for all PubChemPy exceptions.""" pass class ResponseParseError(PubChemPyError): """PubChem response is uninterpretable.""" pass class PubChemHTTPError(PubChemPyError): """Generic error class to handle all HTTP error codes.""" def __init__(self, e): self.code = e.code self.msg = e.reason try: self.msg += ': %s' % json.loads(e.read().decode())['Fault']['Details'][0] except (ValueError, IndexError, KeyError): pass if self.code == 400: raise BadRequestError(self.msg) elif self.code == 404: raise NotFoundError(self.msg) elif self.code == 405: raise MethodNotAllowedError(self.msg) elif self.code == 504: raise TimeoutError(self.msg) elif self.code == 501: raise UnimplementedError(self.msg) elif self.code == 500: raise ServerError(self.msg) def __str__(self): return repr(self.msg) class BadRequestError(PubChemHTTPError): """Request is improperly formed (syntax error in the URL, POST body, etc.).""" def __init__(self, msg='Request is improperly formed'): self.msg = msg class NotFoundError(PubChemHTTPError): """The input record was not found (e.g. invalid CID).""" def __init__(self, msg='The input record was not found'): self.msg = msg class MethodNotAllowedError(PubChemHTTPError): """Request not allowed (such as invalid MIME type in the HTTP Accept header).""" def __init__(self, msg='Request not allowed'): self.msg = msg class TimeoutError(PubChemHTTPError): """The request timed out, from server overload or too broad a request. See :ref:`Avoiding TimeoutError ` for more information. """ def __init__(self, msg='The request timed out'): self.msg = msg class UnimplementedError(PubChemHTTPError): """The requested operation has not (yet) been implemented by the server.""" def __init__(self, msg='The requested operation has not been implemented'): self.msg = msg class ServerError(PubChemHTTPError): """Some problem on the server side (such as a database server down, etc.).""" def __init__(self, msg='Some problem on the server side'): self.msg = msg if __name__ == '__main__': print(__version__) PubChemPy-1.0.4/requirements/000077500000000000000000000000001307322065300160745ustar00rootroot00000000000000PubChemPy-1.0.4/requirements/common.txt000066400000000000000000000000171307322065300201230ustar00rootroot00000000000000pandas>=0.16.2 PubChemPy-1.0.4/requirements/dev.txt000066400000000000000000000001621307322065300174120ustar00rootroot00000000000000-r common.txt bumpversion>=0.5.3 coverage>=4.0 coveralls>=1.0 pytest>=3.0.7 Sphinx>=1.3.1 sphinx-rtd-theme>=0.1.9 PubChemPy-1.0.4/setup.py000066400000000000000000000033461307322065300150710ustar00rootroot00000000000000#!/usr/bin/env python import os from setuptools import setup if os.path.exists('README.rst'): long_description = open('README.rst').read() else: long_description = '''PubChemPy is a wrapper around the PubChem PUG REST API that provides a way to interact with PubChem in Python. It allows chemical searches (including by name, substructure and similarity), chemical standardization, conversion between chemical file formats, depiction and retrieval of chemical properties. ''' setup( name='PubChemPy', version='1.0.4', author='Matt Swain', author_email='m.swain@me.com', license='MIT', url='https://github.com/mcs07/PubChemPy', py_modules=['pubchempy'], description='A simple Python wrapper around the PubChem PUG REST API.', long_description=long_description, keywords='pubchem python rest api chemistry cheminformatics', extras_require={'pandas': ['pandas']}, test_suite='pubchempy_test', classifiers=[ 'Intended Audience :: Science/Research', 'Intended Audience :: Healthcare Industry', 'Intended Audience :: Developers', 'Topic :: Scientific/Engineering', 'Topic :: Scientific/Engineering :: Bio-Informatics', 'Topic :: Scientific/Engineering :: Chemistry', 'Topic :: Database :: Front-Ends', 'Topic :: Software Development :: Libraries :: Python Modules', 'Topic :: Internet', 'License :: OSI Approved :: MIT License', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', ], ) PubChemPy-1.0.4/tests/000077500000000000000000000000001307322065300145135ustar00rootroot00000000000000PubChemPy-1.0.4/tests/test_assay.py000066400000000000000000000016771307322065300172570ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_assay ~~~~~~~~~~ Test assay object. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * @pytest.fixture(scope='module') def a1(): """Assay AID 490.""" return Assay.from_aid(490) def test_basic(a1): assert a1.aid == 490 assert repr(a1) == 'Assay(490)' assert a1.record def test_meta(a1): assert isinstance(a1.name, text_types) assert a1.project_category == ProjectCategory.LITERATURE_EXTRACTED assert isinstance(a1.description, list) assert isinstance(a1.comments, list) def test_assay_equality(): first = Assay.from_aid(490) second = Assay.from_aid(1000) assert first == first assert second == second assert first != second def test_assay_dict(a1): assert isinstance(a1.to_dict(), dict) assert a1.to_dict() PubChemPy-1.0.4/tests/test_compound.py000066400000000000000000000117711307322065300177570ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_compound ~~~~~~~~~~~~~ Test compound object. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import re import pytest from pubchempy import * @pytest.fixture(scope='module') def c1(): """Compound CID 241.""" return Compound.from_cid(241) @pytest.fixture(scope='module') def c2(): """Compound CID 175.""" return Compound.from_cid(175) def test_basic(c1): """Test Compound is retrieved and has a record and correct CID.""" assert c1.cid == 241 assert repr(c1) == 'Compound(241)' assert c1.record def test_atoms(c1): assert len(c1.atoms) == 12 assert set(a.element for a in c1.atoms) == {'C', 'H'} assert set(c1.elements) == {'C', 'H'} def test_atoms_deprecated(c1): with warnings.catch_warnings(record=True) as w: assert set(a['element'] for a in c1.atoms) == {'C', 'H'} assert len(w) == 1 assert w[0].category == PubChemPyDeprecationWarning assert str(w[0].message) == 'Dictionary style access to Atom attributes is deprecated' def test_single_atom(): """Test Compound when there is a single atom and no bonds.""" c = Compound.from_cid(259) assert c.atoms == [Atom(aid=1, number=35, x=2, y=0, charge=-1)] assert c.bonds == [] def test_bonds(c1): assert len(c1.bonds) == 12 assert set(b.order for b in c1.bonds) == {BondType.SINGLE, BondType.DOUBLE} def test_bonds_deprecated(c1): with warnings.catch_warnings(record=True) as w: assert set(b['order'] for b in c1.bonds) == {BondType.SINGLE, BondType.DOUBLE} assert len(w) == 1 assert w[0].category == PubChemPyDeprecationWarning assert str(w[0].message) == 'Dictionary style access to Bond attributes is deprecated' def test_charge(c1): assert c1.charge == 0 def test_coordinates(c1): for a in c1.atoms: assert isinstance(a.x, (float, int)) assert isinstance(a.y, (float, int)) assert a.z is None def test_coordinates_deprecated(c1): with warnings.catch_warnings(record=True) as w: assert isinstance(c1.atoms[0]['x'], (float, int)) assert isinstance(c1.atoms[0]['y'], (float, int)) assert 'z' not in c1.atoms[0] assert len(w) == 3 assert w[0].category == PubChemPyDeprecationWarning assert str(w[0].message) == 'Dictionary style access to Atom attributes is deprecated' def test_identifiers(c1): assert len(c1.canonical_smiles) > 10 assert len(c1.isomeric_smiles) > 10 assert c1.inchi.startswith('InChI=') assert re.match(r'^[A-Z]{14}-[A-Z]{10}-[A-Z\d]$', c1.inchikey) # TODO: c1.molecular_formula def test_properties_types(c1): assert isinstance(c1.molecular_weight, float) assert isinstance(c1.iupac_name, text_types) assert isinstance(c1.xlogp, float) assert isinstance(c1.exact_mass, float) assert isinstance(c1.monoisotopic_mass, float) assert isinstance(c1.tpsa, (int, float)) assert isinstance(c1.complexity, float) assert isinstance(c1.h_bond_donor_count, int) assert isinstance(c1.h_bond_acceptor_count, int) assert isinstance(c1.rotatable_bond_count, int) assert isinstance(c1.heavy_atom_count, int) assert isinstance(c1.isotope_atom_count, int) assert isinstance(c1.atom_stereo_count, int) assert isinstance(c1.defined_atom_stereo_count, int) assert isinstance(c1.undefined_atom_stereo_count, int) assert isinstance(c1.bond_stereo_count, int) assert isinstance(c1.defined_bond_stereo_count, int) assert isinstance(c1.undefined_bond_stereo_count, int) assert isinstance(c1.covalent_unit_count, int) assert isinstance(c1.fingerprint, text_types) def test_coordinate_type(c1): assert c1.coordinate_type == '2d' def test_compound_equality(): assert Compound.from_cid(241) == Compound.from_cid(241) assert get_compounds('Benzene', 'name')[0], get_compounds('c1ccccc1' == 'smiles')[0] def test_synonyms(c1): assert len(c1.synonyms) > 5 assert len(c1.synonyms) > 5 def test_related_records(c1): assert len(c1.sids) > 20 assert len(c1.aids) > 20 def test_compound_dict(c1): assert isinstance(c1.to_dict(), dict) assert c1.to_dict() assert 'atoms' in c1.to_dict() assert 'bonds' in c1.to_dict() assert 'element' in c1.to_dict()['atoms'][0] def test_charged_compound(c2): assert len(c2.atoms) == 7 assert c2.atoms[0].charge == -1 def test_charged_compound_deprecated(c2): with warnings.catch_warnings(record=True) as w: assert c2.atoms[0]['charge'] == -1 assert len(w) == 1 assert w[0].category == PubChemPyDeprecationWarning assert str(w[0].message) == 'Dictionary style access to Atom attributes is deprecated' def test_fingerprint(c1): # CACTVS fingerprint is 881 bits assert len(c1.cactvs_fingerprint) == 881 # Raw fingerprint has 4 byte prefix, 7 bit suffix, and is hex encoded (/4) = 230 assert len(c1.fingerprint) == (881 + (4 * 8) + 7) / 4 PubChemPy-1.0.4/tests/test_compound3d.py000066400000000000000000000044041307322065300202010ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_compound3d ~~~~~~~~~~~~~~~ Test compound object with 3D record. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * @pytest.fixture def c3d(): """Compound CID 1234, 3D.""" return Compound.from_cid(1234, record_type='3d') def test_properties_types(c3d): assert isinstance(c3d.volume_3d, float) assert isinstance(c3d.multipoles_3d, list) assert isinstance(c3d.conformer_rmsd_3d, float) assert isinstance(c3d.effective_rotor_count_3d, int) assert isinstance(c3d.pharmacophore_features_3d, list) assert isinstance(c3d.mmff94_partial_charges_3d, list) assert isinstance(c3d.mmff94_energy_3d, float) assert isinstance(c3d.conformer_id_3d, text_types) assert isinstance(c3d.shape_selfoverlap_3d, float) assert isinstance(c3d.feature_selfoverlap_3d, float) assert isinstance(c3d.shape_fingerprint_3d, list) assert isinstance(c3d.volume_3d, float) def test_coordinate_type(c3d): assert c3d.coordinate_type == '3d' def test_atoms(c3d): assert len(c3d.atoms) == 75 assert set(a.element for a in c3d.atoms) == {'C', 'H', 'O', 'N'} assert set(c3d.elements) == {'C', 'H', 'O', 'N'} def test_atoms_deprecated(c3d): with warnings.catch_warnings(record=True) as w: assert set(a['element'] for a in c3d.atoms) == {'C', 'H', 'O', 'N'} assert len(w) == 1 assert w[0].category == PubChemPyDeprecationWarning assert str(w[0].message) == 'Dictionary style access to Atom attributes is deprecated' def test_coordinates(c3d): for a in c3d.atoms: assert isinstance(a.x, (float, int)) assert isinstance(a.y, (float, int)) assert isinstance(a.z, (float, int)) def test_coordinates_deprecated(c3d): with warnings.catch_warnings(record=True) as w: assert isinstance(c3d.atoms[0]['x'], (float, int)) assert isinstance(c3d.atoms[0]['y'], (float, int)) assert isinstance(c3d.atoms[0]['z'], (float, int)) assert len(w) == 3 assert w[0].category == PubChemPyDeprecationWarning assert str(w[0].message) == 'Dictionary style access to Atom attributes is deprecated' PubChemPy-1.0.4/tests/test_download.py000066400000000000000000000022011307322065300177260ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_download ~~~~~~~~~~~~~ Test downloading. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import csv import shutil import tempfile import pytest from pubchempy import * @pytest.fixture(scope='module') def tmp_dir(): dir = tempfile.mkdtemp() yield dir shutil.rmtree(dir) def test_image_download(tmp_dir): download('PNG', os.path.join(tmp_dir, 'aspirin.png'), 'Aspirin', 'name') with pytest.raises(IOError): download('PNG', os.path.join(tmp_dir, 'aspirin.png'), 'Aspirin', 'name') download('PNG', os.path.join(tmp_dir, 'aspirin.png'), 'Aspirin', 'name', overwrite=True) def test_csv_download(tmp_dir): download('CSV', os.path.join(tmp_dir, 's.csv'), [1, 2, 3], operation='property/CanonicalSMILES,IsomericSMILES') with open(os.path.join(tmp_dir, 's.csv')) as f: rows = list(csv.reader(f)) assert rows[0] == ['CID', 'CanonicalSMILES', 'IsomericSMILES'] assert rows[1][0] == '1' assert rows[2][0] == '2' assert rows[3][0] == '3' PubChemPy-1.0.4/tests/test_errors.py000066400000000000000000000020441307322065300174400ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_errors ~~~~~~~~~~~~~ Test errors. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * def test_invalid_identifier(): """BadRequestError should be raised if identifier is not a positive integer.""" with pytest.raises(BadRequestError): Compound.from_cid('aergaerhg') with pytest.raises(BadRequestError): get_compounds('srthrthsr') with pytest.raises(BadRequestError): get_substances('grgrqjksa') def test_notfound_identifier(): """NotFoundError should be raised if identifier is a positive integer but record doesn't exist.""" with pytest.raises(NotFoundError): Compound.from_cid(999999999) with pytest.raises(NotFoundError): Substance.from_sid(999999999) def test_notfound_search(): """No error should be raised if a search returns no results.""" get_compounds(999999999) get_substances(999999999) PubChemPy-1.0.4/tests/test_identifiers.py000066400000000000000000000024741307322065300204400ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_indentifiers ~~~~~~~~~~~~~~~~~ Test identifiers requests. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * def test_identifiers_from_name(): """Use a name input to retrieve lists of identifiers.""" # Get CID for each compound linked to substances with name Aspirin assert len(get_cids('Aspirin', 'name', 'substance')) >= 10 # Get CID for each compound with name Aspirin assert len(get_cids('Aspirin', 'name', 'compound')) >= 1 # Get SID for substances linked to compound with name Aspirin assert len(get_sids('Aspirin', 'name', 'substance')) >= 10 # Get AID for each assay linked to substances with name Aspirin assert len(get_aids('Aspirin', 'name', 'substance')) >= 10 # Get AID for each assay linked to compound with name Aspirin assert len(get_aids('Aspirin', 'name', 'compound')) >= 1 def test_no_identifiers(): """Test retrieving no identifier results.""" assert get_cids('asfgaerghaeirughae', 'name', 'substance') == [] assert get_cids('asfgaerghaeirughae', 'name', 'compound') == [] assert get_sids(999999999, 'cid', 'compound') == [] assert get_aids(150194, 'cid', 'compound') == [] PubChemPy-1.0.4/tests/test_pandas.py000066400000000000000000000035421307322065300173760ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_pandas ~~~~~~~~~~~ Test optional pandas functionality. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * log = logging.getLogger(__name__) # Import pandas as pd, skipping tests in this module if pandas is not installed pd = pytest.importorskip('pandas') def test_compounds_dataframe(): """""" df = get_compounds('C20H41Br', 'formula', as_dataframe=True) assert df.ndim == 2 assert df.index.names == ['cid'] assert len(df.index) > 5 columns = df.columns.values.tolist() assert 'atom_stereo_count' in columns assert 'atoms' in columns assert 'canonical_smiles' in columns assert 'exact_mass' in columns def test_substances_dataframe(): df = get_substances([1, 2, 3, 4], as_dataframe=True) assert df.ndim == 2 assert df.index.names == ['sid'] assert len(df.index) == 4 assert df.columns.values.tolist() == ['source_id', 'source_name', 'standardized_cid', 'synonyms'] def test_properties_dataframe(): df = get_properties(['isomeric_smiles', 'xlogp', 'inchikey'], '1,2,3,4', 'cid', as_dataframe=True) assert df.ndim == 2 assert df.index.names == ['CID'] assert len(df.index) == 4 assert df.columns.values.tolist() == ['InChIKey', 'IsomericSMILES', 'XLogP'] def test_compound_series(): s = Compound.from_cid(241).to_series() assert isinstance(s, pd.Series) def test_substance_series(): s = Substance.from_sid(1234).to_series() assert isinstance(s, pd.Series) def test_compound_to_frame(): s = compounds_to_frame(Compound.from_cid(241)) assert isinstance(s, pd.DataFrame) def test_substance_to_frame(): s = substances_to_frame(Substance.from_sid(1234)) assert isinstance(s, pd.DataFrame) PubChemPy-1.0.4/tests/test_properties.py000066400000000000000000000033621307322065300203240ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_properties ~~~~~~~~~~~~~~~ Test properties requests. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * def test_properties(): """""" results = get_properties(['IsomericSMILES', 'InChIKey'], 'tris-(1,10-phenanthroline)ruthenium', 'name') assert len(results) > 0 for result in results: assert 'CID' in result assert 'IsomericSMILES' in result assert 'InChIKey' in result def test_underscore_properties(): """Properties can also be specified as underscore-separated words, rather than CamelCase.""" results = get_properties(['isomeric_smiles', 'molecular_weight'], 'tris-(1,10-phenanthroline)ruthenium', 'name') assert len(results) > 0 for result in results: assert 'CID' in result assert 'IsomericSMILES' in result assert 'MolecularWeight' in result def test_comma_string_properties(): """Properties can also be specified as a comma-separated string, rather than a list.""" results = get_properties('isomeric_smiles,InChIKey,molecular_weight', 'tris-(1,10-phenanthroline)ruthenium', 'name') assert len(results) > 0 for result in results: assert 'CID' in result assert 'IsomericSMILES' in result assert 'MolecularWeight' in result assert 'InChIKey' in result def test_synonyms(): results = get_synonyms('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles') assert len(results) > 0 for result in results: assert 'CID' in result assert 'Synonym' in result assert isinstance(result['Synonym'], list) assert len(result['Synonym']) > 0 PubChemPy-1.0.4/tests/test_requests.py000066400000000000000000000037231307322065300200040ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_requests ~~~~~~~~~~~~~ Test basic requests. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * def test_requests(): """Test a variety of basic raw requests and ensure they don't return an error code.""" assert request('c1ccccc1', 'smiles').getcode() == 200 assert request('DTP/NCI', 'sourceid', 'substance', '747285', 'SDF').getcode() == 200 assert request('coumarin', 'name', output='PNG', image_size='50x50').getcode() == 200 def test_content_type(): """Test content type header matches desired output format.""" assert request(241, output='JSON').headers['Content-Type'] == 'application/json' assert request(241, output='XML').headers['Content-Type'] == 'application/xml' assert request(241, output='SDF').headers['Content-Type'] == 'chemical/x-mdl-sdfile' assert request(241, output='ASNT').headers['Content-Type'] == 'text/plain' assert request(241, output='PNG').headers['Content-Type'] == 'image/png' def test_listkey_requests(): """Test asynchronous listkey requests.""" r1 = get_json('CC', 'smiles', operation='cids', searchtype='superstructure') assert 'IdentifierList' in r1 and 'CID' in r1['IdentifierList'] r2 = get_json('C10H21N', 'formula', listkey_count=3) assert 'PC_Compounds' in r2 and len(r2['PC_Compounds']) == 3 def test_xref_request(): """Test requests with xref inputs.""" response = request('US6187568B1', 'PatentID', 'substance', operation='sids', searchtype='xref') assert response.code == 200 response2 = get_json('US6187568B1', 'PatentID', 'substance', operation='sids', searchtype='xref') assert 'IdentifierList' in response2 assert 'SID' in response2['IdentifierList'] sids = get_sids('US6187568B1', 'PatentID', 'substance', searchtype='xref') assert all(isinstance(sid, int) for sid in sids) PubChemPy-1.0.4/tests/test_search.py000066400000000000000000000013451307322065300173740ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_search ~~~~~~~~~~~ Test searching. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * def test_search_assays(): assays = get_assays([1, 1000, 490]) for assay in assays: assert isinstance(assay.name, text_types) def test_substructure(): results = get_compounds('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles', searchtype='substructure', listkey_count=3) assert len(results) == 3 for result in results: assert all(el in [a['element'] for a in result.atoms] for el in {'C', 'N', 'H'}) assert result.heavy_atom_count >= 14 PubChemPy-1.0.4/tests/test_sources.py000066400000000000000000000015701307322065300176120ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_download ~~~~~~~~~~~~~ Test downloading. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * def test_substance_sources(): """Retrieve a list of all Substance sources.""" substance_sources = get_all_sources() assert len(substance_sources) > 20 assert isinstance(substance_sources, list) assert 'SureChEMBL' in substance_sources assert 'DiscoveryGate' in substance_sources assert 'ZINC' in substance_sources def test_assay_sources(): """Retrieve a list of all Assay sources.""" assay_sources = get_all_sources('assay') assert len(assay_sources) > 20 assert isinstance(assay_sources, list) assert 'ChEMBL' in assay_sources assert 'DTP/NCI' in assay_sources PubChemPy-1.0.4/tests/test_substance.py000066400000000000000000000033231307322065300201140ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ test_substance ~~~~~~~~~~~~~~ Test substance object. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals import pytest from pubchempy import * @pytest.fixture(scope='module') def s1(): """Substance SID 24864499.""" return Substance.from_sid(24864499) def test_basic(s1): """Test Substance is retrieved and has a record and correct SID.""" assert s1.sid == 24864499 assert repr(s1) == 'Substance(24864499)' assert s1.record def test_substance_equality(): assert Substance.from_sid(24864499) == Substance.from_sid(24864499) assert get_substances('Coumarin 343, Dye Content 97 %', 'name')[0] == get_substances(24864499)[0] def test_synonyms(s1): assert len(s1.synonyms) == 1 def test_source(s1): assert s1.source_name == 'Sigma-Aldrich' assert s1.source_id == '393029_ALDRICH' def test_deposited_compound(s1): """Check that a Compound object can be constructed from the embedded deposited compound record.""" assert s1.deposited_compound.record def test_deposited_compound2(): """Check that a Compound object can be constructed from the embedded deposited compound record.""" s2 = Substance.from_sid(223766453) assert s2.deposited_compound.record def test_standardized_compound(s1): """Check the CID is correct and that the Compound can be retrieved.""" assert s1.standardized_cid == 108770 assert s1.standardized_compound.cid == 108770 def test_related_records(s1): assert len(s1.cids) == 1 assert len(s1.aids) == 0 def test_substance_dict(s1): assert isinstance(s1.to_dict(), dict) assert s1.to_dict()