pax_global_header00006660000000000000000000000064125357034130014515gustar00rootroot0000000000000052 comment=174ded08546919c1e0dbd0cc663122c5027ffeb1 elasticsearch-py-1.6.0/000077500000000000000000000000001253570341300147615ustar00rootroot00000000000000elasticsearch-py-1.6.0/.coveragerc000066400000000000000000000002551253570341300171040ustar00rootroot00000000000000[run] omit = */python?.?/* */lib-python/?.?/*.py */lib_pypy/* */site-packages/* *.egg/* test_elasticsearch/* elasticsearch/connection/esthrift/* elasticsearch-py-1.6.0/.gitignore000066400000000000000000000002621253570341300167510ustar00rootroot00000000000000.*.swp *~ *.py[co] .coverage test_elasticsearch/cover test_elasticsearch/local.py docs/_build elasticsearch.egg-info .tox dist *.egg coverage.xml nosetests.xml junit-*.xml build elasticsearch-py-1.6.0/.gitmodules000066400000000000000000000001541253570341300171360ustar00rootroot00000000000000[submodule "rest-api-spec"] path = rest-api-spec url = https://github.com/elasticsearch/elasticsearch.git elasticsearch-py-1.6.0/.travis.yml000066400000000000000000000014471253570341300171000ustar00rootroot00000000000000language: python python: - "2.6" - "2.7" - "3.2" - "3.3" - "3.4" - "pypy" env: # different connection classes to test - TEST_ES_CONNECTION=Urllib3HttpConnection - TEST_ES_CONNECTION=RequestsHttpConnection install: - mkdir /tmp/elasticsearch - wget -O - http://s3-eu-west-1.amazonaws.com/build-eu.elasticsearch.org/origin/master/nightly/JDK7/elasticsearch-latest-SNAPSHOT.tar.gz | tar xz --directory=/tmp/elasticsearch --strip-components=1 - /tmp/elasticsearch/bin/elasticsearch -d --path.data /tmp --discovery.zen.ping.multicast.enabled false --script.inline on --script.indexed on - git clone https://github.com/elastic/elasticsearch.git ../elasticsearch - pip install . script: - python setup.py test notifications: email: recipients: - honza.kral@gmail.com elasticsearch-py-1.6.0/AUTHORS000066400000000000000000000031351253570341300160330ustar00rootroot00000000000000Honza Král Jordi Llonch Rob Hudson Njal Karevoll Boaz Leskes Graeme Coupar Murhaf Fares Hari haran Richard Boulton Brian Hicks Karan Gupta H. İbrahim Güngör James Yu Andrey Balandin Marco Hoyer Max Kutsevol Jan Gaedicke Klaas Bosteels starenka Mathieu Geli Bruno Renié Ondrej Sika Alex Ksikes Ronan Amicel speedplane Corey Farwell Andrew Snare Armagnac Alexandru Ghitza Aarni Koskela Michael Schier Yuri Khrustalev Rémy HUBSCHER j0hnsmith Julian Mehnle Steven Moy syllogismos Magnus Bäck Marc Abramowitz J Charitopoulos Sven Wästlund Russell Savage Georges Toth Malthe Borch Jim Kelly elasticsearch-py-1.6.0/CONTRIBUTING.md000066400000000000000000000037301253570341300172150ustar00rootroot00000000000000If you have a bugfix or new feature that you would like to contribute to elasticsearch-py, please find or open an issue about it first. Talk about what you would like to do. It may be that somebody is already working on it, or that there are particular issues that you should know about before implementing the change. We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code. The process for contributing to any of the Elasticsearch repositories is similar. 1. Please make sure you have signed the [Contributor License Agreement](http://www.elastic.co/contributor-agreement/). We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once. 2. Run the test suite to ensure your changes do not break existing code: ```` python setup.py test ```` See the README file in `test_elasticsearch` dirctory for more information on running the test suite. 3. Rebase your changes. Update your local repository with the most recent code from the main elasticsearch-py repository, and rebase your branch on top of the latest master branch. We prefer your changes to be squashed into a single commit. 4. Submit a pull request. Push your local changes to your forked copy of the repository and submit a pull request. In the pull request, describe what your changes do and mention the number of the issue where discussion has taken place, eg “Closes #123″. Please consider adding or modifying tests related to your changes. Then sit back and wait. There will probably be discussion about the pull request and, if any changes are needed, we would love to work with you to get your pull request merged into elasticsearch-py. elasticsearch-py-1.6.0/Changelog.rst000066400000000000000000000100041253570341300173750ustar00rootroot00000000000000.. _changelog: Changelog ========= 1.6.0 (2015-06-10) ------------------ * Add ``indices.flush_synced`` API 1.5.0 (2015-05-18) ------------------ * Add support for ``query_cache`` parameter when searching * helpers have been made more secure by changing defaults to raise an exception on errors * removed deprecated options ``replication`` and the deprecated benchmark api. * Added ``AddonClient`` class to allow for extending the client from outside 1.4.0 (2015-02-11) ------------------ * Using insecure SSL configuration (``verify_cert=False``) raises a warning * ``reindex`` accepts a ``query`` parameter * enable ``reindex`` helper to accept any kwargs for underlying ``bulk`` and ``scan`` calls * when doing an initial sniff (via ``sniff_on_start``) ignore special sniff timeout * option to treat ``TransportError`` as normal failure in ``bulk`` helpers * fixed an issue with sniffing when only a single host was passed in 1.3.0 (2014-12-31) ------------------ * Timeout now doesn't trigger a retry by default (can be overriden by setting ``retry_on_timeout=True``) * Introduced new parameter ``retry_on_status`` (defaulting to ``(503, 504, )``) controls which http status code should lead to a retry. * Implemented url parsing according to RFC-1738 * Added support for proper SSL certificate handling * Required parameters are now checked for non-empty values * ConnectionPool now checks if any connections were defined * DummyConnectionPool introduced when no load balancing is needed (only one connection defined) * Fixed a race condition in ConnectionPool 1.2.0 (2014-08-03) ------------------ Compatibility with newest (1.3) Elasticsearch APIs. * Filter out master-only nodes when sniffing * Improved docs and error messages 1.1.1 (2014-07-04) ------------------ Bugfix release fixing escaping issues with ``request_timeout``. 1.1.0 (2014-07-02) ------------------ Compatibility with newest Elasticsearch APIs. * Test helpers - ``ElasticsearchTestCase`` and ``get_test_client`` for use in your tests * Python 3.2 compatibility * Use ``simplejson`` if installed instead of stdlib json library * Introducing a global ``request_timeout`` parameter for per-call timeout * Bug fixes 1.0.0 (2014-02-11) ------------------ Elasticsearch 1.0 compatibility. See 0.4.X releases (and 0.4 branch) for code compatible with 0.90 elasticsearch. * major breaking change - compatible with 1.0 elasticsearch releases only! * Add an option to change the timeout used for sniff requests (``sniff_timeout``). * empty responses from the server are now returned as empty strings instead of None * ``get_alias`` now has ``name`` as another optional parameter due to issue #4539 in es repo. Note that the order of params have changed so if you are not using keyword arguments this is a breaking change. 0.4.4 (2013-12-23) ------------------ * ``helpers.bulk_index`` renamed to ``helpers.bulk`` (alias put in place for backwards compatibility, to be removed in future versions) * Added ``helpers.streaming_bulk`` to consume an iterator and yield results per operation * ``helpers.bulk`` and ``helpers.streaming_bulk`` are no longer limitted to just index operations. * unicode body (for ``incices.analyze`` for example) is now handled correctly * changed ``perform_request`` on ``Connection`` classes to return headers as well. This is a backwards incompatible change for people who have developed their own connection class. * changed deserialization mechanics. Users who provided their own serializer that didn't extend ``JSONSerializer`` need to specify a ``mimetype`` class attribute. * minor bug fixes 0.4.3 (2013-10-22) ------------------ * Fixes to ``helpers.bulk_index``, better error handling * More benevolent ``hosts`` argument parsing for ``Elasticsearch`` * ``requests`` no longer required (nor recommended) for install 0.4.2 (2013-10-08) ------------------ * ``ignore`` param acceted by all APIs * Fixes to ``helpers.bulk_index`` 0.4.1 (2013-09-24) ------------------ Initial release. elasticsearch-py-1.6.0/LICENSE000066400000000000000000000236371253570341300160010ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. elasticsearch-py-1.6.0/MANIFEST.in000066400000000000000000000005351253570341300165220ustar00rootroot00000000000000include AUTHORS include Changelog.rst include CONTRIBUTING.md include LICENSE include MANIFEST.in include README.rst include README include tox.ini include setup.py include elasticsearch/connection/esthrift/Rest-remote recursive-include docs * prune docs/_build prune test_elasticsearch recursive-exclude * __pycache__ recursive-exclude * *.py[co] elasticsearch-py-1.6.0/README000077700000000000000000000000001253570341300173232README.rstustar00rootroot00000000000000elasticsearch-py-1.6.0/README.rst000066400000000000000000000064441253570341300164600ustar00rootroot00000000000000Python Elasticsearch Client =========================== Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Python; because of this it tries to be opinion-free and very extendable. For a more high level client library with more limited scope, have a look at `elasticsearch-dsl`_ - it is a more pythonic library sitting on top of ``elasticsearch-py``. .. _elasticsearch-dsl: http://elasticsearch-dsl.rtfd.org/ Compatibility ------------- The library is compatible with both Elasticsearch 1.x and 0.90.x but you **have to use a matching version**. For **Elasticsearch 1.0** and later, use the major version 1 (``1.x.y``) of the library. For **Elasticsearch 0.90.x**, use a version from ``0.4.x`` releases of the library. The recommended way to set your requirements in your `setup.py` or `requirements.txt` is:: # Elasticsearch 1.0 elasticsearch>=1.0.0,<2.0.0 # Elasticsearch 0.90 elasticsearch<1.0.0 The development is happening on ``master`` and ``0.4`` branches, respectively. Installation ------------ Install the ``elasticsearch`` package with `pip `_:: pip install elasticsearch Example use ----------- Simple use-case:: >>> from datetime import datetime >>> from elasticsearch import Elasticsearch # by default we connect to localhost:9200 >>> es = Elasticsearch() # create an index in elasticsearch, ignore status code 400 (index already exists) >>> es.indices.create(index='my-index', ignore=400) {u'acknowledged': True} # datetimes will be serialized >>> es.index(index="my-index", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()}) {u'_id': u'42', u'_index': u'my-index', u'_type': u'test-type', u'_version': 1, u'ok': True} # but not deserialized >>> es.get(index="my-index", doc_type="test-type", id=42)['_source'] {u'any': u'data', u'timestamp': u'2013-05-12T19:45:31.804229'} `Full documentation`_. .. _Full documentation: http://elasticsearch-py.rtfd.org/ Features -------- The client's features include: * translating basic Python data types to and from json (datetimes are not decoded for performance reasons) * configurable automatic discovery of cluster nodes * persistent connections * load balancing (with pluggable selection strategy) across all available nodes * failed connection penalization (time based - failed connections won't be retried until a timeout is reached) * support for ssl and http authentication * thread safety * pluggable architecture License ------- Copyright 2015 Elasticsearch Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Build status ------------ .. image:: https://secure.travis-ci.org/elastic/elasticsearch-py.png :target: http://travis-ci.org/#!/elastic/elasticsearch-py elasticsearch-py-1.6.0/docs/000077500000000000000000000000001253570341300157115ustar00rootroot00000000000000elasticsearch-py-1.6.0/docs/Changelog.rst000077700000000000000000000000001253570341300231612../Changelog.rstustar00rootroot00000000000000elasticsearch-py-1.6.0/docs/Makefile000066400000000000000000000152061253570341300173550ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # User-friendly check for sphinx-build ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " xml to make Docutils-native XML files" @echo " pseudoxml to make pseudoxml-XML files for display purposes" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Elasticsearch.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Elasticsearch.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/Elasticsearch" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Elasticsearch" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." latexpdfja: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through platex and dvipdfmx..." $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." xml: $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml @echo @echo "Build finished. The XML files are in $(BUILDDIR)/xml." pseudoxml: $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml @echo @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." elasticsearch-py-1.6.0/docs/api.rst000066400000000000000000000047041253570341300172210ustar00rootroot00000000000000.. _api: API Documentation ================= All the API calls map the raw REST api as closely as possible, including the distinction between required and optional arguments to the calls. This means that the code makes distinction between positional and keyword arguments; we, however, recommend that people **use keyword arguments for all calls for consistency and safety**. .. note:: for compatibility with the Python ecosystem we use ``from_`` instead of ``from`` and ``doc_type`` instead of ``type`` as parameter names. Global options -------------- Some parameters are added by the client itself and can be used in all API calls. Ignore ~~~~~~ An API call is considered successful (and will return a response) if elasticsearch returns a 2XX response. Otherwise an instance of :class:`~elasticsearch.TransportError` (or a more specific subclass) will be raised. You can see other exception and error states in :ref:`exceptions`. If you do not wish an exception to be raised you can always pass in an ``ignore`` parameter with either a single status code that should be ignored or a list of them:: from elasticsearch import Elasticsearch es = Elasticsearch() # ignore 400 cause by IndexAlreadyExistsException when creating an index es.indices.create(index='test-index', ignore=400) # ignore 404 and 400 es.indices.delete(index='test-index', ignore=[400, 404]) Timeout ~~~~~~~ Global timeout can be set when constructing the client (see :class:`~elasticsearch.Connection`'s ``timeout`` parameter) or on a per-request basis using ``request_timeout`` (float value in seconds) as part of any API call, this value will get passed to the ``perform_request`` method of the connection class:: # only wait for 1 second, regardless of the client's default es.cluster.health(wait_for_status='yellow', request_timeout=1) .. note:: Some API calls also accept a ``timeout`` parameter that is passed to Elasticsearch server. This timeout is internal and doesn't guarantee that the request will end in the specified time. .. py:module:: elasticsearch Elasticsearch ------------- .. autoclass:: Elasticsearch :members: .. py:module:: elasticsearch.client Indices ------- .. autoclass:: IndicesClient :members: Cluster ------- .. autoclass:: ClusterClient :members: Nodes ----- .. autoclass:: NodesClient :members: Cat --- .. autoclass:: CatClient :members: Snapshot --- .. autoclass:: SnapshotClient :members: elasticsearch-py-1.6.0/docs/conf.py000066400000000000000000000204131253570341300172100ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Elasticsearch documentation build configuration file, created by # sphinx-quickstart on Mon May 6 15:38:41 2013. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.insert(0, os.path.abspath('.')) # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest'] autoclass_content = "both" # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'Elasticsearch' copyright = u'2013, Honza Král' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = '1.6.0' # The full version, including alpha/beta/rc tags. release = '1.6.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # If true, keep warnings as "system message" paragraphs in the built documents. #keep_warnings = False # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. on_rtd = os.environ.get('READTHEDOCS', None) == 'True' if not on_rtd: # only import and set the theme if we're building docs locally import sphinx_rtd_theme html_theme = 'sphinx_rtd_theme' html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'Elasticsearchdoc' # -- Options for LaTeX output -------------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'Elasticsearch.tex', u'Elasticsearch Documentation', u'Honza Král', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'elasticsearch', u'Elasticsearch Documentation', [u'Honza Král'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------------ # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', 'Elasticsearch', u'Elasticsearch Documentation', u'Honza Král', 'Elasticsearch', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # If true, do not generate a @detailmenu in the "Top" node's menu. #texinfo_no_detailmenu = False elasticsearch-py-1.6.0/docs/connection.rst000066400000000000000000000030771253570341300206110ustar00rootroot00000000000000.. _connection_api: Connection Layer API ==================== All of the classes reponsible for handling the connection to the Elasticsearch cluster. The default subclasses used can be overriden by passing parameters to the :class:`~elasticsearch.Elasticsearch` class. All of the arguments to the client will be passed on to :class:`~elasticsearch.Transport`, :class:`~elasticsearch.ConnectionPool` and :class:`~elasticsearch.Connection`. For example if you wanted to use your own implementation of the :class:`~elasticsearch.ConnectionSelector` class you can just pass in the ``selector_class`` parameter. .. note:: :class:`~elasticsearch.ConnectionPool` and related options (like ``selector_class``) will only be used if more than one connection is defined. Either directly or via the :ref:`sniffing` mechanism. .. py:module:: elasticsearch Transport --------- .. autoclass:: Transport(hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, nodes_to_host_callback=construct_hosts_list, sniff_on_start=False, sniffer_timeout=None, sniff_on_connection_fail=False, serializer=JSONSerializer(), max_retries=3, ** kwargs) :members: Connection Pool --------------- .. autoclass:: ConnectionPool(connections, dead_timeout=60, selector_class=RoundRobinSelector, randomize_hosts=True, ** kwargs) :members: Connection Selector ------------------- .. autoclass:: ConnectionSelector(opts) :members: Urllib3HttpConnection (default connection_class) ------------------------------------------------ .. autoclass:: Urllib3HttpConnection :members: elasticsearch-py-1.6.0/docs/exceptions.rst000066400000000000000000000011211253570341300206170ustar00rootroot00000000000000.. _exceptions: Exceptions ========== .. py:module:: elasticsearch .. autoclass:: ImproperlyConfigured .. autoclass:: ElasticsearchException .. autoclass:: SerializationError(ElasticsearchException) .. autoclass:: TransportError(ElasticsearchException) :members: .. autoclass:: ConnectionError(TransportError) .. autoclass:: ConnectionTimeout(ConnectionError) .. autoclass:: SSLError(ConnectionError) .. autoclass:: NotFoundError(TransportError) .. autoclass:: ConflictError(TransportError) .. autoclass:: RequestError(TransportError) .. autoclass:: ConnectionError(TransportError) elasticsearch-py-1.6.0/docs/helpers.rst000066400000000000000000000004061253570341300201050ustar00rootroot00000000000000.. _helpers: Helpers ======= Collection of simple helper functions that abstract some specifics or the raw API. .. py:module:: elasticsearch.helpers .. autofunction:: streaming_bulk .. autofunction:: bulk .. autofunction:: scan .. autofunction:: reindex elasticsearch-py-1.6.0/docs/index.rst000066400000000000000000000176341253570341300175650ustar00rootroot00000000000000Python Elasticsearch Client =========================== Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Python; because of this it tries to be opinion-free and very extendable. For a more high level client library with more limited scope, have a look at `elasticsearch-dsl`_ - it is a more pythonic library sitting on top of ``elasticsearch-py``. .. _elasticsearch-dsl: http://elasticsearch-dsl.rtfd.org/ Compatibility ------------- The library is compatible with both Elasticsearch 1.x and 0.90.x but you **have to use a matching version**. For **Elasticsearch 1.0** and later, use the major version 1 (``1.x.y``) of the library. For **Elasticsearch 0.90.x**, use a version from ``0.4.x`` releases of the library. The recommended way to set your requirements in your `setup.py` or `requirements.txt` is:: # Elasticsearch 1.0 elasticsearch>=1.0.0,<2.0.0 # Elasticsearch 0.90 elasticsearch<1.0.0 The development is happening on ``master`` and ``0.4`` branches, respectively. Example Usage ------------- :: from datetime import datetime from elasticsearch import Elasticsearch es = Elasticsearch() doc = { 'author': 'kimchy', 'text': 'Elasticsearch: cool. bonsai cool.', 'timestamp': datetime(2010, 10, 10, 10, 10, 10) } res = es.index(index="test-index", doc_type='tweet', id=1, body=doc) print(res['created']) res = es.get(index="test-index", doc_type='tweet', id=1) print(res['_source']) es.indices.refresh(index="test-index") res = es.search(index="test-index", body={"query": {"match_all": {}}}) print("Got %d Hits:" % res['hits']['total']) for hit in res['hits']['hits']: print("%(timestamp)s %(author)s: %(text)s" % hit["_source"]) Features -------- This client was designed as very thin wrapper around Elasticseach's REST API to allow for maximum flexibility. This means that there are no opinions in this client; it also means that some of the APIs are a little cumbersome to use from Python. We have created some :ref:`helpers` to help with this issue. Persistent Connections ~~~~~~~~~~~~~~~~~~~~~~ ``elasticsearch-py`` uses persistent connections inside of individual connection pools (one per each configured or sniffed node). Out of the box you can choose to use ``http``, ``thrift`` or an experimental ``memcached`` protocol to communicate with the elasticsearch nodes. See :ref:`transports` for more information. The transport layer will create an instance of the selected connection class per node and keep track of the health of individual nodes - if a node becomes unresponsive (throwing exceptions while connecting to it) it's put on a timeout by the :class:`~elasticsearch.ConnectionPool` class and only returned to the circulation after the timeout is over (or when no live nodes are left). By default nodes are randomized before being passed into the pool and round-robin strategy is used for load balancing. You can customize this behavior by passing parameters to the :ref:`connection_api` (all keyword arguments to the :class:`~elasticsearch.Elasticsearch` class will be passed through). If what you want to accomplish is not supported you should be able to create a subclass of the relevant component and pass it in as a parameter to be used instead of the default implementation. Automatic Retries ~~~~~~~~~~~~~~~~~ If a connection to a node fails due to connection issues (raises :class:`~elasticsearch.ConnectionError`) it is considered in faulty state. It will be placed on hold for ``dead_timeout`` seconds and the request will be retried on another node. If a connection fails multiple times in a row the timeout will get progressively larger to avoid hitting a node that's, by all indication, down. If no live connection is availible, the connection that has the smallest timeout will be used. By default retries are not triggered by a timeout (:class:`~elasticsearch.ConnectionTimeout`), set ``retry_on_timeout`` to ``True`` to also retry on timeouts. .. _sniffing: Sniffing ~~~~~~~~ The client can be configured to inspect the cluster state to get a list of nodes upon startup, periodically and/or on failure. See :class:`~elasticsearch.Transport` parameters for details. Some example configurations:: from elasticsearch import Elasticsearch # by default we don't sniff, ever es = Elasticsearch() # you can specify to sniff on startup to inspect the cluster and load # balance across all nodes es = Elasticsearch(["seed1", "seed2"], sniff_on_start=True) # you can also sniff periodically and/or after failure: es = Elasticsearch(["seed1", "seed2"], sniff_on_start=True, sniff_on_connection_fail=True, sniffer_timeout=60) SSL and Authentication ~~~~~~~~~~~~~~~~~~~~~~ You can configure the client to use ``SSL`` for connecting to your elasticsearch cluster, including certificate verification and http auth:: from elasticsearch import Elasticsearch # you can use RFC-1738 to specify the url es = Elasticsearch(['https://user:secret@localhost:443']) # ... or specify common parameters as kwargs # use certifi for CA certificates import certifi es = Elasticsearch( ['localhost', 'otherhost'], http_auth=('user', 'secret'), port=443, use_ssl=True, verify_certs=True, ca_certs=certifi.where(), ) .. warning:: By default SSL certificates won't be verified, pass in ``verify_certs=True`` to make sure your certificates will get verified. The client doesn't ship with any CA certificates; easiest way to obtain the common set is by using the `certifi`_ package (as shown above). See class :class:`~elasticsearch.Urllib3HttpConnection` for detailed description of the options. .. _certifi: http://certifi.io/ Logging ~~~~~~~ ``elasticsearch-py`` uses the standard `logging library`_ from python to define two loggers: ``elasticsearch`` and ``elasticsearch.trace``. ``elasticsearch`` is used by the client to log standard activity, depending on the log level. ``elasticsearch.trace`` can be used to log requests to the server in the form of ``curl`` commands using pretty-printed json that can then be executed from command line. If the trace logger has not been configured already it is set to `propagate=False` so it needs to be activated separately. .. _logging library: http://docs.python.org/3.3/library/logging.html Environment considerations -------------------------- When using the client there are several limitations of your environment that could come into play. When using an http load balancer you cannot use the :ref:`sniffing` functionality - the cluster would supply the client with IP addresses to directly cnnect to the cluster, circumventing the load balancer. Depending on your configuration this might be something you don't want or break completely. In some environments (notably on Google App Engine) your http requests might be restricted so that ``GET`` requests won't accept body. In that case use the ``send_get_body_as`` parameter of :class:`~elasticsearch.Transport` to send all bodies via post:: from elasticsearch import Elasticsearch es = Elasticsearch(send_get_body_as='POST') Contents -------- .. toctree:: :maxdepth: 2 api exceptions connection transports helpers Changelog License ------- Copyright 2013 Elasticsearch Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Indices and tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search` elasticsearch-py-1.6.0/docs/transports.rst000066400000000000000000000021641253570341300206650ustar00rootroot00000000000000.. _transports: Transport classes ================= List of transport classes that can be used, simply import your choice and pass it to the constructor of :class:`~elasticsearch.Elasticsearch` as `connection_class`. Note that Thrift and Memcached protocols are experimental and require a plugin to be installed in your cluster as well as additional dependencies (`thrift==0.9` and `pylibmc==1.2`). For example to use the thrift connection just import it and use it. The connection classes are aware of their respective default ports (9500 for thrift) so there is no need to specify them unless modified:: from elasticsearch import Elasticsearch, ThriftConnection es = Elasticsearch(connection_class=ThriftConnection) .. py:module:: elasticsearch.connection Connection ---------- .. autoclass:: Connection Urllib3HttpConnection --------------------- .. autoclass:: Urllib3HttpConnection RequestsHttpConnection ---------------------- .. autoclass:: RequestsHttpConnection ThriftConnection ---------------- .. autoclass:: ThriftConnection MemcachedConnection ------------------- .. autoclass:: MemcachedConnection elasticsearch-py-1.6.0/elasticsearch/000077500000000000000000000000001253570341300175735ustar00rootroot00000000000000elasticsearch-py-1.6.0/elasticsearch/__init__.py000066400000000000000000000015541253570341300217110ustar00rootroot00000000000000from __future__ import absolute_import VERSION = (1, 6, 0) __version__ = VERSION __versionstr__ = '.'.join(map(str, VERSION)) import sys if (2, 7) <= sys.version_info < (3, 2): # On Python 2.7 and Python3 < 3.2, install no-op handler to silence # `No handlers could be found for logger "elasticsearch"` message per # import logging logger = logging.getLogger('elasticsearch') logger.addHandler(logging.NullHandler()) from .client import Elasticsearch from .transport import Transport from .connection_pool import ConnectionPool, ConnectionSelector, \ RoundRobinSelector from .serializer import JSONSerializer from .connection import Connection, RequestsHttpConnection, \ Urllib3HttpConnection, MemcachedConnection, ThriftConnection from .exceptions import * elasticsearch-py-1.6.0/elasticsearch/client/000077500000000000000000000000001253570341300210515ustar00rootroot00000000000000elasticsearch-py-1.6.0/elasticsearch/client/__init__.py000066400000000000000000002002751253570341300231700ustar00rootroot00000000000000from __future__ import unicode_literals import weakref import logging from ..transport import Transport from ..exceptions import NotFoundError, TransportError from ..compat import string_types, urlparse from .indices import IndicesClient from .cluster import ClusterClient from .cat import CatClient from .nodes import NodesClient from .snapshot import SnapshotClient from .utils import query_params, _make_path, SKIP_IN_PATH logger = logging.getLogger('elasticsearch') def _normalize_hosts(hosts): """ Helper function to transform hosts argument to :class:`~elasticsearch.Elasticsearch` to a list of dicts. """ # if hosts are empty, just defer to defaults down the line if hosts is None: return [{}] # passed in just one string if isinstance(hosts, string_types): hosts = [hosts] out = [] # normalize hosts to dicts for host in hosts: if isinstance(host, string_types): if '://' not in host: host = "//%s" % host parsed_url = urlparse(host) h = {"host": parsed_url.hostname} if parsed_url.port: h["port"] = parsed_url.port if parsed_url.scheme == "https": h['port'] = parsed_url.port or 443 h['use_ssl'] = True h['scheme'] = 'http' elif parsed_url.scheme: h['scheme'] = parsed_url.scheme if parsed_url.username or parsed_url.password: h['http_auth'] = '%s:%s' % (parsed_url.username, parsed_url.password) if parsed_url.path and parsed_url.path != '/': h['url_prefix'] = parsed_url.path out.append(h) else: out.append(host) return out class Elasticsearch(object): """ Elasticsearch low-level client. Provides a straightforward mapping from Python to ES REST endpoints. The instance has attributes ``cat``, ``cluster``, ``indices``, ``nodes`` and ``snapshot`` that provide access to instances of :class:`~elasticsearch.client.CatClient`, :class:`~elasticsearch.client.ClusterClient`, :class:`~elasticsearch.client.IndicesClient`, :class:`~elasticsearch.client.NodesClient` and :class:`~elasticsearch.client.SnapshotClient` respectively. This is the preferred (and only supported) way to get access to those classes and their methods. You can specify your own connection class which should be used by providing the ``connection_class`` parameter:: # create connection to localhost using the ThriftConnection es = Elasticsearch(connection_class=ThriftConnection) If you want to turn on :ref:`sniffing` you have several options (described in :class:`~elasticsearch.Transport`):: # create connection that will automatically inspect the cluster to get # the list of active nodes. Start with nodes running on 'esnode1' and # 'esnode2' es = Elasticsearch( ['esnode1', 'esnode2'], # sniff before doing anything sniff_on_start=True, # refresh nodes after a node fails to respond sniff_on_connection_fail=True, # and also every 60 seconds sniffer_timeout=60 ) Different hosts can have different parameters, use a dictionary per node to specify those:: # connect to localhost directly and another node using SSL on port 443 # and an url_prefix. Note that ``port`` needs to be an int. es = Elasticsearch([ {'host': 'localhost'}, {'host': 'othernode', 'port': 443, 'url_prefix': 'es', 'use_ssl': True}, ]) If using SSL, there are several parameters that control how we deal with certificates (see :class:`~elasticsearch.Urllib3HttpConnection` for detailed description of the options):: es = Elasticsearch( ['localhost:443', 'other_host:443'], # turn on SSL use_ssl=True, # make sure we verify SSL certificates (off by default) verify_certs=True, # provide a path to CA certs on disk ca_certs='/path/to/CA_certs' ) Alternatively you can use RFC-1738 formatted URLs, as long as they are not in conflict with other options:: es = Elasticsearch( [ 'http://user:secret@localhost:9200/', 'https://user:secret@other_host:443/production' ], verify_certs=True ) """ def __init__(self, hosts=None, transport_class=Transport, **kwargs): """ :arg hosts: list of nodes we should connect to. Node should be a dictionary ({"host": "localhost", "port": 9200}), the entire dictionary will be passed to the :class:`~elasticsearch.Connection` class as kwargs, or a string in the format of ``host[:port]`` which will be translated to a dictionary automatically. If no value is given the :class:`~elasticsearch.Urllib3HttpConnection` class defaults will be used. :arg transport_class: :class:`~elasticsearch.Transport` subclass to use. :arg kwargs: any additional arguments will be passed on to the :class:`~elasticsearch.Transport` class and, subsequently, to the :class:`~elasticsearch.Connection` instances. """ self.transport = transport_class(_normalize_hosts(hosts), **kwargs) # namespaced clients for compatibility with API names # use weakref to make GC's work a little easier self.indices = IndicesClient(weakref.proxy(self)) self.cluster = ClusterClient(weakref.proxy(self)) self.cat = CatClient(weakref.proxy(self)) self.nodes = NodesClient(weakref.proxy(self)) self.snapshot = SnapshotClient(weakref.proxy(self)) def __repr__(self): try: # get a lost of all connections cons = self.transport.hosts # truncate to 10 if there are too many if len(cons) > 5: cons = cons[:5] + ['...'] return '' % cons except: # probably operating on custom transport and connection_pool, ignore return super(Elasticsearch, self).__repr__() def _bulk_body(self, body): # if not passed in a string, serialize items and join by newline if not isinstance(body, string_types): body = '\n'.join(map(self.transport.serializer.dumps, body)) # bulk body must end with a newline if not body.endswith('\n'): body += '\n' return body @query_params() def ping(self, params=None): """ Returns True if the cluster is up, False otherwise. """ try: self.transport.perform_request('HEAD', '/', params=params) except TransportError: return False return True @query_params() def info(self, params=None): """ Get the basic info from the current cluster. """ _, data = self.transport.perform_request('GET', '/', params=params) return data @query_params('consistency', 'parent', 'percolate', 'refresh', 'routing', 'timeout', 'timestamp', 'ttl', 'version', 'version_type') def create(self, index, doc_type, body, id=None, params=None): """ Adds a typed JSON document in a specific index, making it searchable. Behind the scenes this method calls index(..., op_type='create') ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg body: The document :arg id: Specific document ID (when the POST method is used) :arg consistency: Explicit write consistency setting for the operation :arg parent: ID of the parent document :arg percolate: Percolator queries to execute while indexing the document :arg refresh: Refresh the index after performing the operation :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg timestamp: Explicit timestamp for the document :arg ttl: Expiration time for the document :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ return self.index(index, doc_type, body, id=id, params=params, op_type='create') @query_params('consistency', 'op_type', 'parent', 'refresh', 'routing', 'timeout', 'timestamp', 'ttl', 'version', 'version_type') def index(self, index, doc_type, body, id=None, params=None): """ Adds or updates a typed JSON document in a specific index, making it searchable. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg body: The document :arg id: Document ID :arg consistency: Explicit write consistency setting for the operation :arg op_type: Explicit operation type (default: index) :arg parent: ID of the parent document :arg refresh: Refresh the index after performing the operation :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg timestamp: Explicit timestamp for the document :arg ttl: Expiration time for the document :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (index, doc_type, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") method = 'POST' if id in SKIP_IN_PATH else 'PUT' _, data = self.transport.perform_request(method, _make_path(index, doc_type, id), params=params, body=body) return data @query_params('parent', 'preference', 'realtime', 'refresh', 'routing') def exists(self, index, id, doc_type='_all', params=None): """ Returns a boolean indicating whether or not given document exists in Elasticsearch. ``_ :arg index: The name of the index :arg id: The document ID :arg doc_type: The type of the document (uses `_all` by default to fetch the first document matching the ID across all types) :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") try: self.transport.perform_request('HEAD', _make_path(index, doc_type, id), params=params) except NotFoundError: return False return True @query_params('_source', '_source_exclude', '_source_include', 'fields', 'parent', 'preference', 'realtime', 'refresh', 'routing', 'version', 'version_type') def get(self, index, id, doc_type='_all', params=None): """ Get a typed JSON document from the index based on its id. ``_ :arg index: The name of the index :arg id: The document ID :arg doc_type: The type of the document (uses `_all` by default to fetch the first document matching the ID across all types) :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg fields: A comma-separated list of fields to return in the response :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value :arg version: Explicit version number for concurrency control :arg version_type: Explicit version number for concurrency control """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id), params=params) return data @query_params('_source', '_source_exclude', '_source_include', 'parent', 'preference', 'realtime', 'refresh', 'routing', 'version', 'version_type') def get_source(self, index, id, doc_type='_all', params=None): """ Get the source of a document by it's index, type and id. ``_ :arg index: The name of the index :arg doc_type: The type of the document (uses `_all` by default to fetch the first document matching the ID across all types) :arg id: The document ID :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value :arg version: Explicit version number for concurrency control :arg version_type: Explicit version number for concurrency control """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_source'), params=params) return data @query_params('_source', '_source_exclude', '_source_include', 'fields', 'parent', 'preference', 'realtime', 'refresh', 'routing') def mget(self, body, index=None, doc_type=None, params=None): """ Get multiple documents based on an index, type (optional) and ids. ``_ :arg body: Document identifiers; can be either `docs` (containing full document information) or `ids` (when index and type is provided in the URL. :arg index: The name of the index :arg doc_type: The type of the document :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg fields: A comma-separated list of fields to return in the response :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_mget'), params=params, body=body) return data @query_params('consistency', 'fields', 'lang', 'parent', 'refresh', 'retry_on_conflict', 'routing', 'script', 'script_id', 'scripted_upsert', 'timeout', 'timestamp', 'ttl', 'version', 'version_type') def update(self, index, doc_type, id, body=None, params=None): """ Update a document based on a script or partial data provided. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: Document ID :arg body: The request definition using either `script` or partial `doc` :arg consistency: Explicit write consistency setting for the operation :arg fields: A comma-separated list of fields to return in the response :arg lang: The script language (default: mvel) :arg parent: ID of the parent document :arg refresh: Refresh the index after performing the operation :arg retry_on_conflict: Specify how many times should the operation be retried when a conflict occurs (default: 0) :arg routing: Specific routing value :arg script: The URL-encoded script definition (instead of using request body) :arg script_id: The id of a stored script :arg scripted_upsert: True if the script referenced in script or script_id should be called to perform inserts - defaults to false :arg timeout: Explicit operation timeout :arg timestamp: Explicit timestamp for the document :arg ttl: Expiration time for the document :arg version: Explicit version number for concurrency control :arg version_type: Explicit version number for concurrency control """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('POST', _make_path(index, doc_type, id, '_update'), params=params, body=body) return data @query_params('_source', '_source_exclude', '_source_include', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'explain', 'fielddata_fields', 'fields', 'indices_boost', 'lenient', 'allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'lowercase_expanded_terms', 'from_', 'preference', 'q', 'query_cache', 'routing', 'scroll', 'search_type', 'size', 'sort', 'source', 'stats', 'suggest_field', 'suggest_mode', 'suggest_size', 'suggest_text', 'terminate_after', 'timeout', 'track_scores', 'version') def search(self, index=None, doc_type=None, body=None, params=None): """ Execute a search query and get back search hits that match the query. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg body: The search definition using the Query DSL :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg default_operator: The default operator for query string query (AND or OR) (default: OR) :arg df: The field to use as default where no field prefix is given in the query string :arg explain: Specify whether to return detailed information about score computation as part of a hit :arg fielddata_fields: A comma-separated list of fields to return as the field data representation of a field for each hit :arg fields: A comma-separated list of fields to return as part of a hit :arg indices_boost: Comma-separated list of index boosts :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg from\_: Starting offset (default: 0) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg query_cache: Enable or disable caching on a per-query basis :arg routing: A comma-separated list of specific routing values :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg search_type: Search operation type :arg size: Number of hits to return (default: 10) :arg sort: A comma-separated list of : pairs :arg source: The URL-encoded request definition using the Query DSL (instead of using request body) :arg stats: Specific 'tag' of the request for logging and statistical purposes :arg suggest_field: Specify which field to use for suggestions :arg suggest_mode: Specify suggest mode (default: missing) :arg suggest_size: How many suggestions to return in response :arg suggest_text: The source text for which the suggestions should be returned :arg terminate_after: The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. :arg timeout: Explicit operation timeout :arg track_scores: Whether to calculate and return scores even if they are not used for sorting :arg version: Specify whether to return document version as part of a hit """ # from is a reserved word so it cannot be used, use from_ instead if 'from_' in params: params['from'] = params.pop('from_') if doc_type and not index: index = '_all' _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_search'), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local', 'preference', 'routing') def search_shards(self, index=None, doc_type=None, params=None): """ The search shards api returns the indices and shards that a search request would be executed against. This can give useful feedback for working out issues or planning optimizations with routing and shard preferences. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. (default: '"open"') :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: Specific routing value """ _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_search_shards'), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'preference', 'routing', 'scroll', 'search_type') def search_template(self, index=None, doc_type=None, body=None, params=None): """ A query that accepts a query template and a map of key/value pairs to fill in template parameters. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg body: The search definition template and its params :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: A comma-separated list of specific routing values :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg search_type: Search operation type """ _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_search', 'template'), params=params, body=body) return data @query_params('_source', '_source_exclude', '_source_include', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'fields', 'lenient', 'lowercase_expanded_terms', 'parent', 'preference', 'q', 'routing', 'source') def explain(self, index, doc_type, id, body=None, params=None): """ The explain api computes a score explanation for a query and a specific document. This can give useful feedback whether a document matches or didn't match a specific query. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: The document ID :arg body: The query definition using the Query DSL :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg analyze_wildcard: Specify whether wildcards and prefix queries in the query string query should be analyzed (default: false) :arg analyzer: The analyzer for the query string query :arg default_operator: The default operator for query string query (AND or OR), (default: OR) :arg df: The default field for query string query (default: _all) :arg fields: A comma-separated list of fields to return in the response :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg routing: Specific routing value :arg source: The URL-encoded query definition (instead of using the request body) """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_explain'), params=params, body=body) return data @query_params('scroll') def scroll(self, scroll_id=None, body=None, params=None): """ Scroll a search request created by specifying the scroll parameter. ``_ :arg scroll_id: The scroll ID :arg body: The scroll ID if not passed by URL or query parameter :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search """ if scroll_id in SKIP_IN_PATH and body in SKIP_IN_PATH: raise ValueError("You need to supply scroll_id or body.") elif scroll_id and not body: body = scroll_id elif scroll_id: params['scroll_id'] = scroll_id _, data = self.transport.perform_request('GET', '/_search/scroll', params=params, body=body) return data @query_params() def clear_scroll(self, scroll_id=None, body=None, params=None): """ Clear the scroll request created by specifying the scroll parameter to search. ``_ :arg scroll_id: The scroll ID or a list of scroll IDs :arg body: A comma-separated list of scroll IDs to clear if none was specified via the scroll_id parameter """ _, data = self.transport.perform_request('DELETE', _make_path('_search', 'scroll', scroll_id), body=body, params=params) return data @query_params('consistency', 'parent', 'refresh', 'routing', 'timeout', 'version', 'version_type') def delete(self, index, doc_type, id, params=None): """ Delete a typed JSON document from a specific index based on its id. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: The document ID :arg consistency: Specific write consistency setting for the operation :arg parent: ID of parent document :arg refresh: Refresh the index after performing the operation :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('DELETE', _make_path(index, doc_type, id), params=params) return data @query_params('allow_no_indices', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'expand_wildcards', 'ignore_unavailable', 'min_score', 'lenient', 'lowercase_expanded_terms', 'min_score', 'preference', 'q', 'routing') def count(self, index=None, doc_type=None, body=None, params=None): """ Execute a query and get the number of matches for that query. ``_ :arg index: A comma-separated list of indices to restrict the results :arg doc_type: A comma-separated list of types to restrict the results :arg body: A query to restrict the results (optional) :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg default_operator: The default operator for query string query (AND or OR), default u'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg min_score: Include only documents with a specific `_score` value in the result :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg min_score: Include only documents with a specific `_score` value in the result :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg routing: Specific routing value """ if doc_type and not index: index = '_all' _, data = self.transport.perform_request('POST', _make_path(index, doc_type, '_count'), params=params, body=body) return data @query_params('consistency', 'refresh', 'routing', 'timeout') def bulk(self, body, index=None, doc_type=None, params=None): """ Perform many index/delete operations in a single API call. ``_ See the :func:`~elasticsearch.helpers.bulk` helper function for a more friendly API. :arg body: The operation definition and data (action-data pairs), as either a newline separated string, or a sequence of dicts to serialize (one per row). :arg index: Default index for items which don't provide one :arg doc_type: Default document type for items which don't provide one :arg consistency: Explicit write consistency setting for the operation :arg refresh: Refresh the index after performing the operation :arg routing: Specific routing value :arg timeout: Explicit operation timeout """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('POST', _make_path(index, doc_type, '_bulk'), params=params, body=self._bulk_body(body)) return data @query_params('search_type') def msearch(self, body, index=None, doc_type=None, params=None): """ Execute several search requests within the same API. ``_ :arg body: The request definitions (metadata-search request definition pairs), as either a newline separated string, or a sequence of dicts to serialize (one per row). :arg index: A comma-separated list of index names to use as default :arg doc_type: A comma-separated list of document types to use as default :arg search_type: Search operation type """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_msearch'), params=params, body=self._bulk_body(body)) return data @query_params('allow_no_indices', 'analyzer', 'consistency', 'default_operator', 'df', 'expand_wildcards', 'ignore_unavailable', 'q', 'routing', 'timeout') def delete_by_query(self, index, doc_type=None, body=None, params=None): """ Delete documents from one or more indices and one or more types based on a query. ``_ :arg index: A comma-separated list of indices to restrict the operation; use `_all` to perform the operation on all indices :arg doc_type: A comma-separated list of types to restrict the operation :arg body: A query to restrict the operation specified with the Query DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyzer: The analyzer to use for the query string :arg consistency: Specific write consistency setting for the operation :arg default_operator: The default operator for query string query (AND or OR), default u'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg q: Query in the Lucene query string syntax :arg routing: Specific routing value :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") _, data = self.transport.perform_request('DELETE', _make_path(index, doc_type, '_query'), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'preference', 'routing') def suggest(self, body, index=None, params=None): """ The suggest feature suggests similar looking terms based on a provided text by using a suggester. ``_ :arg index: A comma-separated list of index names to restrict the operation; use `_all` or empty string to perform the operation on all indices :arg body: The request definition :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: Specific routing value """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('POST', _make_path(index, '_suggest'), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'percolate_format', 'percolate_index', 'percolate_preference', 'percolate_routing', 'percolate_type', 'preference', 'routing', 'version', 'version_type') def percolate(self, index, doc_type, id=None, body=None, params=None): """ The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. ``_ :arg index: The index of the document being percolated. :arg doc_type: The type of the document being percolated. :arg id: Substitute the document in the request body with a document that is known by the specified id. On top of the id, the index and type parameter will be used to retrieve the document from within the cluster. :arg body: The percolator request definition using the percolate DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg percolate_format: Return an array of matching query IDs instead of objects :arg percolate_index: The index to percolate the document into. Defaults to index. :arg percolate_preference: Which shard to prefer when executing the percolate request. :arg percolate_routing: The routing value to use when percolating the existing document. :arg percolate_type: The type to percolate document into. Defaults to type. :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: A comma-separated list of specific routing values :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_percolate'), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable') def mpercolate(self, body, index=None, doc_type=None, params=None): """ The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. ``_ :arg index: The index of the document being count percolated to use as default :arg doc_type: The type of the document being percolated to use as default. :arg body: The percolate request definitions (header & body pair), separated by newlines :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_mpercolate'), params=params, body=self._bulk_body(body)) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'percolate_index', 'percolate_type', 'preference', 'routing', 'version', 'version_type') def count_percolate(self, index, doc_type, id=None, body=None, params=None): """ The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. ``_ :arg index: The index of the document being count percolated. :arg doc_type: The type of the document being count percolated. :arg id: Substitute the document in the request body with a document that is known by the specified id. On top of the id, the index and type parameter will be used to retrieve the document from within the cluster. :arg body: The count percolator request definition using the percolate DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg percolate_index: The index to count percolate the document into. Defaults to index. :arg percolate_type: The type to count percolate document into. Defaults to type. :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: A comma-separated list of specific routing values :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_percolate', 'count'), params=params, body=body) return data @query_params('boost_terms', 'include', 'max_doc_freq', 'max_query_terms', 'max_word_length', 'min_doc_freq', 'min_term_freq', 'min_word_length', 'mlt_fields', 'percent_terms_to_match', 'routing', 'search_from', 'search_indices', 'search_query_hint', 'search_scroll', 'search_size', 'search_source', 'search_type', 'search_types', 'stop_words') def mlt(self, index, doc_type, id, body=None, params=None): """ Get documents that are "like" a specified document. ``_ :arg index: The name of the index :arg doc_type: The type of the document (use `_all` to fetch the first document matching the ID across all types) :arg id: The document ID :arg body: A specific search request definition :arg boost_terms: The boost factor :arg include: Whether to include the queried document from the response :arg max_doc_freq: The word occurrence frequency as count: words with higher occurrence in the corpus will be ignored :arg max_query_terms: The maximum query terms to be included in the generated query :arg max_word_length: The minimum length of the word: longer words will be ignored :arg min_doc_freq: The word occurrence frequency as count: words with lower occurrence in the corpus will be ignored :arg min_term_freq: The term frequency as percent: terms with lower occurence in the source document will be ignored :arg min_word_length: The minimum length of the word: shorter words will be ignored :arg mlt_fields: Specific fields to perform the query against :arg percent_terms_to_match: How many terms have to match in order to consider the document a match (default: 0.3) :arg routing: Specific routing value :arg search_from: The offset from which to return results :arg search_indices: A comma-separated list of indices to perform the query against (default: the index containing the document) :arg search_query_hint: The search query hint :arg search_scroll: A scroll search request definition :arg search_size: The number of documents to return (default: 10) :arg search_source: A specific search request definition (instead of using the request body) :arg search_type: Specific search type (eg. `dfs_then_fetch`, `count`, etc) :arg search_types: A comma-separated list of types to perform the query against (default: the same type as the document) :arg stop_words: A list of stop words to be ignored """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_mlt'), params=params, body=body) return data @query_params('dfs', 'field_statistics', 'fields', 'offsets', 'parent', 'payloads', 'positions', 'preference', 'realtime', 'routing', 'term_statistics', 'version', 'version_type') def termvectors(self, index, doc_type, id=None, body=None, params=None): """ Returns information and statistics on terms in the fields of a particular document. The document could be stored in the index or artificially provided by the user (Added in 1.4). Note that for documents stored in the index, this is a near realtime API as the term vectors are not available until the next refresh. `` :arg index: The index in which the document resides. :arg doc_type: The type of the document. :arg id: The id of the document, when not specified a doc param should be supplied. :arg body: Define parameters and or supply a document to get termvectors for. See documentation. :arg dfs: Specifies if distributed frequencies should be returned instead shard frequencies., default False :arg field_statistics: Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned., default True :arg fields: A comma-separated list of fields to return. :arg offsets: Specifies if term offsets should be returned., default True :arg parent: Parent id of documents. :arg payloads: Specifies if term payloads should be returned., default True :arg positions: Specifies if term positions should be returned., default True :arg preference: Specify the node or shard the operation should be performed on (default: random). :arg realtime: Specifies if request is real-time as opposed to near- real-time (default: true). :arg routing: Specific routing value. :arg term_statistics: Specifies if total term frequency and document frequency should be returned., default False :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_termvectors'), params=params, body=body) return data @query_params('field_statistics', 'fields', 'offsets', 'parent', 'payloads', 'positions', 'preference', 'realtime', 'routing', 'term_statistics') def termvector(self, index, doc_type, id, body=None, params=None): for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path(index, doc_type, id, '_termvector'), params=params, body=body) return data termvector.__doc__ = termvectors.__doc__ @query_params('field_statistics', 'fields', 'ids', 'offsets', 'parent', 'payloads', 'positions', 'preference', 'realtime', 'routing', 'term_statistics') def mtermvectors(self, index=None, doc_type=None, body=None, params=None): """ Multi termvectors API allows to get multiple termvectors based on an index, type and id. ``_ :arg index: The index in which the document resides. :arg doc_type: The type of the document. :arg body: Define ids, parameters or a list of parameters per document here. You must at least provide a list of document ids. See documentation. :arg field_statistics: Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg fields: A comma-separated list of fields to return. Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg ids: A comma-separated list of documents ids. You must define ids as parameter or set "ids" or "docs" in the request body :arg offsets: Specifies if term offsets should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg parent: Parent id of documents. Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg payloads: Specifies if term payloads should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg positions: Specifies if term positions should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg preference: Specify the node or shard the operation should be performed on (default: random) .Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg realtime: Specifies if requests are real-time as opposed to near- real-time (default: true). :arg routing: Specific routing value. Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg term_statistics: Specifies if total term frequency and document frequency should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default False """ _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_mtermvectors'), params=params, body=body) return data @query_params('op_type', 'version', 'version_type') def put_script(self, lang, id, body, params=None): """ Create a script in given language with specified ID. ``_ :arg lang: Script language :arg id: Script ID :arg body: The document :arg op_type: Explicit operation type, default u'index' :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (lang, id, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path('_scripts', lang, id), params=params, body=body) return data @query_params('version', 'version_type') def get_script(self, lang, id, params=None): """ Retrieve a script from the API. ``_ :arg lang: Script language :arg id: Script ID :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (lang, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path('_scripts', lang, id), params=params) return data @query_params('version', 'version_type') def delete_script(self, lang, id, params=None): """ Remove a stored script from elasticsearch. ``_ :arg lang: Script language :arg id: Script ID :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (lang, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('DELETE', _make_path('_scripts', lang, id), params=params) return data @query_params('op_type', 'version', 'version_type') def put_template(self, id, body, params=None): """ Create a search template. ``_ :arg id: Template ID :arg body: The document :arg op_type: Explicit operation type, default u'index' :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ for param in (id, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path('_search', 'template', id), params=params, body=body) return data @query_params('version', 'version_type') def get_template(self, id, params=None): """ Retrieve a search template. ``_ :arg id: Template ID :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ if id in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'id'.") _, data = self.transport.perform_request('GET', _make_path('_search', 'template', id), params=params) return data @query_params('version', 'version_type') def delete_template(self, id=None, params=None): """ Delete a search template. ``_ :arg id: Template ID :arg version: Explicit version number for concurrency control :arg version_type: Specific version type """ _, data = self.transport.perform_request('DELETE', _make_path('_search', 'template', id), params=params) return data @query_params('allow_no_indices', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'expand_wildcards', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'min_score', 'preference', 'q', 'routing') def search_exists(self, index=None, doc_type=None, body=None, params=None): """ The exists API allows to easily determine if any matching documents exist for a provided query. ``_ :arg index: A comma-separated list of indices to restrict the results :arg doc_type: A comma-separated list of types to restrict the results :arg body: A query to restrict the results specified with the Query DSL (optional) :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg default_operator: The default operator for query string query (AND or OR), default u'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg min_score: Include only documents with a specific `_score` value in the result :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg routing: Specific routing value """ try: self.transport.perform_request('POST', _make_path(index, doc_type, '_search', 'exists'), params=params, body=body) except NotFoundError: return False return True @query_params('allow_no_indices', 'expand_wildcards', 'fields', 'ignore_unavailable', 'level') def field_stats(self, index=None, params=None): """ The field stats api allows one to find statistical properties of a field without executing a search, but looking up measurements that are natively available in the Lucene index. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg fields: A comma-separated list of fields for to get field statistics for (min value, max value, and more) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg level: Defines if field stats should be returned on a per index level or on a cluster wide level, default u'cluster' """ _, data = self.transport.perform_request('GET', _make_path(index, '_field_stats'), params=params) return data elasticsearch-py-1.6.0/elasticsearch/client/cat.py000066400000000000000000000326441253570341300222030ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path class CatClient(NamespacedClient): @query_params('h', 'help', 'local', 'master_timeout', 'v') def aliases(self, name=None, params=None): """ ``_ :arg name: A comma-separated list of alias names to return :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'aliases', name), params=params) return data @query_params('bytes', 'h', 'help', 'local', 'master_timeout', 'v') def allocation(self, node_id=None, params=None): """ Allocation provides a snapshot of how shards have located around the cluster and the state of disk usage. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information :arg bytes: The unit in which to display byte values :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'allocation', node_id), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def count(self, index=None, params=None): """ Count provides quick access to the document count of the entire cluster, or individual indices. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'count', index), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'ts', 'v') def health(self, params=None): """ health is a terse, one-line representation of the same information from :meth:`~elasticsearch.client.cluster.ClusterClient.health` API ``_ :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg ts: Set to false to disable timestamping, default True :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', '/_cat/health', params=params) return data @query_params('help') def help(self, params=None): """ A simple help for the cat api. ``_ :arg help: Return help information, default False """ _, data = self.transport.perform_request('GET', '/_cat', params=params) return data @query_params('bytes', 'h', 'help', 'local', 'master_timeout', 'pri', 'v') def indices(self, index=None, params=None): """ The indices command provides a cross-section of each index. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg bytes: The unit in which to display byte values :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg pri: Set to true to return stats only for primary shards, default False :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'indices', index), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def master(self, params=None): """ Displays the master's node ID, bound IP address, and node name. ``_ :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', '/_cat/master', params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def nodes(self, params=None): """ The nodes command shows the cluster topology. ``_ :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', '/_cat/nodes', params=params) return data @query_params('bytes', 'h', 'help', 'local', 'master_timeout', 'v') def recovery(self, index=None, params=None): """ recovery is a view of shard replication. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg bytes: The unit in which to display byte values :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'recovery', index), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def shards(self, index=None, params=None): """ The shards command is the detailed view of what nodes contain which shards. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'shards', index), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def segments(self, index=None, params=None): """ The segments command is the detailed view of Lucene segments per index. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'segments', index), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def pending_tasks(self, params=None): """ pending_tasks provides the same information as the :meth:`~elasticsearch.client.cluster.ClusterClient.pending_tasks` API in a convenient tabular format. ``_ :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', '/_cat/pending_tasks', params=params) return data @query_params('full_id', 'h', 'help', 'local', 'master_timeout', 'v') def thread_pool(self, params=None): """ Get information about thread pools. ``_ :arg full_id: Enables displaying the complete node ids (default: 'false') :arg h: Comma-separated list of column names to display :arg help: Return help information (default: 'false') :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers (default: 'false') """ _, data = self.transport.perform_request('GET', '/_cat/thread_pool', params=params) return data @query_params('bytes', 'fields', 'h', 'help', 'local', 'master_timeout', 'v') def fielddata(self, fields=None, params=None): """ Shows information about currently loaded fielddata on a per-node basis. ``_ :arg fields: A comma-separated list of fields to return the fielddata size :arg bytes: The unit in which to display byte values :arg fields: A comma-separated list of fields to return the fielddata size :arg h: Comma-separated list of column names to display :arg help: Return help information (default: 'false') :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers (default: 'false') """ _, data = self.transport.perform_request('GET', _make_path('_cat', 'fielddata', fields), params=params) return data @query_params('h', 'help', 'local', 'master_timeout', 'v') def plugins(self, params=None): """ ``_ :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg v: Verbose mode. Display column headers, default False """ _, data = self.transport.perform_request('GET', '/_cat/plugins', params=params) return data elasticsearch-py-1.6.0/elasticsearch/client/cluster.py000066400000000000000000000161371253570341300231140ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path class ClusterClient(NamespacedClient): @query_params('level', 'local', 'master_timeout', 'timeout', 'wait_for_active_shards', 'wait_for_nodes', 'wait_for_relocating_shards', 'wait_for_status') def health(self, index=None, params=None): """ Get a very simple status on the health of the cluster. ``_ :arg index: Limit the information returned to a specific index :arg level: Specify the level of detail for returned information, default u'cluster' :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout :arg wait_for_active_shards: Wait until the specified number of shards is active :arg wait_for_nodes: Wait until the specified number of nodes is available :arg wait_for_relocating_shards: Wait until the specified number of relocating shards is finished :arg wait_for_status: Wait until cluster is in a specific state, default None """ _, data = self.transport.perform_request('GET', _make_path('_cluster', 'health', index), params=params) return data @query_params('local', 'master_timeout') def pending_tasks(self, params=None): """ The pending cluster tasks API returns a list of any cluster-level changes (e.g. create index, update mapping, allocate or fail shard) which have not yet been executed. ``_ :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Specify timeout for connection to master """ _, data = self.transport.perform_request('GET', '/_cluster/pending_tasks', params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'flat_settings', 'ignore_unavailable', 'local', 'master_timeout') def state(self, metric=None, index=None, params=None): """ Get a comprehensive state information of the whole cluster. ``_ :arg metric: Limit the information returned to the specified metrics. Possible values: "_all", "blocks", "index_templates", "metadata", "nodes", "routing_table", "master_node", "version" :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether wildcard expressions should get expanded to open or closed indices (default: open) :arg flat_settings: Return settings in flat format (default: false) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Specify timeout for connection to master """ if index and not metric: metric = '_all' _, data = self.transport.perform_request('GET', _make_path('_cluster', 'state', metric, index), params=params) return data @query_params('flat_settings', 'human') def stats(self, node_id=None, params=None): """ The Cluster Stats API allows to retrieve statistics from a cluster wide perspective. The API returns basic index metrics and information about the current nodes that form the cluster. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg flat_settings: Return settings in flat format (default: false) :arg human: Whether to return time and byte values in human-readable format. """ url = '/_cluster/stats' if node_id: url = _make_path('_cluster/stats/nodes', node_id) _, data = self.transport.perform_request('GET', url, params=params) return data @query_params('dry_run', 'explain', 'master_timeout', 'metric', 'timeout') def reroute(self, body=None, params=None): """ Explicitly execute a cluster reroute allocation command including specific commands. ``_ :arg body: The definition of `commands` to perform (`move`, `cancel`, `allocate`) :arg dry_run: Simulate the operation only and return the resulting state :arg explain: Return an explanation of why the commands can or cannot be executed :arg filter_metadata: Don't return cluster state metadata (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg metric: Limit the information returned to the specified metrics. Defaults to all but metadata :arg timeout: Explicit operation timeout """ _, data = self.transport.perform_request('POST', '/_cluster/reroute', params=params, body=body) return data @query_params('flat_settings', 'master_timeout', 'timeout') def get_settings(self, params=None): """ Get cluster settings. ``_ :arg flat_settings: Return settings in flat format (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ _, data = self.transport.perform_request('GET', '/_cluster/settings', params=params) return data @query_params('flat_settings', 'master_timeout', 'timeout') def put_settings(self, body, params=None): """ Update cluster wide specific settings. ``_ :arg body: The settings to be updated. Can be either `transient` or `persistent` (survives cluster restart). :arg flat_settings: Return settings in flat format (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ _, data = self.transport.perform_request('PUT', '/_cluster/settings', params=params, body=body) return data elasticsearch-py-1.6.0/elasticsearch/client/indices.py000066400000000000000000001434301253570341300230460ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH from ..exceptions import NotFoundError class IndicesClient(NamespacedClient): @query_params('analyzer', 'char_filters', 'field', 'filters', 'format', 'prefer_local', 'text', 'tokenizer') def analyze(self, index=None, body=None, params=None): """ Perform the analysis process on a text and return the tokens breakdown of the text. ``_ :arg index: The name of the index to scope the operation :arg body: The text on which the analysis should be performed :arg analyzer: The name of the analyzer to use :arg char_filters: A comma-separated list of character filters to use for the analysis :arg field: Use the analyzer configured for this field (instead of passing the analyzer name) :arg filters: A comma-separated list of filters to use for the analysis :arg format: Format of the output, default u'detailed' :arg prefer_local: With `true`, specify that a local shard should be used if available, with `false`, use a random shard (default: true) :arg text: The text on which the analysis should be performed (when request body is not used) :arg tokenizer: The name of the tokenizer to use for the analysis """ _, data = self.transport.perform_request('GET', _make_path(index, '_analyze'), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'force') def refresh(self, index=None, params=None): """ Explicitly refresh one or more index, making all operations performed since the last refresh available for search. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones, default u'none' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg force: Force a refresh even if not required """ _, data = self.transport.perform_request('POST', _make_path(index, '_refresh'), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'force', 'full', 'ignore_indices', 'ignore_unavailable', 'wait_if_ongoing') def flush(self, index=None, params=None): """ Explicitly flush one or more indices. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string for all indices :arg force: Whether a flush should be forced even if it is not necessarily needed ie. if no changes will be committed to the index. :arg full: If set to true a new index writer is created and settings that have been changed related to the index writer will be refreshed. :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones (default: none) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg wait_if_ongoing: If set to true the flush operation will block until the flush can be executed if another flush operation is already executing. The default is false and will cause an exception to be thrown on the shard level if another flush operation is already running. """ _, data = self.transport.perform_request('POST', _make_path(index, '_flush'), params=params) return data @query_params('timeout', 'master_timeout') def create(self, index, body=None, params=None): """ Create an index in Elasticsearch. ``_ :arg index: The name of the index :arg body: The configuration for the index (`settings` and `mappings`) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") _, data = self.transport.perform_request('PUT', _make_path(index), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def get(self, index, feature=None, params=None): """ The get index API allows to retrieve information about one or more indexes. ``_ :arg index: A comma-separated list of index names :arg feature: A comma-separated list of features :arg allow_no_indices: Ignore if a wildcard expression resolves to no concrete indices (default: false) :arg expand_wildcards: Whether wildcard expressions should get expanded to open or closed indices (default: open) :arg ignore_unavailable: Ignore unavailable indexes (default: false) :arg local: Return local information, do not retrieve the state from master node (default: false) """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") _, data = self.transport.perform_request('GET', _make_path(index, feature), params=params) return data @query_params('timeout', 'master_timeout' 'allow_no_indices', 'expand_wildcards', 'ignore_unavailable') def open(self, index, params=None): """ Open a closed index to make it available for search. ``_ :arg index: The name of the index :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") _, data = self.transport.perform_request('POST', _make_path(index, '_open'), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'master_timeout', 'timeout') def close(self, index, params=None): """ Close an index to remove it's overhead from the cluster. Closed index is blocked for read/write operations. ``_ :arg index: A comma-separated list of indices to close; use `_all` or '*' to close all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") _, data = self.transport.perform_request('POST', _make_path(index, '_close'), params=params) return data @query_params('timeout', 'master_timeout') def delete(self, index, params=None): """ Delete an index in Elasticsearch ``_ :arg index: A comma-separated list of indices to delete; use `_all` or '*' to delete all indices :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") _, data = self.transport.perform_request('DELETE', _make_path(index), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def exists(self, index, params=None): """ Return a boolean indicating whether given index exists. ``_ :arg index: A list of indices to check :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") try: self.transport.perform_request('HEAD', _make_path(index), params=params) except NotFoundError: return False return True @query_params('allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'local') def exists_type(self, index, doc_type, params=None): """ Check if a type/types exists in an index/indices. ``_ :arg index: A comma-separated list of index names; use `_all` to check the types across all indices :arg doc_type: A comma-separated list of document types to check :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones (default: none) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") try: self.transport.perform_request('HEAD', _make_path(index, doc_type), params=params) except NotFoundError: return False return True @query_params('allow_no_indices', 'expand_wildcards', 'ignore_conflicts', 'ignore_unavailable', 'master_timeout', 'timeout') def put_mapping(self, doc_type, body, index=None, params=None): """ Register specific mapping definition for a specific type. ``_ :arg doc_type: The name of the document type :arg body: The mapping definition :arg index: A list of index names the mapping should be added to (supports wildcards); use `_all` or omit to add the mapping on all indices. :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_conflicts: Specify whether to ignore conflicts while updating the mapping (default: false) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ for param in (doc_type, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path(index, '_mapping', doc_type), params=params, body=body) return data @query_params('ignore_unavailable', 'allow_no_indices', 'expand_wildcards', 'local') def get_mapping(self, index=None, doc_type=None, params=None): """ Retrieve mapping definition of index or index/type. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string for all indices :arg doc_type: A comma-separated list of document types :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ _, data = self.transport.perform_request('GET', _make_path(index, '_mapping', doc_type), params=params) return data @query_params("include_defaults", 'ignore_unavailable', 'allow_no_indices', 'expand_wildcards', 'local') def get_field_mapping(self, field, index=None, doc_type=None, params=None): """ Retrieve mapping definition of a specific field. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string for all indices :arg doc_type: A comma-separated list of document types :arg field: A comma-separated list of fields to retrieve the mapping for :arg include_defaults: A boolean indicating whether to return default values :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ if field in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'field'.") _, data = self.transport.perform_request('GET', _make_path(index, '_mapping', doc_type, 'field', field), params=params) return data @query_params('master_timeout') def delete_mapping(self, index, doc_type, params=None): """ Delete a mapping (type) along with its data. ``_ :arg index: A comma-separated list of index names (supports wildcard); use `_all` for all indices :arg doc_type: A comma-separated list of document types to delete (supports wildcards); use `_all` to delete all document types in the specified indices. :arg master_timeout: Specify timeout for connection to master """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('DELETE', _make_path(index, '_mapping', doc_type), params=params) return data @query_params('timeout', 'master_timeout') def put_alias(self, name, index, body=None, params=None): """ Create an alias for a specific index/indices. ``_ :arg index: A comma-separated list of index names the alias should point to (supports wildcards); use `_all` or omit to perform the operation on all indices. :arg name: The name of the alias to be created or updated :arg body: The settings for the alias, such as `routing` or `filter` :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit timestamp for the document """ for param in (index, name): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path(index, '_alias', name), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'local') def exists_alias(self, name, index=None, params=None): """ Return a boolean indicating whether given alias exists. ``_ :arg name: A comma-separated list of alias names to return :arg index: A comma-separated list of index names to filter aliases :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones (default: none) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ try: self.transport.perform_request('HEAD', _make_path(index, '_alias', name), params=params) except NotFoundError: return False return True @query_params('allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'local') def get_alias(self, index=None, name=None, params=None): """ Retrieve a specified alias. ``_ :arg name: A comma-separated list of alias names to return :arg index: A comma-separated list of index names to filter aliases :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones, default u'none' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ _, data = self.transport.perform_request('GET', _make_path(index, '_alias', name), params=params) return data @query_params('local', 'timeout') def get_aliases(self, index=None, name=None, params=None): """ Retrieve specified aliases ``_ :arg index: A comma-separated list of index names to filter aliases :arg name: A comma-separated list of alias names to filter :arg local: Return local information, do not retrieve the state from master node (default: false) :arg timeout: Explicit operation timeout """ _, data = self.transport.perform_request('GET', _make_path(index, '_aliases', name), params=params) return data @query_params('timeout', 'master_timeout') def update_aliases(self, body, params=None): """ Update specified aliases. ``_ :arg body: The definition of `actions` to perform :arg master_timeout: Specify timeout for connection to master :arg timeout: Request timeout """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('POST', '/_aliases', params=params, body=body) return data @query_params('timeout', 'master_timeout') def delete_alias(self, index, name, params=None): """ Delete specific alias. ``_ :arg index: A comma-separated list of index names (supports wildcards); use `_all` for all indices :arg name: A comma-separated list of aliases to delete (supports wildcards); use `_all` to delete all aliases for the specified indices. :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit timestamp for the document """ for param in (index, name): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('DELETE', _make_path(index, '_alias', name), params=params) return data @query_params('create', 'order', 'timeout', 'master_timeout', 'flat_settings') def put_template(self, name, body, params=None): """ Create an index template that will automatically be applied to new indices created. ``_ :arg name: The name of the template :arg body: The template definition :arg create: Whether the index template should only be added if new or can also replace an existing one :arg order: The order for this template when merging multiple matching ones (higher numbers are merged later, overriding the lower numbers) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout :arg flat_settings: Return settings in flat format (default: false) """ for param in (name, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path('_template', name), params=params, body=body) return data @query_params('local', 'master_timeout') def exists_template(self, name, params=None): """ Return a boolean indicating whether given template exists. ``_ :arg name: The name of the template :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node """ if name in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'name'.") try: self.transport.perform_request('HEAD', _make_path('_template', name), params=params) except NotFoundError: return False return True @query_params('flat_settings', 'local', 'master_timeout') def get_template(self, name=None, params=None): """ Retrieve an index template by its name. ``_ :arg name: The name of the template :arg flat_settings: Return settings in flat format (default: false) :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node """ _, data = self.transport.perform_request('GET', _make_path('_template', name), params=params) return data @query_params('timeout', 'master_timeout') def delete_template(self, name, params=None): """ Delete an index template by its name. ``_ :arg name: The name of the template :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if name in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'name'.") _, data = self.transport.perform_request('DELETE', _make_path('_template', name), params=params) return data @query_params('expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'flat_settings', 'local') def get_settings(self, index=None, name=None, params=None): """ Retrieve settings for one or more (or all) indices. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg name: The name of the settings that should be included :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones, default u'none' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg flat_settings: Return settings in flat format (default: false) :arg local: Return local information, do not retrieve the state from master node (default: false) """ _, data = self.transport.perform_request('GET', _make_path(index, '_settings', name), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'flat_settings', 'ignore_unavailable', 'master_timeout') def put_settings(self, body, index=None, params=None): """ Change specific index level settings in real time. ``_ :arg body: The index settings to be updated :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg flat_settings: Return settings in flat format (default: false) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") _, data = self.transport.perform_request('PUT', _make_path(index, '_settings'), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'master_timeout') def put_warmer(self, name, body, index=None, doc_type=None, params=None): """ Create an index warmer to run registered search requests to warm up the index before it is available for search. ``_ :arg name: The name of the warmer :arg body: The search request definition for the warmer (query, filters, facets, sorting, etc) :arg index: A comma-separated list of index names to register the warmer for; use `_all` or omit to perform the operation on all indices :arg doc_type: A comma-separated list of document types to register the warmer for; leave empty to perform the operation on all types :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices in the search request to warm. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both, in the search request to warm., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) in the search request to warm :arg master_timeout: Specify timeout for connection to master """ for param in (name, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") if doc_type and not index: index = '_all' _, data = self.transport.perform_request('PUT', _make_path(index, doc_type, '_warmer', name), params=params, body=body) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def get_warmer(self, index=None, doc_type=None, name=None, params=None): """ Retreieve an index warmer. ``_ :arg index: A comma-separated list of index names to restrict the operation; use `_all` to perform the operation on all indices :arg doc_type: A comma-separated list of document types to restrict the operation; leave empty to perform the operation on all types :arg name: The name of the warmer (supports wildcards); leave empty to get all warmers :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_warmer', name), params=params) return data @query_params('master_timeout') def delete_warmer(self, index, name, params=None): """ Delete an index warmer. ``_ :arg index: A comma-separated list of index names to delete warmers from (supports wildcards); use `_all` to perform the operation on all indices. :arg name: A comma-separated list of warmer names to delete (supports wildcards); use `_all` to delete all warmers in the specified indices. :arg master_timeout: Specify timeout for connection to master """ for param in (index, name): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('DELETE', _make_path(index, '_warmer', name), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'operation_threading', 'recovery', 'snapshot', 'human') def status(self, index=None, params=None): """ Get a comprehensive status information of one or more indices. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones, default u'none' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg operation_threading: TODO: ? :arg recovery: Return information about shard recovery :arg snapshot: TODO: ? :arg human: Whether to return time and byte values in human-readable format. """ _, data = self.transport.perform_request('GET', _make_path(index, '_status'), params=params) return data @query_params('completion_fields', 'docs', 'fielddata_fields', 'fields', 'groups', 'allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'human', 'level', 'types') def stats(self, index=None, metric=None, params=None): """ Retrieve statistics on different operations happening on an index. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg metric: A comma-separated list of metrics to display. Possible values: "_all", "completion", "docs", "fielddata", "filter_cache", "flush", "get", "id_cache", "indexing", "merge", "percolate", "refresh", "search", "segments", "store", "warmer" :arg completion_fields: A comma-separated list of fields for `completion` metric (supports wildcards) :arg fielddata_fields: A comma-separated list of fields for `fielddata` metric (supports wildcards) :arg fields: A comma-separated list of fields for `fielddata` and `completion` metric (supports wildcards) :arg groups: A comma-separated list of search groups for `search` statistics :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones (default: none) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg human: Whether to return time and byte values in human-readable format. :arg level: Return stats aggregated at cluster, index or shard level. ("cluster", "indices" or "shards", default: "indices") :arg types: A comma-separated list of document types for the `indexing` index metric """ _, data = self.transport.perform_request('GET', _make_path(index, '_stats', metric), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'human') def segments(self, index=None, params=None): """ Provide low level segments information that a Lucene index (shard level) is built with. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones, default u'none' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg human: Whether to return time and byte values in human-readable format (default: false) """ _, data = self.transport.perform_request('GET', _make_path(index, '_segments'), params=params) return data @query_params('flush', 'allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'max_num_segments', 'only_expunge_deletes', 'operation_threading', 'wait_for_merge') def optimize(self, index=None, params=None): """ Explicitly optimize one or more indices through an API. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg flush: Specify whether the index should be flushed after performing the operation (default: true) :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones, default u'none' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg max_num_segments: The number of segments the index should be merged into (default: dynamic) :arg only_expunge_deletes: Specify whether the operation should only expunge deleted documents :arg operation_threading: TODO: ? :arg wait_for_merge: Specify whether the request should block until the merge process is finished (default: true) """ _, data = self.transport.perform_request('POST', _make_path(index, '_optimize'), params=params) return data @query_params('allow_no_indices', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'expand_wildcards', 'explain', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'operation_threading', 'q') def validate_query(self, index=None, doc_type=None, body=None, params=None): """ Validate a potentially expensive query without executing it. ``_ :arg index: A comma-separated list of index names to restrict the operation; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to restrict the operation; leave empty to perform the operation on all types :arg body: The query definition specified with the Query DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg default_operator: The default operator for query string query (AND or OR), default u'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg explain: Return detailed information about the error :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg operation_threading: TODO: ? :arg q: Query in the Lucene query string syntax """ _, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_validate', 'query'), params=params, body=body) return data @query_params('field_data', 'fielddata', 'fields', 'filter', 'filter_cache', 'filter_keys', 'id', 'id_cache', 'allow_no_indices', 'expand_wildcards', 'ignore_indices', 'ignore_unavailable', 'query_cache', 'recycler') def clear_cache(self, index=None, params=None): """ Clear either all caches or specific cached associated with one ore more indices. ``_ :arg index: A comma-separated list of index name to limit the operation :arg field_data: Clear field data :arg fielddata: Clear field data :arg fields: A comma-separated list of fields to clear when using the `field_data` parameter (default: all) :arg filter: Clear filter caches :arg filter_cache: Clear filter caches :arg filter_keys: A comma-separated list of keys to clear when using the `filter_cache` parameter (default: all) :arg id: Clear ID caches for parent/child :arg id_cache: Clear ID caches for parent/child :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both. :arg ignore_indices: When performed on multiple indices, allows to ignore `missing` ones (default: none) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg query_cache: Clear query cache :arg recycler: Clear the recycler cache """ _, data = self.transport.perform_request('POST', _make_path(index, '_cache', 'clear'), params=params) return data @query_params('active_only', 'detailed', 'human') def recovery(self, index=None, params=None): """ The indices recovery API provides insight into on-going shard recoveries. Recovery status may be reported for specific indices, or cluster-wide. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg active_only: Display only those recoveries that are currently on- going (default: 'false') :arg detailed: Whether to display detailed information about shard recovery (default: 'false') :arg human: Whether to return time and byte values in human-readable format. (default: 'false') """ _, data = self.transport.perform_request('GET', _make_path(index, '_recovery'), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'only_ancient_segments', 'wait_for_completion') def upgrade(self, index=None, params=None): """ Upgrade one or more indices to the latest format through an API. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg only_ancient_segments: If true, only ancient (an older Lucene major release) segments will be upgraded :arg wait_for_completion: Specify whether the request should block until the all segments are upgraded (default: true) """ _, data = self.transport.perform_request('POST', _make_path(index, '_upgrade'), params=params) return data @query_params('allow_no_indices', 'expand_wildcards', 'human', 'ignore_unavailable') def get_upgrade(self, index=None, params=None): """ Monitor how much of one or more index is upgraded. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default u'open' :arg human: Whether to return time and byte values in human-readable format., default False :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) """ _, data = self.transport.perform_request('GET', _make_path(index, '_upgrade'), params=params) return data @query_params() def flush_synced(self, index=None, params=None): """ Perform a normal flush, then add a generated unique marker (sync_id) to all shards. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string for all indices """ _, data = self.transport.perform_request('POST', _make_path(index, '_flush', 'synced'), params=params) return data elasticsearch-py-1.6.0/elasticsearch/client/nodes.py000066400000000000000000000133641253570341300225420ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path class NodesClient(NamespacedClient): @query_params('flat_settings', 'human') def info(self, node_id=None, metric=None, params=None): """ The cluster nodes info API allows to retrieve one or more (or all) of the cluster nodes information. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg metric: A comma-separated list of metrics you wish returned. Leave empty to return all. Choices are "settings", "os", "process", "jvm", "thread_pool", "network", "transport", "http", "plugin" :arg flat_settings: Return settings in flat format (default: false) :arg human: Whether to return time and byte values in human-readable format., default False """ _, data = self.transport.perform_request('GET', _make_path('_nodes', node_id, metric), params=params) return data @query_params('delay', 'exit') def shutdown(self, node_id=None, params=None): """ The nodes shutdown API allows to shutdown one or more (or all) nodes in the cluster. ``_ :arg node_id: A comma-separated list of node IDs or names to perform the operation on; use `_local` to perform the operation on the node you're connected to, leave empty to perform the operation on all nodes :arg delay: Set the delay for the operation (default: 1s) :arg exit: Exit the JVM as well (default: true) """ _, data = self.transport.perform_request('POST', _make_path('_cluster', 'nodes', node_id, '_shutdown'), params=params) return data @query_params('completion_fields', 'fielddata_fields', 'fields', 'groups', 'human', 'level', 'types') def stats(self, node_id=None, metric=None, index_metric=None, params=None): """ The cluster nodes stats API allows to retrieve one or more (or all) of the cluster nodes statistics. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg metric: Limit the information returned to the specified metrics. Possible options are: "_all", "breaker", "fs", "http", "indices", "jvm", "network", "os", "process", "thread_pool", "transport" :arg index_metric: Limit the information returned for `indices` metric to the specific index metrics. Isn't used if `indices` (or `all`) metric isn't specified. Possible options are: "_all", "completion", "docs", "fielddata", "filter_cache", "flush", "get", "id_cache", "indexing", "merge", "percolate", "refresh", "search", "segments", "store", "warmer" :arg completion_fields: A comma-separated list of fields for `fielddata` and `suggest` index metric (supports wildcards) :arg fielddata_fields: A comma-separated list of fields for `fielddata` index metric (supports wildcards) :arg fields: A comma-separated list of fields for `fielddata` and `completion` index metric (supports wildcards) :arg groups: A comma-separated list of search groups for `search` index metric :arg human: Whether to return time and byte values in human-readable format., default False :arg level: Return indices stats aggregated at node, index or shard level, default 'node' :arg types: A comma-separated list of document types for the `indexing` index metric """ _, data = self.transport.perform_request('GET', _make_path('_nodes', node_id, 'stats', metric, index_metric), params=params) return data @query_params('type_', 'ignore_idle_threads', 'interval', 'snapshots', 'threads') def hot_threads(self, node_id=None, params=None): """ An API allowing to get the current hot threads on each node in the cluster. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg type_: The type to sample (default: cpu) :arg ignore_idle_threads: Don't show threads that are in known-idle places, such as waiting on a socket select or pulling from an empty task queue (default: true) :arg interval: The interval for the second sampling of threads :arg snapshots: Number of samples of thread stacktrace (default: 10) :arg threads: Specify the number of threads to provide information for (default: 3) """ # avoid python reserved words if params and 'type_' in params: params['type'] = params.pop('type_') _, data = self.transport.perform_request('GET', _make_path('_nodes', node_id, 'hot_threads'), params=params) return data elasticsearch-py-1.6.0/elasticsearch/client/snapshot.py000066400000000000000000000166601253570341300232730ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH class SnapshotClient(NamespacedClient): @query_params('master_timeout', 'wait_for_completion') def create(self, repository, snapshot, body=None, params=None): """ Create a snapshot in repository ``_ :arg repository: A repository name :arg snapshot: A snapshot name :arg body: The snapshot definition :arg master_timeout: Explicit operation timeout for connection to master node :arg wait_for_completion: Should this request wait until the operation has completed before returning, default False """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path('_snapshot', repository, snapshot), params=params, body=body) return data @query_params('master_timeout') def delete(self, repository, snapshot, params=None): """ Deletes a snapshot from a repository. ``_ :arg repository: A repository name :arg snapshot: A snapshot name :arg master_timeout: Explicit operation timeout for connection to master node """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('DELETE', _make_path('_snapshot', repository, snapshot), params=params) return data @query_params('master_timeout') def get(self, repository, snapshot, params=None): """ Retrieve information about a snapshot. ``_ :arg repository: A comma-separated list of repository names :arg snapshot: A comma-separated list of snapshot names :arg master_timeout: Explicit operation timeout for connection to master node """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('GET', _make_path('_snapshot', repository, snapshot), params=params) return data @query_params('master_timeout', 'timeout') def delete_repository(self, repository, params=None): """ Removes a shared file system repository. ``_ :arg repository: A comma-separated list of repository names :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ if repository in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'repository'.") _, data = self.transport.perform_request('DELETE', _make_path('_snapshot', repository), params=params) return data @query_params('local', 'master_timeout') def get_repository(self, repository=None, params=None): """ Return information about registered repositories. ``_ :arg repository: A comma-separated list of repository names :arg master_timeout: Explicit operation timeout for connection to master node :arg local: Return local information, do not retrieve the state from master node (default: false) """ _, data = self.transport.perform_request('GET', _make_path('_snapshot', repository), params=params) return data @query_params('master_timeout', 'timeout') def create_repository(self, repository, body, params=None): """ Registers a shared file system repository. ``_ :arg repository: A repository name :arg body: The repository definition :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ for param in (repository, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('PUT', _make_path('_snapshot', repository), params=params, body=body) return data @query_params('master_timeout', 'wait_for_completion') def restore(self, repository, snapshot, body=None, params=None): """ Restore a snapshot. ``_ :arg repository: A repository name :arg snapshot: A snapshot name :arg body: Details of what to restore :arg master_timeout: Explicit operation timeout for connection to master node :arg wait_for_completion: Should this request wait until the operation has completed before returning, default False """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") _, data = self.transport.perform_request('POST', _make_path('_snapshot', repository, snapshot, '_restore'), params=params, body=body) return data @query_params('master_timeout') def status(self, repository=None, snapshot=None, params=None): """ Return information about all currently running snapshots. By specifying a repository name, it's possible to limit the results to a particular repository. ``_ :arg repository: A repository name :arg snapshot: A comma-separated list of snapshot names :arg master_timeout: Explicit operation timeout for connection to master node """ _, data = self.transport.perform_request('GET', _make_path('_snapshot', repository, snapshot, '_status'), params=params) return data @query_params('master_timeout', 'timeout') def verify_repository(self, repository, params=None): """ Returns a list of nodes where repository was successfully verified or an error message if verification process failed. ``_ :arg repository: A repository name :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ if repository in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'repository'.") _, data = self.transport.perform_request('POST', _make_path('_snapshot', repository, '_verify'), params=params) return data elasticsearch-py-1.6.0/elasticsearch/client/utils.py000066400000000000000000000051251253570341300225660ustar00rootroot00000000000000from __future__ import unicode_literals import weakref from datetime import date, datetime from functools import wraps from ..compat import string_types, quote_plus # parts of URL to be omitted SKIP_IN_PATH = (None, '', b'', [], ()) def _escape(value): """ Escape a single value of a URL string or a query parameter. If it is a list or tuple, turn it into a comma-separated string first. """ # make sequences into comma-separated stings if isinstance(value, (list, tuple)): value = ','.join(value) # dates and datetimes into isoformat elif isinstance(value, (date, datetime)): value = value.isoformat() # make bools into true/false strings elif isinstance(value, bool): value = str(value).lower() # encode strings to utf-8 if isinstance(value, string_types): try: return value.encode('utf-8') except UnicodeDecodeError: # Python 2 and str, no need to re-encode pass return str(value) def _make_path(*parts): """ Create a URL string from parts, omit all `None` values and empty strings. Convert lists nad tuples to comma separated values. """ #TODO: maybe only allow some parts to be lists/tuples ? return '/' + '/'.join( # preserve ',' and '*' in url for nicer URLs in logs quote_plus(_escape(p), b',*') for p in parts if p not in SKIP_IN_PATH) # parameters that apply to all methods GLOBAL_PARAMS = ('pretty', 'format', 'filter_path') def query_params(*es_query_params): """ Decorator that pops all accepted parameters from method's kwargs and puts them in the params argument. """ def _wrapper(func): @wraps(func) def _wrapped(*args, **kwargs): params = kwargs.pop('params', {}) for p in es_query_params + GLOBAL_PARAMS: if p in kwargs: params[p] = _escape(kwargs.pop(p)) # don't treat ignore and request_timeout as other params to avoid escaping for p in ('ignore', 'request_timeout'): if p in kwargs: params[p] = kwargs.pop(p) return func(*args, params=params, **kwargs) return _wrapped return _wrapper class NamespacedClient(object): def __init__(self, client): self.client = client @property def transport(self): return self.client.transport class AddonClient(NamespacedClient): @classmethod def infect_client(cls, client): addon = cls(weakref.proxy(client)) setattr(client, cls.namespace, addon) return client elasticsearch-py-1.6.0/elasticsearch/compat.py000066400000000000000000000004701253570341300214310ustar00rootroot00000000000000import sys PY2 = sys.version_info[0] == 2 if PY2: string_types = basestring, from urllib import quote_plus, urlencode from urlparse import urlparse from itertools import imap as map else: string_types = str, bytes from urllib.parse import quote_plus, urlencode, urlparse map = map elasticsearch-py-1.6.0/elasticsearch/connection/000077500000000000000000000000001253570341300217325ustar00rootroot00000000000000elasticsearch-py-1.6.0/elasticsearch/connection/__init__.py000066400000000000000000000003411253570341300240410ustar00rootroot00000000000000from .base import Connection from .http_requests import RequestsHttpConnection from .http_urllib3 import Urllib3HttpConnection from .memcached import MemcachedConnection from .thrift import ThriftConnection, THRIFT_AVAILABLE elasticsearch-py-1.6.0/elasticsearch/connection/base.py000066400000000000000000000102511253570341300232150ustar00rootroot00000000000000import logging try: import simplejson as json except ImportError: import json from ..exceptions import TransportError, HTTP_EXCEPTIONS logger = logging.getLogger('elasticsearch') # create the elasticsearch.trace logger, but only set propagate to False if the # logger hasn't already been configured _tracer_already_configured = 'elasticsearch.trace' in logging.Logger.manager.loggerDict tracer = logging.getLogger('elasticsearch.trace') if not _tracer_already_configured: tracer.propagate = False class Connection(object): """ Class responsible for maintaining a connection to an Elasticsearch node. It holds persistent connection pool to it and it's main interface (`perform_request`) is thread-safe. Also responsible for logging. """ transport_schema = 'http' def __init__(self, host='localhost', port=9200, url_prefix='', timeout=10, **kwargs): """ :arg host: hostname of the node (default: localhost) :arg port: port to use (integer, default: 9200) :arg url_prefix: optional url prefix for elasticsearch :arg timeout: default timeout in seconds (float, default: 10) """ self.host = '%s://%s:%s' % (self.transport_schema, host, port) if url_prefix: url_prefix = '/' + url_prefix.strip('/') self.url_prefix = url_prefix self.timeout = timeout def __repr__(self): return '<%s: %s>' % (self.__class__.__name__, self.host) def log_request_success(self, method, full_url, path, body, status_code, response, duration): """ Log a successful API call. """ # TODO: optionally pass in params instead of full_url and do urlencode only when needed def _pretty_json(data): # pretty JSON in tracer curl logs try: return json.dumps(json.loads(data), sort_keys=True, indent=2, separators=(',', ': ')).replace("'", r'\u0027') except (ValueError, TypeError): # non-json data or a bulk request return data # body has already been serialized to utf-8, deserialize it for logging # TODO: find a better way to avoid (de)encoding the body back and forth if body: body = body.decode('utf-8') logger.info( '%s %s [status:%s request:%.3fs]', method, full_url, status_code, duration ) logger.debug('> %s', body) logger.debug('< %s', response) if tracer.isEnabledFor(logging.INFO): # include pretty in trace curls path = path.replace('?', '?pretty&', 1) if '?' in path else path + '?pretty' if self.url_prefix: path = path.replace(self.url_prefix, '', 1) tracer.info("curl -X%s 'http://localhost:9200%s' -d '%s'", method, path, _pretty_json(body) if body else '') if tracer.isEnabledFor(logging.DEBUG): tracer.debug('#[%s] (%.3fs)\n#%s', status_code, duration, _pretty_json(response).replace('\n', '\n#') if response else '') def log_request_fail(self, method, full_url, body, duration, status_code=None, exception=None): """ Log an unsuccessful API call. """ logger.warning( '%s %s [status:%s request:%.3fs]', method, full_url, status_code or 'N/A', duration, exc_info=exception is not None ) # body has already been serialized to utf-8, deserialize it for logging # TODO: find a better way to avoid (de)encoding the body back and forth if body: body = body.decode('utf-8') logger.debug('> %s', body) def _raise_error(self, status_code, raw_data): """ Locate appropriate exception and raise it. """ error_message = raw_data additional_info = None try: additional_info = json.loads(raw_data) error_message = additional_info.get('error', error_message) if isinstance(error_message, dict) and 'type' in error_message: error_message = error_message['type'] except: # we don't care what went wrong pass raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch-py-1.6.0/elasticsearch/connection/esthrift/000077500000000000000000000000001253570341300235625ustar00rootroot00000000000000elasticsearch-py-1.6.0/elasticsearch/connection/esthrift/Rest-remote000077500000000000000000000036061253570341300257230ustar00rootroot00000000000000#!/usr/bin/env python # # Autogenerated by Thrift Compiler (0.9.0) # # DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING # # options string: py:new_style=true,utf8strings=true # import sys import pprint from urlparse import urlparse from thrift.transport import TTransport from thrift.transport import TSocket from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol import Rest from ttypes import * if len(sys.argv) <= 1 or sys.argv[1] == '--help': print '' print 'Usage: ' + sys.argv[0] + ' [-h host[:port]] [-u url] [-f[ramed]] function [arg1 [arg2...]]' print '' print 'Functions:' print ' RestResponse execute(RestRequest request)' print '' sys.exit(0) pp = pprint.PrettyPrinter(indent = 2) host = 'localhost' port = 9090 uri = '' framed = False http = False argi = 1 if sys.argv[argi] == '-h': parts = sys.argv[argi+1].split(':') host = parts[0] if len(parts) > 1: port = int(parts[1]) argi += 2 if sys.argv[argi] == '-u': url = urlparse(sys.argv[argi+1]) parts = url[1].split(':') host = parts[0] if len(parts) > 1: port = int(parts[1]) else: port = 80 uri = url[2] if url[4]: uri += '?%s' % url[4] http = True argi += 2 if sys.argv[argi] == '-f' or sys.argv[argi] == '-framed': framed = True argi += 1 cmd = sys.argv[argi] args = sys.argv[argi+1:] if http: transport = THttpClient.THttpClient(host, port, uri) else: socket = TSocket.TSocket(host, port) if framed: transport = TTransport.TFramedTransport(socket) else: transport = TTransport.TBufferedTransport(socket) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Rest.Client(protocol) transport.open() if cmd == 'execute': if len(args) != 1: print 'execute requires 1 args' sys.exit(1) pp.pprint(client.execute(eval(args[0]),)) else: print 'Unrecognized method %s' % cmd sys.exit(1) transport.close() elasticsearch-py-1.6.0/elasticsearch/connection/esthrift/Rest.py000066400000000000000000000146411253570341300250570ustar00rootroot00000000000000# # Autogenerated by Thrift Compiler (0.9.0) # # DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING # # options string: py:new_style=true,utf8strings=true # from thrift.Thrift import TType, TMessageType, TException, TApplicationException from ttypes import * from thrift.Thrift import TProcessor from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol, TProtocol try: from thrift.protocol import fastbinary except: fastbinary = None class Iface(object): def execute(self, request): """ Parameters: - request """ pass class Client(Iface): def __init__(self, iprot, oprot=None): self._iprot = self._oprot = iprot if oprot is not None: self._oprot = oprot self._seqid = 0 def execute(self, request): """ Parameters: - request """ self.send_execute(request) return self.recv_execute() def send_execute(self, request): self._oprot.writeMessageBegin('execute', TMessageType.CALL, self._seqid) args = execute_args() args.request = request args.write(self._oprot) self._oprot.writeMessageEnd() self._oprot.trans.flush() def recv_execute(self, ): (fname, mtype, rseqid) = self._iprot.readMessageBegin() if mtype == TMessageType.EXCEPTION: x = TApplicationException() x.read(self._iprot) self._iprot.readMessageEnd() raise x result = execute_result() result.read(self._iprot) self._iprot.readMessageEnd() if result.success is not None: return result.success raise TApplicationException(TApplicationException.MISSING_RESULT, "execute failed: unknown result"); class Processor(Iface, TProcessor): def __init__(self, handler): self._handler = handler self._processMap = {} self._processMap["execute"] = Processor.process_execute def process(self, iprot, oprot): (name, type, seqid) = iprot.readMessageBegin() if name not in self._processMap: iprot.skip(TType.STRUCT) iprot.readMessageEnd() x = TApplicationException(TApplicationException.UNKNOWN_METHOD, 'Unknown function %s' % (name)) oprot.writeMessageBegin(name, TMessageType.EXCEPTION, seqid) x.write(oprot) oprot.writeMessageEnd() oprot.trans.flush() return else: self._processMap[name](self, seqid, iprot, oprot) return True def process_execute(self, seqid, iprot, oprot): args = execute_args() args.read(iprot) iprot.readMessageEnd() result = execute_result() result.success = self._handler.execute(args.request) oprot.writeMessageBegin("execute", TMessageType.REPLY, seqid) result.write(oprot) oprot.writeMessageEnd() oprot.trans.flush() # HELPER FUNCTIONS AND STRUCTURES class execute_args(object): """ Attributes: - request """ thrift_spec = ( None, # 0 (1, TType.STRUCT, 'request', (RestRequest, RestRequest.thrift_spec), None, ), # 1 ) def __init__(self, request=None,): self.request = request def read(self, iprot): if iprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and isinstance(iprot.trans, TTransport.CReadableTransport) and self.thrift_spec is not None and fastbinary is not None: fastbinary.decode_binary(self, iprot.trans, (self.__class__, self.thrift_spec)) return iprot.readStructBegin() while True: (fname, ftype, fid) = iprot.readFieldBegin() if ftype == TType.STOP: break if fid == 1: if ftype == TType.STRUCT: self.request = RestRequest() self.request.read(iprot) else: iprot.skip(ftype) else: iprot.skip(ftype) iprot.readFieldEnd() iprot.readStructEnd() def write(self, oprot): if oprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and self.thrift_spec is not None and fastbinary is not None: oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec))) return oprot.writeStructBegin('execute_args') if self.request is not None: oprot.writeFieldBegin('request', TType.STRUCT, 1) self.request.write(oprot) oprot.writeFieldEnd() oprot.writeFieldStop() oprot.writeStructEnd() def validate(self): if self.request is None: raise TProtocol.TProtocolException(message='Required field request is unset!') return def __repr__(self): L = ['%s=%r' % (key, value) for key, value in self.__dict__.iteritems()] return '%s(%s)' % (self.__class__.__name__, ', '.join(L)) def __eq__(self, other): return isinstance(other, self.__class__) and self.__dict__ == other.__dict__ def __ne__(self, other): return not (self == other) class execute_result(object): """ Attributes: - success """ thrift_spec = ( (0, TType.STRUCT, 'success', (RestResponse, RestResponse.thrift_spec), None, ), # 0 ) def __init__(self, success=None,): self.success = success def read(self, iprot): if iprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and isinstance(iprot.trans, TTransport.CReadableTransport) and self.thrift_spec is not None and fastbinary is not None: fastbinary.decode_binary(self, iprot.trans, (self.__class__, self.thrift_spec)) return iprot.readStructBegin() while True: (fname, ftype, fid) = iprot.readFieldBegin() if ftype == TType.STOP: break if fid == 0: if ftype == TType.STRUCT: self.success = RestResponse() self.success.read(iprot) else: iprot.skip(ftype) else: iprot.skip(ftype) iprot.readFieldEnd() iprot.readStructEnd() def write(self, oprot): if oprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and self.thrift_spec is not None and fastbinary is not None: oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec))) return oprot.writeStructBegin('execute_result') if self.success is not None: oprot.writeFieldBegin('success', TType.STRUCT, 0) self.success.write(oprot) oprot.writeFieldEnd() oprot.writeFieldStop() oprot.writeStructEnd() def validate(self): return def __repr__(self): L = ['%s=%r' % (key, value) for key, value in self.__dict__.iteritems()] return '%s(%s)' % (self.__class__.__name__, ', '.join(L)) def __eq__(self, other): return isinstance(other, self.__class__) and self.__dict__ == other.__dict__ def __ne__(self, other): return not (self == other) elasticsearch-py-1.6.0/elasticsearch/connection/esthrift/__init__.py000066400000000000000000000000521253570341300256700ustar00rootroot00000000000000__all__ = ['ttypes', 'constants', 'Rest'] elasticsearch-py-1.6.0/elasticsearch/connection/esthrift/constants.py000066400000000000000000000004241253570341300261500ustar00rootroot00000000000000# # Autogenerated by Thrift Compiler (0.9.0) # # DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING # # options string: py:new_style=true,utf8strings=true # from thrift.Thrift import TType, TMessageType, TException, TApplicationException from ttypes import * elasticsearch-py-1.6.0/elasticsearch/connection/esthrift/ttypes.py000066400000000000000000000274101253570341300254700ustar00rootroot00000000000000# # Autogenerated by Thrift Compiler (0.9.0) # # DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING # # options string: py:new_style=true,utf8strings=true # from thrift.Thrift import TType, TMessageType, TException, TApplicationException from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol, TProtocol try: from thrift.protocol import fastbinary except: fastbinary = None class Method(object): GET = 0 PUT = 1 POST = 2 DELETE = 3 HEAD = 4 OPTIONS = 5 _VALUES_TO_NAMES = { 0: "GET", 1: "PUT", 2: "POST", 3: "DELETE", 4: "HEAD", 5: "OPTIONS", } _NAMES_TO_VALUES = { "GET": 0, "PUT": 1, "POST": 2, "DELETE": 3, "HEAD": 4, "OPTIONS": 5, } class Status(object): CONT = 100 SWITCHING_PROTOCOLS = 101 OK = 200 CREATED = 201 ACCEPTED = 202 NON_AUTHORITATIVE_INFORMATION = 203 NO_CONTENT = 204 RESET_CONTENT = 205 PARTIAL_CONTENT = 206 MULTI_STATUS = 207 MULTIPLE_CHOICES = 300 MOVED_PERMANENTLY = 301 FOUND = 302 SEE_OTHER = 303 NOT_MODIFIED = 304 USE_PROXY = 305 TEMPORARY_REDIRECT = 307 BAD_REQUEST = 400 UNAUTHORIZED = 401 PAYMENT_REQUIRED = 402 FORBIDDEN = 403 NOT_FOUND = 404 METHOD_NOT_ALLOWED = 405 NOT_ACCEPTABLE = 406 PROXY_AUTHENTICATION = 407 REQUEST_TIMEOUT = 408 CONFLICT = 409 GONE = 410 LENGTH_REQUIRED = 411 PRECONDITION_FAILED = 412 REQUEST_ENTITY_TOO_LARGE = 413 REQUEST_URI_TOO_LONG = 414 UNSUPPORTED_MEDIA_TYPE = 415 REQUESTED_RANGE_NOT_SATISFIED = 416 EXPECTATION_FAILED = 417 UNPROCESSABLE_ENTITY = 422 LOCKED = 423 FAILED_DEPENDENCY = 424 INTERNAL_SERVER_ERROR = 500 NOT_IMPLEMENTED = 501 BAD_GATEWAY = 502 SERVICE_UNAVAILABLE = 503 GATEWAY_TIMEOUT = 504 INSUFFICIENT_STORAGE = 506 _VALUES_TO_NAMES = { 100: "CONT", 101: "SWITCHING_PROTOCOLS", 200: "OK", 201: "CREATED", 202: "ACCEPTED", 203: "NON_AUTHORITATIVE_INFORMATION", 204: "NO_CONTENT", 205: "RESET_CONTENT", 206: "PARTIAL_CONTENT", 207: "MULTI_STATUS", 300: "MULTIPLE_CHOICES", 301: "MOVED_PERMANENTLY", 302: "FOUND", 303: "SEE_OTHER", 304: "NOT_MODIFIED", 305: "USE_PROXY", 307: "TEMPORARY_REDIRECT", 400: "BAD_REQUEST", 401: "UNAUTHORIZED", 402: "PAYMENT_REQUIRED", 403: "FORBIDDEN", 404: "NOT_FOUND", 405: "METHOD_NOT_ALLOWED", 406: "NOT_ACCEPTABLE", 407: "PROXY_AUTHENTICATION", 408: "REQUEST_TIMEOUT", 409: "CONFLICT", 410: "GONE", 411: "LENGTH_REQUIRED", 412: "PRECONDITION_FAILED", 413: "REQUEST_ENTITY_TOO_LARGE", 414: "REQUEST_URI_TOO_LONG", 415: "UNSUPPORTED_MEDIA_TYPE", 416: "REQUESTED_RANGE_NOT_SATISFIED", 417: "EXPECTATION_FAILED", 422: "UNPROCESSABLE_ENTITY", 423: "LOCKED", 424: "FAILED_DEPENDENCY", 500: "INTERNAL_SERVER_ERROR", 501: "NOT_IMPLEMENTED", 502: "BAD_GATEWAY", 503: "SERVICE_UNAVAILABLE", 504: "GATEWAY_TIMEOUT", 506: "INSUFFICIENT_STORAGE", } _NAMES_TO_VALUES = { "CONT": 100, "SWITCHING_PROTOCOLS": 101, "OK": 200, "CREATED": 201, "ACCEPTED": 202, "NON_AUTHORITATIVE_INFORMATION": 203, "NO_CONTENT": 204, "RESET_CONTENT": 205, "PARTIAL_CONTENT": 206, "MULTI_STATUS": 207, "MULTIPLE_CHOICES": 300, "MOVED_PERMANENTLY": 301, "FOUND": 302, "SEE_OTHER": 303, "NOT_MODIFIED": 304, "USE_PROXY": 305, "TEMPORARY_REDIRECT": 307, "BAD_REQUEST": 400, "UNAUTHORIZED": 401, "PAYMENT_REQUIRED": 402, "FORBIDDEN": 403, "NOT_FOUND": 404, "METHOD_NOT_ALLOWED": 405, "NOT_ACCEPTABLE": 406, "PROXY_AUTHENTICATION": 407, "REQUEST_TIMEOUT": 408, "CONFLICT": 409, "GONE": 410, "LENGTH_REQUIRED": 411, "PRECONDITION_FAILED": 412, "REQUEST_ENTITY_TOO_LARGE": 413, "REQUEST_URI_TOO_LONG": 414, "UNSUPPORTED_MEDIA_TYPE": 415, "REQUESTED_RANGE_NOT_SATISFIED": 416, "EXPECTATION_FAILED": 417, "UNPROCESSABLE_ENTITY": 422, "LOCKED": 423, "FAILED_DEPENDENCY": 424, "INTERNAL_SERVER_ERROR": 500, "NOT_IMPLEMENTED": 501, "BAD_GATEWAY": 502, "SERVICE_UNAVAILABLE": 503, "GATEWAY_TIMEOUT": 504, "INSUFFICIENT_STORAGE": 506, } class RestRequest(object): """ Attributes: - method - uri - parameters - headers - body """ thrift_spec = ( None, # 0 (1, TType.I32, 'method', None, None, ), # 1 (2, TType.STRING, 'uri', None, None, ), # 2 (3, TType.MAP, 'parameters', (TType.STRING,None,TType.STRING,None), None, ), # 3 (4, TType.MAP, 'headers', (TType.STRING,None,TType.STRING,None), None, ), # 4 (5, TType.STRING, 'body', None, None, ), # 5 ) def __init__(self, method=None, uri=None, parameters=None, headers=None, body=None,): self.method = method self.uri = uri self.parameters = parameters self.headers = headers self.body = body def read(self, iprot): if iprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and isinstance(iprot.trans, TTransport.CReadableTransport) and self.thrift_spec is not None and fastbinary is not None: fastbinary.decode_binary(self, iprot.trans, (self.__class__, self.thrift_spec)) return iprot.readStructBegin() while True: (fname, ftype, fid) = iprot.readFieldBegin() if ftype == TType.STOP: break if fid == 1: if ftype == TType.I32: self.method = iprot.readI32(); else: iprot.skip(ftype) elif fid == 2: if ftype == TType.STRING: self.uri = iprot.readString().decode('utf-8') else: iprot.skip(ftype) elif fid == 3: if ftype == TType.MAP: self.parameters = {} (_ktype1, _vtype2, _size0 ) = iprot.readMapBegin() for _i4 in xrange(_size0): _key5 = iprot.readString().decode('utf-8') _val6 = iprot.readString().decode('utf-8') self.parameters[_key5] = _val6 iprot.readMapEnd() else: iprot.skip(ftype) elif fid == 4: if ftype == TType.MAP: self.headers = {} (_ktype8, _vtype9, _size7 ) = iprot.readMapBegin() for _i11 in xrange(_size7): _key12 = iprot.readString().decode('utf-8') _val13 = iprot.readString().decode('utf-8') self.headers[_key12] = _val13 iprot.readMapEnd() else: iprot.skip(ftype) elif fid == 5: if ftype == TType.STRING: self.body = iprot.readString(); else: iprot.skip(ftype) else: iprot.skip(ftype) iprot.readFieldEnd() iprot.readStructEnd() def write(self, oprot): if oprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and self.thrift_spec is not None and fastbinary is not None: oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec))) return oprot.writeStructBegin('RestRequest') if self.method is not None: oprot.writeFieldBegin('method', TType.I32, 1) oprot.writeI32(self.method) oprot.writeFieldEnd() if self.uri is not None: oprot.writeFieldBegin('uri', TType.STRING, 2) oprot.writeString(self.uri.encode('utf-8')) oprot.writeFieldEnd() if self.parameters is not None: oprot.writeFieldBegin('parameters', TType.MAP, 3) oprot.writeMapBegin(TType.STRING, TType.STRING, len(self.parameters)) for kiter14,viter15 in self.parameters.items(): oprot.writeString(kiter14.encode('utf-8')) oprot.writeString(viter15.encode('utf-8')) oprot.writeMapEnd() oprot.writeFieldEnd() if self.headers is not None: oprot.writeFieldBegin('headers', TType.MAP, 4) oprot.writeMapBegin(TType.STRING, TType.STRING, len(self.headers)) for kiter16,viter17 in self.headers.items(): oprot.writeString(kiter16.encode('utf-8')) oprot.writeString(viter17.encode('utf-8')) oprot.writeMapEnd() oprot.writeFieldEnd() if self.body is not None: oprot.writeFieldBegin('body', TType.STRING, 5) oprot.writeString(self.body) oprot.writeFieldEnd() oprot.writeFieldStop() oprot.writeStructEnd() def validate(self): if self.method is None: raise TProtocol.TProtocolException(message='Required field method is unset!') if self.uri is None: raise TProtocol.TProtocolException(message='Required field uri is unset!') return def __repr__(self): L = ['%s=%r' % (key, value) for key, value in self.__dict__.iteritems()] return '%s(%s)' % (self.__class__.__name__, ', '.join(L)) def __eq__(self, other): return isinstance(other, self.__class__) and self.__dict__ == other.__dict__ def __ne__(self, other): return not (self == other) class RestResponse(object): """ Attributes: - status - headers - body """ thrift_spec = ( None, # 0 (1, TType.I32, 'status', None, None, ), # 1 (2, TType.MAP, 'headers', (TType.STRING,None,TType.STRING,None), None, ), # 2 (3, TType.STRING, 'body', None, None, ), # 3 ) def __init__(self, status=None, headers=None, body=None,): self.status = status self.headers = headers self.body = body def read(self, iprot): if iprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and isinstance(iprot.trans, TTransport.CReadableTransport) and self.thrift_spec is not None and fastbinary is not None: fastbinary.decode_binary(self, iprot.trans, (self.__class__, self.thrift_spec)) return iprot.readStructBegin() while True: (fname, ftype, fid) = iprot.readFieldBegin() if ftype == TType.STOP: break if fid == 1: if ftype == TType.I32: self.status = iprot.readI32(); else: iprot.skip(ftype) elif fid == 2: if ftype == TType.MAP: self.headers = {} (_ktype19, _vtype20, _size18 ) = iprot.readMapBegin() for _i22 in xrange(_size18): _key23 = iprot.readString().decode('utf-8') _val24 = iprot.readString().decode('utf-8') self.headers[_key23] = _val24 iprot.readMapEnd() else: iprot.skip(ftype) elif fid == 3: if ftype == TType.STRING: self.body = iprot.readString(); else: iprot.skip(ftype) else: iprot.skip(ftype) iprot.readFieldEnd() iprot.readStructEnd() def write(self, oprot): if oprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and self.thrift_spec is not None and fastbinary is not None: oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec))) return oprot.writeStructBegin('RestResponse') if self.status is not None: oprot.writeFieldBegin('status', TType.I32, 1) oprot.writeI32(self.status) oprot.writeFieldEnd() if self.headers is not None: oprot.writeFieldBegin('headers', TType.MAP, 2) oprot.writeMapBegin(TType.STRING, TType.STRING, len(self.headers)) for kiter25,viter26 in self.headers.items(): oprot.writeString(kiter25.encode('utf-8')) oprot.writeString(viter26.encode('utf-8')) oprot.writeMapEnd() oprot.writeFieldEnd() if self.body is not None: oprot.writeFieldBegin('body', TType.STRING, 3) oprot.writeString(self.body) oprot.writeFieldEnd() oprot.writeFieldStop() oprot.writeStructEnd() def validate(self): if self.status is None: raise TProtocol.TProtocolException(message='Required field status is unset!') return def __repr__(self): L = ['%s=%r' % (key, value) for key, value in self.__dict__.iteritems()] return '%s(%s)' % (self.__class__.__name__, ', '.join(L)) def __eq__(self, other): return isinstance(other, self.__class__) and self.__dict__ == other.__dict__ def __ne__(self, other): return not (self == other) elasticsearch-py-1.6.0/elasticsearch/connection/http_requests.py000066400000000000000000000071131253570341300252200ustar00rootroot00000000000000import time import warnings try: import requests REQUESTS_AVAILABLE = True except ImportError: REQUESTS_AVAILABLE = False from .base import Connection from ..exceptions import ConnectionError, ImproperlyConfigured, ConnectionTimeout, SSLError from ..compat import urlencode, string_types class RequestsHttpConnection(Connection): """ Connection using the `requests` library. :arg http_auth: optional http auth information as either ':' separated string or a tuple. Any value will be passed into requests as `auth`. :arg use_ssl: use ssl for the connection if `True` :arg verify_certs: whether to verify SSL certificates :arg ca_certs: optional path to CA bundle. By default standard requests' bundle will be used. :arg client_cert: path to the file containing the private key and the certificate """ def __init__(self, host='localhost', port=9200, http_auth=None, use_ssl=False, verify_certs=False, ca_certs=None, client_cert=None, **kwargs): if not REQUESTS_AVAILABLE: raise ImproperlyConfigured("Please install requests to use RequestsHttpConnection.") super(RequestsHttpConnection, self).__init__(host= host, port=port, **kwargs) self.session = requests.session() if http_auth is not None: if isinstance(http_auth, (tuple, list)): http_auth = tuple(http_auth) elif isinstance(http_auth, string_types): http_auth = tuple(http_auth.split(':', 1)) self.session.auth = http_auth self.base_url = 'http%s://%s:%d%s' % ( 's' if use_ssl else '', host, port, self.url_prefix ) self.session.verify = verify_certs self.session.cert = client_cert if ca_certs: if not verify_certs: raise ImproperlyConfigured("You cannot pass CA certificates when verify SSL is off.") self.session.verify = ca_certs if use_ssl and not verify_certs: warnings.warn( 'Connecting to %s using SSL with verify_certs=False is insecure.' % self.base_url) def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=()): url = self.base_url + url if params: url = '%s?%s' % (url, urlencode(params or {})) start = time.time() try: response = self.session.request(method, url, data=body, timeout=timeout or self.timeout) duration = time.time() - start raw_data = response.text except requests.exceptions.SSLError as e: self.log_request_fail(method, url, body, time.time() - start, exception=e) raise SSLError('N/A', str(e), e) except requests.Timeout as e: self.log_request_fail(method, url, body, time.time() - start, exception=e) raise ConnectionTimeout('TIMEOUT', str(e), e) except requests.ConnectionError as e: self.log_request_fail(method, url, body, time.time() - start, exception=e) raise ConnectionError('N/A', str(e), e) # raise errors based on http status codes, let the client handle those if needed if not (200 <= response.status_code < 300) and response.status_code not in ignore: self.log_request_fail(method, url, body, duration, response.status_code) self._raise_error(response.status_code, raw_data) self.log_request_success(method, url, response.request.path_url, body, response.status_code, raw_data, duration) return response.status_code, response.headers, raw_data elasticsearch-py-1.6.0/elasticsearch/connection/http_urllib3.py000066400000000000000000000077141253570341300247300ustar00rootroot00000000000000import time import urllib3 from urllib3.exceptions import ReadTimeoutError, SSLError as UrllibSSLError import warnings from .base import Connection from ..exceptions import ConnectionError, ImproperlyConfigured, ConnectionTimeout, SSLError from ..compat import urlencode class Urllib3HttpConnection(Connection): """ Default connection class using the `urllib3` library and the http protocol. :arg http_auth: optional http auth information as either ':' separated string or a tuple :arg use_ssl: use ssl for the connection if `True` :arg verify_certs: whether to verify SSL certificates :arg ca_certs: optional path to CA bundle. See http://urllib3.readthedocs.org/en/latest/security.html#using-certifi-with-urllib3 for instructions how to get default set :arg client_cert: path to the file containing the private key and the certificate :arg maxsize: the maximum number of connections which will be kept open to this host. """ def __init__(self, host='localhost', port=9200, http_auth=None, use_ssl=False, verify_certs=False, ca_certs=None, client_cert=None, maxsize=10, **kwargs): super(Urllib3HttpConnection, self).__init__(host=host, port=port, **kwargs) self.headers = {} if http_auth is not None: if isinstance(http_auth, (tuple, list)): http_auth = ':'.join(http_auth) self.headers = urllib3.make_headers(basic_auth=http_auth) pool_class = urllib3.HTTPConnectionPool kw = {} if use_ssl: pool_class = urllib3.HTTPSConnectionPool if verify_certs: kw['cert_reqs'] = 'CERT_REQUIRED' kw['ca_certs'] = ca_certs kw['cert_file'] = client_cert elif ca_certs: raise ImproperlyConfigured("You cannot pass CA certificates when verify SSL is off.") else: warnings.warn( 'Connecting to %s using SSL with verify_certs=False is insecure.' % host) self.pool = pool_class(host, port=port, timeout=self.timeout, maxsize=maxsize, **kw) def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=()): url = self.url_prefix + url if params: url = '%s?%s' % (url, urlencode(params)) full_url = self.host + url start = time.time() try: kw = {} if timeout: kw['timeout'] = timeout # in python2 we need to make sure the url and method are not # unicode. Otherwise the body will be decoded into unicode too and # that will fail (#133, #201). if not isinstance(url, str): url = url.encode('utf-8') if not isinstance(method, str): method = method.encode('utf-8') response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw) duration = time.time() - start raw_data = response.data.decode('utf-8') except UrllibSSLError as e: self.log_request_fail(method, full_url, body, time.time() - start, exception=e) raise SSLError('N/A', str(e), e) except ReadTimeoutError as e: self.log_request_fail(method, full_url, body, time.time() - start, exception=e) raise ConnectionTimeout('TIMEOUT', str(e), e) except Exception as e: self.log_request_fail(method, full_url, body, time.time() - start, exception=e) raise ConnectionError('N/A', str(e), e) if not (200 <= response.status < 300) and response.status not in ignore: self.log_request_fail(method, url, body, duration, response.status) self._raise_error(response.status, raw_data) self.log_request_success(method, full_url, url, body, response.status, raw_data, duration) return response.status, response.getheaders(), raw_data elasticsearch-py-1.6.0/elasticsearch/connection/memcached.py000066400000000000000000000054761253570341300242260ustar00rootroot00000000000000import time try: import simplejson as json except ImportError: import json from ..exceptions import TransportError, ConnectionError, ImproperlyConfigured from ..compat import urlencode from .pooling import PoolingConnection class MemcachedConnection(PoolingConnection): """ Client using the `pylibmc` python library to communicate with elasticsearch using the memcached protocol. Requires plugin in the cluster. See https://github.com/elasticsearch/elasticsearch-transport-memcached for more details. """ transport_schema = 'memcached' method_map = { 'PUT': 'set', 'POST': 'set', 'DELETE': 'delete', 'HEAD': 'get', 'GET': 'get', } def __init__(self, host='localhost', port=11211, **kwargs): try: import pylibmc except ImportError: raise ImproperlyConfigured("You need to install pylibmc to use the MemcachedConnection class.") super(MemcachedConnection, self).__init__(host=host, port=port, **kwargs) self._make_connection = lambda: pylibmc.Client(['%s:%s' % (host, port)], behaviors={"tcp_nodelay": True}) def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=()): mc = self._get_connection() url = self.url_prefix + url if params: url = '%s?%s' % (url, urlencode(params or {})) full_url = self.host + url mc_method = self.method_map.get(method, 'get') start = time.time() try: status = 200 if mc_method == 'set': # no response from set commands response = '' if not json.dumps(mc.set(url, body)): status = 500 else: response = mc.get(url) duration = time.time() - start if response: response = response.decode('utf-8') except Exception as e: self.log_request_fail(method, full_url, body, time.time() - start, exception=e) raise ConnectionError('N/A', str(e), e) finally: self._release_connection(mc) # try not to load the json every time if response and response[0] == '{' and ('"status"' in response or '"error"' in response): data = json.loads(response) if 'status' in data and isinstance(data['status'], int): status = data['status'] elif 'error' in data: raise TransportError('N/A', data['error']) if not (200 <= status < 300) and status not in ignore: self.log_request_fail(method, url, body, duration, status) self._raise_error(status, response) self.log_request_success(method, full_url, url, body, status, response, duration) return status, {}, response elasticsearch-py-1.6.0/elasticsearch/connection/pooling.py000066400000000000000000000010351253570341300237520ustar00rootroot00000000000000try: import queue except ImportError: import Queue as queue from .base import Connection class PoolingConnection(Connection): def __init__(self, *args, **kwargs): self._free_connections = queue.Queue() super(PoolingConnection, self).__init__(*args, **kwargs) def _get_connection(self): try: return self._free_connections.get_nowait() except queue.Empty: return self._make_connection() def _release_connection(self, con): self._free_connections.put(con) elasticsearch-py-1.6.0/elasticsearch/connection/thrift.py000066400000000000000000000073201253570341300236060ustar00rootroot00000000000000from __future__ import absolute_import from socket import timeout as SocketTimeout from socket import error as SocketError import time import logging try: from .esthrift import Rest from .esthrift.ttypes import Method, RestRequest from thrift.transport import TTransport, TSocket, TSSLSocket from thrift.protocol import TBinaryProtocol from thrift.Thrift import TException THRIFT_AVAILABLE = True except ImportError: THRIFT_AVAILABLE = False from ..exceptions import ConnectionError, ImproperlyConfigured, ConnectionTimeout from .pooling import PoolingConnection logger = logging.getLogger('elasticsearch') class ThriftConnection(PoolingConnection): """ Connection using the `thrift` protocol to communicate with elasticsearch. See https://github.com/elasticsearch/elasticsearch-transport-thrift for additional info. """ transport_schema = 'thrift' def __init__(self, host='localhost', port=9500, framed_transport=False, use_ssl=False, **kwargs): """ :arg framed_transport: use `TTransport.TFramedTransport` instead of `TTransport.TBufferedTransport` """ if not THRIFT_AVAILABLE: raise ImproperlyConfigured("Thrift is not available.") super(ThriftConnection, self).__init__(host=host, port=port, **kwargs) self._framed_transport = framed_transport self._tsocket_class = TSocket.TSocket if use_ssl: self._tsocket_class = TSSLSocket.TSSLSocket self._tsocket_args = (host, port) def _make_connection(self): socket = self._tsocket_class(*self._tsocket_args) socket.setTimeout(self.timeout * 1000.0) if self._framed_transport: transport = TTransport.TFramedTransport(socket) else: transport = TTransport.TBufferedTransport(socket) protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) client = Rest.Client(protocol) client.transport = transport transport.open() return client def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=()): request = RestRequest(method=Method._NAMES_TO_VALUES[method.upper()], uri=url, parameters=params, body=body) start = time.time() tclient = None try: tclient = self._get_connection() response = tclient.execute(request) duration = time.time() - start except SocketTimeout as e: self.log_request_fail(method, url, body, time.time() - start, exception=e) raise ConnectionTimeout('TIMEOUT', str(e), e) except (TException, SocketError) as e: self.log_request_fail(method, url, body, time.time() - start, exception=e) if tclient: try: # try closing transport socket tclient.transport.close() except Exception as e: logger.warning( 'Exception %s occured when closing a failed thrift connection.', e, exc_info=True ) raise ConnectionError('N/A', str(e), e) self._release_connection(tclient) if not (200 <= response.status < 300) and response.status not in ignore: self.log_request_fail(method, url, body, duration, response.status) self._raise_error(response.status, response.body) self.log_request_success(method, url, url, body, response.status, response.body, duration) headers = {} if response.headers: headers = dict((k.lower(), v) for k, v in response.headers.items()) return response.status, headers, response.body or '' elasticsearch-py-1.6.0/elasticsearch/connection_pool.py000066400000000000000000000226071253570341300233440ustar00rootroot00000000000000import time import random import logging try: from Queue import PriorityQueue, Empty except ImportError: from queue import PriorityQueue, Empty from .exceptions import ImproperlyConfigured logger = logging.getLogger('elasticsearch') class ConnectionSelector(object): """ Simple class used to select a connection from a list of currently live connection instances. In init time it is passed a dictionary containing all the connections' options which it can then use during the selection process. When the `select` method is called it is given a list of *currently* live connections to choose from. The options dictionary is the one that has been passed to :class:`~elasticsearch.Transport` as `hosts` param and the same that is used to construct the Connection object itself. When the Connection was created from information retrieved from the cluster via the sniffing process it will be the dictionary returned by the `host_info_callback`. Example of where this would be useful is a zone-aware selector that would only select connections from it's own zones and only fall back to other connections where there would be none in it's zones. """ def __init__(self, opts): """ :arg opts: dictionary of connection instances and their options """ self.connection_opts = opts def select(self, connections): """ Select a connection from the given list. :arg connections: list of live connections to choose from """ pass class RandomSelector(ConnectionSelector): """ Select a connection at random """ def select(self, connections): return random.choice(connections) class RoundRobinSelector(ConnectionSelector): """ Selector using round-robin. """ def __init__(self, opts): super(RoundRobinSelector, self).__init__(opts) self.rr = -1 def select(self, connections): self.rr += 1 self.rr %= len(connections) return connections[self.rr] class ConnectionPool(object): """ Container holding the :class:`~elasticsearch.Connection` instances, managing the selection process (via a :class:`~elasticsearch.ConnectionSelector`) and dead connections. It's only interactions are with the :class:`~elasticsearch.Transport` class that drives all the actions within `ConnectionPool`. Initially connections are stored on the class as a list and, along with the connection options, get passed to the `ConnectionSelector` instance for future reference. Upon each request the `Transport` will ask for a `Connection` via the `get_connection` method. If the connection fails (it's `perform_request` raises a `ConnectionError`) it will be marked as dead (via `mark_dead`) and put on a timeout (if it fails N times in a row the timeout is exponentially longer - the formula is `default_timeout * 2 ** (fail_count - 1)`). When the timeout is over the connection will be resurrected and returned to the live pool. A connection that has been peviously marked as dead and succeedes will be marked as live (it's fail count will be deleted). """ def __init__(self, connections, dead_timeout=60, timeout_cutoff=5, selector_class=RoundRobinSelector, randomize_hosts=True, **kwargs): """ :arg connections: list of tuples containing the :class:`~elasticsearch.Connection` instance and it's options :arg dead_timeout: number of seconds a connection should be retired for after a failure, increases on consecutive failures :arg timeout_cutoff: number of consecutive failures after which the timeout doesn't increase :arg selector_class: :class:`~elasticsearch.ConnectionSelector` subclass to use if more than one connection is live :arg randomize_hosts: shuffle the list of connections upon arrival to avoid dog piling effect across processes """ if not connections: raise ImproperlyConfigured("No defined connections, you need to " "specify at least one host.") self.connection_opts = connections self.connections = [c for (c, opts) in connections] # remember original connection list for resurrect(force=True) self.orig_connections = tuple(self.connections) # PriorityQueue for thread safety and ease of timeout management self.dead = PriorityQueue(len(self.connections)) self.dead_count = {} if randomize_hosts: # randomize the connection list to avoid all clients hitting same node # after startup/restart random.shuffle(self.connections) # default timeout after which to try resurrecting a connection self.dead_timeout = dead_timeout self.timeout_cutoff = timeout_cutoff self.selector = selector_class(dict(connections)) def mark_dead(self, connection, now=None): """ Mark the connection as dead (failed). Remove it from the live pool and put it on a timeout. :arg connection: the failed instance """ # allow inject for testing purposes now = now if now else time.time() try: self.connections.remove(connection) except ValueError: # connection not alive or another thread marked it already, ignore return else: dead_count = self.dead_count.get(connection, 0) + 1 self.dead_count[connection] = dead_count timeout = self.dead_timeout * 2 ** min(dead_count - 1, self.timeout_cutoff) self.dead.put((now + timeout, connection)) logger.warning( 'Connection %r has failed for %i times in a row, putting on %i second timeout.', connection, dead_count, timeout ) def mark_live(self, connection): """ Mark connection as healthy after a resurrection. Resets the fail counter for the connection. :arg connection: the connection to redeem """ try: del self.dead_count[connection] except KeyError: # race condition, safe to ignore pass def resurrect(self, force=False): """ Attempt to resurrect a connection from the dead pool. It will try to locate one (not all) eligible (it's timeout is over) connection to return to the live pool. Any resurrected connection is also returned. :arg force: resurrect a connection even if there is none eligible (used when we have no live connections). If force is specified resurrect always returns a connection. """ # no dead connections if self.dead.empty(): # we are forced to return a connection, take one from the original # list. This is to avoid a race condition where get_connection can # see no live connections but when it calls resurrect self.dead is # also empty. We assume that other threat has resurrected all # available connections so we can safely return one at random. if force: return random.choice(self.orig_connections) return try: # retrieve a connection to check timeout, connection = self.dead.get(block=False) except Empty: # other thread has been faster and the queue is now empty. If we # are forced, return a connection at random again. if force: return random.choice(self.orig_connections) return if not force and timeout > time.time(): # return it back if not eligible and not forced self.dead.put((timeout, connection)) return # either we were forced or the connection is elligible to be retried self.connections.append(connection) logger.info('Resurrecting connection %r (force=%s).', connection, force) return connection def get_connection(self): """ Return a connection from the pool using the `ConnectionSelector` instance. It tries to resurrect eligible connections, forces a resurrection when no connections are availible and passes the list of live connections to the selector instance to choose from. Returns a connection instance and it's current fail count. """ self.resurrect() connections = self.connections[:] # no live nodes, resurrect one by force and return it if not connections: return self.resurrect(True) # only call selector if we have a selection if len(connections) > 1: return self.selector.select(self.connections) # only one connection, no need for a selector return connections[0] class DummyConnectionPool(ConnectionPool): def __init__(self, connections, **kwargs): if len(connections) != 1: raise ImproperlyConfigured("DummyConnectionPool needs exactly one " "connection defined.") # we need connection opts for sniffing logic self.connection_opts = connections self.connection = connections[0][0] self.connections = (self.connection, ) def get_connection(self): return self.connection def _noop(self, *args, **kwargs): pass mark_dead = mark_live = resurrect = _noop elasticsearch-py-1.6.0/elasticsearch/exceptions.py000066400000000000000000000055701253570341300223350ustar00rootroot00000000000000__all__ = [ 'ImproperlyConfigured', 'ElasticsearchException', 'SerializationError', 'TransportError', 'NotFoundError', 'ConflictError', 'RequestError', 'ConnectionError', 'SSLError', 'ConnectionTimeout' ] class ImproperlyConfigured(Exception): """ Exception raised when the config passed to the client is inconsistent or invalid. """ class ElasticsearchException(Exception): """ Base class for all exceptions raised by this package's operations (doesn't apply to :class:`~elasticsearch.ImproperlyConfigured`). """ class SerializationError(ElasticsearchException): """ Data passed in failed to serialize properly in the ``Serializer`` being used. """ class TransportError(ElasticsearchException): """ Exception raised when ES returns a non-OK (>=400) HTTP status code. Or when an actual connection error happens; in that case the ``status_code`` will be set to ``'N/A'``. """ @property def status_code(self): """ The HTTP status code of the response that precipitated the error or ``'N/A'`` if not applicable. """ return self.args[0] @property def error(self): """ A string error message. """ return self.args[1] @property def info(self): """ Dict of returned error info from ES, where available. """ return self.args[2] def __str__(self): return 'TransportError(%s, %r)' % (self.status_code, self.error) class ConnectionError(TransportError): """ Error raised when there was an exception while talking to ES. Original exception from the underlying :class:`~elasticsearch.Connection` implementation is available as ``.info.`` """ def __str__(self): return 'ConnectionError(%s) caused by: %s(%s)' % ( self.error, self.info.__class__.__name__, self.info) class SSLError(ConnectionError): """ Error raised when encountering SSL errors. """ class ConnectionTimeout(ConnectionError): """ A network timeout. Doesn't cause a node retry by default. """ def __str__(self): return 'ConnectionTimeout caused by - %s(%s)' % ( self.info.__class__.__name__, self.info) class NotFoundError(TransportError): """ Exception representing a 404 status code. """ class ConflictError(TransportError): """ Exception representing a 409 status code. """ class RequestError(TransportError): """ Exception representing a 400 status code. """ class AuthenticationException(TransportError): """ Exception representing a 401 status code. """ class AuthorizationException(TransportError): """ Exception representing a 403 status code. """ # more generic mappings from status_code to python exceptions HTTP_EXCEPTIONS = { 400: RequestError, 401: AuthenticationException, 403: AuthorizationException, 404: NotFoundError, 409: ConflictError, } elasticsearch-py-1.6.0/elasticsearch/helpers/000077500000000000000000000000001253570341300212355ustar00rootroot00000000000000elasticsearch-py-1.6.0/elasticsearch/helpers/__init__.py000066400000000000000000000265711253570341300233610ustar00rootroot00000000000000import logging from itertools import islice from operator import methodcaller from ..exceptions import ElasticsearchException, TransportError from ..compat import map logger = logging.getLogger('elasticsearch.helpers') class BulkIndexError(ElasticsearchException): @property def errors(self): """ List of errors from execution of the last chunk. """ return self.args[1] class ScanError(ElasticsearchException): pass def expand_action(data): """ From one document or action definition passed in by the user extract the action/data lines needed for elasticsearch's :meth:`~elasticsearch.Elasticsearch.bulk` api. """ # make sure we don't alter the action data = data.copy() op_type = data.pop('_op_type', 'index') action = {op_type: {}} for key in ('_index', '_parent', '_percolate', '_routing', '_timestamp', '_ttl', '_type', '_version', '_version_type', '_id', '_retry_on_conflict'): if key in data: action[op_type][key] = data.pop(key) # no data payload for delete if op_type == 'delete': return action, None return action, data.get('_source', data) def streaming_bulk(client, actions, chunk_size=500, raise_on_error=True, expand_action_callback=expand_action, raise_on_exception=True, **kwargs): """ Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use :func:`~elasticsearch.helpers.bulk` which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent. This function expects the action to be in the format as returned by :meth:`~elasticsearch.Elasticsearch.search`, for example:: { '_index': 'index-name', '_type': 'document', '_id': 42, '_parent': 5, '_ttl': '1d', '_source': { ... } } Alternatively, if `_source` is not present, it will pop all metadata fields from the doc and use the rest as the document data. If you wish to perform other operations, like `delete` or `update` use the `_op_type` field in your actions (`_op_type` defaults to `index`):: { '_op_type': 'delete', '_index': 'index-name', '_type': 'document', '_id': 42, } { '_op_type': 'update', '_index': 'index-name', '_type': 'document', '_id': 42, 'doc': {'question': 'The life, universe and everything.'} } :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg actions: iterable containing the actions to be executed :arg chunk_size: number of docs in one chunk sent to es (default: 500) :arg raise_on_error: raise ``BulkIndexError`` containing errors (as `.errors`) from the execution of the last chunk when some occur. By default we raise. :arg raise_on_exception: if ``False`` then don't propagate exceptions from call to ``bulk`` and just report the items that failed as failed. :arg expand_action_callback: callback executed on each action passed in, should return a tuple containing the action line and the data line (`None` if data line should be omitted). """ actions = map(expand_action_callback, actions) # if raise on error is set, we need to collect errors per chunk before raising them errors = [] while True: chunk = islice(actions, chunk_size) # raise on exception means we might need to iterate on chunk twice if not raise_on_exception: chunk = list(chunk) bulk_actions = [] for action, data in chunk: bulk_actions.append(action) if data is not None: bulk_actions.append(data) if not bulk_actions: return try: # send the actual request resp = client.bulk(bulk_actions, **kwargs) except TransportError as e: # default behavior - just propagate exception if raise_on_exception: raise e # if we are not propagating, mark all actions in current chunk as failed err_message = str(e) exc_errors = [] for action, data in chunk: info = {"error": err_message, "status": e.status_code, "exception": e, "data": data} op_type, action = action.popitem() info.update(action) exc_errors.append({op_type: info}) # emulate standard behavior for failed actions if raise_on_error: raise BulkIndexError('%i document(s) failed to index.' % len(exc_errors), exc_errors) else: for err in exc_errors: yield False, err continue # go through request-reponse pairs and detect failures for op_type, item in map(methodcaller('popitem'), resp['items']): ok = 200 <= item.get('status', 500) < 300 if not ok and raise_on_error: errors.append({op_type: item}) if not errors: # if we are not just recording all errors to be able to raise # them all at once, yield items individually yield ok, {op_type: item} if errors: raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors) def bulk(client, actions, stats_only=False, **kwargs): """ Helper for the :meth:`~elasticsearch.Elasticsearch.bulk` api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if `stats_only` is set to `True`. See :func:`~elasticsearch.helpers.streaming_bulk` for more information and accepted formats. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg actions: iterator containing the actions :arg stats_only: if `True` only report number of successful/failed operations instead of just number of successful and a list of error responses Any additional keyword arguments will be passed to :func:`~elasticsearch.helpers.streaming_bulk` which is used to execute the operation. """ success, failed = 0, 0 # list of errors to be collected is not stats_only errors = [] for ok, item in streaming_bulk(client, actions, **kwargs): # go through request-reponse pairs and detect failures if not ok: if not stats_only: errors.append(item) failed += 1 else: success += 1 return success, failed if stats_only else errors # preserve the name for backwards compatibility bulk_index = bulk def scan(client, query=None, scroll='5m', raise_on_error=True, preserve_order=False, **kwargs): """ Simple abstraction on top of the :meth:`~elasticsearch.Elasticsearch.scroll` api - a simple iterator that yields all hits as returned by underlining scroll requests. By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use ``preserve_order=True``. This may be an expensive operation and will negate the performance benefits of using ``scan``. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg query: body for the :meth:`~elasticsearch.Elasticsearch.search` api :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg raise_on_error: raises an exception (``ScanError``) if an error is encountered (some shards fail to execute). By default we raise. :arg preserve_order: don't set the ``search_type`` to ``scan`` - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution. Any additional keyword arguments will be passed to the initial :meth:`~elasticsearch.Elasticsearch.search` call:: scan(es, query={"match": {"title": "python"}}, index="orders-*", doc_type="books" ) """ if not preserve_order: kwargs['search_type'] = 'scan' # initial search resp = client.search(body=query, scroll=scroll, **kwargs) scroll_id = resp.get('_scroll_id') if scroll_id is None: return first_run = True while True: # if we didn't set search_type to scan initial search contains data if preserve_order and first_run: first_run = False else: resp = client.scroll(scroll_id, scroll=scroll) for hit in resp['hits']['hits']: yield hit # check if we have any errrors if resp["_shards"]["failed"]: logger.warning( 'Scrol request has failed on %d shards out of %d.', resp['_shards']['failed'], resp['_shards']['total'] ) if raise_on_error: raise ScanError( 'Scrol request has failed on %d shards out of %d.', resp['_shards']['failed'], resp['_shards']['total'] ) scroll_id = resp.get('_scroll_id') # end of scroll if scroll_id is None or not resp['hits']['hits']: break def reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll='5m', scan_kwargs={}, bulk_kwargs={}): """ Reindex all documents from one index that satisfy a given query to another, potentially (if `target_client` is specified) on a different cluster. If you don't specify the query you will reindex all the documents. .. note:: This helper doesn't transfer mappings, just the data. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use (for read if `target_client` is specified as well) :arg source_index: index (or list of indices) to read documents from :arg target_index: name of the index in the target cluster to populate :arg query: body for the :meth:`~elasticsearch.Elasticsearch.search` api :arg target_client: optional, is specified will be used for writing (thus enabling reindex between clusters) :arg chunk_size: number of docs in one chunk sent to es (default: 500) :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg scan_kwargs: additional kwargs to be passed to :func:`~elasticsearch.helpers.scan` :arg bulk_kwargs: additional kwargs to be passed to :func:`~elasticsearch.helpers.bulk` """ target_client = client if target_client is None else target_client docs = scan(client, query=query, index=source_index, scroll=scroll, **scan_kwargs) def _change_doc_index(hits, index): for h in hits: h['_index'] = index yield h kwargs = { 'stats_only': True, } kwargs.update(bulk_kwargs) return bulk(target_client, _change_doc_index(docs, target_index), chunk_size=chunk_size, **kwargs) elasticsearch-py-1.6.0/elasticsearch/helpers/test.py000066400000000000000000000034021253570341300225650ustar00rootroot00000000000000import time import os try: # python 2.6 from unittest2 import TestCase, SkipTest except ImportError: from unittest import TestCase, SkipTest from elasticsearch import Elasticsearch from elasticsearch.exceptions import ConnectionError def get_test_client(nowait=False): # construct kwargs from the environment kw = {} if 'TEST_ES_CONNECTION' in os.environ: from elasticsearch import connection kw['connection_class'] = getattr(connection, os.environ['TEST_ES_CONNECTION']) client = Elasticsearch([os.environ.get('TEST_ES_SERVER', {})], **kw) # wait for yellow status for _ in range(1 if nowait else 100): try: client.cluster.health(wait_for_status='yellow') return client except ConnectionError: time.sleep(.1) else: # timeout raise SkipTest("Elasticsearch failed to start.") def _get_version(version_string): if '.' not in version_string: return () version = version_string.strip().split('.') return tuple(int(v) if v.isdigit() else 999 for v in version) class ElasticsearchTestCase(TestCase): @staticmethod def _get_client(): return get_test_client() @classmethod def setUpClass(cls): super(ElasticsearchTestCase, cls).setUpClass() cls.client = cls._get_client() def tearDown(self): super(ElasticsearchTestCase, self).tearDown() self.client.indices.delete(index='*') self.client.indices.delete_template(name='*', ignore=404) @property def es_version(self): if not hasattr(self, '_es_version'): version_string = self.client.info()['version']['number'] self._es_version = _get_version(version_string) return self._es_version elasticsearch-py-1.6.0/elasticsearch/serializer.py000066400000000000000000000042101253570341300223130ustar00rootroot00000000000000try: import simplejson as json except ImportError: import json from datetime import date, datetime from decimal import Decimal from .exceptions import SerializationError, ImproperlyConfigured from .compat import string_types class TextSerializer(object): mimetype = 'text/plain' def loads(self, s): return s def dumps(self, data): if isinstance(data, string_types): return data raise SerializationError('Cannot serialize %r into text.' % data) class JSONSerializer(object): mimetype = 'application/json' def default(self, data): if isinstance(data, (date, datetime)): return data.isoformat() elif isinstance(data, Decimal): return float(data) raise TypeError("Unable to serialize %r (type: %s)" % (data, type(data))) def loads(self, s): try: return json.loads(s) except (ValueError, TypeError) as e: raise SerializationError(s, e) def dumps(self, data): # don't serialize strings if isinstance(data, string_types): return data try: return json.dumps(data, default=self.default) except (ValueError, TypeError) as e: raise SerializationError(data, e) DEFAULT_SERIALIZERS = { JSONSerializer.mimetype: JSONSerializer(), TextSerializer.mimetype: TextSerializer(), } class Deserializer(object): def __init__(self, serializers, default_mimetype='application/json'): try: self.default = serializers[default_mimetype] except KeyError: raise ImproperlyConfigured('Cannot find default serializer (%s)' % default_mimetype) self.serializers = serializers def loads(self, s, mimetype=None): if not mimetype: deserializer = self.default else: # split out charset mimetype = mimetype.split(';', 1)[0] try: deserializer = self.serializers[mimetype] except KeyError: raise SerializationError('Unknown mimetype, unable to deserialize: %s' % mimetype) return deserializer.loads(s) elasticsearch-py-1.6.0/elasticsearch/transport.py000066400000000000000000000344161253570341300222110ustar00rootroot00000000000000import re import time from itertools import chain from .connection import Urllib3HttpConnection from .connection_pool import ConnectionPool, DummyConnectionPool from .serializer import JSONSerializer, Deserializer, DEFAULT_SERIALIZERS from .exceptions import ConnectionError, TransportError, SerializationError, \ ConnectionTimeout, ImproperlyConfigured # get ip/port from "inet[wind/127.0.0.1:9200]" ADDRESS_RE = re.compile(r'/(?P[\.:0-9a-f]*):(?P[0-9]+)\]?$') def get_host_info(node_info, host): """ Simple callback that takes the node info from `/_cluster/nodes` and a parsed connection information and return the connection information. If `None` is returned this node will be skipped. Useful for filtering nodes (by proximity for example) or if additional information needs to be provided for the :class:`~elasticsearch.Connection` class. By default master only nodes are filtered out since they shouldn't typically be used for API operations. :arg node_info: node information from `/_cluster/nodes` :arg host: connection information (host, port) extracted from the node info """ attrs = node_info.get('attributes', {}) # ignore master only nodes if (attrs.get('data', 'true') == 'false' and attrs.get('client', 'false') == 'false' and attrs.get('master', 'true') == 'true'): return None return host class Transport(object): """ Encapsulation of transport-related to logic. Handles instantiation of the individual connections as well as creating a connection pool to hold them. Main interface is the `perform_request` method. """ def __init__(self, hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, host_info_callback=get_host_info, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=.1, sniff_on_connection_fail=False, serializer=JSONSerializer(), serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(503, 504, ), retry_on_timeout=False, send_get_body_as='GET', **kwargs): """ :arg hosts: list of dictionaries, each containing keyword arguments to create a `connection_class` instance :arg connection_class: subclass of :class:`~elasticsearch.Connection` to use :arg connection_pool_class: subclass of :class:`~elasticsearch.ConnectionPool` to use :arg host_info_callback: callback responsible for taking the node information from `/_cluser/nodes`, along with already extracted information, and producing a list of arguments (same as `hosts` parameter) :arg sniff_on_start: flag indicating whether to obtain a list of nodes from the cluser at startup time :arg sniffer_timeout: number of seconds between automatic sniffs :arg sniff_on_connection_fail: flag controlling if connection failure triggers a sniff :arg sniff_timeout: timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if ``sniff_on_start`` is on) when the connection still isn't initialized. :arg serializer: serializer instance :arg serializers: optional dict of serializer instances that will be used for deserializing data coming from the server. (key is the mimetype) :arg default_mimetype: when no mimetype is specified by the server response assume this mimetype, defaults to `'application/json'` :arg max_retries: maximum number of retries before an exception is propagated :arg retry_on_status: set of HTTP status codes on which we should retry on a different node. defaults to ``(503, 504, )`` :arg retry_on_timeout: should timeout trigger a retry on different node? (default `False`) :arg send_get_body_as: for GET requests with body this option allows you to specify an alternate way of execution for environments that don't support passing bodies with GET requests. If you set this to 'POST' a POST method will be used instead, if to 'source' then the body will be serialized and passed as a query parameter `source`. Any extra keyword arguments will be passed to the `connection_class` when creating and instance unless overriden by that connection's options provided as part of the hosts parameter. """ # serialization config _serializers = DEFAULT_SERIALIZERS.copy() # if a serializer has been specified, use it for deserialization as well _serializers[serializer.mimetype] = serializer # if custom serializers map has been supplied, override the defaults with it if serializers: _serializers.update(serializers) # create a deserializer with our config self.deserializer = Deserializer(_serializers, default_mimetype) self.max_retries = max_retries self.retry_on_timeout = retry_on_timeout self.retry_on_status = retry_on_status self.send_get_body_as = send_get_body_as # data serializer self.serializer = serializer # store all strategies... self.connection_pool_class = connection_pool_class self.connection_class = connection_class # ...save kwargs to be passed to the connections self.kwargs = kwargs self.hosts = hosts # ...and instantiate them self.set_connections(hosts) # retain the original connection instances for sniffing self.seed_connections = self.connection_pool.connections[:] # sniffing data self.sniffer_timeout = sniffer_timeout self.sniff_on_connection_fail = sniff_on_connection_fail self.last_sniff = time.time() self.sniff_timeout = sniff_timeout # callback to construct host dict from data in /_cluster/nodes self.host_info_callback = host_info_callback if sniff_on_start: self.sniff_hosts(True) def add_connection(self, host): """ Create a new :class:`~elasticsearch.Connection` instance and add it to the pool. :arg host: kwargs that will be used to create the instance """ self.hosts.append(host) self.set_connections(self.hosts) def set_connections(self, hosts): """ Instantiate all the connections and crate new connection pool to hold them. Tries to identify unchanged hosts and re-use existing :class:`~elasticsearch.Connection` instances. :arg hosts: same as `__init__` """ # construct the connections def _create_connection(host): # if this is not the initial setup look at the existing connection # options and identify connections that haven't changed and can be # kept around. if hasattr(self, 'connection_pool'): for (connection, old_host) in self.connection_pool.connection_opts: if old_host == host: return connection # previously unseen params, create new connection kwargs = self.kwargs.copy() kwargs.update(host) if 'scheme' in host and host['scheme'] != self.connection_class.transport_schema: raise ImproperlyConfigured( 'Scheme specified in connection (%s) is not the same as the connection class (%s) specifies (%s).' % ( host['scheme'], self.connection_class.__name__, self.connection_class.transport_schema )) return self.connection_class(**kwargs) connections = map(_create_connection, hosts) connections = list(zip(connections, hosts)) if len(connections) == 1: self.connection_pool = DummyConnectionPool(connections) else: # pass the hosts dicts to the connection pool to optionally extract parameters from self.connection_pool = self.connection_pool_class(connections, **self.kwargs) def get_connection(self): """ Retreive a :class:`~elasticsearch.Connection` instance from the :class:`~elasticsearch.ConnectionPool` instance. """ if self.sniffer_timeout: if time.time() >= self.last_sniff + self.sniffer_timeout: self.sniff_hosts() return self.connection_pool.get_connection() def sniff_hosts(self, initial=False): """ Obtain a list of nodes from the cluster and create a new connection pool using the information retrieved. To extract the node connection parameters use the ``nodes_to_host_callback``. :arg initial: flag indicating if this is during startup (``sniff_on_start``), ignore the ``sniff_timeout`` if ``True`` """ previous_sniff = self.last_sniff try: # reset last_sniff timestamp self.last_sniff = time.time() # go through all current connections as well as the # seed_connections for good measure for c in chain(self.connection_pool.connections, self.seed_connections): try: # use small timeout for the sniffing request, should be a fast api call _, headers, node_info = c.perform_request('GET', '/_nodes/_all/clear', timeout=self.sniff_timeout if not initial else None) node_info = self.deserializer.loads(node_info, headers.get('content-type')) break except (ConnectionError, SerializationError): pass else: raise TransportError("N/A", "Unable to sniff hosts.") except: # keep the previous value on error self.last_sniff = previous_sniff raise hosts = [] address = self.connection_class.transport_schema + '_address' for n in node_info['nodes'].values(): match = ADDRESS_RE.search(n.get(address, '')) if not match: continue host = match.groupdict() if 'port' in host: host['port'] = int(host['port']) host = self.host_info_callback(n, host) if host is not None: hosts.append(host) # we weren't able to get any nodes, maybe using an incompatible # transport_schema or host_info_callback blocked all - raise error. if not hosts: raise TransportError("N/A", "Unable to sniff hosts - no viable hosts found.") self.set_connections(hosts) def mark_dead(self, connection): """ Mark a connection as dead (failed) in the connection pool. If sniffing on failure is enabled this will initiate the sniffing process. :arg connection: instance of :class:`~elasticsearch.Connection` that failed """ # mark as dead even when sniffing to avoid hitting this host during the sniff process self.connection_pool.mark_dead(connection) if self.sniff_on_connection_fail: self.sniff_hosts() def perform_request(self, method, url, params=None, body=None): """ Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it's perform_request method and return the data. If an exception was raised, mark the connection as failed and retry (up to `max_retries` times). If the operation was succesful and the connection used was previously marked as dead, mark it as live, resetting it's failure count. :arg method: HTTP method to use :arg url: absolute url (without host) to target :arg params: dictionary of query parameters, will be handed over to the underlying :class:`~elasticsearch.Connection` class for serialization :arg body: body of the request, will be serializes using serializer and passed to the connection """ if body is not None: body = self.serializer.dumps(body) # some clients or environments don't support sending GET with body if method in ('HEAD', 'GET') and self.send_get_body_as != 'GET': # send it as post instead if self.send_get_body_as == 'POST': method = 'POST' # or as source parameter elif self.send_get_body_as == 'source': if params is None: params = {} params['source'] = body body = None if body is not None: try: body = body.encode('utf-8') except (UnicodeDecodeError, AttributeError): # bytes/str - no need to re-encode pass ignore = () timeout = None if params: timeout = params.pop('request_timeout', None) ignore = params.pop('ignore', ()) if isinstance(ignore, int): ignore = (ignore, ) for attempt in range(self.max_retries + 1): connection = self.get_connection() try: status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) except TransportError as e: retry = False if isinstance(e, ConnectionTimeout): retry = self.retry_on_timeout elif isinstance(e, ConnectionError): retry = True elif e.status_code in self.retry_on_status: retry = True if retry: # only mark as dead if we are retrying self.mark_dead(connection) # raise exception on last retry if attempt == self.max_retries: raise else: raise else: # connection didn't fail, confirm it's live status self.connection_pool.mark_live(connection) if data: data = self.deserializer.loads(data, headers.get('content-type')) return status, data elasticsearch-py-1.6.0/example/000077500000000000000000000000001253570341300164145ustar00rootroot00000000000000elasticsearch-py-1.6.0/example/README.rst000066400000000000000000000015531253570341300201070ustar00rootroot00000000000000Example code for `elasticsearch-py` =================================== This example code demonstrates the features and use patterns for the Python client. To run this example make sure you have elasticsearch running on port 9200, install additional dependencies (on top of `elasticsearch-py`):: pip install python-dateutil GitPython And now you can load the index (the index will be called `git`):: python load.py This will create an index with mappings and parse the git information of this repository and load all the commits into it. You can run some sample queries by running:: python queries.py Look at the `queries.py` file for querying example and `load.py` on examples on loading data into elasticsearch. Both `load` and `queries` set up logging so in `/tmp/es_trace.log` you will have a transcript of the commands being run in the curl format. elasticsearch-py-1.6.0/example/load.py000066400000000000000000000150741253570341300177140ustar00rootroot00000000000000#!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function from os.path import dirname, basename, abspath from itertools import chain from datetime import datetime import logging import git from elasticsearch import Elasticsearch from elasticsearch.helpers import bulk, streaming_bulk def create_git_index(client, index): # create empty index client.indices.create( index=index, body={ 'settings': { # just one shard, no replicas for testing 'number_of_shards': 1, 'number_of_replicas': 0, # custom analyzer for analyzing file paths 'analysis': { 'analyzer': { 'file_path': { 'type': 'custom', 'tokenizer': 'path_hierarchy', 'filter': ['lowercase'] } } } } }, # ignore already existing index ignore=400 ) # we will use user on several places user_mapping = { 'properties': { 'name': { 'type': 'multi_field', 'fields': { 'raw': {'type' : 'string', 'index' : 'not_analyzed'}, 'name': {'type' : 'string'} } } } } client.indices.put_mapping( index=index, doc_type='repos', body={ 'repos': { 'properties': { 'owner': user_mapping, 'created_at': {'type': 'date'}, 'description': { 'type': 'string', 'analyzer': 'snowball', }, 'tags': { 'type': 'string', 'index': 'not_analyzed' } } } } ) client.indices.put_mapping( index=index, doc_type='commits', body={ 'commits': { '_parent': { 'type': 'repos' }, 'properties': { 'author': user_mapping, 'authored_date': {'type': 'date'}, 'committer': user_mapping, 'committed_date': {'type': 'date'}, 'parent_shas': {'type': 'string', 'index' : 'not_analyzed'}, 'description': {'type': 'string', 'analyzer': 'snowball'}, 'files': {'type': 'string', 'analyzer': 'file_path'} } } } ) def parse_commits(repo, name): """ Go through the git repository log and generate a document per commit containing all the metadata. """ for commit in repo.log(): yield { '_id': commit.id, '_parent': name, 'committed_date': datetime(*commit.committed_date[:6]), 'committer': { 'name': commit.committer.name, 'email': commit.committer.email, }, 'authored_date': datetime(*commit.authored_date[:6]), 'author': { 'name': commit.author.name, 'email': commit.author.email, }, 'description': commit.message, 'parent_shas': [p.id for p in commit.parents], # we only care about the filenames, not the per-file stats 'files': list(chain(commit.stats.files)), 'stats': commit.stats.total, } def load_repo(client, path=None, index='git'): """ Parse a git repository with all it's commits and load it into elasticsearch using `client`. If the index doesn't exist it will be created. """ path = dirname(dirname(abspath(__file__))) if path is None else path repo_name = basename(path) repo = git.Repo(path) create_git_index(client, index) # create the parent document in case it doesn't exist client.create( index=index, doc_type='repos', id=repo_name, body={}, ignore=409 # 409 - conflict - would be returned if the document is already there ) # we let the streaming bulk continuously process the commits as they come # in - since the `parse_commits` function is a generator this will avoid # loading all the commits into memory for ok, result in streaming_bulk( client, parse_commits(repo, repo_name), index=index, doc_type='commits', chunk_size=50 # keep the batch sizes small for appearances only ): action, result = result.popitem() doc_id = '/%s/commits/%s' % (index, result['_id']) # process the information from ES whether the document has been # successfully indexed if not ok: print('Failed to %s document %s: %r' % (action, doc_id, result)) else: print(doc_id) # we manually create es repo document and update elasticsearch-py to include metadata REPO_ACTIONS = [ {'_type': 'repos', '_id': 'elasticsearch', '_source': { 'owner': {'name': 'Shay Bannon', 'email': 'kimchy@gmail.com'}, 'created_at': datetime(2010, 2, 8, 15, 22, 27), 'tags': ['search', 'distributed', 'lucene'], 'description': 'You know, for search.'} }, {'_type': 'repos', '_id': 'elasticsearch-py', '_op_type': 'update', 'doc': { 'owner': {'name': 'Honza Král', 'email': 'honza.kral@gmail.com'}, 'created_at': datetime(2013, 5, 1, 16, 37, 32), 'tags': ['elasticsearch', 'search', 'python', 'client'], 'description': 'For searching snakes.'} }, ] if __name__ == '__main__': # get trace logger and set level tracer = logging.getLogger('elasticsearch.trace') tracer.setLevel(logging.INFO) tracer.addHandler(logging.FileHandler('/tmp/es_trace.log')) # instantiate es client, connects to localhost:9200 by default es = Elasticsearch() # we load the repo and all commits load_repo(es) # run the bulk operations success, _ = bulk(es, REPO_ACTIONS, index='git', raise_on_error=True) print('Performed %d actions' % success) # now we can retrieve the documents es_repo = es.get(index='git', doc_type='repos', id='elasticsearch') print('%s: %s' % (es_repo['_id'], es_repo['_source']['description'])) # update - add java to es tags es.update( index='git', doc_type='repos', id='elasticsearch', body={ "script" : "ctx._source.tags += tag", "params" : { "tag" : "java" } } ) # refresh to make the documents available for search es.indices.refresh(index='git') # and now we can count the documents print(es.count(index='git')['count'], 'documents in index') elasticsearch-py-1.6.0/example/queries.py000066400000000000000000000057671253570341300204620ustar00rootroot00000000000000#!/usr/bin/env python from __future__ import print_function import logging from dateutil.parser import parse as parse_date from elasticsearch import Elasticsearch def print_hits(results, facet_masks={}): " Simple utility function to print results of a search query. " print('=' * 80) print('Total %d found in %dms' % (results['hits']['total'], results['took'])) if results['hits']['hits']: print('-' * 80) for hit in results['hits']['hits']: # get created date for a repo and fallback to authored_date for a commit created_at = parse_date(hit['_source'].get('created_at', hit['_source']['authored_date'])) print('/%s/%s/%s (%s): %s' % ( hit['_index'], hit['_type'], hit['_id'], created_at.strftime('%Y-%m-%d'), hit['_source']['description'].replace('\n', ' '))) for facet, mask in facet_masks.items(): print('-' * 80) for d in results['facets'][facet]['terms']: print(mask % d) print('=' * 80) print() # get trace logger and set level tracer = logging.getLogger('elasticsearch.trace') tracer.setLevel(logging.INFO) tracer.addHandler(logging.FileHandler('/tmp/es_trace.log')) # instantiate es client, connects to localhost:9200 by default es = Elasticsearch() print('Empty search:') print_hits(es.search(index='git')) print('Find commits that says "fix" without touching tests:') result = es.search( index='git', doc_type='commits', body={ 'query': { 'filtered': { 'query': { 'match': {'description': 'fix'} }, 'filter': { 'not': { 'term': {'files': 'test_elasticsearch'} } } } } } ) print_hits(result) print('Last 8 Commits for elasticsearch-py:') result = es.search( index='git', doc_type='commits', body={ 'query': { 'filtered': { 'filter': { 'term': { # parent ref is stored as type#id '_parent': 'repos#elasticsearch-py' } } } }, 'sort': [ {'committed_date': {'order': 'desc'}} ], 'size': 8 } ) print_hits(result) print('Stats for top 10 python committers:') result = es.search( index='git', doc_type='commits', body={ 'size': 0, 'query': { 'filtered': { 'filter': { 'has_parent': { 'type': 'repos', 'query': { 'filtered': { 'filter': { 'term': { 'tags': 'python' } } } } } } } }, 'facets': { 'committers': { 'terms_stats': { 'key_field': 'committer.name.raw', 'value_field': 'stats.lines' } } } } ) print_hits(result, {'committers': '%(term)15s: %(count)3d commits changing %(total)6d lines'}) elasticsearch-py-1.6.0/setup.cfg000066400000000000000000000002151253570341300166000ustar00rootroot00000000000000[build_sphinx] source-dir = docs/ build-dir = docs/_build all_files = 1 [wheel] universal = 1 [bdist_rpm] requires = python python-urllib3 elasticsearch-py-1.6.0/setup.py000066400000000000000000000041301253570341300164710ustar00rootroot00000000000000# -*- coding: utf-8 -*- from os.path import join, dirname from setuptools import setup, find_packages import sys import os VERSION = (1, 6, 0) __version__ = VERSION __versionstr__ = '.'.join(map(str, VERSION)) f = open(join(dirname(__file__), 'README')) long_description = f.read().strip() f.close() install_requires = [ 'urllib3>=1.8, <2.0', ] tests_require = [ 'requests>=1.0.0, <3.0.0', 'nose', 'coverage', 'mock', 'pyaml', 'nosexcover' ] # use external unittest for 2.6 if sys.version_info[:2] == (2, 6): install_requires.append('unittest2') if sys.version_info[0] == 2: # only require thrift if we are going to use it if os.environ.get('TEST_ES_CONNECTION', None) == 'ThriftConnection': tests_require.append('thrift==0.9.1') tests_require.append('pylibmc==1.4.1') setup( name = 'elasticsearch', description = "Python client for Elasticsearch", license="Apache License, Version 2.0", url = "https://github.com/elastic/elasticsearch-py", long_description = long_description, version = __versionstr__, author = "Honza Král", author_email = "honza.kral@gmail.com", packages=find_packages( where='.', exclude=('test_elasticsearch*', ) ), classifiers = [ "Development Status :: 5 - Production/Stable", "License :: OSI Approved :: Apache Software License", "Intended Audience :: Developers", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", ], install_requires=install_requires, test_suite='test_elasticsearch.run_tests.run_all', tests_require=tests_require, ) elasticsearch-py-1.6.0/test_elasticsearch/000077500000000000000000000000001253570341300206325ustar00rootroot00000000000000elasticsearch-py-1.6.0/test_elasticsearch/README.rst000066400000000000000000000044101253570341300223200ustar00rootroot00000000000000elasticsearch-py test suite =========================== Warning - by default the tests will try and connect to `localhost:9200` and will destroy all contents of given cluster! The tests also rely on a checkout of `elasticsearch` repository existing on the same level as the `elasticsearch-py` clone. Before running the tests we will, by default, pull latest changes for that repo and perform `git reset --hard` to the exact version that was used to build the server we are running against. Running the tests ----------------- To simply run the tests just execute the `run_tests.py` script or invoke `python setup.py test`. The behavior is driven by environmental variables: * `TEST_ES_SERVER` - can contain "hostname[:port]" of running es cluster * `TEST_ES_CONNECTION` - name of the connection class to use from `elasticsearch.connection` module. If you want to run completely with your own see section on customizing tests. * `TEST_ES_YAML_DIR` - path to the yaml test suite contained in the elasticsearch repo. Defaults to `$TEST_ES_REPO/rest-api-spec/test` * `TEST_ES_REPO` - path to the elasticsearch repo, by default it will look in the same directory as `elasticsearch-py` is in. It will not be used if `TEST_ES_YAML_DIR` is specified directly. * `TEST_ES_NOFETCH` - controls if we should fetch new updates to elasticsearch repo and reset it's version to the sha used to build the current es server. Defaults to `False` which means we will fetch the elasticsearch repo and `git reset --hard` the sha used to build the server. Alternatively, if you wish to control what you are doing you have several additional options: * `run_tests.py` will pass any parameters specified to `nosetests` * you can just run your favorite runner in the `test_elasticsearch` directory (verified to work with nose and py.test) and bypass the fetch logic entirely. Customizing the tests --------------------- You can create a `local.py` file in the `test_elasticsearch` directory which should contain a `get_client` function. If this file exists the function will be used instead of `elasticsearch.helpers.test.get_test_client` to construct the client used for any integration tests. You can use this to make sure your plugins and extensions work with `elasticsearch-py`. elasticsearch-py-1.6.0/test_elasticsearch/__init__.py000066400000000000000000000000001253570341300227310ustar00rootroot00000000000000elasticsearch-py-1.6.0/test_elasticsearch/run_tests.py000077500000000000000000000042261253570341300232410ustar00rootroot00000000000000#!/usr/bin/env python from __future__ import print_function import sys from os import environ from os.path import dirname, join, pardir, abspath, exists import subprocess import nose def fetch_es_repo(): # user is manually setting YAML dir, don't tamper with it if 'TEST_ES_YAML_DIR' in environ: return repo_path = environ.get( 'TEST_ES_REPO', abspath(join(dirname(__file__), pardir, pardir, 'elasticsearch')) ) # no repo if not exists(repo_path) or not exists(join(repo_path, '.git')): print('No elasticsearch repo found...') # set YAML DIR to empty to skip yaml tests environ['TEST_ES_YAML_DIR'] = '' return # set YAML test dir environ['TEST_ES_YAML_DIR'] = join(repo_path, 'rest-api-spec', 'test') # fetching of yaml tests disabled, we'll run with what's there if environ.get('TEST_ES_NOFETCH', False): return from test_elasticsearch.test_server import get_client from test_elasticsearch.test_cases import SkipTest # find out the sha of the running es try: es = get_client() sha = es.info()['version']['build_hash'] except (SkipTest, KeyError): print('No running elasticsearch >1.X server...') return # fetch new commits to be sure... print('Fetching elasticsearch repo...') subprocess.check_call('cd %s && git fetch https://github.com/elasticsearch/elasticsearch.git' % repo_path, shell=True) # reset to the version fron info() subprocess.check_call('cd %s && git reset --hard %s' % (repo_path, sha), shell=True) def run_all(argv=None): sys.exitfunc = lambda: sys.stderr.write('Shutting down....\n') # fetch yaml tests fetch_es_repo() # always insert coverage when running tests if argv is None: argv = [ 'nosetests', '--with-xunit', '--with-xcoverage', '--cover-package=elasticsearch', '--cover-erase', '--logging-filter=elasticsearch', '--logging-level=DEBUG', '--verbose', ] nose.run_exit( argv=argv, defaultTest=abspath(dirname(__file__)) ) if __name__ == '__main__': run_all(sys.argv) elasticsearch-py-1.6.0/test_elasticsearch/test_cases.py000066400000000000000000000034151253570341300233440ustar00rootroot00000000000000from collections import defaultdict try: # python 2.6 from unittest2 import TestCase, SkipTest except ImportError: from unittest import TestCase, SkipTest from elasticsearch import Elasticsearch class DummyTransport(object): def __init__(self, hosts, responses=None, **kwargs): self.hosts = hosts self.responses = responses self.call_count = 0 self.calls = defaultdict(list) def perform_request(self, method, url, params=None, body=None): resp = 200, {} if self.responses: resp = self.responses[self.call_count] self.call_count += 1 self.calls[(method, url)].append((params, body)) return resp class ElasticsearchTestCase(TestCase): def setUp(self): super(ElasticsearchTestCase, self).setUp() self.client = Elasticsearch(transport_class=DummyTransport) def assert_call_count_equals(self, count): self.assertEquals(count, self.client.transport.call_count) def assert_url_called(self, method, url, count=1): self.assertIn((method, url), self.client.transport.calls) calls = self.client.transport.calls[(method, url)] self.assertEquals(count, len(calls)) return calls class TestElasticsearchTestCase(ElasticsearchTestCase): def test_our_transport_used(self): self.assertIsInstance(self.client.transport, DummyTransport) def test_start_with_0_call(self): self.assert_call_count_equals(0) def test_each_call_is_recorded(self): self.client.transport.perform_request('GET', '/') self.client.transport.perform_request('DELETE', '/42', params={}, body='body') self.assert_call_count_equals(2) self.assertEquals([({}, 'body')], self.assert_url_called('DELETE', '/42', 1)) elasticsearch-py-1.6.0/test_elasticsearch/test_client/000077500000000000000000000000001253570341300231475ustar00rootroot00000000000000elasticsearch-py-1.6.0/test_elasticsearch/test_client/__init__.py000066400000000000000000000056641253570341300252730ustar00rootroot00000000000000from __future__ import unicode_literals from elasticsearch.client import _normalize_hosts, Elasticsearch from ..test_cases import TestCase, ElasticsearchTestCase class TestNormalizeHosts(TestCase): def test_none_uses_defaults(self): self.assertEquals([{}], _normalize_hosts(None)) def test_strings_are_used_as_hostnames(self): self.assertEquals([{"host": "elastic.co"}], _normalize_hosts(["elastic.co"])) def test_strings_are_parsed_for_port_and_user(self): self.assertEquals( [{"host": "elastic.co", "port": 42}, {"host": "elastic.co", "http_auth": "user:secret"}], _normalize_hosts(["elastic.co:42", "user:secret@elastic.co"]) ) def test_strings_are_parsed_for_scheme(self): self.assertEquals( [ { "host": "elastic.co", "port": 42, "use_ssl": True, 'scheme': 'http' }, { "host": "elastic.co", "http_auth": "user:secret", "use_ssl": True, "port": 443, 'scheme': 'http', 'url_prefix': '/prefix' } ], _normalize_hosts(["https://elastic.co:42", "https://user:secret@elastic.co/prefix"]) ) def test_dicts_are_left_unchanged(self): self.assertEquals([{"host": "local", "extra": 123}], _normalize_hosts([{"host": "local", "extra": 123}])) def test_single_string_is_wrapped_in_list(self): self.assertEquals( [{"host": "elastic.co"}], _normalize_hosts("elastic.co") ) class TestClient(ElasticsearchTestCase): def test_request_timeout_is_passed_through_unescaped(self): self.client.ping(request_timeout=.1) calls = self.assert_url_called('HEAD', '/') self.assertEquals([({'request_timeout': .1}, None)], calls) def test_from_in_search(self): self.client.search(index='i', doc_type='t', from_=10) calls = self.assert_url_called('GET', '/i/t/_search') self.assertEquals([({'from': '10'}, None)], calls) def test_repr_contains_hosts(self): self.assertEquals('', repr(self.client)) def test_repr_contains_hosts_passed_in(self): self.assertIn("es.org", repr(Elasticsearch(['es.org:123']))) def test_repr_truncates_host_to_10(self): hosts = [{"host": "es" + str(i)} for i in range(20)] self.assertNotIn("es5", repr(Elasticsearch(hosts))) def test_index_uses_post_if_id_is_empty(self): self.client.index(index='my-index', doc_type='test-doc', id='', body={}) self.assert_url_called('POST', '/my-index/test-doc') def test_index_uses_put_if_id_is_not_empty(self): self.client.index(index='my-index', doc_type='test-doc', id=0, body={}) self.assert_url_called('PUT', '/my-index/test-doc/0') elasticsearch-py-1.6.0/test_elasticsearch/test_client/test_indices.py000066400000000000000000000016421253570341300262010ustar00rootroot00000000000000from test_elasticsearch.test_cases import ElasticsearchTestCase class TestIndices(ElasticsearchTestCase): def test_create_one_index(self): self.client.indices.create('test-index') self.assert_url_called('PUT', '/test-index') def test_delete_multiple_indices(self): self.client.indices.delete(['test-index', 'second.index', 'third/index']) self.assert_url_called('DELETE', '/test-index,second.index,third%2Findex') def test_exists_index(self): self.client.indices.exists('second.index,third/index') self.assert_url_called('HEAD', '/second.index,third%2Findex') def test_passing_empty_value_for_required_param_raises_exception(self): self.assertRaises(ValueError, self.client.indices.exists, index=None) self.assertRaises(ValueError, self.client.indices.exists, index=[]) self.assertRaises(ValueError, self.client.indices.exists, index='') elasticsearch-py-1.6.0/test_elasticsearch/test_client/test_utils.py000066400000000000000000000012131253570341300257150ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals from elasticsearch.client.utils import _make_path from elasticsearch.compat import PY2 from ..test_cases import TestCase, SkipTest class TestMakePath(TestCase): def test_handles_unicode(self): id = "中文" self.assertEquals('/some-index/type/%E4%B8%AD%E6%96%87', _make_path('some-index', 'type', id)) def test_handles_utf_encoded_string(self): if not PY2: raise SkipTest('Only relevant for py2') id = "中文".encode('utf-8') self.assertEquals('/some-index/type/%E4%B8%AD%E6%96%87', _make_path('some-index', 'type', id)) elasticsearch-py-1.6.0/test_elasticsearch/test_connection.py000066400000000000000000000237141253570341300244110ustar00rootroot00000000000000import re from mock import Mock, patch import urllib3 import warnings from requests.auth import AuthBase from elasticsearch.exceptions import TransportError, ConflictError, RequestError, NotFoundError from elasticsearch.connection import RequestsHttpConnection, \ Urllib3HttpConnection, THRIFT_AVAILABLE, ThriftConnection from .test_cases import TestCase, SkipTest class TestThriftConnection(TestCase): def setUp(self): if not THRIFT_AVAILABLE: raise SkipTest('Thrift is not available.') super(TestThriftConnection, self).setUp() def test_use_ssl_uses_ssl_socket(self): from thrift.transport import TSSLSocket con = ThriftConnection(use_ssl=True) self.assertIs(con._tsocket_class, TSSLSocket.TSSLSocket) def test_use_normal_tsocket_by_default(self): from thrift.transport import TSocket con = ThriftConnection() self.assertIs(con._tsocket_class, TSocket.TSocket) def test_timeout_set(self): con = ThriftConnection(timeout=42) self.assertEquals(42, con.timeout) class TestUrllib3Connection(TestCase): def test_timeout_set(self): con = Urllib3HttpConnection(timeout=42) self.assertEquals(42, con.timeout) def test_http_auth(self): con = Urllib3HttpConnection(http_auth='username:secret') self.assertEquals({'authorization': 'Basic dXNlcm5hbWU6c2VjcmV0'}, con.headers) def test_http_auth_tuple(self): con = Urllib3HttpConnection(http_auth=('username', 'secret')) self.assertEquals({'authorization': 'Basic dXNlcm5hbWU6c2VjcmV0'}, con.headers) def test_http_auth_list(self): con = Urllib3HttpConnection(http_auth=['username', 'secret']) self.assertEquals({'authorization': 'Basic dXNlcm5hbWU6c2VjcmV0'}, con.headers) def test_uses_https_if_specified(self): with warnings.catch_warnings(record=True) as w: con = Urllib3HttpConnection(use_ssl=True) self.assertEquals(1, len(w)) self.assertEquals('Connecting to localhost using SSL with verify_certs=False is insecure.', str(w[0].message)) self.assertIsInstance(con.pool, urllib3.HTTPSConnectionPool) def test_doesnt_use_https_if_not_specified(self): con = Urllib3HttpConnection() self.assertIsInstance(con.pool, urllib3.HTTPConnectionPool) class TestRequestsConnection(TestCase): def _get_mock_connection(self, connection_params={}, status_code=200, response_body='{}'): con = RequestsHttpConnection(**connection_params) def _dummy_send(*args, **kwargs): dummy_response = Mock() dummy_response.headers = {} dummy_response.status_code = status_code dummy_response.text = response_body dummy_response.request = args[0] dummy_response.cookies = {} _dummy_send.call_args = (args, kwargs) return dummy_response con.session.send = _dummy_send return con def _get_request(self, connection, *args, **kwargs): if 'body' in kwargs: kwargs['body'] = kwargs['body'].encode('utf-8') status, headers, data = connection.perform_request(*args, **kwargs) self.assertEquals(200, status) self.assertEquals('{}', data) timeout = kwargs.pop('timeout', connection.timeout) args, kwargs = connection.session.send.call_args self.assertEquals(timeout, kwargs['timeout']) self.assertEquals(1, len(args)) return args[0] def test_custom_http_auth_is_allowed(self): auth = AuthBase() c = RequestsHttpConnection(http_auth=auth) self.assertEquals(auth, c.session.auth) def test_timeout_set(self): con = RequestsHttpConnection(timeout=42) self.assertEquals(42, con.timeout) def test_use_https_if_specified(self): with warnings.catch_warnings(record=True) as w: con = self._get_mock_connection({'use_ssl': True, 'url_prefix': 'url'}) self.assertEquals(1, len(w)) self.assertEquals('Connecting to https://localhost:9200/url using SSL with verify_certs=False is insecure.', str(w[0].message)) request = self._get_request(con, 'GET', '/') self.assertEquals('https://localhost:9200/url/', request.url) self.assertEquals('GET', request.method) self.assertEquals(None, request.body) def test_http_auth(self): con = RequestsHttpConnection(http_auth='username:secret') self.assertEquals(('username', 'secret'), con.session.auth) def test_http_auth_tuple(self): con = RequestsHttpConnection(http_auth=('username', 'secret')) self.assertEquals(('username', 'secret'), con.session.auth) def test_http_auth_list(self): con = RequestsHttpConnection(http_auth=['username', 'secret']) self.assertEquals(('username', 'secret'), con.session.auth) def test_repr(self): con = self._get_mock_connection({"host": "elasticsearch.com", "port": 443}) self.assertEquals('', repr(con)) def test_conflict_error_is_returned_on_409(self): con = self._get_mock_connection(status_code=409) self.assertRaises(ConflictError, con.perform_request, 'GET', '/', {}, '') def test_not_found_error_is_returned_on_404(self): con = self._get_mock_connection(status_code=404) self.assertRaises(NotFoundError, con.perform_request, 'GET', '/', {}, '') def test_request_error_is_returned_on_400(self): con = self._get_mock_connection(status_code=400) self.assertRaises(RequestError, con.perform_request, 'GET', '/', {}, '') @patch('elasticsearch.connection.base.tracer') @patch('elasticsearch.connection.base.logger') def test_failed_request_logs_and_traces(self, logger, tracer): con = self._get_mock_connection(response_body='{"answer": 42}', status_code=500) self.assertRaises(TransportError, con.perform_request, 'GET', '/', {'param': 42}, '{}'.encode('utf-8')) # no trace request self.assertEquals(0, tracer.info.call_count) # no trace response self.assertEquals(0, tracer.debug.call_count) # log url and duration self.assertEquals(1, logger.warning.call_count) self.assertTrue(re.match( '^GET http://localhost:9200/\?param=42 \[status:500 request:0.[0-9]{3}s\]', logger.warning.call_args[0][0] % logger.warning.call_args[0][1:] )) @patch('elasticsearch.connection.base.tracer') @patch('elasticsearch.connection.base.logger') def test_success_logs_and_traces(self, logger, tracer): con = self._get_mock_connection(response_body='''{"answer": "that's it!"}''') status, headers, data = con.perform_request('GET', '/', {'param': 42}, '''{"question": "what's that?"}'''.encode('utf-8')) # trace request self.assertEquals(1, tracer.info.call_count) self.assertEquals( """curl -XGET 'http://localhost:9200/?pretty¶m=42' -d '{\n "question": "what\\u0027s that?"\n}'""", tracer.info.call_args[0][0] % tracer.info.call_args[0][1:] ) # trace response self.assertEquals(1, tracer.debug.call_count) self.assertTrue(re.match( '#\[200\] \(0.[0-9]{3}s\)\n#\{\n# "answer": "that\\\\u0027s it!"\n#\}', tracer.debug.call_args[0][0] % tracer.debug.call_args[0][1:] )) # log url and duration self.assertEquals(1, logger.info.call_count) self.assertTrue(re.match( 'GET http://localhost:9200/\?param=42 \[status:200 request:0.[0-9]{3}s\]', logger.info.call_args[0][0] % logger.info.call_args[0][1:] )) # log request body and response self.assertEquals(2, logger.debug.call_count) req, resp = logger.debug.call_args_list self.assertEquals( '> {"question": "what\'s that?"}', req[0][0] % req[0][1:] ) self.assertEquals( '< {"answer": "that\'s it!"}', resp[0][0] % resp[0][1:] ) def test_defaults(self): con = self._get_mock_connection() request = self._get_request(con, 'GET', '/') self.assertEquals('http://localhost:9200/', request.url) self.assertEquals('GET', request.method) self.assertEquals(None, request.body) def test_params_properly_encoded(self): con = self._get_mock_connection() request = self._get_request(con, 'GET', '/', params={'param': 'value with spaces'}) self.assertEquals('http://localhost:9200/?param=value+with+spaces', request.url) self.assertEquals('GET', request.method) self.assertEquals(None, request.body) def test_body_attached(self): con = self._get_mock_connection() request = self._get_request(con, 'GET', '/', body='{"answer": 42}') self.assertEquals('http://localhost:9200/', request.url) self.assertEquals('GET', request.method) self.assertEquals('{"answer": 42}'.encode('utf-8'), request.body) def test_http_auth_attached(self): con = self._get_mock_connection({'http_auth': 'username:secret'}) request = self._get_request(con, 'GET', '/') self.assertEquals(request.headers['authorization'], 'Basic dXNlcm5hbWU6c2VjcmV0') @patch('elasticsearch.connection.base.tracer') def test_url_prefix(self, tracer): con = self._get_mock_connection({"url_prefix": "/some-prefix/"}) request = self._get_request(con, 'GET', '/_search', body='{"answer": 42}', timeout=0.1) self.assertEquals('http://localhost:9200/some-prefix/_search', request.url) self.assertEquals('GET', request.method) self.assertEquals('{"answer": 42}'.encode('utf-8'), request.body) # trace request self.assertEquals(1, tracer.info.call_count) self.assertEquals( "curl -XGET 'http://localhost:9200/_search?pretty' -d '{\n \"answer\": 42\n}'", tracer.info.call_args[0][0] % tracer.info.call_args[0][1:] ) elasticsearch-py-1.6.0/test_elasticsearch/test_connection_pool.py000066400000000000000000000104651253570341300254410ustar00rootroot00000000000000import time from elasticsearch.connection_pool import ConnectionPool, RoundRobinSelector, DummyConnectionPool from elasticsearch.exceptions import ImproperlyConfigured from .test_cases import TestCase class TestConnectionPool(TestCase): def test_dummy_cp_raises_exception_on_more_connections(self): self.assertRaises(ImproperlyConfigured, DummyConnectionPool, []) self.assertRaises(ImproperlyConfigured, DummyConnectionPool, [object(), object()]) def test_raises_exception_when_no_connections_defined(self): self.assertRaises(ImproperlyConfigured, ConnectionPool, []) def test_default_round_robin(self): pool = ConnectionPool([(x, {}) for x in range(100)]) connections = set() for _ in range(100): connections.add(pool.get_connection()) self.assertEquals(connections, set(range(100))) def test_disable_shuffling(self): pool = ConnectionPool([(x, {}) for x in range(100)], randomize_hosts=False) connections = [] for _ in range(100): connections.append(pool.get_connection()) self.assertEquals(connections, list(range(100))) def test_selectors_have_access_to_connection_opts(self): class MySelector(RoundRobinSelector): def select(self, connections): return self.connection_opts[super(MySelector, self).select(connections)]["actual"] pool = ConnectionPool([(x, {"actual": x*x}) for x in range(100)], selector_class=MySelector, randomize_hosts=False) connections = [] for _ in range(100): connections.append(pool.get_connection()) self.assertEquals(connections, [x*x for x in range(100)]) def test_dead_nodes_are_removed_from_active_connections(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.mark_dead(42, now=now) self.assertEquals(99, len(pool.connections)) self.assertEquals(1, pool.dead.qsize()) self.assertEquals((now + 60, 42), pool.dead.get()) def test_connection_is_skipped_when_dead(self): pool = ConnectionPool([(x, {}) for x in range(2)]) pool.mark_dead(0) self.assertEquals([1, 1, 1], [pool.get_connection(), pool.get_connection(), pool.get_connection(), ]) def test_connection_is_forcibly_resurrected_when_no_live_ones_are_availible(self): pool = ConnectionPool([(x, {}) for x in range(2)]) pool.dead_count[0] = 1 pool.mark_dead(0) # failed twice, longer timeout pool.mark_dead(1) # failed the first time, first to be resurrected self.assertEquals([], pool.connections) self.assertEquals(1, pool.get_connection()) self.assertEquals([1,], pool.connections) def test_connection_is_resurrected_after_its_timeout(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.mark_dead(42, now=now-61) pool.get_connection() self.assertEquals(42, pool.connections[-1]) self.assertEquals(100, len(pool.connections)) def test_force_resurrect_always_returns_a_connection(self): pool = ConnectionPool([(0, {})]) pool.connections = [] self.assertEquals(0, pool.get_connection()) self.assertEquals([], pool.connections) self.assertTrue(pool.dead.empty()) def test_already_failed_connection_has_longer_timeout(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.dead_count[42] = 2 pool.mark_dead(42, now=now) self.assertEquals(3, pool.dead_count[42]) self.assertEquals((now + 4*60, 42), pool.dead.get()) def test_timeout_for_failed_connections_is_limitted(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.dead_count[42] = 245 pool.mark_dead(42, now=now) self.assertEquals(246, pool.dead_count[42]) self.assertEquals((now + 32*60, 42), pool.dead.get()) def test_dead_count_is_wiped_clean_for_connection_if_marked_live(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.dead_count[42] = 2 pool.mark_dead(42, now=now) self.assertEquals(3, pool.dead_count[42]) pool.mark_live(42) self.assertNotIn(42, pool.dead_count) elasticsearch-py-1.6.0/test_elasticsearch/test_serializer.py000066400000000000000000000046211253570341300244170ustar00rootroot00000000000000# -*- coding: utf-8 -*- import sys from datetime import datetime from decimal import Decimal from elasticsearch.serializer import JSONSerializer, Deserializer, DEFAULT_SERIALIZERS, TextSerializer from elasticsearch.exceptions import SerializationError, ImproperlyConfigured from .test_cases import TestCase, SkipTest class TestJSONSerializer(TestCase): def test_datetime_serialization(self): self.assertEquals('{"d": "2010-10-01T02:30:00"}', JSONSerializer().dumps({'d': datetime(2010, 10, 1, 2, 30)})) def test_decimal_serialization(self): if sys.version_info[:2] == (2, 6): raise SkipTest("Float rounding is broken in 2.6.") self.assertEquals('{"d": 3.8}', JSONSerializer().dumps({'d': Decimal('3.8')})) def test_raises_serialization_error_on_dump_error(self): self.assertRaises(SerializationError, JSONSerializer().dumps, object()) def test_raises_serialization_error_on_load_error(self): self.assertRaises(SerializationError, JSONSerializer().loads, object()) self.assertRaises(SerializationError, JSONSerializer().loads, '') self.assertRaises(SerializationError, JSONSerializer().loads, '{{') def test_strings_are_left_untouched(self): self.assertEquals("你好", JSONSerializer().dumps("你好")) class TestTextSerializer(TestCase): def test_strings_are_left_untouched(self): self.assertEquals("你好", TextSerializer().dumps("你好")) def test_raises_serialization_error_on_dump_error(self): self.assertRaises(SerializationError, TextSerializer().dumps, {}) class TestDeserializer(TestCase): def setUp(self): super(TestDeserializer, self).setUp() self.de = Deserializer(DEFAULT_SERIALIZERS) def test_deserializes_json_by_default(self): self.assertEquals({"some": "data"}, self.de.loads('{"some":"data"}')) def test_deserializes_text_with_correct_ct(self): self.assertEquals('{"some":"data"}', self.de.loads('{"some":"data"}', 'text/plain')) self.assertEquals('{"some":"data"}', self.de.loads('{"some":"data"}', 'text/plain; charset=whatever')) def test_raises_serialization_error_on_unknown_mimetype(self): self.assertRaises(SerializationError, self.de.loads, '{}', 'text/html') def test_raises_improperly_configured_when_default_mimetype_cannot_be_deserialized(self): self.assertRaises(ImproperlyConfigured, Deserializer, {}) elasticsearch-py-1.6.0/test_elasticsearch/test_server/000077500000000000000000000000001253570341300231775ustar00rootroot00000000000000elasticsearch-py-1.6.0/test_elasticsearch/test_server/__init__.py000066400000000000000000000012051253570341300253060ustar00rootroot00000000000000from elasticsearch.helpers.test import get_test_client, ElasticsearchTestCase as BaseTestCase client = None def get_client(): global client if client is not None: return client # try and locate manual override in the local environment try: from test_elasticsearch.local import get_client as local_get_client client = local_get_client() except ImportError: # fallback to using vanilla client client = get_test_client() return client def setup(): get_client() class ElasticsearchTestCase(BaseTestCase): @staticmethod def _get_client(): return get_client() elasticsearch-py-1.6.0/test_elasticsearch/test_server/test_client.py000066400000000000000000000003551253570341300260710ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals from . import ElasticsearchTestCase class TestUnicode(ElasticsearchTestCase): def test_indices_analyze(self): self.client.indices.analyze(body='привет') elasticsearch-py-1.6.0/test_elasticsearch/test_server/test_common.py000066400000000000000000000222001253570341300260740ustar00rootroot00000000000000""" Dynamically generated set of TestCases based on set of yaml files decribing some integration tests. These files are shared among all official Elasticsearch clients. """ import re from os import walk, environ from os.path import exists, join, dirname, pardir import yaml from elasticsearch import TransportError from elasticsearch.compat import string_types from elasticsearch.helpers.test import _get_version from ..test_cases import SkipTest from . import ElasticsearchTestCase # some params had to be changed in python, keep track of them so we can rename # those in the tests accordingly PARAMS_RENAMES = { 'type': 'doc_type', 'from': 'from_', } # mapping from catch values to http status codes CATCH_CODES = { 'missing': 404, 'conflict': 409, } # test features we have implemented IMPLEMENTED_FEATURES = ('gtelte', 'stash_in_path') # broken YAML tests on some releases SKIP_TESTS = { (1, 1, 2): set(('TestCatRecovery10Basic', )) } class InvalidActionType(Exception): pass class YamlTestCase(ElasticsearchTestCase): def setUp(self): super(YamlTestCase, self).setUp() if hasattr(self, '_setup_code'): self.run_code(self._setup_code) self.last_response = None self._state = {} def _resolve(self, value): # resolve variables if isinstance(value, string_types) and value.startswith('$'): value = value[1:] self.assertIn(value, self._state) value = self._state[value] if isinstance(value, string_types): value = value.strip() elif isinstance(value, dict): value = dict((k, self._resolve(v)) for (k, v) in value.items()) elif isinstance(value, list): value = list(map(self._resolve, value)) return value def _lookup(self, path): # fetch the possibly nested value from last_response value = self.last_response if path == '$body': return value path = path.replace(r'\.', '\1') for step in path.split('.'): if not step: continue step = step.replace('\1', '.') step = self._resolve(step) if step.isdigit() and step not in value: step = int(step) self.assertIsInstance(value, list) self.assertGreater(len(value), step) else: self.assertIn(step, value) value = value[step] return value def run_code(self, test): """ Execute an instruction based on it's type. """ for action in test: self.assertEquals(1, len(action)) action_type, action = list(action.items())[0] if hasattr(self, 'run_' + action_type): getattr(self, 'run_' + action_type)(action) else: raise InvalidActionType(action_type) def run_do(self, action): """ Perform an api call with given parameters. """ catch = action.pop('catch', None) self.assertEquals(1, len(action)) method, args = list(action.items())[0] # locate api endpoint api = self.client for m in method.split('.'): self.assertTrue(hasattr(api, m)) api = getattr(api, m) # some parameters had to be renamed to not clash with python builtins, # compensate for k in PARAMS_RENAMES: if k in args: args[PARAMS_RENAMES[k]] = args.pop(k) # resolve vars for k in args: args[k] = self._resolve(args[k]) try: self.last_response = api(**args) except Exception as e: if not catch: raise self.run_catch(catch, e) else: if catch: raise AssertionError('Failed to catch %r in %r.' % (catch, self.last_response)) def _get_nodes(self): if not hasattr(self, '_node_info'): self._node_info = list(self.client.nodes.info(node_id='_all', metric='clear')['nodes'].values()) return self._node_info def _get_data_nodes(self): return len([info for info in self._get_nodes() if info.get('attributes', {}).get('data', 'true') == 'true']) def _get_benchmark_nodes(self): return len([info for info in self._get_nodes() if info.get('attributes', {}).get('bench', 'false') == 'true']) def run_skip(self, skip): if 'features' in skip: if skip['features'] in IMPLEMENTED_FEATURES: return elif skip['features'] == 'requires_replica': if self._get_data_nodes() > 1: return elif skip['features'] == 'benchmark': if self._get_benchmark_nodes(): return raise SkipTest(skip.get('reason', 'Feature %s is not supported' % skip['features'])) if 'version' in skip: version, reason = skip['version'], skip['reason'] if version == 'all': raise SkipTest(reason) min_version, max_version = version.split('-') min_version = _get_version(min_version) or (0, ) max_version = _get_version(max_version) or (999, ) if min_version <= self.es_version <= max_version: raise SkipTest(reason) def run_catch(self, catch, exception): if catch == 'param': self.assertIsInstance(exception, TypeError) return self.assertIsInstance(exception, TransportError) if catch in CATCH_CODES: self.assertEquals(CATCH_CODES[catch], exception.status_code) elif catch[0] == '/' and catch[-1] == '/': self.assertTrue(re.search(catch[1:-1], repr(exception.info)), '%s not in %r' % (catch, exception.info)) self.last_response = exception.info def run_gt(self, action): for key, value in action.items(): self.assertGreater(self._lookup(key), value) def run_gte(self, action): for key, value in action.items(): self.assertGreaterEqual(self._lookup(key), value) def run_lt(self, action): for key, value in action.items(): self.assertLess(self._lookup(key), value) def run_lte(self, action): for key, value in action.items(): self.assertLessEqual(self._lookup(key), value) def run_set(self, action): for key, value in action.items(): self._state[value] = self._lookup(key) def run_is_false(self, action): try: value = self._lookup(action) except AssertionError: pass else: self.assertIn(value, ('', None, False, 0)) def run_is_true(self, action): value = self._lookup(action) self.assertNotIn(value, ('', None, False, 0)) def run_length(self, action): for path, expected in action.items(): value = self._lookup(path) expected = self._resolve(expected) self.assertEquals(expected, len(value)) def run_match(self, action): for path, expected in action.items(): value = self._lookup(path) expected = self._resolve(expected) if isinstance(expected, string_types) and \ expected.startswith('/') and expected.endswith('/'): expected = re.compile(expected[1:-1], re.VERBOSE) self.assertTrue(expected.search(value)) else: self.assertEquals(expected, value) def construct_case(filename, name): """ Parse a definition of a test case from a yaml file and construct the TestCase subclass dynamically. """ def make_test(test_name, definition, i): def m(self): if name in SKIP_TESTS.get(self.es_version, ()): raise SkipTest() self.run_code(definition) m.__doc__ = '%s:%s.test_from_yaml_%d (%s): %s' % ( __name__, name, i, '/'.join(filename.split('/')[-2:]), test_name) m.__name__ = 'test_from_yaml_%d' % i return m with open(filename) as f: tests = list(yaml.load_all(f)) attrs = { '_yaml_file': filename } i = 0 for test in tests: for test_name, definition in test.items(): if test_name == 'setup': attrs['_setup_code'] = definition continue attrs['test_from_yaml_%d' % i] = make_test(test_name, definition, i) i += 1 return type(name, (YamlTestCase, ), attrs) YAML_DIR = environ.get( 'TEST_ES_YAML_DIR', join( dirname(__file__), pardir, pardir, pardir, 'elasticsearch', 'rest-api-spec', 'test' ) ) if exists(YAML_DIR): # find all the test definitions in yaml files ... for (path, dirs, files) in walk(YAML_DIR): for filename in files: if not filename.endswith('.yaml'): continue # ... parse them name = ('Test' + ''.join(s.title() for s in path[len(YAML_DIR) + 1:].split('/')) + filename.rsplit('.', 1)[0].title()).replace('_', '').replace('.', '') # and insert them into locals for test runner to find them locals()[name] = construct_case(join(path, filename), name) elasticsearch-py-1.6.0/test_elasticsearch/test_server/test_helpers.py000066400000000000000000000252401253570341300262550ustar00rootroot00000000000000from elasticsearch import helpers, TransportError from . import ElasticsearchTestCase from ..test_cases import SkipTest class FailingBulkClient(object): def __init__(self, client, fail_at=1): self.client = client self._called = -1 self._fail_at = fail_at def bulk(self, *args, **kwargs): self._called += 1 if self._called == self._fail_at: raise TransportError(599, "Error!", "INFO") return self.client.bulk(*args, **kwargs) class TestStreamingBulk(ElasticsearchTestCase): def test_actions_remain_unchanged(self): actions = [{'_id': 1}, {'_id': 2}] for ok, item in helpers.streaming_bulk(self.client, actions, index='test-index', doc_type='answers'): self.assertTrue(ok) self.assertEquals([{'_id': 1}, {'_id': 2}], actions) def test_all_documents_get_inserted(self): docs = [{"answer": x, '_id': x} for x in range(100)] for ok, item in helpers.streaming_bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True): self.assertTrue(ok) self.assertEquals(100, self.client.count(index='test-index', doc_type='answers')['count']) self.assertEquals({"answer": 42}, self.client.get(index='test-index', doc_type='answers', id=42)['_source']) def test_all_errors_from_chunk_are_raised_on_failure(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") try: for ok, item in helpers.streaming_bulk(self.client, [{"a": "b"}, {"a": "c"}], index="i", doc_type="t", raise_on_error=True): self.assertTrue(ok) except helpers.BulkIndexError as e: self.assertEquals(2, len(e.errors)) else: assert False, "exception should have been raised" def test_different_op_types(self): if self.es_version < (0, 90, 1): raise SkipTest('update supported since 0.90.1') self.client.index(index='i', doc_type='t', id=45, body={}) self.client.index(index='i', doc_type='t', id=42, body={}) docs = [ {'_index': 'i', '_type': 't', '_id': 47, 'f': 'v'}, {'_op_type': 'delete', '_index': 'i', '_type': 't', '_id': 45}, {'_op_type': 'update', '_index': 'i', '_type': 't', '_id': 42, 'doc': {'answer': 42}} ] for ok, item in helpers.streaming_bulk(self.client, docs): self.assertTrue(ok) self.assertFalse(self.client.exists(index='i', id=45)) self.assertEquals({'answer': 42}, self.client.get(index='i', id=42)['_source']) self.assertEquals({'f': 'v'}, self.client.get(index='i', id=47)['_source']) def test_transport_error_can_becaught(self): failing_client = FailingBulkClient(self.client) docs = [ {'_index': 'i', '_type': 't', '_id': 47, 'f': 'v'}, {'_index': 'i', '_type': 't', '_id': 45, 'f': 'v'}, {'_index': 'i', '_type': 't', '_id': 42, 'f': 'v'}, ] results = list(helpers.streaming_bulk(failing_client, docs, raise_on_exception=False, raise_on_error=False, chunk_size=1)) self.assertEquals(3, len(results)) self.assertEquals([True, False, True], [r[0] for r in results]) exc = results[1][1]['index'].pop('exception') self.assertIsInstance(exc, TransportError) self.assertEquals(599, exc.status_code) self.assertEquals( { 'index': { '_index': 'i', '_type': 't', '_id': 45, 'data': {'f': 'v'}, 'error': "TransportError(599, 'Error!')", 'status': 599 } }, results[1][1] ) class TestBulk(ElasticsearchTestCase): def test_bulk_works_with_single_item(self): docs = [{"answer": 42, '_id': 1}] success, failed = helpers.bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True) self.assertEquals(1, success) self.assertFalse(failed) self.assertEquals(1, self.client.count(index='test-index', doc_type='answers')['count']) self.assertEquals({"answer": 42}, self.client.get(index='test-index', doc_type='answers', id=1)['_source']) def test_all_documents_get_inserted(self): docs = [{"answer": x, '_id': x} for x in range(100)] success, failed = helpers.bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True) self.assertEquals(100, success) self.assertFalse(failed) self.assertEquals(100, self.client.count(index='test-index', doc_type='answers')['count']) self.assertEquals({"answer": 42}, self.client.get(index='test-index', doc_type='answers', id=42)['_source']) def test_stats_only_reports_numbers(self): docs = [{"answer": x} for x in range(100)] success, failed = helpers.bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True, stats_only=True) self.assertEquals(100, success) self.assertEquals(0, failed) self.assertEquals(100, self.client.count(index='test-index', doc_type='answers')['count']) def test_errors_are_reported_correctly(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") success, failed = helpers.bulk( self.client, [{"a": 42}, {"a": "c", '_id': 42}], index="i", doc_type="t", raise_on_error=False ) self.assertEquals(1, success) self.assertEquals(1, len(failed)) error = failed[0] self.assertEquals('42', error['index']['_id']) self.assertEquals('t', error['index']['_type']) self.assertEquals('i', error['index']['_index']) self.assertIn('MapperParsingException', error['index']['error']) def test_error_is_raised(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") self.assertRaises(helpers.BulkIndexError, helpers.bulk, self.client, [{"a": 42}, {"a": "c"}], index="i", doc_type="t" ) def test_errors_are_collected_properly(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") success, failed = helpers.bulk( self.client, [{"a": 42}, {"a": "c"}], index="i", doc_type="t", stats_only=True, raise_on_error=False ) self.assertEquals(1, success) self.assertEquals(1, failed) class TestScan(ElasticsearchTestCase): def test_order_can_be_preserved(self): bulk = [] for x in range(100): bulk.append({"index": {"_index": "test_index", "_type": "answers", "_id": x}}) bulk.append({"answer": x, "correct": x == 42}) self.client.bulk(bulk, refresh=True) docs = list(helpers.scan(self.client, index="test_index", doc_type="answers", size=2, query={"sort": ["answer"]}, preserve_order=True)) self.assertEquals(100, len(docs)) self.assertEquals(list(map(str, range(100))), list(d['_id'] for d in docs)) self.assertEquals(list(range(100)), list(d['_source']['answer'] for d in docs)) def test_all_documents_are_read(self): bulk = [] for x in range(100): bulk.append({"index": {"_index": "test_index", "_type": "answers", "_id": x}}) bulk.append({"answer": x, "correct": x == 42}) self.client.bulk(bulk, refresh=True) docs = list(helpers.scan(self.client, index="test_index", doc_type="answers", size=2)) self.assertEquals(100, len(docs)) self.assertEquals(set(map(str, range(100))), set(d['_id'] for d in docs)) self.assertEquals(set(range(100)), set(d['_source']['answer'] for d in docs)) class TestReindex(ElasticsearchTestCase): def setUp(self): super(TestReindex, self).setUp() bulk = [] for x in range(100): bulk.append({"index": {"_index": "test_index", "_type": "answers" if x % 2 == 0 else "questions", "_id": x}}) bulk.append({"answer": x, "correct": x == 42}) self.client.bulk(bulk, refresh=True) def test_reindex_passes_kwargs_to_scan_and_bulk(self): helpers.reindex(self.client, "test_index", "prod_index", scan_kwargs={'doc_type': 'answers'}, bulk_kwargs={'refresh': True}) self.assertTrue(self.client.indices.exists("prod_index")) self.assertFalse(self.client.indices.exists_type(index='prod_index', doc_type='questions')) self.assertEquals(50, self.client.count(index='prod_index', doc_type='answers')['count']) self.assertEquals({"answer": 42, "correct": True}, self.client.get(index="prod_index", doc_type="answers", id=42)['_source']) def test_reindex_accepts_a_query(self): helpers.reindex(self.client, "test_index", "prod_index", query={"query": {"filtered": {"filter": {"term": {"_type": "answers"}}}}}) self.client.indices.refresh() self.assertTrue(self.client.indices.exists("prod_index")) self.assertFalse(self.client.indices.exists_type(index='prod_index', doc_type='questions')) self.assertEquals(50, self.client.count(index='prod_index', doc_type='answers')['count']) self.assertEquals({"answer": 42, "correct": True}, self.client.get(index="prod_index", doc_type="answers", id=42)['_source']) def test_all_documents_get_moved(self): helpers.reindex(self.client, "test_index", "prod_index") self.client.indices.refresh() self.assertTrue(self.client.indices.exists("prod_index")) self.assertEquals(50, self.client.count(index='prod_index', doc_type='questions')['count']) self.assertEquals(50, self.client.count(index='prod_index', doc_type='answers')['count']) self.assertEquals({"answer": 42, "correct": True}, self.client.get(index="prod_index", doc_type="answers", id=42)['_source']) elasticsearch-py-1.6.0/test_elasticsearch/test_server/test_memcached.py000066400000000000000000000033621253570341300265220ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals from elasticsearch import Elasticsearch, MemcachedConnection, NotFoundError from elasticsearch.transport import ADDRESS_RE from . import ElasticsearchTestCase from ..test_cases import SkipTest class TestMemcachedConnection(ElasticsearchTestCase): def setUp(self): try: import pylibmc except ImportError: raise SkipTest("No pylibmc.") super(TestMemcachedConnection, self).setUp() nodes = self.client.nodes.info() for node_id, node_info in nodes["nodes"].items(): if 'memcached_address' in node_info: connection_info = ADDRESS_RE.search(node_info['memcached_address']).groupdict() self.mc_client = Elasticsearch( [connection_info], connection_class=MemcachedConnection ) break else: raise SkipTest("No memcached plugin.") def test_index(self): self.mc_client.index("test_index", "test_type", {"answer": 42}, id=1) self.assertTrue(self.client.exists("test_index", doc_type="test_type", id=1)) def test_get(self): self.client.index("test_index", "test_type", {"answer": 42}, id=1) self.assertEquals({"answer": 42}, self.mc_client.get("test_index", doc_type="test_type", id=1)["_source"]) def test_unicode(self): self.mc_client.index("test_index", "test_type", {"answer": "你好"}, id="你好") self.assertEquals({"answer": "你好"}, self.mc_client.get("test_index", doc_type="test_type", id="你好")["_source"]) def test_missing(self): self.assertRaises(NotFoundError, self.mc_client.get, "test_index", doc_type="test_type", id=42) elasticsearch-py-1.6.0/test_elasticsearch/test_transport.py000066400000000000000000000231221253570341300242770ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals import time from elasticsearch.transport import Transport, get_host_info from elasticsearch.connection import Connection, ThriftConnection from elasticsearch.connection_pool import DummyConnectionPool from elasticsearch.exceptions import ConnectionError, ImproperlyConfigured from .test_cases import TestCase class DummyConnection(Connection): def __init__(self, **kwargs): self.exception = kwargs.pop('exception', None) self.status, self.data = kwargs.pop('status', 200), kwargs.pop('data', '{}') self.headers = kwargs.pop('headers', {}) self.calls = [] super(DummyConnection, self).__init__(**kwargs) def perform_request(self, *args, **kwargs): self.calls.append((args, kwargs)) if self.exception: raise self.exception return self.status, self.headers, self.data CLUSTER_NODES = '''{ "ok" : true, "cluster_name" : "super_cluster", "nodes" : { "wE_6OGBNSjGksbONNncIbg" : { "name" : "Nightwind", "transport_address" : "inet[/127.0.0.1:9300]", "hostname" : "wind", "version" : "0.20.4", "http_address" : "inet[/1.1.1.1:123]", "thrift_address" : "/1.1.1.1:9500]" } } }''' class TestHostsInfoCallback(TestCase): def test_master_only_nodes_are_ignored(self): nodes = [ {'attributes': {'data': 'false', 'client': 'true'}}, {'attributes': {'data': 'false'}}, {'attributes': {'data': 'false', 'master': 'true'}}, {'attributes': {'data': 'false', 'master': 'false'}}, {'attributes': {}}, {} ] chosen = [ i for i, node_info in enumerate(nodes) if get_host_info(node_info, i) is not None] self.assertEquals([0, 3, 4, 5], chosen) class TestTransport(TestCase): def test_single_connection_uses_dummy_connection_pool(self): t = Transport([{}]) self.assertIsInstance(t.connection_pool, DummyConnectionPool) t = Transport([{'host': 'localhost'}]) self.assertIsInstance(t.connection_pool, DummyConnectionPool) def test_host_with_scheme_different_from_connection_fails(self): self.assertRaises(ImproperlyConfigured, Transport, [{'host': 'localhost', 'scheme': 'thrift'}]) self.assertRaises(ImproperlyConfigured, Transport, [{'host': 'localhost', 'scheme': 'http'}], connection_class=ThriftConnection) def test_request_timeout_extracted_from_params_and_passed(self): t = Transport([{}], connection_class=DummyConnection) t.perform_request('GET', '/', params={'request_timeout': 42}) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', {}, None), t.get_connection().calls[0][0]) self.assertEquals({'timeout': 42, 'ignore': ()}, t.get_connection().calls[0][1]) def test_send_get_body_as_source(self): t = Transport([{}], send_get_body_as='source', connection_class=DummyConnection) t.perform_request('GET', '/', body={}) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', {'source': '{}'}, None), t.get_connection().calls[0][0]) def test_send_get_body_as_post(self): t = Transport([{}], send_get_body_as='POST', connection_class=DummyConnection) t.perform_request('GET', '/', body={}) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('POST', '/', None, b'{}'), t.get_connection().calls[0][0]) def test_body_gets_encoded_into_bytes(self): t = Transport([{}], connection_class=DummyConnection) t.perform_request('GET', '/', body='你好') self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', None, b'\xe4\xbd\xa0\xe5\xa5\xbd'), t.get_connection().calls[0][0]) def test_body_bytes_get_passed_untouched(self): t = Transport([{}], connection_class=DummyConnection) body = b'\xe4\xbd\xa0\xe5\xa5\xbd' t.perform_request('GET', '/', body=body) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', None, body), t.get_connection().calls[0][0]) def test_kwargs_passed_on_to_connections(self): t = Transport([{'host': 'google.com'}], port=123) self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('%s://google.com:123' % t.connection_pool.connections[0].transport_schema, t.connection_pool.connections[0].host) def test_kwargs_passed_on_to_connection_pool(self): dt = object() t = Transport([{}, {}], dead_timeout=dt) self.assertIs(dt, t.connection_pool.dead_timeout) def test_custom_connection_class(self): class MyConnection(object): def __init__(self, **kwargs): self.kwargs = kwargs t = Transport([{}], connection_class=MyConnection) self.assertEquals(1, len(t.connection_pool.connections)) self.assertIsInstance(t.connection_pool.connections[0], MyConnection) def test_add_connection(self): t = Transport([{}], randomize_hosts=False) t.add_connection({"host": "google.com", "port": 1234}) self.assertEquals(2, len(t.connection_pool.connections)) self.assertEquals('%s://google.com:1234' % t.connection_pool.connections[1].transport_schema, t.connection_pool.connections[1].host) def test_request_will_fail_after_X_retries(self): t = Transport([{'exception': ConnectionError('abandon ship')}], connection_class=DummyConnection) self.assertRaises(ConnectionError, t.perform_request, 'GET', '/') self.assertEquals(4, len(t.get_connection().calls)) def test_failed_connection_will_be_marked_as_dead(self): t = Transport([{'exception': ConnectionError('abandon ship')}] * 2, connection_class=DummyConnection) self.assertRaises(ConnectionError, t.perform_request, 'GET', '/') self.assertEquals(0, len(t.connection_pool.connections)) def test_resurrected_connection_will_be_marked_as_live_on_success(self): t = Transport([{}, {}], connection_class=DummyConnection) con1 = t.connection_pool.get_connection() con2 = t.connection_pool.get_connection() t.connection_pool.mark_dead(con1) t.connection_pool.mark_dead(con2) t.perform_request('GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals(1, len(t.connection_pool.dead_count)) def test_sniff_will_use_seed_connections(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection) t.set_connections([{'data': 'invalid'}]) t.sniff_hosts() self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) def test_sniff_on_start_fetches_and_uses_nodes_list_for_its_schema(self): class DummyThriftConnection(DummyConnection): transport_schema = 'thrift' t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyThriftConnection, sniff_on_start=True) self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('thrift://1.1.1.1:9500', t.get_connection().host) def test_sniff_on_start_fetches_and_uses_nodes_list(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection, sniff_on_start=True) self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) def test_sniff_on_start_ignores_sniff_timeout(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection, sniff_on_start=True, sniff_timeout=12) self.assertEquals((('GET', '/_nodes/_all/clear'), {'timeout': None}), t.seed_connections[0].calls[0]) def test_sniff_uses_sniff_timeout(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection, sniff_timeout=42) t.sniff_hosts() self.assertEquals((('GET', '/_nodes/_all/clear'), {'timeout': 42}), t.seed_connections[0].calls[0]) def test_sniff_reuses_connection_instances_if_possible(self): t = Transport([{'data': CLUSTER_NODES}, {"host": "1.1.1.1", "port": 123}], connection_class=DummyConnection, randomize_hosts=False) connection = t.connection_pool.connections[1] t.sniff_hosts() self.assertEquals(1, len(t.connection_pool.connections)) self.assertIs(connection, t.get_connection()) def test_sniff_on_fail_triggers_sniffing_on_fail(self): t = Transport([{'exception': ConnectionError('abandon ship')}, {"data": CLUSTER_NODES}], connection_class=DummyConnection, sniff_on_connection_fail=True, max_retries=0, randomize_hosts=False) self.assertRaises(ConnectionError, t.perform_request, 'GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) def test_sniff_after_n_seconds(self): t = Transport([{"data": CLUSTER_NODES}], connection_class=DummyConnection, sniffer_timeout=5) for _ in range(4): t.perform_request('GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertIsInstance(t.get_connection(), DummyConnection) t.last_sniff = time.time() - 5.1 t.perform_request('GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) self.assertTrue(time.time() - 1 < t.last_sniff < time.time() + 0.01 ) elasticsearch-py-1.6.0/tox.ini000066400000000000000000000002731253570341300162760ustar00rootroot00000000000000[tox] envlist = pypy,py26,py27,py33,py34 [testenv] whitelist_externals = git setenv = NOSE_XUNIT_FILE = junit-{envname}.xml commands = git submodule init python setup.py test