pax_global_header00006660000000000000000000000064131073527120014513gustar00rootroot0000000000000052 comment=aea9824d53635edb12a75fdcd8d5f71d40521e54 elasticsearch-py-5.4.0/000077500000000000000000000000001310735271200147615ustar00rootroot00000000000000elasticsearch-py-5.4.0/.coveragerc000066400000000000000000000002551310735271200171040ustar00rootroot00000000000000[run] omit = */python?.?/* */lib-python/?.?/*.py */lib_pypy/* */site-packages/* *.egg/* test_elasticsearch/* elasticsearch/connection/esthrift/* elasticsearch-py-5.4.0/.gitignore000066400000000000000000000002721310735271200167520ustar00rootroot00000000000000.eggs/* .*.swp *~ *.py[co] .coverage test_elasticsearch/cover test_elasticsearch/local.py docs/_build elasticsearch.egg-info .tox dist *.egg coverage.xml nosetests.xml junit-*.xml build elasticsearch-py-5.4.0/.travis.yml000066400000000000000000000016451310735271200171000ustar00rootroot00000000000000language: python python: - "2.6" - "2.7" - "3.3" - "3.4" - "3.5" - "3.6" addons: apt: packages: - oracle-java8-installer env: # different connection classes to test - TEST_ES_CONNECTION=Urllib3HttpConnection - TEST_ES_CONNECTION=RequestsHttpConnection before_install: - sudo update-java-alternatives -s java-8-oracle - export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre - java -version install: - curl -L -o /tmp/es-snap.zip https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.3.0.zip - unzip /tmp/es-snap.zip -d /tmp/ - /tmp/elasticsearch-*/bin/elasticsearch -E script.inline=true -E path.repo=/tmp -E repositories.url.allowed_urls='http://*' -E node.attr.testattr=test -d - git clone https://github.com/elastic/elasticsearch.git ../elasticsearch - pip install . script: - python setup.py test notifications: email: recipients: - honza.kral@gmail.com elasticsearch-py-5.4.0/AUTHORS000066400000000000000000000040451310735271200160340ustar00rootroot00000000000000Honza Král Jordi Llonch Rob Hudson Njal Karevoll Boaz Leskes Graeme Coupar Murhaf Fares Hari haran Richard Boulton Brian Hicks Karan Gupta H. İbrahim Güngör James Yu Andrey Balandin Marco Hoyer Max Kutsevol Jan Gaedicke Klaas Bosteels starenka Mathieu Geli Bruno Renié Ondrej Sika Alex Ksikes Ronan Amicel speedplane Corey Farwell Andrew Snare Armagnac Alexandru Ghitza Aarni Koskela Michael Schier Yuri Khrustalev Rémy HUBSCHER j0hnsmith Julian Mehnle Steven Moy syllogismos Magnus Bäck Marc Abramowitz J Charitopoulos Sven Wästlund Russell Savage Georges Toth Malthe Borch Jim Kelly Xiuming Chen David Szotten Jorgen Jorgensen Jason Veatch Daniel Emil Hessman Dmitry Sadovnychyi Chris Earle Tomas Mozes Alexei Peters Michael Gibson Darryl Ring Will McGinnis elasticsearch-py-5.4.0/CONTRIBUTING.md000066400000000000000000000037331310735271200172200ustar00rootroot00000000000000If you have a bugfix or new feature that you would like to contribute to elasticsearch-py, please find or open an issue about it first. Talk about what you would like to do. It may be that somebody is already working on it, or that there are particular issues that you should know about before implementing the change. We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code. The process for contributing to any of the Elasticsearch repositories is similar. 1. Please make sure you have signed the [Contributor License Agreement](http://www.elastic.co/contributor-agreement/). We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once. 2. Run the test suite to ensure your changes do not break existing code: ```` python setup.py test ```` See the README file in `test_elasticsearch` directory for more information on running the test suite. 3. Rebase your changes. Update your local repository with the most recent code from the main elasticsearch-py repository, and rebase your branch on top of the latest master branch. We prefer your changes to be squashed into a single commit. 4. Submit a pull request. Push your local changes to your forked copy of the repository and submit a pull request. In the pull request, describe what your changes do and mention the number of the issue where discussion has taken place, eg “Closes #123″. Please consider adding or modifying tests related to your changes. Then sit back and wait. There will probably be a discussion about the pull request and, if any changes are needed, we would love to work with you to get your pull request merged into elasticsearch-py. elasticsearch-py-5.4.0/Changelog.rst000066400000000000000000000154001310735271200174020ustar00rootroot00000000000000.. _changelog: Changelog ========= 5.4.0 (2017-05-18) ------------------ * ``bulk`` helpers now extract ``pipeline`` parameter from the action dictionary. 5.3.0 (2017-03-30) ------------------ Compatibility with elasticsearch 5.3 5.2.0 (2017-02-12) ------------------ The client now automatically sends ``Content-Type`` http header set to ``application/json``. If you are explicitly passing in other encoding than ``json`` you need to set the header manually. 5.1.0 (2017-01-11) ------------------ * Fixed sniffing 5.0.1 (2016-11-02) ------------------ Fixed performance regression in ``scan`` helper 5.0.0 (2016-10-19) ------------------ Version compatible with elasticsearch 5.0 * when using SSL certificate validation is now on by default. Install ``certifi`` or supply root certificate bundle. * ``elasticsearch.trace`` logger now also logs failed requests, signature of internal logging method ``log_request_fail`` has changed, all custom connection classes need to be updated * added ``headers`` arg to connections to support custom http headers * passing in a keyword parameter with ``None`` as value will cause that param to be ignored 2.4.0 (2016-08-17) ------------------ * ``ping`` now ignores all ``TransportError`` exceptions and just returns ``False`` * expose ``scroll_id`` on ``ScanError`` * increase default size for ``scan`` helper to 1000 Internal: * changed ``Transport.perform_request`` to just return the body, not status as well. 2.3.0 (2016-02-29) ------------------ * added ``client_key`` argument to configure client certificates * debug logging now includes response body even for failed requests 2.2.0 (2016-01-05) ------------------ Due to change in json encoding the client will no longer mask issues with encoding - if you work with non-ascii data in python 2 you must use the ``unicode`` type or have proper encoding set in your environment. * adding additional options for ssh - ``ssl_assert_hostname`` and ``ssl_assert_fingerprint`` to the default connection class * fix sniffing 2.1.0 (2015-10-19) ------------------ * move multiprocessing import inside parallel bulk for Google App Engine 2.0.0 (2015-10-14) ------------------ * Elasticsearch 2.0 compatibility release 1.8.0 (2015-10-14) ------------------ * removed thrift and memcached connections, if you wish to continue using those, extract the classes and use them separately. * added a new, parallel version of the bulk helper using thread pools * In helpers, removed ``bulk_index`` as an alias for ``bulk``. Use ``bulk`` instead. 1.7.0 (2015-09-21) ------------------ * elasticsearch 2.0 compatibility * thrift now deprecated, to be removed in future version * make sure urllib3 always uses keep-alive 1.6.0 (2015-06-10) ------------------ * Add ``indices.flush_synced`` API * ``helpers.reindex`` now supports reindexing parent/child documents 1.5.0 (2015-05-18) ------------------ * Add support for ``query_cache`` parameter when searching * helpers have been made more secure by changing defaults to raise an exception on errors * removed deprecated options ``replication`` and the deprecated benchmark api. * Added ``AddonClient`` class to allow for extending the client from outside 1.4.0 (2015-02-11) ------------------ * Using insecure SSL configuration (``verify_cert=False``) raises a warning * ``reindex`` accepts a ``query`` parameter * enable ``reindex`` helper to accept any kwargs for underlying ``bulk`` and ``scan`` calls * when doing an initial sniff (via ``sniff_on_start``) ignore special sniff timeout * option to treat ``TransportError`` as normal failure in ``bulk`` helpers * fixed an issue with sniffing when only a single host was passed in 1.3.0 (2014-12-31) ------------------ * Timeout now doesn't trigger a retry by default (can be overriden by setting ``retry_on_timeout=True``) * Introduced new parameter ``retry_on_status`` (defaulting to ``(503, 504, )``) controls which http status code should lead to a retry. * Implemented url parsing according to RFC-1738 * Added support for proper SSL certificate handling * Required parameters are now checked for non-empty values * ConnectionPool now checks if any connections were defined * DummyConnectionPool introduced when no load balancing is needed (only one connection defined) * Fixed a race condition in ConnectionPool 1.2.0 (2014-08-03) ------------------ Compatibility with newest (1.3) Elasticsearch APIs. * Filter out master-only nodes when sniffing * Improved docs and error messages 1.1.1 (2014-07-04) ------------------ Bugfix release fixing escaping issues with ``request_timeout``. 1.1.0 (2014-07-02) ------------------ Compatibility with newest Elasticsearch APIs. * Test helpers - ``ElasticsearchTestCase`` and ``get_test_client`` for use in your tests * Python 3.2 compatibility * Use ``simplejson`` if installed instead of stdlib json library * Introducing a global ``request_timeout`` parameter for per-call timeout * Bug fixes 1.0.0 (2014-02-11) ------------------ Elasticsearch 1.0 compatibility. See 0.4.X releases (and 0.4 branch) for code compatible with 0.90 elasticsearch. * major breaking change - compatible with 1.0 elasticsearch releases only! * Add an option to change the timeout used for sniff requests (``sniff_timeout``). * empty responses from the server are now returned as empty strings instead of None * ``get_alias`` now has ``name`` as another optional parameter due to issue #4539 in es repo. Note that the order of params have changed so if you are not using keyword arguments this is a breaking change. 0.4.4 (2013-12-23) ------------------ * ``helpers.bulk_index`` renamed to ``helpers.bulk`` (alias put in place for backwards compatibility, to be removed in future versions) * Added ``helpers.streaming_bulk`` to consume an iterator and yield results per operation * ``helpers.bulk`` and ``helpers.streaming_bulk`` are no longer limited to just index operations. * unicode body (for ``incices.analyze`` for example) is now handled correctly * changed ``perform_request`` on ``Connection`` classes to return headers as well. This is a backwards incompatible change for people who have developed their own connection class. * changed deserialization mechanics. Users who provided their own serializer that didn't extend ``JSONSerializer`` need to specify a ``mimetype`` class attribute. * minor bug fixes 0.4.3 (2013-10-22) ------------------ * Fixes to ``helpers.bulk_index``, better error handling * More benevolent ``hosts`` argument parsing for ``Elasticsearch`` * ``requests`` no longer required (nor recommended) for install 0.4.2 (2013-10-08) ------------------ * ``ignore`` param accepted by all APIs * Fixes to ``helpers.bulk_index`` 0.4.1 (2013-09-24) ------------------ Initial release. elasticsearch-py-5.4.0/LICENSE000066400000000000000000000236371310735271200160010ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. elasticsearch-py-5.4.0/MANIFEST.in000066400000000000000000000005351310735271200165220ustar00rootroot00000000000000include AUTHORS include Changelog.rst include CONTRIBUTING.md include LICENSE include MANIFEST.in include README.rst include README include tox.ini include setup.py include elasticsearch/connection/esthrift/Rest-remote recursive-include docs * prune docs/_build prune test_elasticsearch recursive-exclude * __pycache__ recursive-exclude * *.py[co] elasticsearch-py-5.4.0/README000066400000000000000000000122611310735271200156430ustar00rootroot00000000000000Python Elasticsearch Client =========================== Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Python; because of this it tries to be opinion-free and very extendable. For a more high level client library with more limited scope, have a look at `elasticsearch-dsl`_ - a more pythonic library sitting on top of ``elasticsearch-py``. It provides a more convenient and idiomatic way to write and manipulate `queries`_. It stays close to the Elasticsearch JSON DSL, mirroring its terminology and structure while exposing the whole range of the DSL from Python either directly using defined classes or a queryset-like expressions. It also provides an optional `persistence layer`_ for working with documents as Python objects in an ORM-like fashion: defining mappings, retrieving and saving documents, wrapping the document data in user-defined classes. .. _elasticsearch-dsl: https://elasticsearch-dsl.readthedocs.io/ .. _queries: https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html .. _persistence layer: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#doctype Compatibility ------------- The library is compatible with all Elasticsearch versions since ``0.90.x`` but you **have to use a matching major version**: For **Elasticsearch 5.0** and later, use the major version 5 (``5.x.y``) of the library. For **Elasticsearch 2.0** and later, use the major version 2 (``2.x.y``) of the library. For **Elasticsearch 1.0** and later, use the major version 1 (``1.x.y``) of the library. For **Elasticsearch 0.90.x**, use a version from ``0.4.x`` releases of the library. The recommended way to set your requirements in your `setup.py` or `requirements.txt` is:: # Elasticsearch 5.x elasticsearch>=5.0.0,<6.0.0 # Elasticsearch 2.x elasticsearch>=2.0.0,<3.0.0 # Elasticsearch 1.x elasticsearch>=1.0.0,<2.0.0 # Elasticsearch 0.90.x elasticsearch<1.0.0 The development is happening on ``master`` and ``2.x`` branches respectively. Installation ------------ Install the ``elasticsearch`` package with `pip `_:: pip install elasticsearch Run Elasticsearch in a Container -------------------------------- To run elasticsearch in a container, optionally set the `ES_VERSION` environment evariable to either 5.4, 5.3 or 2.4. `ES_VERSION` is defaulted to `latest`. Then run ./start_elasticsearch.sh:: export ES_VERSION=5.4 ./start_elasticsearch.sh This will run a version fo Elastic Search in a Docker container suitable for running the tests. To check that elasticearch is running first wait for a `healthy` status in `docker ps`:: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 955e57564e53 7d2ad83f8446 "/docker-entrypoin..." 6 minutes ago Up 6 minutes (healthy) 0.0.0.0:9200->9200/tcp, 9300/tcp trusting_brattain Then you can navigate to `locahost:9200` in your browser. Example use ----------- Simple use-case:: >>> from datetime import datetime >>> from elasticsearch import Elasticsearch # by default we connect to localhost:9200 >>> es = Elasticsearch() # create an index in elasticsearch, ignore status code 400 (index already exists) >>> es.indices.create(index='my-index', ignore=400) {u'acknowledged': True} # datetimes will be serialized >>> es.index(index="my-index", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()}) {u'_id': u'42', u'_index': u'my-index', u'_type': u'test-type', u'_version': 1, u'ok': True} # but not deserialized >>> es.get(index="my-index", doc_type="test-type", id=42)['_source'] {u'any': u'data', u'timestamp': u'2013-05-12T19:45:31.804229'} `Full documentation`_. .. _Full documentation: https://elasticsearch-py.readthedocs.io/ Features -------- The client's features include: * translating basic Python data types to and from json (datetimes are not decoded for performance reasons) * configurable automatic discovery of cluster nodes * persistent connections * load balancing (with pluggable selection strategy) across all available nodes * failed connection penalization (time based - failed connections won't be retried until a timeout is reached) * support for ssl and http authentication * thread safety * pluggable architecture License ------- Copyright 2015 Elasticsearch Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Build status ------------ .. image:: https://secure.travis-ci.org/elastic/elasticsearch-py.png :target: http://travis-ci.org/#!/elastic/elasticsearch-py elasticsearch-py-5.4.0/README.rst000077700000000000000000000000001310735271200173232READMEustar00rootroot00000000000000elasticsearch-py-5.4.0/docs/000077500000000000000000000000001310735271200157115ustar00rootroot00000000000000elasticsearch-py-5.4.0/docs/Changelog.rst000077700000000000000000000000001310735271200231612../Changelog.rstustar00rootroot00000000000000elasticsearch-py-5.4.0/docs/Makefile000066400000000000000000000152061310735271200173550ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # User-friendly check for sphinx-build ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " xml to make Docutils-native XML files" @echo " pseudoxml to make pseudoxml-XML files for display purposes" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Elasticsearch.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Elasticsearch.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/Elasticsearch" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Elasticsearch" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." latexpdfja: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through platex and dvipdfmx..." $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." xml: $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml @echo @echo "Build finished. The XML files are in $(BUILDDIR)/xml." pseudoxml: $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml @echo @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." elasticsearch-py-5.4.0/docs/api.rst000066400000000000000000000057441310735271200172260ustar00rootroot00000000000000.. _api: API Documentation ================= All the API calls map the raw REST api as closely as possible, including the distinction between required and optional arguments to the calls. This means that the code makes distinction between positional and keyword arguments; we, however, recommend that people **use keyword arguments for all calls for consistency and safety**. .. note:: for compatibility with the Python ecosystem we use ``from_`` instead of ``from`` and ``doc_type`` instead of ``type`` as parameter names. Global options -------------- Some parameters are added by the client itself and can be used in all API calls. Ignore ~~~~~~ An API call is considered successful (and will return a response) if elasticsearch returns a 2XX response. Otherwise an instance of :class:`~elasticsearch.TransportError` (or a more specific subclass) will be raised. You can see other exception and error states in :ref:`exceptions`. If you do not wish an exception to be raised you can always pass in an ``ignore`` parameter with either a single status code that should be ignored or a list of them:: from elasticsearch import Elasticsearch es = Elasticsearch() # ignore 400 cause by IndexAlreadyExistsException when creating an index es.indices.create(index='test-index', ignore=400) # ignore 404 and 400 es.indices.delete(index='test-index', ignore=[400, 404]) Timeout ~~~~~~~ Global timeout can be set when constructing the client (see :class:`~elasticsearch.Connection`'s ``timeout`` parameter) or on a per-request basis using ``request_timeout`` (float value in seconds) as part of any API call, this value will get passed to the ``perform_request`` method of the connection class:: # only wait for 1 second, regardless of the client's default es.cluster.health(wait_for_status='yellow', request_timeout=1) .. note:: Some API calls also accept a ``timeout`` parameter that is passed to Elasticsearch server. This timeout is internal and doesn't guarantee that the request will end in the specified time. .. py:module:: elasticsearch Response Filtering ~~~~~~~~~~~~~~~~~~ The ``filter_path`` parameter is used to reduce the response returned by elasticsearch. For example, to only return ``_id`` and ``_type``, do:: es.search(index='test-index', filter_path=['hits.hits._id', 'hits.hits._type']) It also supports the ``*`` wildcard character to match any field or part of a field's name:: es.search(index='test-index', filter_path=['hits.hits._*']) Elasticsearch ------------- .. autoclass:: Elasticsearch :members: .. py:module:: elasticsearch.client Indices ------- .. autoclass:: IndicesClient :members: Ingest ------ .. autoclass:: IngestClient :members: Cluster ------- .. autoclass:: ClusterClient :members: Nodes ----- .. autoclass:: NodesClient :members: Cat --- .. autoclass:: CatClient :members: Snapshot -------- .. autoclass:: SnapshotClient :members: Tasks ----- .. autoclass:: TasksClient :members: elasticsearch-py-5.4.0/docs/conf.py000066400000000000000000000204711310735271200172140ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Elasticsearch documentation build configuration file, created by # sphinx-quickstart on Mon May 6 15:38:41 2013. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.insert(0, os.path.abspath('.')) # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest'] autoclass_content = "both" # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'Elasticsearch' copyright = u'2013, Honza Král' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # import elasticsearch # The short X.Y version. version = elasticsearch.__versionstr__ # The full version, including alpha/beta/rc tags. release = version # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # If true, keep warnings as "system message" paragraphs in the built documents. #keep_warnings = False # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. on_rtd = os.environ.get('READTHEDOCS', None) == 'True' if not on_rtd: # only import and set the theme if we're building docs locally import sphinx_rtd_theme html_theme = 'sphinx_rtd_theme' html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'Elasticsearchdoc' # -- Options for LaTeX output -------------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'Elasticsearch.tex', u'Elasticsearch Documentation', u'Honza Král', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'elasticsearch-py', u'Elasticsearch Documentation', [u'Honza Král'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------------ # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', 'Elasticsearch', u'Elasticsearch Documentation', u'Honza Král', 'Elasticsearch', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # If true, do not generate a @detailmenu in the "Top" node's menu. #texinfo_no_detailmenu = False elasticsearch-py-5.4.0/docs/connection.rst000066400000000000000000000031001310735271200205740ustar00rootroot00000000000000.. _connection_api: Connection Layer API ==================== All of the classes responsible for handling the connection to the Elasticsearch cluster. The default subclasses used can be overriden by passing parameters to the :class:`~elasticsearch.Elasticsearch` class. All of the arguments to the client will be passed on to :class:`~elasticsearch.Transport`, :class:`~elasticsearch.ConnectionPool` and :class:`~elasticsearch.Connection`. For example if you wanted to use your own implementation of the :class:`~elasticsearch.ConnectionSelector` class you can just pass in the ``selector_class`` parameter. .. note:: :class:`~elasticsearch.ConnectionPool` and related options (like ``selector_class``) will only be used if more than one connection is defined. Either directly or via the :ref:`sniffing` mechanism. .. py:module:: elasticsearch Transport --------- .. autoclass:: Transport(hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, nodes_to_host_callback=construct_hosts_list, sniff_on_start=False, sniffer_timeout=None, sniff_on_connection_fail=False, serializer=JSONSerializer(), max_retries=3, ** kwargs) :members: Connection Pool --------------- .. autoclass:: ConnectionPool(connections, dead_timeout=60, selector_class=RoundRobinSelector, randomize_hosts=True, ** kwargs) :members: Connection Selector ------------------- .. autoclass:: ConnectionSelector(opts) :members: Urllib3HttpConnection (default connection_class) ------------------------------------------------ .. autoclass:: Urllib3HttpConnection :members: elasticsearch-py-5.4.0/docs/exceptions.rst000066400000000000000000000010421310735271200206210ustar00rootroot00000000000000.. _exceptions: Exceptions ========== .. py:module:: elasticsearch .. autoclass:: ImproperlyConfigured .. autoclass:: ElasticsearchException .. autoclass:: SerializationError(ElasticsearchException) .. autoclass:: TransportError(ElasticsearchException) :members: .. autoclass:: ConnectionError(TransportError) .. autoclass:: ConnectionTimeout(ConnectionError) .. autoclass:: SSLError(ConnectionError) .. autoclass:: NotFoundError(TransportError) .. autoclass:: ConflictError(TransportError) .. autoclass:: RequestError(TransportError) elasticsearch-py-5.4.0/docs/helpers.rst000066400000000000000000000044671310735271200201200ustar00rootroot00000000000000.. _helpers: Helpers ======= Collection of simple helper functions that abstract some specifics or the raw API. Bulk helpers ------------ There are several helpers for the ``bulk`` API since it's requirement for specific formatting and other considerations can make it cumbersome if used directly. All bulk helpers accept an instance of ``Elasticsearch`` class and an iterable ``actions`` (any iterable, can also be a generator, which is ideal in most cases since it will allow you to index large datasets without the need of loading them into memory). The items in the ``action`` iterable should be the documents we wish to index in several formats. The most common one is the same as returned by :meth:`~elasticsearch.Elasticsearch.search`, for example: .. code:: python { '_index': 'index-name', '_type': 'document', '_id': 42, '_parent': 5, 'pipeline': 'my-ingest-pipeline', '_source': { "title": "Hello World!", "body": "..." } } Alternatively, if `_source` is not present, it will pop all metadata fields from the doc and use the rest as the document data: .. code:: python { "_id": 42, "_parent": 5, "title": "Hello World!", "body": "..." } The :meth:`~elasticsearch.Elasticsearch.bulk` api accepts ``index``, ``create``, ``delete``, and ``update`` actions. Use the ``_op_type`` field to specify an action (``_op_type`` defaults to ``index``): .. code:: python { '_op_type': 'delete', '_index': 'index-name', '_type': 'document', '_id': 42, } { '_op_type': 'update', '_index': 'index-name', '_type': 'document', '_id': 42, 'doc': {'question': 'The life, universe and everything.'} } .. note:: When reading raw json strings from a file, you can also pass them in directly (without decoding to dicts first). In that case, however, you lose the ability to specify anything (index, type, even id) on a per-record basis, all documents will just be sent to elasticsearch to be indexed as-is. .. py:module:: elasticsearch.helpers .. autofunction:: streaming_bulk .. autofunction:: parallel_bulk .. autofunction:: bulk Scan ---- .. autofunction:: scan Reindex ------- .. autofunction:: reindex elasticsearch-py-5.4.0/docs/index.rst000066400000000000000000000246671310735271200175710ustar00rootroot00000000000000Python Elasticsearch Client =========================== Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Python; because of this it tries to be opinion-free and very extendable. For a more high level client library with more limited scope, have a look at `elasticsearch-dsl`_ - it is a more pythonic library sitting on top of ``elasticsearch-py``. .. _elasticsearch-dsl: https://elasticsearch-dsl.readthedocs.io/ Compatibility ------------- The library is compatible with all Elasticsearch versions since ``0.90.x`` but you **have to use a matching major version**: For **Elasticsearch 5.0** and later, use the major version 5 (``5.x.y``) of the library. For **Elasticsearch 2.0** and later, use the major version 2 (``2.x.y``) of the library. For **Elasticsearch 1.0** and later, use the major version 1 (``1.x.y``) of the library. For **Elasticsearch 0.90.x**, use a version from ``0.4.x`` releases of the library. The recommended way to set your requirements in your `setup.py` or `requirements.txt` is:: # Elasticsearch 5.x elasticsearch>=5.0.0,<6.0.0 # Elasticsearch 2.x elasticsearch>=2.0.0,<3.0.0 # Elasticsearch 1.x elasticsearch>=1.0.0,<2.0.0 # Elasticsearch 0.90.x elasticsearch<1.0.0 The development is happening on ``master`` and ``2.x`` branches respectively. Example Usage ------------- :: from datetime import datetime from elasticsearch import Elasticsearch es = Elasticsearch() doc = { 'author': 'kimchy', 'text': 'Elasticsearch: cool. bonsai cool.', 'timestamp': datetime.now(), } res = es.index(index="test-index", doc_type='tweet', id=1, body=doc) print(res['created']) res = es.get(index="test-index", doc_type='tweet', id=1) print(res['_source']) es.indices.refresh(index="test-index") res = es.search(index="test-index", body={"query": {"match_all": {}}}) print("Got %d Hits:" % res['hits']['total']) for hit in res['hits']['hits']: print("%(timestamp)s %(author)s: %(text)s" % hit["_source"]) Features -------- This client was designed as very thin wrapper around Elasticsearch's REST API to allow for maximum flexibility. This means that there are no opinions in this client; it also means that some of the APIs are a little cumbersome to use from Python. We have created some :ref:`helpers` to help with this issue as well as a more high level library (`elasticsearch-dsl`_) on top of this one to provide a more convenient way of working with Elasticsearch. .. _elasticsearch-dsl: https://elasticsearch-dsl.readthedocs.io/ Persistent Connections ~~~~~~~~~~~~~~~~~~~~~~ ``elasticsearch-py`` uses persistent connections inside of individual connection pools (one per each configured or sniffed node). Out of the box you can choose between two ``http`` protocol implementations. See :ref:`transports` for more information. The transport layer will create an instance of the selected connection class per node and keep track of the health of individual nodes - if a node becomes unresponsive (throwing exceptions while connecting to it) it's put on a timeout by the :class:`~elasticsearch.ConnectionPool` class and only returned to the circulation after the timeout is over (or when no live nodes are left). By default nodes are randomized before being passed into the pool and round-robin strategy is used for load balancing. You can customize this behavior by passing parameters to the :ref:`connection_api` (all keyword arguments to the :class:`~elasticsearch.Elasticsearch` class will be passed through). If what you want to accomplish is not supported you should be able to create a subclass of the relevant component and pass it in as a parameter to be used instead of the default implementation. Automatic Retries ~~~~~~~~~~~~~~~~~ If a connection to a node fails due to connection issues (raises :class:`~elasticsearch.ConnectionError`) it is considered in faulty state. It will be placed on hold for ``dead_timeout`` seconds and the request will be retried on another node. If a connection fails multiple times in a row the timeout will get progressively larger to avoid hitting a node that's, by all indication, down. If no live connection is available, the connection that has the smallest timeout will be used. By default retries are not triggered by a timeout (:class:`~elasticsearch.ConnectionTimeout`), set ``retry_on_timeout`` to ``True`` to also retry on timeouts. .. _sniffing: Sniffing ~~~~~~~~ The client can be configured to inspect the cluster state to get a list of nodes upon startup, periodically and/or on failure. See :class:`~elasticsearch.Transport` parameters for details. Some example configurations:: from elasticsearch import Elasticsearch # by default we don't sniff, ever es = Elasticsearch() # you can specify to sniff on startup to inspect the cluster and load # balance across all nodes es = Elasticsearch(["seed1", "seed2"], sniff_on_start=True) # you can also sniff periodically and/or after failure: es = Elasticsearch(["seed1", "seed2"], sniff_on_start=True, sniff_on_connection_fail=True, sniffer_timeout=60) Thread safety ~~~~~~~~~~~~~ The client is thread safe and can be used in a multi threaded environment. Best practice is to create a single global instance of the client and use it throughout your application. If your application is long-running consider turning on :ref:`sniffing` to make sure the client is up to date on the cluster location. By default we allow ``urllib3`` to open up to 10 connections to each node, if your application calls for more parallelism, use the ``maxsize`` parameter to raise the limit:: # allow up to 25 connections to each node es = Elasticsearch(["host1", "host2"], maxsize=25) .. note:: Since we use persistent connections throughout the client it means that the client doesn't tolerate ``fork`` very well. If your application calls for multiple processes make sure you create a fresh client after call to ``fork``. Note that Python's ``multiprocessing`` module uses ``fork`` to create new processes on POSIX systems. SSL and Authentication ~~~~~~~~~~~~~~~~~~~~~~ You can configure the client to use ``SSL`` for connecting to your elasticsearch cluster, including certificate verification and http auth:: from elasticsearch import Elasticsearch # you can use RFC-1738 to specify the url es = Elasticsearch(['https://user:secret@localhost:443']) # ... or specify common parameters as kwargs # use certifi for CA certificates import certifi es = Elasticsearch( ['localhost', 'otherhost'], http_auth=('user', 'secret'), port=443, use_ssl=True ) # SSL client authentication using client_cert and client_key es = Elasticsearch( ['localhost', 'otherhost'], http_auth=('user', 'secret'), port=443, use_ssl=True, ca_certs='/path/to/cacert.pem', client_cert='/path/to/client_cert.pem', client_key='/path/to/client_key.pem', ) .. warning:: ``elasticsearch-py`` doesn't ship with default set of root certificates. To have working SSL certificate validation you need to either specify your own as ``ca_certs`` or install `certifi`_ which will be picked up automatically. See class :class:`~elasticsearch.Urllib3HttpConnection` for detailed description of the options. .. _certifi: http://certifi.io/ Logging ~~~~~~~ ``elasticsearch-py`` uses the standard `logging library`_ from python to define two loggers: ``elasticsearch`` and ``elasticsearch.trace``. ``elasticsearch`` is used by the client to log standard activity, depending on the log level. ``elasticsearch.trace`` can be used to log requests to the server in the form of ``curl`` commands using pretty-printed json that can then be executed from command line. Because it is designed to be shared (for example to demonstrate an issue) it also just uses ``localhost:9200`` as the address instead of the actual address of the host. If the trace logger has not been configured already it is set to `propagate=False` so it needs to be activated separately. .. _logging library: http://docs.python.org/3.3/library/logging.html Environment considerations -------------------------- When using the client there are several limitations of your environment that could come into play. When using an http load balancer you cannot use the :ref:`sniffing` functionality - the cluster would supply the client with IP addresses to directly connect to the cluster, circumventing the load balancer. Depending on your configuration this might be something you don't want or break completely. In some environments (notably on Google App Engine) your http requests might be restricted so that ``GET`` requests won't accept body. In that case use the ``send_get_body_as`` parameter of :class:`~elasticsearch.Transport` to send all bodies via post:: from elasticsearch import Elasticsearch es = Elasticsearch(send_get_body_as='POST') Running on AWS with IAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to use this client with IAM based authentication on AWS you can use the `requests-aws4auth`_ package:: from elasticsearch import Elasticsearch, RequestsHttpConnection from requests_aws4auth import AWS4Auth host = 'YOURHOST.us-east-1.es.amazonaws.com' awsauth = AWS4Auth(YOUR_ACCESS_KEY, YOUR_SECRET_KEY, REGION, 'es') es = Elasticsearch( hosts=[{'host': host, 'port': 443}], http_auth=awsauth, use_ssl=True, verify_certs=True, connection_class=RequestsHttpConnection ) print(es.info()) .. _requests-aws4auth: https://pypi.python.org/pypi/requests-aws4auth Contents -------- .. toctree:: :maxdepth: 2 api exceptions connection transports helpers Changelog License ------- Copyright 2013 Elasticsearch Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Indices and tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search` elasticsearch-py-5.4.0/docs/transports.rst000066400000000000000000000020571310735271200206660ustar00rootroot00000000000000.. _transports: Transport classes ================= List of transport classes that can be used, simply import your choice and pass it to the constructor of :class:`~elasticsearch.Elasticsearch` as `connection_class`. Note that the :class:`~elasticsearch.connection.RequestsHttpConnection` requires ``requests`` to be installed. For example to use the ``requests``-based connection just import it and use it:: from elasticsearch import Elasticsearch, RequestsHttpConnection es = Elasticsearch(connection_class=RequestsHttpConnection) The default connection class is based on ``urllib3`` which is more performant and lightweight than the optional ``requests``-based class. Only use ``RequestsHttpConnection`` if you have need of any of ``requests`` advanced features like custom auth plugins etc. .. py:module:: elasticsearch.connection Connection ---------- .. autoclass:: Connection Urllib3HttpConnection --------------------- .. autoclass:: Urllib3HttpConnection RequestsHttpConnection ---------------------- .. autoclass:: RequestsHttpConnection elasticsearch-py-5.4.0/elasticsearch/000077500000000000000000000000001310735271200175735ustar00rootroot00000000000000elasticsearch-py-5.4.0/elasticsearch/__init__.py000066400000000000000000000015051310735271200217050ustar00rootroot00000000000000from __future__ import absolute_import VERSION = (5, 4, 0) __version__ = VERSION __versionstr__ = '.'.join(map(str, VERSION)) import sys if (2, 7) <= sys.version_info < (3, 2): # On Python 2.7 and Python3 < 3.2, install no-op handler to silence # `No handlers could be found for logger "elasticsearch"` message per # import logging logger = logging.getLogger('elasticsearch') logger.addHandler(logging.NullHandler()) from .client import Elasticsearch from .transport import Transport from .connection_pool import ConnectionPool, ConnectionSelector, \ RoundRobinSelector from .serializer import JSONSerializer from .connection import Connection, RequestsHttpConnection, \ Urllib3HttpConnection from .exceptions import * elasticsearch-py-5.4.0/elasticsearch/client/000077500000000000000000000000001310735271200210515ustar00rootroot00000000000000elasticsearch-py-5.4.0/elasticsearch/client/__init__.py000066400000000000000000002450401310735271200231670ustar00rootroot00000000000000from __future__ import unicode_literals import logging from ..transport import Transport from ..exceptions import TransportError from ..compat import string_types, urlparse, unquote from .indices import IndicesClient from .ingest import IngestClient from .cluster import ClusterClient from .cat import CatClient from .nodes import NodesClient from .snapshot import SnapshotClient from .tasks import TasksClient from .utils import query_params, _make_path, SKIP_IN_PATH logger = logging.getLogger('elasticsearch') def _normalize_hosts(hosts): """ Helper function to transform hosts argument to :class:`~elasticsearch.Elasticsearch` to a list of dicts. """ # if hosts are empty, just defer to defaults down the line if hosts is None: return [{}] # passed in just one string if isinstance(hosts, string_types): hosts = [hosts] out = [] # normalize hosts to dicts for host in hosts: if isinstance(host, string_types): if '://' not in host: host = "//%s" % host parsed_url = urlparse(host) h = {"host": parsed_url.hostname} if parsed_url.port: h["port"] = parsed_url.port if parsed_url.scheme == "https": h['port'] = parsed_url.port or 443 h['use_ssl'] = True if parsed_url.username or parsed_url.password: h['http_auth'] = '%s:%s' % (unquote(parsed_url.username), unquote(parsed_url.password)) if parsed_url.path and parsed_url.path != '/': h['url_prefix'] = parsed_url.path out.append(h) else: out.append(host) return out class Elasticsearch(object): """ Elasticsearch low-level client. Provides a straightforward mapping from Python to ES REST endpoints. The instance has attributes ``cat``, ``cluster``, ``indices``, ``ingest``, ``nodes``, ``snapshot`` and ``tasks`` that provide access to instances of :class:`~elasticsearch.client.CatClient`, :class:`~elasticsearch.client.ClusterClient`, :class:`~elasticsearch.client.IndicesClient`, :class:`~elasticsearch.client.IngestClient`, :class:`~elasticsearch.client.NodesClient`, :class:`~elasticsearch.client.SnapshotClient` and :class:`~elasticsearch.client.TasksClient` respectively. This is the preferred (and only supported) way to get access to those classes and their methods. You can specify your own connection class which should be used by providing the ``connection_class`` parameter:: # create connection to localhost using the ThriftConnection es = Elasticsearch(connection_class=ThriftConnection) If you want to turn on :ref:`sniffing` you have several options (described in :class:`~elasticsearch.Transport`):: # create connection that will automatically inspect the cluster to get # the list of active nodes. Start with nodes running on 'esnode1' and # 'esnode2' es = Elasticsearch( ['esnode1', 'esnode2'], # sniff before doing anything sniff_on_start=True, # refresh nodes after a node fails to respond sniff_on_connection_fail=True, # and also every 60 seconds sniffer_timeout=60 ) Different hosts can have different parameters, use a dictionary per node to specify those:: # connect to localhost directly and another node using SSL on port 443 # and an url_prefix. Note that ``port`` needs to be an int. es = Elasticsearch([ {'host': 'localhost'}, {'host': 'othernode', 'port': 443, 'url_prefix': 'es', 'use_ssl': True}, ]) If using SSL, there are several parameters that control how we deal with certificates (see :class:`~elasticsearch.Urllib3HttpConnection` for detailed description of the options):: es = Elasticsearch( ['localhost:443', 'other_host:443'], # turn on SSL use_ssl=True, # make sure we verify SSL certificates (off by default) verify_certs=True, # provide a path to CA certs on disk ca_certs='/path/to/CA_certs' ) SSL client authentication is supported (see :class:`~elasticsearch.Urllib3HttpConnection` for detailed description of the options):: es = Elasticsearch( ['localhost:443', 'other_host:443'], # turn on SSL use_ssl=True, # make sure we verify SSL certificates (off by default) verify_certs=True, # provide a path to CA certs on disk ca_certs='/path/to/CA_certs', # PEM formatted SSL client certificate client_cert='/path/to/clientcert.pem', # PEM formatted SSL client key client_key='/path/to/clientkey.pem' ) Alternatively you can use RFC-1738 formatted URLs, as long as they are not in conflict with other options:: es = Elasticsearch( [ 'http://user:secret@localhost:9200/', 'https://user:secret@other_host:443/production' ], verify_certs=True ) """ def __init__(self, hosts=None, transport_class=Transport, **kwargs): """ :arg hosts: list of nodes we should connect to. Node should be a dictionary ({"host": "localhost", "port": 9200}), the entire dictionary will be passed to the :class:`~elasticsearch.Connection` class as kwargs, or a string in the format of ``host[:port]`` which will be translated to a dictionary automatically. If no value is given the :class:`~elasticsearch.Urllib3HttpConnection` class defaults will be used. :arg transport_class: :class:`~elasticsearch.Transport` subclass to use. :arg kwargs: any additional arguments will be passed on to the :class:`~elasticsearch.Transport` class and, subsequently, to the :class:`~elasticsearch.Connection` instances. """ self.transport = transport_class(_normalize_hosts(hosts), **kwargs) # namespaced clients for compatibility with API names self.indices = IndicesClient(self) self.ingest = IngestClient(self) self.cluster = ClusterClient(self) self.cat = CatClient(self) self.nodes = NodesClient(self) self.snapshot = SnapshotClient(self) self.tasks = TasksClient(self) def __repr__(self): try: # get a lost of all connections cons = self.transport.hosts # truncate to 10 if there are too many if len(cons) > 5: cons = cons[:5] + ['...'] return '' % cons except: # probably operating on custom transport and connection_pool, ignore return super(Elasticsearch, self).__repr__() def _bulk_body(self, body): # if not passed in a string, serialize items and join by newline if not isinstance(body, string_types): body = '\n'.join(map(self.transport.serializer.dumps, body)) # bulk body must end with a newline if not body.endswith('\n'): body += '\n' return body @query_params() def ping(self, params=None): """ Returns True if the cluster is up, False otherwise. ``_ """ try: return self.transport.perform_request('HEAD', '/', params=params) except TransportError: return False @query_params() def info(self, params=None): """ Get the basic info from the current cluster. ``_ """ return self.transport.perform_request('GET', '/', params=params) @query_params('parent', 'pipeline', 'refresh', 'routing', 'timeout', 'timestamp', 'ttl', 'version', 'version_type', 'wait_for_active_shards') def create(self, index, doc_type, id, body, params=None): """ Adds a typed JSON document in a specific index, making it searchable. Behind the scenes this method calls index(..., op_type='create') ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: Document ID :arg body: The document :arg parent: ID of the parent document :arg pipeline: The pipeline id to preprocess incoming documents with :arg refresh: If `true` then refresh the affected shards to make this operation visible to search, if `wait_for` then wait for a refresh to make this operation visible to search, if `false` (the default) then do nothing with refreshes., valid choices are: u'true', u'false', u'wait_for' :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg timestamp: Explicit timestamp for the document :arg ttl: Expiration time for the document :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: u'internal', u'external', u'external_gte', u'force' :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the index operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) """ for param in (index, doc_type, id, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path(index, doc_type, id, '_create'), params=params, body=body) @query_params('op_type', 'parent', 'pipeline', 'refresh', 'routing', 'timeout', 'timestamp', 'ttl', 'version', 'version_type', 'wait_for_active_shards') def index(self, index, doc_type, body, id=None, params=None): """ Adds or updates a typed JSON document in a specific index, making it searchable. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg body: The document :arg id: Document ID :arg op_type: Explicit operation type, default 'index', valid choices are: 'index', 'create' :arg parent: ID of the parent document :arg pipeline: The pipeline id to preprocess incoming documents with :arg refresh: If `true` then refresh the affected shards to make this operation visible to search, if `wait_for` then wait for a refresh to make this operation visible to search, if `false` (the default) then do nothing with refreshes., valid choices are: u'true', u'false', u'wait_for' :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg timestamp: Explicit timestamp for the document :arg ttl: Expiration time for the document :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the index operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) """ for param in (index, doc_type, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('POST' if id in SKIP_IN_PATH else 'PUT', _make_path(index, doc_type, id), params=params, body=body) @query_params('_source', '_source_exclude', '_source_include', 'parent', 'preference', 'realtime', 'refresh', 'routing', 'stored_fields', 'version', 'version_type') def exists(self, index, doc_type, id, params=None): """ Returns a boolean indicating whether or not given document exists in Elasticsearch. ``_ :arg index: The name of the index :arg doc_type: The type of the document (use `_all` to fetch the first document matching the ID across all types) :arg id: The document ID :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value :arg stored_fields: A comma-separated list of stored fields to return in the response :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('HEAD', _make_path(index, doc_type, id), params=params) @query_params('_source', '_source_exclude', '_source_include', 'parent', 'preference', 'realtime', 'refresh', 'routing', 'version', 'version_type') def exists_source(self, index, doc_type, id, params=None): """ ``_ :arg index: The name of the index :arg doc_type: The type of the document; use `_all` to fetch the first document matching the ID across all types :arg id: The document ID :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('HEAD', _make_path(index, doc_type, id, '_source'), params=params) @query_params('_source', '_source_exclude', '_source_include', 'parent', 'preference', 'realtime', 'refresh', 'routing', 'stored_fields', 'version', 'version_type') def get(self, index, id, doc_type='_all', params=None): """ Get a typed JSON document from the index based on its id. ``_ :arg index: The name of the index :arg id: The document ID :arg doc_type: The type of the document (use `_all` to fetch the first document matching the ID across all types) :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value :arg stored_fields: A comma-separated list of stored fields to return in the response :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path(index, doc_type, id), params=params) @query_params('_source', '_source_exclude', '_source_include', 'parent', 'preference', 'realtime', 'refresh', 'routing', 'version', 'version_type') def get_source(self, index, doc_type, id, params=None): """ Get the source of a document by it's index, type and id. ``_ :arg index: The name of the index :arg doc_type: The type of the document; use `_all` to fetch the first document matching the ID across all types :arg id: The document ID :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg routing: Specific routing value :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path(index, doc_type, id, '_source'), params=params) @query_params('_source', '_source_exclude', '_source_include', 'preference', 'realtime', 'refresh', 'stored_fields') def mget(self, body, index=None, doc_type=None, params=None): """ Get multiple documents based on an index, type (optional) and ids. ``_ :arg body: Document identifiers; can be either `docs` (containing full document information) or `ids` (when index and type is provided in the URL. :arg index: The name of the index :arg doc_type: The type of the document :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg realtime: Specify whether to perform the operation in realtime or search mode :arg refresh: Refresh the shard containing the document before performing the operation :arg stored_fields: A comma-separated list of stored fields to return in the response """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('GET', _make_path(index, doc_type, '_mget'), params=params, body=body) @query_params('_source', '_source_exclude', '_source_include', 'fields', 'lang', 'parent', 'refresh', 'retry_on_conflict', 'routing', 'timeout', 'timestamp', 'ttl', 'version', 'version_type', 'wait_for_active_shards') def update(self, index, doc_type, id, body=None, params=None): """ Update a document based on a script or partial data provided. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: Document ID :arg body: The request definition using either `script` or partial `doc` :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg fields: A comma-separated list of fields to return in the response :arg lang: The script language (default: painless) :arg parent: ID of the parent document. Is is only used for routing and when for the upsert request :arg refresh: If `true` then refresh the effected shards to make this operation visible to search, if `wait_for` then wait for a refresh to make this operation visible to search, if `false` (the default) then do nothing with refreshes., valid choices are: 'true', 'false', 'wait_for' :arg retry_on_conflict: Specify how many times should the operation be retried when a conflict occurs (default: 0) :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg timestamp: Explicit timestamp for the document :arg ttl: Expiration time for the document :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'force' :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the update operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('POST', _make_path(index, doc_type, id, '_update'), params=params, body=body) @query_params('_source', '_source_exclude', '_source_include', 'allow_no_indices', 'analyze_wildcard', 'analyzer', 'batched_reduce_size', 'default_operator', 'df', 'docvalue_fields', 'expand_wildcards', 'explain', 'fielddata_fields', 'from_', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'preference', 'q', 'request_cache', 'routing', 'scroll', 'search_type', 'size', 'sort', 'stats', 'stored_fields', 'suggest_field', 'suggest_mode', 'suggest_size', 'suggest_text', 'terminate_after', 'timeout', 'track_scores', 'typed_keys', 'version') def search(self, index=None, doc_type=None, body=None, params=None): """ Execute a search query and get back search hits that match the query. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg body: The search definition using the Query DSL :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg batched_reduce_size: The number of shard results that should be reduced at once on the coordinating node. This value should be used as a protection mechanism to reduce the memory overhead per search request if the potential number of shards in the request can be large., default 512 :arg default_operator: The default operator for query string query (AND or OR), default 'OR', valid choices are: 'AND', 'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg docvalue_fields: A comma-separated list of fields to return as the docvalue representation of a field for each hit :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg explain: Specify whether to return detailed information about score computation as part of a hit :arg fielddata_fields: A comma-separated list of fields to return as the docvalue representation of a field for each hit :arg from\_: Starting offset (default: 0) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg request_cache: Specify if request cache should be used for this request or not, defaults to index level setting :arg routing: A comma-separated list of specific routing values :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg search_type: Search operation type, valid choices are: 'query_then_fetch', 'dfs_query_then_fetch' :arg size: Number of hits to return (default: 10) :arg sort: A comma-separated list of : pairs :arg stats: Specific 'tag' of the request for logging and statistical purposes :arg stored_fields: A comma-separated list of stored fields to return as part of a hit :arg suggest_field: Specify which field to use for suggestions :arg suggest_mode: Specify suggest mode, default 'missing', valid choices are: 'missing', 'popular', 'always' :arg suggest_size: How many suggestions to return in response :arg suggest_text: The source text for which the suggestions should be returned :arg terminate_after: The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. :arg timeout: Explicit operation timeout :arg track_scores: Whether to calculate and return scores even if they are not used for sorting :arg typed_keys: Specify whether aggregation and suggester names should be prefixed by their respective types in the response :arg version: Specify whether to return document version as part of a hit """ # from is a reserved word so it cannot be used, use from_ instead if 'from_' in params: params['from'] = params.pop('from_') if doc_type and not index: index = '_all' return self.transport.perform_request('GET', _make_path(index, doc_type, '_search'), params=params, body=body) @query_params('_source', '_source_exclude', '_source_include', 'allow_no_indices', 'analyze_wildcard', 'analyzer', 'conflicts', 'default_operator', 'df', 'docvalue_fields', 'expand_wildcards', 'explain', 'fielddata_fields', 'from_', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'pipeline', 'preference', 'q', 'refresh', 'request_cache', 'requests_per_second', 'routing', 'scroll', 'scroll_size', 'search_timeout', 'search_type', 'size', 'sort', 'stats', 'stored_fields', 'suggest_field', 'suggest_mode', 'suggest_size', 'suggest_text', 'terminate_after', 'timeout', 'track_scores', 'version', 'version_type', 'wait_for_active_shards', 'wait_for_completion') def update_by_query(self, index, doc_type=None, body=None, params=None): """ Perform an update on all documents matching a query. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg body: The search definition using the Query DSL :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg conflicts: What to do when the reindex hits version conflicts?, default 'abort', valid choices are: 'abort', 'proceed' :arg default_operator: The default operator for query string query (AND or OR), default 'OR', valid choices are: 'AND', 'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg docvalue_fields: A comma-separated list of fields to return as the docvalue representation of a field for each hit :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg explain: Specify whether to return detailed information about score computation as part of a hit :arg fielddata_fields: A comma-separated list of fields to return as the docvalue representation of a field for each hit :arg from\_: Starting offset (default: 0) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg pipeline: Ingest pipeline to set on index requests made by this action. (default: none) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg refresh: Should the effected indexes be refreshed? :arg request_cache: Specify if request cache should be used for this request or not, defaults to index level setting :arg requests_per_second: The throttle to set on this request in sub- requests per second. -1 means set no throttle as does "unlimited" which is the only non-float this accepts., default 0 :arg routing: A comma-separated list of specific routing values :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg scroll_size: Size on the scroll request powering the update_by_query :arg search_timeout: Explicit timeout for each search request. Defaults to no timeout. :arg search_type: Search operation type, valid choices are: 'query_then_fetch', 'dfs_query_then_fetch' :arg size: Number of hits to return (default: 10) :arg slices: The number of slices this task should be divided into. Defaults to 1 meaning the task isn't sliced into subtasks., default 1 :arg sort: A comma-separated list of : pairs :arg stats: Specific 'tag' of the request for logging and statistical purposes :arg stored_fields: A comma-separated list of stored fields to return as part of a hit :arg suggest_field: Specify which field to use for suggestions :arg suggest_mode: Specify suggest mode, default 'missing', valid choices are: 'missing', 'popular', 'always' :arg suggest_size: How many suggestions to return in response :arg suggest_text: The source text for which the suggestions should be returned :arg terminate_after: The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. :arg timeout: Time each individual bulk request should wait for shards that are unavailable., default '1m' :arg track_scores: Whether to calculate and return scores even if they are not used for sorting :arg version: Specify whether to return document version as part of a hit :arg version_type: Should the document increment the version number (internal) on hit or not (reindex) :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the update by query operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) :arg wait_for_completion: Should the request should block until the reindex is complete., default True """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('POST', _make_path(index, doc_type, '_update_by_query'), params=params, body=body) @query_params('refresh', 'requests_per_second', 'slices', 'timeout', 'wait_for_active_shards', 'wait_for_completion') def reindex(self, body, params=None): """ Reindex all documents from one index to another. ``_ :arg body: The search definition using the Query DSL and the prototype for the index request. :arg refresh: Should the effected indexes be refreshed? :arg requests_per_second: The throttle to set on this request in sub- requests per second. -1 means set no throttle as does "unlimited" which is the only non-float this accepts., default 0 :arg slices: The number of slices this task should be divided into. Defaults to 1 meaning the task isn't sliced into subtasks., default 1 :arg timeout: Time each individual bulk request should wait for shards that are unavailable., default '1m' :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the reindex operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) :arg wait_for_completion: Should the request should block until the reindex is complete., default True """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('POST', '/_reindex', params=params, body=body) @query_params('requests_per_second') def reindex_rethrottle(self, task_id=None, params=None): """ Change the value of ``requests_per_second`` of a running ``reindex`` task. ``_ :arg task_id: The task id to rethrottle :arg requests_per_second: The throttle to set on this request in floating sub-requests per second. -1 means set no throttle. """ return self.transport.perform_request('POST', _make_path('_reindex', task_id, '_rethrottle'), params=params) @query_params('_source', '_source_exclude', '_source_include', 'allow_no_indices', 'analyze_wildcard', 'analyzer', 'conflicts', 'default_operator', 'df', 'docvalue_fields', 'expand_wildcards', 'explain', 'from_', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'preference', 'q', 'refresh', 'request_cache', 'requests_per_second', 'routing', 'scroll', 'slices', 'scroll_size', 'search_timeout', 'search_type', 'size', 'sort', 'stats', 'stored_fields', 'suggest_field', 'suggest_mode', 'suggest_size', 'suggest_text', 'terminate_after', 'timeout', 'track_scores', 'version', 'wait_for_active_shards', 'wait_for_completion') def delete_by_query(self, index, body, doc_type=None, params=None): """ Delete all documents matching a query. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg body: The search definition using the Query DSL :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg conflicts: What to do when the delete-by-query hits version conflicts?, default 'abort', valid choices are: 'abort', 'proceed' :arg default_operator: The default operator for query string query (AND or OR), default 'OR', valid choices are: 'AND', 'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg docvalue_fields: A comma-separated list of fields to return as the docvalue representation of a field for each hit :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg explain: Specify whether to return detailed information about score computation as part of a hit :arg from\_: Starting offset (default: 0) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg refresh: Should the effected indexes be refreshed? :arg request_cache: Specify if request cache should be used for this request or not, defaults to index level setting :arg requests_per_second: The throttle for this request in sub-requests per second. -1 means no throttle., default 0 :arg routing: A comma-separated list of specific routing values :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg scroll_size: Size on the scroll request powering the update_by_query :arg search_timeout: Explicit timeout for each search request. Defaults to no timeout. :arg search_type: Search operation type, valid choices are: 'query_then_fetch', 'dfs_query_then_fetch' :arg size: Number of hits to return (default: 10) :arg slices: The number of slices this task should be divided into. Defaults to 1 meaning the task isn't sliced into subtasks., default 1 :arg sort: A comma-separated list of : pairs :arg stats: Specific 'tag' of the request for logging and statistical purposes :arg stored_fields: A comma-separated list of stored fields to return as part of a hit :arg suggest_field: Specify which field to use for suggestions :arg suggest_mode: Specify suggest mode, default 'missing', valid choices are: 'missing', 'popular', 'always' :arg suggest_size: How many suggestions to return in response :arg suggest_text: The source text for which the suggestions should be returned :arg terminate_after: The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. :arg timeout: Time each individual bulk request should wait for shards that are unavailable., default '1m' :arg track_scores: Whether to calculate and return scores even if they are not used for sorting :arg version: Specify whether to return document version as part of a hit :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the delete by query operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) :arg wait_for_completion: Should the request should block until the delete-by-query is complete., default True """ for param in (index, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('POST', _make_path(index, doc_type, '_delete_by_query'), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local', 'preference', 'routing') def search_shards(self, index=None, doc_type=None, params=None): """ The search shards api returns the indices and shards that a search request would be executed against. This can give useful feedback for working out issues or planning optimizations with routing and shard preferences. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: Specific routing value """ return self.transport.perform_request('GET', _make_path(index, doc_type, '_search_shards'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'explain', 'ignore_unavailable', 'preference', 'profile', 'routing', 'scroll', 'search_type', 'typed_keys') def search_template(self, index=None, doc_type=None, body=None, params=None): """ A query that accepts a query template and a map of key/value pairs to fill in template parameters. ``_ :arg index: A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to search; leave empty to perform the operation on all types :arg body: The search definition template and its params :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg explain: Specify whether to return detailed information about score computation as part of a hit :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg profile: Specify whether to profile the query execution :arg routing: A comma-separated list of specific routing values :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg search_type: Search operation type, valid choices are: 'query_then_fetch', 'query_and_fetch', 'dfs_query_then_fetch', 'dfs_query_and_fetch' :arg typed_keys: Specify whether aggregation and suggester names should be prefixed by their respective types in the response """ return self.transport.perform_request('GET', _make_path(index, doc_type, '_search', 'template'), params=params, body=body) @query_params('_source', '_source_exclude', '_source_include', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'lenient', 'lowercase_expanded_terms', 'parent', 'preference', 'q', 'routing', 'stored_fields') def explain(self, index, doc_type, id, body=None, params=None): """ The explain api computes a score explanation for a query and a specific document. This can give useful feedback whether a document matches or didn't match a specific query. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: The document ID :arg body: The query definition using the Query DSL :arg _source: True or false to return the _source field or not, or a list of fields to return :arg _source_exclude: A list of fields to exclude from the returned _source field :arg _source_include: A list of fields to extract and return from the _source field :arg analyze_wildcard: Specify whether wildcards and prefix queries in the query string query should be analyzed (default: false) :arg analyzer: The analyzer for the query string query :arg default_operator: The default operator for query string query (AND or OR), default 'OR', valid choices are: 'AND', 'OR' :arg df: The default field for query string query (default: _all) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg parent: The ID of the parent document :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg routing: Specific routing value :arg stored_fields: A comma-separated list of stored fields to return in the response """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path(index, doc_type, id, '_explain'), params=params, body=body) @query_params('scroll') def scroll(self, scroll_id=None, body=None, params=None): """ Scroll a search request created by specifying the scroll parameter. ``_ :arg scroll_id: The scroll ID :arg body: The scroll ID if not passed by URL or query parameter. :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search """ if scroll_id in SKIP_IN_PATH and body in SKIP_IN_PATH: raise ValueError("You need to supply scroll_id or body.") elif scroll_id and not body: body = {'scroll_id':scroll_id} elif scroll_id: params['scroll_id'] = scroll_id return self.transport.perform_request('GET', '/_search/scroll', params=params, body=body) @query_params() def clear_scroll(self, scroll_id=None, body=None, params=None): """ Clear the scroll request created by specifying the scroll parameter to search. ``_ :arg scroll_id: A comma-separated list of scroll IDs to clear :arg body: A comma-separated list of scroll IDs to clear if none was specified via the scroll_id parameter """ if scroll_id in SKIP_IN_PATH and body in SKIP_IN_PATH: raise ValueError("You need to supply scroll_id or body.") elif scroll_id and not body: body = {'scroll_id':[scroll_id]} elif scroll_id: params['scroll_id'] = scroll_id return self.transport.perform_request('DELETE', '/_search/scroll', params=params, body=body) @query_params('parent', 'refresh', 'routing', 'timeout', 'version', 'version_type', 'wait_for_active_shards') def delete(self, index, doc_type, id, params=None): """ Delete a typed JSON document from a specific index based on its id. ``_ :arg index: The name of the index :arg doc_type: The type of the document :arg id: The document ID :arg parent: ID of parent document :arg refresh: If `true` then refresh the effected shards to make this operation visible to search, if `wait_for` then wait for a refresh to make this operation visible to search, if `false` (the default) then do nothing with refreshes., valid choices are: 'true', 'false', 'wait_for' :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the delete operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) """ for param in (index, doc_type, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('DELETE', _make_path(index, doc_type, id), params=params) @query_params('allow_no_indices', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'expand_wildcards', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'min_score', 'preference', 'q', 'routing') def count(self, index=None, doc_type=None, body=None, params=None): """ Execute a query and get the number of matches for that query. ``_ :arg index: A comma-separated list of indices to restrict the results :arg doc_type: A comma-separated list of types to restrict the results :arg body: A query to restrict the results specified with the Query DSL (optional) :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg default_operator: The default operator for query string query (AND or OR), default 'OR', valid choices are: 'AND', 'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg min_score: Include only documents with a specific `_score` value in the result :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg q: Query in the Lucene query string syntax :arg routing: Specific routing value """ if doc_type and not index: index = '_all' return self.transport.perform_request('GET', _make_path(index, doc_type, '_count'), params=params, body=body) @query_params('_source', '_source_exclude', '_source_include', 'fields', 'pipeline', 'refresh', 'routing', 'timeout', 'wait_for_active_shards') def bulk(self, body, index=None, doc_type=None, params=None): """ Perform many index/delete operations in a single API call. See the :func:`~elasticsearch.helpers.bulk` helper function for a more friendly API. ``_ :arg body: The operation definition and data (action-data pairs), separated by newlines :arg index: Default index for items which don't provide one :arg doc_type: Default document type for items which don't provide one :arg _source: True or false to return the _source field or not, or default list of fields to return, can be overridden on each sub- request :arg _source_exclude: Default list of fields to exclude from the returned _source field, can be overridden on each sub-request :arg _source_include: Default list of fields to extract and return from the _source field, can be overridden on each sub-request :arg fields: Default comma-separated list of fields to return in the response for updates, can be overridden on each sub-request :arg pipeline: The pipeline id to preprocess incoming documents with :arg refresh: If `true` then refresh the effected shards to make this operation visible to search, if `wait_for` then wait for a refresh to make this operation visible to search, if `false` (the default) then do nothing with refreshes., valid choices are: 'true', 'false', 'wait_for' :arg routing: Specific routing value :arg timeout: Explicit operation timeout :arg wait_for_active_shards: Sets the number of shard copies that must be active before proceeding with the bulk operation. Defaults to 1, meaning the primary shard only. Set to `all` for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1) """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('POST', _make_path(index, doc_type, '_bulk'), params=params, body=self._bulk_body(body)) @query_params('max_concurrent_searches', 'search_type', 'typed_keys') def msearch(self, body, index=None, doc_type=None, params=None): """ Execute several search requests within the same API. ``_ :arg body: The request definitions (metadata-search request definition pairs), separated by newlines :arg index: A comma-separated list of index names to use as default :arg doc_type: A comma-separated list of document types to use as default :arg max_concurrent_searches: Controls the maximum number of concurrent searches the multi search api will execute :arg search_type: Search operation type, valid choices are: 'query_then_fetch', 'query_and_fetch', 'dfs_query_then_fetch', 'dfs_query_and_fetch' :arg typed_keys: Specify whether aggregation and suggester names should be prefixed by their respective types in the response """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('GET', _make_path(index, doc_type, '_msearch'), params=params, body=self._bulk_body(body)) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'preference', 'routing') def suggest(self, body, index=None, params=None): """ The suggest feature suggests similar looking terms based on a provided text by using a suggester. ``_ :arg body: The request definition :arg index: A comma-separated list of index names to restrict the operation; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: Specific routing value """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('POST', _make_path(index, '_suggest'), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'percolate_format', 'percolate_index', 'percolate_preference', 'percolate_routing', 'percolate_type', 'preference', 'routing', 'version', 'version_type') def percolate(self, index, doc_type, id=None, body=None, params=None): """ The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. ``_ :arg index: The index of the document being percolated. :arg doc_type: The type of the document being percolated. :arg id: Substitute the document in the request body with a document that is known by the specified id. On top of the id, the index and type parameter will be used to retrieve the document from within the cluster. :arg body: The percolator request definition using the percolate DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg percolate_format: Return an array of matching query IDs instead of objects, valid choices are: 'ids' :arg percolate_index: The index to percolate the document into. Defaults to index. :arg percolate_preference: Which shard to prefer when executing the percolate request. :arg percolate_routing: The routing value to use when percolating the existing document. :arg percolate_type: The type to percolate document into. Defaults to type. :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: A comma-separated list of specific routing values :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path(index, doc_type, id, '_percolate'), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable') def mpercolate(self, body, index=None, doc_type=None, params=None): """ The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. ``_ :arg body: The percolate request definitions (header & body pair), separated by newlines :arg index: The index of the document being count percolated to use as default :arg doc_type: The type of the document being percolated to use as default. :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('GET', _make_path(index, doc_type, '_mpercolate'), params=params, body=self._bulk_body(body)) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'percolate_index', 'percolate_type', 'preference', 'routing', 'version', 'version_type') def count_percolate(self, index, doc_type, id=None, body=None, params=None): """ The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. ``_ :arg index: The index of the document being count percolated. :arg doc_type: The type of the document being count percolated. :arg id: Substitute the document in the request body with a document that is known by the specified id. On top of the id, the index and type parameter will be used to retrieve the document from within the cluster. :arg body: The count percolator request definition using the percolate DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg percolate_index: The index to count percolate the document into. Defaults to index. :arg percolate_type: The type to count percolate document into. Defaults to type. :arg preference: Specify the node or shard the operation should be performed on (default: random) :arg routing: A comma-separated list of specific routing values :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path(index, doc_type, id, '_percolate', 'count'), params=params, body=body) @query_params('field_statistics', 'fields', 'offsets', 'parent', 'payloads', 'positions', 'preference', 'realtime', 'routing', 'term_statistics', 'version', 'version_type') def termvectors(self, index, doc_type, id=None, body=None, params=None): """ Returns information and statistics on terms in the fields of a particular document. The document could be stored in the index or artificially provided by the user (Added in 1.4). Note that for documents stored in the index, this is a near realtime API as the term vectors are not available until the next refresh. ``_ :arg index: The index in which the document resides. :arg doc_type: The type of the document. :arg id: The id of the document, when not specified a doc param should be supplied. :arg body: Define parameters and or supply a document to get termvectors for. See documentation. :arg field_statistics: Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned., default True :arg fields: A comma-separated list of fields to return. :arg offsets: Specifies if term offsets should be returned., default True :arg parent: Parent id of documents. :arg payloads: Specifies if term payloads should be returned., default True :arg positions: Specifies if term positions should be returned., default True :arg preference: Specify the node or shard the operation should be performed on (default: random). :arg realtime: Specifies if request is real-time as opposed to near- real-time (default: true). :arg routing: Specific routing value. :arg term_statistics: Specifies if total term frequency and document frequency should be returned., default False :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path(index, doc_type, id, '_termvectors'), params=params, body=body) @query_params('field_statistics', 'fields', 'ids', 'offsets', 'parent', 'payloads', 'positions', 'preference', 'realtime', 'routing', 'term_statistics', 'version', 'version_type') def mtermvectors(self, index=None, doc_type=None, body=None, params=None): """ Multi termvectors API allows to get multiple termvectors based on an index, type and id. ``_ :arg index: The index in which the document resides. :arg doc_type: The type of the document. :arg body: Define ids, documents, parameters or a list of parameters per document here. You must at least provide a list of document ids. See documentation. :arg field_statistics: Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg fields: A comma-separated list of fields to return. Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg ids: A comma-separated list of documents ids. You must define ids as parameter or set "ids" or "docs" in the request body :arg offsets: Specifies if term offsets should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg parent: Parent id of documents. Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg payloads: Specifies if term payloads should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg positions: Specifies if term positions should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default True :arg preference: Specify the node or shard the operation should be performed on (default: random) .Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg realtime: Specifies if requests are real-time as opposed to near- real-time (default: true). :arg routing: Specific routing value. Applies to all returned documents unless otherwise specified in body "params" or "docs". :arg term_statistics: Specifies if total term frequency and document frequency should be returned. Applies to all returned documents unless otherwise specified in body "params" or "docs"., default False :arg version: Explicit version number for concurrency control :arg version_type: Specific version type, valid choices are: 'internal', 'external', 'external_gte', 'force' """ return self.transport.perform_request('GET', _make_path(index, doc_type, '_mtermvectors'), params=params, body=body) @query_params() def put_script(self, lang, id, body, params=None): """ Create a script in given language with specified ID. ``_ :arg lang: Script language :arg id: Script ID :arg body: The document """ for param in (lang, id, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path('_scripts', lang, id), params=params, body=body) @query_params() def get_script(self, lang, id, params=None): """ Retrieve a script from the API. ``_ :arg lang: Script language :arg id: Script ID """ for param in (lang, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path('_scripts', lang, id), params=params) @query_params() def delete_script(self, lang, id, params=None): """ Remove a stored script from elasticsearch. ``_ :arg lang: Script language :arg id: Script ID """ for param in (lang, id): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('DELETE', _make_path('_scripts', lang, id), params=params) @query_params() def put_template(self, id, body, params=None): """ Create a search template. ``_ :arg id: Template ID :arg body: The document """ for param in (id, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path('_search', 'template', id), params=params, body=body) @query_params() def get_template(self, id, params=None): """ Retrieve a search template. ``_ :arg id: Template ID """ if id in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'id'.") return self.transport.perform_request('GET', _make_path('_search', 'template', id), params=params) @query_params() def delete_template(self, id, params=None): """ Delete a search template. ``_ :arg id: Template ID """ if id in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'id'.") return self.transport.perform_request('DELETE', _make_path('_search', 'template', id), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'fields', 'ignore_unavailable', 'level') def field_stats(self, index=None, body=None, params=None): """ The field stats api allows one to find statistical properties of a field without executing a search, but looking up measurements that are natively available in the Lucene index. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg body: Field json objects containing the name and optionally a range to filter out indices result, that have results outside the defined bounds :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg fields: A comma-separated list of fields for to get field statistics for (min value, max value, and more) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg level: Defines if field stats should be returned on a per index level or on a cluster wide level, default 'cluster', valid choices are: 'indices', 'cluster' """ return self.transport.perform_request('GET', _make_path(index, '_field_stats'), params=params, body=body) @query_params() def render_search_template(self, id=None, body=None, params=None): """ ``_ :arg id: The id of the stored search template :arg body: The search definition template and its params """ return self.transport.perform_request('GET', _make_path('_render', 'template', id), params=params, body=body) @query_params('search_type') def msearch_template(self, body, index=None, doc_type=None, params=None): """ The /_search/template endpoint allows to use the mustache language to pre render search requests, before they are executed and fill existing templates with template parameters. ``_ :arg body: The request definitions (metadata-search request definition pairs), separated by newlines :arg index: A comma-separated list of index names to use as default :arg doc_type: A comma-separated list of document types to use as default :arg search_type: Search operation type, valid choices are: 'query_then_fetch', 'query_and_fetch', 'dfs_query_then_fetch', 'dfs_query_and_fetch' """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('GET', _make_path(index, doc_type, '_msearch', 'template'), params=params, body=self._bulk_body(body)) elasticsearch-py-5.4.0/elasticsearch/client/cat.py000066400000000000000000000524551310735271200222050ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH class CatClient(NamespacedClient): @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def aliases(self, name=None, params=None): """ ``_ :arg name: A comma-separated list of alias names to return :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'aliases', name), params=params) @query_params('bytes', 'format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def allocation(self, node_id=None, params=None): """ Allocation provides a snapshot of how shards have located around the cluster and the state of disk usage. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information :arg bytes: The unit in which to display byte values, valid choices are: 'b', 'k', 'kb', 'm', 'mb', 'g', 'gb', 't', 'tb', 'p', 'pb' :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'allocation', node_id), params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def count(self, index=None, params=None): """ Count provides quick access to the document count of the entire cluster, or individual indices. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'count', index), params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'ts', 'v') def health(self, params=None): """ health is a terse, one-line representation of the same information from :meth:`~elasticsearch.client.cluster.ClusterClient.health` API ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg ts: Set to false to disable timestamping, default True :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/health', params=params) @query_params('help', 's') def help(self, params=None): """ A simple help for the cat api. ``_ :arg help: Return help information, default False :arg s: Comma-separated list of column names or column aliases to sort by """ return self.transport.perform_request('GET', '/_cat', params=params) @query_params('bytes', 'format', 'h', 'health', 'help', 'local', 'master_timeout', 'pri', 's', 'v') def indices(self, index=None, params=None): """ The indices command provides a cross-section of each index. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg bytes: The unit in which to display byte values, valid choices are: 'b', 'k', 'm', 'g' :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg health: A health status ("green", "yellow", or "red" to filter only indices matching the specified health status, default None, valid choices are: 'green', 'yellow', 'red' :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg pri: Set to true to return stats only for primary shards, default False :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'indices', index), params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def master(self, params=None): """ Displays the master's node ID, bound IP address, and node name. ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/master', params=params) @query_params('format', 'full_id', 'h', 'help', 'local', 'master_timeout', 's', 'v') def nodes(self, params=None): """ The nodes command shows the cluster topology. ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg full_id: Return the full node ID instead of the shortened version (default: false) :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/nodes', params=params) @query_params('bytes', 'format', 'h', 'help', 'master_timeout', 's', 'v') def recovery(self, index=None, params=None): """ recovery is a view of shard replication. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg bytes: The unit in which to display byte values, valid choices are: 'b', 'k', 'kb', 'm', 'mb', 'g', 'gb', 't', 'tb', 'p', 'pb' :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'recovery', index), params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def shards(self, index=None, params=None): """ The shards command is the detailed view of what nodes contain which shards. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'shards', index), params=params) @query_params('format', 'h', 'help', 's', 'v') def segments(self, index=None, params=None): """ The segments command is the detailed view of Lucene segments per index. ``_ :arg index: A comma-separated list of index names to limit the returned information :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'segments', index), params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def pending_tasks(self, params=None): """ pending_tasks provides the same information as the :meth:`~elasticsearch.client.cluster.ClusterClient.pending_tasks` API in a convenient tabular format. ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/pending_tasks', params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'size', 'v') def thread_pool(self, thread_pool_patterns=None, params=None): """ Get information about thread pools. ``_ :arg thread_pool_patterns: A comma-separated list of regular-expressions to filter the thread pools in the output :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg size: The multiplier in which to display values, valid choices are: '', 'k', 'm', 'g', 't', 'p' :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'thread_pool', thread_pool_patterns), params=params) @query_params('bytes', 'format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def fielddata(self, fields=None, params=None): """ Shows information about currently loaded fielddata on a per-node basis. ``_ :arg fields: A comma-separated list of fields to return the fielddata size :arg bytes: The unit in which to display byte values, valid choices are: 'b', 'k', 'kb', 'm', 'mb', 'g', 'gb', 't', 'tb', 'p', 'pb' :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'fielddata', fields), params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def plugins(self, params=None): """ ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/plugins', params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def nodeattrs(self, params=None): """ ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/nodeattrs', params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def repositories(self, params=None): """ ``_ :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node, default False :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/repositories', params=params) @query_params('format', 'h', 'help', 'ignore_unavailable', 'master_timeout', 's', 'v') def snapshots(self, repository=None, params=None): """ ``_ :arg repository: Name of repository from which to fetch the snapshot information :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg ignore_unavailable: Set to true to ignore unavailable snapshots, default False :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'snapshots', repository), params=params) @query_params('actions', 'detailed', 'format', 'h', 'help', 'node_id', 'parent_node', 'parent_task', 's', 'v') def tasks(self, params=None): """ ``_ :arg actions: A comma-separated list of actions that should be returned. Leave empty to return all. :arg detailed: Return detailed task information (default: false) :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg parent_node: Return tasks with specified parent node. :arg parent_task: Return tasks with specified parent task id. Set to -1 to return all. :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', '/_cat/tasks', params=params) @query_params('format', 'h', 'help', 'local', 'master_timeout', 's', 'v') def templates(self, name=None, params=None): """ ``_ :arg name: A pattern that returned template names must match :arg format: a short version of the Accept header, e.g. json, yaml :arg h: Comma-separated list of column names to display :arg help: Return help information, default False :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg s: Comma-separated list of column names or column aliases to sort by :arg v: Verbose mode. Display column headers, default False """ return self.transport.perform_request('GET', _make_path('_cat', 'templates', name), params=params) elasticsearch-py-5.4.0/elasticsearch/client/cluster.py000066400000000000000000000210031310735271200231000ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path class ClusterClient(NamespacedClient): @query_params('level', 'local', 'master_timeout', 'timeout', 'wait_for_active_shards', 'wait_for_events', 'wait_for_no_relocating_shards', 'wait_for_nodes', 'wait_for_status') def health(self, index=None, params=None): """ Get a very simple status on the health of the cluster. ``_ :arg index: Limit the information returned to a specific index :arg level: Specify the level of detail for returned information, default 'cluster', valid choices are: 'cluster', 'indices', 'shards' :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout :arg wait_for_active_shards: Wait until the specified number of shards is active :arg wait_for_events: Wait until all currently queued events with the given priorty are processed, valid choices are: 'immediate', 'urgent', 'high', 'normal', 'low', 'languid' :arg wait_for_no_relocating_shards: Whether to wait until there are no relocating shards in the cluster :arg wait_for_nodes: Wait until the specified number of nodes is available :arg wait_for_status: Wait until cluster is in a specific state, default None, valid choices are: 'green', 'yellow', 'red' """ return self.transport.perform_request('GET', _make_path('_cluster', 'health', index), params=params) @query_params('local', 'master_timeout') def pending_tasks(self, params=None): """ The pending cluster tasks API returns a list of any cluster-level changes (e.g. create index, update mapping, allocate or fail shard) which have not yet been executed. ``_ :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Specify timeout for connection to master """ return self.transport.perform_request('GET', '/_cluster/pending_tasks', params=params) @query_params('allow_no_indices', 'expand_wildcards', 'flat_settings', 'ignore_unavailable', 'local', 'master_timeout') def state(self, metric=None, index=None, params=None): """ Get a comprehensive state information of the whole cluster. ``_ :arg metric: Limit the information returned to the specified metrics :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg flat_settings: Return settings in flat format (default: false) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Specify timeout for connection to master """ if index and not metric: metric = '_all' return self.transport.perform_request('GET', _make_path('_cluster', 'state', metric, index), params=params) @query_params('flat_settings', 'human', 'timeout') def stats(self, node_id=None, params=None): """ The Cluster Stats API allows to retrieve statistics from a cluster wide perspective. The API returns basic index metrics and information about the current nodes that form the cluster. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg flat_settings: Return settings in flat format (default: false) :arg human: Whether to return time and byte values in human-readable format., default False :arg timeout: Explicit operation timeout """ url = '/_cluster/stats' if node_id: url = _make_path('_cluster/stats/nodes', node_id) return self.transport.perform_request('GET', url, params=params) @query_params('dry_run', 'explain', 'master_timeout', 'metric', 'retry_failed', 'timeout') def reroute(self, body=None, params=None): """ Explicitly execute a cluster reroute allocation command including specific commands. ``_ :arg body: The definition of `commands` to perform (`move`, `cancel`, `allocate`) :arg dry_run: Simulate the operation only and return the resulting state :arg explain: Return an explanation of why the commands can or cannot be executed :arg master_timeout: Explicit operation timeout for connection to master node :arg metric: Limit the information returned to the specified metrics. Defaults to all but metadata, valid choices are: '_all', 'blocks', 'metadata', 'nodes', 'routing_table', 'master_node', 'version' :arg retry_failed: Retries allocation of shards that are blocked due to too many subsequent allocation failures :arg timeout: Explicit operation timeout """ return self.transport.perform_request('POST', '/_cluster/reroute', params=params, body=body) @query_params('flat_settings', 'include_defaults', 'master_timeout', 'timeout') def get_settings(self, params=None): """ Get cluster settings. ``_ :arg flat_settings: Return settings in flat format (default: false) :arg include_defaults: Whether to return all default clusters setting., default False :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ return self.transport.perform_request('GET', '/_cluster/settings', params=params) @query_params('flat_settings', 'master_timeout', 'timeout') def put_settings(self, body=None, params=None): """ Update cluster wide specific settings. ``_ :arg body: The settings to be updated. Can be either `transient` or `persistent` (survives cluster restart). :arg flat_settings: Return settings in flat format (default: false) :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ return self.transport.perform_request('PUT', '/_cluster/settings', params=params, body=body) @query_params('include_disk_info', 'include_yes_decisions') def allocation_explain(self, body=None, params=None): """ ``_ :arg body: The index, shard, and primary flag to explain. Empty means 'explain the first unassigned shard' :arg include_disk_info: Return information about disk usage and shard sizes (default: false) :arg include_yes_decisions: Return 'YES' decisions in explanation (default: false) """ return self.transport.perform_request('GET', '/_cluster/allocation/explain', params=params, body=body) elasticsearch-py-5.4.0/elasticsearch/client/indices.py000066400000000000000000001400421310735271200230420ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH class IndicesClient(NamespacedClient): @query_params('analyzer', 'attributes', 'char_filter', 'explain', 'field', 'filter', 'format', 'prefer_local', 'text', 'tokenizer') def analyze(self, index=None, body=None, params=None): """ Perform the analysis process on a text and return the tokens breakdown of the text. ``_ :arg index: The name of the index to scope the operation :arg body: The text on which the analysis should be performed :arg analyzer: The name of the analyzer to use :arg attributes: A comma-separated list of token attributes to output, this parameter works only with `explain=true` :arg char_filter: A comma-separated list of character filters to use for the analysis :arg explain: With `true`, outputs more advanced details. (default: false) :arg field: Use the analyzer configured for this field (instead of passing the analyzer name) :arg filter: A comma-separated list of filters to use for the analysis :arg format: Format of the output, default 'detailed', valid choices are: 'detailed', 'text' :arg prefer_local: With `true`, specify that a local shard should be used if available, with `false`, use a random shard (default: true) :arg text: The text on which the analysis should be performed (when request body is not used) :arg tokenizer: The name of the tokenizer to use for the analysis """ return self.transport.perform_request('GET', _make_path(index, '_analyze'), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'force', 'ignore_unavailable', 'operation_threading') def refresh(self, index=None, params=None): """ Explicitly refresh one or more index, making all operations performed since the last refresh available for search. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg force: Force a refresh even if not required, default False :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg operation_threading: TODO: ? """ return self.transport.perform_request('POST', _make_path(index, '_refresh'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'force', 'ignore_unavailable', 'wait_if_ongoing') def flush(self, index=None, params=None): """ Explicitly flush one or more indices. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string for all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg force: Whether a flush should be forced even if it is not necessarily needed ie. if no changes will be committed to the index. This is useful if transaction log IDs should be incremented even if no uncommitted changes are present. (This setting can be considered as internal) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg wait_if_ongoing: If set to true the flush operation will block until the flush can be executed if another flush operation is already executing. The default is true. If set to false the flush will be skipped iff if another flush operation is already running. """ return self.transport.perform_request('POST', _make_path(index, '_flush'), params=params) @query_params('master_timeout', 'timeout', 'update_all_types', 'wait_for_active_shards') def create(self, index, body=None, params=None): """ Create an index in Elasticsearch. ``_ :arg index: The name of the index :arg body: The configuration for the index (`settings` and `mappings`) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout :arg update_all_types: Whether to update the mapping for all fields with the same name across all types or not :arg wait_for_active_shards: Set the number of active shards to wait for before the operation returns. """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('PUT', _make_path(index), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'flat_settings', 'human', 'ignore_unavailable', 'include_defaults', 'local') def get(self, index, feature=None, params=None): """ The get index API allows to retrieve information about one or more indexes. ``_ :arg index: A comma-separated list of index names :arg feature: A comma-separated list of features :arg allow_no_indices: Ignore if a wildcard expression resolves to no concrete indices (default: false) :arg expand_wildcards: Whether wildcard expressions should get expanded to open or closed indices (default: open), default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg flat_settings: Return settings in flat format (default: false) :arg human: Whether to return version and creation date values in human- readable format., default False :arg ignore_unavailable: Ignore unavailable indexes (default: false) :arg include_defaults: Whether to return all default setting for each of the indices., default False :arg local: Return local information, do not retrieve the state from master node (default: false) """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('GET', _make_path(index, feature), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'master_timeout', 'timeout') def open(self, index, params=None): """ Open a closed index to make it available for search. ``_ :arg index: The name of the index :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'closed', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('POST', _make_path(index, '_open'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'master_timeout', 'timeout') def close(self, index, params=None): """ Close an index to remove it's overhead from the cluster. Closed index is blocked for read/write operations. ``_ :arg index: The name of the index :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('POST', _make_path(index, '_close'), params=params) @query_params('master_timeout', 'timeout') def delete(self, index, params=None): """ Delete an index in Elasticsearch ``_ :arg index: A comma-separated list of indices to delete; use `_all` or `*` string to delete all indices :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('DELETE', _make_path(index), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def exists(self, index, params=None): """ Return a boolean indicating whether given index exists. ``_ :arg index: A comma-separated list of indices to check :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ if index in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'index'.") return self.transport.perform_request('HEAD', _make_path(index), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def exists_type(self, index, doc_type, params=None): """ Check if a type/types exists in an index/indices. ``_ :arg index: A comma-separated list of index names; use `_all` to check the types across all indices :arg doc_type: A comma-separated list of document types to check :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ for param in (index, doc_type): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('HEAD', _make_path(index, '_mapping', doc_type), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'master_timeout', 'timeout', 'update_all_types') def put_mapping(self, doc_type, body, index=None, params=None): """ Register specific mapping definition for a specific type. ``_ :arg doc_type: The name of the document type :arg body: The mapping definition :arg index: A comma-separated list of index names the mapping should be added to (supports wildcards); use `_all` or omit to add the mapping on all indices. :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout :arg update_all_types: Whether to update the mapping for all fields with the same name across all types or not """ for param in (doc_type, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path(index, '_mapping', doc_type), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def get_mapping(self, index=None, doc_type=None, params=None): """ Retrieve mapping definition of index or index/type. ``_ :arg index: A comma-separated list of index names :arg doc_type: A comma-separated list of document types :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ return self.transport.perform_request('GET', _make_path(index, '_mapping', doc_type), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'include_defaults', 'local') def get_field_mapping(self, fields, index=None, doc_type=None, params=None): """ Retrieve mapping definition of a specific field. ``_ :arg fields: A comma-separated list of fields :arg index: A comma-separated list of index names :arg doc_type: A comma-separated list of document types :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg include_defaults: Whether the default mapping values should be returned as well :arg local: Return local information, do not retrieve the state from master node (default: false) """ if fields in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'fields'.") return self.transport.perform_request('GET', _make_path(index, '_mapping', doc_type, 'field', fields), params=params) @query_params('master_timeout', 'timeout') def put_alias(self, index, name, body=None, params=None): """ Create an alias for a specific index/indices. ``_ :arg index: A comma-separated list of index names the alias should point to (supports wildcards); use `_all` to perform the operation on all indices. :arg name: The name of the alias to be created or updated :arg body: The settings for the alias, such as `routing` or `filter` :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit timeout for the operation """ for param in (index, name): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path(index, '_alias', name), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def exists_alias(self, index=None, name=None, params=None): """ Return a boolean indicating whether given alias exists. ``_ :arg index: A comma-separated list of index names to filter aliases :arg name: A comma-separated list of alias names to return :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default ['open', 'closed'], valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ return self.transport.perform_request('HEAD', _make_path(index, '_alias', name), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'local') def get_alias(self, index=None, name=None, params=None): """ Retrieve a specified alias. ``_ :arg index: A comma-separated list of index names to filter aliases :arg name: A comma-separated list of alias names to return :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'all', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg local: Return local information, do not retrieve the state from master node (default: false) """ return self.transport.perform_request('GET', _make_path(index, '_alias', name), params=params) @query_params('master_timeout', 'timeout') def update_aliases(self, body, params=None): """ Update specified aliases. ``_ :arg body: The definition of `actions` to perform :arg master_timeout: Specify timeout for connection to master :arg timeout: Request timeout """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('POST', '/_aliases', params=params, body=body) @query_params('master_timeout', 'timeout') def delete_alias(self, index, name, params=None): """ Delete specific alias. ``_ :arg index: A comma-separated list of index names (supports wildcards); use `_all` for all indices :arg name: A comma-separated list of aliases to delete (supports wildcards); use `_all` to delete all aliases for the specified indices. :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit timeout for the operation """ for param in (index, name): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('DELETE', _make_path(index, '_alias', name), params=params) @query_params('create', 'flat_settings', 'master_timeout', 'order', 'timeout') def put_template(self, name, body, params=None): """ Create an index template that will automatically be applied to new indices created. ``_ :arg name: The name of the template :arg body: The template definition :arg create: Whether the index template should only be added if new or can also replace an existing one, default False :arg flat_settings: Return settings in flat format (default: false) :arg master_timeout: Specify timeout for connection to master :arg order: The order for this template when merging multiple matching ones (higher numbers are merged later, overriding the lower numbers) :arg timeout: Explicit operation timeout """ for param in (name, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path('_template', name), params=params, body=body) @query_params('local', 'master_timeout') def exists_template(self, name, params=None): """ Return a boolean indicating whether given template exists. ``_ :arg name: The name of the template :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node """ if name in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'name'.") return self.transport.perform_request('HEAD', _make_path('_template', name), params=params) @query_params('flat_settings', 'local', 'master_timeout') def get_template(self, name=None, params=None): """ Retrieve an index template by its name. ``_ :arg name: The name of the template :arg flat_settings: Return settings in flat format (default: false) :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node """ return self.transport.perform_request('GET', _make_path('_template', name), params=params) @query_params('master_timeout', 'timeout') def delete_template(self, name, params=None): """ Delete an index template by its name. ``_ :arg name: The name of the template :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout """ if name in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'name'.") return self.transport.perform_request('DELETE', _make_path('_template', name), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'flat_settings', 'human', 'ignore_unavailable', 'include_defaults', 'local') def get_settings(self, index=None, name=None, params=None): """ Retrieve settings for one or more (or all) indices. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg name: The name of the settings that should be included :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default ['open', 'closed'], valid choices are: 'open', 'closed', 'none', 'all' :arg flat_settings: Return settings in flat format (default: false) :arg human: Whether to return version and creation date values in human- readable format., default False :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg include_defaults: Whether to return all default setting for each of the indices., default False :arg local: Return local information, do not retrieve the state from master node (default: false) """ return self.transport.perform_request('GET', _make_path(index, '_settings', name), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'flat_settings', 'ignore_unavailable', 'master_timeout', 'preserve_existing') def put_settings(self, body, index=None, params=None): """ Change specific index level settings in real time. ``_ :arg body: The index settings to be updated :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg flat_settings: Return settings in flat format (default: false) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg master_timeout: Specify timeout for connection to master :arg preserve_existing: Whether to update existing settings. If set to `true` existing settings on an index remain unchanged, the default is `false` """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('PUT', _make_path(index, '_settings'), params=params, body=body) @query_params('completion_fields', 'fielddata_fields', 'fields', 'groups', 'include_segment_file_sizes', 'level', 'types') def stats(self, index=None, metric=None, params=None): """ Retrieve statistics on different operations happening on an index. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg metric: Limit the information returned the specific metrics. :arg completion_fields: A comma-separated list of fields for `fielddata` and `suggest` index metric (supports wildcards) :arg fielddata_fields: A comma-separated list of fields for `fielddata` index metric (supports wildcards) :arg fields: A comma-separated list of fields for `fielddata` and `completion` index metric (supports wildcards) :arg groups: A comma-separated list of search groups for `search` index metric :arg include_segment_file_sizes: Whether to report the aggregated disk usage of each one of the Lucene index files (only applies if segment stats are requested), default False :arg level: Return stats aggregated at cluster, index or shard level, default 'indices', valid choices are: 'cluster', 'indices', 'shards' :arg types: A comma-separated list of document types for the `indexing` index metric """ return self.transport.perform_request('GET', _make_path(index, '_stats', metric), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'operation_threading', 'verbose') def segments(self, index=None, params=None): """ Provide low level segments information that a Lucene index (shard level) is built with. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg operation_threading: TODO: ? :arg verbose: Includes detailed memory usage by Lucene., default False """ return self.transport.perform_request('GET', _make_path(index, '_segments'), params=params) @query_params('allow_no_indices', 'analyze_wildcard', 'analyzer', 'default_operator', 'df', 'expand_wildcards', 'explain', 'ignore_unavailable', 'lenient', 'lowercase_expanded_terms', 'operation_threading', 'q', 'rewrite') def validate_query(self, index=None, doc_type=None, body=None, params=None): """ Validate a potentially expensive query without executing it. ``_ :arg index: A comma-separated list of index names to restrict the operation; use `_all` or empty string to perform the operation on all indices :arg doc_type: A comma-separated list of document types to restrict the operation; leave empty to perform the operation on all types :arg body: The query definition specified with the Query DSL :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg analyze_wildcard: Specify whether wildcard and prefix queries should be analyzed (default: false) :arg analyzer: The analyzer to use for the query string :arg default_operator: The default operator for query string query (AND or OR), default 'OR', valid choices are: 'AND', 'OR' :arg df: The field to use as default where no field prefix is given in the query string :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg explain: Return detailed information about the error :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg lenient: Specify whether format-based query failures (such as providing text to a numeric field) should be ignored :arg lowercase_expanded_terms: Specify whether query terms should be lowercased :arg operation_threading: TODO: ? :arg q: Query in the Lucene query string syntax :arg rewrite: Provide a more detailed explanation showing the actual Lucene query that will be executed. """ return self.transport.perform_request('GET', _make_path(index, doc_type, '_validate', 'query'), params=params, body=body) @query_params('allow_no_indices', 'expand_wildcards', 'field_data', 'fielddata', 'fields', 'ignore_unavailable', 'query', 'recycler', 'request') def clear_cache(self, index=None, params=None): """ Clear either all caches or specific cached associated with one ore more indices. ``_ :arg index: A comma-separated list of index name to limit the operation :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg field_data: Clear field data :arg fielddata: Clear field data :arg fields: A comma-separated list of fields to clear when using the `field_data` parameter (default: all) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg query: Clear query caches :arg recycler: Clear the recycler cache :arg request: Clear request cache """ return self.transport.perform_request('POST', _make_path(index, '_cache', 'clear'), params=params) @query_params('active_only', 'detailed') def recovery(self, index=None, params=None): """ The indices recovery API provides insight into on-going shard recoveries. Recovery status may be reported for specific indices, or cluster-wide. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg active_only: Display only those recoveries that are currently on- going, default False :arg detailed: Whether to display detailed information about shard recovery, default False """ return self.transport.perform_request('GET', _make_path(index, '_recovery'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'only_ancient_segments', 'wait_for_completion') def upgrade(self, index=None, params=None): """ Upgrade one or more indices to the latest format through an API. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg only_ancient_segments: If true, only ancient (an older Lucene major release) segments will be upgraded :arg wait_for_completion: Specify whether the request should block until the all segments are upgraded (default: false) """ return self.transport.perform_request('POST', _make_path(index, '_upgrade'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable') def get_upgrade(self, index=None, params=None): """ Monitor how much of one or more index is upgraded. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) """ return self.transport.perform_request('GET', _make_path(index, '_upgrade'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable') def flush_synced(self, index=None, params=None): """ Perform a normal flush, then add a generated unique marker (sync_id) to all shards. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string for all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) """ return self.transport.perform_request('POST', _make_path(index, '_flush', 'synced'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'ignore_unavailable', 'operation_threading', 'status') def shard_stores(self, index=None, params=None): """ Provides store information for shard copies of indices. Store information reports on which nodes shard copies exist, the shard copy version, indicating how recent they are, and any exceptions encountered while opening the shard index or from earlier engine failure. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg operation_threading: TODO: ? :arg status: A comma-separated list of statuses used to filter on shards to get store information for, valid choices are: 'green', 'yellow', 'red', 'all' """ return self.transport.perform_request('GET', _make_path(index, '_shard_stores'), params=params) @query_params('allow_no_indices', 'expand_wildcards', 'flush', 'ignore_unavailable', 'max_num_segments', 'only_expunge_deletes', 'operation_threading', 'wait_for_merge') def forcemerge(self, index=None, params=None): """ The force merge API allows to force merging of one or more indices through an API. The merge relates to the number of segments a Lucene index holds within each shard. The force merge operation allows to reduce the number of segments by merging them. This call will block until the merge is complete. If the http connection is lost, the request will continue in the background, and any new requests will block until the previous force merge is complete. ``_ :arg index: A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices :arg allow_no_indices: Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes `_all` string or when no indices have been specified) :arg expand_wildcards: Whether to expand wildcard expression to concrete indices that are open, closed or both., default 'open', valid choices are: 'open', 'closed', 'none', 'all' :arg flush: Specify whether the index should be flushed after performing the operation (default: true) :arg ignore_unavailable: Whether specified concrete indices should be ignored when unavailable (missing or closed) :arg max_num_segments: The number of segments the index should be merged into (default: dynamic) :arg only_expunge_deletes: Specify whether the operation should only expunge deleted documents :arg operation_threading: TODO: ? :arg wait_for_merge: Specify whether the request should block until the merge process is finished (default: true) """ return self.transport.perform_request('POST', _make_path(index, '_forcemerge'), params=params) @query_params('master_timeout', 'timeout', 'wait_for_active_shards') def shrink(self, index, target, body=None, params=None): """ The shrink index API allows you to shrink an existing index into a new index with fewer primary shards. The number of primary shards in the target index must be a factor of the shards in the source index. For example an index with 8 primary shards can be shrunk into 4, 2 or 1 primary shards or an index with 15 primary shards can be shrunk into 5, 3 or 1. If the number of shards in the index is a prime number it can only be shrunk into a single primary shard. Before shrinking, a (primary or replica) copy of every shard in the index must be present on the same node. ``_ :arg index: The name of the source index to shrink :arg target: The name of the target index to shrink into :arg body: The configuration for the target index (`settings` and `aliases`) :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout :arg wait_for_active_shards: Set the number of active shards to wait for on the shrunken index before the operation returns. """ for param in (index, target): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path(index, '_shrink', target), params=params, body=body) @query_params('dry_run', 'master_timeout', 'timeout', 'wait_for_active_shards') def rollover(self, alias, new_index=None, body=None, params=None): """ The rollover index API rolls an alias over to a new index when the existing index is considered to be too large or too old. The API accepts a single alias name and a list of conditions. The alias must point to a single index only. If the index satisfies the specified conditions then a new index is created and the alias is switched to point to the new alias. ``_ :arg alias: The name of the alias to rollover :arg new_index: The name of the rollover index :arg body: The conditions that needs to be met for executing rollover :arg dry_run: If set to true the rollover action will only be validated but not actually performed even if a condition matches. The default is false :arg master_timeout: Specify timeout for connection to master :arg timeout: Explicit operation timeout :arg wait_for_active_shards: Set the number of active shards to wait for on the newly created rollover index before the operation returns. """ if alias in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'alias'.") return self.transport.perform_request('POST', _make_path(alias, '_rollover', new_index), params=params, body=body) elasticsearch-py-5.4.0/elasticsearch/client/ingest.py000066400000000000000000000050241310735271200227150ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH class IngestClient(NamespacedClient): @query_params('master_timeout') def get_pipeline(self, id=None, params=None): """ ``_ :arg id: Comma separated list of pipeline ids. Wildcards supported :arg master_timeout: Explicit operation timeout for connection to master node """ return self.transport.perform_request('GET', _make_path('_ingest', 'pipeline', id), params=params) @query_params('master_timeout', 'timeout') def put_pipeline(self, id, body, params=None): """ ``_ :arg id: Pipeline ID :arg body: The ingest definition :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ for param in (id, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path('_ingest', 'pipeline', id), params=params, body=body) @query_params('master_timeout', 'timeout') def delete_pipeline(self, id, params=None): """ ``_ :arg id: Pipeline ID :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ if id in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'id'.") return self.transport.perform_request('DELETE', _make_path('_ingest', 'pipeline', id), params=params) @query_params('verbose') def simulate(self, body, id=None, params=None): """ ``_ :arg body: The simulate definition :arg id: Pipeline ID :arg verbose: Verbose mode. Display data output for each processor in executed pipeline, default False """ if body in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'body'.") return self.transport.perform_request('GET', _make_path('_ingest', 'pipeline', id, '_simulate'), params=params, body=body) elasticsearch-py-5.4.0/elasticsearch/client/nodes.py000066400000000000000000000111661310735271200225400ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path class NodesClient(NamespacedClient): @query_params('flat_settings', 'timeout') def info(self, node_id=None, metric=None, params=None): """ The cluster nodes info API allows to retrieve one or more (or all) of the cluster nodes information. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg metric: A comma-separated list of metrics you wish returned. Leave empty to return all. :arg flat_settings: Return settings in flat format (default: false) :arg timeout: Explicit operation timeout """ return self.transport.perform_request('GET', _make_path('_nodes', node_id, metric), params=params) @query_params('completion_fields', 'fielddata_fields', 'fields', 'groups', 'include_segment_file_sizes', 'level', 'timeout', 'types') def stats(self, node_id=None, metric=None, index_metric=None, params=None): """ The cluster nodes stats API allows to retrieve one or more (or all) of the cluster nodes statistics. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg metric: Limit the information returned to the specified metrics :arg index_metric: Limit the information returned for `indices` metric to the specific index metrics. Isn't used if `indices` (or `all`) metric isn't specified. :arg completion_fields: A comma-separated list of fields for `fielddata` and `suggest` index metric (supports wildcards) :arg fielddata_fields: A comma-separated list of fields for `fielddata` index metric (supports wildcards) :arg fields: A comma-separated list of fields for `fielddata` and `completion` index metric (supports wildcards) :arg groups: A comma-separated list of search groups for `search` index metric :arg include_segment_file_sizes: Whether to report the aggregated disk usage of each one of the Lucene index files (only applies if segment stats are requested), default False :arg level: Return indices stats aggregated at index, node or shard level, default 'node', valid choices are: 'indices', 'node', 'shards' :arg timeout: Explicit operation timeout :arg types: A comma-separated list of document types for the `indexing` index metric """ return self.transport.perform_request('GET', _make_path('_nodes', node_id, 'stats', metric, index_metric), params=params) @query_params('type', 'ignore_idle_threads', 'interval', 'snapshots', 'threads', 'timeout') def hot_threads(self, node_id=None, params=None): """ An API allowing to get the current hot threads on each node in the cluster. ``_ :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg type: The type to sample (default: cpu), valid choices are: 'cpu', 'wait', 'block' :arg ignore_idle_threads: Don't show threads that are in known-idle places, such as waiting on a socket select or pulling from an empty task queue (default: true) :arg interval: The interval for the second sampling of threads :arg snapshots: Number of samples of thread stacktrace (default: 10) :arg threads: Specify the number of threads to provide information for (default: 3) :arg timeout: Explicit operation timeout """ # avoid python reserved words if params and 'type_' in params: params['type'] = params.pop('type_') return self.transport.perform_request('GET', _make_path('_cluster', 'nodes', node_id, 'hotthreads'), params=params) elasticsearch-py-5.4.0/elasticsearch/client/snapshot.py000066400000000000000000000171071310735271200232700ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH class SnapshotClient(NamespacedClient): @query_params('master_timeout', 'wait_for_completion') def create(self, repository, snapshot, body=None, params=None): """ Create a snapshot in repository ``_ :arg repository: A repository name :arg snapshot: A snapshot name :arg body: The snapshot definition :arg master_timeout: Explicit operation timeout for connection to master node :arg wait_for_completion: Should this request wait until the operation has completed before returning, default False """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path('_snapshot', repository, snapshot), params=params, body=body) @query_params('master_timeout') def delete(self, repository, snapshot, params=None): """ Deletes a snapshot from a repository. ``_ :arg repository: A repository name :arg snapshot: A snapshot name :arg master_timeout: Explicit operation timeout for connection to master node """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('DELETE', _make_path('_snapshot', repository, snapshot), params=params) @query_params('ignore_unavailable', 'master_timeout') def get(self, repository, snapshot, params=None): """ Retrieve information about a snapshot. ``_ :arg repository: A repository name :arg snapshot: A comma-separated list of snapshot names :arg ignore_unavailable: Whether to ignore unavailable snapshots, defaults to false which means a SnapshotMissingException is thrown :arg master_timeout: Explicit operation timeout for connection to master node """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('GET', _make_path('_snapshot', repository, snapshot), params=params) @query_params('master_timeout', 'timeout') def delete_repository(self, repository, params=None): """ Removes a shared file system repository. ``_ :arg repository: A comma-separated list of repository names :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ if repository in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'repository'.") return self.transport.perform_request('DELETE', _make_path('_snapshot', repository), params=params) @query_params('local', 'master_timeout') def get_repository(self, repository=None, params=None): """ Return information about registered repositories. ``_ :arg repository: A comma-separated list of repository names :arg local: Return local information, do not retrieve the state from master node (default: false) :arg master_timeout: Explicit operation timeout for connection to master node """ return self.transport.perform_request('GET', _make_path('_snapshot', repository), params=params) @query_params('master_timeout', 'timeout', 'verify') def create_repository(self, repository, body, params=None): """ Registers a shared file system repository. ``_ :arg repository: A repository name :arg body: The repository definition :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout :arg verify: Whether to verify the repository after creation """ for param in (repository, body): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('PUT', _make_path('_snapshot', repository), params=params, body=body) @query_params('master_timeout', 'wait_for_completion') def restore(self, repository, snapshot, body=None, params=None): """ Restore a snapshot. ``_ :arg repository: A repository name :arg snapshot: A snapshot name :arg body: Details of what to restore :arg master_timeout: Explicit operation timeout for connection to master node :arg wait_for_completion: Should this request wait until the operation has completed before returning, default False """ for param in (repository, snapshot): if param in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument.") return self.transport.perform_request('POST', _make_path('_snapshot', repository, snapshot, '_restore'), params=params, body=body) @query_params('ignore_unavailable', 'master_timeout') def status(self, repository=None, snapshot=None, params=None): """ Return information about all currently running snapshots. By specifying a repository name, it's possible to limit the results to a particular repository. ``_ :arg repository: A repository name :arg snapshot: A comma-separated list of snapshot names :arg ignore_unavailable: Whether to ignore unavailable snapshots, defaults to false which means a SnapshotMissingException is thrown :arg master_timeout: Explicit operation timeout for connection to master node """ return self.transport.perform_request('GET', _make_path('_snapshot', repository, snapshot, '_status'), params=params) @query_params('master_timeout', 'timeout') def verify_repository(self, repository, params=None): """ Returns a list of nodes where repository was successfully verified or an error message if verification process failed. ``_ :arg repository: A repository name :arg master_timeout: Explicit operation timeout for connection to master node :arg timeout: Explicit operation timeout """ if repository in SKIP_IN_PATH: raise ValueError("Empty value passed for a required argument 'repository'.") return self.transport.perform_request('POST', _make_path('_snapshot', repository, '_verify'), params=params) elasticsearch-py-5.4.0/elasticsearch/client/tasks.py000066400000000000000000000057551310735271200225640ustar00rootroot00000000000000from .utils import NamespacedClient, query_params, _make_path, SKIP_IN_PATH class TasksClient(NamespacedClient): @query_params('actions', 'detailed', 'group_by', 'node_id', 'parent_node', 'parent_task', 'wait_for_completion') def list(self, task_id=None, params=None): """ ``_ :arg task_id: Return the task with specified id (node_id:task_number) :arg actions: A comma-separated list of actions that should be returned. Leave empty to return all. :arg detailed: Return detailed task information (default: false) :arg group_by: Group tasks by nodes or parent/child relationships, default 'nodes', valid choices are: 'nodes', 'parents' :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg parent_node: Return tasks with specified parent node. :arg parent_task: Return tasks with specified parent task id (node_id:task_number). Set to -1 to return all. :arg wait_for_completion: Wait for the matching tasks to complete (default: false) """ return self.transport.perform_request('GET', _make_path('_tasks', task_id), params=params) @query_params('actions', 'node_id', 'parent_node', 'parent_task') def cancel(self, task_id=None, params=None): """ ``_ :arg task_id: Cancel the task with specified task id (node_id:task_number) :arg actions: A comma-separated list of actions that should be cancelled. Leave empty to cancel all. :arg node_id: A comma-separated list of node IDs or names to limit the returned information; use `_local` to return information from the node you're connecting to, leave empty to get information from all nodes :arg parent_node: Cancel tasks with specified parent node. :arg parent_task: Cancel tasks with specified parent task id (node_id:task_number). Set to -1 to cancel all. """ return self.transport.perform_request('POST', _make_path('_tasks', task_id, '_cancel'), params=params) @query_params('wait_for_completion') def get(self, task_id=None, params=None): """ Retrieve information for a particular task. ``_ :arg task_id: Return the task with specified id (node_id:task_number) :arg wait_for_completion: Wait for the matching tasks to complete (default: false) """ return self.transport.perform_request('GET', _make_path('_tasks', task_id), params=params) elasticsearch-py-5.4.0/elasticsearch/client/utils.py000066400000000000000000000053571310735271200225750ustar00rootroot00000000000000from __future__ import unicode_literals import weakref from datetime import date, datetime from functools import wraps from ..compat import string_types, quote_plus # parts of URL to be omitted SKIP_IN_PATH = (None, '', b'', [], ()) def _escape(value): """ Escape a single value of a URL string or a query parameter. If it is a list or tuple, turn it into a comma-separated string first. """ # make sequences into comma-separated stings if isinstance(value, (list, tuple)): value = ','.join(value) # dates and datetimes into isoformat elif isinstance(value, (date, datetime)): value = value.isoformat() # make bools into true/false strings elif isinstance(value, bool): value = str(value).lower() # encode strings to utf-8 if isinstance(value, string_types): try: return value.encode('utf-8') except UnicodeDecodeError: # Python 2 and str, no need to re-encode pass return str(value) def _make_path(*parts): """ Create a URL string from parts, omit all `None` values and empty strings. Convert lists nad tuples to comma separated values. """ #TODO: maybe only allow some parts to be lists/tuples ? return '/' + '/'.join( # preserve ',' and '*' in url for nicer URLs in logs quote_plus(_escape(p), b',*') for p in parts if p not in SKIP_IN_PATH) # parameters that apply to all methods GLOBAL_PARAMS = ('pretty', 'human', 'error_trace', 'format', 'filter_path') def query_params(*es_query_params): """ Decorator that pops all accepted parameters from method's kwargs and puts them in the params argument. """ def _wrapper(func): @wraps(func) def _wrapped(*args, **kwargs): params = {} if 'params' in kwargs: params = kwargs.pop('params').copy() for p in es_query_params + GLOBAL_PARAMS: if p in kwargs: v = kwargs.pop(p) if v is not None: params[p] = _escape(v) # don't treat ignore and request_timeout as other params to avoid escaping for p in ('ignore', 'request_timeout'): if p in kwargs: params[p] = kwargs.pop(p) return func(*args, params=params, **kwargs) return _wrapped return _wrapper class NamespacedClient(object): def __init__(self, client): self.client = client @property def transport(self): return self.client.transport class AddonClient(NamespacedClient): @classmethod def infect_client(cls, client): addon = cls(weakref.proxy(client)) setattr(client, cls.namespace, addon) return client elasticsearch-py-5.4.0/elasticsearch/compat.py000066400000000000000000000005121310735271200214260ustar00rootroot00000000000000import sys PY2 = sys.version_info[0] == 2 if PY2: string_types = basestring, from urllib import quote_plus, urlencode, unquote from urlparse import urlparse from itertools import imap as map else: string_types = str, bytes from urllib.parse import quote_plus, urlencode, urlparse, unquote map = map elasticsearch-py-5.4.0/elasticsearch/connection/000077500000000000000000000000001310735271200217325ustar00rootroot00000000000000elasticsearch-py-5.4.0/elasticsearch/connection/__init__.py000066400000000000000000000001771310735271200240500ustar00rootroot00000000000000from .base import Connection from .http_requests import RequestsHttpConnection from .http_urllib3 import Urllib3HttpConnection elasticsearch-py-5.4.0/elasticsearch/connection/base.py000066400000000000000000000117071310735271200232240ustar00rootroot00000000000000import logging try: import simplejson as json except ImportError: import json from ..exceptions import TransportError, HTTP_EXCEPTIONS logger = logging.getLogger('elasticsearch') # create the elasticsearch.trace logger, but only set propagate to False if the # logger hasn't already been configured _tracer_already_configured = 'elasticsearch.trace' in logging.Logger.manager.loggerDict tracer = logging.getLogger('elasticsearch.trace') if not _tracer_already_configured: tracer.propagate = False class Connection(object): """ Class responsible for maintaining a connection to an Elasticsearch node. It holds persistent connection pool to it and it's main interface (`perform_request`) is thread-safe. Also responsible for logging. """ def __init__(self, host='localhost', port=9200, use_ssl=False, url_prefix='', timeout=10, **kwargs): """ :arg host: hostname of the node (default: localhost) :arg port: port to use (integer, default: 9200) :arg url_prefix: optional url prefix for elasticsearch :arg timeout: default timeout in seconds (float, default: 10) """ scheme = kwargs.get('scheme', 'http') if use_ssl or scheme == 'https': scheme = 'https' use_ssl = True self.use_ssl = use_ssl self.host = '%s://%s:%s' % (scheme, host, port) if url_prefix: url_prefix = '/' + url_prefix.strip('/') self.url_prefix = url_prefix self.timeout = timeout def __repr__(self): return '<%s: %s>' % (self.__class__.__name__, self.host) def _pretty_json(self, data): # pretty JSON in tracer curl logs try: return json.dumps(json.loads(data), sort_keys=True, indent=2, separators=(',', ': ')).replace("'", r'\u0027') except (ValueError, TypeError): # non-json data or a bulk request return data def _log_trace(self, method, path, body, status_code, response, duration): if not tracer.isEnabledFor(logging.INFO) or not tracer.handlers: return # include pretty in trace curls path = path.replace('?', '?pretty&', 1) if '?' in path else path + '?pretty' if self.url_prefix: path = path.replace(self.url_prefix, '', 1) tracer.info("curl %s-X%s 'http://localhost:9200%s' -d '%s'", "-H 'Content-Type: application/json' " if body else '', method, path, self._pretty_json(body) if body else '') if tracer.isEnabledFor(logging.DEBUG): tracer.debug('#[%s] (%.3fs)\n#%s', status_code, duration, self._pretty_json(response).replace('\n', '\n#') if response else '') def log_request_success(self, method, full_url, path, body, status_code, response, duration): """ Log a successful API call. """ # TODO: optionally pass in params instead of full_url and do urlencode only when needed # body has already been serialized to utf-8, deserialize it for logging # TODO: find a better way to avoid (de)encoding the body back and forth if body: body = body.decode('utf-8') logger.info( '%s %s [status:%s request:%.3fs]', method, full_url, status_code, duration ) logger.debug('> %s', body) logger.debug('< %s', response) self._log_trace(method, path, body, status_code, response, duration) def log_request_fail(self, method, full_url, path, body, duration, status_code=None, response=None, exception=None): """ Log an unsuccessful API call. """ # do not log 404s on HEAD requests if method == 'HEAD' and status_code == 404: return logger.warning( '%s %s [status:%s request:%.3fs]', method, full_url, status_code or 'N/A', duration, exc_info=exception is not None ) # body has already been serialized to utf-8, deserialize it for logging # TODO: find a better way to avoid (de)encoding the body back and forth if body: body = body.decode('utf-8') logger.debug('> %s', body) self._log_trace(method, path, body, status_code, response, duration) if response is not None: logger.debug('< %s', response) def _raise_error(self, status_code, raw_data): """ Locate appropriate exception and raise it. """ error_message = raw_data additional_info = None try: if raw_data: additional_info = json.loads(raw_data) error_message = additional_info.get('error', error_message) if isinstance(error_message, dict) and 'type' in error_message: error_message = error_message['type'] except (ValueError, TypeError) as err: logger.warning('Undecodable raw error response from server: %s', err) raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch-py-5.4.0/elasticsearch/connection/http_requests.py000066400000000000000000000107401310735271200252200ustar00rootroot00000000000000import time import warnings try: import requests REQUESTS_AVAILABLE = True except ImportError: REQUESTS_AVAILABLE = False from .base import Connection from ..exceptions import ConnectionError, ImproperlyConfigured, ConnectionTimeout, SSLError from ..compat import urlencode, string_types class RequestsHttpConnection(Connection): """ Connection using the `requests` library. :arg http_auth: optional http auth information as either ':' separated string or a tuple. Any value will be passed into requests as `auth`. :arg use_ssl: use ssl for the connection if `True` :arg verify_certs: whether to verify SSL certificates :arg ca_certs: optional path to CA bundle. By default standard requests' bundle will be used. :arg client_cert: path to the file containing the private key and the certificate, or cert only if using client_key :arg client_key: path to the file containing the private key if using separate cert and key files (client_cert will contain only the cert) :arg headers: any custom http headers to be add to requests """ def __init__(self, host='localhost', port=9200, http_auth=None, use_ssl=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, headers=None, **kwargs): if not REQUESTS_AVAILABLE: raise ImproperlyConfigured("Please install requests to use RequestsHttpConnection.") super(RequestsHttpConnection, self).__init__(host=host, port=port, use_ssl=use_ssl, **kwargs) self.session = requests.Session() self.session.headers = headers or {} self.session.headers.setdefault('content-type', 'application/json') if http_auth is not None: if isinstance(http_auth, (tuple, list)): http_auth = tuple(http_auth) elif isinstance(http_auth, string_types): http_auth = tuple(http_auth.split(':', 1)) self.session.auth = http_auth self.base_url = 'http%s://%s:%d%s' % ( 's' if self.use_ssl else '', host, port, self.url_prefix ) self.session.verify = verify_certs if not client_key: self.session.cert = client_cert elif client_cert: # cert is a tuple of (certfile, keyfile) self.session.cert = (client_cert, client_key) if ca_certs: if not verify_certs: raise ImproperlyConfigured("You cannot pass CA certificates when verify SSL is off.") self.session.verify = ca_certs if self.use_ssl and not verify_certs: warnings.warn( 'Connecting to %s using SSL with verify_certs=False is insecure.' % self.base_url) def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=()): url = self.base_url + url if params: url = '%s?%s' % (url, urlencode(params or {})) start = time.time() request = requests.Request(method=method, url=url, data=body) prepared_request = self.session.prepare_request(request) settings = self.session.merge_environment_settings(prepared_request.url, {}, None, None, None) send_kwargs = {'timeout': timeout or self.timeout} send_kwargs.update(settings) try: response = self.session.send(prepared_request, **send_kwargs) duration = time.time() - start raw_data = response.text except Exception as e: self.log_request_fail(method, url, prepared_request.path_url, body, time.time() - start, exception=e) if isinstance(e, requests.exceptions.SSLError): raise SSLError('N/A', str(e), e) if isinstance(e, requests.Timeout): raise ConnectionTimeout('TIMEOUT', str(e), e) raise ConnectionError('N/A', str(e), e) # raise errors based on http status codes, let the client handle those if needed if not (200 <= response.status_code < 300) and response.status_code not in ignore: self.log_request_fail(method, url, response.request.path_url, body, duration, response.status_code, raw_data) self._raise_error(response.status_code, raw_data) self.log_request_success(method, url, response.request.path_url, body, response.status_code, raw_data, duration) return response.status_code, response.headers, raw_data def close(self): """ Explicitly closes connections """ self.session.close() elasticsearch-py-5.4.0/elasticsearch/connection/http_urllib3.py000066400000000000000000000137041310735271200247240ustar00rootroot00000000000000import time import urllib3 from urllib3.exceptions import ReadTimeoutError, SSLError as UrllibSSLError import warnings CA_CERTS = None try: import certifi CA_CERTS = certifi.where() except ImportError: pass from .base import Connection from ..exceptions import ConnectionError, ImproperlyConfigured, ConnectionTimeout, SSLError from ..compat import urlencode class Urllib3HttpConnection(Connection): """ Default connection class using the `urllib3` library and the http protocol. :arg host: hostname of the node (default: localhost) :arg port: port to use (integer, default: 9200) :arg url_prefix: optional url prefix for elasticsearch :arg timeout: default timeout in seconds (float, default: 10) :arg http_auth: optional http auth information as either ':' separated string or a tuple :arg use_ssl: use ssl for the connection if `True` :arg verify_certs: whether to verify SSL certificates :arg ca_certs: optional path to CA bundle. See https://urllib3.readthedocs.io/en/latest/security.html#using-certifi-with-urllib3 for instructions how to get default set :arg client_cert: path to the file containing the private key and the certificate, or cert only if using client_key :arg client_key: path to the file containing the private key if using separate cert and key files (client_cert will contain only the cert) :arg ssl_version: version of the SSL protocol to use. Choices are: SSLv23 (default) SSLv2 SSLv3 TLSv1 (see ``PROTOCOL_*`` constants in the ``ssl`` module for exact options for your environment). :arg ssl_assert_hostname: use hostname verification if not `False` :arg ssl_assert_fingerprint: verify the supplied certificate fingerprint if not `None` :arg maxsize: the number of connections which will be kept open to this host. See https://urllib3.readthedocs.io/en/1.4/pools.html#api for more information. :arg headers: any custom http headers to be add to requests """ def __init__(self, host='localhost', port=9200, http_auth=None, use_ssl=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_version=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, maxsize=10, headers=None, **kwargs): super(Urllib3HttpConnection, self).__init__(host=host, port=port, use_ssl=use_ssl, **kwargs) self.headers = urllib3.make_headers(keep_alive=True) if http_auth is not None: if isinstance(http_auth, (tuple, list)): http_auth = ':'.join(http_auth) self.headers.update(urllib3.make_headers(basic_auth=http_auth)) # update headers in lowercase to allow overriding of auth headers if headers: for k in headers: self.headers[k.lower()] = headers[k] self.headers.setdefault('content-type', 'application/json') ca_certs = CA_CERTS if ca_certs is None else ca_certs pool_class = urllib3.HTTPConnectionPool kw = {} if use_ssl: pool_class = urllib3.HTTPSConnectionPool kw.update({ 'ssl_version': ssl_version, 'assert_hostname': ssl_assert_hostname, 'assert_fingerprint': ssl_assert_fingerprint, }) if verify_certs: if not ca_certs: raise ImproperlyConfigured("Root certificates are missing for certificate " "validation. Either pass them in using the ca_certs parameter or " "install certifi to use it automatically.") kw.update({ 'cert_reqs': 'CERT_REQUIRED', 'ca_certs': ca_certs, 'cert_file': client_cert, 'key_file': client_key, }) else: warnings.warn( 'Connecting to %s using SSL with verify_certs=False is insecure.' % host) self.pool = pool_class(host, port=port, timeout=self.timeout, maxsize=maxsize, **kw) def perform_request(self, method, url, params=None, body=None, timeout=None, ignore=()): url = self.url_prefix + url if params: url = '%s?%s' % (url, urlencode(params)) full_url = self.host + url start = time.time() try: kw = {} if timeout: kw['timeout'] = timeout # in python2 we need to make sure the url and method are not # unicode. Otherwise the body will be decoded into unicode too and # that will fail (#133, #201). if not isinstance(url, str): url = url.encode('utf-8') if not isinstance(method, str): method = method.encode('utf-8') response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw) duration = time.time() - start raw_data = response.data.decode('utf-8') except Exception as e: self.log_request_fail(method, full_url, url, body, time.time() - start, exception=e) if isinstance(e, UrllibSSLError): raise SSLError('N/A', str(e), e) if isinstance(e, ReadTimeoutError): raise ConnectionTimeout('TIMEOUT', str(e), e) raise ConnectionError('N/A', str(e), e) # raise errors based on http status codes, let the client handle those if needed if not (200 <= response.status < 300) and response.status not in ignore: self.log_request_fail(method, full_url, url, body, duration, response.status, raw_data) self._raise_error(response.status, raw_data) self.log_request_success(method, full_url, url, body, response.status, raw_data, duration) return response.status, response.getheaders(), raw_data def close(self): """ Explicitly closes connection """ self.pool.close() elasticsearch-py-5.4.0/elasticsearch/connection/pooling.py000066400000000000000000000015761310735271200237640ustar00rootroot00000000000000try: import queue except ImportError: import Queue as queue from .base import Connection class PoolingConnection(Connection): """ Base connection class for connections that use libraries without thread safety and no capacity for connection pooling. To use this just implement a ``_make_connection`` method that constructs a new connection and returns it. """ def __init__(self, *args, **kwargs): self._free_connections = queue.Queue() super(PoolingConnection, self).__init__(*args, **kwargs) def _get_connection(self): try: return self._free_connections.get_nowait() except queue.Empty: return self._make_connection() def _release_connection(self, con): self._free_connections.put(con) def close(self): """ Explicitly close connection """ pass elasticsearch-py-5.4.0/elasticsearch/connection_pool.py000066400000000000000000000233341310735271200233420ustar00rootroot00000000000000import time import random import logging import threading try: from Queue import PriorityQueue, Empty except ImportError: from queue import PriorityQueue, Empty from .exceptions import ImproperlyConfigured logger = logging.getLogger('elasticsearch') class ConnectionSelector(object): """ Simple class used to select a connection from a list of currently live connection instances. In init time it is passed a dictionary containing all the connections' options which it can then use during the selection process. When the `select` method is called it is given a list of *currently* live connections to choose from. The options dictionary is the one that has been passed to :class:`~elasticsearch.Transport` as `hosts` param and the same that is used to construct the Connection object itself. When the Connection was created from information retrieved from the cluster via the sniffing process it will be the dictionary returned by the `host_info_callback`. Example of where this would be useful is a zone-aware selector that would only select connections from it's own zones and only fall back to other connections where there would be none in it's zones. """ def __init__(self, opts): """ :arg opts: dictionary of connection instances and their options """ self.connection_opts = opts def select(self, connections): """ Select a connection from the given list. :arg connections: list of live connections to choose from """ pass class RandomSelector(ConnectionSelector): """ Select a connection at random """ def select(self, connections): return random.choice(connections) class RoundRobinSelector(ConnectionSelector): """ Selector using round-robin. """ def __init__(self, opts): super(RoundRobinSelector, self).__init__(opts) self.data = threading.local() def select(self, connections): self.data.rr = getattr(self.data, 'rr', -1) + 1 self.data.rr %= len(connections) return connections[self.data.rr] class ConnectionPool(object): """ Container holding the :class:`~elasticsearch.Connection` instances, managing the selection process (via a :class:`~elasticsearch.ConnectionSelector`) and dead connections. It's only interactions are with the :class:`~elasticsearch.Transport` class that drives all the actions within `ConnectionPool`. Initially connections are stored on the class as a list and, along with the connection options, get passed to the `ConnectionSelector` instance for future reference. Upon each request the `Transport` will ask for a `Connection` via the `get_connection` method. If the connection fails (it's `perform_request` raises a `ConnectionError`) it will be marked as dead (via `mark_dead`) and put on a timeout (if it fails N times in a row the timeout is exponentially longer - the formula is `default_timeout * 2 ** (fail_count - 1)`). When the timeout is over the connection will be resurrected and returned to the live pool. A connection that has been peviously marked as dead and succeedes will be marked as live (it's fail count will be deleted). """ def __init__(self, connections, dead_timeout=60, timeout_cutoff=5, selector_class=RoundRobinSelector, randomize_hosts=True, **kwargs): """ :arg connections: list of tuples containing the :class:`~elasticsearch.Connection` instance and it's options :arg dead_timeout: number of seconds a connection should be retired for after a failure, increases on consecutive failures :arg timeout_cutoff: number of consecutive failures after which the timeout doesn't increase :arg selector_class: :class:`~elasticsearch.ConnectionSelector` subclass to use if more than one connection is live :arg randomize_hosts: shuffle the list of connections upon arrival to avoid dog piling effect across processes """ if not connections: raise ImproperlyConfigured("No defined connections, you need to " "specify at least one host.") self.connection_opts = connections self.connections = [c for (c, opts) in connections] # remember original connection list for resurrect(force=True) self.orig_connections = tuple(self.connections) # PriorityQueue for thread safety and ease of timeout management self.dead = PriorityQueue(len(self.connections)) self.dead_count = {} if randomize_hosts: # randomize the connection list to avoid all clients hitting same node # after startup/restart random.shuffle(self.connections) # default timeout after which to try resurrecting a connection self.dead_timeout = dead_timeout self.timeout_cutoff = timeout_cutoff self.selector = selector_class(dict(connections)) def mark_dead(self, connection, now=None): """ Mark the connection as dead (failed). Remove it from the live pool and put it on a timeout. :arg connection: the failed instance """ # allow inject for testing purposes now = now if now else time.time() try: self.connections.remove(connection) except ValueError: # connection not alive or another thread marked it already, ignore return else: dead_count = self.dead_count.get(connection, 0) + 1 self.dead_count[connection] = dead_count timeout = self.dead_timeout * 2 ** min(dead_count - 1, self.timeout_cutoff) self.dead.put((now + timeout, connection)) logger.warning( 'Connection %r has failed for %i times in a row, putting on %i second timeout.', connection, dead_count, timeout ) def mark_live(self, connection): """ Mark connection as healthy after a resurrection. Resets the fail counter for the connection. :arg connection: the connection to redeem """ try: del self.dead_count[connection] except KeyError: # race condition, safe to ignore pass def resurrect(self, force=False): """ Attempt to resurrect a connection from the dead pool. It will try to locate one (not all) eligible (it's timeout is over) connection to return to the live pool. Any resurrected connection is also returned. :arg force: resurrect a connection even if there is none eligible (used when we have no live connections). If force is specified resurrect always returns a connection. """ # no dead connections if self.dead.empty(): # we are forced to return a connection, take one from the original # list. This is to avoid a race condition where get_connection can # see no live connections but when it calls resurrect self.dead is # also empty. We assume that other threat has resurrected all # available connections so we can safely return one at random. if force: return random.choice(self.orig_connections) return try: # retrieve a connection to check timeout, connection = self.dead.get(block=False) except Empty: # other thread has been faster and the queue is now empty. If we # are forced, return a connection at random again. if force: return random.choice(self.orig_connections) return if not force and timeout > time.time(): # return it back if not eligible and not forced self.dead.put((timeout, connection)) return # either we were forced or the connection is elligible to be retried self.connections.append(connection) logger.info('Resurrecting connection %r (force=%s).', connection, force) return connection def get_connection(self): """ Return a connection from the pool using the `ConnectionSelector` instance. It tries to resurrect eligible connections, forces a resurrection when no connections are availible and passes the list of live connections to the selector instance to choose from. Returns a connection instance and it's current fail count. """ self.resurrect() connections = self.connections[:] # no live nodes, resurrect one by force and return it if not connections: return self.resurrect(True) # only call selector if we have a selection if len(connections) > 1: return self.selector.select(connections) # only one connection, no need for a selector return connections[0] def close(self): """ Explicitly closes connections """ for conn in self.orig_connections: conn.close() class DummyConnectionPool(ConnectionPool): def __init__(self, connections, **kwargs): if len(connections) != 1: raise ImproperlyConfigured("DummyConnectionPool needs exactly one " "connection defined.") # we need connection opts for sniffing logic self.connection_opts = connections self.connection = connections[0][0] self.connections = (self.connection, ) def get_connection(self): return self.connection def close(self): """ Explicitly closes connections """ self.connection.close() def _noop(self, *args, **kwargs): pass mark_dead = mark_live = resurrect = _noop elasticsearch-py-5.4.0/elasticsearch/exceptions.py000066400000000000000000000060671310735271200223370ustar00rootroot00000000000000__all__ = [ 'ImproperlyConfigured', 'ElasticsearchException', 'SerializationError', 'TransportError', 'NotFoundError', 'ConflictError', 'RequestError', 'ConnectionError', 'SSLError', 'ConnectionTimeout' ] class ImproperlyConfigured(Exception): """ Exception raised when the config passed to the client is inconsistent or invalid. """ class ElasticsearchException(Exception): """ Base class for all exceptions raised by this package's operations (doesn't apply to :class:`~elasticsearch.ImproperlyConfigured`). """ class SerializationError(ElasticsearchException): """ Data passed in failed to serialize properly in the ``Serializer`` being used. """ class TransportError(ElasticsearchException): """ Exception raised when ES returns a non-OK (>=400) HTTP status code. Or when an actual connection error happens; in that case the ``status_code`` will be set to ``'N/A'``. """ @property def status_code(self): """ The HTTP status code of the response that precipitated the error or ``'N/A'`` if not applicable. """ return self.args[0] @property def error(self): """ A string error message. """ return self.args[1] @property def info(self): """ Dict of returned error info from ES, where available. """ return self.args[2] def __str__(self): cause = '' try: if self.info: cause = ', %r' % self.info['error']['root_cause'][0]['reason'] except LookupError: pass return 'TransportError(%s, %r%s)' % (self.status_code, self.error, cause) class ConnectionError(TransportError): """ Error raised when there was an exception while talking to ES. Original exception from the underlying :class:`~elasticsearch.Connection` implementation is available as ``.info.`` """ def __str__(self): return 'ConnectionError(%s) caused by: %s(%s)' % ( self.error, self.info.__class__.__name__, self.info) class SSLError(ConnectionError): """ Error raised when encountering SSL errors. """ class ConnectionTimeout(ConnectionError): """ A network timeout. Doesn't cause a node retry by default. """ def __str__(self): return 'ConnectionTimeout caused by - %s(%s)' % ( self.info.__class__.__name__, self.info) class NotFoundError(TransportError): """ Exception representing a 404 status code. """ class ConflictError(TransportError): """ Exception representing a 409 status code. """ class RequestError(TransportError): """ Exception representing a 400 status code. """ class AuthenticationException(TransportError): """ Exception representing a 401 status code. """ class AuthorizationException(TransportError): """ Exception representing a 403 status code. """ # more generic mappings from status_code to python exceptions HTTP_EXCEPTIONS = { 400: RequestError, 401: AuthenticationException, 403: AuthorizationException, 404: NotFoundError, 409: ConflictError, } elasticsearch-py-5.4.0/elasticsearch/helpers/000077500000000000000000000000001310735271200212355ustar00rootroot00000000000000elasticsearch-py-5.4.0/elasticsearch/helpers/__init__.py000066400000000000000000000360551310735271200233570ustar00rootroot00000000000000from __future__ import unicode_literals import logging from operator import methodcaller from ..exceptions import ElasticsearchException, TransportError from ..compat import map, string_types logger = logging.getLogger('elasticsearch.helpers') class BulkIndexError(ElasticsearchException): @property def errors(self): """ List of errors from execution of the last chunk. """ return self.args[1] class ScanError(ElasticsearchException): def __init__(self, scroll_id, *args, **kwargs): super(ScanError, self).__init__(*args, **kwargs) self.scroll_id = scroll_id def expand_action(data): """ From one document or action definition passed in by the user extract the action/data lines needed for elasticsearch's :meth:`~elasticsearch.Elasticsearch.bulk` api. """ # when given a string, assume user wants to index raw json if isinstance(data, string_types): return '{"index":{}}', data # make sure we don't alter the action data = data.copy() op_type = data.pop('_op_type', 'index') action = {op_type: {}} for key in ('_index', '_parent', '_percolate', '_routing', '_timestamp', '_type', '_version', '_version_type', '_id', '_retry_on_conflict', 'pipeline'): if key in data: action[op_type][key] = data.pop(key) # no data payload for delete if op_type == 'delete': return action, None return action, data.get('_source', data) def _chunk_actions(actions, chunk_size, max_chunk_bytes, serializer): """ Split actions into chunks by number or size, serialize them into strings in the process. """ bulk_actions = [] size, action_count = 0, 0 for action, data in actions: action = serializer.dumps(action) cur_size = len(action) + 1 if data is not None: data = serializer.dumps(data) cur_size += len(data) + 1 # full chunk, send it and start a new one if bulk_actions and (size + cur_size > max_chunk_bytes or action_count == chunk_size): yield bulk_actions bulk_actions = [] size, action_count = 0, 0 bulk_actions.append(action) if data is not None: bulk_actions.append(data) size += cur_size action_count += 1 if bulk_actions: yield bulk_actions def _process_bulk_chunk(client, bulk_actions, raise_on_exception=True, raise_on_error=True, **kwargs): """ Send a bulk request to elasticsearch and process the output. """ # if raise on error is set, we need to collect errors per chunk before raising them errors = [] try: # send the actual request resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs) except TransportError as e: # default behavior - just propagate exception if raise_on_exception: raise e # if we are not propagating, mark all actions in current chunk as failed err_message = str(e) exc_errors = [] # deserialize the data back, thisis expensive but only run on # errors if raise_on_exception is false, so shouldn't be a real # issue bulk_data = map(client.transport.serializer.loads, bulk_actions) while True: try: # collect all the information about failed actions action = next(bulk_data) op_type, action = action.popitem() info = {"error": err_message, "status": e.status_code, "exception": e} if op_type != 'delete': info['data'] = next(bulk_data) info.update(action) exc_errors.append({op_type: info}) except StopIteration: break # emulate standard behavior for failed actions if raise_on_error: raise BulkIndexError('%i document(s) failed to index.' % len(exc_errors), exc_errors) else: for err in exc_errors: yield False, err return # go through request-reponse pairs and detect failures for op_type, item in map(methodcaller('popitem'), resp['items']): ok = 200 <= item.get('status', 500) < 300 if not ok and raise_on_error: errors.append({op_type: item}) if ok or not errors: # if we are not just recording all errors to be able to raise # them all at once, yield items individually yield ok, {op_type: item} if errors: raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors) def streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=100 * 1024 * 1024, raise_on_error=True, expand_action_callback=expand_action, raise_on_exception=True, **kwargs): """ Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use :func:`~elasticsearch.helpers.bulk` which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg actions: iterable containing the actions to be executed :arg chunk_size: number of docs in one chunk sent to es (default: 500) :arg max_chunk_bytes: the maximum size of the request in bytes (default: 100MB) :arg raise_on_error: raise ``BulkIndexError`` containing errors (as `.errors`) from the execution of the last chunk when some occur. By default we raise. :arg raise_on_exception: if ``False`` then don't propagate exceptions from call to ``bulk`` and just report the items that failed as failed. :arg expand_action_callback: callback executed on each action passed in, should return a tuple containing the action line and the data line (`None` if data line should be omitted). """ actions = map(expand_action_callback, actions) for bulk_actions in _chunk_actions(actions, chunk_size, max_chunk_bytes, client.transport.serializer): for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs): yield result def bulk(client, actions, stats_only=False, **kwargs): """ Helper for the :meth:`~elasticsearch.Elasticsearch.bulk` api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if ``stats_only`` is set to ``True``. Note that by default we raise a ``BulkIndexError`` when we encounter an error so options like ``stats_only`` only apply when ``raise_on_error`` is set to ``False``. See :func:`~elasticsearch.helpers.streaming_bulk` for more accepted parameters :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg actions: iterator containing the actions :arg stats_only: if `True` only report number of successful/failed operations instead of just number of successful and a list of error responses Any additional keyword arguments will be passed to :func:`~elasticsearch.helpers.streaming_bulk` which is used to execute the operation. """ success, failed = 0, 0 # list of errors to be collected is not stats_only errors = [] for ok, item in streaming_bulk(client, actions, **kwargs): # go through request-reponse pairs and detect failures if not ok: if not stats_only: errors.append(item) failed += 1 else: success += 1 return success, failed if stats_only else errors def parallel_bulk(client, actions, thread_count=4, chunk_size=500, max_chunk_bytes=100 * 1024 * 1024, expand_action_callback=expand_action, **kwargs): """ Parallel version of the bulk helper run in multiple threads at once. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg actions: iterator containing the actions :arg thread_count: size of the threadpool to use for the bulk requests :arg chunk_size: number of docs in one chunk sent to es (default: 500) :arg max_chunk_bytes: the maximum size of the request in bytes (default: 100MB) :arg raise_on_error: raise ``BulkIndexError`` containing errors (as `.errors`) from the execution of the last chunk when some occur. By default we raise. :arg raise_on_exception: if ``False`` then don't propagate exceptions from call to ``bulk`` and just report the items that failed as failed. :arg expand_action_callback: callback executed on each action passed in, should return a tuple containing the action line and the data line (`None` if data line should be omitted). """ # Avoid importing multiprocessing unless parallel_bulk is used # to avoid exceptions on restricted environments like App Engine from multiprocessing.dummy import Pool actions = map(expand_action_callback, actions) pool = Pool(thread_count) try: for result in pool.imap( lambda chunk: list(_process_bulk_chunk(client, chunk, **kwargs)), _chunk_actions(actions, chunk_size, max_chunk_bytes, client.transport.serializer) ): for item in result: yield item finally: pool.close() pool.join() def scan(client, query=None, scroll='5m', raise_on_error=True, preserve_order=False, size=1000, request_timeout=None, clear_scroll=True, **kwargs): """ Simple abstraction on top of the :meth:`~elasticsearch.Elasticsearch.scroll` api - a simple iterator that yields all hits as returned by underlining scroll requests. By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use ``preserve_order=True``. This may be an expensive operation and will negate the performance benefits of using ``scan``. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use :arg query: body for the :meth:`~elasticsearch.Elasticsearch.search` api :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg raise_on_error: raises an exception (``ScanError``) if an error is encountered (some shards fail to execute). By default we raise. :arg preserve_order: don't set the ``search_type`` to ``scan`` - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution. :arg size: size (per shard) of the batch send at each iteration. :arg request_timeout: explicit timeout for each call to ``scan`` :arg clear_scroll: explicitly calls delete on the scroll id via the clear scroll API at the end of the method on completion or error, defaults to true. Any additional keyword arguments will be passed to the initial :meth:`~elasticsearch.Elasticsearch.search` call:: scan(es, query={"query": {"match": {"title": "python"}}}, index="orders-*", doc_type="books" ) """ if not preserve_order: query = query.copy() if query else {} query["sort"] = "_doc" # initial search resp = client.search(body=query, scroll=scroll, size=size, request_timeout=request_timeout, **kwargs) scroll_id = resp.get('_scroll_id') if scroll_id is None: return try: first_run = True while True: # if we didn't set search_type to scan initial search contains data if first_run: first_run = False else: resp = client.scroll(scroll_id, scroll=scroll, request_timeout=request_timeout) for hit in resp['hits']['hits']: yield hit # check if we have any errrors if resp["_shards"]["failed"]: logger.warning( 'Scroll request has failed on %d shards out of %d.', resp['_shards']['failed'], resp['_shards']['total'] ) if raise_on_error: raise ScanError( scroll_id, 'Scroll request has failed on %d shards out of %d.' % (resp['_shards']['failed'], resp['_shards']['total']) ) scroll_id = resp.get('_scroll_id') # end of scroll if scroll_id is None or not resp['hits']['hits']: break finally: if scroll_id and clear_scroll: client.clear_scroll(body={'scroll_id': [scroll_id]}, ignore=(404, )) def reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll='5m', scan_kwargs={}, bulk_kwargs={}): """ Reindex all documents from one index that satisfy a given query to another, potentially (if `target_client` is specified) on a different cluster. If you don't specify the query you will reindex all the documents. Since ``2.3`` a :meth:`~elasticsearch.Elasticsearch.reindex` api is available as part of elasticsearch itself. It is recommended to use the api instead of this helper wherever possible. The helper is here mostly for backwards compatibility and for situations where more flexibility is needed. .. note:: This helper doesn't transfer mappings, just the data. :arg client: instance of :class:`~elasticsearch.Elasticsearch` to use (for read if `target_client` is specified as well) :arg source_index: index (or list of indices) to read documents from :arg target_index: name of the index in the target cluster to populate :arg query: body for the :meth:`~elasticsearch.Elasticsearch.search` api :arg target_client: optional, is specified will be used for writing (thus enabling reindex between clusters) :arg chunk_size: number of docs in one chunk sent to es (default: 500) :arg scroll: Specify how long a consistent view of the index should be maintained for scrolled search :arg scan_kwargs: additional kwargs to be passed to :func:`~elasticsearch.helpers.scan` :arg bulk_kwargs: additional kwargs to be passed to :func:`~elasticsearch.helpers.bulk` """ target_client = client if target_client is None else target_client docs = scan(client, query=query, index=source_index, scroll=scroll, **scan_kwargs ) def _change_doc_index(hits, index): for h in hits: h['_index'] = index if 'fields' in h: h.update(h.pop('fields')) yield h kwargs = { 'stats_only': True, } kwargs.update(bulk_kwargs) return bulk(target_client, _change_doc_index(docs, target_index), chunk_size=chunk_size, **kwargs) elasticsearch-py-5.4.0/elasticsearch/helpers/test.py000066400000000000000000000034731310735271200225750ustar00rootroot00000000000000import time import os try: # python 2.6 from unittest2 import TestCase, SkipTest except ImportError: from unittest import TestCase, SkipTest from elasticsearch import Elasticsearch from elasticsearch.exceptions import ConnectionError def get_test_client(nowait=False, **kwargs): # construct kwargs from the environment kw = {'timeout': 30} if 'TEST_ES_CONNECTION' in os.environ: from elasticsearch import connection kw['connection_class'] = getattr(connection, os.environ['TEST_ES_CONNECTION']) kw.update(kwargs) client = Elasticsearch([os.environ.get('TEST_ES_SERVER', {})], **kw) # wait for yellow status for _ in range(1 if nowait else 100): try: client.cluster.health(wait_for_status='yellow') return client except ConnectionError: time.sleep(.1) else: # timeout raise SkipTest("Elasticsearch failed to start.") def _get_version(version_string): if '.' not in version_string: return () version = version_string.strip().split('.') return tuple(int(v) if v.isdigit() else 999 for v in version) class ElasticsearchTestCase(TestCase): @staticmethod def _get_client(): return get_test_client() @classmethod def setUpClass(cls): super(ElasticsearchTestCase, cls).setUpClass() cls.client = cls._get_client() def tearDown(self): super(ElasticsearchTestCase, self).tearDown() self.client.indices.delete(index='*', ignore=404) self.client.indices.delete_template(name='*', ignore=404) @property def es_version(self): if not hasattr(self, '_es_version'): version_string = self.client.info()['version']['number'] self._es_version = _get_version(version_string) return self._es_version elasticsearch-py-5.4.0/elasticsearch/serializer.py000066400000000000000000000043571310735271200223270ustar00rootroot00000000000000try: import simplejson as json except ImportError: import json import uuid from datetime import date, datetime from decimal import Decimal from .exceptions import SerializationError, ImproperlyConfigured from .compat import string_types class TextSerializer(object): mimetype = 'text/plain' def loads(self, s): return s def dumps(self, data): if isinstance(data, string_types): return data raise SerializationError('Cannot serialize %r into text.' % data) class JSONSerializer(object): mimetype = 'application/json' def default(self, data): if isinstance(data, (date, datetime)): return data.isoformat() elif isinstance(data, Decimal): return float(data) elif isinstance(data, uuid.UUID): return str(data) raise TypeError("Unable to serialize %r (type: %s)" % (data, type(data))) def loads(self, s): try: return json.loads(s) except (ValueError, TypeError) as e: raise SerializationError(s, e) def dumps(self, data): # don't serialize strings if isinstance(data, string_types): return data try: return json.dumps(data, default=self.default, ensure_ascii=False) except (ValueError, TypeError) as e: raise SerializationError(data, e) DEFAULT_SERIALIZERS = { JSONSerializer.mimetype: JSONSerializer(), TextSerializer.mimetype: TextSerializer(), } class Deserializer(object): def __init__(self, serializers, default_mimetype='application/json'): try: self.default = serializers[default_mimetype] except KeyError: raise ImproperlyConfigured('Cannot find default serializer (%s)' % default_mimetype) self.serializers = serializers def loads(self, s, mimetype=None): if not mimetype: deserializer = self.default else: # split out charset mimetype = mimetype.split(';', 1)[0] try: deserializer = self.serializers[mimetype] except KeyError: raise SerializationError('Unknown mimetype, unable to deserialize: %s' % mimetype) return deserializer.loads(s) elasticsearch-py-5.4.0/elasticsearch/transport.py000066400000000000000000000347371310735271200222170ustar00rootroot00000000000000import time from itertools import chain from .connection import Urllib3HttpConnection from .connection_pool import ConnectionPool, DummyConnectionPool from .serializer import JSONSerializer, Deserializer, DEFAULT_SERIALIZERS from .exceptions import ConnectionError, TransportError, SerializationError, \ ConnectionTimeout, ImproperlyConfigured def get_host_info(node_info, host): """ Simple callback that takes the node info from `/_cluster/nodes` and a parsed connection information and return the connection information. If `None` is returned this node will be skipped. Useful for filtering nodes (by proximity for example) or if additional information needs to be provided for the :class:`~elasticsearch.Connection` class. By default master only nodes are filtered out since they shouldn't typically be used for API operations. :arg node_info: node information from `/_cluster/nodes` :arg host: connection information (host, port) extracted from the node info """ # ignore master only nodes if node_info.get('roles', []) == ['master']: return None return host class Transport(object): """ Encapsulation of transport-related to logic. Handles instantiation of the individual connections as well as creating a connection pool to hold them. Main interface is the `perform_request` method. """ def __init__(self, hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, host_info_callback=get_host_info, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=.1, sniff_on_connection_fail=False, serializer=JSONSerializer(), serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504, ), retry_on_timeout=False, send_get_body_as='GET', **kwargs): """ :arg hosts: list of dictionaries, each containing keyword arguments to create a `connection_class` instance :arg connection_class: subclass of :class:`~elasticsearch.Connection` to use :arg connection_pool_class: subclass of :class:`~elasticsearch.ConnectionPool` to use :arg host_info_callback: callback responsible for taking the node information from `/_cluser/nodes`, along with already extracted information, and producing a list of arguments (same as `hosts` parameter) :arg sniff_on_start: flag indicating whether to obtain a list of nodes from the cluser at startup time :arg sniffer_timeout: number of seconds between automatic sniffs :arg sniff_on_connection_fail: flag controlling if connection failure triggers a sniff :arg sniff_timeout: timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if ``sniff_on_start`` is on) when the connection still isn't initialized. :arg serializer: serializer instance :arg serializers: optional dict of serializer instances that will be used for deserializing data coming from the server. (key is the mimetype) :arg default_mimetype: when no mimetype is specified by the server response assume this mimetype, defaults to `'application/json'` :arg max_retries: maximum number of retries before an exception is propagated :arg retry_on_status: set of HTTP status codes on which we should retry on a different node. defaults to ``(502, 503, 504)`` :arg retry_on_timeout: should timeout trigger a retry on different node? (default `False`) :arg send_get_body_as: for GET requests with body this option allows you to specify an alternate way of execution for environments that don't support passing bodies with GET requests. If you set this to 'POST' a POST method will be used instead, if to 'source' then the body will be serialized and passed as a query parameter `source`. Any extra keyword arguments will be passed to the `connection_class` when creating and instance unless overriden by that connection's options provided as part of the hosts parameter. """ # serialization config _serializers = DEFAULT_SERIALIZERS.copy() # if a serializer has been specified, use it for deserialization as well _serializers[serializer.mimetype] = serializer # if custom serializers map has been supplied, override the defaults with it if serializers: _serializers.update(serializers) # create a deserializer with our config self.deserializer = Deserializer(_serializers, default_mimetype) self.max_retries = max_retries self.retry_on_timeout = retry_on_timeout self.retry_on_status = retry_on_status self.send_get_body_as = send_get_body_as # data serializer self.serializer = serializer # store all strategies... self.connection_pool_class = connection_pool_class self.connection_class = connection_class # ...save kwargs to be passed to the connections self.kwargs = kwargs self.hosts = hosts # ...and instantiate them self.set_connections(hosts) # retain the original connection instances for sniffing self.seed_connections = self.connection_pool.connections[:] # sniffing data self.sniffer_timeout = sniffer_timeout self.sniff_on_connection_fail = sniff_on_connection_fail self.last_sniff = time.time() self.sniff_timeout = sniff_timeout # callback to construct host dict from data in /_cluster/nodes self.host_info_callback = host_info_callback if sniff_on_start: self.sniff_hosts(True) def add_connection(self, host): """ Create a new :class:`~elasticsearch.Connection` instance and add it to the pool. :arg host: kwargs that will be used to create the instance """ self.hosts.append(host) self.set_connections(self.hosts) def set_connections(self, hosts): """ Instantiate all the connections and crate new connection pool to hold them. Tries to identify unchanged hosts and re-use existing :class:`~elasticsearch.Connection` instances. :arg hosts: same as `__init__` """ # construct the connections def _create_connection(host): # if this is not the initial setup look at the existing connection # options and identify connections that haven't changed and can be # kept around. if hasattr(self, 'connection_pool'): for (connection, old_host) in self.connection_pool.connection_opts: if old_host == host: return connection # previously unseen params, create new connection kwargs = self.kwargs.copy() kwargs.update(host) return self.connection_class(**kwargs) connections = map(_create_connection, hosts) connections = list(zip(connections, hosts)) if len(connections) == 1: self.connection_pool = DummyConnectionPool(connections) else: # pass the hosts dicts to the connection pool to optionally extract parameters from self.connection_pool = self.connection_pool_class(connections, **self.kwargs) def get_connection(self): """ Retreive a :class:`~elasticsearch.Connection` instance from the :class:`~elasticsearch.ConnectionPool` instance. """ if self.sniffer_timeout: if time.time() >= self.last_sniff + self.sniffer_timeout: self.sniff_hosts() return self.connection_pool.get_connection() def _get_sniff_data(self, initial=False): """ Perform the request to get sniffins information. Returns a list of dictionaries (one per node) containing all the information from the cluster. It also sets the last_sniff attribute in case of a successful attempt. In rare cases it might be possible to override this method in your custom Transport class to serve data from alternative source like configuration management. """ previous_sniff = self.last_sniff try: # reset last_sniff timestamp self.last_sniff = time.time() # go through all current connections as well as the # seed_connections for good measure for c in chain(self.connection_pool.connections, self.seed_connections): try: # use small timeout for the sniffing request, should be a fast api call _, headers, node_info = c.perform_request( 'GET', '/_nodes/_all/http', timeout=self.sniff_timeout if not initial else None) node_info = self.deserializer.loads(node_info, headers.get('content-type')) break except (ConnectionError, SerializationError): pass else: raise TransportError("N/A", "Unable to sniff hosts.") except: # keep the previous value on error self.last_sniff = previous_sniff raise return list(node_info['nodes'].values()) def _get_host_info(self, host_info): host = {} address = host_info.get('http', {}).get('publish_address') # malformed or no address given if not address or ':' not in address: return None host['host'], host['port'] = address.rsplit(':', 1) host['port'] = int(host['port']) return self.host_info_callback(host_info, host) def sniff_hosts(self, initial=False): """ Obtain a list of nodes from the cluster and create a new connection pool using the information retrieved. To extract the node connection parameters use the ``nodes_to_host_callback``. :arg initial: flag indicating if this is during startup (``sniff_on_start``), ignore the ``sniff_timeout`` if ``True`` """ node_info = self._get_sniff_data(initial) hosts = list(filter(None, (self._get_host_info(n) for n in node_info))) # we weren't able to get any nodes or host_info_callback blocked all - # raise error. if not hosts: raise TransportError("N/A", "Unable to sniff hosts - no viable hosts found.") self.set_connections(hosts) def mark_dead(self, connection): """ Mark a connection as dead (failed) in the connection pool. If sniffing on failure is enabled this will initiate the sniffing process. :arg connection: instance of :class:`~elasticsearch.Connection` that failed """ # mark as dead even when sniffing to avoid hitting this host during the sniff process self.connection_pool.mark_dead(connection) if self.sniff_on_connection_fail: self.sniff_hosts() def perform_request(self, method, url, params=None, body=None): """ Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it's perform_request method and return the data. If an exception was raised, mark the connection as failed and retry (up to `max_retries` times). If the operation was succesful and the connection used was previously marked as dead, mark it as live, resetting it's failure count. :arg method: HTTP method to use :arg url: absolute url (without host) to target :arg params: dictionary of query parameters, will be handed over to the underlying :class:`~elasticsearch.Connection` class for serialization :arg body: body of the request, will be serializes using serializer and passed to the connection """ if body is not None: body = self.serializer.dumps(body) # some clients or environments don't support sending GET with body if method in ('HEAD', 'GET') and self.send_get_body_as != 'GET': # send it as post instead if self.send_get_body_as == 'POST': method = 'POST' # or as source parameter elif self.send_get_body_as == 'source': if params is None: params = {} params['source'] = body body = None if body is not None: try: body = body.encode('utf-8') except (UnicodeDecodeError, AttributeError): # bytes/str - no need to re-encode pass ignore = () timeout = None if params: timeout = params.pop('request_timeout', None) ignore = params.pop('ignore', ()) if isinstance(ignore, int): ignore = (ignore, ) for attempt in range(self.max_retries + 1): connection = self.get_connection() try: status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) except TransportError as e: if method == 'HEAD' and e.status_code == 404: return False retry = False if isinstance(e, ConnectionTimeout): retry = self.retry_on_timeout elif isinstance(e, ConnectionError): retry = True elif e.status_code in self.retry_on_status: retry = True if retry: # only mark as dead if we are retrying self.mark_dead(connection) # raise exception on last retry if attempt == self.max_retries: raise else: raise else: if method == 'HEAD': return 200 <= status < 300 # connection didn't fail, confirm it's live status self.connection_pool.mark_live(connection) if data: data = self.deserializer.loads(data, headers.get('content-type')) return data def close(self): """ Explcitly closes connections """ self.connection_pool.close() elasticsearch-py-5.4.0/example/000077500000000000000000000000001310735271200164145ustar00rootroot00000000000000elasticsearch-py-5.4.0/example/README.rst000066400000000000000000000015531310735271200201070ustar00rootroot00000000000000Example code for `elasticsearch-py` =================================== This example code demonstrates the features and use patterns for the Python client. To run this example make sure you have elasticsearch running on port 9200, install additional dependencies (on top of `elasticsearch-py`):: pip install python-dateutil GitPython And now you can load the index (the index will be called `git`):: python load.py This will create an index with mappings and parse the git information of this repository and load all the commits into it. You can run some sample queries by running:: python queries.py Look at the `queries.py` file for querying example and `load.py` on examples on loading data into elasticsearch. Both `load` and `queries` set up logging so in `/tmp/es_trace.log` you will have a transcript of the commands being run in the curl format. elasticsearch-py-5.4.0/example/load.py000066400000000000000000000147311310735271200177130ustar00rootroot00000000000000#!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function from os.path import dirname, basename, abspath from datetime import datetime import logging import sys import git from elasticsearch import Elasticsearch from elasticsearch.exceptions import TransportError from elasticsearch.helpers import bulk, streaming_bulk def create_git_index(client, index): # we will use user on several places user_mapping = { 'properties': { 'name': { 'type': 'text', 'fields': { 'raw': {'type': 'keyword'}, } } } } create_index_body = { 'settings': { # just one shard, no replicas for testing 'number_of_shards': 1, 'number_of_replicas': 0, # custom analyzer for analyzing file paths 'analysis': { 'analyzer': { 'file_path': { 'type': 'custom', 'tokenizer': 'path_hierarchy', 'filter': ['lowercase'] } } } }, 'mappings': { 'commits': { '_parent': { 'type': 'repos' }, 'properties': { 'author': user_mapping, 'authored_date': {'type': 'date'}, 'committer': user_mapping, 'committed_date': {'type': 'date'}, 'parent_shas': {'type': 'keyword'}, 'description': {'type': 'text', 'analyzer': 'snowball'}, 'files': {'type': 'text', 'analyzer': 'file_path', "fielddata": True} } }, 'repos': { 'properties': { 'owner': user_mapping, 'created_at': {'type': 'date'}, 'description': { 'type': 'text', 'analyzer': 'snowball', }, 'tags': {'type': 'keyword'} } } } } # create empty index try: client.indices.create( index=index, body=create_index_body, ) except TransportError as e: # ignore already existing index if e.error == 'index_already_exists_exception': pass else: raise def parse_commits(head, name): """ Go through the git repository log and generate a document per commit containing all the metadata. """ for commit in head.traverse(): yield { '_id': commit.hexsha, '_parent': name, 'committed_date': datetime.fromtimestamp(commit.committed_date), 'committer': { 'name': commit.committer.name, 'email': commit.committer.email, }, 'authored_date': datetime.fromtimestamp(commit.authored_date), 'author': { 'name': commit.author.name, 'email': commit.author.email, }, 'description': commit.message, 'parent_shas': [p.hexsha for p in commit.parents], # we only care about the filenames, not the per-file stats 'files': list(commit.stats.files), 'stats': commit.stats.total, } def load_repo(client, path=None, index='git'): """ Parse a git repository with all it's commits and load it into elasticsearch using `client`. If the index doesn't exist it will be created. """ path = dirname(dirname(abspath(__file__))) if path is None else path repo_name = basename(path) repo = git.Repo(path) create_git_index(client, index) # create the parent document in case it doesn't exist client.create( index=index, doc_type='repos', id=repo_name, body={}, ignore=409 # 409 - conflict - would be returned if the document is already there ) # we let the streaming bulk continuously process the commits as they come # in - since the `parse_commits` function is a generator this will avoid # loading all the commits into memory for ok, result in streaming_bulk( client, parse_commits(repo.refs.master.commit, repo_name), index=index, doc_type='commits', chunk_size=50 # keep the batch sizes small for appearances only ): action, result = result.popitem() doc_id = '/%s/commits/%s' % (index, result['_id']) # process the information from ES whether the document has been # successfully indexed if not ok: print('Failed to %s document %s: %r' % (action, doc_id, result)) else: print(doc_id) # we manually create es repo document and update elasticsearch-py to include metadata REPO_ACTIONS = [ {'_type': 'repos', '_id': 'elasticsearch', '_source': { 'owner': {'name': 'Shay Bannon', 'email': 'kimchy@gmail.com'}, 'created_at': datetime(2010, 2, 8, 15, 22, 27), 'tags': ['search', 'distributed', 'lucene'], 'description': 'You know, for search.'} }, {'_type': 'repos', '_id': 'elasticsearch-py', '_op_type': 'update', 'doc': { 'owner': {'name': u'Honza Král', 'email': 'honza.kral@gmail.com'}, 'created_at': datetime(2013, 5, 1, 16, 37, 32), 'tags': ['elasticsearch', 'search', 'python', 'client'], 'description': 'For searching snakes.'} }, ] if __name__ == '__main__': # get trace logger and set level tracer = logging.getLogger('elasticsearch.trace') tracer.setLevel(logging.INFO) tracer.addHandler(logging.FileHandler('/tmp/es_trace.log')) # instantiate es client, connects to localhost:9200 by default es = Elasticsearch() # we load the repo and all commits load_repo(es, path=sys.argv[1] if len(sys.argv) == 2 else None) # run the bulk operations success, _ = bulk(es, REPO_ACTIONS, index='git', raise_on_error=True) print('Performed %d actions' % success) # now we can retrieve the documents es_repo = es.get(index='git', doc_type='repos', id='elasticsearch') print('%s: %s' % (es_repo['_id'], es_repo['_source']['description'])) # update - add java to es tags es.update( index='git', doc_type='repos', id='elasticsearch', body={ "script": { "inline" : "ctx._source.tags.add(params.tag)", "params" : { "tag" : "java" } } } ) # refresh to make the documents available for search es.indices.refresh(index='git') # and now we can count the documents print(es.count(index='git')['count'], 'documents in index') elasticsearch-py-5.4.0/example/queries.py000066400000000000000000000053551310735271200204530ustar00rootroot00000000000000#!/usr/bin/env python from __future__ import print_function import logging from dateutil.parser import parse as parse_date from elasticsearch import Elasticsearch def print_search_stats(results): print('=' * 80) print('Total %d found in %dms' % (results['hits']['total'], results['took'])) print('-' * 80) def print_hits(results): " Simple utility function to print results of a search query. " print_search_stats(results) for hit in results['hits']['hits']: # get created date for a repo and fallback to authored_date for a commit created_at = parse_date(hit['_source'].get('created_at', hit['_source']['authored_date'])) print('/%s/%s/%s (%s): %s' % ( hit['_index'], hit['_type'], hit['_id'], created_at.strftime('%Y-%m-%d'), hit['_source']['description'].replace('\n', ' '))) print('=' * 80) print() # get trace logger and set level tracer = logging.getLogger('elasticsearch.trace') tracer.setLevel(logging.INFO) tracer.addHandler(logging.FileHandler('/tmp/es_trace.log')) # instantiate es client, connects to localhost:9200 by default es = Elasticsearch() print('Empty search:') print_hits(es.search(index='git')) print('Find commits that says "fix" without touching tests:') result = es.search( index='git', doc_type='commits', body={ 'query': { 'bool': { 'must': { 'match': {'description': 'fix'} }, 'must_not': { 'term': {'files': 'test_elasticsearch'} } } } } ) print_hits(result) print('Last 8 Commits for elasticsearch-py:') result = es.search( index='git', doc_type='commits', body={ 'query': { 'parent_id': { 'type': 'commits', 'id': 'elasticsearch-py' } }, 'sort': [ {'committed_date': {'order': 'desc'}} ], 'size': 8 } ) print_hits(result) print('Stats for top 10 python committers:') result = es.search( index='git', doc_type='commits', body={ 'size': 0, 'query': { 'has_parent': { 'parent_type': 'repos', 'query': { 'term': { 'tags': 'python' } } } }, 'aggs': { 'committers': { 'terms': { 'field': 'committer.name.raw', }, 'aggs': { 'line_stats': { 'stats': {'field': 'stats.lines'} } } } } } ) print_search_stats(result) for committer in result['aggregations']['committers']['buckets']: print('%15s: %3d commits changing %6d lines' % ( committer['key'], committer['doc_count'], committer['line_stats']['sum'])) print('=' * 80) elasticsearch-py-5.4.0/setup.cfg000066400000000000000000000002151310735271200166000ustar00rootroot00000000000000[build_sphinx] source-dir = docs/ build-dir = docs/_build all_files = 1 [wheel] universal = 1 [bdist_rpm] requires = python python-urllib3 elasticsearch-py-5.4.0/setup.py000066400000000000000000000036711310735271200165020ustar00rootroot00000000000000# -*- coding: utf-8 -*- from os.path import join, dirname from setuptools import setup, find_packages import sys VERSION = (5, 4, 0) __version__ = VERSION __versionstr__ = '.'.join(map(str, VERSION)) f = open(join(dirname(__file__), 'README')) long_description = f.read().strip() f.close() install_requires = [ 'urllib3>=1.8, <2.0', ] tests_require = [ 'requests>=2.0.0, <3.0.0', 'nose', 'coverage', 'mock', 'pyaml', 'nosexcover' ] # use external unittest for 2.6 if sys.version_info[:2] == (2, 6): install_requires.append('unittest2') setup( name = 'elasticsearch', description = "Python client for Elasticsearch", license="Apache License, Version 2.0", url = "https://github.com/elastic/elasticsearch-py", long_description = long_description, version = __versionstr__, author = "Honza Král", author_email = "honza.kral@gmail.com", packages=find_packages( where='.', exclude=('test_elasticsearch*', ) ), classifiers = [ "Development Status :: 5 - Production/Stable", "License :: OSI Approved :: Apache Software License", "Intended Audience :: Developers", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", ], install_requires=install_requires, test_suite='test_elasticsearch.run_tests.run_all', tests_require=tests_require, extras_require={ 'develop': tests_require + ["sphinx", "sphinx_rtd_theme"] }, ) elasticsearch-py-5.4.0/start_elasticsearch.sh000077500000000000000000000007241310735271200213520ustar00rootroot00000000000000#!/bin/bash # Start elasticsearch in a docker container ES_VERSION=${ES_VERSION:-"latest"} ES_TEST_SERVER=${ES_TEST_SERVER:-"http://localhost:9200"} exec docker run -d \ -e script.inline=true \ -e path.repo=/tmp \ -e "repositories.url.allowed_urls=http://*" \ -e node.attr.testattr=test \ -e ES_HOST=$ES_TEST_SERVER \ -v `pwd`/../elasticsearch:/code/elasticsearch \ -v /tmp:/tmp \ -p "9200:9200" \ fxdgear/elasticsearch:$ES_VERSION elasticsearch-py-5.4.0/test_elasticsearch/000077500000000000000000000000001310735271200206325ustar00rootroot00000000000000elasticsearch-py-5.4.0/test_elasticsearch/README.rst000066400000000000000000000044101310735271200223200ustar00rootroot00000000000000elasticsearch-py test suite =========================== Warning - by default the tests will try and connect to `localhost:9200` and will destroy all contents of given cluster! The tests also rely on a checkout of `elasticsearch` repository existing on the same level as the `elasticsearch-py` clone. Before running the tests we will, by default, pull latest changes for that repo and perform `git reset --hard` to the exact version that was used to build the server we are running against. Running the tests ----------------- To simply run the tests just execute the `run_tests.py` script or invoke `python setup.py test`. The behavior is driven by environmental variables: * `TEST_ES_SERVER` - can contain "hostname[:port]" of running es cluster * `TEST_ES_CONNECTION` - name of the connection class to use from `elasticsearch.connection` module. If you want to run completely with your own see section on customizing tests. * `TEST_ES_YAML_DIR` - path to the yaml test suite contained in the elasticsearch repo. Defaults to `$TEST_ES_REPO/rest-api-spec/test` * `TEST_ES_REPO` - path to the elasticsearch repo, by default it will look in the same directory as `elasticsearch-py` is in. It will not be used if `TEST_ES_YAML_DIR` is specified directly. * `TEST_ES_NOFETCH` - controls if we should fetch new updates to elasticsearch repo and reset it's version to the sha used to build the current es server. Defaults to `False` which means we will fetch the elasticsearch repo and `git reset --hard` the sha used to build the server. Alternatively, if you wish to control what you are doing you have several additional options: * `run_tests.py` will pass any parameters specified to `nosetests` * you can just run your favorite runner in the `test_elasticsearch` directory (verified to work with nose and py.test) and bypass the fetch logic entirely. Customizing the tests --------------------- You can create a `local.py` file in the `test_elasticsearch` directory which should contain a `get_client` function. If this file exists the function will be used instead of `elasticsearch.helpers.test.get_test_client` to construct the client used for any integration tests. You can use this to make sure your plugins and extensions work with `elasticsearch-py`. elasticsearch-py-5.4.0/test_elasticsearch/__init__.py000066400000000000000000000000001310735271200227310ustar00rootroot00000000000000elasticsearch-py-5.4.0/test_elasticsearch/run_tests.py000077500000000000000000000043031310735271200232350ustar00rootroot00000000000000#!/usr/bin/env python from __future__ import print_function import sys from os import environ from os.path import dirname, join, pardir, abspath, exists import subprocess import nose def fetch_es_repo(): # user is manually setting YAML dir, don't tamper with it if 'TEST_ES_YAML_DIR' in environ: return repo_path = environ.get( 'TEST_ES_REPO', abspath(join(dirname(__file__), pardir, pardir, 'elasticsearch')) ) # no repo if not exists(repo_path) or not exists(join(repo_path, '.git')): print('No elasticsearch repo found...') # set YAML DIR to empty to skip yaml tests environ['TEST_ES_YAML_DIR'] = '' return # set YAML test dir environ['TEST_ES_YAML_DIR'] = join(repo_path, 'rest-api-spec', 'src', 'main', 'resources', 'rest-api-spec', 'test') # fetching of yaml tests disabled, we'll run with what's there if environ.get('TEST_ES_NOFETCH', False): return from test_elasticsearch.test_server import get_client from test_elasticsearch.test_cases import SkipTest # find out the sha of the running es try: es = get_client() sha = es.info()['version']['build_hash'] except (SkipTest, KeyError): print('No running elasticsearch >1.X server...') return # fetch new commits to be sure... print('Fetching elasticsearch repo...') subprocess.check_call('cd %s && git fetch https://github.com/elasticsearch/elasticsearch.git' % repo_path, shell=True) # reset to the version fron info() subprocess.check_call('cd %s && git reset --hard %s' % (repo_path, sha), shell=True) def run_all(argv=None): sys.exitfunc = lambda: sys.stderr.write('Shutting down....\n') # fetch yaml tests fetch_es_repo() # always insert coverage when running tests if argv is None: argv = [ 'nosetests', '--with-xunit', '--with-xcoverage', '--cover-package=elasticsearch', '--cover-erase', '--logging-filter=elasticsearch', '--logging-level=DEBUG', '--verbose', ] nose.run_exit( argv=argv, defaultTest=abspath(dirname(__file__)) ) if __name__ == '__main__': run_all(sys.argv) elasticsearch-py-5.4.0/test_elasticsearch/test_cases.py000066400000000000000000000034151310735271200233440ustar00rootroot00000000000000from collections import defaultdict try: # python 2.6 from unittest2 import TestCase, SkipTest except ImportError: from unittest import TestCase, SkipTest from elasticsearch import Elasticsearch class DummyTransport(object): def __init__(self, hosts, responses=None, **kwargs): self.hosts = hosts self.responses = responses self.call_count = 0 self.calls = defaultdict(list) def perform_request(self, method, url, params=None, body=None): resp = 200, {} if self.responses: resp = self.responses[self.call_count] self.call_count += 1 self.calls[(method, url)].append((params, body)) return resp class ElasticsearchTestCase(TestCase): def setUp(self): super(ElasticsearchTestCase, self).setUp() self.client = Elasticsearch(transport_class=DummyTransport) def assert_call_count_equals(self, count): self.assertEquals(count, self.client.transport.call_count) def assert_url_called(self, method, url, count=1): self.assertIn((method, url), self.client.transport.calls) calls = self.client.transport.calls[(method, url)] self.assertEquals(count, len(calls)) return calls class TestElasticsearchTestCase(ElasticsearchTestCase): def test_our_transport_used(self): self.assertIsInstance(self.client.transport, DummyTransport) def test_start_with_0_call(self): self.assert_call_count_equals(0) def test_each_call_is_recorded(self): self.client.transport.perform_request('GET', '/') self.client.transport.perform_request('DELETE', '/42', params={}, body='body') self.assert_call_count_equals(2) self.assertEquals([({}, 'body')], self.assert_url_called('DELETE', '/42', 1)) elasticsearch-py-5.4.0/test_elasticsearch/test_client/000077500000000000000000000000001310735271200231475ustar00rootroot00000000000000elasticsearch-py-5.4.0/test_elasticsearch/test_client/__init__.py000066400000000000000000000065071310735271200252700ustar00rootroot00000000000000from __future__ import unicode_literals from elasticsearch.client import _normalize_hosts, Elasticsearch from ..test_cases import TestCase, ElasticsearchTestCase class TestNormalizeHosts(TestCase): def test_none_uses_defaults(self): self.assertEquals([{}], _normalize_hosts(None)) def test_strings_are_used_as_hostnames(self): self.assertEquals([{"host": "elastic.co"}], _normalize_hosts(["elastic.co"])) def test_strings_are_parsed_for_port_and_user(self): self.assertEquals( [{"host": "elastic.co", "port": 42}, {"host": "elastic.co", "http_auth": "user:secre]"}], _normalize_hosts(["elastic.co:42", "user:secre%5D@elastic.co"]) ) def test_strings_are_parsed_for_scheme(self): self.assertEquals( [ { "host": "elastic.co", "port": 42, "use_ssl": True, }, { "host": "elastic.co", "http_auth": "user:secret", "use_ssl": True, "port": 443, 'url_prefix': '/prefix' } ], _normalize_hosts(["https://elastic.co:42", "https://user:secret@elastic.co/prefix"]) ) def test_dicts_are_left_unchanged(self): self.assertEquals([{"host": "local", "extra": 123}], _normalize_hosts([{"host": "local", "extra": 123}])) def test_single_string_is_wrapped_in_list(self): self.assertEquals( [{"host": "elastic.co"}], _normalize_hosts("elastic.co") ) class TestClient(ElasticsearchTestCase): def test_request_timeout_is_passed_through_unescaped(self): self.client.ping(request_timeout=.1) calls = self.assert_url_called('HEAD', '/') self.assertEquals([({'request_timeout': .1}, None)], calls) def test_params_is_copied_when(self): rt = object() params = dict(request_timeout=rt) self.client.ping(params=params) self.client.ping(params=params) calls = self.assert_url_called('HEAD', '/', 2) self.assertEquals( [ ({'request_timeout': rt}, None), ({'request_timeout': rt}, None) ], calls ) self.assertFalse(calls[0][0] is calls[1][0]) def test_from_in_search(self): self.client.search(index='i', doc_type='t', from_=10) calls = self.assert_url_called('GET', '/i/t/_search') self.assertEquals([({'from': '10'}, None)], calls) def test_repr_contains_hosts(self): self.assertEquals('', repr(self.client)) def test_repr_contains_hosts_passed_in(self): self.assertIn("es.org", repr(Elasticsearch(['es.org:123']))) def test_repr_truncates_host_to_10(self): hosts = [{"host": "es" + str(i)} for i in range(20)] self.assertNotIn("es5", repr(Elasticsearch(hosts))) def test_index_uses_post_if_id_is_empty(self): self.client.index(index='my-index', doc_type='test-doc', id='', body={}) self.assert_url_called('POST', '/my-index/test-doc') def test_index_uses_put_if_id_is_not_empty(self): self.client.index(index='my-index', doc_type='test-doc', id=0, body={}) self.assert_url_called('PUT', '/my-index/test-doc/0') elasticsearch-py-5.4.0/test_elasticsearch/test_client/test_indices.py000066400000000000000000000016421310735271200262010ustar00rootroot00000000000000from test_elasticsearch.test_cases import ElasticsearchTestCase class TestIndices(ElasticsearchTestCase): def test_create_one_index(self): self.client.indices.create('test-index') self.assert_url_called('PUT', '/test-index') def test_delete_multiple_indices(self): self.client.indices.delete(['test-index', 'second.index', 'third/index']) self.assert_url_called('DELETE', '/test-index,second.index,third%2Findex') def test_exists_index(self): self.client.indices.exists('second.index,third/index') self.assert_url_called('HEAD', '/second.index,third%2Findex') def test_passing_empty_value_for_required_param_raises_exception(self): self.assertRaises(ValueError, self.client.indices.exists, index=None) self.assertRaises(ValueError, self.client.indices.exists, index=[]) self.assertRaises(ValueError, self.client.indices.exists, index='') elasticsearch-py-5.4.0/test_elasticsearch/test_client/test_utils.py000066400000000000000000000012131310735271200257150ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals from elasticsearch.client.utils import _make_path from elasticsearch.compat import PY2 from ..test_cases import TestCase, SkipTest class TestMakePath(TestCase): def test_handles_unicode(self): id = "中文" self.assertEquals('/some-index/type/%E4%B8%AD%E6%96%87', _make_path('some-index', 'type', id)) def test_handles_utf_encoded_string(self): if not PY2: raise SkipTest('Only relevant for py2') id = "中文".encode('utf-8') self.assertEquals('/some-index/type/%E4%B8%AD%E6%96%87', _make_path('some-index', 'type', id)) elasticsearch-py-5.4.0/test_elasticsearch/test_connection.py000066400000000000000000000241731310735271200244110ustar00rootroot00000000000000import re from mock import Mock, patch import urllib3 import warnings from requests.auth import AuthBase from elasticsearch.exceptions import TransportError, ConflictError, RequestError, NotFoundError from elasticsearch.connection import RequestsHttpConnection, \ Urllib3HttpConnection from .test_cases import TestCase class TestUrllib3Connection(TestCase): def test_timeout_set(self): con = Urllib3HttpConnection(timeout=42) self.assertEquals(42, con.timeout) def test_keep_alive_is_on_by_default(self): con = Urllib3HttpConnection() self.assertEquals({'connection': 'keep-alive', 'content-type': 'application/json'}, con.headers) def test_http_auth(self): con = Urllib3HttpConnection(http_auth='username:secret') self.assertEquals({ 'authorization': 'Basic dXNlcm5hbWU6c2VjcmV0', 'connection': 'keep-alive', 'content-type': 'application/json' }, con.headers) def test_http_auth_tuple(self): con = Urllib3HttpConnection(http_auth=('username', 'secret')) self.assertEquals({'authorization': 'Basic dXNlcm5hbWU6c2VjcmV0', 'content-type': 'application/json', 'connection': 'keep-alive'}, con.headers) def test_http_auth_list(self): con = Urllib3HttpConnection(http_auth=['username', 'secret']) self.assertEquals({'authorization': 'Basic dXNlcm5hbWU6c2VjcmV0', 'content-type': 'application/json', 'connection': 'keep-alive'}, con.headers) def test_uses_https_if_verify_certs_is_off(self): with warnings.catch_warnings(record=True) as w: con = Urllib3HttpConnection(use_ssl=True, verify_certs=False) self.assertEquals(1, len(w)) self.assertEquals('Connecting to localhost using SSL with verify_certs=False is insecure.', str(w[0].message)) self.assertIsInstance(con.pool, urllib3.HTTPSConnectionPool) def test_doesnt_use_https_if_not_specified(self): con = Urllib3HttpConnection() self.assertIsInstance(con.pool, urllib3.HTTPConnectionPool) class TestRequestsConnection(TestCase): def _get_mock_connection(self, connection_params={}, status_code=200, response_body='{}'): con = RequestsHttpConnection(**connection_params) def _dummy_send(*args, **kwargs): dummy_response = Mock() dummy_response.headers = {} dummy_response.status_code = status_code dummy_response.text = response_body dummy_response.request = args[0] dummy_response.cookies = {} _dummy_send.call_args = (args, kwargs) return dummy_response con.session.send = _dummy_send return con def _get_request(self, connection, *args, **kwargs): if 'body' in kwargs: kwargs['body'] = kwargs['body'].encode('utf-8') status, headers, data = connection.perform_request(*args, **kwargs) self.assertEquals(200, status) self.assertEquals('{}', data) timeout = kwargs.pop('timeout', connection.timeout) args, kwargs = connection.session.send.call_args self.assertEquals(timeout, kwargs['timeout']) self.assertEquals(1, len(args)) return args[0] def test_custom_http_auth_is_allowed(self): auth = AuthBase() c = RequestsHttpConnection(http_auth=auth) self.assertEquals(auth, c.session.auth) def test_timeout_set(self): con = RequestsHttpConnection(timeout=42) self.assertEquals(42, con.timeout) def test_uses_https_if_verify_certs_is_off(self): with warnings.catch_warnings(record=True) as w: con = self._get_mock_connection({'use_ssl': True, 'url_prefix': 'url', 'verify_certs': False}) self.assertEquals(1, len(w)) self.assertEquals('Connecting to https://localhost:9200/url using SSL with verify_certs=False is insecure.', str(w[0].message)) request = self._get_request(con, 'GET', '/') self.assertEquals('https://localhost:9200/url/', request.url) self.assertEquals('GET', request.method) self.assertEquals(None, request.body) def test_http_auth(self): con = RequestsHttpConnection(http_auth='username:secret') self.assertEquals(('username', 'secret'), con.session.auth) def test_http_auth_tuple(self): con = RequestsHttpConnection(http_auth=('username', 'secret')) self.assertEquals(('username', 'secret'), con.session.auth) def test_http_auth_list(self): con = RequestsHttpConnection(http_auth=['username', 'secret']) self.assertEquals(('username', 'secret'), con.session.auth) def test_repr(self): con = self._get_mock_connection({"host": "elasticsearch.com", "port": 443}) self.assertEquals('', repr(con)) def test_conflict_error_is_returned_on_409(self): con = self._get_mock_connection(status_code=409) self.assertRaises(ConflictError, con.perform_request, 'GET', '/', {}, '') def test_not_found_error_is_returned_on_404(self): con = self._get_mock_connection(status_code=404) self.assertRaises(NotFoundError, con.perform_request, 'GET', '/', {}, '') def test_request_error_is_returned_on_400(self): con = self._get_mock_connection(status_code=400) self.assertRaises(RequestError, con.perform_request, 'GET', '/', {}, '') @patch('elasticsearch.connection.base.logger') def test_head_with_404_doesnt_get_logged(self, logger): con = self._get_mock_connection(status_code=404) self.assertRaises(NotFoundError, con.perform_request, 'HEAD', '/', {}, '') self.assertEquals(0, logger.warning.call_count) @patch('elasticsearch.connection.base.tracer') @patch('elasticsearch.connection.base.logger') def test_failed_request_logs_and_traces(self, logger, tracer): con = self._get_mock_connection(response_body='{"answer": 42}', status_code=500) self.assertRaises(TransportError, con.perform_request, 'GET', '/', {'param': 42}, '{}'.encode('utf-8')) # trace request self.assertEquals(1, tracer.info.call_count) # trace response self.assertEquals(1, tracer.debug.call_count) # log url and duration self.assertEquals(1, logger.warning.call_count) self.assertTrue(re.match( '^GET http://localhost:9200/\?param=42 \[status:500 request:0.[0-9]{3}s\]', logger.warning.call_args[0][0] % logger.warning.call_args[0][1:] )) @patch('elasticsearch.connection.base.tracer') @patch('elasticsearch.connection.base.logger') def test_success_logs_and_traces(self, logger, tracer): con = self._get_mock_connection(response_body='''{"answer": "that's it!"}''') status, headers, data = con.perform_request('GET', '/', {'param': 42}, '''{"question": "what's that?"}'''.encode('utf-8')) # trace request self.assertEquals(1, tracer.info.call_count) self.assertEquals( """curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/?pretty¶m=42' -d '{\n "question": "what\\u0027s that?"\n}'""", tracer.info.call_args[0][0] % tracer.info.call_args[0][1:] ) # trace response self.assertEquals(1, tracer.debug.call_count) self.assertTrue(re.match( '#\[200\] \(0.[0-9]{3}s\)\n#\{\n# "answer": "that\\\\u0027s it!"\n#\}', tracer.debug.call_args[0][0] % tracer.debug.call_args[0][1:] )) # log url and duration self.assertEquals(1, logger.info.call_count) self.assertTrue(re.match( 'GET http://localhost:9200/\?param=42 \[status:200 request:0.[0-9]{3}s\]', logger.info.call_args[0][0] % logger.info.call_args[0][1:] )) # log request body and response self.assertEquals(2, logger.debug.call_count) req, resp = logger.debug.call_args_list self.assertEquals( '> {"question": "what\'s that?"}', req[0][0] % req[0][1:] ) self.assertEquals( '< {"answer": "that\'s it!"}', resp[0][0] % resp[0][1:] ) def test_defaults(self): con = self._get_mock_connection() request = self._get_request(con, 'GET', '/') self.assertEquals('http://localhost:9200/', request.url) self.assertEquals('GET', request.method) self.assertEquals(None, request.body) def test_params_properly_encoded(self): con = self._get_mock_connection() request = self._get_request(con, 'GET', '/', params={'param': 'value with spaces'}) self.assertEquals('http://localhost:9200/?param=value+with+spaces', request.url) self.assertEquals('GET', request.method) self.assertEquals(None, request.body) def test_body_attached(self): con = self._get_mock_connection() request = self._get_request(con, 'GET', '/', body='{"answer": 42}') self.assertEquals('http://localhost:9200/', request.url) self.assertEquals('GET', request.method) self.assertEquals('{"answer": 42}'.encode('utf-8'), request.body) def test_http_auth_attached(self): con = self._get_mock_connection({'http_auth': 'username:secret'}) request = self._get_request(con, 'GET', '/') self.assertEquals(request.headers['authorization'], 'Basic dXNlcm5hbWU6c2VjcmV0') @patch('elasticsearch.connection.base.tracer') def test_url_prefix(self, tracer): con = self._get_mock_connection({"url_prefix": "/some-prefix/"}) request = self._get_request(con, 'GET', '/_search', body='{"answer": 42}', timeout=0.1) self.assertEquals('http://localhost:9200/some-prefix/_search', request.url) self.assertEquals('GET', request.method) self.assertEquals('{"answer": 42}'.encode('utf-8'), request.body) # trace request self.assertEquals(1, tracer.info.call_count) self.assertEquals( "curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/_search?pretty' -d '{\n \"answer\": 42\n}'", tracer.info.call_args[0][0] % tracer.info.call_args[0][1:] ) elasticsearch-py-5.4.0/test_elasticsearch/test_connection_pool.py000066400000000000000000000104651310735271200254410ustar00rootroot00000000000000import time from elasticsearch.connection_pool import ConnectionPool, RoundRobinSelector, DummyConnectionPool from elasticsearch.exceptions import ImproperlyConfigured from .test_cases import TestCase class TestConnectionPool(TestCase): def test_dummy_cp_raises_exception_on_more_connections(self): self.assertRaises(ImproperlyConfigured, DummyConnectionPool, []) self.assertRaises(ImproperlyConfigured, DummyConnectionPool, [object(), object()]) def test_raises_exception_when_no_connections_defined(self): self.assertRaises(ImproperlyConfigured, ConnectionPool, []) def test_default_round_robin(self): pool = ConnectionPool([(x, {}) for x in range(100)]) connections = set() for _ in range(100): connections.add(pool.get_connection()) self.assertEquals(connections, set(range(100))) def test_disable_shuffling(self): pool = ConnectionPool([(x, {}) for x in range(100)], randomize_hosts=False) connections = [] for _ in range(100): connections.append(pool.get_connection()) self.assertEquals(connections, list(range(100))) def test_selectors_have_access_to_connection_opts(self): class MySelector(RoundRobinSelector): def select(self, connections): return self.connection_opts[super(MySelector, self).select(connections)]["actual"] pool = ConnectionPool([(x, {"actual": x*x}) for x in range(100)], selector_class=MySelector, randomize_hosts=False) connections = [] for _ in range(100): connections.append(pool.get_connection()) self.assertEquals(connections, [x*x for x in range(100)]) def test_dead_nodes_are_removed_from_active_connections(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.mark_dead(42, now=now) self.assertEquals(99, len(pool.connections)) self.assertEquals(1, pool.dead.qsize()) self.assertEquals((now + 60, 42), pool.dead.get()) def test_connection_is_skipped_when_dead(self): pool = ConnectionPool([(x, {}) for x in range(2)]) pool.mark_dead(0) self.assertEquals([1, 1, 1], [pool.get_connection(), pool.get_connection(), pool.get_connection(), ]) def test_connection_is_forcibly_resurrected_when_no_live_ones_are_availible(self): pool = ConnectionPool([(x, {}) for x in range(2)]) pool.dead_count[0] = 1 pool.mark_dead(0) # failed twice, longer timeout pool.mark_dead(1) # failed the first time, first to be resurrected self.assertEquals([], pool.connections) self.assertEquals(1, pool.get_connection()) self.assertEquals([1,], pool.connections) def test_connection_is_resurrected_after_its_timeout(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.mark_dead(42, now=now-61) pool.get_connection() self.assertEquals(42, pool.connections[-1]) self.assertEquals(100, len(pool.connections)) def test_force_resurrect_always_returns_a_connection(self): pool = ConnectionPool([(0, {})]) pool.connections = [] self.assertEquals(0, pool.get_connection()) self.assertEquals([], pool.connections) self.assertTrue(pool.dead.empty()) def test_already_failed_connection_has_longer_timeout(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.dead_count[42] = 2 pool.mark_dead(42, now=now) self.assertEquals(3, pool.dead_count[42]) self.assertEquals((now + 4*60, 42), pool.dead.get()) def test_timeout_for_failed_connections_is_limitted(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.dead_count[42] = 245 pool.mark_dead(42, now=now) self.assertEquals(246, pool.dead_count[42]) self.assertEquals((now + 32*60, 42), pool.dead.get()) def test_dead_count_is_wiped_clean_for_connection_if_marked_live(self): pool = ConnectionPool([(x, {}) for x in range(100)]) now = time.time() pool.dead_count[42] = 2 pool.mark_dead(42, now=now) self.assertEquals(3, pool.dead_count[42]) pool.mark_live(42) self.assertNotIn(42, pool.dead_count) elasticsearch-py-5.4.0/test_elasticsearch/test_helpers.py000066400000000000000000000033351310735271200237110ustar00rootroot00000000000000import mock import time import threading from elasticsearch import helpers, Elasticsearch from elasticsearch.serializer import JSONSerializer from .test_cases import TestCase class TestParallelBulk(TestCase): @mock.patch('elasticsearch.helpers._process_bulk_chunk', return_value=[]) def test_all_chunks_sent(self, _process_bulk_chunk): actions = ({'x': i} for i in range(100)) list(helpers.parallel_bulk(Elasticsearch(), actions, chunk_size=2)) self.assertEquals(50, _process_bulk_chunk.call_count) @mock.patch( 'elasticsearch.helpers._process_bulk_chunk', # make sure we spend some time in the thread side_effect=lambda *a: [(True, time.sleep(.001) or threading.current_thread().ident)] ) def test_chunk_sent_from_different_threads(self, _process_bulk_chunk): actions = ({'x': i} for i in range(100)) results = list(helpers.parallel_bulk(Elasticsearch(), actions, thread_count=10, chunk_size=2)) self.assertTrue(len(set([r[1] for r in results])) > 1) class TestChunkActions(TestCase): def setUp(self): super(TestChunkActions, self).setUp() self.actions = [({'index': {}}, {'some': 'data', 'i': i}) for i in range(100)] def test_chunks_are_chopped_by_byte_size(self): self.assertEquals(100, len(list(helpers._chunk_actions(self.actions, 100000, 1, JSONSerializer())))) def test_chunks_are_chopped_by_chunk_size(self): self.assertEquals(10, len(list(helpers._chunk_actions(self.actions, 10, 99999999, JSONSerializer())))) class TestExpandActions(TestCase): def test_string_actions_are_marked_as_simple_inserts(self): self.assertEquals(('{"index":{}}', "whatever"), helpers.expand_action('whatever')) elasticsearch-py-5.4.0/test_elasticsearch/test_serializer.py000066400000000000000000000051421310735271200244160ustar00rootroot00000000000000# -*- coding: utf-8 -*- import sys import uuid from datetime import datetime from decimal import Decimal from elasticsearch.serializer import JSONSerializer, Deserializer, DEFAULT_SERIALIZERS, TextSerializer from elasticsearch.exceptions import SerializationError, ImproperlyConfigured from .test_cases import TestCase, SkipTest class TestJSONSerializer(TestCase): def test_datetime_serialization(self): self.assertEquals('{"d": "2010-10-01T02:30:00"}', JSONSerializer().dumps({'d': datetime(2010, 10, 1, 2, 30)})) def test_decimal_serialization(self): if sys.version_info[:2] == (2, 6): raise SkipTest("Float rounding is broken in 2.6.") self.assertEquals('{"d": 3.8}', JSONSerializer().dumps({'d': Decimal('3.8')})) def test_uuid_serialization(self): self.assertEquals('{"d": "00000000-0000-0000-0000-000000000003"}', JSONSerializer().dumps({'d': uuid.UUID('00000000-0000-0000-0000-000000000003')})) def test_raises_serialization_error_on_dump_error(self): self.assertRaises(SerializationError, JSONSerializer().dumps, object()) def test_raises_serialization_error_on_load_error(self): self.assertRaises(SerializationError, JSONSerializer().loads, object()) self.assertRaises(SerializationError, JSONSerializer().loads, '') self.assertRaises(SerializationError, JSONSerializer().loads, '{{') def test_strings_are_left_untouched(self): self.assertEquals("你好", JSONSerializer().dumps("你好")) class TestTextSerializer(TestCase): def test_strings_are_left_untouched(self): self.assertEquals("你好", TextSerializer().dumps("你好")) def test_raises_serialization_error_on_dump_error(self): self.assertRaises(SerializationError, TextSerializer().dumps, {}) class TestDeserializer(TestCase): def setUp(self): super(TestDeserializer, self).setUp() self.de = Deserializer(DEFAULT_SERIALIZERS) def test_deserializes_json_by_default(self): self.assertEquals({"some": "data"}, self.de.loads('{"some":"data"}')) def test_deserializes_text_with_correct_ct(self): self.assertEquals('{"some":"data"}', self.de.loads('{"some":"data"}', 'text/plain')) self.assertEquals('{"some":"data"}', self.de.loads('{"some":"data"}', 'text/plain; charset=whatever')) def test_raises_serialization_error_on_unknown_mimetype(self): self.assertRaises(SerializationError, self.de.loads, '{}', 'text/html') def test_raises_improperly_configured_when_default_mimetype_cannot_be_deserialized(self): self.assertRaises(ImproperlyConfigured, Deserializer, {}) elasticsearch-py-5.4.0/test_elasticsearch/test_server/000077500000000000000000000000001310735271200231775ustar00rootroot00000000000000elasticsearch-py-5.4.0/test_elasticsearch/test_server/__init__.py000066400000000000000000000013701310735271200253110ustar00rootroot00000000000000from elasticsearch.helpers.test import get_test_client, ElasticsearchTestCase as BaseTestCase client = None def get_client(**kwargs): global client if client is not None and not kwargs: return client # try and locate manual override in the local environment try: from test_elasticsearch.local import get_client as local_get_client new_client = local_get_client(**kwargs) except ImportError: # fallback to using vanilla client new_client = get_test_client(**kwargs) if not kwargs: client = new_client return new_client def setup(): get_client() class ElasticsearchTestCase(BaseTestCase): @staticmethod def _get_client(**kwargs): return get_client(**kwargs) elasticsearch-py-5.4.0/test_elasticsearch/test_server/test_client.py000066400000000000000000000003711310735271200260670ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals from . import ElasticsearchTestCase class TestUnicode(ElasticsearchTestCase): def test_indices_analyze(self): self.client.indices.analyze(body='{"text": "привет"}') elasticsearch-py-5.4.0/test_elasticsearch/test_server/test_common.py000066400000000000000000000241641310735271200261070ustar00rootroot00000000000000""" Dynamically generated set of TestCases based on set of yaml files decribing some integration tests. These files are shared among all official Elasticsearch clients. """ import re from os import walk, environ from os.path import exists, join, dirname, pardir import yaml from shutil import rmtree from elasticsearch import TransportError from elasticsearch.compat import string_types from elasticsearch.helpers.test import _get_version from ..test_cases import SkipTest from . import ElasticsearchTestCase # some params had to be changed in python, keep track of them so we can rename # those in the tests accordingly PARAMS_RENAMES = { 'type': 'doc_type', 'from': 'from_', } # mapping from catch values to http status codes CATCH_CODES = { 'missing': 404, 'conflict': 409, 'unauthorized': 401, } # test features we have implemented IMPLEMENTED_FEATURES = ('gtelte', 'stash_in_path', 'headers', 'catch_unauthorized') # broken YAML tests on some releases SKIP_TESTS = { (1, 1, 2): set(('TestCatRecovery10Basic', )), '*': set(('TestSearchExists20QueryString', 'TestSearchExists10Basic')) } class InvalidActionType(Exception): pass class YamlTestCase(ElasticsearchTestCase): def setUp(self): super(YamlTestCase, self).setUp() if hasattr(self, '_setup_code'): self.run_code(self._setup_code) self.last_response = None self._state = {} def tearDown(self): if hasattr(self, '_teardown_code'): self.run_code(self._teardown_code) super(YamlTestCase, self).tearDown() for repo, definition in self.client.snapshot.get_repository(repository='_all').items(): self.client.snapshot.delete_repository(repository=repo) if definition['type'] == 'fs': rmtree('/tmp/%s' % definition['settings']['location']) def _resolve(self, value): # resolve variables if isinstance(value, string_types) and value.startswith('$'): value = value[1:] self.assertIn(value, self._state) value = self._state[value] if isinstance(value, string_types): value = value.strip() elif isinstance(value, dict): value = dict((k, self._resolve(v)) for (k, v) in value.items()) elif isinstance(value, list): value = list(map(self._resolve, value)) return value def _lookup(self, path): # fetch the possibly nested value from last_response value = self.last_response if path == '$body': return value path = path.replace(r'\.', '\1') for step in path.split('.'): if not step: continue step = step.replace('\1', '.') step = self._resolve(step) if step.isdigit() and step not in value: step = int(step) self.assertIsInstance(value, list) self.assertGreater(len(value), step) else: self.assertIn(step, value) value = value[step] return value def run_code(self, test): """ Execute an instruction based on it's type. """ for action in test: self.assertEquals(1, len(action)) action_type, action = list(action.items())[0] if hasattr(self, 'run_' + action_type): getattr(self, 'run_' + action_type)(action) else: raise InvalidActionType(action_type) def run_do(self, action): """ Perform an api call with given parameters. """ api = self.client if 'headers' in action: api = self._get_client(headers=action.pop('headers')) catch = action.pop('catch', None) self.assertEquals(1, len(action)) method, args = list(action.items())[0] # locate api endpoint for m in method.split('.'): self.assertTrue(hasattr(api, m)) api = getattr(api, m) # some parameters had to be renamed to not clash with python builtins, # compensate for k in PARAMS_RENAMES: if k in args: args[PARAMS_RENAMES[k]] = args.pop(k) # resolve vars for k in args: args[k] = self._resolve(args[k]) try: self.last_response = api(**args) except Exception as e: if not catch: raise self.run_catch(catch, e) else: if catch: raise AssertionError('Failed to catch %r in %r.' % (catch, self.last_response)) def _get_nodes(self): if not hasattr(self, '_node_info'): self._node_info = list(self.client.nodes.info(node_id='_all', metric='clear')['nodes'].values()) return self._node_info def _get_data_nodes(self): return len([info for info in self._get_nodes() if info.get('attributes', {}).get('data', 'true') == 'true']) def _get_benchmark_nodes(self): return len([info for info in self._get_nodes() if info.get('attributes', {}).get('bench', 'false') == 'true']) def run_skip(self, skip): if 'features' in skip: features = skip['features'] if not isinstance(features, (tuple, list)): features = [features] for feature in features: if feature in IMPLEMENTED_FEATURES: continue elif feature == 'requires_replica': if self._get_data_nodes() > 1: continue elif feature == 'benchmark': if self._get_benchmark_nodes(): continue raise SkipTest(skip.get('reason', 'Feature %s is not supported' % feature)) if 'version' in skip: version, reason = skip['version'], skip['reason'] if version == 'all': raise SkipTest(reason) min_version, max_version = version.split('-') min_version = _get_version(min_version) or (0, ) max_version = _get_version(max_version) or (999, ) if min_version <= self.es_version <= max_version: raise SkipTest(reason) def run_catch(self, catch, exception): if catch == 'param': self.assertIsInstance(exception, TypeError) return self.assertIsInstance(exception, TransportError) if catch in CATCH_CODES: self.assertEquals(CATCH_CODES[catch], exception.status_code) elif catch[0] == '/' and catch[-1] == '/': self.assertTrue(re.search(catch[1:-1], exception.error + ' ' + repr(exception.info)), '%s not in %r' % (catch, exception.info)) self.last_response = exception.info def run_gt(self, action): for key, value in action.items(): self.assertGreater(self._lookup(key), value) def run_gte(self, action): for key, value in action.items(): self.assertGreaterEqual(self._lookup(key), value) def run_lt(self, action): for key, value in action.items(): self.assertLess(self._lookup(key), value) def run_lte(self, action): for key, value in action.items(): self.assertLessEqual(self._lookup(key), value) def run_set(self, action): for key, value in action.items(): self._state[value] = self._lookup(key) def run_is_false(self, action): try: value = self._lookup(action) except AssertionError: pass else: self.assertIn(value, ('', None, False, 0)) def run_is_true(self, action): value = self._lookup(action) self.assertNotIn(value, ('', None, False, 0)) def run_length(self, action): for path, expected in action.items(): value = self._lookup(path) expected = self._resolve(expected) self.assertEquals(expected, len(value)) def run_match(self, action): for path, expected in action.items(): value = self._lookup(path) expected = self._resolve(expected) if isinstance(expected, string_types) and \ expected.startswith('/') and expected.endswith('/'): expected = re.compile(expected[1:-1], re.VERBOSE) self.assertTrue(expected.search(value)) else: self.assertEquals(expected, value) def construct_case(filename, name): """ Parse a definition of a test case from a yaml file and construct the TestCase subclass dynamically. """ def make_test(test_name, definition, i): def m(self): if name in SKIP_TESTS.get(self.es_version, ()) or name in SKIP_TESTS.get('*', ()): raise SkipTest() self.run_code(definition) m.__doc__ = '%s:%s.test_from_yaml_%d (%s): %s' % ( __name__, name, i, '/'.join(filename.split('/')[-2:]), test_name) m.__name__ = 'test_from_yaml_%d' % i return m with open(filename) as f: tests = list(yaml.load_all(f)) attrs = { '_yaml_file': filename } i = 0 for test in tests: for test_name, definition in test.items(): if test_name in ('setup', 'teardown'): attrs['_%s_code' % test_name] = definition continue attrs['test_from_yaml_%d' % i] = make_test(test_name, definition, i) i += 1 return type(name, (YamlTestCase, ), attrs) YAML_DIR = environ.get( 'TEST_ES_YAML_DIR', join( dirname(__file__), pardir, pardir, pardir, 'elasticsearch', 'rest-api-spec', 'src', 'main', 'resources', 'rest-api-spec', 'test' ) ) if exists(YAML_DIR): # find all the test definitions in yaml files ... for (path, dirs, files) in walk(YAML_DIR): for filename in files: if not filename.endswith(('.yaml', '.yml')): continue # ... parse them name = ('Test' + ''.join(s.title() for s in path[len(YAML_DIR) + 1:].split('/')) + filename.rsplit('.', 1)[0].title()).replace('_', '').replace('.', '') # and insert them into locals for test runner to find them locals()[name] = construct_case(join(path, filename), name) elasticsearch-py-5.4.0/test_elasticsearch/test_server/test_helpers.py000066400000000000000000000313451310735271200262600ustar00rootroot00000000000000from elasticsearch import helpers, TransportError from . import ElasticsearchTestCase from ..test_cases import SkipTest class FailingBulkClient(object): def __init__(self, client, fail_at=1): self.client = client self._called = -1 self._fail_at = fail_at self.transport = client.transport def bulk(self, *args, **kwargs): self._called += 1 if self._called == self._fail_at: raise TransportError(599, "Error!", {}) return self.client.bulk(*args, **kwargs) class TestStreamingBulk(ElasticsearchTestCase): def test_actions_remain_unchanged(self): actions = [{'_id': 1}, {'_id': 2}] for ok, item in helpers.streaming_bulk(self.client, actions, index='test-index', doc_type='answers'): self.assertTrue(ok) self.assertEquals([{'_id': 1}, {'_id': 2}], actions) def test_all_documents_get_inserted(self): docs = [{"answer": x, '_id': x} for x in range(100)] for ok, item in helpers.streaming_bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True): self.assertTrue(ok) self.assertEquals(100, self.client.count(index='test-index', doc_type='answers')['count']) self.assertEquals({"answer": 42}, self.client.get(index='test-index', doc_type='answers', id=42)['_source']) def test_all_errors_from_chunk_are_raised_on_failure(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") try: for ok, item in helpers.streaming_bulk(self.client, [{"a": "b"}, {"a": "c"}], index="i", doc_type="t", raise_on_error=True): self.assertTrue(ok) except helpers.BulkIndexError as e: self.assertEquals(2, len(e.errors)) else: assert False, "exception should have been raised" def test_different_op_types(self): if self.es_version < (0, 90, 1): raise SkipTest('update supported since 0.90.1') self.client.index(index='i', doc_type='t', id=45, body={}) self.client.index(index='i', doc_type='t', id=42, body={}) docs = [ {'_index': 'i', '_type': 't', '_id': 47, 'f': 'v'}, {'_op_type': 'delete', '_index': 'i', '_type': 't', '_id': 45}, {'_op_type': 'update', '_index': 'i', '_type': 't', '_id': 42, 'doc': {'answer': 42}} ] for ok, item in helpers.streaming_bulk(self.client, docs): self.assertTrue(ok) self.assertFalse(self.client.exists(index='i', doc_type='t', id=45)) self.assertEquals({'answer': 42}, self.client.get(index='i', id=42)['_source']) self.assertEquals({'f': 'v'}, self.client.get(index='i', id=47)['_source']) def test_transport_error_can_becaught(self): failing_client = FailingBulkClient(self.client) docs = [ {'_index': 'i', '_type': 't', '_id': 47, 'f': 'v'}, {'_index': 'i', '_type': 't', '_id': 45, 'f': 'v'}, {'_index': 'i', '_type': 't', '_id': 42, 'f': 'v'}, ] results = list(helpers.streaming_bulk(failing_client, docs, raise_on_exception=False, raise_on_error=False, chunk_size=1)) self.assertEquals(3, len(results)) self.assertEquals([True, False, True], [r[0] for r in results]) exc = results[1][1]['index'].pop('exception') self.assertIsInstance(exc, TransportError) self.assertEquals(599, exc.status_code) self.assertEquals( { 'index': { '_index': 'i', '_type': 't', '_id': 45, 'data': {'f': 'v'}, 'error': "TransportError(599, 'Error!')", 'status': 599 } }, results[1][1] ) class TestBulk(ElasticsearchTestCase): def test_bulk_works_with_single_item(self): docs = [{"answer": 42, '_id': 1}] success, failed = helpers.bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True) self.assertEquals(1, success) self.assertFalse(failed) self.assertEquals(1, self.client.count(index='test-index', doc_type='answers')['count']) self.assertEquals({"answer": 42}, self.client.get(index='test-index', doc_type='answers', id=1)['_source']) def test_all_documents_get_inserted(self): docs = [{"answer": x, '_id': x} for x in range(100)] success, failed = helpers.bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True) self.assertEquals(100, success) self.assertFalse(failed) self.assertEquals(100, self.client.count(index='test-index', doc_type='answers')['count']) self.assertEquals({"answer": 42}, self.client.get(index='test-index', doc_type='answers', id=42)['_source']) def test_stats_only_reports_numbers(self): docs = [{"answer": x} for x in range(100)] success, failed = helpers.bulk(self.client, docs, index='test-index', doc_type='answers', refresh=True, stats_only=True) self.assertEquals(100, success) self.assertEquals(0, failed) self.assertEquals(100, self.client.count(index='test-index', doc_type='answers')['count']) def test_errors_are_reported_correctly(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") success, failed = helpers.bulk( self.client, [{"a": 42}, {"a": "c", '_id': 42}], index="i", doc_type="t", raise_on_error=False ) self.assertEquals(1, success) self.assertEquals(1, len(failed)) error = failed[0] self.assertEquals('42', error['index']['_id']) self.assertEquals('t', error['index']['_type']) self.assertEquals('i', error['index']['_index']) print(error['index']['error']) self.assertTrue('MapperParsingException' in repr(error['index']['error']) or 'mapper_parsing_exception' in repr(error['index']['error'])) def test_error_is_raised(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") self.assertRaises(helpers.BulkIndexError, helpers.bulk, self.client, [{"a": 42}, {"a": "c"}], index="i", doc_type="t" ) def test_errors_are_collected_properly(self): self.client.indices.create("i", { "mappings": {"t": {"properties": {"a": {"type": "integer"}}}}, "settings": {"number_of_shards": 1, "number_of_replicas": 0} }) self.client.cluster.health(wait_for_status="yellow") success, failed = helpers.bulk( self.client, [{"a": 42}, {"a": "c"}], index="i", doc_type="t", stats_only=True, raise_on_error=False ) self.assertEquals(1, success) self.assertEquals(1, failed) class TestScan(ElasticsearchTestCase): def test_order_can_be_preserved(self): bulk = [] for x in range(100): bulk.append({"index": {"_index": "test_index", "_type": "answers", "_id": x}}) bulk.append({"answer": x, "correct": x == 42}) self.client.bulk(bulk, refresh=True) docs = list(helpers.scan(self.client, index="test_index", doc_type="answers", query={"sort": "answer"}, preserve_order=True)) self.assertEquals(100, len(docs)) self.assertEquals(list(map(str, range(100))), list(d['_id'] for d in docs)) self.assertEquals(list(range(100)), list(d['_source']['answer'] for d in docs)) def test_all_documents_are_read(self): bulk = [] for x in range(100): bulk.append({"index": {"_index": "test_index", "_type": "answers", "_id": x}}) bulk.append({"answer": x, "correct": x == 42}) self.client.bulk(bulk, refresh=True) docs = list(helpers.scan(self.client, index="test_index", doc_type="answers", size=2)) self.assertEquals(100, len(docs)) self.assertEquals(set(map(str, range(100))), set(d['_id'] for d in docs)) self.assertEquals(set(range(100)), set(d['_source']['answer'] for d in docs)) class TestReindex(ElasticsearchTestCase): def setUp(self): super(TestReindex, self).setUp() bulk = [] for x in range(100): bulk.append({"index": {"_index": "test_index", "_type": "answers" if x % 2 == 0 else "questions", "_id": x}}) bulk.append({"answer": x, "correct": x == 42}) self.client.bulk(bulk, refresh=True) def test_reindex_passes_kwargs_to_scan_and_bulk(self): helpers.reindex(self.client, "test_index", "prod_index", scan_kwargs={'doc_type': 'answers'}, bulk_kwargs={'refresh': True}) self.assertTrue(self.client.indices.exists("prod_index")) self.assertFalse(self.client.indices.exists_type(index='prod_index', doc_type='questions')) self.assertEquals(50, self.client.count(index='prod_index', doc_type='answers')['count']) self.assertEquals({"answer": 42, "correct": True}, self.client.get(index="prod_index", doc_type="answers", id=42)['_source']) def test_reindex_accepts_a_query(self): helpers.reindex(self.client, "test_index", "prod_index", query={"query": {"bool": {"filter": {"term": {"_type": "answers"}}}}}) self.client.indices.refresh() self.assertTrue(self.client.indices.exists("prod_index")) self.assertFalse(self.client.indices.exists_type(index='prod_index', doc_type='questions')) self.assertEquals(50, self.client.count(index='prod_index', doc_type='answers')['count']) self.assertEquals({"answer": 42, "correct": True}, self.client.get(index="prod_index", doc_type="answers", id=42)['_source']) def test_all_documents_get_moved(self): helpers.reindex(self.client, "test_index", "prod_index") self.client.indices.refresh() self.assertTrue(self.client.indices.exists("prod_index")) self.assertEquals(50, self.client.count(index='prod_index', doc_type='questions')['count']) self.assertEquals(50, self.client.count(index='prod_index', doc_type='answers')['count']) self.assertEquals({"answer": 42, "correct": True}, self.client.get(index="prod_index", doc_type="answers", id=42)['_source']) class TestParentChildReindex(ElasticsearchTestCase): def setUp(self): super(TestParentChildReindex, self).setUp() body={ 'settings': {"number_of_shards": 1, "number_of_replicas": 0}, 'mappings': { 'question': { }, 'answer': { '_parent': {'type': 'question'}, } } } self.client.indices.create(index='test-index', body=body) self.client.indices.create(index='real-index', body=body) self.client.index( index='test-index', doc_type='question', id=42, body={}, ) self.client.index( index='test-index', doc_type='answer', id=47, body={'some': 'data'}, parent=42 ) self.client.indices.refresh(index='test-index') def test_children_are_reindexed_correctly(self): helpers.reindex(self.client, 'test-index', 'real-index') q = self.client.get( index='real-index', doc_type='question', id=42 ) self.assertEquals( { '_id': '42', '_index': 'real-index', '_source': {}, '_type': 'question', '_version': 1, 'found': True }, q ) q = self.client.get( index='test-index', doc_type='answer', id=47, parent=42 ) if '_routing' in q: self.assertEquals(q.pop('_routing'), '42') self.assertEquals( { '_id': '47', '_index': 'test-index', '_source': {'some': 'data'}, '_type': 'answer', '_version': 1, '_parent': '42', 'found': True }, q ) elasticsearch-py-5.4.0/test_elasticsearch/test_transport.py000066400000000000000000000215421310735271200243030ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals import time from elasticsearch.transport import Transport, get_host_info from elasticsearch.connection import Connection from elasticsearch.connection_pool import DummyConnectionPool from elasticsearch.exceptions import ConnectionError, ImproperlyConfigured from .test_cases import TestCase class DummyConnection(Connection): def __init__(self, **kwargs): self.exception = kwargs.pop('exception', None) self.status, self.data = kwargs.pop('status', 200), kwargs.pop('data', '{}') self.headers = kwargs.pop('headers', {}) self.calls = [] super(DummyConnection, self).__init__(**kwargs) def perform_request(self, *args, **kwargs): self.calls.append((args, kwargs)) if self.exception: raise self.exception return self.status, self.headers, self.data CLUSTER_NODES = '''{ "_nodes" : { "total" : 1, "successful" : 1, "failed" : 0 }, "cluster_name" : "elasticsearch", "nodes" : { "SRZpKFZdQguhhvifmN6UVA" : { "name" : "SRZpKFZ", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.0.0", "build_hash" : "253032b", "roles" : [ "master", "data", "ingest" ], "http" : { "bound_address" : [ "[fe80::1]:9200", "[::1]:9200", "127.0.0.1:9200" ], "publish_address" : "1.1.1.1:123", "max_content_length_in_bytes" : 104857600 } } } }''' class TestHostsInfoCallback(TestCase): def test_master_only_nodes_are_ignored(self): nodes = [ {'roles': [ "master"]}, {'roles': [ "master", "data", "ingest"]}, {'roles': [ "data", "ingest"]}, {'roles': [ ]}, {} ] chosen = [i for i, node_info in enumerate(nodes) if get_host_info(node_info, i) is not None] self.assertEquals([1, 2, 3, 4], chosen) class TestTransport(TestCase): def test_single_connection_uses_dummy_connection_pool(self): t = Transport([{}]) self.assertIsInstance(t.connection_pool, DummyConnectionPool) t = Transport([{'host': 'localhost'}]) self.assertIsInstance(t.connection_pool, DummyConnectionPool) def test_request_timeout_extracted_from_params_and_passed(self): t = Transport([{}], connection_class=DummyConnection) t.perform_request('GET', '/', params={'request_timeout': 42}) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', {}, None), t.get_connection().calls[0][0]) self.assertEquals({'timeout': 42, 'ignore': ()}, t.get_connection().calls[0][1]) def test_send_get_body_as_source(self): t = Transport([{}], send_get_body_as='source', connection_class=DummyConnection) t.perform_request('GET', '/', body={}) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', {'source': '{}'}, None), t.get_connection().calls[0][0]) def test_send_get_body_as_post(self): t = Transport([{}], send_get_body_as='POST', connection_class=DummyConnection) t.perform_request('GET', '/', body={}) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('POST', '/', None, b'{}'), t.get_connection().calls[0][0]) def test_body_gets_encoded_into_bytes(self): t = Transport([{}], connection_class=DummyConnection) t.perform_request('GET', '/', body='你好') self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', None, b'\xe4\xbd\xa0\xe5\xa5\xbd'), t.get_connection().calls[0][0]) def test_body_bytes_get_passed_untouched(self): t = Transport([{}], connection_class=DummyConnection) body = b'\xe4\xbd\xa0\xe5\xa5\xbd' t.perform_request('GET', '/', body=body) self.assertEquals(1, len(t.get_connection().calls)) self.assertEquals(('GET', '/', None, body), t.get_connection().calls[0][0]) def test_kwargs_passed_on_to_connections(self): t = Transport([{'host': 'google.com'}], port=123) self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://google.com:123', t.connection_pool.connections[0].host) def test_kwargs_passed_on_to_connection_pool(self): dt = object() t = Transport([{}, {}], dead_timeout=dt) self.assertIs(dt, t.connection_pool.dead_timeout) def test_custom_connection_class(self): class MyConnection(object): def __init__(self, **kwargs): self.kwargs = kwargs t = Transport([{}], connection_class=MyConnection) self.assertEquals(1, len(t.connection_pool.connections)) self.assertIsInstance(t.connection_pool.connections[0], MyConnection) def test_add_connection(self): t = Transport([{}], randomize_hosts=False) t.add_connection({"host": "google.com", "port": 1234}) self.assertEquals(2, len(t.connection_pool.connections)) self.assertEquals('http://google.com:1234', t.connection_pool.connections[1].host) def test_request_will_fail_after_X_retries(self): t = Transport([{'exception': ConnectionError('abandon ship')}], connection_class=DummyConnection) self.assertRaises(ConnectionError, t.perform_request, 'GET', '/') self.assertEquals(4, len(t.get_connection().calls)) def test_failed_connection_will_be_marked_as_dead(self): t = Transport([{'exception': ConnectionError('abandon ship')}] * 2, connection_class=DummyConnection) self.assertRaises(ConnectionError, t.perform_request, 'GET', '/') self.assertEquals(0, len(t.connection_pool.connections)) def test_resurrected_connection_will_be_marked_as_live_on_success(self): t = Transport([{}, {}], connection_class=DummyConnection) con1 = t.connection_pool.get_connection() con2 = t.connection_pool.get_connection() t.connection_pool.mark_dead(con1) t.connection_pool.mark_dead(con2) t.perform_request('GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals(1, len(t.connection_pool.dead_count)) def test_sniff_will_use_seed_connections(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection) t.set_connections([{'data': 'invalid'}]) t.sniff_hosts() self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) def test_sniff_on_start_fetches_and_uses_nodes_list(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection, sniff_on_start=True) self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) def test_sniff_on_start_ignores_sniff_timeout(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection, sniff_on_start=True, sniff_timeout=12) self.assertEquals((('GET', '/_nodes/_all/http'), {'timeout': None}), t.seed_connections[0].calls[0]) def test_sniff_uses_sniff_timeout(self): t = Transport([{'data': CLUSTER_NODES}], connection_class=DummyConnection, sniff_timeout=42) t.sniff_hosts() self.assertEquals((('GET', '/_nodes/_all/http'), {'timeout': 42}), t.seed_connections[0].calls[0]) def test_sniff_reuses_connection_instances_if_possible(self): t = Transport([{'data': CLUSTER_NODES}, {"host": "1.1.1.1", "port": 123}], connection_class=DummyConnection, randomize_hosts=False) connection = t.connection_pool.connections[1] t.sniff_hosts() self.assertEquals(1, len(t.connection_pool.connections)) self.assertIs(connection, t.get_connection()) def test_sniff_on_fail_triggers_sniffing_on_fail(self): t = Transport([{'exception': ConnectionError('abandon ship')}, {"data": CLUSTER_NODES}], connection_class=DummyConnection, sniff_on_connection_fail=True, max_retries=0, randomize_hosts=False) self.assertRaises(ConnectionError, t.perform_request, 'GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) def test_sniff_after_n_seconds(self): t = Transport([{"data": CLUSTER_NODES}], connection_class=DummyConnection, sniffer_timeout=5) for _ in range(4): t.perform_request('GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertIsInstance(t.get_connection(), DummyConnection) t.last_sniff = time.time() - 5.1 t.perform_request('GET', '/') self.assertEquals(1, len(t.connection_pool.connections)) self.assertEquals('http://1.1.1.1:123', t.get_connection().host) self.assertTrue(time.time() - 1 < t.last_sniff < time.time() + 0.01 ) elasticsearch-py-5.4.0/tox.ini000066400000000000000000000002731310735271200162760ustar00rootroot00000000000000[tox] envlist = pypy,py26,py27,py33,py34 [testenv] whitelist_externals = git setenv = NOSE_XUNIT_FILE = junit-{envname}.xml commands = git submodule init python setup.py test