python-pysolr-2.0.15/0000755000175000017500000000000011577720657013507 5ustar vladyvladypython-pysolr-2.0.15/LICENSE0000644000175000017500000000302511365701340014473 0ustar vladyvladyCopyright (c) Joseph Kocherhans, Jacob Kaplan-Moss, Daniel Lindsley. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of pysolr nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. python-pysolr-2.0.15/README0000644000175000017500000000153411407261374014356 0ustar vladyvlady====== pysolr ====== ``pysolr`` is a lightweight Python wrapper for Apache Solr. It provides an interface that queries the server and returns results based on the query. Features ======== * Basic operations such as selecting, updating & deleting. * Index optimization. * "More Like This" support (if setup in Solr). * Spelling correction (if setup in Solr). * Timeout support. Requirements ============ * Python 2.4+ (tested under Python 2.6+) * **Optional** - ``lxml`` (Python 2.4.X and below) * **Optional** - ``simplejson`` (Python 2.4.X and below) * **Optional** - ``httplib2`` for timeout support * **Optional** - ``BeautifulSoup`` for Tomcat error support Installation ============ ``sudo python setup.py install`` or drop the ``pysolr.py`` file anywhere on your PYTHONPATH. LICENSE ======= ``pysolr`` is licensed under the New BSD license. python-pysolr-2.0.15/PKG-INFO0000644000175000017500000000107411573733356014602 0ustar vladyvladyMetadata-Version: 1.0 Name: pysolr Version: 2.0.15 Summary: Lightweight python wrapper for Apache Solr. Home-page: http://github.com/toastdriven/pysolr/ Author: Daniel Lindsley Author-email: daniel@toastdriven.com License: UNKNOWN Description: UNKNOWN Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search python-pysolr-2.0.15/pysolr.py0000644000175000017500000007115311573733177015415 0ustar vladyvlady# -*- coding: utf-8 -*- """ All we need to create a Solr connection is a url. >>> conn = Solr('http://127.0.0.1:8983/solr/') First, completely clear the index. >>> conn.delete(q='*:*') For now, we can only index python dictionaries. Each key in the dictionary will correspond to a field in Solr. >>> docs = [ ... {'id': 'testdoc.1', 'order_i': 1, 'name': 'document 1', 'text': u'Paul Verlaine'}, ... {'id': 'testdoc.2', 'order_i': 2, 'name': 'document 2', 'text': u'Владимир Маякoвский'}, ... {'id': 'testdoc.3', 'order_i': 3, 'name': 'document 3', 'text': u'test'}, ... {'id': 'testdoc.4', 'order_i': 4, 'name': 'document 4', 'text': u'test'} ... ] We can add documents to the index by passing a list of docs to the connection's add method. >>> conn.add(docs) >>> results = conn.search('Verlaine') >>> len(results) 1 >>> results = conn.search(u'Владимир') >>> len(results) 1 Simple tests for searching. We can optionally sort the results using Solr's sort syntax, that is, the field name and either asc or desc. >>> results = conn.search('test', sort='order_i asc') >>> for result in results: ... print result['name'] document 3 document 4 >>> results = conn.search('test', sort='order_i desc') >>> for result in results: ... print result['name'] document 4 document 3 To update documents, we just use the add method. >>> docs = [ ... {'id': 'testdoc.4', 'order_i': 4, 'name': 'document 4', 'text': u'blah'} ... ] >>> conn.add(docs) >>> len(conn.search('blah')) 1 >>> len(conn.search('test')) 1 We can delete documents from the index by id, or by supplying a query. >>> conn.delete(id='testdoc.1') >>> conn.delete(q='name:"document 2"') >>> results = conn.search('Verlaine') >>> len(results) 0 Docs can also have multiple values for any particular key. This lets us use Solr's multiValue fields. >>> docs = [ ... {'id': 'testdoc.5', 'cat': ['poetry', 'science'], 'name': 'document 5', 'text': u''}, ... {'id': 'testdoc.6', 'cat': ['science-fiction',], 'name': 'document 6', 'text': u''}, ... ] >>> conn.add(docs) >>> results = conn.search('cat:"poetry"') >>> for result in results: ... print result['name'] document 5 >>> results = conn.search('cat:"science-fiction"') >>> for result in results: ... print result['name'] document 6 >>> results = conn.search('cat:"science"') >>> for result in results: ... print result['name'] document 5 Docs can also boost any particular key. This lets us use Solr's boost on a field. >>> docs = [ ... {'id': 'testdoc.7', 'order_i': '7', 'name': 'document 7', 'text': u'eight', 'author': 'seven'}, ... {'id': 'testdoc.8', 'order_i': '8', 'name': 'document 8', 'text': u'seven', 'author': 'eight'}, ... ] >>> conn.add(docs, boost={'author': '2.0',}) >>> results = conn.search('seven author:seven') >>> for result in results: ... print result['name'] document 7 document 8 >>> results = conn.search('eight author:eight') >>> for result in results: ... print result['name'] document 8 document 7 """ # TODO: unicode support is pretty sloppy. define it better. from datetime import datetime import htmlentitydefs import logging import re import time import urllib import urllib2 from urlparse import urlsplit, urlunsplit try: # for python 2.5 from xml.etree import cElementTree as ET except ImportError: try: # use etree from lxml if it is installed from lxml import etree as ET except ImportError: try: # use cElementTree if available import cElementTree as ET except ImportError: try: from elementtree import ElementTree as ET except ImportError: raise ImportError("No suitable ElementTree implementation was found.") try: # For Python < 2.6 or people using a newer version of simplejson import simplejson as json except ImportError: # For Python >= 2.6 import json try: # Desirable from a timeout perspective. from httplib2 import Http TIMEOUTS_AVAILABLE = True except ImportError: from httplib import HTTPConnection TIMEOUTS_AVAILABLE = False try: set except NameError: from sets import Set as set __author__ = 'Joseph Kocherhans, Jacob Kaplan-Moss, Daniel Lindsley' __all__ = ['Solr'] __version__ = (2, 0, 15) def get_version(): return "%s.%s.%s" % __version__[:3] DATETIME_REGEX = re.compile('^(?P\d{4})-(?P\d{2})-(?P\d{2})T(?P\d{2}):(?P\d{2}):(?P\d{2})(\.\d+)?Z$') class NullHandler(logging.Handler): def emit(self, record): pass # Add the ``NullHandler`` to avoid logging by default while still allowing # others to attach their own handlers. LOG = logging.getLogger('pysolr') h = NullHandler() LOG.addHandler(h) # For debugging... if False: LOG.setLevel(logging.DEBUG) stream = logging.StreamHandler() LOG.addHandler(stream) def unescape_html(text): """ Removes HTML or XML character references and entities from a text string. @param text The HTML (or XML) source text. @return The plain text, as a Unicode string, if necessary. Source: http://effbot.org/zone/re-sub.htm#unescape-html """ def fixup(m): text = m.group(0) if text[:2] == "&#": # character reference try: if text[:3] == "&#x": return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is return re.sub("&#?\w+;", fixup, text) def safe_urlencode(params, doseq=0): """ UTF-8-safe version of safe_urlencode The stdlib safe_urlencode prior to Python 3.x chokes on UTF-8 values which can't fail down to ascii. """ if hasattr(params, "items"): params = params.items() new_params = list() for k, v in params: k = k.encode("utf-8") if isinstance(v, basestring): new_params.append((k, v.encode("utf-8"))) elif isinstance(v, (list, tuple)): new_params.append((k, [i.encode("utf-8") for i in v])) else: new_params.append((k, unicode(v))) return urllib.urlencode(new_params, doseq) class SolrError(Exception): pass class Results(object): def __init__(self, docs, hits, highlighting=None, facets=None, spellcheck=None, stats=None, qtime=None, debug=None): self.docs = docs self.hits = hits self.highlighting = highlighting or {} self.facets = facets or {} self.spellcheck = spellcheck or {} self.stats = stats or {} self.qtime = qtime self.debug = debug or {} def __len__(self): return len(self.docs) def __iter__(self): return iter(self.docs) class Solr(object): def __init__(self, url, decoder=None, timeout=60): self.decoder = decoder or json.JSONDecoder() self.url = url self.scheme, netloc, path, query, fragment = urlsplit(url) self.base_url = urlunsplit((self.scheme, netloc, '', '', '')) netloc = netloc.split(':') self.host = netloc[0] if len(netloc) == 1: self.host, self.port = netloc[0], None else: self.host, self.port = netloc[0], int(netloc[1]) self.path = path.rstrip('/') self.timeout = timeout self.log = self._get_log() def _get_log(self): return LOG def _send_request(self, method, path, body=None, headers=None): if TIMEOUTS_AVAILABLE: http = Http(timeout=self.timeout) url = self.base_url + path try: start_time = time.time() self.log.debug("Starting request to '%s' (%s) with body '%s'..." % (url, method, str(body)[:10])) headers, response = http.request(url, method=method, body=body, headers=headers) end_time = time.time() self.log.info("Finished '%s' (%s) with body '%s' in %0.3f seconds." % (url, method, str(body)[:10], end_time - start_time)) except AttributeError: # For httplib2. error_message = "Failed to connect to server at '%s'. Are you sure '%s' is correct? Checking it in a browser might help..." % (url, self.base_url) self.log.error(error_message) raise SolrError(error_message) if int(headers['status']) != 200: error_message = self._extract_error(headers, response) self.log.error(error_message) raise SolrError(error_message) return response else: if headers is None: headers = {} conn = HTTPConnection(self.host, self.port) start_time = time.time() self.log.debug("Starting request to '%s:%s/%s' (%s) with body '%s'..." % (self.host, self.port, path, method, str(body)[:10])) conn.request(method, path, body, headers) response = conn.getresponse() end_time = time.time() self.log.info("Finished '%s:%s/%s' (%s) with body '%s' in %0.3f seconds." % (self.host, self.port, path, method, str(body)[:10], end_time - start_time)) if response.status != 200: error_message = self._extract_error(dict(response.getheaders()), response.read()) self.log.error(error_message) raise SolrError(error_message) return response.read() def _select(self, params): # specify json encoding of results params['wt'] = 'json' params_encoded = safe_urlencode(params, True) if len(params_encoded) < 1024: # Typical case. path = '%s/select/?%s' % (self.path, params_encoded) return self._send_request('GET', path) else: # Handles very long queries by submitting as a POST. path = '%s/select/' % (self.path,) headers = { 'Content-type': 'application/x-www-form-urlencoded; charset=utf-8', } return self._send_request('POST', path, body=params_encoded, headers=headers) def _mlt(self, params): params['wt'] = 'json' # specify json encoding of results path = '%s/mlt/?%s' % (self.path, safe_urlencode(params, True)) return self._send_request('GET', path) def _suggest_terms(self, params): params['wt'] = 'json' # specify json encoding of results path = '%s/terms/?%s' % (self.path, safe_urlencode(params, True)) return self._send_request('GET', path) def _update(self, message, clean_ctrl_chars=True, commit=True, waitFlush=None, waitSearcher=None): """ Posts the given xml message to http://:/solr/update and returns the result. Passing `sanitize` as False will prevent the message from being cleaned of control characters (default True). This is done by default because these characters would cause Solr to fail to parse the XML. Only pass False if you're positive your data is clean. """ path = '%s/update/' % self.path # Per http://wiki.apache.org/solr/UpdateXmlMessages, we can append a # ``commit=true`` to the URL and have the commit happen without a # second request. query_vars = [] if commit is not None: query_vars.append('commit=%s' % str(bool(commit)).lower()) if waitFlush is not None: query_vars.append('waitFlush=%s' % str(bool(waitFlush)).lower()) if waitSearcher is not None: query_vars.append('waitSearcher=%s' % str(bool(waitSearcher)).lower()) if query_vars: path = '%s?%s' % (path, '&'.join(query_vars)) # Clean the message of ctrl characters. if clean_ctrl_chars: message = sanitize(message) return self._send_request('POST', path, message, {'Content-type': 'text/xml; charset=utf-8'}) def _extract_error(self, headers, response): """ Extract the actual error message from a solr response. """ reason = headers.get('reason', None) full_html = None if reason is None: reason, full_html = self._scrape_response(headers, response) msg = "[Reason: %s]" % reason if reason is None: msg += "\n%s" % unescape_html(full_html) return msg def _scrape_response(self, headers, response): """ Scrape the html response. """ # identify the responding server server_type = None server_string = headers.get('server', '') if server_string and 'jetty' in server_string.lower(): server_type = 'jetty' if server_string and 'coyote' in server_string.lower(): # TODO: During the pysolr 3 effort, make this no longer a # conditional and consider using ``lxml.html`` instead. from BeautifulSoup import BeautifulSoup server_type = 'tomcat' reason = None full_html = '' dom_tree = None if server_type == 'tomcat': # Tomcat doesn't produce a valid XML response soup = BeautifulSoup(response) body_node = soup.find('body') p_nodes = body_node.findAll('p') for p_node in p_nodes: children = p_node.findChildren() if len(children) >= 2 and 'message' in children[0].renderContents().lower(): reason = children[1].renderContents() if reason is None: full_html = soup.prettify() else: # Let's assume others do produce a valid XML response try: dom_tree = ET.fromstring(response) reason_node = None # html page might be different for every server if server_type == 'jetty': reason_node = dom_tree.find('body/pre') if reason_node is not None: reason = reason_node.text if reason is None: full_html = ET.tostring(dom_tree) except SyntaxError, e: full_html = "%s" % response full_html = full_html.replace('\n', '') full_html = full_html.replace('\r', '') full_html = full_html.replace('
', '') full_html = full_html.replace('
', '') full_html = full_html.strip() return reason, full_html # Conversion ############################################################# def _from_python(self, value): """ Converts python values to a form suitable for insertion into the xml we send to solr. """ if hasattr(value, 'strftime'): if hasattr(value, 'hour'): value = "%sZ" % value.isoformat() else: value = "%sT00:00:00Z" % value.isoformat() elif isinstance(value, bool): if value: value = 'true' else: value = 'false' elif isinstance(value, str): value = unicode(value, errors='replace') else: value = unicode(value) return value def _to_python(self, value): """ Converts values from Solr to native Python values. """ if isinstance(value, (int, float, long, complex)): return value if isinstance(value, (list, tuple)): value = value[0] if value == 'true': return True elif value == 'false': return False if isinstance(value, basestring): possible_datetime = DATETIME_REGEX.search(value) if possible_datetime: date_values = possible_datetime.groupdict() for dk, dv in date_values.items(): date_values[dk] = int(dv) return datetime(date_values['year'], date_values['month'], date_values['day'], date_values['hour'], date_values['minute'], date_values['second']) try: # This is slightly gross but it's hard to tell otherwise what the # string's original type might have been. Be careful who you trust. converted_value = eval(value) # Try to handle most built-in types. if isinstance(converted_value, (list, tuple, set, dict, int, float, long, complex)): return converted_value except: # If it fails (SyntaxError or its ilk) or we don't trust it, # continue on. pass return value def _is_null_value(self, value): """ Check if a given value is ``null``. Criteria for this is based on values that shouldn't be included in the Solr ``add`` request at all. """ # TODO: This should probably be removed when solved in core Solr level? return (value is None) or (isinstance(value, basestring) and len(value) == 0) # API Methods ############################################################ def search(self, q, **kwargs): """Performs a search and returns the results.""" params = {'q': q} params.update(kwargs) response = self._select(params) # TODO: make result retrieval lazy and allow custom result objects result = self.decoder.decode(response) result_kwargs = {} if result.get('debug'): result_kwargs['debug'] = result['debug'] if result.get('highlighting'): result_kwargs['highlighting'] = result['highlighting'] if result.get('facet_counts'): result_kwargs['facets'] = result['facet_counts'] if result.get('spellcheck'): result_kwargs['spellcheck'] = result['spellcheck'] if result.get('stats'): result_kwargs['stats'] = result['stats'] if 'QTime' in result.get('responseHeader', {}): result_kwargs['qtime'] = result['responseHeader']['QTime'] self.log.debug("Found '%s' search results." % result['response']['numFound']) return Results(result['response']['docs'], result['response']['numFound'], **result_kwargs) def more_like_this(self, q, mltfl, **kwargs): """ Finds and returns results similar to the provided query. Requires Solr 1.3+. """ params = { 'q': q, 'mlt.fl': mltfl, } params.update(kwargs) response = self._mlt(params) result = self.decoder.decode(response) if result['response'] is None: result['response'] = { 'docs': [], 'numFound': 0, } self.log.debug("Found '%s' MLT results." % result['response']['numFound']) return Results(result['response']['docs'], result['response']['numFound']) def suggest_terms(self, fields, prefix, **kwargs): """ Accepts a list of field names and a prefix Returns a dictionary keyed on field name containing a list of ``(term, count)`` pairs Requires Solr 1.4+. """ params = { 'terms.fl': fields, 'terms.prefix': prefix, } params.update(kwargs) response = self._suggest_terms(params) result = self.decoder.decode(response) terms = result.get("terms", {}) res = {} while terms: # The raw values are a flat list: ["dance",23,"dancers",10,"dancing",8,"dancer",6]] field = terms.pop(0) values = terms.pop(0) tmp = list() while values: tmp.append((values.pop(0), values.pop(0))) res[field] = tmp self.log.debug("Found '%d' Term suggestions results.", sum(len(j) for i, j in res.items())) return res def add(self, docs, commit=True, boost=None, commitWithin=None, waitFlush=None, waitSearcher=None): """Adds or updates documents. For now, docs is a list of dictionaries where each key is the field name and each value is the value to index. """ start_time = time.time() self.log.debug("Starting to build add request...") message = ET.Element('add') if commitWithin: message.set('commitWithin', commitWithin) for doc in docs: d = ET.Element('doc') for key, value in doc.items(): if key == 'boost': d.set('boost', str(value)) continue # handle lists, tuples, and other iterables if hasattr(value, '__iter__'): for v in value: if self._is_null_value(value): continue if boost and v in boost: if not isinstance(boost, basestring): boost[v] = str(boost[v]) f = ET.Element('field', name=key, boost=boost[v]) else: f = ET.Element('field', name=key) f.text = self._from_python(v) d.append(f) # handle strings and unicode else: if self._is_null_value(value): continue if boost and key in boost: if not isinstance(boost, basestring): boost[key] = str(boost[key]) f = ET.Element('field', name=key, boost=boost[key]) else: f = ET.Element('field', name=key) f.text = self._from_python(value) d.append(f) message.append(d) m = ET.tostring(message, encoding='utf-8') end_time = time.time() self.log.debug("Built add request of %s docs in %0.2f seconds." % (len(docs), end_time - start_time)) response = self._update(m, commit=commit, waitFlush=waitFlush, waitSearcher=waitSearcher) def delete(self, id=None, q=None, commit=True, waitFlush=None, waitSearcher=None): """Deletes documents.""" if id is None and q is None: raise ValueError('You must specify "id" or "q".') elif id is not None and q is not None: raise ValueError('You many only specify "id" OR "q", not both.') elif id is not None: m = '%s' % id elif q is not None: m = '%s' % q response = self._update(m, commit=commit, waitFlush=waitFlush, waitSearcher=waitSearcher) def commit(self, waitFlush=None, waitSearcher=None, expungeDeletes=None): if expungeDeletes is not None: msg = '' % str(bool(expungeDeletes)).lower() else: msg = '' response = self._update(msg, waitFlush=waitFlush, waitSearcher=waitSearcher) def optimize(self, waitFlush=None, waitSearcher=None, maxSegments=None): if maxSegments: msg = '' % maxSegments else: msg = '' response = self._update('', waitFlush=waitFlush, waitSearcher=waitSearcher) class SolrCoreAdmin(object): """ Handles core admin operations: see http://wiki.apache.org/solr/CoreAdmin Operations offered by Solr are: 1. STATUS 2. CREATE 3. RELOAD 4. RENAME 5. ALIAS 6. SWAP 7. UNLOAD 8. LOAD (not currently implemented) """ def __init__(self, url, *args, **kwargs): super(SolrCoreAdmin, self).__init__(*args, **kwargs) self.url = url def _get_url(self, url, params={}, headers={}): request = urllib2.Request(url, data=safe_urlencode(params), headers=headers) # Let ``socket.error``, ``urllib2.HTTPError`` and ``urllib2.URLError`` # propagate up the stack. response = urllib2.urlopen(request) return response.read() def status(self, core=None): """http://wiki.apache.org/solr/CoreAdmin#head-9be76f5a459882c5c093a7a1456e98bea7723953""" params = { 'action': 'STATUS', } if core is not None: params.update(core=core) return self._get_url(self.url, params=params) def create(self, name, instance_dir=None, config='solrcofig.xml', schema='schema.xml'): """http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08""" params = { 'action': 'STATUS', 'name': name, 'config': config, 'schema': schema, } if instance_dir is None: params.update(instanceDir=name) else: params.update(instanceDir=instance_dir) return self._get_url(self.url, params=params) def reload(self, core): """http://wiki.apache.org/solr/CoreAdmin#head-3f125034c6a64611779442539812067b8b430930""" params = { 'action': 'RELOAD', 'core': core, } return self._get_url(self.url, params=params) def rename(self, core, other): """http://wiki.apache.org/solr/CoreAdmin#head-9473bee1abed39e8583ba45ef993bebb468e3afe""" params = { 'action': 'RENAME', 'core': core, 'other': other, } return self._get_url(self.url, params=params) def alias(self, core, other): """ http://wiki.apache.org/solr/CoreAdmin#head-8bf9004eaa4d86af23d2758aafb0d31e2e8fe0d2 Experimental feature in Solr 1.3 """ params = { 'action': 'ALIAS', 'core': core, 'other': other, } return self._get_url(self.url, params=params) def swap(self, core, other): """http://wiki.apache.org/solr/CoreAdmin#head-928b872300f1b66748c85cebb12a59bb574e501b""" params = { 'action': 'SWAP', 'core': core, 'other': other, } return self._get_url(self.url, params=params) def unload(self, core): """http://wiki.apache.org/solr/CoreAdmin#head-f5055a885932e2c25096a8856de840b06764d143""" params = { 'action': 'UNLOAD', 'core': core, } return self._get_url(self.url, params=params) def load(self, core): raise NotImplementedError('Solr 1.4 and below do not support this operation.') # Using two-tuples to preserve order. REPLACEMENTS = ( # Nuke nasty control characters. ('\x00', ''), # Start of heading ('\x01', ''), # Start of heading ('\x02', ''), # Start of text ('\x03', ''), # End of text ('\x04', ''), # End of transmission ('\x05', ''), # Enquiry ('\x06', ''), # Acknowledge ('\x07', ''), # Ring terminal bell ('\x08', ''), # Backspace ('\x0b', ''), # Vertical tab ('\x0c', ''), # Form feed ('\x0e', ''), # Shift out ('\x0f', ''), # Shift in ('\x10', ''), # Data link escape ('\x11', ''), # Device control 1 ('\x12', ''), # Device control 2 ('\x13', ''), # Device control 3 ('\x14', ''), # Device control 4 ('\x15', ''), # Negative acknowledge ('\x16', ''), # Synchronous idle ('\x17', ''), # End of transmission block ('\x18', ''), # Cancel ('\x19', ''), # End of medium ('\x1a', ''), # Substitute character ('\x1b', ''), # Escape ('\x1c', ''), # File separator ('\x1d', ''), # Group separator ('\x1e', ''), # Record separator ('\x1f', ''), # Unit separator ) def sanitize(data): fixed_string = data for bad, good in REPLACEMENTS: fixed_string = fixed_string.replace(bad, good) return fixed_string if __name__ == "__main__": import doctest doctest.testmod() python-pysolr-2.0.15/AUTHORS0000644000175000017500000000165611577720657014567 0ustar vladyvladyPrimaries: * Joseph Kocherhans * Daniel Lindsley * Jacob Kaplan-Moss Contributors: * initcrash for a patch regarding datetime formatting. * maciekp.lists for a patch correcting URL construction. * jarek & dekstop for a patch regarding sending Unicode documents. * Tomasz.Wegrzanowski for a patch to enable document boosting. * thomas.j.lee for a patch to add stats support. * Chak for a patch regarding empty string being unnecessarily sent. * james.colin.brady for a patch to enable working with the cores. * anti-social for a patch on charset sending. * akaihola for a patch regarding long queries. * bochecha for various patches. * stugots for an invalid character patch. * notanumber for a field boosting patch. * acdha for various patches. * zyegfryed for various patches. * girasquid for a patch related to server string. * David Cramer (dcramer) for various patches. * dourvais for a query time patch. * soypunk for a debug patch. python-pysolr-2.0.15/setup.py0000644000175000017500000000116711573733257015222 0ustar vladyvladyfrom distutils.core import setup setup( name = "pysolr", version = "2.0.15", description = "Lightweight python wrapper for Apache Solr.", author = 'Daniel Lindsley', author_email = 'daniel@toastdriven.com', py_modules = ['pysolr'], classifiers = [ 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'License :: OSI Approved :: BSD License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Topic :: Internet :: WWW/HTTP :: Indexing/Search' ], url = 'http://github.com/toastdriven/pysolr/' ) python-pysolr-2.0.15/ChangeLog0000644000175000017500000004216311577717632015266 0ustar vladyvladycommit 4d62da4c9d1989a2ea54b0b4428dfa0bf22293b2 Author: Daniel Lindsley Date: Wed Jun 8 12:52:31 2011 -0500 Bumped to v2.0.15! commit d41e2095612f4c74cbbef07d4e3b5c3bef3f8f5a Author: Daniel Lindsley Date: Wed Jun 8 12:52:18 2011 -0500 Fixed a bug where ``server_string`` could come back as ``None``. Thanks to croddy for the report! commit fecaf5193741197f153b9fb4e0ae192d97dd20ab Author: Daniel Lindsley Date: Mon May 16 00:35:06 2011 -0500 Added dourvais & soypunk to AUTHORS. commit 6f1b1db9becfc657efd73dfad6f273c3b1716562 Author: David Cramer Date: Mon May 9 16:48:04 2011 -0700 Unescape html entities in error messages commit 555d16e13620cc7c85c9765be604c7ce19694de5 Author: Shawn Medero Date: Thu May 5 15:10:04 2011 -0700 Added support for getting at the Solr querying debug data when using search(). Passing ``debug=True`` as kwarg, the ``search()`` method will activate this property in the JSON results. commit b0b13e7a58262327e031d074155357f3ca3ccaa4 Author: Daniel Dourvaris Date: Fri Feb 11 11:29:29 2011 +0200 Fixed bug, qtime wasn't set when it was 0. commit 7e3802c20a0e70bb6e9560baa51acb6c463bf9f0 Author: Daniel Dourvaris Date: Tue Jan 18 10:35:53 2011 +0200 Added query time to results as attribute. commit fcad5c731d4515d608f93e19bad0d6bac4abb9b2 Author: Daniel Lindsley Date: Fri Apr 29 00:36:29 2011 -0500 Bumped revision for dev on the next release. commit 33fea7e58da8b0029600cea9abce63188864ac21 Author: Daniel Lindsley Date: Fri Apr 29 00:35:34 2011 -0500 v2.0.14 commit ff6858912092fb4d291678d44f57b62b3c9a1be2 Author: David Cramer Date: Wed Apr 27 15:57:49 2011 -0700 Always send commit if its not-null commit efb6b7e9a371fb405f613d8f05c87e2d61529405 Author: David Cramer Date: Wed Apr 27 15:46:40 2011 -0700 Add support for waitFlush and waitSearcher on update queries. Added support for expungeDeletes on commit(). Added support for maxSegments on optimize() commit c43f5e607169e0d2a3691bb38d25dd4668015770 Author: David Cramer Date: Tue Mar 8 17:10:51 2011 -0800 Ensure port is coerced to an integer as (at least some version of) socket does not handle unicode ports nicely commit 4b6c2bb5cdd668a025f1fde3541930f06110dc96 Author: David Cramer Date: Fri Jan 7 11:55:44 2011 -0800 Add support for commitWithin on Solr.add commit ae4796db49c5896ef9e0c7e49c1ea430f0a214d6 Author: Daniel Lindsley Date: Tue Dec 14 20:45:26 2010 -0600 Better compatibility with the latest revisions of lxml. Thanks to ghostmob for pointing this out! commit e42ae892cb141458e3621677de76e4e5a0e11319 Author: Daniel Lindsley Date: Tue Dec 14 20:29:47 2010 -0600 Fixed occasionally trying to call ``lower`` on ``None``. Thanks to girasquid for the report & original patch! commit df50f7d7ade50e593e02cf16d42b4231755badd5 Author: Daniel Lindsley Date: Tue Sep 14 20:15:45 2010 -0500 Cleaned up how parameters are checked. Thanks to zyegfryed for the patch. v2.0.13. commit 45a188e106f00cde9202867e339e11e8745137f8 Author: Daniel Lindsley Date: Mon Sep 13 01:32:06 2010 -0500 Fixed a bug in the weighting when given a string field that's weighted. Thanks to akaihola for the report. commit 80ad7b78db1a302a34f5cada3c29d88a909faca6 Author: Daniel Lindsley Date: Thu Jul 22 21:17:54 2010 -0500 Fixed the case where the data being converted would be clean unicode. Thanks to acdha for submitting another version of this patch. commit b25d897f18e725f145697276aa41fa647c548caa Author: Daniel Lindsley Date: Thu Jul 22 21:14:02 2010 -0500 Fixed the long URL support to correctly deal with sequences. commit 2c0061d1ec17055eb8892d40bd41703a3cdd6769 Author: Daniel Lindsley Date: Wed Jul 21 22:24:36 2010 -0500 Fixed a bug where additional parameters could cause the URL to be longer than 1024 even if the query is not. Thanks to zyegfryed for the report & patch! commit db89b1d94f649a07230f4a2900add9f601187842 Author: Daniel Lindsley Date: Wed Jul 21 22:22:36 2010 -0500 Boost values are now coerced into a string. Thanks to notanumber for the patch! commit 80d9ded51911d56c728b9edea635e848de6857e2 Author: Daniel Lindsley Date: Wed Jul 21 22:18:58 2010 -0500 All params are now safely encoded. Thanks to acdha for the patch! commit 6f76564970ce856805c7140026a4a687b75a9d8e Author: Daniel Lindsley Date: Wed Jul 21 22:15:37 2010 -0500 Added term suggestion. Requires Solr 1.4+. Thanks to acdha for the patch! commit bd29cf03ee8822698ef78e1b722d6934dc1eaf74 Author: Daniel Lindsley Date: Wed Jul 21 21:59:55 2010 -0500 If invalid characters are found, replace them. Thanks to stugots for the report and fix. commit 1b5c427ef919948f3caa74674feca94787527f5f Author: Daniel Lindsley Date: Sat Jun 19 22:24:45 2010 -0500 Slicing ``None`` doesn't work. Make it a string... commit f4444971e449f5079b788078e44f783b234db2f7 Author: Daniel Lindsley Date: Sat Jun 19 20:34:51 2010 -0500 Added basic logging support. Thanks to sjaday for the suggestion. commit 46139ecd75b685c91c486a3840489028cc56c8d9 Author: Daniel Lindsley Date: Sat Jun 19 19:39:02 2010 -0500 Releasing version v2.0.12. commit 745e803b0794945c7dae974e8b69f0adeadfd525 Author: Daniel Lindsley Date: Sat Jun 19 19:35:45 2010 -0500 Added a more helpful message for the ever classic "'NoneType' object has no attribute 'makefile'" error when providing an incorrect URL. commit a8b7d088f818075e6cfa28b2d450741e6ec7e357 Author: Daniel Lindsley Date: Sat Jun 19 19:29:00 2010 -0500 Added better error support when using Tomcat. Thanks to bochecha for the original patch. commit 901520a04d4ee9c3dad3db7864f1bf2c72aebeba Author: Daniel Lindsley Date: Sat Jun 19 19:16:06 2010 -0500 Fixed a long-standing TODO, allowing commits to happen without a second request. Thanks to lyblandin for finally chiding me into fixing it. commit 610d3ce7ac48d12c75fa30c202bc2281ac35154c Author: Daniel Lindsley Date: Wed Jun 16 01:19:11 2010 -0500 Fixed a bug when sending long queries. Thanks to akaihola & gthb for the report and patch. commit 279b82587bf135ef4d42b07963e4d2a25d7372b1 Author: Daniel Lindsley Date: Wed Jun 16 01:16:25 2010 -0500 Corrected a bug where Unicode character might not transmit correctly. Thanks to anti-social for the initial patch. commit a441641b1d9305e26b38482e156ce4f5499294a2 Author: David Sauve Date: Tue Apr 20 21:16:31 2010 -0400 Added field-based boost support. Thanks to notanumber for the patch. commit 2ea84356947af64b6d212de9b650f5ff6ccd06f0 Author: Daniel Lindsley Date: Thu Apr 29 01:39:00 2010 -0500 Better error messages are now provided when things go south. Thanks to bochecha for the patch. commit 44340708850c16ce6f065d552a4f4d3c42b05eaa Author: Daniel Lindsley Date: Thu Apr 29 00:55:09 2010 -0500 Added support for working with Solr cores. Thanks to james.colin.brady for the original patch. commit f35f968ec2b1aa3e2b70bd4572ecedf7eb53f36d Author: Daniel Lindsley Date: Thu Apr 29 00:37:10 2010 -0500 Fixed a bug where empty strings/``None`` would be erroneously sent. Thanks to Chak for the patch. commit aaacaec1c4a9581da84a4c15182780aa2c3d0805 Author: Daniel Lindsley Date: Thu Apr 29 00:24:57 2010 -0500 Added support for the Stats component. Thanks to thomas.j.lee for the original patch. commit c4426c5559c2c2d2c32af6c8b0d1937cbbc5df1c Author: Daniel Lindsley Date: Tue Apr 27 21:13:00 2010 -0500 Fixed datetime/date handling to use ``isoformat`` instead of manually constructing the string. Thanks to joegermuska for the suggestion. commit 6e92dd23be97422a777ab6661325fb60ceed9391 Author: Daniel Lindsley Date: Tue Apr 27 20:05:03 2010 -0500 Added document boost support. Thanks to Tomasz.Wegrzanowski for the patch. commit d89290d41ff266af13d91a7346c8d593e1d33e3f Author: Daniel Lindsley Date: Tue Apr 27 19:53:56 2010 -0500 Fixed pysolr to add documents explicitly using UTF-8. Thanks to jarek & dekstop for the patch. commit 8ccd48f695510d4bd008c6124487efc127565ee8 Author: Daniel Lindsley Date: Tue Apr 27 19:46:15 2010 -0500 Fixed initialization parameters on ``Results``. Thanks to jonathan.slenders for pointing this out. v2.0.11 commit 83b0f87b06faffc2a89bd692ed1936190a89c452 Author: Daniel Lindsley Date: Tue Apr 27 19:33:00 2010 -0500 Added a sane .gitignore. commit 12a51a8eb63f61bc81ec47ffa9e52e3d923ab836 Author: Daniel Lindsley Date: Tue Apr 27 19:13:40 2010 -0500 Fixed a bug in URL construction with httplib2. Thanks to maciekp.lists for the patch. v2.0.10 commit 622da5181a61c15ab8cb6337309125d661cc20d0 Author: Daniel Lindsley Date: Sun Feb 21 10:46:55 2010 -0600 Added a way to handle queries longer than 1024. Adapted from cogtree's Python Solr fork. commit c338c1cef12758b821cd10e117d1dc047a6b323a Author: Daniel Lindsley Date: Wed Aug 26 14:08:23 2009 -0500 Fixed isinstance bug that can occur with the now potentially different datetime/date objects. commit 874688d366144bef15412d8c9274d2fa54bfcffb Author: Daniel Lindsley Date: Wed Aug 26 10:48:04 2009 -0500 Altered pysolr to use, if available, Django's implementation of datetime for dates before 1900. Falls back to the default implementation of datetime. commit e19e2cd24bef833f72b8cc95fde6749fcfad1f7e Author: Daniel Lindsley Date: Wed Jul 15 11:27:08 2009 -0500 If MLT was enabled but no reindexing was performed, Solr returns null instead of no docs. Handle this slightly more gracefully. commit 8ab965b5cf8473501c66fd3a4eec7224d9171521 Author: Daniel Lindsley Date: Thu Jul 2 00:33:08 2009 -0500 Corrected a regression when errors occur while using httplib. commit 0d59ddf77a2831ec577d7c78800cf2405d6f6aec Author: Daniel Lindsley Date: Thu Jul 2 00:32:23 2009 -0500 Bumped version number for previous commit. commit d4a6f03d2a3e74f9fe0042c69dc2c398a5463f23 Author: Daniel Lindsley Date: Wed Jul 1 12:01:46 2009 -0500 Altered the '_extract_error' method to be a little more useful when things go south. commit 2943f26b23c989ff48406d0413e0630a98c10e36 Author: polarcowz Date: Wed Jul 1 15:02:57 2009 +0000 Bumped version for previous commit. commit 92f3b447c3f8a9e4c7d9a294bd20c8811f6b7690 Author: polarcowz Date: Wed Jul 1 15:02:54 2009 +0000 Added (optional but default) sanitizing for updates. This cleans the XML sent of control characters which cause Solr's XML parser to break. commit d09af73edeaa86fed3fcf842fb887d0660344a79 Author: polarcowz Date: Sun Jun 21 06:10:45 2009 +0000 Fixed up a couple distribution bits. commit 059315e37a827221847b2d574d3a3ce635a12a35 Author: polarcowz Date: Fri Jun 19 18:54:14 2009 +0000 Added spellchecking support. commit 945c60ec02c9ab7973c3f9ddf391f63c1e87d30f Author: polarcowz Date: Fri Jun 19 18:54:10 2009 +0000 Added timeouts (optional if httplib2 is installed). commit 6bcd192a37782223da049ce18db18d0085c1ef0e Author: polarcowz Date: Wed May 27 03:11:32 2009 +0000 Fixed DATETIME_REGEX & _from_python to match Solr documentation. Thanks initcrash! commit 8ad217c0718697420e0e2b15df0a7d60151c32ce Author: polarcowz Date: Mon May 18 04:21:21 2009 +0000 Under some circumstances, Solr returns a regular data type instead of a string. Deal with it in _to_python as best as possible. commit c3481e4b3ebf61acbb6c7614325941fc228fa794 Author: polarcowz Date: Sat May 9 21:33:11 2009 +0000 Added '_to_python' method for converting data back to its native Python type. Backward compatible (requires manually calling). commit e399bdfee6b90589fd7497010ac703296b840a36 Author: polarcowz Date: Mon May 4 06:06:18 2009 +0000 Updated pysolr to version 2.0. New bits: * Now uses JSON instead of parsing XML. (jkocherhans) * Added support for passing many types of query parameters to Solr. (daniellindsley) * Added support for More Like This (requires Solr 1.3+). (daniellindsley) * Added support for highlighting. (daniellindsley) * Added support for faceting. (daniellindsley) Ought to be fairly backward-compatible (no known issues) but caution is advised when upgrading. Newly requires either the 'json' or 'simplejson' modules. commit 96cb415a4f83875bbf2eff0f945cac3f8ca48b0f Author: jacob.kaplanmoss Date: Thu Feb 5 18:06:45 2009 +0000 Added the stuff needed to easy_install pysolr. And a LICENSE, since I just made fun of another project for not having one. commit 2391204216bfb67c853727123062a5cc88c1e46c Author: jkocherhans Date: Fri Mar 28 21:21:13 2008 +0000 It would probably help if I imported the correct thing. commit 21fb177cc7ceadb3fa90456cba4875dfa94c12e5 Author: jkocherhans Date: Fri Mar 28 20:46:52 2008 +0000 This is getting a bit hairy, but try to import ElementTree from lxml as well. commit 0efb33a45155d898e45e1d569364d769d4aa7464 Author: jkocherhans Date: Sat Mar 15 00:34:52 2008 +0000 Use cElementTree if it's available. commit e8614c181118c646d7934143358b3e51a3b09499 Author: jkocherhans Date: Fri Mar 14 22:28:16 2008 +0000 Removed unused import. Thanks, jarek.zgoda. commit 1785e1247b6a56ad0c228d69a591e9228e6e074a Author: jkocherhans Date: Fri Mar 14 22:27:09 2008 +0000 Removed default values for start and rows from the search method. Thanks, jarek.zgoda. This will allow people to let solr determine what the default for those should be. commit 5e8afff9a9d05b8829e084f4036352eb0e09cbba Author: jkocherhans Date: Fri Mar 14 22:21:38 2008 +0000 Added converters for float and decimal. This references Issue 1. Thanks, jarek.zgoda. commit 5e2bedbf9c2f9f8d56709260c1bca892c6ba5748 Author: jkocherhans Date: Wed Jan 30 17:12:37 2008 +0000 Fixed a bug for connections that don't specify a port number. commit 3304bdc2658833899889250f7de3cacc1fd0d20f Author: jkocherhans Date: Thu Jan 24 20:16:56 2008 +0000 Fixed Python 2.5-ism. commit f75adc4a72ec6b0a8b74ff74805a06fa2f41cb5b Author: jkocherhans Date: Thu Jan 24 18:23:41 2008 +0000 Allowed for connections to solr instances that don't live at /solr. commit 5e239eaf37e6d68f582e9d8623f3aa214ea303b7 Author: jkocherhans Date: Fri Jan 18 20:32:40 2008 +0000 Added multiValue field handling support. commit 7d1d48329e511d0f3a2b96e7dfba6368db5093ab Author: jkocherhans Date: Thu Jan 17 21:56:08 2008 +0000 Broke results out into a separate object with docs and hits attributes. commit 35f5f047b1f1aba61b519fc8c224b71ec1155dc5 Author: jkocherhans Date: Tue Jan 15 23:03:24 2008 +0000 Fixed typo that caused breakage with python < 2.5. commit e2207e545f5266872bb4e7d31e5ee1d4b9f4964a Author: jkocherhans Date: Wed Jan 9 17:43:58 2008 +0000 Fixed a small typo. commit 8e4c671829c066531ec30aea3e134bb6070aeb64 Author: jkocherhans Date: Wed Jan 9 17:22:25 2008 +0000 Initial import of pysolr. commit af39f919b725dfb80d53a5e424c2d05823bb16d3 Author: (no author) <(no author)@13ae9d4a-4d43-0410-997b-81b7443f7ec1> Date: Wed Jan 9 17:19:14 2008 +0000 Initial directory structure.