pyquery-1.2.4/0000775000175000017500000000000012055767744013477 5ustar gawelgawel00000000000000pyquery-1.2.4/setup.py0000664000175000017500000000336212055726107015201 0ustar gawelgawel00000000000000#-*- coding:utf-8 -*- # # Copyright (C) 2008 - Olivier Lauzanne # # Distributed under the BSD license, see LICENSE.txt from setuptools import setup, find_packages import os def read(*names): values = dict() for name in names: filename = name + '.rst' if os.path.isfile(filename): fd = open(filename) value = fd.read() fd.close() else: value = '' values[name] = value return values long_description = """ %(README)s See http://packages.python.org/pyquery/ for the full documentation News ==== %(CHANGES)s """ % read('README', 'CHANGES') version = '1.2.4' setup(name='pyquery', version=version, description='A jquery-like library for python', long_description=long_description, classifiers=[ "Intended Audience :: Developers", "Development Status :: 5 - Production/Stable", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", ], keywords='jquery html xml scraping', author='Olivier Lauzanne', author_email='olauzanne@gmail.com', maintainer='Gael Pasgrimaud', maintainer_email='gael@gawel.org', url='https://github.com/gawel/pyquery', license='BSD', packages=find_packages(exclude=['bootstrap', 'bootstrap-py3k']), include_package_data=True, zip_safe=False, install_requires=[ 'lxml>=2.1', 'cssselect', ], entry_points=""" # -*- Entry points: -*- """, ) pyquery-1.2.4/docs/0000775000175000017500000000000012055767744014427 5ustar gawelgawel00000000000000pyquery-1.2.4/docs/changes.txt0000664000175000017500000000005012055501750016552 0ustar gawelgawel00000000000000News ===== .. include:: ../CHANGES.rst pyquery-1.2.4/docs/scrap.txt0000664000175000017500000000075012055765012016264 0ustar gawelgawel00000000000000Scraping ========= PyQuery is able to load html document from urls:: >>> pq('http://duckduckgo.com/') [] By default it use python's urllib. If `requests`_ is installed then it will use it. This allow you to use most of `requests`_ parameters:: >>> pq('http://duckduckgo.com/', headers={'user-agent': 'pyquery'}) [] >>> pq('https://duckduckgo.com/', {'q': 'foo'}, method='post', verify=True) [] .. _requests: http://docs.python-requests.org/en/latest/ pyquery-1.2.4/docs/index.txt0000664000175000017500000000146512055763730016274 0ustar gawelgawel00000000000000.. include:: ../README.rst Full documentation ================== .. toctree:: :maxdepth: 1 attributes css manipulating traversing api scrap ajax tips testing future changes More documentation ================== First there is the Sphinx documentation `here`_. Then for more documentation about the API you can use the `jquery website`_. The reference I'm now using for the API is ... the `color cheat sheet`_. Then you can always look at the `code`_. .. _jquery website: http://docs.jquery.com/ .. _code: https://github.com/gawel/pyquery .. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png .. _here: http://packages.python.org/pyquery/ Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` pyquery-1.2.4/docs/attributes.txt0000664000175000017500000000105212055472046017340 0ustar gawelgawel00000000000000Attributes ---------- You can play with the attributes with the jquery API:: >>> p = pq('

')('p') >>> p.attr("id") 'hello' >>> p.attr("id", "plop") [] >>> p.attr("id", "hello") [] Or in a more pythonic way:: >>> p.attr.id = "plop" >>> p.attr.id 'plop' >>> p.attr["id"] = "ola" >>> p.attr["id"] 'ola' >>> p.attr(id='hello', class_='hello2') [] >>> p.attr.class_ 'hello2' >>> p.attr.class_ = 'hello' pyquery-1.2.4/docs/future.txt0000664000175000017500000000054712055472046016474 0ustar gawelgawel00000000000000Future ------- - SELECTORS: done - ATTRIBUTES: done - CSS: done - HTML: done - MANIPULATING: missing the wrapInner method - TRAVERSING: about half done - EVENTS: nothing to do with server side might be used later for automatic ajax - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on server side - AJAX: some with wsgi app pyquery-1.2.4/docs/testing.txt0000664000175000017500000000046312055501107016623 0ustar gawelgawel00000000000000Testing ------- If you want to run the tests that you can see above you should do:: $ git clone git://github.com/gawel/pyquery.git $ cd pyquery $ python bootstrap.py $ bin/buildout install tox $ bin/tox You can build the Sphinx documentation by doing:: $ cd docs $ make html pyquery-1.2.4/docs/manipulating.txt0000664000175000017500000000350712055472046017651 0ustar gawelgawel00000000000000Manipulating ------------ You can also add content to the end of tags:: >>> d = pq('

you know Python rocks

') >>> d('p').append(' check out reddit') [] >>> print d

you know Python rocks check out reddit

Or to the beginning:: >>> p = d('p') >>> p.prepend('check out reddit') [] >>> p.html() u'check out reddityou know ...' Prepend or append an element into an other:: >>> d = pq('') >>> p.prependTo(d('#test')) [] >>> d('#test').html() u'

>> p.insertAfter(d('#test')) [] >>> d('#test').html() u'python !' Or before:: >>> p.insertBefore(d('#test')) [] >>> d('body').html() u'

...' Doing something for each elements:: >>> p.each(lambda e: e.addClass('hello2')) [] Remove an element:: >>> d = pq('

Yeah!

python rocks !

') >>> d.remove('p#id') [] >>> d('p#id') [] Remove what's inside the selection:: >>> d('p').empty() [

] And you can get back the modified html:: >>> print d

You can generate html stuff:: >>> from pyquery import PyQuery as pq >>> print pq('

Yeah !
').addClass('myclass') + pq('cool')
Yeah !
cool pyquery-1.2.4/docs/tips.txt0000664000175000017500000000206012055754316016134 0ustar gawelgawel00000000000000Tips ==== Making links absolute --------------------- You can make links absolute which can be usefull for screen scrapping:: >>> d = pq(url='http://duckduckgo.com/', parser='html') >>> d('a[tabindex="-1"]').attr('href') '/about.html' >>> d.make_links_absolute() [] >>> d('a[tabindex="-1"]').attr('href') 'http://duckduckgo.com/about.html' Using different parsers ----------------------- By default pyquery uses the lxml xml parser and then if it doesn't work goes on to try the html parser from lxml.html. The xml parser can sometimes be problematic when parsing xhtml pages because the parser will not raise an error but give an unusable tree (on w3c.org for example). You can also choose which parser to use explicitly:: >>> pq('

toto

', parser='xml') [] >>> pq('

toto

', parser='html') [] >>> pq('

toto

', parser='html_fragments') [

] The html and html_fragments parser are the ones from lxml.html. pyquery-1.2.4/docs/api.txt0000664000175000017500000000033212055766747015741 0ustar gawelgawel00000000000000================================================ :mod:`~pyquery.pyquery` -- PyQuery complete API ================================================ .. automodule:: pyquery.pyquery .. autoclass:: PyQuery :members: pyquery-1.2.4/docs/ajax.txt0000664000175000017500000000262312055766755016117 0ustar gawelgawel00000000000000============================================= :mod:`pyquery.ajax` -- PyQuery AJAX extension ============================================= .. automodule:: pyquery.ajax .. fake imports >>> from ajax import PyQuery as pq You can query some wsgi app if `WebOb`_ is installed (it's not a pyquery dependencie). IN this example the test app returns a simple input at `/` and a submit button at `/submit`:: >>> d = pq('

', app=input_app) >>> d.append(d.get('/')) [
] >>> print d
The app is also available in new nodes:: >>> d.get('/').app is d.app is d('form').app True You can also request another path:: >>> d.append(d.get('/submit')) [
] >>> print d
If `restkit`_ is installed, you are able to get url directly with a `HostProxy`_ app:: >>> a = d.get('http://packages.python.org/pyquery/') >>> a [] You can retrieve the app response:: >>> print a.response.status 200 OK The response attribute is a `WebOb`_ `Response`_ .. _webob: http://pythonpaste.org/webob/ .. _response: http://pythonpaste.org/webob/#response .. _restkit: http://benoitc.github.com/restkit/ .. _hostproxy: http://benoitc.github.com/restkit/wsgi_proxy.html Api --- .. autoclass:: PyQuery :members: pyquery-1.2.4/docs/css.txt0000664000175000017500000000171012055472046015743 0ustar gawelgawel00000000000000CSS --- .. Initialize tests >>> from pyquery import PyQuery >>> p = PyQuery('

')('p') You can play with css classes:: >>> p.addClass("toto") [] >>> p.toggleClass("titi toto") [] >>> p.removeClass("titi") [] Or the css style:: >>> p.css("font-size", "15px") [] >>> p.attr("style") 'font-size: 15px' >>> p.css({"font-size": "17px"}) [] >>> p.attr("style") 'font-size: 17px' Same thing the pythonic way ('_' characters are translated to '-'):: >>> p.css.font_size = "16px" >>> p.attr.style 'font-size: 16px' >>> p.css['font-size'] = "15px" >>> p.attr.style 'font-size: 15px' >>> p.css(font_size="16px") [] >>> p.attr.style 'font-size: 16px' >>> p.css = {"font-size": "17px"} >>> p.attr.style 'font-size: 17px' pyquery-1.2.4/docs/_build/0000775000175000017500000000000012055767744015665 5ustar gawelgawel00000000000000pyquery-1.2.4/docs/_build/html/0000775000175000017500000000000012055767744016631 5ustar gawelgawel00000000000000pyquery-1.2.4/docs/_build/html/_sources/0000775000175000017500000000000012055767744020453 5ustar gawelgawel00000000000000pyquery-1.2.4/docs/_build/html/_sources/changes.txt0000664000175000017500000000005012055501750022576 0ustar gawelgawel00000000000000News ===== .. include:: ../CHANGES.rst pyquery-1.2.4/docs/_build/html/_sources/scrap.txt0000664000175000017500000000075012055765012022310 0ustar gawelgawel00000000000000Scraping ========= PyQuery is able to load html document from urls:: >>> pq('http://duckduckgo.com/') [] By default it use python's urllib. If `requests`_ is installed then it will use it. This allow you to use most of `requests`_ parameters:: >>> pq('http://duckduckgo.com/', headers={'user-agent': 'pyquery'}) [] >>> pq('https://duckduckgo.com/', {'q': 'foo'}, method='post', verify=True) [] .. _requests: http://docs.python-requests.org/en/latest/ pyquery-1.2.4/docs/_build/html/_sources/index.txt0000664000175000017500000000146512055763730022320 0ustar gawelgawel00000000000000.. include:: ../README.rst Full documentation ================== .. toctree:: :maxdepth: 1 attributes css manipulating traversing api scrap ajax tips testing future changes More documentation ================== First there is the Sphinx documentation `here`_. Then for more documentation about the API you can use the `jquery website`_. The reference I'm now using for the API is ... the `color cheat sheet`_. Then you can always look at the `code`_. .. _jquery website: http://docs.jquery.com/ .. _code: https://github.com/gawel/pyquery .. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png .. _here: http://packages.python.org/pyquery/ Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` pyquery-1.2.4/docs/_build/html/_sources/attributes.txt0000664000175000017500000000105212055472046023364 0ustar gawelgawel00000000000000Attributes ---------- You can play with the attributes with the jquery API:: >>> p = pq('

')('p') >>> p.attr("id") 'hello' >>> p.attr("id", "plop") [] >>> p.attr("id", "hello") [] Or in a more pythonic way:: >>> p.attr.id = "plop" >>> p.attr.id 'plop' >>> p.attr["id"] = "ola" >>> p.attr["id"] 'ola' >>> p.attr(id='hello', class_='hello2') [] >>> p.attr.class_ 'hello2' >>> p.attr.class_ = 'hello' pyquery-1.2.4/docs/_build/html/_sources/future.txt0000664000175000017500000000054712055472046022520 0ustar gawelgawel00000000000000Future ------- - SELECTORS: done - ATTRIBUTES: done - CSS: done - HTML: done - MANIPULATING: missing the wrapInner method - TRAVERSING: about half done - EVENTS: nothing to do with server side might be used later for automatic ajax - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on server side - AJAX: some with wsgi app pyquery-1.2.4/docs/_build/html/_sources/testing.txt0000664000175000017500000000046312055501107022647 0ustar gawelgawel00000000000000Testing ------- If you want to run the tests that you can see above you should do:: $ git clone git://github.com/gawel/pyquery.git $ cd pyquery $ python bootstrap.py $ bin/buildout install tox $ bin/tox You can build the Sphinx documentation by doing:: $ cd docs $ make html pyquery-1.2.4/docs/_build/html/_sources/manipulating.txt0000664000175000017500000000350712055472046023675 0ustar gawelgawel00000000000000Manipulating ------------ You can also add content to the end of tags:: >>> d = pq('

you know Python rocks

') >>> d('p').append(' check out
reddit') [] >>> print d

you know Python rocks check out reddit

Or to the beginning:: >>> p = d('p') >>> p.prepend('check out reddit') [] >>> p.html() u'check out reddityou know ...' Prepend or append an element into an other:: >>> d = pq('') >>> p.prependTo(d('#test')) [] >>> d('#test').html() u'

>> p.insertAfter(d('#test')) [] >>> d('#test').html() u'python !' Or before:: >>> p.insertBefore(d('#test')) [] >>> d('body').html() u'

...' Doing something for each elements:: >>> p.each(lambda e: e.addClass('hello2')) [] Remove an element:: >>> d = pq('

Yeah!

python rocks !

') >>> d.remove('p#id') [] >>> d('p#id') [] Remove what's inside the selection:: >>> d('p').empty() [

] And you can get back the modified html:: >>> print d

You can generate html stuff:: >>> from pyquery import PyQuery as pq >>> print pq('

Yeah !
').addClass('myclass') + pq('cool')
Yeah !
cool pyquery-1.2.4/docs/_build/html/_sources/tips.txt0000664000175000017500000000206012055754316022160 0ustar gawelgawel00000000000000Tips ==== Making links absolute --------------------- You can make links absolute which can be usefull for screen scrapping:: >>> d = pq(url='http://duckduckgo.com/', parser='html') >>> d('a[tabindex="-1"]').attr('href') '/about.html' >>> d.make_links_absolute() [] >>> d('a[tabindex="-1"]').attr('href') 'http://duckduckgo.com/about.html' Using different parsers ----------------------- By default pyquery uses the lxml xml parser and then if it doesn't work goes on to try the html parser from lxml.html. The xml parser can sometimes be problematic when parsing xhtml pages because the parser will not raise an error but give an unusable tree (on w3c.org for example). You can also choose which parser to use explicitly:: >>> pq('

toto

', parser='xml') [] >>> pq('

toto

', parser='html') [] >>> pq('

toto

', parser='html_fragments') [

] The html and html_fragments parser are the ones from lxml.html. pyquery-1.2.4/docs/_build/html/_sources/api.txt0000664000175000017500000000025112055472046021747 0ustar gawelgawel00000000000000:mod:`~pyquery.pyquery` -- PyQuery complete API ================================================ .. automodule:: pyquery.pyquery .. autoclass:: PyQuery :members: pyquery-1.2.4/docs/_build/html/_sources/ajax.txt0000664000175000017500000000254512055746103022127 0ustar gawelgawel00000000000000:mod:`pyquery.ajax` -- PyQuery AJAX extension ============================================= .. automodule:: pyquery.ajax .. fake imports >>> from ajax import PyQuery as pq You can query some wsgi app if `WebOb`_ is installed (it's not a pyquery dependencie). IN this example the test app returns a simple input at `/` and a submit button at `/submit`:: >>> d = pq('

', app=input_app) >>> d.append(d.get('/')) [
] >>> print d
The app is also available in new nodes:: >>> d.get('/').app is d.app is d('form').app True You can also request another path:: >>> d.append(d.get('/submit')) [
] >>> print d
If `restkit`_ is installed, you are able to get url directly with a `HostProxy`_ app:: >>> a = d.get('http://packages.python.org/pyquery/') >>> a [] You can retrieve the app response:: >>> print a.response.status 200 OK The response attribute is a `WebOb`_ `Response`_ .. _webob: http://pythonpaste.org/webob/ .. _response: http://pythonpaste.org/webob/#response .. _restkit: http://benoitc.github.com/restkit/ .. _hostproxy: http://benoitc.github.com/restkit/wsgi_proxy.html Api --- .. autoclass:: PyQuery :members: pyquery-1.2.4/docs/_build/html/_sources/css.txt0000664000175000017500000000171012055472046021767 0ustar gawelgawel00000000000000CSS --- .. Initialize tests >>> from pyquery import PyQuery >>> p = PyQuery('

')('p') You can play with css classes:: >>> p.addClass("toto") [] >>> p.toggleClass("titi toto") [] >>> p.removeClass("titi") [] Or the css style:: >>> p.css("font-size", "15px") [] >>> p.attr("style") 'font-size: 15px' >>> p.css({"font-size": "17px"}) [] >>> p.attr("style") 'font-size: 17px' Same thing the pythonic way ('_' characters are translated to '-'):: >>> p.css.font_size = "16px" >>> p.attr.style 'font-size: 16px' >>> p.css['font-size'] = "15px" >>> p.attr.style 'font-size: 15px' >>> p.css(font_size="16px") [] >>> p.attr.style 'font-size: 16px' >>> p.css = {"font-size": "17px"} >>> p.attr.style 'font-size: 17px' pyquery-1.2.4/docs/_build/html/_sources/traversing.txt0000664000175000017500000000140012055472046023357 0ustar gawelgawel00000000000000Traversing ---------- Some jQuery traversal methods are supported. Here are a few examples. You can filter the selection list using a string selector:: >>> d = pq('

') >>> d('p').filter('.hello') [] It is possible to select a single element with eq:: >>> d('p').eq(0) [] You can find nested elements:: >>> d('p').find('a') [, ] >>> d('p').eq(1).find('a') [] Breaking out of a level of traversal is also supported using end:: >>> d('p').find('a').end() [, ] >>> d('p').eq(0).end() [, ] >>> d('p').filter(lambda i: i == 1).end() [, ] pyquery-1.2.4/docs/traversing.txt0000664000175000017500000000140012055472046017333 0ustar gawelgawel00000000000000Traversing ---------- Some jQuery traversal methods are supported. Here are a few examples. You can filter the selection list using a string selector:: >>> d = pq('

') >>> d('p').filter('.hello') [] It is possible to select a single element with eq:: >>> d('p').eq(0) [] You can find nested elements:: >>> d('p').find('a') [, ] >>> d('p').eq(1).find('a') [] Breaking out of a level of traversal is also supported using end:: >>> d('p').find('a').end() [, ] >>> d('p').eq(0).end() [, ] >>> d('p').filter(lambda i: i == 1).end() [, ] pyquery-1.2.4/pyquery/0000775000175000017500000000000012055767744015215 5ustar gawelgawel00000000000000pyquery-1.2.4/pyquery/pyquery.py0000664000175000017500000012170212055765201017272 0ustar gawelgawel00000000000000#-*- coding:utf-8 -*- # # Copyright (C) 2008 - Olivier Lauzanne # # Distributed under the BSD license, see LICENSE.txt from .cssselectpatch import JQueryTranslator from .openers import url_opener from copy import deepcopy from lxml import etree import lxml.html import inspect import sys PY3k = sys.version_info >= (3,) if PY3k: from urllib.parse import urlencode from urllib.parse import urljoin basestring = (str, bytes) unicode = str else: from urllib import urlencode # NOQA from urlparse import urljoin # NOQA def func_globals(f): return f.__globals__ if PY3k else f.func_globals def func_code(f): return f.__code__ if PY3k else f.func_code def fromstring(context, parser=None, custom_parser=None): """use html parser if we don't have clean xml """ if hasattr(context, 'read') and hasattr(context.read, '__call__'): meth = 'parse' else: meth = 'fromstring' if custom_parser is None: if parser is None: try: result = getattr(etree, meth)(context) except etree.XMLSyntaxError: result = getattr(lxml.html, meth)(context) if isinstance(result, etree._ElementTree): return [result.getroot()] else: return [result] elif parser == 'xml': custom_parser = getattr(etree, meth) elif parser == 'html': custom_parser = getattr(lxml.html, meth) elif parser == 'soup': from lxml.html import soupparser custom_parser = getattr(soupparser, meth) elif parser == 'html_fragments': custom_parser = lxml.html.fragments_fromstring else: ValueError('No such parser: "%s"' % parser) result = custom_parser(context) if type(result) is list: return result elif isinstance(result, etree._ElementTree): return [result.getroot()] elif result is not None: return [result] else: return [] def callback(func, *args): return func(*args[:func_code(func).co_argcount]) class NoDefault(object): def __repr__(self): """clean representation in Sphinx""" return '' no_default = NoDefault() del NoDefault class FlexibleElement(object): """property to allow a flexible api""" def __init__(self, pget, pset=no_default, pdel=no_default): self.pget = pget self.pset = pset self.pdel = pdel def __get__(self, instance, klass): class _element(object): """real element to support set/get/del attr and item and js call style""" def __call__(prop, *args, **kwargs): return self.pget(instance, *args, **kwargs) __getattr__ = __getitem__ = __setattr__ = __setitem__ = __call__ def __delitem__(prop, name): if self.pdel is not no_default: return self.pdel(instance, name) else: raise NotImplementedError() __delattr__ = __delitem__ def __repr__(prop): return '' % self.pget.__name__ return _element() def __set__(self, instance, value): if self.pset is not no_default: self.pset(instance, value) else: raise NotImplementedError() class PyQuery(list): """The main class """ def __init__(self, *args, **kwargs): html = None elements = [] self._base_url = None self.parser = kwargs.get('parser', None) if 'parser' in kwargs: del kwargs['parser'] if len(args) >= 1 and \ (not PY3k and isinstance(args[0], basestring) or \ (PY3k and isinstance(args[0], str))) and \ args[0].split('://', 1)[0] in ('http', 'https'): kwargs['url'] = args[0] if len(args) >= 2: kwargs['data'] = args[1] args = [] if 'parent' in kwargs: self._parent = kwargs.pop('parent') else: self._parent = no_default if 'css_translator' in kwargs: self._translator = kwargs.pop('css_translator') elif self.parser in ('xml',): self._translator = JQueryTranslator(xhtml=True) elif self._parent is not no_default: self._translator = self._parent._translator else: self._translator = JQueryTranslator(xhtml=False) namespaces = kwargs.get('namespaces', {}) if 'namespaces' in kwargs: del kwargs['namespaces'] if kwargs: # specific case to get the dom if 'filename' in kwargs: html = open(kwargs['filename']) elif 'url' in kwargs: url = kwargs.pop('url') if 'opener' in kwargs: opener = kwargs.pop('opener') html = opener(url, **kwargs) else: html = url_opener(url, kwargs) if not self.parser: self.parser = 'html' self._base_url = url else: raise ValueError('Invalid keyword arguments %s' % kwargs) elements = fromstring(html, self.parser) # close open descriptor if possible if hasattr(html, 'close'): try: html.close() except: pass else: # get nodes # determine context and selector if any selector = context = no_default length = len(args) if length == 1: context = args[0] elif length == 2: selector, context = args else: raise ValueError("You can't do that." +\ " Please, provide arguments") # get context if isinstance(context, basestring): try: elements = fromstring(context, self.parser) except Exception: raise elif isinstance(context, self.__class__): # copy elements = context[:] elif isinstance(context, list): elements = context elif isinstance(context, etree._Element): elements = [context] # select nodes if elements and selector is not no_default: xpath = self._css_to_xpath(selector) results = [] for tag in elements: results.extend(tag.xpath(xpath, namespaces=namespaces)) elements = results list.__init__(self, elements) def _css_to_xpath(self, selector, prefix='descendant-or-self::'): selector = selector.replace('[@', '[') return self._translator.css_to_xpath(selector, prefix) def __call__(self, *args, **kwargs): """return a new PyQuery instance """ length = len(args) if length == 0: raise ValueError('You must provide at least a selector') if args[0] == '': return self.__class__([]) if len(args) == 1 and \ isinstance(args[0], str) and \ not args[0].startswith('<'): args += (self,) result = self.__class__(*args, parent=self, **kwargs) return result # keep original list api prefixed with _ _append = list.append _extend = list.extend # improve pythonic api def __add__(self, other): assert isinstance(other, self.__class__) return self.__class__(self[:] + other[:]) def extend(self, other): """Extend with anoter PyQuery object""" assert isinstance(other, self.__class__) self._extend(other[:]) def items(self, selector=None): """Iter over elements. Return PyQuery objects: >>> d = PyQuery('
foobar
') >>> [i.text() for i in d.items('span')] ['foo', 'bar'] >>> [i.text() for i in d('span').items()] ['foo', 'bar'] """ elems = selector and self(selector) or self for elem in elems: yield self.__class__(elem) def xhtml_to_html(self): """Remove xhtml namespace: >>> doc = PyQuery( ... '') >>> doc [<{http://www.w3.org/1999/xhtml}html>] >>> doc.remove_namespaces() [] """ try: root = self[0].getroottree() except IndexError: pass else: lxml.html.xhtml_to_html(root) return self def remove_namespaces(self): """Remove all namespaces: >>> doc = PyQuery('') >>> doc [<{http://example.com/foo}foo>] >>> doc.remove_namespaces() [] """ try: root = self[0].getroottree() except IndexError: pass else: for el in root.iter('{*}*'): if el.tag.startswith('{'): el.tag = el.tag.split('}', 1)[1] return self def __str__(self): """xml representation of current nodes:: >>> xml = PyQuery( ... '', parser='html_fragments') >>> print(str(xml)) """ if PY3k: return ''.join([etree.tostring(e, encoding=str) for e in self]) else: return ''.join([etree.tostring(e) for e in self]) def __unicode__(self): """xml representation of current nodes""" return unicode('').join([etree.tostring(e, encoding=unicode) \ for e in self]) def __html__(self): """html representation of current nodes:: >>> html = PyQuery( ... '', parser='html_fragments') >>> print(html.__html__()) """ return unicode('').join([lxml.html.tostring(e, encoding=unicode) \ for e in self]) def __repr__(self): r = [] try: for el in self: c = el.get('class') c = c and '.' + '.'.join(c.split(' ')) or '' id = el.get('id') id = id and '#' + id or '' r.append('<%s%s%s>' % (el.tag, id, c)) return '[' + (', '.join(r)) + ']' except AttributeError: if PY3k: return list.__repr__(self) else: for el in self: if isinstance(el, unicode): r.append(el.encode('utf-8')) else: r.append(el) return repr(r) @property def root(self): """return the xml root element """ if self._parent is not no_default: return self._parent.getroottree() return self[0].getroottree() @property def encoding(self): """return the xml encoding of the root element """ root = self.root if root is not None: return self.root.docinfo.encoding ############## # Traversing # ############## def _filter_only(self, selector, elements, reverse=False, unique=False): """Filters the selection set only, as opposed to also including descendants. """ if selector is None: results = elements else: xpath = self._css_to_xpath(selector, 'self::') results = [] for tag in elements: results.extend(tag.xpath(xpath)) if reverse: results.reverse() if unique: result_list = results results = [] for item in result_list: if not item in results: results.append(item) return self.__class__(results, **dict(parent=self)) def parent(self, selector=None): return self._filter_only( selector, [e.getparent() for e in self if e.getparent() is not None], unique=True) def prev(self, selector=None): return self._filter_only( selector, [e.getprevious() for e in self if e.getprevious() is not None]) def next(self, selector=None): return self._filter_only( selector, [e.getnext() for e in self if e.getnext() is not None]) def _traverse(self, method): for e in self: current = getattr(e, method)() while current is not None: yield current current = getattr(current, method)() def _traverse_parent_topdown(self): for e in self: this_list = [] current = e.getparent() while current is not None: this_list.append(current) current = current.getparent() this_list.reverse() for j in this_list: yield j def _nextAll(self): return [e for e in self._traverse('getnext')] def nextAll(self, selector=None): """ >>> h = '

Hi

Bye

' >>> d = PyQuery(h) >>> d('p:last').nextAll() [] """ return self._filter_only(selector, self._nextAll()) def _prevAll(self): return [e for e in self._traverse('getprevious')] def prevAll(self, selector=None): """ >>> h = '

Hi

Bye

' >>> d = PyQuery(h) >>> d('p:last').prevAll() [] """ return self._filter_only(selector, self._prevAll(), reverse=True) def siblings(self, selector=None): """ >>> h = '

Hi

Bye

' >>> d = PyQuery(h) >>> d('.hello').siblings() [

, ] >>> d('.hello').siblings('img') [] """ return self._filter_only(selector, self._prevAll() + self._nextAll()) def parents(self, selector=None): """ >>> d = PyQuery('

Hi

Bye

') >>> d('p').parents() [] >>> d('.hello').parents('span') [] >>> d('.hello').parents('p') [] """ return self._filter_only( selector, [e for e in self._traverse_parent_topdown()], unique=True ) def children(self, selector=None): """Filter elements that are direct children of self using optional selector: >>> d = PyQuery('

Hi

Bye

') >>> d [] >>> d.children() [,

] >>> d.children('.hello') [] """ elements = [child for tag in self for child in tag.getchildren()] return self._filter_only(selector, elements) def closest(self, selector=None): """ >>> d = PyQuery( ... '

This is a ' ... 'test

') >>> d('strong').closest('div') [] >>> d('strong').closest('.hello') [] >>> d('strong').closest('form') [] """ result = [] for current in self: while current is not None and \ not self.__class__(current).is_(selector): current = current.getparent() if current is not None: result.append(current) return self.__class__(result, **dict(parent=self)) def contents(self): """ Return contents (with text nodes): >>> d = PyQuery('hello bold') >>> d.contents() # doctest: +ELLIPSIS ['hello ', ] """ results = [] for elem in self: results.extend(elem.xpath('child::text()|child::*')) return self.__class__(results, **dict(parent=self)) def filter(self, selector): """Filter elements in self using selector (string or function): >>> d = PyQuery('

Hi

Bye

') >>> d('p') [,

] >>> d('p').filter('.hello') [] >>> d('p').filter(lambda i: i == 1) [

] >>> d('p').filter(lambda i: PyQuery(this).text() == 'Hi') [] >>> d('p').filter(lambda i, this: PyQuery(this).text() == 'Hi') [] """ if not hasattr(selector, '__call__'): return self._filter_only(selector, self) else: elements = [] args = inspect.getargspec(callback).args try: for i, this in enumerate(self): if len(args) == 1: func_globals(selector)['this'] = this if callback(selector, i, this): elements.append(this) finally: f_globals = func_globals(selector) if 'this' in f_globals: del f_globals['this'] return self.__class__(elements, **dict(parent=self)) def not_(self, selector): """Return elements that don't match the given selector: >>> d = PyQuery('

Hi

Bye

') >>> d('p').not_('.hello') [

] """ exclude = set(self.__class__(selector, self)) return self.__class__([e for e in self if e not in exclude], **dict(parent=self)) def is_(self, selector): """Returns True if selector matches at least one current element, else False: >>> d = PyQuery('

Hi

Bye

') >>> d('p').eq(0).is_('.hello') True >>> d('p').eq(1).is_('.hello') False .. """ return bool(self.__class__(selector, self)) def find(self, selector): """Find elements using selector traversing down from self: >>> m = '

Whoah!

there

' >>> d = PyQuery(m) >>> d('p').find('em') [, ] >>> d('p').eq(1).find('em') [] """ xpath = self._css_to_xpath(selector) results = [child.xpath(xpath) for tag in self \ for child in tag.getchildren()] # Flatten the results elements = [] for r in results: elements.extend(r) return self.__class__(elements, **dict(parent=self)) def eq(self, index): """Return PyQuery of only the element with the provided index:: >>> d = PyQuery('

Hi

Bye

') >>> d('p').eq(0) [] >>> d('p').eq(1) [

] >>> d('p').eq(2) [] .. """ # Use slicing to silently handle out of bounds indexes items = self[index:index + 1] return self.__class__(items, **dict(parent=self)) def each(self, func): """apply func on each nodes """ try: for i, element in enumerate(self): func_globals(func)['this'] = element if callback(func, i, element) == False: break finally: f_globals = func_globals(func) if 'this' in f_globals: del f_globals['this'] return self def map(self, func): """Returns a new PyQuery after transforming current items with func. func should take two arguments - 'index' and 'element'. Elements can also be referred to as 'this' inside of func:: >>> d = PyQuery('

Hi there

Bye


') >>> d('p').map(lambda i, e: PyQuery(e).text()) ['Hi there', 'Bye'] >>> d('p').map(lambda i, e: len(PyQuery(this).text())) [8, 3] >>> d('p').map(lambda i, e: PyQuery(this).text().split()) ['Hi', 'there', 'Bye'] """ items = [] try: for i, element in enumerate(self): func_globals(func)['this'] = element result = callback(func, i, element) if result is not None: if not isinstance(result, list): items.append(result) else: items.extend(result) finally: f_globals = func_globals(func) if 'this' in f_globals: del f_globals['this'] return self.__class__(items, **dict(parent=self)) @property def length(self): return len(self) def size(self): return len(self) def end(self): """Break out of a level of traversal and return to the parent level. >>> m = '

Whoah!

there

' >>> d = PyQuery(m) >>> d('p').eq(1).find('em').end().end() [

,

] """ return self._parent ############## # Attributes # ############## def attr(self, *args, **kwargs): """Attributes manipulation """ mapping = {'class_': 'class', 'for_': 'for'} attr = value = no_default length = len(args) if length == 1: attr = args[0] attr = mapping.get(attr, attr) elif length == 2: attr, value = args attr = mapping.get(attr, attr) elif kwargs: attr = {} for k, v in kwargs.items(): attr[mapping.get(k, k)] = v else: raise ValueError('Invalid arguments %s %s' % (args, kwargs)) if not self: return None elif isinstance(attr, dict): for tag in self: for key, value in attr.items(): tag.set(key, value) elif value is no_default: return self[0].get(attr) elif value is None or value == '': return self.removeAttr(attr) else: for tag in self: tag.set(attr, value) return self def removeAttr(self, name): """Remove an attribute:: >>> d = PyQuery('

') >>> d.removeAttr('id') [
] .. """ for tag in self: del tag.attrib[name] return self attr = FlexibleElement(pget=attr, pdel=removeAttr) ####### # CSS # ####### def height(self, value=no_default): """set/get height of element """ return self.attr('height', value) def width(self, value=no_default): """set/get width of element """ return self.attr('width', value) def hasClass(self, name): """Return True if element has class:: >>> d = PyQuery('
') >>> d.hasClass('myclass') True .. """ return self.is_('.%s' % name) def addClass(self, value): """Add a css class to elements:: >>> d = PyQuery('
') >>> d.addClass('myclass') [] .. """ for tag in self: values = value.split(' ') classes = (tag.get('class') or '').split() classes += [v for v in values if v not in classes] tag.set('class', ' '.join(classes)) return self def removeClass(self, value): """Remove a css class to elements:: >>> d = PyQuery('
') >>> d.removeClass('myclass') [
] .. """ for tag in self: values = value.split(' ') classes = set((tag.get('class') or '').split()) classes.difference_update(values) classes.difference_update(['']) tag.set('class', ' '.join(classes)) return self def toggleClass(self, value): """Toggle a css class to elements >>> d = PyQuery('
') >>> d.toggleClass('myclass') [] """ for tag in self: values = value.split(' ') classes = (tag.get('class') or '').split() values_to_add = [v for v in values if v not in classes] values_to_del = [v for v in values if v in classes] classes = [v for v in classes if v not in values_to_del] classes += values_to_add tag.set('class', ' '.join(classes)) return self def css(self, *args, **kwargs): """css attributes manipulation """ attr = value = no_default length = len(args) if length == 1: attr = args[0] elif length == 2: attr, value = args elif kwargs: attr = kwargs else: raise ValueError('Invalid arguments %s %s' % (args, kwargs)) if isinstance(attr, dict): for tag in self: stripped_keys = [key.strip().replace('_', '-') for key in attr.keys()] current = [el.strip() for el in (tag.get('style') or '').split(';') if el.strip() and not el.split(':')[0].strip() in stripped_keys] for key, value in attr.items(): key = key.replace('_', '-') current.append('%s: %s' % (key, value)) tag.set('style', '; '.join(current)) elif isinstance(value, basestring): attr = attr.replace('_', '-') for tag in self: current = [el.strip() for el in (tag.get('style') or '').split(';') if el.strip() and not el.split(':')[0].strip() == attr.strip()] current.append('%s: %s' % (attr, value)) tag.set('style', '; '.join(current)) return self css = FlexibleElement(pget=css, pset=css) ################### # CORE UI EFFECTS # ################### def hide(self): """remove display:none to elements style >>> print(PyQuery('
').hide())
""" return self.css('display', 'none') def show(self): """add display:block to elements style >>> print(PyQuery('
').show())
""" return self.css('display', 'block') ######## # HTML # ######## def val(self, value=no_default): """Set the attribute value:: >>> d = PyQuery('') >>> d.val('Youhou') [] Get the attribute value:: >>> d.val() 'Youhou' """ return self.attr('value', value) or None def html(self, value=no_default, **kwargs): """Get or set the html representation of sub nodes. Get the text value:: >>> d = PyQuery('
toto
') >>> print(d.html()) toto Extra args are passed to ``lxml.etree.tostring``:: >>> d = PyQuery('
') >>> print(d.html()) >>> print(d.html(method='html')) Set the text value:: >>> d.html('Youhou !') [
] >>> print(d)
Youhou !
""" if value is no_default: if not self: return None tag = self[0] children = tag.getchildren() if not children: return tag.text html = tag.text or '' if 'encoding' not in kwargs: kwargs['encoding'] = unicode html += unicode('').join([etree.tostring(e, **kwargs) \ for e in children]) return html else: if isinstance(value, self.__class__): new_html = unicode(value) elif isinstance(value, basestring): new_html = value elif not value: new_html = '' else: raise ValueError(type(value)) for tag in self: for child in tag.getchildren(): tag.remove(child) root = fromstring( unicode('') + new_html + unicode(''), self.parser)[0] children = root.getchildren() if children: tag.extend(children) tag.text = root.text tag.tail = root.tail return self def outerHtml(self): """Get the html representation of the first selected element:: >>> d = PyQuery('
toto rocks
') >>> print(d('span')) toto rocks >>> print(d('span').outerHtml()) toto >>> S = PyQuery('

Only me & myself

') >>> print(S('b').outerHtml()) me .. """ if not self: return None e0 = self[0] if e0.tail: e0 = deepcopy(e0) e0.tail = '' return lxml.html.tostring(e0, encoding=unicode) def text(self, value=no_default): """Get or set the text representation of sub nodes. Get the text value:: >>> doc = PyQuery('
tototata
') >>> print(doc.text()) toto tata Set the text value:: >>> doc.text('Youhou !') [
] >>> print(doc)
Youhou !
""" if value is no_default: if not self: return None text = [] def add_text(tag, no_tail=False): if tag.text and not isinstance(tag, lxml.etree._Comment): text.append(tag.text) for child in tag.getchildren(): add_text(child) if not no_tail and tag.tail: text.append(tag.tail) for tag in self: add_text(tag, no_tail=True) return ' '.join([t.strip() for t in text if t.strip()]) for tag in self: for child in tag.getchildren(): tag.remove(child) tag.text = value return self ################ # Manipulating # ################ def _get_root(self, value): if isinstance(value, basestring): root = fromstring(unicode('') + value + unicode(''), self.parser)[0] elif isinstance(value, etree._Element): root = self.__class__(value) elif isinstance(value, PyQuery): root = value else: raise TypeError( 'Value must be string, PyQuery or Element. Got %r' % value) if hasattr(root, 'text') and isinstance(root.text, basestring): root_text = root.text else: root_text = '' return root, root_text def append(self, value): """append value to each nodes """ root, root_text = self._get_root(value) for i, tag in enumerate(self): if len(tag) > 0: # if the tag has children last_child = tag[-1] if not last_child.tail: last_child.tail = '' last_child.tail += root_text else: if not tag.text: tag.text = '' tag.text += root_text if i > 0: root = deepcopy(list(root)) tag.extend(root) root = tag[-len(root):] return self def appendTo(self, value): """append nodes to value """ value.append(self) return self def prepend(self, value): """prepend value to nodes """ root, root_text = self._get_root(value) for i, tag in enumerate(self): if not tag.text: tag.text = '' if len(root) > 0: root[-1].tail = tag.text tag.text = root_text else: tag.text = root_text + tag.text if i > 0: root = deepcopy(list(root)) tag[:0] = root root = tag[:len(root)] return self def prependTo(self, value): """prepend nodes to value """ value.prepend(self) return self def after(self, value): """add value after nodes """ root, root_text = self._get_root(value) for i, tag in enumerate(self): if not tag.tail: tag.tail = '' tag.tail += root_text if i > 0: root = deepcopy(list(root)) parent = tag.getparent() index = parent.index(tag) + 1 parent[index:index] = root root = parent[index:len(root)] return self def insertAfter(self, value): """insert nodes after value """ value.after(self) return self def before(self, value): """insert value before nodes """ root, root_text = self._get_root(value) for i, tag in enumerate(self): previous = tag.getprevious() if previous != None: if not previous.tail: previous.tail = '' previous.tail += root_text else: parent = tag.getparent() if not parent.text: parent.text = '' parent.text += root_text if i > 0: root = deepcopy(list(root)) parent = tag.getparent() index = parent.index(tag) parent[index:index] = root root = parent[index:len(root)] return self def insertBefore(self, value): """insert nodes before value """ value.before(self) return self def wrap(self, value): """A string of HTML that will be created on the fly and wrapped around each target: >>> d = PyQuery('youhou') >>> d.wrap('
') [
] >>> print(d)
youhou
""" assert isinstance(value, basestring) value = fromstring(value)[0] nodes = [] for tag in self: wrapper = deepcopy(value) # FIXME: using iterchildren is probably not optimal if not wrapper.getchildren(): wrapper.append(deepcopy(tag)) else: childs = [c for c in wrapper.iterchildren()] child = childs[-1] child.append(deepcopy(tag)) nodes.append(wrapper) parent = tag.getparent() if parent is not None: for t in parent.iterchildren(): if t is tag: t.addnext(wrapper) parent.remove(t) break self[:] = nodes return self def wrapAll(self, value): """Wrap all the elements in the matched set into a single wrapper element:: >>> d = PyQuery('
Heyyou !
') >>> print(d('span').wrapAll('
'))
Heyyou !
.. """ if not self: return self assert isinstance(value, basestring) value = fromstring(value)[0] wrapper = deepcopy(value) if not wrapper.getchildren(): child = wrapper else: childs = [c for c in wrapper.iterchildren()] child = childs[-1] replace_childs = True parent = self[0].getparent() if parent is None: parent = no_default # add nodes to wrapper and check parent for tag in self: child.append(deepcopy(tag)) if tag.getparent() is not parent: replace_childs = False # replace nodes i parent if possible if parent is not no_default and replace_childs: childs = [c for c in parent.iterchildren()] if len(childs) == len(self): for tag in self: parent.remove(tag) parent.append(wrapper) self[:] = [wrapper] return self def replaceWith(self, value): """replace nodes by value """ if hasattr(value, '__call__'): for i, element in enumerate(self): self.__class__(element).before( value(i, element) + (element.tail or '')) parent = element.getparent() parent.remove(element) else: for tag in self: self.__class__(tag).before(value + (tag.tail or '')) parent = tag.getparent() parent.remove(tag) return self def replaceAll(self, expr): """replace nodes by expr """ if self._parent is no_default: raise ValueError( 'replaceAll can only be used with an object with parent') self._parent(expr).replaceWith(self) return self def clone(self): """return a copy of nodes """ return PyQuery([deepcopy(tag) for tag in self]) def empty(self): """remove nodes content """ for tag in self: tag.text = None tag[:] = [] return self def remove(self, expr=no_default): """Remove nodes: >>> h = '
Maybe she does NOT know
' >>> d = PyQuery(h) >>> d('strong').remove() [] >>> print(d)
Maybe she does know
""" if expr is no_default: for tag in self: parent = tag.getparent() if parent is not None: if tag.tail: prev = tag.getprevious() if prev is None: if not parent.text: parent.text = '' parent.text += ' ' + tag.tail else: if not prev.tail: prev.tail = '' prev.tail += ' ' + tag.tail parent.remove(tag) else: results = self.__class__(expr, self) results.remove() return self class Fn(object): """Hook for defining custom function (like the jQuery.fn): .. sourcecode:: python >>> fn = lambda: this.map(lambda i, el: PyQuery(this).outerHtml()) >>> PyQuery.fn.listOuterHtml = fn >>> S = PyQuery( ... '
  1. Coffee
  2. Tea
  3. Milk
') >>> S('li').listOuterHtml() ['
  • Coffee
  • ', '
  • Tea
  • ', '
  • Milk
  • '] """ def __setattr__(self, name, func): def fn(self, *args): func_globals(func)['this'] = self return func(*args) fn.__name__ = name setattr(PyQuery, name, fn) fn = Fn() ##################################################### # Additional methods that are not in the jQuery API # ##################################################### @property def base_url(self): """Return the url of current html document or None if not available. """ if self._base_url is not None: return self._base_url if self._parent is not no_default: return self._parent.base_url def make_links_absolute(self, base_url=None): """Make all links absolute. """ if base_url is None: base_url = self.base_url if base_url is None: raise ValueError('You need a base URL to make your links' 'absolute. It can be provided by the base_url parameter.') self('a').each(lambda: self(this).attr('href', urljoin(base_url, self(this).attr('href')))) # NOQA return self pyquery-1.2.4/pyquery/rules.py0000664000175000017500000000201512055472046016703 0ustar gawelgawel00000000000000# -*- coding: utf-8 -*- try: from deliverance.pyref import PyReference from deliverance import rules from ajax import PyQuery as pq except ImportError: pass else: class PyQuery(rules.AbstractAction): """Python function""" name = 'py' def __init__(self, source_location, pyref): self.source_location = source_location self.pyref = pyref def apply(self, content_doc, theme_doc, resource_fetcher, log): self.pyref(pq([content_doc]), pq([theme_doc]), resource_fetcher, log) @classmethod def from_xml(cls, el, source_location): """Parses and instantiates the class from an element""" pyref = PyReference.parse_xml( el, source_location=source_location, default_function='transform') return cls(source_location, pyref) rules._actions['pyquery'] = PyQuery def deliverance_proxy(): import deliverance.proxycommand deliverance.proxycommand.main() pyquery-1.2.4/pyquery/__init__.py0000664000175000017500000000042112055741454017311 0ustar gawelgawel00000000000000#-*- coding:utf-8 -*- # # Copyright (C) 2008 - Olivier Lauzanne # # Distributed under the BSD license, see LICENSE.txt try: import webob import restkit except ImportError: from .pyquery import PyQuery else: from .ajax import PyQuery pyquery-1.2.4/pyquery/test.html0000664000175000017500000000021712055744553017053 0ustar gawelgawel00000000000000

    Hello world !

    hello python !

    pyquery-1.2.4/pyquery/ajax.py0000664000175000017500000000517212055760343016503 0ustar gawelgawel00000000000000# -*- coding: utf-8 -*- from .pyquery import PyQuery as Base from .pyquery import no_default from webob import Request, Response try: from restkit.contrib.wsgi_proxy import HostProxy except ImportError: HostProxy = no_default # NOQA class PyQuery(Base): def __init__(self, *args, **kwargs): if 'response' in kwargs: self.response = kwargs.pop('response') else: self.response = Response() if 'app' in kwargs: self.app = kwargs.pop('app') if len(args) == 0: args = [[]] else: self.app = no_default Base.__init__(self, *args, **kwargs) if self._parent is not no_default: self.app = self._parent.app def _wsgi_get(self, path_info, **kwargs): if path_info.startswith('/'): if 'app' in kwargs: app = kwargs.pop('app') elif self.app is not no_default: app = self.app else: raise ValueError('There is no app available') else: if HostProxy is not no_default: app = HostProxy(path_info) path_info = '/' else: raise ImportError('restkit is not installed') environ = kwargs.pop('environ').copy() environ.update(kwargs) # unsuported (came from Deliverance) for key in ['HTTP_ACCEPT_ENCODING', 'HTTP_IF_MATCH', 'HTTP_IF_UNMODIFIED_SINCE', 'HTTP_RANGE', 'HTTP_IF_RANGE']: if key in environ: del environ[key] req = Request.blank(path_info) req.environ.update(environ) resp = req.get_response(app) status = resp.status.split() ctype = resp.content_type.split(';')[0] if status[0] not in '45' and ctype == 'text/html': body = resp.body else: body = [] result = self.__class__(body, parent=self._parent, app=self.app, # always return self.app response=resp) return result def get(self, path_info, **kwargs): """GET a path from wsgi app or url """ environ = kwargs.setdefault('environ', {}) environ['REQUEST_METHOD'] = 'GET' environ['CONTENT_LENGTH'] = '0' return self._wsgi_get(path_info, **kwargs) def post(self, path_info, **kwargs): """POST a path from wsgi app or url """ environ = kwargs.setdefault('environ', {}) environ['REQUEST_METHOD'] = 'POST' return self._wsgi_get(path_info, **kwargs) pyquery-1.2.4/pyquery/openers.py0000664000175000017500000000332212055756575017242 0ustar gawelgawel00000000000000# -*- coding: utf-8 -*- import sys PY3k = sys.version_info >= (3,) if PY3k: from urllib.request import urlopen from urllib.parse import urlencode from urllib.parse import urljoin basestring = (str, bytes) else: from urllib2 import urlopen # NOQA from urllib import urlencode # NOQA from urlparse import urljoin # NOQA try: import requests HAS_REQUEST = True except ImportError: HAS_REQUEST = False allowed_args = ( 'auth', 'data', 'headers', 'verify', 'cert', 'config', 'hooks', 'proxies') def _query(url, method, kwargs): data = None if 'data' in kwargs: data = kwargs.pop('data') if type(data) in (dict, list, tuple): data = urlencode(data) if isinstance(method, basestring) and \ method.lower() == 'get' and data: if '?' not in url: url += '?' elif url[-1] not in ('?', '&'): url += '&' url += data data = None if data and PY3k: data = data.encode('utf-8') return url, data def _requests(url, kwargs): encoding = kwargs.get('encoding') method = kwargs.get('method', 'get').lower() meth = getattr(requests, str(method)) if method == 'get': url, data = _query(url, method, kwargs) kw = {} for k in allowed_args: if k in kwargs: kw[k] = kwargs[k] resp = meth(url=url, **kw) if encoding: resp.encoding = encoding html = resp.content return html def _urllib(url, kwargs): method = kwargs.get('method') url, data = _query(url, method, kwargs) return urlopen(url, data) def url_opener(url, kwargs): if HAS_REQUEST: return _requests(url, kwargs) return _urllib(url, kwargs) pyquery-1.2.4/pyquery/test.py0000664000175000017500000004325412055762746016553 0ustar gawelgawel00000000000000#-*- coding:utf-8 -*- # # Copyright (C) 2008 - Olivier Lauzanne # # Distributed under the BSD license, see LICENSE.txt from webob import Request, Response, exc from lxml import etree import unittest import doctest import socket import sys import os PY3k = sys.version_info >= (3,) if PY3k: from io import StringIO import pyquery from pyquery.pyquery import PyQuery as pq from pyquery.ajax import PyQuery as pqa from http.client import HTTPConnection text_type = str def u(value, encoding): return str(value) def b(value): return value.encode('utf-8') else: from cStringIO import StringIO # NOQA import pyquery # NOQA from httplib import HTTPConnection # NOQA from pyquery import PyQuery as pq # NOQA from ajax import PyQuery as pqa # NOQA text_type = unicode def u(value, encoding): # NOQA return unicode(value, encoding) def b(value): # NOQA return str(value) def not_py3k(func): if not PY3k: return func socket.setdefaulttimeout(5) try: conn = HTTPConnection("packages.python.org:80") conn.request("GET", "/pyquery/") response = conn.getresponse() except (socket.timeout, socket.error): GOT_NET = False else: GOT_NET = True try: import requests # NOQA HAS_REQUEST = True except ImportError: HAS_REQUEST = False def with_net(func): if GOT_NET: return func dirname = os.path.dirname(os.path.abspath(pyquery.__file__)) docs = os.path.join(os.path.dirname(dirname), 'docs') path_to_html_file = os.path.join(dirname, 'test.html') def input_app(environ, start_response): resp = Response() req = Request(environ) if req.path_info == '/': resp.body = '' elif req.path_info == '/submit': resp.body = '' else: resp.body = '' return resp(environ, start_response) class TestReadme(doctest.DocFileCase): path = os.path.join(dirname, '..', 'README.rst') def __init__(self, *args, **kwargs): parser = doctest.DocTestParser() fd = open(self.path) doc = fd.read() fd.close() test = parser.get_doctest(doc, globals(), '', self.path, 0) doctest.DocFileCase.__init__(self, test, optionflags=doctest.ELLIPSIS) def setUp(self): test = self._dt_test test.globs.update(globals()) for filename in os.listdir(docs): if filename.endswith('.txt'): if not GOT_NET and filename in ('ajax.txt', 'tips.txt', 'scrap.txt'): continue if not HAS_REQUEST and filename in ('scrap.txt',): continue if PY3k and filename in ('ajax.txt'): continue klass_name = 'Test%s' % filename.replace('.txt', '').title() path = os.path.join(docs, filename) exec('%s = type("%s", (TestReadme,), dict(path=path))' % ( klass_name, klass_name)) class TestTests(doctest.DocFileCase): path = os.path.join(dirname, 'tests.txt') def __init__(self, *args, **kwargs): parser = doctest.DocTestParser() fd = open(self.path) doc = fd.read() fd.close() test = parser.get_doctest(doc, globals(), '', self.path, 0) doctest.DocFileCase.__init__(self, test, optionflags=doctest.ELLIPSIS) class TestUnicode(unittest.TestCase): def test_unicode(self): xml = pq(u("

    é

    ", 'utf-8')) self.assertEqual(type(xml.html()), text_type) if PY3k: self.assertEqual(str(xml), '

    é

    ') else: self.assertEqual(unicode(xml), u("

    é

    ", 'utf-8')) self.assertEqual(str(xml), '

    é

    ') class TestAttributeCase(unittest.TestCase): def test_xml_upper_element_name(self): xml = pq('foo', parser='xml') self.assertEqual(len(xml('X')), 1) self.assertEqual(len(xml('x')), 0) def test_html_upper_element_name(self): xml = pq('foo', parser='html') self.assertEqual(len(xml('X')), 1) self.assertEqual(len(xml('x')), 1) class TestSelector(unittest.TestCase): klass = pq html = """
    node1
    node2
    node3
    """ html2 = """
    node1
    """ html3 = """
    node1
    node2
    node3
    """ html4 = """
    """ html5 = """

    Heading 1

    Heading 2

    Heading 3

    Heading 4

    Heading 5
    Heading 6
    """ def test_get_root(self): doc = pq(b('

    ')) self.assertEqual(isinstance(doc.root, etree._ElementTree), True) self.assertEqual(doc.encoding, 'UTF-8') def test_selector_from_doc(self): doc = etree.fromstring(self.html) assert len(self.klass(doc)) == 1 assert len(self.klass('div', doc)) == 3 assert len(self.klass('div#node2', doc)) == 1 def test_selector_from_html(self): assert len(self.klass(self.html)) == 1 assert len(self.klass('div', self.html)) == 3 assert len(self.klass('div#node2', self.html)) == 1 def test_selector_from_obj(self): e = self.klass(self.html) assert len(e('div')) == 3 assert len(e('div#node2')) == 1 def test_selector_from_html_from_obj(self): e = self.klass(self.html) assert len(e('div', self.html2)) == 1 assert len(e('div#node2', self.html2)) == 0 def test_class(self): e = self.klass(self.html) assert isinstance(e, self.klass) n = e('div', self.html2) assert isinstance(n, self.klass) assert n._parent is e def test_pseudo_classes(self): e = self.klass(self.html) self.assertEqual(e('div:first').text(), 'node1') self.assertEqual(e('div:last').text(), 'node3') self.assertEqual(e('div:even').text(), 'node1 node3') self.assertEqual(e('div div:even').text(), None) self.assertEqual(e('body div:even').text(), 'node1 node3') self.assertEqual(e('div:gt(0)').text(), 'node2 node3') self.assertEqual(e('div:lt(1)').text(), 'node1') self.assertEqual(e('div:eq(2)').text(), 'node3') #test on the form e = self.klass(self.html4) assert len(e(':disabled')) == 1 assert len(e('input:enabled')) == 9 assert len(e(':selected')) == 1 assert len(e(':checked')) == 2 assert len(e(':file')) == 1 assert len(e(':input')) == 12 assert len(e(':button')) == 2 assert len(e(':radio')) == 3 assert len(e(':checkbox')) == 3 #test on other elements e = self.klass(self.html5) assert len(e(":header")) == 6 assert len(e(":parent")) == 2 assert len(e(":empty")) == 6 assert len(e(":contains('Heading')")) == 6 def test_on_the_fly_dom_creation(self): e = self.klass(self.html) assert e('

    Hello world

    ').text() == 'Hello world' assert e('').text() == None class TestTraversal(unittest.TestCase): klass = pq html = """
    node1
    node2 booyah
    """ def test_filter(self): assert len(self.klass('div', self.html).filter('.node3')) == 1 assert len(self.klass('div', self.html).filter('#node2')) == 1 assert len(self.klass('div', self.html).filter(lambda i: i == 0)) == 1 d = pq('

    Hello warming world

    ') self.assertEqual(d('strong').filter(lambda el: True), []) def test_not(self): assert len(self.klass('div', self.html).not_('.node3')) == 1 def test_is(self): assert self.klass('div', self.html).is_('.node3') assert not self.klass('div', self.html).is_('.foobazbar') def test_find(self): assert len(self.klass('#node1', self.html).find('span')) == 1 assert len(self.klass('#node2', self.html).find('span')) == 2 assert len(self.klass('div', self.html).find('span')) == 3 def test_each(self): doc = self.klass(self.html) doc('span').each(lambda: doc(this).wrap("")) # NOQA assert len(doc('em')) == 3 def test_map(self): def ids_minus_one(i, elem): return int(self.klass(elem).attr('id')[-1]) - 1 assert self.klass('div', self.html).map(ids_minus_one) == [0, 1] d = pq('

    Hello warming world

    ') self.assertEqual(d('strong').map(lambda i, el: pq(this).text()), []) # NOQA def test_end(self): assert len(self.klass('div', self.html).find('span').end()) == 2 assert len(self.klass('#node2', self.html).find('span').end()) == 1 def test_closest(self): assert len(self.klass('#node1 span', self.html).closest('body')) == 1 assert self.klass('#node2', self.html).closest('.node3').attr('id') \ == 'node2' assert self.klass('.node3', self.html).closest('form') == [] class TestOpener(unittest.TestCase): def test_custom_opener(self): def opener(url): return '
    ' doc = pq(url='http://example.com', opener=opener) assert len(doc('.node')) == 1, doc class TestComment(unittest.TestCase): def test_comment(self): doc = pq('
    bar
    ') self.assertEqual(doc.text(), 'bar') class TestCallback(unittest.TestCase): html = """
    1. Coffee
    2. Tea
    3. Milk
    """ def test_S_this_inside_callback(self): S = pq(self.html) self.assertEqual(S('li').map(lambda i, el: S(this).html()), # NOQA ['Coffee', 'Tea', 'Milk']) def test_parameterless_callback(self): S = pq(self.html) self.assertEqual(S('li').map(lambda: S(this).html()), # NOQA ['Coffee', 'Tea', 'Milk']) def application(environ, start_response): req = Request(environ) response = Response() if req.method == 'GET': response.body = b('
    Yeah !
    ') else: response.body = b('
    Yeah !') return response(environ, start_response) def secure_application(environ, start_response): if 'REMOTE_USER' not in environ: return exc.HTTPUnauthorized('vomis')(environ, start_response) return application(environ, start_response) class TestAjaxSelector(TestSelector): klass = pqa @not_py3k @with_net def test_proxy(self): e = self.klass([]) val = e.get('http://packages.python.org/pyquery/') assert len(val('body')) == 1, (str(val.response), val) def test_get(self): e = self.klass(app=application) val = e.get('/') assert len(val('pre')) == 1, val def test_secure_get(self): e = self.klass(app=secure_application) val = e.get('/', environ=dict(REMOTE_USER='gawii')) assert len(val('pre')) == 1, val val = e.get('/', REMOTE_USER='gawii') assert len(val('pre')) == 1, val def test_secure_get_not_authorized(self): e = self.klass(app=secure_application) val = e.get('/') assert len(val('pre')) == 0, val def test_post(self): e = self.klass(app=application) val = e.post('/') assert len(val('a')) == 1, val def test_subquery(self): e = self.klass(app=application) n = e('div') val = n.post('/') assert len(val('a')) == 1, val class TestManipulating(unittest.TestCase): html = ''' ''' def test_remove(self): d = pq(self.html) d('img').remove() val = d('a:first').html() assert val == 'Test My link text', repr(val) val = d('a:last').html() assert val == ' My link text 2', repr(val) class TestHTMLParser(unittest.TestCase): xml = "
    I'm valid XML
    " html = '''
    TestimageMy link text imageMy link text 2 Behind you, a three-headed HTML‐Entity!
    ''' def test_parser_persistance(self): d = pq(self.xml, parser='xml') self.assertRaises(etree.XMLSyntaxError, lambda: d.after(self.html)) d = pq(self.xml, parser='html') d.after(self.html) # this should not fail @not_py3k def test_soup_parser(self): d = pq('Hello</head><body onload=crash()>Hi all<p>', parser='soup') self.assertEqual(str(d), ( '<html><meta/><head><title>Hello' 'Hi all

    ')) def test_replaceWith(self): expected = '''

    TestimageMy link text imageMy link text 2 Behind you, a three-headed HTML&dash;Entity!
    ''' d = pq(self.html) d('img').replaceWith('image') val = d.__html__() assert val == expected, (repr(val), repr(expected)) def test_replaceWith_with_function(self): expected = '''
    TestimageMy link text imageMy link text 2 Behind you, a three-headed HTML&dash;Entity!
    ''' d = pq(self.html) d('a').replaceWith(lambda i, e: pq(e).html()) val = d.__html__() assert val == expected, (repr(val), repr(expected)) class TestXMLNamespace(unittest.TestCase): xml = ''' What 123 ''' xhtml = '''
    What
    ''' def test_selector(self): expected = 'What' d = pq(b(self.xml), parser='xml') val = d('bar|blah', namespaces={'bar': 'http://example.com/bar'}).text() self.assertEqual(repr(val), repr(expected)) def test_selector_with_xml(self): expected = 'What' d = pq('bar|blah', b(self.xml), parser='xml', namespaces={'bar': 'http://example.com/bar'}) val = d.text() self.assertEqual(repr(val), repr(expected)) def test_selector_html(self): expected = 'What' d = pq('blah', self.xml.split('?>', 1)[1], parser='html') val = d.text() self.assertEqual(repr(val), repr(expected)) def test_xhtml_namespace(self): expected = 'What' d = pq(b(self.xhtml), parser='xml') d.xhtml_to_html() val = d('div').text() self.assertEqual(repr(val), repr(expected)) def test_xhtml_namespace_html_parser(self): expected = 'What' d = pq(self.xhtml, parser='html') d.xhtml_to_html() val = d('div').text() self.assertEqual(repr(val), repr(expected)) def test_remove_namespaces(self): expected = 'What' d = pq(b(self.xml), parser='xml').remove_namespaces() val = d('blah').text() self.assertEqual(repr(val), repr(expected)) class TestWebScrapping(unittest.TestCase): @with_net def test_get(self): d = pq('http://duckduckgo.com/', {'q': 'foo'}, method='get') print(d) self.assertEqual(d('input[name=q]:last').val(), 'foo') @with_net def test_post(self): d = pq('http://duckduckgo.com/', {'q': 'foo'}, method='post') self.assertEqual(d('input[name=q]:last').val(), 'foo') if __name__ == '__main__': fails, total = unittest.main() if fails == 0: print('OK') pyquery-1.2.4/pyquery/cssselectpatch.py0000664000175000017500000001701712055472046020571 0ustar gawelgawel00000000000000#-*- coding:utf-8 -*- # # Copyright (C) 2008 - Olivier Lauzanne # # Distributed under the BSD license, see LICENSE.txt from cssselect import xpath as cssselect_xpath from cssselect.xpath import ExpressionError class JQueryTranslator(cssselect_xpath.HTMLTranslator): """This class is used to implement the css pseudo classes (:first, :last, ...) that are not defined in the css standard, but are defined in the jquery API. """ def xpath_first_pseudo(self, xpath): """Matches the first selected element. """ xpath.add_post_condition('position() = 1') return xpath def xpath_last_pseudo(self, xpath): """Matches the last selected element. """ xpath.add_post_condition('position() = last()') return xpath def xpath_even_pseudo(self, xpath): """Matches even elements, zero-indexed. """ # the first element is 1 in xpath and 0 in python and js xpath.add_post_condition('position() mod 2 = 1') return xpath def xpath_odd_pseudo(self, xpath): """Matches odd elements, zero-indexed. """ xpath.add_post_condition('position() mod 2 = 0') return xpath def xpath_checked_pseudo(self, xpath): """Matches odd elements, zero-indexed. """ xpath.add_condition("@checked and name(.) = 'input'") return xpath def xpath_selected_pseudo(self, xpath): """Matches all elements that are selected. """ xpath.add_condition("@selected and name(.) = 'option'") return xpath def xpath_disabled_pseudo(self, xpath): """Matches all elements that are disabled. """ xpath.add_condition("@disabled") return xpath def xpath_enabled_pseudo(self, xpath): """Matches all elements that are enabled. """ xpath.add_condition("not(@disabled) and name(.) = 'input'") return xpath def xpath_file_pseudo(self, xpath): """Matches all input elements of type file. """ xpath.add_condition("@type = 'file' and name(.) = 'input'") return xpath def xpath_input_pseudo(self, xpath): """Matches all input elements. """ xpath.add_condition("(name(.) = 'input' or name(.) = 'select') " + "or (name(.) = 'textarea' or name(.) = 'button')") return xpath def xpath_button_pseudo(self, xpath): """Matches all button input elements and the button element. """ xpath.add_condition("(@type = 'button' and name(.) = 'input') " + "or name(.) = 'button'") return xpath def xpath_radio_pseudo(self, xpath): """Matches all radio input elements. """ xpath.add_condition("@type = 'radio' and name(.) = 'input'") return xpath def xpath_text_pseudo(self, xpath): """Matches all text input elements. """ xpath.add_condition("@type = 'text' and name(.) = 'input'") return xpath def xpath_checkbox_pseudo(self, xpath): """Matches all checkbox input elements. """ xpath.add_condition("@type = 'checkbox' and name(.) = 'input'") return xpath def xpath_password_pseudo(self, xpath): """Matches all password input elements. """ xpath.add_condition("@type = 'password' and name(.) = 'input'") return xpath def xpath_submit_pseudo(self, xpath): """Matches all submit input elements. """ xpath.add_condition("@type = 'submit' and name(.) = 'input'") return xpath def xpath_image_pseudo(self, xpath): """Matches all image input elements. """ xpath.add_condition("@type = 'image' and name(.) = 'input'") return xpath def xpath_reset_pseudo(self, xpath): """Matches all reset input elements. """ xpath.add_condition("@type = 'reset' and name(.) = 'input'") return xpath def xpath_header_pseudo(self, xpath): """Matches all header elelements (h1, ..., h6) """ # this seems kind of brute-force, is there a better way? xpath.add_condition( "(name(.) = 'h1' or name(.) = 'h2' or name (.) = 'h3') " + "or (name(.) = 'h4' or name (.) = 'h5' or name(.) = 'h6')") return xpath def xpath_parent_pseudo(self, xpath): """Match all elements that contain other elements """ xpath.add_condition("count(child::*) > 0") return xpath def xpath_empty_pseudo(self, xpath): """Match all elements that do not contain other elements """ xpath.add_condition("count(child::*) = 0") return xpath def xpath_eq_function(self, xpath, function): """Matches a single element by its index. """ if function.argument_types() != ['NUMBER']: raise ExpressionError( "Expected a single integer for :eq(), got %r" % function.arguments ) value = int(function.arguments[0].value) xpath.add_post_condition( 'position() = %s' % (value + 1)) return xpath def xpath_gt_function(self, xpath, function): """Matches all elements with an index over the given one. """ if function.argument_types() != ['NUMBER']: raise ExpressionError( "Expected a single integer for :gt(), got %r" % function.arguments ) value = int(function.arguments[0].value) xpath.add_post_condition( 'position() > %s' % (value + 1)) return xpath def xpath_lt_function(self, xpath, function): """Matches all elements with an index below the given one. """ if function.argument_types() != ['NUMBER']: raise ExpressionError( "Expected a single integer for :gt(), got %r" % function.arguments ) value = int(function.arguments[0].value) xpath.add_post_condition( 'position() < %s' % (value + 1)) return xpath def xpath_contains_function(self, xpath, function): """Matches all elements that contain the given text """ if function.argument_types() != ['STRING']: raise ExpressionError( "Expected a single string for :contains(), got %r" % function.arguments ) value = str(function.arguments[0].value) xpath.add_post_condition( "contains(text(), '%s')" % value) return xpath XPathExprOrig = cssselect_xpath.XPathExpr class XPathExpr(XPathExprOrig): def __init__(self, path='', element='*', condition='', star_prefix=False): self.path = path self.element = element self.condition = condition self.post_condition = None def add_post_condition(self, post_condition): if self.post_condition: self.post_condition = '%s and (%s)' % (self.post_condition, post_condition) else: self.post_condition = post_condition def __str__(self): path = XPathExprOrig.__str__(self) if self.post_condition: path = '%s[%s]' % (path, self.post_condition) return path def join(self, combiner, other): res = XPathExprOrig.join(self, combiner, other) self.post_condition = other.post_condition return res cssselect_xpath.XPathExpr = XPathExpr pyquery-1.2.4/pyquery/tests.txt0000664000175000017500000000140312055472046017102 0ustar gawelgawel00000000000000 Assume spaces normalization:: >>> pq('
    ').text() '' >>> print(pq('
    • toto
    • tata
    ').text()) toto tata Complex wrapping:: >>> d = pq('
    youhou
    ') >>> s = d('span') >>> s is d False >>> s.wrap('
    ') [
    ] We get the original doc with new node:: >>> print(d)
    youhou
    Complex wrapAll:: >>> doc = pq('
    Heyyou !
    ') >>> s = doc('span') >>> s.wrapAll('
    ') [] >>> print(doc)
    Heyyou !
    pyquery-1.2.4/setup.cfg0000664000175000017500000000052212055767744015317 0ustar gawelgawel00000000000000[nosetests] with-doctest = true verbosity = 3 [aliases] sphinx = build_sphinx release = sdist --formats=zip,gztar register upload build_sphinx upload_sphinx [build_sphinx] source-dir = docs/ build-dir = docs/_build all_files = 1 [upload_sphinx] upload-dir = docs/_build/html [egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 pyquery-1.2.4/CHANGES.rst0000664000175000017500000000255412055765675015310 0ustar gawelgawel000000000000001.2.4 ----- Moved to github. So a few files are renamed from .txt to .rst Added .xhtml_to_html() and .remove_namespaces() Use requests to fetch urls (if available) Use restkit's proxy instead of Paste (which will die with py3) Allow to open https urls python2.5 is no longer supported (may work, but tests are broken) 1.2.3 ----- Allow to pass this in .filter() callback Add .contents() .items() Add tox.ini Bug fixes: fix #35 #55 #64 #66 1.2.2 ----- Fix cssselectpatch to match the newer implementation of cssselect. Fixes issue #62, #52 and #59 (Haoyu Bai) Fix issue #37 (Caleb Burns) 1.2.1 ----- Allow to use a custom css translator. Fix issue 44: case problem with xml documents 1.2 --- PyQuery now use `cssselect `_. See issue 43. Fix issue 40: forward .html() extra arguments to ``lxml.etree.tostring`` 1.1.1 ----- Minor release. Include test file so you can run tests from the tarball. 1.1 --- fix issues 30, 31, 32 - py3 improvements / webob 1.2+ support 1.0 --- fix issues 24 0.7 --- Python 3 compatible Add __unicode__ method Add root and encoding attribute fix issues 19, 20, 22, 23 0.6.1 ------ Move README.txt at package root Add CHANGES.txt and add it to long_description 0.6 ---- Added PyQuery.outerHtml Added PyQuery.fn Added PyQuery.map Change PyQuery.each behavior to reflect jQuery api pyquery-1.2.4/tox.ini0000664000175000017500000000210012055754352014772 0ustar gawelgawel00000000000000[tox] envlist=py26,py27,py27-requests,py32,py33 [testenv:py26] basepython=python2.6 changedir={toxinidir} commands = rm -f .installed.cfg {envbindir}/buildout buildout:parts-directory={envdir}/parts buildout:bin-directory={envbindir} {envbindir}/nosetests [] deps = zc.buildout [testenv:py27] basepython=python2.7 changedir={toxinidir} commands = rm -f .installed.cfg {envbindir}/buildout buildout:parts-directory={envdir}/parts buildout:bin-directory={envbindir} {envbindir}/nosetests [] deps = zc.buildout [testenv:py27-requests] basepython=python2.7 changedir={toxinidir} commands = rm -f .installed.cfg {envbindir}/buildout buildout:parts-directory={envdir}/parts buildout:bin-directory={envbindir} \ eggs:eggs+=requests {envbindir}/nosetests [] deps = zc.buildout [testenv:py32] basepython=python3.2 changedir={toxinidir} commands = {envbindir}/nosetests [] deps = nose webob [testenv:py33] basepython=python3.3 changedir={toxinidir} commands = {envbindir}/nosetests [] deps = nose webob pyquery-1.2.4/PKG-INFO0000664000175000017500000001314512055767744014600 0ustar gawelgawel00000000000000Metadata-Version: 1.0 Name: pyquery Version: 1.2.4 Summary: A jquery-like library for python Home-page: https://github.com/gawel/pyquery Author: Gael Pasgrimaud Author-email: gael@gawel.org License: BSD Description: pyquery: a jquery-like library for python ========================================= pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation. This is not (or at least not yet) a library to produce or interact with javascript code. I just liked the jquery API and I missed it in python so I told myself "Hey let's make jquery in python". This is the result. It can be used for many purposes, one idea that I might try in the future is to use it for templating with pure http templates that you modify using pyquery. I can also be used for web scrapping or for theming applications with `Deliverance`_. The `project`_ is being actively developped on a git repository on Github. I have the policy of giving push access to anyone who wants it and then to review what he does. So if you want to contribute just email me. Please report bugs on the `github `_ issue tracker. .. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance .. _project: https://github.com/gawel/pyquery/ Quickstart ========== You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:: >>> from pyquery import PyQuery as pq >>> from lxml import etree >>> import urllib >>> d = pq("") >>> d = pq(etree.fromstring("")) >>> d = pq(url='http://google.com/') >>> # d = pq(url='http://google.com/', opener=lambda url, **kw: urllib.urlopen(url).read()) >>> d = pq(filename=path_to_html_file) Now d is like the $ in jquery:: >>> d("#hello") [] >>> p = d("#hello") >>> print(p.html()) Hello world ! >>> p.html("you know Python rocks") [] >>> print(p.html()) you know Python rocks >>> print(p.text()) you know Python rocks You can use some of the pseudo classes that are available in jQuery but that are not standard in css such as :first :last :even :odd :eq :lt :gt :checked :selected :file:: >>> d('p:first') [] See http://packages.python.org/pyquery/ for the full documentation News ==== 1.2.4 ----- Moved to github. So a few files are renamed from .txt to .rst Added .xhtml_to_html() and .remove_namespaces() Use requests to fetch urls (if available) Use restkit's proxy instead of Paste (which will die with py3) Allow to open https urls python2.5 is no longer supported (may work, but tests are broken) 1.2.3 ----- Allow to pass this in .filter() callback Add .contents() .items() Add tox.ini Bug fixes: fix #35 #55 #64 #66 1.2.2 ----- Fix cssselectpatch to match the newer implementation of cssselect. Fixes issue #62, #52 and #59 (Haoyu Bai) Fix issue #37 (Caleb Burns) 1.2.1 ----- Allow to use a custom css translator. Fix issue 44: case problem with xml documents 1.2 --- PyQuery now use `cssselect `_. See issue 43. Fix issue 40: forward .html() extra arguments to ``lxml.etree.tostring`` 1.1.1 ----- Minor release. Include test file so you can run tests from the tarball. 1.1 --- fix issues 30, 31, 32 - py3 improvements / webob 1.2+ support 1.0 --- fix issues 24 0.7 --- Python 3 compatible Add __unicode__ method Add root and encoding attribute fix issues 19, 20, 22, 23 0.6.1 ------ Move README.txt at package root Add CHANGES.txt and add it to long_description 0.6 ---- Added PyQuery.outerHtml Added PyQuery.fn Added PyQuery.map Change PyQuery.each behavior to reflect jQuery api Keywords: jquery html xml scraping Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 pyquery-1.2.4/pyquery.egg-info/0000775000175000017500000000000012055767744016707 5ustar gawelgawel00000000000000pyquery-1.2.4/pyquery.egg-info/SOURCES.txt0000664000175000017500000000210412055767744020570 0ustar gawelgawel00000000000000CHANGES.rst MANIFEST.in README.rst setup.cfg setup.py tox.ini docs/ajax.txt docs/api.txt docs/attributes.txt docs/changes.txt docs/css.txt docs/future.txt docs/index.txt docs/manipulating.txt docs/scrap.txt docs/testing.txt docs/tips.txt docs/traversing.txt docs/_build/html/_sources/ajax.txt docs/_build/html/_sources/api.txt docs/_build/html/_sources/attributes.txt docs/_build/html/_sources/changes.txt docs/_build/html/_sources/css.txt docs/_build/html/_sources/future.txt docs/_build/html/_sources/index.txt docs/_build/html/_sources/manipulating.txt docs/_build/html/_sources/scrap.txt docs/_build/html/_sources/testing.txt docs/_build/html/_sources/tips.txt docs/_build/html/_sources/traversing.txt pyquery/__init__.py pyquery/ajax.py pyquery/cssselectpatch.py pyquery/openers.py pyquery/pyquery.py pyquery/rules.py pyquery/test.html pyquery/test.py pyquery/tests.txt pyquery.egg-info/PKG-INFO pyquery.egg-info/SOURCES.txt pyquery.egg-info/dependency_links.txt pyquery.egg-info/entry_points.txt pyquery.egg-info/not-zip-safe pyquery.egg-info/requires.txt pyquery.egg-info/top_level.txtpyquery-1.2.4/pyquery.egg-info/top_level.txt0000664000175000017500000000001012055767744021430 0ustar gawelgawel00000000000000pyquery pyquery-1.2.4/pyquery.egg-info/dependency_links.txt0000664000175000017500000000000112055767744022755 0ustar gawelgawel00000000000000 pyquery-1.2.4/pyquery.egg-info/entry_points.txt0000664000175000017500000000004512055767744022204 0ustar gawelgawel00000000000000 # -*- Entry points: -*- pyquery-1.2.4/pyquery.egg-info/requires.txt0000664000175000017500000000002312055767744021302 0ustar gawelgawel00000000000000lxml>=2.1 cssselectpyquery-1.2.4/pyquery.egg-info/not-zip-safe0000664000175000017500000000000112055472137021122 0ustar gawelgawel00000000000000 pyquery-1.2.4/pyquery.egg-info/PKG-INFO0000664000175000017500000001314512055767744020010 0ustar gawelgawel00000000000000Metadata-Version: 1.0 Name: pyquery Version: 1.2.4 Summary: A jquery-like library for python Home-page: https://github.com/gawel/pyquery Author: Gael Pasgrimaud Author-email: gael@gawel.org License: BSD Description: pyquery: a jquery-like library for python ========================================= pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation. This is not (or at least not yet) a library to produce or interact with javascript code. I just liked the jquery API and I missed it in python so I told myself "Hey let's make jquery in python". This is the result. It can be used for many purposes, one idea that I might try in the future is to use it for templating with pure http templates that you modify using pyquery. I can also be used for web scrapping or for theming applications with `Deliverance`_. The `project`_ is being actively developped on a git repository on Github. I have the policy of giving push access to anyone who wants it and then to review what he does. So if you want to contribute just email me. Please report bugs on the `github `_ issue tracker. .. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance .. _project: https://github.com/gawel/pyquery/ Quickstart ========== You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:: >>> from pyquery import PyQuery as pq >>> from lxml import etree >>> import urllib >>> d = pq("") >>> d = pq(etree.fromstring("")) >>> d = pq(url='http://google.com/') >>> # d = pq(url='http://google.com/', opener=lambda url, **kw: urllib.urlopen(url).read()) >>> d = pq(filename=path_to_html_file) Now d is like the $ in jquery:: >>> d("#hello") [] >>> p = d("#hello") >>> print(p.html()) Hello world ! >>> p.html("you know Python rocks") [] >>> print(p.html()) you know Python rocks >>> print(p.text()) you know Python rocks You can use some of the pseudo classes that are available in jQuery but that are not standard in css such as :first :last :even :odd :eq :lt :gt :checked :selected :file:: >>> d('p:first') [] See http://packages.python.org/pyquery/ for the full documentation News ==== 1.2.4 ----- Moved to github. So a few files are renamed from .txt to .rst Added .xhtml_to_html() and .remove_namespaces() Use requests to fetch urls (if available) Use restkit's proxy instead of Paste (which will die with py3) Allow to open https urls python2.5 is no longer supported (may work, but tests are broken) 1.2.3 ----- Allow to pass this in .filter() callback Add .contents() .items() Add tox.ini Bug fixes: fix #35 #55 #64 #66 1.2.2 ----- Fix cssselectpatch to match the newer implementation of cssselect. Fixes issue #62, #52 and #59 (Haoyu Bai) Fix issue #37 (Caleb Burns) 1.2.1 ----- Allow to use a custom css translator. Fix issue 44: case problem with xml documents 1.2 --- PyQuery now use `cssselect `_. See issue 43. Fix issue 40: forward .html() extra arguments to ``lxml.etree.tostring`` 1.1.1 ----- Minor release. Include test file so you can run tests from the tarball. 1.1 --- fix issues 30, 31, 32 - py3 improvements / webob 1.2+ support 1.0 --- fix issues 24 0.7 --- Python 3 compatible Add __unicode__ method Add root and encoding attribute fix issues 19, 20, 22, 23 0.6.1 ------ Move README.txt at package root Add CHANGES.txt and add it to long_description 0.6 ---- Added PyQuery.outerHtml Added PyQuery.fn Added PyQuery.map Change PyQuery.each behavior to reflect jQuery api Keywords: jquery html xml scraping Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 pyquery-1.2.4/MANIFEST.in0000664000175000017500000000022512055501701015207 0ustar gawelgawel00000000000000recursive-include pyquery *.txt recursive-include pyquery *.html recursive-include docs *.txt include README.rst include CHANGES.rst include tox.ini pyquery-1.2.4/README.rst0000664000175000017500000000431312055766527015165 0ustar gawelgawel00000000000000pyquery: a jquery-like library for python ========================================= pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation. This is not (or at least not yet) a library to produce or interact with javascript code. I just liked the jquery API and I missed it in python so I told myself "Hey let's make jquery in python". This is the result. It can be used for many purposes, one idea that I might try in the future is to use it for templating with pure http templates that you modify using pyquery. I can also be used for web scrapping or for theming applications with `Deliverance`_. The `project`_ is being actively developped on a git repository on Github. I have the policy of giving push access to anyone who wants it and then to review what he does. So if you want to contribute just email me. Please report bugs on the `github `_ issue tracker. .. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance .. _project: https://github.com/gawel/pyquery/ Quickstart ========== You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:: >>> from pyquery import PyQuery as pq >>> from lxml import etree >>> import urllib >>> d = pq("") >>> d = pq(etree.fromstring("")) >>> d = pq(url='http://google.com/') >>> # d = pq(url='http://google.com/', opener=lambda url, **kw: urllib.urlopen(url).read()) >>> d = pq(filename=path_to_html_file) Now d is like the $ in jquery:: >>> d("#hello") [] >>> p = d("#hello") >>> print(p.html()) Hello world ! >>> p.html("you know Python rocks") [] >>> print(p.html()) you know Python rocks >>> print(p.text()) you know Python rocks You can use some of the pseudo classes that are available in jQuery but that are not standard in css such as :first :last :even :odd :eq :lt :gt :checked :selected :file:: >>> d('p:first') []