pax_global_header00006660000000000000000000000064124644255510014522gustar00rootroot0000000000000052 comment=558ec81e0af6befa554771095747bd2eeecbfbc9 xmltodict-0.9.2/000077500000000000000000000000001246442555100135415ustar00rootroot00000000000000xmltodict-0.9.2/.gitignore000066400000000000000000000004651246442555100155360ustar00rootroot00000000000000*.py[cod] # C extensions *.so # Packages *.egg *.egg-info dist build eggs parts bin var sdist develop-eggs .installed.cfg lib lib64 # Installer logs pip-log.txt # Unit test / coverage reports .coverage .tox nosetests.xml #Translations *.mo #Mr Developer .mr.developer.cfg #setuptools MANIFEST MANIFEST xmltodict-0.9.2/.travis.yml000066400000000000000000000016731246442555100156610ustar00rootroot00000000000000language: python python: - "2.6" - "2.7" - "3.2" - "3.3" - "3.4" - "pypy" env: - JYTHON=true - JYTHON=false matrix: exclude: - python: "2.6" env: JYTHON=true - python: "2.7" env: JYTHON=true - python: "3.2" env: JYTHON=true - python: "3.3" env: JYTHON=true - python: "3.4" env: JYTHON=true before_install: - export JYTHON_URL='http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7-b3/jython-installer-2.7-b3.jar' - if [ "$JYTHON" == "true" ]; then wget $JYTHON_URL -O jython_installer.jar; java -jar jython_installer.jar -s -d $HOME/jython; export PATH=$HOME/jython:$PATH; jython ez_setup.py; $HOME/jython/bin/easy_install nose; fi before_script: if [ "$JYTHON" == "true" ]; then export NOSE=$HOME/jython/bin/nosetests NOSE_OPTIONS=""; else export NOSE=nosetests NOSE_OPTIONS="--with-coverage --cover-package=xmltodict"; fi script: $NOSE $NOSE_OPTIONS xmltodict-0.9.2/CHANGELOG.md000066400000000000000000000100771246442555100153570ustar00rootroot00000000000000CHANGELOG ========= v0.9.2 ------ * Fix multiroot check for list values (edge case reported by @JKillian) v0.9.1 ------ * Only check single root when full_document=True (Thanks @JKillian!) v0.9.0 ------ * Added CHANGELOG.md * Avoid ternary operator in call to ParserCreate(). * Adding Python 3.4 to Tox test environment. * Added full_document flag to unparse (default=True). v0.8.7 ------ * Merge pull request #56 from HansWeltar/master * Improve performance for large files * Updated README unparse example with pretty=True. v0.8.6 ------ * Fixed extra newlines in pretty print mode. * Fixed all flake8 warnings. v0.8.5 ------ * Added Tox config. * Let expat figure out the doc encoding. v0.8.4 ------ * Fixed Jython TravisCI build. * Moved nose and coverage to tests_require. * Dropping python 2.5 from travis.yml. v0.8.3 ------ * Use system setuptools if available. v0.8.2 ------ * Switch to latest setuptools. v0.8.1 ------ * Include distribute_setup.py in MANIFEST.in * Updated package classifiers (python versions, PyPy, Jython). v0.8.0 ------ * Merge pull request #40 from martinblech/jython-support * Adding Jython support. * Fix streaming example callback (must return True) v0.7.0 ------ * Merge pull request #35 from martinblech/namespace-support * Adding support for XML namespaces. * Merge pull request #33 from bgilb/master * fixes whitespace style * changes module import syntax and assertRaises * adds unittest assertRaises v0.6.0 ------ * Merge pull request #31 from martinblech/document-unparse * Adding documentation for unparse() * Merge pull request #30 from martinblech/prettyprint * Adding support for pretty print in unparse() v0.5.1 ------ * Merge pull request #29 from dusual/master * ordereddict import for less 2.6 if available v0.5.0 ------ * Allow using alternate versions of `expat`. * Added shameless link to GitTip. * Merge pull request #20 from kevbo/master * Adds unparse example to README v0.4.6 ------ * fix try/catch block for pypi (throws AttributeError instead of TypeError) * prevent encoding an already encoded string * removed unecessary try/catch for xml_input.encode(). check if file or string, EAFP style. (thanks @turicas) v0.4.5 ------ * test with python 3.3 too * avoid u'unicode' syntax (fails in python 3.2) * handle unicode input strings properly * add strip_whitespace option (default=True) * Merge pull request #16 from slestak/master * fix unittest * working with upstream to improve #15 * remove pythonpath tweaks, change loc of #15 patch * upstream #15 v0.4.4 ------ * test attribute order roundtrip only if OrderedDict is available (python >= 2.7) * Merge branch 'master' of github.com:martinblech/xmltodict * preserve xml attribute order (fixes #13) v0.4.3 ------ * fix #12: postprocess cdata items too * added info about official fedora package v0.4.2 ------ * Merge pull request #11 from ralphbean/master * Include REAMDE, LICENSE, and tests in the distributed tarball. v0.4.1 ------ * take all characters (no need to strip and filter) * fixed CLI (marshal only takes dict, not OrderedDict) * ignore MANIFEST v0.4 ---- * #8 preprocessing callback in unparse() v0.3 ---- * implemented postprocessor callback (#6) * update readme with install instructions v0.2 ---- * link to travis-ci build status * more complete info in setup.py (for uploading to PyPi) * coverage annotations for tricky py3k workarounds * py3k compatibility * removed unused __future__ print_function * using io.StringIO on py3k * removed unnecessary exception catching * initial travis-ci configuration * made _emit function private * unparse functionality * added tests * updated (c) notice to acknowledge individual contributors * added license information * fixed README * removed temp build directory and added a .gitignore to avoid that happening again * Merge pull request #1 from scottscott/master * Added setup script to make xmltodict a Python module. * fixed bad handling of cdata in semistructured xml, changed _CDATA_ to #text as default * added attr_prefix, cdata_key and force_cdata parameters * links in README * links in README * improved README * initial commit xmltodict-0.9.2/LICENSE000066400000000000000000000020751246442555100145520ustar00rootroot00000000000000Copyright (C) 2012 Martin Blech and individual contributors. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. xmltodict-0.9.2/MANIFEST.in000066400000000000000000000001201246442555100152700ustar00rootroot00000000000000include README.md include LICENSE include ez_setup.py recursive-include tests * xmltodict-0.9.2/README.md000066400000000000000000000100471246442555100150220ustar00rootroot00000000000000# xmltodict `xmltodict` is a Python module that makes working with XML feel like you are working with [JSON](http://docs.python.org/library/json.html), as in this ["spec"](http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html): [![Build Status](https://secure.travis-ci.org/martinblech/xmltodict.png)](http://travis-ci.org/martinblech/xmltodict) ```python >>> doc = xmltodict.parse(""" ... ... ... elements ... more elements ... ... ... element as well ... ... ... """) >>> >>> doc['mydocument']['@has'] u'an attribute' >>> doc['mydocument']['and']['many'] [u'elements', u'more elements'] >>> doc['mydocument']['plus']['@a'] u'complex' >>> doc['mydocument']['plus']['#text'] u'element as well' ``` ## Namespace support By default, `xmltodict` does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing `process_namespaces=True` will make it expand namespaces for you: ```python >>> xml = """ ... ... 1 ... 2 ... 3 ... ... """ >>> xmltodict.parse(xml, process_namespaces=True) == { ... 'http://defaultns.com/:root': { ... 'http://defaultns.com/:x': '1', ... 'http://a.com/:y': '2', ... 'http://b.com/:z': '3', ... } ... } True ``` It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether: ```python >>> namespaces = { ... 'http://defaultns.com/': None, # skip this namespace ... 'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a" ... } >>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == { ... 'root': { ... 'x': '1', ... 'ns_a:y': '2', ... 'http://b.com/:z': '3', ... }, ... } True ``` ## Streaming mode `xmltodict` is very fast ([Expat](http://docs.python.org/library/pyexpat.html)-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like [Discogs](http://discogs.com/data/) or [Wikipedia](http://dumps.wikimedia.org/): ```python >>> def handle_artist(_, artist): ... print artist['name'] ... return True >>> >>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'), ... item_depth=2, item_callback=handle_artist) A Perfect Circle Fantômas King Crimson Chris Potter ... ``` It can also be used from the command line to pipe objects to a script like this: ```python import sys, marshal while True: _, article = marshal.load(sys.stdin) print article['title'] ``` ```sh $ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | myscript.py AccessibleComputing Anarchism AfghanistanHistory AfghanistanGeography AfghanistanPeople AfghanistanCommunications Autism ... ``` Or just cache the dicts so you don't have to parse that big XML file again. You do this only once: ```sh $ cat enwiki-pages-articles.xml.bz2 | bunzip2 | xmltodict.py 2 | gzip > enwiki.dicts.gz ``` And you reuse the dicts with every script that needs them: ```sh $ cat enwiki.dicts.gz | gunzip | script1.py $ cat enwiki.dicts.gz | gunzip | script2.py ... ``` ## Roundtripping You can also convert in the other direction, using the `unparse()` method: ```python >>> mydict = { ... 'response': { ... 'status': 'good', ... 'last_updated': '2014-02-16T23:10:12Z', ... } ... } >>> print unparse(mydict, pretty=True) good 2014-02-16T23:10:12Z ``` ## Ok, how do I get it? You just need to ```sh $ pip install xmltodict ``` There is an [official Fedora package for xmltodict](https://admin.fedoraproject.org/pkgdb/acls/name/python-xmltodict). If you are on Fedora or RHEL, you can do: ```sh $ sudo yum install python-xmltodict ``` ## Donate If you love `xmltodict`, consider supporting the author [on Gittip](https://www.gittip.com/martinblech/). xmltodict-0.9.2/ez_setup.py000066400000000000000000000270751246442555100157640ustar00rootroot00000000000000#!python """Bootstrap setuptools installation If you want to use setuptools in your package's setup.py, just include this file in the same directory with it, and add this to the top of your setup.py:: from ez_setup import use_setuptools use_setuptools() If you want to require a specific version of setuptools, set a download mirror, or use an alternate download directory, you can do so by supplying the appropriate options to ``use_setuptools()``. This file can also be run as a script to install or upgrade setuptools. """ import os import shutil import sys import tempfile import tarfile import optparse import subprocess import platform from distutils import log try: from site import USER_SITE except ImportError: USER_SITE = None DEFAULT_VERSION = "1.1.6" DEFAULT_URL = "https://pypi.python.org/packages/source/s/setuptools/" def _python_cmd(*args): args = (sys.executable,) + args return subprocess.call(args) == 0 def _check_call_py24(cmd, *args, **kwargs): res = subprocess.call(cmd, *args, **kwargs) class CalledProcessError(Exception): pass if not res == 0: msg = "Command '%s' return non-zero exit status %d" % (cmd, res) raise CalledProcessError(msg) vars(subprocess).setdefault('check_call', _check_call_py24) def _install(tarball, install_args=()): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # installing log.warn('Installing Setuptools') if not _python_cmd('setup.py', 'install', *install_args): log.warn('Something went wrong during the installation.') log.warn('See the error message above.') # exitcode will be 2 return 2 finally: os.chdir(old_wd) shutil.rmtree(tmpdir) def _build_egg(egg, tarball, to_dir): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # building an egg log.warn('Building a Setuptools egg in %s', to_dir) _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir) finally: os.chdir(old_wd) shutil.rmtree(tmpdir) # returning the result log.warn(egg) if not os.path.exists(egg): raise IOError('Could not build the egg.') def _do_download(version, download_base, to_dir, download_delay): egg = os.path.join(to_dir, 'setuptools-%s-py%d.%d.egg' % (version, sys.version_info[0], sys.version_info[1])) if not os.path.exists(egg): tarball = download_setuptools(version, download_base, to_dir, download_delay) _build_egg(egg, tarball, to_dir) sys.path.insert(0, egg) # Remove previously-imported pkg_resources if present (see # https://bitbucket.org/pypa/setuptools/pull-request/7/ for details). if 'pkg_resources' in sys.modules: del sys.modules['pkg_resources'] import setuptools setuptools.bootstrap_install_from = egg def use_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, download_delay=15): # making sure we use the absolute path to_dir = os.path.abspath(to_dir) was_imported = 'pkg_resources' in sys.modules or \ 'setuptools' in sys.modules try: import pkg_resources except ImportError: return _do_download(version, download_base, to_dir, download_delay) try: pkg_resources.require("setuptools>=" + version) return except pkg_resources.VersionConflict: e = sys.exc_info()[1] if was_imported: sys.stderr.write( "The required version of setuptools (>=%s) is not available,\n" "and can't be installed while this script is running. Please\n" "install a more recent version first, using\n" "'easy_install -U setuptools'." "\n\n(Currently using %r)\n" % (version, e.args[0])) sys.exit(2) else: del pkg_resources, sys.modules['pkg_resources'] # reload ok return _do_download(version, download_base, to_dir, download_delay) except pkg_resources.DistributionNotFound: return _do_download(version, download_base, to_dir, download_delay) def download_file_powershell(url, target): """ Download the file at url to target using Powershell (which will validate trust). Raise an exception if the command cannot complete. """ target = os.path.abspath(target) cmd = [ 'powershell', '-Command', "(new-object System.Net.WebClient).DownloadFile(%(url)r, %(target)r)" % vars(), ] subprocess.check_call(cmd) def has_powershell(): if platform.system() != 'Windows': return False cmd = ['powershell', '-Command', 'echo test'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_powershell.viable = has_powershell def download_file_curl(url, target): cmd = ['curl', url, '--silent', '--output', target] subprocess.check_call(cmd) def has_curl(): cmd = ['curl', '--version'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_curl.viable = has_curl def download_file_wget(url, target): cmd = ['wget', url, '--quiet', '--output-document', target] subprocess.check_call(cmd) def has_wget(): cmd = ['wget', '--version'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_wget.viable = has_wget def download_file_insecure(url, target): """ Use Python to download the file, even though it cannot authenticate the connection. """ try: from urllib.request import urlopen except ImportError: from urllib2 import urlopen src = dst = None try: src = urlopen(url) # Read/write all in one block, so we don't create a corrupt file # if the download is interrupted. data = src.read() dst = open(target, "wb") dst.write(data) finally: if src: src.close() if dst: dst.close() download_file_insecure.viable = lambda: True def get_best_downloader(): downloaders = [ download_file_powershell, download_file_curl, download_file_wget, download_file_insecure, ] for dl in downloaders: if dl.viable(): return dl def download_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, delay=15, downloader_factory=get_best_downloader): """Download setuptools from a specified location and return its filename `version` should be a valid setuptools version number that is available as an egg for download under the `download_base` URL (which should end with a '/'). `to_dir` is the directory where the egg will be downloaded. `delay` is the number of seconds to pause before an actual download attempt. ``downloader_factory`` should be a function taking no arguments and returning a function for downloading a URL to a target. """ # making sure we use the absolute path to_dir = os.path.abspath(to_dir) tgz_name = "setuptools-%s.tar.gz" % version url = download_base + tgz_name saveto = os.path.join(to_dir, tgz_name) if not os.path.exists(saveto): # Avoid repeated downloads log.warn("Downloading %s", url) downloader = downloader_factory() downloader(url, saveto) return os.path.realpath(saveto) def _extractall(self, path=".", members=None): """Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by getmembers(). """ import copy import operator from tarfile import ExtractError directories = [] if members is None: members = self for tarinfo in members: if tarinfo.isdir(): # Extract directories with a safe mode. directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 448 # decimal for oct 0700 self.extract(tarinfo, path) # Reverse sort directories. if sys.version_info < (2, 4): def sorter(dir1, dir2): return cmp(dir1.name, dir2.name) directories.sort(sorter) directories.reverse() else: directories.sort(key=operator.attrgetter('name'), reverse=True) # Set correct owner, mtime and filemode on directories. for tarinfo in directories: dirpath = os.path.join(path, tarinfo.name) try: self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) except ExtractError: e = sys.exc_info()[1] if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def _build_install_args(options): """ Build the arguments to 'python setup.py install' on the setuptools package """ install_args = [] if options.user_install: if sys.version_info < (2, 6): log.warn("--user requires Python 2.6 or later") raise SystemExit(1) install_args.append('--user') return install_args def _parse_args(): """ Parse the command line for options """ parser = optparse.OptionParser() parser.add_option( '--user', dest='user_install', action='store_true', default=False, help='install in user site package (requires Python 2.6 or later)') parser.add_option( '--download-base', dest='download_base', metavar="URL", default=DEFAULT_URL, help='alternative URL from where to download the setuptools package') parser.add_option( '--insecure', dest='downloader_factory', action='store_const', const=lambda: download_file_insecure, default=get_best_downloader, help='Use internal, non-validating downloader' ) options, args = parser.parse_args() # positional arguments are ignored return options def main(version=DEFAULT_VERSION): """Install or upgrade setuptools and EasyInstall""" options = _parse_args() tarball = download_setuptools(download_base=options.download_base, downloader_factory=options.downloader_factory) return _install(tarball, _build_install_args(options)) if __name__ == '__main__': sys.exit(main()) xmltodict-0.9.2/setup.py000077500000000000000000000024751246442555100152660ustar00rootroot00000000000000#!/usr/bin/env python try: from setuptools import setup except ImportError: from ez_setup import use_setuptools use_setuptools() from setuptools import setup import xmltodict setup(name='xmltodict', version=xmltodict.__version__, description=xmltodict.__doc__, author=xmltodict.__author__, author_email='martinblech@gmail.com', url='https://github.com/martinblech/xmltodict', license=xmltodict.__license__, platforms=['all'], classifiers=[ 'Intended Audience :: Developers', 'License :: OSI Approved :: MIT License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.5', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: Implementation :: Jython', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Text Processing :: Markup :: XML', ], py_modules=['xmltodict'], tests_require=['nose>=1.0', 'coverage'], ) xmltodict-0.9.2/tests/000077500000000000000000000000001246442555100147035ustar00rootroot00000000000000xmltodict-0.9.2/tests/test_dicttoxml.py000066400000000000000000000120221246442555100203200ustar00rootroot00000000000000import sys from xmltodict import parse, unparse, OrderedDict try: import unittest2 as unittest except ImportError: import unittest import re import collections from textwrap import dedent IS_JYTHON = sys.platform.startswith('java') _HEADER_RE = re.compile(r'^[^\n]*\n') def _strip(fullxml): return _HEADER_RE.sub('', fullxml) class DictToXMLTestCase(unittest.TestCase): def test_root(self): obj = {'a': None} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_simple_cdata(self): obj = {'a': 'b'} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_cdata(self): obj = {'a': {'#text': 'y'}} self.assertEqual(obj, parse(unparse(obj), force_cdata=True)) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_attrib(self): obj = {'a': {'@href': 'x'}} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_attrib_and_cdata(self): obj = {'a': {'@href': 'x', '#text': 'y'}} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_list(self): obj = {'a': {'b': ['1', '2', '3']}} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_no_root(self): self.assertRaises(ValueError, unparse, {}) def test_multiple_roots(self): self.assertRaises(ValueError, unparse, {'a': '1', 'b': '2'}) self.assertRaises(ValueError, unparse, {'a': ['1', '2', '3']}) def test_no_root_nofulldoc(self): self.assertEqual(unparse({}, full_document=False), '') def test_multiple_roots_nofulldoc(self): obj = OrderedDict((('a', 1), ('b', 2))) xml = unparse(obj, full_document=False) self.assertEqual(xml, '12') obj = {'a': [1, 2]} xml = unparse(obj, full_document=False) self.assertEqual(xml, '12') def test_nested(self): obj = {'a': {'b': '1', 'c': '2'}} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) obj = {'a': {'b': {'c': {'@a': 'x', '#text': 'y'}}}} self.assertEqual(obj, parse(unparse(obj))) self.assertEqual(unparse(obj), unparse(parse(unparse(obj)))) def test_semistructured(self): xml = 'abcefg' self.assertEqual(_strip(unparse(parse(xml))), 'abcefg') if hasattr(collections, 'OrderedDict'): def test_preprocessor(self): obj = {'a': OrderedDict((('b:int', [1, 2]), ('b', 'c')))} def p(key, value): try: key, _ = key.split(':') except ValueError: pass return key, value self.assertEqual(_strip(unparse(obj, preprocessor=p)), '12c') def test_preprocessor_skipkey(self): obj = {'a': {'b': 1, 'c': 2}} def p(key, value): if key == 'b': return None return key, value self.assertEqual(_strip(unparse(obj, preprocessor=p)), '2') if hasattr(collections, 'OrderedDict') and not IS_JYTHON: # Jython's SAX does not preserve attribute order def test_attr_order_roundtrip(self): xml = '' self.assertEqual(xml, _strip(unparse(parse(xml)))) if hasattr(collections, 'OrderedDict'): def test_pretty_print(self): obj = {'a': OrderedDict(( ('b', [{'c': [1, 2]}, 3]), ('x', 'y'), ))} newl = '\n' indent = '....' xml = dedent('''\ .... ........1 ........2 .... ....3 ....y ''') self.assertEqual(xml, unparse(obj, pretty=True, newl=newl, indent=indent)) def test_encoding(self): try: value = unichr(39321) except NameError: value = chr(39321) obj = {'a': value} utf8doc = unparse(obj, encoding='utf-8') latin1doc = unparse(obj, encoding='iso-8859-1') self.assertEqual(parse(utf8doc), parse(latin1doc)) self.assertEqual(parse(utf8doc), obj) def test_fulldoc(self): xml_declaration_re = re.compile( '^' + re.escape('')) self.assertTrue(xml_declaration_re.match(unparse({'a': 1}))) self.assertFalse( xml_declaration_re.match(unparse({'a': 1}, full_document=False))) xmltodict-0.9.2/tests/test_xmltodict.py000066400000000000000000000154031246442555100203260ustar00rootroot00000000000000from xmltodict import parse, ParsingInterrupted try: import unittest2 as unittest except ImportError: import unittest try: from io import BytesIO as StringIO except ImportError: from xmltodict import StringIO def _encode(s): try: return bytes(s, 'ascii') except (NameError, TypeError): return s class XMLToDictTestCase(unittest.TestCase): def test_string_vs_file(self): xml = 'data' self.assertEqual(parse(xml), parse(StringIO(_encode(xml)))) def test_minimal(self): self.assertEqual(parse(''), {'a': None}) self.assertEqual(parse('', force_cdata=True), {'a': None}) def test_simple(self): self.assertEqual(parse('data'), {'a': 'data'}) def test_force_cdata(self): self.assertEqual(parse('data', force_cdata=True), {'a': {'#text': 'data'}}) def test_custom_cdata(self): self.assertEqual(parse('data', force_cdata=True, cdata_key='_CDATA_'), {'a': {'_CDATA_': 'data'}}) def test_list(self): self.assertEqual(parse('123'), {'a': {'b': ['1', '2', '3']}}) def test_attrib(self): self.assertEqual(parse(''), {'a': {'@href': 'xyz'}}) def test_skip_attrib(self): self.assertEqual(parse('', xml_attribs=False), {'a': None}) def test_custom_attrib(self): self.assertEqual(parse('', attr_prefix='!'), {'a': {'!href': 'xyz'}}) def test_attrib_and_cdata(self): self.assertEqual(parse('123'), {'a': {'@href': 'xyz', '#text': '123'}}) def test_semi_structured(self): self.assertEqual(parse('abcdef'), {'a': {'b': None, '#text': 'abcdef'}}) self.assertEqual(parse('abcdef', cdata_separator='\n'), {'a': {'b': None, '#text': 'abc\ndef'}}) def test_nested_semi_structured(self): self.assertEqual(parse('abc123456def'), {'a': {'#text': 'abcdef', 'b': { '#text': '123456', 'c': None}}}) def test_skip_whitespace(self): xml = """ hello """ self.assertEqual( parse(xml), {'root': {'emptya': None, 'emptyb': {'@attr': 'attrvalue'}, 'value': 'hello'}}) def test_keep_whitespace(self): xml = " " self.assertEqual(parse(xml), dict(root=None)) self.assertEqual(parse(xml, strip_whitespace=False), dict(root=' ')) def test_streaming(self): def cb(path, item): cb.count += 1 self.assertEqual(path, [('a', {'x': 'y'}), ('b', None)]) self.assertEqual(item, str(cb.count)) return True cb.count = 0 parse('123', item_depth=2, item_callback=cb) self.assertEqual(cb.count, 3) def test_streaming_interrupt(self): cb = lambda path, item: False self.assertRaises(ParsingInterrupted, parse, 'x', item_depth=1, item_callback=cb) def test_postprocessor(self): def postprocessor(path, key, value): try: return key + ':int', int(value) except (ValueError, TypeError): return key, value self.assertEqual({'a': {'b:int': [1, 2], 'b': 'x'}}, parse('12x', postprocessor=postprocessor)) def test_postprocessor_skip(self): def postprocessor(path, key, value): if key == 'b': value = int(value) if value == 3: return None return key, value self.assertEqual({'a': {'b': [1, 2]}}, parse('123', postprocessor=postprocessor)) def test_unicode(self): try: value = unichr(39321) except NameError: value = chr(39321) self.assertEqual({'a': value}, parse('%s' % value)) def test_encoded_string(self): try: value = unichr(39321) except NameError: value = chr(39321) xml = '%s' % value self.assertEqual(parse(xml), parse(xml.encode('utf-8'))) def test_namespace_support(self): xml = """ 1 2 3 """ d = { 'http://defaultns.com/:root': { 'http://defaultns.com/:x': '1', 'http://a.com/:y': '2', 'http://b.com/:z': '3', } } self.assertEqual(parse(xml, process_namespaces=True), d) def test_namespace_collapse(self): xml = """ 1 2 3 """ namespaces = { 'http://defaultns.com/': None, 'http://a.com/': 'ns_a', } d = { 'root': { 'x': '1', 'ns_a:y': '2', 'http://b.com/:z': '3', }, } self.assertEqual( parse(xml, process_namespaces=True, namespaces=namespaces), d) def test_namespace_ignore(self): xml = """ 1 2 3 """ d = { 'root': { '@xmlns': 'http://defaultns.com/', '@xmlns:a': 'http://a.com/', '@xmlns:b': 'http://b.com/', 'x': '1', 'a:y': '2', 'b:z': '3', }, } self.assertEqual(parse(xml), d) xmltodict-0.9.2/tox.ini000066400000000000000000000002041246442555100150500ustar00rootroot00000000000000[tox] envlist = py27,py34,pypy,py26 [testenv] deps = nose coverage commands=nosetests --with-coverage --cover-package=xmltodict xmltodict-0.9.2/xmltodict.py000077500000000000000000000302611246442555100161270ustar00rootroot00000000000000#!/usr/bin/env python "Makes working with XML feel like you are working with JSON" from xml.parsers import expat from xml.sax.saxutils import XMLGenerator from xml.sax.xmlreader import AttributesImpl try: # pragma no cover from cStringIO import StringIO except ImportError: # pragma no cover try: from StringIO import StringIO except ImportError: from io import StringIO try: # pragma no cover from collections import OrderedDict except ImportError: # pragma no cover try: from ordereddict import OrderedDict except ImportError: OrderedDict = dict try: # pragma no cover _basestring = basestring except NameError: # pragma no cover _basestring = str try: # pragma no cover _unicode = unicode except NameError: # pragma no cover _unicode = str __author__ = 'Martin Blech' __version__ = '0.9.2' __license__ = 'MIT' class ParsingInterrupted(Exception): pass class _DictSAXHandler(object): def __init__(self, item_depth=0, item_callback=lambda *args: True, xml_attribs=True, attr_prefix='@', cdata_key='#text', force_cdata=False, cdata_separator='', postprocessor=None, dict_constructor=OrderedDict, strip_whitespace=True, namespace_separator=':', namespaces=None): self.path = [] self.stack = [] self.data = None self.item = None self.item_depth = item_depth self.xml_attribs = xml_attribs self.item_callback = item_callback self.attr_prefix = attr_prefix self.cdata_key = cdata_key self.force_cdata = force_cdata self.cdata_separator = cdata_separator self.postprocessor = postprocessor self.dict_constructor = dict_constructor self.strip_whitespace = strip_whitespace self.namespace_separator = namespace_separator self.namespaces = namespaces def _build_name(self, full_name): if not self.namespaces: return full_name i = full_name.rfind(self.namespace_separator) if i == -1: return full_name namespace, name = full_name[:i], full_name[i+1:] short_namespace = self.namespaces.get(namespace, namespace) if not short_namespace: return name else: return self.namespace_separator.join((short_namespace, name)) def _attrs_to_dict(self, attrs): if isinstance(attrs, dict): return attrs return self.dict_constructor(zip(attrs[0::2], attrs[1::2])) def startElement(self, full_name, attrs): name = self._build_name(full_name) attrs = self._attrs_to_dict(attrs) self.path.append((name, attrs or None)) if len(self.path) > self.item_depth: self.stack.append((self.item, self.data)) if self.xml_attribs: attrs = self.dict_constructor( (self.attr_prefix+key, value) for (key, value) in attrs.items()) else: attrs = None self.item = attrs or None self.data = None def endElement(self, full_name): name = self._build_name(full_name) if len(self.path) == self.item_depth: item = self.item if item is None: item = self.data should_continue = self.item_callback(self.path, item) if not should_continue: raise ParsingInterrupted() if len(self.stack): item, data = self.item, self.data self.item, self.data = self.stack.pop() if self.strip_whitespace and data is not None: data = data.strip() or None if data and self.force_cdata and item is None: item = self.dict_constructor() if item is not None: if data: self.push_data(item, self.cdata_key, data) self.item = self.push_data(self.item, name, item) else: self.item = self.push_data(self.item, name, data) else: self.item = self.data = None self.path.pop() def characters(self, data): if not self.data: self.data = data else: self.data += self.cdata_separator + data def push_data(self, item, key, data): if self.postprocessor is not None: result = self.postprocessor(self.path, key, data) if result is None: return item key, data = result if item is None: item = self.dict_constructor() try: value = item[key] if isinstance(value, list): value.append(data) else: item[key] = [value, data] except KeyError: item[key] = data return item def parse(xml_input, encoding=None, expat=expat, process_namespaces=False, namespace_separator=':', **kwargs): """Parse the given XML input and convert it into a dictionary. `xml_input` can either be a `string` or a file-like object. If `xml_attribs` is `True`, element attributes are put in the dictionary among regular child elements, using `@` as a prefix to avoid collisions. If set to `False`, they are just ignored. Simple example:: >>> import xmltodict >>> doc = xmltodict.parse(\"\"\" ... ... 1 ... 2 ... ... \"\"\") >>> doc['a']['@prop'] u'x' >>> doc['a']['b'] [u'1', u'2'] If `item_depth` is `0`, the function returns a dictionary for the root element (default behavior). Otherwise, it calls `item_callback` every time an item at the specified depth is found and returns `None` in the end (streaming mode). The callback function receives two parameters: the `path` from the document root to the item (name-attribs pairs), and the `item` (dict). If the callback's return value is false-ish, parsing will be stopped with the :class:`ParsingInterrupted` exception. Streaming example:: >>> def handle(path, item): ... print 'path:%s item:%s' % (path, item) ... return True ... >>> xmltodict.parse(\"\"\" ... ... 1 ... 2 ... \"\"\", item_depth=2, item_callback=handle) path:[(u'a', {u'prop': u'x'}), (u'b', None)] item:1 path:[(u'a', {u'prop': u'x'}), (u'b', None)] item:2 The optional argument `postprocessor` is a function that takes `path`, `key` and `value` as positional arguments and returns a new `(key, value)` pair where both `key` and `value` may have changed. Usage example:: >>> def postprocessor(path, key, value): ... try: ... return key + ':int', int(value) ... except (ValueError, TypeError): ... return key, value >>> xmltodict.parse('12x', ... postprocessor=postprocessor) OrderedDict([(u'a', OrderedDict([(u'b:int', [1, 2]), (u'b', u'x')]))]) You can pass an alternate version of `expat` (such as `defusedexpat`) by using the `expat` parameter. E.g: >>> import defusedexpat >>> xmltodict.parse('hello', expat=defusedexpat.pyexpat) OrderedDict([(u'a', u'hello')]) """ handler = _DictSAXHandler(namespace_separator=namespace_separator, **kwargs) if isinstance(xml_input, _unicode): if not encoding: encoding = 'utf-8' xml_input = xml_input.encode(encoding) if not process_namespaces: namespace_separator = None parser = expat.ParserCreate( encoding, namespace_separator ) try: parser.ordered_attributes = True except AttributeError: # Jython's expat does not support ordered_attributes pass parser.StartElementHandler = handler.startElement parser.EndElementHandler = handler.endElement parser.CharacterDataHandler = handler.characters parser.buffer_text = True try: parser.ParseFile(xml_input) except (TypeError, AttributeError): parser.Parse(xml_input, True) return handler.item def _emit(key, value, content_handler, attr_prefix='@', cdata_key='#text', depth=0, preprocessor=None, pretty=False, newl='\n', indent='\t', full_document=True): if preprocessor is not None: result = preprocessor(key, value) if result is None: return key, value = result if not isinstance(value, (list, tuple)): value = [value] if full_document and depth == 0 and len(value) > 1: raise ValueError('document with multiple roots') for v in value: if v is None: v = OrderedDict() elif not isinstance(v, dict): v = _unicode(v) if isinstance(v, _basestring): v = OrderedDict(((cdata_key, v),)) cdata = None attrs = OrderedDict() children = [] for ik, iv in v.items(): if ik == cdata_key: cdata = iv continue if ik.startswith(attr_prefix): attrs[ik[len(attr_prefix):]] = iv continue children.append((ik, iv)) if pretty: content_handler.ignorableWhitespace(depth * indent) content_handler.startElement(key, AttributesImpl(attrs)) if pretty and children: content_handler.ignorableWhitespace(newl) for child_key, child_value in children: _emit(child_key, child_value, content_handler, attr_prefix, cdata_key, depth+1, preprocessor, pretty, newl, indent) if cdata is not None: content_handler.characters(cdata) if pretty and children: content_handler.ignorableWhitespace(depth * indent) content_handler.endElement(key) if pretty and depth: content_handler.ignorableWhitespace(newl) def unparse(input_dict, output=None, encoding='utf-8', full_document=True, **kwargs): """Emit an XML document for the given `input_dict` (reverse of `parse`). The resulting XML document is returned as a string, but if `output` (a file-like object) is specified, it is written there instead. Dictionary keys prefixed with `attr_prefix` (default=`'@'`) are interpreted as XML node attributes, whereas keys equal to `cdata_key` (default=`'#text'`) are treated as character data. The `pretty` parameter (default=`False`) enables pretty-printing. In this mode, lines are terminated with `'\n'` and indented with `'\t'`, but this can be customized with the `newl` and `indent` parameters. """ if full_document and len(input_dict) != 1: raise ValueError('Document must have exactly one root.') must_return = False if output is None: output = StringIO() must_return = True content_handler = XMLGenerator(output, encoding) if full_document: content_handler.startDocument() for key, value in input_dict.items(): _emit(key, value, content_handler, full_document=full_document, **kwargs) if full_document: content_handler.endDocument() if must_return: value = output.getvalue() try: # pragma no cover value = value.decode(encoding) except AttributeError: # pragma no cover pass return value if __name__ == '__main__': # pragma: no cover import sys import marshal (item_depth,) = sys.argv[1:] item_depth = int(item_depth) def handle_item(path, item): marshal.dump((path, item), sys.stdout) return True try: root = parse(sys.stdin, item_depth=item_depth, item_callback=handle_item, dict_constructor=dict) if item_depth == 0: handle_item([], root) except KeyboardInterrupt: pass