pax_global_header 0000666 0000000 0000000 00000000064 14235663755 0014532 g ustar 00root root 0000000 0000000 52 comment=096db4ad337e4168ef891551814cdd829716192b
xmltodict-0.13.0/ 0000775 0000000 0000000 00000000000 14235663755 0013622 5 ustar 00root root 0000000 0000000 xmltodict-0.13.0/.gitignore 0000664 0000000 0000000 00000000465 14235663755 0015617 0 ustar 00root root 0000000 0000000 *.py[cod]
# C extensions
*.so
# Packages
*.egg
*.egg-info
dist
build
eggs
parts
bin
var
sdist
develop-eggs
.installed.cfg
lib
lib64
# Installer logs
pip-log.txt
# Unit test / coverage reports
.coverage
.tox
nosetests.xml
#Translations
*.mo
#Mr Developer
.mr.developer.cfg
#setuptools MANIFEST
MANIFEST
xmltodict-0.13.0/.travis.yml 0000664 0000000 0000000 00000000267 14235663755 0015740 0 ustar 00root root 0000000 0000000 language: python
python:
- "3.4"
- "3.5"
- "3.6"
- "3.7"
- "3.8"
- "3.9"
- "3.10-dev"
- "pypy"
install: pip install nose2
script: nose2 -vv --coverage=xmltodict.py
xmltodict-0.13.0/CHANGELOG.md 0000664 0000000 0000000 00000014703 14235663755 0015440 0 ustar 00root root 0000000 0000000 CHANGELOG
=========
v0.13.0
-------
* Add install info to readme for openSUSE. (#205)
* Thanks, @smarlowucf!
* Support defaultdict for namespace mapping (#211)
* Thanks, @nathanalderson!
* parse(generator) is now possible (#212)
* Thanks, @xandey!
* Processing comments on parsing from xml to dict (connected to #109) (#221)
* Thanks, @svetazol!
* Add expand_iter kw to unparse to expand iterables (#213)
* Thanks, @claweyenuk!
* Fixed some typos
* Thanks, @timgates42 and @kianmeng!
* Add support for python3.8
* Thanks, @t0b3!
* Drop Jython/Python 2 and add Python 3.9/3.10.
* Drop OrderedDict in Python >= 3.7
* Do not use len() to determine if a sequence is empty
* Thanks, @DimitriPapadopoulos!
* Add more namespace attribute tests
* Thanks, @leogregianin!
* Fix encoding issue in setup.py
* Thanks, @rjarry!
v0.12.0
-------
* Allow force_commits=True for getting all keys as lists (#204)
* README.md: fix useless uses of cat (#200)
* Add FreeBSD install instructions (#199)
* Fix and simplify travis config (#192)
* Add support for Python 3.7 (#189)
* Drop support for EOL Python (#191)
* Use Markdown long_description on PyPI (#190)
* correct spelling mistake (#165)
* correctly unparse booleans (#180)
* Updates README.md with svg badge
v0.11.0
-------
* Determine fileness by checking for `read` attr
* Thanks, @jwodder!
* Add support for Python 3.6.
* Thanks, @cclauss!
* Release as a universal wheel.
* Thanks, @adamchainz!
* Updated docs examples to use print function.
* Thanks, @cdeil!
* unparse: pass short_empty_elements to XMLGenerator
* Thanks, @zhanglei002!
* Added namespace support when unparsing.
* Thanks, @imiric!
v0.10.2
-------
* Fixed defusedexpat expat import.
* Thanks, @fiebiga!
v0.10.1
-------
* Use defusedexpat if available.
* Allow non-string attributes in unparse.
* Add postprocessor support for attributes.
* Make command line interface Python 3-compatible.
v0.10.0
-------
* Add force_list feature.
* Thanks, @guewen and @jonlooney!
* Add support for Python 3.4 and 3.5.
* Performance optimization: use list instead of string for CDATA.
* Thanks, @bharel!
* Include Arch Linux package instructions in README.
* Thanks, @felixonmars!
* Improved documentation.
* Thanks, @ubershmekel!
* Allow any iterable in unparse, not just lists.
* Thanks, @bzamecnik!
* Bugfix: Process namespaces in attributes too.
* Better testing under Python 2.6.
* Thanks, @TyMaszWeb!
v0.9.2
------
* Fix multiroot check for list values (edge case reported by @JKillian)
v0.9.1
------
* Only check single root when full_document=True (Thanks @JKillian!)
v0.9.0
------
* Added CHANGELOG.md
* Avoid ternary operator in call to ParserCreate().
* Adding Python 3.4 to Tox test environment.
* Added full_document flag to unparse (default=True).
v0.8.7
------
* Merge pull request #56 from HansWeltar/master
* Improve performance for large files
* Updated README unparse example with pretty=True.
v0.8.6
------
* Fixed extra newlines in pretty print mode.
* Fixed all flake8 warnings.
v0.8.5
------
* Added Tox config.
* Let expat figure out the doc encoding.
v0.8.4
------
* Fixed Jython TravisCI build.
* Moved nose and coverage to tests_require.
* Dropping python 2.5 from travis.yml.
v0.8.3
------
* Use system setuptools if available.
v0.8.2
------
* Switch to latest setuptools.
v0.8.1
------
* Include distribute_setup.py in MANIFEST.in
* Updated package classifiers (python versions, PyPy, Jython).
v0.8.0
------
* Merge pull request #40 from martinblech/jython-support
* Adding Jython support.
* Fix streaming example callback (must return True)
v0.7.0
------
* Merge pull request #35 from martinblech/namespace-support
* Adding support for XML namespaces.
* Merge pull request #33 from bgilb/master
* fixes whitespace style
* changes module import syntax and assertRaises
* adds unittest assertRaises
v0.6.0
------
* Merge pull request #31 from martinblech/document-unparse
* Adding documentation for unparse()
* Merge pull request #30 from martinblech/prettyprint
* Adding support for pretty print in unparse()
v0.5.1
------
* Merge pull request #29 from dusual/master
* ordereddict import for less 2.6 if available
v0.5.0
------
* Allow using alternate versions of `expat`.
* Added shameless link to GitTip.
* Merge pull request #20 from kevbo/master
* Adds unparse example to README
v0.4.6
------
* fix try/catch block for pypi (throws AttributeError instead of TypeError)
* prevent encoding an already encoded string
* removed unnecessary try/catch for xml_input.encode(). check if file or string, EAFP style. (thanks @turicas)
v0.4.5
------
* test with python 3.3 too
* avoid u'unicode' syntax (fails in python 3.2)
* handle unicode input strings properly
* add strip_whitespace option (default=True)
* Merge pull request #16 from slestak/master
* fix unittest
* working with upstream to improve #15
* remove pythonpath tweaks, change loc of #15 patch
* upstream #15
v0.4.4
------
* test attribute order roundtrip only if OrderedDict is available (python >= 2.7)
* Merge branch 'master' of github.com:martinblech/xmltodict
* preserve xml attribute order (fixes #13)
v0.4.3
------
* fix #12: postprocess cdata items too
* added info about official fedora package
v0.4.2
------
* Merge pull request #11 from ralphbean/master
* Include README, LICENSE, and tests in the distributed tarball.
v0.4.1
------
* take all characters (no need to strip and filter)
* fixed CLI (marshal only takes dict, not OrderedDict)
* ignore MANIFEST
v0.4
----
* #8 preprocessing callback in unparse()
v0.3
----
* implemented postprocessor callback (#6)
* update readme with install instructions
v0.2
----
* link to travis-ci build status
* more complete info in setup.py (for uploading to PyPi)
* coverage annotations for tricky py3k workarounds
* py3k compatibility
* removed unused __future__ print_function
* using io.StringIO on py3k
* removed unnecessary exception catching
* initial travis-ci configuration
* made _emit function private
* unparse functionality
* added tests
* updated (c) notice to acknowledge individual contributors
* added license information
* fixed README
* removed temp build directory and added a .gitignore to avoid that happening again
* Merge pull request #1 from scottscott/master
* Added setup script to make xmltodict a Python module.
* fixed bad handling of cdata in semistructured xml, changed _CDATA_ to #text as default
* added attr_prefix, cdata_key and force_cdata parameters
* links in README
* links in README
* improved README
* initial commit
xmltodict-0.13.0/LICENSE 0000664 0000000 0000000 00000002075 14235663755 0014633 0 ustar 00root root 0000000 0000000 Copyright (C) 2012 Martin Blech and individual contributors.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
xmltodict-0.13.0/MANIFEST.in 0000664 0000000 0000000 00000000120 14235663755 0015351 0 ustar 00root root 0000000 0000000 include README.md
include LICENSE
include ez_setup.py
recursive-include tests *
xmltodict-0.13.0/README.md 0000664 0000000 0000000 00000014714 14235663755 0015110 0 ustar 00root root 0000000 0000000 # xmltodict
`xmltodict` is a Python module that makes working with XML feel like you are working with [JSON](http://docs.python.org/library/json.html), as in this ["spec"](http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html):
[](https://travis-ci.com/martinblech/xmltodict)
```python
>>> print(json.dumps(xmltodict.parse("""
...
...
... elements
... more elements
...
...
... element as well
...
...
... """), indent=4))
{
"mydocument": {
"@has": "an attribute",
"and": {
"many": [
"elements",
"more elements"
]
},
"plus": {
"@a": "complex",
"#text": "element as well"
}
}
}
```
## Namespace support
By default, `xmltodict` does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing `process_namespaces=True` will make it expand namespaces for you:
```python
>>> xml = """
...
... 1
... 2
... 3
...
... """
>>> xmltodict.parse(xml, process_namespaces=True) == {
... 'http://defaultns.com/:root': {
... 'http://defaultns.com/:x': '1',
... 'http://a.com/:y': '2',
... 'http://b.com/:z': '3',
... }
... }
True
```
It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:
```python
>>> namespaces = {
... 'http://defaultns.com/': None, # skip this namespace
... 'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a"
... }
>>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == {
... 'root': {
... 'x': '1',
... 'ns_a:y': '2',
... 'http://b.com/:z': '3',
... },
... }
True
```
## Streaming mode
`xmltodict` is very fast ([Expat](http://docs.python.org/library/pyexpat.html)-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like [Discogs](http://discogs.com/data/) or [Wikipedia](http://dumps.wikimedia.org/):
```python
>>> def handle_artist(_, artist):
... print(artist['name'])
... return True
>>>
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
... item_depth=2, item_callback=handle_artist)
A Perfect Circle
Fantômas
King Crimson
Chris Potter
...
```
It can also be used from the command line to pipe objects to a script like this:
```python
import sys, marshal
while True:
_, article = marshal.load(sys.stdin)
print(article['title'])
```
```sh
$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...
```
Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:
```sh
$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | gzip > enwiki.dicts.gz
```
And you reuse the dicts with every script that needs them:
```sh
$ gunzip enwiki.dicts.gz | script1.py
$ gunzip enwiki.dicts.gz | script2.py
...
```
## Roundtripping
You can also convert in the other direction, using the `unparse()` method:
```python
>>> mydict = {
... 'response': {
... 'status': 'good',
... 'last_updated': '2014-02-16T23:10:12Z',
... }
... }
>>> print(unparse(mydict, pretty=True))
good
2014-02-16T23:10:12Z
```
Text values for nodes can be specified with the `cdata_key` key in the python dict, while node properties can be specified with the `attr_prefix` prefixed to the key name in the python dict. The default value for `attr_prefix` is `@` and the default value for `cdata_key` is `#text`.
```python
>>> import xmltodict
>>>
>>> mydict = {
... 'text': {
... '@color':'red',
... '@stroke':'2',
... '#text':'This is a test'
... }
... }
>>> print(xmltodict.unparse(mydict, pretty=True))
This is a test
```
Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the `expand_iter` keyword argument to provide a tag as demonstrated below. Note that using `expand_iter` will break roundtripping.
```python
>>> mydict = {
... "line": {
... "points": [
... [1, 5],
... [2, 6],
... ]
... }
... }
>>> print(xmltodict.unparse(mydict, pretty=True))
[1, 5]
[2, 6]
>>> print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord"))
1
5
2
6
```
## Ok, how do I get it?
### Using pypi
You just need to
```sh
$ pip install xmltodict
```
### RPM-based distro (Fedora, RHEL, …)
There is an [official Fedora package for xmltodict](https://apps.fedoraproject.org/packages/python-xmltodict).
```sh
$ sudo yum install python-xmltodict
```
### Arch Linux
There is an [official Arch Linux package for xmltodict](https://www.archlinux.org/packages/community/any/python-xmltodict/).
```sh
$ sudo pacman -S python-xmltodict
```
### Debian-based distro (Debian, Ubuntu, …)
There is an [official Debian package for xmltodict](https://tracker.debian.org/pkg/python-xmltodict).
```sh
$ sudo apt install python-xmltodict
```
### FreeBSD
There is an [official FreeBSD port for xmltodict](https://svnweb.freebsd.org/ports/head/devel/py-xmltodict/).
```sh
$ pkg install py36-xmltodict
```
### openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)
There is an [official openSUSE package for xmltodict](https://software.opensuse.org/package/python-xmltodict).
```sh
# Python2
$ zypper in python2-xmltodict
# Python3
$ zypper in python3-xmltodict
```
xmltodict-0.13.0/ez_setup.py 0000664 0000000 0000000 00000030371 14235663755 0016036 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
Setuptools bootstrapping installer.
Maintained at https://github.com/pypa/setuptools/tree/bootstrap.
Run this script to install or upgrade setuptools.
This method is DEPRECATED. Check https://github.com/pypa/setuptools/issues/581 for more details.
"""
import os
import shutil
import sys
import tempfile
import zipfile
import optparse
import subprocess
import platform
import textwrap
import contextlib
from distutils import log
try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
try:
from site import USER_SITE
except ImportError:
USER_SITE = None
# 33.1.1 is the last version that supports setuptools self upgrade/installation.
DEFAULT_VERSION = "33.1.1"
DEFAULT_URL = "https://pypi.io/packages/source/s/setuptools/"
DEFAULT_SAVE_DIR = os.curdir
DEFAULT_DEPRECATION_MESSAGE = "ez_setup.py is deprecated and when using it setuptools will be pinned to {0} since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools"
MEANINGFUL_INVALID_ZIP_ERR_MSG = 'Maybe {0} is corrupted, delete it and try again.'
log.warn(DEFAULT_DEPRECATION_MESSAGE.format(DEFAULT_VERSION))
def _python_cmd(*args):
"""
Execute a command.
Return True if the command succeeded.
"""
args = (sys.executable,) + args
return subprocess.call(args) == 0
def _install(archive_filename, install_args=()):
"""Install Setuptools."""
with archive_context(archive_filename):
# installing
log.warn('Installing Setuptools')
if not _python_cmd('setup.py', 'install', *install_args):
log.warn('Something went wrong during the installation.')
log.warn('See the error message above.')
# exitcode will be 2
return 2
def _build_egg(egg, archive_filename, to_dir):
"""Build Setuptools egg."""
with archive_context(archive_filename):
# building an egg
log.warn('Building a Setuptools egg in %s', to_dir)
_python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir)
# returning the result
log.warn(egg)
if not os.path.exists(egg):
raise IOError('Could not build the egg.')
class ContextualZipFile(zipfile.ZipFile):
"""Supplement ZipFile class to support context manager for Python 2.6."""
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.close()
def __new__(cls, *args, **kwargs):
"""Construct a ZipFile or ContextualZipFile as appropriate."""
if hasattr(zipfile.ZipFile, '__exit__'):
return zipfile.ZipFile(*args, **kwargs)
return super(ContextualZipFile, cls).__new__(cls)
@contextlib.contextmanager
def archive_context(filename):
"""
Unzip filename to a temporary directory, set to the cwd.
The unzipped target is cleaned up after.
"""
tmpdir = tempfile.mkdtemp()
log.warn('Extracting in %s', tmpdir)
old_wd = os.getcwd()
try:
os.chdir(tmpdir)
try:
with ContextualZipFile(filename) as archive:
archive.extractall()
except zipfile.BadZipfile as err:
if not err.args:
err.args = ('', )
err.args = err.args + (
MEANINGFUL_INVALID_ZIP_ERR_MSG.format(filename),
)
raise
# going in the directory
subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0])
os.chdir(subdir)
log.warn('Now working in %s', subdir)
yield
finally:
os.chdir(old_wd)
shutil.rmtree(tmpdir)
def _do_download(version, download_base, to_dir, download_delay):
"""Download Setuptools."""
py_desig = 'py{sys.version_info[0]}.{sys.version_info[1]}'.format(sys=sys)
tp = 'setuptools-{version}-{py_desig}.egg'
egg = os.path.join(to_dir, tp.format(**locals()))
if not os.path.exists(egg):
archive = download_setuptools(version, download_base,
to_dir, download_delay)
_build_egg(egg, archive, to_dir)
sys.path.insert(0, egg)
# Remove previously-imported pkg_resources if present (see
# https://bitbucket.org/pypa/setuptools/pull-request/7/ for details).
if 'pkg_resources' in sys.modules:
_unload_pkg_resources()
import setuptools
setuptools.bootstrap_install_from = egg
def use_setuptools(
version=DEFAULT_VERSION, download_base=DEFAULT_URL,
to_dir=DEFAULT_SAVE_DIR, download_delay=15):
"""
Ensure that a setuptools version is installed.
Return None. Raise SystemExit if the requested version
or later cannot be installed.
"""
to_dir = os.path.abspath(to_dir)
# prior to importing, capture the module state for
# representative modules.
rep_modules = 'pkg_resources', 'setuptools'
imported = set(sys.modules).intersection(rep_modules)
try:
import pkg_resources
pkg_resources.require("setuptools>=" + version)
# a suitable version is already installed
return
except ImportError:
# pkg_resources not available; setuptools is not installed; download
pass
except pkg_resources.DistributionNotFound:
# no version of setuptools was found; allow download
pass
except pkg_resources.VersionConflict as VC_err:
if imported:
_conflict_bail(VC_err, version)
# otherwise, unload pkg_resources to allow the downloaded version to
# take precedence.
del pkg_resources
_unload_pkg_resources()
return _do_download(version, download_base, to_dir, download_delay)
def _conflict_bail(VC_err, version):
"""
Setuptools was imported prior to invocation, so it is
unsafe to unload it. Bail out.
"""
conflict_tmpl = textwrap.dedent("""
The required version of setuptools (>={version}) is not available,
and can't be installed while this script is running. Please
install a more recent version first, using
'easy_install -U setuptools'.
(Currently using {VC_err.args[0]!r})
""")
msg = conflict_tmpl.format(**locals())
sys.stderr.write(msg)
sys.exit(2)
def _unload_pkg_resources():
sys.meta_path = [
importer
for importer in sys.meta_path
if importer.__class__.__module__ != 'pkg_resources.extern'
]
del_modules = [
name for name in sys.modules
if name.startswith('pkg_resources')
]
for mod_name in del_modules:
del sys.modules[mod_name]
def _clean_check(cmd, target):
"""
Run the command to download target.
If the command fails, clean up before re-raising the error.
"""
try:
subprocess.check_call(cmd)
except subprocess.CalledProcessError:
if os.access(target, os.F_OK):
os.unlink(target)
raise
def download_file_powershell(url, target):
"""
Download the file at url to target using Powershell.
Powershell will validate trust.
Raise an exception if the command cannot complete.
"""
target = os.path.abspath(target)
ps_cmd = (
"[System.Net.WebRequest]::DefaultWebProxy.Credentials = "
"[System.Net.CredentialCache]::DefaultCredentials; "
'(new-object System.Net.WebClient).DownloadFile("%(url)s", "%(target)s")'
% locals()
)
cmd = [
'powershell',
'-Command',
ps_cmd,
]
_clean_check(cmd, target)
def has_powershell():
"""Determine if Powershell is available."""
if platform.system() != 'Windows':
return False
cmd = ['powershell', '-Command', 'echo test']
with open(os.path.devnull, 'wb') as devnull:
try:
subprocess.check_call(cmd, stdout=devnull, stderr=devnull)
except Exception:
return False
return True
download_file_powershell.viable = has_powershell
def download_file_curl(url, target):
cmd = ['curl', url, '--location', '--silent', '--output', target]
_clean_check(cmd, target)
def has_curl():
cmd = ['curl', '--version']
with open(os.path.devnull, 'wb') as devnull:
try:
subprocess.check_call(cmd, stdout=devnull, stderr=devnull)
except Exception:
return False
return True
download_file_curl.viable = has_curl
def download_file_wget(url, target):
cmd = ['wget', url, '--quiet', '--output-document', target]
_clean_check(cmd, target)
def has_wget():
cmd = ['wget', '--version']
with open(os.path.devnull, 'wb') as devnull:
try:
subprocess.check_call(cmd, stdout=devnull, stderr=devnull)
except Exception:
return False
return True
download_file_wget.viable = has_wget
def download_file_insecure(url, target):
"""Use Python to download the file, without connection authentication."""
src = urlopen(url)
try:
# Read all the data in one block.
data = src.read()
finally:
src.close()
# Write all the data in one block to avoid creating a partial file.
with open(target, "wb") as dst:
dst.write(data)
download_file_insecure.viable = lambda: True
def get_best_downloader():
downloaders = (
download_file_powershell,
download_file_curl,
download_file_wget,
download_file_insecure,
)
viable_downloaders = (dl for dl in downloaders if dl.viable())
return next(viable_downloaders, None)
def download_setuptools(
version=DEFAULT_VERSION, download_base=DEFAULT_URL,
to_dir=DEFAULT_SAVE_DIR, delay=15,
downloader_factory=get_best_downloader):
"""
Download setuptools from a specified location and return its filename.
`version` should be a valid setuptools version number that is available
as an sdist for download under the `download_base` URL (which should end
with a '/'). `to_dir` is the directory where the egg will be downloaded.
`delay` is the number of seconds to pause before an actual download
attempt.
``downloader_factory`` should be a function taking no arguments and
returning a function for downloading a URL to a target.
"""
# making sure we use the absolute path
to_dir = os.path.abspath(to_dir)
zip_name = "setuptools-%s.zip" % version
url = download_base + zip_name
saveto = os.path.join(to_dir, zip_name)
if not os.path.exists(saveto): # Avoid repeated downloads
log.warn("Downloading %s", url)
downloader = downloader_factory()
downloader(url, saveto)
return os.path.realpath(saveto)
def _build_install_args(options):
"""
Build the arguments to 'python setup.py install' on the setuptools package.
Returns list of command line arguments.
"""
return ['--user'] if options.user_install else []
def _parse_args():
"""Parse the command line for options."""
parser = optparse.OptionParser()
parser.add_option(
'--user', dest='user_install', action='store_true', default=False,
help='install in user site package')
parser.add_option(
'--download-base', dest='download_base', metavar="URL",
default=DEFAULT_URL,
help='alternative URL from where to download the setuptools package')
parser.add_option(
'--insecure', dest='downloader_factory', action='store_const',
const=lambda: download_file_insecure, default=get_best_downloader,
help='Use internal, non-validating downloader'
)
parser.add_option(
'--version', help="Specify which version to download",
default=DEFAULT_VERSION,
)
parser.add_option(
'--to-dir',
help="Directory to save (and re-use) package",
default=DEFAULT_SAVE_DIR,
)
options, args = parser.parse_args()
# positional arguments are ignored
return options
def _download_args(options):
"""Return args for download_setuptools function from cmdline args."""
return dict(
version=options.version,
download_base=options.download_base,
downloader_factory=options.downloader_factory,
to_dir=options.to_dir,
)
def main():
"""Install or upgrade setuptools and EasyInstall."""
options = _parse_args()
archive = download_setuptools(**_download_args(options))
return _install(archive, _build_install_args(options))
if __name__ == '__main__':
sys.exit(main())
xmltodict-0.13.0/push_release.sh 0000775 0000000 0000000 00000000101 14235663755 0016630 0 ustar 00root root 0000000 0000000 #!/usr/bin/env sh
python setup.py clean sdist bdist_wheel upload
xmltodict-0.13.0/setup.cfg 0000664 0000000 0000000 00000000034 14235663755 0015440 0 ustar 00root root 0000000 0000000 [bdist_wheel]
universal = 1
xmltodict-0.13.0/setup.py 0000775 0000000 0000000 00000002771 14235663755 0015346 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
try:
from setuptools import setup
except ImportError:
from ez_setup import use_setuptools
use_setuptools()
from setuptools import setup
import xmltodict
with open('README.md', 'rb') as f:
long_description = f.read().decode('utf-8')
setup(name='xmltodict',
version=xmltodict.__version__,
description=xmltodict.__doc__,
long_description=long_description,
long_description_content_type='text/markdown',
author=xmltodict.__author__,
author_email='martinblech@gmail.com',
url='https://github.com/martinblech/xmltodict',
license=xmltodict.__license__,
platforms=['all'],
python_requires='>=3.4',
classifiers=[
'Intended Audience :: Developers',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: Implementation :: PyPy',
'Topic :: Text Processing :: Markup :: XML',
],
py_modules=['xmltodict'],
tests_require=['nose2', 'coverage'],
)
xmltodict-0.13.0/tests/ 0000775 0000000 0000000 00000000000 14235663755 0014764 5 ustar 00root root 0000000 0000000 xmltodict-0.13.0/tests/test_dicttoxml.py 0000664 0000000 0000000 00000016342 14235663755 0020412 0 ustar 00root root 0000000 0000000 import sys
from xmltodict import parse, unparse
from collections import OrderedDict
import unittest
import re
from textwrap import dedent
IS_JYTHON = sys.platform.startswith('java')
_HEADER_RE = re.compile(r'^[^\n]*\n')
def _strip(fullxml):
return _HEADER_RE.sub('', fullxml)
class DictToXMLTestCase(unittest.TestCase):
def test_root(self):
obj = {'a': None}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_simple_cdata(self):
obj = {'a': 'b'}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_cdata(self):
obj = {'a': {'#text': 'y'}}
self.assertEqual(obj, parse(unparse(obj), force_cdata=True))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_attrib(self):
obj = {'a': {'@href': 'x'}}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_attrib_and_cdata(self):
obj = {'a': {'@href': 'x', '#text': 'y'}}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_list(self):
obj = {'a': {'b': ['1', '2', '3']}}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_list_expand_iter(self):
obj = {'a': {'b': [['1', '2'], ['3',]]}}
#self.assertEqual(obj, parse(unparse(obj, expand_iter="item")))
exp_xml = dedent('''\
- 1
- 2
- 3
''')
self.assertEqual(exp_xml, unparse(obj, expand_iter="item"))
def test_generator(self):
obj = {'a': {'b': ['1', '2', '3']}}
def lazy_obj():
return {'a': {'b': (i for i in ('1', '2', '3'))}}
self.assertEqual(obj, parse(unparse(lazy_obj())))
self.assertEqual(unparse(lazy_obj()),
unparse(parse(unparse(lazy_obj()))))
def test_no_root(self):
self.assertRaises(ValueError, unparse, {})
def test_multiple_roots(self):
self.assertRaises(ValueError, unparse, {'a': '1', 'b': '2'})
self.assertRaises(ValueError, unparse, {'a': ['1', '2', '3']})
def test_no_root_nofulldoc(self):
self.assertEqual(unparse({}, full_document=False), '')
def test_multiple_roots_nofulldoc(self):
obj = OrderedDict((('a', 1), ('b', 2)))
xml = unparse(obj, full_document=False)
self.assertEqual(xml, '12')
obj = {'a': [1, 2]}
xml = unparse(obj, full_document=False)
self.assertEqual(xml, '12')
def test_nested(self):
obj = {'a': {'b': '1', 'c': '2'}}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
obj = {'a': {'b': {'c': {'@a': 'x', '#text': 'y'}}}}
self.assertEqual(obj, parse(unparse(obj)))
self.assertEqual(unparse(obj), unparse(parse(unparse(obj))))
def test_semistructured(self):
xml = 'abcefg'
self.assertEqual(_strip(unparse(parse(xml))),
'abcefg')
def test_preprocessor(self):
obj = {'a': OrderedDict((('b:int', [1, 2]), ('b', 'c')))}
def p(key, value):
try:
key, _ = key.split(':')
except ValueError:
pass
return key, value
self.assertEqual(_strip(unparse(obj, preprocessor=p)),
'12c')
def test_preprocessor_skipkey(self):
obj = {'a': {'b': 1, 'c': 2}}
def p(key, value):
if key == 'b':
return None
return key, value
self.assertEqual(_strip(unparse(obj, preprocessor=p)),
'2')
if not IS_JYTHON:
# Jython's SAX does not preserve attribute order
def test_attr_order_roundtrip(self):
xml = ''
self.assertEqual(xml, _strip(unparse(parse(xml))))
def test_pretty_print(self):
obj = {'a': OrderedDict((
('b', [{'c': [1, 2]}, 3]),
('x', 'y'),
))}
newl = '\n'
indent = '....'
xml = dedent('''\
....
........1
........2
....
....3
....y
''')
self.assertEqual(xml, unparse(obj, pretty=True,
newl=newl, indent=indent))
def test_encoding(self):
try:
value = unichr(39321)
except NameError:
value = chr(39321)
obj = {'a': value}
utf8doc = unparse(obj, encoding='utf-8')
latin1doc = unparse(obj, encoding='iso-8859-1')
self.assertEqual(parse(utf8doc), parse(latin1doc))
self.assertEqual(parse(utf8doc), obj)
def test_fulldoc(self):
xml_declaration_re = re.compile(
'^' + re.escape(''))
self.assertTrue(xml_declaration_re.match(unparse({'a': 1})))
self.assertFalse(
xml_declaration_re.match(unparse({'a': 1}, full_document=False)))
def test_non_string_value(self):
obj = {'a': 1}
self.assertEqual('1', _strip(unparse(obj)))
def test_non_string_attr(self):
obj = {'a': {'@attr': 1}}
self.assertEqual('', _strip(unparse(obj)))
def test_short_empty_elements(self):
if sys.version_info[0] < 3:
return
obj = {'a': None}
self.assertEqual('', _strip(unparse(obj, short_empty_elements=True)))
def test_namespace_support(self):
obj = OrderedDict((
('http://defaultns.com/:root', OrderedDict((
('@xmlns', OrderedDict((
('', 'http://defaultns.com/'),
('a', 'http://a.com/'),
('b', 'http://b.com/'),
))),
('http://defaultns.com/:x', OrderedDict((
('@http://a.com/:attr', 'val'),
('#text', '1'),
))),
('http://a.com/:y', '2'),
('http://b.com/:z', '3'),
))),
))
ns = {
'http://defaultns.com/': '',
'http://a.com/': 'a',
'http://b.com/': 'b',
}
expected_xml = '''
123'''
xml = unparse(obj, namespaces=ns)
self.assertEqual(xml, expected_xml)
def test_boolean_unparse(self):
expected_xml = '\ntrue'
xml = unparse(dict(x=True))
self.assertEqual(xml, expected_xml)
expected_xml = '\nfalse'
xml = unparse(dict(x=False))
self.assertEqual(xml, expected_xml)
xmltodict-0.13.0/tests/test_xmltodict.py 0000664 0000000 0000000 00000033621 14235663755 0020411 0 ustar 00root root 0000000 0000000 from xmltodict import parse, ParsingInterrupted
import collections
import unittest
try:
from io import BytesIO as StringIO
except ImportError:
from xmltodict import StringIO
from xml.parsers.expat import ParserCreate
from xml.parsers import expat
def _encode(s):
try:
return bytes(s, 'ascii')
except (NameError, TypeError):
return s
class XMLToDictTestCase(unittest.TestCase):
def test_string_vs_file(self):
xml = 'data'
self.assertEqual(parse(xml),
parse(StringIO(_encode(xml))))
def test_minimal(self):
self.assertEqual(parse(''),
{'a': None})
self.assertEqual(parse('', force_cdata=True),
{'a': None})
def test_simple(self):
self.assertEqual(parse('data'),
{'a': 'data'})
def test_force_cdata(self):
self.assertEqual(parse('data', force_cdata=True),
{'a': {'#text': 'data'}})
def test_custom_cdata(self):
self.assertEqual(parse('data',
force_cdata=True,
cdata_key='_CDATA_'),
{'a': {'_CDATA_': 'data'}})
def test_list(self):
self.assertEqual(parse('123'),
{'a': {'b': ['1', '2', '3']}})
def test_attrib(self):
self.assertEqual(parse(''),
{'a': {'@href': 'xyz'}})
def test_skip_attrib(self):
self.assertEqual(parse('', xml_attribs=False),
{'a': None})
def test_custom_attrib(self):
self.assertEqual(parse('',
attr_prefix='!'),
{'a': {'!href': 'xyz'}})
def test_attrib_and_cdata(self):
self.assertEqual(parse('123'),
{'a': {'@href': 'xyz', '#text': '123'}})
def test_semi_structured(self):
self.assertEqual(parse('abcdef'),
{'a': {'b': None, '#text': 'abcdef'}})
self.assertEqual(parse('abcdef',
cdata_separator='\n'),
{'a': {'b': None, '#text': 'abc\ndef'}})
def test_nested_semi_structured(self):
self.assertEqual(parse('abc123456def'),
{'a': {'#text': 'abcdef', 'b': {
'#text': '123456', 'c': None}}})
def test_skip_whitespace(self):
xml = """
hello
"""
self.assertEqual(
parse(xml),
{'root': {'emptya': None,
'emptyb': {'@attr': 'attrvalue'},
'value': 'hello'}})
def test_keep_whitespace(self):
xml = " "
self.assertEqual(parse(xml), dict(root=None))
self.assertEqual(parse(xml, strip_whitespace=False),
dict(root=' '))
def test_streaming(self):
def cb(path, item):
cb.count += 1
self.assertEqual(path, [('a', {'x': 'y'}), ('b', None)])
self.assertEqual(item, str(cb.count))
return True
cb.count = 0
parse('123',
item_depth=2, item_callback=cb)
self.assertEqual(cb.count, 3)
def test_streaming_interrupt(self):
cb = lambda path, item: False
self.assertRaises(ParsingInterrupted,
parse, 'x',
item_depth=1, item_callback=cb)
def test_streaming_generator(self):
def cb(path, item):
cb.count += 1
self.assertEqual(path, [('a', {'x': 'y'}), ('b', None)])
self.assertEqual(item, str(cb.count))
return True
cb.count = 0
parse((n for n in '123'),
item_depth=2, item_callback=cb)
self.assertEqual(cb.count, 3)
def test_postprocessor(self):
def postprocessor(path, key, value):
try:
return key + ':int', int(value)
except (ValueError, TypeError):
return key, value
self.assertEqual({'a': {'b:int': [1, 2], 'b': 'x'}},
parse('12x',
postprocessor=postprocessor))
def test_postprocessor_attribute(self):
def postprocessor(path, key, value):
try:
return key + ':int', int(value)
except (ValueError, TypeError):
return key, value
self.assertEqual({'a': {'@b:int': 1}},
parse('',
postprocessor=postprocessor))
def test_postprocessor_skip(self):
def postprocessor(path, key, value):
if key == 'b':
value = int(value)
if value == 3:
return None
return key, value
self.assertEqual({'a': {'b': [1, 2]}},
parse('123',
postprocessor=postprocessor))
def test_unicode(self):
try:
value = unichr(39321)
except NameError:
value = chr(39321)
self.assertEqual({'a': value},
parse('%s' % value))
def test_encoded_string(self):
try:
value = unichr(39321)
except NameError:
value = chr(39321)
xml = '%s' % value
self.assertEqual(parse(xml),
parse(xml.encode('utf-8')))
def test_namespace_support(self):
xml = """
1
2
3
"""
d = {
'http://defaultns.com/:root': {
'@version': '1.00',
'@xmlns': {
'': 'http://defaultns.com/',
'a': 'http://a.com/',
'b': 'http://b.com/',
},
'http://defaultns.com/:x': {
'@http://a.com/:attr': 'val',
'#text': '1',
},
'http://a.com/:y': '2',
'http://b.com/:z': '3',
}
}
res = parse(xml, process_namespaces=True)
self.assertEqual(res, d)
def test_namespace_collapse(self):
xml = """
1
2
3
"""
namespaces = {
'http://defaultns.com/': '',
'http://a.com/': 'ns_a',
}
d = {
'root': {
'@version': '1.00',
'@xmlns': {
'': 'http://defaultns.com/',
'a': 'http://a.com/',
'b': 'http://b.com/',
},
'x': {
'@ns_a:attr': 'val',
'#text': '1',
},
'ns_a:y': '2',
'http://b.com/:z': '3',
},
}
res = parse(xml, process_namespaces=True, namespaces=namespaces)
self.assertEqual(res, d)
def test_namespace_collapse_all(self):
xml = """
1
2
3
"""
namespaces = collections.defaultdict(lambda: None)
d = {
'root': {
'@version': '1.00',
'@xmlns': {
'': 'http://defaultns.com/',
'a': 'http://a.com/',
'b': 'http://b.com/',
},
'x': {
'@attr': 'val',
'#text': '1',
},
'y': '2',
'z': '3',
},
}
res = parse(xml, process_namespaces=True, namespaces=namespaces)
self.assertEqual(res, d)
def test_namespace_ignore(self):
xml = """
1
2
3
"""
d = {
'root': {
'@xmlns': 'http://defaultns.com/',
'@xmlns:a': 'http://a.com/',
'@xmlns:b': 'http://b.com/',
'@version': '1.00',
'x': '1',
'a:y': '2',
'b:z': '3',
},
}
self.assertEqual(parse(xml), d)
def test_force_list_basic(self):
xml = """
server1
os1
"""
expectedResult = {
'servers': {
'server': [
{
'name': 'server1',
'os': 'os1',
},
],
}
}
self.assertEqual(parse(xml, force_list=('server',)), expectedResult)
def test_force_list_callable(self):
xml = """
server1
os1
"""
def force_list(path, key, value):
"""Only return True for servers/server, but not for skip/server."""
if key != 'server':
return False
return path and path[-1][0] == 'servers'
expectedResult = {
'config': {
'servers': {
'server': [
{
'name': 'server1',
'os': 'os1',
},
],
},
'skip': {
'server': None,
},
},
}
self.assertEqual(parse(xml, force_list=force_list, dict_constructor=dict), expectedResult)
def test_disable_entities_true_ignores_xmlbomb(self):
xml = """
]>
&c;
"""
expectedResult = {'bomb': None}
try:
parse_attempt = parse(xml, disable_entities=True)
except expat.ExpatError:
self.assertTrue(True)
else:
self.assertEqual(parse_attempt, expectedResult)
def test_disable_entities_false_returns_xmlbomb(self):
xml = """
]>
&c;
"""
bomb = "1234567890" * 64
expectedResult = {'bomb': bomb}
self.assertEqual(parse(xml, disable_entities=False), expectedResult)
def test_disable_entities_true_ignores_external_dtd(self):
xml = """
]>
ⅇ
"""
expectedResult = {'root': None}
try:
parse_attempt = parse(xml, disable_entities=True)
except expat.ExpatError:
self.assertTrue(True)
else:
self.assertEqual(parse_attempt, expectedResult)
def test_disable_entities_true_attempts_external_dtd(self):
xml = """
]>
ⅇ
"""
def raising_external_ref_handler(*args, **kwargs):
parser = ParserCreate(*args, **kwargs)
parser.ExternalEntityRefHandler = lambda *x: 0
try:
feature = "http://apache.org/xml/features/disallow-doctype-decl"
parser._reader.setFeature(feature, True)
except AttributeError:
pass
return parser
expat.ParserCreate = raising_external_ref_handler
# Using this try/catch because a TypeError is thrown before
# the ExpatError, and Python 2.6 is confused by that.
try:
parse(xml, disable_entities=False, expat=expat)
except expat.ExpatError:
self.assertTrue(True)
else:
self.assertTrue(False)
expat.ParserCreate = ParserCreate
def test_comments(self):
xml = """
1
2
"""
expectedResult = {
'a': {
'b': {
'#comment': 'b comment',
'c': {
'#comment': 'c comment',
'#text': '1',
},
'd': '2',
},
}
}
self.assertEqual(parse(xml, process_comments=True), expectedResult)
xmltodict-0.13.0/tox.ini 0000664 0000000 0000000 00000000213 14235663755 0015131 0 ustar 00root root 0000000 0000000 [tox]
envlist = py34, py35, py36, py37, py38, py39, py310, pypy
[testenv]
deps =
coverage
nose2
commands=nose2 --coverage=xmltodict.py
xmltodict-0.13.0/xmltodict.py 0000775 0000000 0000000 00000045226 14235663755 0016217 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"Makes working with XML feel like you are working with JSON"
try:
from defusedexpat import pyexpat as expat
except ImportError:
from xml.parsers import expat
from xml.sax.saxutils import XMLGenerator
from xml.sax.xmlreader import AttributesImpl
try: # pragma no cover
from cStringIO import StringIO
except ImportError: # pragma no cover
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
_dict = dict
import platform
if tuple(map(int, platform.python_version_tuple()[:2])) < (3, 7):
from collections import OrderedDict as _dict
from inspect import isgenerator
try: # pragma no cover
_basestring = basestring
except NameError: # pragma no cover
_basestring = str
try: # pragma no cover
_unicode = unicode
except NameError: # pragma no cover
_unicode = str
__author__ = 'Martin Blech'
__version__ = '0.13.0'
__license__ = 'MIT'
class ParsingInterrupted(Exception):
pass
class _DictSAXHandler(object):
def __init__(self,
item_depth=0,
item_callback=lambda *args: True,
xml_attribs=True,
attr_prefix='@',
cdata_key='#text',
force_cdata=False,
cdata_separator='',
postprocessor=None,
dict_constructor=_dict,
strip_whitespace=True,
namespace_separator=':',
namespaces=None,
force_list=None,
comment_key='#comment'):
self.path = []
self.stack = []
self.data = []
self.item = None
self.item_depth = item_depth
self.xml_attribs = xml_attribs
self.item_callback = item_callback
self.attr_prefix = attr_prefix
self.cdata_key = cdata_key
self.force_cdata = force_cdata
self.cdata_separator = cdata_separator
self.postprocessor = postprocessor
self.dict_constructor = dict_constructor
self.strip_whitespace = strip_whitespace
self.namespace_separator = namespace_separator
self.namespaces = namespaces
self.namespace_declarations = dict_constructor()
self.force_list = force_list
self.comment_key = comment_key
def _build_name(self, full_name):
if self.namespaces is None:
return full_name
i = full_name.rfind(self.namespace_separator)
if i == -1:
return full_name
namespace, name = full_name[:i], full_name[i+1:]
try:
short_namespace = self.namespaces[namespace]
except KeyError:
short_namespace = namespace
if not short_namespace:
return name
else:
return self.namespace_separator.join((short_namespace, name))
def _attrs_to_dict(self, attrs):
if isinstance(attrs, dict):
return attrs
return self.dict_constructor(zip(attrs[0::2], attrs[1::2]))
def startNamespaceDecl(self, prefix, uri):
self.namespace_declarations[prefix or ''] = uri
def startElement(self, full_name, attrs):
name = self._build_name(full_name)
attrs = self._attrs_to_dict(attrs)
if attrs and self.namespace_declarations:
attrs['xmlns'] = self.namespace_declarations
self.namespace_declarations = self.dict_constructor()
self.path.append((name, attrs or None))
if len(self.path) > self.item_depth:
self.stack.append((self.item, self.data))
if self.xml_attribs:
attr_entries = []
for key, value in attrs.items():
key = self.attr_prefix+self._build_name(key)
if self.postprocessor:
entry = self.postprocessor(self.path, key, value)
else:
entry = (key, value)
if entry:
attr_entries.append(entry)
attrs = self.dict_constructor(attr_entries)
else:
attrs = None
self.item = attrs or None
self.data = []
def endElement(self, full_name):
name = self._build_name(full_name)
if len(self.path) == self.item_depth:
item = self.item
if item is None:
item = (None if not self.data
else self.cdata_separator.join(self.data))
should_continue = self.item_callback(self.path, item)
if not should_continue:
raise ParsingInterrupted()
if self.stack:
data = (None if not self.data
else self.cdata_separator.join(self.data))
item = self.item
self.item, self.data = self.stack.pop()
if self.strip_whitespace and data:
data = data.strip() or None
if data and self.force_cdata and item is None:
item = self.dict_constructor()
if item is not None:
if data:
self.push_data(item, self.cdata_key, data)
self.item = self.push_data(self.item, name, item)
else:
self.item = self.push_data(self.item, name, data)
else:
self.item = None
self.data = []
self.path.pop()
def characters(self, data):
if not self.data:
self.data = [data]
else:
self.data.append(data)
def comments(self, data):
if self.strip_whitespace:
data = data.strip()
self.item = self.push_data(self.item, self.comment_key, data)
def push_data(self, item, key, data):
if self.postprocessor is not None:
result = self.postprocessor(self.path, key, data)
if result is None:
return item
key, data = result
if item is None:
item = self.dict_constructor()
try:
value = item[key]
if isinstance(value, list):
value.append(data)
else:
item[key] = [value, data]
except KeyError:
if self._should_force_list(key, data):
item[key] = [data]
else:
item[key] = data
return item
def _should_force_list(self, key, value):
if not self.force_list:
return False
if isinstance(self.force_list, bool):
return self.force_list
try:
return key in self.force_list
except TypeError:
return self.force_list(self.path[:-1], key, value)
def parse(xml_input, encoding=None, expat=expat, process_namespaces=False,
namespace_separator=':', disable_entities=True, process_comments=False, **kwargs):
"""Parse the given XML input and convert it into a dictionary.
`xml_input` can either be a `string`, a file-like object, or a generator of strings.
If `xml_attribs` is `True`, element attributes are put in the dictionary
among regular child elements, using `@` as a prefix to avoid collisions. If
set to `False`, they are just ignored.
Simple example::
>>> import xmltodict
>>> doc = xmltodict.parse(\"\"\"
...
... 1
... 2
...
... \"\"\")
>>> doc['a']['@prop']
u'x'
>>> doc['a']['b']
[u'1', u'2']
If `item_depth` is `0`, the function returns a dictionary for the root
element (default behavior). Otherwise, it calls `item_callback` every time
an item at the specified depth is found and returns `None` in the end
(streaming mode).
The callback function receives two parameters: the `path` from the document
root to the item (name-attribs pairs), and the `item` (dict). If the
callback's return value is false-ish, parsing will be stopped with the
:class:`ParsingInterrupted` exception.
Streaming example::
>>> def handle(path, item):
... print('path:%s item:%s' % (path, item))
... return True
...
>>> xmltodict.parse(\"\"\"
...
... 1
... 2
... \"\"\", item_depth=2, item_callback=handle)
path:[(u'a', {u'prop': u'x'}), (u'b', None)] item:1
path:[(u'a', {u'prop': u'x'}), (u'b', None)] item:2
The optional argument `postprocessor` is a function that takes `path`,
`key` and `value` as positional arguments and returns a new `(key, value)`
pair where both `key` and `value` may have changed. Usage example::
>>> def postprocessor(path, key, value):
... try:
... return key + ':int', int(value)
... except (ValueError, TypeError):
... return key, value
>>> xmltodict.parse('12x',
... postprocessor=postprocessor)
{'a': {'b:int': [1, 2], 'b': 'x'}}
You can pass an alternate version of `expat` (such as `defusedexpat`) by
using the `expat` parameter. E.g:
>>> import defusedexpat
>>> xmltodict.parse('hello', expat=defusedexpat.pyexpat)
{'a': 'hello'}
You can use the force_list argument to force lists to be created even
when there is only a single child of a given level of hierarchy. The
force_list argument is a tuple of keys. If the key for a given level
of hierarchy is in the force_list argument, that level of hierarchy
will have a list as a child (even if there is only one sub-element).
The index_keys operation takes precedence over this. This is applied
after any user-supplied postprocessor has already run.
For example, given this input:
host1
Linux
em0
10.0.0.1
If called with force_list=('interface',), it will produce
this dictionary:
{'servers':
{'server':
{'name': 'host1',
'os': 'Linux'},
'interfaces':
{'interface':
[ {'name': 'em0', 'ip_address': '10.0.0.1' } ] } } }
`force_list` can also be a callable that receives `path`, `key` and
`value`. This is helpful in cases where the logic that decides whether
a list should be forced is more complex.
If `process_comment` is `True` then comment will be added with comment_key
(default=`'#comment'`) to then tag which contains comment
For example, given this input:
1
2
If called with process_comment=True, it will produce
this dictionary:
'a': {
'b': {
'#comment': 'b comment',
'c': {
'#comment': 'c comment',
'#text': '1',
},
'd': '2',
},
}
"""
handler = _DictSAXHandler(namespace_separator=namespace_separator,
**kwargs)
if isinstance(xml_input, _unicode):
if not encoding:
encoding = 'utf-8'
xml_input = xml_input.encode(encoding)
if not process_namespaces:
namespace_separator = None
parser = expat.ParserCreate(
encoding,
namespace_separator
)
try:
parser.ordered_attributes = True
except AttributeError:
# Jython's expat does not support ordered_attributes
pass
parser.StartNamespaceDeclHandler = handler.startNamespaceDecl
parser.StartElementHandler = handler.startElement
parser.EndElementHandler = handler.endElement
parser.CharacterDataHandler = handler.characters
if process_comments:
parser.CommentHandler = handler.comments
parser.buffer_text = True
if disable_entities:
try:
# Attempt to disable DTD in Jython's expat parser (Xerces-J).
feature = "http://apache.org/xml/features/disallow-doctype-decl"
parser._reader.setFeature(feature, True)
except AttributeError:
# For CPython / expat parser.
# Anything not handled ends up here and entities aren't expanded.
parser.DefaultHandler = lambda x: None
# Expects an integer return; zero means failure -> expat.ExpatError.
parser.ExternalEntityRefHandler = lambda *x: 1
if hasattr(xml_input, 'read'):
parser.ParseFile(xml_input)
elif isgenerator(xml_input):
for chunk in xml_input:
parser.Parse(chunk,False)
parser.Parse(b'',True)
else:
parser.Parse(xml_input, True)
return handler.item
def _process_namespace(name, namespaces, ns_sep=':', attr_prefix='@'):
if not namespaces:
return name
try:
ns, name = name.rsplit(ns_sep, 1)
except ValueError:
pass
else:
ns_res = namespaces.get(ns.strip(attr_prefix))
name = '{}{}{}{}'.format(
attr_prefix if ns.startswith(attr_prefix) else '',
ns_res, ns_sep, name) if ns_res else name
return name
def _emit(key, value, content_handler,
attr_prefix='@',
cdata_key='#text',
depth=0,
preprocessor=None,
pretty=False,
newl='\n',
indent='\t',
namespace_separator=':',
namespaces=None,
full_document=True,
expand_iter=None):
key = _process_namespace(key, namespaces, namespace_separator, attr_prefix)
if preprocessor is not None:
result = preprocessor(key, value)
if result is None:
return
key, value = result
if (not hasattr(value, '__iter__')
or isinstance(value, _basestring)
or isinstance(value, dict)):
value = [value]
for index, v in enumerate(value):
if full_document and depth == 0 and index > 0:
raise ValueError('document with multiple roots')
if v is None:
v = _dict()
elif isinstance(v, bool):
if v:
v = _unicode('true')
else:
v = _unicode('false')
elif not isinstance(v, dict):
if expand_iter and hasattr(v, '__iter__') and not isinstance(v, _basestring):
v = _dict(((expand_iter, v),))
else:
v = _unicode(v)
if isinstance(v, _basestring):
v = _dict(((cdata_key, v),))
cdata = None
attrs = _dict()
children = []
for ik, iv in v.items():
if ik == cdata_key:
cdata = iv
continue
if ik.startswith(attr_prefix):
ik = _process_namespace(ik, namespaces, namespace_separator,
attr_prefix)
if ik == '@xmlns' and isinstance(iv, dict):
for k, v in iv.items():
attr = 'xmlns{}'.format(':{}'.format(k) if k else '')
attrs[attr] = _unicode(v)
continue
if not isinstance(iv, _unicode):
iv = _unicode(iv)
attrs[ik[len(attr_prefix):]] = iv
continue
children.append((ik, iv))
if pretty:
content_handler.ignorableWhitespace(depth * indent)
content_handler.startElement(key, AttributesImpl(attrs))
if pretty and children:
content_handler.ignorableWhitespace(newl)
for child_key, child_value in children:
_emit(child_key, child_value, content_handler,
attr_prefix, cdata_key, depth+1, preprocessor,
pretty, newl, indent, namespaces=namespaces,
namespace_separator=namespace_separator,
expand_iter=expand_iter)
if cdata is not None:
content_handler.characters(cdata)
if pretty and children:
content_handler.ignorableWhitespace(depth * indent)
content_handler.endElement(key)
if pretty and depth:
content_handler.ignorableWhitespace(newl)
def unparse(input_dict, output=None, encoding='utf-8', full_document=True,
short_empty_elements=False,
**kwargs):
"""Emit an XML document for the given `input_dict` (reverse of `parse`).
The resulting XML document is returned as a string, but if `output` (a
file-like object) is specified, it is written there instead.
Dictionary keys prefixed with `attr_prefix` (default=`'@'`) are interpreted
as XML node attributes, whereas keys equal to `cdata_key`
(default=`'#text'`) are treated as character data.
The `pretty` parameter (default=`False`) enables pretty-printing. In this
mode, lines are terminated with `'\n'` and indented with `'\t'`, but this
can be customized with the `newl` and `indent` parameters.
"""
if full_document and len(input_dict) != 1:
raise ValueError('Document must have exactly one root.')
must_return = False
if output is None:
output = StringIO()
must_return = True
if short_empty_elements:
content_handler = XMLGenerator(output, encoding, True)
else:
content_handler = XMLGenerator(output, encoding)
if full_document:
content_handler.startDocument()
for key, value in input_dict.items():
_emit(key, value, content_handler, full_document=full_document,
**kwargs)
if full_document:
content_handler.endDocument()
if must_return:
value = output.getvalue()
try: # pragma no cover
value = value.decode(encoding)
except AttributeError: # pragma no cover
pass
return value
if __name__ == '__main__': # pragma: no cover
import sys
import marshal
try:
stdin = sys.stdin.buffer
stdout = sys.stdout.buffer
except AttributeError:
stdin = sys.stdin
stdout = sys.stdout
(item_depth,) = sys.argv[1:]
item_depth = int(item_depth)
def handle_item(path, item):
marshal.dump((path, item), stdout)
return True
try:
root = parse(stdin,
item_depth=item_depth,
item_callback=handle_item,
dict_constructor=dict)
if item_depth == 0:
handle_item([], root)
except KeyboardInterrupt:
pass