pax_global_header 0000666 0000000 0000000 00000000064 14346315420 0014514 g ustar 00root root 0000000 0000000 52 comment=3eeccba1b98f77cd5961379b33521d795ec6275e
docxcompose-1.4.0/ 0000775 0000000 0000000 00000000000 14346315420 0014041 5 ustar 00root root 0000000 0000000 docxcompose-1.4.0/.gitignore 0000664 0000000 0000000 00000000166 14346315420 0016034 0 ustar 00root root 0000000 0000000 __pycache__/
*.py[cod]
*$py.class
.Python
include/
lib/
*.egg-info/
pip-selfcheck.json
.pytest_cache/
.tox
/.cache/
docxcompose-1.4.0/HISTORY.txt 0000664 0000000 0000000 00000012753 14346315420 0015753 0 ustar 00root root 0000000 0000000 Changelog
=========
1.4.0 (2022-12-14)
------------------
- Add support for updating multiline plain text Content Controls. [lgraf]
1.3.7 (2022-11-18)
------------------
- Respect document language when updating datefields. [njohner]
1.3.6 (2022-10-05)
------------------
- vt2value(): Convert empty nodes to empty string instead of None. [lgraf]
1.3.5 (2022-07-08)
------------------
- Support missing style elements. [BryceStevenWilley]
- Correctly handle headers and footers when merging documents with sections. [njohner]
1.3.4 (2021-12-20)
------------------
- Avoid IndexError when processing documents that have custom styled numbering definitions. [lonetwin]
1.3.3 (2021-08-12)
------------------
- Add support for Smart Art (fixes #23)
- Correctly handle mapped styles in restart_first_numbering. [njohner]
1.3.2 (2021-04-27)
------------------
- Make Doc Properties case-insensitive. [buchi]
1.3.1 (2021-01-13)
------------------
- Add support for complex fields with fieldname split into several runs. [njohner]
- Add support for date format switches. [njohner]
1.3.0 (2020-10-06)
------------------
- Support updating complex properties with no existing value. [deiferni]
1.2.0 (2020-07-13)
------------------
- Add method to nullify a docproperty. [deiferni]
1.1.2 (2020-06-11)
------------------
- Handle embedded images that also have an external reference.
[buchi]
- Fix renumbering of non-visual image and drawing properties.
[buchi]
1.1.1 (2020-05-04)
------------------
- Fix an issue with non-ascii binary_type docproperties. [deiferni]
1.1.0 (2020-04-07)
------------------
- Add support for updating docproperties in header and footer of documents. [deiferni]
1.0.2 (2019-09-09)
------------------
- Do not fail when complex field does not have a separate node. [njohner]
1.0.1 (2019-07-25)
------------------
- Correctly treat two complex fields in the same paragraph. [njohner]
- Correctly handle the case when a docproperty appears multiple time in a document. [njohner]
- Handle docproperties with extra space before or no quotes around the property name. [njohner]
1.0.0 (2019-06-13)
------------------
- Change license from GPL to MIT.
[buchi]
- Add support for adding, setting and deleting of doc properties.
[buchi]
1.0.0a17 (2019-04-25)
---------------------
- Add functionality to get and set content of plain text content controls
(structured document tags).
[buchi]
1.0.0a16 (2019-01-15)
---------------------
- Prevent artifacts of previously cached doc property values during update. [deiferni]
1.0.0a15 (2018-12-12)
---------------------
- Fix updating doc-properties with non-ascii names. [deiferni]
- Don't handle hyperlink references twice. [deiferni]
1.0.0a14 (2018-12-04)
---------------------
- Implement generic handling of referenced parts. Among other, this adds
support for embedded Excel charts.
[buchi]
- Handle embedded SVGs.
[buchi]
- Add styles from other parts, e.g. footnotes.
[buchi]
1.0.0a13 (2018-11-05)
---------------------
- Fix list-styles being set incorrectly when restarting numberings.
[deiferni]
1.0.0a12 (2018-10-30)
---------------------
- Fix setting section type for appended documents with only one section.
[deiferni]
1.0.0a11 (2018-07-30)
---------------------
- Fix handling of section type.
[buchi]
- Fix an issue where the listing style of the first element was different.
[deiferni]
- Fix issue when restarting intermittent numbering.
[deiferni]
1.0.0a10 (2018-07-18)
---------------------
- Add console script command to compose two or more word files.
[deiferni]
1.0.0a9 (2018-05-01)
--------------------
- Fix error in mapping of num_ids introduced in 1.0.0.a7.
[buchi]
- Do not fail when numbering zero is referenced.
[deiferni]
1.0.0a8 (2018-04-26)
--------------------
- Only attempt to set the nsid when it is available.
[deiferni]
1.0.0a7 (2018-04-20)
--------------------
- Fix handling of images in WordprocessingGroups ().
[buchi]
- Fix handling of shapes in shape groups ().
[buchi]
- Fix handling of numberings, avoid inserting multiple numbering properties.
[buchi]
- Fix renumbering of bookmarks.
[buchi]
- Renumber ids of drawing object properties ().
[buchi]
1.0.0a6 (2018-02-20)
--------------------
- Do not restart numbering of bullets.
[buchi]
1.0.0a5 (2018-01-11)
--------------------
- Renumber bookmarks to avoid duplicate ids.
[buchi]
- Add support for shapes.
[buchi]
1.0.0a4 (2017-12-27)
--------------------
- Fix handling of styles when composing documents with different languages.
[buchi]
- Also add numberings referenced in styles.
[buchi]
- Avoid having multiple elements for the same style.
[buchi]
- Restart first numbering of inserted documents
[buchi]
- Add support for anchored images.
[buchi]
- Handle referenced style ids that are not defined in styles.xml
[buchi]
- Remove header and footer references in paragraph properties.
[buchi]
1.0.0a3 (2017-11-22)
--------------------
- Make removal of property fields optional.
[buchi]
1.0.0a2 (2017-11-06)
--------------------
- Fix handling of footnotes containing hyperlinks.
[buchi]
- Add functionality to deal with custom document properties. Properties can be
updated and fields containing properties can be removed. When appending or
inserting documents their custom document properties get removed automatically.
[buchi]
1.0.0a1 (2017-09-13)
--------------------
- Initial release
[buchi]
docxcompose-1.4.0/LICENSE 0000664 0000000 0000000 00000002066 14346315420 0015052 0 ustar 00root root 0000000 0000000 The MIT License (MIT)
Copyright (c) 2019 4teamwork AG
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. docxcompose-1.4.0/MANIFEST.in 0000664 0000000 0000000 00000000142 14346315420 0015574 0 ustar 00root root 0000000 0000000 graft docxcompose
prune tests
include *.rst
include *.txt
exclude .gitignore
global-exclude *.pyc
docxcompose-1.4.0/README.rst 0000664 0000000 0000000 00000003741 14346315420 0015535 0 ustar 00root root 0000000 0000000
*docxcompose* is a Python library for concatenating/appending Microsoft
Word (.docx) files.
Example usage
-------------
Append a document to another document:
.. code::
from docxcompose.composer import Composer
from docx import Document
master = Document("master.docx")
composer = Composer(master)
doc1 = Document("doc1.docx")
composer.append(doc1)
composer.save("combined.docx")
The docxcompose console script
------------------------------
The ``docxcompose`` console script allows to compose docx files from the command
line, e.g.:
.. code:: sh
$ docxcompose files/master.docx files/content.docx -o files/composed.docx
Installation for development
----------------------------
To install docxcompose for development, clone the repository and using a python with setuptools (for example a fresh virtualenv), install it using pip:
.. code:: sh
$ pip install -e .[tests]
Tests can then be run with ``pytest``.
A note about testing
--------------------
The tests provide helpers for blackbox testing that can compare whole word
files. To do so the following files should be provided:
- a file for the expected output that should be added to the folder
`docs/composed_fixture`
- multiple files that can be composed into the file above should be added
to the folder `docs`.
The expected output can now be tested as follows:
.. code:: python
def test_example():
fixture = FixtureDocument("expected.docx")
composed = ComposedDocument("master.docx", "slave1.docx", "slave2.docx")
assert fixture == composed
Should the assertion fail the output file will be stored in the folder
`docs/composed_debug` with the filename of the fixture file, `expected.docx`
in case of this example.
Headers and footers
-------------------
The first document is considered as the main template and headers and footers from the other documents are ignored, so that the header and footer of the first document is used throughout the merged file.
docxcompose-1.4.0/bin/ 0000775 0000000 0000000 00000000000 14346315420 0014611 5 ustar 00root root 0000000 0000000 docxcompose-1.4.0/bin/.gitignore 0000664 0000000 0000000 00000000027 14346315420 0016600 0 ustar 00root root 0000000 0000000 *
!.gitignore
!cibuild
docxcompose-1.4.0/bin/cibuild 0000775 0000000 0000000 00000000062 14346315420 0016150 0 ustar 00root root 0000000 0000000 #!/usr/bin/env sh
set -euo pipefail
tox
exit $?
docxcompose-1.4.0/docxcompose/ 0000775 0000000 0000000 00000000000 14346315420 0016364 5 ustar 00root root 0000000 0000000 docxcompose-1.4.0/docxcompose/__init__.py 0000664 0000000 0000000 00000000000 14346315420 0020463 0 ustar 00root root 0000000 0000000 docxcompose-1.4.0/docxcompose/command.py 0000664 0000000 0000000 00000003307 14346315420 0020357 0 ustar 00root root 0000000 0000000 from argparse import ArgumentParser
from docx import Document
from docxcompose.composer import Composer
import os.path
import sys
def setup_parser():
parser = ArgumentParser(
description='compose multiple docx files into one file.')
parser.add_argument('master',
help='path to master template that defines styles, '
'headings and so on')
parser.add_argument('files', nargs='+',
help='path to one or more word-files to be appended '
'to the master template',
metavar='file')
parser.add_argument('-o', '--output-document', dest='ouput_document',
default='composed.docx',
help='path to the output file', metavar='file')
return parser
def require_valid_file(parser, path):
if not os.path.isfile(path):
parser.error(message='file not found {}'.format(path))
def parse_args(parser, args):
parsed_args = parser.parse_args(args=args)
require_valid_file(parser, parsed_args.master)
for file_path in parsed_args.files:
require_valid_file(parser, file_path)
return parsed_args
def compose_files(parser, parsed_args):
composer = Composer(Document(parsed_args.master))
for slave_path in parsed_args.files:
composer.append(Document(slave_path))
composer.save(parsed_args.ouput_document)
parser.exit(message='successfully composed file at {}\n'.format(
parsed_args.ouput_document))
def main(args=None):
if args is None:
args = sys.argv[1:]
parser = setup_parser()
parsed_args = parse_args(parser, args)
compose_files(parser, parsed_args)
docxcompose-1.4.0/docxcompose/composer.py 0000664 0000000 0000000 00000071672 14346315420 0020602 0 ustar 00root root 0000000 0000000 from collections import OrderedDict
from copy import deepcopy
from docx.opc.constants import CONTENT_TYPE as CT
from docx.opc.constants import RELATIONSHIP_TYPE as RT
from docx.opc.oxml import serialize_part_xml
from docx.opc.packuri import PackURI
from docx.opc.part import Part
from docx.oxml import parse_xml
from docx.oxml.section import CT_SectPr
from docx.parts.numbering import NumberingPart
from docxcompose.image import ImageWrapper
from docxcompose.properties import CustomProperties
from docxcompose.utils import NS
from docxcompose.utils import xpath
import os.path
import random
import re
FILENAME_IDX_RE = re.compile('([a-zA-Z/_-]+)([1-9][0-9]*)?')
RID_IDX_RE = re.compile('rId([0-9]*)')
REFERENCED_PARTS_IGNORED_RELTYPES = set([
RT.IMAGE,
RT.HEADER,
RT.FOOTER,
])
PART_RELTYPES_WITH_STYLES = [
RT.FOOTNOTES,
]
class Composer(object):
def __init__(self, doc):
self.doc = doc
self.pkg = doc.part.package
self.restart_numbering = True
self.reset_reference_mapping()
self.first_section_properties_added = False
def reset_reference_mapping(self):
self.num_id_mapping = {}
self.anum_id_mapping = {}
self._numbering_restarted = set()
def append(self, doc, remove_property_fields=True):
"""Append the given document."""
index = self.append_index()
self.insert(index, doc, remove_property_fields=remove_property_fields)
def insert(self, index, doc, remove_property_fields=True):
"""Insert the given document at the given index."""
self.reset_reference_mapping()
# Remove custom property fields but keep the values
if remove_property_fields:
cprops = CustomProperties(doc)
for name in cprops.keys():
cprops.dissolve_fields(name)
self._create_style_id_mapping(doc)
for element in doc.element.body:
if isinstance(element, CT_SectPr):
"""This will lead to unexpected behaviors, for example if one
of the added documents with landscape set for the last section
the page orientation will get lost here. Still this is mostly
ok, and otherwise we would need to create a section for each
document added, i.e. move the properties into the last
paragraph and also decide which properties we allow to overwrite
and which should inherit from the main template."""
continue
element = deepcopy(element)
self.doc.element.body.insert(index, element)
self.add_referenced_parts(doc.part, self.doc.part, element)
self.add_styles(doc, element)
self.add_numberings(doc, element)
self.restart_first_numbering(doc, element)
self.add_images(doc, element)
self.add_diagrams(doc, element)
self.add_shapes(doc, element)
self.add_footnotes(doc, element)
self.remove_header_and_footer_references(doc, element)
index += 1
self.add_styles_from_other_parts(doc)
self.renumber_bookmarks()
self.renumber_docpr_ids()
self.renumber_nvpicpr_ids()
# The two methods below attempt to fix a general issue we have with
# sections and their properties which is not correctly solved yet.
# Right now the situation is really messy. When there is only one
# section per document being assembled, then we remove the properties
# of all added documents and only use the properties of the main template.
# When a document has more than one section, then we keep the properties
# for all the sections of that document except the last one. Also
# note that for the first such document added, the properties of its
# first section will get applied to the everything that came before.
# This is because of how sections and section properties are
# defined, i.e. sections are defined by the secPr tags inside the last
# paragraph of a section, except for the last section which has its
# secPr tag in the body...
self.fix_section_types(doc)
self.fix_header_and_footers(doc)
def save(self, filename):
self.doc.save(filename)
def append_index(self):
section_props = self.doc.element.body.xpath('w:sectPr')
if section_props:
return self.doc.element.body.index(section_props[0])
return len(self.doc.element.body)
def add_referenced_parts(self, src_part, dst_part, element):
rid_elements = xpath(element, './/*[@r:id]')
for rid_element in rid_elements:
rid = rid_element.get('{%s}id' % NS['r'])
rel = src_part.rels[rid]
if rel.reltype in REFERENCED_PARTS_IGNORED_RELTYPES:
continue
new_rel = self.add_relationship(src_part, dst_part, rel)
rid_element.set('{%s}id' % NS['r'], new_rel.rId)
def add_relationship(self, src_part, dst_part, relationship):
"""Add relationship and it's target part"""
if relationship.is_external:
new_rid = dst_part.rels.get_or_add_ext_rel(
relationship.reltype, relationship.target_ref)
return dst_part.rels[new_rid]
part = relationship.target_part
# Determine next partname
name = FILENAME_IDX_RE.match(part.partname).group(1)
used_part_numbers = [
FILENAME_IDX_RE.match(p.partname).group(2)
for p in dst_part.package.iter_parts()
if p.partname.startswith(name)
]
used_part_numbers = [
int(idx) for idx in used_part_numbers if idx is not None]
for n in range(1, len(used_part_numbers)+2):
if n not in used_part_numbers:
next_part_number = n
break
next_partname = PackURI('%s%d.%s' % (
name, next_part_number, part.partname.ext))
new_part = Part(
next_partname, part.content_type, part.blob,
dst_part.package)
new_rel = dst_part.rels.get_or_add(relationship.reltype, new_part)
# Sort relationships by rId to get the same rId when adding them to the
# new part. This avoids fixing references.
def sort_key(r):
match = RID_IDX_RE.match(r.rId)
return int(match.group(1))
for rel in sorted(part.rels.values(), key=sort_key):
self.add_relationship(part, new_part, rel)
return new_rel
def add_diagrams(self, doc, element):
dgm_rels = xpath(element, './/dgm:relIds[@r:dm]')
for dgm_rel in dgm_rels:
for item, rt_type in (
('dm', RT.DIAGRAM_DATA),
('lo', RT.DIAGRAM_LAYOUT),
('qs', RT.DIAGRAM_QUICK_STYLE),
('cs', RT.DIAGRAM_COLORS)
):
dm_rid = dgm_rel.get('{%s}%s' % (NS['r'], item))
dm_part = doc.part.rels[dm_rid].target_part
new_rid = self.doc.part.relate_to(dm_part, rt_type)
dgm_rel.set('{%s}%s' % (NS['r'], item), new_rid)
def add_images(self, doc, element):
"""Add images from the given document used in the given element."""
blips = xpath(
element, '(.//a:blip|.//asvg:svgBlip)[@r:embed]')
for blip in blips:
rid = blip.get('{%s}embed' % NS['r'])
img_part = doc.part.rels[rid].target_part
new_img_part = self.pkg.image_parts._get_by_sha1(img_part.sha1)
if new_img_part is None:
image = ImageWrapper(img_part)
new_img_part = self.pkg.image_parts._add_image_part(image)
new_rid = self.doc.part.relate_to(new_img_part, RT.IMAGE)
blip.set('{%s}embed' % NS['r'], new_rid)
# handle external reference as images can be embedded and have an
# external reference
rid = blip.get('{%s}link' % NS['r'])
if rid:
rel = doc.part.rels[rid]
new_rel = self.add_relationship(None, self.doc.part, rel)
blip.set('{%s}link' % NS['r'], new_rel.rId)
def add_shapes(self, doc, element):
shapes = xpath(element, './/v:shape/v:imagedata')
for shape in shapes:
rid = shape.get('{%s}id' % NS['r'])
img_part = doc.part.rels[rid].target_part
new_img_part = self.pkg.image_parts._get_by_sha1(img_part.sha1)
if new_img_part is None:
image = ImageWrapper(img_part)
new_img_part = self.pkg.image_parts._add_image_part(image)
new_rid = self.doc.part.relate_to(new_img_part, RT.IMAGE)
shape.set('{%s}id' % NS['r'], new_rid)
def add_footnotes(self, doc, element):
"""Add footnotes from the given document used in the given element."""
footnotes_refs = element.findall('.//w:footnoteReference', NS)
if not footnotes_refs:
return
footnote_part = doc.part.rels.part_with_reltype(RT.FOOTNOTES)
my_footnote_part = self.footnote_part()
footnotes = parse_xml(my_footnote_part.blob)
next_id = len(footnotes) + 1
for ref in footnotes_refs:
id_ = ref.get('{%s}id' % NS['w'])
element = parse_xml(footnote_part.blob)
footnote = deepcopy(element.find('.//w:footnote[@w:id="%s"]' % id_, NS))
footnotes.append(footnote)
footnote.set('{%s}id' % NS['w'], str(next_id))
ref.set('{%s}id' % NS['w'], str(next_id))
next_id += 1
self.add_referenced_parts(footnote_part, my_footnote_part, element)
my_footnote_part._blob = serialize_part_xml(footnotes)
def footnote_part(self):
"""The footnote part of the document."""
try:
footnote_part = self.doc.part.rels.part_with_reltype(RT.FOOTNOTES)
except KeyError:
# Create a new empty footnotes part
partname = PackURI('/word/footnotes.xml')
content_type = CT.WML_FOOTNOTES
xml_path = os.path.join(
os.path.dirname(__file__), 'templates', 'footnotes.xml')
with open(xml_path, 'rb') as f:
xml_bytes = f.read()
footnote_part = Part(
partname, content_type, xml_bytes, self.doc.part.package)
self.doc.part.relate_to(footnote_part, RT.FOOTNOTES)
return footnote_part
def mapped_style_id(self, style_id):
if style_id not in self._style_id2name:
return style_id
return self._style_name2id.get(
self._style_id2name[style_id], style_id)
def _create_style_id_mapping(self, doc):
# Style ids are language-specific, but names not (always), WTF?
# The inserted document may have another language than the composed one.
# Thus we map the style id using the style name.
self._style_id2name = {s.style_id: s.name for s in doc.styles}
self._style_name2id = {s.name: s.style_id for s in self.doc.styles}
def add_styles_from_other_parts(self, doc):
for reltype in PART_RELTYPES_WITH_STYLES:
try:
el = parse_xml(doc.part.rels.part_with_reltype(reltype).blob)
except (KeyError, ValueError):
pass
else:
self.add_styles(doc, el)
def add_styles(self, doc, element):
"""Add styles from the given document used in the given element."""
our_style_ids = [s.style_id for s in self.doc.styles]
# de-duplicate ids and keep order to make sure tests are not flaky
used_style_ids = list(OrderedDict.fromkeys([e.val for e in xpath(
element, './/w:tblStyle|.//w:pStyle|.//w:rStyle')]))
for style_id in used_style_ids:
our_style_id = self.mapped_style_id(style_id)
if our_style_id not in our_style_ids:
style_element = deepcopy(doc.styles.element.get_by_id(style_id))
if style_element is not None:
self.doc.styles.element.append(style_element)
self.add_numberings(doc, style_element)
# Also add linked styles
linked_style_ids = xpath(style_element, './/w:link/@w:val')
if linked_style_ids:
linked_style_id = linked_style_ids[0]
our_linked_style_id = self.mapped_style_id(linked_style_id)
if our_linked_style_id not in our_style_ids:
our_linked_style = doc.styles.element.get_by_id(
linked_style_id)
if our_linked_style is not None:
self.doc.styles.element.append(deepcopy(
our_linked_style))
else:
# Create a mapping for abstractNumIds used in existing styles
# This is used when adding numberings to avoid having multiple
# elements for the same style.
style_element = doc.styles.element.get_by_id(style_id)
if style_element is not None:
num_ids = xpath(style_element, './/w:numId/@w:val')
if num_ids:
anum_ids = xpath(
doc.part.numbering_part.element,
'.//w:num[@w:numId="%s"]/w:abstractNumId/@w:val' % num_ids[0])
if anum_ids:
our_style_element = self.doc.styles.element.get_by_id(our_style_id)
our_num_ids = xpath(our_style_element, './/w:numId/@w:val')
if our_num_ids:
numbering_part = self.numbering_part()
our_anum_ids = xpath(
numbering_part.element,
'.//w:num[@w:numId="%s"]/w:abstractNumId/@w:val' % our_num_ids[0])
if our_anum_ids:
self.anum_id_mapping[int(anum_ids[0])] = int(our_anum_ids[0])
# Replace language-specific style id with our style id
if our_style_id != style_id and our_style_id is not None:
style_elements = xpath(
element,
'.//w:tblStyle[@w:val="%(styleid)s"]|'
'.//w:pStyle[@w:val="%(styleid)s"]|'
'.//w:rStyle[@w:val="%(styleid)s"]' % dict(styleid=style_id))
for el in style_elements:
el.val = our_style_id
# Update our style ids
our_style_ids = [s.style_id for s in self.doc.styles]
def add_numberings(self, doc, element):
"""Add numberings from the given document used in the given element."""
# Search for numbering references
num_ids = set([n.val for n in xpath(element, './/w:numId')])
if not num_ids:
return
next_num_id, next_anum_id = self._next_numbering_ids()
src_numbering_part = doc.part.numbering_part
for num_id in num_ids:
if num_id in self.num_id_mapping:
continue
# Find the referenced element
res = src_numbering_part.element.xpath(
'.//w:num[@w:numId="%s"]' % num_id)
if not res:
continue
num_element = deepcopy(res[0])
num_element.numId = next_num_id
self.num_id_mapping[num_id] = next_num_id
anum_id = num_element.xpath('//w:abstractNumId')[0]
if anum_id.val not in self.anum_id_mapping:
# Find the referenced element
res = src_numbering_part.element.xpath(
'.//w:abstractNum[@w:abstractNumId="%s"]' % anum_id.val)
if not res:
continue
anum_element = deepcopy(res[0])
self.anum_id_mapping[anum_id.val] = next_anum_id
anum_id.val = next_anum_id
# anum_element.abstractNumId = next_anum_id
anum_element.set('{%s}abstractNumId' % NS['w'], str(next_anum_id))
# Make sure we have a unique nsid so numberings restart properly
nsid = anum_element.find('.//w:nsid', NS)
if nsid is not None:
nsid.set(
'{%s}val' % NS['w'],
"{0:08X}".format(int(10**8 * random.random()))
)
self._insert_abstract_num(anum_element)
else:
anum_id.val = self.anum_id_mapping[anum_id.val]
self._insert_num(num_element)
# Fix references
for num_id_ref in xpath(element, './/w:numId'):
num_id_ref.val = self.num_id_mapping.get(
num_id_ref.val, num_id_ref.val)
def _next_numbering_ids(self):
numbering_part = self.numbering_part()
# Determine next unused numId (numbering starts with 1)
current_num_ids = [
n.numId for n in xpath(numbering_part.element, './/w:num')]
if current_num_ids:
next_num_id = max(current_num_ids) + 1
else:
next_num_id = 1
# Determine next unused abstractNumId (numbering starts with 0)
current_anum_ids = [
int(n) for n in
xpath(numbering_part.element, './/w:abstractNum/@w:abstractNumId')]
if current_anum_ids:
next_anum_id = max(current_anum_ids) + 1
else:
next_anum_id = 0
return next_num_id, next_anum_id
def _insert_num(self, element):
# Find position of last element and insert after that
numbering_part = self.numbering_part()
nums = numbering_part.element.xpath('.//w:num')
if nums:
num_index = numbering_part.element.index(nums[-1])
numbering_part.element.insert(num_index, element)
else:
numbering_part.element.append(element)
def _insert_abstract_num(self, element):
# Find position of first element
# We'll insert before that
numbering_part = self.numbering_part()
nums = numbering_part.element.xpath('.//w:num')
if nums:
anum_index = numbering_part.element.index(nums[0])
else:
anum_index = 0
numbering_part.element.insert(anum_index, element)
def _replace_mapped_num_id(self, old_id, new_id):
"""Replace a mapped numId with a new one."""
for key, value in self.num_id_mapping.items():
if value == old_id:
self.num_id_mapping[key] = new_id
return
def numbering_part(self):
"""The numbering part of the document."""
try:
numbering_part = self.doc.part.rels.part_with_reltype(RT.NUMBERING)
except KeyError:
# Create a new empty numbering part
partname = PackURI('/word/numbering.xml')
content_type = CT.WML_NUMBERING
xml_path = os.path.join(
os.path.dirname(__file__), 'templates', 'numbering.xml')
with open(xml_path, 'rb') as f:
xml_bytes = f.read()
element = parse_xml(xml_bytes)
numbering_part = NumberingPart(
partname, content_type, element, self.doc.part.package)
self.doc.part.relate_to(numbering_part, RT.NUMBERING)
return numbering_part
def restart_first_numbering(self, doc, element):
if not self.restart_numbering:
return
style_id = xpath(element, './/w:pStyle/@w:val')
if not style_id:
return
style_id = style_id[0]
if style_id in self._numbering_restarted:
return
style_element = self.doc.styles.element.get_by_id(style_id)
if style_element is None:
return
outline_lvl = xpath(style_element, './/w:outlineLvl')
if outline_lvl:
# Styles with an outline level are probably headings.
# Do not restart numbering of headings
return
# if there is a numId referenced from the paragraph, that numId is
# relevant, otherwise fall back to the style's numId
local_num_id = xpath(element, './/w:numPr/w:numId/@w:val')
if local_num_id:
num_id = local_num_id[0]
else:
style_num_id = xpath(style_element, './/w:numId/@w:val')
if not style_num_id:
return
num_id = style_num_id[0]
numbering_part = self.numbering_part()
num_element = xpath(
numbering_part.element,
'.//w:num[@w:numId="%s"]' % num_id)
if not num_element:
# Styles with no numbering element should not be processed
return
anum_id = xpath(num_element[0], './/w:abstractNumId/@w:val')[0]
anum_element = xpath(
numbering_part.element,
'.//w:abstractNum[@w:abstractNumId="%s"]' % anum_id)
num_fmt = xpath(
anum_element[0], './/w:lvl[@w:ilvl="0"]/w:numFmt/@w:val')
# Do not restart numbering of bullets
if num_fmt and num_fmt[0] == 'bullet':
return
new_num_element = deepcopy(num_element[0])
lvl_override = parse_xml(
'')
new_num_element.append(lvl_override)
next_num_id, next_anum_id = self._next_numbering_ids()
new_num_element.numId = next_num_id
self._insert_num(new_num_element)
paragraph_props = xpath(element, './/w:pPr/w:pStyle[@w:val="%s"]/parent::w:pPr' % style_id)
num_pr = xpath(paragraph_props[0], './/w:numPr')
if num_pr:
num_pr = num_pr[0]
previous_num_id = num_pr.numId.val
self._replace_mapped_num_id(previous_num_id, next_num_id)
num_pr.numId.val = next_num_id
else:
num_pr = parse_xml(
''
'' % next_num_id)
paragraph_props[0].append(num_pr)
self._numbering_restarted.add(style_id)
def header_part(self, content=None):
"""The header part of the document."""
header_rels = [
rel for rel in self.doc.part.rels.values() if rel.reltype == RT.HEADER]
next_id = len(header_rels) + 1
# Create a new header part
partname = PackURI('/word/header%s.xml' % next_id)
content_type = CT.WML_HEADER
if not content:
xml_path = os.path.join(
os.path.dirname(__file__), 'templates', 'header.xml')
with open(xml_path, 'rb') as f:
content = f.read()
header_part = Part(
partname, content_type, content, self.doc.part.package)
self.doc.part.relate_to(header_part, RT.HEADER)
return header_part
def footer_part(self, content=None):
"""The footer part of the document."""
footer_rels = [
rel for rel in self.doc.part.rels.values() if rel.reltype == RT.FOOTER]
next_id = len(footer_rels) + 1
# Create a new header part
partname = PackURI('/word/footer%s.xml' % next_id)
content_type = CT.WML_FOOTER
if not content:
xml_path = os.path.join(
os.path.dirname(__file__), 'templates', 'footer.xml')
with open(xml_path, 'rb') as f:
content = f.read()
footer_part = Part(
partname, content_type, content, self.doc.part.package)
self.doc.part.relate_to(footer_part, RT.FOOTER)
return footer_part
def remove_header_and_footer_references(self, doc, element):
refs = xpath(
element, './/w:headerReference|.//w:footerReference')
for ref in refs:
ref.getparent().remove(ref)
def renumber_bookmarks(self):
bookmarks_start = xpath(self.doc.element.body, './/w:bookmarkStart')
bookmark_id = 0
for bookmark in bookmarks_start:
bookmark.set('{%s}id' % NS['w'], str(bookmark_id))
bookmark_id += 1
bookmarks_end = xpath(self.doc.element.body, './/w:bookmarkEnd')
bookmark_id = 0
for bookmark in bookmarks_end:
bookmark.set('{%s}id' % NS['w'], str(bookmark_id))
bookmark_id += 1
def renumber_docpr_ids(self):
# Ensure that non-visual drawing properties have a unique id
doc_prs = xpath(
self.doc.element.body, './/wp:docPr')
doc_pr_id = 1
for doc_pr in doc_prs:
doc_pr.id = doc_pr_id
doc_pr_id += 1
parts = [
rel.target_part for rel in self.doc.part.rels.values()
if rel.reltype in [RT.HEADER, RT.FOOTER, ]
]
for part in parts:
doc_prs = xpath(part.element, './/wp:docPr')
for doc_pr in doc_prs:
doc_pr.id = doc_pr_id
doc_pr_id += 1
def renumber_nvpicpr_ids(self):
# Ensure that non-visual image properties have a unique id
c_nv_prs = xpath(
self.doc.element.body, './/pic:cNvPr')
c_nv_pr_id = 1
for c_nv_pr in c_nv_prs:
c_nv_pr.id = c_nv_pr_id
c_nv_pr_id += 1
parts = [
rel.target_part for rel in self.doc.part.rels.values()
if rel.reltype in [RT.HEADER, RT.FOOTER, ]
]
for part in parts:
c_nv_prs = xpath(part.element, './/pic:cNvPr')
for c_nv_pr in c_nv_prs:
c_nv_pr.id = c_nv_pr_id
c_nv_pr_id += 1
def fix_section_types(self, doc):
# The section type determines how the contents of the section will be
# placed relative to the *previous* section.
# The last section always stays at the end. Therefore we need to adjust
# the type of first new section.
# We also need to change the type of the last section of the composed
# document to the one from the appended document.
# TODO: Support when inserting document at an arbitrary position
if len(self.doc.sections) == 1 or len(doc.sections) == 1:
return
first_new_section_idx = len(self.doc.sections) - len(doc.sections)
self.doc.sections[first_new_section_idx].start_type = self.doc.sections[-1].start_type
self.doc.sections[-1].start_type = doc.sections[-1].start_type
def fix_header_and_footers(self, doc):
"""
The master document usually only has one section, hence its section
properties are defined directly in the body of the document and apply
to the last section of the document. For all other sections but the
last one, section properties are defined in the last paragraph of
the section.
Headers and footers are inherited from the previous section properties
if they are not defined in a given section. If not defined in the first
section, then blank headers and footers will be used., so we need to
make sure to add the definition from the main template in the first
section of the document if there are more than one sections.
"""
if self.first_section_properties_added:
return
if len(self.doc.sections) == 1 or len(doc.sections) == 1:
return
first_new_section_idx = len(self.doc.sections) - len(doc.sections)
last_section = self.doc.sections[-1]
first_section = self.doc.sections[first_new_section_idx]
for footer_name in ('footer', 'even_page_footer', 'first_page_footer'):
footer_main = getattr(last_section, footer_name)
if not footer_main._has_definition:
continue
footer_sec = getattr(first_section, footer_name)
rid = footer_main._sectPr.get_footerReference(footer_main._hdrftr_index).rId
footer_sec._sectPr.add_footerReference(footer_main._hdrftr_index, rid)
for header_name in ('header', 'even_page_header', 'first_page_header'):
header_main = getattr(last_section, header_name)
if not header_main._has_definition:
continue
header_sec = getattr(first_section, header_name)
rid = header_main._sectPr.get_headerReference(header_main._hdrftr_index).rId
header_sec._sectPr.add_headerReference(header_main._hdrftr_index, rid)
# We also need to move the page number type tag to that section
# properties and remove it from the section properties from the body.
last_sect_pr = last_section._sectPr
first_sect_pr = first_section._sectPr
pg_num_types = last_sect_pr.xpath("w:pgNumType")
for pg_num_type in pg_num_types:
last_sect_pr.remove(pg_num_type)
first_sect_pr.append(pg_num_type)
self.first_section_properties_added = True
docxcompose-1.4.0/docxcompose/image.py 0000664 0000000 0000000 00000000571 14346315420 0020023 0 ustar 00root root 0000000 0000000 import os.path
class ImageWrapper(object):
"""Image wrapper for image part creation out of an existing image part."""
def __init__(self, img_part):
self.sha1 = img_part.sha1
self.filename = img_part.filename
self.ext = os.path.splitext(self.filename)[1][1:]
self.content_type = img_part.content_type
self.blob = img_part.blob
docxcompose-1.4.0/docxcompose/properties.py 0000664 0000000 0000000 00000051403 14346315420 0021135 0 ustar 00root root 0000000 0000000 from babel.dates import format_datetime
from copy import deepcopy
from datetime import datetime
from docx.opc.constants import CONTENT_TYPE as CT
from docx.opc.constants import RELATIONSHIP_TYPE as RT
from docx.opc.oxml import serialize_part_xml
from docx.opc.packuri import PackURI
from docx.opc.part import Part
from docx.oxml import parse_xml
from docx.oxml.coreprops import CT_CoreProperties
from docxcompose.utils import NS
from docxcompose.utils import word_to_python_date_format
from docxcompose.utils import xpath
from lxml.etree import FunctionNamespace
from lxml.etree import QName
from six import binary_type
from six import text_type
import pkg_resources
import re
CUSTOM_PROPERTY_FMTID = '{D5CDD505-2E9C-101B-9397-08002B2CF9AE}'
CUSTOM_PROPERTY_TYPES = {
'text': ''.format(NS['vt']),
'int': ''.format(NS['vt']),
'bool': ''.format(NS['vt']),
'datetime': ''.format(NS['vt']),
'float': ''.format(NS['vt']),
}
MIN_PID = 2 # Property IDs have to start with 2
def value2vt(value):
if isinstance(value, bool):
el = parse_xml(CUSTOM_PROPERTY_TYPES['bool'])
el.text = 'true' if value else 'false'
elif isinstance(value, int):
el = parse_xml(CUSTOM_PROPERTY_TYPES['int'])
el.text = text_type(value)
elif isinstance(value, float):
el = parse_xml(CUSTOM_PROPERTY_TYPES['float'])
el.text = text_type(value)
elif isinstance(value, datetime):
el = parse_xml(CUSTOM_PROPERTY_TYPES['datetime'])
el.text = value.strftime('%Y-%m-%dT%H:%M:%SZ')
elif isinstance(value, text_type):
el = parse_xml(CUSTOM_PROPERTY_TYPES['text'])
el.text = value
elif isinstance(value, binary_type):
value = value.decode('utf-8')
el = parse_xml(CUSTOM_PROPERTY_TYPES['text'])
el.text = value
else:
raise TypeError('Unsupported type {}'.format(type(value)))
return el
def vt2value(element):
tag = QName(element).localname
if tag == 'bool':
if element.text.lower() == u'true':
return True
else:
return False
elif tag in ['i1', 'i2', 'i4', 'int', 'ui1', 'ui2', 'ui4', 'uint']:
return int(element.text)
elif tag in ['r4', 'r8']:
return float(element.text)
elif tag == 'filetime':
return CT_CoreProperties._parse_W3CDTF_to_datetime(element.text)
elif tag == 'lpwstr':
return element.text if element.text else u''
else:
return element.text
def is_text_property(property):
tag = QName(property).localname
return tag in ['bstr', 'lpstr', 'lpwstr']
ns = FunctionNamespace(None)
# lxml doesn't support XPath 2.0 functions
# Thus we implement lower-case() as an extension function
@ns('lower-case')
def lower_case(context, a):
return [el.lower() for el in a]
class CustomProperties(object):
"""Custom doc properties stored in ``/docProps/custom.xml``.
Allows updating of doc properties in a document.
"""
def __init__(self, doc):
self.doc = doc
self.part = None
self._element = None
self.language = self.get_doc_language()
try:
part = doc.part.package.part_related_by(RT.CUSTOM_PROPERTIES)
except KeyError:
self._element = parse_xml(self._part_template())
else:
self.part = part
self._element = parse_xml(part.blob)
def _part_template(self):
return pkg_resources.resource_string(
'docxcompose', 'templates/custom.xml')
def _update_part(self):
if self.part is None:
# Create a new part for custom properties
partname = PackURI('/docProps/custom.xml')
self.part = Part(
partname, CT.OFC_CUSTOM_PROPERTIES,
serialize_part_xml(self._element), self.doc.part.package)
self.doc.part.package.relate_to(self.part, RT.CUSTOM_PROPERTIES)
self._element = parse_xml(self.part.blob)
else:
self.part._blob = serialize_part_xml(self._element)
def __getitem__(self, key):
"""Get the value of a property."""
props = xpath(
self._element,
u'.//cp:property[lower-case(@name)="{}"]'.format(key.lower()))
if not props:
raise KeyError(key)
return vt2value(props[0][0])
def __setitem__(self, key, value):
"""Set the value of a property."""
props = xpath(
self._element,
u'.//cp:property[lower-case(@name)="{}"]'.format(key.lower()))
if not props:
self.add(key, value)
return
value_el = props[0][0]
new_value_el = value2vt(value)
value_el.getparent().replace(value_el, new_value_el)
self._update_part()
def __delitem__(self, key):
"""Delete a property."""
props = xpath(
self._element,
u'.//cp:property[lower-case(@name)="{}"]'.format(key.lower()))
if not props:
raise KeyError(key)
props[0].getparent().remove(props[0])
# Renumber pids
pid = MIN_PID
for prop in self._element:
prop.set('pid', text_type(pid))
pid += 1
self._update_part()
def get_doc_language(self):
"""We actually should determine the correct language for each field.
Instead we simply determine the language from the first w:lang tag in
the document, and if None are found, from the w:lang tag in the default
style.
"""
lang_tags = xpath(self.doc.element, ".//w:lang")
lang_tags.extend(xpath(self.doc.styles.element, ".//w:lang"))
# keep the first tag containing a setting for Latin languages
latin_lang_key = "{{{}}}val".format(NS["w"])
lang_tags = [tag for tag in lang_tags if latin_lang_key in tag.keys()]
if lang_tags:
language = lang_tags[0].attrib[latin_lang_key]
# babel does not support dashes in combined language codes
return language.replace("-", "_")
return None
def nullify(self, key):
"""Delete key for non text-properties, set key to empty string for
text.
"""
props = xpath(
self._element,
u'.//cp:property[lower-case(@name)="{}"]'.format(key.lower()))
if not props:
raise KeyError(key)
if is_text_property(props[0][0]):
self[key] = ''
else:
del self[key]
def __contains__(self, item):
props = xpath(
self._element,
u'.//cp:property[lower-case(@name)="{}"]'.format(item.lower()))
if props:
return True
else:
return False
def get(self, key, default=None):
try:
return self[key]
except KeyError:
return default
def add(self, name, value):
"""Add a property."""
pids = [int(pid) for pid in xpath(self._element, u'.//cp:property/@pid')]
if pids:
pid = max(pids) + 1
else:
pid = MIN_PID
prop = parse_xml(''.format(NS['cp']))
prop.set('fmtid', CUSTOM_PROPERTY_FMTID)
prop.set('name', name)
prop.set('pid', text_type(pid))
value_el = value2vt(value)
prop.append(value_el)
self._element.append(prop)
self._update_part()
def keys(self):
if self._element is None:
return []
props = xpath(self._element, u'.//cp:property')
return [prop.get('name') for prop in props]
def values(self):
if self._element is None:
return []
props = xpath(self._element, u'.//cp:property')
return [vt2value(prop[0]) for prop in props]
def items(self):
if self._element is None:
return []
props = xpath(self._element, u'.//cp:property')
return [(prop.get('name'), vt2value(prop[0])) for prop in props]
def set_properties(self, properties):
for name, value in properties.items():
self.set(name, value)
def find_docprops_in_document(self, name=None):
"""This method searches for all doc-properties in the document and
in section headers and footers.
"""
docprops = []
for section in self.doc.sections:
all_header_footers = [section.first_page_header,
section.header,
section.even_page_header,
section.first_page_footer,
section.footer,
section.even_page_footer,
]
# word seems to keep "hidden" header and footer definitions, so
# even though some may have been deactivated via the
# "different first page" or "different odd & even pages" checkboxes
# the definitions will be accessible and also reactivated when the
# checkboxes are re-enabled.
# we deliberately bypass the `different_first_page_header_footer`
# accessor method and check via the underlying `_has_definition`
# method if the header/footer has a definition in xml.
for container in all_header_footers:
if container._has_definition and not container.is_linked_to_previous:
docprops.extend(self._find_docprops_in(
container.part.element, name=name))
docprops.extend(self._find_docprops_in(
self.doc.element.body, name=name))
return docprops
def _find_docprops_in(self, element, name=None):
# First we search for the simple fields:
sfield_nodes = xpath(
element,
u'.//w:fldSimple[contains(@w:instr, \'DOCPROPERTY \')]')
docprops = [SimpleField(sfield_node) for sfield_node in sfield_nodes]
# Now for the complex fields
cfield_nodes = xpath(
element,
u'.//w:instrText[contains(.,\'DOCPROPERTY \')]')
docprops.extend([ComplexField(cfield_node) for cfield_node in cfield_nodes])
if name is not None:
docprops = filter(lambda prop: prop.name == name, docprops)
return docprops
def update_all(self):
"""Update all the document's doc-properties."""
docprops = self.find_docprops_in_document()
available_docprops = dict(self.items())
for docprop in docprops:
value = available_docprops.get(docprop.name)
if value is None:
continue
docprop.update(value, language=self.language)
def update(self, name, value):
"""Update all instances of a given doc-property in the document."""
docprops = self.find_docprops_in_document(name)
for docprop in docprops:
docprop.update(value, language=self.language)
def dissolve_fields(self, name):
"""Remove the property fields but keep their value."""
docprops = self.find_docprops_in_document(name)
for docprop in docprops:
docprop.replace_field_with_value()
class FieldBase(object):
"""Class used to represent a docproperty field in the document.xml.
"""
fieldname_and_format_search_expr = re.compile(
r'DOCPROPERTY +"{0,1}([^\\]*?)"{0,1} +(?:\\\@ +"{0,1}([^\\]*?)"{0,1} +){0,1}\\\* MERGEFORMAT',
flags=re.UNICODE)
def __init__(self, field_node):
self.node = field_node
self.name, self.date_format = self._parse_fieldname_and_format()
if self.date_format:
self.date_format = word_to_python_date_format(self.date_format)
else:
self.date_format = "short"
def _format_value(self, value, language=None):
if isinstance(value, bool):
return u'Y' if value else u'N'
elif isinstance(value, datetime):
if language is not None:
return format_datetime(value, self.date_format, locale=language)
return format_datetime(value, self.date_format)
else:
return text_type(value)
def update(self, value, language=None):
""" Sets the value of the docproperty in the document
"""
raise NotImplementedError()
def replace_field_with_value(self):
""" Removes the field from the document, replacing it with
its value.
"""
raise NotImplementedError()
def _get_fieldname_string(self):
raise NotImplementedError()
def _parse_fieldname_and_format(self):
match = self.fieldname_and_format_search_expr.search(
self._get_fieldname_string())
if match is None:
return None, None
return match.groups()
class SimpleField(FieldBase):
""" Represents a simple field, i.e. node in the
document.xml, its body containing the value of the field.
self.node here is the node.
"""
attr_name = "{{{}}}instr".format(NS["w"])
def _get_fieldname_string(self):
return self.node.attrib[self.attr_name]
def update(self, value, language=None):
text = xpath(self.node, './/w:t')
if text:
text[0].text = self._format_value(value, language=language)
def replace_field_with_value(self):
parent = self.node.getparent()
index = list(parent).index(self.node)
w_r = deepcopy(self.node[0])
parent.remove(self.node)
parent.insert(index, w_r)
class InvalidComplexField(Exception):
"""This exception is raised when a complex field cannot
be handled correctly."""
class ComplexField(FieldBase):
""" Represents a complex field, i.e. a several nodes delimited by runs
containing and .
In these fields, the actual value is stored in nodes that come after a
node.
"""
XPATH_PRECEDING_BEGINS = "./preceding-sibling::w:r/w:fldChar[@w:fldCharType=\"begin\"]/.."
XPATH_FOLLOWING_ENDS = "./following-sibling::w:r/w:fldChar[@w:fldCharType=\"end\"]/.."
XPATH_FOLLOWING_SEPARATES = "./following-sibling::w:r/w:fldChar[@w:fldCharType=\"separate\"]/.."
XPATH_TEXTS = "w:instrText"
def __init__(self, field_node):
# run and paragraph containing the field
self.w_r = field_node.getparent()
self.w_p = self.w_r.getparent()
super(ComplexField, self).__init__(field_node)
def _get_fieldname_string(self):
"""The field name can be split up in several instrText runs
so we look for all the instrText nodes between the begin and either
separate or end runs
"""
separate_run = self.get_separate_run()
last = (self.w_p.index(separate_run) if separate_run is not None
else self.w_p.index(self.end_run))
runs = [run for run in self._runs if self.w_p.index(run) < last]
texts = []
for run in runs:
texts.extend(xpath(run, self.XPATH_TEXTS))
return "".join([each.text for each in texts])
@property
def begin_run(self):
begins = xpath(self.w_r, self.XPATH_PRECEDING_BEGINS)
if not begins:
msg = "Complex field without begin node is not supported"
raise InvalidComplexField(msg)
return begins[-1]
@property
def end_run(self):
if not hasattr(self, "_end_run"):
ends = xpath(self.w_r, self.XPATH_FOLLOWING_ENDS)
if not ends:
msg = "Complex field without end node is not supported"
raise InvalidComplexField(msg)
self._end_run = ends[0]
return self._end_run
def get_separate_run(self):
"""The ooxml format standard says that the separate node is optional,
so we check whether we find one in our complex field, otherwise
we return None."""
separates = xpath(self.w_r, self.XPATH_FOLLOWING_SEPARATES)
if not separates:
return None
separate = separates[0]
if not self.w_p.index(separate) < self.w_p.index(self.end_run):
return None
return separate
@property
def _runs(self):
return xpath(self.begin_run, "./following-sibling::w:r")
def get_runs_for_update(self):
"""
Get run fields after
"""
end_index = self.w_p.index(self.end_run)
separate_run = self.get_separate_run()
# if there is no separate, we have no value to update
if separate_run is None:
return []
separate_index = self.w_p.index(separate_run)
return [run for run in self._runs
if self.w_p.index(run) > separate_index and
self.w_p.index(run) < end_index]
def get_runs_to_replace_field_with_value(self):
"""
Get all nodes between
and including boundaries,
plus the node
"""
separate_run = self.get_separate_run()
# If there is no separate, then the field has no value
# meaning we can remove the whole field.
if separate_run is None:
end_index = self.w_p.index(self.end_run)
runs = [run for run in self._runs
if self.w_p.index(run) < end_index]
else:
separate_index = self.w_p.index(separate_run)
runs = [run for run in self._runs
if self.w_p.index(run) <= separate_index]
runs.insert(0, self.begin_run)
runs.append(self.end_run)
return runs
def update(self, value, language=None):
runs_after_separate = self.get_runs_for_update()
if runs_after_separate:
first_w_r = runs_after_separate[0]
text = xpath(first_w_r, u'.//w:t')
if text:
text[0].text = self._format_value(value, language=language)
# remove any additional text-nodes inside the first run. we
# update the first text-node only with the full cached
# docproperty value. if for some reason the initial cached
# value is split into multiple text nodes we remove any
# additional node after updating the first node.
for unnecessary_w_t in text[1:]:
first_w_r.remove(unnecessary_w_t)
# if there are multiple runs between "separate" and "end" they
# all may contain a piece of the cached docproperty value. we
# can't reliably handle this situation and only update the
# first node in the first run with the full cached value. it
# appears any additional runs with text nodes should then be
# removed to avoid duplicating parts of the cached docproperty
# value.
for run in runs_after_separate[1:]:
text = xpath(run, u'.//w:t')
if text:
self.w_p.remove(run)
else:
# create a run using
# the run as a template.
# the node can contain all kind of formatting information, the
# easiest way to preserve it seems to base new nodes on an existing
# node.
# we just swap out the fldCharType from begin to separate.
separate_run = deepcopy(self.begin_run)
w_fld_char = xpath(separate_run, 'w:fldChar')[0]
w_fld_char.set('{{{}}}fldCharType'.format(NS['w']), 'separate')
# create new run containing the actual docproperty value using
# the run as a template.
# the node can contain all kind of formatting information, the
# easiest way to preserve it seems to base new nodes on an existing
# node.
# we drop the fldChar node and insert a text node instead.
value_run = deepcopy(self.begin_run)
value_run.remove(xpath(value_run, 'w:fldChar')[0])
text = parse_xml(''.format(NS['w']))
text.text = self._format_value(value, language=language)
value_run.append(text)
# insert newly created nodes after the node containing the
# docproperty field code in .
docprop_index = self.w_p.index(self.w_r)
self.w_p.insert(docprop_index + 1, separate_run)
self.w_p.insert(docprop_index + 2, value_run)
def replace_field_with_value(self):
# Get list of nodes for removal
runs_to_remove = self.get_runs_to_replace_field_with_value()
for run in runs_to_remove:
self.w_p.remove(run)
docxcompose-1.4.0/docxcompose/sdt.py 0000664 0000000 0000000 00000011043 14346315420 0017527 0 ustar 00root root 0000000 0000000 from docxcompose.utils import xpath
from lxml.etree import Element
from lxml.etree import QName
class StructuredDocumentTags(object):
"""Structured Document Tags (aka Content Controls)"""
def __init__(self, doc):
self.doc = doc
def tags_by_alias(self, alias):
"""Get Structured Document Tags by alias."""
return xpath(
self.doc.element.body,
'.//w:sdt/w:sdtPr/w:alias[@w:val="%s"]/ancestor::w:sdt' % alias)
def set_text(self, alias, text):
"""Set the text content of all Structured Document Tags identified by
an alias. Only plain text SDTs are supported.
If the SDT has the 'multiLine' property, newlines in `text` will be
respected, and the SDTs content will be updated with lines separated
by line breaks.
"""
text = text.strip()
tags = self.tags_by_alias(alias)
for tag in tags:
# Ignore if it's not a plain text SDT
plain_text = xpath(tag, './w:sdtPr/w:text')
if not plain_text:
continue
nsmap = tag.nsmap
is_multiline = bool(plain_text[0].xpath('./@w:multiLine', namespaces=nsmap))
properties = xpath(tag, './w:sdtPr')
content = xpath(tag, './w:sdtContent')
if not content:
continue
run_elements = xpath(content[0], './/w:r')
if not run_elements:
continue
# First, prepare the SDT for easy updating of its value.
#
# We do this by cleaning out the SDT content to only preserve
# the first of possibly many runs, and remove the contents of
# that run (except w:rPr formatting properties).
#
# That run can then be filled with new text nodes and line breaks
# as needed. This should allow us to preserve formatting, but
# otherwise start from a clean slate where we create new nodes
# instead of having to carefully update an existing structure.
first_run = run_elements[0]
self._remove_placeholder(properties, content, first_run)
self._remove_all_runs_except_first(run_elements)
self._clean_first_run(first_run)
# Now update contents by appending new text nodes.
#
# If the SDT has the multiLine property, we respect newlines
# in the input value string and create text nodes delimited by
# line breaks.
if not is_multiline:
text = text.replace('\n', ' ')
lines = text.splitlines()
for i, line in enumerate(lines, start=1):
txt_node = Element(QName(nsmap['w'], "t"))
txt_node.text = line
first_run.append(txt_node)
if i != len(lines):
br = Element(QName(nsmap['w'], "br"))
first_run.append(br)
def _remove_placeholder(self, properties, content, first_run):
"""Remove placeholder marker and style.
"""
showing_placeholder = xpath(properties[0], './w:showingPlcHdr')
if showing_placeholder:
properties[0].remove(showing_placeholder[0])
run_props = xpath(first_run, './w:rPr')
if run_props:
first_run.remove(run_props[0])
def _remove_all_runs_except_first(self, run_elements):
"""Remove all runs except the first one.
"""
for run in run_elements[1:]:
run.getparent().remove(run)
def _clean_first_run(self, first_run):
"""Remove all elements from the first run except run formatting.
"""
for child in first_run.getchildren():
# Preserve formatting
if QName(child).localname == 'rPr':
continue
first_run.remove(child)
def get_text(self, alias):
"""Get the text content of the first Structured Document Tag identified
by the given alias.
"""
tags = self.tags_by_alias(alias)
for tag in tags:
# Ignore if it's not a plain text SDT
if not xpath(tag, './w:sdtPr/w:text'):
continue
tokens = []
text_and_brs = xpath(tag, './w:sdtContent//w:r/*[self::w:t or self::w:br]')
for el in text_and_brs:
if QName(el).localname == 't':
tokens.append(el.text)
elif QName(el).localname == 'br':
tokens.append('\n')
return ''.join(tokens)
docxcompose-1.4.0/docxcompose/templates/ 0000775 0000000 0000000 00000000000 14346315420 0020362 5 ustar 00root root 0000000 0000000 docxcompose-1.4.0/docxcompose/templates/custom.xml 0000664 0000000 0000000 00000000361 14346315420 0022416 0 ustar 00root root 0000000 0000000