PubTal-3.5/0000755000105000010500000000000011555341012011325 5ustar cms103cms103PubTal-3.5/LICENSE.txt0000644000105000010500000000273211555340742013165 0ustar cms103cms103PubTal 3.5 -------------------------------------------------------------------- Copyright (c) 2011 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. PubTal-3.5/README.txt0000644000105000010500000000527311555340742013043 0ustar cms103cms103PubTal 3.5 ---------- A template driven web site builder for small sites. Installation ------------ Full installation instructions for Linux, MacOS X and Windows can be found in documentation/html/installation.html. To install PubTal under Unix: (Note that to perform the installation of PubTal you will probably have to have the Python Development package installed.) 1 - Become root 2 - Run "python setup.py install" Installing Plugins ------------------ PubTal supports the addtion of new functionality through a plugin architecture. Several plugins are installed by default with PubTal to provide support for HTMLText, OpenOffice, Catalogue, Binary, and Raw content types. Additional plugins that are not installed by default can be found in the optional-plugins directory. Currently these include: textile.py - provides Textile (http://www.textism.com/tools/textile/) support. This requires pyTextile (http://diveintomark.org/projects/pytextile/) and Python 2.2 to be installed. abiwordContent - provides AbiWord content support. AbiWord currently has some significant bugs, which is why this plugin is not installed by default. CSVPlugin - Provides support for generating multiple web pages base on the contents of a .CSV file. Documentation on how to use this plugin is included in the main documentation. To install these extra plugins (or any other PubTal 2.x plugin) simply copy the plugin to the location of the PubTal plugin directory, beneath the Python site-packages directory. (Under Debian this is can be found in: /usr/lib/python2.2/site-packages/pubtal/plugins/) Alternative add the following configuration option to your site configuration file, replacing /usr/local/PubTal/plugins/ with the path to the plugin dir: additional-plugins-dir /usr/local/PubTal/plugins/ The Commands ------------ PubTal has one command used to build a site's HTML pages, and one to upload generated pages. The command for generating a site is: updateSite.py configFile To upload a site use the uploadSite.py command, details of which can be found in the documentation. Getting Started --------------- Documentation for PubTal can be found in the documentation/html/ directory. Additionally there are several example websites included for experimentation under the examples directory. A very straight forward example is in examples/homepage/ and a more complicated example demonstrating macros is in examples/macro-example/ Migrating between PubTal 2.x and 3.0 ------------------------------------ All 2.x PubTal sites should work, without any changes, in the 3.x series. Any custom written plugins for 2.x will need to be updated to reflect internal API changes. PubTal-3.5/lib/0000755000105000010500000000000011555341012012073 5ustar cms103cms103PubTal-3.5/lib/pubtal/0000755000105000010500000000000011555341012013362 5ustar cms103cms103PubTal-3.5/lib/pubtal/DateContext.py0000644000105000010500000000573311555340742016177 0ustar cms103cms103""" A class that can provide a date/time in any timeformat.format() format and both local and UTC timezones within a ContextVariable. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import re, time, math, string import timeformat from simpletal import simpleTALES PATHREGEX = re.compile ('^((?:local)|(?:utc))/?(.*)$') class Date (simpleTALES.ContextVariable): """ Wraps a DateTime and provides context paths local and utc. These paths in turn can take TimeFormat formats, for example: utc/%d-%m-%Y """ def __init__ (self, value = None, defaultFormat = '%a[SHORT], %d %b[SHORT] %Y %H:%M:%S %Z'): """ The value should be in the LOCAL timezone. """ self.ourValue = value self.defaultFormat = defaultFormat def value (self, currentPath=None): # Default to local timezone and RFC822 format utcTime = 0 strFrmt = self.defaultFormat if (currentPath is not None): index, paths = currentPath currentPath = '/'.join (paths[index:]) match = PATHREGEX.match (currentPath) if (match is not None): type = match.group(1) if (type == 'local'): utcTime = 0 else: utcTime = 1 strFrmt = match.group(2) if (strFrmt == ""): strFrmt = self.defaultFormat if (self.ourValue is None): # Default to the current time! timeValue = time.localtime() else: timeValue = self.ourValue if (utcTime): # Convert to UTC (GMT) timeValue = time.gmtime (time.mktime (timeValue)) value = timeformat.format (strFrmt, timeValue, utctime=utcTime) raise simpleTALES.ContextVariable (value) PubTal-3.5/lib/pubtal/MessageBus.py0000644000105000010500000000521411555340742016005 0ustar cms103cms103""" A really simple internal message bus for PubTal. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import logging class MessageBus: def __init__ (self): # This contains eventType: FunctionDictionary pairs. self.listeners = {} self.log = logging.getLogger ("MessageBus") def registerListener (self, eventType, func): currentListeners = self.listeners.get (eventType, {}) currentListeners [func] = func self.listeners [eventType] = currentListeners self.log.info ("Function %s registered for event type %s" % (repr (func), eventType)) def unregisterListener (self, eventType, func): currentListeners = self.listeners.get (eventType, {}) try: del currentListeners [func] self.log.info ("Function %s un-registered for event type %s" % (repr (func), eventType)) except: self.log.warn ("Function %s was not registered for event type %s, but tried to unregister." % (repr (func), eventType)) def notifyEvent (self, eventType, data=None): self.log.info ("Event %s data %s" % (eventType, repr (data))) currentListeners = self.listeners.get (eventType, {}) for listener in currentListeners.values(): self.log.debug ("Calling %s" % repr (listener)) listener (eventType, data) PubTal-3.5/lib/pubtal/CatalogueContent.py0000644000105000010500000000144611555340742017211 0ustar cms103cms103class CatalogueContent: def __init__ (self, contentFile, codec): self.filename = contentFile self.codec = codec sourceFile = open (contentFile, 'r') self.items = [] self.catalogueHeaders = self._readHeaders_(sourceFile) headers = self._readHeaders_ (sourceFile) while (len (headers) > 0): self.items.append (headers) headers = self._readHeaders_ (sourceFile) def getCatalogueHeaders (self): return self.catalogueHeaders def getItems (self): return self.items def _readHeaders_ (self, sourceFile): readingHeaders = 1 headers = {} while (readingHeaders): line = self.codec (sourceFile.readline())[0] offSet = line.find (':') if (offSet > 0): headers [line[0:offSet]] = line[offSet + 1:].strip() else: readingHeaders = 0 return headers PubTal-3.5/lib/pubtal/ContentToHTMLConverter.py0000644000105000010500000004274711555340742020255 0ustar cms103cms103""" Classes to convert entered content into HTML Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. """ import os, sgmllib, StringIO, cgi, re, codecs import xml.sax try: import logging except: import InfoLogging as logging import HTMLWriter # Used to spot already escaped attributes in HTMLText ESCAPED_TEXT_REGEX=re.compile (r"\&\S+?;") # Used to determine how much of the resulting output should be shown: MAX_CONTEXT=200 class ContentParseException (Exception): def __init__ (self, msg): self.msg = msg def __str__ (self): return self.msg class BaseContentConverter: def handleAccumulatedData (self, openingNewBlock=0, docEnded=0): data = u"".join (self.characterData) if (docEnded): if (len (data.strip()) == 0): # We do nothing - it's the last thing! return # We are not in a block, so let's do the paragraph thing. paraData = data.split ('\n\n') paraCount = len (paraData) - 1 for para in paraData: # We have something useful in this paragraph data. lines = para.split ('\n') # Loop over *all* lines. lineCount = len (lines) - 1 for line in lines: # Check to see whether we have already written data. if (self.currentParagraph.getDataLength() > 0): # Yes, see whether we wanted to write a newline. if (self.writeNewLine): # Special case: When we are about to open a new block level item and we are on the last # paragraph and last line, we don't want a newline! if (openingNewBlock == 1 and paraCount == 0 and lineCount == 0): self.log.debug ("Suppressing new line for last bit!") else: if (not self.ignoreNewLines): self.currentParagraph.lineBreak () else: self.currentParagraph.write ('\n') # We've done this request now. self.writeNewLine = 0 self.currentParagraph.write (line) # We want a newline next... self.writeNewLine = 1 lineCount -= 1 # We have finished paragraph, so clear the new line flag self.writeNewLine = 0 if (paraCount > 0): # We have another paragraph coming, so close this one. self.closeParagraph() paraCount -= 1 self.characterData = [] def closeParagraph (self): self.currentParagraph.endElement ('p') outputStream = self.currentParagraph.getOutput() asData = outputStream.getvalue() outputStream.close() withNoPTags = asData [3:-5] if (len (withNoPTags.strip()) > 0): # There is actual, useful, content in this paragraph. self.result.write (asData) # Prepare a new paragraph for the future. if (self.plainTextOuput): self.currentParagraph = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = self.preserveSpaces, exceptionOnError=1) else: self.currentParagraph = HTMLWriter.HTMLWriter (outputStream = StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = self.preserveSpaces, exceptionOnError=1) self.currentParagraph.startElement ('p') class ContentToXHTMLConverter (xml.sax.handler.ContentHandler, xml.sax.handler.DTDHandler, xml.sax.handler.ErrorHandler, BaseContentConverter): """ Convert entered markup into XHTML1. Paragraph and line break elements are added to the content, taking into consideration any block level markup that might have been entered by the user. """ def __init__ (self): xml.sax.handler.ContentHandler.__init__ (self) self.log = logging.getLogger ("PubTal.ContentToXHTMLConverter") self.outputXHTML = 1 # Use utf-8 instead of utf-16 internally because the Python SAX implementation is # a bit broken, and doesn't understand U+feff and keeps it when sending it to the SAX handler self.utf8Encoder = codecs.lookup ("utf-8")[0] self.SPECIAL_START_TAG = self.utf8Encoder (u'\n')[0] self.SPECIAL_END_TAG = self.utf8Encoder (u'')[0] def convertContent (self, content, ignoreNewLines=0, preserveSpaces=1, plainTextOuput=0): self.ignoreNewLines = ignoreNewLines self.preserveSpaces = preserveSpaces # This is how deep the non-paragraph tags have reached. self.depthOfTags = 0 self.writeNewLine = 0 self.characterData = [] self.plainTextOuput = plainTextOuput if (plainTextOuput): self.result = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=1, exceptionOnError=1) self.currentParagraph = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=1, exceptionOnError=1) else: self.result = HTMLWriter.HTMLWriter (outputStream = StringIO.StringIO(), outputXHTML=1, preserveSpaces = self.preserveSpaces, exceptionOnError=1) self.currentParagraph = HTMLWriter.HTMLWriter (outputStream = StringIO.StringIO(), outputXHTML=1, preserveSpaces = self.preserveSpaces, exceptionOnError=1) self.currentParagraph.startElement ('p') self.ourParser = xml.sax.make_parser() self.log.debug ("Setting features of parser") self.ourParser.setFeature (xml.sax.handler.feature_external_ges, 0) self.ourParser.setContentHandler (self) self.ourParser.setErrorHandler (self) file = StringIO.StringIO (self.SPECIAL_START_TAG + self.utf8Encoder (content)[0] + self.SPECIAL_END_TAG) # Parse the content as XML try: self.ourParser.parse (file) except Exception, e: self.log.error ("Error parsing input: " + str (e)) raise # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData(docEnded=1) # See if there is anything not yet written out self.closeParagraph() resultFile = self.result.getOutput() data = resultFile.getvalue() resultFile.close() return data def startElement (self, origtag, attributes): #self.log.debug ("Recieved Real Start Tag: " + origtag + " Attributes: " + str (attributes)) tag = origtag.lower() # Convert attributes into a list of tuples atts = [] for att in attributes.getNames(): self.log.debug ("Attribute name %s has value %s" % (att, attributes[att])) atts.append (' ') atts.append (att) atts.append ('="') atts.append (cgi.escape (attributes[att], quote=1)) atts.append ('"') atts = u"".join (atts) if (origtag != u"XMLCONTENTTYPESPECIALTAG"): if (self.depthOfTags > 0): # We are simply writing this out self.depthOfTags += 1 try: self.result.startElement (tag, atts) except HTMLWriter.TagNotAllowedException, tagErr: curResult = self.result.getOutput().getvalue() if (len (curResult) > MAX_CONTEXT): curResult = "...%s" % curResult [-MAX_CONTEXT:] msg = "Element <%s%s> is not allowed in current location: %s" % (tag, atts, curResult) self.log.error (msg) raise ContentParseException (msg) else: # We are currently writing to a paragraph. Can this continue? # Find out whether this tag can go in a paragraph. if (self.currentParagraph.isElementAllowed (tag)): self.log.debug ("Element %s found to be allowed." % str (tag)) # We can keep going. # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData() self.currentParagraph.startElement (tag, atts) else: self.log.debug ("Element %s not allowed, closing paragraph." % str (tag)) # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData(openingNewBlock=1) # We aren't allowed this element in a paragraph. # We have to assume that the user knows it can go into the template as-is. # First we check that there aren't any tags left open that need to be closed... paraStack = self.currentParagraph.getCurrentElementStack() if (len (paraStack) > 1): paraResult = self.currentParagraph.getOutput().getvalue() if (len (paraResult) > MAX_CONTEXT): paraResult = "...%s" % paraResult [-MAX_CONTEXT:] msg = "Element <%s%s> is not allowed in current location: %s" % (tag, atts, paraResult) self.log.error (msg) raise ContentParseException (msg) # Write out the current paragraph, and then pass this on directly to the output self.closeParagraph() # Now write this out... self.depthOfTags += 1 self.result.startElement (tag, atts) def endElement (self, tag): #self.log.debug ("Recieved Real End Tag: " + tag) if (tag != u"XMLCONTENTTYPESPECIALTAG"): tag = tag.lower() # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData() if (self.depthOfTags > 0): # This doesn't belong to a paragraph self.result.endElement (tag) self.depthOfTags -= 1 else: # This is destined for a paragraph self.currentParagraph.endElement (tag) def fatalError (self, msg): self.error (msg) def error (self, msg): if (self.depthOfTags > 0): # Error occured in the main document. curResult = self.result.getOutput().getvalue() if (len (curResult) > MAX_CONTEXT): curResult = "...%s" % curResult [-MAX_CONTEXT:] msg = "Error %s occured shortly after: %s" % (msg, curResult) self.log.error (msg) raise ContentParseException (msg) else: # Flush out any remaining data so that our error message is more complete. if len (self.characterData) > 0: self.handleAccumulatedData() curResult = self.currentParagraph.getOutput().getvalue() if (len (curResult) > MAX_CONTEXT): curResult = "...%s" % curResult [-MAX_CONTEXT:] msg = "Error %s occured shortly after: %s" % (msg, curResult) self.log.error (msg) raise ContentParseException (msg) def characters (self, data): if (self.depthOfTags > 0): # We are in a block, so just output self.result.write (cgi.escape (data)) return # Accumulate the character data together so that we can merge all the newline events self.characterData.append (cgi.escape (data)) class ContentToHTMLConverter (sgmllib.SGMLParser, BaseContentConverter): """ Convert entered markup into HTML. Paragraph and line break elements are added to the content, taking into consideration any block level markup that might have been entered by the user. """ def __init__ (self): self.outputXHTML = 0 self.log = logging.getLogger ("PubTal.HTMLText.ContentToHTMLConverter") sgmllib.SGMLParser.__init__ (self) def convertContent (self, content, ignoreNewLines=0, preserveSpaces = 1, plainTextOuput=0): self.ignoreNewLines = ignoreNewLines self.preserveSpaces = preserveSpaces # This is how deep the non-paragraph tags have reached. self.depthOfTags = 0 self.writeNewLine = 0 self.characterData = [] self.plainTextOuput = plainTextOuput if (plainTextOuput): self.result = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=0, exceptionOnError=1) self.currentParagraph = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=0, exceptionOnError=1) else: self.result = HTMLWriter.HTMLWriter (outputStream = StringIO.StringIO(), outputXHTML=0, preserveSpaces = self.preserveSpaces, exceptionOnError=1) self.currentParagraph = HTMLWriter.HTMLWriter (outputStream = StringIO.StringIO(), outputXHTML=0, preserveSpaces = self.preserveSpaces, exceptionOnError=1) self.currentParagraph.startElement ('p') self.feed (content.strip()) # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData(docEnded=1) # See if there is anything not yet written out self.closeParagraph() resultFile = self.result.getOutput() data = resultFile.getvalue() resultFile.close() return data def unknown_starttag (self, origtag, attributes): attStack = [] for name, value in attributes: attStack.append (' ') attStack.append (name) attStack.append ('="') if (ESCAPED_TEXT_REGEX.search (value) is not None): # We already have some escaped characters in here, so assume it's all valid attStack.append (value) else: attStack.append (cgi.escape (value)) attStack.append ('"') atts = u"".join (attStack) tag = origtag.lower() self.log.debug ("Recieved start tag %s" % tag) if (self.depthOfTags > 0): # We are simply writing this out # Are we expecting an end tag? if (not self.result.isEndTagForbidden (tag)): # Yes, we'll count it self.depthOfTags += 1 try: self.result.startElement (tag, atts) except HTMLWriter.TagNotAllowedException, tagErr: curResult = self.result.getOutput().getvalue() if (len (curResult) > MAX_CONTEXT): curResult = "...%s" % curResult [-MAX_CONTEXT:] msg = "Element <%s%s> is not allowed in current location: %s" % (tag, atts, curResult) self.log.error (msg) raise ContentParseException (msg) else: # We are currently writing to a paragraph. Can this continue? # Find out whether this tag can go in a paragraph. if (self.currentParagraph.isElementAllowed (tag)): self.log.debug ("Element %s found to be allowed." % str (tag)) # We can keep going. # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData() self.currentParagraph.startElement (tag, atts) else: self.log.debug ("Element %s not allowed, closing paragraph." % str (tag)) # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData(openingNewBlock=1) # We aren't allowed this element in a paragraph. # We have to assume that the user knows it can go into the template as-is. # First we check that there aren't any tags left open that need to be closed... paraStack = self.currentParagraph.getCurrentElementStack() if (len (paraStack) > 1): paraResult = self.currentParagraph.getOutput().getvalue() if (len (paraResult) > MAX_CONTEXT): paraResult = "...%s" % paraResult [-MAX_CONTEXT:] msg = "Element <%s%s> is not allowed in current location: %s" % (tag, atts, paraResult) self.log.error (msg) raise ContentParseException (msg) # Write out the current paragraph, and then pass this on directly to the output self.closeParagraph() # Now write this out... if (not self.result.isEndTagForbidden (tag)): # Yes, we'll count it self.depthOfTags += 1 self.result.startElement (tag, atts) def unknown_endtag (self, tag): tag = tag.lower() # Handle any accumulated character data if len (self.characterData) > 0: self.handleAccumulatedData() if (self.depthOfTags > 0): # This doesn't belong to a paragraph try: self.result.endElement (tag) except HTMLWriter.BadCloseTagException, badTag: curResult = self.result.getOutput().getvalue() if (len (curResult) > MAX_CONTEXT): curResult = "...%s" % curResult [-MAX_CONTEXT:] msg = "End tag is not allowed in current location: %s" % (tag, curResult) self.log.error (msg) raise ContentParseException (msg) self.depthOfTags -= 1 else: # This is destined for a paragraph try: self.currentParagraph.endElement (tag) except HTMLWriter.BadCloseTagException, badTag: curResult = self.currentParagraph.getOutput().getvalue() if (len (curResult) > MAX_CONTEXT): curResult = "...%s" % curResult [-MAX_CONTEXT:] msg = "End tag is not allowed in current location: %s" % (tag, curResult) self.log.error (msg) raise ContentParseException (msg) def handle_data (self, data): if (self.depthOfTags > 0): # We are in a block, so just output self.result.write (cgi.escape (data)) return # Accumulate the character data together so that we can merge all the newline events self.characterData.append (cgi.escape (data)) def handle_charref (self, ref): data = u'&#%s;' % ref if (self.depthOfTags > 0): # We are in a block, so just output self.result.write (data) else: # Write to the paragraph # We *don't* call cgi.escape because we already have it in encoded form. self.characterData.append (data) def handle_entityref (self, ref): data = u'&%s;' % ref if (self.depthOfTags > 0): # We are in a block, so just output self.result.write (data) else: # Write to the paragraph # We *don't* call cgi.escape because we already have it in encoded form. self.characterData.append (data) def report_unbalanced (self, tag): raise ContentParseException ("Recieved close tag '%s', but no corresponding open tag." % tag) PubTal-3.5/lib/pubtal/BuiltInPlugins.py0000644000105000010500000003517311555340742016666 0ustar cms103cms103""" Classes to handle HTMLText and Catalogues in PubTal. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: import InfoLogging as logging import SitePublisher, CatalogueContent, ContentToHTMLConverter, SiteUploader, FtpLibrary import os, time, anydbm, codecs import timeformat from simpletal import simpleTAL, simpleTALES # getPluginInfo provides the list of built-in supported content. def getPluginInfo (): builtInContent = [{'functionality': 'content', 'content-type': 'HTMLText' ,'file-type': 'txt','class': HTMLTextPagePublisher} , {'functionality': 'content', 'content-type': 'Catalogue','file-type': 'catalogue','class': CataloguePublisher} , {'functionality': 'upload-method', 'method-type': 'FTP', 'class': FTPUploadMethod}] return builtInContent class CataloguePublisher (SitePublisher.ContentPublisher): def __init__ (self, pagePublisher): SitePublisher.ContentPublisher.__init__ (self, pagePublisher) self.log = logging.getLogger ("PubTal.CataloguePublisher") self.ui = pagePublisher.getUI () def publish (self, page): indexTemplate = self.templateConfig.getTemplate (page.getOption ('catalogue-index-template', 'template.html')) itemTemplate = self.templateConfig.getTemplate (page.getOption ('catalogue-item-template', 'template.html')) maxCols = int (page.getOption ('catalogue-max-columns', '5')) buildIndexPage = 0 buildItemPages = 0 catalogueBuildPages = page.getOption ('catalogue-build-pages', 'index,item') for option in catalogueBuildPages.split (','): if (option == "index"): if (indexTemplate is not None): buildIndexPage = 1 else: msg = "Unable to build the index page for catalogue %s because no catalogue-index-template has been specified." % page.getSource() self.log.warn (msg) self.ui.warn (msg) elif (option == "item"): if (itemTemplate is not None): buildItemPages = 1 else: msg = "Unable to build the item pages for catalogue %s because no catalogue-item-template has been specified." % page.getSource() self.log.warn (msg) self.ui.warn (msg) if (not buildIndexPage | buildItemPages): msg = "Neither index or item pages are being built for catalogue %s" % page.getSource() self.log.warn (msg) self.ui.warn (msg) return itemContentType = page.getOption ('catalogue-item-content-type', None) if (itemContentType is None or itemContentType.lower() == 'none'): # We wish to turn off item content publishing itemContentPublisher = None else: itemContentPublisher = self.pagePublisher.getContentPublisher (itemContentType) if (itemContentPublisher is None): msg = "Unable to find a publisher for catalogue item content type %s." % itemContentType self.log.error (msg) raise SitePublisher.PublisherException (msg) # Build the context pieces we are going to need pageCharSet = page.getOption ('character-set', None) if (pageCharSet is not None): # This page has it's own character set pageCodec = codecs.lookup (self.pageCharSet) else: # This page uses the default character set. pageCodec = self.characterSetCodec catalogue = CatalogueContent.CatalogueContent (page.getSource(), pageCodec) items = [] rows = [] col = [] lastModDate = timeformat.format ('%a[SHORT], %d %b[SHORT] %Y %H:%M:%S %Z', time.localtime (page.getModificationTime())) copyrightYear = timeformat.format ('%Y') # Source paths relativeSourcePath = page.getRelativePath() contentDir = self.contentDir absSourcePath = page.getSource() localDestinationDir = os.path.dirname (page.getRelativePath()) depth = page.getDepthString() self.log.debug ("Building the context for each item in the catalogue.") for itemHeaders in catalogue.getItems(): # Destination paths filename = itemHeaders.get ('filename', None) if (filename is None and buildItemPages): msg = "Unable to publish catalogue %s. Missing filename header in catalogue but item publishing is enabled." % page.getSource() self.log.error (msg) raise SitePublisher.PublisherException (msg) actualHeaders = {} actualHeaders.update (page.getHeaders()) actualHeaders.update (itemHeaders) if (filename is not None): # Used to determine the file to write to, kept in case the pageContext doesn't contain them. relativeDestPath = os.path.join (localDestinationDir, os.path.splitext (filename)[0] + '.html') destPath = os.path.join (self.destDir, relativeDestPath) if (itemContentPublisher is not None): self.log.debug ("Retrieving page context for this catalogue entry.") # We need a page for this entry so that we can get it's content. itemPageList = self.contentConfig.getPages (os.path.join (contentDir, filename), {}) if (len (itemPageList) > 1): self.ui.warn ("Catalogue contains content type that returns more than one page! Only building first page.") itemPage = itemPageList [0] pageContext = itemContentPublisher.getPageContext (itemPage, itemTemplate) actualHeaders.update (pageContext.get ('headers', {})) pageContext ['headers'] = actualHeaders if (not pageContext.has_key ('destinationPath')): pageContext ['destinationPath'] = relativeDestPath if (not pageContext.has_key ('absoluteDestinationPath')): pageContext ['absoluteDestinationPath'] = destPath else: self.log.debug ("No content type for this catalogue entry - just publish what we have.") # Get the generic page information for this file relativeDestPath = os.path.join (localDestinationDir, os.path.splitext (filename)[0] + '.' + itemTemplate.getTemplateExtension()) destPath = os.path.join (self.destDir, relativeDestPath) destFilename = os.path.basename (destPath) actualHeaders = {} actualHeaders.update (page.getHeaders()) actualHeaders.update (itemHeaders) pageContext = {'lastModifiedDate': lastModDate ,'copyrightYear': copyrightYear ,'sourcePath': relativeSourcePath ,'absoluteSourcePath': absSourcePath ,'destinationPath': relativeDestPath ,'absoluteDestinationPath': destPath ,'destinationFilename': destFilename ,'depth': depth ,'headers': actualHeaders } else: # No filename specified for this entry pageContext = {'headers': actualHeaders} items.append (pageContext) if (len (col) == maxCols): rows.append (col) col = [] col.append (pageContext) if (len (col) > 0): rows.append (col) col = [] # Build the Catalogue context catalogueMap = {'entries': items, 'rows': rows, 'headers': catalogue.getCatalogueHeaders()} # Do the individual items now if (buildItemPages): itemCount = 0 itemLength = len (items) for item in items: relativeDestPath = item['destinationPath'] context = simpleTALES.Context(allowPythonPath=1) context.addGlobal ('page', item) if (itemCount > 0): catalogueMap ['previous'] = items[itemCount - 1] elif (catalogueMap.has_key ('previous')): del catalogueMap ['previous'] if (itemCount < itemLength - 1): catalogueMap ['next'] = items[itemCount + 1] elif (catalogueMap.has_key ('next')): del catalogueMap ['next'] context.addGlobal ('catalogue', catalogueMap) macros = page.getMacros() self.pagePublisher.expandTemplate (itemTemplate, context, relativeDestPath, macros) itemCount += 1 if (buildIndexPage): # Cleanup the catalogueMap from the items pages. if (catalogueMap.has_key ('previous')): del catalogueMap ['previous'] if (catalogueMap.has_key ('next')): del catalogueMap ['next'] indexMap = self.getPageContext (page, indexTemplate, catalogue) relativeDestPath = indexMap ['destinationPath'] context = simpleTALES.Context(allowPythonPath=1) context.addGlobal ('page', indexMap) context.addGlobal ('catalogue', catalogueMap) macros = page.getMacros() self.pagePublisher.expandTemplate (indexTemplate, context, relativeDestPath, macros) def getPageContext (self, page, template, catalogue=None): # The page context for a Catalogue is fairly boring, but someone might use it indexMap = SitePublisher.ContentPublisher.getPageContext (self, page, template) if (catalogue is None): localCatalogue = CatalogueContent.CatalogueContent (page.getSource(), self.characterSetCodec) else: localCatalogue = catalogue actualHeaders = indexMap ['headers'] actualHeaders.update (localCatalogue.getCatalogueHeaders()) indexMap ['headers'] = actualHeaders return indexMap class HTMLTextPagePublisher (SitePublisher.ContentPublisher): def __init__ (self, pagePublisher): SitePublisher.ContentPublisher.__init__ (self, pagePublisher) self.htmlConverter = ContentToHTMLConverter.ContentToHTMLConverter() self.xhtmlConverter = ContentToHTMLConverter.ContentToXHTMLConverter() self.log = logging.getLogger ("PubTal.HTMLTextPagePublisher") def publish (self, page): templateName = page.getOption ('template') # Get this template's configuration template = self.templateConfig.getTemplate (templateName) context = simpleTALES.Context(allowPythonPath=1) # Get the page context for this content map = self.getPageContext (page, template) # Determine the destination for this page relativeDestPath = map ['destinationPath'] context.addGlobal ('page', map) macros = page.getMacros() self.pagePublisher.expandTemplate (template, context, relativeDestPath, macros) def getPageContext (self, page, template): pageMap = SitePublisher.ContentPublisher.getPageContext (self, page, template) ignoreNewlines = page.getBooleanOption ('htmltext-ignorenewlines') preserveSpaces = page.getBooleanOption ('preserve-html-spaces', 1) headers, rawContent = self.readHeadersAndContent(page) # Determine desired output type, HTML or XHTML outputType = template.getOption ('output-type') if (outputType == 'HTML'): content = self.htmlConverter.convertContent (rawContent, ignoreNewLines=ignoreNewlines, preserveSpaces=preserveSpaces) elif (outputType == 'XHTML'): content = self.xhtmlConverter.convertContent (rawContent, ignoreNewLines=ignoreNewlines, preserveSpaces=preserveSpaces) elif (outputType == 'PlainText'): # It doesn't actually matter how the markup has been entered in the HTMLText, because we # are going to output Plain Text anyway. We use HTML because it's the least demanding. content = self.htmlConverter.convertContent (rawContent, ignoreNewLines=ignoreNewlines, plainTextOuput=1) else: msg = "HTMLText content doesn't support output in type '%s'." % outputType self.log.error (msg) raise SitePublisher.PublisherException (msg) actualHeaders = pageMap ['headers'] actualHeaders.update (headers) pageMap ['headers'] = actualHeaders pageMap ['content'] = content pageMap ['rawContent'] = rawContent return pageMap class FTPUploadMethod (SiteUploader.UploadMethod): def __init__ (self, siteConfig, uploadConfig): self.siteConfig = siteConfig self.uploadConfig = uploadConfig self.utfencoder = codecs.lookup ("utf8")[0] self.utfdecoder = codecs.lookup ("utf8")[1] self.db = None self.ftpClient = None self.log = logging.getLogger ("FTPUploadMethod") try: conf = 'host' self.hostname = uploadConfig [conf] conf = 'username' self.username = uploadConfig [conf] except: raise Exception ("Missing FTP configuration option %s" % conf) self.password = uploadConfig.get ('password', None) self.initialDir = uploadConfig.get ('base-dir') def getDB (self): if (self.db is None): self.db = anydbm.open (os.path.join (self.siteConfig.getDataDir(), 'FtpCache-%s-%s' % (self.hostname, self.username)), 'c') return self.db def uploadFiles (self, fileDict, userInteraction): "Return 1 for success, 0 for failure. Must notify userInteraction directly." if (self.ftpClient is None): self.log.debug ("First file, there is no ftp client yet.") if (self.password is None): self.log.debug ("Asking for password - none in config file.") self.password = userInteraction.promptPassword ("Password required (%s@%s)" % (self.username, self.hostname)) self.ftpClient = FtpLibrary.FTPUpload (self.hostname, self.username, self.password, self.initialDir) try: self.log.info ("Connecting to FTP site.") userInteraction.info ("Connecting to FTP site.") if (not self.ftpClient.connect (userInteraction)): return 0 self.log.info ("Connected.") userInteraction.info ("Connected.") except Exception, e: msg = "Error connecting to FTP site: %s" % str (e) userInteraction.taskError ("Error connecting to FTP site: %s" % str (e)) return 0 percentageDone = 0.0 incrementSize = 100.0/float (len (fileDict)) db = self.getDB() allFiles = fileDict.keys() # Try to keep files together to help FTP performance. allFiles.sort() for fName in allFiles: userInteraction.taskProgress ("Uploading %s" % fName, percentageDone) percentageDone += incrementSize if (self.ftpClient.uploadFile (self.siteConfig.getDestinationDir(), fName, userInteraction)): db [self.utfencoder (fName)[0]] = fileDict [fName] return 1 def finished (self): if (self.ftpClient is not None): self.ftpClient.disconnect() self.ftpClient = None if (self.db is not None): self.db.close() self.db = NonePubTal-3.5/lib/pubtal/FtpLibrary.py0000644000105000010500000001162011555340742016023 0ustar cms103cms103import string, os, ftplib, os.path, string, time try: import logging except: import InfoLogging as logging class FTPDirList: """ Brings back a list of the entries in the currrent working directory on an FTP server.""" def __init__ (self, client): self.list = [] client.retrlines ('LIST', self.callback) def callback (self, entry): self.list.append (entry) def getList (self): newList = [] for dir in self.list: components = dir.split () if (len (components) > 8): # We have an actual FTP entry... newList.append ("".join (components[8:])) return newList class DirectoryMaker: """ A class that creates directories on an FTP site. This class will ensure that a directory already exists, or make it if required. It cache's the available directory list to reduce load on the server. """ def __init__ (self): self.dirCache = {} self.log = logging.getLogger ("FtpLibrary.DirectoryMaker") def makeDir (self, aDir, client): """ Make this directory structure if required. NOTE: This only works on relative paths! """ curDir = client.pwd() head, tail = os.path.split (aDir) dirsList = [] while (len (head) > 0): # We have a directory element to check dirsList.insert (0,tail) head, tail = os.path.split (head) dirsList.insert (0, tail) # We have a list of directories to check - now get to it! currentLocation = "" skippedDirs = [] hadToMove = 0 for dirToCheck in dirsList: currentLocation = os.path.join (currentLocation, dirToCheck) # Check the cache first. if (self.dirCache.has_key (os.path.join (curDir, currentLocation))): self.log.debug ("Directory %s already exists - found in cache." % currentLocation) skippedDirs.append (dirToCheck) else: self.log.debug ("Directory %s not found in cache." % currentLocation) # Note that we changed directory. hadToMove = 1 for skipDir in skippedDirs: self.log.debug ("Changing skipped directory to %s" % skipDir) client.cwd (skipDir) self.log.debug ("Sleeping to give the server a rest") time.sleep (0.1) self.log.debug ("Woken up, time to carry on.") skippedDirs = [] dirList = FTPDirList (client) if (dirToCheck not in dirList.getList()): # Does not exists! self.log.info ("Directory %s does not exist, creating dir." % currentLocation) client.mkd (dirToCheck) self.log.debug ("Sleeping to give the server a rest") time.sleep (0.1) self.log.debug ("Woken up, time to carry on.") # Add the directory to the cache as one that exists! self.dirCache [os.path.join (curDir, currentLocation)] = 1 client.cwd (dirToCheck) self.log.debug ("Sleeping to give the server a rest") time.sleep (0.1) self.log.debug ("Woken up, time to carry on.") if (hadToMove): self.log.debug ("Had to move directories, going back to original.") client.cwd (curDir) else: self.log.debug ("Cache saved us moving directory.") class FTPUpload: def __init__ (self, host, username, password, initialDir=None): self.host = host self.username = username self.password = password self.initialDir = initialDir self.log = logging.getLogger ("FtpLibrary.FTPUpload") self.dirMaker = DirectoryMaker() self.client = None def connect (self, userInteraction): self.client = self._getClient_ (userInteraction) if (self.client is None): return 0 return 1 def uploadFile (self, localDir, fName, userInteraction): if (self.client is None): return 0 self.log.debug ("Processing file %s" % fName) remoteDir, remoteFile = os.path.split (fName) # Ensure that the directory is present if (len (remoteDir) > 0): self.dirMaker.makeDir (remoteDir, self.client) path = os.path.join (localDir, fName) self.log.debug ("Attempting to upload %s to %s" % (path, fName)) try: uploadFile = open (path,'r') self.client.storbinary ('STOR %s' % fName, uploadFile) uploadFile.close() self.log.debug ("Uploaded: " + fName) except Exception, e: self.log.error ("Error uploading: " + str (e)) userInteraction.taskError ("Error uploading: " + str (e)) return 0 return 1 def disconnect (self): try: self.client.quit() except Exception: self.log.warn ("Exception occured while closing FTP connection") def _getClient_ (self, userInteraction): try: client = ftplib.FTP (self.host, self.username, self.password) client.set_pasv (True) except (ftplib.all_errors), e: self.log.error ("Error connecting to FTP Site: " + str (e)) userInteraction.taskError ("Error connecting to FTP Site: " + str (e)) return None if (self.initialDir is not None): self.log.debug ("Attempting to change directory to: " + str (self.initialDir)) try: client.cwd (self.initialDir) except (ftplib.all_errors), e: self.log.error ("Error changing directory: " + str (e)) userInteraction.taskError ("Error changing directory: " + str (e)) return None return client PubTal-3.5/lib/pubtal/SiteConfiguration.py0000644000105000010500000006177511555340742017421 0ustar cms103cms103""" Classes to handle configuration of a PubTal site. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: import InfoLogging as logging import pubtal import ConfigurationParser, BuiltInPlugins, EncodingCapabilities, MessageBus import os, os.path, stat, re, sys, anydbm, copy, fnmatch class SiteConfig: def __init__ (self, configFile): self.log = logging.getLogger ("PubTal.SiteConfig") # Get the real path to the config file configFile = os.path.normpath (configFile) if (hasattr (os.path, "realpath")): # Under Unix and 2.2 we can remove symlinks. configFile = os.path.realpath (configFile) self.log.info ("Normalised configuration file path: " + configFile) self.contentConfig = ContentConfig (self) self.templateConfig = TemplateConfig (self) self.messageBus = MessageBus.MessageBus() self.encodingCapabilities = EncodingCapabilities.EncodingCapabilities() # List of upload config objects. self.uploadList = [] # List of regular expressions that match files to be ignored self.ignoreFilters = [] # Supported content types self.supportedContent = {} # Supported upload types self.supportedUploadMethods = {} # Directories where plugins may reside self.extraPluginDirs = [] self.plugins = [] self.baseDir = os.path.abspath (os.path.split (configFile)[0]) self.localCacheDB = None # Defaults self.setDefaultCharacterSet ('ISO-8859-15') self.contentDir = os.path.join (self.baseDir, 'content') self.destDir = os.path.join (self.baseDir, 'dest') self.templateDir = os.path.join (self.baseDir, 'template') self.pubtalDataDir = os.path.join (self.baseDir, 'PubTalData') self.readConfig(configFile) # Load plugins self.log.info ("Loading plugins...") systemPlugins = self.getPluginModules(os.path.join (pubtal.__path__[0],'plugins')) self.plugins.extend (systemPlugins) self.log.debug ("Loading extra plugins.") for extraDir in self.extraPluginDirs: self.log.debug ("Looking for plugins in %s" % extraDir) morePlugins = self.getPluginModules (extraDir) self.plugins.extend (morePlugins) # Add in the built in supported content self.plugins.append (BuiltInPlugins) for plugin in self.plugins: for pluginInfo in plugin.getPluginInfo(): pluginType = pluginInfo.get ('functionality', None) if (pluginType == None): sef.log.warn ("Plugin did not include 'functionality' key - skipping.") elif (pluginType == "content"): contentType = pluginInfo ['content-type'] klass = pluginInfo ['class'] self.supportedContent [contentType] = klass if (pluginInfo.has_key ('file-type')): fileTypeInfo = pluginInfo ['file-type'] if (type (fileTypeInfo) == type ([])): fileTypeInfos = fileTypeInfo else: fileTypeInfos = [fileTypeInfo] for fileTypeInfo in fileTypeInfos: contentTypeConfig = PageConfigItem() contentTypeConfig.setOption ('content-type', contentType) self.log.debug ("Adding plugin file-type config for %s" % fileTypeInfo) self.contentConfig.addFileType (fileTypeInfo, contentTypeConfig) elif (pluginType == "upload-method"): methodType = pluginInfo ['method-type'] klass = pluginInfo ['class'] self.supportedUploadMethods [methodType] = klass self.log.info ("Upload method %s supported." % methodType) else: self.log.warn ("Plugin offer functionality %s which is not understood." % pluginType) self.messageBus.notifyEvent ("PubTal.InitComplete") def getMessageBus (self): return self.messageBus def finished (self): """ Called to let us know that no further publishing is going to happen. """ self.messageBus.notifyEvent ("PubTal.Shutdown") def getContentDir (self): return self.contentDir def getDestinationDir (self): return self.destDir def getTemplateDir (self): return self.templateDir def getDataDir (self): return self.pubtalDataDir def getIgnoreFilters (self): return self.ignoreFilters def getDefaultCharacterSet (self): return self.characterSet def setDefaultCharacterSet (self, charSet): self.characterSet = charSet def getSupportedContent (self): return self.supportedContent def getSupportedUploadMethods (self): return self.supportedUploadMethods def getUploadConfigs (self): return self.uploadList def getEncodingCapabilities (self): return self.encodingCapabilities def getContentConfig (self): return self.contentConfig def getTemplateConfig (self): return self.templateConfig def getLocalCacheDB (self): if (self.localCacheDB is None): # Create the directory if needed and open the DB. dataDir = self.getDataDir() if (not os.path.exists (dataDir)): self.log.info ("Creating PubTal Data directory %s" % dataDir) os.makedirs (dataDir) self.localCacheDB = anydbm.open (os.path.join (dataDir, 'localCache'), 'c') self.messageBus.registerListener ("PubTal.Shutdown", self._shutdown_) return self.localCacheDB def readConfig (self, configFile): parser = ConfigurationParser.ConfigurationParser() parser.addTopLevelHandler ('SiteConfig', self) parser.addTopLevelHandler ('Content', self.contentConfig) parser.addTopLevelHandler ('Template', self.templateConfig) self.currentDirective = [] confFile = open (configFile, 'r') parser.parse(confFile) confFile.close() self.contentConfig.configFinished() self.templateConfig.configFinished() def startDirective (self, directive, options): self.currentDirective.append (directive) if (directive == 'UPLOAD'): self.uploadList.append ({}) def endDirective (self, directive): self.currentDirective.pop() def option (self, line): if (len (self.currentDirective) == 0): self.log.warn ("Received option with no directive in place.") return directive = self.currentDirective [-1] if (directive == 'SITECONFIG'): if (line.lower().startswith ('content-dir')): self.contentDir = os.path.join (self.baseDir, line [line.find (' ')+1:]) elif (line.lower().startswith ('template-dir')): self.templateDir = os.path.join (self.baseDir, line [line.find (' ')+1:]) elif (line.lower().startswith ('dest-dir')): self.destDir = os.path.join (self.baseDir, line [line.find (' ')+1:]) elif (line.lower().startswith ('ignore-filter')): filter = line [line.find (' ')+1:] self.log.info ("Adding filter of content to ignore: %s" % filter) self.ignoreFilters.append (re.compile (filter)) elif (line.lower().startswith ('character-set')): self.setDefaultCharacterSet (line[line.find (' ') + 1:]) elif (line.lower().startswith ('additional-plugins-dir')): self.extraPluginDirs.append (os.path.join (self.baseDir, line [line.find (' ')+1:])) elif (line.lower().startswith ('pubtal-data-dir')): self.pubtalDataDir = os.path.join (self.baseDir, line [line.find (' ')+1:]) else: self.log.warn ("SiteConfig Option %s not supported" % line) elif (directive == 'UPLOAD'): uploadConf = self.uploadList[-1] nvBreak = line.find (' ') uploadConf [line [0:nvBreak]] = line [nvBreak + 1:] def getPluginModules (self, path): try: dirList = os.listdir (path) except: return [] self.log.debug ("Adding %s to the Python path." % path) sys.path.insert(0, path) foundPlugins = [] for fileName in dirList: try: if (os.path.isfile (os.path.join (path,fileName))): if (fileName == "__init__.py"): self.log.debug ("Skipping init file for plugins dir.") elif (fileName.endswith ('.py')): pluginModuleName = fileName[:-3] plugin = __import__ (pluginModuleName, globals(), locals(), pluginModuleName) foundPlugins.append (plugin) self.log.info ("Loaded PubTal plugin %s" % pluginModuleName) else: self.log.debug ("Skipping file %s while looking for plugins." % fileName) elif (os.path.isdir (os.path.join (path, fileName))): pluginModuleName = fileName try: plugin = __import__ (pluginModuleName, globals(), locals(), pluginModuleName) foundPlugins.append (plugin) self.log.info ("Loaded PubTal plugin %s" % pluginModuleName) except: self.log.warn ("Error trying to load dir %s as a module." % pluginModuleName) else: self.log.warn ("Found neither file nor dir in plugin directory.") except ImportError, e: self.log.warn ("Error loading PubTal plugin %s." % fileName) self.log.debug ("Exception was: %s" % str (e)) return foundPlugins def _shutdown_ (self, event, data): # Only called when PubTal is shutting down. self.localCacheDB.close() class ContentConfig: def __init__ (self, siteConfig): self.log = logging.getLogger ("PubTal.ContentConfig") self.patternMap = {} self.directoryMap = {} self.fileMap = {} self.pageBuilders = {} self.currentConfigItem = None self.siteConfig = siteConfig def getPage (self, contentPath): """ This returns a single page with all configuration information populated. It does *NOT* call page builders or respect classes. """ page = Page (contentPath, self.siteConfig.contentDir) # Now work through directory and pattern maps dirConfigItems = [] head, tail = os.path.split (contentPath) contentFilename = tail while (tail != ''): self.log.debug ("Looking for directory and pattern configuration at: %s" % head) # Do directories first - they have higher priorities than pattern matches configItem = self.directoryMap.get (head, None) if (configItem is not None): dirConfigItems.insert (0,configItem) # Do the pattern matching second. patternList = self.patternMap.get (head, []) for pattern, configItem in patternList: if (fnmatch.fnmatch (contentFilename, pattern)): # We have a match dirConfigItems.insert (0, configItem) if (self.siteConfig.contentDir.startswith (head)): # We have reached the top of the content dir, so stop now tail = '' else: head, tail = os.path.split (head) for confItem in dirConfigItems: self.log.debug ("Updating config using %s" % str (confItem)) confItem.updatePage (page) # Now look for this specific file configItem = self.fileMap.get (contentPath) self.log.debug ("Looking for file config item for %s", contentPath) if (configItem is not None): self.log.debug ("Found file config item of %s", str (configItem)) configItem.updatePage (page) return page def getPages (self, contentPath, options): """ Returns a list of pages (i.e. elements to be built) for a given content path. This method determines the content type that applies for this page, then filters the page based on content. Finally content-specific methods are called (if applicable) which can in turn create their own page objects. """ # See what, if any, classes we should check for. classList = options.get ('classes', 'normal').split (',') allowAllClasses = options.get ('buildAllClasses', 0) page = self.getPage (contentPath) # Now filter based on class. pageClasses = page.getOption ('class', 'normal').split (',') self.log.debug ("Checking that page class %s is in the list of classes to build." % str (pageClasses)) allow = 0 if (allowAllClasses): allow = 1 else: for pageClass in pageClasses: if (pageClass in classList): allow = 1 if (not allow): return [] # Now see whether this content type has it's own page builder. pageContentType = page.getOption ('content-type') if (self.pageBuilders.has_key (pageContentType)): self.log.debug ("Page content type %s has builder - calling." % pageContentType) return self.pageBuilders [pageContentType] (page, options) else: return [page] def addFileType (self, fileExtension, extensionConfig): """ Adds default configuration options for a file type. """ newPattern = "*.%s" % fileExtension self.addPattern (self.siteConfig.contentDir, newPattern, extensionConfig) def addPattern (self, targetDirectory, newPattern, newConfig): count = 0 foundExisting = 0 existingExtensions = self.patternMap.get (targetDirectory, []) for existingPattern, existingConfig in existingExtensions: if (existingPattern == newPattern): self.log.debug ("Configuration for existing pattern %s already exists, merging." % existingPattern) # We already have a config. # The existing configuration should take precedence - so merge it into the new config newConfig.updateConfigItem (existingConfig) # Update the existing entry existingExtensions [count] = (existingPattern, newConfig) foundExisting = 1 count = count + 1 if (not foundExisting): existingExtensions.append ((newPattern, newConfig)) self.patternMap [targetDirectory] = existingExtensions def registerPageBuilder (self, contentType, builderMethod): self.pageBuilders [contentType] = builderMethod def startDirective (self, directive, options): self.currentDirective = directive if (directive == 'CONTENT'): self.currentConfigItem = PageConfigItem() targetPath = os.path.join (self.siteConfig.contentDir, options) targetDirectory, targetFilename = os.path.split (targetPath) if (os.path.isfile (targetPath)): # This is an individual file self.log.debug ("Found file configuration item: %s", targetPath) self.fileMap [targetPath] = self.currentConfigItem elif (os.path.isdir (targetPath)): # This is a directory directive self.log.debug ("Found directory configuration item: %s", targetPath) if (targetPath.endswith (os.sep)): targetPath = targetPath [:-1] self.directoryMap [targetPath] = self.currentConfigItem else: # Pattern for matching file content. self.log.debug ("Found pattern configuration item.") self.addPattern (targetDirectory, targetFilename, self.currentConfigItem) else: self.currentConfigItem = None def option (self, line): if (self.currentConfigItem is not None): if (line.lower().startswith ('macro')): mnStart = line.find (' ') mnEnd = line.find (' ', mnStart+1) macroName = line [mnStart+1:mnEnd] macro = line [mnEnd + 1:] self.currentConfigItem.addMacro (macroName, os.path.join (self.siteConfig.templateDir, macro)) elif (line.lower().startswith ('header')): mnStart = line.find (' ') mnEnd = line.find (' ', mnStart+1) headerName = line [mnStart+1:mnEnd] header = line [mnEnd + 1:] self.currentConfigItem.addHeader (headerName, header) else: firstSpace = line.find (' ') name = line [:firstSpace] value = line [firstSpace+1:] self.currentConfigItem.setOption (name, value) def endDirective (self, directive): self.currentDirective = None self.currentConfigItem = None def configFinished (self): # Some debug messages if enabled. if (self.log.isEnabledFor (logging.DEBUG)): for directory in self.patternMap.keys(): configDump = [] for pattern, configValue in self.patternMap [directory]: configDump.append ("Pattern %s: Config values: %s" % (pattern, str (configValue))) self.log.debug ("PatternMap %s has Configuration: %s" % (directory, "".join (configDump))) for dir in self.directoryMap.keys(): self.log.debug ("Directory %s has Configuration: %s" % (dir, str (self.directoryMap [dir]))) for file in self.fileMap.keys(): self.log.debug ("File %s has Configuration: %s" % (file, str (self.fileMap [file]))) class TemplateConfig: def __init__ (self, siteConfig): self.log = logging.getLogger ("PubTal.TemplateConfig") self.patternMap = {} self.fileMap = {} self.directoryMap = {} self.currentConfigItem = None self.siteConfig = siteConfig def getTemplate (self, templateName): """ Get a Template object which hold the configuration information for this template. Note that the templateName is relative to the templateDirectory - not absolute! """ template = Template (templateName, self.siteConfig.templateDir) templatePath = template.getTemplatePath() # Now work through directory map dirConfigItems = [] head, tail = os.path.split (templatePath) templateFilename = tail while (tail != ''): self.log.debug ("Looking for directory configuration at: %s" % head) configItem = self.directoryMap.get (head, None) if (configItem is not None): dirConfigItems.insert (0,configItem) # Do the pattern matching second. patternList = self.patternMap.get (head, []) for pattern, configItem in patternList: if (fnmatch.fnmatch (templateFilename, pattern)): # We have a match dirConfigItems.insert (0, configItem) if (self.siteConfig.templateDir.startswith (head)): # We have reached the top of the content dir, so stop now tail = '' else: head, tail = os.path.split (head) for confItem in dirConfigItems: template.options.update (confItem) # Now look for this specific file configItem = self.fileMap.get (templatePath) if (configItem is not None): template.options.update (configItem) return template def addFileType (self, fileExtension, extensionConfig): newPattern = "*.%s" % fileExtension self.addPattern (self.siteConfig.templateDir, newPattern, extensionConfig) def addPattern (self, targetDirectory, newPattern, newConfig): count = 0 foundExisting = 0 existingExtensions = self.patternMap.get (targetDirectory, []) for existingPattern, existingConfig in existingExtensions: if (existingPattern == newPattern): self.log.debug ("Configuration for existing pattern %s already exists, merging." % existingPattern) # We already have a config. # The existing configuration should take precedence - so merge it into the new config newConfig.updateConfigItem (existingConfig) # Update the existing entry existingExtensions [count] = (existingPattern, newConfig) foundExisting = 1 count = count + 1 if (not foundExisting): existingExtensions.append ((newPattern, newConfig)) self.patternMap [targetDirectory] = existingExtensions def startDirective (self, directive, options): self.currentDirective = directive if (directive == 'TEMPLATE'): self.currentConfigItem = {} targetPath = os.path.join (self.siteConfig.templateDir, options) targetDirectory, targetFilename = os.path.split (targetPath) if (os.path.isfile (targetPath)): # This is an individual file self.log.debug ("Found file configuration item.") self.fileMap [targetPath] = self.currentConfigItem elif (os.path.isdir (targetPath)): # This is a directory directive self.log.debug ("Found directory configuration item: %s" % targetPath) if (targetPath.endswith (os.sep)): targetPath = targetPath [:-1] self.directoryMap [targetPath] = self.currentConfigItem else: # Pattern for matching file content. self.log.debug ("Found pattern configuration item.") self.addPattern (targetDirectory, targetFilename, self.currentConfigItem) else: self.currentConfigItem = None def option (self, line): if (self.currentConfigItem is not None): firstSpace = line.find (' ') name = line [:firstSpace] value = line [firstSpace+1:] self.currentConfigItem [name] = value def endDirective (self, directive): self.currentDirective = None self.currentConfigItem = None def configFinished (self): # Some debug messages if enabled. if (self.log.isEnabledFor (logging.DEBUG)): for dir in self.patternMap.keys(): self.log.debug ("PatternMap directory %s has Configuration: %s" % (dir, str (self.patternMap [dir]))) for dir in self.directoryMap.keys(): self.log.debug ("Directory %s has Configuration: %s" % (dir, str (self.directoryMap [dir]))) for file in self.fileMap.keys(): self.log.debug ("File %s has Configuration: %s" % (file, str (self.fileMap [file]))) class PageConfigItem: def __init__ (self): self.macros = {} self.headers = {} self.options = {} def updatePage (self, page): page.macros.update (self.macros) page.headers.update (self.headers) page.options.update (self.options) def updateConfigItem (self, anotherItem): """ This method merges in the configuration options held by another item. """ self.macros.update (anotherItem.macros) self.headers.update (anotherItem.headers) self.options.update (anotherItem.options) def addMacro (self, name, template): self.macros [name] = template def addHeader (self, name, header): self.headers [name] = header def setOption (self, name, value): if (not self.options.has_key (name)): self.options [name] = value else: curVal = self.options [name] if (type (curVal) == type ([])): curVal.append (value) else: curVal = [curVal, value] self.options [name] = curVal def __str__ (self): desc = "" if (len (self.macros) > 0): desc += "Macros: %s\n" % str (self.macros) if (len (self.headers) > 0): desc += "Headers: %s\n" % str (self.headers) if (len (self.options) > 0): desc += "Options: %s\n" % str (self.options) return desc class Page: def __init__ (self, source, contentDir): # Things we need self.source = source self.macros = {} self.options = {'template': 'template.html'} self.headers = {} self.contentDir = contentDir commonRoot = os.path.commonprefix ([contentDir, source]) self.relativePath = source[len (commonRoot)+1:] # relativePath is of the form: [dir/]file depth = 0 head, tail = os.path.split (self.relativePath) while (tail != ''): depth += 1 head, tail = os.path.split (head) # We go to far, so knock one off! self.depth = depth - 1 self.name = self.relativePath def getDuplicatePage (self, newSource): newPage = Page (newSource, self.contentDir) newPage.macros = copy.deepcopy (self.macros) newPage.options = copy.deepcopy (self.options) newPage.headers = copy.deepcopy (self.headers) return newPage def setName (self, name): self.name = name def setOption (self, name, value): self.options [name] = value def hasOption (self, name): return self.options.has_key (name) def getOption (self, name, defaultValue=None): return self.options.get (name, defaultValue) def getBooleanOption (self, name, defaultValue='0'): value = self.getOption (name, str (defaultValue)) lowerValue = value.lower() if (lowerValue == 'y' or lowerValue == 'true'): return 1 if (lowerValue == 'n' or lowerValue == 'false'): return 0 try: val = int (lowerValue) return val except: return 0 def getListOption (self, name): if (self.options.has_key (name)): option = self.options [name] if (type (option) == type ([])): return option return [option] return None def getSource (self): return self.source def getRelativePath (self): return self.relativePath def getMacros (self): return self.macros def getHeaders (self): return self.headers def getDepthString (self): return "../"*self.depth def getModificationTime (self): """ Returns a tuple of the creation and modification date/time. """ info = os.stat (self.source) return info[stat.ST_MTIME] def __str__ (self): return self.name class Template: def __init__ (self, templateName, templateDir): self.templateName = templateName self.templatePath = os.path.join (templateDir, templateName) self.options = {'output-type': 'HTML'} def getTemplateName (self): return self.templateName def getTemplatePath (self): return self.templatePath def getTemplateExtension (self): return os.path.splitext (self.templateName)[1][1:] def getOption (self, name, defaultValue=None): return self.options.get (name, defaultValue) def getBooleanOption (self, name, defaultValue='0'): value = self.getOption (name, str (defaultValue)) lowerValue = value.lower() if (lowerValue == 'y' or lowerValue == 'true'): return 1 if (lowerValue == 'n' or lowerValue == 'false'): return 0 try: val = int (lowerValue) return val except: return 0 def __str__ (self): return self.templateName PubTal-3.5/lib/pubtal/InfoLogging.py0000644000105000010500000000404011555340742016145 0ustar cms103cms103""" PubTal logging for when Python logging is missing. Copyright (c) 2003 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! Dummy logging module, used when logging (http://www.red-dove.com/python_logging.html) is not installed. """ class InfoLogger: def debug (self, *args): pass def info (self, *args): print args[0] % args[1:] def warn (self, *args): print "WARNING: " + args[0] % args[1:] def error (self, *args): print "ERROR: " + args[0] % args[1:] def critical (self, *args): print "CRITICAL: " + args[0] % args[1:] def getLogger (*params): return InfoLogger() PubTal-3.5/lib/pubtal/SiteUploader.py0000644000105000010500000002073211555340742016351 0ustar cms103cms103""" Utility classes to help automate the PubTal testing. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: import InfoLogging as logging import os, os.path, copy, hashlib, getpass, codecs class UploadMethod: """ An upload method should implement these methods, although it isn't a requirment to inherit from this class. """ def __init__ (self, siteConfig, uploadConfig): self.siteConfig = siteConfig self.uploadConfig = uploadConfig def getDB (self): return {} def uploadFiles (self, fileDict, userInteraction): "Return 1 for success, 0 for failure. Must notify UserInteraction directly." pass def markFilesUpToDate (self, fileDict, userInteraction): percentage = 0.0 increment = 100.0/float (len (fileDict)) db = self.getDB() for fName in fileDict.keys(): userInteraction.taskProgress ("Marking file %s as already uploaded." % fName, percentage) db [fName] = fileDict [fName] percentage += increment def finished (self): pass class SiteUploader: """ A class that determines which files should be uploaded at this time.""" def __init__ (self, config): self.config = config self.log = logging.getLogger ('SiteUploader') self.currentDestFiles = [] self.destDir = config.getDestinationDir() self.localCache = config.getLocalCacheDB() self.utfencode = codecs.lookup ("utf8")[0] self.utfdecode = codecs.lookup ("utf8")[1] self.ignoreFilters = config.getIgnoreFilters() def uploadSite (self, uploadConfig, userInteraction, target=None, options = {}): """ Determines what the upload method is, finds a provider to use that method, then determines what files need to be uploaded, and then does the actual upload. """ # Get the options being used for the upload allFiles = options.get ('allFiles', 0) forceUpload = options.get ('forceUpload', 0) markFilesUpToDate = options.get ('markFilesUpToDate', 0) dryRun = options.get ('dry-run', 0) # Get the method try: uploadMethod = uploadConfig['method'] except: self.log.warn ("Upload method not specified, assuming FTP") uploadMethod = 'FTP' try: methodKlass = self.config.getSupportedUploadMethods ()[uploadMethod] except: msg = "There is no support for upload method %s" % uploadMethod self.log.error (msg) userInteraction.taskError (msg) return # Get an instance of the UploadMethod self.log.debug ("Creating upload method instance.") method = methodKlass (self.config, uploadConfig) self.log.debug ("Asking for database.") self.uploadDB = method.getDB() self.log.debug ("Getting files to upload...") fileDict = self._getFilesToUpload_ (target, allFiles, forceUpload) if (len (fileDict) == 0): self.log.info ("No files in file dictionary.") userInteraction.info ("No files to process, the site is up-to-date.") self.uploadDB = None method.finished() userInteraction.taskDone() return if (markFilesUpToDate): if (dryRun): userInteraction.info ("dry-run: Would mark the following files as being up-to-date") for fName in fileDict.keys(): userInteraction.info ("dry-run: Would mark %s as up-to-date" % fName) else: userInteraction.info ("Marking files as being up-to-date.") method.markFilesUpToDate (fileDict, userInteraction) else: if (dryRun): userInteraction.info ("dry-run: Would upload the following files.") for fName in fileDict.keys(): userInteraction.info ("dry-run: Would upload %s" % fName) else: method.uploadFiles (fileDict, userInteraction) self.uploadDB = None method.finished() userInteraction.taskDone() def _getFilesToUpload_ (self, target, allFiles, forceUpload): """ Returns a dictionary of files that need uploading. The key is the filepath, the value is the current checksum. target is either: None - Get all files List of files or dir paths - Get only files in these paths allFiles is a flag. True means include files PubTal didn't create. forceUpload is a flag. True means upload files even if they haven't changed. """ result = {} if (target is None): self.log.debug ("Looking at whole site for upload material.") targetList = [self.destDir] else: targetList = [] for t in target: targetList.append (os.path.abspath (t)) for targetPath in targetList: self.log.debug ("Checking target path: %s" % targetPath) # Are we doing just one file or a dir? if (os.path.isfile (targetPath)): qualified = self.qualifyPath (targetPath, allFiles, forceUpload) if (qualified is not None): relPath, curChecksum = qualified result [relPath] = curChecksum else: os.path.walk (targetPath, self.walkPaths, None) for destFile in self.currentDestFiles: qualified = self.qualifyPath (destFile, allFiles, forceUpload) if (qualified is not None): relPath, curChecksum = qualified result [relPath] = curChecksum self.currentDestFiles = [] return result def qualifyPath (self, path, allFiles, forceUpload): # Determine whether the path should be uploaded, and turn into a relative path if so. # Get the relative path as first step. commonRoot = os.path.commonprefix ([self.destDir, path]) relativePath = path[len (commonRoot)+1:] utfRelativePath = self.utfencode (relativePath)[0] if (not allFiles): # Check that PubTal created this file. if (not self.localCache.has_key (utfRelativePath)): self.log.debug ("File %s is not in localCache, but allFiles is false." % utfRelativePath) return None # We always need the current checksum try: curChecksum = self.localCache [utfRelativePath] # This was a PubTal generated file except: # Not a PubTal generated file, so we need to generate our own checksum on the real file. curChecksum = self._getChecksum_ (path) if (not forceUpload): if (self.uploadDB.has_key (utfRelativePath)): # We've uploaded this file before, time to compare checksums lastChecksum = self.uploadDB [utfRelativePath] if (lastChecksum == curChecksum): self.log.debug ("File %s has same checksum as last upload, and forceUpload is false." % relativePath) return None self.log.info ("File %s has changed since last upload." % relativePath) else: # We have never seen the file before, so we assume that we must upload it! self.log.info ("File %s is new and has never been uploaded." % relativePath) else: self.log.info ("Forcing upload of file %s" % relativePath) return (relativePath, curChecksum) def _getChecksum_ (self, path): sum = hashlib.md5() readFile = open (path, 'r') while 1: buffer = readFile.read(1024*1024) if len(buffer) == 0: break sum.update(buffer) readFile.close() return sum.hexdigest() def walkPaths (self, arg, dirname, names): for name in names: realName = os.path.join (dirname, name) if (os.path.isfile (realName)): # Check to see whether it is a file we should ignore. addFile = 1 for filter in self.ignoreFilters: if (filter.match (realName)): addFile = 0 if (addFile): self.currentDestFiles.append (realName) PubTal-3.5/lib/pubtal/SitePublisher.py0000644000105000010500000003064111555340742016533 0ustar cms103cms103""" Classes to handle publishing of a PubTal site. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: import InfoLogging as logging import time, codecs, os, os.path, hashlib from simpletal import simpleTAL, simpleTALES, simpleTALUtils import pubtal import timeformat import ContentToHTMLConverter, SiteUtils, DateContext class PagePublisher: def __init__ (self, config, ui=SiteUtils.SilentUI()): self.ui = ui self.templateCache = simpleTALUtils.TemplateCache() self.templateDir = config.getTemplateDir() self.destDir = config.getDestinationDir() self.characterSet = config.getDefaultCharacterSet() self.templateConfig = config.getTemplateConfig() self.config = config self.messageBus = self.config.getMessageBus() self.localCache = config.getLocalCacheDB() supportedContent = config.getSupportedContent() self.supportedContent = {} self.log = logging.getLogger ("PubTal.PagePublisher") self.log.info ("Looking for supported content types...") for contentType in supportedContent.keys(): klass = supportedContent [contentType] obj = klass (self) self.supportedContent [contentType] = obj msg = "Support for content types: %s" % ", ".join (self.supportedContent.keys()) self.log.info (msg) ui.info (msg) # Used in the Context self.ContextFunctions = ContextFunctions(self.config) self.pubTalInfo = {'version': pubtal.__version__ ,'url': "http://www.owlfish.com/software/PubTal/" ,'linkText': """

Made with PubTal %s

""" % pubtal.__version__ } self.messageBus.notifyEvent ("PagePublisher.InitComplete") def getUI (self): return self.ui def getConfig (self): return self.config def getContentPublisher (self, contentType): return self.supportedContent.get (contentType, None) def publish (self, page): contentType = page.getOption ('content-type') try: publisher = self.supportedContent [contentType] except: msg = "Unsupported content type: %s" % contentType self.log.warn (msg) self.ui.warn (msg) return 1 try: publisher.publish (page) return 1 except Exception, e: self.log.error ("Exception publishing page: %s" % repr (e)) self.ui.taskError ("Page Publication failed: %s " % str (e)) return 0 def expandTemplate (self, template, context, relativeOutputPath, macros): """ Expand the given Template object using the context, writing to the output path. Looks up the character-set for each template and macro. """ absTemplateName = template.getTemplatePath() templateCharset = template.getOption ('character-set', self.characterSet) suppressXMLDeclaration = template.getOption ('suppress-xmldecl') outputType = template.getOption ('output-type') if (outputType == 'HTML'): # For HTML output-type we guess as to the SimpleTAL template kind taltemplate = self.templateCache.getTemplate (absTemplateName, inputEncoding=templateCharset) else: # Assume it's XML taltemplate = self.templateCache.getXMLTemplate (absTemplateName) # Handle XHTML DOCTYPE xmlDoctype = template.getOption ('xml-doctype', None) self.ContextFunctions.setCurrentPage (relativeOutputPath, context) context.addGlobal ('ispage', self.ContextFunctions.isPage) context.addGlobal ('readFile', self.ContextFunctions.readFile) context.addGlobal ('pubtal', self.pubTalInfo) # Add macros to the context macroTemplates = {} for macroName in macros.keys(): macTemplate = self.templateConfig.getTemplate (macros [macroName]) macroCharSet = macTemplate.getOption ('character-set', self.characterSet) mTemp = self.templateCache.getTemplate (macros [macroName], inputEncoding=macroCharSet) macroTemplates [macroName] = mTemp.macros context.addGlobal ('macros', macroTemplates) if (self.log.isEnabledFor (logging.DEBUG)): self.log.debug (str (context)) dest = self.openOuputFile (relativeOutputPath) if (isinstance (taltemplate, simpleTAL.XMLTemplate)): if (xmlDoctype is not None): taltemplate.expand (context, dest, outputEncoding=templateCharset, docType=xmlDoctype, suppressXMLDeclaration=suppressXMLDeclaration) dest.close() return else: taltemplate.expand (context, dest, outputEncoding=templateCharset, suppressXMLDeclaration=suppressXMLDeclaration) dest.close() return taltemplate.expand (context, dest, outputEncoding=templateCharset) dest.close() def openOuputFile (self, relativeOutputPath): """ Creates and required directories and opens a file-like object to the destination path. This provides a common point for PubTal to note all directories it has created and files it has written. The file-like object will keep track of the MD5 of the file written. """ # Make directories if required. outputPath = os.path.join (self.destDir, relativeOutputPath) destDir = os.path.split (outputPath)[0] if (not os.path.exists (destDir)): os.makedirs (destDir) dest = MD5File (outputPath, relativeOutputPath, 'wb', self.localCache) return dest class MD5File: """ This presents a file object to the world, and calculates an MD5 checksum on the fly. When the file is closed it updates a dictionary with the resulting hex digest. This file type should only be used for writing! """ def __init__ (self, filePath, relativeOutputPath, mode, dictionary): self.dictionary = dictionary self.ourmd5 = hashlib.md5() self.ourFile = open (filePath, mode) # We need to transform the path name into ascii compatible strings for some anydbm implementations. utfencode = codecs.lookup ("utf8")[0] self.relativeOutputPath = utfencode (relativeOutputPath)[0] self.closed = 0 def close (self): self.ourFile.close() self.dictionary [self.relativeOutputPath] = self.ourmd5.hexdigest() self.closed = 1 def __del__ (self): if (not self.closed): self.close() def flush (self): return self.ourFile.flush() def fileno (self): return self.ourFile.fileno() def read (self, size=None): return self.ourFile.read(size) def readline (self, size=None): return self.ourFile.readline(size) def readlines (self, size=None): return self.ourFile.readlines (size) def xreadlines (self): return self.ourFile.xreadlines() def seek (self, offset, wence=0): return self.ourFile.seek(offset, wence) def tell (self): return self.ourFile.tell() def truncate (self, size=None): return self.ourFile.truncate (size) def write (self, str): self.ourFile.write (str) self.ourmd5.update (str) def writelines (self, aseq): for value in aseq: self.ourmd5.update (value) self.ourFile.write (value) def __itter__ (self): return self.ourFile.__itter__() class ContextFunctions: def __init__ (self, siteConfig): self.log = logging.getLogger ("PubTal.PagePublisher") self.currentTargetPath = None self.currentContext = None self.config = siteConfig self.contentDir = self.config.getContentDir() self.destinationDir = self.config.getDestinationDir() self.isPage = simpleTALES.PathFunctionVariable (self.isCurrentPage) self.readFile = simpleTALES.PathFunctionVariable (self.readExternalFile) def setCurrentPage (self, targetPath, context): self.currentTargetPath = targetPath.replace (os.sep, '/') self.currentContext = context def isCurrentPage (self, targetPath): if (self.currentTargetPath == targetPath.replace (os.sep, '/')): return 1 return 0 def readExternalFile (self, targetPath): # Start by evaluating the targetPath to resolve the filename targetFileName = self.currentContext.evaluate (targetPath) self.log.info ("Resolved path %s to filename %s" % (targetPath, str (targetFileName))) if (targetFileName): # Read the file (relative to the content directory) try: targetFile = open (os.path.join (self.contentDir, targetFileName)) targetData = targetFile.read() targetFile.close() return targetData except Exception, e: self.log.error ("Error reading file %s: %s" % (os.path.join (self.contentDir, targetFileName), str (e))) raise return None class ContentPublisher: def __init__ (self, pagePublisher): self.pagePublisher = pagePublisher self.config = self.pagePublisher.config self.contentConfig = self.config.getContentConfig() self.templateConfig = self.config.getTemplateConfig() self.characterSet = self.config.getDefaultCharacterSet() self.characterSetCodec = codecs.lookup (self.characterSet)[1] self.destDir = self.config.getDestinationDir() self.contentDir = self.config.getContentDir() def readHeadersAndContent (self, page, preserveCharacterSet = 0): """ This method reads the source file for this page, and then returns the headers defined in this file and the raw content of the body of the file. If preserveCharacterSet is false then Unicode is returned. """ sourceFile = open (page.getSource(), 'r') readingHeaders = 1 headers = {} if (not preserveCharacterSet): pageCharSet = page.getOption ('character-set', None) if (pageCharSet is not None): # This page has it's own character set pageCodec = codecs.lookup (pageCharSet)[1] else: # This page uses the default character set. pageCodec = self.characterSetCodec else: # We use a dummy function that doesn't alter the string if we are preserving the character set. pageCodec = lambda decodedString: (decodedString, 0) while (readingHeaders): line = pageCodec (sourceFile.readline())[0] offSet = line.find (':') if (offSet > 0): headers [line[0:offSet]] = line[offSet + 1:].strip() else: readingHeaders = 0 rawContent = pageCodec (sourceFile.read())[0] sourceFile.close() return (headers, rawContent) def getPageContext (self, page, template): """ Returns the default context which will apply to most pages of content. Template is the template that this context will eventually be used in, and is used to extract the type of output (HTML, XHTML, WML, etc) and the destination file extension. """ copyrightYear = timeformat.format ('%Y') destExtension = '.' + template.getTemplateExtension() relativeDestPath = os.path.splitext (page.getRelativePath())[0] + destExtension destPath = os.path.join (self.destDir, relativeDestPath) destFilename = os.path.basename (destPath) pageContext = {'lastModifiedDate': DateContext.Date (time.localtime (page.getModificationTime()), '%a[SHORT], %d %b[SHORT] %Y %H:%M:%S %Z') ,'copyrightYear': DateContext.Date (time.localtime(), '%Y') ,'sourcePath': page.getRelativePath() ,'absoluteSourcePath': page.getSource() ,'destinationPath': relativeDestPath ,'absoluteDestinationPath': destPath ,'destinationFilename': destFilename ,'depth': page.getDepthString() ,'headers': page.getHeaders() } siteURLPrefix = page.getOption ('url-prefix') if (siteURLPrefix is not None): pageContext ['absoluteDestinationURL'] = '%s/%s' % (siteURLPrefix, relativeDestPath) return pageContext class PublisherException (Exception): passPubTal-3.5/lib/pubtal/SiteUtils.py0000644000105000010500000003110311555340742015670 0ustar cms103cms103""" Utility classes to help automate the PubTal testing. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: import InfoLogging as logging import os, os.path, copy, hashlib, getpass import xml.sax, xml.sax.handler, StringIO class BlockFilter: def filter (self, msg): return 0 class UserInteraction: """ This class defines the interface that should be provided to interact with the core PubTal library. This implementation is for command line clients. It isn't neseccary to inherit from this class. """ def prompt (self, msg): return raw_input ('%s: ' % msg) def promptPassword (self, msg): return getpass.getpass ('%s: ' % msg) def taskProgress (self, msg, percentageDone): print "(%s %%) %s" % (str (int (percentageDone)), msg) def taskError (self, msg): print "ERROR: %s" % msg def taskDone (self): print "Finished." def warn (self, msg): print "Warning: %s" % msg def info (self, msg): print msg class SilentUI (UserInteraction): def prompt (self, msg): return "" def promptPassword (self, msg): return "" def taskProgress (self, msg, percentageDone): pass def taskError (self, msg): pass def taskDone (self): pass def warn (self, msg): pass def info (self, msg): pass class SiteBuilder: def __init__ (self, location=None): self.log = logging.getLogger ("PubTal.SiteCreation") if (location is None): self.siteDir = os.tempnam() else: self.siteDir = location if (os.access (self.siteDir, os.F_OK)): msg = "Directory %s already exists!" % self.siteDir self.log.error (msg) raise Exception (msg) def buildDirs (self, templateDir="template", destinationDir="dest", contentDir="content"): self.log.debug ("Building site directory %s" % self.siteDir) os.mkdir (self.siteDir) self.contentDir = os.path.join (self.siteDir, contentDir) self.log.debug ("Building content directory %s" % self.contentDir) os.mkdir (self.contentDir) self.destinationDir = os.path.join (self.siteDir, destinationDir) self.log.debug ("Building destination directory %s" % self.destinationDir) os.mkdir (self.destinationDir) self.templateDir = os.path.join (self.siteDir, templateDir) self.log.debug ("Building template directory %s" % self.templateDir) os.mkdir (self.templateDir) def createContent (self, filePath, content): self.log.debug ("Creating content file %s" % filePath) destPath = os.path.join (self.contentDir, filePath) self._createDirsAndFile_ (destPath, content) def createTemplate (self, filePath, template): self.log.debug ("Creating template file %s" % filePath) destPath = os.path.join (self.templateDir, filePath) self._createDirsAndFile_ (destPath, template) def createConfigFile (self, filePath, config): self.log.debug ("Creating configuration file %s" % filePath) destPath = os.path.join (self.siteDir, filePath) self._createDirsAndFile_ (destPath, config) def getSiteDir (self): return self.siteDir def getContentDir (self): return self.contentDir def getDestDir (self): return self.destinationDir def _createDirsAndFile_ (self, destPath, content): # Make directories if required. destDir = os.path.split (destPath)[0] if (not os.path.exists (destDir)): os.makedirs (destDir) dest = open (destPath, 'w') dest.write (content) dest.close() def destroySite (self): self.log.debug ("Destroying site directory and contents") pathCleaner = pathRemover () pathCleaner.walk (self.siteDir) class PageBuilder: """ A class for determining the pages to be generated.""" def __init__ (self, config, ui=SilentUI()): self.ui = ui self.config = config self.messageBus = self.config.getMessageBus() self.contentConfig = config.getContentConfig() self.currentContent = [] self.log = logging.getLogger ('PageBuilder') self.contentDir = config.getContentDir() self.destDir = config.getDestinationDir() self.ignoreFilters = config.getIgnoreFilters() def getPages (self, target, options={}): """ Returns a Page list target is either: None - Get all files List of files or dir paths. """ result = [] self.messageBus.notifyEvent ("PageBuilder.Start", options) if (target is None): self.log.info ("Building whole site.") self.ui.info ("Building whole site.") targetList = [self.contentDir] else: targetList = [] for t in target: tFile = os.path.normpath (os.path.abspath (t)) if (hasattr (os.path, "realpath")): # Under Unix and 2.2 we can remove symlinks. tFile = os.path.realpath (tFile) targetList.append (os.path.abspath (tFile)) self.log.debug ("Target path: %s" % str (targetList)) for targetPath in targetList: self.log.debug ("Checking target path: %s" % targetPath) # Are we doing just one file or a dir? if (os.path.isfile (targetPath)): # Just get this entry try: result.extend (self.contentConfig.getPages (targetPath, options)) except: self.ui.taskError ("Unable to build Page %s" % targetPath) self.messageBus.notifyEvent ("PageBuilder.Error") raise else: os.path.walk (targetPath, self.walkPaths, None) for content in self.currentContent: try: result.extend (self.contentConfig.getPages (content, options)) except: self.ui.taskError ("Unable to build Page %s" % content) self.messageBus.notifyEvent ("PageBuilder.Error") raise self.currentContent = [] self.messageBus.notifyEvent ("PageBuilder.End") return result def walkPaths (self, arg, dirname, names): for name in names: self.log.debug ("Checking path %s for content." % os.path.join (dirname, name)) realName = os.path.join (dirname, name) if (os.path.isfile (realName)): contentFile = 1 for filter in self.ignoreFilters: if (filter.match (realName)): contentFile = 0 if (contentFile): self.currentContent.append (realName) else: self.log.debug ("Ignoring path %s" % realName) class pathRemover: def __init__ (self): self.dirsToRemove = [] self.log = logging.getLogger ("PubTal.SiteCreation.pathRemover") def walk (self, path): self.dirsToRemove = [path] os.path.walk (path, self.walking, None) # Now remove all of the directories we saw, starting with the last one self.dirsToRemove.reverse() for dir in self.dirsToRemove: os.rmdir (dir) self.dirsToRemove = [] def walking (self, arg, dirname, names): for name in names: #self.log.debug ("Would delete file: %s" % os.path.join (dirname, name)) target = os.path.join (dirname, name) if (os.path.islink (target)): os.remove (target) elif (os.path.isfile (target)): os.remove (target) elif (os.path.isdir (target)): self.dirsToRemove.append (target) else: self.log.error ("Path %s is neither a directory or a file!" % target) class XMLChecksumHandler (xml.sax.handler.ContentHandler, xml.sax.handler.DTDHandler, xml.sax.handler.ErrorHandler): """ A class that parses an XML document and generates an MD5 checksum for the document. This allows two XML documents to be compared, ignoring differences in attribute ordering and other such differences. """ def __init__ (self, parser): xml.sax.handler.ContentHandler.__init__ (self) self.ourParser = parser def startDocument (self): self.digest = hashlib.md5() def startPrefixMapping (self, prefix, uri): self.digest.update (prefix) self.digest.update (uri) def endPrefixMapping (self, prefix): self.digest.update (prefix) def startElement (self, name, atts): self.digest.update (name) allAtts = atts.getNames() allAtts.sort() for att in allAtts: self.digest.update (att) self.digest.update (atts [att]) def endElement (self, name): self.digest.update (name) def characters (self, data): self.digest.update (data) def processingInstruction (self, target, data): self.digest.update (target) self.digest.update (data) def skippedEntity (self, name): self.digest.update (name) # DTD Handler def notationDecl(self, name, publicId, systemId): self.digest.update (name) self.digest.update (publicId) self.digest.update (systemId) def unparsedEntityDecl(name, publicId, systemId, ndata): self.digest.update (name) self.digest.update (publicId) self.digest.update (systemId) self.digest.update (ndata) def error (self, excpt): print "Error: %s" % str (excpt) def warning (self, excpt): print "Warning: %s" % str (excpt) def getDigest (self): return self.digest.hexdigest() class DirCompare: def __init__ (self): self.xmlParser = None def compare (self, path, expected, comparisonFunc = None): """ By default do a string comparison between all files in the given path, and all expected files. Use compare (path, expected, comparisonFun = dirCompare.compareXML) to do an XML comparison. """ self.expected = copy.copy (expected) self.path = path self.badFile = None if (comparisonFunc is None): comparisonFunc = self.compareStrings os.path.walk (path, self.walking, comparisonFunc) if (self.badFile is not None): return self.badFile if (len (self.expected) > 0): return "Missing files: " + str (self.expected.keys()) return None def compareStrings (self, target, relTarget): testFile = open (target, 'r') content = testFile.read() testFile.close() if (content != self.expected [relTarget]): self.badFile = "File %s had content:\n%s\nexpected:\n%s\n" % (relTarget, content, self.expected [relTarget]) return 0 return 1 def compareXML (self, target, relTarget): """ Compares XML documents, discounting ordering of attributes, etc. """ if (self.xmlParser is None): self.xmlParser = xml.sax.make_parser() self.xmlParser.setFeature (xml.sax.handler.feature_external_ges, 0) self.xmlParser.setFeature (xml.sax.handler.feature_namespaces, 1) self.checksumHandler = XMLChecksumHandler(self.xmlParser) self.xmlParser.setContentHandler (self.checksumHandler) self.xmlParser.setDTDHandler (self.checksumHandler) self.xmlParser.setErrorHandler (self.checksumHandler) # Get the XML checksum of the file we are testng. testFile = open (target, 'r') self.xmlParser.parse (testFile) realChecksum = self.checksumHandler.getDigest() testFile.close() # Get the XML checksu mof the expected result. testFile = StringIO.StringIO (self.expected [relTarget]) self.xmlParser.parse (testFile) expectedChecksum = self.checksumHandler.getDigest() testFile.close() if (realChecksum != expectedChecksum): testFile = open (target, 'r') content = testFile.read() testFile.close() self.badFile = "File %s had content:\n%s\nexpected:\n%s\n" % (relTarget, content, self.expected [relTarget]) return 0 return 1 def walking (self, arg, dirname, names): if (self.badFile is not None): return comparisonFunc = arg commonRoot = os.path.commonprefix ([self.path, dirname]) for name in names: target = os.path.join (dirname, name) relTarget = target[len (commonRoot)+1:] if (os.path.isfile (target)): if (not self.expected.has_key (relTarget)): self.badFile = "Found unexepected file %s" % relTarget return if (not comparisonFunc (target, relTarget)): return del self.expected [relTarget] PubTal-3.5/lib/pubtal/plugins/0000755000105000010500000000000011555341012015043 5ustar cms103cms103PubTal-3.5/lib/pubtal/plugins/openOfficeContent/0000755000105000010500000000000011555341012020453 5ustar cms103cms103PubTal-3.5/lib/pubtal/plugins/openOfficeContent/__init__.py0000644000105000010500000001034411555340742022577 0ustar cms103cms103""" OpenOffice to HTML Plugin for PubTal Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: from pubtal import InfoLogging as logging from pubtal import SitePublisher from simpletal import simpleTAL, simpleTALES import OpenOfficeToHTMLConverter def getPluginInfo (): builtInContent = [{'functionality': 'content', 'content-type': 'OpenOffice' ,'file-type': 'sxw','class': OpenOfficePagePublisher}] return builtInContent class OpenOfficePagePublisher (SitePublisher.ContentPublisher): def __init__ (self, pagePublisher): SitePublisher.ContentPublisher.__init__ (self, pagePublisher) self.log = logging.getLogger ("PubTal.OpenOfficePagePublisher") self.converter = OpenOfficeToHTMLConverter.OpenOfficeConverter() # Get the default character set for the site. config = pagePublisher.getConfig() self.defaultCharset = config.getDefaultCharacterSet() self.encodingCapabilities = config.getEncodingCapabilities() def publish (self, page): template = self.templateConfig.getTemplate (page.getOption ('template', 'template.html')) context = simpleTALES.Context(allowPythonPath=1) # Get the page context for this content map = self.getPageContext (page, template) context.addGlobal ('page', map) macros = page.getMacros() # Determine the destination for this page relativeDestPath = map ['destinationPath'] self.pagePublisher.expandTemplate (template, context, relativeDestPath, macros) # Publish any bundled pictures. for fileName, data in self.converter.getPictures(): destFile = self.pagePublisher.openOuputFile (fileName) destFile.write (data) destFile.close() def getPageContext (self, page, template): pageMap = SitePublisher.ContentPublisher.getPageContext (self, page, template) # Determine the character set that will be used on output templateCharset = template.getOption ('character-set', self.defaultCharset) # Now determine what capabilities this character set offers smartQuotes = not self.encodingCapabilities.getCapability (templateCharset, 'SmartQuotes') hyphens = not self.encodingCapabilities.getCapability (templateCharset, 'Hyphen') # Parse the page options = {'CleanSmartQuotes': smartQuotes, 'CleanHyphens': hyphens} options ['DestinationFile'] = pageMap ['destinationPath'] options ['output-type'] = template.getOption ('output-type', 'HTML') options ['preserveSpaces'] = page.getBooleanOption ('preserve-html-spaces', 1) self.converter.convert (page.getSource(), options) headers = self.converter.getMetaInfo() content = self.converter.getContent() footNotes = self.converter.getFootNotes() actualHeaders = pageMap ['headers'] actualHeaders.update (headers) pageMap ['headers'] = actualHeaders pageMap ['content'] = content pageMap ['footnotes'] = footNotes return pageMap PubTal-3.5/lib/pubtal/plugins/openOfficeContent/OOFilter.py0000644000105000010500000002344511555340742022531 0ustar cms103cms103import copy, xml.sax try: import logging except: from pubtal import InfoLogging as logging # These are the tags that we explicitly handle. We also handle all field elements as well, but only by # ignoring them. #~ office:document-content #~ meta:keyword #~ style:style #~ style:properties #~ text:h #~ text:p #~ text:ordered-list #~ text:unordered-list #~ text:list-item #~ text:span #~ text:a #~ text:footnote #~ text:endnote #~ text:footnote-body #~ text:endnote-body #~ text:bookmark-start #~ text:bookmark #~ text:line-break #~ draw:image #~ draw:a #~ svg:desc #~ table:table #~table:sub-table #~ table:table-header-rows #~ table:table-row #~ table:table-cell # ALL Dublin core elements. # The TAG_MAP lists all tags we can handle # meta.xml has document-meta as root element. TAG_MAP = {'office:document-meta': ['office:meta'] ,'office:meta': ['meta:keywords', 'meta:creation-date', 'dc:title', 'dc:description', 'dc:subject', 'dc:creator', 'dc:date', 'dc:language'] ,'meta:keywords': ['meta:keyword'] ,'meta:keyword': [] ,'meta:creation-date': [] ,'dc:title': [] ,'dc:description': [] ,'dc:subject': [] ,'dc:creator': [] ,'dc:date': [] ,'dc:language': [] # styles.xml has document-styles ,'office:document-styles': ['office:styles'] ,'office:styles': ['style:style'] # style:style is used in both style.xml and content.xml, and only contains style:properties. ,'style:style': ['style:properties'] ,'style:properties': [] # The content.xml starts with office:document-content. We only care about styles and the body. ,'office:document-content': ['office:automatic-styles', 'office:body'] ,'office:automatic-styles': ['style:style'] ,'office:body': ['text:h', 'text:p', 'text:ordered-list', 'text:unordered-list' ,'table:table', 'draw:a', 'text:section']} # This list is taken from the OO DTD (text.mod) from the %fields ENTITY # The following elements have been removed, because our parser does not have # any code to handle them: # 'office:annotation', FIELD_ELEMENTS = ['text:date','text:time','text:page-number','text:page-continuation','text:sender-firstname','text:sender-lastname','text:sender-initials' ,'text:sender-title','text:sender-position','text:sender-email','text:sender-phone-private','text:sender-fax' ,'text:sender-company','text:sender-phone-work','text:sender-street','text:sender-city','text:sender-postal-code' ,'text:sender-country','text:sender-state-or-province','text:author-name','text:author-initials','text:placeholder','text:variable-set' ,'text:variable-get','text:variable-input','text:user-field-get','text:user-field-input','text:sequence','text:expression' ,'text:text-input','text:database-display','text:database-next','text:database-select','text:database-row-number','text:database-name' ,'text:initial-creator','text:creation-date','text:creation-time','text:description','text:user-defined','text:print-time','text:print-date' ,'text:printed-by','text:title','text:subject','text:keywords','text:editing-cycles','text:editing-duration','text:modification-time' ,'text:modification-date','text:creator','text:conditional-text','text:hidden-text','text:hidden-paragraph','text:chapter','text:file-name' ,'text:template-name','text:page-variable-set','text:page-variable-get','text:execute-macro','text:dde-connection','text:reference-ref' ,'text:sequence-ref','text:bookmark-ref','text:footnote-ref','text:endnote-ref','text:sheet-name','text:bibliography-mark','text:page-count' ,'text:paragraph-count','text:word-count','text:character-count','text:table-count','text:image-count','text:object-count' ,'text:script','text:measure'] # FIELD_ELEMENTS need to be all empty for us to handle them # These are the ones we can handle, despite not doing so explicitly for elmn in FIELD_ELEMENTS: TAG_MAP [elmn] = [] INLINE_ELEMENTS = copy.copy (FIELD_ELEMENTS) # This is based on the defintion in the text.mod DTD. # I've NOT listed those elements that are harmless but unimplemented (e.g. tab-stop) # Excluded: text:tab-stop, text:bookmark-stop, text:reference-mark, text:reference-mark-start, text:reference-mark-end # %shape, text:toc-mark-start, text:toc-mark-end, text:toc-mark, text:user-index-mark-start, text:user-index-mark-end # text:user-index-mark, text:alphabetical-index-mark-start, text:alphabetical-index-mark-end, text:alphabetical-index-mark # %change-marks;, text:ruby # # We do list draw:text-box as implemented, otherwise we can not handle images with captions. INLINE_ELEMENTS.extend (['text:span', 'text:line-break', 'text:footnote', 'text:endnote' , 'text:a', 'text:s', 'text:bookmark', 'text:bookmark-start', 'draw:a' , 'draw:image']) # Now we need to add these extra elements to the TAG_MAP, otherwise we'll filter them out! TAG_MAP ['text:span'] = INLINE_ELEMENTS TAG_MAP ['text:line-break'] = [] TAG_MAP ['text:footnote'] = ['text:footnote-body'] TAG_MAP ['text:footnote-body'] = ['text:h', 'text:p', 'text:ordered-list', 'text:unordered-list'] TAG_MAP ['text:endnote'] = ['text:endnote-body'] TAG_MAP ['text:endnote-body'] = ['text:h', 'text:p', 'text:ordered-list', 'text:unordered-list'] TAG_MAP ['text:a'] = INLINE_ELEMENTS TAG_MAP ['text:s'] = [] TAG_MAP ['text:bookmark'] = [] TAG_MAP ['text:bookmark-start'] = [] TAG_MAP ['draw:a'] = ['draw:image'] TAG_MAP ['draw:image'] = ['svg:desc'] TAG_MAP ['svg:desc'] = [] # Used by %textSections TEXT_SECTIONS_ELEMENTS = ['text:p', 'text:h', 'text:ordered-list', 'text:unordered-list' ,'table:table', 'text:section'] # We have the following elements left over that need to be defined in the TAG_MAP: # 'text:h', 'text:p', 'text:ordered-list', 'text:unordered-list', 'table:table', 'text:section' TAG_MAP ['text:h'] = INLINE_ELEMENTS TAG_MAP ['text:p'] = INLINE_ELEMENTS TAG_MAP ['text:unordered-list'] = ['text:list-item'] TAG_MAP ['text:ordered-list'] = ['text:list-item'] TAG_MAP ['text:list-item'] = ['text:p', 'text:h', 'text:ordered-list', 'text:unordered-list'] TAG_MAP ['table:table'] = ['table:table-header-rows', 'table:table-row', 'table:table-cell'] TAG_MAP ['table:table-header-rows'] = ['table:table-row'] TAG_MAP ['table:table-row'] = ['table:table-cell'] TAG_MAP ['table:table-cell'] = ['table:sub-table', 'text:h', 'text:p', 'text:ordered-list' ,'text:unordered-list'] TAG_MAP ['table:sub-table'] = ['table:table-header-rows', 'table:table-row', 'table:table-cell'] TAG_MAP ['text:section'] = TEXT_SECTIONS_ELEMENTS URLMAP = {'http://openoffice.org/2000/office': 'office' ,'http://openoffice.org/2000/text': 'text' ,'http://openoffice.org/2000/style': 'style' ,'http://openoffice.org/2000/table': 'table' ,'http://www.w3.org/1999/XSL/Format': 'fo' ,'http://purl.org/dc/elements/1.1/': 'dc' ,'http://openoffice.org/2000/meta': 'meta' ,'http://www.w3.org/1999/xlink': 'xlink' ,'http://www.w3.org/2000/svg': 'svg' ,'http://openoffice.org/2000/drawing': 'draw'} def validateTagMap(): errorMap = {} for element in TAG_MAP.keys(): for child in TAG_MAP [element]: if (not TAG_MAP.has_key (child)): errorMap [child] = 1 return errorMap.keys() class SAXFilter(xml.sax.handler.ContentHandler): """ The purpose of this class is to filter out calls that we don't handle. It also dispatches to other SAX handlers based on the namespaces that they register with. """ def __init__ (self): xml.sax.handler.ContentHandler.__init__ (self) self.log = logging.getLogger ("PubTal.OOC.SAXFilter") self.debugOn = self.log.isEnabledFor (logging.DEBUG) self.documentHandlers = {} self.handlerStack = [] self.allowedElementsStack = [] self.skipDepth = 0 def setHandler (self, namespace, handler): self.documentHandlers [namespace] = handler def startElementNS (self, name, qname, atts): # Are we skipping elements? if (self.skipDepth != 0): # Skipping, so just increment the depth self.skipDepth += 1 if (self.debugOn): self.log.debug ("Skipping element %s - depth now %s" % ('%s:%s' % (URLMAP.get (name[0],''), name[1]), str (self.skipDepth))) return # Determine whether this tag is allowed or not. elementName = '%s:%s' % (URLMAP.get (name[0],''), name[1]) if (not self.__checkAllowed__ (elementName)): self.skipDepth += 1 return # This element is allowed, so find a handler and pass it through handler = self.documentHandlers.get (name[0], None) self.handlerStack.append (handler) if (handler is not None): handler.startElementNS (name, qname, atts) def endElementNS (self, name, qname): if (self.skipDepth != 0): self.skipDepth -= 1 if (self.debugOn): self.log.debug ("Skipping END element %s - depth now %s" % ('%s:%s' % (URLMAP.get (name[0],''), name[1]), str (self.skipDepth))) return handler = self.handlerStack.pop() self.allowedElementsStack.pop() if (self.debugOn): self.log.debug ("Allowed END element %s" % '%s:%s' % (URLMAP.get (name[0],''), name[1])) if (handler is not None): handler.endElementNS (name, qname) def characters (self, data): if (self.skipDepth != 0): return handler = self.handlerStack [-1] if (handler is not None): handler.characters (data) def __checkAllowed__ (self, tagName): if (len (self.allowedElementsStack) == 0): # We are allowed, so let's record what we expect next. self.log.debug ("Root element passed, adding allowed elements to stack.") self.allowedElementsStack.append (TAG_MAP.get (tagName, [])) # We re-check for debug status when we see a root element, that way if logging # config is changed between runs we will pick it up. self.debugOn = self.log.isEnabledFor (logging.DEBUG) return 1 if tagName in self.allowedElementsStack[-1]: # We are allowed, so let's record what we expect next. if (self.debugOn): self.log.debug ("Found element %s, allowing" % tagName) self.allowedElementsStack.append (TAG_MAP.get (tagName, [])) return 1 #self.log.debug ("Element %s blocked." % tagName) return 0 PubTal-3.5/lib/pubtal/plugins/openOfficeContent/OpenOfficeToHTMLConverter.py0000644000105000010500000006647411555340742025754 0ustar cms103cms103""" OpenOffice to HTML Converter for PubTal Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import xml.sax, zipfile, StringIO, cgi, re, os.path try: import logging except: from pubtal import InfoLogging as logging import OOFilter from pubtal import HTMLWriter OFFICE_URI='http://openoffice.org/2000/office' TEXT_URI='http://openoffice.org/2000/text' STYLE_URI='http://openoffice.org/2000/style' TABLE_URI='http://openoffice.org/2000/table' FORMAT_URI='http://www.w3.org/1999/XSL/Format' DUBLIN_URI='http://purl.org/dc/elements/1.1/' META_URI='http://openoffice.org/2000/meta' XLINK_URI='http://www.w3.org/1999/xlink' SVG_URI='http://www.w3.org/2000/svg' DRAW_URI='http://openoffice.org/2000/drawing' # These are the fo styles that will be treated as CSS styles. SUPPORTED_FO_STYLES = {'text-align':1, 'font-weight':1, 'font-style':1, 'margin-left':1} # These lists act as filters on which styles are applied to which kind of elements. HEADING_STYLE_FILTER = ['text-align', 'margin-left'] PARAGRAPH_STYLE_FILTER = ['text-align', 'underline', 'line-through', 'overline' ,'font-weight', 'font-style', 'vertical-align', 'margin-left'] SPAN_STYLE_FILTER = PARAGRAPH_STYLE_FILTER # These are the assumed defaults for paragraphs - OO setting these will be ignored. DEFAULT_PARAGRAPH_STYLES = { 'text-align': 'start', 'font-weight': 'normal' ,'font-style': 'normal', 'margin-left': '0cm'} class OpenOfficeConverter: """ Convert OpenOffice format to HTML, XHTML or PlainText """ def __init__ (self): self.log = logging.getLogger ("PubTal.OOC") self.contentParser = SXWContentPraser () def convert (self, fileName, config={}): archive = zipfile.ZipFile (fileName, 'r') self.contentParser.parseContent (archive, config) archive.close() def getMetaInfo (self): return self.contentParser.getMetaInfo() def getContent (self): return self.contentParser.getContent() def getFootNotes (self): return self.contentParser.getFootNotes() def getPictures (self): return self.contentParser.getPictures() class SXWContentPraser (xml.sax.handler.DTDHandler): """ Convert OpenOffice format to HTML, XHTML or PlainText """ def __init__ (self): self.log = logging.getLogger ("PubTal.OOC.SWXContentParser") self.saxFilter = OOFilter.SAXFilter () def parseContent (self, archive, config): self.officeHandler = OfficeHandler(config) self.styleHandler = StyleHandler(config) self.textHandler = TextHandler (self.styleHandler, config) self.tableHandler = TableHandler (self.styleHandler, self.textHandler.result, config) self.drawHandler = DrawHandler (self.styleHandler, self.textHandler, config) self.saxFilter.setHandler (OFFICE_URI, self.officeHandler) self.saxFilter.setHandler (DUBLIN_URI, self.officeHandler) self.saxFilter.setHandler (META_URI, self.officeHandler) self.saxFilter.setHandler (STYLE_URI, self.styleHandler) self.saxFilter.setHandler (TEXT_URI, self.textHandler) self.saxFilter.setHandler (TABLE_URI, self.tableHandler) self.saxFilter.setHandler (DRAW_URI, self.drawHandler) self.saxFilter.setHandler (SVG_URI, self.drawHandler) self.ourParser = xml.sax.make_parser() self.log.debug ("Setting features of parser") self.ourParser.setFeature (xml.sax.handler.feature_external_ges, 0) self.ourParser.setFeature (xml.sax.handler.feature_namespaces, 1) self.ourParser.setContentHandler (self.saxFilter) # Initialise our variables self.pictureList = [] self.log.debug ("Parsing meta data.") sxwContent = archive.read ('meta.xml') contentFile = StringIO.StringIO (sxwContent) self.ourParser.parse (contentFile) self.log.debug ("Parsing styles.") sxwContent = archive.read ('styles.xml') contentFile = StringIO.StringIO (sxwContent) self.ourParser.parse (contentFile) self.log.debug ("Parsing actual content.") sxwContent = archive.read ('content.xml') contentFile = StringIO.StringIO (sxwContent) self.ourParser.parse (contentFile) # Read pictures for pictureFilename, newFilename in self.drawHandler.getBundledPictures(): self.pictureList.append ((newFilename, archive.read (pictureFilename))) def getMetaInfo (self): return self.officeHandler.getMetaInfo() def getContent (self): return self.textHandler.getContent() def getFootNotes (self): return self.textHandler.getFootNotes() def getPictures (self): return self.pictureList class OfficeHandler: def __init__ (self, config): self.log = logging.getLogger ("PubTal.OOC.OfficeHandler") self.metaData = {} self.keywords = [] self.charData = [] self.cleanSmartQuotes = config.get ('CleanSmartQuotes', 0) self.cleanHyphens = config.get ('CleanHyphens', 0) def startElementNS (self, name, qname, atts): self.charData = [] if (name[1] == 'document-content'): try: version = atts [(OFFICE_URI,'version')] self.log.debug ("Open Office format %s found." % version) if (float (version) != 1.0): self.log.warn ("Only OpenOffice format 1.0 is supported, version %s detected." % version) except Exception, e: msg = "Error determining OO version. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) def endElementNS (self, name, qname): data = u"".join (self.charData) self.charData = [] if (name[0] == META_URI): if (name [1] == 'keyword'): self.keywords.append (data) elif (name [1] == 'creation-date'): self.metaData [name [1]] = data if (name[0] == DUBLIN_URI): self.metaData [name [1]] = data def characters (self, data): if (self.cleanSmartQuotes): data = data.replace (u'\u201c', '"') data = data.replace (u'\u201d', '"') if (self.cleanHyphens): data = data.replace (u'\u2013', '-') self.charData.append (data) def getMetaInfo (self): self.metaData ['keywords'] = self.keywords return self.metaData class StyleHandler: def __init__ (self, config): self.log = logging.getLogger ("PubTal.OOC.StyleHandler") self.textStyleMap = {} self.paragraphStyleMap = {} self.currentStyleFamily = None self.currentStyle = None def startElementNS (self, name, qname, atts): realName = name [1] if (realName == 'style'): try: self.currentStyle = {} self.currentStyle ['name'] = atts [(STYLE_URI, 'name')] self.currentStyleFamily = atts [(STYLE_URI, 'family')] self.currentStyle ['parent-name'] = atts.get ((STYLE_URI, 'parent-style-name'), None) except Exception, e: msg = "Error parsing style information. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) if (realName == 'properties' and self.currentStyle is not None): for uri, attName in atts.keys(): if (uri == FORMAT_URI): if SUPPORTED_FO_STYLES.has_key (attName): attValue = atts [(FORMAT_URI, attName)] self.currentStyle [attName] = attValue if (uri == STYLE_URI): attValue = atts [(STYLE_URI, attName)] if (attValue != 'none'): if (attName == 'text-underline'): self.currentStyle ['underline'] = 'underline' if (attName == 'text-crossing-out'): self.currentStyle ['line-through'] = 'line-through' if (attName == 'text-position'): actualPosition = attValue [0:attValue.find (' ')] self.currentStyle ['vertical-align'] = actualPosition def endElementNS (self, name, qname): if (name[1] == 'style'): if (self.currentStyle is not None): name = self.currentStyle ['name'] if (self.currentStyleFamily == "paragraph"): self.log.debug ("Recording paragraph style %s" % name) self.paragraphStyleMap [name] = self.currentStyle elif (self.currentStyleFamily == "text"): self.log.debug ("Recording text style %s" % name) self.textStyleMap [name] = self.currentStyle else: self.log.debug ("Unsupported style family %s" % self.currentStyleFamily) self.currentStyle = None self.currentStyleFamily = None def characters (self, data): pass def getTextStyle (self, name): return self.styleLookup (name, self.textStyleMap) return foundStyle def getParagraphStyle (self, name): return self.styleLookup (name, self.paragraphStyleMap) def styleLookup (self, name, map): foundStyle = {} styleHierachy = [] lookupName = name while (lookupName is not None): lookupStyle = map.get (lookupName, None) if (lookupStyle is not None): styleHierachy.append (lookupStyle) lookupName = lookupStyle ['parent-name'] else: self.log.debug ("Style %s not found!" % lookupName) lookupName = None styleHierachy.reverse() for style in styleHierachy: foundStyle.update (style) return foundStyle class TextHandler: def __init__ (self, styleHandler, config): self.log = logging.getLogger ("PubTal.OOC.TextHandler") self.styleHandler = styleHandler # Check for the kind of output we are generating outputType = config.get ('output-type', 'HTML') self.outputPlainText = 0 if (outputType == 'HTML'): self.outputXHTML = 0 elif (outputType == 'XHTML'): self.outputXHTML = 1 elif (outputType == 'PlainText'): # Plain text trumps outputXHTML self.outputPlainText = 1 else: msg = "Attempt to configure for unsupported output-type %s. " + outputType self.log.error (msg) raise OpenOfficeFormatException (msg) if (self.outputPlainText): # We do not preserve spaces with   because our output is not space clean. self.result = HTMLWriter.PlainTextWriter(outputStream=StringIO.StringIO(), outputXHTML=1, preserveSpaces = 0) else: self.result = HTMLWriter.HTMLWriter(outputStream=StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = 0) # We use this stack to re-direct output into footnotes. self.resultStack = [] # We treat footnotes and endnotes the same. self.footNoteID = None self.footnotes = [] self.charData = [] # The closeTagsStack holds one entry per open OO text tag. # Those that have corresponding HTML tags have text, everything else has None self.closeTagsStack = [] # The effectiveStyleStack holds the effective style (e.g. paragraph) and is used to filter out # un-needed style changes. self.effectiveStyleStack = [DEFAULT_PARAGRAPH_STYLES] self.cleanSmartQuotes = config.get ('CleanSmartQuotes', 0) self.cleanHyphens = config.get ('CleanHyphens', 0) self.preserveSpaces = config.get ('preserveSpaces', 1) def startElementNS (self, name, qname, atts): #self.log.debug ("Start: %s" % name[1]) realName = name [1] styleName = atts.get ((TEXT_URI, 'style-name'), None) if (realName == 'h'): self.charData = [] # We have a heading - get the level and style. try: headingLevel = int (atts [(TEXT_URI, 'level')]) applicableStyle = self.styleHandler.getParagraphStyle (styleName) if (headingLevel > 6): self.log.warn ("Heading level of %s used, but HTML only supports up to level 6." % str (headingLevel)) headingLevel = 6 self.result.startElement ('h%s' % str (headingLevel), self.getCSSStyle (applicableStyle, HEADING_STYLE_FILTER)) self.closeTagsStack.append ('h%s' % str (headingLevel)) except Exception, e: msg = "Error parsing heading. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) elif (realName == 'p'): # We have a paragraph self.charData = [] applicableStyle = self.styleHandler.getParagraphStyle (styleName) if (styleName == "Preformatted Text"): # We have PRE text self.result.startElement ('pre', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('pre') elif (styleName == "Quotations"): # We have a block qutoe. self.result.startElement ('blockquote') self.result.startElement ('p', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append (['p', 'blockquote']) else: self.result.startElement ('p', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('p') # Footnotes can start with either paragraphs or lists. if (self.footNoteID is not None): self.result.startElement ('a', ' name="%s" style="vertical-align: super" href="#src%s"'% (self.footNoteID, self.footNoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.footNoteID = None elif (realName == 'ordered-list'): self.charData = [] applicableStyle = self.styleHandler.getParagraphStyle (styleName) self.result.startElement ('ol', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('ol') # Footnotes can start with either paragraphs or lists. if (self.footNoteID is not None): self.result.startElement ('a', ' name="%s" style="vertical-align: super" href="#src%s"'% (self.footNoteID, self.footNoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.footNoteID = None elif (realName == 'unordered-list'): self.charData = [] applicableStyle = self.styleHandler.getParagraphStyle (styleName) self.result.startElement ('ul', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('ul') # Footnotes can start with either paragraphs or lists. if (self.footNoteID is not None): self.result.startElement ('a', ' name="%s" style="vertical-align: super" href="#src%s"'% (self.footNoteID, self.footNoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.footNoteID = None elif (realName == 'list-item'): applicableStyle = self.styleHandler.getTextStyle (styleName) self.result.startElement ('li', self.getCSSStyle (applicableStyle, SPAN_STYLE_FILTER)) self.closeTagsStack.append ('li') elif (realName == 'span'): # We have some text formatting - write out any data already accumulated. self.writeData() applicableStyle = self.styleHandler.getTextStyle (styleName) if (styleName == "Source Text"): # We have PRE text self.result.startElement ('code', self.getCSSStyle (applicableStyle, SPAN_STYLE_FILTER)) self.closeTagsStack.append ('code') else: cssStyle = self.getCSSStyle (applicableStyle, SPAN_STYLE_FILTER) if (len (cssStyle) > 0): self.result.startElement ('span', cssStyle) self.closeTagsStack.append ('span') else: #self.log.debug ("Suppressing span - no change in style.") self.closeTagsStack.append (None) elif (realName == 'a'): self.writeData() linkDest = atts.get ((XLINK_URI, 'href'), None) if (linkDest is not None): self.result.startElement ('a', ' href="%s"' % linkDest) self.closeTagsStack.append ('a') else: self.closeTagsStack.append (None) # Links are underlined - we want this done by the style sheet, so ignore the underline. newEffectiveStyle = {} newEffectiveStyle.update (self.effectiveStyleStack[-1]) newEffectiveStyle ['underline'] = 'underline' self.effectiveStyleStack.append (newEffectiveStyle) elif (realName == 'footnote' or realName == 'endnote'): try: footnoteID = atts[(TEXT_URI, 'id')] except Exception, e: msg = "Error getting footnoteid. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) # Write out any data we have currently stored. self.writeData() # Now write out the link to the footnote self.result.startElement ('a', ' name="src%s" style="vertical-align: super" href="#%s"' % (footnoteID, footnoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.resultStack.append (self.result) if (self.outputPlainText): self.result = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=1, preserveSpaces = 0) else: self.result = HTMLWriter.HTMLWriter(outputStream = StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = 0) self.closeTagsStack.append (None) # Re-set the style stack for the footenote self.effectiveStyleStack.append (DEFAULT_PARAGRAPH_STYLES) # Keep this foonote id around for the first paragraph. self.footNoteID = footnoteID elif (realName == 'footnote-body' or realName == 'endnote-body'): self.closeTagsStack.append (None) # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) elif (realName == 'bookmark-start' or realName == 'bookmark'): try: bookmarkName = atts[(TEXT_URI, 'name')] except Exception, e: msg = "Error getting bookmark name. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) self.writeData() self.result.startElement ('a', ' name="%s"' % bookmarkName) self.closeTagsStack.append ('a') # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) elif (realName == 'line-break'): self.writeData() self.result.lineBreak() self.closeTagsStack.append (None) # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) elif (realName == 's'): # An extra space or two # Remove the leading space if possible so that we can output '  ' instead of '  ' removedSpace = 0 if (len (self.charData) > 0): if (self.charData [-1][-1] == u" "): self.charData [-1] = self.charData [-1][:-1] removedSpace = 1 self.writeData() count = int (atts.get ((TEXT_URI, 'c'), 1)) if (self.preserveSpaces): for spaces in xrange (count): self.result.nonbreakingSpace() if (removedSpace): # Add it back now self.charData.append (u" ") # Keep the effective style as-is, and ignore the close element self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) self.closeTagsStack.append (None) else: # We have no HTML output associated with this OO tag. self.closeTagsStack.append (None) # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) def endElementNS (self, name, qname): if (len (self.closeTagsStack) > 0): htmlTag = self.closeTagsStack.pop() if (htmlTag is not None): self.writeData() if (type (htmlTag) == type ([])): for a in htmlTag: self.result.endElement (a) else: self.result.endElement (htmlTag) # Remove this effective style. self.effectiveStyleStack.pop() if (name[1] == 'footnote' or name[1] == 'endnote'): # We have just closed a footnote or endnote - record the result, pop the stack. outputFile = self.result.getOutput() self.footnotes.append (outputFile.getvalue()) outputFile.close() self.result = self.resultStack.pop() def characters (self, data): if (self.cleanSmartQuotes): data = data.replace (u'\u201c', '"') data = data.replace (u'\u201d', '"') if (self.cleanHyphens): data = data.replace (u'\u2013', '-') self.charData.append (data) def writeData (self): data = u"".join (self.charData) self.result.write (cgi.escape (data)) self.charData = [] def getCSSStyle (self, applicableStyle, styleList): #self.log.debug ("Filtering styles %s for styles %s" % (str (applicableStyle), str (styleList))) textDecoration = [] cssStyles = [] # Take a look at the effective styles. effectiveStyles = self.effectiveStyleStack [-1] # Store the new effective style for future comparison newEffectiveStyle = {} newEffectiveStyle.update (effectiveStyles) for style in styleList: if (applicableStyle.has_key (style)): if (style in ["underline", "line-through", "overline"]): if (not effectiveStyles.has_key (style)): textDecoration.append (style) else: # We check to see whether the effective style already has this value # I.e. handle paragraph of font-style=normal and span of font-style=normal styleValue = applicableStyle [style] if (effectiveStyles.has_key (style)): if (effectiveStyles[style] != styleValue): cssStyles.append (u"%s:%s" % (style, styleValue)) else: #self.log.debug ("Style %s already in effect with value %s" % (style, styleValue)) pass else: cssStyles.append (u"%s:%s" % (style, styleValue)) # Note this new effective style newEffectiveStyle [style] = styleValue if (len (textDecoration) > 0): cssStyles.append (u"text-decoration: %s" % u",".join (textDecoration)) #self.log.debug ("Adding real effective style (%s) to stack." % str (newEffectiveStyle)) self.effectiveStyleStack.append (newEffectiveStyle) cssStyleList = ";".join (cssStyles) if (len (cssStyleList) > 0): return ' style="%s"' % cssStyleList return '' def getContent (self): return self.result.getOutput().getvalue() def getFootNotes (self): return self.footnotes class DrawHandler: def __init__ (self, styleHandler, textHandler, config): self.log = logging.getLogger ("PubTal.OOC.DrawHandler") self.styleHandler = styleHandler self.result = textHandler.result self.textHandler = textHandler self.charData = [] # The effectiveStyleStack holds the effective style (e.g. paragraph) and is used to filter out # un-needed style changes. self.effectiveStyleStack = [DEFAULT_PARAGRAPH_STYLES] self.closeTagsStack = [] self.bundledPictureList = [] self.currentImage = None # Check for the kind of output we are generating self.cleanSmartQuotes = config.get ('CleanSmartQuotes', 0) self.cleanHyphens = config.get ('CleanHyphens', 0) self.picturePrefix = os.path.join ('Pictures', config.get ('DestinationFile', '').replace ('.', '_')) self.log.debug ("Determined picture prefix as %s" % self.picturePrefix) def getBundledPictures (self): return self.bundledPictureList def startElementNS (self, name, qname, atts): theURI = name [0] realName = name [1] if (theURI == DRAW_URI): if (realName == 'image'): styleName = atts.get ((DRAW_URI, 'style-name'), None) href = atts.get ((XLINK_URI, 'href'), None) if (href is None): self.log.warn ("No href attribute found for image!") self.closeTagsStack = None return # Deal with bundled pictures if (href.startswith ('#Pictures/')): self.log.debug ("Found bundled picture %s" % href) archivePicName = href [1:] href = self.picturePrefix + archivePicName[9:] self.bundledPictureList.append ((archivePicName, href)) alt = atts.get ((DRAW_URI, 'name'), None) self.currentImage = {'href': href, 'alt': alt} self.closeTagsStack.append (None) elif (realName == 'a'): linkDest = atts.get ((XLINK_URI, 'href'), None) if (linkDest is not None): self.textHandler.writeData() self.result.startElement ('a', ' href="%s"' % linkDest) self.closeTagsStack.append ('a') else: self.closeTagsStack.append (None) elif (theURI == SVG_URI): if (realName == 'desc'): self.charData = [] self.closeTagsStack.append (None) else: self.closeTagsStack.append (None) def endElementNS (self, name, qname): if (len (self.closeTagsStack) > 0): htmlTag = self.closeTagsStack.pop() if (htmlTag is not None): self.result.endElement (htmlTag) # Remove this effective style. #self.effectiveStyleStack.pop() theURI = name [0] realName = name [1] if (theURI == SVG_URI): if (realName == 'desc'): # We have an image description - note it! altText = cgi.escape (u"".join (self.charData)) self.charData = [] if (self.currentImage is not None): self.currentImage ['alt'] = altText elif (theURI == DRAW_URI): if (realName == 'image'): self.textHandler.writeData() self.result.startElement ('img', ' src="%s" alt="%s"' % (self.currentImage ['href'], self.currentImage ['alt'])) self.result.endElement ('img') self.currentImage = None def characters (self, data): if (self.cleanSmartQuotes): data = data.replace (u'\u201c', '"') data = data.replace (u'\u201d', '"') if (self.cleanHyphens): data = data.replace (u'\u2013', '-') self.charData.append (data) class TableHandler: def __init__ (self, styleHandler, resultWriter, config): self.log = logging.getLogger ("PubTal.OOC.TextHandler") self.styleHandler = styleHandler self.result = resultWriter self.closeTagsStack = [] self.tableStatusStack = [] def startElementNS (self, name, qname, atts): #self.log.debug ("Start: %s" % name[1]) realName = name [1] styleName = atts.get ((TABLE_URI, 'style-name'), None) if (realName == 'table' or realName == 'sub-table'): self.result.startElement ('table') self.closeTagsStack.append ('table') self.tableStatusStack.append ({'inHeader':0, 'firstRow': 1}) elif (realName == 'table-header-rows'): status = self.tableStatusStack [-1] status ['inHeader'] = 1 self.result.startElement ('thead') self.closeTagsStack.append ('thead') elif (realName == 'table-row'): status = self.tableStatusStack [-1] if ((not status ['inHeader']) and (status ['firstRow'])): status ['firstRow'] = 0 self.result.startElement ('tbody') self.result.startElement ('tr') self.closeTagsStack.append ('tr') elif (realName == 'table-cell'): status = self.tableStatusStack [-1] colSpan = int (atts.get ((TABLE_URI, 'number-columns-spanned'), 0)) if (colSpan != 0): colSpanTxt = ' colspan="%s"' % str (colSpan) else: colSpanTxt = '' if (status ['inHeader']): self.result.startElement ('th', colSpanTxt) self.closeTagsStack.append ('th') else: self.result.startElement ('td', colSpanTxt) self.closeTagsStack.append ('td') else: self.closeTagsStack.append (None) def endElementNS (self, name, qname): realName = name [1] # We check for table because we want to insert tbody close before table close. if (len (self.tableStatusStack) > 0): status = self.tableStatusStack [-1] if (realName == 'table' or realName == 'sub-table'): if (not status ['firstRow']): # The table actually had content. self.result.endElement ('tbody') if (len (self.closeTagsStack) > 0): htmlTag = self.closeTagsStack.pop() if (htmlTag is not None): self.result.endElement (htmlTag) # We check for table header rows here. if (realName == 'table-header-rows'): status ['inHeader'] = 0 if (realName == 'table'): # Pop this table status off the stack self.tableStatusStack.pop() def characters (self, data): pass class OpenOfficeFormatException (Exception): pass PubTal-3.5/lib/pubtal/plugins/weblog/0000755000105000010500000000000011555341012016322 5ustar cms103cms103PubTal-3.5/lib/pubtal/plugins/weblog/__init__.py0000644000105000010500000003721011555340742020447 0ustar cms103cms103""" Weblog plugin for PubTal Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import os.path, time try: import logging except: from pubtal import InfoLogging as logging from pubtal import SitePublisher, DateContext from simpletal import simpleTAL, simpleTALES import WeblogContent # These two maps provide a fast lookup for month names SHORT_MONTH_MAP = {} LONG_MONTH_MAP = {} for month in range (1,13): SHORT_MONTH_MAP[month] = time.strftime ('%b', (2004,month,1,1,1,1,0,1,0)) LONG_MONTH_MAP[month] = time.strftime ('%B', (2004,month,1,1,1,1,0,1,0)) def getPluginInfo (): builtInContent = [{'functionality': 'content', 'content-type': 'Weblog' ,'file-type': 'post','class': WeblogPagePublisher}] return builtInContent class WeblogPagePublisher (SitePublisher.ContentPublisher): def __init__ (self, pagePublisher): SitePublisher.ContentPublisher.__init__ (self, pagePublisher) self.log = logging.getLogger ("PubTal.WeblogPagePublisher") self.manager = WeblogContent.WeblogManager(pagePublisher) self.log.info ("Registering page builder with content config.") siteConfig = pagePublisher.getConfig() contentConfig = siteConfig.getContentConfig() contentConfig.registerPageBuilder ('Weblog', self.manager.pageBuilder) self.templateConfig = siteConfig.getTemplateConfig() self.contentConfig = contentConfig self.contentDir = siteConfig.getContentDir() def publish (self, page): pageType = page.getOption ('weblogPageType') weblog = self.manager.getWeblog(page) if (pageType == 'day'): self.log.debug ("Getting template for day page.") template = self.templateConfig.getTemplate (page.getOption ('weblog-day-template', 'template.html')) self.log.debug ("Found weblog day template name of: " + str (template)) elif (pageType == 'index'): self.log.debug ("Getting template for index page.") template = self.templateConfig.getTemplate (page.getOption ('weblog-index-template', 'template.html')) self.log.debug ("Found weblog index template name of: " + str (template)) elif (pageType == 'syndication'): self.log.debug ("Getting templates for syndication pages.") weblogSyndicationTemplates = page.getListOption ('weblog-syndication-template') if (weblogSyndicationTemplates is None or len (weblogSyndicationTemplates) == 0): msg = "Syndication attempted, but no templates are defined!" self.log.error (msg) raise SitePublisher.PublisherException (msg) msg = "Syndication attempted, but no template defined!" for templateName in weblogSyndicationTemplates: context = simpleTALES.Context(allowPythonPath=1) template = self.templateConfig.getTemplate (templateName) # Get the page context for this content map = self.getPageContext (page, template) context.addGlobal ('page', map) macros = page.getMacros() # Determine the destination for this page relativeDestPath = map ['destinationPath'] self.pagePublisher.expandTemplate (template, context, relativeDestPath, macros) weblog.notePagePublished (page.getOption ('pageName')) elif (pageType == 'month'): self.log.debug ("Getting template for monthly archive page.") template = self.templateConfig.getTemplate (page.getOption ('weblog-month-template', 'template.html')) if (pageType != 'syndication'): self.log.debug ("Building non-syndication page.") context = simpleTALES.Context(allowPythonPath=1) # Get the page context for this content self.log.debug ("Getting page context.") map = self.getPageContext (page, template) self.log.debug ("Adding 'page' object to SimpleTALES.Context") context.addGlobal ('page', map) macros = page.getMacros() # Determine the destination for this page relativeDestPath = map ['destinationPath'] self.log.debug ("Expanding template.") self.pagePublisher.expandTemplate (template, context, relativeDestPath, macros) weblog.notePagePublished (page.getOption ('pageName')) def getPageContext (self, page, template): pageMap = SitePublisher.ContentPublisher.getPageContext (self, page, template) # The pageMap will contain two top level entries: months and days. # Pages go in the following locations: # day - yyyy/mm/ddmmyyyy.html # index - index.html # syndication - rss.xml # archive - yyyy/mm/archive.html # links are to the URL location: ddmmyyyy.html#HH:mi:ss # Default depth is 0. i.e. posts appear in weblog/a.post and we want to generate the index # no directories higher, in weblog/ self.log.debug ("Determining weblog home.") weblogDepth = int (page.getOption ('weblog-post-depth', '0')) + 1 # The monthly template is used to determine whether to generate the monthlyArchive object. monthlyTemplate = page.getOption ('weblog-month-template', None) # The site's hostname is needed for creating absolute URLs siteURLPrefix = page.getOption ('url-prefix') # Used for the default value for header/weblog-name weblogName = page.getOption ('weblog-name', 'Weblog') outputType = template.getOption ('output-type') plainTextMaxSize = template.getOption ('plaintext-maxsize') if (plainTextMaxSize is not None): plainTextMaxSize = int (plainTextMaxSize) destExtension = '.' + template.getTemplateExtension() # We need the day's extension for permaLinks - so let's work that out. dailyTemplateName = page.getOption ('weblog-day-template', None) if (dailyTemplateName is not None): dayTemplate = self.templateConfig.getTemplate (dailyTemplateName) dayExtension = '.' + dayTemplate.getTemplateExtension() else: dayExtension = None weblogRelativeHomeDestDir = pageMap ['destinationPath'] # Takes weblog/2004/01/12-34.html and turns it into weblog for depth in range (weblogDepth): weblogRelativeHomeDestDir = os.path.split (weblogRelativeHomeDestDir)[0] self.log.debug ("weblogRelativeHomeDestDir is %s" % weblogRelativeHomeDestDir) # Now get the depth of the weblog... head, tail = os.path.split (weblogRelativeHomeDestDir) weblogDepth = 0 while (tail != ''): weblogDepth += 1 head, tail = os.path.split (head) # We need the data associated with this weblog self.log.debug ("Getting weblog data object.") weblog = self.manager.getWeblog(page) postData = weblog.getPostData() postTree = weblog.getPostTree() pageType = page.getOption ('weblogPageType') if (pageType == 'day'): # We need to generate the list of posts for this day. dayStr = page.getOption ("weblogPageDay") self.log.debug ("Determining all posts for day %s." % dayStr) postList = postTree.getDaysPosts (dayStr) relativeDestPath = os.path.join (weblogRelativeHomeDestDir, dayStr[0:4], dayStr[4:6], "%s%s%s%s" % (dayStr [6:8],dayStr [4:6], dayStr [0:4], destExtension)) elif (pageType == 'index' or pageType == 'syndication'): # We need to get the index list of posts. indexSize = int (page.getOption ('weblog-index-size', '5')) self.log.debug ("Determining latest posts for index or syndication.") postList = postTree.getLatestPosts (indexSize) if (pageType == 'index'): relativeDestPath = os.path.join (weblogRelativeHomeDestDir, "index%s" % destExtension) else: self.log.debug ("Determining name of syndication file (template name is %s." % template.getTemplateName()) relativeDestPath = os.path.join (weblogRelativeHomeDestDir, "%s" % os.path.split (template.getTemplateName())[1]) elif (pageType == 'month'): archiveStr = page.getOption ("weblogArchiveYearMonth") self.log.debug ("Determining all posts for monthly archive %s." % archiveStr) postList = postTree.getMonthsPosts (archiveStr) relativeDestPath = os.path.join (weblogRelativeHomeDestDir, archiveStr[0:4], archiveStr[4:6], "archive%s" % destExtension) dayList = [] curDay = "00000000" curRealDate = None dayMap = None lastModifiedDate = None for post in postList: # Get the context for this post self.log.debug ("Getting context for post %s" % post) fullPathToPost = os.path.join (self.contentDir, post) self.log.debug ("Full path to the post is %s" % fullPathToPost) pageForPost = self.contentConfig.getPage (fullPathToPost) postLastModified = pageForPost.getModificationTime() if (lastModifiedDate is None or (lastModifiedDate < postLastModified)): lastModifiedDate = postLastModified postContext = postData.getPostContextMap (pageForPost, template) postCreationDate = postContext ['headers']['postCreationDate'] if (curDay != postCreationDate [0:8]): self.log.debug ("Found a new day %s." % postCreationDate) # It's a brand new day! if (dayMap is not None): self.log.debug ("Adding old day to the map.") # Get the date as Monday, 11 November 2002 dayMap ['date'] = DateContext.Date (curRealDate, '%a[LONG], %d[NP] %b[LONG] %Y') dayMap ['posts'] = dayPostList dayList.append (dayMap) dayMap = {} dayPostList = [] curRealDate = time.strptime (postCreationDate, WeblogContent.INTERNAL_DATE_FORMAT) curDay = postCreationDate [0:8] # Just need to add permaLink to postContext and we are done! # We only do perma-links if daily archives are enabled. # Perma-links are relative to the current file only for day pages. if (dayExtension is not None): permaLink = "#%s" % postCreationDate [8:16] if (pageType == 'index' or pageType == 'syndication'): # Permalinks for posts have to index into the yyyy/mm/ddmmyyyy.html permaLink = os.path.join (postCreationDate [0:4], postCreationDate [4:6], "%s%s%s%s%s" % (postCreationDate [6:8],postCreationDate [4:6], postCreationDate [0:4], dayExtension, permaLink)) elif (pageType == 'month'): # Permalinks for posts have to index into the yyyy/mm/ddmmyyyy.html permaLink = "%s%s%s%s%s" % (postCreationDate [6:8],postCreationDate [4:6], postCreationDate [0:4], dayExtension, permaLink) if (pageType == 'day'): # Permalink name postContext ['permaLinkName'] = postCreationDate [8:16] if (pageType != 'day'): postContext ['permaLink'] = permaLink if (siteURLPrefix is not None): postContext ['absolutePermaLink'] = '%s/%s' % (siteURLPrefix, os.path.join (weblogRelativeHomeDestDir, permaLink)) # RSS requires truncating of output, so we need to check for that here. if (pageType == 'syndication' and outputType == 'PlainText' and plainTextMaxSize is not None): self.log.info ("Truncating syndication PlainText output to %s" % str (plainTextMaxSize)) postBody = postContext.get ('content', None) if (postBody is not None): if (len (postBody) > plainTextMaxSize): postBody = postBody [:plainTextMaxSize] + "..." postContext ['content'] = postBody else: self.log.warn ("Post body not found!") dayPostList.append (postContext) if (dayMap is not None): self.log.debug ("Adding final day to the map.") # Get the date as Monday, 11 November 2002 dayMap ['date'] = DateContext.Date (curRealDate, '%a[LONG], %d[NP] %b[LONG] %Y') dayMap ['posts'] = dayPostList dayList.append (dayMap) pageMap ['days'] = dayList # Now do the months object if applicable... if (monthlyTemplate is not None): self.log.debug ("Monthly template is defined, so creating monthlyArchive object.") monthlyArchiveList = [] yearObject = {} yearsMonthList = [] currentYear = None for monthYearStr in postTree.getAllMonthlyNames(): # monthlyYearStr is yyyymm self.log.debug ("Handling year/month %s" % monthYearStr) year = int (monthYearStr [0:4]) month = int (monthYearStr [4:6]) if (currentYear != year): self.log.debug ("A new year found.") if (currentYear is not None): self.log.debug ("Old year will be added to the list.") yearObject ['yearName'] = str (currentYear) yearObject ['monthList'] = yearsMonthList monthlyArchiveList.append (yearObject) yearObject = {} yearsMonthList = [] currentYear = year monthLong = LONG_MONTH_MAP [month] monthShort = SHORT_MONTH_MAP [month] # archiveLink depends on current page type, but should point to yyyy/mm/archive.html if (pageType == 'index' or pageType == 'syndication'): # Montly archives have to index into the yyyy/mm/archive.html archiveLink = os.path.join (monthYearStr [0:4], monthYearStr [4:6], "archive%s" % destExtension) elif (pageType == 'month' or pageType == 'day'): # Monthly archives and posts have to index into ../../yyyy/mm/archive.html archiveLink = os.path.join ('..', '..', monthYearStr [0:4], monthYearStr [4:6], "archive%s" % destExtension) yearsMonthList.append ({'monthNameLong': monthLong, 'monthNameShort': monthShort ,'monthNumber': str (month), 'archiveLink': archiveLink}) # Do the final year... if (currentYear is not None): self.log.debug ("A final year found.") yearObject ['yearName'] = str (currentYear) yearObject ['monthList'] = yearsMonthList monthlyArchiveList.append (yearObject) pageMap ['monthlyArchive'] = monthlyArchiveList if (pageType == 'month'): # We do special things for monthly archives. month = int (archiveStr[4:6]) monthLong = LONG_MONTH_MAP [month] monthShort = SHORT_MONTH_MAP [month] pageMap ['yearName'] = archiveStr[0:4] pageMap ['monthNameLong'] = monthLong pageMap ['monthNameShort'] = monthShort pageMap ['depth'] = "../"*(weblogDepth + 2) elif (pageType == 'day'): pageMap ['dayDate'] = DateContext.Date (curRealDate, '%a[LONG], %d[NP] %b[LONG] %Y') pageMap ['depth'] = "../"*(weblogDepth + 2) else: pageMap ['depth'] = "../"*(weblogDepth) # The last modified date of this page is the latest modification date of its components. pageMap ['lastModifiedDate'] = DateContext.Date (time.localtime (lastModifiedDate), '%a[SHORT], %d %b[SHORT] %Y %H:%M:%S %Z') pageMap ['weblog-name'] = weblogName weblogTagPrefix = page.getOption ('weblog-tag-prefix') if (weblogTagPrefix is not None): pageMap ['weblog-tag-prefix'] = "tag:%s" % weblogTagPrefix if (siteURLPrefix is not None): if (len (weblogRelativeHomeDestDir) > 0): pageMap ['weblog-link'] = "%s/%s/" % (siteURLPrefix, weblogRelativeHomeDestDir) else: pageMap ['weblog-link'] = "%s/" % siteURLPrefix if (siteURLPrefix is not None): pageMap ['absoluteDestinationURL'] = '%s/%s' % (siteURLPrefix, relativeDestPath) pageMap ['destinationPath'] = relativeDestPath pageMap ['absoluteDestinationPath'] = os.path.join (self.destDir, relativeDestPath) return pageMap PubTal-3.5/lib/pubtal/plugins/weblog/WeblogContent.py0000644000105000010500000006533011555340742021466 0ustar cms103cms103""" Weblog plugin for PubTal Copyright (c) 2009 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import anydbm, os, os.path, re, time, string, hashlib, codecs from pubtal import timeformat FIELDREGEX=re.compile ('(? 0): cssStyles.append (u"text-decoration: %s" % u",".join (textDecoration)) cssStyleList = ";".join (cssStyles) self.log.debug ("Built style list %s", cssStyleList) if (len (cssStyleList) > 0): return ' style="%s"' % cssStyleList return '' def parseContent (self, tree): self.log.info ("Parsing content") body = tree.find ("//%(o)sbody" % PM) # contentGenerator uses recursion to go through all tags self.contentGenerator (body) def contentGenerator (self, node): tagHandler = self.tagHandlers.get (node.tag, self.defaultTagHandler) requestedStyle = self.determineRequestedStyle (node) self.styleStack.append (requestedStyle) for child in tagHandler (node, requestedStyle): #self.log.debug ("Recursively handle child elements") self.contentGenerator (child) self.styleStack.pop() return def defaultTagHandler (self, node, requestedStyle): if (node.tag.startswith ("%(t)s" % PM)): unhandledText = True if (node.text): self.result.write (cgi.escape (node.text)) else: unhandledText = False for child in node.getchildren(): yield child if (node.tail and unhandledText): self.result.write (cgi.escape (node.tail)) def headingTag (self, node, requestedStyle): """ Heading Tag Handler Deals with the start tag and any initial text, then yeilds all children, then deals with any remaining text and the close tag. """ headingLevel = int(node.get ("%(t)soutline-level" %PM,1)) if (headingLevel > 6): self.log.warn ("Heading level of %s used, but HTML only supports up to level 6.", headingLevel) headingLevel = 6 self.result.startElement ('h%s' % str (headingLevel), self.getCSSStyle (HEADING_STYLE_FILTER)) if (node.text): self.result.write (cgi.escape (node.text)) for child in node.getchildren(): yield child if (node.tail): self.result.write (cgi.escape (node.tail)) self.result.endElement ('h%s' % str (headingLevel)) def paragraphTag (self, node, requestedStyle): """ Paragrah Tag Handler Deals with the start tag and any initial text, then yeilds all children, then deals with any remaining text and the close tag. """ self.result.startElement ('p', self.getCSSStyle (PARAGRAPH_STYLE_FILTER)) if (node.text): self.result.write (cgi.escape (node.text)) for child in node.getchildren(): yield child if (node.tail): self.result.write (cgi.escape (node.tail)) self.result.endElement ('p') def parseMetaData (self, tree): metaNode = tree.find ("%(o)smeta/" % PM) self.metaData ['creation-date'] = metaNode.findtext ("%(m)screation-date" % PM) self.metaData ['title'] = metaNode.findtext ("%(dub)stitle" % PM) self.metaData ['description'] = metaNode.findtext ("%(dub)sdescription" % PM) self.metaData ['subject'] = metaNode.findtext ("%(dub)ssubject" % PM) self.metaData ['language'] = metaNode.findtext ("%(dub)slanguage" % PM) self.metaData ['keywords'] = [] for keywords in metaNode.findall ("%(m)skeyword" % PM): self.metaData ['keywords'].append (keywords.text) self.log.debug ("Meta information %s", self.metaData) def parseStyleData (self, tree): """ Parse style data is called twice, once with the tree for styles once with the content tree. This allows us to build a full style library covering both styles, and automatic styles. """ for styleElm in tree.findall ("//%(s)sstyle" % PM): styleInfo = {} styleName = styleElm.get ("%(s)sname" % PM) styleInfo ['family'] = styleElm.get ("%(s)sfamily" % PM) styleInfo ['parent'] = styleElm.get ("%(s)sparent-style-name" % PM, None) styleInfo ['display-name'] = styleElm.get ("%(s)sdisplay-name" % PM) children = styleElm.getchildren() if (len (children) > 0): # Assume the first child is the properties styleProp = children [0] for key, attValue in styleProp.items(): if key in SUPPORTED_FO_STYLES: prop = key[key.find ("}")+1:] styleInfo [prop] = attValue if (key == "%(s)stext-underline-style" % PM): styleInfo ["underline"] = "underline" if (key == "%(s)stext-line-through-style" % PM): styleInfo ["line-through"] = "line-through" if (key == "%(s)stext-position" % PM): actualPosition = attValue[0:attValue.find (' ')] styleInfo ["vertical-align"] = actualPosition self.log.debug ("Adding style %s with config %s", styleName, styleInfo) self.styles [styleName] = styleInfo logging.info ("Looking for list styles") for styleElm in tree.findall ("//%(t)slist-style" % PM): listStyleStack = [] styleName = styleElm.get ("%(s)sname" % PM) for listStyleElm in styleElm.getchildren(): orderedList = False if listStyleElm.tag == "%(t)slist-level-style-number" % PM: orderedList = True listStyleStack.append (orderedList) self.log.debug ("Adding list style %s with stack %s", styleName, listStyleStack) self.listStyles [styleName] = listStyleStack def init_configuration (self): outputType = self.configuration.get ('output-type', 'HTML') self.outputXHTML = True if outputType == "HTML" else False self.outputPlainText = True if outputType == "PlainText" else False if (self.outputPlainText): # We do not preserve spaces with   because our output is not space clean. self.result = HTMLWriter.PlainTextWriter(outputStream=StringIO.StringIO(), outputXHTML=1, preserveSpaces = 0) else: self.result = HTMLWriter.HTMLWriter(outputStream=StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = 0) # We use this stack to re-direct output into footnotes. self.resultStack = [] # We treat footnotes and endnotes the same. self.footNoteID = None self.footnotes = [] # The effectiveStyleStack holds the effective style (e.g. paragraph) and is used to filter out # un-needed style changes. self.effectiveStyleStack = [DEFAULT_PARAGRAPH_STYLES] self.cleanSmartQuotes = self.configuration.get ('CleanSmartQuotes', 0) self.cleanHyphens = self.configuration.get ('CleanHyphens', 0) self.preserveSpaces = self.configuration.get ('preserveSpaces', 1) def getMetaInfo (self): return self.metaData def getContent (self): cnt = self.result.getOutput().getvalue() self.log.debug ("Content: %s", cnt) return cnt def getFootNotes (self): return def getPictures (self): return self.pictureListPubTal-3.5/lib/pubtal/plugins/openDocumentFormat/__init__.py0000644000105000010500000001035211555340743023000 0ustar cms103cms103""" Open Document Format to HTML Plugin for PubTal Copyright (c) 2009 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: from pubtal import InfoLogging as logging from pubtal import SitePublisher from simpletal import simpleTAL, simpleTALES import ODFToHTMLConverter def getPluginInfo (): builtInContent = [{'functionality': 'content', 'content-type': 'OpenDocument' ,'file-type': 'odt','class': OpenDocumentPagePublisher}] return builtInContent class OpenDocumentPagePublisher (SitePublisher.ContentPublisher): def __init__ (self, pagePublisher): SitePublisher.ContentPublisher.__init__ (self, pagePublisher) self.log = logging.getLogger ("PubTal.OpenDocumentPagePublisher") self.converter = ODFToHTMLConverter.OpenDocumentConverter() # Get the default character set for the site. config = pagePublisher.getConfig() self.defaultCharset = config.getDefaultCharacterSet() self.encodingCapabilities = config.getEncodingCapabilities() def publish (self, page): template = self.templateConfig.getTemplate (page.getOption ('template', 'template.html')) context = simpleTALES.Context(allowPythonPath=1) # Get the page context for this content map = self.getPageContext (page, template) context.addGlobal ('page', map) macros = page.getMacros() # Determine the destination for this page relativeDestPath = map ['destinationPath'] self.pagePublisher.expandTemplate (template, context, relativeDestPath, macros) # Publish any bundled pictures. for fileName, data in self.converter.getPictures(): destFile = self.pagePublisher.openOuputFile (fileName) destFile.write (data) destFile.close() def getPageContext (self, page, template): pageMap = SitePublisher.ContentPublisher.getPageContext (self, page, template) # Determine the character set that will be used on output templateCharset = template.getOption ('character-set', self.defaultCharset) # Now determine what capabilities this character set offers smartQuotes = not self.encodingCapabilities.getCapability (templateCharset, 'SmartQuotes') hyphens = not self.encodingCapabilities.getCapability (templateCharset, 'Hyphen') # Parse the page options = {'CleanSmartQuotes': smartQuotes, 'CleanHyphens': hyphens} options ['DestinationFile'] = pageMap ['destinationPath'] options ['output-type'] = template.getOption ('output-type', 'HTML') options ['preserveSpaces'] = page.getBooleanOption ('preserve-html-spaces', 1) self.converter.convert (page.getSource(), options) headers = self.converter.getMetaInfo() content = self.converter.getContent() footNotes = self.converter.getFootNotes() actualHeaders = pageMap ['headers'] actualHeaders.update (headers) pageMap ['headers'] = actualHeaders pageMap ['content'] = content pageMap ['footnotes'] = footNotes return pageMap PubTal-3.5/lib/pubtal/plugins/openDocumentFormat/OpenDocumentToHTMLConverter.py0000644000105000010500000007006011555340743026523 0ustar cms103cms103""" OpenDocument to HTML Converter for PubTal Copyright (c) 2009 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import xml.sax, zipfile, StringIO, cgi, re, os.path try: import logging except: from pubtal import InfoLogging as logging import OOFilter from pubtal import HTMLWriter OFFICE_URI='urn:oasis:names:tc:opendocument:xmlns:office:1.0' TEXT_URI='urn:oasis:names:tc:opendocument:xmlns:text:1.0' STYLE_URI='urn:oasis:names:tc:opendocument:xmlns:style:1.0' TABLE_URI='urn:oasis:names:tc:opendocument:xmlns:table:1.0' #FORMAT_URI='http://www.w3.org/1999/XSL/Format' FORMAT_URI='urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0' DUBLIN_URI='http://purl.org/dc/elements/1.1/' META_URI='urn:oasis:names:tc:opendocument:xmlns:meta:1.0' XLINK_URI='http://www.w3.org/1999/xlink' SVG_URI='urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0' DRAW_URI='urn:oasis:names:tc:opendocument:xmlns:drawing:1.0' # These are the fo styles that will be treated as CSS styles. SUPPORTED_FO_STYLES = {'text-align':1, 'font-weight':1, 'font-style':1, 'margin-left':1} # These lists act as filters on which styles are applied to which kind of elements. HEADING_STYLE_FILTER = ['text-align', 'margin-left'] PARAGRAPH_STYLE_FILTER = ['text-align', 'underline', 'line-through', 'overline' ,'font-weight', 'font-style', 'vertical-align', 'margin-left'] SPAN_STYLE_FILTER = PARAGRAPH_STYLE_FILTER # These are the assumed defaults for paragraphs - OO setting these will be ignored. DEFAULT_PARAGRAPH_STYLES = { 'text-align': 'start', 'font-weight': 'normal' ,'font-style': 'normal', 'margin-left': '0cm'} class OpenDocumentConverter: """ Convert OpenOffice format to HTML, XHTML or PlainText """ def __init__ (self): self.log = logging.getLogger ("PubTal.ODC") self.contentParser = ODFContentPraser () def convert (self, fileName, config={}): archive = zipfile.ZipFile (fileName, 'r') self.contentParser.parseContent (archive, config) archive.close() def getMetaInfo (self): return self.contentParser.getMetaInfo() def getContent (self): return self.contentParser.getContent() def getFootNotes (self): return self.contentParser.getFootNotes() def getPictures (self): return self.contentParser.getPictures() class ODFContentPraser (xml.sax.handler.DTDHandler): """ Convert OpenDocument format to HTML, XHTML or PlainText """ def __init__ (self): self.log = logging.getLogger ("PubTal.ODF.ODFContentParser") self.saxFilter = ODFFilter.SAXFilter () def parseContent (self, archive, config): self.officeHandler = OfficeHandler(config) self.styleHandler = StyleHandler(config) self.textHandler = TextHandler (self.styleHandler, config) self.tableHandler = TableHandler (self.styleHandler, self.textHandler.result, config) self.drawHandler = DrawHandler (self.styleHandler, self.textHandler, config) self.saxFilter.setHandler (OFFICE_URI, self.officeHandler) self.saxFilter.setHandler (DUBLIN_URI, self.officeHandler) self.saxFilter.setHandler (META_URI, self.officeHandler) self.saxFilter.setHandler (STYLE_URI, self.styleHandler) self.saxFilter.setHandler (TEXT_URI, self.textHandler) self.saxFilter.setHandler (TABLE_URI, self.tableHandler) self.saxFilter.setHandler (DRAW_URI, self.drawHandler) self.saxFilter.setHandler (SVG_URI, self.drawHandler) self.ourParser = xml.sax.make_parser() self.log.debug ("Setting features of parser") self.ourParser.setFeature (xml.sax.handler.feature_external_ges, 0) self.ourParser.setFeature (xml.sax.handler.feature_namespaces, 1) self.ourParser.setContentHandler (self.saxFilter) # Initialise our variables self.pictureList = [] self.log.debug ("Parsing meta data.") odfContent = archive.read ('meta.xml') contentFile = StringIO.StringIO (odfContent) self.ourParser.parse (contentFile) self.log.debug ("Parsing styles.") odfContent = archive.read ('styles.xml') contentFile = StringIO.StringIO (odfContent) self.ourParser.parse (contentFile) self.log.debug ("Parsing actual content.") odfContent = archive.read ('content.xml') contentFile = StringIO.StringIO (odfContent) self.ourParser.parse (contentFile) # Read pictures for pictureFilename, newFilename in self.drawHandler.getBundledPictures(): self.pictureList.append ((newFilename, archive.read (pictureFilename))) def getMetaInfo (self): return self.officeHandler.getMetaInfo() def getContent (self): return self.textHandler.getContent() def getFootNotes (self): return self.textHandler.getFootNotes() def getPictures (self): return self.pictureList class OfficeHandler: def __init__ (self, config): self.log = logging.getLogger ("PubTal.ODF.OfficeHandler") self.metaData = {} self.keywords = [] self.charData = [] self.cleanSmartQuotes = config.get ('CleanSmartQuotes', 0) self.cleanHyphens = config.get ('CleanHyphens', 0) def startElementNS (self, name, qname, atts): self.charData = [] if (name[1] == 'document-content'): try: version = atts [(OFFICE_URI,'version')] self.log.debug ("Open Document format %s found." % version) if (float (version) != 1.2): self.log.warn ("Only Open Document format 1.2 is supported, version %s detected." % version) except Exception, e: msg = "Error determining ODF version. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) def endElementNS (self, name, qname): data = u"".join (self.charData) self.charData = [] if (name[0] == META_URI): if (name [1] == 'keyword'): self.keywords.append (data) elif (name [1] == 'creation-date'): self.metaData [name [1]] = data if (name[0] == DUBLIN_URI): self.metaData [name [1]] = data def characters (self, data): if (self.cleanSmartQuotes): data = data.replace (u'\u201c', '"') data = data.replace (u'\u201d', '"') if (self.cleanHyphens): data = data.replace (u'\u2013', '-') self.charData.append (data) def getMetaInfo (self): self.metaData ['keywords'] = self.keywords return self.metaData class StyleHandler: def __init__ (self, config): self.log = logging.getLogger ("PubTal.ODF.StyleHandler") self.textStyleMap = {} self.paragraphStyleMap = {} self.currentStyleFamily = None self.currentStyle = None def startElementNS (self, name, qname, atts): realName = name [1] if (realName == 'style'): try: self.currentStyle = {} self.currentStyle ['name'] = atts [(STYLE_URI, 'name')] self.currentStyleFamily = atts [(STYLE_URI, 'family')] self.currentStyle ['parent-name'] = atts.get ((STYLE_URI, 'parent-style-name'), None) # Capture the display name as it's the only way to handle # Preformatted text, etc. self.currentStyle ['display-name'] = atts.get ((STYLE_URI, 'display-name'), None) except Exception, e: msg = "Error parsing style information. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) if (realName in ('graphic-properties', 'paragraph-properties', 'text-properties', 'table-properties', 'table-row-properties') and self.currentStyle is not None): for uri, attName in atts.keys(): if (uri == FORMAT_URI): if SUPPORTED_FO_STYLES.has_key (attName): attValue = atts [(FORMAT_URI, attName)] self.currentStyle [attName] = attValue if (uri == STYLE_URI): attValue = atts [(STYLE_URI, attName)] if (attValue != 'none'): if (attName == 'text-underline-style'): self.currentStyle ['underline'] = 'underline' if (attName == 'text-line-through-style'): self.currentStyle ['line-through'] = 'line-through' if (attName == 'text-position'): actualPosition = attValue [0:attValue.find (' ')] self.currentStyle ['vertical-align'] = actualPosition def endElementNS (self, name, qname): if (name[1] == 'style'): if (self.currentStyle is not None): name = self.currentStyle ['name'] if (self.currentStyleFamily == "paragraph"): self.log.debug ("Recording paragraph style %s" % name) self.paragraphStyleMap [name] = self.currentStyle elif (self.currentStyleFamily == "text"): self.log.debug ("Recording text style %s" % name) self.textStyleMap [name] = self.currentStyle else: self.log.debug ("Unsupported style family %s" % self.currentStyleFamily) self.currentStyle = None self.currentStyleFamily = None def characters (self, data): pass def getTextStyle (self, name): return self.styleLookup (name, self.textStyleMap) return foundStyle def getParagraphStyle (self, name): return self.styleLookup (name, self.paragraphStyleMap) def styleLookup (self, name, map): foundStyle = {} styleHierachy = [] lookupName = name while (lookupName is not None): lookupStyle = map.get (lookupName, None) if (lookupStyle is not None): styleHierachy.append (lookupStyle) lookupName = lookupStyle ['parent-name'] else: self.log.debug ("Style %s not found!" % lookupName) lookupName = None styleHierachy.reverse() for style in styleHierachy: foundStyle.update (style) return foundStyle class TextHandler: def __init__ (self, styleHandler, config): self.log = logging.getLogger ("PubTal.ODF.TextHandler") self.styleHandler = styleHandler # Check for the kind of output we are generating outputType = config.get ('output-type', 'HTML') self.outputPlainText = 0 if (outputType == 'HTML'): self.outputXHTML = 0 elif (outputType == 'XHTML'): self.outputXHTML = 1 elif (outputType == 'PlainText'): # Plain text trumps outputXHTML self.outputPlainText = 1 else: msg = "Attempt to configure for unsupported output-type %s. " + outputType self.log.error (msg) raise OpenOfficeFormatException (msg) if (self.outputPlainText): # We do not preserve spaces with   because our output is not space clean. self.result = HTMLWriter.PlainTextWriter(outputStream=StringIO.StringIO(), outputXHTML=1, preserveSpaces = 0) else: self.result = HTMLWriter.HTMLWriter(outputStream=StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = 0) # We use this stack to re-direct output into footnotes. self.resultStack = [] # We treat footnotes and endnotes the same. self.footNoteID = None self.footnotes = [] self.charData = [] # The closeTagsStack holds one entry per open OO text tag. # Those that have corresponding HTML tags have text, everything else has None self.closeTagsStack = [] # The effectiveStyleStack holds the effective style (e.g. paragraph) and is used to filter out # un-needed style changes. self.effectiveStyleStack = [DEFAULT_PARAGRAPH_STYLES] self.cleanSmartQuotes = config.get ('CleanSmartQuotes', 0) self.cleanHyphens = config.get ('CleanHyphens', 0) self.preserveSpaces = config.get ('preserveSpaces', 1) def startElementNS (self, name, qname, atts): #self.log.debug ("Start: %s" % name[1]) realName = name [1] styleName = atts.get ((TEXT_URI, 'style-name'), None) if (realName == 'h'): self.charData = [] # We have a heading - get the level and style. try: headingLevel = int (atts [(TEXT_URI, 'outline-level')]) applicableStyle = self.styleHandler.getParagraphStyle (styleName) if (headingLevel > 6): self.log.warn ("Heading level of %s used, but HTML only supports up to level 6." % str (headingLevel)) headingLevel = 6 self.result.startElement ('h%s' % str (headingLevel), self.getCSSStyle (applicableStyle, HEADING_STYLE_FILTER)) self.closeTagsStack.append ('h%s' % str (headingLevel)) except Exception, e: msg = "Error parsing heading. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) elif (realName == 'p'): # We have a paragraph self.charData = [] applicableStyle = self.styleHandler.getParagraphStyle (styleName) if ("Preformatted Text" in (applicableStyle ['display-name'], styleName)): # We have PRE textStyleMap self.result.startElement ('pre', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('pre') elif ("Quotations" in (styleName, applicableStyle ['display-name'])): # We have a block qutoe. self.result.startElement ('blockquote') self.result.startElement ('p', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append (['p', 'blockquote']) else: self.result.startElement ('p', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('p') # Footnotes can start with either paragraphs or lists. if (self.footNoteID is not None): self.result.startElement ('a', ' name="%s" style="vertical-align: super" href="#src%s"'% (self.footNoteID, self.footNoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.footNoteID = None elif (realName == 'list'): # In ODF both ordered and unordered lists come through as lists # TODO: Continue to work through the plugin and find other changes. # Add handling for list style support! self.charData = [] applicableStyle = self.styleHandler.getParagraphStyle (styleName) self.result.startElement ('ol', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('ol') # Footnotes can start with either paragraphs or lists. if (self.footNoteID is not None): self.result.startElement ('a', ' name="%s" style="vertical-align: super" href="#src%s"'% (self.footNoteID, self.footNoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.footNoteID = None elif (realName == 'unordered-list'): self.charData = [] applicableStyle = self.styleHandler.getParagraphStyle (styleName) self.result.startElement ('ul', self.getCSSStyle (applicableStyle, PARAGRAPH_STYLE_FILTER)) self.closeTagsStack.append ('ul') # Footnotes can start with either paragraphs or lists. if (self.footNoteID is not None): self.result.startElement ('a', ' name="%s" style="vertical-align: super" href="#src%s"'% (self.footNoteID, self.footNoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.footNoteID = None elif (realName == 'list-item'): applicableStyle = self.styleHandler.getTextStyle (styleName) self.result.startElement ('li', self.getCSSStyle (applicableStyle, SPAN_STYLE_FILTER)) self.closeTagsStack.append ('li') elif (realName == 'span'): # We have some text formatting - write out any data already accumulated. self.writeData() applicableStyle = self.styleHandler.getTextStyle (styleName) if (styleName == "Source Text"): # We have PRE text self.result.startElement ('code', self.getCSSStyle (applicableStyle, SPAN_STYLE_FILTER)) self.closeTagsStack.append ('code') else: cssStyle = self.getCSSStyle (applicableStyle, SPAN_STYLE_FILTER) if (len (cssStyle) > 0): self.result.startElement ('span', cssStyle) self.closeTagsStack.append ('span') else: #self.log.debug ("Suppressing span - no change in style.") self.closeTagsStack.append (None) elif (realName == 'a'): self.writeData() linkDest = atts.get ((XLINK_URI, 'href'), None) if (linkDest is not None): self.result.startElement ('a', ' href="%s"' % linkDest) self.closeTagsStack.append ('a') else: self.closeTagsStack.append (None) # Links are underlined - we want this done by the style sheet, so ignore the underline. newEffectiveStyle = {} newEffectiveStyle.update (self.effectiveStyleStack[-1]) newEffectiveStyle ['underline'] = 'underline' self.effectiveStyleStack.append (newEffectiveStyle) elif (realName == 'footnote' or realName == 'endnote'): try: footnoteID = atts[(TEXT_URI, 'id')] except Exception, e: msg = "Error getting footnoteid. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) # Write out any data we have currently stored. self.writeData() # Now write out the link to the footnote self.result.startElement ('a', ' name="src%s" style="vertical-align: super" href="#%s"' % (footnoteID, footnoteID)) self.result.write (str (len (self.footnotes) + 1)) self.result.endElement ('a') self.resultStack.append (self.result) if (self.outputPlainText): self.result = HTMLWriter.PlainTextWriter (outputStream = StringIO.StringIO(), outputXHTML=1, preserveSpaces = 0) else: self.result = HTMLWriter.HTMLWriter(outputStream = StringIO.StringIO(), outputXHTML=self.outputXHTML, preserveSpaces = 0) self.closeTagsStack.append (None) # Re-set the style stack for the footenote self.effectiveStyleStack.append (DEFAULT_PARAGRAPH_STYLES) # Keep this foonote id around for the first paragraph. self.footNoteID = footnoteID elif (realName == 'footnote-body' or realName == 'endnote-body'): self.closeTagsStack.append (None) # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) elif (realName == 'bookmark-start' or realName == 'bookmark'): try: bookmarkName = atts[(TEXT_URI, 'name')] except Exception, e: msg = "Error getting bookmark name. Error: " + str (e) self.log.error (msg) raise OpenOfficeFormatException (msg) self.writeData() self.result.startElement ('a', ' name="%s"' % bookmarkName) self.closeTagsStack.append ('a') # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) elif (realName == 'line-break'): self.writeData() self.result.lineBreak() self.closeTagsStack.append (None) # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) elif (realName == 's'): # An extra space or two # Remove the leading space if possible so that we can output '  ' instead of '  ' removedSpace = 0 if (len (self.charData) > 0): if (self.charData [-1][-1] == u" "): self.charData [-1] = self.charData [-1][:-1] removedSpace = 1 self.writeData() count = int (atts.get ((TEXT_URI, 'c'), 1)) if (self.preserveSpaces): for spaces in xrange (count): self.result.nonbreakingSpace() if (removedSpace): # Add it back now self.charData.append (u" ") # Keep the effective style as-is, and ignore the close element self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) self.closeTagsStack.append (None) else: # We have no HTML output associated with this OO tag. self.closeTagsStack.append (None) # Keep the effective style as-is self.effectiveStyleStack.append (self.effectiveStyleStack[-1]) def endElementNS (self, name, qname): if (len (self.closeTagsStack) > 0): htmlTag = self.closeTagsStack.pop() if (htmlTag is not None): self.writeData() if (type (htmlTag) == type ([])): for a in htmlTag: self.result.endElement (a) else: self.result.endElement (htmlTag) # Remove this effective style. self.effectiveStyleStack.pop() if (name[1] == 'footnote' or name[1] == 'endnote'): # We have just closed a footnote or endnote - record the result, pop the stack. outputFile = self.result.getOutput() self.footnotes.append (outputFile.getvalue()) outputFile.close() self.result = self.resultStack.pop() def characters (self, data): if (self.cleanSmartQuotes): data = data.replace (u'\u201c', '"') data = data.replace (u'\u201d', '"') if (self.cleanHyphens): data = data.replace (u'\u2013', '-') self.charData.append (data) def writeData (self): data = u"".join (self.charData) self.result.write (cgi.escape (data)) self.charData = [] def getCSSStyle (self, applicableStyle, styleList): #self.log.debug ("Filtering styles %s for styles %s" % (str (applicableStyle), str (styleList))) textDecoration = [] cssStyles = [] # Take a look at the effective styles. effectiveStyles = self.effectiveStyleStack [-1] # Store the new effective style for future comparison newEffectiveStyle = {} newEffectiveStyle.update (effectiveStyles) for style in styleList: if (applicableStyle.has_key (style)): if (style in ["underline", "line-through", "overline"]): if (not effectiveStyles.has_key (style)): textDecoration.append (style) else: # We check to see whether the effective style already has this value # I.e. handle paragraph of font-style=normal and span of font-style=normal styleValue = applicableStyle [style] if (effectiveStyles.has_key (style)): if (effectiveStyles[style] != styleValue): cssStyles.append (u"%s:%s" % (style, styleValue)) else: #self.log.debug ("Style %s already in effect with value %s" % (style, styleValue)) pass else: cssStyles.append (u"%s:%s" % (style, styleValue)) # Note this new effective style newEffectiveStyle [style] = styleValue if (len (textDecoration) > 0): cssStyles.append (u"text-decoration: %s" % u",".join (textDecoration)) #self.log.debug ("Adding real effective style (%s) to stack." % str (newEffectiveStyle)) self.effectiveStyleStack.append (newEffectiveStyle) cssStyleList = ";".join (cssStyles) if (len (cssStyleList) > 0): return ' style="%s"' % cssStyleList return '' def getContent (self): return self.result.getOutput().getvalue() def getFootNotes (self): return self.footnotes class DrawHandler: def __init__ (self, styleHandler, textHandler, config): self.log = logging.getLogger ("PubTal.OOC.DrawHandler") self.styleHandler = styleHandler self.result = textHandler.result self.textHandler = textHandler self.charData = [] # The effectiveStyleStack holds the effective style (e.g. paragraph) and is used to filter out # un-needed style changes. self.effectiveStyleStack = [DEFAULT_PARAGRAPH_STYLES] self.closeTagsStack = [] self.bundledPictureList = [] self.currentImage = None # Check for the kind of output we are generating self.cleanSmartQuotes = config.get ('CleanSmartQuotes', 0) self.cleanHyphens = config.get ('CleanHyphens', 0) self.picturePrefix = os.path.join ('Pictures', config.get ('DestinationFile', '').replace ('.', '_')) self.log.debug ("Determined picture prefix as %s" % self.picturePrefix) def getBundledPictures (self): return self.bundledPictureList def startElementNS (self, name, qname, atts): theURI = name [0] realName = name [1] if (theURI == DRAW_URI): if (realName == 'image'): styleName = atts.get ((DRAW_URI, 'style-name'), None) href = atts.get ((XLINK_URI, 'href'), None) if (href is None): self.log.warn ("No href attribute found for image!") self.closeTagsStack = None return # Deal with bundled pictures if (href.startswith ('#Pictures/')): self.log.debug ("Found bundled picture %s" % href) archivePicName = href [1:] href = self.picturePrefix + archivePicName[9:] self.bundledPictureList.append ((archivePicName, href)) alt = atts.get ((DRAW_URI, 'name'), None) self.currentImage = {'href': href, 'alt': alt} self.closeTagsStack.append (None) elif (realName == 'a'): linkDest = atts.get ((XLINK_URI, 'href'), None) if (linkDest is not None): self.textHandler.writeData() self.result.startElement ('a', ' href="%s"' % linkDest) self.closeTagsStack.append ('a') else: self.closeTagsStack.append (None) elif (theURI == SVG_URI): if (realName == 'desc'): self.charData = [] self.closeTagsStack.append (None) else: self.closeTagsStack.append (None) def endElementNS (self, name, qname): if (len (self.closeTagsStack) > 0): htmlTag = self.closeTagsStack.pop() if (htmlTag is not None): self.result.endElement (htmlTag) # Remove this effective style. #self.effectiveStyleStack.pop() theURI = name [0] realName = name [1] if (theURI == SVG_URI): if (realName == 'desc'): # We have an image description - note it! altText = cgi.escape (u"".join (self.charData)) self.charData = [] if (self.currentImage is not None): self.currentImage ['alt'] = altText elif (theURI == DRAW_URI): if (realName == 'image'): self.textHandler.writeData() self.result.startElement ('img', ' src="%s" alt="%s"' % (self.currentImage ['href'], self.currentImage ['alt'])) self.result.endElement ('img') self.currentImage = None def characters (self, data): if (self.cleanSmartQuotes): data = data.replace (u'\u201c', '"') data = data.replace (u'\u201d', '"') if (self.cleanHyphens): data = data.replace (u'\u2013', '-') self.charData.append (data) class TableHandler: def __init__ (self, styleHandler, resultWriter, config): self.log = logging.getLogger ("PubTal.OOC.TextHandler") self.styleHandler = styleHandler self.result = resultWriter self.closeTagsStack = [] self.tableStatusStack = [] def startElementNS (self, name, qname, atts): #self.log.debug ("Start: %s" % name[1]) realName = name [1] styleName = atts.get ((TABLE_URI, 'style-name'), None) if (realName == 'table' or realName == 'sub-table'): self.result.startElement ('table') self.closeTagsStack.append ('table') self.tableStatusStack.append ({'inHeader':0, 'firstRow': 1}) elif (realName == 'table-header-rows'): status = self.tableStatusStack [-1] status ['inHeader'] = 1 self.result.startElement ('thead') self.closeTagsStack.append ('thead') elif (realName == 'table-row'): status = self.tableStatusStack [-1] if ((not status ['inHeader']) and (status ['firstRow'])): status ['firstRow'] = 0 self.result.startElement ('tbody') self.result.startElement ('tr') self.closeTagsStack.append ('tr') elif (realName == 'table-cell'): status = self.tableStatusStack [-1] colSpan = int (atts.get ((TABLE_URI, 'number-columns-spanned'), 0)) if (colSpan != 0): colSpanTxt = ' colspan="%s"' % str (colSpan) else: colSpanTxt = '' if (status ['inHeader']): self.result.startElement ('th', colSpanTxt) self.closeTagsStack.append ('th') else: self.result.startElement ('td', colSpanTxt) self.closeTagsStack.append ('td') else: self.closeTagsStack.append (None) def endElementNS (self, name, qname): realName = name [1] # We check for table because we want to insert tbody close before table close. if (len (self.tableStatusStack) > 0): status = self.tableStatusStack [-1] if (realName == 'table' or realName == 'sub-table'): if (not status ['firstRow']): # The table actually had content. self.result.endElement ('tbody') if (len (self.closeTagsStack) > 0): htmlTag = self.closeTagsStack.pop() if (htmlTag is not None): self.result.endElement (htmlTag) # We check for table header rows here. if (realName == 'table-header-rows'): status ['inHeader'] = 0 if (realName == 'table'): # Pop this table status off the stack self.tableStatusStack.pop() def characters (self, data): pass class OpenDocumentFormatException (Exception): pass PubTal-3.5/lib/pubtal/__init__.py0000644000105000010500000000013011555340743015477 0ustar cms103cms103__version__ = "3.5" #Use for testing against the output of 3.1.1 #__version__ = "3.1.3" PubTal-3.5/lib/pubtal/ConfigurationParser.py0000644000105000010500000001026511555340743017736 0ustar cms103cms103""" Configuration Parser - Part of PubTal. Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ import re try: import logging except: import InfoLogging as logging class ConfigurationParser: def __init__ (self): self.handlers = {} self.directiveRegex = re.compile ('^\s*<([^ /]+)(.*)>$') self.endDirective = re.compile ('^\s*$') self.commentRegex = re.compile ('(^\s*#.*$)|(^\W*$)') self.log = logging.getLogger ("PubTal.ConfigurationParser") self.defaultHandler = None def addTopLevelHandler (self, directive, handler): self.handlers [directive.upper()] = handler def setDefaultHandler (self, handler): self.defaultHandler = handler def parse (self, fileStream): directiveStack = [] currentHandler = None for line in fileStream.readlines(): lineHandled = 0 match = self.directiveRegex.match (line) if (match is not None): directiveStack.append (match.group(1).upper()) if (currentHandler is None): # Do we have an handler? handler = self.handlers.get (match.group(1).upper(), None) if (handler is not None): # We have a good handler for this! currentHandler = handler currentHandler.startDirective (match.group(1).upper(), match.group(2).strip()) else: if (self.defaultHandler is not None): self.defaultHandler.startDirective (match.group(1).upper(), match.group(2).strip()) else: self.log.warn ("Handler not found for directive %s" % match.group(1)) else: # We already have a handler, just pass them the nested directive. currentHandler.startDirective (match.group(1).upper(), match.group(2).strip()) lineHandled = 1 match = self.endDirective.match (line) if (not lineHandled and match is not None): # Pop off a directive. looking = 1 while (looking and len (directiveStack) > 0): lastDir = directiveStack.pop() if (lastDir == match.group(1).upper()): looking = 0 else: self.log.warn ("Un-closed directive tag: %s (looking for %s)" % (lastDir, match.group(1).upper())) if (currentHandler is not None): currentHandler.endDirective (lastDir) elif (self.defaultHandler is not None): self.defaultHandler.endDirective (lastDir) if (currentHandler is not None): currentHandler.endDirective (match.group(1).upper()) if (len (directiveStack) == 0): currentHandler = None elif (self.defaultHandler is not None): self.defaultHandler.endDirective (match.group(1).upper()) lineHandled = 1 if (not lineHandled): if (not self.commentRegex.match (line)): if (currentHandler is not None): currentHandler.option (line.strip()) elif (self.defaultHandler is not None): self.defaultHandler.option (line.strip()) PubTal-3.5/lib/pubtal/timeformat.py0000644000105000010500000003677611555340743016140 0ustar cms103cms103""" TimeFormat -------------------------------------------------------------------- Copyright (c) 2004 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. """ import re, time, locale, string, math, os __version__ = "1.1.0" goodLocaleModule = 1 for attribute in ['DAY_1', 'ABDAY_1', 'MON_1', 'ABMON_1', 'nl_langinfo']: if (not hasattr (locale, attribute)): goodLocaleModule = 0 # We only use this alternative locale implementation if we have to. class alternativeLocale: def __init__ (self): self.__counter__ = 0 self.__lookupMap__ = {} self.__getDays__ () self.__getMonths__ () def nl_langinfo (self, aconstant): return self.__lookupMap__ (aconstant) def __getDays__ (self): # The reference date we use to build up the locale data # The 24th of May 2004 was a Monday refDate = [2004,5,24, 00, 00, 00, 0, 146, 0] day_keys = [] abday_keys = [] for weekDay in range (0,7): # Get the long and the short locale for this day longDay = time.strftime ('%A', refDate) shortDay = time.strftime ('%a', refDate) self.__lookupMap__ [self.__counter__] = longDay day_keys.append (self.__counter__) self.__counter__ = self.__counter__ + 1 self.__lookupMap__ [self.__counter__] = shortDay abday_keys.append (self.__counter__) self.__counter__ = self.__counter__ + 1 # Move on to the next day refDate [2] = refDate [2] + 1 refDate [6] = refDate [6] + 1 refDate [7] = refDate [7] + 1 # Fill out the constants at the module level # DAY_1 is a Sunday, so do that last. self.DAY_2 = day_keys[0] self.DAY_3 = day_keys[1] self.DAY_4 = day_keys[2] self.DAY_5 = day_keys[3] self.DAY_6 = day_keys[4] self.DAY_7 = day_keys[5] self.DAY_1 = day_keys[6] # The short day constants self.ABDAY_2 = abday_keys[0] self.ABDAY_3 = abday_keys[1] self.ABDAY_4 = abday_keys[2] self.ABDAY_5 = abday_keys[3] self.ABDAY_6 = abday_keys[4] self.ABDAY_7 = abday_keys[5] self.ABDAY_1 = abday_keys[6] def __getMonths__ (self): # The reference date we use to build up the locale data month_keys = [] abmonth_keys = [] for month in range (1,13): # Get a time module format date for this month refDate = time.strptime ("%s 2004" % str (month), '%m %Y') # Get the long and the short locale for this day longMonth = time.strftime ('%B', refDate) shortMonth = time.strftime ('%b', refDate) self.__lookupMap__ [self.__counter__] = longMonth month_keys.append (self.__counter__) self.__counter__ = self.__counter__ + 1 self.__lookupMap__ [self.__counter__] = shortMonth abmonth_keys.append (self.__counter__) self.__counter__ = self.__counter__ + 1 # Fill out the constants at the module level # MON_1 is January self.MON_1 = month_keys[0] self.MON_2 = month_keys[1] self.MON_3 = month_keys[2] self.MON_4 = month_keys[3] self.MON_5 = month_keys[4] self.MON_6 = month_keys[5] self.MON_7 = month_keys[6] self.MON_8 = month_keys[7] self.MON_9 = month_keys[8] self.MON_10 = month_keys[9] self.MON_11 = month_keys[10] self.MON_12 = month_keys[11] # The short constants self.ABMON_1 = abmonth_keys[0] self.ABMON_2 = abmonth_keys[1] self.ABMON_3 = abmonth_keys[2] self.ABMON_4 = abmonth_keys[3] self.ABMON_5 = abmonth_keys[4] self.ABMON_6 = abmonth_keys[5] self.ABMON_7 = abmonth_keys[6] self.ABMON_8 = abmonth_keys[7] self.ABMON_9 = abmonth_keys[8] self.ABMON_10 = abmonth_keys[9] self.ABMON_11 = abmonth_keys[10] self.ABMON_12 = abmonth_keys[11] if (goodLocaleModule): localeModule = locale else: # We are on Windows or some other platform with an incomplete loacle module localeModule = alternativeLocale () # Regex for our date/time format strings. Format is: %TYPE[\[MODIFIER\]] # %% is used to escape the % regex = re.compile ('(%[%abcCdHIjmMnpPrStTuUwWxXyYZz])(\[[^\]]*\])?') strftime_regex = re.compile ('((? 12): resultBuf.append (_getNumber_ (hour - 12, formatModifier, '[0]')) else: resultBuf.append (_getNumber_ (hour, formatModifier, '[0]')) elif (formatType == '%j'): # Day of year as a number resultBuf.append (_getNumber_ (ourTime[7], formatModifier, '[0]', 3)) elif (formatType == '%m'): # Month of year as a number resultBuf.append (_getNumber_ (ourTime[1], formatModifier, '[0]')) elif (formatType == '%M'): # Minute as a number resultBuf.append (_getNumber_ (ourTime[4], formatModifier, '[0]')) elif (formatType == '%n'): resultBuf.append (os.linesep) elif (formatType == '%t'): resultBuf.append ('\t') elif (formatType == '%r'): try: ampmFmt = localeModule.nl_langinfo (localeModule.T_FMT_AMPM) except: ampmFmt = DEFAULT_T_FMT_AMPM # Now get a translation and expansion of this. resultBuf.append (strftime (ampmFmt, ourTime)) elif (formatType == '%c'): try: prefFmt = localeModule.nl_langinfo (localeModule.D_T_FMT) except: prefFmt = DEFAULT_D_T_FMT # Now get a translation and expansion of this. resultBuf.append (strftime (prefFmt, ourTime)) elif (formatType == '%x'): try: prefFmt = localeModule.nl_langinfo (localeModule.D_FMT) except: prefFmt = DEFAULT_D_FMT # Now get a translation and expansion of this. resultBuf.append (strftime (prefFmt, ourTime)) elif (formatType == '%X'): try: prefFmt = localeModule.nl_langinfo (localeModule.T_FMT) except: prefFmt = DEFAULT_T_FMT # Now get a translation and expansion of this. resultBuf.append (strftime (prefFmt, ourTime)) elif (formatType == '%S'): # Second as a number resultBuf.append (_getNumber_ (ourTime[5], formatModifier, '[0]')) elif (formatType == '%w'): # Day of week as number, Sunday = 0 weekDayNum = ourTime [6] + 1 if (weekDayNum == 7): weekDayNum = 0 resultBuf.append (str (weekDayNum)) elif (formatType == '%u'): # Day of week as number, Monday = 1 weekDayNum = ourTime [6] + 1 resultBuf.append (str (weekDayNum)) elif (formatType == '%y'): # 2 digit year year = int (str (ourTime [0])[2:]) resultBuf.append (_getNumber_ (year, formatModifier, '[0]')) elif (formatType == '%Y'): # 4 digit year resultBuf.append (str (ourTime [0])) elif (formatType == '%C'): # 2 digit century resultBuf.append (str (ourTime [0])[0:2]) elif (formatType == '%p' or formatType == '%P'): # Have to fall back on the 'C' version resultBuf.append (time.strftime (formatType, ourTime)) elif (formatType == '%U' or formatType == '%W'): # Fall back to 'C' version, but still allow the extra modifier. resultBuf.append (_getNumber_ (int (time.strftime (formatType, ourTime)), formatModifier, '[0]')) elif (formatType == '%z'): # W3C Timezone format resultBuf.append (_getTimeZone_ (w3cFormat = 1, utctime = utctime)) elif (formatType == '%T'): # hhmm timezone format resultBuf.append (_getTimeZone_ (w3cFormat = 0, utctime = utctime)) elif (formatType == '%Z'): # TLA timezone format if (utctime): resultBuf.append ('UTC') elif (ourTime [8] == 1): resultBuf.append (time.tzname[1]) else: resultBuf.append (time.tzname[0]) elif (formatType == '%%'): resultBuf.append ('%') else: # Silently ignore the error - print as litteral if (formatModifier is None): resultBuf.append ("%s" % formatType) else: resultBuf.append ("%s%s" % (formatType, formatModifier)) last = match.end() match = regex.search (informat, last) resultBuf.append (informat [last:]) return u"".join (resultBuf) def strftime (informat, intime = None): """ Provides a backwards-compatible strftime implementation. This converts strftime format codes into TimeFormat codes, and then expands them using format() """ resultBuf = [] position = 0 last = 0 match = strftime_regex.search (informat) while (match): resultBuf.append (informat [last:match.start()]) formatType = match.group(1) if (STRFMAP.has_key (formatType)): resultBuf.append (STRFMAP [formatType]) else: # Silently ignore the error - print as litteral resultBuf.append ("%s" % formatType) last = match.end() match = strftime_regex.search (informat, last) resultBuf.append (informat [last:]) # Now expand the TimeFormat string return format (u"".join (resultBuf), intime) def _getWeekday_ (dayOfWeek, formatModifier): constantList = DAY_WEEK_LONG if (formatModifier == '[SHORT]'): constantList = DAY_WEEK_SHORT localeConst = constantList [dayOfWeek] try: weekDay = localeModule.nl_langinfo (localeConst) return weekDay except: # nl_langinfo not supported return DEFAULT_DAY_WEEK [localeConst] def _getMonth_ (monthNum, formatModifier): constantList = MONTH_LONG if (formatModifier == '[SHORT]'): constantList = MONTH_SHORT # Months are 1-12 not 0-11 localeConst = constantList [monthNum-1] try: monthName = localeModule.nl_langinfo (localeConst) return monthName except: # nl_langinfo not supported return DEFAULT_MONTH [localeConst] def _getNumber_ (theNumber, formatModifier, defaultModifier, cols=2): " Returns a positive digit number either as-is, or padded" if (formatModifier is None): formatModifier = defaultModifier # By default do no padding padding = 0 if (formatModifier == '[NP]'): padding = 0 elif (formatModifier == '[SP]'): padding = 1 padder = " " elif (formatModifier == '[0]'): padding = 1 padder = "0" if (padding == 0): return str (theNumber) ourNum = str (theNumber) return "%s%s" % (padder * (cols - len (ourNum)), ourNum) def _getTimeZone_ (w3cFormat, utctime): if (utctime): if (w3cFormat): return "Z" return "-0000" # Work out the timezone in +/-HH:MM format. if (time.daylight): offset = time.altzone else: offset = time.timezone absoffset = abs (offset) hours = int (math.floor (absoffset/3600.0)) mins = int (math.floor ((absoffset - (hours * 3600))/60.0)) if (offset > 0): thesign = "-" else: thesign = "+" if (w3cFormat): return "%s%s:%s" % (thesign, string.zfill (hours,2), string.zfill (mins, 2)) else: return "%s%s%s" % (thesign, string.zfill (hours,2), string.zfill (mins, 2)) PubTal-3.5/lib/pubtal/HTMLWriter.py0000644000105000010500000002747611555340743015727 0ustar cms103cms103import copy, StringIO, re try: import logging except: import InfoLogging as logging import dtdcode # HTML Class uses this to suppress end-tag output, XHTML class uses these to write singletons. TAG_OPTIONAL=1 TAG_REQUIRED=0 TAG_FORBIDDEN=2 MLT_SPACE = re.compile (' +') NBSP_REF = " " class TagNotAllowedException (Exception): def __init__ (self, tag, stack): stackMsg = [] self.tag = tag for oldtag, atts in stack: stackMsg.append ("<%s%s>" % (oldtag, atts)) stackMsg = " ".join (stackMsg) self.msg = "Tag %s not allowed here: %s" % (tag, stackMsg) def getTag (self): return self.tag def __str__ (self): return self.msg class BadCloseTagException (Exception): def __init__ (self, tag, stack, expected=None): stackMsg = [] self.tag = tag for oldtag, atts in stack: stackMsg.append ("<%s%s>" % (oldtag, atts)) stackMsg = " ".join (stackMsg) if (expected is None): self.msg = "Close tag %s has no corresponding open tag. (Elements currently open are: %s)" % (tag, stackMsg) else: self.msg = "Received close tag %s when expecting %s. (Elements currently open are: %s)" % (tag, expected, stackMsg) def getTag (self): return self.tag def __str__ (self): return self.msg class HTMLWriter: """ The purpose of this class is to provide a simple way of writing valid HTML fragements. The class has enough logic to keep track of simple HTML rules (e.g.

elements can not be nested), and to silently correct when an attempt is made to write HTML that would break those rules. The class will not enforce rules such as only having no more than one element. WARNING: All Start calls must be matched by End calls, this class will not magically nest elements correctly! outputStream - File like object to write output to. outputXHTML - Whether to generate XHTML or HTML tags. preserveSpaces - If true then   will be inserted into the output as required. """ def __init__ (self, outputStream=None, outputXHTML=1, preserveSpaces = 1, exceptionOnError=0): if (outputStream is None): self.output = StringIO.StringIO() else: self.output = outputStream self.exceptionOnError = exceptionOnError self.log = logging.getLogger ("HTMLWriter") self.debugOn = self.log.isEnabledFor (logging.DEBUG) self.allowedElementsStack = [] self.currentElementsStack = [] self.currentText = [] self.skipDepth = 0 self.outputXHTML = outputXHTML self.preserveSpaces = preserveSpaces self.dataLength = 0 self.log.debug ("XHTML Status :%s " % str (self.outputXHTML)) if (self.outputXHTML): self.tagmap = dtdcode.XHTML_TAG_MAP self.blocklist = dtdcode.XHTML_BLOCK_LIST self.log.info ("Selected XHTML tag map.") else: self.tagmap = dtdcode.HTML_TAG_MAP self.blocklist = dtdcode.HTML_BLOCK_LIST self.log.info ("Selected HTML tag map.") def getOutput (self): self.flush() return self.output def startElement (self, elementName, attributes=""): if (self.skipDepth != 0): self.skipDepth += 1 return if (not self.__checkAllowed__ (elementName)): self.skipDepth += 1 self.log.warn ("Element %s not allowed." % elementName) if (self.log.isEnabledFor (logging.DEBUG)): self.log.debug ("Allowed stack follows:") for nest in self.allowedElementsStack: self.log.debug ("Elements %s allowed" % str (nest)) if (self.exceptionOnError): raise TagNotAllowedException (elementName, self.currentElementsStack) return if (self.debugOn): self.log.debug ("Writing start element %s atts: %s" % (elementName, attributes)) # Write out any pending data if (self.currentText): self.__outputText__() try: allowedTags, endTagPolicy = self.tagmap [elementName] # Only put the tag on the stacks if it *can* have an end tag. if (endTagPolicy != TAG_FORBIDDEN): self.allowedElementsStack.append (allowedTags) self.currentElementsStack.append ((elementName, attributes)) if (self.outputXHTML and endTagPolicy == TAG_FORBIDDEN): self.output.write (u'<%s%s />' % (elementName, attributes)) else: self.output.write (u'<%s%s>' % (elementName, attributes)) except KeyError, e: msg = "HTML element %s is not supported!" % elementName self.log.warn (msg) if (self.exceptionOnError): raise Exception (msg) def endElement (self, elementName): if (self.skipDepth != 0): self.log.warn ("End element %s is not allowed (start tag was suppressed)." % elementName) self.skipDepth -= 1 return if (self.currentText): self.__outputText__() # Ensure that this type of end element is allowed, if not then ignore it. allowedTags, endTagPolicy = self.tagmap [elementName] if (endTagPolicy == TAG_FORBIDDEN): self.log.debug ("End tag %s forbidden, skipping." % elementName) return if (len (self.currentElementsStack) == 0): raise BadCloseTagException (elementName, self.currentElementsStack) self.allowedElementsStack.pop() expectedElement = self.currentElementsStack.pop()[0] if (elementName != expectedElement): looking = 1 while (looking): expectedElementTagPolicy = self.tagmap [expectedElement][1] self.log.debug ("Tag %s has policy %s" % (expectedElement, str (expectedElementTagPolicy))) # No end-tag-forbidden elements ever go on the stack, which means we have an un-closed tag! if (self.exceptionOnError): raise BadCloseTagException (elementName, self.currentElementsStack, expectedElement) self.log.warn ("Closing un-closed tag %s" % expectedElement) if (elementName in self.blocklist): self.output.write (u'\n' % expectedElement) else: self.output.write (u'' % expectedElement) if (len (self.currentElementsStack) > 0): self.allowedElementsStack.pop() expectedElement = self.currentElementsStack.pop()[0] if (expectedElement == elementName): looking = 0 else: raise BadCloseTagException (elementName, self.currentElementsStack) if (self.debugOn): self.log.debug ("Writing end element %s" % elementName) if (elementName in self.blocklist): self.output.write (u'\n' % elementName) else: self.output.write (u'' % elementName) def write (self, data): if (self.skipDepth == 0): if (self.preserveSpaces): # Ensure that we are allowed to write text into this element. if (len (self.currentElementsStack) > 0): if (dtdcode.TEXT_ALLOWED_MAP.has_key (self.currentElementsStack [-1][0])): # We are allowed text, so feel free to do the   thing. self.currentText.append (data) else: # We could just suppress this output, but it might include white space that is good for formatting. # Using a regex to validate that would allow for even more checking! self.output.write (data) else: # We have no open element, assume that this is OK. self.currentText.append (data) else: self.output.write (data) self.dataLength += len (data) def flush (self): """ Write out any cached data. """ if (self.currentText): self.__outputText__() def lineBreak (self): if (not self.__checkAllowed__ ('br')): return if (self.currentText): self.__outputText__() if (self.outputXHTML): self.output.write (u'
\n') else: self.output.write (u'
\n') def nonbreakingSpace (self): if (self.skipDepth == 0): if (self.preserveSpaces): self.currentText.append (NBSP_REF) else: self.output.write (NBSP_REF) def getCurrentElementStack (self): return self.currentElementsStack def isElementAllowed (self, tagName): return self.__checkAllowed__ (tagName) def isEndTagForbidden (self, tagName): allowedTags, endTagPolicy = self.tagmap [tagName] if (endTagPolicy == TAG_FORBIDDEN): return 1 def getDataLength (self): return self.dataLength def __outputText__ (self): # Get one big text string realText = "".join (self.currentText) # Determine whether there are any double spaces to escape. match = MLT_SPACE.search (realText) pos = 0 while (match): # Output the current chunk of text start, end = match.start(), match.end() self.output.write (realText [pos:start]) self.output.write (NBSP_REF*(end - start - 1) + " ") pos = end match = MLT_SPACE.search (realText, pos) self.output.write (realText [pos:]) self.currentText = [] def __checkAllowed__ (self, tagName): if (len (self.allowedElementsStack) == 0): return 1 if tagName in self.allowedElementsStack[-1]: return 1 return 0 class PlainTextWriter (HTMLWriter): """ This class works like the HTMLWriter class, except that it doesn't output any markup. This is useful when we need to produce output suitable for including in an RSS feed. The class still enforces correct nesting of elements so that callers can rely on this logic. """ def __init__ (self, outputStream=StringIO.StringIO(), outputXHTML=1, preserveSpaces=1, exceptionOnError=0): HTMLWriter.__init__ (self, outputStream, outputXHTML, preserveSpaces = 0, exceptionOnError = exceptionOnError) self.log = logging.getLogger ("PlainTextWriter") def startElement (self, elementName, attributes=""): if (self.skipDepth != 0): self.skipDepth += 1 return if (not self.__checkAllowed__ (elementName)): self.skipDepth += 1 self.log.warn ("Element %s not allowed." % elementName) if (self.log.isEnabledFor (logging.DEBUG)): self.log.debug ("Allowed stack follows:") for nest in self.allowedElementsStack: self.log.debug ("Elements %s allowed" % str (nest)) if (self.exceptionOnError): raise TagNotAllowedException (elementName, self.currentElementsStack) return if (self.currentText): self.__outputText__() try: allowedTags, endTagPolicy = self.tagmap [elementName] # Only put the tag on the stacks if it *can* have an end tag. if (endTagPolicy != TAG_FORBIDDEN): self.allowedElementsStack.append (allowedTags) self.currentElementsStack.append ((elementName, attributes)) except KeyError, e: msg = "HTML element %s is not supported!" % elementName self.log.warn (msg) if (self.exceptionOnError): raise Exception (msg) def endElement (self, elementName): if (self.skipDepth != 0): self.log.warn ("End element %s is not allowed (start tag was suppressed)." % elementName) self.skipDepth -= 1 return if (self.currentText): self.__outputText__() # Ensure that this type of end element is allowed, if not then ignore it. allowedTags, endTagPolicy = self.tagmap [elementName] if (endTagPolicy == TAG_FORBIDDEN): self.log.debug ("End tag %s forbidden, skipping." % elementName) return if (len (self.currentElementsStack) == 0): raise BadCloseTagException (elementName, self.currentElementsStack) self.allowedElementsStack.pop() expectedElement = self.currentElementsStack.pop()[0] if (elementName != expectedElement): looking = 1 while (looking): expectedElementTagPolicy = self.tagmap [expectedElement][1] self.log.debug ("Tag %s has policy %s" % (expectedElement, str (expectedElementTagPolicy))) # No end-tag-forbidden elements ever go on the stack, which means we have an un-closed tag! if (self.exceptionOnError): raise BadCloseTagException (elementName, self.currentElementsStack, expectedElement) self.log.warn ("Closing un-closed tag %s" % expectedElement) if (elementName in self.blocklist): self.output.write (u'\n') if (len (self.currentElementsStack) > 0): self.allowedElementsStack.pop() expectedElement = self.currentElementsStack.pop()[0] if (expectedElement == elementName): looking = 0 else: raise BadCloseTagException (elementName, self.currentElementsStack) if (elementName in self.blocklist): self.output.write (u'\n') def lineBreak (self): if (self.currentText): self.__outputText__() self.output.write (u'\n') def write (self, data): if (self.skipDepth == 0): self.output.write (data) self.dataLength += len (data) PubTal-3.5/lib/pubtal/dtdcode.py0000644000105000010500000012421011555340743015354 0ustar cms103cms103""" Lookup maps to assist in generating valid HTML. Each key is a lower case tag name. Values are a tuple of two values: a list of valid children and a number. Numbers have the following significance: 0 - Close tag is required 1 - Close tag is optional 2 - Close tag is forbidden """ HTML_TAG_MAP = { 'dir': (['li'], 0) ,'col': ([], 2) ,'tt': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'div': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'p': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'iframe': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'del': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'applet': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'caption': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'q': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'isindex': ([], 2) ,'button': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'hr', 'table', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo'], 0) ,'i': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'colgroup': (['col'], 1) ,'textarea': ([], 0) ,'center': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'script': ([], 0) ,'b': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'span': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'a': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'thead': (['tr'], 1) ,'legend': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'strong': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'dt': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'var': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'optgroup': (['option'], 0) ,'address': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'br': ([], 2) ,'abbr': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'link': ([], 2) ,'base': ([], 2) ,'ul': (['li'], 0) ,'u': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'object': (['param', 'p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'dd': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'big': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'basefont': ([], 2) ,'form': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'li': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'fieldset': (['legend', 'p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'blockquote': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address'], 0) ,'dl': (['dt', 'dd'], 0) ,'map': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'area'], 0) ,'kbd': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'cite': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'samp': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'td': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'input': ([], 2) ,'strike': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'acronym': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'th': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'tfoot': (['tr'], 1) ,'dfn': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'label': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'param': ([], 2) ,'tbody': (['tr'], 1) ,'tr': (['th', 'td'], 1) ,'bdo': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'menu': (['li'], 0) ,'area': ([], 2) ,'img': ([], 2) ,'sub': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'ol': (['li'], 0) ,'style': ([], 0) ,'h5': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'noscript': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'noframes': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button', 'body'], 0) ,'select': (['optgroup', 'option'], 0) ,'font': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'table': (['caption', 'col', 'colgroup', 'thead', 'tfoot', 'tbody', 'tr'], 0) ,'option': ([], 1) ,'s': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'sup': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'pre': (['tt', 'i', 'b', 'u', 's', 'strike', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'applet', 'font', 'basefont', 'br', 'script', 'map', 'q', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'ins': (['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h4': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h6': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h1': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h3': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h2': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'hr': ([], 2) ,'code': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'small': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'em': (['tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'a', 'img', 'applet', 'object', 'font', 'basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe', 'input', 'select', 'textarea', 'label', 'button'], 0)} XHTML_TAG_MAP = { 'dir': (['li'], 0) ,'col': ([], 2) ,'tt': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'div': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'p': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'iframe': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'del': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'applet': (['param', 'p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'caption': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'q': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'isindex': ([], 2) ,'button': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'table', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'noscript', 'ins', 'del', 'script'], 0) ,'i': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'colgroup': (['col'], 1) ,'textarea': ([], 0) ,'center': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'script': ([], 0) ,'b': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'span': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'a': (['br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'ins', 'del', 'script'], 0) ,'thead': (['tr'], 1) ,'legend': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'strong': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'dt': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 1) ,'var': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'optgroup': (['option'], 0) ,'address': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'ins', 'del', 'script', 'p'], 0) ,'br': ([], 2) ,'abbr': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'link': ([], 2) ,'base': ([], 2) ,'ul': (['li'], 0) ,'u': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'object': (['param', 'p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'dd': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 1) ,'big': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'basefont': ([], 2) ,'form': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'li': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 1) ,'fieldset': (['legend', 'p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'blockquote': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'dl': (['dt', 'dd'], 0) ,'map': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'noscript', 'ins', 'del', 'script', 'area'], 0) ,'kbd': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'cite': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'samp': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'td': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 1) ,'input': ([], 2) ,'strike': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'acronym': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'th': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 1) ,'tfoot': (['tr'], 1) ,'dfn': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'label': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'param': ([], 2) ,'tbody': (['tr'], 1) ,'tr': (['th', 'td'], 1) ,'bdo': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'menu': (['li'], 0) ,'area': ([], 2) ,'img': ([], 2) ,'sub': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'ol': (['li'], 0) ,'style': ([], 0) ,'h5': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'noscript': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'noframes': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'select': (['optgroup', 'option'], 0) ,'font': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'table': (['caption', 'col', 'colgroup', 'thead', 'tfoot', 'tbody', 'tr'], 0) ,'option': ([], 1) ,'s': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'sup': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'pre': (['a', 'br', 'span', 'bdo', 'tt', 'i', 'b', 'u', 's', 'strike', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'input', 'select', 'textarea', 'label', 'button', 'ins', 'del', 'script'], 0) ,'ins': (['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table', 'form', 'a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button', 'noscript', 'ins', 'del', 'script'], 0) ,'h4': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h6': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h1': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h3': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'h2': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'hr': ([], 2) ,'code': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'small': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0) ,'em': (['a', 'br', 'span', 'bdo', 'object', 'applet', 'img', 'map', 'iframe', 'tt', 'i', 'b', 'u', 's', 'strike', 'big', 'small', 'font', 'basefont', 'em', 'strong', 'dfn', 'code', 'q', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym', 'sub', 'sup', 'input', 'select', 'textarea', 'label', 'button'], 0)} HTML_BLOCK_LIST = ['p', 'ol', 'ul', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address'] XHTML_BLOCK_LIST = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'ul', 'ol', 'dl', 'menu', 'dir', 'pre', 'hr', 'blockquote', 'address', 'center', 'noframes', 'isindex', 'fieldset', 'table'] # A list of elements that are allowed to have text in them. # Map version allows fast lookup. TEXT_ALLOWED_LIST = ["tt", "i", "b", "big", "small", "em", "strong", "dfn", "code", "samp", "kbd", "var", "cite", "abbr", "acronym", "sub","sup","span","bdo","address","a","p","h1,h2,h3,h4,h5,h6","pre","q","dt","label","legend","caption","div","object","ins","del","dd","li","fieldset","button","th","td","option","textarea","title"] TEXT_ALLOWED_MAP = {} for element in TEXT_ALLOWED_LIST: TEXT_ALLOWED_MAP [element] = 1 PubTal-3.5/lib/pubtal/EncodingCapabilities.py0000644000105000010500000000565511555340743020021 0ustar cms103cms103""" Classes to determine and cache character set capabilities. Copyright (c) 2003 Colin Stewart (http://www.owlfish.com/) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. If you make any bug fixes or feature enhancements please let me know! """ try: import logging except: import InfoLogging as logging import codecs # List of tests and capability names. tests = [(u'\u201c\u201d', 'SmartQuotes'), (u'\u2013', 'Hyphen')] class EncodingCapabilities: def __init__ (self): """ Class for deteriming character set encoding capabilities.""" self.log = logging.getLogger ("PubTal.SiteConfig") self.cache = {} def getCapability (self, characterSet, capability): if (not self.cache.has_key (characterSet)): self.log.debug ("Cache miss for character set %s" % characterSet) self._getCapabilities_ (characterSet) else: self.log.debug ("Cache hit for character set %s" % characterSet) return self.cache [characterSet][capability] def _getCapabilities_ (self, characterSet): try: encoder = codecs.lookup (characterSet)[0] except Exception, e: self.log.error ("Character set %s not supported." % characterSet) raise e capabilities = {} self.log.debug ("Testing capabilities for character set %s" % characterSet) for testcase, testname in tests: try: self.log.info ("About to execute testcase: %s" % repr (testcase)) encoder (testcase, "strict") capability = 1 self.log.debug ("%s supported." % testname) except: capability = 0 self.log.debug ("%s not supported." % testname) capabilities [testname] = capability self.cache [characterSet] = capabilities PubTal-3.5/PKG-INFO0000644000105000010500000000052211555341012012421 0ustar cms103cms103Metadata-Version: 1.0 Name: PubTal Version: 3.5 Summary: A template driven web site builder for small sites. Home-page: http://www.owlfish.com/software/PubTal/index.html Author: Colin Stewart Author-email: colin@owlfish.com License: UNKNOWN Description: PubTal is a template driven web site builder for small web sites. Platform: UNKNOWN PubTal-3.5/Changes.txt0000644000105000010500000001470111555340743013453 0ustar cms103cms103PubTal Change Log ----------------- Version 3.5 ----------- Bug fixes: - Updated Textile plugin to work with python-textile 2.1.4 (thanks to Rodrigo Gallardo for the fix) Version 3.4 ----------- Bug fixes: - File seperator characters (e.g. /) can now be used in weblog title names. - Replaced use of md5 module with hashlib - Wrapped all instances of a String being raised as an exception with the Exception class. Version 3.3 ----------- New Features: - Changed example template for Atom to use content rather than summary tags. This enables better display of the feed in Sage. - Added new option "weblog-index-disabled" to allow the index page of a weblog to be disabled. - Made filename optional for catalogue entries when catalogue-build-pages disables item page build. Bug fixes: - Added TR to the list of allowed tags with TABLE element. Version 3.2.1 ------------- New Features: - Sort files during FTP so that fewer directory changes are required. - Added a short sleep between FTP commands to improve reliability. Bug fixes: - Change MANIFEST.IN to include the Atom and RSS XML templates. Version 3.2.0 ------------- New Features: - Changed "hostname" configuration keyword to "url-prefix", making the "absoluteDestinationURL" property available across all templates - Added support for Atom 1.0 to the weblog plugin - Updated "full-weblog" example atom.xml template to support Atom 1.0 Bug fixes: - Fixed stray "self" in SiteConfiguration (thanks to Luis Rodrigo Gallardo Cruz) - Switch from utf-16 to utf-8 internally to avoid broken SAX library issue Version 3.1.3 ------------- New Features: - Special path "readFile" allows page-specific files to be included in the expanded content. Bug fixes: - Textile plugin would fail because the Textile library performs Unicode conversion. Version 3.1.2 ------------- Added the automatic generation of   characters for HTMLText and OpenOffice content. New Features: - Automatically add   as required to the output of HTMLText and OpenOffice content. - Added ability to suppress   generation using the new preserve-html-spaces configuration option. Version 3.1.1 ------------- This version includes an updated version of TimeFormat that fixes two major issues. Bug fixes: - Fixes usage under Windows. - Timezones east of UTC are now handled correctly. Version 3.1.0 ------------- New Features: - Date's can now be formated using TimeFormat codes. - Support for plain text output added for HTMLText and OpenOffice. - Weblog support is now included. - Configuration files can now specify file types within sub directories. Bug fixes: - OpenOffice plugin now disables external lookups of DTDs for those Python XML libraries which support this. - File paths are now converted to UTF-8 before being placed into the database files. This fixes crashes under Fedora Core 2. - Unit test cases no longer throw errors under PyXML. Version 3.0.1 ------------- Bug fix: The setup script was not copying the OpenOffice plugin correctly on install. Version 3.0 ------------ NOTE: This version of PubTal requires SimpleTAL 3.8 or higher! New Features: - Upload to an FTP site is now supported. - OpenOffice content can now support Images. - OpenOffice content now produces bidirectional footnotes. - Added the ability to determine which files have really changed, which reduces the number of files PubTal has to upload. - OpenOffice and HTMLText produced nicer HTML output. - HTMLText is stricter about the use of valid HTML. - User interaction is substantially nicer. - Logging can be enabled and sent to a file using command line parameters. - Parts of a site can be given a specific class, which will then not be built without passing a command line parameter. - New CSVSortedTables content type plugin. Bug fixes: - Using file types in