xml-conduit-1.1.0.9/0000755000000000000000000000000012247623377012326 5ustar0000000000000000xml-conduit-1.1.0.9/xml-conduit.cabal0000644000000000000000000000730612247623377015563 0ustar0000000000000000name: xml-conduit version: 1.1.0.9 license: BSD3 license-file: LICENSE author: Michael Snoyman , Aristid Breitkreuz maintainer: Michael Snoyman synopsis: Pure-Haskell utilities for dealing with XML with the conduit package. description: This package provides parsing and rendering functions for XML. It is based on the datatypes found in the xml-types package. This package is broken up into the following modules: . * Text.XML: DOM-based parsing and rendering. This is the most commonly used module. . * Text.XML.Cursor: A wrapper around "Text.XML" which allows bidirectional traversing of the DOM, similar to XPath. (Note: Text.XML.Cursor.Generic is the same concept, but will work with any node representation.) . * Text.XML.Unresolved: A slight modification to "Text.XML" which does not require all entities to be resolved at parsing. The datatypes are slightly more complicated here, and therefore this module is only recommended when you need to deal directly with raw entities. . * Text.XML.Stream.Parse: Streaming parser, including some streaming parser combinators. . * Text.XML.Stream.Render: Streaming renderer. . Additionally, the xml-hamlet package provides a more convenient syntax for creating XML documents. For a more thorough tutorial on this library, please see . category: XML, Conduit stability: Stable cabal-version: >= 1.8 build-type: Simple homepage: http://github.com/snoyberg/xml extra-source-files: test/main.hs library build-depends: base >= 4 && < 5 , conduit >= 1.0 && < 1.1 , resourcet >= 0.3 && < 0.5 , attoparsec-conduit >= 1.0 , blaze-builder-conduit >= 1.0 , bytestring >= 0.9 , text >= 0.7 , containers >= 0.2 , xml-types >= 0.3.4 && < 0.4 , attoparsec >= 0.10 , blaze-builder >= 0.2 && < 0.4 , transformers >= 0.2 && < 0.4 , failure >= 0.1 && < 0.3 , data-default , system-filepath >= 0.4 && < 0.5 , monad-control >= 0.3 && < 0.4 , blaze-markup >= 0.5 , blaze-html >= 0.5 , deepseq >= 1.1.0.0 exposed-modules: Text.XML.Stream.Parse Text.XML.Stream.Render Text.XML.Unresolved Text.XML.Cursor Text.XML.Cursor.Generic Text.XML other-modules: Text.XML.Stream.Token ghc-options: -Wall test-suite test type: exitcode-stdio-1.0 main-is: main.hs hs-source-dirs: test build-depends: base , containers , text , transformers , bytestring , xml-conduit , hspec >= 1.3 , HUnit , xml-types >= 0.3.1 , conduit , blaze-markup source-repository head type: git location: git://github.com/snoyberg/xml.git xml-conduit-1.1.0.9/LICENSE0000644000000000000000000000253012247623377013333 0ustar0000000000000000The following license covers this documentation, and the source code, except where otherwise indicated. Copyright 2010, Suite Solutions. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. xml-conduit-1.1.0.9/Setup.lhs0000644000000000000000000000016212247623377014135 0ustar0000000000000000#!/usr/bin/env runhaskell > module Main where > import Distribution.Simple > main :: IO () > main = defaultMain xml-conduit-1.1.0.9/Text/0000755000000000000000000000000012247623377013252 5ustar0000000000000000xml-conduit-1.1.0.9/Text/XML.hs0000644000000000000000000002621112247623377014250 0ustar0000000000000000{-# LANGUAGE CPP #-} {-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE PatternGuards #-} {-# LANGUAGE RankNTypes #-} -- | DOM-based parsing and rendering. -- -- This module requires that all entities be resolved at parsing. If you need -- to interact with unresolved entities, please use "Text.XML.Unresolved". This -- is the recommended module for most uses cases. -- -- While many of the datatypes in this module are simply re-exported from -- @Data.XML.Types@, 'Document', 'Node' and 'Element' are all redefined here to -- disallow the possibility of unresolved entities. Conversion functions are -- provided to switch between the two sets of datatypes. -- -- For simpler, bidirectional traversal of the DOM tree, see the -- "Text.XML.Cursor" module. module Text.XML ( -- * Data types Document (..) , Prologue (..) , Instruction (..) , Miscellaneous (..) , Node (..) , Element (..) , Name (..) , Doctype (..) , ExternalID (..) -- * Parsing -- ** Files , readFile -- ** Bytes , parseLBS , parseLBS_ , sinkDoc -- ** Text , parseText , parseText_ , sinkTextDoc -- ** Other , fromEvents , UnresolvedEntityException (..) , XMLException (..) -- * Rendering , writeFile , renderLBS , renderText , renderBytes -- * Settings , def -- ** Parsing , ParseSettings , psDecodeEntities -- *** Entity decoding , P.decodeXmlEntities , P.decodeHtmlEntities -- ** Rendering , R.RenderSettings , R.rsPretty , R.rsNamespaces , R.rsAttrOrder , R.orderAttrs -- * Conversion , toXMLDocument , fromXMLDocument , toXMLNode , fromXMLNode , toXMLElement , fromXMLElement ) where import qualified Data.XML.Types as X import Data.XML.Types ( Prologue (..) , Miscellaneous (..) , Instruction (..) , Name (..) , Doctype (..) , ExternalID (..) ) import Data.Typeable (Typeable) import Data.Data (Data) import Control.DeepSeq(NFData(rnf)) import Data.Text (Text) import qualified Text.XML.Stream.Parse as P import qualified Text.XML.Unresolved as D import qualified Text.XML.Stream.Render as R import qualified Data.Text as T import Data.Either (partitionEithers) import Prelude hiding (readFile, writeFile, FilePath) import Filesystem.Path.CurrentOS (FilePath, encodeString) import Control.Exception (SomeException, Exception, throwIO, handle) import Text.XML.Stream.Parse (ParseSettings, def, psDecodeEntities) import Data.ByteString (ByteString) import qualified Data.ByteString.Lazy as L import Control.Monad.ST (runST) import qualified Data.Set as Set import qualified Data.Map as Map import Data.Set (Set) import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Encoding as TLE import Data.Conduit import qualified Data.Conduit.List as CL import qualified Data.Conduit.Binary as CB import System.IO.Unsafe (unsafePerformIO) import Control.Exception (throw) import Control.Monad.Trans.Resource (runExceptionT) import Control.Monad.Trans.Class (lift) import Data.Conduit.Lazy (lazyConsume) import qualified Text.Blaze as B import qualified Text.Blaze.Html as B import qualified Text.Blaze.Html5 as B5 import qualified Text.Blaze.Internal as BI import Data.Monoid (mempty, mappend) import Data.String (fromString) import Data.List (foldl') import Control.Arrow (first) data Document = Document { documentPrologue :: Prologue , documentRoot :: Element , documentEpilogue :: [Miscellaneous] } deriving (Show, Eq, Typeable, Data) #if MIN_VERSION_containers(0, 4, 2) instance NFData Document where rnf (Document a b c) = rnf a `seq` rnf b `seq` rnf c `seq` () #endif data Node = NodeElement Element | NodeInstruction Instruction | NodeContent Text | NodeComment Text deriving (Show, Eq, Ord, Typeable, Data) #if MIN_VERSION_containers(0, 4, 2) instance NFData Node where rnf (NodeElement e) = rnf e `seq` () rnf (NodeInstruction i) = rnf i `seq` () rnf (NodeContent t) = rnf t `seq` () rnf (NodeComment t) = rnf t `seq` () #endif data Element = Element { elementName :: Name , elementAttributes :: Map.Map Name Text , elementNodes :: [Node] } deriving (Show, Eq, Ord, Typeable, Data) #if MIN_VERSION_containers(0, 4, 2) instance NFData Element where rnf (Element a b c) = rnf a `seq` rnf b `seq` rnf c `seq` () #endif {- readFile :: FilePath -> ParseSettings -> IO (Either SomeException Document) readFile_ :: FIlePath -> ParseSettings -> IO Document -} toXMLDocument :: Document -> X.Document toXMLDocument = toXMLDocument' def toXMLDocument' :: R.RenderSettings -> Document -> X.Document toXMLDocument' rs (Document a b c) = X.Document a (toXMLElement' rs b) c toXMLElement :: Element -> X.Element toXMLElement = toXMLElement' def toXMLElement' :: R.RenderSettings -> Element -> X.Element toXMLElement' rs (Element name as nodes) = X.Element name as' nodes' where as' = map (\(x, y) -> (x, [X.ContentText y])) $ R.rsAttrOrder rs name as nodes' = map (toXMLNode' rs) nodes toXMLNode :: Node -> X.Node toXMLNode = toXMLNode' def toXMLNode' :: R.RenderSettings -> Node -> X.Node toXMLNode' rs (NodeElement e) = X.NodeElement $ toXMLElement' rs e toXMLNode' _ (NodeContent t) = X.NodeContent $ X.ContentText t toXMLNode' _ (NodeComment c) = X.NodeComment c toXMLNode' _ (NodeInstruction i) = X.NodeInstruction i fromXMLDocument :: X.Document -> Either (Set Text) Document fromXMLDocument (X.Document a b c) = case fromXMLElement b of Left es -> Left es Right b' -> Right $ Document a b' c fromXMLElement :: X.Element -> Either (Set Text) Element fromXMLElement (X.Element name as nodes) = case (lnodes, las) of ([], []) -> Right $ Element name ras rnodes (x, []) -> Left $ Set.unions x ([], y) -> Left $ Set.unions y (x, y) -> Left $ Set.unions x `Set.union` Set.unions y where enodes = map fromXMLNode nodes (lnodes, rnodes) = partitionEithers enodes eas = map go as (las, ras') = partitionEithers eas ras = Map.fromList ras' go (x, y) = case go' [] id y of Left es -> Left es Right y' -> Right (x, y') go' [] front [] = Right $ T.concat $ front [] go' errs _ [] = Left $ Set.fromList errs go' errs front (X.ContentText t:ys) = go' errs (front . (:) t) ys go' errs front (X.ContentEntity t:ys) = go' (t : errs) front ys fromXMLNode :: X.Node -> Either (Set Text) Node fromXMLNode (X.NodeElement e) = either Left (Right . NodeElement) $ fromXMLElement e fromXMLNode (X.NodeContent (X.ContentText t)) = Right $ NodeContent t fromXMLNode (X.NodeContent (X.ContentEntity t)) = Left $ Set.singleton t fromXMLNode (X.NodeComment c) = Right $ NodeComment c fromXMLNode (X.NodeInstruction i) = Right $ NodeInstruction i readFile :: ParseSettings -> FilePath -> IO Document readFile ps fp = handle (throwIO . InvalidXMLFile fp) (runResourceT $ CB.sourceFile (encodeString fp) $$ sinkDoc ps) data XMLException = InvalidXMLFile FilePath SomeException deriving Typeable instance Show XMLException where show (InvalidXMLFile fp e) = concat [ "Error parsing XML file " , encodeString fp , ": " , show e ] instance Exception XMLException parseLBS :: ParseSettings -> L.ByteString -> Either SomeException Document parseLBS ps lbs = runST $ runExceptionT $ CL.sourceList (L.toChunks lbs) $$ sinkDoc ps parseLBS_ :: ParseSettings -> L.ByteString -> Document parseLBS_ ps = either throw id . parseLBS ps sinkDoc :: MonadThrow m => ParseSettings -> Consumer ByteString m Document sinkDoc ps = P.parseBytesPos ps =$= fromEvents parseText :: ParseSettings -> TL.Text -> Either SomeException Document parseText ps tl = runST $ runExceptionT $ CL.sourceList (TL.toChunks tl) $$ sinkTextDoc ps parseText_ :: ParseSettings -> TL.Text -> Document parseText_ ps = either throw id . parseText ps sinkTextDoc :: MonadThrow m => ParseSettings -> Consumer Text m Document sinkTextDoc ps = P.parseText ps =$= fromEvents fromEvents :: MonadThrow m => Consumer P.EventPos m Document fromEvents = do d <- D.fromEvents either (lift . monadThrow . UnresolvedEntityException) return $ fromXMLDocument d data UnresolvedEntityException = UnresolvedEntityException (Set Text) deriving (Show, Typeable) instance Exception UnresolvedEntityException renderBytes :: MonadUnsafeIO m => R.RenderSettings -> Document -> Producer m ByteString renderBytes rs doc = D.renderBytes rs $ toXMLDocument' rs doc writeFile :: R.RenderSettings -> FilePath -> Document -> IO () writeFile rs fp doc = runResourceT $ renderBytes rs doc $$ CB.sinkFile (encodeString fp) renderLBS :: R.RenderSettings -> Document -> L.ByteString renderLBS rs doc = L.fromChunks $ unsafePerformIO -- not generally safe, but we know that runResourceT -- will not deallocate any of the resources being used -- by the process $ lazyConsume $ renderBytes rs doc renderText :: R.RenderSettings -> Document -> TL.Text renderText rs = TLE.decodeUtf8 . renderLBS rs instance B.ToMarkup Document where toMarkup (Document _ root _) = B5.docType >> B.toMarkup root -- | Note that the special element name -- @{http://www.snoyman.com/xml2html}ie-cond@ with the single attribute @cond@ -- is used to indicate an IE conditional comment. instance B.ToMarkup Element where toMarkup (Element "{http://www.snoyman.com/xml2html}ie-cond" attrs children) | [("cond", cond)] <- Map.toList attrs = B.preEscapedToMarkup ("" :: T.Text) toMarkup (Element name' attrs children) = if isVoid then foldl' (B.!) leaf attrs' else foldl' (B.!) parent attrs' childrenHtml where childrenHtml :: B.Html childrenHtml = case (name `elem` ["style", "script"], children) of (True, [NodeContent t]) -> B.preEscapedToMarkup t _ -> mapM_ B.toMarkup children isVoid = nameLocalName name' `Set.member` voidElems parent :: B.Html -> B.Html parent = BI.Parent tag open close leaf :: B.Html leaf = BI.Leaf tag open (fromString " />") name = T.unpack $ nameLocalName name' tag = fromString name open = fromString $ '<' : name close = fromString $ concat [""] attrs' :: [B.Attribute] attrs' = map goAttr $ map (first nameLocalName) $ Map.toList attrs goAttr (key, value) = B.customAttribute (B.textTag key) $ B.toValue value instance B.ToMarkup Node where toMarkup (NodeElement e) = B.toMarkup e toMarkup (NodeContent t) = B.toMarkup t toMarkup _ = mempty voidElems :: Set.Set T.Text voidElems = Set.fromAscList $ T.words $ T.pack "area base br col command embed hr img input keygen link meta param source track wbr" xml-conduit-1.1.0.9/Text/XML/0000755000000000000000000000000012247623377013712 5ustar0000000000000000xml-conduit-1.1.0.9/Text/XML/Cursor.hs0000644000000000000000000002104312247623377015523 0ustar0000000000000000-- | This module provides for simple DOM traversal. It is inspired by XPath. There are two central concepts here: -- -- * A 'Cursor' represents a node in the DOM. It also contains information on the node's /location/. While the 'Node' datatype will only know of its children, a @Cursor@ knows about its parent and siblings as well. (The underlying mechanism allowing this is called a zipper, see and .) -- -- * An 'Axis', in its simplest form, takes a @Cursor@ and returns a list of @Cursor@s. It is used for selections, such as finding children, ancestors, etc. Axes can be chained together to express complex rules, such as all children named /foo/. -- -- The terminology used in this module is taken directly from the XPath -- specification: . For those familiar with XPath, -- the one major difference is that attributes are not considered nodes in this -- module. module Text.XML.Cursor ( -- * Data types Cursor , Axis -- * Production , fromDocument , fromNode , cut -- * Axes , parent , CG.precedingSibling , CG.followingSibling , child , node , CG.preceding , CG.following , CG.ancestor , descendant , orSelf -- ** Filters , check , checkNode , checkElement , checkName , anyElement , element , laxElement , content , attribute , laxAttribute , hasAttribute , attributeIs -- * Operators , (CG.&|) , (CG.&/) , (CG.&//) , (CG.&.//) , (CG.$|) , (CG.$/) , (CG.$//) , (CG.$.//) , (CG.>=>) -- * Type classes , Boolean(..) -- * Error handling , force , forceM ) where import Control.Monad import Data.Function (on) import Text.XML import qualified Control.Failure as F import qualified Data.Text as T import qualified Data.Map as Map import qualified Text.XML.Cursor.Generic as CG import Text.XML.Cursor.Generic (node, child, parent, descendant, orSelf) import Data.Maybe (maybeToList) -- TODO: Consider [Cursor] -> [Cursor]? -- | The type of an Axis that returns a list of Cursors. -- They are roughly modeled after . -- -- Axes can be composed with '>=>', where e.g. @f >=> g@ means that on all results of -- the @f@ axis, the @g@ axis will be applied, and all results joined together. -- Because Axis is just a type synonym for @Cursor -> [Cursor]@, it is possible to use -- other standard functions like '>>=' or 'concatMap' similarly. -- -- The operators '&|', '&/', '&//' and '&.//' can be used to combine axes so that the second -- axis works on the context nodes, children, descendants, respectively the context node as -- well as its descendants of the results of the first axis. -- -- The operators '$|', '$/', '$//' and '$.//' can be used to apply an axis (right-hand side) -- to a cursor so that it is applied on the cursor itself, its children, its descendants, -- respectively itself and its descendants. -- -- Note that many of these operators also work on /generalised Axes/ that can return -- lists of something other than Cursors, for example Content elements. type Axis = Cursor -> [Cursor] -- XPath axes as in http://www.w3.org/TR/xpath/#axes -- TODO: Decide whether to use an existing package for this -- | Something that can be used in a predicate check as a boolean. class Boolean a where bool :: a -> Bool instance Boolean Bool where bool = id instance Boolean [a] where bool = not . null instance Boolean (Maybe a) where bool (Just _) = True bool _ = False instance Boolean (Either a b) where bool (Left _) = False bool (Right _) = True -- | A cursor: contains an XML 'Node' and pointers to its children, ancestors and siblings. type Cursor = CG.Cursor Node -- | Cut a cursor off from its parent. The idea is to allow restricting the scope of queries on it. cut :: Cursor -> Cursor cut = fromNode . CG.node -- | Convert a 'Document' to a 'Cursor'. It will point to the document root. fromDocument :: Document -> Cursor fromDocument = fromNode . NodeElement . documentRoot -- | Convert a 'Node' to a 'Cursor' (without parents). fromNode :: Node -> Cursor fromNode = CG.toCursor cs where cs (NodeElement (Element _ _ x)) = x cs _ = [] -- | Filter cursors that don't pass a check. check :: Boolean b => (Cursor -> b) -> Axis check f c = case bool $ f c of False -> [] True -> [c] -- | Filter nodes that don't pass a check. checkNode :: Boolean b => (Node -> b) -> Axis checkNode f c = check (f . node) c -- | Filter elements that don't pass a check, and remove all non-elements. checkElement :: Boolean b => (Element -> b) -> Axis checkElement f c = case node c of NodeElement e -> case bool $ f e of True -> [c] False -> [] _ -> [] -- | Filter elements that don't pass a name check, and remove all non-elements. checkName :: Boolean b => (Name -> b) -> Axis checkName f c = checkElement (f . elementName) c -- | Remove all non-elements. Compare roughly to XPath: -- /A node test * is true for any node of the principal node type. For example, child::* will select all element children of the context node [...]/. anyElement :: Axis anyElement = checkElement (const True) -- | Select only those elements with a matching tag name. XPath: -- /A node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName./ element :: Name -> Axis element n = checkName (== n) -- | Select only those elements with a loosely matching tag name. Namespace and case are ignored. XPath: -- /A node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName./ laxElement :: T.Text -> Axis laxElement n = checkName (on (==) T.toCaseFold n . nameLocalName) -- | Select only text nodes, and directly give the 'Content' values. XPath: -- /The node test text() is true for any text node./ -- -- Note that this is not strictly an 'Axis', but will work with most combinators. content :: Cursor -> [T.Text] content c = case node c of (NodeContent v) -> [v] _ -> [] -- | Select attributes on the current element (or nothing if it is not an element). XPath: -- /the attribute axis contains the attributes of the context node; the axis will be empty unless the context node is an element/ -- -- Note that this is not strictly an 'Axis', but will work with most combinators. -- -- The return list of the generalised axis contains as elements lists of 'Content' -- elements, each full list representing an attribute value. attribute :: Name -> Cursor -> [T.Text] attribute n c = case node c of NodeElement e -> maybeToList $ Map.lookup n $ elementAttributes e _ -> [] -- | Select attributes on the current element (or nothing if it is not an element). Namespace and case are ignored. XPath: -- /the attribute axis contains the attributes of the context node; the axis will be empty unless the context node is an element/ -- -- Note that this is not strictly an 'Axis', but will work with most combinators. -- -- The return list of the generalised axis contains as elements lists of 'Content' -- elements, each full list representing an attribute value. laxAttribute :: T.Text -> Cursor -> [T.Text] laxAttribute n c = case node c of NodeElement e -> do (n', v) <- Map.toList $ elementAttributes e guard $ (on (==) T.toCaseFold) n (nameLocalName n') return v _ -> [] -- | Select only those element nodes with the given attribute. hasAttribute :: Name -> Axis hasAttribute n c = case node c of NodeElement (Element _ as _) -> maybe [] (const [c]) $ Map.lookup n as _ -> [] -- | Select only those element nodes containing the given attribute key/value pair. attributeIs :: Name -> T.Text -> Axis attributeIs n v c = case node c of NodeElement (Element _ as _) -> if Just v == Map.lookup n as then [c] else [] _ -> [] force :: F.Failure e f => e -> [a] -> f a force e [] = F.failure e force _ (x:_) = return x forceM :: F.Failure e f => e -> [f a] -> f a forceM e [] = F.failure e forceM _ (x:_) = x xml-conduit-1.1.0.9/Text/XML/Unresolved.hs0000644000000000000000000002264212247623377016402 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE RankNTypes #-} -- | DOM-based XML parsing and rendering. -- -- In this module, attribute values and content nodes can contain either raw -- text or entities. In most cases, these can be fully resolved at parsing. If -- that is the case for your documents, the "Text.XML" module provides -- simplified datatypes that only contain raw text. module Text.XML.Unresolved ( -- * Non-streaming functions writeFile , readFile -- * Lazy bytestrings , renderLBS , parseLBS , parseLBS_ -- * Text , parseText , parseText_ , sinkTextDoc -- * Byte streams , sinkDoc -- * Streaming functions , toEvents , fromEvents , renderBuilder , renderBytes , renderText -- * Exceptions , InvalidEventStream (..) -- * Settings , P.def -- ** Parse , P.ParseSettings , P.psDecodeEntities -- ** Render , R.RenderSettings , R.rsPretty , R.rsNamespaces ) where import Prelude hiding (writeFile, readFile, FilePath) import Filesystem.Path.CurrentOS (FilePath, encodeString) import Data.XML.Types import Control.Exception (Exception, SomeException) import Data.Typeable (Typeable) import Blaze.ByteString.Builder (Builder) import qualified Text.XML.Stream.Render as R import qualified Text.XML.Stream.Parse as P import Text.XML.Stream.Parse (ParseSettings) import Data.ByteString (ByteString) import Data.Text (Text) import Control.Applicative ((<$>), (<*>)) import Control.Monad (when) import qualified Data.Text as T import qualified Data.Text.Lazy as TL import Data.Char (isSpace) import qualified Data.ByteString.Lazy as L import System.IO.Unsafe (unsafePerformIO) import Data.Conduit import qualified Data.Conduit.List as CL import qualified Data.Conduit.Binary as CB import Control.Exception (throw) import Control.Monad.Trans.Class (lift) import Control.Monad.ST (runST) import Data.Conduit.Lazy (lazyConsume) readFile :: P.ParseSettings -> FilePath -> IO Document readFile ps fp = runResourceT $ CB.sourceFile (encodeString fp) $$ sinkDoc ps sinkDoc :: MonadThrow m => P.ParseSettings -> Consumer ByteString m Document sinkDoc ps = P.parseBytesPos ps =$= fromEvents writeFile :: R.RenderSettings -> FilePath -> Document -> IO () writeFile rs fp doc = runResourceT $ renderBytes rs doc $$ CB.sinkFile (encodeString fp) renderLBS :: R.RenderSettings -> Document -> L.ByteString renderLBS rs doc = L.fromChunks $ unsafePerformIO -- not generally safe, but we know that runResourceT -- will not deallocate any of the resources being used -- by the process $ lazyConsume $ renderBytes rs doc parseLBS :: P.ParseSettings -> L.ByteString -> Either SomeException Document parseLBS ps lbs = runST $ runExceptionT $ CL.sourceList (L.toChunks lbs) $$ sinkDoc ps parseLBS_ :: P.ParseSettings -> L.ByteString -> Document parseLBS_ ps lbs = either throw id $ parseLBS ps lbs data InvalidEventStream = ContentAfterRoot P.EventPos | MissingRootElement | InvalidInlineDoctype P.EventPos | MissingEndElement Name (Maybe P.EventPos) | UnterminatedInlineDoctype deriving Typeable instance Exception InvalidEventStream instance Show InvalidEventStream where show (ContentAfterRoot (pos, e)) = mShowPos pos ++ "Found content after root element: " ++ prettyShowE e show MissingRootElement = "Missing root element" show (InvalidInlineDoctype (pos, e)) = mShowPos pos ++ "Invalid content inside doctype: " ++ prettyShowE e show (MissingEndElement name Nothing) = "Documented ended while expected end element for: " ++ prettyShowName name show (MissingEndElement name (Just (pos, e))) = mShowPos pos ++ "Expected end element for: " ++ prettyShowName name ++ ", but received: " ++ prettyShowE e show UnterminatedInlineDoctype = "Unterminated doctype declaration" mShowPos :: Maybe P.PositionRange -> String mShowPos Nothing = "" mShowPos (Just pos) = show pos ++ ": " prettyShowE :: Event -> String prettyShowE = show -- FIXME prettyShowName :: Name -> String prettyShowName = show -- FIXME renderBuilder :: Monad m => R.RenderSettings -> Document -> Producer m Builder renderBuilder rs doc = CL.sourceList (toEvents doc) =$= R.renderBuilder rs renderBytes :: MonadUnsafeIO m => R.RenderSettings -> Document -> Producer m ByteString renderBytes rs doc = CL.sourceList (toEvents doc) =$= R.renderBytes rs renderText :: (MonadThrow m, MonadUnsafeIO m) => R.RenderSettings -> Document -> Producer m Text renderText rs doc = CL.sourceList (toEvents doc) =$= R.renderText rs fromEvents :: MonadThrow m => Consumer P.EventPos m Document fromEvents = do skip EventBeginDocument d <- Document <$> goP <*> require goE <*> goM skip EventEndDocument y <- CL.head case y of Nothing -> return d Just (_, EventEndDocument) -> lift $ monadThrow MissingRootElement Just z -> lift $ monadThrow $ ContentAfterRoot z where skip e = do x <- CL.peek when (fmap snd x == Just e) (CL.drop 1) many f = go id where go front = do x <- f case x of Nothing -> return $ front [] Just y -> go (front . (:) y) dropReturn x = CL.drop 1 >> return x require f = do x <- f case x of Just y -> return y Nothing -> do my <- CL.head case my of Nothing -> error "Text.XML.Unresolved:impossible" Just (_, EventEndDocument) -> lift $ monadThrow MissingRootElement Just y -> lift $ monadThrow $ ContentAfterRoot y goP = Prologue <$> goM <*> goD <*> goM goM = many goM' goM' = do x <- CL.peek case x of Just (_, EventInstruction i) -> dropReturn $ Just $ MiscInstruction i Just (_, EventComment t) -> dropReturn $ Just $ MiscComment t Just (_, EventContent (ContentText t)) | T.all isSpace t -> CL.drop 1 >> goM' _ -> return Nothing goD = do x <- CL.peek case x of Just (_, EventBeginDoctype name meid) -> do CL.drop 1 dropTillDoctype return (Just $ Doctype name meid) _ -> return Nothing dropTillDoctype = do x <- CL.head case x of -- Leaving the following line commented so that the intention of -- this function stays clear. I figure in the future xml-types will -- be expanded again to support some form of EventDeclaration -- -- Just (EventDeclaration _) -> dropTillDoctype Just (_, EventEndDoctype) -> return () Just epos -> lift $ monadThrow $ InvalidInlineDoctype epos Nothing -> lift $ monadThrow UnterminatedInlineDoctype goE = do x <- CL.peek case x of Just (_, EventBeginElement n as) -> Just <$> goE' n as _ -> return Nothing goE' n as = do CL.drop 1 ns <- many goN y <- CL.head if fmap snd y == Just (EventEndElement n) then return $ Element n as $ compressNodes ns else lift $ monadThrow $ MissingEndElement n y goN = do x <- CL.peek case x of Just (_, EventBeginElement n as) -> (Just . NodeElement) <$> goE' n as Just (_, EventInstruction i) -> dropReturn $ Just $ NodeInstruction i Just (_, EventContent c) -> dropReturn $ Just $ NodeContent c Just (_, EventComment t) -> dropReturn $ Just $ NodeComment t Just (_, EventCDATA t) -> dropReturn $ Just $ NodeContent $ ContentText t _ -> return Nothing toEvents :: Document -> [Event] toEvents (Document prol root epi) = (EventBeginDocument :) . goP prol . goE root . goM epi $ [EventEndDocument] where goP (Prologue before doctype after) = goM before . maybe id goD doctype . goM after goM [] = id goM [x] = (goM' x :) goM (x:xs) = (goM' x :) . goM xs goM' (MiscInstruction i) = EventInstruction i goM' (MiscComment t) = EventComment t goD (Doctype name meid) = (:) (EventBeginDoctype name meid) . (:) EventEndDoctype goE (Element name as ns) = (EventBeginElement name as :) . goN ns . (EventEndElement name :) goN [] = id goN [x] = goN' x goN (x:xs) = goN' x . goN xs goN' (NodeElement e) = goE e goN' (NodeInstruction i) = (EventInstruction i :) goN' (NodeContent c) = (EventContent c :) goN' (NodeComment t) = (EventComment t :) compressNodes :: [Node] -> [Node] compressNodes [] = [] compressNodes [x] = [x] compressNodes (NodeContent (ContentText x) : NodeContent (ContentText y) : z) = compressNodes $ NodeContent (ContentText $ x `T.append` y) : z compressNodes (x:xs) = x : compressNodes xs parseText :: ParseSettings -> TL.Text -> Either SomeException Document parseText ps tl = runST $ runExceptionT $ CL.sourceList (TL.toChunks tl) $$ sinkTextDoc ps parseText_ :: ParseSettings -> TL.Text -> Document parseText_ ps = either throw id . parseText ps sinkTextDoc :: MonadThrow m => ParseSettings -> Consumer Text m Document sinkTextDoc ps = P.parseText ps =$= fromEvents xml-conduit-1.1.0.9/Text/XML/Stream/0000755000000000000000000000000012247623377015145 5ustar0000000000000000xml-conduit-1.1.0.9/Text/XML/Stream/Render.hs0000644000000000000000000003017412247623377016725 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE FlexibleContexts #-} -- | 'Enumeratee's to render XML 'Event's. Unlike libxml-enumerator and -- expat-enumerator, this module does not provide IO and ST variants, since the -- underlying rendering operations are pure functions. module Text.XML.Stream.Render ( renderBuilder , renderBytes , renderText , RenderSettings , def , rsPretty , rsNamespaces , rsAttrOrder , orderAttrs , prettify ) where import Data.XML.Types (Event (..), Content (..), Name (..)) import Text.XML.Stream.Token import qualified Data.Text as T import Data.Text (Text) import Blaze.ByteString.Builder import Data.Conduit.Blaze (builderToByteString) import qualified Data.Map as Map import Data.Map (Map) import Data.Maybe (fromMaybe, mapMaybe) import Data.ByteString (ByteString) import Data.Default (Default (def)) import qualified Data.Set as Set import Data.List (foldl') import Data.Conduit import qualified Data.Conduit.List as CL import qualified Data.Conduit.Text as CT import Data.Monoid (mempty) -- | Render a stream of 'Event's into a stream of 'ByteString's. This function -- wraps around 'renderBuilder' and 'builderToByteString', so it produces -- optimally sized 'ByteString's with minimal buffer copying. -- -- The output is UTF8 encoded. renderBytes :: MonadUnsafeIO m => RenderSettings -> Conduit Event m ByteString renderBytes rs = renderBuilder rs =$= builderToByteString -- | Render a stream of 'Event's into a stream of 'ByteString's. This function -- wraps around 'renderBuilder', 'builderToByteString' and 'renderBytes', so it -- produces optimally sized 'ByteString's with minimal buffer copying. renderText :: (MonadThrow m, MonadUnsafeIO m) => RenderSettings -> Conduit Event m Text renderText rs = renderBytes rs =$= CT.decode CT.utf8 data RenderSettings = RenderSettings { rsPretty :: Bool , rsNamespaces :: [(Text, Text)] -- ^ Defines some top level namespace definitions to be used, in the form -- of (prefix, namespace). This has absolutely no impact on the meaning -- of your documents, but can increase readability by moving commonly -- used namespace declarations to the top level. , rsAttrOrder :: Name -> Map.Map Name Text -> [(Name, Text)] -- ^ Specify how to turn the unordered attributes used by the "Text.XML" -- module into an ordered list. } instance Default RenderSettings where def = RenderSettings { rsPretty = False , rsNamespaces = [] , rsAttrOrder = const Map.toList } -- | Convenience function to create an ordering function suitable for -- use as the value of 'rsAttrOrder'. The ordering function is created -- from an explicit ordering of the attributes, specified as a list of -- tuples, as follows: In each tuple, the first component is the -- 'Name' of an element, and the second component is a list of -- attributes names. When the given element is rendered, the -- attributes listed, when present, appear first in the given order, -- followed by any other attributes in arbitrary order. If an element -- does not appear, all of its attributes are rendered in arbitrary -- order. orderAttrs :: [(Name, [Name])] -> Name -> Map Name Text -> [(Name, Text)] orderAttrs orderSpec = order where order elt attrMap = let initialAttrs = fromMaybe [] $ lookup elt orderSpec mkPair attr = fmap ((,) attr) $ Map.lookup attr attrMap otherAttrMap = Map.filterWithKey (const . not . (`elem` initialAttrs)) attrMap in mapMaybe mkPair initialAttrs ++ Map.toAscList otherAttrMap -- | Render a stream of 'Event's into a stream of 'Builder's. Builders are from -- the blaze-builder package, and allow the create of optimally sized -- 'ByteString's with minimal buffer copying. renderBuilder :: Monad m => RenderSettings -> Conduit Event m Builder renderBuilder RenderSettings { rsPretty = True, rsNamespaces = n } = prettify =$= renderBuilder' n True renderBuilder RenderSettings { rsPretty = False, rsNamespaces = n } = renderBuilder' n False renderBuilder' :: Monad m => [(Text, Text)] -> Bool -> Conduit Event m Builder renderBuilder' namespaces0 isPretty = do loop [] where loop nslevels = await >>= maybe (return ()) (go nslevels) go nslevels e = case e of EventBeginElement n1 as -> do mnext <- CL.peek isClosed <- case mnext of Just (EventEndElement n2) | n1 == n2 -> do CL.drop 1 return True _ -> return False let (token, nslevels') = mkBeginToken isPretty isClosed namespaces0 nslevels n1 as yield token loop nslevels' _ -> do let (token, nslevels') = eventToToken nslevels e yield token loop nslevels' eventToToken :: Stack -> Event -> (Builder, [NSLevel]) eventToToken s EventBeginDocument = (tokenToBuilder $ TokenBeginDocument [ ("version", [ContentText "1.0"]) , ("encoding", [ContentText "UTF-8"]) ] , s) eventToToken s EventEndDocument = (mempty, s) eventToToken s (EventInstruction i) = (tokenToBuilder $ TokenInstruction i, s) eventToToken s (EventBeginDoctype n meid) = (tokenToBuilder $ TokenDoctype n meid [], s) eventToToken s EventEndDoctype = (mempty, s) eventToToken s (EventCDATA t) = (tokenToBuilder $ TokenCDATA t, s) eventToToken s (EventEndElement name) = (tokenToBuilder $ TokenEndElement $ nameToTName sl name, s') where (sl:s') = s eventToToken s (EventContent c) = (tokenToBuilder $ TokenContent c, s) eventToToken s (EventComment t) = (tokenToBuilder $ TokenComment t, s) eventToToken _ EventBeginElement{} = error "eventToToken on EventBeginElement" -- mkBeginToken False s name attrs type Stack = [NSLevel] nameToTName :: NSLevel -> Name -> TName nameToTName _ (Name name _ (Just pref)) | pref == "xml" = TName (Just "xml") name nameToTName _ (Name name Nothing _) = TName Nothing name -- invariant that this is true nameToTName (NSLevel def' sl) (Name name (Just ns) _) | def' == Just ns = TName Nothing name | otherwise = case Map.lookup ns sl of Nothing -> error "nameToTName" Just pref -> TName (Just pref) name mkBeginToken :: Bool -- ^ pretty print attributes? -> Bool -- ^ self closing? -> [(Text, Text)] -- ^ namespaces to apply to top-level -> Stack -> Name -> [(Name, [Content])] -> (Builder, Stack) mkBeginToken isPretty isClosed namespaces0 s name attrs = (tokenToBuilder $ TokenBeginElement tname tattrs3 isClosed indent, if isClosed then s else sl3 : s) where indent = if isPretty then 2 + 4 * length s else 0 prevsl = case s of [] -> NSLevel Nothing Map.empty sl':_ -> sl' (sl1, tname, tattrs1) = newElemStack prevsl name (sl2, tattrs2) = foldr newAttrStack (sl1, tattrs1) $ nubAttrs attrs (sl3, tattrs3) = case s of [] -> (sl2 { prefixes = Map.union (prefixes sl2) $ Map.fromList namespaceSL }, namespaceAttrs ++ tattrs2) _ -> (sl2, tattrs2) (namespaceSL, namespaceAttrs) = unzip $ mapMaybe unused namespaces0 unused (k, v) = case lookup k' tattrs2 of Just{} -> Nothing Nothing -> Just ((v, k), (k', v')) where k' = TName (Just "xmlns") k v' = [ContentText v] newElemStack :: NSLevel -> Name -> (NSLevel, TName, [TAttribute]) newElemStack nsl@(NSLevel def' _) (Name local ns _) | def' == ns = (nsl, TName Nothing local, []) newElemStack (NSLevel _ nsmap) (Name local Nothing _) = (NSLevel Nothing nsmap, TName Nothing local, [(TName Nothing "xmlns", [])]) newElemStack (NSLevel _ nsmap) (Name local (Just ns) Nothing) = (NSLevel (Just ns) nsmap, TName Nothing local, [(TName Nothing "xmlns", [ContentText ns])]) newElemStack (NSLevel def' nsmap) (Name local (Just ns) (Just pref)) = case Map.lookup ns nsmap of Just pref' | pref == pref' -> ( NSLevel def' nsmap , TName (Just pref) local , [] ) _ -> ( NSLevel def' nsmap' , TName (Just pref) local , [(TName (Just "xmlns") pref, [ContentText ns])] ) where nsmap' = Map.insert ns pref nsmap newAttrStack :: (Name, [Content]) -> (NSLevel, [TAttribute]) -> (NSLevel, [TAttribute]) newAttrStack (name, value) (NSLevel def' nsmap, attrs) = (NSLevel def' nsmap', addNS $ (tname, value) : attrs) where (nsmap', tname, addNS) = case name of Name local Nothing _ -> (nsmap, TName Nothing local, id) Name local (Just ns) mpref -> let ppref = fromMaybe "ns" mpref (pref, addNS') = getPrefix ppref nsmap ns in (Map.insert ns pref nsmap, TName (Just pref) local, addNS') getPrefix :: Text -> Map Text Text -> Text -> (Text, [TAttribute] -> [TAttribute]) getPrefix _ _ "http://www.w3.org/XML/1998/namespace" = ("xml", id) getPrefix ppref nsmap ns = case Map.lookup ns nsmap of Just pref -> (pref, id) Nothing -> let pref = findUnused ppref $ Map.elems nsmap in (pref, (:) (TName (Just "xmlns") pref, [ContentText ns])) where findUnused x xs | x `elem` xs = findUnused (x `T.snoc` '_') xs | otherwise = x -- | Convert a stream of 'Event's into a prettified one, adding extra -- whitespace. Note that this can change the meaning of your XML. prettify :: Monad m => Conduit Event m Event prettify = prettify' 0 prettify' :: Monad m => Int -> Conduit Event m Event prettify' level = await >>= maybe (return ()) go where go e@EventBeginDocument = do yield e yield $ EventContent $ ContentText "\n" prettify' level go e@EventBeginElement{} = do yield before yield e mnext <- CL.peek case mnext of Just next@EventEndElement{} -> do CL.drop 1 yield next yield after prettify' level _ -> do yield after prettify' $ level + 1 go e@EventEndElement{} = do let level' = max 0 $ level - 1 yield $ before' level' yield e yield after prettify' level' go (EventContent c) = do cs <- takeContents (c:) let cs' = mapMaybe normalize cs case cs' of [] -> return () _ -> do yield before mapM_ (yield . EventContent) cs' yield after prettify' level go (EventCDATA t) = go $ EventContent $ ContentText t go e@EventInstruction{} = do yield before yield e yield after prettify' level go (EventComment t) = do yield before yield $ EventComment $ T.concat [ " " , T.unwords $ T.words t , " " ] yield after prettify' level go e@EventEndDocument = yield e >> prettify' level go e@EventBeginDoctype{} = yield e >> prettify' level go e@EventEndDoctype{} = yield e >> yield after >> prettify' level takeContents front = do me <- CL.peek case me of Just (EventContent c) -> do CL.drop 1 takeContents $ front . (c:) Just (EventCDATA t) -> do CL.drop 1 takeContents $ front . (ContentText t:) _ -> return $ front [] normalize (ContentText t) | T.null t' = Nothing | otherwise = Just $ ContentText t' where t' = T.unwords $ T.words t normalize c = Just c before = EventContent $ ContentText $ T.replicate level " " before' l = EventContent $ ContentText $ T.replicate l " " after = EventContent $ ContentText "\n" nubAttrs :: [(Name, v)] -> [(Name, v)] nubAttrs orig = front [] where (front, _) = foldl' go (id, Set.empty) orig go (dlist, used) (k, v) | k `Set.member` used = (dlist, used) | otherwise = (dlist . ((k, v):), Set.insert k used) xml-conduit-1.1.0.9/Text/XML/Stream/Token.hs0000644000000000000000000001226712247623377016571 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE FlexibleContexts #-} module Text.XML.Stream.Token ( tokenToBuilder , TName (..) , Token (..) , TAttribute , NSLevel (..) ) where import Data.XML.Types (Instruction (..), Content (..), ExternalID (..)) import qualified Data.Text as T import Data.Text (Text) import Data.String (IsString (fromString)) import Blaze.ByteString.Builder (Builder, fromByteString, writeByteString, copyByteString) import Blaze.ByteString.Builder.Internal.Write (fromWriteList) import Blaze.ByteString.Builder.Char.Utf8 (writeChar, fromText) import Data.Monoid (mconcat, mempty, mappend) import Data.ByteString.Char8 () import Data.Map (Map) import qualified Blaze.ByteString.Builder.Char8 as BC8 import qualified Data.Set as Set import Data.List (foldl') import Control.Arrow (first) oneSpace :: Builder oneSpace = copyByteString " " data Token = TokenBeginDocument [TAttribute] | TokenInstruction Instruction | TokenBeginElement TName [TAttribute] Bool Int -- ^ indent | TokenEndElement TName | TokenContent Content | TokenComment Text | TokenDoctype Text (Maybe ExternalID) [(Text, Text)] | TokenCDATA Text deriving Show tokenToBuilder :: Token -> Builder tokenToBuilder (TokenBeginDocument attrs) = fromByteString "") tokenToBuilder (TokenInstruction (Instruction target data_)) = mconcat [ fromByteString "" ] tokenToBuilder (TokenBeginElement name attrs' isEmpty indent) = copyByteString "<" `mappend` tnameToText name `mappend` foldAttrs (if indent == 0 || lessThan3 attrs then oneSpace else BC8.fromString ('\n' : replicate indent ' ')) attrs (if isEmpty then fromByteString "/>" else fromByteString ">") where attrs = nubAttrs $ map (first splitTName) attrs' lessThan3 [] = True lessThan3 [_] = True lessThan3 [_, _] = True lessThan3 _ = False tokenToBuilder (TokenEndElement name) = mconcat [ fromByteString "" ] tokenToBuilder (TokenContent c) = contentToText c tokenToBuilder (TokenCDATA t) = copyByteString "" tokenToBuilder (TokenComment t) = mconcat [fromByteString ""] tokenToBuilder (TokenDoctype name eid _) = mconcat [ fromByteString "" ] where go Nothing = mempty go (Just (SystemID uri)) = mconcat [ fromByteString " SYSTEM \"" , fromText uri , fromByteString "\"" ] go (Just (PublicID pid uri)) = mconcat [ fromByteString " PUBLIC \"" , fromText pid , fromByteString "\" \"" , fromText uri , fromByteString "\"" ] data TName = TName (Maybe Text) Text deriving (Show, Eq, Ord) tnameToText :: TName -> Builder tnameToText (TName Nothing name) = fromText name tnameToText (TName (Just prefix) name) = mconcat [fromText prefix, fromByteString ":", fromText name] contentToText :: Content -> Builder contentToText (ContentText t) = fromWriteList go $ T.unpack t where go '<' = writeByteString "<" go '>' = writeByteString ">" go '&' = writeByteString "&" -- Not escaping quotes, since this is only called outside of attributes go c = writeChar c contentToText (ContentEntity e) = mconcat [ fromByteString "&" , fromText e , fromByteString ";" ] type TAttribute = (TName, [Content]) foldAttrs :: Builder -- ^ before -> [TAttribute] -> Builder -> Builder foldAttrs before attrs rest' = foldr go rest' attrs where go (key, val) rest = before `mappend` tnameToText key `mappend` copyByteString "=\"" `mappend` foldr go' (fromByteString "\"" `mappend` rest) val go' (ContentText t) rest = fromWriteList h (T.unpack t) `mappend` rest where h '<' = writeByteString "<" h '>' = writeByteString ">" h '&' = writeByteString "&" h '"' = writeByteString """ -- Not escaping single quotes, since our attributes are always double -- quoted h c = writeChar c go' (ContentEntity t) rest = fromByteString "&" `mappend` fromText t `mappend` fromByteString ";" `mappend` rest instance IsString TName where fromString = TName Nothing . T.pack data NSLevel = NSLevel { defaultNS :: Maybe Text , prefixes :: Map Text Text } deriving Show nubAttrs :: [TAttribute] -> [TAttribute] nubAttrs orig = front [] where (front, _) = foldl' go (id, Set.empty) orig go (dlist, used) (k, v) | k `Set.member` used = (dlist, used) | otherwise = (dlist . ((k, v):), Set.insert k used) splitTName :: TName -> TName splitTName x@(TName Just{} _) = x splitTName x@(TName Nothing t) | T.null b = x | otherwise = TName (Just a) $ T.drop 1 b where (a, b) = T.break (== ':') t xml-conduit-1.1.0.9/Text/XML/Stream/Parse.hs0000644000000000000000000010157512247623377016564 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE PatternGuards #-} {-# LANGUAGE ImpredicativeTypes #-} {-# LANGUAGE BangPatterns #-} -- | This module provides both a native Haskell solution for parsing XML -- documents into a stream of events, and a set of parser combinators for -- dealing with a stream of events. -- -- As a simple example, if you have the following XML file: -- -- > -- > -- > Michael -- > Eliezer -- > -- -- Then this code: -- -- > {-# LANGUAGE OverloadedStrings #-} -- > import Control.Monad.Trans.Resource -- > import Data.Conduit (($$)) -- > import Data.Text (Text, unpack) -- > import Text.XML.Stream.Parse -- > -- > data Person = Person Int Text -- > deriving Show -- > -- > parsePerson = tagName "person" (requireAttr "age") $ \age -> do -- > name <- content -- > return $ Person (read $ unpack age) name -- > -- > parsePeople = tagNoAttr "people" $ many parsePerson -- > -- > main = do -- > people <- runResourceT $ -- > parseFile def "people.xml" $$ force "people required" parsePeople -- > print people -- -- will produce: -- -- > [Person {age = 25, name = "Michael"},Person {age = 2, name = "Eliezer"}] -- -- Previous versions of this module contained a number of more sophisticated -- functions written by Aristid Breitkreuz and Dmitry Olshansky. To keep this -- package simpler, those functions are being moved to a separate package. This -- note will be updated with the name of the package(s) when available. module Text.XML.Stream.Parse ( -- * Parsing XML files parseBytes , parseBytesPos , parseText , detectUtf , parseFile , parseLBS -- ** Parser settings , ParseSettings , def , DecodeEntities , psDecodeEntities -- *** Entity decoding , decodeXmlEntities , decodeHtmlEntities -- * Event parsing , tag , tagPredicate , tagName , tagNoAttr , content , contentMaybe -- * Attribute parsing , AttrParser , requireAttr , optionalAttr , requireAttrRaw , optionalAttrRaw , ignoreAttrs -- * Combinators , orE , choose , many , force -- * Exceptions , XmlException (..) -- * Other types , PositionRange , EventPos ) where import Data.Attoparsec.Text ( char, Parser, takeWhile1, skipWhile, string , manyTill, takeWhile, try, anyChar ) import qualified Control.Applicative as A import Data.Conduit.Attoparsec (conduitParser, PositionRange) import Data.XML.Types ( Name (..), Event (..), Content (..) , Instruction (..), ExternalID (..) ) import Filesystem.Path.CurrentOS (FilePath, encodeString) import Control.Applicative (Applicative(..), Alternative(empty,(<|>)), (<$>)) import Data.Text (Text, pack) import Control.Arrow ((***)) import qualified Data.Text as T import Data.Text.Read (Reader, decimal, hexadecimal) import Data.Text.Encoding (decodeUtf32BEWith) import Data.Text.Encoding.Error (ignore) import Data.Word (Word32) import Blaze.ByteString.Builder (fromWord32be, toByteString) import Text.XML.Stream.Token import Prelude hiding (takeWhile, FilePath) import qualified Data.ByteString as S import qualified Data.ByteString.Lazy as L import qualified Data.Map as Map import Data.Conduit import qualified Data.Conduit.Text as CT import qualified Data.Conduit.List as CL import qualified Data.Conduit.Internal as CI import Control.Monad (ap, liftM) import qualified Data.Text as TS import Data.List (foldl') import Data.Typeable (Typeable) import Control.Exception (Exception) import Data.Conduit.Binary (sourceFile) import Data.Char (isSpace) import Data.Default (Default (..)) import Control.Monad.Trans.Class (lift) type Ents = [(Text, Text)] tokenToEvent :: Ents -> [NSLevel] -> Token -> (Ents, [NSLevel], [Event]) tokenToEvent es n (TokenBeginDocument _) = (es, n, []) tokenToEvent es n (TokenInstruction i) = (es, n, [EventInstruction i]) tokenToEvent es n (TokenBeginElement name as isClosed _) = (es, n', if isClosed then [begin, end] else [begin]) where l0 = case n of [] -> NSLevel Nothing Map.empty x:_ -> x (as', l') = foldl' go (id, l0) as go (front, l) (TName kpref kname, val) | kpref == Just "xmlns" = (front, l { prefixes = Map.insert kname (contentsToText val) $ prefixes l }) | kpref == Nothing && kname == "xmlns" = (front, l { defaultNS = if T.null $ contentsToText val then Nothing else Just $ contentsToText val }) | otherwise = (front . (:) (TName kpref kname, map resolve val), l) resolve (ContentEntity e) | Just t <- lookup e es = ContentText t resolve c = c n' = if isClosed then n else l' : n fixAttName (name', val) = (tnameToName True l' name', val) elementName = tnameToName False l' name begin = EventBeginElement elementName $ map fixAttName $ as' [] end = EventEndElement elementName tokenToEvent es n (TokenEndElement name) = (es, n', [EventEndElement $ tnameToName False l name]) where (l, n') = case n of [] -> (NSLevel Nothing Map.empty, []) x:xs -> (x, xs) tokenToEvent es n (TokenContent (ContentEntity e)) | Just t <- lookup e es = (es, n, [EventContent $ ContentText t]) tokenToEvent es n (TokenContent c) = (es, n, [EventContent c]) tokenToEvent es n (TokenComment c) = (es, n, [EventComment c]) tokenToEvent es n (TokenDoctype t eid es') = (es ++ es', n, [EventBeginDoctype t eid, EventEndDoctype]) tokenToEvent es n (TokenCDATA t) = (es, n, [EventCDATA t]) tnameToName :: Bool -> NSLevel -> TName -> Name tnameToName _ _ (TName (Just "xml") name) = Name name (Just "http://www.w3.org/XML/1998/namespace") (Just "xml") tnameToName isAttr (NSLevel def' _) (TName Nothing name) = Name name (if isAttr then Nothing else def') Nothing tnameToName _ (NSLevel _ m) (TName (Just pref) name) = case Map.lookup pref m of Just ns -> Name name (Just ns) (Just pref) Nothing -> Name name Nothing (Just pref) -- FIXME is this correct? -- | Automatically determine which UTF variant is being used. This function -- first checks for BOMs, removing them as necessary, and then check for the -- equivalent of Conduit S.ByteString m TS.Text detectUtf = conduit id where conduit front = await >>= maybe (return ()) (push front) push front bss = case getEncoding front bss of Left x -> conduit x Right (bss', continue) -> leftover bss' >> continue getEncoding front bs' | S.length bs < 4 = Left (bs `S.append`) | otherwise = Right (bsOut, CT.decode codec) where bs = front bs' bsOut = S.append (S.drop toDrop x) y (x, y) = S.splitAt 4 bs (toDrop, codec) = case S.unpack x of [0x00, 0x00, 0xFE, 0xFF] -> (4, CT.utf32_be) [0xFF, 0xFE, 0x00, 0x00] -> (4, CT.utf32_le) 0xFE : 0xFF: _ -> (2, CT.utf16_be) 0xFF : 0xFE: _ -> (2, CT.utf16_le) 0xEF : 0xBB: 0xBF : _ -> (3, CT.utf8) [0x00, 0x00, 0x00, 0x3C] -> (0, CT.utf32_be) [0x3C, 0x00, 0x00, 0x00] -> (0, CT.utf32_le) [0x00, 0x3C, 0x00, 0x3F] -> (0, CT.utf16_be) [0x3C, 0x00, 0x3F, 0x00] -> (0, CT.utf16_le) _ -> (0, CT.utf8) -- Assuming UTF-8 type EventPos = (Maybe PositionRange, Event) -- | Parses a byte stream into 'Event's. This function is implemented fully in -- Haskell using attoparsec-text for parsing. The produced error messages do -- not give line/column information, so you may prefer to stick with the parser -- provided by libxml-enumerator. However, this has the advantage of not -- relying on any C libraries. -- -- This relies on 'detectUtf' to determine character encoding, and 'parseText' -- to do the actual parsing. parseBytes :: MonadThrow m => ParseSettings -> Conduit S.ByteString m Event parseBytes = mapOutput snd . parseBytesPos parseBytesPos :: MonadThrow m => ParseSettings -> Conduit S.ByteString m EventPos parseBytesPos ps = detectUtf =$= parseText ps dropBOM :: Monad m => Conduit TS.Text m TS.Text dropBOM = await >>= maybe (return ()) push where push t = case T.uncons t of Nothing -> dropBOM Just (c, cs) -> let output | c == '\xfeef' = cs | otherwise = t in yield output >> idConduit idConduit = await >>= maybe (return ()) (\x -> yield x >> idConduit) -- | Parses a character stream into 'Event's. This function is implemented -- fully in Haskell using attoparsec-text for parsing. The produced error -- messages do not give line/column information, so you may prefer to stick -- with the parser provided by libxml-enumerator. However, this has the -- advantage of not relying on any C libraries. parseText :: MonadThrow m => ParseSettings -> Conduit TS.Text m EventPos parseText de = dropBOM =$= tokenize =$= toEventC =$= addBeginEnd where tokenize = conduitToken de addBeginEnd = yield (Nothing, EventBeginDocument) >> addEnd addEnd = await >>= maybe (yield (Nothing, EventEndDocument)) (\e -> yield e >> addEnd) toEventC :: Monad m => Conduit (PositionRange, Token) m EventPos toEventC = go [] [] where go !es !levels = await >>= maybe (return ()) push where push (position, token) = mapM_ (yield . (,) (Just position)) events >> go es' levels' where (es', levels', events) = tokenToEvent es levels token data ParseSettings = ParseSettings { psDecodeEntities :: DecodeEntities } instance Default ParseSettings where def = ParseSettings { psDecodeEntities = decodeXmlEntities } conduitToken :: MonadThrow m => ParseSettings -> Conduit TS.Text m (PositionRange, Token) conduitToken = conduitParser . parseToken . psDecodeEntities parseToken :: DecodeEntities -> Parser Token parseToken de = (char '<' >> parseLt) <|> TokenContent <$> parseContent de False False where parseLt = (char '?' >> parseInstr) <|> (char '!' >> (parseComment <|> parseCdata <|> parseDoctype)) <|> parseBegin <|> (char '/' >> parseEnd) parseInstr = do name <- parseIdent if name == "xml" then do as <- A.many $ parseAttribute de skipSpace char' '?' char' '>' newline <|> return () return $ TokenBeginDocument as else do skipSpace x <- T.pack <$> manyTill anyChar (try $ string "?>") return $ TokenInstruction $ Instruction name x parseComment = do char' '-' char' '-' c <- T.pack <$> manyTill anyChar (string "-->") -- FIXME use takeWhile instead return $ TokenComment c parseCdata = do _ <- string "[CDATA[" t <- T.pack <$> manyTill anyChar (string "]]>") -- FIXME use takeWhile instead return $ TokenCDATA t parseDoctype = do _ <- string "DOCTYPE" skipSpace name <- parseName let i = case name of TName Nothing x -> x TName (Just x) y -> T.concat [x, ":", y] skipSpace eid <- fmap Just parsePublicID <|> fmap Just parseSystemID <|> return Nothing skipSpace ents <- (do char' '[' ents <- parseEntities id skipSpace return ents) <|> return [] char' '>' newline <|> return () return $ TokenDoctype i eid ents parseEntities front = (char ']' >> return (front [])) <|> (parseEntity >>= \e -> parseEntities (front . (e:))) <|> (char '<' >> parseEntities front) <|> (skipWhile (\t -> t /= ']' && t /= '<') >> parseEntities front) parseEntity = try $ do _ <- string "' return (i, t) parsePublicID = do _ <- string "PUBLIC" x <- quotedText y <- quotedText return $ PublicID x y parseSystemID = do _ <- string "SYSTEM" x <- quotedText return $ SystemID x quotedText = do skipSpace between '"' <|> between '\'' between c = do char' c x <- takeWhile (/=c) char' c return x parseEnd = do skipSpace n <- parseName skipSpace char' '>' return $ TokenEndElement n parseBegin = do skipSpace n <- parseName as <- A.many $ parseAttribute de skipSpace isClose <- (char '/' >> skipSpace >> return True) <|> return False char' '>' return $ TokenBeginElement n as isClose 0 parseAttribute :: DecodeEntities -> Parser TAttribute parseAttribute de = do skipSpace key <- parseName skipSpace char' '=' skipSpace val <- squoted <|> dquoted return (key, val) where squoted = char '\'' *> manyTill (parseContent de False True) (char '\'') dquoted = char '"' *> manyTill (parseContent de True False) (char '"') parseName :: Parser TName parseName = name <$> parseIdent <*> A.optional (char ':' >> parseIdent) where name i1 Nothing = TName Nothing i1 name i1 (Just i2) = TName (Just i1) i2 parseIdent :: Parser Text parseIdent = takeWhile1 valid where valid '&' = False valid '<' = False valid '>' = False valid ':' = False valid '?' = False valid '=' = False valid '"' = False valid '\'' = False valid '/' = False valid c = not $ isXMLSpace c parseContent :: DecodeEntities -> Bool -- break on double quote -> Bool -- break on single quote -> Parser Content parseContent de breakDouble breakSingle = parseEntity <|> parseText' where parseEntity = do char' '&' t <- takeWhile1 (/= ';') char' ';' return $ de t parseText' = do bs <- takeWhile1 valid return $ ContentText bs valid '"' = not breakDouble valid '\'' = not breakSingle valid '&' = False -- amp valid '<' = False -- lt valid _ = True skipSpace :: Parser () skipSpace = skipWhile isXMLSpace -- | Determines whether a character is an XML white space. The list of -- white spaces is given by -- -- > S ::= (#x20 | #x9 | #xD | #xA)+ -- -- in . isXMLSpace :: Char -> Bool isXMLSpace ' ' = True isXMLSpace '\t' = True isXMLSpace '\r' = True isXMLSpace '\n' = True isXMLSpace _ = False newline :: Parser () newline = ((char '\r' >> char '\n') <|> char '\n') >> return () char' :: Char -> Parser () char' c = char c >> return () data ContentType = Ignore | IsContent Text | IsError String | NotContent -- | Grabs the next piece of content if available. This function skips over any -- comments and instructions and concatenates all content until the next start -- or end tag. contentMaybe :: MonadThrow m => Consumer Event m (Maybe Text) contentMaybe = do x <- CL.peek case pc' x of Ignore -> CL.drop 1 >> contentMaybe IsContent t -> CL.drop 1 >> fmap Just (takeContents (t:)) IsError e -> lift $ monadThrow $ XmlException e x NotContent -> return Nothing where pc' Nothing = NotContent pc' (Just x) = pc x pc (EventContent (ContentText t)) = IsContent t pc (EventContent (ContentEntity e)) = IsError $ "Unknown entity: " ++ show e pc (EventCDATA t) = IsContent t pc EventBeginElement{} = NotContent pc EventEndElement{} = NotContent pc EventBeginDocument{} = Ignore pc EventEndDocument = Ignore pc EventBeginDoctype{} = Ignore pc EventEndDoctype = Ignore pc EventInstruction{} = Ignore pc EventComment{} = Ignore takeContents front = do x <- CL.peek case pc' x of Ignore -> CL.drop 1 >> takeContents front IsContent t -> CL.drop 1 >> takeContents (front . (:) t) IsError e -> lift $ monadThrow $ XmlException e x NotContent -> return $ T.concat $ front [] -- | Grabs the next piece of content. If none if available, returns 'T.empty'. -- This is simply a wrapper around 'contentMaybe'. content :: MonadThrow m => Consumer Event m Text content = do x <- contentMaybe case x of Nothing -> return T.empty Just y -> return y -- | The most generic way to parse a tag. It takes a predicate for checking if -- this is the correct tag name, an 'AttrParser' for handling attributes, and -- then a parser for dealing with content. -- -- This function automatically absorbs its balancing closing tag, and will -- throw an exception if not all of the attributes or child elements are -- consumed. If you want to allow extra attributes, see 'ignoreAttrs'. -- -- This function automatically ignores comments, instructions and whitespace. tag :: MonadThrow m => (Name -> Maybe a) -> (a -> AttrParser b) -> (b -> CI.ConduitM Event o m c) -> CI.ConduitM Event o m (Maybe c) tag checkName attrParser f = do x <- dropWS case x of Just (EventBeginElement name as) -> case checkName name of Just y -> case runAttrParser' (attrParser y) as of Left e -> lift $ monadThrow e Right z -> do CL.drop 1 z' <- f z a <- dropWS case a of Just (EventEndElement name') | name == name' -> CL.drop 1 >> return (Just z') _ -> lift $ monadThrow $ XmlException ("Expected end tag for: " ++ show name) a Nothing -> return Nothing _ -> return Nothing where dropWS = do x <- CL.peek let isWS = case x of Just EventBeginDocument -> True Just EventEndDocument -> True Just EventBeginDoctype{} -> True Just EventEndDoctype -> True Just EventInstruction{} -> True Just EventBeginElement{} -> False Just EventEndElement{} -> False Just (EventContent (ContentText t)) | T.all isSpace t -> True | otherwise -> False Just (EventContent ContentEntity{}) -> False Just EventComment{} -> True Just EventCDATA{} -> False Nothing -> False if isWS then CL.drop 1 >> dropWS else return x runAttrParser' p as = case runAttrParser p as of Left e -> Left e Right ([], x) -> Right x Right (attr, _) -> Left $ UnparsedAttributes attr -- | A simplified version of 'tag' which matches against boolean predicates. tagPredicate :: MonadThrow m => (Name -> Bool) -> AttrParser a -> (a -> CI.ConduitM Event o m b) -> CI.ConduitM Event o m (Maybe b) tagPredicate p attrParser = tag (\x -> if p x then Just () else Nothing) (const attrParser) -- | A simplified version of 'tag' which matches for specific tag names instead -- of taking a predicate function. This is often sufficient, and when combined -- with OverloadedStrings and the IsString instance of 'Name', can prove to be -- very concise. tagName :: MonadThrow m => Name -> AttrParser a -> (a -> CI.ConduitM Event o m b) -> CI.ConduitM Event o m (Maybe b) tagName name = tagPredicate (== name) -- | A further simplified tag parser, which requires that no attributes exist. tagNoAttr :: MonadThrow m => Name -> CI.ConduitM Event o m a -> CI.ConduitM Event o m (Maybe a) tagNoAttr name f = tagName name (return ()) $ const f -- | Get the value of the first parser which returns 'Just'. If no parsers -- succeed (i.e., return 'Just'), this function returns 'Nothing'. -- -- > orE a b = choose [a, b] orE :: Monad m => Consumer Event m (Maybe a) -> Consumer Event m (Maybe a) -> Consumer Event m (Maybe a) orE a b = do x <- a case x of Nothing -> b _ -> return x -- | Get the value of the first parser which returns 'Just'. If no parsers -- succeed (i.e., return 'Just'), this function returns 'Nothing'. choose :: Monad m => [Consumer Event m (Maybe a)] -> Consumer Event m (Maybe a) choose [] = return Nothing choose (i:is) = do x <- i case x of Nothing -> choose is Just a -> return $ Just a -- | Force an optional parser into a required parser. All of the 'tag' -- functions, 'choose' and 'many' deal with 'Maybe' parsers. Use this when you -- want to finally force something to happen. force :: MonadThrow m => String -- ^ Error message -> CI.ConduitM Event o m (Maybe a) -> CI.ConduitM Event o m a force msg i = do x <- i case x of Nothing -> lift $ monadThrow $ XmlException msg Nothing Just a -> return a -- | A helper function which reads a file from disk using 'enumFile', detects -- character encoding using 'detectUtf', parses the XML using 'parseBytes', and -- then hands off control to your supplied parser. parseFile :: MonadResource m => ParseSettings -> FilePath -> Producer m Event parseFile ps fp = sourceFile (encodeString fp) =$= parseBytes ps -- | Parse an event stream from a lazy 'L.ByteString'. parseLBS :: MonadThrow m => ParseSettings -> L.ByteString -> Producer m Event parseLBS ps lbs = CL.sourceList (L.toChunks lbs) =$= parseBytes ps data XmlException = XmlException { xmlErrorMessage :: String , xmlBadInput :: Maybe Event } | InvalidEndElement Name | InvalidEntity Text | UnparsedAttributes [(Name, [Content])] deriving (Show, Typeable) instance Exception XmlException -- | A monad for parsing attributes. By default, it requires you to deal with -- all attributes present on an element, and will throw an exception if there -- are unhandled attributes. Use the 'requireAttr', 'optionalAttr' et al -- functions for handling an attribute, and 'ignoreAttrs' if you would like to -- skip the rest of the attributes on an element. -- -- 'Alternative' instance behave like 'First' monoid. It chooses first -- parser which doesn't fail. newtype AttrParser a = AttrParser { runAttrParser :: [(Name, [Content])] -> Either XmlException ([(Name, [Content])], a) } instance Monad AttrParser where return a = AttrParser $ \as -> Right (as, a) (AttrParser f) >>= g = AttrParser $ \as -> case f as of Left e -> Left e Right (as', f') -> runAttrParser (g f') as' instance Functor AttrParser where fmap = liftM instance Applicative AttrParser where pure = return (<*>) = ap instance Alternative AttrParser where empty = AttrParser $ const $ Left $ XmlException "AttrParser.empty" Nothing AttrParser f <|> AttrParser g = AttrParser $ \x -> case f x of Left _ -> g x res -> res optionalAttrRaw :: ((Name, [Content]) -> Maybe b) -> AttrParser (Maybe b) optionalAttrRaw f = AttrParser $ go id where go front [] = Right (front [], Nothing) go front (a:as) = case f a of Nothing -> go (front . (:) a) as Just b -> Right (front as, Just b) requireAttrRaw :: String -> ((Name, [Content]) -> Maybe b) -> AttrParser b requireAttrRaw msg f = do x <- optionalAttrRaw f case x of Just b -> return b Nothing -> AttrParser $ const $ Left $ XmlException msg Nothing -- | Require that a certain attribute be present and return its value. requireAttr :: Name -> AttrParser Text requireAttr n = requireAttrRaw ("Missing attribute: " ++ show n) (\(x, y) -> if x == n then Just (contentsToText y) else Nothing) -- | Return the value for an attribute if present. optionalAttr :: Name -> AttrParser (Maybe Text) optionalAttr n = optionalAttrRaw (\(x, y) -> if x == n then Just (contentsToText y) else Nothing) contentsToText :: [Content] -> Text contentsToText = T.concat . map toText where toText (ContentText t) = t toText (ContentEntity e) = T.concat ["&", e, ";"] -- | Skip the remaining attributes on an element. Since this will clear the -- list of attributes, you must call this /after/ any calls to 'requireAttr', -- 'optionalAttr', etc. ignoreAttrs :: AttrParser () ignoreAttrs = AttrParser $ \_ -> Right ([], ()) -- | Keep parsing elements as long as the parser returns 'Just'. many :: Monad m => Consumer Event m (Maybe a) -> Consumer Event m [a] many i = go id where go front = do x <- i case x of Nothing -> return $ front [] Just y -> go $ front . (:) y type DecodeEntities = Text -> Content -- | Default implementation of 'DecodeEntities': handles numeric entities and -- the five standard character entities (lt, gt, amp, quot, apos). decodeXmlEntities :: DecodeEntities decodeXmlEntities "lt" = ContentText "<" decodeXmlEntities "gt" = ContentText ">" decodeXmlEntities "amp" = ContentText "&" decodeXmlEntities "quot" = ContentText "\"" decodeXmlEntities "apos" = ContentText "'" decodeXmlEntities t = let backup = ContentEntity t in case T.uncons t of Just ('#', t') -> case T.uncons t' of Just ('x', t'') | T.length t'' > 6 -> backup | otherwise -> decodeChar hexadecimal backup t'' _ | T.length t' > 7 -> backup | otherwise -> decodeChar decimal backup t' _ -> backup -- | HTML4-compliant entity decoder. Handles numerics, the five standard -- character entities, and the additional 248 entities defined by HTML 4 and -- XHTML 1. -- -- Note that HTML 5 introduces a drastically larger number of entities, and -- this code does not recognize most of them. decodeHtmlEntities :: DecodeEntities decodeHtmlEntities t = case decodeXmlEntities t of x@ContentText{} -> x backup@ContentEntity{} -> case Map.lookup t htmlEntities of Just x -> ContentText x Nothing -> backup htmlEntities :: Map.Map T.Text T.Text htmlEntities = Map.fromList $ map (pack *** pack) -- Work around the long-compile-time bug [ ("nbsp", "\160") , ("iexcl", "\161") , ("cent", "\162") , ("pound", "\163") , ("curren", "\164") , ("yen", "\165") , ("brvbar", "\166") , ("sect", "\167") , ("uml", "\168") , ("copy", "\169") , ("ordf", "\170") , ("laquo", "\171") , ("not", "\172") , ("shy", "\173") , ("reg", "\174") , ("macr", "\175") , ("deg", "\176") , ("plusmn", "\177") , ("sup2", "\178") , ("sup3", "\179") , ("acute", "\180") , ("micro", "\181") , ("para", "\182") , ("middot", "\183") , ("cedil", "\184") , ("sup1", "\185") , ("ordm", "\186") , ("raquo", "\187") , ("frac14", "\188") , ("frac12", "\189") , ("frac34", "\190") , ("iquest", "\191") , ("Agrave", "\192") , ("Aacute", "\193") , ("Acirc", "\194") , ("Atilde", "\195") , ("Auml", "\196") , ("Aring", "\197") , ("AElig", "\198") , ("Ccedil", "\199") , ("Egrave", "\200") , ("Eacute", "\201") , ("Ecirc", "\202") , ("Euml", "\203") , ("Igrave", "\204") , ("Iacute", "\205") , ("Icirc", "\206") , ("Iuml", "\207") , ("ETH", "\208") , ("Ntilde", "\209") , ("Ograve", "\210") , ("Oacute", "\211") , ("Ocirc", "\212") , ("Otilde", "\213") , ("Ouml", "\214") , ("times", "\215") , ("Oslash", "\216") , ("Ugrave", "\217") , ("Uacute", "\218") , ("Ucirc", "\219") , ("Uuml", "\220") , ("Yacute", "\221") , ("THORN", "\222") , ("szlig", "\223") , ("agrave", "\224") , ("aacute", "\225") , ("acirc", "\226") , ("atilde", "\227") , ("auml", "\228") , ("aring", "\229") , ("aelig", "\230") , ("ccedil", "\231") , ("egrave", "\232") , ("eacute", "\233") , ("ecirc", "\234") , ("euml", "\235") , ("igrave", "\236") , ("iacute", "\237") , ("icirc", "\238") , ("iuml", "\239") , ("eth", "\240") , ("ntilde", "\241") , ("ograve", "\242") , ("oacute", "\243") , ("ocirc", "\244") , ("otilde", "\245") , ("ouml", "\246") , ("divide", "\247") , ("oslash", "\248") , ("ugrave", "\249") , ("uacute", "\250") , ("ucirc", "\251") , ("uuml", "\252") , ("yacute", "\253") , ("thorn", "\254") , ("yuml", "\255") , ("OElig", "\338") , ("oelig", "\339") , ("Scaron", "\352") , ("scaron", "\353") , ("Yuml", "\376") , ("fnof", "\402") , ("circ", "\710") , ("tilde", "\732") , ("Alpha", "\913") , ("Beta", "\914") , ("Gamma", "\915") , ("Delta", "\916") , ("Epsilon", "\917") , ("Zeta", "\918") , ("Eta", "\919") , ("Theta", "\920") , ("Iota", "\921") , ("Kappa", "\922") , ("Lambda", "\923") , ("Mu", "\924") , ("Nu", "\925") , ("Xi", "\926") , ("Omicron", "\927") , ("Pi", "\928") , ("Rho", "\929") , ("Sigma", "\931") , ("Tau", "\932") , ("Upsilon", "\933") , ("Phi", "\934") , ("Chi", "\935") , ("Psi", "\936") , ("Omega", "\937") , ("alpha", "\945") , ("beta", "\946") , ("gamma", "\947") , ("delta", "\948") , ("epsilon", "\949") , ("zeta", "\950") , ("eta", "\951") , ("theta", "\952") , ("iota", "\953") , ("kappa", "\954") , ("lambda", "\955") , ("mu", "\956") , ("nu", "\957") , ("xi", "\958") , ("omicron", "\959") , ("pi", "\960") , ("rho", "\961") , ("sigmaf", "\962") , ("sigma", "\963") , ("tau", "\964") , ("upsilon", "\965") , ("phi", "\966") , ("chi", "\967") , ("psi", "\968") , ("omega", "\969") , ("thetasym", "\977") , ("upsih", "\978") , ("piv", "\982") , ("ensp", "\8194") , ("emsp", "\8195") , ("thinsp", "\8201") , ("zwnj", "\8204") , ("zwj", "\8205") , ("lrm", "\8206") , ("rlm", "\8207") , ("ndash", "\8211") , ("mdash", "\8212") , ("lsquo", "\8216") , ("rsquo", "\8217") , ("sbquo", "\8218") , ("ldquo", "\8220") , ("rdquo", "\8221") , ("bdquo", "\8222") , ("dagger", "\8224") , ("Dagger", "\8225") , ("bull", "\8226") , ("hellip", "\8230") , ("permil", "\8240") , ("prime", "\8242") , ("Prime", "\8243") , ("lsaquo", "\8249") , ("rsaquo", "\8250") , ("oline", "\8254") , ("frasl", "\8260") , ("euro", "\8364") , ("image", "\8465") , ("weierp", "\8472") , ("real", "\8476") , ("trade", "\8482") , ("alefsym", "\8501") , ("larr", "\8592") , ("uarr", "\8593") , ("rarr", "\8594") , ("darr", "\8595") , ("harr", "\8596") , ("crarr", "\8629") , ("lArr", "\8656") , ("uArr", "\8657") , ("rArr", "\8658") , ("dArr", "\8659") , ("hArr", "\8660") , ("forall", "\8704") , ("part", "\8706") , ("exist", "\8707") , ("empty", "\8709") , ("nabla", "\8711") , ("isin", "\8712") , ("notin", "\8713") , ("ni", "\8715") , ("prod", "\8719") , ("sum", "\8721") , ("minus", "\8722") , ("lowast", "\8727") , ("radic", "\8730") , ("prop", "\8733") , ("infin", "\8734") , ("ang", "\8736") , ("and", "\8743") , ("or", "\8744") , ("cap", "\8745") , ("cup", "\8746") , ("int", "\8747") , ("there4", "\8756") , ("sim", "\8764") , ("cong", "\8773") , ("asymp", "\8776") , ("ne", "\8800") , ("equiv", "\8801") , ("le", "\8804") , ("ge", "\8805") , ("sub", "\8834") , ("sup", "\8835") , ("nsub", "\8836") , ("sube", "\8838") , ("supe", "\8839") , ("oplus", "\8853") , ("otimes", "\8855") , ("perp", "\8869") , ("sdot", "\8901") , ("lceil", "\8968") , ("rceil", "\8969") , ("lfloor", "\8970") , ("rfloor", "\8971") , ("lang", "\9001") , ("rang", "\9002") , ("loz", "\9674") , ("spades", "\9824") , ("clubs", "\9827") , ("hearts", "\9829") , ("diams", "\9830") ] decodeChar :: Reader Word32 -> Content -> Text -> Content decodeChar readNum backup = either (const backup) toContent . readNum where toContent (num, extra) | T.null extra = case decodeUtf32BEWith ignore . toByteString $ fromWord32be num of c | T.length c == 1 -> ContentText c | otherwise -> backup toContent _ = backup xml-conduit-1.1.0.9/Text/XML/Cursor/0000755000000000000000000000000012247623377015167 5ustar0000000000000000xml-conduit-1.1.0.9/Text/XML/Cursor/Generic.hs0000644000000000000000000001254412247623377017105 0ustar0000000000000000-- | Generalized cursors to be applied to different nodes. module Text.XML.Cursor.Generic ( -- * Core Cursor , Axis , toCursor , node -- * Axes , child , parent , precedingSibling , followingSibling , ancestor , descendant , orSelf , preceding , following -- * Operators , (&|) , (&/) , (&//) , (&.//) , ($|) , ($/) , ($//) , ($.//) , (>=>) ) where import Data.Maybe (maybeToList) import Data.List (foldl') import Control.Monad ((>=>)) type DiffCursor node = [Cursor node] -> [Cursor node] type Axis node = Cursor node -> [Cursor node] -- | A cursor: contains an XML 'Node' and pointers to its children, ancestors and siblings. data Cursor node = Cursor { parent' :: Maybe (Cursor node) , precedingSibling' :: DiffCursor node , followingSibling' :: DiffCursor node -- | The child axis. XPath: -- /the child axis contains the children of the context node/. , child :: [Cursor node] -- | The current node. , node :: node } instance Show node => Show (Cursor node) where show Cursor { node = n } = "Cursor @ " ++ show n toCursor :: (node -> [node]) -- ^ get children -> node -> Cursor node toCursor cs = toCursor' cs Nothing id id toCursor' :: (node -> [node]) -> Maybe (Cursor node) -> DiffCursor node -> DiffCursor node -> node -> Cursor node toCursor' cs par pre fol n = me where me = Cursor par pre fol chi n chi' = cs n chi = go id chi' [] go _ [] = id go pre' (n':ns') = (:) me' . fol' where me' = toCursor' cs (Just me) pre' fol' n' fol' = go (pre' . (:) me') ns' -- | The parent axis. As described in XPath: -- /the parent axis contains the parent of the context node, if there is one/. -- -- Every node but the root element of the document has a parent. Parent nodes -- will always be 'NodeElement's. parent :: Axis node parent = maybeToList . parent' -- | The preceding-sibling axis. XPath: -- /the preceding-sibling axis contains all the preceding siblings of the context node [...]/. precedingSibling :: Axis node precedingSibling = ($ []) . precedingSibling' -- | The following-sibling axis. XPath: -- /the following-sibling axis contains all the following siblings of the context node [...]/. followingSibling :: Axis node followingSibling = ($ []) . followingSibling' -- | The preceding axis. XPath: -- /the preceding axis contains all nodes in the same document as the context node that are before the context node in document order, excluding any ancestors and excluding attribute nodes and namespace nodes/. preceding :: Axis node preceding c = go (precedingSibling' c []) (parent c >>= preceding) where go x y = foldl' (flip go') y x go' x rest = foldl' (flip go') (x : rest) (child x) -- | The following axis. XPath: -- /the following axis contains all nodes in the same document as the context node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes/. following :: Axis node following c = go (followingSibling' c) (parent c >>= following) where go x z = foldr go' z (x []) go' x rest = x : foldr go' rest (child x) -- | The ancestor axis. XPath: -- /the ancestor axis contains the ancestors of the context node; the ancestors of the context node consist of the parent of context node and the parent's parent and so on; thus, the ancestor axis will always include the root node, unless the context node is the root node/. ancestor :: Axis node ancestor = parent >=> (\p -> p : ancestor p) -- | The descendant axis. XPath: -- /the descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes/. descendant :: Axis node descendant = child >=> (\c -> c : descendant c) -- | Modify an axis by adding the context node itself as the first element of the result list. orSelf :: Axis node -> Axis node orSelf ax c = c : ax c infixr 1 &| infixr 1 &/ infixr 1 &// infixr 1 &.// infixr 1 $| infixr 1 $/ infixr 1 $// infixr 1 $.// -- | Apply a function to the result of an axis. (&|) :: (Cursor node -> [a]) -> (a -> b) -> (Cursor node -> [b]) f &| g = map g . f -- | Combine two axes so that the second works on the children of the results -- of the first. (&/) :: Axis node -> (Cursor node -> [a]) -> (Cursor node -> [a]) f &/ g = f >=> child >=> g -- | Combine two axes so that the second works on the descendants of the results -- of the first. (&//) :: Axis node -> (Cursor node -> [a]) -> (Cursor node -> [a]) f &// g = f >=> descendant >=> g -- | Combine two axes so that the second works on both the result nodes, and their -- descendants. (&.//) :: Axis node -> (Cursor node -> [a]) -> (Cursor node -> [a]) f &.// g = f >=> orSelf descendant >=> g -- | Apply an axis to a 'Cursor node'. ($|) :: Cursor node -> (Cursor node -> a) -> a v $| f = f v -- | Apply an axis to the children of a 'Cursor node'. ($/) :: Cursor node -> (Cursor node -> [a]) -> [a] v $/ f = child v >>= f -- | Apply an axis to the descendants of a 'Cursor node'. ($//) :: Cursor node -> (Cursor node -> [a]) -> [a] v $// f = descendant v >>= f -- | Apply an axis to a 'Cursor node' as well as its descendants. ($.//) :: Cursor node -> (Cursor node -> [a]) -> [a] v $.// f = orSelf descendant v >>= f xml-conduit-1.1.0.9/test/0000755000000000000000000000000012247623377013305 5ustar0000000000000000xml-conduit-1.1.0.9/test/main.hs0000644000000000000000000005126312247623377014574 0ustar0000000000000000{-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE OverloadedStrings #-} import Control.Monad.IO.Class (liftIO) import Data.XML.Types import Test.HUnit hiding (Test) import Test.Hspec import qualified Data.ByteString.Char8 as S import qualified Data.ByteString.Lazy.Char8 as L import qualified Text.XML.Unresolved as D import qualified Text.XML.Stream.Parse as P import qualified Text.XML as Res import qualified Text.XML.Cursor as Cu import Text.XML.Stream.Parse (def) import Text.XML.Cursor ((&/), (&//), (&.//), ($|), ($/), ($//), ($.//)) import Data.Text(Text) import Control.Monad import Control.Monad.Trans.Class (lift) import qualified Data.Text as T import qualified Data.Set as Set import Control.Exception (toException) import qualified Data.Conduit as C import qualified Data.Conduit.List as CL import qualified Data.Map as Map import Text.Blaze (toMarkup) import Text.Blaze.Renderer.String (renderMarkup) main :: IO () main = hspec $ do describe "XML parsing and rendering" $ do it "is idempotent to parse and render a document" documentParseRender it "has valid parser combinators" combinators it "has working choose function" testChoose it "has working many function" testMany it "has working orE" testOrE it "is idempotent to parse and pretty render a document" documentParsePrettyRender it "ignores the BOM" parseIgnoreBOM it "strips duplicated attributes" stripDuplicateAttributes it "displays comments" testRenderComments it "conduit parser" testConduitParser describe "XML Cursors" $ do it "has correct parent" cursorParent it "has correct ancestor" cursorAncestor it "has correct orSelf" cursorOrSelf it "has correct preceding" cursorPreceding it "has correct following" cursorFollowing it "has correct precedingSibling" cursorPrecedingSib it "has correct followingSibling" cursorFollowingSib it "has correct descendant" cursorDescendant it "has correct check" cursorCheck it "has correct check with lists" cursorPredicate it "has correct checkNode" cursorCheckNode it "has correct checkElement" cursorCheckElement it "has correct checkName" cursorCheckName it "has correct anyElement" cursorAnyElement it "has correct element" cursorElement it "has correct laxElement" cursorLaxElement it "has correct content" cursorContent it "has correct attribute" cursorAttribute it "has correct laxAttribute" cursorLaxAttribute it "has correct &* and $* operators" cursorDeep it "has correct force" cursorForce it "has correct forceM" cursorForceM it "has correct hasAttribute" cursorHasAttribute it "has correct attributeIs" cursorAttributeIs describe "resolved" $ do it "identifies unresolved entities" resolvedIdentifies it "decodeHtmlEntities" testHtmlEntities it "works for resolvable entities" resolvedAllGood it "merges adjacent content nodes" resolvedMergeContent it "understands inline entity declarations" resolvedInline describe "pretty" $ do it "works" casePretty describe "top level namespaces" $ do it "works" caseTopLevelNamespace it "works with prefix" caseTopLevelNamespacePrefix it "handles conflicts" caseTLNConflict describe "blaze-html instances" $ do it "works" caseBlazeHtml describe "attribute reordering" $ do it "works" caseAttrReorder describe "ordering attributes explicitly" $ do it "works" caseOrderAttrs it "parsing CDATA" caseParseCdata documentParseRender :: IO () documentParseRender = mapM_ go docs where go x = x @=? D.parseLBS_ def (D.renderLBS def x) docs = [ Document (Prologue [] Nothing []) (Element "foo" [] []) [] , D.parseLBS_ def "\n" , D.parseLBS_ def "\n&ignore;" , D.parseLBS_ def "]]>" , D.parseLBS_ def "" , D.parseLBS_ def "" , D.parseLBS_ def "" ] documentParsePrettyRender :: IO () documentParsePrettyRender = L.unpack (D.renderLBS def { D.rsPretty = True } (D.parseLBS_ def doc)) @?= L.unpack doc where doc = L.unlines [ "" , "" , " " , " text" , " " , "" ] combinators :: Assertion combinators = C.runResourceT $ P.parseLBS def input C.$$ do P.force "need hello" $ P.tagName "hello" (P.requireAttr "world") $ \world -> do liftIO $ world @?= "true" P.force "need child1" $ P.tagNoAttr "{mynamespace}child1" $ return () P.force "need child2" $ P.tagNoAttr "child2" $ return () P.force "need child3" $ P.tagNoAttr "child3" $ do x <- P.contentMaybe liftIO $ x @?= Just "combine &content" where input = L.concat [ "" , "\n" , "" , "" , "" , "" , " " , "combine <all> \n" , "" ] testChoose :: Assertion testChoose = C.runResourceT $ P.parseLBS def input C.$$ do P.force "need hello" $ P.tagNoAttr "hello" $ do x <- P.choose [ P.tagNoAttr "failure" $ return 1 , P.tagNoAttr "success" $ return 2 ] liftIO $ x @?= Just (2 :: Int) where input = L.concat [ "" , "\n" , "" , "" , "" ] testMany :: Assertion testMany = C.runResourceT $ P.parseLBS def input C.$$ do P.force "need hello" $ P.tagNoAttr "hello" $ do x <- P.many $ P.tagNoAttr "success" $ return () liftIO $ length x @?= 5 where input = L.concat [ "" , "\n" , "" , "" , "" , "" , "" , "" , "" ] testOrE :: IO () testOrE = C.runResourceT $ P.parseLBS def input C.$$ do P.force "need hello" $ P.tagNoAttr "hello" $ do x <- P.tagNoAttr "failure" (return 1) `P.orE` P.tagNoAttr "success" (return 2) liftIO $ x @?= Just (2 :: Int) where input = L.concat [ "" , "\n" , "" , "" , "" ] testConduitParser :: Assertion testConduitParser = C.runResourceT $ do x <- P.parseLBS def input C.$= (P.force "need hello" $ P.tagNoAttr "hello" f) C.$$ CL.consume liftIO $ x @?= [1, 1, 1] where input = L.concat [ "" , "\n" , "" , "" , "" , "" , "" ] f :: C.MonadThrow m => C.Conduit Event m Int f = do ma <- P.tagNoAttr "item" (return 1) maybe (return ()) (\a -> C.yield a >> f) ma name :: [Cu.Cursor] -> [Text] name [] = [] name (c:cs) = ($ name cs) $ case Cu.node c of Res.NodeElement e -> ((Res.nameLocalName $ Res.elementName e) :) _ -> id cursor :: Cu.Cursor cursor = Cu.fromDocument $ Res.parseLBS_ def input where input = L.concat [ "" , "" , "" , "" , "" , "a" , "" , "" , "" , "b" , "" , "" , "" , "" , "" ] bar2, baz2, bar3, bin2 :: Cu.Cursor bar2 = Cu.child cursor !! 1 baz2 = Cu.child bar2 !! 1 bar3 = Cu.child cursor !! 2 bin2 = Cu.child bar3 !! 1 cursorParent, cursorAncestor, cursorOrSelf, cursorPreceding, cursorFollowing, cursorPrecedingSib, cursorFollowingSib, cursorDescendant, cursorCheck, cursorPredicate, cursorCheckNode, cursorCheckElement, cursorCheckName, cursorAnyElement, cursorElement, cursorLaxElement, cursorContent, cursorAttribute, cursorLaxAttribute, cursorHasAttribute, cursorAttributeIs, cursorDeep, cursorForce, cursorForceM, resolvedIdentifies, resolvedAllGood, resolvedMergeContent, testHtmlEntities :: Assertion cursorParent = name (Cu.parent bar2) @?= ["foo"] cursorAncestor = name (Cu.ancestor baz2) @?= ["bar2", "foo"] cursorOrSelf = name (Cu.orSelf Cu.ancestor baz2) @?= ["baz2", "bar2", "foo"] cursorPreceding = do name (Cu.preceding baz2) @?= ["baz1", "bar1"] name (Cu.preceding bin2) @?= ["bin1", "baz3", "baz2", "baz1", "bar2", "bar1"] cursorFollowing = do name (Cu.following baz2) @?= ["baz3", "bar3", "bin1", "bin2", "bin3", "Bar1"] name (Cu.following bar2) @?= ["bar3", "bin1", "bin2", "bin3", "Bar1"] cursorPrecedingSib = name (Cu.precedingSibling baz2) @?= ["baz1"] cursorFollowingSib = name (Cu.followingSibling baz2) @?= ["baz3"] cursorDescendant = (name $ Cu.descendant cursor) @?= T.words "bar1 bar2 baz1 baz2 baz3 bar3 bin1 bin2 bin3 Bar1" cursorCheck = null (cursor $.// Cu.check (const False)) @?= True cursorPredicate = (name $ cursor $.// Cu.check Cu.descendant) @?= T.words "foo bar2 baz3 bar3" cursorCheckNode = (name $ cursor $// Cu.checkNode f) @?= T.words "bar1 bar2 bar3" where f (Res.NodeElement e) = "bar" `T.isPrefixOf` Res.nameLocalName (Res.elementName e) f _ = False cursorCheckElement = (name $ cursor $// Cu.checkElement f) @?= T.words "bar1 bar2 bar3" where f e = "bar" `T.isPrefixOf` Res.nameLocalName (Res.elementName e) cursorCheckName = (name $ cursor $// Cu.checkName f) @?= T.words "bar1 bar2 bar3" where f n = "bar" `T.isPrefixOf` nameLocalName n cursorAnyElement = (name $ cursor $// Cu.anyElement) @?= T.words "bar1 bar2 baz1 baz2 baz3 bar3 bin1 bin2 bin3 Bar1" cursorElement = (name $ cursor $// Cu.element "bar1") @?= ["bar1"] cursorLaxElement = (name $ cursor $// Cu.laxElement "bar1") @?= ["bar1", "Bar1"] cursorContent = do Cu.content cursor @?= [] (cursor $.// Cu.content) @?= ["a", "b"] cursorAttribute = Cu.attribute "attr" cursor @?= ["x"] cursorLaxAttribute = (cursor $.// Cu.laxAttribute "Attr") @?= ["x", "y", "q"] cursorHasAttribute = (length $ cursor $.// Cu.hasAttribute "attr") @?= 2 cursorAttributeIs = (length $ cursor $.// Cu.attributeIs "attr" "y") @?= 1 cursorDeep = do (Cu.element "foo" &/ Cu.element "bar2" &// Cu.attribute "attr") cursor @?= ["y"] (return &.// Cu.attribute "attr") cursor @?= ["x", "y"] (cursor $.// Cu.attribute "attr") @?= ["x", "y"] (cursor $/ Cu.element "bar2" &// Cu.attribute "attr") @?= ["y"] (cursor $/ Cu.element "bar2" &/ Cu.element "baz2" >=> Cu.attribute "attr") @?= ["y"] null (cursor $| Cu.element "foo") @?= False cursorForce = do Cu.force () [] @?= (Nothing :: Maybe Integer) Cu.force () [1] @?= Just (1 :: Int) Cu.force () [1,2] @?= Just (1 :: Int) cursorForceM = do Cu.forceM () [] @?= (Nothing :: Maybe Integer) Cu.forceM () [Just 1, Nothing] @?= Just (1 :: Int) Cu.forceM () [Nothing, Just (1 :: Int)] @?= Nothing showEq :: (Show a, Show b) => Either a b -> Either a b -> Assertion showEq x y = show x @=? show y resolvedIdentifies = Left (toException $ Res.UnresolvedEntityException $ Set.fromList ["foo", "bar", "baz"]) `showEq` Res.parseLBS def "&foo; --- &baz; &foo;" testHtmlEntities = Res.parseLBS_ def { P.psDecodeEntities = P.decodeHtmlEntities } xml1 @=? Res.parseLBS_ def xml2 where xml1 = " " xml2 = " " resolvedAllGood = D.parseLBS_ def xml @=? Res.toXMLDocument (Res.parseLBS_ def xml) where xml = "" resolvedMergeContent = Res.documentRoot (Res.parseLBS_ def xml) @=? Res.Element "foo" Map.empty [Res.NodeContent "bar&baz"] where xml = "bar&baz" parseIgnoreBOM :: Assertion parseIgnoreBOM = do either (const $ Left (1 :: Int)) Right (Res.parseText Res.def "\xfeef") @?= either (const $ Left (2 :: Int)) Right (Res.parseText Res.def "") stripDuplicateAttributes :: Assertion stripDuplicateAttributes = do "" @=? D.renderLBS def (Document (Prologue [] Nothing []) (Element "foo" [("bar", [ContentText "baz"]), ("bar", [ContentText "bin"])] []) []) "" @=? D.renderLBS def (Document (Prologue [] Nothing []) (Element "foo" [ ("x:bar", [ContentText "baz"]) , (Name "bar" (Just "namespace") (Just "x"), [ContentText "bin"]) ] []) []) testRenderComments :: Assertion testRenderComments =do "" @=? D.renderLBS def (Document (Prologue [] Nothing []) (Element "foo" [] [NodeComment "comment"]) []) resolvedInline :: Assertion resolvedInline = do Res.Document _ root _ <- return $ Res.parseLBS_ Res.def "]>&bar;" root @?= Res.Element "foo" Map.empty [Res.NodeContent "baz"] Res.Document _ root2 _ <- return $ Res.parseLBS_ Res.def "]>" root2 @?= Res.Element "foo" (Map.singleton "bar" "baz") [] casePretty :: Assertion casePretty = do let pretty = S.unlines [ "" , "" , "" , " " , " Hello World" , " " , " " , " " , " " , " " , " bar content" , " " , "" ] doctype = Res.Doctype "foo" Nothing doc = Res.Document (Res.Prologue [] (Just doctype) []) root [] root = Res.Element "foo" (Map.fromList [("bar", "bar"), ("baz", "baz")]) [ Res.NodeElement $ Res.Element "foo" (Map.fromList [("bar", "bar"), ("baz", "baz"), ("bin", "bin")]) [ Res.NodeContent " Hello World\n\n" , Res.NodeContent " " ] , Res.NodeElement $ Res.Element "foo" Map.empty [] , Res.NodeInstruction $ Res.Instruction "foo" "bar" , Res.NodeComment "foo bar\n\r\nbaz \tbin " , Res.NodeElement $ Res.Element "bar" Map.empty [Res.NodeContent "bar content"] ] pretty @=? S.concat (L.toChunks $ Res.renderLBS def { D.rsPretty = True } doc) caseTopLevelNamespace :: Assertion caseTopLevelNamespace = do let lbs = S.concat [ "" , "" , "" , "" ] rs = def { D.rsNamespaces = [("bar", "baz")] } doc = Res.Document (Res.Prologue [] Nothing []) (Res.Element "foo" Map.empty [ Res.NodeElement $ Res.Element "subfoo" (Map.singleton "{baz}bin" "") [] ]) [] lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc) caseTopLevelNamespacePrefix :: Assertion caseTopLevelNamespacePrefix = do let lbs = S.concat [ "" , "" , "" , "" ] rs = def { D.rsNamespaces = [("bar", "baz")] } doc = Res.Document (Res.Prologue [] Nothing []) (Res.Element "foo" Map.empty [ Res.NodeElement $ Res.Element "subfoo" (Map.fromList [(Name "bin" (Just "baz") (Just "bar"), "")]) [] ]) [] lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc) caseTLNConflict :: Assertion caseTLNConflict = do let lbs = S.concat [ "" , "" , "" , "" ] rs = def { D.rsNamespaces = [("bar", "baz")] } doc = Res.Document (Res.Prologue [] Nothing []) (Res.Element "foo" (Map.fromList [(Name "x" (Just "something") (Just "bar"), "y")]) [ Res.NodeElement $ Res.Element "subfoo" (Map.fromList [(Name "bin" (Just "baz") (Just "bar"), "")]) [] ]) [] lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc) caseBlazeHtml :: Assertion caseBlazeHtml = expected @=? str where str = renderMarkup $ toMarkup $ Res.Document (Res.Prologue [] Nothing []) root [] root :: Res.Element root = Res.Element "html" Map.empty [ Res.NodeElement $ Res.Element "head" Map.empty [ Res.NodeElement $ Res.Element "title" Map.empty [Res.NodeContent "Test"] , Res.NodeElement $ Res.Element "script" Map.empty [Res.NodeContent "if (5 < 6 || 8 > 9) alert('Hello World!');"] , Res.NodeElement $ Res.Element "{http://www.snoyman.com/xml2html}ie-cond" (Map.singleton "cond" "lt IE 7") [Res.NodeElement $ Res.Element "link" (Map.singleton "href" "ie6.css") []] , Res.NodeElement $ Res.Element "style" Map.empty [Res.NodeContent "body > h1 { color: red }"] ] , Res.NodeElement $ Res.Element "body" Map.empty [ Res.NodeElement $ Res.Element "h1" Map.empty [Res.NodeContent "Hello World!"] ] ] expected :: String expected = concat [ "\n" , "Test" , "" , "" , "" , "

Hello World!

" ] caseAttrReorder :: Assertion caseAttrReorder = do let lbs = S.concat [ "" , "" , "" , "" ] rs = def { Res.rsAttrOrder = \name m -> case name of "foo" -> reverse $ Map.toAscList m _ -> Map.toAscList m } attrs = Map.fromList [("a", "a"), ("b", "b"), ("c", "c")] doc = Res.Document (Res.Prologue [] Nothing []) (Res.Element "foo" attrs [ Res.NodeElement $ Res.Element "bar" attrs [] ]) [] lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc) caseOrderAttrs :: Assertion caseOrderAttrs = do let lbs = S.concat [ "" , "" , "" , "" ] rs = def { Res.rsAttrOrder = Res.orderAttrs [("foo", ["c", "b"])] } attrs = Map.fromList [("a", "a"), ("b", "b"), ("c", "c")] doc = Res.Document (Res.Prologue [] Nothing []) (Res.Element "foo" attrs [ Res.NodeElement $ Res.Element "bar" attrs [] ]) [] lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc) caseParseCdata :: Assertion caseParseCdata = do let lbs = "" doc = Res.Document (Res.Prologue [] Nothing []) (Res.Element "a" Map.empty [ Res.NodeContent "www.google.com" ]) [] Res.parseLBS_ def lbs @?= doc