xml-conduit-1.9.1.3/ 0000755 0000000 0000000 00000000000 07346545000 012317 5 ustar 00 0000000 0000000 xml-conduit-1.9.1.3/ChangeLog.md 0000644 0000000 0000000 00000011622 07346545000 014472 0 ustar 00 0000000 0000000 ## 1.9.1.1
* Entity declarations with tags inside are now correctly handled
* Parser now fails gracefully on malformed entity declarations
* Parameter entity declarations are now ignored
## 1.9.1
* `]` characters inside doctype are now correctly handled
* Entity expansion loops are now detected and avoided
* Add field `psEntityExpansionSizeLimit` in `ParseSettings` to limit the length of an entity expansion; set to 8192 characters by default
## 1.9.0
* Remove deprecated functions (`ignoreTag`, `ignoreAllTreesContent`, `takeAllTreesContent`)
* Rename `parseText'` into `parseText`
* `takeContent` and `ignoreContent` now cover entities
* Align behaviour of `take`* and `ignore`* functions
## 1.8.0.1
* Use doctest to validate code examples from documentation
## 1.8.0
* Upgrade to conduit 1.3.0
## 1.7.1
* Add `psDecodeIllegalCharacters` field in `ParseSettings` to specify how illegal characters references should be decoded
* Fix compatibility with GHC 8.4.1 [#121](https://github.com/snoyberg/xml/issues/121)
## 1.7.0
* `psDecodeEntities` is no longer passed numeric character references (e.g., ` `, `A`) and the predefined XML entities (`&`, `<`, etc). They are now handled by the parser. Both of these construct classes only have one spec-compliant interpretation and this behaviour must always be present, so it makes no sense to force user code to re-implement the parsing logic.
* In prior versions of xml-conduit, hexadecimal character references with a leading `0x` or `0X` like `&0x20;` were accepted. This was not in compliance with the XML specification and it has been corrected.
* xml-conduit now rejects some (but not all) invalid-according-to-spec entities during parsing: specifically, entities with a leading `#` that are not character references are no longer allowed and will be parse errors.
## 1.6.0
* Dropped the dependency on `data-default` for `data-default-class`, reducing the transitive dependency load. For most users, this will not be a breaking change, but it does mean that importing `Text.XML.Conduit` will no longer bring various instances for `Default` into scope. This will break code that relies on those instances and does not otherwise see them. To fix this, import `Data.Default` from `data-default` or one of the more specific instance-providing packages directly (e.g., `data-default-dlist` for the `DList` instance).
## 1.5.1
* New render setting, `rsXMLDeclaration`; setting it to `False` omits the XML declaration.
## 1.5.0
* `tag` function no longer throws an exception when attributes don't match [#93](https://github.com/snoyberg/xml/pull/93)
* Add `many_` combinator to avoid building results in memory [#94](https://github.com/snoyberg/xml/pull/94)
* Turn some functions from `Consumer Event m a` to `ConduitM Event o m a` to allow yielding values
* Replace `takeAllTreesContent` with `takeAnyTreeContent`, that only consumes one tree
* Introduce `NameMatcher` type to refactor tag parsers
* Add a couple of `take*` functions to stream events rather than parse them
* Rename `ignore*` functions to comply with naming convention
## 1.4.0.3
* Compatibility with blaze-markup-0.8.0.0 [#95](https://github.com/snoyberg/xml/issues/95)
## 1.4.0.2
* Parse XML encoding case-insensitively
* Remove extra EOL when printing XmlException
## 1.4.0.1
* Handle CDATA in takeAllTreesContent [#88](https://github.com/snoyberg/xml/pull/88)
## 1.4.0
* Improve XmlException definition and usage
* Add 'takeAllTreesContent' function
## 1.3.5
* Improvements for using xml-conduit for streaming XML protocols [#85](https://github.com/snoyberg/xml/pull/85)
## 1.3.4.2
* transformers dep bump
## 1.3.4.1
* Remove unneeded ImpredicativeTypes
## 1.3.4
* dropWS retains consumed whitespace values [#74](https://github.com/snoyberg/xml/issues/74) [#75](https://github.com/snoyberg/xml/pull/75) [#76](https://github.com/snoyberg/xml/pull/76)
## 1.3.3.1
* Generalize signature of choose (Fixes [#72](https://github.com/snoyberg/xml/issues/72)) [#73](https://github.com/snoyberg/xml/pull/73)
## 1.3.3
* New render setting to control when to use CDATA [#68](https://github.com/snoyberg/xml/pull/68)
* Escaping CDATA closing tag in CDATA [#69](https://github.com/snoyberg/xml/pull/69)
## 1.3.2
* Support for iso-8859-1 [#63](https://github.com/snoyberg/xml/issues/63)
## 1.3.1
* Add functions to ignore subtrees & result-streaming (yield) parsers [#58](https://github.com/snoyberg/xml/pull/58)
## 1.3.0
* Drop system-filepath
## 1.2.6
* Reuse 'MonadThrow' and 'force' for 'AttrParser' [#52](https://github.com/snoyberg/xml/pull/52)
## 1.2.5
* Added helper functions to render XML elements [#48](https://github.com/snoyberg/xml/pull/48)
## 1.2.4
* 'parseText' becomes 'parseText'/'parseTextPos', depending on the output type [#47](https://github.com/snoyberg/xml/pull/47)
## 1.2.3.3
* Allow blaze-builder 0.4
## 1.2.3.2
* Doc fix [#44](https://github.com/snoyberg/xml/pull/44)
## 1.2.3.1
Support monad-control 1.0
xml-conduit-1.9.1.3/LICENSE 0000644 0000000 0000000 00000002066 07346545000 013330 0 ustar 00 0000000 0000000 Copyright 2010, Suite Solutions. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
xml-conduit-1.9.1.3/README.md 0000644 0000000 0000000 00000002222 07346545000 013574 0 ustar 00 0000000 0000000 ## xml-conduit
This package provides parsing and rendering functions for XML. It is based on the datatypes found in the xml-types package. This package is broken up into the following modules:
* Text.XML: DOM-based parsing and rendering. This is the most commonly used module.
* Text.XML.Cursor: A wrapper around `Text.XML` which allows bidirectional traversing of the DOM, similar to XPath. (Note: Text.XML.Cursor.Generic is the same concept, but will work with any node representation.)
* Text.XML.Unresolved: A slight modification to `Text.XML` which does not require all entities to be resolved at parsing. The datatypes are slightly more complicated here, and therefore this module is only recommended when you need to deal directly with raw entities.
* Text.XML.Stream.Parse: Streaming parser, including some streaming parser combinators.
* Text.XML.Stream.Render: Streaming renderer.
Additionally, the [xml-hamlet
package](http://www.stackage.org/package/xml-hamlet) provides a more convenient
syntax for creating XML documents. For a more thorough tutorial on this
library, please see
[http://www.yesodweb.com/book/xml](http://www.yesodweb.com/book/xml).
xml-conduit-1.9.1.3/Setup.hs 0000644 0000000 0000000 00000000207 07346545000 013752 0 ustar 00 0000000 0000000 module Main where
import Distribution.Extra.Doctest (defaultMainWithDoctests)
main :: IO ()
main = defaultMainWithDoctests "doctest"
xml-conduit-1.9.1.3/src/Text/ 0000755 0000000 0000000 00000000000 07346545000 014032 5 ustar 00 0000000 0000000 xml-conduit-1.9.1.3/src/Text/XML.hs 0000644 0000000 0000000 00000030106 07346545000 015026 0 ustar 00 0000000 0000000 {-# LANGUAGE CPP #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE PatternGuards #-}
{-# LANGUAGE RankNTypes #-}
-- | DOM-based parsing and rendering.
--
-- This module requires that all entities be resolved at parsing. If you need
-- to interact with unresolved entities, please use "Text.XML.Unresolved". This
-- is the recommended module for most uses cases.
--
-- While many of the datatypes in this module are simply re-exported from
-- @Data.XML.Types@, 'Document', 'Node' and 'Element' are all redefined here to
-- disallow the possibility of unresolved entities. Conversion functions are
-- provided to switch between the two sets of datatypes.
--
-- For simpler, bidirectional traversal of the DOM tree, see the
-- "Text.XML.Cursor" module.
module Text.XML
( -- * Data types
Document (..)
, Prologue (..)
, Instruction (..)
, Miscellaneous (..)
, Node (..)
, Element (..)
, Name (..)
, Doctype (..)
, ExternalID (..)
-- * Parsing
-- ** Files
, readFile
-- ** Bytes
, parseLBS
, parseLBS_
, sinkDoc
-- ** Text
, parseText
, parseText_
, sinkTextDoc
-- ** Other
, fromEvents
, UnresolvedEntityException (..)
, XMLException (..)
-- * Rendering
, writeFile
, renderLBS
, renderText
, renderBytes
-- * Settings
, def
-- ** Parsing
, ParseSettings
, psDecodeEntities
, P.psRetainNamespaces
-- *** Entity decoding
, P.decodeXmlEntities
, P.decodeHtmlEntities
-- ** Rendering
, R.RenderSettings
, R.rsPretty
, R.rsNamespaces
, R.rsAttrOrder
, R.rsUseCDATA
, R.rsXMLDeclaration
, R.orderAttrs
-- * Conversion
, toXMLDocument
, fromXMLDocument
, toXMLNode
, fromXMLNode
, toXMLElement
, fromXMLElement
) where
import Conduit
import Control.Applicative ((<$>))
import Control.DeepSeq (NFData (rnf))
import Control.Exception (Exception, SomeException, handle,
throw, throwIO)
import Control.Monad.Trans.Resource (MonadThrow, throwM)
import Data.ByteString (ByteString)
import qualified Data.ByteString.Lazy as L
import Data.Data (Data)
import Data.Either (partitionEithers)
import qualified Data.Map as Map
import Data.Set (Set)
import qualified Data.Set as Set
import Data.Text (Text)
import qualified Data.Text as T
import Data.Typeable (Typeable)
import Data.XML.Types (Doctype (..), ExternalID (..),
Instruction (..),
Miscellaneous (..), Name (..),
Prologue (..))
import qualified Data.XML.Types as X
import Prelude hiding (readFile, writeFile)
import Text.XML.Stream.Parse (ParseSettings, def,
psDecodeEntities)
import qualified Text.XML.Stream.Parse as P
import qualified Text.XML.Stream.Render as R
import qualified Text.XML.Unresolved as D
import Control.Monad.Trans.Class (lift)
import qualified Data.Conduit.Binary as CB
import Data.Conduit.Lazy (lazyConsume)
import qualified Data.Conduit.List as CL
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Encoding as TLE
import System.IO.Unsafe (unsafePerformIO)
import Control.Arrow (first)
import Data.List (foldl')
import Data.Monoid (mappend, mempty)
import Data.String (fromString)
import qualified Text.Blaze as B
import qualified Text.Blaze.Html as B
import qualified Text.Blaze.Html5 as B5
import qualified Text.Blaze.Internal as BI
data Document = Document
{ documentPrologue :: Prologue
, documentRoot :: Element
, documentEpilogue :: [Miscellaneous]
}
deriving (Show, Eq, Typeable, Data)
#if MIN_VERSION_containers(0, 4, 2)
instance NFData Document where
rnf (Document a b c) = rnf a `seq` rnf b `seq` rnf c `seq` ()
#endif
data Node
= NodeElement Element
| NodeInstruction Instruction
| NodeContent Text
| NodeComment Text
deriving (Show, Eq, Ord, Typeable, Data)
#if MIN_VERSION_containers(0, 4, 2)
instance NFData Node where
rnf (NodeElement e) = rnf e `seq` ()
rnf (NodeInstruction i) = rnf i `seq` ()
rnf (NodeContent t) = rnf t `seq` ()
rnf (NodeComment t) = rnf t `seq` ()
#endif
data Element = Element
{ elementName :: Name
, elementAttributes :: Map.Map Name Text
, elementNodes :: [Node]
}
deriving (Show, Eq, Ord, Typeable, Data)
#if MIN_VERSION_containers(0, 4, 2)
instance NFData Element where
rnf (Element a b c) = rnf a `seq` rnf b `seq` rnf c `seq` ()
#endif
{-
readFile :: FilePath -> ParseSettings -> IO (Either SomeException Document)
readFile_ :: FIlePath -> ParseSettings -> IO Document
-}
toXMLDocument :: Document -> X.Document
toXMLDocument = toXMLDocument' def
toXMLDocument' :: R.RenderSettings -> Document -> X.Document
toXMLDocument' rs (Document a b c) = X.Document a (toXMLElement' rs b) c
toXMLElement :: Element -> X.Element
toXMLElement = toXMLElement' def
toXMLElement' :: R.RenderSettings -> Element -> X.Element
toXMLElement' rs (Element name as nodes) =
X.Element name as' nodes'
where
as' = map (\(x, y) -> (x, [X.ContentText y])) $ R.rsAttrOrder rs name as
nodes' = map (toXMLNode' rs) nodes
toXMLNode :: Node -> X.Node
toXMLNode = toXMLNode' def
toXMLNode' :: R.RenderSettings -> Node -> X.Node
toXMLNode' rs (NodeElement e) = X.NodeElement $ toXMLElement' rs e
toXMLNode' _ (NodeContent t) = X.NodeContent $ X.ContentText t
toXMLNode' _ (NodeComment c) = X.NodeComment c
toXMLNode' _ (NodeInstruction i) = X.NodeInstruction i
fromXMLDocument :: X.Document -> Either (Set Text) Document
fromXMLDocument (X.Document a b c) =
case fromXMLElement b of
Left es -> Left es
Right b' -> Right $ Document a b' c
fromXMLElement :: X.Element -> Either (Set Text) Element
fromXMLElement (X.Element name as nodes) =
case (lnodes, las) of
([], []) -> Right $ Element name ras rnodes
(x, []) -> Left $ Set.unions x
([], y) -> Left $ Set.unions y
(x, y) -> Left $ Set.unions x `Set.union` Set.unions y
where
enodes = map fromXMLNode nodes
(lnodes, rnodes) = partitionEithers enodes
eas = map go as
(las, ras') = partitionEithers eas
ras = Map.fromList ras'
go (x, y) =
case go' [] id y of
Left es -> Left es
Right y' -> Right (x, y')
go' [] front [] = Right $ T.concat $ front []
go' errs _ [] = Left $ Set.fromList errs
go' errs front (X.ContentText t:ys) = go' errs (front . (:) t) ys
go' errs front (X.ContentEntity t:ys) = go' (t : errs) front ys
fromXMLNode :: X.Node -> Either (Set Text) Node
fromXMLNode (X.NodeElement e) = NodeElement <$> fromXMLElement e
fromXMLNode (X.NodeContent (X.ContentText t)) = Right $ NodeContent t
fromXMLNode (X.NodeContent (X.ContentEntity t)) = Left $ Set.singleton t
fromXMLNode (X.NodeComment c) = Right $ NodeComment c
fromXMLNode (X.NodeInstruction i) = Right $ NodeInstruction i
readFile :: ParseSettings -> FilePath -> IO Document
readFile ps fp = handle
(throwIO . InvalidXMLFile fp)
(runConduitRes $ CB.sourceFile fp .| sinkDoc ps)
data XMLException = InvalidXMLFile FilePath SomeException
deriving Typeable
instance Show XMLException where
show (InvalidXMLFile fp e) = concat
[ "Error parsing XML file "
, fp
, ": "
, show e
]
instance Exception XMLException
parseLBS :: ParseSettings -> L.ByteString -> Either SomeException Document
parseLBS ps lbs
= runConduit
$ CL.sourceList (L.toChunks lbs)
.| sinkDoc ps
parseLBS_ :: ParseSettings -> L.ByteString -> Document
parseLBS_ ps = either throw id . parseLBS ps
sinkDoc :: MonadThrow m
=> ParseSettings
-> ConduitT ByteString o m Document
sinkDoc ps = P.parseBytesPos ps .| fromEvents
parseText :: ParseSettings -> TL.Text -> Either SomeException Document
parseText ps tl
= runConduit
$ CL.sourceList (TL.toChunks tl)
.| sinkTextDoc ps
parseText_ :: ParseSettings -> TL.Text -> Document
parseText_ ps = either throw id . parseText ps
sinkTextDoc :: MonadThrow m
=> ParseSettings
-> ConduitT Text o m Document
sinkTextDoc ps = P.parseTextPos ps .| fromEvents
fromEvents :: MonadThrow m => ConduitT P.EventPos o m Document
fromEvents = do
d <- D.fromEvents
either (lift . throwM . UnresolvedEntityException) return $ fromXMLDocument d
data UnresolvedEntityException = UnresolvedEntityException (Set Text)
deriving (Show, Typeable)
instance Exception UnresolvedEntityException
renderBytes :: PrimMonad m => D.RenderSettings -> Document -> ConduitT i ByteString m ()
renderBytes rs doc = D.renderBytes rs $ toXMLDocument' rs doc
writeFile :: R.RenderSettings -> FilePath -> Document -> IO ()
writeFile rs fp doc =
runConduitRes $ renderBytes rs doc .| CB.sinkFile fp
renderLBS :: R.RenderSettings -> Document -> L.ByteString
renderLBS rs doc =
L.fromChunks $ unsafePerformIO
-- not generally safe, but we know that runResourceT
-- will not deallocate any of the resources being used
-- by the process
$ lazyConsume
$ renderBytes rs doc
renderText :: R.RenderSettings -> Document -> TL.Text
renderText rs = TLE.decodeUtf8 . renderLBS rs
instance B.ToMarkup Document where
toMarkup (Document _ root _) = B5.docType >> B.toMarkup root
-- | Note that the special element name
-- @{http://www.snoyman.com/xml2html}ie-cond@ with the single attribute @cond@
-- is used to indicate an IE conditional comment.
instance B.ToMarkup Element where
toMarkup (Element "{http://www.snoyman.com/xml2html}ie-cond" attrs children)
| [("cond", cond)] <- Map.toList attrs =
B.preEscapedToMarkup ("" :: T.Text)
toMarkup (Element name' attrs children) =
if isVoid
then foldl' (B.!) leaf attrs'
else foldl' (B.!) parent attrs' childrenHtml
where
childrenHtml :: B.Html
childrenHtml =
case (name `elem` ["style", "script"], children) of
(True, [NodeContent t]) -> B.preEscapedToMarkup t
_ -> mapM_ B.toMarkup children
isVoid = nameLocalName name' `Set.member` voidElems
parent :: B.Html -> B.Html
parent = BI.Parent tag open close
leaf :: B.Html
#if MIN_VERSION_blaze_markup(0,8,0)
leaf = BI.Leaf tag open (fromString " />") ()
#else
leaf = BI.Leaf tag open (fromString " />")
#endif
name = T.unpack $ nameLocalName name'
tag = fromString name
open = fromString $ '<' : name
close = fromString $ concat ["", name, ">"]
attrs' :: [B.Attribute]
attrs' = map (goAttr . first nameLocalName) $ Map.toList attrs
goAttr (key, value) = B.customAttribute (B.textTag key) $ B.toValue value
instance B.ToMarkup Node where
toMarkup (NodeElement e) = B.toMarkup e
toMarkup (NodeContent t) = B.toMarkup t
toMarkup _ = mempty
voidElems :: Set.Set T.Text
voidElems = Set.fromAscList $ T.words $ T.pack "area base br col command embed hr img input keygen link meta param source track wbr"
xml-conduit-1.9.1.3/src/Text/XML/ 0000755 0000000 0000000 00000000000 07346545000 014472 5 ustar 00 0000000 0000000 xml-conduit-1.9.1.3/src/Text/XML/Cursor.hs 0000644 0000000 0000000 00000021065 07346545000 016307 0 ustar 00 0000000 0000000 -- | This module provides for simple DOM traversal. It is inspired by XPath. There are two central concepts here:
--
-- * A 'Cursor' represents a node in the DOM. It also contains information on the node's /location/. While the 'Node' datatype will only know of its children, a @Cursor@ knows about its parent and siblings as well. (The underlying mechanism allowing this is called a zipper, see and .)
--
-- * An 'Axis', in its simplest form, takes a @Cursor@ and returns a list of @Cursor@s. It is used for selections, such as finding children, ancestors, etc. Axes can be chained together to express complex rules, such as all children named /foo/.
--
-- The terminology used in this module is taken directly from the XPath
-- specification: . For those familiar with XPath,
-- the one major difference is that attributes are not considered nodes in this
-- module.
module Text.XML.Cursor
(
-- * Data types
Cursor
, Axis
-- * Production
, fromDocument
, fromNode
, cut
-- * Axes
, parent
, CG.precedingSibling
, CG.followingSibling
, child
, node
, CG.preceding
, CG.following
, CG.ancestor
, descendant
, orSelf
-- ** Filters
, check
, checkNode
, checkElement
, checkName
, anyElement
, element
, laxElement
, content
, attribute
, laxAttribute
, hasAttribute
, attributeIs
-- * Operators
, (CG.&|)
, (CG.&/)
, (CG.&//)
, (CG.&.//)
, (CG.$|)
, (CG.$/)
, (CG.$//)
, (CG.$.//)
, (CG.>=>)
-- * Type classes
, Boolean(..)
-- * Error handling
, force
, forceM
) where
import Control.Exception (Exception)
import Control.Monad
import Control.Monad.Trans.Resource (MonadThrow, throwM)
import Data.Function (on)
import qualified Data.Map as Map
import Data.Maybe (maybeToList)
import qualified Data.Text as T
import Text.XML
import Text.XML.Cursor.Generic (child, descendant, node, orSelf,
parent)
import qualified Text.XML.Cursor.Generic as CG
-- TODO: Consider [Cursor] -> [Cursor]?
-- | The type of an Axis that returns a list of Cursors.
-- They are roughly modeled after .
--
-- Axes can be composed with '>=>', where e.g. @f >=> g@ means that on all results of
-- the @f@ axis, the @g@ axis will be applied, and all results joined together.
-- Because Axis is just a type synonym for @Cursor -> [Cursor]@, it is possible to use
-- other standard functions like '>>=' or 'concatMap' similarly.
--
-- The operators '&|', '&/', '&//' and '&.//' can be used to combine axes so that the second
-- axis works on the context nodes, children, descendants, respectively the context node as
-- well as its descendants of the results of the first axis.
--
-- The operators '$|', '$/', '$//' and '$.//' can be used to apply an axis (right-hand side)
-- to a cursor so that it is applied on the cursor itself, its children, its descendants,
-- respectively itself and its descendants.
--
-- Note that many of these operators also work on /generalised Axes/ that can return
-- lists of something other than Cursors, for example Content elements.
type Axis = Cursor -> [Cursor]
-- XPath axes as in http://www.w3.org/TR/xpath/#axes
-- TODO: Decide whether to use an existing package for this
-- | Something that can be used in a predicate check as a boolean.
class Boolean a where
bool :: a -> Bool
instance Boolean Bool where
bool = id
instance Boolean [a] where
bool = not . null
instance Boolean (Maybe a) where
bool (Just _) = True
bool _ = False
instance Boolean (Either a b) where
bool (Left _) = False
bool (Right _) = True
-- | A cursor: contains an XML 'Node' and pointers to its children, ancestors and siblings.
type Cursor = CG.Cursor Node
-- | Cut a cursor off from its parent. The idea is to allow restricting the scope of queries on it.
cut :: Cursor -> Cursor
cut = fromNode . CG.node
-- | Convert a 'Document' to a 'Cursor'. It will point to the document root.
fromDocument :: Document -> Cursor
fromDocument = fromNode . NodeElement . documentRoot
-- | Convert a 'Node' to a 'Cursor' (without parents).
fromNode :: Node -> Cursor
fromNode =
CG.toCursor cs
where
cs (NodeElement (Element _ _ x)) = x
cs _ = []
-- | Filter cursors that don't pass a check.
check :: Boolean b => (Cursor -> b) -> Axis
check f c = [c | bool $ f c]
-- | Filter nodes that don't pass a check.
checkNode :: Boolean b => (Node -> b) -> Axis
checkNode f = check (f . node)
-- | Filter elements that don't pass a check, and remove all non-elements.
checkElement :: Boolean b => (Element -> b) -> Axis
checkElement f c = case node c of
NodeElement e -> [c | bool $ f e]
_ -> []
-- | Filter elements that don't pass a name check, and remove all non-elements.
checkName :: Boolean b => (Name -> b) -> Axis
checkName f = checkElement (f . elementName)
-- | Remove all non-elements. Compare roughly to XPath:
-- /A node test * is true for any node of the principal node type. For example, child::* will select all element children of the context node [...]/.
anyElement :: Axis
anyElement = checkElement (const True)
-- | Select only those elements with a matching tag name. XPath:
-- /A node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName./
element :: Name -> Axis
element n = checkName (== n)
-- | Select only those elements with a loosely matching tag name. Namespace and case are ignored. XPath:
-- /A node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName./
laxElement :: T.Text -> Axis
laxElement n = checkName (on (==) T.toCaseFold n . nameLocalName)
-- | Select only text nodes, and directly give the 'Content' values. XPath:
-- /The node test text() is true for any text node./
--
-- Note that this is not strictly an 'Axis', but will work with most combinators.
content :: Cursor -> [T.Text]
content c = case node c of
(NodeContent v) -> [v]
_ -> []
-- | Select attributes on the current element (or nothing if it is not an element). XPath:
-- /the attribute axis contains the attributes of the context node; the axis will be empty unless the context node is an element/
--
-- Note that this is not strictly an 'Axis', but will work with most combinators.
--
-- The return list of the generalised axis contains as elements lists of 'Content'
-- elements, each full list representing an attribute value.
attribute :: Name -> Cursor -> [T.Text]
attribute n c =
case node c of
NodeElement e -> maybeToList $ Map.lookup n $ elementAttributes e
_ -> []
-- | Select attributes on the current element (or nothing if it is not an element). Namespace and case are ignored. XPath:
-- /the attribute axis contains the attributes of the context node; the axis will be empty unless the context node is an element/
--
-- Note that this is not strictly an 'Axis', but will work with most combinators.
--
-- The return list of the generalised axis contains as elements lists of 'Content'
-- elements, each full list representing an attribute value.
laxAttribute :: T.Text -> Cursor -> [T.Text]
laxAttribute n c =
case node c of
NodeElement e -> do
(n', v) <- Map.toList $ elementAttributes e
guard $ (on (==) T.toCaseFold) n (nameLocalName n')
return v
_ -> []
-- | Select only those element nodes with the given attribute.
hasAttribute :: Name -> Axis
hasAttribute n c =
case node c of
NodeElement (Element _ as _) -> maybe [] (const [c]) $ Map.lookup n as
_ -> []
-- | Select only those element nodes containing the given attribute key/value pair.
attributeIs :: Name -> T.Text -> Axis
attributeIs n v c =
case node c of
NodeElement (Element _ as _) -> [ c | Just v == Map.lookup n as]
_ -> []
force :: (Exception e, MonadThrow f) => e -> [a] -> f a
force e [] = throwM e
force _ (x:_) = return x
forceM :: (Exception e, MonadThrow f) => e -> [f a] -> f a
forceM e [] = throwM e
forceM _ (x:_) = x
xml-conduit-1.9.1.3/src/Text/XML/Cursor/ 0000755 0000000 0000000 00000000000 07346545000 015747 5 ustar 00 0000000 0000000 xml-conduit-1.9.1.3/src/Text/XML/Cursor/Generic.hs 0000644 0000000 0000000 00000012544 07346545000 017665 0 ustar 00 0000000 0000000 -- | Generalized cursors to be applied to different nodes.
module Text.XML.Cursor.Generic
( -- * Core
Cursor
, Axis
, toCursor
, node
-- * Axes
, child
, parent
, precedingSibling
, followingSibling
, ancestor
, descendant
, orSelf
, preceding
, following
-- * Operators
, (&|)
, (&/)
, (&//)
, (&.//)
, ($|)
, ($/)
, ($//)
, ($.//)
, (>=>)
) where
import Data.Maybe (maybeToList)
import Data.List (foldl')
import Control.Monad ((>=>))
type DiffCursor node = [Cursor node] -> [Cursor node]
type Axis node = Cursor node -> [Cursor node]
-- | A cursor: contains an XML 'Node' and pointers to its children, ancestors and siblings.
data Cursor node = Cursor
{ parent' :: Maybe (Cursor node)
, precedingSibling' :: DiffCursor node
, followingSibling' :: DiffCursor node
-- | The child axis. XPath:
-- /the child axis contains the children of the context node/.
, child :: [Cursor node]
-- | The current node.
, node :: node
}
instance Show node => Show (Cursor node) where
show Cursor { node = n } = "Cursor @ " ++ show n
toCursor :: (node -> [node]) -- ^ get children
-> node
-> Cursor node
toCursor cs = toCursor' cs Nothing id id
toCursor' :: (node -> [node])
-> Maybe (Cursor node) -> DiffCursor node -> DiffCursor node -> node -> Cursor node
toCursor' cs par pre fol n =
me
where
me = Cursor par pre fol chi n
chi' = cs n
chi = go id chi' []
go _ [] = id
go pre' (n':ns') =
(:) me' . fol'
where
me' = toCursor' cs (Just me) pre' fol' n'
fol' = go (pre' . (:) me') ns'
-- | The parent axis. As described in XPath:
-- /the parent axis contains the parent of the context node, if there is one/.
--
-- Every node but the root element of the document has a parent. Parent nodes
-- will always be 'NodeElement's.
parent :: Axis node
parent = maybeToList . parent'
-- | The preceding-sibling axis. XPath:
-- /the preceding-sibling axis contains all the preceding siblings of the context node [...]/.
precedingSibling :: Axis node
precedingSibling = ($ []) . precedingSibling'
-- | The following-sibling axis. XPath:
-- /the following-sibling axis contains all the following siblings of the context node [...]/.
followingSibling :: Axis node
followingSibling = ($ []) . followingSibling'
-- | The preceding axis. XPath:
-- /the preceding axis contains all nodes in the same document as the context node that are before the context node in document order, excluding any ancestors and excluding attribute nodes and namespace nodes/.
preceding :: Axis node
preceding c =
go (precedingSibling' c []) (parent c >>= preceding)
where
go x y = foldl' (flip go') y x
go' x rest = foldl' (flip go') (x : rest) (child x)
-- | The following axis. XPath:
-- /the following axis contains all nodes in the same document as the context node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes/.
following :: Axis node
following c =
go (followingSibling' c) (parent c >>= following)
where
go x z = foldr go' z (x [])
go' x rest = x : foldr go' rest (child x)
-- | The ancestor axis. XPath:
-- /the ancestor axis contains the ancestors of the context node; the ancestors of the context node consist of the parent of context node and the parent's parent and so on; thus, the ancestor axis will always include the root node, unless the context node is the root node/.
ancestor :: Axis node
ancestor = parent >=> (\p -> p : ancestor p)
-- | The descendant axis. XPath:
-- /the descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes/.
descendant :: Axis node
descendant = child >=> (\c -> c : descendant c)
-- | Modify an axis by adding the context node itself as the first element of the result list.
orSelf :: Axis node -> Axis node
orSelf ax c = c : ax c
infixr 1 &|
infixr 1 &/
infixr 1 &//
infixr 1 &.//
infixr 1 $|
infixr 1 $/
infixr 1 $//
infixr 1 $.//
-- | Apply a function to the result of an axis.
(&|) :: (Cursor node -> [a]) -> (a -> b) -> (Cursor node -> [b])
f &| g = map g . f
-- | Combine two axes so that the second works on the children of the results
-- of the first.
(&/) :: Axis node -> (Cursor node -> [a]) -> (Cursor node -> [a])
f &/ g = f >=> child >=> g
-- | Combine two axes so that the second works on the descendants of the results
-- of the first.
(&//) :: Axis node -> (Cursor node -> [a]) -> (Cursor node -> [a])
f &// g = f >=> descendant >=> g
-- | Combine two axes so that the second works on both the result nodes, and their
-- descendants.
(&.//) :: Axis node -> (Cursor node -> [a]) -> (Cursor node -> [a])
f &.// g = f >=> orSelf descendant >=> g
-- | Apply an axis to a 'Cursor node'.
($|) :: Cursor node -> (Cursor node -> a) -> a
v $| f = f v
-- | Apply an axis to the children of a 'Cursor node'.
($/) :: Cursor node -> (Cursor node -> [a]) -> [a]
v $/ f = child v >>= f
-- | Apply an axis to the descendants of a 'Cursor node'.
($//) :: Cursor node -> (Cursor node -> [a]) -> [a]
v $// f = descendant v >>= f
-- | Apply an axis to a 'Cursor node' as well as its descendants.
($.//) :: Cursor node -> (Cursor node -> [a]) -> [a]
v $.// f = orSelf descendant v >>= f
xml-conduit-1.9.1.3/src/Text/XML/Stream/ 0000755 0000000 0000000 00000000000 07346545000 015725 5 ustar 00 0000000 0000000 xml-conduit-1.9.1.3/src/Text/XML/Stream/Parse.hs 0000644 0000000 0000000 00000167061 07346545000 017346 0 ustar 00 0000000 0000000 {-# LANGUAGE BangPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE PatternGuards #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeFamilies #-}
-- | This module provides both a native Haskell solution for parsing XML
-- documents into a stream of events, and a set of parser combinators for
-- dealing with a stream of events.
--
-- As a simple example:
--
-- >>> :set -XOverloadedStrings
-- >>> import Conduit (runConduit, (.|))
-- >>> import Data.Text (Text, unpack)
-- >>> import Data.XML.Types (Event)
-- >>> data Person = Person Int Text Text deriving Show
-- >>> :{
-- let parsePerson :: MonadThrow m => ConduitT Event o m (Maybe Person)
-- parsePerson = tag' "person" parseAttributes $ \(age, goodAtHaskell) -> do
-- name <- content
-- return $ Person (read $ unpack age) name goodAtHaskell
-- where parseAttributes = (,) <$> requireAttr "age" <*> requireAttr "goodAtHaskell" <* ignoreAttrs
-- parsePeople :: MonadThrow m => ConduitT Event o m (Maybe [Person])
-- parsePeople = tagNoAttr "people" $ many parsePerson
-- inputXml = mconcat
-- [ ""
-- , ""
-- , " Michael"
-- , " Eliezer"
-- , ""
-- ]
-- :}
--
-- >>> runConduit $ parseLBS def inputXml .| force "people required" parsePeople
-- [Person 25 "Michael" "yes",Person 2 "Eliezer" "might become"]
--
--
-- This module also supports streaming results using 'yield'.
-- This allows parser results to be processed using conduits
-- while a particular parser (e.g. 'many') is still running.
-- Without using streaming results, you have to wait until the parser finished
-- before you can process the result list. Large XML files might be easier
-- to process by using streaming results.
-- See http://stackoverflow.com/q/21367423/2597135 for a related discussion.
--
-- >>> import Data.Conduit.List as CL
-- >>> :{
-- let parsePeople' :: MonadThrow m => ConduitT Event Person m (Maybe ())
-- parsePeople' = tagNoAttr "people" $ manyYield parsePerson
-- :}
--
-- >>> runConduit $ parseLBS def inputXml .| force "people required" parsePeople' .| CL.mapM_ print
-- Person 25 "Michael" "yes"
-- Person 2 "Eliezer" "might become"
--
-- Previous versions of this module contained a number of more sophisticated
-- functions written by Aristid Breitkreuz and Dmitry Olshansky. To keep this
-- package simpler, those functions are being moved to a separate package. This
-- note will be updated with the name of the package(s) when available.
module Text.XML.Stream.Parse
( -- * Parsing XML files
parseBytes
, parseBytesPos
, parseText
, parseTextPos
, detectUtf
, parseFile
, parseLBS
-- ** Parser settings
, ParseSettings
, def
, DecodeEntities
, DecodeIllegalCharacters
, psDecodeEntities
, psDecodeIllegalCharacters
, psRetainNamespaces
, psEntityExpansionSizeLimit
, psIgnoreInternalEntityDeclarations
-- *** Entity decoding
, decodeXmlEntities
, decodeHtmlEntities
-- * Event parsing
, tag
, tag'
, tagNoAttr
, tagIgnoreAttrs
, content
, contentMaybe
-- * Ignoring tags/trees
, ignoreEmptyTag
, ignoreTree
, ignoreContent
, ignoreTreeContent
, ignoreAnyTreeContent
-- * Streaming events
, takeContent
, takeTree
, takeTreeContent
, takeAnyTreeContent
-- * Tag name matching
, NameMatcher(..)
, matching
, anyOf
, anyName
-- * Attribute parsing
, AttrParser
, attr
, requireAttr
, optionalAttr
, requireAttrRaw
, optionalAttrRaw
, ignoreAttrs
-- * Combinators
, orE
, choose
, many
, many_
, manyIgnore
, many'
, force
-- * Streaming combinators
, manyYield
, manyYield'
, manyIgnoreYield
-- * Exceptions
, XmlException (..)
-- * Other types
, PositionRange
, EventPos
) where
import Conduit
import Control.Applicative (Alternative (empty, (<|>)),
Applicative (..), (<$>))
import qualified Control.Applicative as A
import Control.Arrow ((***))
import Control.Exception (Exception (..), SomeException)
import Control.Monad (ap, liftM, void)
import Control.Monad.Fix (fix)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Class (lift)
import Control.Monad.Trans.Maybe (MaybeT (..))
import Control.Monad.Trans.Resource (MonadResource, MonadThrow (..),
throwM)
import Data.Attoparsec.Internal (concatReverse)
import Data.Attoparsec.Text (Parser, anyChar, char, manyTill,
skipWhile, string, takeWhile,
takeWhile1, (>),
notInClass, skipMany, skipMany1,
satisfy, peekChar)
import qualified Data.Attoparsec.Text as AT
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Builder as Builder
import Data.Char (isSpace)
import Data.Conduit.Attoparsec (PositionRange, conduitParser)
import qualified Data.Conduit.Text as CT
import Data.Default.Class (Default (..))
import Data.List (foldl', intercalate)
import qualified Data.Map as Map
import Data.Maybe (fromMaybe, isNothing, mapMaybe)
import Data.String (IsString (..))
import Data.Text (Text, pack)
import qualified Data.Text as T
import Data.Text.Encoding (decodeUtf8With)
import Data.Text.Encoding.Error (lenientDecode)
import Data.Typeable (Typeable)
import Data.XML.Types (Content (..), Event (..),
ExternalID (..),
Instruction (..), Name (..))
import Prelude hiding (takeWhile)
import Text.XML.Stream.Token
-- $setup
-- >>> :set -XOverloadedStrings
-- >>> import Conduit
-- >>> import Control.Monad (void, join)
type EntityTable = [(Text, Text)]
tokenToEvent :: ParseSettings -> EntityTable -> [NSLevel] -> Token -> (EntityTable, [NSLevel], [Event])
tokenToEvent _ es n (TokenXMLDeclaration _) = (es, n, [])
tokenToEvent _ es n (TokenInstruction i) = (es, n, [EventInstruction i])
tokenToEvent ps es n (TokenBeginElement name as isClosed _) =
(es, n', if isClosed then [begin, end] else [begin])
where
l0 = case n of
[] -> NSLevel Nothing Map.empty
x:_ -> x
(as', l') = foldl' go (id, l0) as
go (front, l) (TName kpref kname, val) =
(addNS front, l'')
where
isPrefixed = kpref == Just "xmlns"
isUnprefixed = isNothing kpref && kname == "xmlns"
addNS
| not (psRetainNamespaces ps) && (isPrefixed || isUnprefixed) = id
| otherwise = (. ((tname, resolveEntities' ps es val):))
where
resolveEntities' ps' es' xs =
mapMaybe extractTokenContent
(resolveEntities ps' es'
(map TokenContent xs))
extractTokenContent (TokenContent c) = Just c
extractTokenContent _ = Nothing
tname
| isPrefixed = TName Nothing ("xmlns:" `T.append` kname)
| otherwise = TName kpref kname
l''
| isPrefixed =
l { prefixes = Map.insert kname (contentsToText val)
$ prefixes l }
| isUnprefixed =
l { defaultNS = if T.null $ contentsToText val
then Nothing
else Just $ contentsToText val }
| otherwise = l
n' = if isClosed then n else l' : n
fixAttName (name', val) = (tnameToName True l' name', val)
elementName = tnameToName False l' name
begin = EventBeginElement elementName $ map fixAttName $ as' []
end = EventEndElement elementName
tokenToEvent _ es n (TokenEndElement name) =
(es, n', [EventEndElement $ tnameToName False l name])
where
(l, n') =
case n of
[] -> (NSLevel Nothing Map.empty, [])
x:xs -> (x, xs)
tokenToEvent ps es n tok@(TokenContent c@(ContentEntity e))
= case lookup e es of
Just _ -> (es, n, concatMap toEvents newtoks)
Nothing -> (es, n, [EventContent c])
where
toEvents t =
let (_, _, events) = tokenToEvent ps [] n t
in events
newtoks = resolveEntities ps es [tok]
tokenToEvent _ es n (TokenContent c) = (es, n, [EventContent c])
tokenToEvent _ es n (TokenComment c) = (es, n, [EventComment c])
tokenToEvent _ es n (TokenDoctype t eid es') = (es ++ es', n, [EventBeginDoctype t eid, EventEndDoctype])
tokenToEvent _ es n (TokenCDATA t) = (es, n, [EventCDATA t])
resolveEntities :: ParseSettings
-> EntityTable
-> [Token]
-> [Token]
resolveEntities ps entities = foldr go []
where
go tok@(TokenContent (ContentEntity e)) toks
= case expandEntity entities e of
Just xs -> foldr go toks xs
Nothing -> tok : toks
go tok toks = tok : toks
expandEntity es e
| Just t <- lookup e es =
case AT.parseOnly (manyTill
(parseToken ps :: Parser Token)
AT.endOfInput) t of
Left _ -> Nothing
Right xs -> -- recursively expand
let es' = filter (\(x,_) -> x /= e) es
in fst <$> foldr (goent es') (Just ([], 0)) xs
-- we delete e from the entity map in resolving its contents,
-- to avoid infinite loops in recursive expansion.
| otherwise = Nothing
goent _ _ Nothing = Nothing
goent es (TokenContent (ContentEntity e)) (Just (cs, size))
= expandEntity es e >>= foldr (goent es) (Just (cs, size))
goent _ tok (Just (toks, size)) =
let toksize = fromIntegral $
L.length (Builder.toLazyByteString (tokenToBuilder tok))
in case size + toksize of
n | n > psEntityExpansionSizeLimit ps -> Nothing
| otherwise -> Just (tok:toks, n)
tnameToName :: Bool -> NSLevel -> TName -> Name
tnameToName _ _ (TName (Just "xml") name) =
Name name (Just "http://www.w3.org/XML/1998/namespace") (Just "xml")
tnameToName isAttr (NSLevel def' _) (TName Nothing name) =
Name name (if isAttr then Nothing else def') Nothing
tnameToName _ (NSLevel _ m) (TName (Just pref) name) =
case Map.lookup pref m of
Just ns -> Name name (Just ns) (Just pref)
Nothing -> Name name Nothing (Just pref) -- FIXME is this correct?
-- | Automatically determine which UTF variant is being used. This function
-- first checks for BOMs, removing them as necessary, and then check for the
-- equivalent of ConduitT S.ByteString T.Text m ()
detectUtf =
conduit id
where
conduit front = await >>= maybe (return ()) (push front)
push front bss =
either conduit
(uncurry checkXMLDecl)
(getEncoding front bss)
getEncoding front bs'
| S.length bs < 4 =
Left (bs `S.append`)
| otherwise =
Right (bsOut, mcodec)
where
bs = front bs'
bsOut = S.append (S.drop toDrop x) y
(x, y) = S.splitAt 4 bs
(toDrop, mcodec) =
case S.unpack x of
[0x00, 0x00, 0xFE, 0xFF] -> (4, Just CT.utf32_be)
[0xFF, 0xFE, 0x00, 0x00] -> (4, Just CT.utf32_le)
0xFE : 0xFF: _ -> (2, Just CT.utf16_be)
0xFF : 0xFE: _ -> (2, Just CT.utf16_le)
0xEF : 0xBB: 0xBF : _ -> (3, Just CT.utf8)
[0x00, 0x00, 0x00, 0x3C] -> (0, Just CT.utf32_be)
[0x3C, 0x00, 0x00, 0x00] -> (0, Just CT.utf32_le)
[0x00, 0x3C, 0x00, 0x3F] -> (0, Just CT.utf16_be)
[0x3C, 0x00, 0x3F, 0x00] -> (0, Just CT.utf16_le)
_ -> (0, Nothing) -- Assuming UTF-8
checkXMLDecl :: MonadThrow m
=> S.ByteString
-> Maybe CT.Codec
-> ConduitT S.ByteString T.Text m ()
checkXMLDecl bs (Just codec) = leftover bs >> CT.decode codec
checkXMLDecl bs0 Nothing =
loop [] (AT.parse (parseToken def)) bs0
where
loop chunks0 parser nextChunk =
case parser $ decodeUtf8With lenientDecode nextChunk of
AT.Fail{} -> fallback
AT.Partial f -> await >>= maybe fallback (loop chunks f)
AT.Done _ (TokenXMLDeclaration attrs) -> findEncoding attrs
AT.Done{} -> fallback
where
chunks = nextChunk : chunks0
fallback = complete CT.utf8
complete codec = mapM_ leftover chunks >> CT.decode codec
findEncoding [] = fallback
findEncoding ((TName _ "encoding", [ContentText enc]):_) =
case T.toLower enc of
"iso-8859-1" -> complete CT.iso8859_1
"utf-8" -> complete CT.utf8
_ -> complete CT.utf8
findEncoding (_:xs) = findEncoding xs
type EventPos = (Maybe PositionRange, Event)
-- | Parses a byte stream into 'Event's. This function is implemented fully in
-- Haskell using attoparsec-text for parsing. The produced error messages do
-- not give line/column information, so you may prefer to stick with the parser
-- provided by libxml-enumerator. However, this has the advantage of not
-- relying on any C libraries.
--
-- This relies on 'detectUtf' to determine character encoding, and 'parseText'
-- to do the actual parsing.
parseBytes :: MonadThrow m
=> ParseSettings
-> ConduitT S.ByteString Event m ()
parseBytes = mapOutput snd . parseBytesPos
parseBytesPos :: MonadThrow m
=> ParseSettings
-> ConduitT S.ByteString EventPos m ()
parseBytesPos ps = detectUtf .| parseTextPos ps
dropBOM :: Monad m => ConduitT T.Text T.Text m ()
dropBOM =
await >>= maybe (return ()) push
where
push t =
case T.uncons t of
Nothing -> dropBOM
Just (c, cs) ->
let output
| c == '\xfeef' = cs
| otherwise = t
in yield output >> idConduit
idConduit = await >>= maybe (return ()) (\x -> yield x >> idConduit)
-- | Parses a character stream into 'Event's. This function is implemented
-- fully in Haskell using attoparsec-text for parsing. The produced error
-- messages do not give line/column information, so you may prefer to stick
-- with the parser provided by libxml-enumerator. However, this has the
-- advantage of not relying on any C libraries.
--
-- Since 1.2.4
parseText :: MonadThrow m => ParseSettings -> ConduitT T.Text Event m ()
parseText = mapOutput snd . parseTextPos
-- | Same as 'parseText', but includes the position of each event.
--
-- Since 1.2.4
parseTextPos :: MonadThrow m
=> ParseSettings
-> ConduitT T.Text EventPos m ()
parseTextPos de =
dropBOM
.| tokenize
.| toEventC de
.| addBeginEnd
where
tokenize = conduitToken de
addBeginEnd = yield (Nothing, EventBeginDocument) >> addEnd
addEnd = await >>= maybe
(yield (Nothing, EventEndDocument))
(\e -> yield e >> addEnd)
toEventC :: Monad m => ParseSettings -> ConduitT (PositionRange, Token) EventPos m ()
toEventC ps =
go [] []
where
go !es !levels =
await >>= maybe (return ()) push
where
push (position, token) =
mapM_ (yield . (,) (Just position)) events >> go es' levels'
where
(es', levels', events) = tokenToEvent ps es levels token
type DecodeEntities = Text -> Content
type DecodeIllegalCharacters = Int -> Maybe Char
data ParseSettings = ParseSettings
{ psDecodeEntities :: DecodeEntities
, psRetainNamespaces :: Bool
-- ^ Whether the original xmlns attributes should be retained in the parsed
-- values. For more information on motivation, see:
--
--
--
-- Default: False
--
-- Since 1.2.1
, psDecodeIllegalCharacters :: DecodeIllegalCharacters
-- ^ How to decode illegal character references (@[0-9]+;@ or @[0-9a-fA-F]+;@).
--
-- Character references within the legal ranges defined by are automatically parsed.
-- Others are passed to this function.
--
-- Default: @const Nothing@
--
-- Since 1.7.1
, psEntityExpansionSizeLimit :: Int
-- ^ Maximum number of characters allowed in expanding an
-- internal entity. This is intended to protect against the
-- billion laughs attack.
--
-- Default: @8192@
--
-- Since 1.9.1
, psIgnoreInternalEntityDeclarations :: Bool
-- ^ Whether to resolve any but the predefined entities.
--
-- Default: @False@
}
instance Default ParseSettings where
def = ParseSettings
{ psDecodeEntities = decodeXmlEntities
, psRetainNamespaces = False
, psDecodeIllegalCharacters = const Nothing
, psEntityExpansionSizeLimit = 8192
, psIgnoreInternalEntityDeclarations = False
}
conduitToken :: MonadThrow m => ParseSettings -> ConduitT T.Text (PositionRange, Token) m ()
conduitToken = conduitParser . parseToken
parseToken :: ParseSettings -> Parser Token
parseToken settings = do
mbc <- peekChar
case mbc of
Just '<' -> char '<' >> parseLt
_ -> TokenContent <$> parseContent settings False False
where
parseLt = do
mbc <- peekChar
case mbc of
Just '?' -> char' '?' >> parseInstr
Just '!' -> char' '!' >>
(parseComment <|> parseCdata <|> parseDoctype)
Just '/' -> char' '/' >> parseEnd
_ -> parseBegin
parseInstr = (do
name <- parseIdent
if name == "xml"
then do
as <- A.many $ parseAttribute settings
skipSpace
char' '?'
char' '>'
newline <|> return ()
return $ TokenXMLDeclaration as
else do
skipSpace
x <- T.pack <$> manyTill anyChar (string "?>")
return $ TokenInstruction $ Instruction name x)
> "instruction"
parseComment = (do
char' '-'
char' '-'
c <- T.pack <$> manyTill anyChar (string "-->")
return $ TokenComment c) > "comment"
parseCdata = (do
_ <- string "[CDATA["
t <- T.pack <$> manyTill anyChar (string "]]>")
return $ TokenCDATA t) > "CDATA"
parseDoctype = (do
_ <- string "DOCTYPE"
skipSpace
name <- parseName
let i =
case name of
TName Nothing x -> x
TName (Just x) y -> T.concat [x, ":", y]
skipSpace
eid <- fmap Just parsePublicID <|>
fmap Just parseSystemID <|>
return Nothing
skipSpace
mbc <- peekChar
ents <- case mbc of
Just '[' ->
do char' '['
ents <- parseDeclarations id
skipSpace
if psIgnoreInternalEntityDeclarations settings then
return []
else
return ents
_ -> return []
char' '>'
newline <|> return ()
return $ TokenDoctype i eid ents) > "DOCTYPE"
parseDeclarations front = -- we ignore everything but ENTITY
(char' ']' >> return (front [])) <|>
(parseEntity >>= \f -> parseDeclarations (front . f)) <|>
(string "") >>
parseDeclarations front) <|>
-- this clause handles directives like '\"")) <|> void quotedText)
char' '>'
parseDeclarations front) <|>
(skipMany1 (satisfy (notInClass "]<>")) >>
parseDeclarations front)
parseEntity = (do
_ <- string " skipSpace))
i <- parseIdent
t <- quotedText
skipSpace
char' '>'
return $
if isParameterEntity
then id
else ((i, t):)) > "entity"
parsePublicID = PublicID <$> (string "PUBLIC" *> quotedText) <*> quotedText
parseSystemID = SystemID <$> (string "SYSTEM" *> quotedText)
quotedText = (do
skipSpace
between '"' <|> between '\'') > "quoted text"
between c = do
char' c
x <- takeWhile (/=c)
char' c
return x
parseEnd = (do
skipSpace
n <- parseName
skipSpace
char' '>'
return $ TokenEndElement n) > "close tag"
parseBegin = (do
skipSpace
n <- parseName
as <- A.many $ parseAttribute settings
skipSpace
isClose <- (char '/' >> skipSpace >> return True) <|> return False
char' '>'
return $ TokenBeginElement n as isClose 0) > "open tag"
parseAttribute :: ParseSettings -> Parser TAttribute
parseAttribute settings = (do
skipSpace
key <- parseName
skipSpace
char' '='
skipSpace
val <- squoted <|> dquoted
return (key, val)) > "attribute"
where
squoted = char '\'' *> manyTill (parseContent settings False True) (char '\'')
dquoted = char '"' *> manyTill (parseContent settings True False) (char '"')
parseName :: Parser TName
parseName =
(name <$> parseIdent <*> A.optional (char ':' >> parseIdent)) > "name"
where
name i1 Nothing = TName Nothing i1
name i1 (Just i2) = TName (Just i1) i2
parseIdent :: Parser Text
parseIdent = takeWhile1 valid > "identifier"
where
valid '&' = False
valid '<' = False
valid '>' = False
valid ':' = False
valid '?' = False
valid '=' = False
valid '"' = False
valid '\'' = False
valid '/' = False
valid ';' = False
valid '#' = False
valid '[' = False
valid ']' = False
valid c = not $ isXMLSpace c
parseContent :: ParseSettings
-> Bool -- break on double quote
-> Bool -- break on single quote
-> Parser Content
parseContent (ParseSettings decodeEntities _ decodeIllegalCharacters _ _) breakDouble breakSingle = parseReference <|> (parseTextContent > "text content") where
parseReference = do
char' '&'
t <- parseEntityRef <|> parseHexCharRef <|> parseDecCharRef
char' ';'
return t
parseEntityRef = do
TName ma b <- parseName
let name = maybe "" (`T.append` ":") ma `T.append` b
return $ case name of
"lt" -> ContentText "<"
"gt" -> ContentText ">"
"amp" -> ContentText "&"
"quot" -> ContentText "\""
"apos" -> ContentText "'"
_ -> decodeEntities name
parseHexCharRef = do
void $ string "#x"
n <- AT.hexadecimal
case toValidXmlChar n <|> decodeIllegalCharacters n of
Nothing -> fail "Invalid character from hexadecimal character reference."
Just c -> return $ ContentText $ T.singleton c
parseDecCharRef = do
void $ string "#"
n <- AT.decimal
case toValidXmlChar n <|> decodeIllegalCharacters n of
Nothing -> fail "Invalid character from decimal character reference."
Just c -> return $ ContentText $ T.singleton c
-- Turns @\r\n@ and @\r@ into @\n@. See
-- .
parseTextContent = do
-- Read until the end of this piece of content
-- OR until a carriage return. In the second case, we use
-- handleCR to normalize \r and \r\n into \n.
firstChunk <- takeWhile valid
mbC <- peekChar
case mbC of
Just '\r' ->
handleCR [firstChunk]
_ ->
exit firstChunk
-- This is a duplication of the logic above and could be used instead.
-- Specialising these cases to the case "full text content contains no carriage return"
-- considerably speeds up execution when no carriage returns are in the original source.
handleCRPeek chunks = do
mbC <- peekChar
case mbC of
Just '\r' ->
handleCR chunks
_ ->
exit' chunks
handleCR chunks = do
-- We know that the next character is a carriage return. Discard it.
_ <- anyChar
-- Read the next chunk.
chunk <- takeWhile valid
case T.uncons chunk of
-- If it starts with newline, we're good:
-- We've already discarded the carriage return.
-- This is the case that replaces \r\n by \n.
Just ('\n', _) ->
handleCRPeek $ chunk : chunks
-- Otherwise, we'll have to insert a newline.
-- This is the case that replaces \r by \n.
Just _ ->
handleCRPeek $ chunk : "\n" : chunks
-- If the chunk is empty, we've either hit another carriage
-- return or the end of this piece of content. Since we've discarded
-- a carriage return we need to insert a newline.
Nothing ->
handleCRPeek $ "\n" : chunks
-- exit and exit' fail if the emitted text content is empty.
-- exit' uses Data.Text.concat to efficiently concatenate the collected
-- chunks.
exit c
| T.null c = fail "parseTextContent"
| otherwise = pure $ ContentText c
exit' cs = exit $ T.concat $ reverse cs
-- Check whether a character is valid text content (e.g. not a <)
-- OR a carriage return. The latter is used above in parseTextContent
-- to normalize line endings.
valid '"' = not breakDouble
valid '\'' = not breakSingle
valid '&' = False -- amp
valid '<' = False -- lt
valid '\r' = False
valid _ = True
-- | Is this codepoint a valid XML character? See
-- . This is proudly XML 1.0 only.
toValidXmlChar :: Int -> Maybe Char
toValidXmlChar n
| any checkRange ranges = Just (toEnum n)
| otherwise = Nothing
where
--Inclusive lower bound, inclusive upper bound.
ranges :: [(Int, Int)]
ranges =
[ (0x9, 0xA)
, (0xD, 0xD)
, (0x20, 0xD7FF)
, (0xE000, 0xFFFD)
, (0x10000, 0x10FFFF)
]
checkRange (lb, ub) = lb <= n && n <= ub
skipSpace :: Parser ()
skipSpace = skipWhile isXMLSpace
-- | Determines whether a character is an XML white space. The list of
-- white spaces is given by
--
-- > S ::= (#x20 | #x9 | #xD | #xA)+
--
-- in .
isXMLSpace :: Char -> Bool
isXMLSpace ' ' = True
isXMLSpace '\t' = True
isXMLSpace '\r' = True
isXMLSpace '\n' = True
isXMLSpace _ = False
newline :: Parser ()
newline = void $ (char '\r' >> char '\n') <|> char '\n'
char' :: Char -> Parser ()
char' = void . char
data ContentType = Ignore | IsContent Text | IsError String | NotContent
-- | Grabs the next piece of content if available. This function skips over any
-- comments, instructions or entities, and concatenates all content until the next start
-- or end tag.
contentMaybe :: MonadThrow m => ConduitT Event o m (Maybe Text)
contentMaybe = do
x <- peekC
case pc' x of
Ignore -> dropC 1 >> contentMaybe
IsContent t -> dropC 1 >> fmap Just (takeContents (t:))
IsError e -> lift $ throwM $ InvalidEntity e x
NotContent -> return Nothing
where
pc' Nothing = NotContent
pc' (Just x) = pc x
pc (EventContent (ContentText t)) = IsContent t
pc (EventContent (ContentEntity e)) = IsError $ "Unknown entity: " ++ show e
pc (EventCDATA t) = IsContent t
pc EventBeginElement{} = NotContent
pc EventEndElement{} = NotContent
pc EventBeginDocument{} = Ignore
pc EventEndDocument = Ignore
pc EventBeginDoctype{} = Ignore
pc EventEndDoctype = Ignore
pc EventInstruction{} = Ignore
pc EventComment{} = Ignore
takeContents front = do
x <- peekC
case pc' x of
Ignore -> dropC 1 >> takeContents front
IsContent t -> dropC 1 >> takeContents (front . (:) t)
IsError e -> lift $ throwM $ InvalidEntity e x
NotContent -> return $ T.concat $ front []
-- | Grabs the next piece of content. If none if available, returns 'T.empty'.
-- This is simply a wrapper around 'contentMaybe'.
content :: MonadThrow m => ConduitT Event o m Text
content = fromMaybe T.empty <$> contentMaybe
isWhitespace :: Event -> Bool
isWhitespace EventBeginDocument = True
isWhitespace EventEndDocument = True
isWhitespace EventBeginDoctype{} = True
isWhitespace EventEndDoctype = True
isWhitespace EventInstruction{} = True
isWhitespace (EventContent (ContentText t)) = T.all isSpace t
isWhitespace EventComment{} = True
isWhitespace (EventCDATA t) = T.all isSpace t
isWhitespace _ = False
-- | The most generic way to parse a tag. It takes a 'NameMatcher' to check whether
-- this is a correct tag name, an 'AttrParser' to handle attributes, and
-- then a parser to deal with content.
--
-- 'Events' are consumed if and only if the tag name and its attributes match.
--
-- This function automatically absorbs its balancing closing tag, and will
-- throw an exception if not all of the attributes or child elements are
-- consumed. If you want to allow extra attributes, see 'ignoreAttrs'.
--
-- This function automatically ignores comments, instructions and whitespace.
tag :: MonadThrow m
=> NameMatcher a -- ^ Check if this is a correct tag name
-- and return a value that can be used to get an @AttrParser@.
-- If this fails, the function will return @Nothing@
-> (a -> AttrParser b) -- ^ Given the value returned by the name checker, this function will
-- be used to get an @AttrParser@ appropriate for the specific tag.
-- If the @AttrParser@ fails, the function will also return @Nothing@
-> (b -> ConduitT Event o m c) -- ^ Handler function to handle the attributes and children
-- of a tag, given the value return from the @AttrParser@
-> ConduitT Event o m (Maybe c)
tag nameMatcher attrParser f = do
(x, leftovers) <- dropWS []
res <- case x of
Just (EventBeginElement name as) -> case runNameMatcher nameMatcher name of
Just y -> case runAttrParser' (attrParser y) as of
Left _ -> return Nothing
Right z -> do
z' <- f z
(a, _leftovers') <- dropWS []
case a of
Just (EventEndElement name')
| name == name' -> return (Just z')
_ -> lift $ throwM $ InvalidEndElement name a
Nothing -> return Nothing
_ -> return Nothing
case res of
-- Did not parse, put back all of the leading whitespace events and the
-- final observed event generated by dropWS
Nothing -> mapM_ leftover leftovers
-- Parse succeeded, discard all of those whitespace events and the
-- first parsed event
_ -> return ()
return res
where
-- Drop Events until we encounter a non-whitespace element. Return all of
-- the events consumed here (including the first non-whitespace event) so
-- that the calling function can treat them as leftovers if the parse fails
dropWS leftovers = do
x <- await
let leftovers' = maybe id (:) x leftovers
case isWhitespace <$> x of
Just True -> dropWS leftovers'
_ -> return (x, leftovers')
runAttrParser' p as =
case runAttrParser p as of
Left e -> Left e
Right ([], x) -> Right x
Right (attr', _) -> Left $ toException $ UnparsedAttributes attr'
-- | A simplified version of 'tag' where the 'NameMatcher' result isn't forwarded to the attributes parser.
--
-- Since 1.5.0
tag' :: MonadThrow m
=> NameMatcher a -> AttrParser b -> (b -> ConduitT Event o m c)
-> ConduitT Event o m (Maybe c)
tag' a b = tag a (const b)
-- | A further simplified tag parser, which requires that no attributes exist.
tagNoAttr :: MonadThrow m
=> NameMatcher a -- ^ Check if this is a correct tag name
-> ConduitT Event o m b -- ^ Handler function to handle the children of the matched tag
-> ConduitT Event o m (Maybe b)
tagNoAttr name f = tag' name (return ()) $ const f
-- | A further simplified tag parser, which ignores all attributes, if any exist
tagIgnoreAttrs :: MonadThrow m
=> NameMatcher a -- ^ Check if this is a correct tag name
-> ConduitT Event o m b -- ^ Handler function to handle the children of the matched tag
-> ConduitT Event o m (Maybe b)
tagIgnoreAttrs name f = tag' name ignoreAttrs $ const f
-- | Ignore an empty tag and all of its attributes.
-- This does not ignore the tag recursively
-- (i.e. it assumes there are no child elements).
-- This function returns @Just ()@ if the tag matched.
--
-- Since 1.5.0
ignoreEmptyTag :: MonadThrow m
=> NameMatcher a -- ^ Check if this is a correct tag name
-> ConduitT Event o m (Maybe ())
ignoreEmptyTag nameMatcher = tagIgnoreAttrs nameMatcher (return ())
ignored :: Monad m => ConduitT i o m ()
ignored = fix $ \recurse -> do
event <- await
case event of
Just _ -> recurse
_ -> return ()
-- | Same as `takeTree`, without yielding `Event`s.
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreTree "a" ignoreAttrs >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "b", ...}) [],EventEndElement (Name {nameLocalName = "b", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreTree "b" ignoreAttrs >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "a", ...}) [],EventContent (ContentText "content"),EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreTree anyName ignoreAttrs >> sinkList)
-- [EventContent (ContentText "content"),EventBeginElement (Name {nameLocalName = "a", ...}) [],EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- Since 1.9.0
ignoreTree :: MonadThrow m => NameMatcher a -> AttrParser b -> ConduitT Event o m (Maybe ())
ignoreTree nameMatcher attrParser = fuseUpstream (takeTree nameMatcher attrParser) ignored
-- | Same as `takeContent`, without yielding `Event`s.
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreContent >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "a", ...}) [],EventContent (ContentText "content"),EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreContent >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "a", ...}) [],EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreContent >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "a", ...}) [],EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- Since 1.9.0
ignoreContent :: MonadThrow m => ConduitT Event o m (Maybe ())
ignoreContent = fuseUpstream takeContent ignored
-- | Same as `takeTreeContent`, without yielding `Event`s.
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreTreeContent "a" ignoreAttrs >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "b", ...}) [],EventEndElement (Name {nameLocalName = "b", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreTreeContent "b" ignoreAttrs >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "a", ...}) [],EventContent (ContentText "content"),EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreTreeContent anyName ignoreAttrs >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "a", ...}) [],EventEndElement (Name {nameLocalName = "a", ...}),EventEndDocument]
--
-- Since 1.5.0
ignoreTreeContent :: MonadThrow m => NameMatcher a -> AttrParser b -> ConduitT Event o m (Maybe ())
ignoreTreeContent namePred attrParser = fuseUpstream (takeTreeContent namePred attrParser) ignored
-- | Same as `takeAnyTreeContent`, without yielding `Event`s.
--
-- >>> runConduit $ parseLBS def "content" .| (ignoreAnyTreeContent >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "b", ...}) [],EventEndElement (Name {nameLocalName = "b", ...}),EventEndDocument]
--
-- >>> runConduit $ parseLBS def "text" .| (ignoreAnyTreeContent >> sinkList)
-- [EventBeginElement (Name {nameLocalName = "b", ...}) [],EventEndElement (Name {nameLocalName = "b", ...}),EventEndDocument]
--
-- Since 1.5.0
ignoreAnyTreeContent :: MonadThrow m => ConduitT Event o m (Maybe ())
ignoreAnyTreeContent = fuseUpstream takeAnyTreeContent ignored
-- | Get the value of the first parser which returns 'Just'. If no parsers
-- succeed (i.e., return @Just@), this function returns 'Nothing'.
--
-- > orE a b = choose [a, b]
--
-- Warning: `orE` doesn't backtrack. See 'choose' for detailed explanation.
orE :: Monad m
=> ConduitT Event o m (Maybe a) -- ^ The first (preferred) parser
-> ConduitT Event o m (Maybe a) -- ^ The second parser, only executed if the first parser fails
-> ConduitT Event o m (Maybe a)
orE a b = a >>= \x -> maybe b (const $ return x) x
-- | Get the value of the first parser which returns 'Just'. If no parsers
-- succeed (i.e., return 'Just'), this function returns 'Nothing'.
--
-- Warning: 'choose' doesn't backtrack. If a parser consumed some events,
-- subsequent parsers will continue from the following events. This can be a
-- problem if parsers share an accepted prefix of events, so an earlier
-- (failing) parser will discard the events that the later parser could
-- potentially succeed on.
--
-- An other problematic case is using 'choose' to implement order-independent
-- parsing using a set of parsers, with a final trailing ignore-anything-else
-- action. In this case, certain trees might be skipped.
--
-- >>> :{
-- let parse2Tags name1 name2 = do
-- tag1 <- tagNoAttr name1 (pure ())
-- tag2 <- tagNoAttr name2 (pure tag1)
-- return $ join tag2
-- :}
--
-- >>> :{
-- runConduit $ parseLBS def "" .| choose
-- [ parse2Tags "a" "b"
-- , parse2Tags "a" "c"
-- ]
-- :}
-- Just ()
--
-- >>> :{
-- runConduit $ parseLBS def "" .| choose
-- [ parse2Tags "a" "c"
-- , parse2Tags "a" "b"
-- ]
-- :}
-- Nothing
choose :: Monad m
=> [ConduitT Event o m (Maybe a)] -- ^ List of parsers that will be tried in order.
-> ConduitT Event o m (Maybe a) -- ^ Result of the first parser to succeed, or @Nothing@
-- if no parser succeeded
choose [] = return Nothing
choose (i:is) = i >>= maybe (choose is) (return . Just)
-- | Force an optional parser into a required parser. All of the 'tag'
-- functions, 'attr', 'choose' and 'many' deal with 'Maybe' parsers. Use this when you
-- want to finally force something to happen.
force :: MonadThrow m
=> String -- ^ Error message
-> m (Maybe a) -- ^ Optional parser to be forced
-> m a
force msg i = i >>= maybe (throwM $ XmlException msg Nothing) return
-- | A helper function which reads a file from disk using 'enumFile', detects
-- character encoding using 'detectUtf', parses the XML using 'parseBytes', and
-- then hands off control to your supplied parser.
parseFile :: MonadResource m
=> ParseSettings
-> FilePath
-> ConduitT i Event m ()
parseFile ps fp = sourceFile fp .| transPipe liftIO (parseBytes ps)
-- | Parse an event stream from a lazy 'L.ByteString'.
parseLBS :: MonadThrow m
=> ParseSettings
-> L.ByteString
-> ConduitT i Event m ()
parseLBS ps lbs = sourceLazy lbs .| parseBytes ps
data XmlException = XmlException
{ xmlErrorMessage :: String
, xmlBadInput :: Maybe Event
}
| InvalidEndElement Name (Maybe Event)
| InvalidEntity String (Maybe Event)
| MissingAttribute String
| UnparsedAttributes [(Name, [Content])]
deriving (Show, Typeable)
instance Exception XmlException where
#if MIN_VERSION_base(4, 8, 0)
displayException (XmlException msg (Just event)) = "Error while parsing XML event " ++ show event ++ ": " ++ msg
displayException (XmlException msg _) = "Error while parsing XML: " ++ msg
displayException (InvalidEndElement name (Just event)) = "Error while parsing XML event: expected " ++ T.unpack (nameLocalName name) ++ ">, got " ++ show event
displayException (InvalidEndElement name _) = "Error while parsing XML event: expected " ++ show name ++ ">, got nothing"
displayException (InvalidEntity msg (Just event)) = "Error while parsing XML entity " ++ show event ++ ": " ++ msg
displayException (InvalidEntity msg _) = "Error while parsing XML entity: " ++ msg
displayException (MissingAttribute msg) = "Missing required attribute: " ++ msg
displayException (UnparsedAttributes attrs) = show (length attrs) ++ " remaining unparsed attributes: \n" ++ intercalate "\n" (show <$> attrs)
#endif
-- | A @NameMatcher@ describes which names a tag parser is allowed to match.
--
-- Since 1.5.0
newtype NameMatcher a = NameMatcher { runNameMatcher :: Name -> Maybe a }
deriving instance Functor NameMatcher
instance Applicative NameMatcher where
pure a = NameMatcher $ const $ pure a
NameMatcher f <*> NameMatcher a = NameMatcher $ \name -> f name <*> a name
-- | 'NameMatcher's can be combined with @\<|\>@
instance Alternative NameMatcher where
empty = NameMatcher $ const Nothing
NameMatcher f <|> NameMatcher g = NameMatcher (\a -> f a <|> g a)
-- | Match a single 'Name' in a concise way.
-- Note that 'Name' is namespace sensitive: when using the 'IsString' instance,
-- use @"{http:\/\/a\/b}c"@ to match the tag @c@ in the XML namespace @http://a/b@
instance (a ~ Name) => IsString (NameMatcher a) where
fromString s = matching (== fromString s)
-- | @matching f@ matches @name@ iff @f name@ is true. Returns the matched 'Name'.
--
-- Since 1.5.0
matching :: (Name -> Bool) -> NameMatcher Name
matching f = NameMatcher $ \name -> if f name then Just name else Nothing
-- | Matches any 'Name'. Returns the matched 'Name'.
--
-- Since 1.5.0
anyName :: NameMatcher Name
anyName = matching (const True)
-- | Matches any 'Name' from the given list. Returns the matched 'Name'.
--
-- Since 1.5.0
anyOf :: [Name] -> NameMatcher Name
anyOf values = matching (`elem` values)
-- | A monad for parsing attributes. By default, it requires you to deal with
-- all attributes present on an element, and will throw an exception if there
-- are unhandled attributes. Use the 'requireAttr', 'attr' et al
-- functions for handling an attribute, and 'ignoreAttrs' if you would like to
-- skip the rest of the attributes on an element.
--
-- 'Alternative' instance behaves like 'First' monoid: it chooses first
-- parser which doesn't fail.
newtype AttrParser a = AttrParser { runAttrParser :: [(Name, [Content])] -> Either SomeException ([(Name, [Content])], a) }
instance Monad AttrParser where
return a = AttrParser $ \as -> Right (as, a)
(AttrParser f) >>= g = AttrParser $ \as ->
either Left (\(as', f') -> runAttrParser (g f') as') (f as)
instance Functor AttrParser where
fmap = liftM
instance Applicative AttrParser where
pure = return
(<*>) = ap
instance Alternative AttrParser where
empty = AttrParser $ const $ Left $ toException $ XmlException "AttrParser.empty" Nothing
AttrParser f <|> AttrParser g = AttrParser $ \x ->
either (const $ g x) Right (f x)
instance MonadThrow AttrParser where
throwM = AttrParser . const . throwM
optionalAttrRaw :: ((Name, [Content]) -> Maybe b) -> AttrParser (Maybe b)
optionalAttrRaw f =
AttrParser $ go id
where
go front [] = Right (front [], Nothing)
go front (a:as) =
maybe (go (front . (:) a) as)
(\b -> Right (front as, Just b))
(f a)
requireAttrRaw :: String -> ((Name, [Content]) -> Maybe b) -> AttrParser b
requireAttrRaw msg f = optionalAttrRaw f >>=
maybe (AttrParser $ const $ Left $ toException $ MissingAttribute msg)
return
-- | Return the value for an attribute if present.
attr :: Name -> AttrParser (Maybe Text)
attr n = optionalAttrRaw
(\(x, y) -> if x == n then Just (contentsToText y) else Nothing)
-- | Shortcut composition of 'force' and 'attr'.
requireAttr :: Name -> AttrParser Text
requireAttr n = force ("Missing attribute: " ++ show n) $ attr n
{-# DEPRECATED optionalAttr "Please use 'attr'." #-}
optionalAttr :: Name -> AttrParser (Maybe Text)
optionalAttr = attr
contentsToText :: [Content] -> Text
contentsToText = T.concat . map toText where
toText (ContentText t) = t
toText (ContentEntity e) = T.concat ["&", e, ";"]
-- | Skip the remaining attributes on an element. Since this will clear the
-- list of attributes, you must call this /after/ any calls to 'requireAttr',
-- 'optionalAttr', etc.
ignoreAttrs :: AttrParser ()
ignoreAttrs = AttrParser $ const $ Right ([], ())
-- | Keep parsing elements as long as the parser returns 'Just'.
many :: Monad m
=> ConduitT Event o m (Maybe a)
-> ConduitT Event o m [a]
many i = manyIgnore i $ return Nothing
-- | Like 'many' but discards the results without building an intermediate list.
--
-- Since 1.5.0
many_ :: MonadThrow m
=> ConduitT Event o m (Maybe a)
-> ConduitT Event o m ()
many_ consumer = manyIgnoreYield (return Nothing) (void <$> consumer)
-- | Keep parsing elements as long as the parser returns 'Just'
-- or the ignore parser returns 'Just'.
manyIgnore :: Monad m
=> ConduitT Event o m (Maybe a)
-> ConduitT Event o m (Maybe b)
-> ConduitT Event o m [a]
manyIgnore i ignored' = go id where
go front = i >>= maybe (onFail front) (\y -> go $ front . (:) y)
-- onFail is called if the main parser fails
onFail front = ignored' >>= maybe (return $ front []) (const $ go front)
-- | Like @many@, but any tags and content the consumer doesn't match on
-- are silently ignored.
many' :: MonadThrow m
=> ConduitT Event o m (Maybe a)
-> ConduitT Event o m [a]
many' consumer = manyIgnore consumer ignoreAnyTreeContent
-- | Like 'many', but uses 'yield' so the result list can be streamed
-- to downstream conduits without waiting for 'manyYield' to finish
manyYield :: Monad m
=> ConduitT a b m (Maybe b)
-> ConduitT a b m ()
manyYield consumer = fix $ \loop ->
consumer >>= maybe (return ()) (\x -> yield x >> loop)
-- | Like 'manyIgnore', but uses 'yield' so the result list can be streamed
-- to downstream conduits without waiting for 'manyIgnoreYield' to finish
manyIgnoreYield :: MonadThrow m
=> ConduitT Event b m (Maybe b) -- ^ Consuming parser that generates the result stream
-> ConduitT Event b m (Maybe ()) -- ^ Ignore parser that consumes elements to be ignored
-> ConduitT Event b m ()
manyIgnoreYield consumer ignoreParser = fix $ \loop ->
consumer >>= maybe (onFail loop) (\x -> yield x >> loop)
where onFail loop = ignoreParser >>= maybe (return ()) (const loop)
-- | Like 'many'', but uses 'yield' so the result list can be streamed
-- to downstream conduits without waiting for 'manyYield'' to finish
manyYield' :: MonadThrow m
=> ConduitT Event b m (Maybe b)
-> ConduitT Event b m ()
manyYield' consumer = manyIgnoreYield consumer ignoreAnyTreeContent
-- | Stream a single content 'Event'.
--
-- Returns @Just ()@ if a content 'Event' was consumed, @Nothing@ otherwise.
--
-- >>> runConduit $ parseLBS def "content" .| void takeContent .| sinkList
-- [EventBeginDocument,EventContent (ContentText "content")]
--
-- If next event isn't a content, nothing is consumed.
--
-- >>> runConduit $ parseLBS def "content" .| void takeContent .| sinkList
-- [EventBeginDocument]
--
-- Since 1.5.0
takeContent :: MonadThrow m => ConduitT Event Event m (Maybe ())
takeContent = do
event <- await
case event of
Just e@EventContent{} -> yield e >> return (Just ())
Just e@EventCDATA{} -> yield e >> return (Just ())
Just e -> if isWhitespace e then yield e >> takeContent else leftover e >> return Nothing
_ -> return Nothing
-- | Stream 'Event's corresponding to a single XML element that matches given 'NameMatcher' and 'AttrParser', from the opening- to the closing-tag.
--
-- >>> runConduit $ parseLBS def "content" .| void (takeTree "a" ignoreAttrs) .| sinkList
-- [EventBeginDocument,EventBeginElement (Name {nameLocalName = "a", ...}) [],EventContent (ContentText "content"),EventEndElement (Name {nameLocalName = "a", ...})]
--
-- >>> runConduit $ parseLBS def "content" .| void (takeTree "b" ignoreAttrs) .| sinkList
-- [EventBeginDocument]
--
-- If next 'Event' isn't an element, nothing is consumed.
--
-- >>> runConduit $ parseLBS def "text" .| void (takeTree "a" ignoreAttrs) .| sinkList
-- [EventBeginDocument]
--
-- If an opening-tag is consumed but no matching closing-tag is found, an 'XmlException' is thrown.
--
-- >>> runConduit $ parseLBS def "" .| void (takeTree "a" ignoreAttrs) .| sinkList
-- *** Exception: InvalidEndElement (Name {nameLocalName = "a", nameNamespace = Nothing, namePrefix = Nothing}) Nothing
--
-- This function automatically ignores comments, instructions and whitespace.
--
-- Returns @Just ()@ if an element was consumed, 'Nothing' otherwise.
--
-- Since 1.5.0
takeTree :: MonadThrow m => NameMatcher a -> AttrParser b -> ConduitT Event Event m (Maybe ())
takeTree nameMatcher attrParser = do
event <- await
case event of
Just e@(EventBeginElement name as) -> case runNameMatcher nameMatcher name of
Just _ -> case runAttrParser attrParser as of
Right _ -> do
yield e
whileJust takeAnyTreeContent
endEvent <- await
case endEvent of
Just e'@(EventEndElement name') | name == name' -> yield e' >> return (Just ())
_ -> lift $ throwM $ InvalidEndElement name endEvent
_ -> leftover e >> return Nothing
_ -> leftover e >> return Nothing
Just e -> if isWhitespace e then yield e >> takeTree nameMatcher attrParser else leftover e >> return Nothing
_ -> return Nothing
where
whileJust f = fix $ \loop -> f >>= maybe (return ()) (const loop)
-- | Like 'takeTree', but can also stream a content 'Event'.
--
-- >>> runConduit $ parseLBS def "content" .| void (takeTreeContent "a" ignoreAttrs) .| sinkList
-- [EventBeginDocument,EventBeginElement (Name {nameLocalName = "a", ...}) [],EventContent (ContentText "content"),EventEndElement (Name {nameLocalName = "a", ...})]
--
-- >>> runConduit $ parseLBS def "content" .| void (takeTreeContent "b" ignoreAttrs) .| sinkList
-- [EventBeginDocument]
--
-- >>> runConduit $ parseLBS def "content" .| void (takeTreeContent "a" ignoreAttrs) .| sinkList
-- [EventBeginDocument,EventContent (ContentText "content")]
--
-- Since 1.5.0
takeTreeContent :: MonadThrow m => NameMatcher a -> AttrParser b -> ConduitT Event Event m (Maybe ())
takeTreeContent nameMatcher attrParser = runMaybeT $ MaybeT (takeTree nameMatcher attrParser) <|> MaybeT takeContent
-- | Like 'takeTreeContent', without checking for tag name or attributes.
--
-- >>> runConduit $ parseLBS def "text" .| void takeAnyTreeContent .| sinkList
-- [EventBeginDocument,EventContent (ContentText "text")]
--
-- >>> runConduit $ parseLBS def "" .| void takeAnyTreeContent .| sinkList
-- [EventBeginDocument]
--
-- >>> runConduit $ parseLBS def "text" .| void takeAnyTreeContent .| sinkList
-- [EventBeginDocument,EventBeginElement (Name {nameLocalName = "b", ...}) [],EventBeginElement (Name {nameLocalName = "c", ...}) [],EventEndElement (Name {nameLocalName = "c", ...}),EventEndElement (Name {nameLocalName = "b", ...})]
--
-- Since 1.5.0
takeAnyTreeContent :: MonadThrow m
=> ConduitT Event Event m (Maybe ())
takeAnyTreeContent = takeTreeContent anyName ignoreAttrs
-- | Default implementation of 'DecodeEntities', which leaves the
-- entity as-is. Numeric character references and the five standard
-- entities (lt, gt, amp, quot, pos) are handled internally by the
-- parser.
decodeXmlEntities :: DecodeEntities
decodeXmlEntities = ContentEntity
-- | HTML4-compliant entity decoder. Handles the additional 248
-- entities defined by HTML 4 and XHTML 1.
--
-- Note that HTML 5 introduces a drastically larger number of entities, and
-- this code does not recognize most of them.
decodeHtmlEntities :: DecodeEntities
decodeHtmlEntities t =
maybe (ContentEntity t) ContentText $ Map.lookup t htmlEntities
htmlEntities :: Map.Map T.Text T.Text
htmlEntities = Map.fromList
$ map (pack *** pack) -- Work around the long-compile-time bug
[ ("nbsp", "\160")
, ("iexcl", "\161")
, ("cent", "\162")
, ("pound", "\163")
, ("curren", "\164")
, ("yen", "\165")
, ("brvbar", "\166")
, ("sect", "\167")
, ("uml", "\168")
, ("copy", "\169")
, ("ordf", "\170")
, ("laquo", "\171")
, ("not", "\172")
, ("shy", "\173")
, ("reg", "\174")
, ("macr", "\175")
, ("deg", "\176")
, ("plusmn", "\177")
, ("sup2", "\178")
, ("sup3", "\179")
, ("acute", "\180")
, ("micro", "\181")
, ("para", "\182")
, ("middot", "\183")
, ("cedil", "\184")
, ("sup1", "\185")
, ("ordm", "\186")
, ("raquo", "\187")
, ("frac14", "\188")
, ("frac12", "\189")
, ("frac34", "\190")
, ("iquest", "\191")
, ("Agrave", "\192")
, ("Aacute", "\193")
, ("Acirc", "\194")
, ("Atilde", "\195")
, ("Auml", "\196")
, ("Aring", "\197")
, ("AElig", "\198")
, ("Ccedil", "\199")
, ("Egrave", "\200")
, ("Eacute", "\201")
, ("Ecirc", "\202")
, ("Euml", "\203")
, ("Igrave", "\204")
, ("Iacute", "\205")
, ("Icirc", "\206")
, ("Iuml", "\207")
, ("ETH", "\208")
, ("Ntilde", "\209")
, ("Ograve", "\210")
, ("Oacute", "\211")
, ("Ocirc", "\212")
, ("Otilde", "\213")
, ("Ouml", "\214")
, ("times", "\215")
, ("Oslash", "\216")
, ("Ugrave", "\217")
, ("Uacute", "\218")
, ("Ucirc", "\219")
, ("Uuml", "\220")
, ("Yacute", "\221")
, ("THORN", "\222")
, ("szlig", "\223")
, ("agrave", "\224")
, ("aacute", "\225")
, ("acirc", "\226")
, ("atilde", "\227")
, ("auml", "\228")
, ("aring", "\229")
, ("aelig", "\230")
, ("ccedil", "\231")
, ("egrave", "\232")
, ("eacute", "\233")
, ("ecirc", "\234")
, ("euml", "\235")
, ("igrave", "\236")
, ("iacute", "\237")
, ("icirc", "\238")
, ("iuml", "\239")
, ("eth", "\240")
, ("ntilde", "\241")
, ("ograve", "\242")
, ("oacute", "\243")
, ("ocirc", "\244")
, ("otilde", "\245")
, ("ouml", "\246")
, ("divide", "\247")
, ("oslash", "\248")
, ("ugrave", "\249")
, ("uacute", "\250")
, ("ucirc", "\251")
, ("uuml", "\252")
, ("yacute", "\253")
, ("thorn", "\254")
, ("yuml", "\255")
, ("OElig", "\338")
, ("oelig", "\339")
, ("Scaron", "\352")
, ("scaron", "\353")
, ("Yuml", "\376")
, ("fnof", "\402")
, ("circ", "\710")
, ("tilde", "\732")
, ("Alpha", "\913")
, ("Beta", "\914")
, ("Gamma", "\915")
, ("Delta", "\916")
, ("Epsilon", "\917")
, ("Zeta", "\918")
, ("Eta", "\919")
, ("Theta", "\920")
, ("Iota", "\921")
, ("Kappa", "\922")
, ("Lambda", "\923")
, ("Mu", "\924")
, ("Nu", "\925")
, ("Xi", "\926")
, ("Omicron", "\927")
, ("Pi", "\928")
, ("Rho", "\929")
, ("Sigma", "\931")
, ("Tau", "\932")
, ("Upsilon", "\933")
, ("Phi", "\934")
, ("Chi", "\935")
, ("Psi", "\936")
, ("Omega", "\937")
, ("alpha", "\945")
, ("beta", "\946")
, ("gamma", "\947")
, ("delta", "\948")
, ("epsilon", "\949")
, ("zeta", "\950")
, ("eta", "\951")
, ("theta", "\952")
, ("iota", "\953")
, ("kappa", "\954")
, ("lambda", "\955")
, ("mu", "\956")
, ("nu", "\957")
, ("xi", "\958")
, ("omicron", "\959")
, ("pi", "\960")
, ("rho", "\961")
, ("sigmaf", "\962")
, ("sigma", "\963")
, ("tau", "\964")
, ("upsilon", "\965")
, ("phi", "\966")
, ("chi", "\967")
, ("psi", "\968")
, ("omega", "\969")
, ("thetasym", "\977")
, ("upsih", "\978")
, ("piv", "\982")
, ("ensp", "\8194")
, ("emsp", "\8195")
, ("thinsp", "\8201")
, ("zwnj", "\8204")
, ("zwj", "\8205")
, ("lrm", "\8206")
, ("rlm", "\8207")
, ("ndash", "\8211")
, ("mdash", "\8212")
, ("lsquo", "\8216")
, ("rsquo", "\8217")
, ("sbquo", "\8218")
, ("ldquo", "\8220")
, ("rdquo", "\8221")
, ("bdquo", "\8222")
, ("dagger", "\8224")
, ("Dagger", "\8225")
, ("bull", "\8226")
, ("hellip", "\8230")
, ("permil", "\8240")
, ("prime", "\8242")
, ("Prime", "\8243")
, ("lsaquo", "\8249")
, ("rsaquo", "\8250")
, ("oline", "\8254")
, ("frasl", "\8260")
, ("euro", "\8364")
, ("image", "\8465")
, ("weierp", "\8472")
, ("real", "\8476")
, ("trade", "\8482")
, ("alefsym", "\8501")
, ("larr", "\8592")
, ("uarr", "\8593")
, ("rarr", "\8594")
, ("darr", "\8595")
, ("harr", "\8596")
, ("crarr", "\8629")
, ("lArr", "\8656")
, ("uArr", "\8657")
, ("rArr", "\8658")
, ("dArr", "\8659")
, ("hArr", "\8660")
, ("forall", "\8704")
, ("part", "\8706")
, ("exist", "\8707")
, ("empty", "\8709")
, ("nabla", "\8711")
, ("isin", "\8712")
, ("notin", "\8713")
, ("ni", "\8715")
, ("prod", "\8719")
, ("sum", "\8721")
, ("minus", "\8722")
, ("lowast", "\8727")
, ("radic", "\8730")
, ("prop", "\8733")
, ("infin", "\8734")
, ("ang", "\8736")
, ("and", "\8743")
, ("or", "\8744")
, ("cap", "\8745")
, ("cup", "\8746")
, ("int", "\8747")
, ("there4", "\8756")
, ("sim", "\8764")
, ("cong", "\8773")
, ("asymp", "\8776")
, ("ne", "\8800")
, ("equiv", "\8801")
, ("le", "\8804")
, ("ge", "\8805")
, ("sub", "\8834")
, ("sup", "\8835")
, ("nsub", "\8836")
, ("sube", "\8838")
, ("supe", "\8839")
, ("oplus", "\8853")
, ("otimes", "\8855")
, ("perp", "\8869")
, ("sdot", "\8901")
, ("lceil", "\8968")
, ("rceil", "\8969")
, ("lfloor", "\8970")
, ("rfloor", "\8971")
, ("lang", "\9001")
, ("rang", "\9002")
, ("loz", "\9674")
, ("spades", "\9824")
, ("clubs", "\9827")
, ("hearts", "\9829")
, ("diams", "\9830")
]
xml-conduit-1.9.1.3/src/Text/XML/Stream/Render.hs 0000644 0000000 0000000 00000037156 07346545000 017514 0 ustar 00 0000000 0000000 {-# LANGUAGE CPP #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RankNTypes #-}
-- | 'Enumeratee's to render XML 'Event's. Unlike libxml-enumerator and
-- expat-enumerator, this module does not provide IO and ST variants, since the
-- underlying rendering operations are pure functions.
module Text.XML.Stream.Render
( -- * Rendering XML files
renderBuilder
, renderBuilderFlush
, renderBytes
, renderText
, prettify
-- * Renderer settings
, RenderSettings
, def
, rsPretty
, rsNamespaces
, rsAttrOrder
, rsUseCDATA
, rsXMLDeclaration
, orderAttrs
-- * Event rendering
, tag
, content
-- * Attribute rendering
, Attributes
, attr
, optionalAttr
) where
import Control.Applicative ((<$>))
import Control.Monad.Trans.Resource (MonadThrow)
import Data.ByteString (ByteString)
import Data.ByteString.Builder (Builder)
import Conduit
import Data.Default.Class (Default (def))
import Data.List (foldl')
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (fromMaybe, mapMaybe)
import Data.Monoid (Monoid, mappend, mempty)
import qualified Data.Set as Set
import Data.Text (Text)
import qualified Data.Text as T
import Data.XML.Types (Content (..), Event (..),
Name (..))
import Text.XML.Stream.Token
-- | Render a stream of 'Event's into a stream of 'ByteString's. This function
-- wraps around 'renderBuilder' and 'builderToByteString', so it produces
-- optimally sized 'ByteString's with minimal buffer copying.
--
-- The output is UTF8 encoded.
renderBytes :: PrimMonad m => RenderSettings -> ConduitT Event ByteString m ()
renderBytes rs = renderBuilder rs .| builderToByteString
-- | Render a stream of 'Event's into a stream of 'Text's. This function
-- wraps around 'renderBuilder', 'builderToByteString' and 'renderBytes', so it
-- produces optimally sized 'Text's with minimal buffer copying.
renderText :: (PrimMonad m, MonadThrow m) => RenderSettings -> ConduitT Event Text m ()
renderText rs = renderBytes rs .| decodeUtf8C
data RenderSettings = RenderSettings
{ rsPretty :: Bool
, rsNamespaces :: [(Text, Text)]
-- ^ Defines some top level namespace definitions to be used, in the form
-- of (prefix, namespace). This has absolutely no impact on the meaning
-- of your documents, but can increase readability by moving commonly
-- used namespace declarations to the top level.
, rsAttrOrder :: Name -> Map.Map Name Text -> [(Name, Text)]
-- ^ Specify how to turn the unordered attributes used by the "Text.XML"
-- module into an ordered list.
, rsUseCDATA :: Content -> Bool
-- ^ Determines if for a given text content the renderer should use a
-- CDATA node.
--
-- Default: @False@
--
-- @since 1.3.3
, rsXMLDeclaration :: Bool
-- ^ Determines whether the XML declaration will be output.
--
-- Default: @True@
--
-- @since 1.5.1
}
instance Default RenderSettings where
def = RenderSettings
{ rsPretty = False
, rsNamespaces = []
, rsAttrOrder = const Map.toList
, rsUseCDATA = const False
, rsXMLDeclaration = True
}
-- | Convenience function to create an ordering function suitable for
-- use as the value of 'rsAttrOrder'. The ordering function is created
-- from an explicit ordering of the attributes, specified as a list of
-- tuples, as follows: In each tuple, the first component is the
-- 'Name' of an element, and the second component is a list of
-- attributes names. When the given element is rendered, the
-- attributes listed, when present, appear first in the given order,
-- followed by any other attributes in arbitrary order. If an element
-- does not appear, all of its attributes are rendered in arbitrary
-- order.
orderAttrs :: [(Name, [Name])] ->
Name -> Map Name Text -> [(Name, Text)]
orderAttrs orderSpec = order
where
order elt attrMap =
let initialAttrs = fromMaybe [] $ lookup elt orderSpec
mkPair attr' = (,) attr' <$> Map.lookup attr' attrMap
otherAttrMap =
Map.filterWithKey (const . not . (`elem` initialAttrs)) attrMap
in mapMaybe mkPair initialAttrs ++ Map.toAscList otherAttrMap
-- | Render a stream of 'Event's into a stream of 'Builder's. Builders are from
-- the blaze-builder package, and allow the create of optimally sized
-- 'ByteString's with minimal buffer copying.
renderBuilder :: Monad m => RenderSettings -> ConduitT Event Builder m ()
renderBuilder settings = mapC Chunk .| renderBuilder' yield' settings
where
yield' Flush = return ()
yield' (Chunk bs) = yield bs
-- | Same as 'renderBuilder' but allows you to flush XML stream to ensure that all
-- events at needed point are rendered.
--
-- @since 1.3.5
renderBuilderFlush :: Monad m => RenderSettings -> ConduitT (Flush Event) (Flush Builder) m ()
renderBuilderFlush = renderBuilder' yield
renderBuilder'
:: Monad m
=> (Flush Builder -> ConduitT (Flush Event) o m ())
-> RenderSettings
-> ConduitT (Flush Event) o m ()
renderBuilder' yield' settings =
if rsPretty settings
then prettify .| renderEvent'
else renderEvent'
where
renderEvent' = renderEvent yield' settings
renderEvent
:: Monad m
=> (Flush Builder -> ConduitT (Flush Event) o m ())
-> RenderSettings
-> ConduitT (Flush Event) o m ()
renderEvent yield' RenderSettings { rsPretty = isPretty, rsNamespaces = namespaces0, rsUseCDATA = useCDATA, rsXMLDeclaration = useXMLDecl } =
loop []
where
loop nslevels = await >>= maybe (return ()) (go nslevels)
go nslevels Flush = yield' Flush >> loop nslevels
go nslevels (Chunk e) =
case e of
EventBeginElement n1 as -> do
mnext <- peekC
isClosed <-
case mnext of
Just (Chunk (EventEndElement n2)) | n1 == n2 -> do
dropC 1
return True
_ -> return False
let (token, nslevels') = mkBeginToken isPretty isClosed namespaces0 nslevels n1 as
yield' $ Chunk token
loop nslevels'
_ -> do
let (token, nslevels') = eventToToken nslevels useCDATA useXMLDecl e
yield' $ Chunk token
loop nslevels'
eventToToken :: Stack -> (Content -> Bool) -> Bool -> Event -> (Builder, [NSLevel])
eventToToken s _ True EventBeginDocument =
(tokenToBuilder $ TokenXMLDeclaration
[ ("version", [ContentText "1.0"])
, ("encoding", [ContentText "UTF-8"])
]
, s)
eventToToken s _ False EventBeginDocument = (mempty, s)
eventToToken s _ _ EventEndDocument = (mempty, s)
eventToToken s _ _ (EventInstruction i) = (tokenToBuilder $ TokenInstruction i, s)
eventToToken s _ _ (EventBeginDoctype n meid) = (tokenToBuilder $ TokenDoctype n meid [], s)
eventToToken s _ _ EventEndDoctype = (mempty, s)
eventToToken s _ _ (EventCDATA t) = (tokenToBuilder $ TokenCDATA t, s)
eventToToken s _ _ (EventEndElement name) =
(tokenToBuilder $ TokenEndElement $ nameToTName sl name, s')
where
(sl:s') = s
eventToToken s useCDATA _ (EventContent c)
| useCDATA c =
case c of
ContentText txt -> (tokenToBuilder $ TokenCDATA txt, s)
ContentEntity txt -> (tokenToBuilder $ TokenCDATA txt, s)
| otherwise = (tokenToBuilder $ TokenContent c, s)
eventToToken s _ _ (EventComment t) = (tokenToBuilder $ TokenComment t, s)
eventToToken _ _ _ EventBeginElement{} = error "eventToToken on EventBeginElement" -- mkBeginToken False s name attrs
type Stack = [NSLevel]
nameToTName :: NSLevel -> Name -> TName
nameToTName _ (Name name _ (Just pref))
| pref == "xml" = TName (Just "xml") name
nameToTName _ (Name name Nothing _) = TName Nothing name -- invariant that this is true
nameToTName (NSLevel def' sl) (Name name (Just ns) _)
| def' == Just ns = TName Nothing name
| otherwise =
case Map.lookup ns sl of
Nothing -> error "nameToTName"
Just pref -> TName (Just pref) name
mkBeginToken :: Bool -- ^ pretty print attributes?
-> Bool -- ^ self closing?
-> [(Text, Text)] -- ^ namespaces to apply to top-level
-> Stack
-> Name
-> [(Name, [Content])]
-> (Builder, Stack)
mkBeginToken isPretty isClosed namespaces0 s name attrs =
(tokenToBuilder $ TokenBeginElement tname tattrs3 isClosed indent,
if isClosed then s else sl3 : s)
where
indent = if isPretty then 2 + 4 * length s else 0
prevsl = case s of
[] -> NSLevel Nothing Map.empty
sl':_ -> sl'
(sl1, tname, tattrs1) = newElemStack prevsl name
(sl2, tattrs2) = foldr newAttrStack (sl1, tattrs1) $ nubAttrs attrs
(sl3, tattrs3) =
case s of
[] -> (sl2 { prefixes = Map.union (prefixes sl2) $ Map.fromList namespaceSL }, namespaceAttrs ++ tattrs2)
_ -> (sl2, tattrs2)
(namespaceSL, namespaceAttrs) = unzip $ mapMaybe unused namespaces0
unused (k, v) =
case lookup k' tattrs2 of
Just{} -> Nothing
Nothing -> Just ((v, k), (k', v'))
where
k' = TName (Just "xmlns") k
v' = [ContentText v]
newElemStack :: NSLevel -> Name -> (NSLevel, TName, [TAttribute])
newElemStack nsl@(NSLevel def' _) (Name local ns _)
| def' == ns = (nsl, TName Nothing local, [])
newElemStack (NSLevel _ nsmap) (Name local Nothing _) =
(NSLevel Nothing nsmap, TName Nothing local, [(TName Nothing "xmlns", [])])
newElemStack (NSLevel _ nsmap) (Name local (Just ns) Nothing) =
(NSLevel (Just ns) nsmap, TName Nothing local, [(TName Nothing "xmlns", [ContentText ns])])
newElemStack (NSLevel def' nsmap) (Name local (Just ns) (Just pref)) =
case Map.lookup ns nsmap of
Just pref'
| pref == pref' ->
( NSLevel def' nsmap
, TName (Just pref) local
, []
)
_ -> ( NSLevel def' nsmap'
, TName (Just pref) local
, [(TName (Just "xmlns") pref, [ContentText ns])]
)
where
nsmap' = Map.insert ns pref nsmap
newAttrStack :: (Name, [Content]) -> (NSLevel, [TAttribute]) -> (NSLevel, [TAttribute])
newAttrStack (name, value) (NSLevel def' nsmap, attrs) =
(NSLevel def' nsmap', addNS $ (tname, value) : attrs)
where
(nsmap', tname, addNS) =
case name of
Name local Nothing _ -> (nsmap, TName Nothing local, id)
Name local (Just ns) mpref ->
let ppref = fromMaybe "ns" mpref
(pref, addNS') = getPrefix ppref nsmap ns
in (Map.insert ns pref nsmap, TName (Just pref) local, addNS')
getPrefix :: Text -> Map Text Text -> Text -> (Text, [TAttribute] -> [TAttribute])
getPrefix _ _ "http://www.w3.org/XML/1998/namespace" = ("xml", id)
getPrefix ppref nsmap ns =
case Map.lookup ns nsmap of
Just pref -> (pref, id)
Nothing ->
let pref = findUnused ppref $ Map.elems nsmap
in (pref, (:) (TName (Just "xmlns") pref, [ContentText ns]))
where
findUnused x xs
| x `elem` xs = findUnused (x `T.snoc` '_') xs
| otherwise = x
-- | Convert a stream of 'Event's into a prettified one, adding extra
-- whitespace. Note that this can change the meaning of your XML.
prettify :: Monad m => ConduitT (Flush Event) (Flush Event) m ()
prettify = prettify' 0
prettify' :: Monad m => Int -> ConduitT (Flush Event) (Flush Event) m ()
prettify' level =
await >>= maybe (return ()) goC
where
yield' = yield . Chunk
goC Flush = yield Flush >> prettify' level
goC (Chunk e) = go e
go e@EventBeginDocument = do
yield' e
yield' $ EventContent $ ContentText "\n"
prettify' level
go e@EventBeginElement{} = do
yield' before
yield' e
mnext <- peekC
case mnext of
Just (Chunk next@EventEndElement{}) -> do
dropC 1
yield' next
yield' after
prettify' level
_ -> do
yield' after
prettify' $ level + 1
go e@EventEndElement{} = do
let level' = max 0 $ level - 1
yield' $ before' level'
yield' e
yield' after
prettify' level'
go (EventContent c) = do
cs <- takeContents (c:)
let cs' = mapMaybe normalize cs
case cs' of
[] -> return ()
_ -> do
yield' before
mapM_ (yield' . EventContent) cs'
yield' after
prettify' level
go (EventCDATA t) = go $ EventContent $ ContentText t
go e@EventInstruction{} = do
yield' before
yield' e
yield' after
prettify' level
go (EventComment t) = do
yield' before
yield' $ EventComment $ T.concat
[ " "
, T.unwords $ T.words t
, " "
]
yield' after
prettify' level
go e@EventEndDocument = yield' e >> prettify' level
go e@EventBeginDoctype{} = yield' e >> prettify' level
go e@EventEndDoctype{} = yield' e >> yield' after >> prettify' level
takeContents front = do
me <- peekC
case me of
Just (Chunk (EventContent c)) -> do
dropC 1
takeContents $ front . (c:)
Just (Chunk (EventCDATA t)) -> do
dropC 1
takeContents $ front . (ContentText t:)
_ -> return $ front []
normalize (ContentText t)
| T.null t' = Nothing
| otherwise = Just $ ContentText t'
where
t' = T.unwords $ T.words t
normalize c = Just c
before = EventContent $ ContentText $ T.replicate level " "
before' l = EventContent $ ContentText $ T.replicate l " "
after = EventContent $ ContentText "\n"
nubAttrs :: [(Name, v)] -> [(Name, v)]
nubAttrs orig =
front []
where
(front, _) = foldl' go (id, Set.empty) orig
go (dlist, used) (k, v)
| k `Set.member` used = (dlist, used)
| otherwise = (dlist . ((k, v):), Set.insert k used)
-- | Generate a complete XML 'Element'.
tag :: (Monad m) => Name -> Attributes -> ConduitT i Event m () -- ^ 'Element''s subnodes.
-> ConduitT i Event m ()
tag name (Attributes a) content' = do
yield $ EventBeginElement name a
content'
yield $ EventEndElement name
-- | Generate a textual 'EventContent'.
content :: (Monad m) => Text -> ConduitT i Event m ()
content = yield . EventContent . ContentText
-- | A list of attributes.
data Attributes = Attributes [(Name, [Content])]
instance Monoid Attributes where
mempty = Attributes mempty
#if !MIN_VERSION_base(4,11,0)
(Attributes a) `mappend` (Attributes b) = Attributes (a `mappend` b)
#else
instance Semigroup Attributes where
(Attributes a) <> (Attributes b) = Attributes (a <> b)
#endif
-- | Generate a single attribute.
attr :: Name -- ^ Attribute's name
-> Text -- ^ Attribute's value
-> Attributes
attr name value = Attributes [(name, [ContentText value])]
-- | Helper function that generates a valid attribute if input isn't 'Nothing', or 'mempty' otherwise.
optionalAttr :: Name -> Maybe Text -> Attributes
optionalAttr name = maybe mempty (attr name)
xml-conduit-1.9.1.3/src/Text/XML/Stream/Token.hs 0000644 0000000 0000000 00000013615 07346545000 017347 0 ustar 00 0000000 0000000 {-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleContexts #-}
module Text.XML.Stream.Token
( tokenToBuilder
, TName (..)
, Token (..)
, TAttribute
, NSLevel (..)
) where
import Data.XML.Types (Instruction (..), Content (..), ExternalID (..))
import qualified Data.Text as T
import Data.Text (Text)
import Data.Text.Encoding (encodeUtf8Builder, encodeUtf8BuilderEscaped)
import Data.String (IsString (fromString))
import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder.Prim as E
import Data.ByteString.Builder.Prim ((>*<), (>$<), condB)
import Data.Monoid (mconcat, mempty, (<>))
import Data.Map (Map)
import qualified Data.Set as Set
import Data.List (foldl')
import Control.Arrow (first)
import Data.Word (Word8)
oneSpace :: Builder
oneSpace = " "
data Token = TokenXMLDeclaration [TAttribute]
| TokenInstruction Instruction
| TokenBeginElement TName [TAttribute] Bool Int -- ^ indent
| TokenEndElement TName
| TokenContent Content
| TokenComment Text
| TokenDoctype Text (Maybe ExternalID) [(Text, Text)]
| TokenCDATA Text
deriving Show
tokenToBuilder :: Token -> Builder
tokenToBuilder (TokenXMLDeclaration attrs) =
"
foldAttrs oneSpace attrs <>
"?>"
tokenToBuilder (TokenInstruction (Instruction target data_)) =
"" <>
encodeUtf8Builder target <>
" " <>
encodeUtf8Builder data_ <>
"?>"
tokenToBuilder (TokenBeginElement name attrs' isEmpty indent) =
"<" <>
tnameToText name <>
foldAttrs
(if indent == 0 || lessThan3 attrs
then oneSpace
else mconcat $ ("\n" : replicate indent " "))
attrs <>
(if isEmpty then "/>" else ">")
where
attrs = nubAttrs $ map (first splitTName) attrs'
lessThan3 [] = True
lessThan3 [_] = True
lessThan3 [_, _] = True
lessThan3 _ = False
tokenToBuilder (TokenEndElement name) = "" <> tnameToText name <> ">"
tokenToBuilder (TokenContent c) = contentToText c
tokenToBuilder (TokenCDATA t) = " escCDATA t <> "]]>"
tokenToBuilder (TokenComment t) = ""
tokenToBuilder (TokenDoctype name eid _) =
"
encodeUtf8Builder name <>
go eid <>
">"
where
go Nothing = mempty
go (Just (SystemID uri)) = " SYSTEM \"" <> encodeUtf8Builder uri <> "\""
go (Just (PublicID pid uri)) =
" PUBLIC \"" <>
encodeUtf8Builder pid <>
"\" \"" <>
encodeUtf8Builder uri <>
"\""
data TName = TName (Maybe Text) Text
deriving (Show, Eq, Ord)
tnameToText :: TName -> Builder
tnameToText (TName Nothing name) = encodeUtf8Builder name
tnameToText (TName (Just prefix) name) =
encodeUtf8Builder prefix <> ":" <> encodeUtf8Builder name
contentToText :: Content -> Builder
contentToText (ContentText t) = encodeUtf8BuilderEscaped (charUtf8XmlEscaped ECContent) t
contentToText (ContentEntity e) = "&" <> encodeUtf8Builder e <> ";"
-- | What usage are we escaping for?
data EscapeContext = ECContent -- ^ ..
| ECDoubleArg -- ^
| ECSingleArg -- ^
deriving (Show, Eq)
{-# INLINE charUtf8XmlEscaped #-}
charUtf8XmlEscaped :: EscapeContext -> E.BoundedPrim Word8
charUtf8XmlEscaped ec =
(condB (> _gt) (E.liftFixedToBounded E.word8)) $
(condB (== _lt) (fixed4 (_am,(_l,(_t,_sc))))) $ -- <
escapeFor ECContent (condB (== _gt) (fixed4 (_am,(_g,(_t,_sc))))) $ -- >
(condB (== _am) (fixed5 (_am,(_a,(_m,(_p,_sc)))))) $ -- &
escapeFor ECDoubleArg (condB (== _dq) (fixed6 (_am,(_q,(_u,(_o,(_t,_sc))))))) $ -- "
escapeFor ECSingleArg (condB (== _sq) (fixed6 (_am,(_a,(_p,(_o,(_s,_sc))))))) $ -- '
(E.liftFixedToBounded E.word8) -- fallback for Chars smaller than '>'
where
_gt = 62 -- >
_lt = 60 -- <
_am = 38 -- &
_dq = 34 -- "
_sq = 39 -- '
_l = 108 -- l
_t = 116 -- t
_g = 103 -- g
_a = 97 -- a
_m = 109 -- m
_p = 112 -- p
_o = 111 -- o
_s = 115 -- s
_q = 113 -- q
_u = 117 -- u
_sc = 59 -- ;
{-# INLINE escapeFor #-}
escapeFor :: EscapeContext -> (a -> a) -> a -> a
escapeFor ec' f a
| ec == ec' = f a
| otherwise = a
{-# INLINE fixed4 #-}
fixed4 x = E.liftFixedToBounded $ const x >$<
E.word8 >*< E.word8 >*< E.word8 >*< E.word8
{-# INLINE fixed5 #-}
fixed5 x = E.liftFixedToBounded $ const x >$<
E.word8 >*< E.word8 >*< E.word8 >*< E.word8 >*< E.word8
{-# INLINE fixed6 #-}
fixed6 x = E.liftFixedToBounded $ const x >$<
E.word8 >*< E.word8 >*< E.word8 >*< E.word8 >*< E.word8 >*< E.word8
type TAttribute = (TName, [Content])
foldAttrs :: Builder -- ^ before
-> [TAttribute]
-> Builder
foldAttrs before =
foldMap go
where
go (key, val) =
before <>
tnameToText key <>
"=\"" <>
foldMap go' val <>
"\""
go' (ContentText t) =
encodeUtf8BuilderEscaped (charUtf8XmlEscaped ECDoubleArg) t
go' (ContentEntity t) = "&" <> encodeUtf8Builder t <> ";"
instance IsString TName where
fromString = TName Nothing . T.pack
data NSLevel = NSLevel
{ defaultNS :: Maybe Text
, prefixes :: Map Text Text
}
deriving Show
nubAttrs :: [TAttribute] -> [TAttribute]
nubAttrs orig =
front []
where
(front, _) = foldl' go (id, Set.empty) orig
go (dlist, used) (k, v)
| k `Set.member` used = (dlist, used)
| otherwise = (dlist . ((k, v):), Set.insert k used)
splitTName :: TName -> TName
splitTName x@(TName Just{} _) = x
splitTName x@(TName Nothing t)
| T.null b = x
| otherwise = TName (Just a) $ T.drop 1 b
where
(a, b) = T.break (== ':') t
escCDATA :: Text -> Builder
escCDATA s = encodeUtf8Builder (T.replace "]]>" "]]]]>" s)
xml-conduit-1.9.1.3/src/Text/XML/Unresolved.hs 0000644 0000000 0000000 00000025263 07346545000 017164 0 ustar 00 0000000 0000000 {-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
-- | DOM-based XML parsing and rendering.
--
-- In this module, attribute values and content nodes can contain either raw
-- text or entities. In most cases, these can be fully resolved at parsing. If
-- that is the case for your documents, the "Text.XML" module provides
-- simplified datatypes that only contain raw text.
module Text.XML.Unresolved
( -- * Non-streaming functions
writeFile
, readFile
-- * Lazy bytestrings
, renderLBS
, parseLBS
, parseLBS_
-- * Text
, parseText
, parseText_
, sinkTextDoc
-- * Byte streams
, sinkDoc
-- * Streaming functions
, toEvents
, elementToEvents
, fromEvents
, elementFromEvents
, renderBuilder
, renderBytes
, renderText
-- * Exceptions
, InvalidEventStream (..)
-- * Settings
, P.def
-- ** Parse
, P.ParseSettings
, P.psDecodeEntities
, P.psRetainNamespaces
-- ** Render
, R.RenderSettings
, R.rsPretty
, R.rsNamespaces
) where
import Conduit
import Control.Applicative ((<$>), (<*>))
import Control.Exception (Exception, SomeException, throw)
import Control.Monad (when)
import Control.Monad.Trans.Class (lift)
import Data.ByteString (ByteString)
import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Lazy as L
import Data.Char (isSpace)
import qualified Data.Conduit.Binary as CB
import Data.Conduit.Lazy (lazyConsume)
import qualified Data.Conduit.List as CL
import Data.Maybe (isJust, mapMaybe)
import Data.Monoid (mconcat)
import Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.Lazy as TL
import Data.Typeable (Typeable)
import Data.XML.Types
import Prelude hiding (readFile, writeFile)
import System.IO.Unsafe (unsafePerformIO)
import Text.XML.Stream.Parse (ParseSettings)
import qualified Text.XML.Stream.Parse as P
import qualified Text.XML.Stream.Render as R
readFile :: P.ParseSettings -> FilePath -> IO Document
readFile ps fp = runConduitRes $ CB.sourceFile fp .| sinkDoc ps
sinkDoc :: MonadThrow m
=> P.ParseSettings
-> ConduitT ByteString o m Document
sinkDoc ps = P.parseBytesPos ps .| fromEvents
writeFile :: R.RenderSettings -> FilePath -> Document -> IO ()
writeFile rs fp doc =
runConduitRes $ renderBytes rs doc .| CB.sinkFile fp
renderLBS :: R.RenderSettings -> Document -> L.ByteString
renderLBS rs doc =
L.fromChunks $ unsafePerformIO
-- not generally safe, but we know that runResourceT
-- will not deallocate any of the resources being used
-- by the process
$ lazyConsume
$ renderBytes rs doc
parseLBS :: P.ParseSettings -> L.ByteString -> Either SomeException Document
parseLBS ps lbs = runConduit $ CL.sourceList (L.toChunks lbs) .| sinkDoc ps
parseLBS_ :: P.ParseSettings -> L.ByteString -> Document
parseLBS_ ps lbs = either throw id $ parseLBS ps lbs
data InvalidEventStream = ContentAfterRoot P.EventPos
| MissingRootElement
| InvalidInlineDoctype P.EventPos
| MissingEndElement Name (Maybe P.EventPos)
| UnterminatedInlineDoctype
deriving Typeable
instance Exception InvalidEventStream
instance Show InvalidEventStream where
show (ContentAfterRoot (pos, e)) = mShowPos pos ++ "Found content after root element: " ++ prettyShowE e
show MissingRootElement = "Missing root element"
show (InvalidInlineDoctype (pos, e)) = mShowPos pos ++ "Invalid content inside doctype: " ++ prettyShowE e
show (MissingEndElement name Nothing) = "Documented ended while expected end element for: " ++ prettyShowName name
show (MissingEndElement name (Just (pos, e))) = mShowPos pos ++ "Expected end element for: " ++ prettyShowName name ++ ", but received: " ++ prettyShowE e
show UnterminatedInlineDoctype = "Unterminated doctype declaration"
mShowPos :: Maybe P.PositionRange -> String
mShowPos Nothing = ""
mShowPos (Just pos) = show pos ++ ": "
prettyShowE :: Event -> String
prettyShowE = show -- FIXME
prettyShowName :: Name -> String
prettyShowName = show -- FIXME
renderBuilder :: Monad m => R.RenderSettings -> Document -> ConduitT i Builder m ()
renderBuilder rs doc = CL.sourceList (toEvents doc) .| R.renderBuilder rs
renderBytes :: PrimMonad m => R.RenderSettings -> Document -> ConduitT i ByteString m ()
renderBytes rs doc = CL.sourceList (toEvents doc) .| R.renderBytes rs
renderText :: (MonadThrow m, PrimMonad m) => R.RenderSettings -> Document -> ConduitT i Text m ()
renderText rs doc = CL.sourceList (toEvents doc) .| R.renderText rs
manyTries :: Monad m => m (Maybe a) -> m [a]
manyTries f =
go id
where
go front = do
x <- f
case x of
Nothing -> return $ front []
Just y -> go (front . (:) y)
dropReturn :: Monad m => a -> ConduitM i o m a
dropReturn x = CL.drop 1 >> return x
-- | Parse a document from a stream of events.
fromEvents :: MonadThrow m => ConduitT P.EventPos o m Document
fromEvents = do
skip EventBeginDocument
d <- Document <$> goP <*> require elementFromEvents <*> goM
skip EventEndDocument
y <- CL.head
case y of
Nothing -> return d
Just (_, EventEndDocument) -> lift $ throwM MissingRootElement
Just z ->
lift $ throwM $ ContentAfterRoot z
where
skip e = do
x <- CL.peek
when (fmap snd x == Just e) (CL.drop 1)
require f = do
x <- f
case x of
Just y -> return y
Nothing -> do
my <- CL.head
case my of
Nothing -> error "Text.XML.Unresolved:impossible"
Just (_, EventEndDocument) -> lift $ throwM MissingRootElement
Just y -> lift $ throwM $ ContentAfterRoot y
goP = Prologue <$> goM <*> goD <*> goM
goM = manyTries goM'
goM' = do
x <- CL.peek
case x of
Just (_, EventInstruction i) -> dropReturn $ Just $ MiscInstruction i
Just (_, EventComment t) -> dropReturn $ Just $ MiscComment t
Just (_, EventContent (ContentText t))
| T.all isSpace t -> CL.drop 1 >> goM'
_ -> return Nothing
goD = do
x <- CL.peek
case x of
Just (_, EventBeginDoctype name meid) -> do
CL.drop 1
dropTillDoctype
return (Just $ Doctype name meid)
_ -> return Nothing
dropTillDoctype = do
x <- CL.head
case x of
-- Leaving the following line commented so that the intention of
-- this function stays clear. I figure in the future xml-types will
-- be expanded again to support some form of EventDeclaration
--
-- Just (EventDeclaration _) -> dropTillDoctype
Just (_, EventEndDoctype) -> return ()
Just epos -> lift $ throwM $ InvalidInlineDoctype epos
Nothing -> lift $ throwM UnterminatedInlineDoctype
-- | Try to parse a document element (as defined in XML) from a stream of events.
--
-- @since 1.3.5
elementFromEvents :: MonadThrow m => ConduitT P.EventPos o m (Maybe Element)
elementFromEvents = goE
where
goE = do
x <- CL.peek
case x of
Just (_, EventBeginElement n as) -> Just <$> goE' n as
_ -> return Nothing
goE' n as = do
CL.drop 1
ns <- manyTries goN
y <- CL.head
if fmap snd y == Just (EventEndElement n)
then return $ Element n as $ compressNodes ns
else lift $ throwM $ MissingEndElement n y
goN = do
x <- CL.peek
case x of
Just (_, EventBeginElement n as) -> (Just . NodeElement) <$> goE' n as
Just (_, EventInstruction i) -> dropReturn $ Just $ NodeInstruction i
Just (_, EventContent c) -> dropReturn $ Just $ NodeContent c
Just (_, EventComment t) -> dropReturn $ Just $ NodeComment t
Just (_, EventCDATA t) -> dropReturn $ Just $ NodeContent $ ContentText t
_ -> return Nothing
-- | Render a document into events.
toEvents :: Document -> [Event]
toEvents (Document prol root epi) =
(EventBeginDocument :)
. goP prol . elementToEvents' root . goM epi $ [EventEndDocument]
where
goP (Prologue before doctype after) =
goM before . maybe id goD doctype . goM after
goM [] = id
goM [x] = (goM' x :)
goM (x:xs) = (goM' x :) . goM xs
goM' (MiscInstruction i) = EventInstruction i
goM' (MiscComment t) = EventComment t
goD (Doctype name meid) =
(:) (EventBeginDoctype name meid)
. (:) EventEndDoctype
-- | Render a document element into events.
--
-- @since 1.3.5
elementToEvents :: Element -> [Event]
elementToEvents e = elementToEvents' e []
elementToEvents' :: Element -> [Event] -> [Event]
elementToEvents' = goE
where
goE (Element name as ns) =
(EventBeginElement name as :)
. goN ns
. (EventEndElement name :)
goN [] = id
goN [x] = goN' x
goN (x:xs) = goN' x . goN xs
goN' (NodeElement e) = goE e
goN' (NodeInstruction i) = (EventInstruction i :)
goN' (NodeContent c) = (EventContent c :)
goN' (NodeComment t) = (EventComment t :)
compressNodes :: [Node] -> [Node]
compressNodes [] = []
compressNodes [x] = [x]
compressNodes (x@(NodeContent (ContentText _)) : y@(NodeContent (ContentText _)) : z) =
let (textNodes, remainder) = span (isJust . unContent) (x:y:z)
texts = mapMaybe unContent textNodes
in
compressNodes $ NodeContent (ContentText $ mconcat texts) : remainder
where
unContent (NodeContent (ContentText text)) = Just text
unContent _ = Nothing
compressNodes (x:xs) = x : compressNodes xs
parseText :: ParseSettings -> TL.Text -> Either SomeException Document
parseText ps tl =
runConduit
$ CL.sourceList (TL.toChunks tl)
.| sinkTextDoc ps
parseText_ :: ParseSettings -> TL.Text -> Document
parseText_ ps = either throw id . parseText ps
sinkTextDoc :: MonadThrow m
=> ParseSettings
-> ConduitT Text o m Document
sinkTextDoc ps = P.parseTextPos ps .| fromEvents
xml-conduit-1.9.1.3/test/ 0000755 0000000 0000000 00000000000 07346545000 013276 5 ustar 00 0000000 0000000 xml-conduit-1.9.1.3/test/doctest.hs 0000644 0000000 0000000 00000000533 07346545000 015300 0 ustar 00 0000000 0000000 module Main where
import Build_doctests (flags, module_sources, pkgs)
import Data.Foldable (traverse_)
import System.Environment (unsetEnv)
import Test.DocTest (doctest)
main :: IO ()
main = do
traverse_ putStrLn args
doctest args
where
args = flags ++ pkgs ++ module_sources
xml-conduit-1.9.1.3/test/unit.hs 0000644 0000000 0000000 00000127364 07346545000 014626 0 ustar 00 0000000 0000000 {-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Exception (Exception, toException,
fromException)
import Control.Monad.IO.Class (liftIO)
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy.Char8 as L
import Data.Typeable (Typeable)
import Data.XML.Types
import Test.Hspec
import Test.HUnit hiding (Test)
import qualified Text.XML as Res
import qualified Text.XML.Cursor as Cu
import Text.XML.Stream.Parse (def)
import qualified Text.XML.Stream.Parse as P
import qualified Text.XML.Unresolved as D
import Control.Monad
import qualified Data.Set as Set
import Data.Text (Text)
import qualified Data.Text as T
import Text.XML.Cursor (($.//), ($/), ($//), ($|),
(&.//), (&/), (&//))
import qualified Control.Monad.Trans.Resource as C
import Data.Conduit ((.|), runConduit,
runConduitRes, ConduitT)
import Data.Conduit.Attoparsec (ParseError(..))
import qualified Data.Conduit as C
import qualified Data.Conduit.List as CL
import qualified Data.Map as Map
import Text.Blaze (toMarkup)
import Text.Blaze.Renderer.String (renderMarkup)
main :: IO ()
main = hspec $ do
describe "XML parsing and rendering" $ do
it "is idempotent to parse and render a document" documentParseRender
it "has valid parser combinators" combinators
context "has working choose function" testChoose
it "has working many function" testMany
it "has working many' function" testMany'
it "has working manyYield function" testManyYield
it "has working takeContent function" testTakeContent
it "has working takeTree function" testTakeTree
it "has working takeAnyTreeContent function" testTakeAnyTreeContent
it "has working orE" testOrE
it "is idempotent to parse and pretty render a document" documentParsePrettyRender
it "ignores the BOM" parseIgnoreBOM
it "strips duplicated attributes" stripDuplicateAttributes
it "displays comments" testRenderComments
it "conduit parser" testConduitParser
it "can omit the XML declaration" omitXMLDeclaration
it "doesn't hang on malformed entity declarations" malformedEntityDeclaration
it "escapes <>'\"& as necessary" caseEscapesAsNecessary
it "preserves the order of attributes" casePreservesAttrOrder
context "correctly parses hexadecimal entities" hexEntityParsing
it "normalizes line endings" crlfToLfConversion
it "normalizes \\r at the end of a content" crlfToLfConversionCrAtEnd
it "normalizes multiple \\rs and \\r\\ns" crlfToLfConversionCrCrCr
describe "XML Cursors" $ do
it "has correct parent" cursorParent
it "has correct ancestor" cursorAncestor
it "has correct orSelf" cursorOrSelf
it "has correct preceding" cursorPreceding
it "has correct following" cursorFollowing
it "has correct precedingSibling" cursorPrecedingSib
it "has correct followingSibling" cursorFollowingSib
it "has correct descendant" cursorDescendant
it "has correct check" cursorCheck
it "has correct check with lists" cursorPredicate
it "has correct checkNode" cursorCheckNode
it "has correct checkElement" cursorCheckElement
it "has correct checkName" cursorCheckName
it "has correct anyElement" cursorAnyElement
it "has correct element" cursorElement
it "has correct laxElement" cursorLaxElement
it "has correct content" cursorContent
it "has correct attribute" cursorAttribute
it "has correct laxAttribute" cursorLaxAttribute
it "has correct &* and $* operators" cursorDeep
it "has correct force" cursorForce
it "has correct forceM" cursorForceM
it "has correct hasAttribute" cursorHasAttribute
it "has correct attributeIs" cursorAttributeIs
describe "resolved" $ do
it "identifies unresolved entities" resolvedIdentifies
it "decodeHtmlEntities" testHtmlEntities
it "works for resolvable entities" resolvedAllGood
it "ignores custom entities when psResolveEntities is False" dontResolveEntities
it "merges adjacent content nodes" resolvedMergeContent
it "understands inline entity declarations" resolvedInline
it "understands complex inline with markup" resolvedInlineComplex
it "can expand inline entities recursively" resolvedInlineRecursive
it "doesn't explode with an inline entity loop" resolvedInlineLoop
it "doesn't explode with the billion laughs attack" billionLaughs
it "allows entity expansion size limit to be adjusted" thousandLaughs
it "ignores parameter entity declarations" parameterEntity
it "doesn't break on [] in doctype comments" doctypeComment
it "skips element declarations in doctype" doctypeElements
it "skips processing instructions in doctype" doctypePI
describe "pretty" $ do
it "works" casePretty
describe "top level namespaces" $ do
it "works" caseTopLevelNamespace
it "works with prefix" caseTopLevelNamespacePrefix
it "handles conflicts" caseTLNConflict
describe "blaze-html instances" $ do
it "works" caseBlazeHtml
describe "attribute reordering" $ do
it "works" caseAttrReorder
describe "ordering attributes explicitly" $ do
it "works" caseOrderAttrs
it "parsing CDATA" caseParseCdata
it "retains namespaces when asked" caseRetainNamespaces
it "handles iso-8859-1" caseIso8859_1
it "renders CDATA when asked" caseRenderCDATA
it "escapes CDATA closing tag in CDATA" caseEscapesCDATA
documentParseRender :: IO ()
documentParseRender =
mapM_ go docs
where
go x = x @=? D.parseLBS_ def (D.renderLBS def x)
docs =
[ Document (Prologue [] Nothing [])
(Element "foo" [] [])
[]
, D.parseLBS_ def
"\n"
, D.parseLBS_ def
"\n&ignore;"
, D.parseLBS_ def
"]]>"
, D.parseLBS_ def
""
, D.parseLBS_ def
""
, D.parseLBS_ def
""
]
documentParsePrettyRender :: IO ()
documentParsePrettyRender =
L.unpack (D.renderLBS def { D.rsPretty = True } (D.parseLBS_ def doc)) @?= L.unpack doc
where
doc = L.unlines
[ ""
, ""
, " "
, " text"
, " "
, ""
]
combinators :: Assertion
combinators = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tag' "hello" (P.requireAttr "world") $ \world -> do
liftIO $ world @?= "true"
P.force "need child1" $ P.tagNoAttr "{mynamespace}child1" $ return ()
P.force "need child2" $ P.tagNoAttr "child2" $ return ()
P.force "need child3" $ P.tagNoAttr "child3" $ do
x <- P.contentMaybe
liftIO $ x @?= Just "combine &content"
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
, ""
, " "
, "combine <all> \n"
, ""
]
testChoose :: Spec
testChoose = do
it "can choose between elements"
testChooseEitherElem
it "can choose between elements and text, returning text"
testChooseElemOrTextIsText
it "can choose between elements and text, returning elements"
testChooseElemOrTextIsElem
it "can choose between text and elements, returning text"
testChooseTextOrElemIsText
it "can choose between text and elements, returning elements"
testChooseTextOrElemIsElem
it "can choose between text and elements, when the text is encoded"
testChooseElemOrTextIsEncoded
it "can choose between text and elements, when the text is encoded, NBSP"
testChooseElemOrTextIsEncodedNBSP
it "can choose between elements and text, when the text is whitespace"
testChooseElemOrTextIsWhiteSpace
it "can choose between text and elements, when the text is whitespace"
testChooseTextOrElemIsWhiteSpace
it "can choose between text and elements, when the whitespace is both literal and encoded"
testChooseElemOrTextIsChunkedText
it "can choose between text and elements, when the text is chunked the other way"
testChooseElemOrTextIsChunkedText2
testChooseElemOrTextIsText :: Assertion
testChooseElemOrTextIsText = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return "boom"
, P.contentMaybe
]
liftIO $ x @?= Just " something "
where
input = L.concat
[ ""
, "\n"
, ""
, " something "
, ""
]
testChooseElemOrTextIsEncoded :: Assertion
testChooseElemOrTextIsEncoded = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return "boom"
, P.contentMaybe
]
liftIO $ x @?= Just "\x20something\x20"
where
input = L.concat
[ ""
, "\n"
, ""
, " something "
, ""
]
testChooseElemOrTextIsEncodedNBSP :: Assertion
testChooseElemOrTextIsEncodedNBSP = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return "boom"
, P.contentMaybe
]
liftIO $ x @?= Just "\160something\160"
where
input = L.concat
[ ""
, "\n"
, ""
, " something "
, ""
]
testChooseElemOrTextIsWhiteSpace :: Assertion
testChooseElemOrTextIsWhiteSpace = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return "boom"
, P.contentMaybe
]
liftIO $ x @?= Just "\x20\x20\x20"
where
input = L.concat
[ ""
, "\n"
, " "
]
testChooseTextOrElemIsWhiteSpace :: Assertion
testChooseTextOrElemIsWhiteSpace = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.contentMaybe
, P.tagNoAttr "failure" $ return "boom"
]
liftIO $ x @?= Just "\x20\x20\x20"
where
input = L.concat
[ ""
, "\n"
, " "
]
testChooseElemOrTextIsChunkedText :: Assertion
testChooseElemOrTextIsChunkedText = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return "boom"
, P.contentMaybe
]
liftIO $ x @?= Just "\x20\x20\x20"
where
input = L.concat
[ ""
, "\n"
, " "
]
testChooseElemOrTextIsChunkedText2 :: Assertion
testChooseElemOrTextIsChunkedText2 = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return "boom"
, P.contentMaybe
]
liftIO $ x @?= Just "\x20\x20\x20"
where
input = L.concat
[ ""
, "\n"
, " "
]
testChooseElemOrTextIsElem :: Assertion
testChooseElemOrTextIsElem = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "success" $ return "success"
, P.contentMaybe
]
liftIO $ x @?= Just "success"
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
]
testChooseTextOrElemIsText :: Assertion
testChooseTextOrElemIsText = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.contentMaybe
, P.tagNoAttr "failure" $ return "boom"
]
liftIO $ x @?= Just " something "
where
input = L.concat
[ ""
, "\n"
, ""
, " something "
, ""
]
testChooseTextOrElemIsElem :: Assertion
testChooseTextOrElemIsElem = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.contentMaybe
, P.tagNoAttr "success" $ return "success"
]
liftIO $ x @?= Just "success"
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
]
testChooseEitherElem :: Assertion
testChooseEitherElem = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.choose
[ P.tagNoAttr "failure" $ return 1
, P.tagNoAttr "success" $ return 2
]
liftIO $ x @?= Just (2 :: Int)
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
]
testManyYield :: Assertion
testManyYield = do
-- Basically the same as testMany, but consume the streamed result
result <- runConduitRes $
P.parseLBS def input .| helloParser
.| CL.consume
length result @?= 5
where
helloParser = void $ P.tagNoAttr "hello" $ P.manyYield successParser
successParser = P.tagNoAttr "success" $ return ()
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
, ""
, ""
, ""
, ""
]
testTakeContent :: Assertion
testTakeContent = do
result <- runConduitRes $ P.parseLBS def input .| rootParser
result @?= Just
[ EventContent (ContentText "Hello world !")
]
where
rootParser = P.tagNoAttr "root" $ void (P.takeContent >> P.takeContent) .| CL.consume
input = L.concat
[ ""
, "\n"
, ""
, "Hello world !"
, ""
]
testTakeTree :: Assertion
testTakeTree = do
result <- runConduitRes $ P.parseLBS def input .| rootParser
result @?=
[ EventBeginDocument
, EventBeginDoctype "foo" Nothing
, EventEndDoctype
, EventBeginElement "a" []
, EventBeginElement "em" []
, EventContent (ContentText "Hello world !")
, EventEndElement "em"
, EventEndElement "a"
]
where
rootParser = void (P.takeTree "a" P.ignoreAttrs) .| CL.consume
input = L.concat
[ ""
, "\n"
, ""
, "Hello world !"
, ""
, ""
, ""
]
testTakeAnyTreeContent :: Assertion
testTakeAnyTreeContent = do
result <- runConduitRes $ P.parseLBS def input .| rootParser
result @?= Just
[ EventBeginElement "b" []
, EventContent (ContentText "Hello ")
, EventBeginElement "em" []
, EventContent (ContentText "world")
, EventEndElement "em"
, EventContent (ContentText " !")
, EventEndElement "b"
]
where
rootParser = P.tagNoAttr "root" $ (P.takeAnyTreeContent >> void P.ignoreAnyTreeContent) .| CL.consume
input = L.concat
[ ""
, "\n"
, ""
, "Hello world ! Welcome !"
, ""
]
testMany :: Assertion
testMany = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.many $ P.tagNoAttr "success" $ return ()
liftIO $ length x @?= 5
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
, ""
, ""
, ""
, ""
]
testMany' :: Assertion
testMany' = runConduitRes $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.many' $ P.tagNoAttr "success" $ return ()
liftIO $ length x @?= 5
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
, ""
, ""
, ""
, "some content"
, ""
, ""
]
testOrE :: IO ()
testOrE = runConduitRes $ runConduit $ P.parseLBS def input .| do
P.force "need hello" $ P.tagNoAttr "hello" $ do
x <- P.tagNoAttr "failure" (return 1) `P.orE`
P.tagNoAttr "success" (return 2)
y <- P.tag' "success" (P.requireAttr "failure") (const $ return 1) `P.orE`
P.tag' "success" (P.requireAttr "success") (const $ return 2)
liftIO $ x @?= Just (2 :: Int)
liftIO $ y @?= Just (2 :: Int)
where
input = L.concat
[ ""
, "\n"
, ""
, ""
, ""
, ""
]
testConduitParser :: Assertion
testConduitParser = do
x <- runConduitRes
$ P.parseLBS def input
.| (P.force "need hello" $ P.tagNoAttr "hello" f)
.| CL.consume
liftIO $ x @?= [1, 1, 1]
where
input = L.concat
[ ""
, "\n"
, ""
, " "
, " "
, " "
, ""
]
f :: C.MonadThrow m => ConduitT Event Int m ()
f = do
ma <- P.tagNoAttr "item" (return 1)
maybe (return ()) (\a -> C.yield a >> f) ma
omitXMLDeclaration :: Assertion
omitXMLDeclaration = Res.renderLBS settings input @?= spec
where
settings = def { Res.rsXMLDeclaration = False }
input = Res.Document (Prologue [] Nothing [])
(Res.Element "foo" Map.empty [Res.NodeContent "bar"])
[]
spec = "bar"
malformedEntityDeclaration :: Assertion
malformedEntityDeclaration = do -- missing > after bim
assertBool "raises ParseError" $
case Res.parseLBS Res.def "" of
Left e -> case fromException e of
Just (ParseError ["DOCTYPE"] _ _) -> True
_ -> False
_ -> False
caseEscapesAsNecessary :: Assertion
caseEscapesAsNecessary = do
let doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "a" (Map.fromList [("attr", "'<&val>'")])
[Res.NodeContent "'\"<&test]]>\"'"])
[]
result = Res.renderLBS def doc
result `shouldBe` "'\">'\"<&test]]>\"'"
casePreservesAttrOrder :: Assertion
casePreservesAttrOrder = do
let doc = Document (Prologue [] Nothing [])
(Element "doc" [] [
NodeElement (Element "el" [("attr1", [ContentText "1"]), ("attr2", [ContentText "2"])] []),
NodeElement (Element "el" [("attr2", [ContentText "2"]), ("attr1", [ContentText "1"])] [])
])
[]
rendered = ""
renderResult = D.renderLBS def doc
parseResult = D.parseLBS def rendered
renderResult `shouldBe` rendered
parseResult `shouldSatisfy` either (const False) (doc==)
hexEntityParsing :: Spec
hexEntityParsing = do
it "rejects leading 0x" $
go "xff;" @?= Nothing
it "rejects leading 0X" $
go "Xff;" @?= Nothing
it "accepts lowercase hex digits" $
go "ÿ" @?= Just (spec "\xff")
it "accepts uppercase hex digits" $
go "ÿ" @?= Just (spec "\xff")
--Note: this must be rejected, because, according to the XML spec, a
--legal EntityRef's entity matches Name, which can't start with a
--hash.
it "rejects trailing junk" $
go "ÿhello;" @?= Nothing
--Some of these next tests are XML 1.0 specific (i.e., they would
--differ for XML 1.1), but approximately no-one uses XML 1.1.
it "rejects illegal character #x0" $
go "" @?= Nothing
it "rejects illegal character #xFFFE" $
go "" @?= Nothing
it "rejects illegal character #xFFFF" $
go "" @?= Nothing
it "rejects illegal character #xD900" $
go "" @?= Nothing
it "rejects illegal character #xC" $
go "" @?= Nothing
it "rejects illegal character #x1F" $
go "" @?= Nothing
it "accepts astral plane character" $
go "" @?= Just (spec "\x1006ff")
it "accepts custom character references" $
go' customSettings "" @?= Just (spec "\xff")
where
spec content = Document (Prologue [] Nothing [])
(Element "foo" [] [NodeContent (ContentText content)])
[]
go = either (const Nothing) Just . D.parseLBS def
go' settings = either (const Nothing) Just . D.parseLBS settings
customSettings = def { P.psDecodeIllegalCharacters = customDecoder }
customDecoder 12 = Just '\xff'
customDecoder _ = Nothing
name :: [Cu.Cursor] -> [Text]
name [] = []
name (c:cs) = ($ name cs) $ case Cu.node c of
Res.NodeElement e -> ((Res.nameLocalName $ Res.elementName e) :)
_ -> id
cursor :: Cu.Cursor
cursor =
Cu.fromDocument $ Res.parseLBS_ def input
where
input = L.concat
[ ""
, ""
, ""
, ""
, ""
, "a"
, ""
, ""
, ""
, "b"
, ""
, ""
, ""
, ""
, ""
]
bar2, baz2, bar3, bin2 :: Cu.Cursor
bar2 = Cu.child cursor !! 1
baz2 = Cu.child bar2 !! 1
bar3 = Cu.child cursor !! 2
bin2 = Cu.child bar3 !! 1
cursorParent, cursorAncestor, cursorOrSelf, cursorPreceding, cursorFollowing,
cursorPrecedingSib, cursorFollowingSib, cursorDescendant, cursorCheck,
cursorPredicate, cursorCheckNode, cursorCheckElement, cursorCheckName,
cursorAnyElement, cursorElement, cursorLaxElement, cursorContent,
cursorAttribute, cursorLaxAttribute, cursorHasAttribute,
cursorAttributeIs, cursorDeep, cursorForce, cursorForceM,
resolvedIdentifies, resolvedAllGood, dontResolveEntities,
resolvedMergeContent, testHtmlEntities
:: Assertion
cursorParent = name (Cu.parent bar2) @?= ["foo"]
cursorAncestor = name (Cu.ancestor baz2) @?= ["bar2", "foo"]
cursorOrSelf = name (Cu.orSelf Cu.ancestor baz2) @?= ["baz2", "bar2", "foo"]
cursorPreceding = do
name (Cu.preceding baz2) @?= ["baz1", "bar1"]
name (Cu.preceding bin2) @?= ["bin1", "baz3", "baz2", "baz1", "bar2", "bar1"]
cursorFollowing = do
name (Cu.following baz2) @?= ["baz3", "bar3", "bin1", "bin2", "bin3", "Bar1"]
name (Cu.following bar2) @?= ["bar3", "bin1", "bin2", "bin3", "Bar1"]
cursorPrecedingSib = name (Cu.precedingSibling baz2) @?= ["baz1"]
cursorFollowingSib = name (Cu.followingSibling baz2) @?= ["baz3"]
cursorDescendant = (name $ Cu.descendant cursor) @?= T.words "bar1 bar2 baz1 baz2 baz3 bar3 bin1 bin2 bin3 Bar1"
cursorCheck = null (cursor $.// Cu.check (const False)) @?= True
cursorPredicate = (name $ cursor $.// Cu.check Cu.descendant) @?= T.words "foo bar2 baz3 bar3"
cursorCheckNode = (name $ cursor $// Cu.checkNode f) @?= T.words "bar1 bar2 bar3"
where f (Res.NodeElement e) = "bar" `T.isPrefixOf` Res.nameLocalName (Res.elementName e)
f _ = False
cursorCheckElement = (name $ cursor $// Cu.checkElement f) @?= T.words "bar1 bar2 bar3"
where f e = "bar" `T.isPrefixOf` Res.nameLocalName (Res.elementName e)
cursorCheckName = (name $ cursor $// Cu.checkName f) @?= T.words "bar1 bar2 bar3"
where f n = "bar" `T.isPrefixOf` nameLocalName n
cursorAnyElement = (name $ cursor $// Cu.anyElement) @?= T.words "bar1 bar2 baz1 baz2 baz3 bar3 bin1 bin2 bin3 Bar1"
cursorElement = (name $ cursor $// Cu.element "bar1") @?= ["bar1"]
cursorLaxElement = (name $ cursor $// Cu.laxElement "bar1") @?= ["bar1", "Bar1"]
cursorContent = do
Cu.content cursor @?= []
(cursor $.// Cu.content) @?= ["a", "b"]
cursorAttribute = Cu.attribute "attr" cursor @?= ["x"]
cursorLaxAttribute = (cursor $.// Cu.laxAttribute "Attr") @?= ["x", "y", "q"]
cursorHasAttribute = (length $ cursor $.// Cu.hasAttribute "attr") @?= 2
cursorAttributeIs = (length $ cursor $.// Cu.attributeIs "attr" "y") @?= 1
cursorDeep = do
(Cu.element "foo" &/ Cu.element "bar2" &// Cu.attribute "attr") cursor @?= ["y"]
(return &.// Cu.attribute "attr") cursor @?= ["x", "y"]
(cursor $.// Cu.attribute "attr") @?= ["x", "y"]
(cursor $/ Cu.element "bar2" &// Cu.attribute "attr") @?= ["y"]
(cursor $/ Cu.element "bar2" &/ Cu.element "baz2" >=> Cu.attribute "attr") @?= ["y"]
null (cursor $| Cu.element "foo") @?= False
cursorForce = do
Cu.force DummyEx [] @?= (Nothing :: Maybe Integer)
Cu.force DummyEx [1] @?= Just (1 :: Int)
Cu.force DummyEx [1,2] @?= Just (1 :: Int)
cursorForceM = do
Cu.forceM DummyEx [] @?= (Nothing :: Maybe Integer)
Cu.forceM DummyEx [Just 1, Nothing] @?= Just (1 :: Int)
Cu.forceM DummyEx [Nothing, Just (1 :: Int)] @?= Nothing
data DummyEx = DummyEx
deriving (Show, Typeable)
instance Exception DummyEx
showEq :: (Show a, Show b) => Either a b -> Either a b -> Assertion
showEq x y = show x @=? show y
resolvedIdentifies =
Left (toException $ Res.UnresolvedEntityException $ Set.fromList ["foo", "bar", "baz"]) `showEq`
Res.parseLBS def
"&foo; --- &baz; &foo;"
testHtmlEntities =
Res.parseLBS_ def
{ P.psDecodeEntities = P.decodeHtmlEntities
} xml1 @=? Res.parseLBS_ def xml2
where
xml1 = " "
xml2 = " "
resolvedAllGood =
D.parseLBS_ def xml @=?
Res.toXMLDocument (Res.parseLBS_ def xml)
where
xml = ""
dontResolveEntities =
D.parseLBS_ settings xml @=? expectedDocument
where
settings = def { P.psIgnoreInternalEntityDeclarations = True }
xml = mconcat
[ " ]>"
, ">&foo;&bar;"
]
expectedDocument =
Document
(Prologue [] (Just (Doctype "mydt" Nothing)) [])
(Element "root" mempty
[ NodeContent (ContentText ">")
, NodeContent (ContentEntity "foo")
, NodeContent (ContentEntity "bar")
])
[]
resolvedMergeContent =
Res.documentRoot (Res.parseLBS_ def xml) @=?
Res.Element "foo" Map.empty [Res.NodeContent "bar&baz"]
where
xml = "bar&baz"
parseIgnoreBOM :: Assertion
parseIgnoreBOM = do
either (const $ Left (1 :: Int)) Right (Res.parseText Res.def "\xfeef") @?=
either (const $ Left (2 :: Int)) Right (Res.parseText Res.def "")
stripDuplicateAttributes :: Assertion
stripDuplicateAttributes = do
"" @=?
D.renderLBS def (Document (Prologue [] Nothing []) (Element "foo" [("bar", [ContentText "baz"]), ("bar", [ContentText "bin"])] []) [])
"" @=?
D.renderLBS def (Document (Prologue [] Nothing []) (Element "foo"
[ ("x:bar", [ContentText "baz"])
, (Name "bar" (Just "namespace") (Just "x"), [ContentText "bin"])
] []) [])
testRenderComments :: Assertion
testRenderComments =do
""
@=? D.renderLBS def (Document (Prologue [] Nothing [])
(Element "foo" [] [NodeComment "comment"]) [])
resolvedInline :: Assertion
resolvedInline = do
Res.Document _ root _ <- return $ Res.parseLBS_ Res.def "]>&bar;"
root @?= Res.Element "foo" Map.empty [Res.NodeContent "baz"]
Res.Document _ root2 _ <- return $ Res.parseLBS_ Res.def "]>"
root2 @?= Res.Element "foo" (Map.singleton "bar" "baz") []
resolvedInlineComplex :: Assertion
resolvedInlineComplex = do
Res.Document _ root _ <- return $ Res.parseLBS_ Res.def "baz &bim;
\">]>&bar;"
root @?= Res.Element "foo" Map.empty [Res.NodeElement (Res.Element "p" Map.empty [Res.NodeContent "baz Hello"])]
Res.Document _ root2 _ <- return $ Res.parseLBS_ Res.def "baz\">]>"
root2 @?= Res.Element "foo" (Map.fromList [("class","baz")]) []
resolvedInlineRecursive :: Assertion
resolvedInlineRecursive = do
Res.Document _ root _ <- return $ Res.parseLBS_ Res.def
"]>&bar;"
root @?= Res.Element "foo" Map.empty [Res.NodeContent "bazI&"]
resolvedInlineLoop :: Assertion
resolvedInlineLoop = do
res <- return $ Res.parseLBS Res.def
"]>&bim;"
Left (toException $ Res.UnresolvedEntityException (Set.fromList ["bim"]))
`showEq` res
res2 <- return $ Res.parseLBS Res.def
"]>"
Left (toException $ Res.UnresolvedEntityException (Set.fromList ["bim"]))
`showEq` res2
billionLaughs :: Assertion
billionLaughs = do
res <- return $ Res.parseLBS Res.def
"]>&lol9;"
Left (toException $ Res.UnresolvedEntityException (Set.fromList ["lol9"]))
`showEq` res
thousandLaughs :: Assertion
thousandLaughs = do
res <- return $ Res.parseLBS Res.def{ P.psEntityExpansionSizeLimit = 2999 }
"]>&lol3;"
Left (toException $ Res.UnresolvedEntityException (Set.fromList ["lol3"]))
`showEq` res
-- Raise the entity expansion limit and it should work:
Right (Res.Document {Res.documentRoot = Res.Element{ Res.elementNodes = [Res.NodeContent t] }}) <- return $ Res.parseLBS Res.def{ P.psEntityExpansionSizeLimit = 3001 } "]>&lol3;"
t @?= T.replicate 1000 "lol"
parameterEntity :: Assertion
parameterEntity = do
let res = Res.parseLBS Res.def "]>&bim;"
Left (toException $ Res.UnresolvedEntityException (Set.fromList ["bim"]))
`showEq` res
doctypeComment :: Assertion
doctypeComment = do
Res.Document _ root _ <- return $ Res.parseLBS_
Res.def " ]>&bar;"
root @?= Res.Element "foo" Map.empty [Res.NodeContent "baz"]
doctypeElements :: Assertion
doctypeElements = do
Res.Document _ root _ <- return $ Res.parseLBS_
Res.def "\n\n\n]>&bar;"
root @?= Res.Element "foo" Map.empty [Res.NodeContent "baz"]
doctypePI :: Assertion
doctypePI = do
Res.Document _ root _ <- return $ Res.parseLBS_
Res.def "]>&bar;"
root @?= Res.Element "foo" Map.empty [Res.NodeContent "baz"]
casePretty :: Assertion
casePretty = do
let pretty = S.unlines
[ ""
, ""
, ""
, " "
, " Hello World"
, " "
, " "
, " "
, " "
, " "
, " bar content"
, " "
, ""
]
doctype = Res.Doctype "foo" Nothing
doc = Res.Document (Res.Prologue [] (Just doctype) []) root []
root = Res.Element "foo" (Map.fromList [("bar", "bar"), ("baz", "baz")])
[ Res.NodeElement $ Res.Element "foo" (Map.fromList [("bar", "bar"), ("baz", "baz"), ("bin", "bin")])
[ Res.NodeContent " Hello World\n\n"
, Res.NodeContent " "
]
, Res.NodeElement $ Res.Element "foo" Map.empty []
, Res.NodeInstruction $ Res.Instruction "foo" "bar"
, Res.NodeComment "foo bar\n\r\nbaz \tbin "
, Res.NodeElement $ Res.Element "bar" Map.empty [Res.NodeContent "bar content"]
]
pretty @=? S.concat (L.toChunks $ Res.renderLBS def { D.rsPretty = True } doc)
caseTopLevelNamespace :: Assertion
caseTopLevelNamespace = do
let lbs = S.concat
[ ""
, ""
, ""
, ""
]
rs = def { D.rsNamespaces = [("bar", "baz")] }
doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "foo" Map.empty
[ Res.NodeElement
$ Res.Element "subfoo" (Map.singleton "{baz}bin" "") []
])
[]
lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc)
caseTopLevelNamespacePrefix :: Assertion
caseTopLevelNamespacePrefix = do
let lbs = S.concat
[ ""
, ""
, ""
, ""
]
rs = def { D.rsNamespaces = [("bar", "baz")] }
doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "foo" Map.empty
[ Res.NodeElement
$ Res.Element "subfoo" (Map.fromList [(Name "bin" (Just "baz") (Just "bar"), "")]) []
])
[]
lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc)
caseTLNConflict :: Assertion
caseTLNConflict = do
let lbs = S.concat
[ ""
, ""
, ""
, ""
]
rs = def { D.rsNamespaces = [("bar", "baz")] }
doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "foo" (Map.fromList [(Name "x" (Just "something") (Just "bar"), "y")])
[ Res.NodeElement
$ Res.Element "subfoo" (Map.fromList [(Name "bin" (Just "baz") (Just "bar"), "")]) []
])
[]
lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc)
caseBlazeHtml :: Assertion
caseBlazeHtml =
expected @=? str
where
str = renderMarkup $ toMarkup $ Res.Document (Res.Prologue [] Nothing []) root []
root :: Res.Element
root = Res.Element "html" Map.empty
[ Res.NodeElement $ Res.Element "head" Map.empty
[ Res.NodeElement $ Res.Element "title" Map.empty [Res.NodeContent "Test"]
, Res.NodeElement $ Res.Element "script" Map.empty
[Res.NodeContent "if (5 < 6 || 8 > 9) alert('Hello World!');"]
, Res.NodeElement $ Res.Element "{http://www.snoyman.com/xml2html}ie-cond" (Map.singleton "cond" "lt IE 7")
[Res.NodeElement $ Res.Element "link" (Map.singleton "href" "ie6.css") []]
, Res.NodeElement $ Res.Element "style" Map.empty
[Res.NodeContent "body > h1 { color: red }"]
]
, Res.NodeElement $ Res.Element "body" Map.empty
[ Res.NodeElement $ Res.Element "h1" Map.empty [Res.NodeContent "Hello World!"]
]
]
expected :: String
expected = concat
[ "\n"
, "Test"
, ""
, ""
, ""
, "Hello World!
"
]
caseAttrReorder :: Assertion
caseAttrReorder = do
let lbs = S.concat
[ ""
, ""
, ""
, ""
]
rs = def { Res.rsAttrOrder = \name' m ->
case name' of
"foo" -> reverse $ Map.toAscList m
_ -> Map.toAscList m
}
attrs = Map.fromList [("a", "a"), ("b", "b"), ("c", "c")]
doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "foo" attrs
[ Res.NodeElement
$ Res.Element "bar" attrs []
])
[]
lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc)
caseOrderAttrs :: Assertion
caseOrderAttrs = do
let lbs = S.concat
[ ""
, ""
, ""
, ""
]
rs = def { Res.rsAttrOrder = Res.orderAttrs
[("foo", ["c", "b"])]
}
attrs = Map.fromList [("a", "a"), ("b", "b"), ("c", "c")]
doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "foo" attrs
[ Res.NodeElement
$ Res.Element "bar" attrs []
])
[]
lbs @=? S.concat (L.toChunks $ Res.renderLBS rs doc)
caseParseCdata :: Assertion
caseParseCdata = do
let lbs = ""
doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "a" Map.empty
[ Res.NodeContent "www.google.com"
])
[]
Res.parseLBS_ def lbs @?= doc
caseRetainNamespaces :: Assertion
caseRetainNamespaces = do
let lbs = ""
doc = Res.parseLBS_ def { Res.psRetainNamespaces = True } lbs
doc `shouldBe` Res.Document
(Res.Prologue [] Nothing [])
(Res.Element
"foo"
(Map.singleton "xmlns:bar" "baz")
[ Res.NodeElement $ Res.Element
"{baz}bin"
Map.empty
[]
, Res.NodeElement $ Res.Element
"{bin4}bin3"
(Map.singleton "xmlns" "bin4")
[]
])
[]
caseIso8859_1 :: Assertion
caseIso8859_1 = do
let lbs = "\232"
doc = Res.parseLBS_ def lbs
doc `shouldBe` Res.Document
(Res.Prologue [] Nothing [])
(Res.Element
"foo"
Map.empty
[Res.NodeContent "\232"])
[]
caseRenderCDATA :: Assertion
caseRenderCDATA = do
let doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "a" Map.empty
[ Res.NodeContent "www.google.com"
])
[]
withoutCDATA = Res.renderLBS def doc
withCDATA = Res.renderLBS (def { Res.rsUseCDATA = const True }) doc
withCDATA `shouldBe` ""
withoutCDATA `shouldBe` "www.google.com"
caseEscapesCDATA :: Assertion
caseEscapesCDATA = do
let doc = Res.Document (Res.Prologue [] Nothing [])
(Res.Element "a" Map.empty
[ Res.NodeContent "]]>"
])
[]
result = Res.renderLBS (def { Res.rsUseCDATA = const True }) doc
result `shouldBe` "]]>"
crlfToLfConversion :: Assertion
crlfToLfConversion = (elementContent $ documentRoot crlfDoc) `shouldBe` crlfContent
where
crlfDoc = D.parseLBS_ def "Hello,\rWorld!\r\nWe don't like your kind of line endings around here.\r\n"
crlfContent = [ContentText "Hello,\nWorld!\nWe don't like your kind of line endings around here.\n"]
crlfToLfConversionCrAtEnd :: Assertion
crlfToLfConversionCrAtEnd = (elementContent $ documentRoot doc) `shouldBe` content
where
doc = D.parseLBS_ def "Hello, World!\r"
content = [ContentText "Hello, World!\n"]
crlfToLfConversionCrCrCr :: Assertion
crlfToLfConversionCrCrCr = (elementContent $ documentRoot doc) `shouldBe` content
where
doc = D.parseLBS_ def "\r\r\r\n\r\r\r"
content = [ContentText "\n\n\n\n\n\n"]
xml-conduit-1.9.1.3/xml-conduit.cabal 0000644 0000000 0000000 00000005706 07346545000 015556 0 ustar 00 0000000 0000000 cabal-version: 1.14
name: xml-conduit
version: 1.9.1.3
license: MIT
license-file: LICENSE
author: Michael Snoyman , Aristid Breitkreuz
maintainer: Michael Snoyman
synopsis: Pure-Haskell utilities for dealing with XML with the conduit package.
description: Hackage documentation generation is not reliable. For up to date documentation, please see: .
category: XML, Conduit
stability: Stable
build-type: Custom
homepage: http://github.com/snoyberg/xml
extra-source-files: README.md
ChangeLog.md
tested-with: GHC >=8.0 && <8.12
custom-setup
setup-depends: base >= 4 && <5, Cabal <4, cabal-doctest >= 1.0.9 && <1.1
library
build-depends: base >= 4.12 && < 5
, conduit >= 1.3 && < 1.4
, conduit-extra >= 1.3 && < 1.4
, resourcet >= 1.2 && < 1.4
, bytestring >= 0.10.2
, text >= 0.7
, containers >= 0.2
, xml-types >= 0.3.4 && < 0.4
, attoparsec >= 0.10
, transformers >= 0.2 && < 0.7
, data-default-class
, blaze-markup >= 0.5
, blaze-html >= 0.5
, deepseq >= 1.1.0.0
exposed-modules: Text.XML.Stream.Parse
Text.XML.Stream.Render
Text.XML.Unresolved
Text.XML.Cursor
Text.XML.Cursor.Generic
Text.XML
other-modules: Text.XML.Stream.Token
ghc-options: -Wall
hs-source-dirs: src
default-language: Haskell2010
test-suite unit
type: exitcode-stdio-1.0
main-is: unit.hs
hs-source-dirs: test
build-depends: base
, containers
, text
, transformers
, bytestring
, xml-conduit
, hspec >= 1.3
, HUnit
, xml-types >= 0.3.1
, conduit
, conduit-extra
, blaze-markup
, resourcet
default-language: Haskell2010
test-suite doctest
type: exitcode-stdio-1.0
main-is: doctest.hs
hs-source-dirs: test
build-depends: base
, doctest >= 0.8
, xml-conduit
default-language: Haskell2010
source-repository head
type: git
location: git://github.com/snoyberg/xml.git