text-0.11.3.1/0000755000000000000000000000000012140332415011100 5ustar0000000000000000text-0.11.3.1/LICENSE0000644000000000000000000000245312140332415012111 0ustar0000000000000000Copyright (c) 2008-2009, Tom Harper All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. text-0.11.3.1/README.markdown0000644000000000000000000000256612140332415013612 0ustar0000000000000000# Text: Fast, packed Unicode strings, using stream fusion This package provides the Data.Text library, a library for the space- and time-efficient manipulation of Unicode text in Haskell. # Normalization, conversion, and collation, oh my! This library intentionally provides a simple API based on the Haskell prelude's list manipulation functions. For more complicated real-world tasks, such as Unicode normalization, conversion to and from a larger variety of encodings, and collation, use the [text-icu package](http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text-icu). That library uses the well-respected and liberally licensed ICU library to provide these facilities. # Get involved! Please report bugs via the [github issue tracker](https://github.com/bos/text/issues). Master [git repository](https://github.com/bos/text): * `git clone git://github.com/bos/text.git` There's also a [Mercurial mirror](https://bitbucket.org/bos/text): * `hg clone https://bitbucket.org/bos/text` (You can create and contribute changes using either Mercurial or git.) # Authors The base code for this library was originally written by Tom Harper, based on the stream fusion framework developed by Roman Leshchinskiy, Duncan Coutts, and Don Stewart. The core library was fleshed out, debugged, and tested by Bryan O'Sullivan , and he is the current maintainer. text-0.11.3.1/Setup.lhs0000644000000000000000000000011412140332415012704 0ustar0000000000000000#!/usr/bin/env runhaskell > import Distribution.Simple > main = defaultMain text-0.11.3.1/tests-and-benchmarks.markdown0000644000000000000000000000432312140332415016663 0ustar0000000000000000Tests and benchmarks ==================== Prerequisites ------------- To run the tests and benchmarks, you will need the test data, which you can clone from one of the following locations: * Mercurial master repository: [bitbucket.org/bos/text-test-data](https://bitbucket.org/bos/text-test-data) * Git mirror repository: [github.com/bos/text-test-data](https://github.com/bos/text-test-data) You should clone that repository into the `tests` subdirectory (your clone must be named `text-test-data` locally), then run `make -C tests/text-test-data` to uncompress the test files. Many tests and benchmarks will fail if the test files are missing. Functional tests ---------------- The functional tests are located in the `tests` subdirectory. An overview of what's in that directory: Makefile Has targets for common tasks Tests Source files of the testing code scripts Various utility scripts text-tests.cabal Cabal file that compiles all benchmarks The `text-tests.cabal` builds: - A copy of the text library, sharing the source code, but exposing all internal modules, for testing purposes - The different test suites To compile, run all tests, and generate a coverage report, simply use `make`. Benchmarks ---------- The benchmarks are located in the `benchmarks` subdirectory. An overview of what's in that directory: Makefile Has targets for common tasks haskell Source files of the haskell benchmarks python Python implementations of some benchmarks ruby Ruby implementations of some benchmarks text-benchmarks.cabal Cabal file which compiles all benchmarks To compile the benchmarks, navigate to the `benchmarks` subdirectory and run `cabal configure && cabal build`. Then, you can run the benchmarks using: ./dist/build/text-benchmarks/text-benchmarks However, since there's quite a lot of benchmarks, you usually don't want to run them all. Instead, use the `-l` flag to get a list of benchmarks: ./dist/build/text-benchmarks/text-benchmarks And run the ones you want to inspect. If you want to configure the benchmarks further, the exact parameters can be changed in `Benchmarks.hs`. text-0.11.3.1/text.cabal0000644000000000000000000001310212140332415013045 0ustar0000000000000000name: text version: 0.11.3.1 homepage: https://github.com/bos/text bug-reports: https://github.com/bos/text/issues synopsis: An efficient packed Unicode text type. description: . An efficient packed, immutable Unicode text type (both strict and lazy), with a powerful loop fusion optimization framework. . The 'Text' type represents Unicode character strings, in a time and space-efficient manner. This package provides text processing capabilities that are optimized for performance critical use, both in terms of large data quantities and high speed. . The 'Text' type provides character-encoding, type-safe case conversion via whole-string case conversion functions. It also provides a range of functions for converting 'Text' values to and from 'ByteStrings', using several standard encodings. . Efficient locale-sensitive support for text IO is also supported. . These modules are intended to be imported qualified, to avoid name clashes with Prelude functions, e.g. . > import qualified Data.Text as T . To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the @text-icu@ package: . —— RELEASE NOTES —— . Changes in 0.11.2.0: . * String literals are now converted directly from the format in which GHC stores them into 'Text', without an intermediate transformation through 'String', and without inlining of conversion code at each site where a string literal is declared. . license: BSD3 license-file: LICENSE author: Bryan O'Sullivan maintainer: Bryan O'Sullivan copyright: 2009-2011 Bryan O'Sullivan, 2008-2009 Tom Harper category: Data, Text build-type: Simple cabal-version: >= 1.8 extra-source-files: -- scripts/CaseFolding.txt -- scripts/SpecialCasing.txt README.markdown benchmarks/Setup.hs benchmarks/cbits/*.c benchmarks/haskell/*.hs benchmarks/haskell/Benchmarks/*.hs benchmarks/python/*.py benchmarks/ruby/*.rb benchmarks/text-benchmarks.cabal scripts/*.hs tests-and-benchmarks.markdown tests/*.hs tests/.ghci tests/Makefile tests/Tests/*.hs tests/scripts/*.sh tests/text-tests.cabal flag developer description: operate in developer mode default: False flag integer-simple description: Use the simple integer library instead of GMP default: False library c-sources: cbits/cbits.c exposed-modules: Data.Text Data.Text.Array Data.Text.Encoding Data.Text.Encoding.Error Data.Text.Foreign Data.Text.IO Data.Text.Internal Data.Text.Lazy Data.Text.Lazy.Builder Data.Text.Lazy.Builder.Int Data.Text.Lazy.Builder.RealFloat Data.Text.Lazy.Encoding Data.Text.Lazy.IO Data.Text.Lazy.Internal Data.Text.Lazy.Read Data.Text.Read Data.Text.Unsafe other-modules: Data.Text.Encoding.Fusion Data.Text.Encoding.Fusion.Common Data.Text.Encoding.Utf16 Data.Text.Encoding.Utf32 Data.Text.Encoding.Utf8 Data.Text.Fusion Data.Text.Fusion.CaseMapping Data.Text.Fusion.Common Data.Text.Fusion.Internal Data.Text.Fusion.Size Data.Text.IO.Internal Data.Text.Lazy.Builder.Functions Data.Text.Lazy.Builder.Int.Digits Data.Text.Lazy.Builder.Internal Data.Text.Lazy.Builder.RealFloat.Functions Data.Text.Lazy.Encoding.Fusion Data.Text.Lazy.Fusion Data.Text.Lazy.Search Data.Text.Private Data.Text.Search Data.Text.Unsafe.Base Data.Text.UnsafeChar Data.Text.UnsafeShift Data.Text.Util build-depends: array, base < 5, bytestring >= 0.9 if impl(ghc >= 6.10) build-depends: ghc-prim, base >= 4, deepseq >= 1.1.0.0 cpp-options: -DHAVE_DEEPSEQ else build-depends: extensible-exceptions extensions: ScopedTypeVariables ghc-options: -Wall -funbox-strict-fields -O2 if impl(ghc >= 6.8) ghc-options: -fwarn-tabs if flag(developer) ghc-prof-options: -auto-all ghc-options: -Werror cpp-options: -DASSERTS if impl(ghc >= 6.11) if flag(integer-simple) cpp-options: -DINTEGER_SIMPLE build-depends: integer-simple >= 0.1 && < 0.5 else cpp-options: -DINTEGER_GMP build-depends: integer-gmp >= 0.2 if impl(ghc >= 6.9) && impl(ghc < 6.11) cpp-options: -DINTEGER_GMP build-depends: integer >= 0.1 && < 0.2 test-suite tests type: exitcode-stdio-1.0 hs-source-dirs: tests . main-is: Tests.hs c-sources: cbits/cbits.c ghc-options: -Wall -threaded -O0 -rtsopts cpp-options: -DASSERTS -DHAVE_DEEPSEQ build-depends: HUnit >= 1.2, QuickCheck >= 2.4, array, base, bytestring, deepseq, directory, ghc-prim, random, test-framework >= 0.4, test-framework-hunit >= 0.2, test-framework-quickcheck2 >= 0.2 if impl(ghc >= 6.11) if flag(integer-simple) cpp-options: -DINTEGER_SIMPLE build-depends: integer-simple >= 0.1 && < 0.5 else cpp-options: -DINTEGER_GMP build-depends: integer-gmp >= 0.2 if impl(ghc >= 6.9) && impl(ghc < 6.11) cpp-options: -DINTEGER_GMP build-depends: integer >= 0.1 && < 0.2 source-repository head type: git location: https://github.com/bos/text source-repository head type: mercurial location: https://bitbucket.org/bos/text text-0.11.3.1/benchmarks/0000755000000000000000000000000012140332415013215 5ustar0000000000000000text-0.11.3.1/benchmarks/Setup.hs0000644000000000000000000000005612140332415014652 0ustar0000000000000000import Distribution.Simple main = defaultMain text-0.11.3.1/benchmarks/text-benchmarks.cabal0000644000000000000000000000313412140332415017301 0ustar0000000000000000name: text-benchmarks version: 0.0.0.0 synopsis: Benchmarks for the text package description: Benchmarks for the text package homepage: https://bitbucket.org/bos/text license: BSD3 license-file: ../LICENSE author: Jasper Van der Jeugt , Bryan O'Sullivan , Tom Harper , Duncan Coutts maintainer: jaspervdj@gmail.com category: Text build-type: Simple cabal-version: >=1.2 flag llvm description: use LLVM default: False executable text-benchmarks hs-source-dirs: haskell .. c-sources: ../cbits/cbits.c cbits/time_iconv.c main-is: Benchmarks.hs ghc-options: -Wall -O2 if flag(llvm) ghc-options: -fllvm cpp-options: -DHAVE_DEEPSEQ -DINTEGER_GMP build-depends: base == 4.*, binary, blaze-builder, bytestring, bytestring-lexing, containers, criterion >= 0.6.0.1, deepseq, directory, filepath, ghc-prim, integer-gmp, stringsearch, utf8-string executable text-multilang hs-source-dirs: haskell main-is: Multilang.hs ghc-options: -Wall -O2 build-depends: base == 4.*, bytestring, text, time text-0.11.3.1/benchmarks/cbits/0000755000000000000000000000000012140332415014321 5ustar0000000000000000text-0.11.3.1/benchmarks/cbits/time_iconv.c0000644000000000000000000000135212140332415016622 0ustar0000000000000000#include #include #include #include int time_iconv(char *srcbuf, size_t srcbufsize) { uint16_t *destbuf = NULL; size_t destbufsize; static uint16_t *origdestbuf; static size_t origdestbufsize; iconv_t ic = (iconv_t) -1; int ret = 0; if (ic == (iconv_t) -1) { ic = iconv_open("UTF-16LE", "UTF-8"); if (ic == (iconv_t) -1) { ret = -1; goto done; } } destbufsize = srcbufsize * sizeof(uint16_t); if (destbufsize > origdestbufsize) { free(origdestbuf); origdestbuf = destbuf = malloc(origdestbufsize = destbufsize); } else { destbuf = origdestbuf; } iconv(ic, &srcbuf, &srcbufsize, (char**) &destbuf, &destbufsize); done: return ret; } text-0.11.3.1/benchmarks/haskell/0000755000000000000000000000000012140332415014640 5ustar0000000000000000text-0.11.3.1/benchmarks/haskell/Benchmarks.hs0000644000000000000000000000561612140332415017261 0ustar0000000000000000-- | Main module to run the micro benchmarks -- {-# LANGUAGE OverloadedStrings #-} module Main ( main ) where import Criterion.Main (Benchmark, defaultMain, bgroup) import System.FilePath (()) import System.IO (IOMode (WriteMode), openFile, hSetEncoding, utf8) import qualified Benchmarks.Builder as Builder import qualified Benchmarks.DecodeUtf8 as DecodeUtf8 import qualified Benchmarks.EncodeUtf8 as EncodeUtf8 import qualified Benchmarks.Equality as Equality import qualified Benchmarks.FileRead as FileRead import qualified Benchmarks.FoldLines as FoldLines import qualified Benchmarks.Pure as Pure import qualified Benchmarks.ReadNumbers as ReadNumbers import qualified Benchmarks.Replace as Replace import qualified Benchmarks.Search as Search import qualified Benchmarks.Stream as Stream import qualified Benchmarks.WordFrequencies as WordFrequencies import qualified Benchmarks.Programs.BigTable as Programs.BigTable import qualified Benchmarks.Programs.Cut as Programs.Cut import qualified Benchmarks.Programs.Fold as Programs.Fold import qualified Benchmarks.Programs.Sort as Programs.Sort import qualified Benchmarks.Programs.StripTags as Programs.StripTags import qualified Benchmarks.Programs.Throughput as Programs.Throughput main :: IO () main = benchmarks >>= defaultMain benchmarks :: IO [Benchmark] benchmarks = do sink <- openFile "/dev/null" WriteMode hSetEncoding sink utf8 -- Traditional benchmarks bs <- sequence [ Builder.benchmark , DecodeUtf8.benchmark "html" (tf "libya-chinese.html") , DecodeUtf8.benchmark "xml" (tf "yiwiki.xml") , DecodeUtf8.benchmark "ascii" (tf "ascii.txt") , DecodeUtf8.benchmark "russian" (tf "russian.txt") , DecodeUtf8.benchmark "japanese" (tf "japanese.txt") , EncodeUtf8.benchmark "επανάληψη 竺法蘭共譯" , Equality.benchmark (tf "japanese.txt") , FileRead.benchmark (tf "russian.txt") , FoldLines.benchmark (tf "russian.txt") , Pure.benchmark "tiny "(tf "tiny.txt") , Pure.benchmark "japanese" (tf "japanese.txt") , ReadNumbers.benchmark (tf "numbers.txt") , Replace.benchmark (tf "russian.txt") "принимая" "своем" , Search.benchmark (tf "russian.txt") "принимая" , Stream.benchmark (tf "russian.txt") , WordFrequencies.benchmark (tf "russian.txt") ] -- Program-like benchmarks ps <- bgroup "Programs" `fmap` sequence [ Programs.BigTable.benchmark sink , Programs.Cut.benchmark (tf "russian.txt") sink 20 40 , Programs.Fold.benchmark (tf "russian.txt") sink , Programs.Sort.benchmark (tf "russian.txt") sink , Programs.StripTags.benchmark (tf "yiwiki.xml") sink , Programs.Throughput.benchmark (tf "russian.txt") sink ] return $ bs ++ [ps] where -- Location of a test file tf = ("../tests/text-test-data" ) text-0.11.3.1/benchmarks/haskell/Multilang.hs0000644000000000000000000000142512140332415017132 0ustar0000000000000000{-# LANGUAGE BangPatterns, OverloadedStrings, RankNTypes #-} module Main ( main ) where import Control.Monad (forM_) import qualified Data.ByteString as B import qualified Data.Text as Text import Data.Text.Encoding (decodeUtf8) import Data.Text (Text) import System.IO (hFlush, stdout) import Timer (timer) type BM = Text -> () bm :: forall a. (Text -> a) -> BM bm f t = f t `seq` () benchmarks :: [(String, Text.Text -> ())] benchmarks = [ ("find_first", bm $ Text.isInfixOf "en:Benin") , ("find_index", bm $ Text.findIndex (=='c')) ] main :: IO () main = do !contents <- decodeUtf8 `fmap` B.readFile "../tests/text-test-data/yiwiki.xml" forM_ benchmarks $ \(name, bmark) -> do putStr $ name ++ " " hFlush stdout putStrLn =<< (timer 100 contents bmark) text-0.11.3.1/benchmarks/haskell/Timer.hs0000644000000000000000000000162212140332415016255 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} module Timer (timer) where import Control.Exception (evaluate) import Data.Time.Clock.POSIX (getPOSIXTime) import GHC.Float (FFFormat(..), formatRealFloat) ickyRound :: Int -> Double -> String ickyRound k = formatRealFloat FFFixed (Just k) timer :: Int -> a -> (a -> b) -> IO String timer count a0 f = do let loop !k !fastest | k <= 0 = return fastest | otherwise = do start <- getPOSIXTime let inner a i | i <= 0 = return () | otherwise = evaluate (f a) >> inner a (i-1) inner a0 count end <- getPOSIXTime let elapsed = end - start loop (k-1) (min fastest (elapsed / fromIntegral count)) t <- loop (3::Int) 1e300 let log10 x = log x / log 10 ft = realToFrac t prec = round (log10 (fromIntegral count) - log10 ft) return $! ickyRound prec ft {-# NOINLINE timer #-} text-0.11.3.1/benchmarks/haskell/Benchmarks/0000755000000000000000000000000012140332415016715 5ustar0000000000000000text-0.11.3.1/benchmarks/haskell/Benchmarks/Builder.hs0000644000000000000000000000470312140332415020643 0ustar0000000000000000-- | Testing the internal builder monoid -- -- Tested in this benchmark: -- -- * Concatenating many small strings using a builder -- {-# LANGUAGE OverloadedStrings #-} module Benchmarks.Builder ( benchmark ) where import Criterion (Benchmark, bgroup, bench, nf) import Data.Binary.Builder as B import Data.ByteString.Char8 () import Data.Monoid (mconcat, mempty) import qualified Blaze.ByteString.Builder as Blaze import qualified Blaze.ByteString.Builder.Char.Utf8 as Blaze import qualified Data.ByteString as SB import qualified Data.ByteString.Lazy as LB import qualified Data.Text as T import qualified Data.Text.Lazy as LT import qualified Data.Text.Lazy.Builder as LTB import qualified Data.Text.Lazy.Builder.Int as Int import Data.Int (Int64) benchmark :: IO Benchmark benchmark = return $ bgroup "Builder" [ bgroup "Comparison" [ bench "LazyText" $ nf (LT.length . LTB.toLazyText . mconcat . map LTB.fromText) texts , bench "Binary" $ nf (LB.length . B.toLazyByteString . mconcat . map B.fromByteString) byteStrings , bench "Blaze" $ nf (LB.length . Blaze.toLazyByteString . mconcat . map Blaze.fromString) strings ] , bgroup "Int" [ bgroup "Decimal" [ bgroup "Positive" . flip map numbers $ \n -> (bench (show (length (show n))) $ nf (LTB.toLazyText . Int.decimal) n) , bgroup "Negative" . flip map numbers $ \m -> let n = negate m in (bench (show (length (show n))) $ nf (LTB.toLazyText . Int.decimal) n) , bench "Empty" $ nf LTB.toLazyText mempty , bgroup "Show" . flip map numbers $ \n -> (bench (show (length (show n))) $ nf show n) ] ] ] where numbers :: [Int64] numbers = [ 6, 14, 500, 9688, 10654, 620735, 5608880, 37010612, 731223504, 5061580596, 24596952933, 711732309084, 2845910093839, 54601756118340, 735159434806159, 3619097625502435, 95777227510267124, 414944309510675693, 8986407456998704019 ] texts :: [T.Text] texts = take 200000 $ cycle ["foo", "λx", "由の"] {-# NOINLINE texts #-} -- Note that the non-ascii characters will be chopped byteStrings :: [SB.ByteString] byteStrings = take 200000 $ cycle ["foo", "λx", "由の"] {-# NOINLINE byteStrings #-} -- Note that the non-ascii characters will be chopped strings :: [String] strings = take 200000 $ cycle ["foo", "λx", "由の"] {-# NOINLINE strings #-} text-0.11.3.1/benchmarks/haskell/Benchmarks/DecodeUtf8.hs0000644000000000000000000000372312140332415021210 0ustar0000000000000000{-# LANGUAGE ForeignFunctionInterface #-} -- | Test decoding of UTF-8 -- -- Tested in this benchmark: -- -- * Decoding bytes using UTF-8 -- -- In some tests: -- -- * Taking the length of the result -- -- * Taking the init of the result -- -- The latter are used for testing stream fusion. -- module Benchmarks.DecodeUtf8 ( benchmark ) where import Foreign.C.Types import Data.ByteString.Internal (ByteString(..)) import Foreign.Ptr (Ptr, plusPtr) import Foreign.ForeignPtr (withForeignPtr) import Data.Word (Word8) import qualified Criterion as C import Criterion (Benchmark, bgroup, nf) import qualified Codec.Binary.UTF8.Generic as U8 import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as BL import qualified Data.Text as T import qualified Data.Text.Encoding as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Encoding as TL benchmark :: String -> FilePath -> IO Benchmark benchmark kind fp = do bs <- B.readFile fp lbs <- BL.readFile fp let bench name = C.bench (name ++ "+" ++ kind) return $ bgroup "DecodeUtf8" [ bench "Strict" $ nf T.decodeUtf8 bs , bench "IConv" $ iconv bs , bench "StrictLength" $ nf (T.length . T.decodeUtf8) bs , bench "StrictInitLength" $ nf (T.length . T.init . T.decodeUtf8) bs , bench "Lazy" $ nf TL.decodeUtf8 lbs , bench "LazyLength" $ nf (TL.length . TL.decodeUtf8) lbs , bench "LazyInitLength" $ nf (TL.length . TL.init . TL.decodeUtf8) lbs , bench "StrictStringUtf8" $ nf U8.toString bs , bench "StrictStringUtf8Length" $ nf (length . U8.toString) bs , bench "LazyStringUtf8" $ nf U8.toString lbs , bench "LazyStringUtf8Length" $ nf (length . U8.toString) lbs ] iconv :: ByteString -> IO CInt iconv (PS fp off len) = withForeignPtr fp $ \ptr -> time_iconv (ptr `plusPtr` off) (fromIntegral len) foreign import ccall unsafe time_iconv :: Ptr Word8 -> CSize -> IO CInt text-0.11.3.1/benchmarks/haskell/Benchmarks/EncodeUtf8.hs0000644000000000000000000000160512140332415021217 0ustar0000000000000000-- | UTF-8 encode a text -- -- Tested in this benchmark: -- -- * Replicating a string a number of times -- -- * UTF-8 encoding it -- module Benchmarks.EncodeUtf8 ( benchmark ) where import Criterion (Benchmark, bgroup, bench, whnf) import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as BL import qualified Data.Text as T import qualified Data.Text.Encoding as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Encoding as TL benchmark :: String -> IO Benchmark benchmark string = do return $ bgroup "EncodeUtf8" [ bench "Text" $ whnf (B.length . T.encodeUtf8) text , bench "LazyText" $ whnf (BL.length . TL.encodeUtf8) lazyText ] where -- The string in different formats text = T.replicate k $ T.pack string lazyText = TL.replicate (fromIntegral k) $ TL.pack string -- Amount k = 100000 text-0.11.3.1/benchmarks/haskell/Benchmarks/Equality.hs0000644000000000000000000000254212140332415021051 0ustar0000000000000000-- | Compare a string with a copy of itself that is identical except -- for the last character. -- -- Tested in this benchmark: -- -- * Comparison of strings (Eq instance) -- module Benchmarks.Equality ( benchmark ) where import Criterion (Benchmark, bgroup, bench, whnf) import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy.Char8 as BL import qualified Data.Text as T import qualified Data.Text.Encoding as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Encoding as TL benchmark :: FilePath -> IO Benchmark benchmark fp = do b <- B.readFile fp bl1 <- BL.readFile fp -- A lazy bytestring is a list of chunks. When we do not explicitly create two -- different lazy bytestrings at a different address, the bytestring library -- will compare the chunk addresses instead of the chunk contents. This is why -- we read the lazy bytestring twice here. bl2 <- BL.readFile fp l <- readFile fp let t = T.decodeUtf8 b tl = TL.decodeUtf8 bl1 return $ bgroup "Equality" [ bench "Text" $ whnf (== T.init t `T.snoc` '\xfffd') t , bench "LazyText" $ whnf (== TL.init tl `TL.snoc` '\xfffd') tl , bench "ByteString" $ whnf (== B.init b `B.snoc` '\xfffd') b , bench "LazyByteString" $ whnf (== BL.init bl2 `BL.snoc` '\xfffd') bl1 , bench "String" $ whnf (== init l ++ "\xfffd") l ] text-0.11.3.1/benchmarks/haskell/Benchmarks/FileRead.hs0000644000000000000000000000217712140332415020733 0ustar0000000000000000-- | Benchmarks simple file reading -- -- Tested in this benchmark: -- -- * Reading a file from the disk -- module Benchmarks.FileRead ( benchmark ) where import Control.Exception (evaluate) import Criterion (Benchmark, bgroup, bench) import qualified Data.ByteString as SB import qualified Data.ByteString.Lazy as LB import qualified Data.Text as T import qualified Data.Text.Encoding as T import qualified Data.Text.IO as T import qualified Data.Text.Lazy as LT import qualified Data.Text.Lazy.Encoding as LT import qualified Data.Text.Lazy.IO as LT benchmark :: FilePath -> IO Benchmark benchmark p = return $ bgroup "FileRead" [ bench "String" $ readFile p >>= evaluate . length , bench "ByteString" $ SB.readFile p >>= evaluate . SB.length , bench "LazyByteString" $ LB.readFile p >>= evaluate . LB.length , bench "Text" $ T.readFile p >>= evaluate . T.length , bench "LazyText" $ LT.readFile p >>= evaluate . LT.length , bench "TextByteString" $ SB.readFile p >>= evaluate . T.length . T.decodeUtf8 , bench "LazyTextByteString" $ LB.readFile p >>= evaluate . LT.length . LT.decodeUtf8 ] text-0.11.3.1/benchmarks/haskell/Benchmarks/FoldLines.hs0000644000000000000000000000276412140332415021141 0ustar0000000000000000-- | Read a file line-by-line using handles, and perform a fold over the lines. -- The fold is used here to calculate the number of lines in the file. -- -- Tested in this benchmark: -- -- * Buffered, line-based IO -- {-# LANGUAGE BangPatterns #-} module Benchmarks.FoldLines ( benchmark ) where import Criterion (Benchmark, bgroup, bench) import System.IO import qualified Data.ByteString as B import qualified Data.Text as T import qualified Data.Text.IO as T benchmark :: FilePath -> IO Benchmark benchmark fp = return $ bgroup "ReadLines" [ bench "Text" $ withHandle $ foldLinesT (\n _ -> n + 1) (0 :: Int) , bench "ByteString" $ withHandle $ foldLinesB (\n _ -> n + 1) (0 :: Int) ] where withHandle f = do h <- openFile fp ReadMode hSetBuffering h (BlockBuffering (Just 16384)) x <- f h hClose h return x -- | Text line fold -- foldLinesT :: (a -> T.Text -> a) -> a -> Handle -> IO a foldLinesT f z0 h = go z0 where go !z = do eof <- hIsEOF h if eof then return z else do l <- T.hGetLine h let z' = f z l in go z' {-# INLINE foldLinesT #-} -- | ByteString line fold -- foldLinesB :: (a -> B.ByteString -> a) -> a -> Handle -> IO a foldLinesB f z0 h = go z0 where go !z = do eof <- hIsEOF h if eof then return z else do l <- B.hGetLine h let z' = f z l in go z' {-# INLINE foldLinesB #-} text-0.11.3.1/benchmarks/haskell/Benchmarks/Pure.hs0000644000000000000000000004647312140332415020202 0ustar0000000000000000-- | Benchmarks various pure functions from the Text library -- -- Tested in this benchmark: -- -- * Most pure functions defined the string types -- {-# LANGUAGE BangPatterns, CPP, GADTs, MagicHash #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Benchmarks.Pure ( benchmark ) where import Control.DeepSeq (NFData (..)) import Control.Exception (evaluate) import Criterion (Benchmark, bgroup, bench, nf) import Data.Char (toLower, toUpper) import Data.Monoid (mappend, mempty) import GHC.Base (Char (..), Int (..), chr#, ord#, (+#)) import qualified Data.ByteString.Char8 as BS import qualified Data.ByteString.Lazy.Char8 as BL import qualified Data.ByteString.Lazy.Internal as BL import qualified Data.ByteString.UTF8 as UTF8 import qualified Data.List as L import qualified Data.Text as T import qualified Data.Text.Encoding as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Builder as TB import qualified Data.Text.Lazy.Encoding as TL benchmark :: String -> FilePath -> IO Benchmark benchmark kind fp = do -- Evaluate stuff before actually running the benchmark, we don't want to -- count it here. -- ByteString A bsa <- BS.readFile fp -- Text A/B, LazyText A/B ta <- evaluate $ T.decodeUtf8 bsa tb <- evaluate $ T.toUpper ta tla <- evaluate $ TL.fromChunks (T.chunksOf 16376 ta) tlb <- evaluate $ TL.fromChunks (T.chunksOf 16376 tb) -- ByteString B, LazyByteString A/B bsb <- evaluate $ T.encodeUtf8 tb bla <- evaluate $ BL.fromChunks (chunksOf 16376 bsa) blb <- evaluate $ BL.fromChunks (chunksOf 16376 bsb) -- String A/B sa <- evaluate $ UTF8.toString bsa sb <- evaluate $ T.unpack tb -- Lengths bsa_len <- evaluate $ BS.length bsa ta_len <- evaluate $ T.length ta bla_len <- evaluate $ BL.length bla tla_len <- evaluate $ TL.length tla sa_len <- evaluate $ L.length sa -- Lines bsl <- evaluate $ BS.lines bsa bll <- evaluate $ BL.lines bla tl <- evaluate $ T.lines ta tll <- evaluate $ TL.lines tla sl <- evaluate $ L.lines sa return $ bgroup "Pure" [ bgroup "append" [ benchT $ nf (T.append tb) ta , benchTL $ nf (TL.append tlb) tla , benchBS $ nf (BS.append bsb) bsa , benchBSL $ nf (BL.append blb) bla , benchS $ nf ((++) sb) sa ] , bgroup "concat" [ benchT $ nf T.concat tl , benchTL $ nf TL.concat tll , benchBS $ nf BS.concat bsl , benchBSL $ nf BL.concat bll , benchS $ nf L.concat sl ] , bgroup "cons" [ benchT $ nf (T.cons c) ta , benchTL $ nf (TL.cons c) tla , benchBS $ nf (BS.cons c) bsa , benchBSL $ nf (BL.cons c) bla , benchS $ nf (c:) sa ] , bgroup "concatMap" [ benchT $ nf (T.concatMap (T.replicate 3 . T.singleton)) ta , benchTL $ nf (TL.concatMap (TL.replicate 3 . TL.singleton)) tla , benchBS $ nf (BS.concatMap (BS.replicate 3)) bsa , benchBSL $ nf (BL.concatMap (BL.replicate 3)) bla , benchS $ nf (L.concatMap (L.replicate 3 . (:[]))) sa ] , bgroup "decode" [ benchT $ nf T.decodeUtf8 bsa , benchTL $ nf TL.decodeUtf8 bla , benchBS $ nf BS.unpack bsa , benchBSL $ nf BL.unpack bla , benchS $ nf UTF8.toString bsa ] , bgroup "decode'" [ benchT $ nf T.decodeUtf8' bsa , benchTL $ nf TL.decodeUtf8' bla ] , bgroup "drop" [ benchT $ nf (T.drop (ta_len `div` 3)) ta , benchTL $ nf (TL.drop (tla_len `div` 3)) tla , benchBS $ nf (BS.drop (bsa_len `div` 3)) bsa , benchBSL $ nf (BL.drop (bla_len `div` 3)) bla , benchS $ nf (L.drop (sa_len `div` 3)) sa ] , bgroup "encode" [ benchT $ nf T.encodeUtf8 ta , benchTL $ nf TL.encodeUtf8 tla , benchBS $ nf BS.pack sa , benchBSL $ nf BL.pack sa , benchS $ nf UTF8.fromString sa ] , bgroup "filter" [ benchT $ nf (T.filter p0) ta , benchTL $ nf (TL.filter p0) tla , benchBS $ nf (BS.filter p0) bsa , benchBSL $ nf (BL.filter p0) bla , benchS $ nf (L.filter p0) sa ] , bgroup "filter.filter" [ benchT $ nf (T.filter p1 . T.filter p0) ta , benchTL $ nf (TL.filter p1 . TL.filter p0) tla , benchBS $ nf (BS.filter p1 . BS.filter p0) bsa , benchBSL $ nf (BL.filter p1 . BL.filter p0) bla , benchS $ nf (L.filter p1 . L.filter p0) sa ] , bgroup "foldl'" [ benchT $ nf (T.foldl' len 0) ta , benchTL $ nf (TL.foldl' len 0) tla , benchBS $ nf (BS.foldl' len 0) bsa , benchBSL $ nf (BL.foldl' len 0) bla , benchS $ nf (L.foldl' len 0) sa ] , bgroup "foldr" [ benchT $ nf (L.length . T.foldr (:) []) ta , benchTL $ nf (L.length . TL.foldr (:) []) tla , benchBS $ nf (L.length . BS.foldr (:) []) bsa , benchBSL $ nf (L.length . BL.foldr (:) []) bla , benchS $ nf (L.length . L.foldr (:) []) sa ] , bgroup "head" [ benchT $ nf T.head ta , benchTL $ nf TL.head tla , benchBS $ nf BS.head bsa , benchBSL $ nf BL.head bla , benchS $ nf L.head sa ] , bgroup "init" [ benchT $ nf T.init ta , benchTL $ nf TL.init tla , benchBS $ nf BS.init bsa , benchBSL $ nf BL.init bla , benchS $ nf L.init sa ] , bgroup "intercalate" [ benchT $ nf (T.intercalate tsw) tl , benchTL $ nf (TL.intercalate tlw) tll , benchBS $ nf (BS.intercalate bsw) bsl , benchBSL $ nf (BL.intercalate blw) bll , benchS $ nf (L.intercalate lw) sl ] , bgroup "intersperse" [ benchT $ nf (T.intersperse c) ta , benchTL $ nf (TL.intersperse c) tla , benchBS $ nf (BS.intersperse c) bsa , benchBSL $ nf (BL.intersperse c) bla , benchS $ nf (L.intersperse c) sa ] , bgroup "isInfixOf" [ benchT $ nf (T.isInfixOf tsw) ta , benchTL $ nf (TL.isInfixOf tlw) tla , benchBS $ nf (BS.isInfixOf bsw) bsa -- no isInfixOf for lazy bytestrings , benchS $ nf (L.isInfixOf lw) sa ] , bgroup "last" [ benchT $ nf T.last ta , benchTL $ nf TL.last tla , benchBS $ nf BS.last bsa , benchBSL $ nf BL.last bla , benchS $ nf L.last sa ] , bgroup "map" [ benchT $ nf (T.map f) ta , benchTL $ nf (TL.map f) tla , benchBS $ nf (BS.map f) bsa , benchBSL $ nf (BL.map f) bla , benchS $ nf (L.map f) sa ] , bgroup "mapAccumL" [ benchT $ nf (T.mapAccumL g 0) ta , benchTL $ nf (TL.mapAccumL g 0) tla , benchBS $ nf (BS.mapAccumL g 0) bsa , benchBSL $ nf (BL.mapAccumL g 0) bla , benchS $ nf (L.mapAccumL g 0) sa ] , bgroup "mapAccumR" [ benchT $ nf (T.mapAccumR g 0) ta , benchTL $ nf (TL.mapAccumR g 0) tla , benchBS $ nf (BS.mapAccumR g 0) bsa , benchBSL $ nf (BL.mapAccumR g 0) bla , benchS $ nf (L.mapAccumR g 0) sa ] , bgroup "map.map" [ benchT $ nf (T.map f . T.map f) ta , benchTL $ nf (TL.map f . TL.map f) tla , benchBS $ nf (BS.map f . BS.map f) bsa , benchBSL $ nf (BL.map f . BL.map f) bla , benchS $ nf (L.map f . L.map f) sa ] , bgroup "replicate char" [ benchT $ nf (T.replicate bsa_len) (T.singleton c) , benchTL $ nf (TL.replicate (fromIntegral bsa_len)) (TL.singleton c) , benchBS $ nf (BS.replicate bsa_len) c , benchBSL $ nf (BL.replicate (fromIntegral bsa_len)) c , benchS $ nf (L.replicate bsa_len) c ] , bgroup "replicate string" [ benchT $ nf (T.replicate (bsa_len `div` T.length tsw)) tsw , benchTL $ nf (TL.replicate (fromIntegral bsa_len `div` TL.length tlw)) tlw , benchS $ nf (replicat (bsa_len `div` T.length tsw)) lw ] , bgroup "reverse" [ benchT $ nf T.reverse ta , benchTL $ nf TL.reverse tla , benchBS $ nf BS.reverse bsa , benchBSL $ nf BL.reverse bla , benchS $ nf L.reverse sa ] , bgroup "take" [ benchT $ nf (T.take (ta_len `div` 3)) ta , benchTL $ nf (TL.take (tla_len `div` 3)) tla , benchBS $ nf (BS.take (bsa_len `div` 3)) bsa , benchBSL $ nf (BL.take (bla_len `div` 3)) bla , benchS $ nf (L.take (sa_len `div` 3)) sa ] , bgroup "tail" [ benchT $ nf T.tail ta , benchTL $ nf TL.tail tla , benchBS $ nf BS.tail bsa , benchBSL $ nf BL.tail bla , benchS $ nf L.tail sa ] , bgroup "toLower" [ benchT $ nf T.toLower ta , benchTL $ nf TL.toLower tla , benchBS $ nf (BS.map toLower) bsa , benchBSL $ nf (BL.map toLower) bla , benchS $ nf (L.map toLower) sa ] , bgroup "toUpper" [ benchT $ nf T.toUpper ta , benchTL $ nf TL.toUpper tla , benchBS $ nf (BS.map toUpper) bsa , benchBSL $ nf (BL.map toUpper) bla , benchS $ nf (L.map toUpper) sa ] , bgroup "words" [ benchT $ nf T.words ta , benchTL $ nf TL.words tla , benchBS $ nf BS.words bsa , benchBSL $ nf BL.words bla , benchS $ nf L.words sa ] , bgroup "zipWith" [ benchT $ nf (T.zipWith min tb) ta , benchTL $ nf (TL.zipWith min tlb) tla , benchBS $ nf (BS.zipWith min bsb) bsa , benchBSL $ nf (BL.zipWith min blb) bla , benchS $ nf (L.zipWith min sb) sa ] , bgroup "length" [ bgroup "cons" [ benchT $ nf (T.length . T.cons c) ta , benchTL $ nf (TL.length . TL.cons c) tla , benchBS $ nf (BS.length . BS.cons c) bsa , benchBSL $ nf (BL.length . BL.cons c) bla , benchS $ nf (L.length . (:) c) sa ] , bgroup "decode" [ benchT $ nf (T.length . T.decodeUtf8) bsa , benchTL $ nf (TL.length . TL.decodeUtf8) bla , benchBS $ nf (L.length . BS.unpack) bsa , benchBSL $ nf (L.length . BL.unpack) bla , bench "StringUTF8" $ nf (L.length . UTF8.toString) bsa ] , bgroup "drop" [ benchT $ nf (T.length . T.drop (ta_len `div` 3)) ta , benchTL $ nf (TL.length . TL.drop (tla_len `div` 3)) tla , benchBS $ nf (BS.length . BS.drop (bsa_len `div` 3)) bsa , benchBSL $ nf (BL.length . BL.drop (bla_len `div` 3)) bla , benchS $ nf (L.length . L.drop (sa_len `div` 3)) sa ] , bgroup "filter" [ benchT $ nf (T.length . T.filter p0) ta , benchTL $ nf (TL.length . TL.filter p0) tla , benchBS $ nf (BS.length . BS.filter p0) bsa , benchBSL $ nf (BL.length . BL.filter p0) bla , benchS $ nf (L.length . L.filter p0) sa ] , bgroup "filter.filter" [ benchT $ nf (T.length . T.filter p1 . T.filter p0) ta , benchTL $ nf (TL.length . TL.filter p1 . TL.filter p0) tla , benchBS $ nf (BS.length . BS.filter p1 . BS.filter p0) bsa , benchBSL $ nf (BL.length . BL.filter p1 . BL.filter p0) bla , benchS $ nf (L.length . L.filter p1 . L.filter p0) sa ] , bgroup "init" [ benchT $ nf (T.length . T.init) ta , benchTL $ nf (TL.length . TL.init) tla , benchBS $ nf (BS.length . BS.init) bsa , benchBSL $ nf (BL.length . BL.init) bla , benchS $ nf (L.length . L.init) sa ] , bgroup "intercalate" [ benchT $ nf (T.length . T.intercalate tsw) tl , benchTL $ nf (TL.length . TL.intercalate tlw) tll , benchBS $ nf (BS.length . BS.intercalate bsw) bsl , benchBSL $ nf (BL.length . BL.intercalate blw) bll , benchS $ nf (L.length . L.intercalate lw) sl ] , bgroup "intersperse" [ benchT $ nf (T.length . T.intersperse c) ta , benchTL $ nf (TL.length . TL.intersperse c) tla , benchBS $ nf (BS.length . BS.intersperse c) bsa , benchBSL $ nf (BL.length . BL.intersperse c) bla , benchS $ nf (L.length . L.intersperse c) sa ] , bgroup "map" [ benchT $ nf (T.length . T.map f) ta , benchTL $ nf (TL.length . TL.map f) tla , benchBS $ nf (BS.length . BS.map f) bsa , benchBSL $ nf (BL.length . BL.map f) bla , benchS $ nf (L.length . L.map f) sa ] , bgroup "map.map" [ benchT $ nf (T.length . T.map f . T.map f) ta , benchTL $ nf (TL.length . TL.map f . TL.map f) tla , benchBS $ nf (BS.length . BS.map f . BS.map f) bsa , benchS $ nf (L.length . L.map f . L.map f) sa ] , bgroup "replicate char" [ benchT $ nf (T.length . T.replicate bsa_len) (T.singleton c) , benchTL $ nf (TL.length . TL.replicate (fromIntegral bsa_len)) (TL.singleton c) , benchBS $ nf (BS.length . BS.replicate bsa_len) c , benchBSL $ nf (BL.length . BL.replicate (fromIntegral bsa_len)) c , benchS $ nf (L.length . L.replicate bsa_len) c ] , bgroup "replicate string" [ benchT $ nf (T.length . T.replicate (bsa_len `div` T.length tsw)) tsw , benchTL $ nf (TL.length . TL.replicate (fromIntegral bsa_len `div` TL.length tlw)) tlw , benchS $ nf (L.length . replicat (bsa_len `div` T.length tsw)) lw ] , bgroup "take" [ benchT $ nf (T.length . T.take (ta_len `div` 3)) ta , benchTL $ nf (TL.length . TL.take (tla_len `div` 3)) tla , benchBS $ nf (BS.length . BS.take (bsa_len `div` 3)) bsa , benchBSL $ nf (BL.length . BL.take (bla_len `div` 3)) bla , benchS $ nf (L.length . L.take (sa_len `div` 3)) sa ] , bgroup "tail" [ benchT $ nf (T.length . T.tail) ta , benchTL $ nf (TL.length . TL.tail) tla , benchBS $ nf (BS.length . BS.tail) bsa , benchBSL $ nf (BL.length . BL.tail) bla , benchS $ nf (L.length . L.tail) sa ] , bgroup "toLower" [ benchT $ nf (T.length . T.toLower) ta , benchTL $ nf (TL.length . TL.toLower) tla , benchBS $ nf (BS.length . BS.map toLower) bsa , benchBSL $ nf (BL.length . BL.map toLower) bla , benchS $ nf (L.length . L.map toLower) sa ] , bgroup "toUpper" [ benchT $ nf (T.length . T.toUpper) ta , benchTL $ nf (TL.length . TL.toUpper) tla , benchBS $ nf (BS.length . BS.map toUpper) bsa , benchBSL $ nf (BL.length . BL.map toUpper) bla , benchS $ nf (L.length . L.map toUpper) sa ] , bgroup "words" [ benchT $ nf (L.length . T.words) ta , benchTL $ nf (L.length . TL.words) tla , benchBS $ nf (L.length . BS.words) bsa , benchBSL $ nf (L.length . BL.words) bla , benchS $ nf (L.length . L.words) sa ] , bgroup "zipWith" [ benchT $ nf (T.length . T.zipWith min tb) ta , benchTL $ nf (TL.length . TL.zipWith min tlb) tla , benchBS $ nf (L.length . BS.zipWith min bsb) bsa , benchBSL $ nf (L.length . BL.zipWith min blb) bla , benchS $ nf (L.length . L.zipWith min sb) sa ] ] , bgroup "Builder" [ bench "mappend char" $ nf (TL.length . TB.toLazyText . mappendNChar 'a') 10000 , bench "mappend 8 char" $ nf (TL.length . TB.toLazyText . mappend8Char) 'a' , bench "mappend text" $ nf (TL.length . TB.toLazyText . mappendNText short) 10000 ] ] where benchS = bench ("String+" ++ kind) benchT = bench ("Text+" ++ kind) benchTL = bench ("LazyText+" ++ kind) benchBS = bench ("ByteString+" ++ kind) benchBSL = bench ("LazyByteString+" ++ kind) c = 'й' p0 = (== c) p1 = (/= 'д') lw = "право" bsw = UTF8.fromString lw blw = BL.fromChunks [bsw] tsw = T.pack lw tlw = TL.fromChunks [tsw] f (C# c#) = C# (chr# (ord# c# +# 1#)) g (I# i#) (C# c#) = (I# (i# +# 1#), C# (chr# (ord# c# +# i#))) len l _ = l + (1::Int) replicat n = concat . L.replicate n short = T.pack "short" #if !MIN_VERSION_bytestring(0,10,0) instance NFData BS.ByteString instance NFData BL.ByteString where rnf BL.Empty = () rnf (BL.Chunk _ ts) = rnf ts #endif data B where B :: NFData a => a -> B instance NFData B where rnf (B b) = rnf b -- | Split a bytestring in chunks -- chunksOf :: Int -> BS.ByteString -> [BS.ByteString] chunksOf k = go where go t = case BS.splitAt k t of (a,b) | BS.null a -> [] | otherwise -> a : go b -- | Append a character n times -- mappendNChar :: Char -> Int -> TB.Builder mappendNChar c n = go 0 where go i | i < n = TB.singleton c `mappend` go (i+1) | otherwise = mempty -- | Gives more opportunity for inlining and elimination of unnecesary -- bounds checks. -- mappend8Char :: Char -> TB.Builder mappend8Char c = TB.singleton c `mappend` TB.singleton c `mappend` TB.singleton c `mappend` TB.singleton c `mappend` TB.singleton c `mappend` TB.singleton c `mappend` TB.singleton c `mappend` TB.singleton c -- | Append a text N times -- mappendNText :: T.Text -> Int -> TB.Builder mappendNText t n = go 0 where go i | i < n = TB.fromText t `mappend` go (i+1) | otherwise = mempty text-0.11.3.1/benchmarks/haskell/Benchmarks/ReadNumbers.hs0000644000000000000000000000646312140332415021471 0ustar0000000000000000-- | Read numbers from a file with a just a number on each line, find the -- minimum of those numbers. The file contains different kinds of numbers: -- -- * Decimals -- -- * Hexadecimals -- -- * Floating point numbers -- -- * Floating point numbers in scientific notation -- -- The different benchmarks will only take into account the values they can -- parse. -- -- Tested in this benchmark: -- -- * Lexing/parsing of different numerical types -- module Benchmarks.ReadNumbers ( benchmark ) where import Criterion (Benchmark, bgroup, bench, whnf) import Data.List (foldl') import Numeric (readDec, readFloat, readHex) import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy.Char8 as BL import qualified Data.ByteString.Lex.Double as B import qualified Data.ByteString.Lex.Lazy.Double as BL import qualified Data.Text as T import qualified Data.Text.IO as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.IO as TL import qualified Data.Text.Lazy.Read as TL import qualified Data.Text.Read as T benchmark :: FilePath -> IO Benchmark benchmark fp = do -- Read all files into lines: string, text, lazy text, bytestring, lazy -- bytestring s <- lines `fmap` readFile fp t <- T.lines `fmap` T.readFile fp tl <- TL.lines `fmap` TL.readFile fp b <- B.lines `fmap` B.readFile fp bl <- BL.lines `fmap` BL.readFile fp return $ bgroup "ReadNumbers" [ bench "DecimalString" $ whnf (int . string readDec) s , bench "HexadecimalString" $ whnf (int . string readHex) s , bench "DoubleString" $ whnf (double . string readFloat) s , bench "DecimalText" $ whnf (int . text (T.signed T.decimal)) t , bench "HexadecimalText" $ whnf (int . text (T.signed T.hexadecimal)) t , bench "DoubleText" $ whnf (double . text T.double) t , bench "RationalText" $ whnf (double . text T.rational) t , bench "DecimalLazyText" $ whnf (int . text (TL.signed TL.decimal)) tl , bench "HexadecimalLazyText" $ whnf (int . text (TL.signed TL.hexadecimal)) tl , bench "DoubleLazyText" $ whnf (double . text TL.double) tl , bench "RationalLazyText" $ whnf (double . text TL.rational) tl , bench "DecimalByteString" $ whnf (int . byteString B.readInt) b , bench "DoubleByteString" $ whnf (double . byteString B.readDouble) b , bench "DecimalLazyByteString" $ whnf (int . byteString BL.readInt) bl , bench "DoubleLazyByteString" $ whnf (double . byteString BL.readDouble) bl ] where -- Used for fixing types int :: Int -> Int int = id double :: Double -> Double double = id string :: (Ord a, Num a) => (t -> [(a, t)]) -> [t] -> a string reader = foldl' go 1000000 where go z t = case reader t of [(n, _)] -> min n z _ -> z text :: (Ord a, Num a) => (t -> Either String (a,t)) -> [t] -> a text reader = foldl' go 1000000 where go z t = case reader t of Left _ -> z Right (n, _) -> min n z byteString :: (Ord a, Num a) => (t -> Maybe (a,t)) -> [t] -> a byteString reader = foldl' go 1000000 where go z t = case reader t of Nothing -> z Just (n, _) -> min n z text-0.11.3.1/benchmarks/haskell/Benchmarks/Replace.hs0000644000000000000000000000173012140332415020625 0ustar0000000000000000-- | Replace a string by another string -- -- Tested in this benchmark: -- -- * Search and replace of a pattern in a text -- module Benchmarks.Replace ( benchmark ) where import Criterion (Benchmark, bgroup, bench, nf) import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy as BL import qualified Data.ByteString.Lazy.Search as BL import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Encoding as TL import qualified Data.Text.Lazy.IO as TL benchmark :: FilePath -> String -> String -> IO Benchmark benchmark fp pat sub = do tl <- TL.readFile fp bl <- BL.readFile fp return $ bgroup "Replace" [ bench "LazyText" $ nf (TL.length . TL.replace tpat tsub) tl , bench "LazyByteString" $ nf (BL.length . BL.replace bpat bsub) bl ] where tpat = TL.pack pat tsub = TL.pack sub bpat = B.concat $ BL.toChunks $ TL.encodeUtf8 tpat bsub = B.concat $ BL.toChunks $ TL.encodeUtf8 tsub text-0.11.3.1/benchmarks/haskell/Benchmarks/Search.hs0000644000000000000000000000301512140332415020455 0ustar0000000000000000-- | Search for a pattern in a file, find the number of occurences -- -- Tested in this benchmark: -- -- * Searching all occurences of a pattern using library routines -- module Benchmarks.Search ( benchmark ) where import Criterion (Benchmark, bench, bgroup, whnf) import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as BL import qualified Data.ByteString.Lazy.Search as BL import qualified Data.ByteString.Search as B import qualified Data.Text as T import qualified Data.Text.Encoding as T import qualified Data.Text.IO as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.IO as TL benchmark :: FilePath -> T.Text -> IO Benchmark benchmark fp needleT = do b <- B.readFile fp bl <- BL.readFile fp t <- T.readFile fp tl <- TL.readFile fp return $ bgroup "FileIndices" [ bench "ByteString" $ whnf (byteString needleB) b , bench "LazyByteString" $ whnf (lazyByteString needleB) bl , bench "Text" $ whnf (text needleT) t , bench "LazyText" $ whnf (lazyText needleTL) tl ] where needleB = T.encodeUtf8 needleT needleTL = TL.fromChunks [needleT] byteString :: B.ByteString -> B.ByteString -> Int byteString needle = length . B.indices needle lazyByteString :: B.ByteString -> BL.ByteString -> Int lazyByteString needle = length . BL.indices needle text :: T.Text -> T.Text -> Int text = T.count lazyText :: TL.Text -> TL.Text -> Int lazyText needle = fromIntegral . TL.count needle text-0.11.3.1/benchmarks/haskell/Benchmarks/Stream.hs0000644000000000000000000000636512140332415020516 0ustar0000000000000000-- | This module contains a number of benchmarks for the different streaming -- functions -- -- Tested in this benchmark: -- -- * Most streaming functions -- {-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Benchmarks.Stream ( benchmark ) where import Control.DeepSeq (NFData (..)) import Criterion (Benchmark, bgroup, bench, nf) import Data.Text.Fusion.Internal (Step (..), Stream (..)) import qualified Data.Text.Encoding as T import qualified Data.Text.Encoding.Error as E import qualified Data.Text.Encoding.Fusion as T import qualified Data.Text.Encoding.Fusion.Common as F import qualified Data.Text.Fusion as T import qualified Data.Text.IO as T import qualified Data.Text.Lazy.Encoding as TL import qualified Data.Text.Lazy.Encoding.Fusion as TL import qualified Data.Text.Lazy.Fusion as TL import qualified Data.Text.Lazy.IO as TL instance NFData a => NFData (Stream a) where -- Currently, this implementation does not force evaluation of the size hint rnf (Stream next s0 _) = go s0 where go !s = case next s of Done -> () Skip s' -> go s' Yield x s' -> rnf x `seq` go s' benchmark :: FilePath -> IO Benchmark benchmark fp = do -- Different formats t <- T.readFile fp let !utf8 = T.encodeUtf8 t !utf16le = T.encodeUtf16LE t !utf16be = T.encodeUtf16BE t !utf32le = T.encodeUtf32LE t !utf32be = T.encodeUtf32BE t -- Once again for the lazy variants tl <- TL.readFile fp let !utf8L = TL.encodeUtf8 tl !utf16leL = TL.encodeUtf16LE tl !utf16beL = TL.encodeUtf16BE tl !utf32leL = TL.encodeUtf32LE tl !utf32beL = TL.encodeUtf32BE tl -- For the functions which operate on streams let !s = T.stream t return $ bgroup "Stream" -- Fusion [ bgroup "stream" $ [ bench "Text" $ nf T.stream t , bench "LazyText" $ nf TL.stream tl ] -- Encoding.Fusion , bgroup "streamUtf8" [ bench "Text" $ nf (T.streamUtf8 E.lenientDecode) utf8 , bench "LazyText" $ nf (TL.streamUtf8 E.lenientDecode) utf8L ] , bgroup "streamUtf16LE" [ bench "Text" $ nf (T.streamUtf16LE E.lenientDecode) utf16le , bench "LazyText" $ nf (TL.streamUtf16LE E.lenientDecode) utf16leL ] , bgroup "streamUtf16BE" [ bench "Text" $ nf (T.streamUtf16BE E.lenientDecode) utf16be , bench "LazyText" $ nf (TL.streamUtf16BE E.lenientDecode) utf16beL ] , bgroup "streamUtf32LE" [ bench "Text" $ nf (T.streamUtf32LE E.lenientDecode) utf32le , bench "LazyText" $ nf (TL.streamUtf32LE E.lenientDecode) utf32leL ] , bgroup "streamUtf32BE" [ bench "Text" $ nf (T.streamUtf32BE E.lenientDecode) utf32be , bench "LazyText" $ nf (TL.streamUtf32BE E.lenientDecode) utf32beL ] -- Encoding.Fusion.Common , bench "restreamUtf16LE" $ nf F.restreamUtf16LE s , bench "restreamUtf16BE" $ nf F.restreamUtf16BE s , bench "restreamUtf32LE" $ nf F.restreamUtf32LE s , bench "restreamUtf32BE" $ nf F.restreamUtf32BE s ] text-0.11.3.1/benchmarks/haskell/Benchmarks/WordFrequencies.hs0000644000000000000000000000201112140332415022350 0ustar0000000000000000-- | A word frequency count using the different string types -- -- Tested in this benchmark: -- -- * Splitting into words -- -- * Converting to lowercase -- -- * Comparing: Eq/Ord instances -- module Benchmarks.WordFrequencies ( benchmark ) where import Criterion (Benchmark, bench, bgroup, whnf) import Data.Char (toLower) import Data.List (foldl') import Data.Map (Map) import qualified Data.ByteString.Char8 as B import qualified Data.Map as M import qualified Data.Text as T import qualified Data.Text.IO as T benchmark :: FilePath -> IO Benchmark benchmark fp = do s <- readFile fp b <- B.readFile fp t <- T.readFile fp return $ bgroup "WordFrequencies" [ bench "String" $ whnf (frequencies . words . map toLower) s , bench "ByteString" $ whnf (frequencies . B.words . B.map toLower) b , bench "Text" $ whnf (frequencies . T.words . T.toLower) t ] frequencies :: Ord a => [a] -> Map a Int frequencies = foldl' (\m k -> M.insertWith (+) k 1 m) M.empty text-0.11.3.1/benchmarks/python/0000755000000000000000000000000012140332415014536 5ustar0000000000000000text-0.11.3.1/benchmarks/python/cut.py0000644000000000000000000000044412140332415015705 0ustar0000000000000000#!/usr/bin/env python import utils, sys, codecs def cut(filename, l, r): content = open(filename, encoding='utf-8') for line in content: print(line[l:r]) for f in sys.argv[1:]: t = utils.benchmark(lambda: cut(f, 20, 40)) sys.stderr.write('{0}: {1}\n'.format(f, t)) text-0.11.3.1/benchmarks/python/multilang.py0000755000000000000000000000200612140332415017105 0ustar0000000000000000#!/usr/bin/env python import math import sys import time def find_first(): cf = contents.find return timer(lambda: cf("en:Benin")) def timer(f, count=100): a = 1e300 def g(): return for i in xrange(3): start = time.time() for j in xrange(count): g() a = min(a, (time.time() - start) / count) b = 1e300 for i in xrange(3): start = time.time() for j in xrange(count): f() b = min(b, (time.time() - start) / count) return round(b - a, int(round(math.log(count, 10) - math.log(b - a, 10)))) contents = open('../../tests/text-test-data/yiwiki.xml', 'r').read() contents = contents.decode('utf-8') benchmarks = ( find_first, ) to_run = sys.argv[1:] bms = [] if to_run: for r in to_run: for b in benchmarks: if b.__name__.startswith(r): bms.append(b) else: bms = benchmarks for b in bms: sys.stdout.write(b.__name__ + ' ') sys.stdout.flush() print b() text-0.11.3.1/benchmarks/python/sort.py0000644000000000000000000000047312140332415016103 0ustar0000000000000000#!/usr/bin/env python import utils, sys, codecs def sort(filename): content = open(filename, encoding='utf-8').read() lines = content.splitlines() lines.sort() print('\n'.join(lines)) for f in sys.argv[1:]: t = utils.benchmark(lambda: sort(f)) sys.stderr.write('{0}: {1}\n'.format(f, t)) text-0.11.3.1/benchmarks/python/strip_tags.py0000644000000000000000000000067512140332415017277 0ustar0000000000000000#!/usr/bin/env python import utils, sys def strip_tags(filename): string = open(filename, encoding='utf-8').read() d = 0 out = [] for c in string: if c == '<': d += 1 if d > 0: out += ' ' else: out += c if c == '>': d -= 1 print(''.join(out)) for f in sys.argv[1:]: t = utils.benchmark(lambda: strip_tags(f)) sys.stderr.write('{0}: {1}\n'.format(f, t)) text-0.11.3.1/benchmarks/python/utils.py0000755000000000000000000000055112140332415016254 0ustar0000000000000000#!/usr/bin/env python import sys, time def benchmark_once(f): start = time.time() f() end = time.time() return end - start def benchmark(f): runs = 100 total = 0.0 for i in range(runs): result = benchmark_once(f) sys.stderr.write('Run {0}: {1}\n'.format(i, result)) total += result return total / runs text-0.11.3.1/benchmarks/ruby/0000755000000000000000000000000012140332415014176 5ustar0000000000000000text-0.11.3.1/benchmarks/ruby/cut.rb0000644000000000000000000000041312140332415015314 0ustar0000000000000000#!/usr/bin/env ruby require './utils.rb' def cut(filename, l, r) File.open(filename, 'r:utf-8') do |file| file.each_line do |line| puts line[l, r - l] end end end ARGV.each do |f| t = benchmark { cut(f, 20, 40) } STDERR.puts "#{f}: #{t}" end text-0.11.3.1/benchmarks/ruby/fold.rb0000644000000000000000000000210712140332415015447 0ustar0000000000000000#!/usr/bin/env ruby require './utils.rb' def fold(filename, max_width) File.open(filename, 'r:utf-8') do |file| # Words in this paragraph paragraph = [] file.each_line do |line| # If we encounter an empty line, we reformat and dump the current # paragraph if line.strip.empty? puts fold_paragraph(paragraph, max_width) puts paragraph = [] # Otherwise, we append the words found in the line to the paragraph else paragraph.concat line.split end end # Last paragraph puts fold_paragraph(paragraph, max_width) unless paragraph.empty? end end # Fold a single paragraph to the desired width def fold_paragraph(paragraph, max_width) # Gradually build our output str, *rest = paragraph width = str.length rest.each do |word| if width + word.length + 1 <= max_width str << ' ' << word width += word.length + 1 else str << "\n" << word width = word.length end end str end ARGV.each do |f| t = benchmark { fold(f, 80) } STDERR.puts "#{f}: #{t}" end text-0.11.3.1/benchmarks/ruby/sort.rb0000644000000000000000000000037112140332415015513 0ustar0000000000000000#!/usr/bin/env ruby require './utils.rb' def sort(filename) File.open(filename, 'r:utf-8') do |file| content = file.read puts content.lines.sort.join end end ARGV.each do |f| t = benchmark { sort(f) } STDERR.puts "#{f}: #{t}" end text-0.11.3.1/benchmarks/ruby/strip_tags.rb0000644000000000000000000000055012140332415016702 0ustar0000000000000000#!/usr/bin/env ruby require './utils.rb' def strip_tags(filename) File.open(filename, 'r:utf-8') do |file| str = file.read d = 0 str.each_char do |c| d += 1 if c == '<' putc(if d > 0 then ' ' else c end) d -= 1 if c == '>' end end end ARGV.each do |f| t = benchmark { strip_tags(f) } STDERR.puts "#{f}: #{t}" end text-0.11.3.1/benchmarks/ruby/utils.rb0000644000000000000000000000033412140332415015663 0ustar0000000000000000require 'benchmark' def benchmark(&block) runs = 100 total = 0 runs.times do |i| result = Benchmark.measure(&block).total $stderr.puts "Run #{i}: #{result}" total += result end total / runs end text-0.11.3.1/cbits/0000755000000000000000000000000012140332415012204 5ustar0000000000000000text-0.11.3.1/cbits/cbits.c0000644000000000000000000001135012140332415013454 0ustar0000000000000000/* * Copyright (c) 2011 Bryan O'Sullivan . * * Portions copyright (c) 2008-2010 Björn Höhrmann . * * See http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ for details. */ #include #include #include void _hs_text_memcpy(void *dest, size_t doff, const void *src, size_t soff, size_t n) { memcpy(dest + (doff<<1), src + (soff<<1), n<<1); } int _hs_text_memcmp(const void *a, size_t aoff, const void *b, size_t boff, size_t n) { return memcmp(a + (aoff<<1), b + (boff<<1), n<<1); } #define UTF8_ACCEPT 0 #define UTF8_REJECT 12 static const uint8_t utf8d[] = { /* * The first part of the table maps bytes to character classes that * to reduce the size of the transition table and create bitmasks. */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 8,8,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 10,3,3,3,3,3,3,3,3,3,3,3,3,4,3,3, 11,6,6,6,5,8,8,8,8,8,8,8,8,8,8,8, /* * The second part is a transition table that maps a combination of * a state of the automaton and a character class to a state. */ 0,12,24,36,60,96,84,12,12,12,48,72, 12,12,12,12,12,12,12,12,12,12,12,12, 12, 0,12,12,12,12,12, 0,12, 0,12,12, 12,24,12,12,12,12,12,24,12,24,12,12, 12,12,12,12,12,12,12,24,12,12,12,12, 12,24,12,12,12,12,12,12,12,24,12,12, 12,12,12,12,12,12,12,36,12,36,12,12, 12,36,12,12,12,12,12,36,12,36,12,12, 12,36,12,12,12,12,12,12,12,12,12,12, }; static inline uint32_t decode(uint32_t *state, uint32_t* codep, uint32_t byte) { uint32_t type = utf8d[byte]; *codep = (*state != UTF8_ACCEPT) ? (byte & 0x3fu) | (*codep << 6) : (0xff >> type) & (byte); return *state = utf8d[256 + *state + type]; } /* * The ISO 8859-1 (aka latin-1) code points correspond exactly to the first 256 unicode * code-points, therefore we can trivially convert from a latin-1 encoded bytestring to * an UTF16 array */ void _hs_text_decode_latin1(uint16_t *dest, const uint8_t const *src, const uint8_t const *srcend) { const uint8_t *p = src; #if defined(__i386__) || defined(__x86_64__) /* This optimization works on a little-endian systems by using (aligned) 32-bit loads instead of 8-bit loads */ /* consume unaligned prefix */ while (p != srcend && (uintptr_t)p & 0x3) *dest++ = *p++; /* iterate over 32-bit aligned loads */ while (p < srcend - 3) { const uint32_t w = *((const uint32_t *)p); *dest++ = w & 0xff; *dest++ = (w >> 8) & 0xff; *dest++ = (w >> 16) & 0xff; *dest++ = (w >> 24) & 0xff; p += 4; } #endif /* handle unaligned suffix */ while (p != srcend) *dest++ = *p++; } /* * A best-effort decoder. Runs until it hits either end of input or * the start of an invalid byte sequence. * * At exit, updates *destoff with the next offset to write to, and * returns the next source offset to read from. */ uint8_t const * _hs_text_decode_utf8(uint16_t *dest, size_t *destoff, const uint8_t const *src, const uint8_t const *srcend) { uint16_t *d = dest + *destoff; const uint8_t const *s = src; uint32_t state = UTF8_ACCEPT; while (s < srcend) { uint32_t codepoint; #if defined(__i386__) || defined(__x86_64__) /* * This code will only work on a little-endian system that * supports unaligned loads. * * It gives a substantial speed win on data that is purely or * partly ASCII (e.g. HTML), at only a slight cost on purely * non-ASCII text. */ if (state == UTF8_ACCEPT) { while (s < srcend - 4) { codepoint = *((uint32_t *) s); if ((codepoint & 0x80808080) != 0) break; s += 4; /* * Tried 32-bit stores here, but the extra bit-twiddling * slowed the code down. */ *d++ = (uint16_t) (codepoint & 0xff); *d++ = (uint16_t) ((codepoint >> 8) & 0xff); *d++ = (uint16_t) ((codepoint >> 16) & 0xff); *d++ = (uint16_t) ((codepoint >> 24) & 0xff); } } #endif if (decode(&state, &codepoint, *s++) != UTF8_ACCEPT) { if (state != UTF8_REJECT) continue; break; } if (codepoint <= 0xffff) *d++ = (uint16_t) codepoint; else { *d++ = (uint16_t) (0xD7C0 + (codepoint >> 10)); *d++ = (uint16_t) (0xDC00 + (codepoint & 0x3FF)); } } /* Error recovery - if we're not in a valid finishing state, back up. */ if (state != UTF8_ACCEPT) s -= 1; *destoff = d - dest; return s; } text-0.11.3.1/Data/0000755000000000000000000000000012140332415011751 5ustar0000000000000000text-0.11.3.1/Data/Text.hs0000644000000000000000000015151412140332415013240 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, MagicHash, Rank2Types, UnboxedTuples #-} {-# OPTIONS_GHC -fno-warn-orphans #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text -- Copyright : (c) 2009, 2010, 2011, 2012 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts, -- (c) 2008, 2009 Tom Harper -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- A time and space-efficient implementation of Unicode text. -- Suitable for performance critical use, both in terms of large data -- quantities and high speed. -- -- /Note/: Read below the synopsis for important notes on the use of -- this module. -- -- This module is intended to be imported @qualified@, to avoid name -- clashes with "Prelude" functions, e.g. -- -- > import qualified Data.Text as T -- -- To use an extended and very rich family of functions for working -- with Unicode text (including normalization, regular expressions, -- non-standard encodings, text breaking, and locales), see the -- @text-icu@ package: module Data.Text ( -- * Strict vs lazy types -- $strict -- * Acceptable data -- $replacement -- * Fusion -- $fusion -- * Types Text -- * Creation and elimination , pack , unpack , singleton , empty -- * Basic interface , cons , snoc , append , uncons , head , last , tail , init , null , length , compareLength -- * Transformations , map , intercalate , intersperse , transpose , reverse , replace -- ** Case conversion -- $case , toCaseFold , toLower , toUpper -- ** Justification , justifyLeft , justifyRight , center -- * Folds , foldl , foldl' , foldl1 , foldl1' , foldr , foldr1 -- ** Special folds , concat , concatMap , any , all , maximum , minimum -- * Construction -- ** Scans , scanl , scanl1 , scanr , scanr1 -- ** Accumulating maps , mapAccumL , mapAccumR -- ** Generation and unfolding , replicate , unfoldr , unfoldrN -- * Substrings -- ** Breaking strings , take , drop , takeWhile , dropWhile , dropWhileEnd , dropAround , strip , stripStart , stripEnd , splitAt , breakOn , breakOnEnd , break , span , group , groupBy , inits , tails -- ** Breaking into many substrings -- $split , splitOn , split , chunksOf -- ** Breaking into lines and words , lines --, lines' , words , unlines , unwords -- * Predicates , isPrefixOf , isSuffixOf , isInfixOf -- ** View patterns , stripPrefix , stripSuffix , commonPrefixes -- * Searching , filter , breakOnAll , find , partition -- , findSubstring -- * Indexing -- $index , index , findIndex , count -- * Zipping and unzipping , zip , zipWith -- -* Ordered text -- , sort -- * Low level operations , copy ) where import Prelude (Char, Bool(..), Int, Maybe(..), String, Eq(..), Ord(..), Ordering(..), (++), Read(..), Show(..), (&&), (||), (+), (-), (.), ($), ($!), (>>), (*), maxBound, not, return, otherwise, quot) #if defined(HAVE_DEEPSEQ) import Control.DeepSeq (NFData) #endif #if defined(ASSERTS) import Control.Exception (assert) #endif import Data.Char (isSpace) import Data.Data (Data(gfoldl, toConstr, gunfold, dataTypeOf)) #if __GLASGOW_HASKELL__ >= 612 import Data.Data (mkNoRepType) #else import Data.Data (mkNorepType) #endif import Control.Monad (foldM) import qualified Data.Text.Array as A import qualified Data.List as L import Data.Monoid (Monoid(..)) import Data.String (IsString(..)) import qualified Data.Text.Fusion as S import qualified Data.Text.Fusion.Common as S import Data.Text.Fusion (stream, reverseStream, unstream) import Data.Text.Private (span_) import Data.Text.Internal (Text(..), empty, firstf, safe, text, textP) import qualified Prelude as P import Data.Text.Unsafe (Iter(..), iter, iter_, lengthWord16, reverseIter, unsafeHead, unsafeTail) import Data.Text.UnsafeChar (unsafeChr) import qualified Data.Text.Util as U import qualified Data.Text.Encoding.Utf16 as U16 import Data.Text.Search (indices) #if defined(__HADDOCK__) import Data.ByteString (ByteString) import qualified Data.Text.Lazy as L import Data.Int (Int64) #endif #if __GLASGOW_HASKELL__ >= 702 import qualified GHC.CString as GHC #else import qualified GHC.Base as GHC #endif import GHC.Prim (Addr#) -- $strict -- -- This package provides both strict and lazy 'Text' types. The -- strict type is provided by the 'Data.Text' package, while the lazy -- type is provided by the 'Data.Text.Lazy' package. Internally, the -- lazy @Text@ type consists of a list of strict chunks. -- -- The strict 'Text' type requires that an entire string fit into -- memory at once. The lazy @Text@ type is capable of streaming -- strings that are larger than memory using a small memory footprint. -- In many cases, the overhead of chunked streaming makes the lazy -- @Text@ type slower than its strict counterpart, but this is not -- always the case. Sometimes, the time complexity of a function in -- one module may be different from the other, due to their differing -- internal structures. -- -- Each module provides an almost identical API, with the main -- difference being that the strict module uses 'Int' values for -- lengths and counts, while the lazy module uses 'Int64' lengths. -- $replacement -- -- A 'Text' value is a sequence of Unicode scalar values, as defined -- in §3.9, definition D76 of the Unicode 5.2 standard: -- . As -- such, a 'Text' cannot contain values in the range U+D800 to U+DFFF -- inclusive. Haskell implementations admit all Unicode code points -- (§3.4, definition D10) as 'Char' values, including code points -- from this invalid range. This means that there are some 'Char' -- values that are not valid Unicode scalar values, and the functions -- in this module must handle those cases. -- -- Within this module, many functions construct a 'Text' from one or -- more 'Char' values. Those functions will substitute 'Char' values -- that are not valid Unicode scalar values with the replacement -- character \"�\" (U+FFFD). Functions that perform this -- inspection and replacement are documented with the phrase -- \"Performs replacement on invalid scalar values\". -- -- (One reason for this policy of replacement is that internally, a -- 'Text' value is represented as packed UTF-16 data. Values in the -- range U+D800 through U+DFFF are used by UTF-16 to denote surrogate -- code points, and so cannot be represented. The functions replace -- invalid scalar values, instead of dropping them, as a security -- measure. For details, see Unicode Technical Report 36, §3.5: -- ) -- $fusion -- -- Most of the functions in this module are subject to /fusion/, -- meaning that a pipeline of such functions will usually allocate at -- most one 'Text' value. -- -- As an example, consider the following pipeline: -- -- > import Data.Text as T -- > import Data.Text.Encoding as E -- > import Data.ByteString (ByteString) -- > -- > countChars :: ByteString -> Int -- > countChars = T.length . T.toUpper . E.decodeUtf8 -- -- From the type signatures involved, this looks like it should -- allocate one 'ByteString' value, and two 'Text' values. However, -- when a module is compiled with optimisation enabled under GHC, the -- two intermediate 'Text' values will be optimised away, and the -- function will be compiled down to a single loop over the source -- 'ByteString'. -- -- Functions that can be fused by the compiler are documented with the -- phrase \"Subject to fusion\". instance Eq Text where Text arrA offA lenA == Text arrB offB lenB | lenA == lenB = A.equal arrA offA arrB offB lenA | otherwise = False {-# INLINE (==) #-} instance Ord Text where compare = compareText instance Show Text where showsPrec p ps r = showsPrec p (unpack ps) r instance Read Text where readsPrec p str = [(pack x,y) | (x,y) <- readsPrec p str] instance Monoid Text where mempty = empty mappend = append mconcat = concat instance IsString Text where fromString = pack #if defined(HAVE_DEEPSEQ) instance NFData Text #endif -- This instance preserves data abstraction at the cost of inefficiency. -- We omit reflection services for the sake of data abstraction. -- -- This instance was created by copying the behavior of Data.Set and -- Data.Map. If you feel a mistake has been made, please feel free to -- submit improvements. -- -- Original discussion is archived here: -- "could we get a Data instance for Data.Text.Text?" -- http://groups.google.com/group/haskell-cafe/browse_thread/thread/b5bbb1b28a7e525d/0639d46852575b93 instance Data Text where gfoldl f z txt = z pack `f` (unpack txt) toConstr _ = P.error "Data.Text.Text.toConstr" gunfold _ _ = P.error "Data.Text.Text.gunfold" #if __GLASGOW_HASKELL__ >= 612 dataTypeOf _ = mkNoRepType "Data.Text.Text" #else dataTypeOf _ = mkNorepType "Data.Text.Text" #endif -- | /O(n)/ Compare two 'Text' values lexicographically. compareText :: Text -> Text -> Ordering compareText ta@(Text _arrA _offA lenA) tb@(Text _arrB _offB lenB) | lenA == 0 && lenB == 0 = EQ | otherwise = go 0 0 where go !i !j | i >= lenA || j >= lenB = compare lenA lenB | a < b = LT | a > b = GT | otherwise = go (i+di) (j+dj) where Iter a di = iter ta i Iter b dj = iter tb j -- ----------------------------------------------------------------------------- -- * Conversion to/from 'Text' -- | /O(n)/ Convert a 'String' into a 'Text'. Subject to -- fusion. Performs replacement on invalid scalar values. pack :: String -> Text pack = unstream . S.map safe . S.streamList {-# INLINE [1] pack #-} -- | /O(n)/ Convert a Text into a String. Subject to fusion. unpack :: Text -> String unpack = S.unstreamList . stream {-# INLINE [1] unpack #-} -- | /O(n)/ Convert a literal string into a Text. Subject to fusion. unpackCString# :: Addr# -> Text unpackCString# addr# = unstream (S.streamCString# addr#) {-# NOINLINE unpackCString# #-} {-# RULES "TEXT literal" forall a. unstream (S.map safe (S.streamList (GHC.unpackCString# a))) = unpackCString# a #-} {-# RULES "TEXT literal UTF8" forall a. unstream (S.map safe (S.streamList (GHC.unpackCStringUtf8# a))) = unpackCString# a #-} -- | /O(1)/ Convert a character into a Text. Subject to fusion. -- Performs replacement on invalid scalar values. singleton :: Char -> Text singleton = unstream . S.singleton . safe {-# INLINE [1] singleton #-} -- ----------------------------------------------------------------------------- -- * Basic functions -- | /O(n)/ Adds a character to the front of a 'Text'. This function -- is more costly than its 'List' counterpart because it requires -- copying a new array. Subject to fusion. Performs replacement on -- invalid scalar values. cons :: Char -> Text -> Text cons c t = unstream (S.cons (safe c) (stream t)) {-# INLINE cons #-} infixr 5 `cons` -- | /O(n)/ Adds a character to the end of a 'Text'. This copies the -- entire array in the process, unless fused. Subject to fusion. -- Performs replacement on invalid scalar values. snoc :: Text -> Char -> Text snoc t c = unstream (S.snoc (stream t) (safe c)) {-# INLINE snoc #-} -- | /O(n)/ Appends one 'Text' to the other by copying both of them -- into a new 'Text'. Subject to fusion. append :: Text -> Text -> Text append a@(Text arr1 off1 len1) b@(Text arr2 off2 len2) | len1 == 0 = b | len2 == 0 = a | len > 0 = Text (A.run x) 0 len | otherwise = overflowError "append" where len = len1+len2 x = do arr <- A.new len A.copyI arr 0 arr1 off1 len1 A.copyI arr len1 arr2 off2 len return arr {-# INLINE append #-} {-# RULES "TEXT append -> fused" [~1] forall t1 t2. append t1 t2 = unstream (S.append (stream t1) (stream t2)) "TEXT append -> unfused" [1] forall t1 t2. unstream (S.append (stream t1) (stream t2)) = append t1 t2 #-} -- | /O(1)/ Returns the first character of a 'Text', which must be -- non-empty. Subject to fusion. head :: Text -> Char head t = S.head (stream t) {-# INLINE head #-} -- | /O(1)/ Returns the first character and rest of a 'Text', or -- 'Nothing' if empty. Subject to fusion. uncons :: Text -> Maybe (Char, Text) uncons t@(Text arr off len) | len <= 0 = Nothing | otherwise = Just (c, textP arr (off+d) (len-d)) where Iter c d = iter t 0 {-# INLINE [1] uncons #-} -- | Lifted from Control.Arrow and specialized. second :: (b -> c) -> (a,b) -> (a,c) second f (a, b) = (a, f b) -- | /O(1)/ Returns the last character of a 'Text', which must be -- non-empty. Subject to fusion. last :: Text -> Char last (Text arr off len) | len <= 0 = emptyError "last" | n < 0xDC00 || n > 0xDFFF = unsafeChr n | otherwise = U16.chr2 n0 n where n = A.unsafeIndex arr (off+len-1) n0 = A.unsafeIndex arr (off+len-2) {-# INLINE [1] last #-} {-# RULES "TEXT last -> fused" [~1] forall t. last t = S.last (stream t) "TEXT last -> unfused" [1] forall t. S.last (stream t) = last t #-} -- | /O(1)/ Returns all characters after the head of a 'Text', which -- must be non-empty. Subject to fusion. tail :: Text -> Text tail t@(Text arr off len) | len <= 0 = emptyError "tail" | otherwise = textP arr (off+d) (len-d) where d = iter_ t 0 {-# INLINE [1] tail #-} {-# RULES "TEXT tail -> fused" [~1] forall t. tail t = unstream (S.tail (stream t)) "TEXT tail -> unfused" [1] forall t. unstream (S.tail (stream t)) = tail t #-} -- | /O(1)/ Returns all but the last character of a 'Text', which must -- be non-empty. Subject to fusion. init :: Text -> Text init (Text arr off len) | len <= 0 = emptyError "init" | n >= 0xDC00 && n <= 0xDFFF = textP arr off (len-2) | otherwise = textP arr off (len-1) where n = A.unsafeIndex arr (off+len-1) {-# INLINE [1] init #-} {-# RULES "TEXT init -> fused" [~1] forall t. init t = unstream (S.init (stream t)) "TEXT init -> unfused" [1] forall t. unstream (S.init (stream t)) = init t #-} -- | /O(1)/ Tests whether a 'Text' is empty or not. Subject to -- fusion. null :: Text -> Bool null (Text _arr _off len) = #if defined(ASSERTS) assert (len >= 0) $ #endif len <= 0 {-# INLINE [1] null #-} {-# RULES "TEXT null -> fused" [~1] forall t. null t = S.null (stream t) "TEXT null -> unfused" [1] forall t. S.null (stream t) = null t #-} -- | /O(1)/ Tests whether a 'Text' contains exactly one character. -- Subject to fusion. isSingleton :: Text -> Bool isSingleton = S.isSingleton . stream {-# INLINE isSingleton #-} -- | /O(n)/ Returns the number of characters in a 'Text'. -- Subject to fusion. length :: Text -> Int length t = S.length (stream t) {-# INLINE length #-} -- | /O(n)/ Compare the count of characters in a 'Text' to a number. -- Subject to fusion. -- -- This function gives the same answer as comparing against the result -- of 'length', but can short circuit if the count of characters is -- greater than the number, and hence be more efficient. compareLength :: Text -> Int -> Ordering compareLength t n = S.compareLengthI (stream t) n {-# INLINE [1] compareLength #-} {-# RULES "TEXT compareN/length -> compareLength" [~1] forall t n. compare (length t) n = compareLength t n #-} {-# RULES "TEXT ==N/length -> compareLength/==EQ" [~1] forall t n. (==) (length t) n = compareLength t n == EQ #-} {-# RULES "TEXT /=N/length -> compareLength//=EQ" [~1] forall t n. (/=) (length t) n = compareLength t n /= EQ #-} {-# RULES "TEXT compareLength/==LT" [~1] forall t n. (<) (length t) n = compareLength t n == LT #-} {-# RULES "TEXT <=N/length -> compareLength//=GT" [~1] forall t n. (<=) (length t) n = compareLength t n /= GT #-} {-# RULES "TEXT >N/length -> compareLength/==GT" [~1] forall t n. (>) (length t) n = compareLength t n == GT #-} {-# RULES "TEXT >=N/length -> compareLength//=LT" [~1] forall t n. (>=) (length t) n = compareLength t n /= LT #-} -- ----------------------------------------------------------------------------- -- * Transformations -- | /O(n)/ 'map' @f@ @t@ is the 'Text' obtained by applying @f@ to -- each element of @t@. Subject to fusion. Performs replacement on -- invalid scalar values. map :: (Char -> Char) -> Text -> Text map f t = unstream (S.map (safe . f) (stream t)) {-# INLINE [1] map #-} -- | /O(n)/ The 'intercalate' function takes a 'Text' and a list of -- 'Text's and concatenates the list after interspersing the first -- argument between each element of the list. intercalate :: Text -> [Text] -> Text intercalate t = concat . (U.intersperse t) {-# INLINE intercalate #-} -- | /O(n)/ The 'intersperse' function takes a character and places it -- between the characters of a 'Text'. Subject to fusion. Performs -- replacement on invalid scalar values. intersperse :: Char -> Text -> Text intersperse c t = unstream (S.intersperse (safe c) (stream t)) {-# INLINE intersperse #-} -- | /O(n)/ Reverse the characters of a string. Subject to fusion. reverse :: Text -> Text reverse t = S.reverse (stream t) {-# INLINE reverse #-} -- | /O(m+n)/ Replace every occurrence of one substring with another. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. replace :: Text -- ^ Text to search for -> Text -- ^ Replacement text -> Text -- ^ Input text -> Text replace s d = intercalate d . splitOn s {-# INLINE replace #-} -- ---------------------------------------------------------------------------- -- ** Case conversions (folds) -- $case -- -- When case converting 'Text' values, do not use combinators like -- @map toUpper@ to case convert each character of a string -- individually, as this gives incorrect results according to the -- rules of some writing systems. The whole-string case conversion -- functions from this module, such as @toUpper@, obey the correct -- case conversion rules. As a result, these functions may map one -- input character to two or three output characters. For examples, -- see the documentation of each function. -- -- /Note/: In some languages, case conversion is a locale- and -- context-dependent operation. The case conversion functions in this -- module are /not/ locale sensitive. Programs that require locale -- sensitivity should use appropriate versions of the case mapping -- functions from the @text-icu@ package: -- -- | /O(n)/ Convert a string to folded case. This function is mainly -- useful for performing caseless (also known as case insensitive) -- string comparisons. -- -- A string @x@ is a caseless match for a string @y@ if and only if: -- -- @toCaseFold x == toCaseFold y@ -- -- The result string may be longer than the input string, and may -- differ from applying 'toLower' to the input string. For instance, -- the Armenian small ligature \"ﬓ\" (men now, U+FB13) is case -- folded to the sequence \"մ\" (men, U+0574) followed by -- \"ն\" (now, U+0576), while the Greek \"µ\" (micro sign, -- U+00B5) is case folded to \"μ\" (small letter mu, U+03BC) -- instead of itself. toCaseFold :: Text -> Text toCaseFold t = unstream (S.toCaseFold (stream t)) {-# INLINE [0] toCaseFold #-} -- | /O(n)/ Convert a string to lower case, using simple case -- conversion. The result string may be longer than the input string. -- For instance, \"İ\" (Latin capital letter I with dot above, -- U+0130) maps to the sequence \"i\" (Latin small letter i, U+0069) followed -- by \" ̇\" (combining dot above, U+0307). toLower :: Text -> Text toLower t = unstream (S.toLower (stream t)) {-# INLINE toLower #-} -- | /O(n)/ Convert a string to upper case, using simple case -- conversion. The result string may be longer than the input string. -- For instance, the German \"ß\" (eszett, U+00DF) maps to the -- two-letter sequence \"SS\". toUpper :: Text -> Text toUpper t = unstream (S.toUpper (stream t)) {-# INLINE toUpper #-} -- | /O(n)/ Left-justify a string to the given length, using the -- specified fill character on the right. Subject to fusion. -- Performs replacement on invalid scalar values. -- -- Examples: -- -- > justifyLeft 7 'x' "foo" == "fooxxxx" -- > justifyLeft 3 'x' "foobar" == "foobar" justifyLeft :: Int -> Char -> Text -> Text justifyLeft k c t | len >= k = t | otherwise = t `append` replicateChar (k-len) c where len = length t {-# INLINE [1] justifyLeft #-} {-# RULES "TEXT justifyLeft -> fused" [~1] forall k c t. justifyLeft k c t = unstream (S.justifyLeftI k c (stream t)) "TEXT justifyLeft -> unfused" [1] forall k c t. unstream (S.justifyLeftI k c (stream t)) = justifyLeft k c t #-} -- | /O(n)/ Right-justify a string to the given length, using the -- specified fill character on the left. Performs replacement on -- invalid scalar values. -- -- Examples: -- -- > justifyRight 7 'x' "bar" == "xxxxbar" -- > justifyRight 3 'x' "foobar" == "foobar" justifyRight :: Int -> Char -> Text -> Text justifyRight k c t | len >= k = t | otherwise = replicateChar (k-len) c `append` t where len = length t {-# INLINE justifyRight #-} -- | /O(n)/ Center a string to the given length, using the specified -- fill character on either side. Performs replacement on invalid -- scalar values. -- -- Examples: -- -- > center 8 'x' "HS" = "xxxHSxxx" center :: Int -> Char -> Text -> Text center k c t | len >= k = t | otherwise = replicateChar l c `append` t `append` replicateChar r c where len = length t d = k - len r = d `quot` 2 l = d - r {-# INLINE center #-} -- | /O(n)/ The 'transpose' function transposes the rows and columns -- of its 'Text' argument. Note that this function uses 'pack', -- 'unpack', and the list version of transpose, and is thus not very -- efficient. transpose :: [Text] -> [Text] transpose ts = P.map pack (L.transpose (P.map unpack ts)) -- ----------------------------------------------------------------------------- -- * Reducing 'Text's (folds) -- | /O(n)/ 'foldl', applied to a binary operator, a starting value -- (typically the left-identity of the operator), and a 'Text', -- reduces the 'Text' using the binary operator, from left to right. -- Subject to fusion. foldl :: (a -> Char -> a) -> a -> Text -> a foldl f z t = S.foldl f z (stream t) {-# INLINE foldl #-} -- | /O(n)/ A strict version of 'foldl'. Subject to fusion. foldl' :: (a -> Char -> a) -> a -> Text -> a foldl' f z t = S.foldl' f z (stream t) {-# INLINE foldl' #-} -- | /O(n)/ A variant of 'foldl' that has no starting value argument, -- and thus must be applied to a non-empty 'Text'. Subject to fusion. foldl1 :: (Char -> Char -> Char) -> Text -> Char foldl1 f t = S.foldl1 f (stream t) {-# INLINE foldl1 #-} -- | /O(n)/ A strict version of 'foldl1'. Subject to fusion. foldl1' :: (Char -> Char -> Char) -> Text -> Char foldl1' f t = S.foldl1' f (stream t) {-# INLINE foldl1' #-} -- | /O(n)/ 'foldr', applied to a binary operator, a starting value -- (typically the right-identity of the operator), and a 'Text', -- reduces the 'Text' using the binary operator, from right to left. -- Subject to fusion. foldr :: (Char -> a -> a) -> a -> Text -> a foldr f z t = S.foldr f z (stream t) {-# INLINE foldr #-} -- | /O(n)/ A variant of 'foldr' that has no starting value argument, -- and thus must be applied to a non-empty 'Text'. Subject to -- fusion. foldr1 :: (Char -> Char -> Char) -> Text -> Char foldr1 f t = S.foldr1 f (stream t) {-# INLINE foldr1 #-} -- ----------------------------------------------------------------------------- -- ** Special folds -- | /O(n)/ Concatenate a list of 'Text's. concat :: [Text] -> Text concat ts = case ts' of [] -> empty [t] -> t _ -> Text (A.run go) 0 len where ts' = L.filter (not . null) ts len = sumP "concat" $ L.map lengthWord16 ts' go = do arr <- A.new len let step i (Text a o l) = let !j = i + l in A.copyI arr i a o j >> return j foldM step 0 ts' >> return arr -- | /O(n)/ Map a function over a 'Text' that results in a 'Text', and -- concatenate the results. concatMap :: (Char -> Text) -> Text -> Text concatMap f = concat . foldr ((:) . f) [] {-# INLINE concatMap #-} -- | /O(n)/ 'any' @p@ @t@ determines whether any character in the -- 'Text' @t@ satisifes the predicate @p@. Subject to fusion. any :: (Char -> Bool) -> Text -> Bool any p t = S.any p (stream t) {-# INLINE any #-} -- | /O(n)/ 'all' @p@ @t@ determines whether all characters in the -- 'Text' @t@ satisify the predicate @p@. Subject to fusion. all :: (Char -> Bool) -> Text -> Bool all p t = S.all p (stream t) {-# INLINE all #-} -- | /O(n)/ 'maximum' returns the maximum value from a 'Text', which -- must be non-empty. Subject to fusion. maximum :: Text -> Char maximum t = S.maximum (stream t) {-# INLINE maximum #-} -- | /O(n)/ 'minimum' returns the minimum value from a 'Text', which -- must be non-empty. Subject to fusion. minimum :: Text -> Char minimum t = S.minimum (stream t) {-# INLINE minimum #-} -- ----------------------------------------------------------------------------- -- * Building 'Text's -- | /O(n)/ 'scanl' is similar to 'foldl', but returns a list of -- successive reduced values from the left. Subject to fusion. -- Performs replacement on invalid scalar values. -- -- > scanl f z [x1, x2, ...] == [z, z `f` x1, (z `f` x1) `f` x2, ...] -- -- Note that -- -- > last (scanl f z xs) == foldl f z xs. scanl :: (Char -> Char -> Char) -> Char -> Text -> Text scanl f z t = unstream (S.scanl g z (stream t)) where g a b = safe (f a b) {-# INLINE scanl #-} -- | /O(n)/ 'scanl1' is a variant of 'scanl' that has no starting -- value argument. Subject to fusion. Performs replacement on -- invalid scalar values. -- -- > scanl1 f [x1, x2, ...] == [x1, x1 `f` x2, ...] scanl1 :: (Char -> Char -> Char) -> Text -> Text scanl1 f t | null t = empty | otherwise = scanl f (unsafeHead t) (unsafeTail t) {-# INLINE scanl1 #-} -- | /O(n)/ 'scanr' is the right-to-left dual of 'scanl'. Performs -- replacement on invalid scalar values. -- -- > scanr f v == reverse . scanl (flip f) v . reverse scanr :: (Char -> Char -> Char) -> Char -> Text -> Text scanr f z = S.reverse . S.reverseScanr g z . reverseStream where g a b = safe (f a b) {-# INLINE scanr #-} -- | /O(n)/ 'scanr1' is a variant of 'scanr' that has no starting -- value argument. Subject to fusion. Performs replacement on -- invalid scalar values. scanr1 :: (Char -> Char -> Char) -> Text -> Text scanr1 f t | null t = empty | otherwise = scanr f (last t) (init t) {-# INLINE scanr1 #-} -- | /O(n)/ Like a combination of 'map' and 'foldl''. Applies a -- function to each element of a 'Text', passing an accumulating -- parameter from left to right, and returns a final 'Text'. Performs -- replacement on invalid scalar values. mapAccumL :: (a -> Char -> (a,Char)) -> a -> Text -> (a, Text) mapAccumL f z0 = S.mapAccumL g z0 . stream where g a b = second safe (f a b) {-# INLINE mapAccumL #-} -- | The 'mapAccumR' function behaves like a combination of 'map' and -- a strict 'foldr'; it applies a function to each element of a -- 'Text', passing an accumulating parameter from right to left, and -- returning a final value of this accumulator together with the new -- 'Text'. -- Performs replacement on invalid scalar values. mapAccumR :: (a -> Char -> (a,Char)) -> a -> Text -> (a, Text) mapAccumR f z0 = second reverse . S.mapAccumL g z0 . reverseStream where g a b = second safe (f a b) {-# INLINE mapAccumR #-} -- ----------------------------------------------------------------------------- -- ** Generating and unfolding 'Text's -- | /O(n*m)/ 'replicate' @n@ @t@ is a 'Text' consisting of the input -- @t@ repeated @n@ times. replicate :: Int -> Text -> Text replicate n t@(Text a o l) | n <= 0 || l <= 0 = empty | n == 1 = t | isSingleton t = replicateChar n (unsafeHead t) | n <= maxBound `quot` l = Text (A.run x) 0 len | otherwise = overflowError "replicate" where len = l * n x = do arr <- A.new len let loop !d !i | i >= n = return arr | otherwise = let m = d + l in A.copyI arr d a o m >> loop m (i+1) loop 0 0 {-# INLINE [1] replicate #-} {-# RULES "TEXT replicate/singleton -> replicateChar" [~1] forall n c. replicate n (singleton c) = replicateChar n c #-} -- | /O(n)/ 'replicateChar' @n@ @c@ is a 'Text' of length @n@ with @c@ the -- value of every element. Subject to fusion. replicateChar :: Int -> Char -> Text replicateChar n c = unstream (S.replicateCharI n (safe c)) {-# INLINE replicateChar #-} -- | /O(n)/, where @n@ is the length of the result. The 'unfoldr' -- function is analogous to the List 'L.unfoldr'. 'unfoldr' builds a -- 'Text' from a seed value. The function takes the element and -- returns 'Nothing' if it is done producing the 'Text', otherwise -- 'Just' @(a,b)@. In this case, @a@ is the next 'Char' in the -- string, and @b@ is the seed value for further production. Subject -- to fusion. Performs replacement on invalid scalar values. unfoldr :: (a -> Maybe (Char,a)) -> a -> Text unfoldr f s = unstream (S.unfoldr (firstf safe . f) s) {-# INLINE unfoldr #-} -- | /O(n)/ Like 'unfoldr', 'unfoldrN' builds a 'Text' from a seed -- value. However, the length of the result should be limited by the -- first argument to 'unfoldrN'. This function is more efficient than -- 'unfoldr' when the maximum length of the result is known and -- correct, otherwise its performance is similar to 'unfoldr'. Subject -- to fusion. Performs replacement on invalid scalar values. unfoldrN :: Int -> (a -> Maybe (Char,a)) -> a -> Text unfoldrN n f s = unstream (S.unfoldrN n (firstf safe . f) s) {-# INLINE unfoldrN #-} -- ----------------------------------------------------------------------------- -- * Substrings -- | /O(n)/ 'take' @n@, applied to a 'Text', returns the prefix of the -- 'Text' of length @n@, or the 'Text' itself if @n@ is greater than -- the length of the Text. Subject to fusion. take :: Int -> Text -> Text take n t@(Text arr off len) | n <= 0 = empty | n >= len = t | otherwise = Text arr off (loop 0 0) where loop !i !cnt | i >= len || cnt >= n = i | otherwise = loop (i+d) (cnt+1) where d = iter_ t i {-# INLINE [1] take #-} {-# RULES "TEXT take -> fused" [~1] forall n t. take n t = unstream (S.take n (stream t)) "TEXT take -> unfused" [1] forall n t. unstream (S.take n (stream t)) = take n t #-} -- | /O(n)/ 'drop' @n@, applied to a 'Text', returns the suffix of the -- 'Text' after the first @n@ characters, or the empty 'Text' if @n@ -- is greater than the length of the 'Text'. Subject to fusion. drop :: Int -> Text -> Text drop n t@(Text arr off len) | n <= 0 = t | n >= len = empty | otherwise = loop 0 0 where loop !i !cnt | i >= len || cnt >= n = Text arr (off+i) (len-i) | otherwise = loop (i+d) (cnt+1) where d = iter_ t i {-# INLINE [1] drop #-} {-# RULES "TEXT drop -> fused" [~1] forall n t. drop n t = unstream (S.drop n (stream t)) "TEXT drop -> unfused" [1] forall n t. unstream (S.drop n (stream t)) = drop n t #-} -- | /O(n)/ 'takeWhile', applied to a predicate @p@ and a 'Text', -- returns the longest prefix (possibly empty) of elements that -- satisfy @p@. Subject to fusion. takeWhile :: (Char -> Bool) -> Text -> Text takeWhile p t@(Text arr off len) = loop 0 where loop !i | i >= len = t | p c = loop (i+d) | otherwise = textP arr off i where Iter c d = iter t i {-# INLINE [1] takeWhile #-} {-# RULES "TEXT takeWhile -> fused" [~1] forall p t. takeWhile p t = unstream (S.takeWhile p (stream t)) "TEXT takeWhile -> unfused" [1] forall p t. unstream (S.takeWhile p (stream t)) = takeWhile p t #-} -- | /O(n)/ 'dropWhile' @p@ @t@ returns the suffix remaining after -- 'takeWhile' @p@ @t@. Subject to fusion. dropWhile :: (Char -> Bool) -> Text -> Text dropWhile p t@(Text arr off len) = loop 0 0 where loop !i !l | l >= len = empty | p c = loop (i+d) (l+d) | otherwise = Text arr (off+i) (len-l) where Iter c d = iter t i {-# INLINE [1] dropWhile #-} {-# RULES "TEXT dropWhile -> fused" [~1] forall p t. dropWhile p t = unstream (S.dropWhile p (stream t)) "TEXT dropWhile -> unfused" [1] forall p t. unstream (S.dropWhile p (stream t)) = dropWhile p t #-} -- | /O(n)/ 'dropWhileEnd' @p@ @t@ returns the prefix remaining after -- dropping characters that fail the predicate @p@ from the end of -- @t@. Subject to fusion. -- Examples: -- -- > dropWhileEnd (=='.') "foo..." == "foo" dropWhileEnd :: (Char -> Bool) -> Text -> Text dropWhileEnd p t@(Text arr off len) = loop (len-1) len where loop !i !l | l <= 0 = empty | p c = loop (i+d) (l+d) | otherwise = Text arr off l where (c,d) = reverseIter t i {-# INLINE [1] dropWhileEnd #-} {-# RULES "TEXT dropWhileEnd -> fused" [~1] forall p t. dropWhileEnd p t = S.reverse (S.dropWhile p (S.reverseStream t)) "TEXT dropWhileEnd -> unfused" [1] forall p t. S.reverse (S.dropWhile p (S.reverseStream t)) = dropWhileEnd p t #-} -- | /O(n)/ 'dropAround' @p@ @t@ returns the substring remaining after -- dropping characters that fail the predicate @p@ from both the -- beginning and end of @t@. Subject to fusion. dropAround :: (Char -> Bool) -> Text -> Text dropAround p = dropWhile p . dropWhileEnd p {-# INLINE [1] dropAround #-} -- | /O(n)/ Remove leading white space from a string. Equivalent to: -- -- > dropWhile isSpace stripStart :: Text -> Text stripStart = dropWhile isSpace {-# INLINE [1] stripStart #-} -- | /O(n)/ Remove trailing white space from a string. Equivalent to: -- -- > dropWhileEnd isSpace stripEnd :: Text -> Text stripEnd = dropWhileEnd isSpace {-# INLINE [1] stripEnd #-} -- | /O(n)/ Remove leading and trailing white space from a string. -- Equivalent to: -- -- > dropAround isSpace strip :: Text -> Text strip = dropAround isSpace {-# INLINE [1] strip #-} -- | /O(n)/ 'splitAt' @n t@ returns a pair whose first element is a -- prefix of @t@ of length @n@, and whose second is the remainder of -- the string. It is equivalent to @('take' n t, 'drop' n t)@. splitAt :: Int -> Text -> (Text, Text) splitAt n t@(Text arr off len) | n <= 0 = (empty, t) | n >= len = (t, empty) | otherwise = (Text arr off k, Text arr (off+k) (len-k)) where k = loop 0 0 loop !i !cnt | i >= len || cnt >= n = i | otherwise = loop (i+d) (cnt+1) where d = iter_ t i {-# INLINE splitAt #-} -- | /O(n)/ 'span', applied to a predicate @p@ and text @t@, returns -- a pair whose first element is the longest prefix (possibly empty) -- of @t@ of elements that satisfy @p@, and whose second is the -- remainder of the list. span :: (Char -> Bool) -> Text -> (Text, Text) span p t = case span_ p t of (# hd,tl #) -> (hd,tl) {-# INLINE span #-} -- | /O(n)/ 'break' is like 'span', but the prefix returned is -- over elements that fail the predicate @p@. break :: (Char -> Bool) -> Text -> (Text, Text) break p = span (not . p) {-# INLINE break #-} -- | /O(n)/ Group characters in a string according to a predicate. groupBy :: (Char -> Char -> Bool) -> Text -> [Text] groupBy p = loop where loop t@(Text arr off len) | null t = [] | otherwise = text arr off n : loop (text arr (off+n) (len-n)) where Iter c d = iter t 0 n = d + findAIndexOrEnd (not . p c) (Text arr (off+d) (len-d)) -- | Returns the /array/ index (in units of 'Word16') at which a -- character may be found. This is /not/ the same as the logical -- index returned by e.g. 'findIndex'. findAIndexOrEnd :: (Char -> Bool) -> Text -> Int findAIndexOrEnd q t@(Text _arr _off len) = go 0 where go !i | i >= len || q c = i | otherwise = go (i+d) where Iter c d = iter t i -- | /O(n)/ Group characters in a string by equality. group :: Text -> [Text] group = groupBy (==) -- | /O(n)/ Return all initial segments of the given 'Text', shortest -- first. inits :: Text -> [Text] inits t@(Text arr off len) = loop 0 where loop i | i >= len = [t] | otherwise = Text arr off i : loop (i + iter_ t i) -- | /O(n)/ Return all final segments of the given 'Text', longest -- first. tails :: Text -> [Text] tails t | null t = [empty] | otherwise = t : tails (unsafeTail t) -- $split -- -- Splitting functions in this library do not perform character-wise -- copies to create substrings; they just construct new 'Text's that -- are slices of the original. -- | /O(m+n)/ Break a 'Text' into pieces separated by the first -- 'Text' argument, consuming the delimiter. An empty delimiter is -- invalid, and will cause an error to be raised. -- -- Examples: -- -- > splitOn "\r\n" "a\r\nb\r\nd\r\ne" == ["a","b","d","e"] -- > splitOn "aaa" "aaaXaaaXaaaXaaa" == ["","X","X","X",""] -- > splitOn "x" "x" == ["",""] -- -- and -- -- > intercalate s . splitOn s == id -- > splitOn (singleton c) == split (==c) -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. splitOn :: Text -> Text -> [Text] splitOn pat@(Text _ _ l) src@(Text arr off len) | l <= 0 = emptyError "splitOn" | isSingleton pat = split (== unsafeHead pat) src | otherwise = go 0 (indices pat src) where go !s (x:xs) = textP arr (s+off) (x-s) : go (x+l) xs go s _ = [textP arr (s+off) (len-s)] {-# INLINE [1] splitOn #-} {-# RULES "TEXT splitOn/singleton -> split/==" [~1] forall c t. splitOn (singleton c) t = split (==c) t #-} -- | /O(n)/ Splits a 'Text' into components delimited by separators, -- where the predicate returns True for a separator element. The -- resulting components do not contain the separators. Two adjacent -- separators result in an empty component in the output. eg. -- -- > split (=='a') "aabbaca" == ["","","bb","c",""] -- > split (=='a') "" == [""] split :: (Char -> Bool) -> Text -> [Text] split _ t@(Text _off _arr 0) = [t] split p t = loop t where loop s | null s' = [l] | otherwise = l : loop (unsafeTail s') where (# l, s' #) = span_ (not . p) s {-# INLINE split #-} -- | /O(n)/ Splits a 'Text' into components of length @k@. The last -- element may be shorter than the other chunks, depending on the -- length of the input. Examples: -- -- > chunksOf 3 "foobarbaz" == ["foo","bar","baz"] -- > chunksOf 4 "haskell.org" == ["hask","ell.","org"] chunksOf :: Int -> Text -> [Text] chunksOf k = go where go t = case splitAt k t of (a,b) | null a -> [] | otherwise -> a : go b {-# INLINE chunksOf #-} -- ---------------------------------------------------------------------------- -- * Searching ------------------------------------------------------------------------------- -- ** Searching with a predicate -- | /O(n)/ The 'find' function takes a predicate and a 'Text', and -- returns the first element matching the predicate, or 'Nothing' if -- there is no such element. find :: (Char -> Bool) -> Text -> Maybe Char find p t = S.findBy p (stream t) {-# INLINE find #-} -- | /O(n)/ The 'partition' function takes a predicate and a 'Text', -- and returns the pair of 'Text's with elements which do and do not -- satisfy the predicate, respectively; i.e. -- -- > partition p t == (filter p t, filter (not . p) t) partition :: (Char -> Bool) -> Text -> (Text, Text) partition p t = (filter p t, filter (not . p) t) {-# INLINE partition #-} -- | /O(n)/ 'filter', applied to a predicate and a 'Text', -- returns a 'Text' containing those characters that satisfy the -- predicate. filter :: (Char -> Bool) -> Text -> Text filter p t = unstream (S.filter p (stream t)) {-# INLINE filter #-} -- | /O(n+m)/ Find the first instance of @needle@ (which must be -- non-'null') in @haystack@. The first element of the returned tuple -- is the prefix of @haystack@ before @needle@ is matched. The second -- is the remainder of @haystack@, starting with the match. -- -- Examples: -- -- > breakOn "::" "a::b::c" ==> ("a", "::b::c") -- > breakOn "/" "foobar" ==> ("foobar", "") -- -- Laws: -- -- > append prefix match == haystack -- > where (prefix, match) = breakOn needle haystack -- -- If you need to break a string by a substring repeatedly (e.g. you -- want to break on every instance of a substring), use 'breakOnAll' -- instead, as it has lower startup overhead. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. breakOn :: Text -> Text -> (Text, Text) breakOn pat src@(Text arr off len) | null pat = emptyError "breakOn" | otherwise = case indices pat src of [] -> (src, empty) (x:_) -> (textP arr off x, textP arr (off+x) (len-x)) {-# INLINE breakOn #-} -- | /O(n+m)/ Similar to 'breakOn', but searches from the end of the -- string. -- -- The first element of the returned tuple is the prefix of @haystack@ -- up to and including the last match of @needle@. The second is the -- remainder of @haystack@, following the match. -- -- > breakOnEnd "::" "a::b::c" ==> ("a::b::", "c") breakOnEnd :: Text -> Text -> (Text, Text) breakOnEnd pat src = (reverse b, reverse a) where (a,b) = breakOn (reverse pat) (reverse src) {-# INLINE breakOnEnd #-} -- | /O(n+m)/ Find all non-overlapping instances of @needle@ in -- @haystack@. Each element of the returned list consists of a pair: -- -- * The entire string prior to the /k/th match (i.e. the prefix) -- -- * The /k/th match, followed by the remainder of the string -- -- Examples: -- -- > breakOnAll "::" "" -- > ==> [] -- > breakOnAll "/" "a/b/c/" -- > ==> [("a", "/b/c/"), ("a/b", "/c/"), ("a/b/c", "/")] -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. -- -- The @needle@ parameter may not be empty. breakOnAll :: Text -- ^ @needle@ to search for -> Text -- ^ @haystack@ in which to search -> [(Text, Text)] breakOnAll pat src@(Text arr off slen) | null pat = emptyError "breakOnAll" | otherwise = L.map step (indices pat src) where step x = (chunk 0 x, chunk x (slen-x)) chunk !n !l = textP arr (n+off) l {-# INLINE breakOnAll #-} ------------------------------------------------------------------------------- -- ** Indexing 'Text's -- $index -- -- If you think of a 'Text' value as an array of 'Char' values (which -- it is not), you run the risk of writing inefficient code. -- -- An idiom that is common in some languages is to find the numeric -- offset of a character or substring, then use that number to split -- or trim the searched string. With a 'Text' value, this approach -- would require two /O(n)/ operations: one to perform the search, and -- one to operate from wherever the search ended. -- -- For example, suppose you have a string that you want to split on -- the substring @\"::\"@, such as @\"foo::bar::quux\"@. Instead of -- searching for the index of @\"::\"@ and taking the substrings -- before and after that index, you would instead use @breakOnAll \"::\"@. -- | /O(n)/ 'Text' index (subscript) operator, starting from 0. index :: Text -> Int -> Char index t n = S.index (stream t) n {-# INLINE index #-} -- | /O(n)/ The 'findIndex' function takes a predicate and a 'Text' -- and returns the index of the first element in the 'Text' satisfying -- the predicate. Subject to fusion. findIndex :: (Char -> Bool) -> Text -> Maybe Int findIndex p t = S.findIndex p (stream t) {-# INLINE findIndex #-} -- | /O(n+m)/ The 'count' function returns the number of times the -- query string appears in the given 'Text'. An empty query string is -- invalid, and will cause an error to be raised. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. count :: Text -> Text -> Int count pat src | null pat = emptyError "count" | isSingleton pat = countChar (unsafeHead pat) src | otherwise = L.length (indices pat src) {-# INLINE [1] count #-} {-# RULES "TEXT count/singleton -> countChar" [~1] forall c t. count (singleton c) t = countChar c t #-} -- | /O(n)/ The 'countChar' function returns the number of times the -- query element appears in the given 'Text'. Subject to fusion. countChar :: Char -> Text -> Int countChar c t = S.countChar c (stream t) {-# INLINE countChar #-} ------------------------------------------------------------------------------- -- * Zipping -- | /O(n)/ 'zip' takes two 'Text's and returns a list of -- corresponding pairs of bytes. If one input 'Text' is short, -- excess elements of the longer 'Text' are discarded. This is -- equivalent to a pair of 'unpack' operations. zip :: Text -> Text -> [(Char,Char)] zip a b = S.unstreamList $ S.zipWith (,) (stream a) (stream b) {-# INLINE [0] zip #-} -- | /O(n)/ 'zipWith' generalises 'zip' by zipping with the function -- given as the first argument, instead of a tupling function. -- Performs replacement on invalid scalar values. zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text zipWith f t1 t2 = unstream (S.zipWith g (stream t1) (stream t2)) where g a b = safe (f a b) {-# INLINE [0] zipWith #-} -- | /O(n)/ Breaks a 'Text' up into a list of words, delimited by 'Char's -- representing white space. words :: Text -> [Text] words t@(Text arr off len) = loop 0 0 where loop !start !n | n >= len = if start == n then [] else [Text arr (start+off) (n-start)] | isSpace c = if start == n then loop (start+1) (start+1) else Text arr (start+off) (n-start) : loop (n+d) (n+d) | otherwise = loop start (n+d) where Iter c d = iter t n {-# INLINE words #-} -- | /O(n)/ Breaks a 'Text' up into a list of 'Text's at -- newline 'Char's. The resulting strings do not contain newlines. lines :: Text -> [Text] lines ps | null ps = [] | otherwise = h : if null t then [] else lines (unsafeTail t) where (# h,t #) = span_ (/= '\n') ps {-# INLINE lines #-} {- -- | /O(n)/ Portably breaks a 'Text' up into a list of 'Text's at line -- boundaries. -- -- A line boundary is considered to be either a line feed, a carriage -- return immediately followed by a line feed, or a carriage return. -- This accounts for both Unix and Windows line ending conventions, -- and for the old convention used on Mac OS 9 and earlier. lines' :: Text -> [Text] lines' ps | null ps = [] | otherwise = h : case uncons t of Nothing -> [] Just (c,t') | c == '\n' -> lines t' | c == '\r' -> case uncons t' of Just ('\n',t'') -> lines t'' _ -> lines t' where (h,t) = span notEOL ps notEOL c = c /= '\n' && c /= '\r' {-# INLINE lines' #-} -} -- | /O(n)/ Joins lines, after appending a terminating newline to -- each. unlines :: [Text] -> Text unlines = concat . L.map (`snoc` '\n') {-# INLINE unlines #-} -- | /O(n)/ Joins words using single space characters. unwords :: [Text] -> Text unwords = intercalate (singleton ' ') {-# INLINE unwords #-} -- | /O(n)/ The 'isPrefixOf' function takes two 'Text's and returns -- 'True' iff the first is a prefix of the second. Subject to fusion. isPrefixOf :: Text -> Text -> Bool isPrefixOf a@(Text _ _ alen) b@(Text _ _ blen) = alen <= blen && S.isPrefixOf (stream a) (stream b) {-# INLINE [1] isPrefixOf #-} {-# RULES "TEXT isPrefixOf -> fused" [~1] forall s t. isPrefixOf s t = S.isPrefixOf (stream s) (stream t) #-} -- | /O(n)/ The 'isSuffixOf' function takes two 'Text's and returns -- 'True' iff the first is a suffix of the second. isSuffixOf :: Text -> Text -> Bool isSuffixOf a@(Text _aarr _aoff alen) b@(Text barr boff blen) = d >= 0 && a == b' where d = blen - alen b' | d == 0 = b | otherwise = Text barr (boff+d) alen {-# INLINE isSuffixOf #-} -- | /O(n+m)/ The 'isInfixOf' function takes two 'Text's and returns -- 'True' iff the first is contained, wholly and intact, anywhere -- within the second. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. isInfixOf :: Text -> Text -> Bool isInfixOf needle haystack | null needle = True | isSingleton needle = S.elem (unsafeHead needle) . S.stream $ haystack | otherwise = not . L.null . indices needle $ haystack {-# INLINE [1] isInfixOf #-} {-# RULES "TEXT isInfixOf/singleton -> S.elem/S.stream" [~1] forall n h. isInfixOf (singleton n) h = S.elem n (S.stream h) #-} ------------------------------------------------------------------------------- -- * View patterns -- | /O(n)/ Return the suffix of the second string if its prefix -- matches the entire first string. -- -- Examples: -- -- > stripPrefix "foo" "foobar" == Just "bar" -- > stripPrefix "" "baz" == Just "baz" -- > stripPrefix "foo" "quux" == Nothing -- -- This is particularly useful with the @ViewPatterns@ extension to -- GHC, as follows: -- -- > {-# LANGUAGE ViewPatterns #-} -- > import Data.Text as T -- > -- > fnordLength :: Text -> Int -- > fnordLength (stripPrefix "fnord" -> Just suf) = T.length suf -- > fnordLength _ = -1 stripPrefix :: Text -> Text -> Maybe Text stripPrefix p@(Text _arr _off plen) t@(Text arr off len) | p `isPrefixOf` t = Just $! textP arr (off+plen) (len-plen) | otherwise = Nothing -- | /O(n)/ Find the longest non-empty common prefix of two strings -- and return it, along with the suffixes of each string at which they -- no longer match. -- -- If the strings do not have a common prefix or either one is empty, -- this function returns 'Nothing'. -- -- Examples: -- -- > commonPrefixes "foobar" "fooquux" == Just ("foo","bar","quux") -- > commonPrefixes "veeble" "fetzer" == Nothing -- > commonPrefixes "" "baz" == Nothing commonPrefixes :: Text -> Text -> Maybe (Text,Text,Text) commonPrefixes t0@(Text arr0 off0 len0) t1@(Text arr1 off1 len1) = go 0 0 where go !i !j | i < len0 && j < len1 && a == b = go (i+d0) (j+d1) | i > 0 = Just (Text arr0 off0 i, textP arr0 (off0+i) (len0-i), textP arr1 (off1+j) (len1-j)) | otherwise = Nothing where Iter a d0 = iter t0 i Iter b d1 = iter t1 j -- | /O(n)/ Return the prefix of the second string if its suffix -- matches the entire first string. -- -- Examples: -- -- > stripSuffix "bar" "foobar" == Just "foo" -- > stripSuffix "" "baz" == Just "baz" -- > stripSuffix "foo" "quux" == Nothing -- -- This is particularly useful with the @ViewPatterns@ extension to -- GHC, as follows: -- -- > {-# LANGUAGE ViewPatterns #-} -- > import Data.Text as T -- > -- > quuxLength :: Text -> Int -- > quuxLength (stripSuffix "quux" -> Just pre) = T.length pre -- > quuxLength _ = -1 stripSuffix :: Text -> Text -> Maybe Text stripSuffix p@(Text _arr _off plen) t@(Text arr off len) | p `isSuffixOf` t = Just $! textP arr off (len-plen) | otherwise = Nothing -- | Add a list of non-negative numbers. Errors out on overflow. sumP :: String -> [Int] -> Int sumP fun = go 0 where go !a (x:xs) | ax >= 0 = go ax xs | otherwise = overflowError fun where ax = a + x go a _ = a emptyError :: String -> a emptyError fun = P.error $ "Data.Text." ++ fun ++ ": empty input" overflowError :: String -> a overflowError fun = P.error $ "Data.Text." ++ fun ++ ": size overflow" -- | /O(n)/ Make a distinct copy of the given string, sharing no -- storage with the original string. -- -- As an example, suppose you read a large string, of which you need -- only a small portion. If you do not use 'copy', the entire original -- array will be kept alive in memory by the smaller string. Making a -- copy \"breaks the link\" to the original array, allowing it to be -- garbage collected if there are no other live references to it. copy :: Text -> Text copy (Text arr off len) = Text (A.run go) 0 len where go = do marr <- A.new len A.copyI marr 0 arr off len return marr text-0.11.3.1/Data/Text/0000755000000000000000000000000012140332415012675 5ustar0000000000000000text-0.11.3.1/Data/Text/Array.hs0000644000000000000000000001723412140332415014316 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, ForeignFunctionInterface, MagicHash, Rank2Types, RecordWildCards, UnboxedTuples, UnliftedFFITypes #-} {-# OPTIONS_GHC -fno-warn-unused-matches #-} -- | -- Module : Data.Text.Array -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Packed, unboxed, heap-resident arrays. Suitable for performance -- critical use, both in terms of large data quantities and high -- speed. -- -- This module is intended to be imported @qualified@, to avoid name -- clashes with "Prelude" functions, e.g. -- -- > import qualified Data.Text.Array as A -- -- The names in this module resemble those in the 'Data.Array' family -- of modules, but are shorter due to the assumption of qualifid -- naming. module Data.Text.Array ( -- * Types Array(aBA) , MArray(maBA) -- * Functions , copyM , copyI , empty , equal #if defined(ASSERTS) , length #endif , run , run2 , toList , unsafeFreeze , unsafeIndex , new , unsafeWrite ) where #if defined(ASSERTS) -- This fugly hack is brought by GHC's apparent reluctance to deal -- with MagicHash and UnboxedTuples when inferring types. Eek! # define CHECK_BOUNDS(_func_,_len_,_k_) \ if (_k_) < 0 || (_k_) >= (_len_) then error ("Data.Text.Array." ++ (_func_) ++ ": bounds error, offset " ++ show (_k_) ++ ", length " ++ show (_len_)) else #else # define CHECK_BOUNDS(_func_,_len_,_k_) #endif #include "MachDeps.h" #if defined(ASSERTS) import Control.Exception (assert) #endif #if __GLASGOW_HASKELL__ >= 702 import Control.Monad.ST.Unsafe (unsafeIOToST) #else import Control.Monad.ST (unsafeIOToST) #endif import Data.Bits ((.&.), xor) import Data.Text.Unsafe.Base (inlinePerformIO) import Data.Text.UnsafeShift (shiftL, shiftR) #if __GLASGOW_HASKELL__ >= 703 import Foreign.C.Types (CInt(CInt), CSize(CSize)) #else import Foreign.C.Types (CInt, CSize) #endif import GHC.Base (ByteArray#, MutableByteArray#, Int(..), indexWord16Array#, newByteArray#, unsafeCoerce#, writeWord16Array#) import GHC.ST (ST(..), runST) import GHC.Word (Word16(..)) import Prelude hiding (length, read) -- | Immutable array type. data Array = Array { aBA :: ByteArray# #if defined(ASSERTS) , aLen :: {-# UNPACK #-} !Int -- length (in units of Word16, not bytes) #endif } -- | Mutable array type, for use in the ST monad. data MArray s = MArray { maBA :: MutableByteArray# s #if defined(ASSERTS) , maLen :: {-# UNPACK #-} !Int -- length (in units of Word16, not bytes) #endif } #if defined(ASSERTS) -- | Operations supported by all arrays. class IArray a where -- | Return the length of an array. length :: a -> Int instance IArray Array where length = aLen {-# INLINE length #-} instance IArray (MArray s) where length = maLen {-# INLINE length #-} #endif -- | Create an uninitialized mutable array. new :: forall s. Int -> ST s (MArray s) new n | n < 0 || n .&. highBit /= 0 = array_size_error | otherwise = ST $ \s1# -> case newByteArray# len# s1# of (# s2#, marr# #) -> (# s2#, MArray marr# #if defined(ASSERTS) n #endif #) where !(I# len#) = bytesInArray n highBit = maxBound `xor` (maxBound `shiftR` 1) {-# INLINE new #-} array_size_error :: a array_size_error = error "Data.Text.Array.new: size overflow" -- | Freeze a mutable array. Do not mutate the 'MArray' afterwards! unsafeFreeze :: MArray s -> ST s Array unsafeFreeze MArray{..} = ST $ \s# -> (# s#, Array (unsafeCoerce# maBA) #if defined(ASSERTS) maLen #endif #) {-# INLINE unsafeFreeze #-} -- | Indicate how many bytes would be used for an array of the given -- size. bytesInArray :: Int -> Int bytesInArray n = n `shiftL` 1 {-# INLINE bytesInArray #-} -- | Unchecked read of an immutable array. May return garbage or -- crash on an out-of-bounds access. unsafeIndex :: Array -> Int -> Word16 unsafeIndex Array{..} i@(I# i#) = CHECK_BOUNDS("unsafeIndex",aLen,i) case indexWord16Array# aBA i# of r# -> (W16# r#) {-# INLINE unsafeIndex #-} -- | Unchecked write of a mutable array. May return garbage or crash -- on an out-of-bounds access. unsafeWrite :: MArray s -> Int -> Word16 -> ST s () unsafeWrite MArray{..} i@(I# i#) (W16# e#) = ST $ \s1# -> CHECK_BOUNDS("unsafeWrite",maLen,i) case writeWord16Array# maBA i# e# s1# of s2# -> (# s2#, () #) {-# INLINE unsafeWrite #-} -- | Convert an immutable array to a list. toList :: Array -> Int -> Int -> [Word16] toList ary off len = loop 0 where loop i | i < len = unsafeIndex ary (off+i) : loop (i+1) | otherwise = [] -- | An empty immutable array. empty :: Array empty = runST (new 0 >>= unsafeFreeze) -- | Run an action in the ST monad and return an immutable array of -- its result. run :: (forall s. ST s (MArray s)) -> Array run k = runST (k >>= unsafeFreeze) -- | Run an action in the ST monad and return an immutable array of -- its result paired with whatever else the action returns. run2 :: (forall s. ST s (MArray s, a)) -> (Array, a) run2 k = runST (do (marr,b) <- k arr <- unsafeFreeze marr return (arr,b)) {-# INLINE run2 #-} -- | Copy some elements of a mutable array. copyM :: MArray s -- ^ Destination -> Int -- ^ Destination offset -> MArray s -- ^ Source -> Int -- ^ Source offset -> Int -- ^ Count -> ST s () copyM dest didx src sidx count | count <= 0 = return () | otherwise = #if defined(ASSERTS) assert (sidx + count <= length src) . assert (didx + count <= length dest) . #endif unsafeIOToST $ memcpyM (maBA dest) (fromIntegral didx) (maBA src) (fromIntegral sidx) (fromIntegral count) {-# INLINE copyM #-} -- | Copy some elements of an immutable array. copyI :: MArray s -- ^ Destination -> Int -- ^ Destination offset -> Array -- ^ Source -> Int -- ^ Source offset -> Int -- ^ First offset in destination /not/ to -- copy (i.e. /not/ length) -> ST s () copyI dest i0 src j0 top | i0 >= top = return () | otherwise = unsafeIOToST $ memcpyI (maBA dest) (fromIntegral i0) (aBA src) (fromIntegral j0) (fromIntegral (top-i0)) {-# INLINE copyI #-} -- | Compare portions of two arrays for equality. No bounds checking -- is performed. equal :: Array -- ^ First -> Int -- ^ Offset into first -> Array -- ^ Second -> Int -- ^ Offset into second -> Int -- ^ Count -> Bool equal arrA offA arrB offB count = inlinePerformIO $ do i <- memcmp (aBA arrA) (fromIntegral offA) (aBA arrB) (fromIntegral offB) (fromIntegral count) return $! i == 0 {-# INLINE equal #-} foreign import ccall unsafe "_hs_text_memcpy" memcpyI :: MutableByteArray# s -> CSize -> ByteArray# -> CSize -> CSize -> IO () foreign import ccall unsafe "_hs_text_memcmp" memcmp :: ByteArray# -> CSize -> ByteArray# -> CSize -> CSize -> IO CInt foreign import ccall unsafe "_hs_text_memcpy" memcpyM :: MutableByteArray# s -> CSize -> MutableByteArray# s -> CSize -> CSize -> IO () text-0.11.3.1/Data/Text/Encoding.hs0000644000000000000000000002640612140332415014767 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, ForeignFunctionInterface, MagicHash, UnliftedFFITypes #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Encoding -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts, -- (c) 2008, 2009 Tom Harper -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Functions for converting 'Text' values to and from 'ByteString', -- using several standard encodings. -- -- To gain access to a much larger family of encodings, use the -- @text-icu@ package: module Data.Text.Encoding ( -- * Decoding ByteStrings to Text -- $strict decodeASCII , decodeLatin1 , decodeUtf8 , decodeUtf16LE , decodeUtf16BE , decodeUtf32LE , decodeUtf32BE -- ** Catchable failure , decodeUtf8' -- ** Controllable error handling , decodeUtf8With , decodeUtf16LEWith , decodeUtf16BEWith , decodeUtf32LEWith , decodeUtf32BEWith -- * Encoding Text to ByteStrings , encodeUtf8 , encodeUtf16LE , encodeUtf16BE , encodeUtf32LE , encodeUtf32BE ) where import Control.Exception (evaluate, try) #if __GLASGOW_HASKELL__ >= 702 import Control.Monad.ST.Unsafe (unsafeIOToST, unsafeSTToIO) #else import Control.Monad.ST (unsafeIOToST, unsafeSTToIO) #endif import Data.Bits ((.&.)) import Data.ByteString as B import Data.ByteString.Internal as B import Data.Text.Encoding.Error (OnDecodeError, UnicodeException, strictDecode) import Data.Text.Internal (Text(..), safe, textP) import Data.Text.Private (runText) import Data.Text.UnsafeChar (ord, unsafeWrite) import Data.Text.UnsafeShift (shiftL, shiftR) import Data.Word (Word8) import Foreign.C.Types (CSize) import Foreign.ForeignPtr (withForeignPtr) import Foreign.Marshal.Utils (with) import Foreign.Ptr (Ptr, minusPtr, plusPtr) import Foreign.Storable (peek, poke) import GHC.Base (MutableByteArray#) import qualified Data.Text.Array as A import qualified Data.Text.Encoding.Fusion as E import qualified Data.Text.Encoding.Utf16 as U16 import qualified Data.Text.Fusion as F import Data.Text.Unsafe (unsafeDupablePerformIO) -- $strict -- -- All of the single-parameter functions for decoding bytestrings -- encoded in one of the Unicode Transformation Formats (UTF) operate -- in a /strict/ mode: each will throw an exception if given invalid -- input. -- -- Each function has a variant, whose name is suffixed with -'With', -- that gives greater control over the handling of decoding errors. -- For instance, 'decodeUtf8' will throw an exception, but -- 'decodeUtf8With' allows the programmer to determine what to do on a -- decoding error. -- | /Deprecated/. Decode a 'ByteString' containing 7-bit ASCII -- encoded text. -- -- This function is deprecated. Use 'decodeLatin1' instead. decodeASCII :: ByteString -> Text decodeASCII = decodeUtf8 {-# DEPRECATED decodeASCII "Use decodeUtf8 instead" #-} -- | Decode a 'ByteString' containing Latin-1 (aka ISO-8859-1) encoded text. -- -- 'decodeLatin1' is semantically equivalent to -- @Data.Text.pack . Data.ByteString.Char8.unpack@ decodeLatin1 :: ByteString -> Text decodeLatin1 (PS fp off len) = textP a 0 len where a = A.run (A.new len >>= unsafeIOToST . go) go dest = withForeignPtr fp $ \ptr -> do c_decode_latin1 (A.maBA dest) (ptr `plusPtr` off) (ptr `plusPtr` (off+len)) return dest -- | Decode a 'ByteString' containing UTF-8 encoded text. decodeUtf8With :: OnDecodeError -> ByteString -> Text decodeUtf8With onErr (PS fp off len) = runText $ \done -> do let go dest = withForeignPtr fp $ \ptr -> with (0::CSize) $ \destOffPtr -> do let end = ptr `plusPtr` (off + len) loop curPtr = do curPtr' <- c_decode_utf8 (A.maBA dest) destOffPtr curPtr end if curPtr' == end then do n <- peek destOffPtr unsafeSTToIO (done dest (fromIntegral n)) else do x <- peek curPtr' case onErr desc (Just x) of Nothing -> loop $ curPtr' `plusPtr` 1 Just c -> do destOff <- peek destOffPtr w <- unsafeSTToIO $ unsafeWrite dest (fromIntegral destOff) (safe c) poke destOffPtr (destOff + fromIntegral w) loop $ curPtr' `plusPtr` 1 loop (ptr `plusPtr` off) (unsafeIOToST . go) =<< A.new len where desc = "Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream" {- INLINE[0] decodeUtf8With #-} -- | Decode a 'ByteString' containing UTF-8 encoded text that is known -- to be valid. -- -- If the input contains any invalid UTF-8 data, an exception will be -- thrown that cannot be caught in pure code. For more control over -- the handling of invalid data, use 'decodeUtf8'' or -- 'decodeUtf8With'. decodeUtf8 :: ByteString -> Text decodeUtf8 = decodeUtf8With strictDecode {-# INLINE[0] decodeUtf8 #-} {-# RULES "STREAM stream/decodeUtf8 fusion" [1] forall bs. F.stream (decodeUtf8 bs) = E.streamUtf8 strictDecode bs #-} -- | Decode a 'ByteString' containing UTF-8 encoded text.. -- -- If the input contains any invalid UTF-8 data, the relevant -- exception will be returned, otherwise the decoded text. decodeUtf8' :: ByteString -> Either UnicodeException Text decodeUtf8' = unsafeDupablePerformIO . try . evaluate . decodeUtf8With strictDecode {-# INLINE decodeUtf8' #-} -- | Encode text using UTF-8 encoding. encodeUtf8 :: Text -> ByteString encodeUtf8 (Text arr off len) = unsafeDupablePerformIO $ do let size0 = max len 4 mallocByteString size0 >>= start size0 off 0 where start size n0 m0 fp = withForeignPtr fp $ loop n0 m0 where loop n1 m1 ptr = go n1 m1 where offLen = off + len go !n !m | n == offLen = return (PS fp 0 m) | otherwise = do let poke8 k v = poke (ptr `plusPtr` k) (fromIntegral v :: Word8) ensure k act | size-m >= k = act | otherwise = {-# SCC "resizeUtf8/ensure" #-} do let newSize = size `shiftL` 1 fp' <- mallocByteString newSize withForeignPtr fp' $ \ptr' -> memcpy ptr' ptr (fromIntegral m) start newSize n m fp' {-# INLINE ensure #-} case A.unsafeIndex arr n of w| w <= 0x7F -> ensure 1 $ do poke (ptr `plusPtr` m) (fromIntegral w :: Word8) -- A single ASCII octet is likely to start a run of -- them. We see better performance when we -- special-case this assumption. let end = ptr `plusPtr` size ascii !t !u | t == offLen || u == end || v >= 0x80 = go t (u `minusPtr` ptr) | otherwise = do poke u (fromIntegral v :: Word8) ascii (t+1) (u `plusPtr` 1) where v = A.unsafeIndex arr t ascii (n+1) (ptr `plusPtr` (m+1)) | w <= 0x7FF -> ensure 2 $ do poke8 m $ (w `shiftR` 6) + 0xC0 poke8 (m+1) $ (w .&. 0x3f) + 0x80 go (n+1) (m+2) | 0xD800 <= w && w <= 0xDBFF -> ensure 4 $ do let c = ord $ U16.chr2 w (A.unsafeIndex arr (n+1)) poke8 m $ (c `shiftR` 18) + 0xF0 poke8 (m+1) $ ((c `shiftR` 12) .&. 0x3F) + 0x80 poke8 (m+2) $ ((c `shiftR` 6) .&. 0x3F) + 0x80 poke8 (m+3) $ (c .&. 0x3F) + 0x80 go (n+2) (m+4) | otherwise -> ensure 3 $ do poke8 m $ (w `shiftR` 12) + 0xE0 poke8 (m+1) $ ((w `shiftR` 6) .&. 0x3F) + 0x80 poke8 (m+2) $ (w .&. 0x3F) + 0x80 go (n+1) (m+3) -- | Decode text from little endian UTF-16 encoding. decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text decodeUtf16LEWith onErr bs = F.unstream (E.streamUtf16LE onErr bs) {-# INLINE decodeUtf16LEWith #-} -- | Decode text from little endian UTF-16 encoding. -- -- If the input contains any invalid little endian UTF-16 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf16LEWith'. decodeUtf16LE :: ByteString -> Text decodeUtf16LE = decodeUtf16LEWith strictDecode {-# INLINE decodeUtf16LE #-} -- | Decode text from big endian UTF-16 encoding. decodeUtf16BEWith :: OnDecodeError -> ByteString -> Text decodeUtf16BEWith onErr bs = F.unstream (E.streamUtf16BE onErr bs) {-# INLINE decodeUtf16BEWith #-} -- | Decode text from big endian UTF-16 encoding. -- -- If the input contains any invalid big endian UTF-16 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf16BEWith'. decodeUtf16BE :: ByteString -> Text decodeUtf16BE = decodeUtf16BEWith strictDecode {-# INLINE decodeUtf16BE #-} -- | Encode text using little endian UTF-16 encoding. encodeUtf16LE :: Text -> ByteString encodeUtf16LE txt = E.unstream (E.restreamUtf16LE (F.stream txt)) {-# INLINE encodeUtf16LE #-} -- | Encode text using big endian UTF-16 encoding. encodeUtf16BE :: Text -> ByteString encodeUtf16BE txt = E.unstream (E.restreamUtf16BE (F.stream txt)) {-# INLINE encodeUtf16BE #-} -- | Decode text from little endian UTF-32 encoding. decodeUtf32LEWith :: OnDecodeError -> ByteString -> Text decodeUtf32LEWith onErr bs = F.unstream (E.streamUtf32LE onErr bs) {-# INLINE decodeUtf32LEWith #-} -- | Decode text from little endian UTF-32 encoding. -- -- If the input contains any invalid little endian UTF-32 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf32LEWith'. decodeUtf32LE :: ByteString -> Text decodeUtf32LE = decodeUtf32LEWith strictDecode {-# INLINE decodeUtf32LE #-} -- | Decode text from big endian UTF-32 encoding. decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text decodeUtf32BEWith onErr bs = F.unstream (E.streamUtf32BE onErr bs) {-# INLINE decodeUtf32BEWith #-} -- | Decode text from big endian UTF-32 encoding. -- -- If the input contains any invalid big endian UTF-32 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf32BEWith'. decodeUtf32BE :: ByteString -> Text decodeUtf32BE = decodeUtf32BEWith strictDecode {-# INLINE decodeUtf32BE #-} -- | Encode text using little endian UTF-32 encoding. encodeUtf32LE :: Text -> ByteString encodeUtf32LE txt = E.unstream (E.restreamUtf32LE (F.stream txt)) {-# INLINE encodeUtf32LE #-} -- | Encode text using big endian UTF-32 encoding. encodeUtf32BE :: Text -> ByteString encodeUtf32BE txt = E.unstream (E.restreamUtf32BE (F.stream txt)) {-# INLINE encodeUtf32BE #-} foreign import ccall unsafe "_hs_text_decode_utf8" c_decode_utf8 :: MutableByteArray# s -> Ptr CSize -> Ptr Word8 -> Ptr Word8 -> IO (Ptr Word8) foreign import ccall unsafe "_hs_text_decode_latin1" c_decode_latin1 :: MutableByteArray# s -> Ptr Word8 -> Ptr Word8 -> IO () text-0.11.3.1/Data/Text/Foreign.hs0000644000000000000000000001202112140332415014616 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, GeneralizedNewtypeDeriving #-} -- | -- Module : Data.Text.Foreign -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Support for using 'Text' data with native code via the Haskell -- foreign function interface. module Data.Text.Foreign ( -- * Interoperability with native code -- $interop I16 -- * Safe conversion functions , fromPtr , useAsPtr , asForeignPtr -- * Unsafe conversion code , lengthWord16 , unsafeCopyToPtr -- * Low-level manipulation -- $lowlevel , dropWord16 , takeWord16 ) where #if defined(ASSERTS) import Control.Exception (assert) #endif #if __GLASGOW_HASKELL__ >= 702 import Control.Monad.ST.Unsafe (unsafeIOToST) #else import Control.Monad.ST (unsafeIOToST) #endif import Data.Text.Internal (Text(..), empty) import Data.Text.Unsafe (lengthWord16) import qualified Data.Text.Array as A import Data.Word (Word16) import Foreign.Marshal.Alloc (allocaBytes) import Foreign.Ptr (Ptr, castPtr, plusPtr) import Foreign.ForeignPtr (ForeignPtr, mallocForeignPtrArray, withForeignPtr) import Foreign.Storable (peek, poke) -- $interop -- -- The 'Text' type is implemented using arrays that are not guaranteed -- to have a fixed address in the Haskell heap. All communication with -- native code must thus occur by copying data back and forth. -- -- The 'Text' type's internal representation is UTF-16, using the -- platform's native endianness. This makes copied data suitable for -- use with native libraries that use a similar representation, such -- as ICU. To interoperate with native libraries that use different -- internal representations, such as UTF-8 or UTF-32, consider using -- the functions in the 'Data.Text.Encoding' module. -- | A type representing a number of UTF-16 code units. newtype I16 = I16 Int deriving (Bounded, Enum, Eq, Integral, Num, Ord, Read, Real, Show) -- | /O(n)/ Create a new 'Text' from a 'Ptr' 'Word16' by copying the -- contents of the array. fromPtr :: Ptr Word16 -- ^ source array -> I16 -- ^ length of source array (in 'Word16' units) -> IO Text fromPtr _ (I16 0) = return empty fromPtr ptr (I16 len) = #if defined(ASSERTS) assert (len > 0) $ #endif return $! Text arr 0 len where arr = A.run (A.new len >>= copy) copy marr = loop ptr 0 where loop !p !i | i == len = return marr | otherwise = do A.unsafeWrite marr i =<< unsafeIOToST (peek p) loop (p `plusPtr` 2) (i + 1) -- $lowlevel -- -- Foreign functions that use UTF-16 internally may return indices in -- units of 'Word16' instead of characters. These functions may -- safely be used with such indices, as they will adjust offsets if -- necessary to preserve the validity of a Unicode string. -- | /O(1)/ Return the prefix of the 'Text' of @n@ 'Word16' units in -- length. -- -- If @n@ would cause the 'Text' to end inside a surrogate pair, the -- end of the prefix will be advanced by one additional 'Word16' unit -- to maintain its validity. takeWord16 :: I16 -> Text -> Text takeWord16 (I16 n) t@(Text arr off len) | n <= 0 = empty | n >= len || m >= len = t | otherwise = Text arr off m where m | w < 0xDB00 || w > 0xD8FF = n | otherwise = n+1 w = A.unsafeIndex arr (off+n-1) -- | /O(1)/ Return the suffix of the 'Text', with @n@ 'Word16' units -- dropped from its beginning. -- -- If @n@ would cause the 'Text' to begin inside a surrogate pair, the -- beginning of the suffix will be advanced by one additional 'Word16' -- unit to maintain its validity. dropWord16 :: I16 -> Text -> Text dropWord16 (I16 n) t@(Text arr off len) | n <= 0 = t | n >= len || m >= len = empty | otherwise = Text arr (off+m) (len-m) where m | w < 0xD800 || w > 0xDBFF = n | otherwise = n+1 w = A.unsafeIndex arr (off+n-1) -- | /O(n)/ Copy a 'Text' to an array. The array is assumed to be big -- enough to hold the contents of the entire 'Text'. unsafeCopyToPtr :: Text -> Ptr Word16 -> IO () unsafeCopyToPtr (Text arr off len) ptr = loop ptr off where end = off + len loop !p !i | i == end = return () | otherwise = do poke p (A.unsafeIndex arr i) loop (p `plusPtr` 2) (i + 1) -- | /O(n)/ Perform an action on a temporary, mutable copy of a -- 'Text'. The copy is freed as soon as the action returns. useAsPtr :: Text -> (Ptr Word16 -> I16 -> IO a) -> IO a useAsPtr t@(Text _arr _off len) action = allocaBytes (len * 2) $ \buf -> do unsafeCopyToPtr t buf action (castPtr buf) (fromIntegral len) -- | /O(n)/ Make a mutable copy of a 'Text'. asForeignPtr :: Text -> IO (ForeignPtr Word16, I16) asForeignPtr t@(Text _arr _off len) = do fp <- mallocForeignPtrArray len withForeignPtr fp $ unsafeCopyToPtr t return (fp, I16 len) text-0.11.3.1/Data/Text/Fusion.hs0000644000000000000000000002004012140332415014470 0ustar0000000000000000{-# LANGUAGE BangPatterns, MagicHash #-} -- | -- Module : Data.Text.Fusion -- Copyright : (c) Tom Harper 2008-2009, -- (c) Bryan O'Sullivan 2009-2010, -- (c) Duncan Coutts 2009 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Text manipulation functions represented as fusible operations over -- streams. module Data.Text.Fusion ( -- * Types Stream(..) , Step(..) -- * Creation and elimination , stream , unstream , reverseStream , length -- * Transformations , reverse -- * Construction -- ** Scans , reverseScanr -- ** Accumulating maps , mapAccumL -- ** Generation and unfolding , unfoldrN -- * Indexing , index , findIndex , countChar ) where import Prelude (Bool(..), Char, Maybe(..), Monad(..), Int, Num(..), Ord(..), ($), (&&), fromIntegral, otherwise) import Data.Bits ((.&.)) import Data.Text.Internal (Text(..)) import Data.Text.Private (runText) import Data.Text.UnsafeChar (ord, unsafeChr, unsafeWrite) import Data.Text.UnsafeShift (shiftL, shiftR) import qualified Data.Text.Array as A import qualified Data.Text.Fusion.Common as S import Data.Text.Fusion.Internal import Data.Text.Fusion.Size import qualified Data.Text.Internal as I import qualified Data.Text.Encoding.Utf16 as U16 default(Int) -- | /O(n)/ Convert a 'Text' into a 'Stream Char'. stream :: Text -> Stream Char stream (Text arr off len) = Stream next off (maxSize len) where !end = off+len next !i | i >= end = Done | n >= 0xD800 && n <= 0xDBFF = Yield (U16.chr2 n n2) (i + 2) | otherwise = Yield (unsafeChr n) (i + 1) where n = A.unsafeIndex arr i n2 = A.unsafeIndex arr (i + 1) {-# INLINE [0] stream #-} -- | /O(n)/ Convert a 'Text' into a 'Stream Char', but iterate -- backwards. reverseStream :: Text -> Stream Char reverseStream (Text arr off len) = Stream next (off+len-1) (maxSize len) where {-# INLINE next #-} next !i | i < off = Done | n >= 0xDC00 && n <= 0xDFFF = Yield (U16.chr2 n2 n) (i - 2) | otherwise = Yield (unsafeChr n) (i - 1) where n = A.unsafeIndex arr i n2 = A.unsafeIndex arr (i - 1) {-# INLINE [0] reverseStream #-} -- | /O(n)/ Convert a 'Stream Char' into a 'Text'. unstream :: Stream Char -> Text unstream (Stream next0 s0 len) = runText $ \done -> do let mlen = upperBound 4 len arr0 <- A.new mlen let outer arr top = loop where loop !s !i = case next0 s of Done -> done arr i Skip s' -> loop s' i Yield x s' | j >= top -> {-# SCC "unstream/resize" #-} do let top' = (top + 1) `shiftL` 1 arr' <- A.new top' A.copyM arr' 0 arr 0 top outer arr' top' s i | otherwise -> do d <- unsafeWrite arr i x loop s' (i+d) where j | ord x < 0x10000 = i | otherwise = i + 1 outer arr0 mlen s0 0 {-# INLINE [0] unstream #-} {-# RULES "STREAM stream/unstream fusion" forall s. stream (unstream s) = s #-} -- ---------------------------------------------------------------------------- -- * Basic stream functions length :: Stream Char -> Int length = S.lengthI {-# INLINE[0] length #-} -- | /O(n)/ Reverse the characters of a string. reverse :: Stream Char -> Text reverse (Stream next s len0) | isEmpty len0 = I.empty | otherwise = I.textP arr off' len' where len0' = upperBound 4 (larger len0 4) (arr, (off', len')) = A.run2 (A.new len0' >>= loop s (len0'-1) len0') loop !s0 !i !len marr = case next s0 of Done -> return (marr, (j, len-j)) where j = i + 1 Skip s1 -> loop s1 i len marr Yield x s1 | i < least -> {-# SCC "reverse/resize" #-} do let newLen = len `shiftL` 1 marr' <- A.new newLen A.copyM marr' (newLen-len) marr 0 len write s1 (len+i) newLen marr' | otherwise -> write s1 i len marr where n = ord x least | n < 0x10000 = 0 | otherwise = 1 m = n - 0x10000 lo = fromIntegral $ (m `shiftR` 10) + 0xD800 hi = fromIntegral $ (m .&. 0x3FF) + 0xDC00 write t j l mar | n < 0x10000 = do A.unsafeWrite mar j (fromIntegral n) loop t (j-1) l mar | otherwise = do A.unsafeWrite mar (j-1) lo A.unsafeWrite mar j hi loop t (j-2) l mar {-# INLINE [0] reverse #-} -- | /O(n)/ Perform the equivalent of 'scanr' over a list, only with -- the input and result reversed. reverseScanr :: (Char -> Char -> Char) -> Char -> Stream Char -> Stream Char reverseScanr f z0 (Stream next0 s0 len) = Stream next (S1 :*: z0 :*: s0) (len+1) -- HINT maybe too low where {-# INLINE next #-} next (S1 :*: z :*: s) = Yield z (S2 :*: z :*: s) next (S2 :*: z :*: s) = case next0 s of Yield x s' -> let !x' = f x z in Yield x' (S2 :*: x' :*: s') Skip s' -> Skip (S2 :*: z :*: s') Done -> Done {-# INLINE reverseScanr #-} -- | /O(n)/ Like 'unfoldr', 'unfoldrN' builds a stream from a seed -- value. However, the length of the result is limited by the -- first argument to 'unfoldrN'. This function is more efficient than -- 'unfoldr' when the length of the result is known. unfoldrN :: Int -> (a -> Maybe (Char,a)) -> a -> Stream Char unfoldrN n = S.unfoldrNI n {-# INLINE [0] unfoldrN #-} ------------------------------------------------------------------------------- -- ** Indexing streams -- | /O(n)/ stream index (subscript) operator, starting from 0. index :: Stream Char -> Int -> Char index = S.indexI {-# INLINE [0] index #-} -- | The 'findIndex' function takes a predicate and a stream and -- returns the index of the first element in the stream -- satisfying the predicate. findIndex :: (Char -> Bool) -> Stream Char -> Maybe Int findIndex = S.findIndexI {-# INLINE [0] findIndex #-} -- | /O(n)/ The 'count' function returns the number of times the query -- element appears in the given stream. countChar :: Char -> Stream Char -> Int countChar = S.countCharI {-# INLINE [0] countChar #-} -- | /O(n)/ Like a combination of 'map' and 'foldl''. Applies a -- function to each element of a 'Text', passing an accumulating -- parameter from left to right, and returns a final 'Text'. mapAccumL :: (a -> Char -> (a,Char)) -> a -> Stream Char -> (a, Text) mapAccumL f z0 (Stream next0 s0 len) = (nz,I.textP na 0 nl) where (na,(nz,nl)) = A.run2 (A.new mlen >>= \arr -> outer arr mlen z0 s0 0) where mlen = upperBound 4 len outer arr top = loop where loop !z !s !i = case next0 s of Done -> return (arr, (z,i)) Skip s' -> loop z s' i Yield x s' | j >= top -> {-# SCC "mapAccumL/resize" #-} do let top' = (top + 1) `shiftL` 1 arr' <- A.new top' A.copyM arr' 0 arr 0 top outer arr' top' z s i | otherwise -> do let (z',c) = f z x d <- unsafeWrite arr i c loop z' s' (i+d) where j | ord x < 0x10000 = i | otherwise = i + 1 {-# INLINE [0] mapAccumL #-} text-0.11.3.1/Data/Text/Internal.hs0000644000000000000000000000773612140332415015022 0ustar0000000000000000{-# LANGUAGE CPP, DeriveDataTypeable #-} -- | -- Module : Data.Text.Internal -- Copyright : (c) 2008, 2009 Tom Harper, -- (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- A module containing private 'Text' internals. This exposes the -- 'Text' representation and low level construction functions. -- Modules which extend the 'Text' system may need to use this module. -- -- You should not use this module unless you are determined to monkey -- with the internals, as the functions here do just about nothing to -- preserve data invariants. You have been warned! module Data.Text.Internal ( -- * Types -- $internals Text(..) -- * Construction , text , textP -- * Safety , safe -- * Code that must be here for accessibility , empty -- * Utilities , firstf -- * Debugging , showText ) where #if defined(ASSERTS) import Control.Exception (assert) #endif import Data.Bits ((.&.)) import qualified Data.Text.Array as A import Data.Text.UnsafeChar (ord) import Data.Typeable (Typeable) -- | A space efficient, packed, unboxed Unicode text type. data Text = Text {-# UNPACK #-} !A.Array -- payload (Word16 elements) {-# UNPACK #-} !Int -- offset (units of Word16, not Char) {-# UNPACK #-} !Int -- length (units of Word16, not Char) deriving (Typeable) -- | Smart constructor. text :: A.Array -> Int -> Int -> Text text arr off len = #if defined(ASSERTS) let c = A.unsafeIndex arr off alen = A.length arr in assert (len >= 0) . assert (off >= 0) . assert (alen == 0 || len == 0 || off < alen) . assert (len == 0 || c < 0xDC00 || c > 0xDFFF) $ #endif Text arr off len {-# INLINE text #-} -- | /O(1)/ The empty 'Text'. empty :: Text empty = Text A.empty 0 0 {-# INLINE [1] empty #-} -- | Construct a 'Text' without invisibly pinning its byte array in -- memory if its length has dwindled to zero. textP :: A.Array -> Int -> Int -> Text textP arr off len | len == 0 = empty | otherwise = text arr off len {-# INLINE textP #-} -- | A useful 'show'-like function for debugging purposes. showText :: Text -> String showText (Text arr off len) = "Text " ++ show (A.toList arr off len) ++ ' ' : show off ++ ' ' : show len -- | Map a 'Char' to a 'Text'-safe value. -- -- UTF-16 surrogate code points are not included in the set of Unicode -- scalar values, but are unfortunately admitted as valid 'Char' -- values by Haskell. They cannot be represented in a 'Text'. This -- function remaps those code points to the Unicode replacement -- character (U+FFFD, \'�\'), and leaves other code points -- unchanged. safe :: Char -> Char safe c | ord c .&. 0x1ff800 /= 0xd800 = c | otherwise = '\xfffd' {-# INLINE safe #-} -- | Apply a function to the first element of an optional pair. firstf :: (a -> c) -> Maybe (a,b) -> Maybe (c,b) firstf f (Just (a, b)) = Just (f a, b) firstf _ Nothing = Nothing -- $internals -- -- Internally, the 'Text' type is represented as an array of 'Word16' -- UTF-16 code units. The offset and length fields in the constructor -- are in these units, /not/ units of 'Char'. -- -- Invariants that all functions must maintain: -- -- * Since the 'Text' type uses UTF-16 internally, it cannot represent -- characters in the reserved surrogate code point range U+D800 to -- U+DFFF. To maintain this invariant, the 'safe' function maps -- 'Char' values in this range to the replacement character (U+FFFD, -- \'�\'). -- -- * A leading (or \"high\") surrogate code unit (0xD800–0xDBFF) must -- always be followed by a trailing (or \"low\") surrogate code unit -- (0xDC00-0xDFFF). A trailing surrogate code unit must always be -- preceded by a leading surrogate code unit. text-0.11.3.1/Data/Text/IO.hs0000644000000000000000000003151312140332415013543 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, RecordWildCards, ScopedTypeVariables #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.IO -- Copyright : (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Simon Marlow -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Efficient locale-sensitive support for text I\/O. -- -- Skip past the synopsis for some important notes on performance and -- portability across different versions of GHC. module Data.Text.IO ( -- * Performance -- $performance -- * Locale support -- $locale -- * File-at-a-time operations readFile , writeFile , appendFile -- * Operations on handles , hGetContents , hGetChunk , hGetLine , hPutStr , hPutStrLn -- * Special cases for standard input and output , interact , getContents , getLine , putStr , putStrLn ) where import Data.Text (Text) import Prelude hiding (appendFile, getContents, getLine, interact, putStr, putStrLn, readFile, writeFile) import System.IO (Handle, IOMode(..), hPutChar, openFile, stdin, stdout, withFile) #if __GLASGOW_HASKELL__ <= 610 import qualified Data.ByteString.Char8 as B import Data.Text.Encoding (decodeUtf8, encodeUtf8) #else import qualified Control.Exception as E import Control.Monad (liftM2, when) import Data.IORef (readIORef, writeIORef) import qualified Data.Text as T import Data.Text.Fusion (stream) import Data.Text.Fusion.Internal (Step(..), Stream(..)) import Data.Text.IO.Internal (hGetLineWith, readChunk) import GHC.IO.Buffer (Buffer(..), BufferState(..), CharBufElem, CharBuffer, RawCharBuffer, emptyBuffer, isEmptyBuffer, newCharBuffer, writeCharBuf) import GHC.IO.Exception (IOException(ioe_type), IOErrorType(InappropriateType)) import GHC.IO.Handle.Internals (augmentIOError, hClose_help, wantReadableHandle, wantWritableHandle) import GHC.IO.Handle.Text (commitBuffer') import GHC.IO.Handle.Types (BufferList(..), BufferMode(..), Handle__(..), HandleType(..), Newline(..)) import System.IO (hGetBuffering, hFileSize, hSetBuffering, hTell) import System.IO.Error (isEOFError) #endif -- $performance -- #performance# -- -- The functions in this module obey the runtime system's locale, -- character set encoding, and line ending conversion settings. -- -- If you know in advance that you will be working with data that has -- a specific encoding (e.g. UTF-8), and your application is highly -- performance sensitive, you may find that it is faster to perform -- I\/O with bytestrings and to encode and decode yourself than to use -- the functions in this module. -- -- Whether this will hold depends on the version of GHC you are using, -- the platform you are working on, the data you are working with, and -- the encodings you are using, so be sure to test for yourself. -- | The 'readFile' function reads a file and returns the contents of -- the file as a string. The entire file is read strictly, as with -- 'getContents'. readFile :: FilePath -> IO Text readFile name = openFile name ReadMode >>= hGetContents -- | Write a string to a file. The file is truncated to zero length -- before writing begins. writeFile :: FilePath -> Text -> IO () writeFile p = withFile p WriteMode . flip hPutStr -- | Write a string the end of a file. appendFile :: FilePath -> Text -> IO () appendFile p = withFile p AppendMode . flip hPutStr catchError :: String -> Handle -> Handle__ -> IOError -> IO Text catchError caller h Handle__{..} err | isEOFError err = do buf <- readIORef haCharBuffer return $ if isEmptyBuffer buf then T.empty else T.singleton '\r' | otherwise = E.throwIO (augmentIOError err caller h) -- | /Experimental./ Read a single chunk of strict text from a -- 'Handle'. The size of the chunk depends on the amount of input -- currently buffered. -- -- This function blocks only if there is no data available, and EOF -- has not yet been reached. Once EOF is reached, this function -- returns an empty string instead of throwing an exception. hGetChunk :: Handle -> IO Text hGetChunk h = wantReadableHandle "hGetChunk" h readSingleChunk where readSingleChunk hh@Handle__{..} = do buf <- readIORef haCharBuffer t <- readChunk hh buf `E.catch` catchError "hGetChunk" h hh return (hh, t) -- | Read the remaining contents of a 'Handle' as a string. The -- 'Handle' is closed once the contents have been read, or if an -- exception is thrown. -- -- Internally, this function reads a chunk at a time from the -- lower-level buffering abstraction, and concatenates the chunks into -- a single string once the entire file has been read. -- -- As a result, it requires approximately twice as much memory as its -- result to construct its result. For files more than a half of -- available RAM in size, this may result in memory exhaustion. hGetContents :: Handle -> IO Text #if __GLASGOW_HASKELL__ <= 610 hGetContents = fmap decodeUtf8 . B.hGetContents #else hGetContents h = do chooseGoodBuffering h wantReadableHandle "hGetContents" h readAll where readAll hh@Handle__{..} = do let readChunks = do buf <- readIORef haCharBuffer t <- readChunk hh buf `E.catch` catchError "hGetContents" h hh if T.null t then return [t] else (t:) `fmap` readChunks ts <- readChunks (hh', _) <- hClose_help hh return (hh'{haType=ClosedHandle}, T.concat ts) -- | Use a more efficient buffer size if we're reading in -- block-buffered mode with the default buffer size. When we can -- determine the size of the handle we're reading, set the buffer size -- to that, so that we can read the entire file in one chunk. -- Otherwise, use a buffer size of at least 16KB. chooseGoodBuffering :: Handle -> IO () chooseGoodBuffering h = do bufMode <- hGetBuffering h case bufMode of BlockBuffering Nothing -> do d <- E.catch (liftM2 (-) (hFileSize h) (hTell h)) $ \(e::IOException) -> if ioe_type e == InappropriateType then return 16384 -- faster than the 2KB default else E.throwIO e when (d > 0) . hSetBuffering h . BlockBuffering . Just . fromIntegral $ d _ -> return () #endif -- | Read a single line from a handle. hGetLine :: Handle -> IO Text #if __GLASGOW_HASKELL__ <= 610 hGetLine = fmap decodeUtf8 . B.hGetLine #else hGetLine = hGetLineWith T.concat #endif -- | Write a string to a handle. hPutStr :: Handle -> Text -> IO () #if __GLASGOW_HASKELL__ <= 610 hPutStr h = B.hPutStr h . encodeUtf8 #else -- This function is lifted almost verbatim from GHC.IO.Handle.Text. hPutStr h t = do (buffer_mode, nl) <- wantWritableHandle "hPutStr" h $ \h_ -> do bmode <- getSpareBuffer h_ return (bmode, haOutputNL h_) let str = stream t case buffer_mode of (NoBuffering, _) -> hPutChars h str (LineBuffering, buf) -> writeLines h nl buf str (BlockBuffering _, buf) | nl == CRLF -> writeBlocksCRLF h buf str | otherwise -> writeBlocksRaw h buf str hPutChars :: Handle -> Stream Char -> IO () hPutChars h (Stream next0 s0 _len) = loop s0 where loop !s = case next0 s of Done -> return () Skip s' -> loop s' Yield x s' -> hPutChar h x >> loop s' -- The following functions are largely lifted from GHC.IO.Handle.Text, -- but adapted to a coinductive stream of data instead of an inductive -- list. -- -- We have several variations of more or less the same code for -- performance reasons. Splitting the original buffered write -- function into line- and block-oriented versions gave us a 2.1x -- performance improvement. Lifting out the raw/cooked newline -- handling gave a few more percent on top. writeLines :: Handle -> Newline -> Buffer CharBufElem -> Stream Char -> IO () writeLines h nl buf0 (Stream next0 s0 _len) = outer s0 buf0 where outer s1 Buffer{bufRaw=raw, bufSize=len} = inner s1 (0::Int) where inner !s !n = case next0 s of Done -> commit n False{-no flush-} True{-release-} >> return () Skip s' -> inner s' n Yield x s' | n + 1 >= len -> commit n True{-needs flush-} False >>= outer s | x == '\n' -> do n' <- if nl == CRLF then do n1 <- writeCharBuf raw n '\r' writeCharBuf raw n1 '\n' else writeCharBuf raw n x commit n' True{-needs flush-} False >>= outer s' | otherwise -> writeCharBuf raw n x >>= inner s' commit = commitBuffer h raw len writeBlocksCRLF :: Handle -> Buffer CharBufElem -> Stream Char -> IO () writeBlocksCRLF h buf0 (Stream next0 s0 _len) = outer s0 buf0 where outer s1 Buffer{bufRaw=raw, bufSize=len} = inner s1 (0::Int) where inner !s !n = case next0 s of Done -> commit n False{-no flush-} True{-release-} >> return () Skip s' -> inner s' n Yield x s' | n + 1 >= len -> commit n True{-needs flush-} False >>= outer s | x == '\n' -> do n1 <- writeCharBuf raw n '\r' writeCharBuf raw n1 '\n' >>= inner s' | otherwise -> writeCharBuf raw n x >>= inner s' commit = commitBuffer h raw len writeBlocksRaw :: Handle -> Buffer CharBufElem -> Stream Char -> IO () writeBlocksRaw h buf0 (Stream next0 s0 _len) = outer s0 buf0 where outer s1 Buffer{bufRaw=raw, bufSize=len} = inner s1 (0::Int) where inner !s !n = case next0 s of Done -> commit n False{-no flush-} True{-release-} >> return () Skip s' -> inner s' n Yield x s' | n + 1 >= len -> commit n True{-needs flush-} False >>= outer s | otherwise -> writeCharBuf raw n x >>= inner s' commit = commitBuffer h raw len -- This function is completely lifted from GHC.IO.Handle.Text. getSpareBuffer :: Handle__ -> IO (BufferMode, CharBuffer) getSpareBuffer Handle__{haCharBuffer=ref, haBuffers=spare_ref, haBufferMode=mode} = do case mode of NoBuffering -> return (mode, error "no buffer!") _ -> do bufs <- readIORef spare_ref buf <- readIORef ref case bufs of BufferListCons b rest -> do writeIORef spare_ref rest return ( mode, emptyBuffer b (bufSize buf) WriteBuffer) BufferListNil -> do new_buf <- newCharBuffer (bufSize buf) WriteBuffer return (mode, new_buf) -- This function is completely lifted from GHC.IO.Handle.Text. commitBuffer :: Handle -> RawCharBuffer -> Int -> Int -> Bool -> Bool -> IO CharBuffer commitBuffer hdl !raw !sz !count flush release = wantWritableHandle "commitAndReleaseBuffer" hdl $ commitBuffer' raw sz count flush release {-# INLINE commitBuffer #-} #endif -- | Write a string to a handle, followed by a newline. hPutStrLn :: Handle -> Text -> IO () hPutStrLn h t = hPutStr h t >> hPutChar h '\n' -- | The 'interact' function takes a function of type @Text -> Text@ -- as its argument. The entire input from the standard input device is -- passed to this function as its argument, and the resulting string -- is output on the standard output device. interact :: (Text -> Text) -> IO () interact f = putStr . f =<< getContents -- | Read all user input on 'stdin' as a single string. getContents :: IO Text getContents = hGetContents stdin -- | Read a single line of user input from 'stdin'. getLine :: IO Text getLine = hGetLine stdin -- | Write a string to 'stdout'. putStr :: Text -> IO () putStr = hPutStr stdout -- | Write a string to 'stdout', followed by a newline. putStrLn :: Text -> IO () putStrLn = hPutStrLn stdout -- $locale -- -- /Note/: The behaviour of functions in this module depends on the -- version of GHC you are using. -- -- Beginning with GHC 6.12, text I\/O is performed using the system or -- handle's current locale and line ending conventions. -- -- Under GHC 6.10 and earlier, the system I\/O libraries do not -- support locale-sensitive I\/O or line ending conversion. On these -- versions of GHC, functions in this library all use UTF-8. What -- does this mean in practice? -- -- * All data that is read will be decoded as UTF-8. -- -- * Before data is written, it is first encoded as UTF-8. -- -- * On both reading and writing, the platform's native newline -- conversion is performed. -- -- If you must use a non-UTF-8 locale on an older version of GHC, you -- will have to perform the transcoding yourself, e.g. as follows: -- -- > import qualified Data.ByteString as B -- > import Data.Text (Text) -- > import Data.Text.Encoding (encodeUtf16) -- > -- > putStr_Utf16LE :: Text -> IO () -- > putStr_Utf16LE t = B.putStr (encodeUtf16LE t) text-0.11.3.1/Data/Text/Lazy.hs0000644000000000000000000014243312140332415014157 0ustar0000000000000000{-# OPTIONS_GHC -fno-warn-orphans #-} {-# LANGUAGE BangPatterns, MagicHash, CPP #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Lazy -- Copyright : (c) 2009, 2010, 2012 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- A time and space-efficient implementation of Unicode text using -- lists of packed arrays. -- -- /Note/: Read below the synopsis for important notes on the use of -- this module. -- -- The representation used by this module is suitable for high -- performance use and for streaming large quantities of data. It -- provides a means to manipulate a large body of text without -- requiring that the entire content be resident in memory. -- -- Some operations, such as 'concat', 'append', 'reverse' and 'cons', -- have better time complexity than their "Data.Text" equivalents, due -- to the underlying representation being a list of chunks. For other -- operations, lazy 'Text's are usually within a few percent of strict -- ones, but often with better heap usage if used in a streaming -- fashion. For data larger than available memory, or if you have -- tight memory constraints, this module will be the only option. -- -- This module is intended to be imported @qualified@, to avoid name -- clashes with "Prelude" functions. eg. -- -- > import qualified Data.Text.Lazy as L module Data.Text.Lazy ( -- * Fusion -- $fusion -- * Acceptable data -- $replacement -- * Types Text -- * Creation and elimination , pack , unpack , singleton , empty , fromChunks , toChunks , toStrict , fromStrict , foldrChunks , foldlChunks -- * Basic interface , cons , snoc , append , uncons , head , last , tail , init , null , length , compareLength -- * Transformations , map , intercalate , intersperse , transpose , reverse , replace -- ** Case conversion -- $case , toCaseFold , toLower , toUpper -- ** Justification , justifyLeft , justifyRight , center -- * Folds , foldl , foldl' , foldl1 , foldl1' , foldr , foldr1 -- ** Special folds , concat , concatMap , any , all , maximum , minimum -- * Construction -- ** Scans , scanl , scanl1 , scanr , scanr1 -- ** Accumulating maps , mapAccumL , mapAccumR -- ** Generation and unfolding , replicate , unfoldr , unfoldrN -- * Substrings -- ** Breaking strings , take , drop , takeWhile , dropWhile , dropWhileEnd , dropAround , strip , stripStart , stripEnd , splitAt , span , breakOn , breakOnEnd , break , group , groupBy , inits , tails -- ** Breaking into many substrings -- $split , splitOn , split , chunksOf -- , breakSubstring -- ** Breaking into lines and words , lines , words , unlines , unwords -- * Predicates , isPrefixOf , isSuffixOf , isInfixOf -- ** View patterns , stripPrefix , stripSuffix , commonPrefixes -- * Searching , filter , find , breakOnAll , partition -- , findSubstring -- * Indexing , index , count -- * Zipping and unzipping , zip , zipWith -- -* Ordered text -- , sort ) where import Prelude (Char, Bool(..), Maybe(..), String, Eq(..), Ord(..), Ordering(..), Read(..), Show(..), (&&), (||), (+), (-), (.), ($), (++), error, flip, fmap, fromIntegral, not, otherwise, quot) import qualified Prelude as P #if defined(HAVE_DEEPSEQ) import Control.DeepSeq (NFData(..)) #endif import Data.Int (Int64) import qualified Data.List as L import Data.Char (isSpace) import Data.Data (Data(gfoldl, toConstr, gunfold, dataTypeOf)) #if __GLASGOW_HASKELL__ >= 612 import Data.Data (mkNoRepType) #else import Data.Data (mkNorepType) #endif import Data.Monoid (Monoid(..)) import Data.String (IsString(..)) import qualified Data.Text as T import qualified Data.Text.Internal as T import qualified Data.Text.Fusion.Common as S import qualified Data.Text.Unsafe as T import qualified Data.Text.Lazy.Fusion as S import Data.Text.Fusion.Internal (PairS(..)) import Data.Text.Lazy.Fusion (stream, unstream) import Data.Text.Lazy.Internal (Text(..), chunk, empty, foldlChunks, foldrChunks) import Data.Text.Internal (firstf, safe, textP) import qualified Data.Text.Util as U import Data.Text.Lazy.Search (indices) #if __GLASGOW_HASKELL__ >= 702 import qualified GHC.CString as GHC #else import qualified GHC.Base as GHC #endif import GHC.Prim (Addr#) -- $fusion -- -- Most of the functions in this module are subject to /fusion/, -- meaning that a pipeline of such functions will usually allocate at -- most one 'Text' value. -- -- As an example, consider the following pipeline: -- -- > import Data.Text.Lazy as T -- > import Data.Text.Lazy.Encoding as E -- > import Data.ByteString.Lazy (ByteString) -- > -- > countChars :: ByteString -> Int -- > countChars = T.length . T.toUpper . E.decodeUtf8 -- -- From the type signatures involved, this looks like it should -- allocate one 'ByteString' value, and two 'Text' values. However, -- when a module is compiled with optimisation enabled under GHC, the -- two intermediate 'Text' values will be optimised away, and the -- function will be compiled down to a single loop over the source -- 'ByteString'. -- -- Functions that can be fused by the compiler are documented with the -- phrase \"Subject to fusion\". -- $replacement -- -- A 'Text' value is a sequence of Unicode scalar values, as defined -- in §3.9, definition D76 of the Unicode 5.2 standard: -- . As -- such, a 'Text' cannot contain values in the range U+D800 to U+DFFF -- inclusive. Haskell implementations admit all Unicode code points -- (§3.4, definition D10) as 'Char' values, including code points -- from this invalid range. This means that there are some 'Char' -- values that are not valid Unicode scalar values, and the functions -- in this module must handle those cases. -- -- Within this module, many functions construct a 'Text' from one or -- more 'Char' values. Those functions will substitute 'Char' values -- that are not valid Unicode scalar values with the replacement -- character \"�\" (U+FFFD). Functions that perform this -- inspection and replacement are documented with the phrase -- \"Performs replacement on invalid scalar values\". -- -- (One reason for this policy of replacement is that internally, a -- 'Text' value is represented as packed UTF-16 data. Values in the -- range U+D800 through U+DFFF are used by UTF-16 to denote surrogate -- code points, and so cannot be represented. The functions replace -- invalid scalar values, instead of dropping them, as a security -- measure. For details, see Unicode Technical Report 36, §3.5: -- ) equal :: Text -> Text -> Bool equal Empty Empty = True equal Empty _ = False equal _ Empty = False equal (Chunk a as) (Chunk b bs) = case compare lenA lenB of LT -> a == (T.takeWord16 lenA b) && as `equal` Chunk (T.dropWord16 lenA b) bs EQ -> a == b && as `equal` bs GT -> T.takeWord16 lenB a == b && Chunk (T.dropWord16 lenB a) as `equal` bs where lenA = T.lengthWord16 a lenB = T.lengthWord16 b instance Eq Text where (==) = equal {-# INLINE (==) #-} instance Ord Text where compare = compareText compareText :: Text -> Text -> Ordering compareText Empty Empty = EQ compareText Empty _ = LT compareText _ Empty = GT compareText (Chunk a0 as) (Chunk b0 bs) = outer a0 b0 where outer ta@(T.Text arrA offA lenA) tb@(T.Text arrB offB lenB) = go 0 0 where go !i !j | i >= lenA = compareText as (chunk (T.Text arrB (offB+j) (lenB-j)) bs) | j >= lenB = compareText (chunk (T.Text arrA (offA+i) (lenA-i)) as) bs | a < b = LT | a > b = GT | otherwise = go (i+di) (j+dj) where T.Iter a di = T.iter ta i T.Iter b dj = T.iter tb j instance Show Text where showsPrec p ps r = showsPrec p (unpack ps) r instance Read Text where readsPrec p str = [(pack x,y) | (x,y) <- readsPrec p str] instance Monoid Text where mempty = empty mappend = append mconcat = concat instance IsString Text where fromString = pack #if defined(HAVE_DEEPSEQ) instance NFData Text where rnf Empty = () rnf (Chunk _ ts) = rnf ts #endif instance Data Text where gfoldl f z txt = z pack `f` (unpack txt) toConstr _ = error "Data.Text.Lazy.Text.toConstr" gunfold _ _ = error "Data.Text.Lazy.Text.gunfold" #if __GLASGOW_HASKELL__ >= 612 dataTypeOf _ = mkNoRepType "Data.Text.Lazy.Text" #else dataTypeOf _ = mkNorepType "Data.Text.Lazy.Text" #endif -- | /O(n)/ Convert a 'String' into a 'Text'. -- -- Subject to fusion. Performs replacement on invalid scalar values. pack :: String -> Text pack = unstream . S.streamList . L.map safe {-# INLINE [1] pack #-} -- | /O(n)/ Convert a 'Text' into a 'String'. -- Subject to fusion. unpack :: Text -> String unpack t = S.unstreamList (stream t) {-# INLINE [1] unpack #-} -- | /O(n)/ Convert a literal string into a Text. unpackCString# :: Addr# -> Text unpackCString# addr# = unstream (S.streamCString# addr#) {-# NOINLINE unpackCString# #-} {-# RULES "TEXT literal" forall a. unstream (S.streamList (L.map safe (GHC.unpackCString# a))) = unpackCString# a #-} {-# RULES "TEXT literal UTF8" forall a. unstream (S.streamList (L.map safe (GHC.unpackCStringUtf8# a))) = unpackCString# a #-} -- | /O(1)/ Convert a character into a Text. Subject to fusion. -- Performs replacement on invalid scalar values. singleton :: Char -> Text singleton c = Chunk (T.singleton c) Empty {-# INLINE [1] singleton #-} {-# RULES "LAZY TEXT singleton -> fused" [~1] forall c. singleton c = unstream (S.singleton c) "LAZY TEXT singleton -> unfused" [1] forall c. unstream (S.singleton c) = singleton c #-} -- | /O(c)/ Convert a list of strict 'T.Text's into a lazy 'Text'. fromChunks :: [T.Text] -> Text fromChunks cs = L.foldr chunk Empty cs -- | /O(n)/ Convert a lazy 'Text' into a list of strict 'T.Text's. toChunks :: Text -> [T.Text] toChunks cs = foldrChunks (:) [] cs -- | /O(n)/ Convert a lazy 'Text' into a strict 'T.Text'. toStrict :: Text -> T.Text toStrict t = T.concat (toChunks t) {-# INLINE [1] toStrict #-} -- | /O(c)/ Convert a strict 'T.Text' into a lazy 'Text'. fromStrict :: T.Text -> Text fromStrict t = chunk t Empty {-# INLINE [1] fromStrict #-} -- ----------------------------------------------------------------------------- -- * Basic functions -- | /O(n)/ Adds a character to the front of a 'Text'. This function -- is more costly than its 'List' counterpart because it requires -- copying a new array. Subject to fusion. cons :: Char -> Text -> Text cons c t = Chunk (T.singleton c) t {-# INLINE [1] cons #-} infixr 5 `cons` {-# RULES "LAZY TEXT cons -> fused" [~1] forall c t. cons c t = unstream (S.cons c (stream t)) "LAZY TEXT cons -> unfused" [1] forall c t. unstream (S.cons c (stream t)) = cons c t #-} -- | /O(n)/ Adds a character to the end of a 'Text'. This copies the -- entire array in the process, unless fused. Subject to fusion. snoc :: Text -> Char -> Text snoc t c = foldrChunks Chunk (singleton c) t {-# INLINE [1] snoc #-} {-# RULES "LAZY TEXT snoc -> fused" [~1] forall t c. snoc t c = unstream (S.snoc (stream t) c) "LAZY TEXT snoc -> unfused" [1] forall t c. unstream (S.snoc (stream t) c) = snoc t c #-} -- | /O(n\/c)/ Appends one 'Text' to another. Subject to fusion. append :: Text -> Text -> Text append xs ys = foldrChunks Chunk ys xs {-# INLINE [1] append #-} {-# RULES "LAZY TEXT append -> fused" [~1] forall t1 t2. append t1 t2 = unstream (S.append (stream t1) (stream t2)) "LAZY TEXT append -> unfused" [1] forall t1 t2. unstream (S.append (stream t1) (stream t2)) = append t1 t2 #-} -- | /O(1)/ Returns the first character and rest of a 'Text', or -- 'Nothing' if empty. Subject to fusion. uncons :: Text -> Maybe (Char, Text) uncons Empty = Nothing uncons (Chunk t ts) = Just (T.unsafeHead t, ts') where ts' | T.compareLength t 1 == EQ = ts | otherwise = Chunk (T.unsafeTail t) ts {-# INLINE uncons #-} -- | /O(1)/ Returns the first character of a 'Text', which must be -- non-empty. Subject to fusion. head :: Text -> Char head t = S.head (stream t) {-# INLINE head #-} -- | /O(1)/ Returns all characters after the head of a 'Text', which -- must be non-empty. Subject to fusion. tail :: Text -> Text tail (Chunk t ts) = chunk (T.tail t) ts tail Empty = emptyError "tail" {-# INLINE [1] tail #-} {-# RULES "LAZY TEXT tail -> fused" [~1] forall t. tail t = unstream (S.tail (stream t)) "LAZY TEXT tail -> unfused" [1] forall t. unstream (S.tail (stream t)) = tail t #-} -- | /O(1)/ Returns all but the last character of a 'Text', which must -- be non-empty. Subject to fusion. init :: Text -> Text init (Chunk t0 ts0) = go t0 ts0 where go t (Chunk t' ts) = Chunk t (go t' ts) go t Empty = chunk (T.init t) Empty init Empty = emptyError "init" {-# INLINE [1] init #-} {-# RULES "LAZY TEXT init -> fused" [~1] forall t. init t = unstream (S.init (stream t)) "LAZY TEXT init -> unfused" [1] forall t. unstream (S.init (stream t)) = init t #-} -- | /O(1)/ Tests whether a 'Text' is empty or not. Subject to -- fusion. null :: Text -> Bool null Empty = True null _ = False {-# INLINE [1] null #-} {-# RULES "LAZY TEXT null -> fused" [~1] forall t. null t = S.null (stream t) "LAZY TEXT null -> unfused" [1] forall t. S.null (stream t) = null t #-} -- | /O(1)/ Tests whether a 'Text' contains exactly one character. -- Subject to fusion. isSingleton :: Text -> Bool isSingleton = S.isSingleton . stream {-# INLINE isSingleton #-} -- | /O(1)/ Returns the last character of a 'Text', which must be -- non-empty. Subject to fusion. last :: Text -> Char last Empty = emptyError "last" last (Chunk t ts) = go t ts where go _ (Chunk t' ts') = go t' ts' go t' Empty = T.last t' {-# INLINE [1] last #-} {-# RULES "LAZY TEXT last -> fused" [~1] forall t. last t = S.last (stream t) "LAZY TEXT last -> unfused" [1] forall t. S.last (stream t) = last t #-} -- | /O(n)/ Returns the number of characters in a 'Text'. -- Subject to fusion. length :: Text -> Int64 length = foldlChunks go 0 where go l t = l + fromIntegral (T.length t) {-# INLINE [1] length #-} {-# RULES "LAZY TEXT length -> fused" [~1] forall t. length t = S.length (stream t) "LAZY TEXT length -> unfused" [1] forall t. S.length (stream t) = length t #-} -- | /O(n)/ Compare the count of characters in a 'Text' to a number. -- Subject to fusion. -- -- This function gives the same answer as comparing against the result -- of 'length', but can short circuit if the count of characters is -- greater than the number, and hence be more efficient. compareLength :: Text -> Int64 -> Ordering compareLength t n = S.compareLengthI (stream t) n {-# INLINE [1] compareLength #-} -- We don't apply those otherwise appealing length-to-compareLength -- rewrite rules here, because they can change the strictness -- properties of code. -- | /O(n)/ 'map' @f@ @t@ is the 'Text' obtained by applying @f@ to -- each element of @t@. Subject to fusion. Performs replacement on -- invalid scalar values. map :: (Char -> Char) -> Text -> Text map f t = unstream (S.map (safe . f) (stream t)) {-# INLINE [1] map #-} -- | /O(n)/ The 'intercalate' function takes a 'Text' and a list of -- 'Text's and concatenates the list after interspersing the first -- argument between each element of the list. intercalate :: Text -> [Text] -> Text intercalate t = concat . (U.intersperse t) {-# INLINE intercalate #-} -- | /O(n)/ The 'intersperse' function takes a character and places it -- between the characters of a 'Text'. Subject to fusion. Performs -- replacement on invalid scalar values. intersperse :: Char -> Text -> Text intersperse c t = unstream (S.intersperse (safe c) (stream t)) {-# INLINE intersperse #-} -- | /O(n)/ Left-justify a string to the given length, using the -- specified fill character on the right. Subject to fusion. Performs -- replacement on invalid scalar values. -- -- Examples: -- -- > justifyLeft 7 'x' "foo" == "fooxxxx" -- > justifyLeft 3 'x' "foobar" == "foobar" justifyLeft :: Int64 -> Char -> Text -> Text justifyLeft k c t | len >= k = t | otherwise = t `append` replicateChar (k-len) c where len = length t {-# INLINE [1] justifyLeft #-} {-# RULES "LAZY TEXT justifyLeft -> fused" [~1] forall k c t. justifyLeft k c t = unstream (S.justifyLeftI k c (stream t)) "LAZY TEXT justifyLeft -> unfused" [1] forall k c t. unstream (S.justifyLeftI k c (stream t)) = justifyLeft k c t #-} -- | /O(n)/ Right-justify a string to the given length, using the -- specified fill character on the left. Performs replacement on -- invalid scalar values. -- -- Examples: -- -- > justifyRight 7 'x' "bar" == "xxxxbar" -- > justifyRight 3 'x' "foobar" == "foobar" justifyRight :: Int64 -> Char -> Text -> Text justifyRight k c t | len >= k = t | otherwise = replicateChar (k-len) c `append` t where len = length t {-# INLINE justifyRight #-} -- | /O(n)/ Center a string to the given length, using the specified -- fill character on either side. Performs replacement on invalid -- scalar values. -- -- Examples: -- -- > center 8 'x' "HS" = "xxxHSxxx" center :: Int64 -> Char -> Text -> Text center k c t | len >= k = t | otherwise = replicateChar l c `append` t `append` replicateChar r c where len = length t d = k - len r = d `quot` 2 l = d - r {-# INLINE center #-} -- | /O(n)/ The 'transpose' function transposes the rows and columns -- of its 'Text' argument. Note that this function uses 'pack', -- 'unpack', and the list version of transpose, and is thus not very -- efficient. transpose :: [Text] -> [Text] transpose ts = L.map (\ss -> Chunk (T.pack ss) Empty) (L.transpose (L.map unpack ts)) -- TODO: make this fast -- | /O(n)/ 'reverse' @t@ returns the elements of @t@ in reverse order. reverse :: Text -> Text reverse = rev Empty where rev a Empty = a rev a (Chunk t ts) = rev (Chunk (T.reverse t) a) ts -- | /O(m+n)/ Replace every occurrence of one substring with another. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. replace :: Text -- ^ Text to search for -> Text -- ^ Replacement text -> Text -- ^ Input text -> Text replace s d = intercalate d . splitOn s {-# INLINE replace #-} -- ---------------------------------------------------------------------------- -- ** Case conversions (folds) -- $case -- -- With Unicode text, it is incorrect to use combinators like @map -- toUpper@ to case convert each character of a string individually. -- Instead, use the whole-string case conversion functions from this -- module. For correctness in different writing systems, these -- functions may map one input character to two or three output -- characters. -- | /O(n)/ Convert a string to folded case. This function is mainly -- useful for performing caseless (or case insensitive) string -- comparisons. -- -- A string @x@ is a caseless match for a string @y@ if and only if: -- -- @toCaseFold x == toCaseFold y@ -- -- The result string may be longer than the input string, and may -- differ from applying 'toLower' to the input string. For instance, -- the Armenian small ligature men now (U+FB13) is case folded to the -- bigram men now (U+0574 U+0576), while the micro sign (U+00B5) is -- case folded to the Greek small letter letter mu (U+03BC) instead of -- itself. toCaseFold :: Text -> Text toCaseFold t = unstream (S.toCaseFold (stream t)) {-# INLINE [0] toCaseFold #-} -- | /O(n)/ Convert a string to lower case, using simple case -- conversion. The result string may be longer than the input string. -- For instance, the Latin capital letter I with dot above (U+0130) -- maps to the sequence Latin small letter i (U+0069) followed by -- combining dot above (U+0307). toLower :: Text -> Text toLower t = unstream (S.toLower (stream t)) {-# INLINE toLower #-} -- | /O(n)/ Convert a string to upper case, using simple case -- conversion. The result string may be longer than the input string. -- For instance, the German eszett (U+00DF) maps to the two-letter -- sequence SS. toUpper :: Text -> Text toUpper t = unstream (S.toUpper (stream t)) {-# INLINE toUpper #-} -- | /O(n)/ 'foldl', applied to a binary operator, a starting value -- (typically the left-identity of the operator), and a 'Text', -- reduces the 'Text' using the binary operator, from left to right. -- Subject to fusion. foldl :: (a -> Char -> a) -> a -> Text -> a foldl f z t = S.foldl f z (stream t) {-# INLINE foldl #-} -- | /O(n)/ A strict version of 'foldl'. -- Subject to fusion. foldl' :: (a -> Char -> a) -> a -> Text -> a foldl' f z t = S.foldl' f z (stream t) {-# INLINE foldl' #-} -- | /O(n)/ A variant of 'foldl' that has no starting value argument, -- and thus must be applied to a non-empty 'Text'. Subject to fusion. foldl1 :: (Char -> Char -> Char) -> Text -> Char foldl1 f t = S.foldl1 f (stream t) {-# INLINE foldl1 #-} -- | /O(n)/ A strict version of 'foldl1'. Subject to fusion. foldl1' :: (Char -> Char -> Char) -> Text -> Char foldl1' f t = S.foldl1' f (stream t) {-# INLINE foldl1' #-} -- | /O(n)/ 'foldr', applied to a binary operator, a starting value -- (typically the right-identity of the operator), and a 'Text', -- reduces the 'Text' using the binary operator, from right to left. -- Subject to fusion. foldr :: (Char -> a -> a) -> a -> Text -> a foldr f z t = S.foldr f z (stream t) {-# INLINE foldr #-} -- | /O(n)/ A variant of 'foldr' that has no starting value argument, -- and thus must be applied to a non-empty 'Text'. Subject to -- fusion. foldr1 :: (Char -> Char -> Char) -> Text -> Char foldr1 f t = S.foldr1 f (stream t) {-# INLINE foldr1 #-} -- | /O(n)/ Concatenate a list of 'Text's. concat :: [Text] -> Text concat = to where go Empty css = to css go (Chunk c cs) css = Chunk c (go cs css) to [] = Empty to (cs:css) = go cs css {-# INLINE concat #-} -- | /O(n)/ Map a function over a 'Text' that results in a 'Text', and -- concatenate the results. concatMap :: (Char -> Text) -> Text -> Text concatMap f = concat . foldr ((:) . f) [] {-# INLINE concatMap #-} -- | /O(n)/ 'any' @p@ @t@ determines whether any character in the -- 'Text' @t@ satisifes the predicate @p@. Subject to fusion. any :: (Char -> Bool) -> Text -> Bool any p t = S.any p (stream t) {-# INLINE any #-} -- | /O(n)/ 'all' @p@ @t@ determines whether all characters in the -- 'Text' @t@ satisify the predicate @p@. Subject to fusion. all :: (Char -> Bool) -> Text -> Bool all p t = S.all p (stream t) {-# INLINE all #-} -- | /O(n)/ 'maximum' returns the maximum value from a 'Text', which -- must be non-empty. Subject to fusion. maximum :: Text -> Char maximum t = S.maximum (stream t) {-# INLINE maximum #-} -- | /O(n)/ 'minimum' returns the minimum value from a 'Text', which -- must be non-empty. Subject to fusion. minimum :: Text -> Char minimum t = S.minimum (stream t) {-# INLINE minimum #-} -- | /O(n)/ 'scanl' is similar to 'foldl', but returns a list of -- successive reduced values from the left. Subject to fusion. -- Performs replacement on invalid scalar values. -- -- > scanl f z [x1, x2, ...] == [z, z `f` x1, (z `f` x1) `f` x2, ...] -- -- Note that -- -- > last (scanl f z xs) == foldl f z xs. scanl :: (Char -> Char -> Char) -> Char -> Text -> Text scanl f z t = unstream (S.scanl g z (stream t)) where g a b = safe (f a b) {-# INLINE scanl #-} -- | /O(n)/ 'scanl1' is a variant of 'scanl' that has no starting -- value argument. Subject to fusion. Performs replacement on -- invalid scalar values. -- -- > scanl1 f [x1, x2, ...] == [x1, x1 `f` x2, ...] scanl1 :: (Char -> Char -> Char) -> Text -> Text scanl1 f t0 = case uncons t0 of Nothing -> empty Just (t,ts) -> scanl f t ts {-# INLINE scanl1 #-} -- | /O(n)/ 'scanr' is the right-to-left dual of 'scanl'. Performs -- replacement on invalid scalar values. -- -- > scanr f v == reverse . scanl (flip f) v . reverse scanr :: (Char -> Char -> Char) -> Char -> Text -> Text scanr f v = reverse . scanl g v . reverse where g a b = safe (f b a) -- | /O(n)/ 'scanr1' is a variant of 'scanr' that has no starting -- value argument. Performs replacement on invalid scalar values. scanr1 :: (Char -> Char -> Char) -> Text -> Text scanr1 f t | null t = empty | otherwise = scanr f (last t) (init t) -- | /O(n)/ Like a combination of 'map' and 'foldl''. Applies a -- function to each element of a 'Text', passing an accumulating -- parameter from left to right, and returns a final 'Text'. Performs -- replacement on invalid scalar values. mapAccumL :: (a -> Char -> (a,Char)) -> a -> Text -> (a, Text) mapAccumL f = go where go z (Chunk c cs) = (z'', Chunk c' cs') where (z', c') = T.mapAccumL f z c (z'', cs') = go z' cs go z Empty = (z, Empty) {-# INLINE mapAccumL #-} -- | The 'mapAccumR' function behaves like a combination of 'map' and -- a strict 'foldr'; it applies a function to each element of a -- 'Text', passing an accumulating parameter from right to left, and -- returning a final value of this accumulator together with the new -- 'Text'. Performs replacement on invalid scalar values. mapAccumR :: (a -> Char -> (a,Char)) -> a -> Text -> (a, Text) mapAccumR f = go where go z (Chunk c cs) = (z'', Chunk c' cs') where (z'', c') = T.mapAccumR f z' c (z', cs') = go z cs go z Empty = (z, Empty) {-# INLINE mapAccumR #-} -- | /O(n*m)/ 'replicate' @n@ @t@ is a 'Text' consisting of the input -- @t@ repeated @n@ times. replicate :: Int64 -> Text -> Text replicate n t | null t || n <= 0 = empty | isSingleton t = replicateChar n (head t) | otherwise = concat (rep 0) where rep !i | i >= n = [] | otherwise = t : rep (i+1) {-# INLINE replicate #-} -- | /O(n)/ 'replicateChar' @n@ @c@ is a 'Text' of length @n@ with @c@ the -- value of every element. Subject to fusion. replicateChar :: Int64 -> Char -> Text replicateChar n c = unstream (S.replicateCharI n (safe c)) {-# INLINE replicateChar #-} {-# RULES "LAZY TEXT replicate/singleton -> replicateChar" [~1] forall n c. replicate n (singleton c) = replicateChar n c #-} -- | /O(n)/, where @n@ is the length of the result. The 'unfoldr' -- function is analogous to the List 'L.unfoldr'. 'unfoldr' builds a -- 'Text' from a seed value. The function takes the element and -- returns 'Nothing' if it is done producing the 'Text', otherwise -- 'Just' @(a,b)@. In this case, @a@ is the next 'Char' in the -- string, and @b@ is the seed value for further production. Performs -- replacement on invalid scalar values. unfoldr :: (a -> Maybe (Char,a)) -> a -> Text unfoldr f s = unstream (S.unfoldr (firstf safe . f) s) {-# INLINE unfoldr #-} -- | /O(n)/ Like 'unfoldr', 'unfoldrN' builds a 'Text' from a seed -- value. However, the length of the result should be limited by the -- first argument to 'unfoldrN'. This function is more efficient than -- 'unfoldr' when the maximum length of the result is known and -- correct, otherwise its performance is similar to 'unfoldr'. -- Performs replacement on invalid scalar values. unfoldrN :: Int64 -> (a -> Maybe (Char,a)) -> a -> Text unfoldrN n f s = unstream (S.unfoldrN n (firstf safe . f) s) {-# INLINE unfoldrN #-} -- | /O(n)/ 'take' @n@, applied to a 'Text', returns the prefix of the -- 'Text' of length @n@, or the 'Text' itself if @n@ is greater than -- the length of the Text. Subject to fusion. take :: Int64 -> Text -> Text take i _ | i <= 0 = Empty take i t0 = take' i t0 where take' 0 _ = Empty take' _ Empty = Empty take' n (Chunk t ts) | n < len = Chunk (T.take (fromIntegral n) t) Empty | otherwise = Chunk t (take' (n - len) ts) where len = fromIntegral (T.length t) {-# INLINE [1] take #-} {-# RULES "LAZY TEXT take -> fused" [~1] forall n t. take n t = unstream (S.take n (stream t)) "LAZY TEXT take -> unfused" [1] forall n t. unstream (S.take n (stream t)) = take n t #-} -- | /O(n)/ 'drop' @n@, applied to a 'Text', returns the suffix of the -- 'Text' after the first @n@ characters, or the empty 'Text' if @n@ -- is greater than the length of the 'Text'. Subject to fusion. drop :: Int64 -> Text -> Text drop i t0 | i <= 0 = t0 | otherwise = drop' i t0 where drop' 0 ts = ts drop' _ Empty = Empty drop' n (Chunk t ts) | n < len = Chunk (T.drop (fromIntegral n) t) ts | otherwise = drop' (n - len) ts where len = fromIntegral (T.length t) {-# INLINE [1] drop #-} {-# RULES "LAZY TEXT drop -> fused" [~1] forall n t. drop n t = unstream (S.drop n (stream t)) "LAZY TEXT drop -> unfused" [1] forall n t. unstream (S.drop n (stream t)) = drop n t #-} -- | /O(n)/ 'dropWords' @n@ returns the suffix with @n@ 'Word16' -- values dropped, or the empty 'Text' if @n@ is greater than the -- number of 'Word16' values present. dropWords :: Int64 -> Text -> Text dropWords i t0 | i <= 0 = t0 | otherwise = drop' i t0 where drop' 0 ts = ts drop' _ Empty = Empty drop' n (Chunk (T.Text arr off len) ts) | n < len' = chunk (textP arr (off+n') (len-n')) ts | otherwise = drop' (n - len') ts where len' = fromIntegral len n' = fromIntegral n -- | /O(n)/ 'takeWhile', applied to a predicate @p@ and a 'Text', -- returns the longest prefix (possibly empty) of elements that -- satisfy @p@. Subject to fusion. takeWhile :: (Char -> Bool) -> Text -> Text takeWhile p t0 = takeWhile' t0 where takeWhile' Empty = Empty takeWhile' (Chunk t ts) = case T.findIndex (not . p) t of Just n | n > 0 -> Chunk (T.take n t) Empty | otherwise -> Empty Nothing -> Chunk t (takeWhile' ts) {-# INLINE [1] takeWhile #-} {-# RULES "LAZY TEXT takeWhile -> fused" [~1] forall p t. takeWhile p t = unstream (S.takeWhile p (stream t)) "LAZY TEXT takeWhile -> unfused" [1] forall p t. unstream (S.takeWhile p (stream t)) = takeWhile p t #-} -- | /O(n)/ 'dropWhile' @p@ @t@ returns the suffix remaining after -- 'takeWhile' @p@ @t@. Subject to fusion. dropWhile :: (Char -> Bool) -> Text -> Text dropWhile p t0 = dropWhile' t0 where dropWhile' Empty = Empty dropWhile' (Chunk t ts) = case T.findIndex (not . p) t of Just n -> Chunk (T.drop n t) ts Nothing -> dropWhile' ts {-# INLINE [1] dropWhile #-} {-# RULES "LAZY TEXT dropWhile -> fused" [~1] forall p t. dropWhile p t = unstream (S.dropWhile p (stream t)) "LAZY TEXT dropWhile -> unfused" [1] forall p t. unstream (S.dropWhile p (stream t)) = dropWhile p t #-} -- | /O(n)/ 'dropWhileEnd' @p@ @t@ returns the prefix remaining after -- dropping characters that fail the predicate @p@ from the end of -- @t@. -- Examples: -- -- > dropWhileEnd (=='.') "foo..." == "foo" dropWhileEnd :: (Char -> Bool) -> Text -> Text dropWhileEnd p = go where go Empty = Empty go (Chunk t Empty) = if T.null t' then Empty else Chunk t' Empty where t' = T.dropWhileEnd p t go (Chunk t ts) = case go ts of Empty -> go (Chunk t Empty) ts' -> Chunk t ts' {-# INLINE dropWhileEnd #-} -- | /O(n)/ 'dropAround' @p@ @t@ returns the substring remaining after -- dropping characters that fail the predicate @p@ from both the -- beginning and end of @t@. Subject to fusion. dropAround :: (Char -> Bool) -> Text -> Text dropAround p = dropWhile p . dropWhileEnd p {-# INLINE [1] dropAround #-} -- | /O(n)/ Remove leading white space from a string. Equivalent to: -- -- > dropWhile isSpace stripStart :: Text -> Text stripStart = dropWhile isSpace {-# INLINE [1] stripStart #-} -- | /O(n)/ Remove trailing white space from a string. Equivalent to: -- -- > dropWhileEnd isSpace stripEnd :: Text -> Text stripEnd = dropWhileEnd isSpace {-# INLINE [1] stripEnd #-} -- | /O(n)/ Remove leading and trailing white space from a string. -- Equivalent to: -- -- > dropAround isSpace strip :: Text -> Text strip = dropAround isSpace {-# INLINE [1] strip #-} -- | /O(n)/ 'splitAt' @n t@ returns a pair whose first element is a -- prefix of @t@ of length @n@, and whose second is the remainder of -- the string. It is equivalent to @('take' n t, 'drop' n t)@. splitAt :: Int64 -> Text -> (Text, Text) splitAt = loop where loop _ Empty = (empty, empty) loop n t | n <= 0 = (empty, t) loop n (Chunk t ts) | n < len = let (t',t'') = T.splitAt (fromIntegral n) t in (Chunk t' Empty, Chunk t'' ts) | otherwise = let (ts',ts'') = loop (n - len) ts in (Chunk t ts', ts'') where len = fromIntegral (T.length t) -- | /O(n)/ 'splitAtWord' @n t@ returns a strict pair whose first -- element is a prefix of @t@ whose chunks contain @n@ 'Word16' -- values, and whose second is the remainder of the string. splitAtWord :: Int64 -> Text -> PairS Text Text splitAtWord _ Empty = empty :*: empty splitAtWord x (Chunk c@(T.Text arr off len) cs) | y >= len = let h :*: t = splitAtWord (x-fromIntegral len) cs in Chunk c h :*: t | otherwise = chunk (textP arr off y) empty :*: chunk (textP arr (off+y) (len-y)) cs where y = fromIntegral x -- | /O(n+m)/ Find the first instance of @needle@ (which must be -- non-'null') in @haystack@. The first element of the returned tuple -- is the prefix of @haystack@ before @needle@ is matched. The second -- is the remainder of @haystack@, starting with the match. -- -- Examples: -- -- > breakOn "::" "a::b::c" ==> ("a", "::b::c") -- > breakOn "/" "foobar" ==> ("foobar", "") -- -- Laws: -- -- > append prefix match == haystack -- > where (prefix, match) = breakOn needle haystack -- -- If you need to break a string by a substring repeatedly (e.g. you -- want to break on every instance of a substring), use 'breakOnAll' -- instead, as it has lower startup overhead. -- -- This function is strict in its first argument, and lazy in its -- second. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. breakOn :: Text -> Text -> (Text, Text) breakOn pat src | null pat = emptyError "breakOn" | otherwise = case indices pat src of [] -> (src, empty) (x:_) -> let h :*: t = splitAtWord x src in (h, t) -- | /O(n+m)/ Similar to 'breakOn', but searches from the end of the string. -- -- The first element of the returned tuple is the prefix of @haystack@ -- up to and including the last match of @needle@. The second is the -- remainder of @haystack@, following the match. -- -- > breakOnEnd "::" "a::b::c" ==> ("a::b::", "c") breakOnEnd :: Text -> Text -> (Text, Text) breakOnEnd pat src = let (a,b) = breakOn (reverse pat) (reverse src) in (reverse b, reverse a) {-# INLINE breakOnEnd #-} -- | /O(n+m)/ Find all non-overlapping instances of @needle@ in -- @haystack@. Each element of the returned list consists of a pair: -- -- * The entire string prior to the /k/th match (i.e. the prefix) -- -- * The /k/th match, followed by the remainder of the string -- -- Examples: -- -- > breakOnAll "::" "" -- > ==> [] -- > breakOnAll "/" "a/b/c/" -- > ==> [("a", "/b/c/"), ("a/b", "/c/"), ("a/b/c", "/")] -- -- This function is strict in its first argument, and lazy in its -- second. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. -- -- The @needle@ parameter may not be empty. breakOnAll :: Text -- ^ @needle@ to search for -> Text -- ^ @haystack@ in which to search -> [(Text, Text)] breakOnAll pat src | null pat = emptyError "breakOnAll" | otherwise = go 0 empty src (indices pat src) where go !n p s (x:xs) = let h :*: t = splitAtWord (x-n) s h' = append p h in (h',t) : go x h' t xs go _ _ _ _ = [] -- | /O(n)/ 'break' is like 'span', but the prefix returned is over -- elements that fail the predicate @p@. break :: (Char -> Bool) -> Text -> (Text, Text) break p t0 = break' t0 where break' Empty = (empty, empty) break' c@(Chunk t ts) = case T.findIndex p t of Nothing -> let (ts', ts'') = break' ts in (Chunk t ts', ts'') Just n | n == 0 -> (Empty, c) | otherwise -> let (a,b) = T.splitAt n t in (Chunk a Empty, Chunk b ts) -- | /O(n)/ 'span', applied to a predicate @p@ and text @t@, returns -- a pair whose first element is the longest prefix (possibly empty) -- of @t@ of elements that satisfy @p@, and whose second is the -- remainder of the list. span :: (Char -> Bool) -> Text -> (Text, Text) span p = break (not . p) {-# INLINE span #-} -- | The 'group' function takes a 'Text' and returns a list of 'Text's -- such that the concatenation of the result is equal to the argument. -- Moreover, each sublist in the result contains only equal elements. -- For example, -- -- > group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"] -- -- It is a special case of 'groupBy', which allows the programmer to -- supply their own equality test. group :: Text -> [Text] group = groupBy (==) {-# INLINE group #-} -- | The 'groupBy' function is the non-overloaded version of 'group'. groupBy :: (Char -> Char -> Bool) -> Text -> [Text] groupBy _ Empty = [] groupBy eq (Chunk t ts) = cons x ys : groupBy eq zs where (ys,zs) = span (eq x) xs x = T.unsafeHead t xs = chunk (T.unsafeTail t) ts -- | /O(n)/ Return all initial segments of the given 'Text', -- shortest first. inits :: Text -> [Text] inits = (Empty :) . inits' where inits' Empty = [] inits' (Chunk t ts) = L.map (\t' -> Chunk t' Empty) (L.tail (T.inits t)) ++ L.map (Chunk t) (inits' ts) -- | /O(n)/ Return all final segments of the given 'Text', longest -- first. tails :: Text -> [Text] tails Empty = Empty : [] tails ts@(Chunk t ts') | T.length t == 1 = ts : tails ts' | otherwise = ts : tails (Chunk (T.unsafeTail t) ts') -- $split -- -- Splitting functions in this library do not perform character-wise -- copies to create substrings; they just construct new 'Text's that -- are slices of the original. -- | /O(m+n)/ Break a 'Text' into pieces separated by the first -- 'Text' argument, consuming the delimiter. An empty delimiter is -- invalid, and will cause an error to be raised. -- -- Examples: -- -- > splitOn "\r\n" "a\r\nb\r\nd\r\ne" == ["a","b","d","e"] -- > splitOn "aaa" "aaaXaaaXaaaXaaa" == ["","X","X","X",""] -- > splitOn "x" "x" == ["",""] -- -- and -- -- > intercalate s . splitOn s == id -- > splitOn (singleton c) == split (==c) -- -- This function is strict in its first argument, and lazy in its -- second. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. splitOn :: Text -- ^ Text to split on -> Text -- ^ Input text -> [Text] splitOn pat src | null pat = emptyError "splitOn" | isSingleton pat = split (== head pat) src | otherwise = go 0 (indices pat src) src where go _ [] cs = [cs] go !i (x:xs) cs = let h :*: t = splitAtWord (x-i) cs in h : go (x+l) xs (dropWords l t) l = foldlChunks (\a (T.Text _ _ b) -> a + fromIntegral b) 0 pat {-# INLINE [1] splitOn #-} {-# RULES "LAZY TEXT splitOn/singleton -> split/==" [~1] forall c t. splitOn (singleton c) t = split (==c) t #-} -- | /O(n)/ Splits a 'Text' into components delimited by separators, -- where the predicate returns True for a separator element. The -- resulting components do not contain the separators. Two adjacent -- separators result in an empty component in the output. eg. -- -- > split (=='a') "aabbaca" == ["","","bb","c",""] -- > split (=='a') [] == [""] split :: (Char -> Bool) -> Text -> [Text] split _ Empty = [Empty] split p (Chunk t0 ts0) = comb [] (T.split p t0) ts0 where comb acc (s:[]) Empty = revChunks (s:acc) : [] comb acc (s:[]) (Chunk t ts) = comb (s:acc) (T.split p t) ts comb acc (s:ss) ts = revChunks (s:acc) : comb [] ss ts comb _ [] _ = impossibleError "split" {-# INLINE split #-} -- | /O(n)/ Splits a 'Text' into components of length @k@. The last -- element may be shorter than the other chunks, depending on the -- length of the input. Examples: -- -- > chunksOf 3 "foobarbaz" == ["foo","bar","baz"] -- > chunksOf 4 "haskell.org" == ["hask","ell.","org"] chunksOf :: Int64 -> Text -> [Text] chunksOf k = go where go t = case splitAt k t of (a,b) | null a -> [] | otherwise -> a : go b {-# INLINE chunksOf #-} -- | /O(n)/ Breaks a 'Text' up into a list of 'Text's at -- newline 'Char's. The resulting strings do not contain newlines. lines :: Text -> [Text] lines Empty = [] lines t = let (l,t') = break ((==) '\n') t in l : if null t' then [] else lines (tail t') -- | /O(n)/ Breaks a 'Text' up into a list of words, delimited by 'Char's -- representing white space. words :: Text -> [Text] words = L.filter (not . null) . split isSpace {-# INLINE words #-} -- | /O(n)/ Joins lines, after appending a terminating newline to -- each. unlines :: [Text] -> Text unlines = concat . L.map (`snoc` '\n') {-# INLINE unlines #-} -- | /O(n)/ Joins words using single space characters. unwords :: [Text] -> Text unwords = intercalate (singleton ' ') {-# INLINE unwords #-} -- | /O(n)/ The 'isPrefixOf' function takes two 'Text's and returns -- 'True' iff the first is a prefix of the second. Subject to fusion. isPrefixOf :: Text -> Text -> Bool isPrefixOf Empty _ = True isPrefixOf _ Empty = False isPrefixOf (Chunk x xs) (Chunk y ys) | lx == ly = x == y && isPrefixOf xs ys | lx < ly = x == yh && isPrefixOf xs (Chunk yt ys) | otherwise = xh == y && isPrefixOf (Chunk xt xs) ys where (xh,xt) = T.splitAt ly x (yh,yt) = T.splitAt lx y lx = T.length x ly = T.length y {-# INLINE [1] isPrefixOf #-} {-# RULES "LAZY TEXT isPrefixOf -> fused" [~1] forall s t. isPrefixOf s t = S.isPrefixOf (stream s) (stream t) "LAZY TEXT isPrefixOf -> unfused" [1] forall s t. S.isPrefixOf (stream s) (stream t) = isPrefixOf s t #-} -- | /O(n)/ The 'isSuffixOf' function takes two 'Text's and returns -- 'True' iff the first is a suffix of the second. isSuffixOf :: Text -> Text -> Bool isSuffixOf x y = reverse x `isPrefixOf` reverse y {-# INLINE isSuffixOf #-} -- TODO: a better implementation -- | /O(n+m)/ The 'isInfixOf' function takes two 'Text's and returns -- 'True' iff the first is contained, wholly and intact, anywhere -- within the second. -- -- This function is strict in its first argument, and lazy in its -- second. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. isInfixOf :: Text -> Text -> Bool isInfixOf needle haystack | null needle = True | isSingleton needle = S.elem (head needle) . S.stream $ haystack | otherwise = not . L.null . indices needle $ haystack {-# INLINE [1] isInfixOf #-} {-# RULES "LAZY TEXT isInfixOf/singleton -> S.elem/S.stream" [~1] forall n h. isInfixOf (singleton n) h = S.elem n (S.stream h) #-} ------------------------------------------------------------------------------- -- * View patterns -- | /O(n)/ Return the suffix of the second string if its prefix -- matches the entire first string. -- -- Examples: -- -- > stripPrefix "foo" "foobar" == Just "bar" -- > stripPrefix "" "baz" == Just "baz" -- > stripPrefix "foo" "quux" == Nothing -- -- This is particularly useful with the @ViewPatterns@ extension to -- GHC, as follows: -- -- > {-# LANGUAGE ViewPatterns #-} -- > import Data.Text.Lazy as T -- > -- > fnordLength :: Text -> Int -- > fnordLength (stripPrefix "fnord" -> Just suf) = T.length suf -- > fnordLength _ = -1 stripPrefix :: Text -> Text -> Maybe Text stripPrefix p t | null p = Just t | otherwise = case commonPrefixes p t of Just (_,c,r) | null c -> Just r _ -> Nothing -- | /O(n)/ Find the longest non-empty common prefix of two strings -- and return it, along with the suffixes of each string at which they -- no longer match. -- -- If the strings do not have a common prefix or either one is empty, -- this function returns 'Nothing'. -- -- Examples: -- -- > commonPrefixes "foobar" "fooquux" == Just ("foo","bar","quux") -- > commonPrefixes "veeble" "fetzer" == Nothing -- > commonPrefixes "" "baz" == Nothing commonPrefixes :: Text -> Text -> Maybe (Text,Text,Text) commonPrefixes Empty _ = Nothing commonPrefixes _ Empty = Nothing commonPrefixes a0 b0 = Just (go a0 b0 []) where go t0@(Chunk x xs) t1@(Chunk y ys) ps = case T.commonPrefixes x y of Just (p,a,b) | T.null a -> go xs (chunk b ys) (p:ps) | T.null b -> go (chunk a xs) ys (p:ps) | otherwise -> (fromChunks (L.reverse (p:ps)),chunk a xs, chunk b ys) Nothing -> (fromChunks (L.reverse ps),t0,t1) go t0 t1 ps = (fromChunks (L.reverse ps),t0,t1) -- | /O(n)/ Return the prefix of the second string if its suffix -- matches the entire first string. -- -- Examples: -- -- > stripSuffix "bar" "foobar" == Just "foo" -- > stripSuffix "" "baz" == Just "baz" -- > stripSuffix "foo" "quux" == Nothing -- -- This is particularly useful with the @ViewPatterns@ extension to -- GHC, as follows: -- -- > {-# LANGUAGE ViewPatterns #-} -- > import Data.Text.Lazy as T -- > -- > quuxLength :: Text -> Int -- > quuxLength (stripSuffix "quux" -> Just pre) = T.length pre -- > quuxLength _ = -1 stripSuffix :: Text -> Text -> Maybe Text stripSuffix p t = reverse `fmap` stripPrefix (reverse p) (reverse t) -- | /O(n)/ 'filter', applied to a predicate and a 'Text', -- returns a 'Text' containing those characters that satisfy the -- predicate. filter :: (Char -> Bool) -> Text -> Text filter p t = unstream (S.filter p (stream t)) {-# INLINE filter #-} -- | /O(n)/ The 'find' function takes a predicate and a 'Text', and -- returns the first element in matching the predicate, or 'Nothing' -- if there is no such element. find :: (Char -> Bool) -> Text -> Maybe Char find p t = S.findBy p (stream t) {-# INLINE find #-} -- | /O(n)/ The 'partition' function takes a predicate and a 'Text', -- and returns the pair of 'Text's with elements which do and do not -- satisfy the predicate, respectively; i.e. -- -- > partition p t == (filter p t, filter (not . p) t) partition :: (Char -> Bool) -> Text -> (Text, Text) partition p t = (filter p t, filter (not . p) t) {-# INLINE partition #-} -- | /O(n)/ 'Text' index (subscript) operator, starting from 0. index :: Text -> Int64 -> Char index t n = S.index (stream t) n {-# INLINE index #-} -- | /O(n+m)/ The 'count' function returns the number of times the -- query string appears in the given 'Text'. An empty query string is -- invalid, and will cause an error to be raised. -- -- In (unlikely) bad cases, this function's time complexity degrades -- towards /O(n*m)/. count :: Text -> Text -> Int64 count pat src | null pat = emptyError "count" | otherwise = go 0 (indices pat src) where go !n [] = n go !n (_:xs) = go (n+1) xs {-# INLINE [1] count #-} {-# RULES "LAZY TEXT count/singleton -> countChar" [~1] forall c t. count (singleton c) t = countChar c t #-} -- | /O(n)/ The 'countChar' function returns the number of times the -- query element appears in the given 'Text'. Subject to fusion. countChar :: Char -> Text -> Int64 countChar c t = S.countChar c (stream t) -- | /O(n)/ 'zip' takes two 'Text's and returns a list of -- corresponding pairs of bytes. If one input 'Text' is short, -- excess elements of the longer 'Text' are discarded. This is -- equivalent to a pair of 'unpack' operations. zip :: Text -> Text -> [(Char,Char)] zip a b = S.unstreamList $ S.zipWith (,) (stream a) (stream b) {-# INLINE [0] zip #-} -- | /O(n)/ 'zipWith' generalises 'zip' by zipping with the function -- given as the first argument, instead of a tupling function. -- Performs replacement on invalid scalar values. zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text zipWith f t1 t2 = unstream (S.zipWith g (stream t1) (stream t2)) where g a b = safe (f a b) {-# INLINE [0] zipWith #-} revChunks :: [T.Text] -> Text revChunks = L.foldl' (flip chunk) Empty emptyError :: String -> a emptyError fun = P.error ("Data.Text.Lazy." ++ fun ++ ": empty input") impossibleError :: String -> a impossibleError fun = P.error ("Data.Text.Lazy." ++ fun ++ ": impossible case") text-0.11.3.1/Data/Text/Private.hs0000644000000000000000000000206712140332415014650 0ustar0000000000000000{-# LANGUAGE BangPatterns, Rank2Types, UnboxedTuples #-} -- | -- Module : Data.Text.Private -- Copyright : (c) 2011 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC module Data.Text.Private ( runText , span_ ) where import Control.Monad.ST (ST, runST) import Data.Text.Internal (Text(..), textP) import Data.Text.Unsafe (Iter(..), iter) import qualified Data.Text.Array as A span_ :: (Char -> Bool) -> Text -> (# Text, Text #) span_ p t@(Text arr off len) = (# hd,tl #) where hd = textP arr off k tl = textP arr (off+k) (len-k) !k = loop 0 loop !i | i < len && p c = loop (i+d) | otherwise = i where Iter c d = iter t i {-# INLINE span_ #-} runText :: (forall s. (A.MArray s -> Int -> ST s Text) -> ST s Text) -> Text runText act = runST (act $ \ !marr !len -> do arr <- A.unsafeFreeze marr return $! textP arr 0 len) {-# INLINE runText #-} text-0.11.3.1/Data/Text/Read.hs0000644000000000000000000002031712140332415014107 0ustar0000000000000000{-# LANGUAGE OverloadedStrings, UnboxedTuples, CPP #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Read -- Copyright : (c) 2010, 2011 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Functions used frequently when reading textual data. module Data.Text.Read ( Reader , decimal , hexadecimal , signed , rational , double ) where import Control.Monad (liftM) import Data.Char (isDigit, isHexDigit, ord) import Data.Int (Int8, Int16, Int32, Int64) import Data.Ratio ((%)) import Data.Text as T import Data.Text.Private (span_) import Data.Word (Word, Word8, Word16, Word32, Word64) -- | Read some text. If the read succeeds, return its value and the -- remaining text, otherwise an error message. type Reader a = Text -> Either String (a,Text) -- | Read a decimal integer. The input must begin with at least one -- decimal digit, and is consumed until a non-digit or end of string -- is reached. -- -- This function does not handle leading sign characters. If you need -- to handle signed input, use @'signed' 'decimal'@. -- -- /Note/: For fixed-width integer types, this function does not -- attempt to detect overflow, so a sufficiently long input may give -- incorrect results. If you are worried about overflow, use -- 'Integer' for your result type. decimal :: Integral a => Reader a {-# SPECIALIZE decimal :: Reader Int #-} {-# SPECIALIZE decimal :: Reader Int8 #-} {-# SPECIALIZE decimal :: Reader Int16 #-} {-# SPECIALIZE decimal :: Reader Int32 #-} {-# SPECIALIZE decimal :: Reader Int64 #-} {-# SPECIALIZE decimal :: Reader Integer #-} {-# SPECIALIZE decimal :: Reader Word #-} {-# SPECIALIZE decimal :: Reader Word8 #-} {-# SPECIALIZE decimal :: Reader Word16 #-} {-# SPECIALIZE decimal :: Reader Word32 #-} {-# SPECIALIZE decimal :: Reader Word64 #-} decimal txt | T.null h = Left "input does not start with a digit" | otherwise = Right (T.foldl' go 0 h, t) where (# h,t #) = span_ isDigit txt go n d = (n * 10 + fromIntegral (digitToInt d)) -- | Read a hexadecimal integer, consisting of an optional leading -- @\"0x\"@ followed by at least one decimal digit. Input is consumed -- until a non-hex-digit or end of string is reached. This function -- is case insensitive. -- -- This function does not handle leading sign characters. If you need -- to handle signed input, use @'signed' 'hexadecimal'@. -- -- /Note/: For fixed-width integer types, this function does not -- attempt to detect overflow, so a sufficiently long input may give -- incorrect results. If you are worried about overflow, use -- 'Integer' for your result type. hexadecimal :: Integral a => Reader a {-# SPECIALIZE hexadecimal :: Reader Int #-} {-# SPECIALIZE hexadecimal :: Reader Int8 #-} {-# SPECIALIZE hexadecimal :: Reader Int16 #-} {-# SPECIALIZE hexadecimal :: Reader Int32 #-} {-# SPECIALIZE hexadecimal :: Reader Int64 #-} {-# SPECIALIZE hexadecimal :: Reader Integer #-} {-# SPECIALIZE hexadecimal :: Reader Word #-} {-# SPECIALIZE hexadecimal :: Reader Word8 #-} {-# SPECIALIZE hexadecimal :: Reader Word16 #-} {-# SPECIALIZE hexadecimal :: Reader Word32 #-} {-# SPECIALIZE hexadecimal :: Reader Word64 #-} hexadecimal txt | h == "0x" || h == "0X" = hex t | otherwise = hex txt where (h,t) = T.splitAt 2 txt hex :: Integral a => Reader a {-# SPECIALIZE hex :: Reader Int #-} {-# SPECIALIZE hex :: Reader Int8 #-} {-# SPECIALIZE hex :: Reader Int16 #-} {-# SPECIALIZE hex :: Reader Int32 #-} {-# SPECIALIZE hex :: Reader Int64 #-} {-# SPECIALIZE hex :: Reader Integer #-} {-# SPECIALIZE hex :: Reader Word #-} {-# SPECIALIZE hex :: Reader Word8 #-} {-# SPECIALIZE hex :: Reader Word16 #-} {-# SPECIALIZE hex :: Reader Word32 #-} {-# SPECIALIZE hex :: Reader Word64 #-} hex txt | T.null h = Left "input does not start with a hexadecimal digit" | otherwise = Right (T.foldl' go 0 h, t) where (# h,t #) = span_ isHexDigit txt go n d = (n * 16 + fromIntegral (hexDigitToInt d)) hexDigitToInt :: Char -> Int hexDigitToInt c | c >= '0' && c <= '9' = ord c - ord '0' | c >= 'a' && c <= 'f' = ord c - (ord 'a' - 10) | otherwise = ord c - (ord 'A' - 10) digitToInt :: Char -> Int digitToInt c = ord c - ord '0' -- | Read an optional leading sign character (@\'-\'@ or @\'+\'@) and -- apply it to the result of applying the given reader. signed :: Num a => Reader a -> Reader a {-# INLINE signed #-} signed f = runP (signa (P f)) -- | Read a rational number. -- -- This function accepts an optional leading sign character, followed -- by at least one decimal digit. The syntax similar to that accepted -- by the 'read' function, with the exception that a trailing @\'.\'@ -- or @\'e\'@ /not/ followed by a number is not consumed. -- -- Examples (with behaviour identical to 'read'): -- -- >rational "3" == Right (3.0, "") -- >rational "3.1" == Right (3.1, "") -- >rational "3e4" == Right (30000.0, "") -- >rational "3.1e4" == Right (31000.0, "") -- >rational ".3" == Left "input does not start with a digit" -- >rational "e3" == Left "input does not start with a digit" -- -- Examples of differences from 'read': -- -- >rational "3.foo" == Right (3.0, ".foo") -- >rational "3e" == Right (3.0, "e") rational :: Fractional a => Reader a {-# SPECIALIZE rational :: Reader Double #-} rational = floaty $ \real frac fracDenom -> fromRational $ real % 1 + frac % fracDenom -- | Read a rational number. -- -- The syntax accepted by this function is the same as for 'rational'. -- -- /Note/: This function is almost ten times faster than 'rational', -- but is slightly less accurate. -- -- The 'Double' type supports about 16 decimal places of accuracy. -- For 94.2% of numbers, this function and 'rational' give identical -- results, but for the remaining 5.8%, this function loses precision -- around the 15th decimal place. For 0.001% of numbers, this -- function will lose precision at the 13th or 14th decimal place. double :: Reader Double double = floaty $ \real frac fracDenom -> fromIntegral real + fromIntegral frac / fromIntegral fracDenom signa :: Num a => Parser a -> Parser a {-# SPECIALIZE signa :: Parser Int -> Parser Int #-} {-# SPECIALIZE signa :: Parser Int8 -> Parser Int8 #-} {-# SPECIALIZE signa :: Parser Int16 -> Parser Int16 #-} {-# SPECIALIZE signa :: Parser Int32 -> Parser Int32 #-} {-# SPECIALIZE signa :: Parser Int64 -> Parser Int64 #-} {-# SPECIALIZE signa :: Parser Integer -> Parser Integer #-} signa p = do sign <- perhaps '+' $ char (\c -> c == '-' || c == '+') if sign == '+' then p else negate `liftM` p newtype Parser a = P { runP :: Reader a } instance Monad Parser where return a = P $ \t -> Right (a,t) {-# INLINE return #-} m >>= k = P $ \t -> case runP m t of Left err -> Left err Right (a,t') -> runP (k a) t' {-# INLINE (>>=) #-} fail msg = P $ \_ -> Left msg perhaps :: a -> Parser a -> Parser a perhaps def m = P $ \t -> case runP m t of Left _ -> Right (def,t) r@(Right _) -> r char :: (Char -> Bool) -> Parser Char char p = P $ \t -> case T.uncons t of Just (c,t') | p c -> Right (c,t') _ -> Left "character does not match" data T = T !Integer !Int floaty :: Fractional a => (Integer -> Integer -> Integer -> a) -> Reader a {-# INLINE floaty #-} floaty f = runP $ do sign <- perhaps '+' $ char (\c -> c == '-' || c == '+') real <- P decimal T fraction fracDigits <- perhaps (T 0 0) $ do _ <- char (=='.') digits <- P $ \t -> Right (T.length $ T.takeWhile isDigit t, t) n <- P decimal return $ T n digits let e c = c == 'e' || c == 'E' power <- perhaps 0 (char e >> signa (P decimal) :: Parser Int) let n = if fracDigits == 0 then if power == 0 then fromIntegral real else fromIntegral real * (10 ^^ power) else if power == 0 then f real fraction (10 ^ fracDigits) else f real fraction (10 ^ fracDigits) * (10 ^^ power) return $! if sign == '+' then n else -n text-0.11.3.1/Data/Text/Search.hs0000644000000000000000000000630412140332415014441 0ustar0000000000000000{-# LANGUAGE BangPatterns, ScopedTypeVariables #-} -- | -- Module : Data.Text.Search -- Copyright : (c) Bryan O'Sullivan 2009 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Fast substring search for 'Text', based on work by Boyer, Moore, -- Horspool, Sunday, and Lundh. -- -- References: -- -- * R. S. Boyer, J. S. Moore: A Fast String Searching Algorithm. -- Communications of the ACM, 20, 10, 762-772 (1977) -- -- * R. N. Horspool: Practical Fast Searching in Strings. Software - -- Practice and Experience 10, 501-506 (1980) -- -- * D. M. Sunday: A Very Fast Substring Search Algorithm. -- Communications of the ACM, 33, 8, 132-142 (1990) -- -- * F. Lundh: The Fast Search Algorithm. -- (2006) module Data.Text.Search ( indices ) where import qualified Data.Text.Array as A import Data.Word (Word64) import Data.Text.Internal (Text(..)) import Data.Bits ((.|.), (.&.)) import Data.Text.UnsafeShift (shiftL) data T = {-# UNPACK #-} !Word64 :* {-# UNPACK #-} !Int -- | /O(n+m)/ Find the offsets of all non-overlapping indices of -- @needle@ within @haystack@. The offsets returned represent -- locations in the low-level array. -- -- In (unlikely) bad cases, this algorithm's complexity degrades -- towards /O(n*m)/. indices :: Text -- ^ Substring to search for (@needle@) -> Text -- ^ Text to search in (@haystack@) -> [Int] indices _needle@(Text narr noff nlen) _haystack@(Text harr hoff hlen) | nlen == 1 = scanOne (nindex 0) | nlen <= 0 || ldiff < 0 = [] | otherwise = scan 0 where ldiff = hlen - nlen nlast = nlen - 1 z = nindex nlast nindex k = A.unsafeIndex narr (noff+k) hindex k = A.unsafeIndex harr (hoff+k) hindex' k | k == hlen = 0 | otherwise = A.unsafeIndex harr (hoff+k) buildTable !i !msk !skp | i >= nlast = (msk .|. swizzle z) :* skp | otherwise = buildTable (i+1) (msk .|. swizzle c) skp' where c = nindex i skp' | c == z = nlen - i - 2 | otherwise = skp swizzle k = 1 `shiftL` (fromIntegral k .&. 0x3f) scan !i | i > ldiff = [] | c == z && candidateMatch 0 = i : scan (i + nlen) | otherwise = scan (i + delta) where c = hindex (i + nlast) candidateMatch !j | j >= nlast = True | hindex (i+j) /= nindex j = False | otherwise = candidateMatch (j+1) delta | nextInPattern = nlen + 1 | c == z = skip + 1 | otherwise = 1 where nextInPattern = mask .&. swizzle (hindex' (i+nlen)) == 0 !(mask :* skip) = buildTable 0 0 (nlen-2) scanOne c = loop 0 where loop !i | i >= hlen = [] | hindex i == c = i : loop (i+1) | otherwise = loop (i+1) {-# INLINE indices #-} text-0.11.3.1/Data/Text/Unsafe.hs0000644000000000000000000000750312140332415014457 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash, UnboxedTuples #-} -- | -- Module : Data.Text.Unsafe -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- A module containing unsafe 'Text' operations, for very very careful -- use in heavily tested code. module Data.Text.Unsafe ( inlineInterleaveST , inlinePerformIO , unsafeDupablePerformIO , Iter(..) , iter , iter_ , reverseIter , unsafeHead , unsafeTail , lengthWord16 , takeWord16 , dropWord16 ) where #if defined(ASSERTS) import Control.Exception (assert) #endif import Data.Text.Encoding.Utf16 (chr2) import Data.Text.Internal (Text(..)) import Data.Text.Unsafe.Base (inlineInterleaveST, inlinePerformIO) import Data.Text.UnsafeChar (unsafeChr) import qualified Data.Text.Array as A #if __GLASGOW_HASKELL__ >= 611 import GHC.IO (unsafeDupablePerformIO) #else import GHC.IOBase (unsafeDupablePerformIO) #endif -- | /O(1)/ A variant of 'head' for non-empty 'Text'. 'unsafeHead' -- omits the check for the empty case, so there is an obligation on -- the programmer to provide a proof that the 'Text' is non-empty. unsafeHead :: Text -> Char unsafeHead (Text arr off _len) | m < 0xD800 || m > 0xDBFF = unsafeChr m | otherwise = chr2 m n where m = A.unsafeIndex arr off n = A.unsafeIndex arr (off+1) {-# INLINE unsafeHead #-} -- | /O(1)/ A variant of 'tail' for non-empty 'Text'. 'unsafeHead' -- omits the check for the empty case, so there is an obligation on -- the programmer to provide a proof that the 'Text' is non-empty. unsafeTail :: Text -> Text unsafeTail t@(Text arr off len) = #if defined(ASSERTS) assert (d <= len) $ #endif Text arr (off+d) (len-d) where d = iter_ t 0 {-# INLINE unsafeTail #-} data Iter = Iter {-# UNPACK #-} !Char {-# UNPACK #-} !Int -- | /O(1)/ Iterate (unsafely) one step forwards through a UTF-16 -- array, returning the current character and the delta to add to give -- the next offset to iterate at. iter :: Text -> Int -> Iter iter (Text arr off _len) i | m < 0xD800 || m > 0xDBFF = Iter (unsafeChr m) 1 | otherwise = Iter (chr2 m n) 2 where m = A.unsafeIndex arr j n = A.unsafeIndex arr k j = off + i k = j + 1 {-# INLINE iter #-} -- | /O(1)/ Iterate one step through a UTF-16 array, returning the -- delta to add to give the next offset to iterate at. iter_ :: Text -> Int -> Int iter_ (Text arr off _len) i | m < 0xD800 || m > 0xDBFF = 1 | otherwise = 2 where m = A.unsafeIndex arr (off+i) {-# INLINE iter_ #-} -- | /O(1)/ Iterate one step backwards through a UTF-16 array, -- returning the current character and the delta to add (i.e. a -- negative number) to give the next offset to iterate at. reverseIter :: Text -> Int -> (Char,Int) reverseIter (Text arr off _len) i | m < 0xDC00 || m > 0xDFFF = (unsafeChr m, -1) | otherwise = (chr2 n m, -2) where m = A.unsafeIndex arr j n = A.unsafeIndex arr k j = off + i k = j - 1 {-# INLINE reverseIter #-} -- | /O(1)/ Return the length of a 'Text' in units of 'Word16'. This -- is useful for sizing a target array appropriately before using -- 'unsafeCopyToPtr'. lengthWord16 :: Text -> Int lengthWord16 (Text _arr _off len) = len {-# INLINE lengthWord16 #-} -- | /O(1)/ Unchecked take of 'k' 'Word16's from the front of a 'Text'. takeWord16 :: Int -> Text -> Text takeWord16 k (Text arr off _len) = Text arr off k {-# INLINE takeWord16 #-} -- | /O(1)/ Unchecked drop of 'k' 'Word16's from the front of a 'Text'. dropWord16 :: Int -> Text -> Text dropWord16 k (Text arr off len) = Text arr (off+k) (len-k) {-# INLINE dropWord16 #-} text-0.11.3.1/Data/Text/UnsafeChar.hs0000644000000000000000000000504112140332415015250 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash #-} -- | -- Module : Data.Text.UnsafeChar -- Copyright : (c) 2008, 2009 Tom Harper, -- (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Fast character manipulation functions. module Data.Text.UnsafeChar ( ord , unsafeChr , unsafeChr8 , unsafeChr32 , unsafeWrite -- , unsafeWriteRev ) where #ifdef ASSERTS import Control.Exception (assert) #endif import Control.Monad.ST (ST) import Data.Bits ((.&.)) import Data.Text.UnsafeShift (shiftR) import GHC.Exts (Char(..), Int(..), chr#, ord#, word2Int#) import GHC.Word (Word8(..), Word16(..), Word32(..)) import qualified Data.Text.Array as A ord :: Char -> Int ord (C# c#) = I# (ord# c#) {-# INLINE ord #-} unsafeChr :: Word16 -> Char unsafeChr (W16# w#) = C# (chr# (word2Int# w#)) {-# INLINE unsafeChr #-} unsafeChr8 :: Word8 -> Char unsafeChr8 (W8# w#) = C# (chr# (word2Int# w#)) {-# INLINE unsafeChr8 #-} unsafeChr32 :: Word32 -> Char unsafeChr32 (W32# w#) = C# (chr# (word2Int# w#)) {-# INLINE unsafeChr32 #-} -- | Write a character into the array at the given offset. Returns -- the number of 'Word16's written. unsafeWrite :: A.MArray s -> Int -> Char -> ST s Int unsafeWrite marr i c | n < 0x10000 = do #if defined(ASSERTS) assert (i >= 0) . assert (i < A.length marr) $ return () #endif A.unsafeWrite marr i (fromIntegral n) return 1 | otherwise = do #if defined(ASSERTS) assert (i >= 0) . assert (i < A.length marr - 1) $ return () #endif A.unsafeWrite marr i lo A.unsafeWrite marr (i+1) hi return 2 where n = ord c m = n - 0x10000 lo = fromIntegral $ (m `shiftR` 10) + 0xD800 hi = fromIntegral $ (m .&. 0x3FF) + 0xDC00 {-# INLINE unsafeWrite #-} {- unsafeWriteRev :: A.MArray s Word16 -> Int -> Char -> ST s Int unsafeWriteRev marr i c | n < 0x10000 = do assert (i >= 0) . assert (i < A.length marr) $ A.unsafeWrite marr i (fromIntegral n) return (i-1) | otherwise = do assert (i >= 1) . assert (i < A.length marr) $ A.unsafeWrite marr (i-1) lo A.unsafeWrite marr i hi return (i-2) where n = ord c m = n - 0x10000 lo = fromIntegral $ (m `shiftR` 10) + 0xD800 hi = fromIntegral $ (m .&. 0x3FF) + 0xDC00 {-# INLINE unsafeWriteRev #-} -} text-0.11.3.1/Data/Text/UnsafeShift.hs0000644000000000000000000000355512140332415015460 0ustar0000000000000000{-# LANGUAGE MagicHash #-} -- | -- Module : Data.Text.UnsafeShift -- Copyright : (c) Bryan O'Sullivan 2009 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Fast, unchecked bit shifting functions. module Data.Text.UnsafeShift ( UnsafeShift(..) ) where -- import qualified Data.Bits as Bits import GHC.Base import GHC.Word -- | This is a workaround for poor optimisation in GHC 6.8.2. It -- fails to notice constant-width shifts, and adds a test and branch -- to every shift. This imposes about a 10% performance hit. -- -- These functions are undefined when the amount being shifted by is -- greater than the size in bits of a machine Int#. class UnsafeShift a where shiftL :: a -> Int -> a shiftR :: a -> Int -> a instance UnsafeShift Word16 where {-# INLINE shiftL #-} shiftL (W16# x#) (I# i#) = W16# (narrow16Word# (x# `uncheckedShiftL#` i#)) {-# INLINE shiftR #-} shiftR (W16# x#) (I# i#) = W16# (x# `uncheckedShiftRL#` i#) instance UnsafeShift Word32 where {-# INLINE shiftL #-} shiftL (W32# x#) (I# i#) = W32# (narrow32Word# (x# `uncheckedShiftL#` i#)) {-# INLINE shiftR #-} shiftR (W32# x#) (I# i#) = W32# (x# `uncheckedShiftRL#` i#) instance UnsafeShift Word64 where {-# INLINE shiftL #-} shiftL (W64# x#) (I# i#) = W64# (x# `uncheckedShiftL64#` i#) {-# INLINE shiftR #-} shiftR (W64# x#) (I# i#) = W64# (x# `uncheckedShiftRL64#` i#) instance UnsafeShift Int where {-# INLINE shiftL #-} shiftL (I# x#) (I# i#) = I# (x# `iShiftL#` i#) {-# INLINE shiftR #-} shiftR (I# x#) (I# i#) = I# (x# `iShiftRA#` i#) {- instance UnsafeShift Integer where {-# INLINE shiftL #-} shiftL = Bits.shiftL {-# INLINE shiftR #-} shiftR = Bits.shiftR -} text-0.11.3.1/Data/Text/Util.hs0000644000000000000000000000111412140332415014143 0ustar0000000000000000{-# LANGUAGE CPP, DeriveDataTypeable #-} -- | -- Module : Data.Text.Util -- Copyright : 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Useful functions. module Data.Text.Util ( intersperse ) where -- | A lazier version of Data.List.intersperse. The other version -- causes space leaks! intersperse :: a -> [a] -> [a] intersperse _ [] = [] intersperse sep (x:xs) = x : go xs where go [] = [] go (y:ys) = sep : y: go ys {-# INLINE intersperse #-} text-0.11.3.1/Data/Text/Encoding/0000755000000000000000000000000012140332415014423 5ustar0000000000000000text-0.11.3.1/Data/Text/Encoding/Error.hs0000644000000000000000000001021112140332415016043 0ustar0000000000000000{-# LANGUAGE CPP, DeriveDataTypeable #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Encoding.Error -- Copyright : (c) Bryan O'Sullivan 2009 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Types and functions for dealing with encoding and decoding errors -- in Unicode text. -- -- The standard functions for encoding and decoding text are strict, -- which is to say that they throw exceptions on invalid input. This -- is often unhelpful on real world input, so alternative functions -- exist that accept custom handlers for dealing with invalid inputs. -- These 'OnError' handlers are normal Haskell functions. You can use -- one of the presupplied functions in this module, or you can write a -- custom handler of your own. module Data.Text.Encoding.Error ( -- * Error handling types UnicodeException(..) , OnError , OnDecodeError , OnEncodeError -- * Useful error handling functions , lenientDecode , strictDecode , strictEncode , ignore , replace ) where import Control.DeepSeq (NFData (..)) #if __GLASGOW_HASKELL__ >= 610 import Control.Exception (Exception, throw) #else import Control.Exception.Extensible (Exception, throw) #endif import Data.Typeable (Typeable) import Data.Word (Word8) import Numeric (showHex) -- | Function type for handling a coding error. It is supplied with -- two inputs: -- -- * A 'String' that describes the error. -- -- * The input value that caused the error. If the error arose -- because the end of input was reached or could not be identified -- precisely, this value will be 'Nothing'. -- -- If the handler returns a value wrapped with 'Just', that value will -- be used in the output as the replacement for the invalid input. If -- it returns 'Nothing', no value will be used in the output. -- -- Should the handler need to abort processing, it should use 'error' -- or 'throw' an exception (preferably a 'UnicodeException'). It may -- use the description provided to construct a more helpful error -- report. type OnError a b = String -> Maybe a -> Maybe b -- | A handler for a decoding error. type OnDecodeError = OnError Word8 Char -- | A handler for an encoding error. type OnEncodeError = OnError Char Word8 -- | An exception type for representing Unicode encoding errors. data UnicodeException = DecodeError String (Maybe Word8) -- ^ Could not decode a byte sequence because it was invalid under -- the given encoding, or ran out of input in mid-decode. | EncodeError String (Maybe Char) -- ^ Tried to encode a character that could not be represented -- under the given encoding, or ran out of input in mid-encode. deriving (Eq, Typeable) showUnicodeException :: UnicodeException -> String showUnicodeException (DecodeError desc (Just w)) = "Cannot decode byte '\\x" ++ showHex w ("': " ++ desc) showUnicodeException (DecodeError desc Nothing) = "Cannot decode input: " ++ desc showUnicodeException (EncodeError desc (Just c)) = "Cannot encode character '\\x" ++ showHex (fromEnum c) ("': " ++ desc) showUnicodeException (EncodeError desc Nothing) = "Cannot encode input: " ++ desc instance Show UnicodeException where show = showUnicodeException instance Exception UnicodeException instance NFData UnicodeException where rnf (DecodeError desc w) = rnf desc `seq` rnf w `seq` () rnf (EncodeError desc c) = rnf desc `seq` rnf c `seq` () -- | Throw a 'UnicodeException' if decoding fails. strictDecode :: OnDecodeError strictDecode desc c = throw (DecodeError desc c) -- | Replace an invalid input byte with the Unicode replacement -- character U+FFFD. lenientDecode :: OnDecodeError lenientDecode _ _ = Just '\xfffd' -- | Throw a 'UnicodeException' if encoding fails. strictEncode :: OnEncodeError strictEncode desc c = throw (EncodeError desc c) -- | Ignore an invalid input, substituting nothing in the output. ignore :: OnError a b ignore _ _ = Nothing -- | Replace an invalid input with a valid output. replace :: b -> OnError a b replace c _ _ = Just c text-0.11.3.1/Data/Text/Encoding/Fusion.hs0000644000000000000000000001654112140332415016231 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, Rank2Types #-} -- | -- Module : Data.Text.Encoding.Fusion -- Copyright : (c) Tom Harper 2008-2009, -- (c) Bryan O'Sullivan 2009, -- (c) Duncan Coutts 2009 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Fusible 'Stream'-oriented functions for converting between 'Text' -- and several common encodings. module Data.Text.Encoding.Fusion ( -- * Streaming streamASCII , streamUtf8 , streamUtf16LE , streamUtf16BE , streamUtf32LE , streamUtf32BE -- * Unstreaming , unstream , module Data.Text.Encoding.Fusion.Common ) where #if defined(ASSERTS) import Control.Exception (assert) #endif import Data.ByteString.Internal (ByteString(..), mallocByteString, memcpy) import Data.Text.Fusion (Step(..), Stream(..)) import Data.Text.Fusion.Size import Data.Text.Encoding.Error import Data.Text.Encoding.Fusion.Common import Data.Text.UnsafeChar (unsafeChr, unsafeChr8, unsafeChr32) import Data.Text.UnsafeShift (shiftL, shiftR) import Data.Word (Word8, Word16, Word32) import Foreign.ForeignPtr (withForeignPtr, ForeignPtr) import Foreign.Storable (pokeByteOff) import qualified Data.ByteString as B import qualified Data.ByteString.Unsafe as B import qualified Data.Text.Encoding.Utf8 as U8 import qualified Data.Text.Encoding.Utf16 as U16 import qualified Data.Text.Encoding.Utf32 as U32 import Data.Text.Unsafe (unsafeDupablePerformIO) streamASCII :: ByteString -> Stream Char streamASCII bs = Stream next 0 (maxSize l) where l = B.length bs {-# INLINE next #-} next i | i >= l = Done | otherwise = Yield (unsafeChr8 x1) (i+1) where x1 = B.unsafeIndex bs i {-# DEPRECATED streamASCII "Do not use this function" #-} {-# INLINE [0] streamASCII #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using UTF-8 -- encoding. streamUtf8 :: OnDecodeError -> ByteString -> Stream Char streamUtf8 onErr bs = Stream next 0 (maxSize l) where l = B.length bs next i | i >= l = Done | U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1) | i+1 < l && U8.validate2 x1 x2 = Yield (U8.chr2 x1 x2) (i+2) | i+2 < l && U8.validate3 x1 x2 x3 = Yield (U8.chr3 x1 x2 x3) (i+3) | i+3 < l && U8.validate4 x1 x2 x3 x4 = Yield (U8.chr4 x1 x2 x3 x4) (i+4) | otherwise = decodeError "streamUtf8" "UTF-8" onErr (Just x1) (i+1) where x1 = idx i x2 = idx (i + 1) x3 = idx (i + 2) x4 = idx (i + 3) idx = B.unsafeIndex bs {-# INLINE [0] streamUtf8 #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using little -- endian UTF-16 encoding. streamUtf16LE :: OnDecodeError -> ByteString -> Stream Char streamUtf16LE onErr bs = Stream next 0 (maxSize (l `shiftR` 1)) where l = B.length bs {-# INLINE next #-} next i | i >= l = Done | i+1 < l && U16.validate1 x1 = Yield (unsafeChr x1) (i+2) | i+3 < l && U16.validate2 x1 x2 = Yield (U16.chr2 x1 x2) (i+4) | otherwise = decodeError "streamUtf16LE" "UTF-16LE" onErr Nothing (i+1) where x1 = idx i + (idx (i + 1) `shiftL` 8) x2 = idx (i + 2) + (idx (i + 3) `shiftL` 8) idx = fromIntegral . B.unsafeIndex bs :: Int -> Word16 {-# INLINE [0] streamUtf16LE #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using big -- endian UTF-16 encoding. streamUtf16BE :: OnDecodeError -> ByteString -> Stream Char streamUtf16BE onErr bs = Stream next 0 (maxSize (l `shiftR` 1)) where l = B.length bs {-# INLINE next #-} next i | i >= l = Done | i+1 < l && U16.validate1 x1 = Yield (unsafeChr x1) (i+2) | i+3 < l && U16.validate2 x1 x2 = Yield (U16.chr2 x1 x2) (i+4) | otherwise = decodeError "streamUtf16BE" "UTF-16BE" onErr Nothing (i+1) where x1 = (idx i `shiftL` 8) + idx (i + 1) x2 = (idx (i + 2) `shiftL` 8) + idx (i + 3) idx = fromIntegral . B.unsafeIndex bs :: Int -> Word16 {-# INLINE [0] streamUtf16BE #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using big -- endian UTF-32 encoding. streamUtf32BE :: OnDecodeError -> ByteString -> Stream Char streamUtf32BE onErr bs = Stream next 0 (maxSize (l `shiftR` 2)) where l = B.length bs {-# INLINE next #-} next i | i >= l = Done | i+3 < l && U32.validate x = Yield (unsafeChr32 x) (i+4) | otherwise = decodeError "streamUtf32BE" "UTF-32BE" onErr Nothing (i+1) where x = shiftL x1 24 + shiftL x2 16 + shiftL x3 8 + x4 x1 = idx i x2 = idx (i+1) x3 = idx (i+2) x4 = idx (i+3) idx = fromIntegral . B.unsafeIndex bs :: Int -> Word32 {-# INLINE [0] streamUtf32BE #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using little -- endian UTF-32 encoding. streamUtf32LE :: OnDecodeError -> ByteString -> Stream Char streamUtf32LE onErr bs = Stream next 0 (maxSize (l `shiftR` 2)) where l = B.length bs {-# INLINE next #-} next i | i >= l = Done | i+3 < l && U32.validate x = Yield (unsafeChr32 x) (i+4) | otherwise = decodeError "streamUtf32LE" "UTF-32LE" onErr Nothing (i+1) where x = shiftL x4 24 + shiftL x3 16 + shiftL x2 8 + x1 x1 = idx i x2 = idx $ i+1 x3 = idx $ i+2 x4 = idx $ i+3 idx = fromIntegral . B.unsafeIndex bs :: Int -> Word32 {-# INLINE [0] streamUtf32LE #-} -- | /O(n)/ Convert a 'Stream' 'Word8' to a 'ByteString'. unstream :: Stream Word8 -> ByteString unstream (Stream next s0 len) = unsafeDupablePerformIO $ do let mlen = upperBound 4 len mallocByteString mlen >>= loop mlen 0 s0 where loop !n !off !s fp = case next s of Done -> trimUp fp n off Skip s' -> loop n off s' fp Yield x s' | off == n -> realloc fp n off s' x | otherwise -> do withForeignPtr fp $ \p -> pokeByteOff p off x loop n (off+1) s' fp {-# NOINLINE realloc #-} realloc fp n off s x = do let n' = n+n fp' <- copy0 fp n n' withForeignPtr fp' $ \p -> pokeByteOff p off x loop n' (off+1) s fp' {-# NOINLINE trimUp #-} trimUp fp _ off = return $! PS fp 0 off copy0 :: ForeignPtr Word8 -> Int -> Int -> IO (ForeignPtr Word8) copy0 !src !srcLen !destLen = #if defined(ASSERTS) assert (srcLen <= destLen) $ #endif do dest <- mallocByteString destLen withForeignPtr src $ \src' -> withForeignPtr dest $ \dest' -> memcpy dest' src' (fromIntegral srcLen) return dest decodeError :: forall s. String -> String -> OnDecodeError -> Maybe Word8 -> s -> Step s Char decodeError func kind onErr mb i = case onErr desc mb of Nothing -> Skip i Just c -> Yield c i where desc = "Data.Text.Encoding.Fusion." ++ func ++ ": Invalid " ++ kind ++ " stream" text-0.11.3.1/Data/Text/Encoding/Utf16.hs0000644000000000000000000000215012140332415015662 0ustar0000000000000000{-# LANGUAGE MagicHash, BangPatterns #-} -- | -- Module : Data.Text.Encoding.Utf16 -- Copyright : (c) 2008, 2009 Tom Harper, -- (c) 2009 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Basic UTF-16 validation and character manipulation. module Data.Text.Encoding.Utf16 ( chr2 , validate1 , validate2 ) where import GHC.Exts import GHC.Word (Word16(..)) chr2 :: Word16 -> Word16 -> Char chr2 (W16# a#) (W16# b#) = C# (chr# (upper# +# lower# +# 0x10000#)) where !x# = word2Int# a# !y# = word2Int# b# !upper# = uncheckedIShiftL# (x# -# 0xD800#) 10# !lower# = y# -# 0xDC00# {-# INLINE chr2 #-} validate1 :: Word16 -> Bool validate1 x1 = x1 < 0xD800 || x1 > 0xDFFF {-# INLINE validate1 #-} validate2 :: Word16 -> Word16 -> Bool validate2 x1 x2 = x1 >= 0xD800 && x1 <= 0xDBFF && x2 >= 0xDC00 && x2 <= 0xDFFF {-# INLINE validate2 #-} text-0.11.3.1/Data/Text/Encoding/Utf32.hs0000644000000000000000000000113612140332415015663 0ustar0000000000000000-- | -- Module : Data.Text.Encoding.Utf32 -- Copyright : (c) 2008, 2009 Tom Harper, -- (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Basic UTF-32 validation. module Data.Text.Encoding.Utf32 ( validate ) where import Data.Word (Word32) validate :: Word32 -> Bool validate x1 = x1 < 0xD800 || (x1 > 0xDFFF && x1 <= 0x10FFFF) {-# INLINE validate #-} text-0.11.3.1/Data/Text/Encoding/Utf8.hs0000644000000000000000000001073512140332415015613 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash, BangPatterns #-} -- | -- Module : Data.Text.Encoding.Utf8 -- Copyright : (c) 2008, 2009 Tom Harper, -- (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Duncan Coutts -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Basic UTF-8 validation and character manipulation. module Data.Text.Encoding.Utf8 ( -- Decomposition ord2 , ord3 , ord4 -- Construction , chr2 , chr3 , chr4 -- * Validation , validate1 , validate2 , validate3 , validate4 ) where #if defined(ASSERTS) import Control.Exception (assert) #endif import Data.Bits ((.&.)) import Data.Text.UnsafeChar (ord) import Data.Text.UnsafeShift (shiftR) import GHC.Exts import GHC.Word (Word8(..)) default(Int) between :: Word8 -- ^ byte to check -> Word8 -- ^ lower bound -> Word8 -- ^ upper bound -> Bool between x y z = x >= y && x <= z {-# INLINE between #-} ord2 :: Char -> (Word8,Word8) ord2 c = #if defined(ASSERTS) assert (n >= 0x80 && n <= 0x07ff) #endif (x1,x2) where n = ord c x1 = fromIntegral $ (n `shiftR` 6) + 0xC0 x2 = fromIntegral $ (n .&. 0x3F) + 0x80 ord3 :: Char -> (Word8,Word8,Word8) ord3 c = #if defined(ASSERTS) assert (n >= 0x0800 && n <= 0xffff) #endif (x1,x2,x3) where n = ord c x1 = fromIntegral $ (n `shiftR` 12) + 0xE0 x2 = fromIntegral $ ((n `shiftR` 6) .&. 0x3F) + 0x80 x3 = fromIntegral $ (n .&. 0x3F) + 0x80 ord4 :: Char -> (Word8,Word8,Word8,Word8) ord4 c = #if defined(ASSERTS) assert (n >= 0x10000) #endif (x1,x2,x3,x4) where n = ord c x1 = fromIntegral $ (n `shiftR` 18) + 0xF0 x2 = fromIntegral $ ((n `shiftR` 12) .&. 0x3F) + 0x80 x3 = fromIntegral $ ((n `shiftR` 6) .&. 0x3F) + 0x80 x4 = fromIntegral $ (n .&. 0x3F) + 0x80 chr2 :: Word8 -> Word8 -> Char chr2 (W8# x1#) (W8# x2#) = C# (chr# (z1# +# z2#)) where !y1# = word2Int# x1# !y2# = word2Int# x2# !z1# = uncheckedIShiftL# (y1# -# 0xC0#) 6# !z2# = y2# -# 0x80# {-# INLINE chr2 #-} chr3 :: Word8 -> Word8 -> Word8 -> Char chr3 (W8# x1#) (W8# x2#) (W8# x3#) = C# (chr# (z1# +# z2# +# z3#)) where !y1# = word2Int# x1# !y2# = word2Int# x2# !y3# = word2Int# x3# !z1# = uncheckedIShiftL# (y1# -# 0xE0#) 12# !z2# = uncheckedIShiftL# (y2# -# 0x80#) 6# !z3# = y3# -# 0x80# {-# INLINE chr3 #-} chr4 :: Word8 -> Word8 -> Word8 -> Word8 -> Char chr4 (W8# x1#) (W8# x2#) (W8# x3#) (W8# x4#) = C# (chr# (z1# +# z2# +# z3# +# z4#)) where !y1# = word2Int# x1# !y2# = word2Int# x2# !y3# = word2Int# x3# !y4# = word2Int# x4# !z1# = uncheckedIShiftL# (y1# -# 0xF0#) 18# !z2# = uncheckedIShiftL# (y2# -# 0x80#) 12# !z3# = uncheckedIShiftL# (y3# -# 0x80#) 6# !z4# = y4# -# 0x80# {-# INLINE chr4 #-} validate1 :: Word8 -> Bool validate1 x1 = x1 <= 0x7F {-# INLINE validate1 #-} validate2 :: Word8 -> Word8 -> Bool validate2 x1 x2 = between x1 0xC2 0xDF && between x2 0x80 0xBF {-# INLINE validate2 #-} validate3 :: Word8 -> Word8 -> Word8 -> Bool {-# INLINE validate3 #-} validate3 x1 x2 x3 = validate3_1 || validate3_2 || validate3_3 || validate3_4 where validate3_1 = (x1 == 0xE0) && between x2 0xA0 0xBF && between x3 0x80 0xBF validate3_2 = between x1 0xE1 0xEC && between x2 0x80 0xBF && between x3 0x80 0xBF validate3_3 = x1 == 0xED && between x2 0x80 0x9F && between x3 0x80 0xBF validate3_4 = between x1 0xEE 0xEF && between x2 0x80 0xBF && between x3 0x80 0xBF validate4 :: Word8 -> Word8 -> Word8 -> Word8 -> Bool {-# INLINE validate4 #-} validate4 x1 x2 x3 x4 = validate4_1 || validate4_2 || validate4_3 where validate4_1 = x1 == 0xF0 && between x2 0x90 0xBF && between x3 0x80 0xBF && between x4 0x80 0xBF validate4_2 = between x1 0xF1 0xF3 && between x2 0x80 0xBF && between x3 0x80 0xBF && between x4 0x80 0xBF validate4_3 = x1 == 0xF4 && between x2 0x80 0x8F && between x3 0x80 0xBF && between x4 0x80 0xBF text-0.11.3.1/Data/Text/Encoding/Fusion/0000755000000000000000000000000012140332415015666 5ustar0000000000000000text-0.11.3.1/Data/Text/Encoding/Fusion/Common.hs0000644000000000000000000001012012140332415017444 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} -- | -- Module : Data.Text.Encoding.Fusion.Common -- Copyright : (c) Tom Harper 2008-2009, -- (c) Bryan O'Sullivan 2009, -- (c) Duncan Coutts 2009, -- (c) Jasper Van der Jeugt 2011 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Fusible 'Stream'-oriented functions for converting between 'Text' -- and several common encodings. module Data.Text.Encoding.Fusion.Common ( -- * Restreaming -- Restreaming is the act of converting from one 'Stream' -- representation to another. restreamUtf16LE , restreamUtf16BE , restreamUtf32LE , restreamUtf32BE ) where import Data.Bits ((.&.)) import Data.Text.Fusion (Step(..), Stream(..)) import Data.Text.Fusion.Internal (RS(..)) import Data.Text.UnsafeChar (ord) import Data.Text.UnsafeShift (shiftR) import Data.Word (Word8) restreamUtf16BE :: Stream Char -> Stream Word8 restreamUtf16BE (Stream next0 s0 len) = Stream next (RS0 s0) (len * 2) where next (RS0 s) = case next0 s of Done -> Done Skip s' -> Skip (RS0 s') Yield x s' | n < 0x10000 -> Yield (fromIntegral $ n `shiftR` 8) $ RS1 s' (fromIntegral n) | otherwise -> Yield c1 $ RS3 s' c2 c3 c4 where n = ord x n1 = n - 0x10000 c1 = fromIntegral (n1 `shiftR` 18 + 0xD8) c2 = fromIntegral (n1 `shiftR` 10) n2 = n1 .&. 0x3FF c3 = fromIntegral (n2 `shiftR` 8 + 0xDC) c4 = fromIntegral n2 next (RS1 s x2) = Yield x2 (RS0 s) next (RS2 s x2 x3) = Yield x2 (RS1 s x3) next (RS3 s x2 x3 x4) = Yield x2 (RS2 s x3 x4) {-# INLINE next #-} {-# INLINE restreamUtf16BE #-} restreamUtf16LE :: Stream Char -> Stream Word8 restreamUtf16LE (Stream next0 s0 len) = Stream next (RS0 s0) (len * 2) where next (RS0 s) = case next0 s of Done -> Done Skip s' -> Skip (RS0 s') Yield x s' | n < 0x10000 -> Yield (fromIntegral n) $ RS1 s' (fromIntegral $ shiftR n 8) | otherwise -> Yield c1 $ RS3 s' c2 c3 c4 where n = ord x n1 = n - 0x10000 c2 = fromIntegral (shiftR n1 18 + 0xD8) c1 = fromIntegral (shiftR n1 10) n2 = n1 .&. 0x3FF c4 = fromIntegral (shiftR n2 8 + 0xDC) c3 = fromIntegral n2 next (RS1 s x2) = Yield x2 (RS0 s) next (RS2 s x2 x3) = Yield x2 (RS1 s x3) next (RS3 s x2 x3 x4) = Yield x2 (RS2 s x3 x4) {-# INLINE next #-} {-# INLINE restreamUtf16LE #-} restreamUtf32BE :: Stream Char -> Stream Word8 restreamUtf32BE (Stream next0 s0 len) = Stream next (RS0 s0) (len * 2) where next (RS0 s) = case next0 s of Done -> Done Skip s' -> Skip (RS0 s') Yield x s' -> Yield c1 (RS3 s' c2 c3 c4) where n = ord x c1 = fromIntegral $ shiftR n 24 c2 = fromIntegral $ shiftR n 16 c3 = fromIntegral $ shiftR n 8 c4 = fromIntegral n next (RS1 s x2) = Yield x2 (RS0 s) next (RS2 s x2 x3) = Yield x2 (RS1 s x3) next (RS3 s x2 x3 x4) = Yield x2 (RS2 s x3 x4) {-# INLINE next #-} {-# INLINE restreamUtf32BE #-} restreamUtf32LE :: Stream Char -> Stream Word8 restreamUtf32LE (Stream next0 s0 len) = Stream next (RS0 s0) (len * 2) where next (RS0 s) = case next0 s of Done -> Done Skip s' -> Skip (RS0 s') Yield x s' -> Yield c1 (RS3 s' c2 c3 c4) where n = ord x c4 = fromIntegral $ shiftR n 24 c3 = fromIntegral $ shiftR n 16 c2 = fromIntegral $ shiftR n 8 c1 = fromIntegral n next (RS1 s x2) = Yield x2 (RS0 s) next (RS2 s x2 x3) = Yield x2 (RS1 s x3) next (RS3 s x2 x3 x4) = Yield x2 (RS2 s x3 x4) {-# INLINE next #-} {-# INLINE restreamUtf32LE #-} text-0.11.3.1/Data/Text/Fusion/0000755000000000000000000000000012140332415014140 5ustar0000000000000000text-0.11.3.1/Data/Text/Fusion/CaseMapping.hs0000644000000000000000000006413612140332415016675 0ustar0000000000000000{-# LANGUAGE Rank2Types #-} -- AUTOMATICALLY GENERATED - DO NOT EDIT -- Generated by scripts/SpecialCasing.hs module Data.Text.Fusion.CaseMapping where import Data.Char import Data.Text.Fusion.Internal upperMapping :: forall s. Char -> s -> Step (CC s) Char {-# INLINE upperMapping #-} -- LATIN SMALL LETTER SHARP S upperMapping '\x00df' s = Yield '\x0053' (CC s '\x0053' '\x0000') -- LATIN SMALL LIGATURE FF upperMapping '\xfb00' s = Yield '\x0046' (CC s '\x0046' '\x0000') -- LATIN SMALL LIGATURE FI upperMapping '\xfb01' s = Yield '\x0046' (CC s '\x0049' '\x0000') -- LATIN SMALL LIGATURE FL upperMapping '\xfb02' s = Yield '\x0046' (CC s '\x004c' '\x0000') -- LATIN SMALL LIGATURE FFI upperMapping '\xfb03' s = Yield '\x0046' (CC s '\x0046' '\x0049') -- LATIN SMALL LIGATURE FFL upperMapping '\xfb04' s = Yield '\x0046' (CC s '\x0046' '\x004c') -- LATIN SMALL LIGATURE LONG S T upperMapping '\xfb05' s = Yield '\x0053' (CC s '\x0054' '\x0000') -- LATIN SMALL LIGATURE ST upperMapping '\xfb06' s = Yield '\x0053' (CC s '\x0054' '\x0000') -- ARMENIAN SMALL LIGATURE ECH YIWN upperMapping '\x0587' s = Yield '\x0535' (CC s '\x0552' '\x0000') -- ARMENIAN SMALL LIGATURE MEN NOW upperMapping '\xfb13' s = Yield '\x0544' (CC s '\x0546' '\x0000') -- ARMENIAN SMALL LIGATURE MEN ECH upperMapping '\xfb14' s = Yield '\x0544' (CC s '\x0535' '\x0000') -- ARMENIAN SMALL LIGATURE MEN INI upperMapping '\xfb15' s = Yield '\x0544' (CC s '\x053b' '\x0000') -- ARMENIAN SMALL LIGATURE VEW NOW upperMapping '\xfb16' s = Yield '\x054e' (CC s '\x0546' '\x0000') -- ARMENIAN SMALL LIGATURE MEN XEH upperMapping '\xfb17' s = Yield '\x0544' (CC s '\x053d' '\x0000') -- LATIN SMALL LETTER N PRECEDED BY APOSTROPHE upperMapping '\x0149' s = Yield '\x02bc' (CC s '\x004e' '\x0000') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS upperMapping '\x0390' s = Yield '\x0399' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS upperMapping '\x03b0' s = Yield '\x03a5' (CC s '\x0308' '\x0301') -- LATIN SMALL LETTER J WITH CARON upperMapping '\x01f0' s = Yield '\x004a' (CC s '\x030c' '\x0000') -- LATIN SMALL LETTER H WITH LINE BELOW upperMapping '\x1e96' s = Yield '\x0048' (CC s '\x0331' '\x0000') -- LATIN SMALL LETTER T WITH DIAERESIS upperMapping '\x1e97' s = Yield '\x0054' (CC s '\x0308' '\x0000') -- LATIN SMALL LETTER W WITH RING ABOVE upperMapping '\x1e98' s = Yield '\x0057' (CC s '\x030a' '\x0000') -- LATIN SMALL LETTER Y WITH RING ABOVE upperMapping '\x1e99' s = Yield '\x0059' (CC s '\x030a' '\x0000') -- LATIN SMALL LETTER A WITH RIGHT HALF RING upperMapping '\x1e9a' s = Yield '\x0041' (CC s '\x02be' '\x0000') -- GREEK SMALL LETTER UPSILON WITH PSILI upperMapping '\x1f50' s = Yield '\x03a5' (CC s '\x0313' '\x0000') -- GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA upperMapping '\x1f52' s = Yield '\x03a5' (CC s '\x0313' '\x0300') -- GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA upperMapping '\x1f54' s = Yield '\x03a5' (CC s '\x0313' '\x0301') -- GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI upperMapping '\x1f56' s = Yield '\x03a5' (CC s '\x0313' '\x0342') -- GREEK SMALL LETTER ALPHA WITH PERISPOMENI upperMapping '\x1fb6' s = Yield '\x0391' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER ETA WITH PERISPOMENI upperMapping '\x1fc6' s = Yield '\x0397' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA upperMapping '\x1fd2' s = Yield '\x0399' (CC s '\x0308' '\x0300') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA upperMapping '\x1fd3' s = Yield '\x0399' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER IOTA WITH PERISPOMENI upperMapping '\x1fd6' s = Yield '\x0399' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI upperMapping '\x1fd7' s = Yield '\x0399' (CC s '\x0308' '\x0342') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA upperMapping '\x1fe2' s = Yield '\x03a5' (CC s '\x0308' '\x0300') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA upperMapping '\x1fe3' s = Yield '\x03a5' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER RHO WITH PSILI upperMapping '\x1fe4' s = Yield '\x03a1' (CC s '\x0313' '\x0000') -- GREEK SMALL LETTER UPSILON WITH PERISPOMENI upperMapping '\x1fe6' s = Yield '\x03a5' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI upperMapping '\x1fe7' s = Yield '\x03a5' (CC s '\x0308' '\x0342') -- GREEK SMALL LETTER OMEGA WITH PERISPOMENI upperMapping '\x1ff6' s = Yield '\x03a9' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI upperMapping '\x1f80' s = Yield '\x1f08' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI upperMapping '\x1f81' s = Yield '\x1f09' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI upperMapping '\x1f82' s = Yield '\x1f0a' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI upperMapping '\x1f83' s = Yield '\x1f0b' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI upperMapping '\x1f84' s = Yield '\x1f0c' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI upperMapping '\x1f85' s = Yield '\x1f0d' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1f86' s = Yield '\x1f0e' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1f87' s = Yield '\x1f0f' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI upperMapping '\x1f88' s = Yield '\x1f08' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI upperMapping '\x1f89' s = Yield '\x1f09' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI upperMapping '\x1f8a' s = Yield '\x1f0a' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI upperMapping '\x1f8b' s = Yield '\x1f0b' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI upperMapping '\x1f8c' s = Yield '\x1f0c' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI upperMapping '\x1f8d' s = Yield '\x1f0d' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI upperMapping '\x1f8e' s = Yield '\x1f0e' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI upperMapping '\x1f8f' s = Yield '\x1f0f' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI upperMapping '\x1f90' s = Yield '\x1f28' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI upperMapping '\x1f91' s = Yield '\x1f29' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI upperMapping '\x1f92' s = Yield '\x1f2a' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND VARIA AND YPOGEGRAMMENI upperMapping '\x1f93' s = Yield '\x1f2b' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND OXIA AND YPOGEGRAMMENI upperMapping '\x1f94' s = Yield '\x1f2c' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND OXIA AND YPOGEGRAMMENI upperMapping '\x1f95' s = Yield '\x1f2d' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1f96' s = Yield '\x1f2e' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1f97' s = Yield '\x1f2f' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI upperMapping '\x1f98' s = Yield '\x1f28' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI upperMapping '\x1f99' s = Yield '\x1f29' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI upperMapping '\x1f9a' s = Yield '\x1f2a' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI upperMapping '\x1f9b' s = Yield '\x1f2b' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI upperMapping '\x1f9c' s = Yield '\x1f2c' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI upperMapping '\x1f9d' s = Yield '\x1f2d' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI upperMapping '\x1f9e' s = Yield '\x1f2e' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI upperMapping '\x1f9f' s = Yield '\x1f2f' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI upperMapping '\x1fa0' s = Yield '\x1f68' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND YPOGEGRAMMENI upperMapping '\x1fa1' s = Yield '\x1f69' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI upperMapping '\x1fa2' s = Yield '\x1f6a' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA AND YPOGEGRAMMENI upperMapping '\x1fa3' s = Yield '\x1f6b' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI upperMapping '\x1fa4' s = Yield '\x1f6c' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA AND YPOGEGRAMMENI upperMapping '\x1fa5' s = Yield '\x1f6d' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1fa6' s = Yield '\x1f6e' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1fa7' s = Yield '\x1f6f' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI upperMapping '\x1fa8' s = Yield '\x1f68' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI upperMapping '\x1fa9' s = Yield '\x1f69' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI upperMapping '\x1faa' s = Yield '\x1f6a' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI upperMapping '\x1fab' s = Yield '\x1f6b' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI upperMapping '\x1fac' s = Yield '\x1f6c' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI upperMapping '\x1fad' s = Yield '\x1f6d' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI upperMapping '\x1fae' s = Yield '\x1f6e' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI upperMapping '\x1faf' s = Yield '\x1f6f' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI upperMapping '\x1fb3' s = Yield '\x0391' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI upperMapping '\x1fbc' s = Yield '\x0391' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI upperMapping '\x1fc3' s = Yield '\x0397' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI upperMapping '\x1fcc' s = Yield '\x0397' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI upperMapping '\x1ff3' s = Yield '\x03a9' (CC s '\x0399' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI upperMapping '\x1ffc' s = Yield '\x03a9' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI upperMapping '\x1fb2' s = Yield '\x1fba' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI upperMapping '\x1fb4' s = Yield '\x0386' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI upperMapping '\x1fc2' s = Yield '\x1fca' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI upperMapping '\x1fc4' s = Yield '\x0389' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI upperMapping '\x1ff2' s = Yield '\x1ffa' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI upperMapping '\x1ff4' s = Yield '\x038f' (CC s '\x0399' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1fb7' s = Yield '\x0391' (CC s '\x0342' '\x0399') -- GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1fc7' s = Yield '\x0397' (CC s '\x0342' '\x0399') -- GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI upperMapping '\x1ff7' s = Yield '\x03a9' (CC s '\x0342' '\x0399') upperMapping c s = Yield (toUpper c) (CC s '\0' '\0') lowerMapping :: forall s. Char -> s -> Step (CC s) Char {-# INLINE lowerMapping #-} -- LATIN CAPITAL LETTER I WITH DOT ABOVE lowerMapping '\x0130' s = Yield '\x0069' (CC s '\x0307' '\x0000') lowerMapping c s = Yield (toLower c) (CC s '\0' '\0') foldMapping :: forall s. Char -> s -> Step (CC s) Char {-# INLINE foldMapping #-} -- MICRO SIGN foldMapping '\x00b5' s = Yield '\x03bc' (CC s '\x0000' '\x0000') -- LATIN SMALL LETTER SHARP S foldMapping '\x00df' s = Yield '\x0073' (CC s '\x0073' '\x0000') -- LATIN CAPITAL LETTER I WITH DOT ABOVE foldMapping '\x0130' s = Yield '\x0069' (CC s '\x0307' '\x0000') -- LATIN SMALL LETTER N PRECEDED BY APOSTROPHE foldMapping '\x0149' s = Yield '\x02bc' (CC s '\x006e' '\x0000') -- LATIN SMALL LETTER LONG S foldMapping '\x017f' s = Yield '\x0073' (CC s '\x0000' '\x0000') -- LATIN SMALL LETTER J WITH CARON foldMapping '\x01f0' s = Yield '\x006a' (CC s '\x030c' '\x0000') -- COMBINING GREEK YPOGEGRAMMENI foldMapping '\x0345' s = Yield '\x03b9' (CC s '\x0000' '\x0000') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS foldMapping '\x0390' s = Yield '\x03b9' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS foldMapping '\x03b0' s = Yield '\x03c5' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER FINAL SIGMA foldMapping '\x03c2' s = Yield '\x03c3' (CC s '\x0000' '\x0000') -- GREEK BETA SYMBOL foldMapping '\x03d0' s = Yield '\x03b2' (CC s '\x0000' '\x0000') -- GREEK THETA SYMBOL foldMapping '\x03d1' s = Yield '\x03b8' (CC s '\x0000' '\x0000') -- GREEK PHI SYMBOL foldMapping '\x03d5' s = Yield '\x03c6' (CC s '\x0000' '\x0000') -- GREEK PI SYMBOL foldMapping '\x03d6' s = Yield '\x03c0' (CC s '\x0000' '\x0000') -- GREEK KAPPA SYMBOL foldMapping '\x03f0' s = Yield '\x03ba' (CC s '\x0000' '\x0000') -- GREEK RHO SYMBOL foldMapping '\x03f1' s = Yield '\x03c1' (CC s '\x0000' '\x0000') -- GREEK LUNATE EPSILON SYMBOL foldMapping '\x03f5' s = Yield '\x03b5' (CC s '\x0000' '\x0000') -- ARMENIAN SMALL LIGATURE ECH YIWN foldMapping '\x0587' s = Yield '\x0565' (CC s '\x0582' '\x0000') -- LATIN SMALL LETTER H WITH LINE BELOW foldMapping '\x1e96' s = Yield '\x0068' (CC s '\x0331' '\x0000') -- LATIN SMALL LETTER T WITH DIAERESIS foldMapping '\x1e97' s = Yield '\x0074' (CC s '\x0308' '\x0000') -- LATIN SMALL LETTER W WITH RING ABOVE foldMapping '\x1e98' s = Yield '\x0077' (CC s '\x030a' '\x0000') -- LATIN SMALL LETTER Y WITH RING ABOVE foldMapping '\x1e99' s = Yield '\x0079' (CC s '\x030a' '\x0000') -- LATIN SMALL LETTER A WITH RIGHT HALF RING foldMapping '\x1e9a' s = Yield '\x0061' (CC s '\x02be' '\x0000') -- LATIN SMALL LETTER LONG S WITH DOT ABOVE foldMapping '\x1e9b' s = Yield '\x1e61' (CC s '\x0000' '\x0000') -- LATIN CAPITAL LETTER SHARP S foldMapping '\x1e9e' s = Yield '\x0073' (CC s '\x0073' '\x0000') -- GREEK SMALL LETTER UPSILON WITH PSILI foldMapping '\x1f50' s = Yield '\x03c5' (CC s '\x0313' '\x0000') -- GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA foldMapping '\x1f52' s = Yield '\x03c5' (CC s '\x0313' '\x0300') -- GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA foldMapping '\x1f54' s = Yield '\x03c5' (CC s '\x0313' '\x0301') -- GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI foldMapping '\x1f56' s = Yield '\x03c5' (CC s '\x0313' '\x0342') -- GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI foldMapping '\x1f80' s = Yield '\x1f00' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI foldMapping '\x1f81' s = Yield '\x1f01' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI foldMapping '\x1f82' s = Yield '\x1f02' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI foldMapping '\x1f83' s = Yield '\x1f03' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI foldMapping '\x1f84' s = Yield '\x1f04' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI foldMapping '\x1f85' s = Yield '\x1f05' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1f86' s = Yield '\x1f06' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1f87' s = Yield '\x1f07' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI foldMapping '\x1f88' s = Yield '\x1f00' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI foldMapping '\x1f89' s = Yield '\x1f01' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI foldMapping '\x1f8a' s = Yield '\x1f02' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI foldMapping '\x1f8b' s = Yield '\x1f03' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI foldMapping '\x1f8c' s = Yield '\x1f04' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI foldMapping '\x1f8d' s = Yield '\x1f05' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI foldMapping '\x1f8e' s = Yield '\x1f06' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI foldMapping '\x1f8f' s = Yield '\x1f07' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI foldMapping '\x1f90' s = Yield '\x1f20' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI foldMapping '\x1f91' s = Yield '\x1f21' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI foldMapping '\x1f92' s = Yield '\x1f22' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND VARIA AND YPOGEGRAMMENI foldMapping '\x1f93' s = Yield '\x1f23' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND OXIA AND YPOGEGRAMMENI foldMapping '\x1f94' s = Yield '\x1f24' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND OXIA AND YPOGEGRAMMENI foldMapping '\x1f95' s = Yield '\x1f25' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1f96' s = Yield '\x1f26' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1f97' s = Yield '\x1f27' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI foldMapping '\x1f98' s = Yield '\x1f20' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI foldMapping '\x1f99' s = Yield '\x1f21' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI foldMapping '\x1f9a' s = Yield '\x1f22' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI foldMapping '\x1f9b' s = Yield '\x1f23' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI foldMapping '\x1f9c' s = Yield '\x1f24' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI foldMapping '\x1f9d' s = Yield '\x1f25' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI foldMapping '\x1f9e' s = Yield '\x1f26' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI foldMapping '\x1f9f' s = Yield '\x1f27' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI foldMapping '\x1fa0' s = Yield '\x1f60' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND YPOGEGRAMMENI foldMapping '\x1fa1' s = Yield '\x1f61' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI foldMapping '\x1fa2' s = Yield '\x1f62' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA AND YPOGEGRAMMENI foldMapping '\x1fa3' s = Yield '\x1f63' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI foldMapping '\x1fa4' s = Yield '\x1f64' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA AND YPOGEGRAMMENI foldMapping '\x1fa5' s = Yield '\x1f65' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1fa6' s = Yield '\x1f66' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1fa7' s = Yield '\x1f67' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI foldMapping '\x1fa8' s = Yield '\x1f60' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI foldMapping '\x1fa9' s = Yield '\x1f61' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI foldMapping '\x1faa' s = Yield '\x1f62' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI foldMapping '\x1fab' s = Yield '\x1f63' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI foldMapping '\x1fac' s = Yield '\x1f64' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI foldMapping '\x1fad' s = Yield '\x1f65' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI foldMapping '\x1fae' s = Yield '\x1f66' (CC s '\x03b9' '\x0000') -- GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI foldMapping '\x1faf' s = Yield '\x1f67' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI foldMapping '\x1fb2' s = Yield '\x1f70' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI foldMapping '\x1fb3' s = Yield '\x03b1' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI foldMapping '\x1fb4' s = Yield '\x03ac' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PERISPOMENI foldMapping '\x1fb6' s = Yield '\x03b1' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1fb7' s = Yield '\x03b1' (CC s '\x0342' '\x03b9') -- GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI foldMapping '\x1fbc' s = Yield '\x03b1' (CC s '\x03b9' '\x0000') -- GREEK PROSGEGRAMMENI foldMapping '\x1fbe' s = Yield '\x03b9' (CC s '\x0000' '\x0000') -- GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI foldMapping '\x1fc2' s = Yield '\x1f74' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI foldMapping '\x1fc3' s = Yield '\x03b7' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI foldMapping '\x1fc4' s = Yield '\x03ae' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER ETA WITH PERISPOMENI foldMapping '\x1fc6' s = Yield '\x03b7' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1fc7' s = Yield '\x03b7' (CC s '\x0342' '\x03b9') -- GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI foldMapping '\x1fcc' s = Yield '\x03b7' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA foldMapping '\x1fd2' s = Yield '\x03b9' (CC s '\x0308' '\x0300') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA foldMapping '\x1fd3' s = Yield '\x03b9' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER IOTA WITH PERISPOMENI foldMapping '\x1fd6' s = Yield '\x03b9' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI foldMapping '\x1fd7' s = Yield '\x03b9' (CC s '\x0308' '\x0342') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA foldMapping '\x1fe2' s = Yield '\x03c5' (CC s '\x0308' '\x0300') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA foldMapping '\x1fe3' s = Yield '\x03c5' (CC s '\x0308' '\x0301') -- GREEK SMALL LETTER RHO WITH PSILI foldMapping '\x1fe4' s = Yield '\x03c1' (CC s '\x0313' '\x0000') -- GREEK SMALL LETTER UPSILON WITH PERISPOMENI foldMapping '\x1fe6' s = Yield '\x03c5' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI foldMapping '\x1fe7' s = Yield '\x03c5' (CC s '\x0308' '\x0342') -- GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI foldMapping '\x1ff2' s = Yield '\x1f7c' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI foldMapping '\x1ff3' s = Yield '\x03c9' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI foldMapping '\x1ff4' s = Yield '\x03ce' (CC s '\x03b9' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PERISPOMENI foldMapping '\x1ff6' s = Yield '\x03c9' (CC s '\x0342' '\x0000') -- GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI foldMapping '\x1ff7' s = Yield '\x03c9' (CC s '\x0342' '\x03b9') -- GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI foldMapping '\x1ffc' s = Yield '\x03c9' (CC s '\x03b9' '\x0000') -- LATIN SMALL LIGATURE FF foldMapping '\xfb00' s = Yield '\x0066' (CC s '\x0066' '\x0000') -- LATIN SMALL LIGATURE FI foldMapping '\xfb01' s = Yield '\x0066' (CC s '\x0069' '\x0000') -- LATIN SMALL LIGATURE FL foldMapping '\xfb02' s = Yield '\x0066' (CC s '\x006c' '\x0000') -- LATIN SMALL LIGATURE FFI foldMapping '\xfb03' s = Yield '\x0066' (CC s '\x0066' '\x0069') -- LATIN SMALL LIGATURE FFL foldMapping '\xfb04' s = Yield '\x0066' (CC s '\x0066' '\x006c') -- LATIN SMALL LIGATURE LONG S T foldMapping '\xfb05' s = Yield '\x0073' (CC s '\x0074' '\x0000') -- LATIN SMALL LIGATURE ST foldMapping '\xfb06' s = Yield '\x0073' (CC s '\x0074' '\x0000') -- ARMENIAN SMALL LIGATURE MEN NOW foldMapping '\xfb13' s = Yield '\x0574' (CC s '\x0576' '\x0000') -- ARMENIAN SMALL LIGATURE MEN ECH foldMapping '\xfb14' s = Yield '\x0574' (CC s '\x0565' '\x0000') -- ARMENIAN SMALL LIGATURE MEN INI foldMapping '\xfb15' s = Yield '\x0574' (CC s '\x056b' '\x0000') -- ARMENIAN SMALL LIGATURE VEW NOW foldMapping '\xfb16' s = Yield '\x057e' (CC s '\x0576' '\x0000') -- ARMENIAN SMALL LIGATURE MEN XEH foldMapping '\xfb17' s = Yield '\x0574' (CC s '\x056d' '\x0000') foldMapping c s = Yield (toLower c) (CC s '\0' '\0') text-0.11.3.1/Data/Text/Fusion/Common.hs0000644000000000000000000010104012140332415015720 0ustar0000000000000000{-# LANGUAGE BangPatterns, MagicHash, Rank2Types #-} -- | -- Module : Data.Text.Fusion.Common -- Copyright : (c) Bryan O'Sullivan 2009, 2012 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Common stream fusion functionality for text. module Data.Text.Fusion.Common ( -- * Creation and elimination singleton , streamList , unstreamList , streamCString# -- * Basic interface , cons , snoc , append , head , uncons , last , tail , init , null , lengthI , compareLengthI , isSingleton -- * Transformations , map , intercalate , intersperse -- ** Case conversion -- $case , toCaseFold , toLower , toUpper -- ** Justification , justifyLeftI -- * Folds , foldl , foldl' , foldl1 , foldl1' , foldr , foldr1 -- ** Special folds , concat , concatMap , any , all , maximum , minimum -- * Construction -- ** Scans , scanl -- ** Accumulating maps -- , mapAccumL -- ** Generation and unfolding , replicateCharI , replicateI , unfoldr , unfoldrNI -- * Substrings -- ** Breaking strings , take , drop , takeWhile , dropWhile -- * Predicates , isPrefixOf -- * Searching , elem , filter -- * Indexing , findBy , indexI , findIndexI , countCharI -- * Zipping and unzipping , zipWith ) where import Prelude (Bool(..), Char, Eq(..), Int, Integral, Maybe(..), Ord(..), Ordering(..), String, (.), ($), (+), (-), (*), (++), (&&), fromIntegral, otherwise) import qualified Data.List as L import qualified Prelude as P import Data.Bits (shiftL) import Data.Int (Int64) import Data.Text.Fusion.Internal import Data.Text.Fusion.CaseMapping (foldMapping, lowerMapping, upperMapping) import Data.Text.Fusion.Size import GHC.Prim (Addr#, chr#, indexCharOffAddr#, ord#) import GHC.Types (Char(..), Int(..)) singleton :: Char -> Stream Char singleton c = Stream next False 1 where next False = Yield c True next True = Done {-# INLINE singleton #-} streamList :: [a] -> Stream a {-# INLINE [0] streamList #-} streamList s = Stream next s unknownSize where next [] = Done next (x:xs) = Yield x xs unstreamList :: Stream a -> [a] unstreamList (Stream next s0 _len) = unfold s0 where unfold !s = case next s of Done -> [] Skip s' -> unfold s' Yield x s' -> x : unfold s' {-# INLINE [0] unstreamList #-} {-# RULES "STREAM streamList/unstreamList fusion" forall s. streamList (unstreamList s) = s #-} -- | Stream the UTF-8-like packed encoding used by GHC to represent -- constant strings in generated code. -- -- This encoding uses the byte sequence "\xc0\x80" to represent NUL, -- and the string is NUL-terminated. streamCString# :: Addr# -> Stream Char streamCString# addr = Stream step 0 unknownSize where step !i | b == 0 = Done | b <= 0x7f = Yield (C# b#) (i+1) | b <= 0xdf = let !c = chr $ ((b-0xc0) `shiftL` 6) + next 1 in Yield c (i+2) | b <= 0xef = let !c = chr $ ((b-0xe0) `shiftL` 12) + (next 1 `shiftL` 6) + next 2 in Yield c (i+3) | otherwise = let !c = chr $ ((b-0xf0) `shiftL` 18) + (next 1 `shiftL` 12) + (next 2 `shiftL` 6) + next 3 in Yield c (i+4) where b = I# (ord# b#) next n = I# (ord# (at# (i+n))) - 0x80 !b# = at# i at# (I# i#) = indexCharOffAddr# addr i# chr (I# i#) = C# (chr# i#) {-# INLINE [0] streamCString# #-} -- ---------------------------------------------------------------------------- -- * Basic stream functions data C s = C0 !s | C1 !s -- | /O(n)/ Adds a character to the front of a Stream Char. cons :: Char -> Stream Char -> Stream Char cons !w (Stream next0 s0 len) = Stream next (C1 s0) (len+1) where next (C1 s) = Yield w (C0 s) next (C0 s) = case next0 s of Done -> Done Skip s' -> Skip (C0 s') Yield x s' -> Yield x (C0 s') {-# INLINE [0] cons #-} -- | /O(n)/ Adds a character to the end of a stream. snoc :: Stream Char -> Char -> Stream Char snoc (Stream next0 xs0 len) w = Stream next (J xs0) (len+1) where next (J xs) = case next0 xs of Done -> Yield w N Skip xs' -> Skip (J xs') Yield x xs' -> Yield x (J xs') next N = Done {-# INLINE [0] snoc #-} data E l r = L !l | R !r -- | /O(n)/ Appends one Stream to the other. append :: Stream Char -> Stream Char -> Stream Char append (Stream next0 s01 len1) (Stream next1 s02 len2) = Stream next (L s01) (len1 + len2) where next (L s1) = case next0 s1 of Done -> Skip (R s02) Skip s1' -> Skip (L s1') Yield x s1' -> Yield x (L s1') next (R s2) = case next1 s2 of Done -> Done Skip s2' -> Skip (R s2') Yield x s2' -> Yield x (R s2') {-# INLINE [0] append #-} -- | /O(1)/ Returns the first character of a Text, which must be non-empty. -- Subject to array fusion. head :: Stream Char -> Char head (Stream next s0 _len) = loop_head s0 where loop_head !s = case next s of Yield x _ -> x Skip s' -> loop_head s' Done -> head_empty {-# INLINE [0] head #-} head_empty :: a head_empty = streamError "head" "Empty stream" {-# NOINLINE head_empty #-} -- | /O(1)/ Returns the first character and remainder of a 'Stream -- Char', or 'Nothing' if empty. Subject to array fusion. uncons :: Stream Char -> Maybe (Char, Stream Char) uncons (Stream next s0 len) = loop_uncons s0 where loop_uncons !s = case next s of Yield x s1 -> Just (x, Stream next s1 (len-1)) Skip s' -> loop_uncons s' Done -> Nothing {-# INLINE [0] uncons #-} -- | /O(n)/ Returns the last character of a 'Stream Char', which must -- be non-empty. last :: Stream Char -> Char last (Stream next s0 _len) = loop0_last s0 where loop0_last !s = case next s of Done -> emptyError "last" Skip s' -> loop0_last s' Yield x s' -> loop_last x s' loop_last !x !s = case next s of Done -> x Skip s' -> loop_last x s' Yield x' s' -> loop_last x' s' {-# INLINE[0] last #-} -- | /O(1)/ Returns all characters after the head of a Stream Char, which must -- be non-empty. tail :: Stream Char -> Stream Char tail (Stream next0 s0 len) = Stream next (C0 s0) (len-1) where next (C0 s) = case next0 s of Done -> emptyError "tail" Skip s' -> Skip (C0 s') Yield _ s' -> Skip (C1 s') next (C1 s) = case next0 s of Done -> Done Skip s' -> Skip (C1 s') Yield x s' -> Yield x (C1 s') {-# INLINE [0] tail #-} data Init s = Init0 !s | Init1 {-# UNPACK #-} !Char !s -- | /O(1)/ Returns all but the last character of a Stream Char, which -- must be non-empty. init :: Stream Char -> Stream Char init (Stream next0 s0 len) = Stream next (Init0 s0) (len-1) where next (Init0 s) = case next0 s of Done -> emptyError "init" Skip s' -> Skip (Init0 s') Yield x s' -> Skip (Init1 x s') next (Init1 x s) = case next0 s of Done -> Done Skip s' -> Skip (Init1 x s') Yield x' s' -> Yield x (Init1 x' s') {-# INLINE [0] init #-} -- | /O(1)/ Tests whether a Stream Char is empty or not. null :: Stream Char -> Bool null (Stream next s0 _len) = loop_null s0 where loop_null !s = case next s of Done -> True Yield _ _ -> False Skip s' -> loop_null s' {-# INLINE[0] null #-} -- | /O(n)/ Returns the number of characters in a string. lengthI :: Integral a => Stream Char -> a lengthI (Stream next s0 _len) = loop_length 0 s0 where loop_length !z s = case next s of Done -> z Skip s' -> loop_length z s' Yield _ s' -> loop_length (z + 1) s' {-# INLINE[0] lengthI #-} -- | /O(n)/ Compares the count of characters in a string to a number. -- Subject to fusion. -- -- This function gives the same answer as comparing against the result -- of 'lengthI', but can short circuit if the count of characters is -- greater than the number, and hence be more efficient. compareLengthI :: Integral a => Stream Char -> a -> Ordering compareLengthI (Stream next s0 len) n = case exactly len of Nothing -> loop_cmp 0 s0 Just i -> compare (fromIntegral i) n where loop_cmp !z s = case next s of Done -> compare z n Skip s' -> loop_cmp z s' Yield _ s' | z > n -> GT | otherwise -> loop_cmp (z + 1) s' {-# INLINE[0] compareLengthI #-} -- | /O(n)/ Indicate whether a string contains exactly one element. isSingleton :: Stream Char -> Bool isSingleton (Stream next s0 _len) = loop 0 s0 where loop !z s = case next s of Done -> z == (1::Int) Skip s' -> loop z s' Yield _ s' | z >= 1 -> False | otherwise -> loop (z+1) s' {-# INLINE[0] isSingleton #-} -- ---------------------------------------------------------------------------- -- * Stream transformations -- | /O(n)/ 'map' @f @xs is the Stream Char obtained by applying @f@ -- to each element of @xs@. map :: (Char -> Char) -> Stream Char -> Stream Char map f (Stream next0 s0 len) = Stream next s0 len where next !s = case next0 s of Done -> Done Skip s' -> Skip s' Yield x s' -> Yield (f x) s' {-# INLINE [0] map #-} {-# RULES "STREAM map/map fusion" forall f g s. map f (map g s) = map (\x -> f (g x)) s #-} data I s = I1 !s | I2 !s {-# UNPACK #-} !Char | I3 !s -- | /O(n)/ Take a character and place it between each of the -- characters of a 'Stream Char'. intersperse :: Char -> Stream Char -> Stream Char intersperse c (Stream next0 s0 len) = Stream next (I1 s0) len where next (I1 s) = case next0 s of Done -> Done Skip s' -> Skip (I1 s') Yield x s' -> Skip (I2 s' x) next (I2 s x) = Yield x (I3 s) next (I3 s) = case next0 s of Done -> Done Skip s' -> Skip (I3 s') Yield x s' -> Yield c (I2 s' x) {-# INLINE [0] intersperse #-} -- ---------------------------------------------------------------------------- -- ** Case conversions (folds) -- $case -- -- With Unicode text, it is incorrect to use combinators like @map -- toUpper@ to case convert each character of a string individually. -- Instead, use the whole-string case conversion functions from this -- module. For correctness in different writing systems, these -- functions may map one input character to two or three output -- characters. caseConvert :: (forall s. Char -> s -> Step (CC s) Char) -> Stream Char -> Stream Char caseConvert remap (Stream next0 s0 len) = Stream next (CC s0 '\0' '\0') len where next (CC s '\0' _) = case next0 s of Done -> Done Skip s' -> Skip (CC s' '\0' '\0') Yield c s' -> remap c s' next (CC s a b) = Yield a (CC s b '\0') -- | /O(n)/ Convert a string to folded case. This function is mainly -- useful for performing caseless (or case insensitive) string -- comparisons. -- -- A string @x@ is a caseless match for a string @y@ if and only if: -- -- @toCaseFold x == toCaseFold y@ -- -- The result string may be longer than the input string, and may -- differ from applying 'toLower' to the input string. For instance, -- the Armenian small ligature men now (U+FB13) is case folded to the -- bigram men now (U+0574 U+0576), while the micro sign (U+00B5) is -- case folded to the Greek small letter letter mu (U+03BC) instead of -- itself. toCaseFold :: Stream Char -> Stream Char toCaseFold = caseConvert foldMapping {-# INLINE [0] toCaseFold #-} -- | /O(n)/ Convert a string to upper case, using simple case -- conversion. The result string may be longer than the input string. -- For instance, the German eszett (U+00DF) maps to the two-letter -- sequence SS. toUpper :: Stream Char -> Stream Char toUpper = caseConvert upperMapping {-# INLINE [0] toUpper #-} -- | /O(n)/ Convert a string to lower case, using simple case -- conversion. The result string may be longer than the input string. -- For instance, the Latin capital letter I with dot above (U+0130) -- maps to the sequence Latin small letter i (U+0069) followed by -- combining dot above (U+0307). toLower :: Stream Char -> Stream Char toLower = caseConvert lowerMapping {-# INLINE [0] toLower #-} justifyLeftI :: Integral a => a -> Char -> Stream Char -> Stream Char justifyLeftI k c (Stream next0 s0 len) = Stream next (s0 :*: S1 :*: 0) (larger (fromIntegral k) len) where next (s :*: S1 :*: n) = case next0 s of Done -> next (s :*: S2 :*: n) Skip s' -> Skip (s' :*: S1 :*: n) Yield x s' -> Yield x (s' :*: S1 :*: n+1) next (s :*: S2 :*: n) | n < k = Yield c (s :*: S2 :*: n+1) | otherwise = Done {-# INLINE next #-} {-# INLINE [0] justifyLeftI #-} -- ---------------------------------------------------------------------------- -- * Reducing Streams (folds) -- | foldl, applied to a binary operator, a starting value (typically the -- left-identity of the operator), and a Stream, reduces the Stream using the -- binary operator, from left to right. foldl :: (b -> Char -> b) -> b -> Stream Char -> b foldl f z0 (Stream next s0 _len) = loop_foldl z0 s0 where loop_foldl z !s = case next s of Done -> z Skip s' -> loop_foldl z s' Yield x s' -> loop_foldl (f z x) s' {-# INLINE [0] foldl #-} -- | A strict version of foldl. foldl' :: (b -> Char -> b) -> b -> Stream Char -> b foldl' f z0 (Stream next s0 _len) = loop_foldl' z0 s0 where loop_foldl' !z !s = case next s of Done -> z Skip s' -> loop_foldl' z s' Yield x s' -> loop_foldl' (f z x) s' {-# INLINE [0] foldl' #-} -- | foldl1 is a variant of foldl that has no starting value argument, -- and thus must be applied to non-empty Streams. foldl1 :: (Char -> Char -> Char) -> Stream Char -> Char foldl1 f (Stream next s0 _len) = loop0_foldl1 s0 where loop0_foldl1 !s = case next s of Skip s' -> loop0_foldl1 s' Yield x s' -> loop_foldl1 x s' Done -> emptyError "foldl1" loop_foldl1 z !s = case next s of Done -> z Skip s' -> loop_foldl1 z s' Yield x s' -> loop_foldl1 (f z x) s' {-# INLINE [0] foldl1 #-} -- | A strict version of foldl1. foldl1' :: (Char -> Char -> Char) -> Stream Char -> Char foldl1' f (Stream next s0 _len) = loop0_foldl1' s0 where loop0_foldl1' !s = case next s of Skip s' -> loop0_foldl1' s' Yield x s' -> loop_foldl1' x s' Done -> emptyError "foldl1" loop_foldl1' !z !s = case next s of Done -> z Skip s' -> loop_foldl1' z s' Yield x s' -> loop_foldl1' (f z x) s' {-# INLINE [0] foldl1' #-} -- | 'foldr', applied to a binary operator, a starting value (typically the -- right-identity of the operator), and a stream, reduces the stream using the -- binary operator, from right to left. foldr :: (Char -> b -> b) -> b -> Stream Char -> b foldr f z (Stream next s0 _len) = loop_foldr s0 where loop_foldr !s = case next s of Done -> z Skip s' -> loop_foldr s' Yield x s' -> f x (loop_foldr s') {-# INLINE [0] foldr #-} -- | foldr1 is a variant of 'foldr' that has no starting value argument, -- and thus must be applied to non-empty streams. -- Subject to array fusion. foldr1 :: (Char -> Char -> Char) -> Stream Char -> Char foldr1 f (Stream next s0 _len) = loop0_foldr1 s0 where loop0_foldr1 !s = case next s of Done -> emptyError "foldr1" Skip s' -> loop0_foldr1 s' Yield x s' -> loop_foldr1 x s' loop_foldr1 x !s = case next s of Done -> x Skip s' -> loop_foldr1 x s' Yield x' s' -> f x (loop_foldr1 x' s') {-# INLINE [0] foldr1 #-} intercalate :: Stream Char -> [Stream Char] -> Stream Char intercalate s = concat . (L.intersperse s) {-# INLINE [0] intercalate #-} -- ---------------------------------------------------------------------------- -- ** Special folds -- | /O(n)/ Concatenate a list of streams. Subject to array fusion. concat :: [Stream Char] -> Stream Char concat = L.foldr append empty {-# INLINE [0] concat #-} -- | Map a function over a stream that results in a stream and concatenate the -- results. concatMap :: (Char -> Stream Char) -> Stream Char -> Stream Char concatMap f = foldr (append . f) empty {-# INLINE [0] concatMap #-} -- | /O(n)/ any @p @xs determines if any character in the stream -- @xs@ satisifes the predicate @p@. any :: (Char -> Bool) -> Stream Char -> Bool any p (Stream next0 s0 _len) = loop_any s0 where loop_any !s = case next0 s of Done -> False Skip s' -> loop_any s' Yield x s' | p x -> True | otherwise -> loop_any s' {-# INLINE [0] any #-} -- | /O(n)/ all @p @xs determines if all characters in the 'Text' -- @xs@ satisify the predicate @p@. all :: (Char -> Bool) -> Stream Char -> Bool all p (Stream next0 s0 _len) = loop_all s0 where loop_all !s = case next0 s of Done -> True Skip s' -> loop_all s' Yield x s' | p x -> loop_all s' | otherwise -> False {-# INLINE [0] all #-} -- | /O(n)/ maximum returns the maximum value from a stream, which must be -- non-empty. maximum :: Stream Char -> Char maximum (Stream next0 s0 _len) = loop0_maximum s0 where loop0_maximum !s = case next0 s of Done -> emptyError "maximum" Skip s' -> loop0_maximum s' Yield x s' -> loop_maximum x s' loop_maximum !z !s = case next0 s of Done -> z Skip s' -> loop_maximum z s' Yield x s' | x > z -> loop_maximum x s' | otherwise -> loop_maximum z s' {-# INLINE [0] maximum #-} -- | /O(n)/ minimum returns the minimum value from a 'Text', which must be -- non-empty. minimum :: Stream Char -> Char minimum (Stream next0 s0 _len) = loop0_minimum s0 where loop0_minimum !s = case next0 s of Done -> emptyError "minimum" Skip s' -> loop0_minimum s' Yield x s' -> loop_minimum x s' loop_minimum !z !s = case next0 s of Done -> z Skip s' -> loop_minimum z s' Yield x s' | x < z -> loop_minimum x s' | otherwise -> loop_minimum z s' {-# INLINE [0] minimum #-} -- ----------------------------------------------------------------------------- -- * Building streams scanl :: (Char -> Char -> Char) -> Char -> Stream Char -> Stream Char scanl f z0 (Stream next0 s0 len) = Stream next (S1 :*: z0 :*: s0) (len+1) -- HINT maybe too low where {-# INLINE next #-} next (S1 :*: z :*: s) = Yield z (S2 :*: z :*: s) next (S2 :*: z :*: s) = case next0 s of Yield x s' -> let !x' = f z x in Yield x' (S2 :*: x' :*: s') Skip s' -> Skip (S2 :*: z :*: s') Done -> Done {-# INLINE [0] scanl #-} -- ----------------------------------------------------------------------------- -- ** Accumulating maps {- -- | /O(n)/ Like a combination of 'map' and 'foldl'. Applies a -- function to each element of a stream, passing an accumulating -- parameter from left to right, and returns a final stream. -- -- /Note/: Unlike the version over lists, this function does not -- return a final value for the accumulator, because the nature of -- streams precludes it. mapAccumL :: (a -> b -> (a,b)) -> a -> Stream b -> Stream b mapAccumL f z0 (Stream next0 s0 len) = Stream next (s0 :*: z0) len -- HINT depends on f where {-# INLINE next #-} next (s :*: z) = case next0 s of Yield x s' -> let (z',y) = f z x in Yield y (s' :*: z') Skip s' -> Skip (s' :*: z) Done -> Done {-# INLINE [0] mapAccumL #-} -} -- ----------------------------------------------------------------------------- -- ** Generating and unfolding streams replicateCharI :: Integral a => a -> Char -> Stream Char replicateCharI n c | n < 0 = empty | otherwise = Stream next 0 (fromIntegral n) -- HINT maybe too low where next i | i >= n = Done | otherwise = Yield c (i + 1) {-# INLINE [0] replicateCharI #-} data RI s = RI !s {-# UNPACK #-} !Int64 replicateI :: Int64 -> Stream Char -> Stream Char replicateI n (Stream next0 s0 len) = Stream next (RI s0 0) (fromIntegral (max 0 n) * len) where next (RI s k) | k >= n = Done | otherwise = case next0 s of Done -> Skip (RI s0 (k+1)) Skip s' -> Skip (RI s' k) Yield x s' -> Yield x (RI s' k) {-# INLINE [0] replicateI #-} -- | /O(n)/, where @n@ is the length of the result. The unfoldr function -- is analogous to the List 'unfoldr'. unfoldr builds a stream -- from a seed value. The function takes the element and returns -- Nothing if it is done producing the stream or returns Just -- (a,b), in which case, a is the next Char in the string, and b is -- the seed value for further production. unfoldr :: (a -> Maybe (Char,a)) -> a -> Stream Char unfoldr f s0 = Stream next s0 1 -- HINT maybe too low where {-# INLINE next #-} next !s = case f s of Nothing -> Done Just (w, s') -> Yield w s' {-# INLINE [0] unfoldr #-} -- | /O(n)/ Like 'unfoldr', 'unfoldrNI' builds a stream from a seed -- value. However, the length of the result is limited by the -- first argument to 'unfoldrNI'. This function is more efficient than -- 'unfoldr' when the length of the result is known. unfoldrNI :: Integral a => a -> (b -> Maybe (Char,b)) -> b -> Stream Char unfoldrNI n f s0 | n < 0 = empty | otherwise = Stream next (0 :*: s0) (fromIntegral (n*2)) -- HINT maybe too high where {-# INLINE next #-} next (z :*: s) = case f s of Nothing -> Done Just (w, s') | z >= n -> Done | otherwise -> Yield w ((z + 1) :*: s') {-# INLINE unfoldrNI #-} ------------------------------------------------------------------------------- -- * Substreams -- | /O(n)/ take n, applied to a stream, returns the prefix of the -- stream of length @n@, or the stream itself if @n@ is greater than the -- length of the stream. take :: Integral a => a -> Stream Char -> Stream Char take n0 (Stream next0 s0 len) = Stream next (n0 :*: s0) (smaller len (fromIntegral (max 0 n0))) where {-# INLINE next #-} next (n :*: s) | n <= 0 = Done | otherwise = case next0 s of Done -> Done Skip s' -> Skip (n :*: s') Yield x s' -> Yield x ((n-1) :*: s') {-# INLINE [0] take #-} -- | /O(n)/ drop n, applied to a stream, returns the suffix of the -- stream after the first @n@ characters, or the empty stream if @n@ -- is greater than the length of the stream. drop :: Integral a => a -> Stream Char -> Stream Char drop n0 (Stream next0 s0 len) = Stream next (J n0 :*: s0) (len - fromIntegral (max 0 n0)) where {-# INLINE next #-} next (J n :*: s) | n <= 0 = Skip (N :*: s) | otherwise = case next0 s of Done -> Done Skip s' -> Skip (J n :*: s') Yield _ s' -> Skip (J (n-1) :*: s') next (N :*: s) = case next0 s of Done -> Done Skip s' -> Skip (N :*: s') Yield x s' -> Yield x (N :*: s') {-# INLINE [0] drop #-} -- | takeWhile, applied to a predicate @p@ and a stream, returns the -- longest prefix (possibly empty) of elements that satisfy p. takeWhile :: (Char -> Bool) -> Stream Char -> Stream Char takeWhile p (Stream next0 s0 len) = Stream next s0 len -- HINT maybe too high where {-# INLINE next #-} next !s = case next0 s of Done -> Done Skip s' -> Skip s' Yield x s' | p x -> Yield x s' | otherwise -> Done {-# INLINE [0] takeWhile #-} -- | dropWhile @p @xs returns the suffix remaining after takeWhile @p @xs. dropWhile :: (Char -> Bool) -> Stream Char -> Stream Char dropWhile p (Stream next0 s0 len) = Stream next (S1 :*: s0) len -- HINT maybe too high where {-# INLINE next #-} next (S1 :*: s) = case next0 s of Done -> Done Skip s' -> Skip (S1 :*: s') Yield x s' | p x -> Skip (S1 :*: s') | otherwise -> Yield x (S2 :*: s') next (S2 :*: s) = case next0 s of Done -> Done Skip s' -> Skip (S2 :*: s') Yield x s' -> Yield x (S2 :*: s') {-# INLINE [0] dropWhile #-} -- | /O(n)/ The 'isPrefixOf' function takes two 'Stream's and returns -- 'True' iff the first is a prefix of the second. isPrefixOf :: (Eq a) => Stream a -> Stream a -> Bool isPrefixOf (Stream next1 s1 _) (Stream next2 s2 _) = loop (next1 s1) (next2 s2) where loop Done _ = True loop _ Done = False loop (Skip s1') (Skip s2') = loop (next1 s1') (next2 s2') loop (Skip s1') x2 = loop (next1 s1') x2 loop x1 (Skip s2') = loop x1 (next2 s2') loop (Yield x1 s1') (Yield x2 s2') = x1 == x2 && loop (next1 s1') (next2 s2') {-# INLINE [0] isPrefixOf #-} -- ---------------------------------------------------------------------------- -- * Searching ------------------------------------------------------------------------------- -- ** Searching by equality -- | /O(n)/ elem is the stream membership predicate. elem :: Char -> Stream Char -> Bool elem w (Stream next s0 _len) = loop_elem s0 where loop_elem !s = case next s of Done -> False Skip s' -> loop_elem s' Yield x s' | x == w -> True | otherwise -> loop_elem s' {-# INLINE [0] elem #-} ------------------------------------------------------------------------------- -- ** Searching with a predicate -- | /O(n)/ The 'findBy' function takes a predicate and a stream, -- and returns the first element in matching the predicate, or 'Nothing' -- if there is no such element. findBy :: (Char -> Bool) -> Stream Char -> Maybe Char findBy p (Stream next s0 _len) = loop_find s0 where loop_find !s = case next s of Done -> Nothing Skip s' -> loop_find s' Yield x s' | p x -> Just x | otherwise -> loop_find s' {-# INLINE [0] findBy #-} -- | /O(n)/ Stream index (subscript) operator, starting from 0. indexI :: Integral a => Stream Char -> a -> Char indexI (Stream next s0 _len) n0 | n0 < 0 = streamError "index" "Negative index" | otherwise = loop_index n0 s0 where loop_index !n !s = case next s of Done -> streamError "index" "Index too large" Skip s' -> loop_index n s' Yield x s' | n == 0 -> x | otherwise -> loop_index (n-1) s' {-# INLINE [0] indexI #-} -- | /O(n)/ 'filter', applied to a predicate and a stream, -- returns a stream containing those characters that satisfy the -- predicate. filter :: (Char -> Bool) -> Stream Char -> Stream Char filter p (Stream next0 s0 len) = Stream next s0 len -- HINT maybe too high where next !s = case next0 s of Done -> Done Skip s' -> Skip s' Yield x s' | p x -> Yield x s' | otherwise -> Skip s' {-# INLINE [0] filter #-} {-# RULES "STREAM filter/filter fusion" forall p q s. filter p (filter q s) = filter (\x -> q x && p x) s #-} -- | The 'findIndexI' function takes a predicate and a stream and -- returns the index of the first element in the stream satisfying the -- predicate. findIndexI :: Integral a => (Char -> Bool) -> Stream Char -> Maybe a findIndexI p s = case findIndicesI p s of (i:_) -> Just i _ -> Nothing {-# INLINE [0] findIndexI #-} -- | The 'findIndicesI' function takes a predicate and a stream and -- returns all indices of the elements in the stream satisfying the -- predicate. findIndicesI :: Integral a => (Char -> Bool) -> Stream Char -> [a] findIndicesI p (Stream next s0 _len) = loop_findIndex 0 s0 where loop_findIndex !i !s = case next s of Done -> [] Skip s' -> loop_findIndex i s' -- hmm. not caught by QC Yield x s' | p x -> i : loop_findIndex (i+1) s' | otherwise -> loop_findIndex (i+1) s' {-# INLINE [0] findIndicesI #-} ------------------------------------------------------------------------------- -- * Zipping -- | zipWith generalises 'zip' by zipping with the function given as -- the first argument, instead of a tupling function. zipWith :: (a -> a -> b) -> Stream a -> Stream a -> Stream b zipWith f (Stream next0 sa0 len1) (Stream next1 sb0 len2) = Stream next (sa0 :*: sb0 :*: N) (smaller len1 len2) where next (sa :*: sb :*: N) = case next0 sa of Done -> Done Skip sa' -> Skip (sa' :*: sb :*: N) Yield a sa' -> Skip (sa' :*: sb :*: J a) next (sa' :*: sb :*: J a) = case next1 sb of Done -> Done Skip sb' -> Skip (sa' :*: sb' :*: J a) Yield b sb' -> Yield (f a b) (sa' :*: sb' :*: N) {-# INLINE [0] zipWith #-} -- | /O(n)/ The 'countCharI' function returns the number of times the -- query element appears in the given stream. countCharI :: Integral a => Char -> Stream Char -> a countCharI a (Stream next s0 _len) = loop 0 s0 where loop !i !s = case next s of Done -> i Skip s' -> loop i s' Yield x s' | a == x -> loop (i+1) s' | otherwise -> loop i s' {-# INLINE [0] countCharI #-} streamError :: String -> String -> a streamError func msg = P.error $ "Data.Text.Fusion.Common." ++ func ++ ": " ++ msg emptyError :: String -> a emptyError func = internalError func "Empty input" internalError :: String -> a internalError func = streamError func "Internal error" text-0.11.3.1/Data/Text/Fusion/Internal.hs0000644000000000000000000000753412140332415016261 0ustar0000000000000000{-# LANGUAGE BangPatterns, ExistentialQuantification #-} -- | -- Module : Data.Text.Fusion.Internal -- Copyright : (c) Tom Harper 2008-2009, -- (c) Bryan O'Sullivan 2009, -- (c) Duncan Coutts 2009, -- (c) Jasper Van der Jeugt 2011 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Core stream fusion functionality for text. module Data.Text.Fusion.Internal ( CC(..) , M(..) , M8 , PairS(..) , RS(..) , Step(..) , Stream(..) , Switch(..) , empty ) where import Data.Text.Fusion.Size import Data.Word (Word8) -- | Specialised tuple for case conversion. data CC s = CC !s {-# UNPACK #-} !Char {-# UNPACK #-} !Char -- | Specialised, strict Maybe-like type. data M a = N | J !a type M8 = M Word8 -- Restreaming state. data RS s = RS0 !s | RS1 !s {-# UNPACK #-} !Word8 | RS2 !s {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 | RS3 !s {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 infixl 2 :*: data PairS a b = !a :*: !b -- deriving (Eq, Ord, Show) -- | Allow a function over a stream to switch between two states. data Switch = S1 | S2 data Step s a = Done | Skip !s | Yield !a !s {- instance (Show a) => Show (Step s a) where show Done = "Done" show (Skip _) = "Skip" show (Yield x _) = "Yield " ++ show x -} instance (Eq a) => Eq (Stream a) where (==) = eq instance (Ord a) => Ord (Stream a) where compare = cmp -- The length hint in a Stream has two roles. If its value is zero, -- we trust it, and treat the stream as empty. Otherwise, we treat it -- as a hint: it should usually be accurate, so we use it when -- unstreaming to decide what size array to allocate. However, the -- unstreaming functions must be able to cope with the hint being too -- small or too large. -- -- The size hint tries to track the UTF-16 code points in a stream, -- but often counts the number of characters instead. It can easily -- undercount if, for instance, a transformed stream contains astral -- plane characters (those above 0x10000). data Stream a = forall s. Stream (s -> Step s a) -- stepper function !s -- current state !Size -- size hint -- | /O(n)/ Determines if two streams are equal. eq :: (Eq a) => Stream a -> Stream a -> Bool eq (Stream next1 s1 _) (Stream next2 s2 _) = loop (next1 s1) (next2 s2) where loop Done Done = True loop (Skip s1') (Skip s2') = loop (next1 s1') (next2 s2') loop (Skip s1') x2 = loop (next1 s1') x2 loop x1 (Skip s2') = loop x1 (next2 s2') loop Done _ = False loop _ Done = False loop (Yield x1 s1') (Yield x2 s2') = x1 == x2 && loop (next1 s1') (next2 s2') {-# INLINE [0] eq #-} cmp :: (Ord a) => Stream a -> Stream a -> Ordering cmp (Stream next1 s1 _) (Stream next2 s2 _) = loop (next1 s1) (next2 s2) where loop Done Done = EQ loop (Skip s1') (Skip s2') = loop (next1 s1') (next2 s2') loop (Skip s1') x2 = loop (next1 s1') x2 loop x1 (Skip s2') = loop x1 (next2 s2') loop Done _ = LT loop _ Done = GT loop (Yield x1 s1') (Yield x2 s2') = case compare x1 x2 of EQ -> loop (next1 s1') (next2 s2') other -> other {-# INLINE [0] cmp #-} -- | The empty stream. empty :: Stream a empty = Stream next () 0 where next _ = Done {-# INLINE [0] empty #-} text-0.11.3.1/Data/Text/Fusion/Size.hs0000644000000000000000000000760012140332415015411 0ustar0000000000000000{-# LANGUAGE CPP #-} {-# OPTIONS_GHC -fno-warn-missing-methods #-} -- | -- Module : Data.Text.Fusion.Internal -- Copyright : (c) Roman Leshchinskiy 2008, -- (c) Bryan O'Sullivan 2009 -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Size hints. module Data.Text.Fusion.Size ( Size , exactly , exactSize , maxSize , unknownSize , smaller , larger , upperBound , isEmpty ) where #if defined(ASSERTS) import Control.Exception (assert) #endif data Size = Exact {-# UNPACK #-} !Int -- ^ Exact size. | Max {-# UNPACK #-} !Int -- ^ Upper bound on size. | Unknown -- ^ Unknown size. deriving (Eq, Show) exactly :: Size -> Maybe Int exactly (Exact n) = Just n exactly _ = Nothing {-# INLINE exactly #-} exactSize :: Int -> Size exactSize n = #if defined(ASSERTS) assert (n >= 0) #endif Exact n {-# INLINE exactSize #-} maxSize :: Int -> Size maxSize n = #if defined(ASSERTS) assert (n >= 0) #endif Max n {-# INLINE maxSize #-} unknownSize :: Size unknownSize = Unknown {-# INLINE unknownSize #-} instance Num Size where (+) = addSize (-) = subtractSize (*) = mulSize fromInteger = f where f = Exact . fromInteger {-# INLINE f #-} add :: Int -> Int -> Int add m n | mn >= 0 = mn | otherwise = overflowError where mn = m + n {-# INLINE add #-} addSize :: Size -> Size -> Size addSize (Exact m) (Exact n) = Exact (add m n) addSize (Exact m) (Max n) = Max (add m n) addSize (Max m) (Exact n) = Max (add m n) addSize (Max m) (Max n) = Max (add m n) addSize _ _ = Unknown {-# INLINE addSize #-} subtractSize :: Size -> Size -> Size subtractSize (Exact m) (Exact n) = Exact (max (m-n) 0) subtractSize (Exact m) (Max _) = Max m subtractSize (Max m) (Exact n) = Max (max (m-n) 0) subtractSize a@(Max _) (Max _) = a subtractSize a@(Max _) Unknown = a subtractSize _ _ = Unknown {-# INLINE subtractSize #-} mul :: Int -> Int -> Int mul m n | m <= maxBound `quot` n = m * n | otherwise = overflowError {-# INLINE mul #-} mulSize :: Size -> Size -> Size mulSize (Exact m) (Exact n) = Exact (mul m n) mulSize (Exact m) (Max n) = Max (mul m n) mulSize (Max m) (Exact n) = Max (mul m n) mulSize (Max m) (Max n) = Max (mul m n) mulSize _ _ = Unknown {-# INLINE mulSize #-} -- | Minimum of two size hints. smaller :: Size -> Size -> Size smaller (Exact m) (Exact n) = Exact (m `min` n) smaller (Exact m) (Max n) = Max (m `min` n) smaller (Exact m) Unknown = Max m smaller (Max m) (Exact n) = Max (m `min` n) smaller (Max m) (Max n) = Max (m `min` n) smaller a@(Max _) Unknown = a smaller Unknown (Exact n) = Max n smaller Unknown (Max n) = Max n smaller Unknown Unknown = Unknown {-# INLINE smaller #-} -- | Maximum of two size hints. larger :: Size -> Size -> Size larger (Exact m) (Exact n) = Exact (m `max` n) larger a@(Exact m) b@(Max n) | m >= n = a | otherwise = b larger a@(Max m) b@(Exact n) | n >= m = b | otherwise = a larger (Max m) (Max n) = Max (m `max` n) larger _ _ = Unknown {-# INLINE larger #-} -- | Compute the maximum size from a size hint, if possible. upperBound :: Int -> Size -> Int upperBound _ (Exact n) = n upperBound _ (Max n) = n upperBound k _ = k {-# INLINE upperBound #-} isEmpty :: Size -> Bool isEmpty (Exact n) = n <= 0 isEmpty (Max n) = n <= 0 isEmpty _ = False {-# INLINE isEmpty #-} overflowError :: Int overflowError = error "Data.Text.Fusion.Size: size overflow" text-0.11.3.1/Data/Text/IO/0000755000000000000000000000000012140332415013204 5ustar0000000000000000text-0.11.3.1/Data/Text/IO/Internal.hs0000644000000000000000000001406612140332415015323 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, RecordWildCards #-} -- | -- Module : Data.Text.IO.Internal -- Copyright : (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Simon Marlow -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Low-level support for text I\/O. module Data.Text.IO.Internal ( #if __GLASGOW_HASKELL__ >= 612 hGetLineWith , readChunk #endif ) where #if __GLASGOW_HASKELL__ >= 612 import qualified Control.Exception as E import Data.IORef (readIORef, writeIORef) import Data.Text (Text) import Data.Text.Fusion (unstream) import Data.Text.Fusion.Internal (Step(..), Stream(..)) import Data.Text.Fusion.Size (exactSize, maxSize) import Data.Text.Unsafe (inlinePerformIO) import Foreign.Storable (peekElemOff) import GHC.IO.Buffer (Buffer(..), CharBuffer, RawCharBuffer, bufferAdjustL, bufferElems, charSize, isEmptyBuffer, readCharBuf, withRawBuffer, writeCharBuf) import GHC.IO.Handle.Internals (ioe_EOF, readTextDevice, wantReadableHandle_) import GHC.IO.Handle.Types (Handle__(..), Newline(..)) import System.IO (Handle) import System.IO.Error (isEOFError) import qualified Data.Text as T -- | Read a single line of input from a handle, constructing a list of -- decoded chunks as we go. When we're done, transform them into the -- destination type. hGetLineWith :: ([Text] -> t) -> Handle -> IO t hGetLineWith f h = wantReadableHandle_ "hGetLine" h go where go hh@Handle__{..} = readIORef haCharBuffer >>= fmap f . hGetLineLoop hh [] hGetLineLoop :: Handle__ -> [Text] -> CharBuffer -> IO [Text] hGetLineLoop hh@Handle__{..} = go where go ts buf@Buffer{ bufL=r0, bufR=w, bufRaw=raw0 } = do let findEOL raw r | r == w = return (False, w) | otherwise = do (c,r') <- readCharBuf raw r if c == '\n' then return (True, r) else findEOL raw r' (eol, off) <- findEOL raw0 r0 (t,r') <- if haInputNL == CRLF then unpack_nl raw0 r0 off else do t <- unpack raw0 r0 off return (t,off) if eol then do writeIORef haCharBuffer (bufferAdjustL (off+1) buf) return $ reverse (t:ts) else do let buf1 = bufferAdjustL r' buf maybe_buf <- maybeFillReadBuffer hh buf1 case maybe_buf of -- Nothing indicates we caught an EOF, and we may have a -- partial line to return. Nothing -> do -- we reached EOF. There might be a lone \r left -- in the buffer, so check for that and -- append it to the line if necessary. let pre | isEmptyBuffer buf1 = T.empty | otherwise = T.singleton '\r' writeIORef haCharBuffer buf1{ bufL=0, bufR=0 } let str = reverse . filter (not . T.null) $ pre:t:ts if null str then ioe_EOF else return str Just new_buf -> go (t:ts) new_buf -- This function is lifted almost verbatim from GHC.IO.Handle.Text. maybeFillReadBuffer :: Handle__ -> CharBuffer -> IO (Maybe CharBuffer) maybeFillReadBuffer handle_ buf = E.catch (Just `fmap` getSomeCharacters handle_ buf) $ \e -> if isEOFError e then return Nothing else ioError e unpack :: RawCharBuffer -> Int -> Int -> IO Text unpack !buf !r !w | charSize /= 4 = sizeError "unpack" | r >= w = return T.empty | otherwise = withRawBuffer buf go where go pbuf = return $! unstream (Stream next r (exactSize (w-r))) where next !i | i >= w = Done | otherwise = Yield (ix i) (i+1) ix i = inlinePerformIO $ peekElemOff pbuf i unpack_nl :: RawCharBuffer -> Int -> Int -> IO (Text, Int) unpack_nl !buf !r !w | charSize /= 4 = sizeError "unpack_nl" | r >= w = return (T.empty, 0) | otherwise = withRawBuffer buf $ go where go pbuf = do let !t = unstream (Stream next r (maxSize (w-r))) w' = w - 1 return $ if ix w' == '\r' then (t,w') else (t,w) where next !i | i >= w = Done | c == '\r' = let i' = i + 1 in if i' < w then if ix i' == '\n' then Yield '\n' (i+2) else Yield '\n' i' else Done | otherwise = Yield c (i+1) where c = ix i ix i = inlinePerformIO $ peekElemOff pbuf i -- This function is completely lifted from GHC.IO.Handle.Text. getSomeCharacters :: Handle__ -> CharBuffer -> IO CharBuffer getSomeCharacters handle_@Handle__{..} buf@Buffer{..} = case bufferElems buf of -- buffer empty: read some more 0 -> {-# SCC "readTextDevice" #-} readTextDevice handle_ buf -- if the buffer has a single '\r' in it and we're doing newline -- translation: read some more 1 | haInputNL == CRLF -> do (c,_) <- readCharBuf bufRaw bufL if c == '\r' then do -- shuffle the '\r' to the beginning. This is only safe -- if we're about to call readTextDevice, otherwise it -- would mess up flushCharBuffer. -- See [note Buffer Flushing], GHC.IO.Handle.Types _ <- writeCharBuf bufRaw 0 '\r' let buf' = buf{ bufL=0, bufR=1 } readTextDevice handle_ buf' else do return buf -- buffer has some chars in it already: just return it _otherwise -> {-# SCC "otherwise" #-} return buf -- | Read a single chunk of strict text from a buffer. Used by both -- the strict and lazy implementations of hGetContents. readChunk :: Handle__ -> CharBuffer -> IO Text readChunk hh@Handle__{..} buf = do buf'@Buffer{..} <- getSomeCharacters hh buf (t,r) <- if haInputNL == CRLF then unpack_nl bufRaw bufL bufR else do t <- unpack bufRaw bufL bufR return (t,bufR) writeIORef haCharBuffer (bufferAdjustL r buf') return t sizeError :: String -> a sizeError loc = error $ "Data.Text.IO." ++ loc ++ ": bad internal buffer size" #endif text-0.11.3.1/Data/Text/Lazy/0000755000000000000000000000000012140332415013614 5ustar0000000000000000text-0.11.3.1/Data/Text/Lazy/Builder.hs0000644000000000000000000000256012140332415015541 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, Rank2Types #-} ----------------------------------------------------------------------------- -- | -- Module : Data.Text.Lazy.Builder -- Copyright : (c) 2013 Bryan O'Sullivan -- (c) 2010 Johan Tibell -- License : BSD3-style (see LICENSE) -- -- Maintainer : Johan Tibell -- Stability : experimental -- Portability : portable to Hugs and GHC -- -- Efficient construction of lazy @Text@ values. The principal -- operations on a @Builder@ are @singleton@, @fromText@, and -- @fromLazyText@, which construct new builders, and 'mappend', which -- concatenates two builders. -- -- To get maximum performance when building lazy @Text@ values using a -- builder, associate @mappend@ calls to the right. For example, -- prefer -- -- > singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c') -- -- to -- -- > singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c' -- -- as the latter associates @mappend@ to the left. -- ----------------------------------------------------------------------------- module Data.Text.Lazy.Builder ( -- * The Builder type Builder , toLazyText , toLazyTextWith -- * Constructing Builders , singleton , fromText , fromLazyText , fromString -- * Flushing the buffer state , flush ) where import Data.Text.Lazy.Builder.Internal text-0.11.3.1/Data/Text/Lazy/Encoding.hs0000644000000000000000000002155012140332415015701 0ustar0000000000000000{-# LANGUAGE BangPatterns,CPP #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Lazy.Encoding -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Functions for converting lazy 'Text' values to and from lazy -- 'ByteString', using several standard encodings. -- -- To gain access to a much larger variety of encodings, use the -- @text-icu@ package: module Data.Text.Lazy.Encoding ( -- * Decoding ByteStrings to Text -- $strict decodeASCII , decodeLatin1 , decodeUtf8 , decodeUtf16LE , decodeUtf16BE , decodeUtf32LE , decodeUtf32BE -- ** Catchable failure , decodeUtf8' -- ** Controllable error handling , decodeUtf8With , decodeUtf16LEWith , decodeUtf16BEWith , decodeUtf32LEWith , decodeUtf32BEWith -- * Encoding Text to ByteStrings , encodeUtf8 , encodeUtf16LE , encodeUtf16BE , encodeUtf32LE , encodeUtf32BE ) where import Control.Exception (evaluate, try) import Data.Bits ((.&.)) import Data.Text.Encoding.Error (OnDecodeError, UnicodeException, strictDecode) import Data.Text.Lazy.Internal (Text(..), chunk, empty, foldrChunks) import qualified Data.ByteString as S import qualified Data.ByteString.Lazy as B import qualified Data.ByteString.Lazy.Internal as B import qualified Data.ByteString.Unsafe as S import qualified Data.Text as T import qualified Data.Text.Encoding as TE import qualified Data.Text.Lazy.Encoding.Fusion as E import qualified Data.Text.Lazy.Fusion as F import Data.Text.Unsafe (unsafeDupablePerformIO) -- $strict -- -- All of the single-parameter functions for decoding bytestrings -- encoded in one of the Unicode Transformation Formats (UTF) operate -- in a /strict/ mode: each will throw an exception if given invalid -- input. -- -- Each function has a variant, whose name is suffixed with -'With', -- that gives greater control over the handling of decoding errors. -- For instance, 'decodeUtf8' will throw an exception, but -- 'decodeUtf8With' allows the programmer to determine what to do on a -- decoding error. -- | /Deprecated/. Decode a 'ByteString' containing 7-bit ASCII -- encoded text. -- -- This function is deprecated. Use 'decodeLatin1' instead. decodeASCII :: B.ByteString -> Text decodeASCII = decodeUtf8 {-# DEPRECATED decodeASCII "Use decodeUtf8 instead" #-} -- | Decode a 'ByteString' containing Latin-1 (aka ISO-8859-1) encoded text. decodeLatin1 :: B.ByteString -> Text decodeLatin1 = foldr (chunk . TE.decodeLatin1) empty . B.toChunks -- | Decode a 'ByteString' containing UTF-8 encoded text. decodeUtf8With :: OnDecodeError -> B.ByteString -> Text decodeUtf8With onErr bs0 = fast bs0 where decode = TE.decodeUtf8With onErr fast (B.Chunk p ps) | isComplete p = chunk (decode p) (fast ps) | otherwise = chunk (decode h) (slow t ps) where (h,t) = S.splitAt pivot p pivot | at 1 = len-1 | at 2 = len-2 | otherwise = len-3 len = S.length p at n = len >= n && S.unsafeIndex p (len-n) .&. 0xc0 == 0xc0 fast B.Empty = empty slow i bs = {-# SCC "decodeUtf8With'/slow" #-} case B.uncons bs of Just (w,bs') | isComplete i' -> chunk (decode i') (fast bs') | otherwise -> slow i' bs' where i' = S.snoc i w Nothing -> case S.uncons i of Just (j,i') -> case onErr desc (Just j) of Nothing -> slow i' bs Just c -> Chunk (T.singleton c) (slow i' bs) Nothing -> case onErr desc Nothing of Nothing -> empty Just c -> Chunk (T.singleton c) empty isComplete bs = {-# SCC "decodeUtf8With'/isComplete" #-} ix 1 .&. 0x80 == 0 || (len >= 2 && ix 2 .&. 0xe0 == 0xc0) || (len >= 3 && ix 3 .&. 0xf0 == 0xe0) || (len >= 4 && ix 4 .&. 0xf8 == 0xf0) where len = S.length bs ix n = S.unsafeIndex bs (len-n) desc = "Data.Text.Lazy.Encoding.decodeUtf8With: Invalid UTF-8 stream" {-# INLINE[0] decodeUtf8With #-} -- | Decode a 'ByteString' containing UTF-8 encoded text that is known -- to be valid. -- -- If the input contains any invalid UTF-8 data, an exception will be -- thrown that cannot be caught in pure code. For more control over -- the handling of invalid data, use 'decodeUtf8'' or -- 'decodeUtf8With'. decodeUtf8 :: B.ByteString -> Text decodeUtf8 = decodeUtf8With strictDecode {-# INLINE[0] decodeUtf8 #-} -- This rule seems to cause performance loss. {- RULES "LAZY STREAM stream/decodeUtf8' fusion" [1] forall bs. F.stream (decodeUtf8' bs) = E.streamUtf8 strictDecode bs #-} -- | Decode a 'ByteString' containing UTF-8 encoded text.. -- -- If the input contains any invalid UTF-8 data, the relevant -- exception will be returned, otherwise the decoded text. -- -- /Note/: this function is /not/ lazy, as it must decode its entire -- input before it can return a result. If you need lazy (streaming) -- decoding, use 'decodeUtf8With' in lenient mode. decodeUtf8' :: B.ByteString -> Either UnicodeException Text decodeUtf8' bs = unsafeDupablePerformIO $ do let t = decodeUtf8 bs try (evaluate (rnf t `seq` t)) where rnf Empty = () rnf (Chunk _ ts) = rnf ts {-# INLINE decodeUtf8' #-} encodeUtf8 :: Text -> B.ByteString encodeUtf8 (Chunk c cs) = B.Chunk (TE.encodeUtf8 c) (encodeUtf8 cs) encodeUtf8 Empty = B.Empty -- | Decode text from little endian UTF-16 encoding. decodeUtf16LEWith :: OnDecodeError -> B.ByteString -> Text decodeUtf16LEWith onErr bs = F.unstream (E.streamUtf16LE onErr bs) {-# INLINE decodeUtf16LEWith #-} -- | Decode text from little endian UTF-16 encoding. -- -- If the input contains any invalid little endian UTF-16 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf16LEWith'. decodeUtf16LE :: B.ByteString -> Text decodeUtf16LE = decodeUtf16LEWith strictDecode {-# INLINE decodeUtf16LE #-} -- | Decode text from big endian UTF-16 encoding. decodeUtf16BEWith :: OnDecodeError -> B.ByteString -> Text decodeUtf16BEWith onErr bs = F.unstream (E.streamUtf16BE onErr bs) {-# INLINE decodeUtf16BEWith #-} -- | Decode text from big endian UTF-16 encoding. -- -- If the input contains any invalid big endian UTF-16 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf16BEWith'. decodeUtf16BE :: B.ByteString -> Text decodeUtf16BE = decodeUtf16BEWith strictDecode {-# INLINE decodeUtf16BE #-} -- | Encode text using little endian UTF-16 encoding. encodeUtf16LE :: Text -> B.ByteString encodeUtf16LE txt = B.fromChunks (foldrChunks ((:) . TE.encodeUtf16LE) [] txt) {-# INLINE encodeUtf16LE #-} -- | Encode text using big endian UTF-16 encoding. encodeUtf16BE :: Text -> B.ByteString encodeUtf16BE txt = B.fromChunks (foldrChunks ((:) . TE.encodeUtf16BE) [] txt) {-# INLINE encodeUtf16BE #-} -- | Decode text from little endian UTF-32 encoding. decodeUtf32LEWith :: OnDecodeError -> B.ByteString -> Text decodeUtf32LEWith onErr bs = F.unstream (E.streamUtf32LE onErr bs) {-# INLINE decodeUtf32LEWith #-} -- | Decode text from little endian UTF-32 encoding. -- -- If the input contains any invalid little endian UTF-32 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf32LEWith'. decodeUtf32LE :: B.ByteString -> Text decodeUtf32LE = decodeUtf32LEWith strictDecode {-# INLINE decodeUtf32LE #-} -- | Decode text from big endian UTF-32 encoding. decodeUtf32BEWith :: OnDecodeError -> B.ByteString -> Text decodeUtf32BEWith onErr bs = F.unstream (E.streamUtf32BE onErr bs) {-# INLINE decodeUtf32BEWith #-} -- | Decode text from big endian UTF-32 encoding. -- -- If the input contains any invalid big endian UTF-32 data, an -- exception will be thrown. For more control over the handling of -- invalid data, use 'decodeUtf32BEWith'. decodeUtf32BE :: B.ByteString -> Text decodeUtf32BE = decodeUtf32BEWith strictDecode {-# INLINE decodeUtf32BE #-} -- | Encode text using little endian UTF-32 encoding. encodeUtf32LE :: Text -> B.ByteString encodeUtf32LE txt = B.fromChunks (foldrChunks ((:) . TE.encodeUtf32LE) [] txt) {-# INLINE encodeUtf32LE #-} -- | Encode text using big endian UTF-32 encoding. encodeUtf32BE :: Text -> B.ByteString encodeUtf32BE txt = B.fromChunks (foldrChunks ((:) . TE.encodeUtf32BE) [] txt) {-# INLINE encodeUtf32BE #-} text-0.11.3.1/Data/Text/Lazy/Fusion.hs0000644000000000000000000000751112140332415015417 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} -- | -- Module : Data.Text.Lazy.Fusion -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Core stream fusion functionality for text. module Data.Text.Lazy.Fusion ( stream , unstream , unstreamChunks , length , unfoldrN , index , countChar ) where import Prelude hiding (length) import qualified Data.Text.Fusion.Common as S import Control.Monad.ST (runST) import Data.Text.Fusion.Internal import Data.Text.Fusion.Size (isEmpty, unknownSize) import Data.Text.Lazy.Internal import qualified Data.Text.Internal as I import qualified Data.Text.Array as A import Data.Text.UnsafeChar (unsafeWrite) import Data.Text.UnsafeShift (shiftL) import Data.Text.Unsafe (Iter(..), iter) import Data.Int (Int64) default(Int64) -- | /O(n)/ Convert a 'Text' into a 'Stream Char'. stream :: Text -> Stream Char stream text = Stream next (text :*: 0) unknownSize where next (Empty :*: _) = Done next (txt@(Chunk t@(I.Text _ _ len) ts) :*: i) | i >= len = next (ts :*: 0) | otherwise = Yield c (txt :*: i+d) where Iter c d = iter t i {-# INLINE [0] stream #-} -- | /O(n)/ Convert a 'Stream Char' into a 'Text', using the given -- chunk size. unstreamChunks :: Int -> Stream Char -> Text unstreamChunks !chunkSize (Stream next s0 len0) | isEmpty len0 = Empty | otherwise = outer s0 where outer so = {-# SCC "unstreamChunks/outer" #-} case next so of Done -> Empty Skip s' -> outer s' Yield x s' -> runST $ do a <- A.new unknownLength unsafeWrite a 0 x >>= inner a unknownLength s' where unknownLength = 4 where inner marr !len s !i | i + 1 >= chunkSize = finish marr i s | i + 1 >= len = {-# SCC "unstreamChunks/resize" #-} do let newLen = min (len `shiftL` 1) chunkSize marr' <- A.new newLen A.copyM marr' 0 marr 0 len inner marr' newLen s i | otherwise = {-# SCC "unstreamChunks/inner" #-} case next s of Done -> finish marr i s Skip s' -> inner marr len s' i Yield x s' -> do d <- unsafeWrite marr i x inner marr len s' (i+d) finish marr len s' = do arr <- A.unsafeFreeze marr return (I.Text arr 0 len `Chunk` outer s') {-# INLINE [0] unstreamChunks #-} -- | /O(n)/ Convert a 'Stream Char' into a 'Text', using -- 'defaultChunkSize'. unstream :: Stream Char -> Text unstream = unstreamChunks defaultChunkSize {-# INLINE [0] unstream #-} -- | /O(n)/ Returns the number of characters in a text. length :: Stream Char -> Int64 length = S.lengthI {-# INLINE[0] length #-} {-# RULES "LAZY STREAM stream/unstream fusion" forall s. stream (unstream s) = s #-} -- | /O(n)/ Like 'unfoldr', 'unfoldrN' builds a stream from a seed -- value. However, the length of the result is limited by the -- first argument to 'unfoldrN'. This function is more efficient than -- 'unfoldr' when the length of the result is known. unfoldrN :: Int64 -> (a -> Maybe (Char,a)) -> a -> Stream Char unfoldrN n = S.unfoldrNI n {-# INLINE [0] unfoldrN #-} -- | /O(n)/ stream index (subscript) operator, starting from 0. index :: Stream Char -> Int64 -> Char index = S.indexI {-# INLINE [0] index #-} -- | /O(n)/ The 'count' function returns the number of times the query -- element appears in the given stream. countChar :: Char -> Stream Char -> Int64 countChar = S.countCharI {-# INLINE [0] countChar #-} text-0.11.3.1/Data/Text/Lazy/Internal.hs0000644000000000000000000000722112140332415015726 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable #-} -- | -- Module : Data.Text.Lazy.Internal -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- A module containing private 'Text' internals. This exposes the -- 'Text' representation and low level construction functions. -- Modules which extend the 'Text' system may need to use this module. -- -- You should not use this module unless you are determined to monkey -- with the internals, as the functions here do just about nothing to -- preserve data invariants. You have been warned! module Data.Text.Lazy.Internal ( Text(..) , chunk , empty , foldrChunks , foldlChunks -- * Data type invariant and abstraction functions -- $invariant , strictInvariant , lazyInvariant , showStructure -- * Chunk allocation sizes , defaultChunkSize , smallChunkSize , chunkOverhead ) where import Data.Text () import Data.Text.UnsafeShift (shiftL) import Data.Typeable (Typeable) import Foreign.Storable (sizeOf) import qualified Data.Text.Internal as T data Text = Empty | Chunk {-# UNPACK #-} !T.Text Text deriving (Typeable) -- $invariant -- -- The data type invariant for lazy 'Text': Every 'Text' is either 'Empty' or -- consists of non-null 'T.Text's. All functions must preserve this, -- and the QC properties must check this. -- | Check the invariant strictly. strictInvariant :: Text -> Bool strictInvariant Empty = True strictInvariant x@(Chunk (T.Text _ _ len) cs) | len > 0 = strictInvariant cs | otherwise = error $ "Data.Text.Lazy: invariant violation: " ++ showStructure x -- | Check the invariant lazily. lazyInvariant :: Text -> Text lazyInvariant Empty = Empty lazyInvariant x@(Chunk c@(T.Text _ _ len) cs) | len > 0 = Chunk c (lazyInvariant cs) | otherwise = error $ "Data.Text.Lazy: invariant violation: " ++ showStructure x -- | Display the internal structure of a lazy 'Text'. showStructure :: Text -> String showStructure Empty = "Empty" showStructure (Chunk t Empty) = "Chunk " ++ show t ++ " Empty" showStructure (Chunk t ts) = "Chunk " ++ show t ++ " (" ++ showStructure ts ++ ")" -- | Smart constructor for 'Chunk'. Guarantees the data type invariant. chunk :: T.Text -> Text -> Text {-# INLINE chunk #-} chunk t@(T.Text _ _ len) ts | len == 0 = ts | otherwise = Chunk t ts -- | Smart constructor for 'Empty'. empty :: Text {-# INLINE [0] empty #-} empty = Empty -- | Consume the chunks of a lazy 'Text' with a natural right fold. foldrChunks :: (T.Text -> a -> a) -> a -> Text -> a foldrChunks f z = go where go Empty = z go (Chunk c cs) = f c (go cs) {-# INLINE foldrChunks #-} -- | Consume the chunks of a lazy 'Text' with a strict, tail-recursive, -- accumulating left fold. foldlChunks :: (a -> T.Text -> a) -> a -> Text -> a foldlChunks f z = go z where go !a Empty = a go !a (Chunk c cs) = go (f a c) cs {-# INLINE foldlChunks #-} -- | Currently set to 16 KiB, less the memory management overhead. defaultChunkSize :: Int defaultChunkSize = 16384 - chunkOverhead {-# INLINE defaultChunkSize #-} -- | Currently set to 128 bytes, less the memory management overhead. smallChunkSize :: Int smallChunkSize = 128 - chunkOverhead {-# INLINE smallChunkSize #-} -- | The memory management overhead. Currently this is tuned for GHC only. chunkOverhead :: Int chunkOverhead = sizeOf (undefined :: Int) `shiftL` 1 {-# INLINE chunkOverhead #-} text-0.11.3.1/Data/Text/Lazy/IO.hs0000644000000000000000000001607212140332415014465 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, RecordWildCards #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Lazy.IO -- Copyright : (c) 2009, 2010 Bryan O'Sullivan, -- (c) 2009 Simon Marlow -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Efficient locale-sensitive support for lazy text I\/O. -- -- Skip past the synopsis for some important notes on performance and -- portability across different versions of GHC. module Data.Text.Lazy.IO ( -- * Performance -- $performance -- * Locale support -- $locale -- * File-at-a-time operations readFile , writeFile , appendFile -- * Operations on handles , hGetContents , hGetLine , hPutStr , hPutStrLn -- * Special cases for standard input and output , interact , getContents , getLine , putStr , putStrLn ) where import Data.Text.Lazy (Text) import Prelude hiding (appendFile, getContents, getLine, interact, putStr, putStrLn, readFile, writeFile) import System.IO (Handle, IOMode(..), hPutChar, openFile, stdin, stdout, withFile) import qualified Data.Text.IO as T import qualified Data.Text.Lazy as L #if __GLASGOW_HASKELL__ <= 610 import Data.Text.Lazy.Encoding (decodeUtf8) import qualified Data.ByteString.Char8 as S8 import qualified Data.ByteString.Lazy.Char8 as L8 #else import qualified Control.Exception as E import Control.Monad (when) import Data.IORef (readIORef) import Data.Text.IO.Internal (hGetLineWith, readChunk) import Data.Text.Lazy.Internal (chunk, empty) import GHC.IO.Buffer (isEmptyBuffer) import GHC.IO.Exception (IOException(..), IOErrorType(..), ioException) import GHC.IO.Handle.Internals (augmentIOError, hClose_help, wantReadableHandle, withHandle) import GHC.IO.Handle.Types (Handle__(..), HandleType(..)) import System.IO (BufferMode(..), hGetBuffering, hSetBuffering) import System.IO.Error (isEOFError) import System.IO.Unsafe (unsafeInterleaveIO) #endif -- $performance -- -- The functions in this module obey the runtime system's locale, -- character set encoding, and line ending conversion settings. -- -- If you know in advance that you will be working with data that has -- a specific encoding (e.g. UTF-8), and your application is highly -- performance sensitive, you may find that it is faster to perform -- I\/O with bytestrings and to encode and decode yourself than to use -- the functions in this module. -- -- Whether this will hold depends on the version of GHC you are using, -- the platform you are working on, the data you are working with, and -- the encodings you are using, so be sure to test for yourself. -- | Read a file and return its contents as a string. The file is -- read lazily, as with 'getContents'. readFile :: FilePath -> IO Text readFile name = openFile name ReadMode >>= hGetContents -- | Write a string to a file. The file is truncated to zero length -- before writing begins. writeFile :: FilePath -> Text -> IO () writeFile p = withFile p WriteMode . flip hPutStr -- | Write a string the end of a file. appendFile :: FilePath -> Text -> IO () appendFile p = withFile p AppendMode . flip hPutStr -- | Lazily read the remaining contents of a 'Handle'. The 'Handle' -- will be closed after the read completes, or on error. hGetContents :: Handle -> IO Text #if __GLASGOW_HASKELL__ <= 610 hGetContents = fmap decodeUtf8 . L8.hGetContents #else hGetContents h = do chooseGoodBuffering h wantReadableHandle "hGetContents" h $ \hh -> do ts <- lazyRead h return (hh{haType=SemiClosedHandle}, ts) -- | Use a more efficient buffer size if we're reading in -- block-buffered mode with the default buffer size. chooseGoodBuffering :: Handle -> IO () chooseGoodBuffering h = do bufMode <- hGetBuffering h when (bufMode == BlockBuffering Nothing) $ hSetBuffering h (BlockBuffering (Just 16384)) lazyRead :: Handle -> IO Text lazyRead h = unsafeInterleaveIO $ withHandle "hGetContents" h $ \hh -> do case haType hh of ClosedHandle -> return (hh, L.empty) SemiClosedHandle -> lazyReadBuffered h hh _ -> ioException (IOError (Just h) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing) lazyReadBuffered :: Handle -> Handle__ -> IO (Handle__, Text) lazyReadBuffered h hh@Handle__{..} = do buf <- readIORef haCharBuffer (do t <- readChunk hh buf ts <- lazyRead h return (hh, chunk t ts)) `E.catch` \e -> do (hh', _) <- hClose_help hh if isEOFError e then return $ if isEmptyBuffer buf then (hh', empty) else (hh', L.singleton '\r') else E.throwIO (augmentIOError e "hGetContents" h) #endif -- | Read a single line from a handle. hGetLine :: Handle -> IO Text #if __GLASGOW_HASKELL__ <= 610 hGetLine = fmap (decodeUtf8 . L8.fromChunks . (:[])) . S8.hGetLine #else hGetLine = hGetLineWith L.fromChunks #endif -- | Write a string to a handle. hPutStr :: Handle -> Text -> IO () hPutStr h = mapM_ (T.hPutStr h) . L.toChunks -- | Write a string to a handle, followed by a newline. hPutStrLn :: Handle -> Text -> IO () hPutStrLn h t = hPutStr h t >> hPutChar h '\n' -- | The 'interact' function takes a function of type @Text -> Text@ -- as its argument. The entire input from the standard input device is -- passed (lazily) to this function as its argument, and the resulting -- string is output on the standard output device. interact :: (Text -> Text) -> IO () interact f = putStr . f =<< getContents -- | Lazily read all user input on 'stdin' as a single string. getContents :: IO Text getContents = hGetContents stdin -- | Read a single line of user input from 'stdin'. getLine :: IO Text getLine = hGetLine stdin -- | Write a string to 'stdout'. putStr :: Text -> IO () putStr = hPutStr stdout -- | Write a string to 'stdout', followed by a newline. putStrLn :: Text -> IO () putStrLn = hPutStrLn stdout -- $locale -- -- /Note/: The behaviour of functions in this module depends on the -- version of GHC you are using. -- -- Beginning with GHC 6.12, text I\/O is performed using the system or -- handle's current locale and line ending conventions. -- -- Under GHC 6.10 and earlier, the system I\/O libraries /do not -- support/ locale-sensitive I\/O or line ending conversion. On these -- versions of GHC, functions in this library all use UTF-8. What -- does this mean in practice? -- -- * All data that is read will be decoded as UTF-8. -- -- * Before data is written, it is first encoded as UTF-8. -- -- * On both reading and writing, the platform's native newline -- conversion is performed. -- -- If you must use a non-UTF-8 locale on an older version of GHC, you -- will have to perform the transcoding yourself, e.g. as follows: -- -- > import qualified Data.ByteString.Lazy as B -- > import Data.Text.Lazy (Text) -- > import Data.Text.Lazy.Encoding (encodeUtf16) -- > -- > putStr_Utf16LE :: Text -> IO () -- > putStr_Utf16LE t = B.putStr (encodeUtf16LE t) text-0.11.3.1/Data/Text/Lazy/Read.hs0000644000000000000000000001750212140332415015030 0ustar0000000000000000{-# LANGUAGE OverloadedStrings, CPP #-} #if __GLASGOW_HASKELL__ >= 702 {-# LANGUAGE Trustworthy #-} #endif -- | -- Module : Data.Text.Lazy.Read -- Copyright : (c) 2010, 2011 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Functions used frequently when reading textual data. module Data.Text.Lazy.Read ( Reader , decimal , hexadecimal , signed , rational , double ) where import Control.Monad (liftM) import Data.Char (isDigit, isHexDigit, ord) import Data.Int (Int8, Int16, Int32, Int64) import Data.Ratio ((%)) import Data.Text.Lazy as T import Data.Word (Word, Word8, Word16, Word32, Word64) -- | Read some text. If the read succeeds, return its value and the -- remaining text, otherwise an error message. type Reader a = Text -> Either String (a,Text) -- | Read a decimal integer. The input must begin with at least one -- decimal digit, and is consumed until a non-digit or end of string -- is reached. -- -- This function does not handle leading sign characters. If you need -- to handle signed input, use @'signed' 'decimal'@. -- -- /Note/: For fixed-width integer types, this function does not -- attempt to detect overflow, so a sufficiently long input may give -- incorrect results. If you are worried about overflow, use -- 'Integer' for your result type. decimal :: Integral a => Reader a {-# SPECIALIZE decimal :: Reader Int #-} {-# SPECIALIZE decimal :: Reader Int8 #-} {-# SPECIALIZE decimal :: Reader Int16 #-} {-# SPECIALIZE decimal :: Reader Int32 #-} {-# SPECIALIZE decimal :: Reader Int64 #-} {-# SPECIALIZE decimal :: Reader Integer #-} {-# SPECIALIZE decimal :: Reader Word #-} {-# SPECIALIZE decimal :: Reader Word8 #-} {-# SPECIALIZE decimal :: Reader Word16 #-} {-# SPECIALIZE decimal :: Reader Word32 #-} {-# SPECIALIZE decimal :: Reader Word64 #-} decimal txt | T.null h = Left "input does not start with a digit" | otherwise = Right (T.foldl' go 0 h, t) where (h,t) = T.span isDigit txt go n d = (n * 10 + fromIntegral (digitToInt d)) -- | Read a hexadecimal integer, consisting of an optional leading -- @\"0x\"@ followed by at least one decimal digit. Input is consumed -- until a non-hex-digit or end of string is reached. This function -- is case insensitive. -- -- This function does not handle leading sign characters. If you need -- to handle signed input, use @'signed' 'hexadecimal'@. -- -- /Note/: For fixed-width integer types, this function does not -- attempt to detect overflow, so a sufficiently long input may give -- incorrect results. If you are worried about overflow, use -- 'Integer' for your result type. hexadecimal :: Integral a => Reader a {-# SPECIALIZE hexadecimal :: Reader Int #-} {-# SPECIALIZE hexadecimal :: Reader Integer #-} hexadecimal txt | h == "0x" || h == "0X" = hex t | otherwise = hex txt where (h,t) = T.splitAt 2 txt hex :: Integral a => Reader a {-# SPECIALIZE hexadecimal :: Reader Int #-} {-# SPECIALIZE hexadecimal :: Reader Int8 #-} {-# SPECIALIZE hexadecimal :: Reader Int16 #-} {-# SPECIALIZE hexadecimal :: Reader Int32 #-} {-# SPECIALIZE hexadecimal :: Reader Int64 #-} {-# SPECIALIZE hexadecimal :: Reader Integer #-} {-# SPECIALIZE hexadecimal :: Reader Word #-} {-# SPECIALIZE hexadecimal :: Reader Word8 #-} {-# SPECIALIZE hexadecimal :: Reader Word16 #-} {-# SPECIALIZE hexadecimal :: Reader Word32 #-} {-# SPECIALIZE hexadecimal :: Reader Word64 #-} hex txt | T.null h = Left "input does not start with a hexadecimal digit" | otherwise = Right (T.foldl' go 0 h, t) where (h,t) = T.span isHexDigit txt go n d = (n * 16 + fromIntegral (hexDigitToInt d)) hexDigitToInt :: Char -> Int hexDigitToInt c | c >= '0' && c <= '9' = ord c - ord '0' | c >= 'a' && c <= 'f' = ord c - (ord 'a' - 10) | otherwise = ord c - (ord 'A' - 10) digitToInt :: Char -> Int digitToInt c = ord c - ord '0' -- | Read an optional leading sign character (@\'-\'@ or @\'+\'@) and -- apply it to the result of applying the given reader. signed :: Num a => Reader a -> Reader a {-# INLINE signed #-} signed f = runP (signa (P f)) -- | Read a rational number. -- -- This function accepts an optional leading sign character, followed -- by at least one decimal digit. The syntax similar to that accepted -- by the 'read' function, with the exception that a trailing @\'.\'@ -- or @\'e\'@ /not/ followed by a number is not consumed. -- -- Examples: -- -- >rational "3" == Right (3.0, "") -- >rational "3.1" == Right (3.1, "") -- >rational "3e4" == Right (30000.0, "") -- >rational "3.1e4" == Right (31000.0, "") -- >rational ".3" == Left "input does not start with a digit" -- >rational "e3" == Left "input does not start with a digit" -- -- Examples of differences from 'read': -- -- >rational "3.foo" == Right (3.0, ".foo") -- >rational "3e" == Right (3.0, "e") rational :: Fractional a => Reader a {-# SPECIALIZE rational :: Reader Double #-} rational = floaty $ \real frac fracDenom -> fromRational $ real % 1 + frac % fracDenom -- | Read a rational number. -- -- The syntax accepted by this function is the same as for 'rational'. -- -- /Note/: This function is almost ten times faster than 'rational', -- but is slightly less accurate. -- -- The 'Double' type supports about 16 decimal places of accuracy. -- For 94.2% of numbers, this function and 'rational' give identical -- results, but for the remaining 5.8%, this function loses precision -- around the 15th decimal place. For 0.001% of numbers, this -- function will lose precision at the 13th or 14th decimal place. double :: Reader Double double = floaty $ \real frac fracDenom -> fromIntegral real + fromIntegral frac / fromIntegral fracDenom signa :: Num a => Parser a -> Parser a {-# SPECIALIZE signa :: Parser Int -> Parser Int #-} {-# SPECIALIZE signa :: Parser Int8 -> Parser Int8 #-} {-# SPECIALIZE signa :: Parser Int16 -> Parser Int16 #-} {-# SPECIALIZE signa :: Parser Int32 -> Parser Int32 #-} {-# SPECIALIZE signa :: Parser Int64 -> Parser Int64 #-} {-# SPECIALIZE signa :: Parser Integer -> Parser Integer #-} signa p = do sign <- perhaps '+' $ char (\c -> c == '-' || c == '+') if sign == '+' then p else negate `liftM` p newtype Parser a = P { runP :: Reader a } instance Monad Parser where return a = P $ \t -> Right (a,t) {-# INLINE return #-} m >>= k = P $ \t -> case runP m t of Left err -> Left err Right (a,t') -> runP (k a) t' {-# INLINE (>>=) #-} fail msg = P $ \_ -> Left msg perhaps :: a -> Parser a -> Parser a perhaps def m = P $ \t -> case runP m t of Left _ -> Right (def,t) r@(Right _) -> r char :: (Char -> Bool) -> Parser Char char p = P $ \t -> case T.uncons t of Just (c,t') | p c -> Right (c,t') _ -> Left "character does not match" data T = T !Integer !Int floaty :: Fractional a => (Integer -> Integer -> Integer -> a) -> Reader a {-# INLINE floaty #-} floaty f = runP $ do sign <- perhaps '+' $ char (\c -> c == '-' || c == '+') real <- P decimal T fraction fracDigits <- perhaps (T 0 0) $ do _ <- char (=='.') digits <- P $ \t -> Right (fromIntegral . T.length $ T.takeWhile isDigit t, t) n <- P decimal return $ T n digits let e c = c == 'e' || c == 'E' power <- perhaps 0 (char e >> signa (P decimal) :: Parser Int) let n = if fracDigits == 0 then if power == 0 then fromIntegral real else fromIntegral real * (10 ^^ power) else if power == 0 then f real fraction (10 ^ fracDigits) else f real fraction (10 ^ fracDigits) * (10 ^^ power) return $! if sign == '+' then n else -n text-0.11.3.1/Data/Text/Lazy/Search.hs0000644000000000000000000001171112140332415015356 0ustar0000000000000000{-# LANGUAGE BangPatterns, ScopedTypeVariables #-} -- | -- Module : Data.Text.Lazy.Search -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : GHC -- -- Fast substring search for lazy 'Text', based on work by Boyer, -- Moore, Horspool, Sunday, and Lundh. Adapted from the strict -- implementation. module Data.Text.Lazy.Search ( indices ) where import qualified Data.Text.Array as A import Data.Int (Int64) import Data.Word (Word16, Word64) import qualified Data.Text.Internal as T import Data.Text.Fusion.Internal (PairS(..)) import Data.Text.Lazy.Internal (Text(..), foldlChunks) import Data.Bits ((.|.), (.&.)) import Data.Text.UnsafeShift (shiftL) -- | /O(n+m)/ Find the offsets of all non-overlapping indices of -- @needle@ within @haystack@. -- -- This function is strict in @needle@, and lazy (as far as possible) -- in the chunks of @haystack@. -- -- In (unlikely) bad cases, this algorithm's complexity degrades -- towards /O(n*m)/. indices :: Text -- ^ Substring to search for (@needle@) -> Text -- ^ Text to search in (@haystack@) -> [Int64] indices needle@(Chunk n ns) _haystack@(Chunk k ks) | nlen <= 0 = [] | nlen == 1 = indicesOne (nindex 0) 0 k ks | otherwise = advance k ks 0 0 where advance x@(T.Text _ _ l) xs = scan where scan !g !i | i >= m = case xs of Empty -> [] Chunk y ys -> advance y ys g (i-m) | lackingHay (i + nlen) x xs = [] | c == z && candidateMatch 0 = g : scan (g+nlen) (i+nlen) | otherwise = scan (g+delta) (i+delta) where m = fromIntegral l c = hindex (i + nlast) delta | nextInPattern = nlen + 1 | c == z = skip + 1 | otherwise = 1 nextInPattern = mask .&. swizzle (hindex (i+nlen)) == 0 candidateMatch !j | j >= nlast = True | hindex (i+j) /= nindex j = False | otherwise = candidateMatch (j+1) hindex = index x xs nlen = wordLength needle nlast = nlen - 1 nindex = index n ns z = foldlChunks fin 0 needle where fin _ (T.Text farr foff flen) = A.unsafeIndex farr (foff+flen-1) (mask :: Word64) :*: skip = buildTable n ns 0 0 0 (nlen-2) swizzle w = 1 `shiftL` (fromIntegral w .&. 0x3f) buildTable (T.Text xarr xoff xlen) xs = go where go !(g::Int64) !i !msk !skp | i >= xlast = case xs of Empty -> (msk .|. swizzle z) :*: skp Chunk y ys -> buildTable y ys g 0 msk' skp' | otherwise = go (g+1) (i+1) msk' skp' where c = A.unsafeIndex xarr (xoff+i) msk' = msk .|. swizzle c skp' | c == z = nlen - g - 2 | otherwise = skp xlast = xlen - 1 -- | Check whether an attempt to index into the haystack at the -- given offset would fail. lackingHay q = go 0 where go p (T.Text _ _ l) ps = p' < q && case ps of Empty -> True Chunk r rs -> go p' r rs where p' = p + fromIntegral l indices _ _ = [] -- | Fast index into a partly unpacked 'Text'. We take into account -- the possibility that the caller might try to access one element -- past the end. index :: T.Text -> Text -> Int64 -> Word16 index (T.Text arr off len) xs !i | j < len = A.unsafeIndex arr (off+j) | otherwise = case xs of Empty -- out of bounds, but legal | j == len -> 0 -- should never happen, due to lackingHay above | otherwise -> emptyError "index" Chunk c cs -> index c cs (i-fromIntegral len) where j = fromIntegral i -- | A variant of 'indices' that scans linearly for a single 'Word16'. indicesOne :: Word16 -> Int64 -> T.Text -> Text -> [Int64] indicesOne c = chunk where chunk !i (T.Text oarr ooff olen) os = go 0 where go h | h >= olen = case os of Empty -> [] Chunk y ys -> chunk (i+fromIntegral olen) y ys | on == c = i + fromIntegral h : go (h+1) | otherwise = go (h+1) where on = A.unsafeIndex oarr (ooff+h) -- | The number of 'Word16' values in a 'Text'. wordLength :: Text -> Int64 wordLength = foldlChunks sumLength 0 where sumLength i (T.Text _ _ l) = i + fromIntegral l emptyError :: String -> a emptyError fun = error ("Data.Text.Lazy.Search." ++ fun ++ ": empty input") text-0.11.3.1/Data/Text/Lazy/Builder/0000755000000000000000000000000012140332415015202 5ustar0000000000000000text-0.11.3.1/Data/Text/Lazy/Builder/Functions.hs0000644000000000000000000000134212140332415017506 0ustar0000000000000000{-# LANGUAGE MagicHash #-} -- | -- Module : Data.Text.Lazy.Builder.Functions -- Copyright : (c) 2011 MailRank, Inc. -- -- License : BSD-style -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : GHC -- -- Useful functions and combinators. module Data.Text.Lazy.Builder.Functions ( (<>) , i2d ) where import Data.Monoid (mappend) import Data.Text.Lazy.Builder (Builder) import GHC.Base -- | Unsafe conversion for decimal digits. {-# INLINE i2d #-} i2d :: Int -> Char i2d (I# i#) = C# (chr# (ord# '0'# +# i#)) -- | The normal 'mappend' function with right associativity instead of -- left. (<>) :: Builder -> Builder -> Builder (<>) = mappend {-# INLINE (<>) #-} infixr 4 <> text-0.11.3.1/Data/Text/Lazy/Builder/Int.hs0000644000000000000000000002135612140332415016277 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, MagicHash, RankNTypes, UnboxedTuples #-} -- Module: Data.Text.Lazy.Builder.Int -- Copyright: (c) 2013 Bryan O'Sullivan -- (c) 2011 MailRank, Inc. -- License: BSD3 -- Maintainer: Bryan O'Sullivan -- Stability: experimental -- Portability: portable -- -- Efficiently write an integral value to a 'Builder'. module Data.Text.Lazy.Builder.Int ( decimal , hexadecimal ) where import Data.Int (Int8, Int16, Int32, Int64) import Data.Monoid (mempty) import qualified Data.ByteString.Unsafe as B import Data.Text.Lazy.Builder.Functions ((<>), i2d) import Data.Text.Lazy.Builder.Internal import Data.Text.Lazy.Builder.Int.Digits (digits) import Data.Text.Array import Data.Word (Word, Word8, Word16, Word32, Word64) import GHC.Base (quotInt, remInt) import GHC.Num (quotRemInteger) import GHC.Types (Int(..)) import Control.Monad.ST #ifdef __GLASGOW_HASKELL__ # if __GLASGOW_HASKELL__ < 611 import GHC.Integer.Internals # elif defined(INTEGER_GMP) import GHC.Integer.GMP.Internals # elif defined(INTEGER_SIMPLE) import GHC.Integer # else # error "You need to use either GMP or integer-simple." # endif #endif #if defined(INTEGER_GMP) || defined(INTEGER_SIMPLE) # define PAIR(a,b) (# a,b #) #else # define PAIR(a,b) (a,b) #endif decimal :: Integral a => a -> Builder {-# SPECIALIZE decimal :: Int8 -> Builder #-} {-# RULES "decimal/Int" decimal = boundedDecimal :: Int -> Builder #-} {-# RULES "decimal/Int16" decimal = boundedDecimal :: Int16 -> Builder #-} {-# RULES "decimal/Int32" decimal = boundedDecimal :: Int32 -> Builder #-} {-# RULES "decimal/Int64" decimal = boundedDecimal :: Int64 -> Builder #-} {-# RULES "decimal/Word" decimal = positive :: Word -> Builder #-} {-# RULES "decimal/Word8" decimal = positive :: Word8 -> Builder #-} {-# RULES "decimal/Word16" decimal = positive :: Word16 -> Builder #-} {-# RULES "decimal/Word32" decimal = positive :: Word32 -> Builder #-} {-# RULES "decimal/Word64" decimal = positive :: Word64 -> Builder #-} {-# RULES "decimal/Integer" decimal = integer 10 :: Integer -> Builder #-} decimal i = decimal' (<= -128) i boundedDecimal :: (Integral a, Bounded a) => a -> Builder {-# SPECIALIZE boundedDecimal :: Int -> Builder #-} {-# SPECIALIZE boundedDecimal :: Int8 -> Builder #-} {-# SPECIALIZE boundedDecimal :: Int16 -> Builder #-} {-# SPECIALIZE boundedDecimal :: Int32 -> Builder #-} {-# SPECIALIZE boundedDecimal :: Int64 -> Builder #-} boundedDecimal i = decimal' (== minBound) i decimal' :: (Integral a) => (a -> Bool) -> a -> Builder {-# INLINE decimal' #-} decimal' p i | i < 0 = if p i then let (q, r) = i `quotRem` 10 qq = -q !n = countDigits qq in writeN (n + 2) $ \marr off -> do unsafeWrite marr off minus posDecimal marr (off+1) n qq unsafeWrite marr (off+n+1) (i2w (-r)) else let j = -i !n = countDigits j in writeN (n + 1) $ \marr off -> unsafeWrite marr off minus >> posDecimal marr (off+1) n j | otherwise = positive i positive :: (Integral a) => a -> Builder {-# SPECIALIZE positive :: Int -> Builder #-} {-# SPECIALIZE positive :: Int8 -> Builder #-} {-# SPECIALIZE positive :: Int16 -> Builder #-} {-# SPECIALIZE positive :: Int32 -> Builder #-} {-# SPECIALIZE positive :: Int64 -> Builder #-} {-# SPECIALIZE positive :: Word -> Builder #-} {-# SPECIALIZE positive :: Word8 -> Builder #-} {-# SPECIALIZE positive :: Word16 -> Builder #-} {-# SPECIALIZE positive :: Word32 -> Builder #-} {-# SPECIALIZE positive :: Word64 -> Builder #-} positive i | i < 10 = writeN 1 $ \marr off -> unsafeWrite marr off (i2w i) | otherwise = let !n = countDigits i in writeN n $ \marr off -> posDecimal marr off n i posDecimal :: (Integral a) => forall s. MArray s -> Int -> Int -> a -> ST s () {-# INLINE posDecimal #-} posDecimal marr off0 ds v0 = go (off0 + ds - 1) v0 where go off v | v >= 100 = do let (q, r) = v `quotRem` 100 write2 off r go (off - 2) q | v < 10 = unsafeWrite marr off (i2w v) | otherwise = write2 off v write2 off i0 = do let i = fromIntegral i0; j = i + i unsafeWrite marr off $ get (j + 1) unsafeWrite marr (off - 1) $ get j get = fromIntegral . B.unsafeIndex digits minus, zero :: Word16 {-# INLINE minus #-} {-# INLINE zero #-} minus = 45 zero = 48 i2w :: (Integral a) => a -> Word16 {-# INLINE i2w #-} i2w v = zero + fromIntegral v countDigits :: (Integral a) => a -> Int {-# INLINE countDigits #-} countDigits v0 = go 1 (fromIntegral v0 :: Word64) where go !k v | v < 10 = k | v < 100 = k + 1 | v < 1000 = k + 2 | v < 1000000000000 = k + if v < 100000000 then if v < 1000000 then if v < 10000 then 3 else 4 + fin v 100000 else 6 + fin v 10000000 else if v < 10000000000 then 8 + fin v 1000000000 else 10 + fin v 100000000000 | otherwise = go (k + 12) (v `quot` 1000000000000) fin v n = if v >= n then 1 else 0 hexadecimal :: Integral a => a -> Builder {-# SPECIALIZE hexadecimal :: Int -> Builder #-} {-# SPECIALIZE hexadecimal :: Int8 -> Builder #-} {-# SPECIALIZE hexadecimal :: Int16 -> Builder #-} {-# SPECIALIZE hexadecimal :: Int32 -> Builder #-} {-# SPECIALIZE hexadecimal :: Int64 -> Builder #-} {-# SPECIALIZE hexadecimal :: Word -> Builder #-} {-# SPECIALIZE hexadecimal :: Word8 -> Builder #-} {-# SPECIALIZE hexadecimal :: Word16 -> Builder #-} {-# SPECIALIZE hexadecimal :: Word32 -> Builder #-} {-# SPECIALIZE hexadecimal :: Word64 -> Builder #-} {-# RULES "hexadecimal/Integer" hexadecimal = hexInteger :: Integer -> Builder #-} hexadecimal i | i < 0 = error hexErrMsg | otherwise = go i where go n | n < 16 = hexDigit n | otherwise = go (n `quot` 16) <> hexDigit (n `rem` 16) hexInteger :: Integer -> Builder hexInteger i | i < 0 = error hexErrMsg | otherwise = integer 16 i hexErrMsg :: String hexErrMsg = "Data.Text.Lazy.Builder.Int.hexadecimal: applied to negative number" hexDigit :: Integral a => a -> Builder hexDigit n | n <= 9 = singleton $! i2d (fromIntegral n) | otherwise = singleton $! toEnum (fromIntegral n + 87) {-# INLINE hexDigit #-} int :: Int -> Builder int = decimal {-# INLINE int #-} data T = T !Integer !Int integer :: Int -> Integer -> Builder #ifdef INTEGER_GMP integer 10 (S# i#) = decimal (I# i#) integer 16 (S# i#) = hexadecimal (I# i#) #else integer 10 i = decimal i integer 16 i = hexadecimal i #endif integer base i | i < 0 = singleton '-' <> go (-i) | otherwise = go i where go n | n < maxInt = int (fromInteger n) | otherwise = putH (splitf (maxInt * maxInt) n) splitf p n | p > n = [n] | otherwise = splith p (splitf (p*p) n) splith p (n:ns) = case n `quotRemInteger` p of PAIR(q,r) | q > 0 -> q : r : splitb p ns | otherwise -> r : splitb p ns splith _ _ = error "splith: the impossible happened." splitb p (n:ns) = case n `quotRemInteger` p of PAIR(q,r) -> q : r : splitb p ns splitb _ _ = [] T maxInt10 maxDigits10 = until ((>mi) . (*10) . fstT) (\(T n d) -> T (n*10) (d+1)) (T 10 1) where mi = fromIntegral (maxBound :: Int) T maxInt16 maxDigits16 = until ((>mi) . (*16) . fstT) (\(T n d) -> T (n*16) (d+1)) (T 16 1) where mi = fromIntegral (maxBound :: Int) fstT (T a _) = a maxInt | base == 10 = maxInt10 | otherwise = maxInt16 maxDigits | base == 10 = maxDigits10 | otherwise = maxDigits16 putH (n:ns) = case n `quotRemInteger` maxInt of PAIR(x,y) | q > 0 -> int q <> pblock r <> putB ns | otherwise -> int r <> putB ns where q = fromInteger x r = fromInteger y putH _ = error "putH: the impossible happened" putB (n:ns) = case n `quotRemInteger` maxInt of PAIR(x,y) -> pblock q <> pblock r <> putB ns where q = fromInteger x r = fromInteger y putB _ = mempty pblock = loop maxDigits where loop !d !n | d == 1 = hexDigit n | otherwise = loop (d-1) q <> hexDigit r where q = n `quotInt` base r = n `remInt` base text-0.11.3.1/Data/Text/Lazy/Builder/Internal.hs0000644000000000000000000002460712140332415017323 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, Rank2Types #-} ----------------------------------------------------------------------------- -- | -- Module : Data.Text.Lazy.Builder.Internal -- Copyright : (c) 2013 Bryan O'Sullivan -- (c) 2010 Johan Tibell -- License : BSD3-style (see LICENSE) -- -- Maintainer : Johan Tibell -- Stability : experimental -- Portability : portable to Hugs and GHC -- -- Efficient construction of lazy @Text@ values. The principal -- operations on a @Builder@ are @singleton@, @fromText@, and -- @fromLazyText@, which construct new builders, and 'mappend', which -- concatenates two builders. -- -- To get maximum performance when building lazy @Text@ values using a -- builder, associate @mappend@ calls to the right. For example, -- prefer -- -- > singleton 'a' `mappend` (singleton 'b' `mappend` singleton 'c') -- -- to -- -- > singleton 'a' `mappend` singleton 'b' `mappend` singleton 'c' -- -- as the latter associates @mappend@ to the left. -- ----------------------------------------------------------------------------- module Data.Text.Lazy.Builder.Internal ( -- * Public API -- ** The Builder type Builder , toLazyText , toLazyTextWith -- ** Constructing Builders , singleton , fromText , fromLazyText , fromString -- ** Flushing the buffer state , flush -- * Internal functions , append' , ensureFree , writeN ) where import Control.Monad.ST (ST, runST) import Data.Bits ((.&.)) import Data.Monoid (Monoid(..)) import Data.Text.Internal (Text(..)) import Data.Text.Lazy.Internal (smallChunkSize) import Data.Text.Unsafe (inlineInterleaveST) import Data.Text.UnsafeChar (ord, unsafeWrite) import Data.Text.UnsafeShift (shiftR) import Prelude hiding (map, putChar) import qualified Data.String as String import qualified Data.Text as S import qualified Data.Text.Array as A import qualified Data.Text.Lazy as L ------------------------------------------------------------------------ -- | A @Builder@ is an efficient way to build lazy @Text@ values. -- There are several functions for constructing builders, but only one -- to inspect them: to extract any data, you have to turn them into -- lazy @Text@ values using @toLazyText@. -- -- Internally, a builder constructs a lazy @Text@ by filling arrays -- piece by piece. As each buffer is filled, it is \'popped\' off, to -- become a new chunk of the resulting lazy @Text@. All this is -- hidden from the user of the @Builder@. newtype Builder = Builder { -- Invariant (from Data.Text.Lazy): -- The lists include no null Texts. runBuilder :: forall s. (Buffer s -> ST s [S.Text]) -> Buffer s -> ST s [S.Text] } instance Monoid Builder where mempty = empty {-# INLINE mempty #-} mappend = append {-# INLINE mappend #-} mconcat = foldr mappend mempty {-# INLINE mconcat #-} instance String.IsString Builder where fromString = fromString {-# INLINE fromString #-} instance Show Builder where show = show . toLazyText instance Eq Builder where a == b = toLazyText a == toLazyText b instance Ord Builder where a <= b = toLazyText a <= toLazyText b ------------------------------------------------------------------------ -- | /O(1)./ The empty @Builder@, satisfying -- -- * @'toLazyText' 'empty' = 'L.empty'@ -- empty :: Builder empty = Builder (\ k buf -> k buf) {-# INLINE empty #-} -- | /O(1)./ A @Builder@ taking a single character, satisfying -- -- * @'toLazyText' ('singleton' c) = 'L.singleton' c@ -- singleton :: Char -> Builder singleton c = writeAtMost 2 $ \ marr o -> if n < 0x10000 then A.unsafeWrite marr o (fromIntegral n) >> return 1 else do A.unsafeWrite marr o lo A.unsafeWrite marr (o+1) hi return 2 where n = ord c m = n - 0x10000 lo = fromIntegral $ (m `shiftR` 10) + 0xD800 hi = fromIntegral $ (m .&. 0x3FF) + 0xDC00 {-# INLINE singleton #-} ------------------------------------------------------------------------ -- | /O(1)./ The concatenation of two builders, an associative -- operation with identity 'empty', satisfying -- -- * @'toLazyText' ('append' x y) = 'L.append' ('toLazyText' x) ('toLazyText' y)@ -- append :: Builder -> Builder -> Builder append (Builder f) (Builder g) = Builder (f . g) {-# INLINE [0] append #-} -- TODO: Experiment to find the right threshold. copyLimit :: Int copyLimit = 128 -- This function attempts to merge small @Text@ values instead of -- treating each value as its own chunk. We may not always want this. -- | /O(1)./ A @Builder@ taking a 'S.Text', satisfying -- -- * @'toLazyText' ('fromText' t) = 'L.fromChunks' [t]@ -- fromText :: S.Text -> Builder fromText t@(Text arr off l) | S.null t = empty | l <= copyLimit = writeN l $ \marr o -> A.copyI marr o arr off (l+o) | otherwise = flush `append` mapBuilder (t :) {-# INLINE [1] fromText #-} {-# RULES "fromText/pack" forall s . fromText (S.pack s) = fromString s #-} -- | /O(1)./ A Builder taking a @String@, satisfying -- -- * @'toLazyText' ('fromString' s) = 'L.fromChunks' [S.pack s]@ -- fromString :: String -> Builder fromString str = Builder $ \k (Buffer p0 o0 u0 l0) -> let loop !marr !o !u !l [] = k (Buffer marr o u l) loop marr o u l s@(c:cs) | l <= 1 = do arr <- A.unsafeFreeze marr let !t = Text arr o u marr' <- A.new chunkSize ts <- inlineInterleaveST (loop marr' 0 0 chunkSize s) return $ t : ts | otherwise = do n <- unsafeWrite marr (o+u) c loop marr o (u+n) (l-n) cs in loop p0 o0 u0 l0 str where chunkSize = smallChunkSize {-# INLINE fromString #-} -- | /O(1)./ A @Builder@ taking a lazy @Text@, satisfying -- -- * @'toLazyText' ('fromLazyText' t) = t@ -- fromLazyText :: L.Text -> Builder fromLazyText ts = flush `append` mapBuilder (L.toChunks ts ++) {-# INLINE fromLazyText #-} ------------------------------------------------------------------------ -- Our internal buffer type data Buffer s = Buffer {-# UNPACK #-} !(A.MArray s) {-# UNPACK #-} !Int -- offset {-# UNPACK #-} !Int -- used units {-# UNPACK #-} !Int -- length left ------------------------------------------------------------------------ -- | /O(n)./ Extract a lazy @Text@ from a @Builder@ with a default -- buffer size. The construction work takes place if and when the -- relevant part of the lazy @Text@ is demanded. toLazyText :: Builder -> L.Text toLazyText = toLazyTextWith smallChunkSize -- | /O(n)./ Extract a lazy @Text@ from a @Builder@, using the given -- size for the initial buffer. The construction work takes place if -- and when the relevant part of the lazy @Text@ is demanded. -- -- If the initial buffer is too small to hold all data, subsequent -- buffers will be the default buffer size. toLazyTextWith :: Int -> Builder -> L.Text toLazyTextWith chunkSize m = L.fromChunks (runST $ newBuffer chunkSize >>= runBuilder (m `append` flush) (const (return []))) -- | /O(1)./ Pop the strict @Text@ we have constructed so far, if any, -- yielding a new chunk in the result lazy @Text@. flush :: Builder flush = Builder $ \ k buf@(Buffer p o u l) -> if u == 0 then k buf else do arr <- A.unsafeFreeze p let !b = Buffer p (o+u) 0 l !t = Text arr o u ts <- inlineInterleaveST (k b) return $! t : ts ------------------------------------------------------------------------ -- | Sequence an ST operation on the buffer withBuffer :: (forall s. Buffer s -> ST s (Buffer s)) -> Builder withBuffer f = Builder $ \k buf -> f buf >>= k {-# INLINE withBuffer #-} -- | Get the size of the buffer withSize :: (Int -> Builder) -> Builder withSize f = Builder $ \ k buf@(Buffer _ _ _ l) -> runBuilder (f l) k buf {-# INLINE withSize #-} -- | Map the resulting list of texts. mapBuilder :: ([S.Text] -> [S.Text]) -> Builder mapBuilder f = Builder (fmap f .) ------------------------------------------------------------------------ -- | Ensure that there are at least @n@ many elements available. ensureFree :: Int -> Builder ensureFree !n = withSize $ \ l -> if n <= l then empty else flush `append'` withBuffer (const (newBuffer (max n smallChunkSize))) {-# INLINE [0] ensureFree #-} writeAtMost :: Int -> (forall s. A.MArray s -> Int -> ST s Int) -> Builder writeAtMost n f = ensureFree n `append'` withBuffer (writeBuffer f) {-# INLINE [0] writeAtMost #-} -- | Ensure that @n@ many elements are available, and then use @f@ to -- write some elements into the memory. writeN :: Int -> (forall s. A.MArray s -> Int -> ST s ()) -> Builder writeN n f = writeAtMost n (\ p o -> f p o >> return n) {-# INLINE writeN #-} writeBuffer :: (A.MArray s -> Int -> ST s Int) -> Buffer s -> ST s (Buffer s) writeBuffer f (Buffer p o u l) = do n <- f p (o+u) return $! Buffer p o (u+n) (l-n) {-# INLINE writeBuffer #-} newBuffer :: Int -> ST s (Buffer s) newBuffer size = do arr <- A.new size return $! Buffer arr 0 0 size {-# INLINE newBuffer #-} ------------------------------------------------------------------------ -- Some nice rules for Builder -- This function makes GHC understand that 'writeN' and 'ensureFree' -- are *not* recursive in the precense of the rewrite rules below. -- This is not needed with GHC 7+. append' :: Builder -> Builder -> Builder append' (Builder f) (Builder g) = Builder (f . g) {-# INLINE append' #-} {-# RULES "append/writeAtMost" forall a b (f::forall s. A.MArray s -> Int -> ST s Int) (g::forall s. A.MArray s -> Int -> ST s Int) ws. append (writeAtMost a f) (append (writeAtMost b g) ws) = append (writeAtMost (a+b) (\marr o -> f marr o >>= \ n -> g marr (o+n) >>= \ m -> let s = n+m in s `seq` return s)) ws "writeAtMost/writeAtMost" forall a b (f::forall s. A.MArray s -> Int -> ST s Int) (g::forall s. A.MArray s -> Int -> ST s Int). append (writeAtMost a f) (writeAtMost b g) = writeAtMost (a+b) (\marr o -> f marr o >>= \ n -> g marr (o+n) >>= \ m -> let s = n+m in s `seq` return s) "ensureFree/ensureFree" forall a b . append (ensureFree a) (ensureFree b) = ensureFree (max a b) "flush/flush" append flush flush = flush #-} text-0.11.3.1/Data/Text/Lazy/Builder/RealFloat.hs0000644000000000000000000002025012140332415017406 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} -- | -- Module: Data.Text.Lazy.Builder.RealFloat -- Copyright: (c) The University of Glasgow 1994-2002 -- License: see libraries/base/LICENSE -- -- Write a floating point value to a 'Builder'. module Data.Text.Lazy.Builder.RealFloat ( FPFormat(..) , realFloat , formatRealFloat ) where import Data.Array.Base (unsafeAt) import Data.Array.IArray import Data.Text.Lazy.Builder.Functions ((<>), i2d) import Data.Text.Lazy.Builder.Int (decimal) import Data.Text.Lazy.Builder.RealFloat.Functions (roundTo) import Data.Text.Lazy.Builder import qualified Data.Text as T -- | Control the rendering of floating point numbers. data FPFormat = Exponent -- ^ Scientific notation (e.g. @2.3e123@). | Fixed -- ^ Standard decimal notation. | Generic -- ^ Use decimal notation for values between @0.1@ and -- @9,999,999@, and scientific notation otherwise. deriving (Enum, Read, Show) -- | Show a signed 'RealFloat' value to full precision, -- using standard decimal notation for arguments whose absolute value lies -- between @0.1@ and @9,999,999@, and scientific notation otherwise. realFloat :: (RealFloat a) => a -> Builder {-# SPECIALIZE realFloat :: Float -> Builder #-} {-# SPECIALIZE realFloat :: Double -> Builder #-} realFloat x = formatRealFloat Generic Nothing x formatRealFloat :: (RealFloat a) => FPFormat -> Maybe Int -- ^ Number of decimal places to render. -> a -> Builder {-# SPECIALIZE formatRealFloat :: FPFormat -> Maybe Int -> Float -> Builder #-} {-# SPECIALIZE formatRealFloat :: FPFormat -> Maybe Int -> Double -> Builder #-} formatRealFloat fmt decs x | isNaN x = "NaN" | isInfinite x = if x < 0 then "-Infinity" else "Infinity" | x < 0 || isNegativeZero x = singleton '-' <> doFmt fmt (floatToDigits (-x)) | otherwise = doFmt fmt (floatToDigits x) where doFmt format (is, e) = let ds = map i2d is in case format of Generic -> doFmt (if e < 0 || e > 7 then Exponent else Fixed) (is,e) Exponent -> case decs of Nothing -> let show_e' = decimal (e-1) in case ds of "0" -> "0.0e0" [d] -> singleton d <> ".0e" <> show_e' (d:ds') -> singleton d <> singleton '.' <> fromString ds' <> singleton 'e' <> show_e' [] -> error "formatRealFloat/doFmt/Exponent: []" Just dec -> let dec' = max dec 1 in case is of [0] -> "0." <> fromText (T.replicate dec' "0") <> "e0" _ -> let (ei,is') = roundTo (dec'+1) is (d:ds') = map i2d (if ei > 0 then init is' else is') in singleton d <> singleton '.' <> fromString ds' <> singleton 'e' <> decimal (e-1+ei) Fixed -> let mk0 ls = case ls of { "" -> "0" ; _ -> fromString ls} in case decs of Nothing | e <= 0 -> "0." <> fromText (T.replicate (-e) "0") <> fromString ds | otherwise -> let f 0 s rs = mk0 (reverse s) <> singleton '.' <> mk0 rs f n s "" = f (n-1) ('0':s) "" f n s (r:rs) = f (n-1) (r:s) rs in f e "" ds Just dec -> let dec' = max dec 0 in if e >= 0 then let (ei,is') = roundTo (dec' + e) is (ls,rs) = splitAt (e+ei) (map i2d is') in mk0 ls <> (if null rs then "" else singleton '.' <> fromString rs) else let (ei,is') = roundTo dec' (replicate (-e) 0 ++ is) d:ds' = map i2d (if ei > 0 then is' else 0:is') in singleton d <> (if null ds' then "" else singleton '.' <> fromString ds') -- Based on "Printing Floating-Point Numbers Quickly and Accurately" -- by R.G. Burger and R.K. Dybvig in PLDI 96. -- This version uses a much slower logarithm estimator. It should be improved. -- | 'floatToDigits' takes a base and a non-negative 'RealFloat' number, -- and returns a list of digits and an exponent. -- In particular, if @x>=0@, and -- -- > floatToDigits base x = ([d1,d2,...,dn], e) -- -- then -- -- (1) @n >= 1@ -- -- (2) @x = 0.d1d2...dn * (base**e)@ -- -- (3) @0 <= di <= base-1@ floatToDigits :: (RealFloat a) => a -> ([Int], Int) {-# SPECIALIZE floatToDigits :: Float -> ([Int], Int) #-} {-# SPECIALIZE floatToDigits :: Double -> ([Int], Int) #-} floatToDigits 0 = ([0], 0) floatToDigits x = let (f0, e0) = decodeFloat x (minExp0, _) = floatRange x p = floatDigits x b = floatRadix x minExp = minExp0 - p -- the real minimum exponent -- Haskell requires that f be adjusted so denormalized numbers -- will have an impossibly low exponent. Adjust for this. (f, e) = let n = minExp - e0 in if n > 0 then (f0 `quot` (expt b n), e0+n) else (f0, e0) (r, s, mUp, mDn) = if e >= 0 then let be = expt b e in if f == expt b (p-1) then (f*be*b*2, 2*b, be*b, be) -- according to Burger and Dybvig else (f*be*2, 2, be, be) else if e > minExp && f == expt b (p-1) then (f*b*2, expt b (-e+1)*2, b, 1) else (f*2, expt b (-e)*2, 1, 1) k :: Int k = let k0 :: Int k0 = if b == 2 then -- logBase 10 2 is very slightly larger than 8651/28738 -- (about 5.3558e-10), so if log x >= 0, the approximation -- k1 is too small, hence we add one and need one fixup step less. -- If log x < 0, the approximation errs rather on the high side. -- That is usually more than compensated for by ignoring the -- fractional part of logBase 2 x, but when x is a power of 1/2 -- or slightly larger and the exponent is a multiple of the -- denominator of the rational approximation to logBase 10 2, -- k1 is larger than logBase 10 x. If k1 > 1 + logBase 10 x, -- we get a leading zero-digit we don't want. -- With the approximation 3/10, this happened for -- 0.5^1030, 0.5^1040, ..., 0.5^1070 and values close above. -- The approximation 8651/28738 guarantees k1 < 1 + logBase 10 x -- for IEEE-ish floating point types with exponent fields -- <= 17 bits and mantissae of several thousand bits, earlier -- convergents to logBase 10 2 would fail for long double. -- Using quot instead of div is a little faster and requires -- fewer fixup steps for negative lx. let lx = p - 1 + e0 k1 = (lx * 8651) `quot` 28738 in if lx >= 0 then k1 + 1 else k1 else -- f :: Integer, log :: Float -> Float, -- ceiling :: Float -> Int ceiling ((log (fromInteger (f+1) :: Float) + fromIntegral e * log (fromInteger b)) / log 10) --WAS: fromInt e * log (fromInteger b)) fixup n = if n >= 0 then if r + mUp <= expt 10 n * s then n else fixup (n+1) else if expt 10 (-n) * (r + mUp) <= s then n else fixup (n+1) in fixup k0 gen ds rn sN mUpN mDnN = let (dn, rn') = (rn * 10) `quotRem` sN mUpN' = mUpN * 10 mDnN' = mDnN * 10 in case (rn' < mDnN', rn' + mUpN' > sN) of (True, False) -> dn : ds (False, True) -> dn+1 : ds (True, True) -> if rn' * 2 < sN then dn : ds else dn+1 : ds (False, False) -> gen (dn:ds) rn' sN mUpN' mDnN' rds = if k >= 0 then gen [] r (s * expt 10 k) mUp mDn else let bk = expt 10 (-k) in gen [] (r * bk) s (mUp * bk) (mDn * bk) in (map fromIntegral (reverse rds), k) -- Exponentiation with a cache for the most common numbers. minExpt, maxExpt :: Int minExpt = 0 maxExpt = 1100 expt :: Integer -> Int -> Integer expt base n | base == 2 && n >= minExpt && n <= maxExpt = expts `unsafeAt` n | base == 10 && n <= maxExpt10 = expts10 `unsafeAt` n | otherwise = base^n expts :: Array Int Integer expts = array (minExpt,maxExpt) [(n,2^n) | n <- [minExpt .. maxExpt]] maxExpt10 :: Int maxExpt10 = 324 expts10 :: Array Int Integer expts10 = array (minExpt,maxExpt10) [(n,10^n) | n <- [minExpt .. maxExpt10]] text-0.11.3.1/Data/Text/Lazy/Builder/Int/0000755000000000000000000000000012140332415015734 5ustar0000000000000000text-0.11.3.1/Data/Text/Lazy/Builder/Int/Digits.hs0000644000000000000000000000141512140332415017514 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} -- Module: Data.Text.Lazy.Builder.Int.Digits -- Copyright: (c) 2013 Bryan O'Sullivan -- License: BSD3 -- Maintainer: Bryan O'Sullivan -- Stability: experimental -- Portability: portable -- -- This module exists because the C preprocessor does things that we -- shall not speak of when confronted with Haskell multiline strings. module Data.Text.Lazy.Builder.Int.Digits (digits) where import Data.ByteString.Char8 (ByteString) digits :: ByteString digits = "0001020304050607080910111213141516171819\ \2021222324252627282930313233343536373839\ \4041424344454647484950515253545556575859\ \6061626364656667686970717273747576777879\ \8081828384858687888990919293949596979899" text-0.11.3.1/Data/Text/Lazy/Builder/RealFloat/0000755000000000000000000000000012140332415017053 5ustar0000000000000000text-0.11.3.1/Data/Text/Lazy/Builder/RealFloat/Functions.hs0000644000000000000000000000114512140332415021360 0ustar0000000000000000-- | -- Module: Data.Text.Lazy.Builder.RealFloat.Functions -- Copyright: (c) The University of Glasgow 1994-2002 -- License: see libraries/base/LICENSE module Data.Text.Lazy.Builder.RealFloat.Functions ( roundTo ) where roundTo :: Int -> [Int] -> (Int,[Int]) roundTo d is = case f d is of x@(0,_) -> x (1,xs) -> (1, 1:xs) _ -> error "roundTo: bad Value" where f n [] = (0, replicate n 0) f 0 (x:_) = (if x >= 5 then 1 else 0, []) f n (i:xs) | i' == 10 = (1,0:ds) | otherwise = (0,i':ds) where (c,ds) = f (n-1) xs i' = c + i text-0.11.3.1/Data/Text/Lazy/Encoding/0000755000000000000000000000000012140332415015342 5ustar0000000000000000text-0.11.3.1/Data/Text/Lazy/Encoding/Fusion.hs0000644000000000000000000003200012140332415017134 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, Rank2Types #-} -- | -- Module : Data.Text.Lazy.Encoding.Fusion -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- Fusible 'Stream'-oriented functions for converting between lazy -- 'Text' and several common encodings. module Data.Text.Lazy.Encoding.Fusion ( -- * Streaming -- streamASCII streamUtf8 , streamUtf16LE , streamUtf16BE , streamUtf32LE , streamUtf32BE -- * Unstreaming , unstream , module Data.Text.Encoding.Fusion.Common ) where import Data.ByteString.Lazy.Internal (ByteString(..), defaultChunkSize) import qualified Data.ByteString as B import qualified Data.ByteString.Unsafe as B import Data.Text.Encoding.Fusion.Common import Data.Text.Encoding.Error import Data.Text.Fusion (Step(..), Stream(..)) import Data.Text.Fusion.Size import Data.Text.UnsafeChar (unsafeChr, unsafeChr8, unsafeChr32) import Data.Text.UnsafeShift (shiftL) import Data.Word (Word8, Word16, Word32) import qualified Data.Text.Encoding.Utf8 as U8 import qualified Data.Text.Encoding.Utf16 as U16 import qualified Data.Text.Encoding.Utf32 as U32 import Data.Text.Unsafe (unsafeDupablePerformIO) import Foreign.ForeignPtr (withForeignPtr, ForeignPtr) import Foreign.Storable (pokeByteOff) import Data.ByteString.Internal (mallocByteString, memcpy) #if defined(ASSERTS) import Control.Exception (assert) #endif import qualified Data.ByteString.Internal as B data S = S0 | S1 {-# UNPACK #-} !Word8 | S2 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 | S3 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 | S4 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 {-# UNPACK #-} !Word8 data T = T !ByteString !S {-# UNPACK #-} !Int -- | /O(n)/ Convert a lazy 'ByteString' into a 'Stream Char', using -- UTF-8 encoding. streamUtf8 :: OnDecodeError -> ByteString -> Stream Char streamUtf8 onErr bs0 = Stream next (T bs0 S0 0) unknownSize where next (T bs@(Chunk ps _) S0 i) | i < len && U8.validate1 a = Yield (unsafeChr8 a) (T bs S0 (i+1)) | i + 1 < len && U8.validate2 a b = Yield (U8.chr2 a b) (T bs S0 (i+2)) | i + 2 < len && U8.validate3 a b c = Yield (U8.chr3 a b c) (T bs S0 (i+3)) | i + 3 < len && U8.validate4 a b c d = Yield (U8.chr4 a b c d) (T bs S0 (i+4)) where len = B.length ps a = B.unsafeIndex ps i b = B.unsafeIndex ps (i+1) c = B.unsafeIndex ps (i+2) d = B.unsafeIndex ps (i+3) next st@(T bs s i) = case s of S1 a | U8.validate1 a -> Yield (unsafeChr8 a) es S2 a b | U8.validate2 a b -> Yield (U8.chr2 a b) es S3 a b c | U8.validate3 a b c -> Yield (U8.chr3 a b c) es S4 a b c d | U8.validate4 a b c d -> Yield (U8.chr4 a b c d) es _ -> consume st where es = T bs S0 i consume (T bs@(Chunk ps rest) s i) | i >= B.length ps = consume (T rest s 0) | otherwise = case s of S0 -> next (T bs (S1 x) (i+1)) S1 a -> next (T bs (S2 a x) (i+1)) S2 a b -> next (T bs (S3 a b x) (i+1)) S3 a b c -> next (T bs (S4 a b c x) (i+1)) S4 a b c d -> decodeError "streamUtf8" "UTF-8" onErr (Just a) (T bs (S3 b c d) (i+1)) where x = B.unsafeIndex ps i consume (T Empty S0 _) = Done consume st = decodeError "streamUtf8" "UTF-8" onErr Nothing st {-# INLINE [0] streamUtf8 #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using little -- endian UTF-16 encoding. streamUtf16LE :: OnDecodeError -> ByteString -> Stream Char streamUtf16LE onErr bs0 = Stream next (T bs0 S0 0) unknownSize where next (T bs@(Chunk ps _) S0 i) | i + 1 < len && U16.validate1 x1 = Yield (unsafeChr x1) (T bs S0 (i+2)) | i + 3 < len && U16.validate2 x1 x2 = Yield (U16.chr2 x1 x2) (T bs S0 (i+4)) where len = B.length ps x1 = c (idx i) (idx (i + 1)) x2 = c (idx (i + 2)) (idx (i + 3)) c w1 w2 = w1 + (w2 `shiftL` 8) idx = fromIntegral . B.unsafeIndex ps :: Int -> Word16 next st@(T bs s i) = case s of S2 w1 w2 | U16.validate1 (c w1 w2) -> Yield (unsafeChr (c w1 w2)) es S4 w1 w2 w3 w4 | U16.validate2 (c w1 w2) (c w3 w4) -> Yield (U16.chr2 (c w1 w2) (c w3 w4)) es _ -> consume st where es = T bs S0 i c :: Word8 -> Word8 -> Word16 c w1 w2 = fromIntegral w1 + (fromIntegral w2 `shiftL` 8) consume (T bs@(Chunk ps rest) s i) | i >= B.length ps = consume (T rest s 0) | otherwise = case s of S0 -> next (T bs (S1 x) (i+1)) S1 w1 -> next (T bs (S2 w1 x) (i+1)) S2 w1 w2 -> next (T bs (S3 w1 w2 x) (i+1)) S3 w1 w2 w3 -> next (T bs (S4 w1 w2 w3 x) (i+1)) S4 w1 w2 w3 w4 -> decodeError "streamUtf16LE" "UTF-16LE" onErr (Just w1) (T bs (S3 w2 w3 w4) (i+1)) where x = B.unsafeIndex ps i consume (T Empty S0 _) = Done consume st = decodeError "streamUtf16LE" "UTF-16LE" onErr Nothing st {-# INLINE [0] streamUtf16LE #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using big -- endian UTF-16 encoding. streamUtf16BE :: OnDecodeError -> ByteString -> Stream Char streamUtf16BE onErr bs0 = Stream next (T bs0 S0 0) unknownSize where next (T bs@(Chunk ps _) S0 i) | i + 1 < len && U16.validate1 x1 = Yield (unsafeChr x1) (T bs S0 (i+2)) | i + 3 < len && U16.validate2 x1 x2 = Yield (U16.chr2 x1 x2) (T bs S0 (i+4)) where len = B.length ps x1 = c (idx i) (idx (i + 1)) x2 = c (idx (i + 2)) (idx (i + 3)) c w1 w2 = (w1 `shiftL` 8) + w2 idx = fromIntegral . B.unsafeIndex ps :: Int -> Word16 next st@(T bs s i) = case s of S2 w1 w2 | U16.validate1 (c w1 w2) -> Yield (unsafeChr (c w1 w2)) es S4 w1 w2 w3 w4 | U16.validate2 (c w1 w2) (c w3 w4) -> Yield (U16.chr2 (c w1 w2) (c w3 w4)) es _ -> consume st where es = T bs S0 i c :: Word8 -> Word8 -> Word16 c w1 w2 = (fromIntegral w1 `shiftL` 8) + fromIntegral w2 consume (T bs@(Chunk ps rest) s i) | i >= B.length ps = consume (T rest s 0) | otherwise = case s of S0 -> next (T bs (S1 x) (i+1)) S1 w1 -> next (T bs (S2 w1 x) (i+1)) S2 w1 w2 -> next (T bs (S3 w1 w2 x) (i+1)) S3 w1 w2 w3 -> next (T bs (S4 w1 w2 w3 x) (i+1)) S4 w1 w2 w3 w4 -> decodeError "streamUtf16BE" "UTF-16BE" onErr (Just w1) (T bs (S3 w2 w3 w4) (i+1)) where x = B.unsafeIndex ps i consume (T Empty S0 _) = Done consume st = decodeError "streamUtf16BE" "UTF-16BE" onErr Nothing st {-# INLINE [0] streamUtf16BE #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using big -- endian UTF-32 encoding. streamUtf32BE :: OnDecodeError -> ByteString -> Stream Char streamUtf32BE onErr bs0 = Stream next (T bs0 S0 0) unknownSize where next (T bs@(Chunk ps _) S0 i) | i + 3 < len && U32.validate x = Yield (unsafeChr32 x) (T bs S0 (i+4)) where len = B.length ps x = shiftL x1 24 + shiftL x2 16 + shiftL x3 8 + x4 x1 = idx i x2 = idx (i+1) x3 = idx (i+2) x4 = idx (i+3) idx = fromIntegral . B.unsafeIndex ps :: Int -> Word32 next st@(T bs s i) = case s of S4 w1 w2 w3 w4 | U32.validate (c w1 w2 w3 w4) -> Yield (unsafeChr32 (c w1 w2 w3 w4)) es _ -> consume st where es = T bs S0 i c :: Word8 -> Word8 -> Word8 -> Word8 -> Word32 c w1 w2 w3 w4 = shifted where shifted = shiftL x1 24 + shiftL x2 16 + shiftL x3 8 + x4 x1 = fromIntegral w1 x2 = fromIntegral w2 x3 = fromIntegral w3 x4 = fromIntegral w4 consume (T bs@(Chunk ps rest) s i) | i >= B.length ps = consume (T rest s 0) | otherwise = case s of S0 -> next (T bs (S1 x) (i+1)) S1 w1 -> next (T bs (S2 w1 x) (i+1)) S2 w1 w2 -> next (T bs (S3 w1 w2 x) (i+1)) S3 w1 w2 w3 -> next (T bs (S4 w1 w2 w3 x) (i+1)) S4 w1 w2 w3 w4 -> decodeError "streamUtf32BE" "UTF-32BE" onErr (Just w1) (T bs (S3 w2 w3 w4) (i+1)) where x = B.unsafeIndex ps i consume (T Empty S0 _) = Done consume st = decodeError "streamUtf32BE" "UTF-32BE" onErr Nothing st {-# INLINE [0] streamUtf32BE #-} -- | /O(n)/ Convert a 'ByteString' into a 'Stream Char', using little -- endian UTF-32 encoding. streamUtf32LE :: OnDecodeError -> ByteString -> Stream Char streamUtf32LE onErr bs0 = Stream next (T bs0 S0 0) unknownSize where next (T bs@(Chunk ps _) S0 i) | i + 3 < len && U32.validate x = Yield (unsafeChr32 x) (T bs S0 (i+4)) where len = B.length ps x = shiftL x4 24 + shiftL x3 16 + shiftL x2 8 + x1 x1 = idx i x2 = idx (i+1) x3 = idx (i+2) x4 = idx (i+3) idx = fromIntegral . B.unsafeIndex ps :: Int -> Word32 next st@(T bs s i) = case s of S4 w1 w2 w3 w4 | U32.validate (c w1 w2 w3 w4) -> Yield (unsafeChr32 (c w1 w2 w3 w4)) es _ -> consume st where es = T bs S0 i c :: Word8 -> Word8 -> Word8 -> Word8 -> Word32 c w1 w2 w3 w4 = shifted where shifted = shiftL x4 24 + shiftL x3 16 + shiftL x2 8 + x1 x1 = fromIntegral w1 x2 = fromIntegral w2 x3 = fromIntegral w3 x4 = fromIntegral w4 consume (T bs@(Chunk ps rest) s i) | i >= B.length ps = consume (T rest s 0) | otherwise = case s of S0 -> next (T bs (S1 x) (i+1)) S1 w1 -> next (T bs (S2 w1 x) (i+1)) S2 w1 w2 -> next (T bs (S3 w1 w2 x) (i+1)) S3 w1 w2 w3 -> next (T bs (S4 w1 w2 w3 x) (i+1)) S4 w1 w2 w3 w4 -> decodeError "streamUtf32LE" "UTF-32LE" onErr (Just w1) (T bs (S3 w2 w3 w4) (i+1)) where x = B.unsafeIndex ps i consume (T Empty S0 _) = Done consume st = decodeError "streamUtf32LE" "UTF-32LE" onErr Nothing st {-# INLINE [0] streamUtf32LE #-} -- | /O(n)/ Convert a 'Stream' 'Word8' to a lazy 'ByteString'. unstreamChunks :: Int -> Stream Word8 -> ByteString unstreamChunks chunkSize (Stream next s0 len0) = chunk s0 (upperBound 4 len0) where chunk s1 len1 = unsafeDupablePerformIO $ do let len = max 4 (min len1 chunkSize) mallocByteString len >>= loop len 0 s1 where loop !n !off !s fp = case next s of Done | off == 0 -> return Empty | otherwise -> return $! Chunk (trimUp fp off) Empty Skip s' -> loop n off s' fp Yield x s' | off == chunkSize -> do let !newLen = n - off return $! Chunk (trimUp fp off) (chunk s newLen) | off == n -> realloc fp n off s' x | otherwise -> do withForeignPtr fp $ \p -> pokeByteOff p off x loop n (off+1) s' fp {-# NOINLINE realloc #-} realloc fp n off s x = do let n' = min (n+n) chunkSize fp' <- copy0 fp n n' withForeignPtr fp' $ \p -> pokeByteOff p off x loop n' (off+1) s fp' trimUp fp off = B.PS fp 0 off copy0 :: ForeignPtr Word8 -> Int -> Int -> IO (ForeignPtr Word8) copy0 !src !srcLen !destLen = #if defined(ASSERTS) assert (srcLen <= destLen) $ #endif do dest <- mallocByteString destLen withForeignPtr src $ \src' -> withForeignPtr dest $ \dest' -> memcpy dest' src' (fromIntegral srcLen) return dest -- | /O(n)/ Convert a 'Stream' 'Word8' to a lazy 'ByteString'. unstream :: Stream Word8 -> ByteString unstream = unstreamChunks defaultChunkSize decodeError :: forall s. String -> String -> OnDecodeError -> Maybe Word8 -> s -> Step s Char decodeError func kind onErr mb i = case onErr desc mb of Nothing -> Skip i Just c -> Yield c i where desc = "Data.Text.Lazy.Encoding.Fusion." ++ func ++ ": Invalid " ++ kind ++ " stream" text-0.11.3.1/Data/Text/Unsafe/0000755000000000000000000000000012140332415014116 5ustar0000000000000000text-0.11.3.1/Data/Text/Unsafe/Base.hs0000644000000000000000000000334312140332415015327 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash, UnboxedTuples #-} -- | -- Module : Data.Text.Unsafe.Base -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan -- License : BSD-style -- Maintainer : bos@serpentine.com, rtomharper@googlemail.com, -- duncan@haskell.org -- Stability : experimental -- Portability : portable -- -- A module containing unsafe operations, for very very careful use in -- heavily tested code. module Data.Text.Unsafe.Base ( inlineInterleaveST , inlinePerformIO ) where import GHC.ST (ST(..)) #if defined(__GLASGOW_HASKELL__) # if __GLASGOW_HASKELL__ >= 611 import GHC.IO (IO(IO)) # else import GHC.IOBase (IO(IO)) # endif import GHC.Base (realWorld#) #endif -- | Just like unsafePerformIO, but we inline it. Big performance gains as -- it exposes lots of things to further inlining. /Very unsafe/. In -- particular, you should do no memory allocation inside an -- 'inlinePerformIO' block. On Hugs this is just @unsafePerformIO@. -- {-# INLINE inlinePerformIO #-} inlinePerformIO :: IO a -> a #if defined(__GLASGOW_HASKELL__) inlinePerformIO (IO m) = case m realWorld# of (# _, r #) -> r #else inlinePerformIO = unsafePerformIO #endif -- | Allow an 'ST' computation to be deferred lazily. When passed an -- action of type 'ST' @s@ @a@, the action will only be performed when -- the value of @a@ is demanded. -- -- This function is identical to the normal unsafeInterleaveST, but is -- inlined and hence faster. -- -- /Note/: This operation is highly unsafe, as it can introduce -- externally visible non-determinism into an 'ST' action. inlineInterleaveST :: ST s a -> ST s a inlineInterleaveST (ST m) = ST $ \ s -> let r = case m s of (# _, res #) -> res in (# s, r #) {-# INLINE inlineInterleaveST #-} text-0.11.3.1/scripts/0000755000000000000000000000000012140332415012567 5ustar0000000000000000text-0.11.3.1/scripts/ApiCompare.hs0000644000000000000000000000155512140332415015151 0ustar0000000000000000-- This script compares the strict and lazy Text APIs to ensure that -- they're reasonably in sync. {-# LANGUAGE OverloadedStrings #-} import qualified Data.Set as S import qualified Data.Text as T import System.Process main = do let tidy pkg = (S.fromList . filter (T.isInfixOf "::") . T.lines . T.replace "GHC.Int.Int64" "Int" . T.replace "\n " "" . T.replace (T.append (T.pack pkg) ".") "" . T.pack) `fmap` readProcess "ghci" [] (":browse " ++ pkg) let diff a b = mapM_ (putStrLn . (" "++) . T.unpack) . S.toList $ S.difference a b text <- tidy "Data.Text" lazy <- tidy "Data.Text.Lazy" list <- tidy "Data.List" putStrLn "Text \\ List:" diff text list putStrLn "" putStrLn "Text \\ Lazy:" diff text lazy putStrLn "" putStrLn "Lazy \\ Text:" diff lazy text text-0.11.3.1/scripts/Arsec.hs0000644000000000000000000000241412140332415014161 0ustar0000000000000000module Arsec ( comment , semi , showC , unichar , unichars , module Control.Applicative , module Control.Monad , module Data.Char , module Text.ParserCombinators.Parsec.Char , module Text.ParserCombinators.Parsec.Combinator , module Text.ParserCombinators.Parsec.Error , module Text.ParserCombinators.Parsec.Prim ) where import Control.Monad import Control.Applicative import Data.Char import Numeric import Text.ParserCombinators.Parsec.Char hiding (lower, upper) import Text.ParserCombinators.Parsec.Combinator hiding (optional) import Text.ParserCombinators.Parsec.Error import Text.ParserCombinators.Parsec.Prim hiding ((<|>), many) instance Applicative (GenParser s a) where pure = return (<*>) = ap instance Alternative (GenParser s a) where empty = mzero (<|>) = mplus unichar :: Parser Char unichar = chr . fst . head . readHex <$> many1 hexDigit unichars :: Parser [Char] unichars = manyTill (unichar <* spaces) semi semi :: Parser () semi = char ';' *> spaces *> pure () comment :: Parser String comment = (char '#' *> manyTill anyToken (char '\n')) <|> string "\n" showC :: Char -> String showC c = "'\\x" ++ d ++ "'" where h = showHex (ord c) "" d = replicate (4 - length h) '0' ++ h text-0.11.3.1/scripts/CaseFolding.hs0000644000000000000000000000237112140332415015304 0ustar0000000000000000-- This script processes the following source file: -- -- http://unicode.org/Public/UNIDATA/CaseFolding.txt module CaseFolding ( Fold(..) , parseCF , mapCF ) where import Arsec data Fold = Fold { code :: Char , status :: Char , mapping :: [Char] , name :: String } deriving (Eq, Ord, Show) entries :: Parser [Fold] entries = many comment *> many (entry <* many comment) where entry = Fold <$> unichar <* semi <*> oneOf "CFST" <* semi <*> unichars <*> (string "# " *> manyTill anyToken (char '\n')) parseCF :: FilePath -> IO (Either ParseError [Fold]) parseCF name = parse entries name <$> readFile name mapCF :: [Fold] -> [String] mapCF ms = typ ++ (map nice . filter p $ ms) ++ [last] where typ = ["foldMapping :: forall s. Char -> s -> Step (CC s) Char" ,"{-# INLINE foldMapping #-}"] last = "foldMapping c s = Yield (toLower c) (CC s '\\0' '\\0')" nice c = "-- " ++ name c ++ "\n" ++ "foldMapping " ++ showC (code c) ++ " s = Yield " ++ x ++ " (CC s " ++ y ++ " " ++ z ++ ")" where [x,y,z] = (map showC . take 3) (mapping c ++ repeat '\0') p f = status f `elem` "CF" && mapping f /= [toLower (code f)] text-0.11.3.1/scripts/CaseMapping.hs0000644000000000000000000000210312140332415015306 0ustar0000000000000000import System.Environment import System.IO import Arsec import CaseFolding import SpecialCasing main = do args <- getArgs let oname = case args of [] -> "../Data/Text/Fusion/CaseMapping.hs" [o] -> o psc <- parseSC "SpecialCasing.txt" pcf <- parseCF "CaseFolding.txt" scs <- case psc of Left err -> print err >> return undefined Right ms -> return ms cfs <- case pcf of Left err -> print err >> return undefined Right ms -> return ms h <- openFile oname WriteMode mapM_ (hPutStrLn h) ["{-# LANGUAGE Rank2Types #-}" ,"-- AUTOMATICALLY GENERATED - DO NOT EDIT" ,"-- Generated by scripts/SpecialCasing.hs" ,"module Data.Text.Fusion.CaseMapping where" ,"import Data.Char" ,"import Data.Text.Fusion.Internal" ,""] mapM_ (hPutStrLn h) (mapSC "upper" upper toUpper scs) mapM_ (hPutStrLn h) (mapSC "lower" lower toLower scs) mapM_ (hPutStrLn h) (mapCF cfs) hClose h text-0.11.3.1/scripts/SpecialCasing.hs0000644000000000000000000000305212140332415015630 0ustar0000000000000000-- This script processes the following source file: -- -- http://unicode.org/Public/UNIDATA/SpecialCasing.txt module SpecialCasing ( Case(..) , parseSC , mapSC ) where import Arsec data Case = Case { code :: Char , lower :: [Char] , title :: [Char] , upper :: [Char] , conditions :: String , name :: String } deriving (Eq, Ord, Show) entries :: Parser [Case] entries = many comment *> many (entry <* many comment) where entry = Case <$> unichar <* semi <*> unichars <*> unichars <*> unichars <*> manyTill anyToken (string "# ") <*> manyTill anyToken (char '\n') parseSC :: FilePath -> IO (Either ParseError [Case]) parseSC name = parse entries name <$> readFile name mapSC :: String -> (Case -> String) -> (Char -> Char) -> [Case] -> [String] mapSC which access twiddle ms = typ ++ (map nice . filter p $ ms) ++ [last] where typ = [which ++ "Mapping :: forall s. Char -> s -> Step (CC s) Char" ,"{-# INLINE " ++ which ++ "Mapping #-}"] last = which ++ "Mapping c s = Yield (to" ++ ucFirst which ++ " c) (CC s '\\0' '\\0')" nice c = "-- " ++ name c ++ "\n" ++ which ++ "Mapping " ++ showC (code c) ++ " s = Yield " ++ x ++ " (CC s " ++ y ++ " " ++ z ++ ")" where [x,y,z] = (map showC . take 3) (access c ++ repeat '\0') p c = [k] /= a && a /= [twiddle k] && null (conditions c) where a = access c k = code c ucFirst (c:cs) = toUpper c : cs ucFirst [] = [] text-0.11.3.1/tests/0000755000000000000000000000000012140332415012242 5ustar0000000000000000text-0.11.3.1/tests/.ghci0000644000000000000000000000004212140332415013151 0ustar0000000000000000:set -DHAVE_DEEPSEQ -isrc -i../.. text-0.11.3.1/tests/Makefile0000644000000000000000000000150712140332415013705 0ustar0000000000000000count = 1000 coverage: build coverage/hpc_index.html build: text-test-data cabal configure -fhpc cabal build text-test-data: hg clone https://bitbucket.org/bos/text-test-data $(MAKE) -C text-test-data coverage/text-tests.tix: -mkdir -p coverage ./dist/build/text-tests/text-tests -a $(count) mv text-tests.tix $@ coverage/text-tests-stdio.tix: -mkdir -p coverage ./scripts/cover-stdio.sh ./dist/build/text-tests-stdio/text-tests-stdio mv text-tests-stdio.tix $@ coverage/coverage.tix: coverage/text-tests.tix coverage/text-tests-stdio.tix hpc combine --output=$@ \ --exclude=Main \ coverage/text-tests.tix \ coverage/text-tests-stdio.tix coverage/hpc_index.html: coverage/coverage.tix hpc markup --destdir=coverage coverage/coverage.tix clean: rm -rf dist coverage .hpc .PHONY: build coverage text-0.11.3.1/tests/Tests.hs0000644000000000000000000000046412140332415013704 0ustar0000000000000000-- | Provides a simple main function which runs all the tests -- module Main ( main ) where import Test.Framework (defaultMain) import qualified Tests.Properties as Properties import qualified Tests.Regressions as Regressions main :: IO () main = defaultMain [Properties.tests, Regressions.tests] text-0.11.3.1/tests/text-tests.cabal0000644000000000000000000000525312140332415015357 0ustar0000000000000000name: text-tests version: 0.0.0.0 synopsis: Functional tests for the text package description: Functional tests for the text package homepage: https://bitbucket.org/bos/text license: BSD3 license-file: ../LICENSE author: Jasper Van der Jeugt , Bryan O'Sullivan , Tom Harper , Duncan Coutts maintainer: Bryan O'Sullivan category: Text build-type: Simple cabal-version: >=1.8 flag hpc description: Enable HPC to generate coverage reports default: False executable text-tests main-is: Tests.hs ghc-options: -Wall -threaded -O0 -rtsopts if flag(hpc) ghc-options: -fhpc cpp-options: -DASSERTS -DHAVE_DEEPSEQ build-depends: HUnit >= 1.2, QuickCheck >= 2.4, base == 4.*, bytestring, deepseq, directory, random, test-framework >= 0.4, test-framework-hunit >= 0.2, test-framework-quickcheck2 >= 0.2, text-tests executable text-tests-stdio main-is: Tests/IO.hs ghc-options: -Wall -threaded -rtsopts -- Optional HPC support if flag(hpc) ghc-options: -fhpc build-depends: text-tests, base >= 4 && < 5 library hs-source-dirs: .. c-sources: ../cbits/cbits.c exposed-modules: Data.Text Data.Text.Array Data.Text.Encoding Data.Text.Encoding.Error Data.Text.Encoding.Fusion Data.Text.Encoding.Fusion.Common Data.Text.Encoding.Utf16 Data.Text.Encoding.Utf32 Data.Text.Encoding.Utf8 Data.Text.Foreign Data.Text.Fusion Data.Text.Fusion.CaseMapping Data.Text.Fusion.Common Data.Text.Fusion.Internal Data.Text.Fusion.Size Data.Text.IO Data.Text.IO.Internal Data.Text.Internal Data.Text.Lazy Data.Text.Lazy.Builder Data.Text.Lazy.Builder.Functions Data.Text.Lazy.Builder.Int Data.Text.Lazy.Builder.Int.Digits Data.Text.Lazy.Builder.Internal Data.Text.Lazy.Builder.RealFloat Data.Text.Lazy.Builder.RealFloat.Functions Data.Text.Lazy.Encoding Data.Text.Lazy.Encoding.Fusion Data.Text.Lazy.Fusion Data.Text.Lazy.IO Data.Text.Lazy.Internal Data.Text.Lazy.Read Data.Text.Lazy.Search Data.Text.Private Data.Text.Read Data.Text.Search Data.Text.Unsafe Data.Text.Unsafe.Base Data.Text.UnsafeChar Data.Text.UnsafeShift Data.Text.Util if flag(hpc) ghc-options: -fhpc cpp-options: -DHAVE_DEEPSEQ -DASSERTS -DINTEGER_GMP build-depends: array, base == 4.*, bytestring, deepseq, ghc-prim, integer-gmp text-0.11.3.1/tests/scripts/0000755000000000000000000000000012140332415013731 5ustar0000000000000000text-0.11.3.1/tests/scripts/cover-stdio.sh0000755000000000000000000000217712140332415016535 0ustar0000000000000000#!/bin/bash if [[ $# < 1 ]]; then echo "Usage: $0 " exit 1 fi exe=$1 rm -f $exe.tix f=$(mktemp stdio-f.XXXXXX) g=$(mktemp stdio-g.XXXXXX) for t in T TL; do echo $t.readFile > $f $exe $t.readFile $f > $g if ! diff -u $f $g; then errs=$((errs+1)) echo FAIL: $t.readFile 1>&2 fi $exe $t.writeFile $f $t.writeFile echo -n $t.writeFile > $g if ! diff -u $f $g; then errs=$((errs+1)) echo FAIL: $t.writeFile 1>&2 fi echo -n quux > $f $exe $t.appendFile $f $t.appendFile echo -n quux$t.appendFile > $g if ! diff -u $f $g; then errs=$((errs+1)) echo FAIL: $t.appendFile 1>&2 fi echo $t.interact | $exe $t.interact > $f echo $t.interact > $g if ! diff -u $f $g; then errs=$((errs+1)) echo FAIL: $t.interact 1>&2 fi echo $t.getContents | $exe $t.getContents > $f echo $t.getContents > $g if ! diff -u $f $g; then errs=$((errs+1)) echo FAIL: $t.getContents 1>&2 fi echo $t.getLine | $exe $t.getLine > $f echo $t.getLine > $g if ! diff -u $f $g; then errs=$((errs+1)) echo FAIL: $t.getLine 1>&2 fi done rm -f $f $g exit $errs text-0.11.3.1/tests/Tests/0000755000000000000000000000000012140332415013344 5ustar0000000000000000text-0.11.3.1/tests/Tests/IO.hs0000644000000000000000000000227712140332415014217 0ustar0000000000000000-- | Program which exposes some haskell functions as an exutable. The results -- and coverage of this module is meant to be checked using a shell script. -- module Main ( main ) where import System.Environment (getArgs) import System.Exit (exitFailure) import System.IO (hPutStrLn, stderr) import qualified Data.Text as T import qualified Data.Text.IO as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.IO as TL main :: IO () main = do args <- getArgs case args of ["T.readFile", name] -> T.putStr =<< T.readFile name ["T.writeFile", name, t] -> T.writeFile name (T.pack t) ["T.appendFile", name, t] -> T.appendFile name (T.pack t) ["T.interact"] -> T.interact id ["T.getContents"] -> T.putStr =<< T.getContents ["T.getLine"] -> T.putStrLn =<< T.getLine ["TL.readFile", name] -> TL.putStr =<< TL.readFile name ["TL.writeFile", name, t] -> TL.writeFile name (TL.pack t) ["TL.appendFile", name, t] -> TL.appendFile name (TL.pack t) ["TL.interact"] -> TL.interact id ["TL.getContents"] -> TL.putStr =<< TL.getContents ["TL.getLine"] -> TL.putStrLn =<< TL.getLine _ -> hPutStrLn stderr "invalid directive!" >> exitFailure text-0.11.3.1/tests/Tests/Properties.hs0000644000000000000000000014766612140332415016060 0ustar0000000000000000-- | General quicktest properties for the text library -- {-# LANGUAGE BangPatterns, FlexibleInstances, OverloadedStrings, ScopedTypeVariables, TypeSynonymInstances, CPP #-} {-# OPTIONS_GHC -fno-enable-rewrite-rules #-} {-# OPTIONS_GHC -fno-warn-missing-signatures #-} module Tests.Properties ( tests ) where import Test.QuickCheck import Test.QuickCheck.Monadic import Text.Show.Functions () import Control.Arrow ((***), second) import Control.Exception (catch) import Data.Char (chr, isDigit, isHexDigit, isLower, isSpace, isUpper, ord) import Data.Int (Int8, Int16, Int32, Int64) import Data.Monoid (Monoid(..)) import Data.String (fromString) import Data.Text.Encoding.Error import Data.Text.Foreign import Data.Text.Fusion.Size import Data.Text.Lazy.Read as TL import Data.Text.Read as T import Data.Text.Search (indices) import Data.Word (Word, Word8, Word16, Word32, Word64) import Numeric (showHex) import Prelude hiding (catch, replicate) import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import qualified Data.Bits as Bits (shiftL, shiftR) import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as BL import qualified Data.List as L import qualified Data.Text as T import qualified Data.Text.Encoding as E import qualified Data.Text.Fusion as S import qualified Data.Text.Fusion.Common as S import qualified Data.Text.IO as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Builder as TB import qualified Data.Text.Lazy.Builder.Int as TB import qualified Data.Text.Lazy.Builder.RealFloat as TB import qualified Data.Text.Lazy.Encoding as EL import qualified Data.Text.Lazy.Fusion as SL import qualified Data.Text.Lazy.IO as TL import qualified Data.Text.Lazy.Search as S (indices) import qualified Data.Text.UnsafeShift as U import qualified System.IO as IO import Tests.QuickCheckUtils import Tests.Utils import qualified Tests.SlowFunctions as Slow t_pack_unpack = (T.unpack . T.pack) `eq` id tl_pack_unpack = (TL.unpack . TL.pack) `eq` id t_stream_unstream = (S.unstream . S.stream) `eq` id tl_stream_unstream = (SL.unstream . SL.stream) `eq` id t_reverse_stream t = (S.reverse . S.reverseStream) t == t t_singleton c = [c] == (T.unpack . T.singleton) c tl_singleton c = [c] == (TL.unpack . TL.singleton) c tl_unstreamChunks x = f 11 x == f 1000 x where f n = SL.unstreamChunks n . S.streamList tl_chunk_unchunk = (TL.fromChunks . TL.toChunks) `eq` id tl_from_to_strict = (TL.fromStrict . TL.toStrict) `eq` id -- Note: this silently truncates code-points > 255 to 8-bit due to 'B.pack' encodeL1 :: T.Text -> B.ByteString encodeL1 = B.pack . map (fromIntegral . fromEnum) . T.unpack encodeLazyL1 :: TL.Text -> BL.ByteString encodeLazyL1 = BL.fromChunks . map encodeL1 . TL.toChunks t_ascii t = E.decodeASCII (E.encodeUtf8 a) == a where a = T.map (\c -> chr (ord c `mod` 128)) t tl_ascii t = EL.decodeASCII (EL.encodeUtf8 a) == a where a = TL.map (\c -> chr (ord c `mod` 128)) t t_latin1 t = E.decodeLatin1 (encodeL1 a) == a where a = T.map (\c -> chr (ord c `mod` 256)) t tl_latin1 t = EL.decodeLatin1 (encodeLazyL1 a) == a where a = TL.map (\c -> chr (ord c `mod` 256)) t t_utf8 = forAll genUnicode $ (E.decodeUtf8 . E.encodeUtf8) `eq` id t_utf8' = forAll genUnicode $ (E.decodeUtf8' . E.encodeUtf8) `eq` (id . Right) tl_utf8 = forAll genUnicode $ (EL.decodeUtf8 . EL.encodeUtf8) `eq` id tl_utf8' = forAll genUnicode $ (EL.decodeUtf8' . EL.encodeUtf8) `eq` (id . Right) t_utf16LE = forAll genUnicode $ (E.decodeUtf16LE . E.encodeUtf16LE) `eq` id tl_utf16LE = forAll genUnicode $ (EL.decodeUtf16LE . EL.encodeUtf16LE) `eq` id t_utf16BE = forAll genUnicode $ (E.decodeUtf16BE . E.encodeUtf16BE) `eq` id tl_utf16BE = forAll genUnicode $ (EL.decodeUtf16BE . EL.encodeUtf16BE) `eq` id t_utf32LE = forAll genUnicode $ (E.decodeUtf32LE . E.encodeUtf32LE) `eq` id tl_utf32LE = forAll genUnicode $ (EL.decodeUtf32LE . EL.encodeUtf32LE) `eq` id t_utf32BE = forAll genUnicode $ (E.decodeUtf32BE . E.encodeUtf32BE) `eq` id tl_utf32BE = forAll genUnicode $ (EL.decodeUtf32BE . EL.encodeUtf32BE) `eq` id -- This is a poor attempt to ensure that the error handling paths on -- decode are exercised in some way. Proper testing would be rather -- more involved. t_utf8_err :: DecodeErr -> B.ByteString -> Property t_utf8_err (DE _ de) bs = monadicIO $ do l <- run $ let len = T.length (E.decodeUtf8With de bs) in (len `seq` return (Right len)) `catch` (\(e::UnicodeException) -> return (Left e)) case l of Left err -> assert $ length (show err) >= 0 Right n -> assert $ n >= 0 t_utf8_err' :: B.ByteString -> Property t_utf8_err' bs = monadicIO . assert $ case E.decodeUtf8' bs of Left err -> length (show err) >= 0 Right t -> T.length t >= 0 s_Eq s = (s==) `eq` ((S.streamList s==) . S.streamList) where _types = s :: String sf_Eq p s = ((L.filter p s==) . L.filter p) `eq` (((S.filter p $ S.streamList s)==) . S.filter p . S.streamList) t_Eq s = (s==) `eq` ((T.pack s==) . T.pack) tl_Eq s = (s==) `eq` ((TL.pack s==) . TL.pack) s_Ord s = (compare s) `eq` (compare (S.streamList s) . S.streamList) where _types = s :: String sf_Ord p s = ((compare $ L.filter p s) . L.filter p) `eq` (compare (S.filter p $ S.streamList s) . S.filter p . S.streamList) t_Ord s = (compare s) `eq` (compare (T.pack s) . T.pack) tl_Ord s = (compare s) `eq` (compare (TL.pack s) . TL.pack) t_Read = id `eq` (T.unpack . read . show) tl_Read = id `eq` (TL.unpack . read . show) t_Show = show `eq` (show . T.pack) tl_Show = show `eq` (show . TL.pack) t_mappend s = mappend s`eqP` (unpackS . mappend (T.pack s)) tl_mappend s = mappend s`eqP` (unpackS . mappend (TL.pack s)) t_mconcat = mconcat `eq` (unpackS . mconcat . L.map T.pack) tl_mconcat = mconcat `eq` (unpackS . mconcat . L.map TL.pack) t_mempty = mempty == (unpackS (mempty :: T.Text)) tl_mempty = mempty == (unpackS (mempty :: TL.Text)) t_IsString = fromString `eqP` (T.unpack . fromString) tl_IsString = fromString `eqP` (TL.unpack . fromString) s_cons x = (x:) `eqP` (unpackS . S.cons x) s_cons_s x = (x:) `eqP` (unpackS . S.unstream . S.cons x) sf_cons p x = ((x:) . L.filter p) `eqP` (unpackS . S.cons x . S.filter p) t_cons x = (x:) `eqP` (unpackS . T.cons x) tl_cons x = (x:) `eqP` (unpackS . TL.cons x) s_snoc x = (++ [x]) `eqP` (unpackS . (flip S.snoc) x) t_snoc x = (++ [x]) `eqP` (unpackS . (flip T.snoc) x) tl_snoc x = (++ [x]) `eqP` (unpackS . (flip TL.snoc) x) s_append s = (s++) `eqP` (unpackS . S.append (S.streamList s)) s_append_s s = (s++) `eqP` (unpackS . S.unstream . S.append (S.streamList s)) sf_append p s = (L.filter p s++) `eqP` (unpackS . S.append (S.filter p $ S.streamList s)) t_append s = (s++) `eqP` (unpackS . T.append (packS s)) uncons (x:xs) = Just (x,xs) uncons _ = Nothing s_uncons = uncons `eqP` (fmap (second unpackS) . S.uncons) sf_uncons p = (uncons . L.filter p) `eqP` (fmap (second unpackS) . S.uncons . S.filter p) t_uncons = uncons `eqP` (fmap (second unpackS) . T.uncons) tl_uncons = uncons `eqP` (fmap (second unpackS) . TL.uncons) s_head = head `eqP` S.head sf_head p = (head . L.filter p) `eqP` (S.head . S.filter p) t_head = head `eqP` T.head tl_head = head `eqP` TL.head s_last = last `eqP` S.last sf_last p = (last . L.filter p) `eqP` (S.last . S.filter p) t_last = last `eqP` T.last tl_last = last `eqP` TL.last s_tail = tail `eqP` (unpackS . S.tail) s_tail_s = tail `eqP` (unpackS . S.unstream . S.tail) sf_tail p = (tail . L.filter p) `eqP` (unpackS . S.tail . S.filter p) t_tail = tail `eqP` (unpackS . T.tail) tl_tail = tail `eqP` (unpackS . TL.tail) s_init = init `eqP` (unpackS . S.init) s_init_s = init `eqP` (unpackS . S.unstream . S.init) sf_init p = (init . L.filter p) `eqP` (unpackS . S.init . S.filter p) t_init = init `eqP` (unpackS . T.init) tl_init = init `eqP` (unpackS . TL.init) s_null = null `eqP` S.null sf_null p = (null . L.filter p) `eqP` (S.null . S.filter p) t_null = null `eqP` T.null tl_null = null `eqP` TL.null s_length = length `eqP` S.length sf_length p = (length . L.filter p) `eqP` (S.length . S.filter p) sl_length = (fromIntegral . length) `eqP` SL.length t_length = length `eqP` T.length tl_length = L.genericLength `eqP` TL.length t_compareLength t = (compare (T.length t)) `eq` T.compareLength t tl_compareLength t= (compare (TL.length t)) `eq` TL.compareLength t s_map f = map f `eqP` (unpackS . S.map f) s_map_s f = map f `eqP` (unpackS . S.unstream . S.map f) sf_map p f = (map f . L.filter p) `eqP` (unpackS . S.map f . S.filter p) t_map f = map f `eqP` (unpackS . T.map f) tl_map f = map f `eqP` (unpackS . TL.map f) s_intercalate c = L.intercalate c `eq` (unpackS . S.intercalate (packS c) . map packS) t_intercalate c = L.intercalate c `eq` (unpackS . T.intercalate (packS c) . map packS) tl_intercalate c = L.intercalate c `eq` (unpackS . TL.intercalate (TL.pack c) . map TL.pack) s_intersperse c = L.intersperse c `eqP` (unpackS . S.intersperse c) s_intersperse_s c = L.intersperse c `eqP` (unpackS . S.unstream . S.intersperse c) sf_intersperse p c= (L.intersperse c . L.filter p) `eqP` (unpackS . S.intersperse c . S.filter p) t_intersperse c = L.intersperse c `eqP` (unpackS . T.intersperse c) tl_intersperse c = L.intersperse c `eqP` (unpackS . TL.intersperse c) t_transpose = L.transpose `eq` (map unpackS . T.transpose . map packS) tl_transpose = L.transpose `eq` (map unpackS . TL.transpose . map TL.pack) t_reverse = L.reverse `eqP` (unpackS . T.reverse) tl_reverse = L.reverse `eqP` (unpackS . TL.reverse) t_reverse_short n = L.reverse `eqP` (unpackS . S.reverse . shorten n . S.stream) t_replace s d = (L.intercalate d . splitOn s) `eqP` (unpackS . T.replace (T.pack s) (T.pack d)) tl_replace s d = (L.intercalate d . splitOn s) `eqP` (unpackS . TL.replace (TL.pack s) (TL.pack d)) splitOn :: (Eq a) => [a] -> [a] -> [[a]] splitOn pat src0 | l == 0 = error "empty" | otherwise = go src0 where l = length pat go src = search 0 src where search _ [] = [src] search !n s@(_:s') | pat `L.isPrefixOf` s = take n src : go (drop l s) | otherwise = search (n+1) s' s_toCaseFold_length xs = S.length (S.toCaseFold s) >= length xs where s = S.streamList xs sf_toCaseFold_length p xs = (S.length . S.toCaseFold . S.filter p $ s) >= (length . L.filter p $ xs) where s = S.streamList xs t_toCaseFold_length t = T.length (T.toCaseFold t) >= T.length t tl_toCaseFold_length t = TL.length (TL.toCaseFold t) >= TL.length t t_toLower_length t = T.length (T.toLower t) >= T.length t t_toLower_lower t = p (T.toLower t) >= p t where p = T.length . T.filter isLower tl_toLower_lower t = p (TL.toLower t) >= p t where p = TL.length . TL.filter isLower t_toUpper_length t = T.length (T.toUpper t) >= T.length t t_toUpper_upper t = p (T.toUpper t) >= p t where p = T.length . T.filter isUpper tl_toUpper_upper t = p (TL.toUpper t) >= p t where p = TL.length . TL.filter isUpper justifyLeft k c xs = xs ++ L.replicate (k - length xs) c justifyRight m n xs = L.replicate (m - length xs) n ++ xs center k c xs | len >= k = xs | otherwise = L.replicate l c ++ xs ++ L.replicate r c where len = length xs d = k - len r = d `div` 2 l = d - r s_justifyLeft k c = justifyLeft j c `eqP` (unpackS . S.justifyLeftI j c) where j = fromIntegral (k :: Word8) s_justifyLeft_s k c = justifyLeft j c `eqP` (unpackS . S.unstream . S.justifyLeftI j c) where j = fromIntegral (k :: Word8) sf_justifyLeft p k c = (justifyLeft j c . L.filter p) `eqP` (unpackS . S.justifyLeftI j c . S.filter p) where j = fromIntegral (k :: Word8) t_justifyLeft k c = justifyLeft j c `eqP` (unpackS . T.justifyLeft j c) where j = fromIntegral (k :: Word8) tl_justifyLeft k c = justifyLeft j c `eqP` (unpackS . TL.justifyLeft (fromIntegral j) c) where j = fromIntegral (k :: Word8) t_justifyRight k c = justifyRight j c `eqP` (unpackS . T.justifyRight j c) where j = fromIntegral (k :: Word8) tl_justifyRight k c = justifyRight j c `eqP` (unpackS . TL.justifyRight (fromIntegral j) c) where j = fromIntegral (k :: Word8) t_center k c = center j c `eqP` (unpackS . T.center j c) where j = fromIntegral (k :: Word8) tl_center k c = center j c `eqP` (unpackS . TL.center (fromIntegral j) c) where j = fromIntegral (k :: Word8) sf_foldl p f z = (L.foldl f z . L.filter p) `eqP` (S.foldl f z . S.filter p) where _types = f :: Char -> Char -> Char t_foldl f z = L.foldl f z `eqP` (T.foldl f z) where _types = f :: Char -> Char -> Char tl_foldl f z = L.foldl f z `eqP` (TL.foldl f z) where _types = f :: Char -> Char -> Char sf_foldl' p f z = (L.foldl' f z . L.filter p) `eqP` (S.foldl' f z . S.filter p) where _types = f :: Char -> Char -> Char t_foldl' f z = L.foldl' f z `eqP` T.foldl' f z where _types = f :: Char -> Char -> Char tl_foldl' f z = L.foldl' f z `eqP` TL.foldl' f z where _types = f :: Char -> Char -> Char sf_foldl1 p f = (L.foldl1 f . L.filter p) `eqP` (S.foldl1 f . S.filter p) t_foldl1 f = L.foldl1 f `eqP` T.foldl1 f tl_foldl1 f = L.foldl1 f `eqP` TL.foldl1 f sf_foldl1' p f = (L.foldl1' f . L.filter p) `eqP` (S.foldl1' f . S.filter p) t_foldl1' f = L.foldl1' f `eqP` T.foldl1' f tl_foldl1' f = L.foldl1' f `eqP` TL.foldl1' f sf_foldr p f z = (L.foldr f z . L.filter p) `eqP` (S.foldr f z . S.filter p) where _types = f :: Char -> Char -> Char t_foldr f z = L.foldr f z `eqP` T.foldr f z where _types = f :: Char -> Char -> Char tl_foldr f z = L.foldr f z `eqP` TL.foldr f z where _types = f :: Char -> Char -> Char sf_foldr1 p f = (L.foldr1 f . L.filter p) `eqP` (S.foldr1 f . S.filter p) t_foldr1 f = L.foldr1 f `eqP` T.foldr1 f tl_foldr1 f = L.foldr1 f `eqP` TL.foldr1 f s_concat_s = L.concat `eq` (unpackS . S.unstream . S.concat . map packS) sf_concat p = (L.concat . map (L.filter p)) `eq` (unpackS . S.concat . map (S.filter p . packS)) t_concat = L.concat `eq` (unpackS . T.concat . map packS) tl_concat = L.concat `eq` (unpackS . TL.concat . map TL.pack) sf_concatMap p f = unsquare $ (L.concatMap f . L.filter p) `eqP` (unpackS . S.concatMap (packS . f) . S.filter p) t_concatMap f = unsquare $ L.concatMap f `eqP` (unpackS . T.concatMap (packS . f)) tl_concatMap f = unsquare $ L.concatMap f `eqP` (unpackS . TL.concatMap (TL.pack . f)) sf_any q p = (L.any p . L.filter q) `eqP` (S.any p . S.filter q) t_any p = L.any p `eqP` T.any p tl_any p = L.any p `eqP` TL.any p sf_all q p = (L.all p . L.filter q) `eqP` (S.all p . S.filter q) t_all p = L.all p `eqP` T.all p tl_all p = L.all p `eqP` TL.all p sf_maximum p = (L.maximum . L.filter p) `eqP` (S.maximum . S.filter p) t_maximum = L.maximum `eqP` T.maximum tl_maximum = L.maximum `eqP` TL.maximum sf_minimum p = (L.minimum . L.filter p) `eqP` (S.minimum . S.filter p) t_minimum = L.minimum `eqP` T.minimum tl_minimum = L.minimum `eqP` TL.minimum sf_scanl p f z = (L.scanl f z . L.filter p) `eqP` (unpackS . S.scanl f z . S.filter p) t_scanl f z = L.scanl f z `eqP` (unpackS . T.scanl f z) tl_scanl f z = L.scanl f z `eqP` (unpackS . TL.scanl f z) t_scanl1 f = L.scanl1 f `eqP` (unpackS . T.scanl1 f) tl_scanl1 f = L.scanl1 f `eqP` (unpackS . TL.scanl1 f) t_scanr f z = L.scanr f z `eqP` (unpackS . T.scanr f z) tl_scanr f z = L.scanr f z `eqP` (unpackS . TL.scanr f z) t_scanr1 f = L.scanr1 f `eqP` (unpackS . T.scanr1 f) tl_scanr1 f = L.scanr1 f `eqP` (unpackS . TL.scanr1 f) t_mapAccumL f z = L.mapAccumL f z `eqP` (second unpackS . T.mapAccumL f z) where _types = f :: Int -> Char -> (Int,Char) tl_mapAccumL f z = L.mapAccumL f z `eqP` (second unpackS . TL.mapAccumL f z) where _types = f :: Int -> Char -> (Int,Char) t_mapAccumR f z = L.mapAccumR f z `eqP` (second unpackS . T.mapAccumR f z) where _types = f :: Int -> Char -> (Int,Char) tl_mapAccumR f z = L.mapAccumR f z `eqP` (second unpackS . TL.mapAccumR f z) where _types = f :: Int -> Char -> (Int,Char) replicate n l = concat (L.replicate n l) s_replicate n = replicate m `eq` (unpackS . S.replicateI (fromIntegral m) . packS) where m = fromIntegral (n :: Word8) t_replicate n = replicate m `eq` (unpackS . T.replicate m . packS) where m = fromIntegral (n :: Word8) tl_replicate n = replicate m `eq` (unpackS . TL.replicate (fromIntegral m) . packS) where m = fromIntegral (n :: Word8) unf :: Int -> Char -> Maybe (Char, Char) unf n c | fromEnum c * 100 > n = Nothing | otherwise = Just (c, succ c) t_unfoldr n = L.unfoldr (unf m) `eq` (unpackS . T.unfoldr (unf m)) where m = fromIntegral (n :: Word16) tl_unfoldr n = L.unfoldr (unf m) `eq` (unpackS . TL.unfoldr (unf m)) where m = fromIntegral (n :: Word16) t_unfoldrN n m = (L.take i . L.unfoldr (unf j)) `eq` (unpackS . T.unfoldrN i (unf j)) where i = fromIntegral (n :: Word16) j = fromIntegral (m :: Word16) tl_unfoldrN n m = (L.take i . L.unfoldr (unf j)) `eq` (unpackS . TL.unfoldrN (fromIntegral i) (unf j)) where i = fromIntegral (n :: Word16) j = fromIntegral (m :: Word16) unpack2 :: (Stringy s) => (s,s) -> (String,String) unpack2 = unpackS *** unpackS s_take n = L.take n `eqP` (unpackS . S.take n) s_take_s m = L.take n `eqP` (unpackS . S.unstream . S.take n) where n = small m sf_take p n = (L.take n . L.filter p) `eqP` (unpackS . S.take n . S.filter p) t_take n = L.take n `eqP` (unpackS . T.take n) tl_take n = L.take n `eqP` (unpackS . TL.take (fromIntegral n)) s_drop n = L.drop n `eqP` (unpackS . S.drop n) s_drop_s m = L.drop n `eqP` (unpackS . S.unstream . S.drop n) where n = small m sf_drop p n = (L.drop n . L.filter p) `eqP` (unpackS . S.drop n . S.filter p) t_drop n = L.drop n `eqP` (unpackS . T.drop n) tl_drop n = L.drop n `eqP` (unpackS . TL.drop (fromIntegral n)) s_take_drop m = (L.take n . L.drop n) `eqP` (unpackS . S.take n . S.drop n) where n = small m s_take_drop_s m = (L.take n . L.drop n) `eqP` (unpackS . S.unstream . S.take n . S.drop n) where n = small m s_takeWhile p = L.takeWhile p `eqP` (unpackS . S.takeWhile p) s_takeWhile_s p = L.takeWhile p `eqP` (unpackS . S.unstream . S.takeWhile p) sf_takeWhile q p = (L.takeWhile p . L.filter q) `eqP` (unpackS . S.takeWhile p . S.filter q) t_takeWhile p = L.takeWhile p `eqP` (unpackS . T.takeWhile p) tl_takeWhile p = L.takeWhile p `eqP` (unpackS . TL.takeWhile p) s_dropWhile p = L.dropWhile p `eqP` (unpackS . S.dropWhile p) s_dropWhile_s p = L.dropWhile p `eqP` (unpackS . S.unstream . S.dropWhile p) sf_dropWhile q p = (L.dropWhile p . L.filter q) `eqP` (unpackS . S.dropWhile p . S.filter q) t_dropWhile p = L.dropWhile p `eqP` (unpackS . T.dropWhile p) tl_dropWhile p = L.dropWhile p `eqP` (unpackS . S.dropWhile p) t_dropWhileEnd p = (L.reverse . L.dropWhile p . L.reverse) `eqP` (unpackS . T.dropWhileEnd p) tl_dropWhileEnd p = (L.reverse . L.dropWhile p . L.reverse) `eqP` (unpackS . TL.dropWhileEnd p) t_dropAround p = (L.dropWhile p . L.reverse . L.dropWhile p . L.reverse) `eqP` (unpackS . T.dropAround p) tl_dropAround p = (L.dropWhile p . L.reverse . L.dropWhile p . L.reverse) `eqP` (unpackS . TL.dropAround p) t_stripStart = T.dropWhile isSpace `eq` T.stripStart tl_stripStart = TL.dropWhile isSpace `eq` TL.stripStart t_stripEnd = T.dropWhileEnd isSpace `eq` T.stripEnd tl_stripEnd = TL.dropWhileEnd isSpace `eq` TL.stripEnd t_strip = T.dropAround isSpace `eq` T.strip tl_strip = TL.dropAround isSpace `eq` TL.strip t_splitAt n = L.splitAt n `eqP` (unpack2 . T.splitAt n) tl_splitAt n = L.splitAt n `eqP` (unpack2 . TL.splitAt (fromIntegral n)) t_span p = L.span p `eqP` (unpack2 . T.span p) tl_span p = L.span p `eqP` (unpack2 . TL.span p) t_breakOn_id s = squid `eq` (uncurry T.append . T.breakOn s) where squid t | T.null s = error "empty" | otherwise = t tl_breakOn_id s = squid `eq` (uncurry TL.append . TL.breakOn s) where squid t | TL.null s = error "empty" | otherwise = t t_breakOn_start (NotEmpty s) t = let (k,m) = T.breakOn s t in k `T.isPrefixOf` t && (T.null m || s `T.isPrefixOf` m) tl_breakOn_start (NotEmpty s) t = let (k,m) = TL.breakOn s t in k `TL.isPrefixOf` t && TL.null m || s `TL.isPrefixOf` m t_breakOnEnd_end (NotEmpty s) t = let (m,k) = T.breakOnEnd s t in k `T.isSuffixOf` t && (T.null m || s `T.isSuffixOf` m) tl_breakOnEnd_end (NotEmpty s) t = let (m,k) = TL.breakOnEnd s t in k `TL.isSuffixOf` t && (TL.null m || s `TL.isSuffixOf` m) t_break p = L.break p `eqP` (unpack2 . T.break p) tl_break p = L.break p `eqP` (unpack2 . TL.break p) t_group = L.group `eqP` (map unpackS . T.group) tl_group = L.group `eqP` (map unpackS . TL.group) t_groupBy p = L.groupBy p `eqP` (map unpackS . T.groupBy p) tl_groupBy p = L.groupBy p `eqP` (map unpackS . TL.groupBy p) t_inits = L.inits `eqP` (map unpackS . T.inits) tl_inits = L.inits `eqP` (map unpackS . TL.inits) t_tails = L.tails `eqP` (map unpackS . T.tails) tl_tails = L.tails `eqP` (map unpackS . TL.tails) t_findAppendId (NotEmpty s) = unsquare $ \ts -> let t = T.intercalate s ts in all (==t) $ map (uncurry T.append) (T.breakOnAll s t) tl_findAppendId (NotEmpty s) = unsquare $ \ts -> let t = TL.intercalate s ts in all (==t) $ map (uncurry TL.append) (TL.breakOnAll s t) t_findContains (NotEmpty s) = all (T.isPrefixOf s . snd) . T.breakOnAll s . T.intercalate s tl_findContains (NotEmpty s) = all (TL.isPrefixOf s . snd) . TL.breakOnAll s . TL.intercalate s sl_filterCount c = (L.genericLength . L.filter (==c)) `eqP` SL.countChar c t_findCount s = (L.length . T.breakOnAll s) `eq` T.count s tl_findCount s = (L.genericLength . TL.breakOnAll s) `eq` TL.count s t_splitOn_split s = (T.splitOn s `eq` Slow.splitOn s) . T.intercalate s tl_splitOn_split s = ((TL.splitOn (TL.fromStrict s) . TL.fromStrict) `eq` (map TL.fromStrict . T.splitOn s)) . T.intercalate s t_splitOn_i (NotEmpty t) = id `eq` (T.intercalate t . T.splitOn t) tl_splitOn_i (NotEmpty t) = id `eq` (TL.intercalate t . TL.splitOn t) t_split p = split p `eqP` (map unpackS . T.split p) t_split_count c = (L.length . T.split (==c)) `eq` ((1+) . T.count (T.singleton c)) t_split_splitOn c = T.split (==c) `eq` T.splitOn (T.singleton c) tl_split p = split p `eqP` (map unpackS . TL.split p) split :: (a -> Bool) -> [a] -> [[a]] split _ [] = [[]] split p xs = loop xs where loop s | null s' = [l] | otherwise = l : loop (tail s') where (l, s') = break p s t_chunksOf_same_lengths k = all ((==k) . T.length) . ini . T.chunksOf k where ini [] = [] ini xs = init xs t_chunksOf_length k t = len == T.length t || (k <= 0 && len == 0) where len = L.sum . L.map T.length $ T.chunksOf k t tl_chunksOf k = T.chunksOf k `eq` (map (T.concat . TL.toChunks) . TL.chunksOf (fromIntegral k) . TL.fromStrict) t_lines = L.lines `eqP` (map unpackS . T.lines) tl_lines = L.lines `eqP` (map unpackS . TL.lines) {- t_lines' = lines' `eqP` (map unpackS . T.lines') where lines' "" = [] lines' s = let (l, s') = break eol s in l : case s' of [] -> [] ('\r':'\n':s'') -> lines' s'' (_:s'') -> lines' s'' eol c = c == '\r' || c == '\n' -} t_words = L.words `eqP` (map unpackS . T.words) tl_words = L.words `eqP` (map unpackS . TL.words) t_unlines = L.unlines `eq` (unpackS . T.unlines . map packS) tl_unlines = L.unlines `eq` (unpackS . TL.unlines . map packS) t_unwords = L.unwords `eq` (unpackS . T.unwords . map packS) tl_unwords = L.unwords `eq` (unpackS . TL.unwords . map packS) s_isPrefixOf s = L.isPrefixOf s `eqP` (S.isPrefixOf (S.stream $ packS s) . S.stream) sf_isPrefixOf p s = (L.isPrefixOf s . L.filter p) `eqP` (S.isPrefixOf (S.stream $ packS s) . S.filter p . S.stream) t_isPrefixOf s = L.isPrefixOf s`eqP` T.isPrefixOf (packS s) tl_isPrefixOf s = L.isPrefixOf s`eqP` TL.isPrefixOf (packS s) t_isSuffixOf s = L.isSuffixOf s`eqP` T.isSuffixOf (packS s) tl_isSuffixOf s = L.isSuffixOf s`eqP` TL.isSuffixOf (packS s) t_isInfixOf s = L.isInfixOf s `eqP` T.isInfixOf (packS s) tl_isInfixOf s = L.isInfixOf s `eqP` TL.isInfixOf (packS s) t_stripPrefix s = (fmap packS . L.stripPrefix s) `eqP` T.stripPrefix (packS s) tl_stripPrefix s = (fmap packS . L.stripPrefix s) `eqP` TL.stripPrefix (packS s) stripSuffix p t = reverse `fmap` L.stripPrefix (reverse p) (reverse t) t_stripSuffix s = (fmap packS . stripSuffix s) `eqP` T.stripSuffix (packS s) tl_stripSuffix s = (fmap packS . stripSuffix s) `eqP` TL.stripSuffix (packS s) commonPrefixes a0@(_:_) b0@(_:_) = Just (go a0 b0 []) where go (a:as) (b:bs) ps | a == b = go as bs (a:ps) go as bs ps = (reverse ps,as,bs) commonPrefixes _ _ = Nothing t_commonPrefixes a b (NonEmpty p) = commonPrefixes pa pb == repack `fmap` T.commonPrefixes (packS pa) (packS pb) where repack (x,y,z) = (unpackS x,unpackS y,unpackS z) pa = p ++ a pb = p ++ b tl_commonPrefixes a b (NonEmpty p) = commonPrefixes pa pb == repack `fmap` TL.commonPrefixes (packS pa) (packS pb) where repack (x,y,z) = (unpackS x,unpackS y,unpackS z) pa = p ++ a pb = p ++ b sf_elem p c = (L.elem c . L.filter p) `eqP` (S.elem c . S.filter p) sf_filter q p = (L.filter p . L.filter q) `eqP` (unpackS . S.filter p . S.filter q) t_filter p = L.filter p `eqP` (unpackS . T.filter p) tl_filter p = L.filter p `eqP` (unpackS . TL.filter p) sf_findBy q p = (L.find p . L.filter q) `eqP` (S.findBy p . S.filter q) t_find p = L.find p `eqP` T.find p tl_find p = L.find p `eqP` TL.find p t_partition p = L.partition p `eqP` (unpack2 . T.partition p) tl_partition p = L.partition p `eqP` (unpack2 . TL.partition p) sf_index p s = forAll (choose (-l,l*2)) ((L.filter p s L.!!) `eq` S.index (S.filter p $ packS s)) where l = L.length s t_index s = forAll (choose (-l,l*2)) ((s L.!!) `eq` T.index (packS s)) where l = L.length s tl_index s = forAll (choose (-l,l*2)) ((s L.!!) `eq` (TL.index (packS s) . fromIntegral)) where l = L.length s t_findIndex p = L.findIndex p `eqP` T.findIndex p t_count (NotEmpty t) = (subtract 1 . L.length . T.splitOn t) `eq` T.count t tl_count (NotEmpty t) = (subtract 1 . L.genericLength . TL.splitOn t) `eq` TL.count t t_zip s = L.zip s `eqP` T.zip (packS s) tl_zip s = L.zip s `eqP` TL.zip (packS s) sf_zipWith p c s = (L.zipWith c (L.filter p s) . L.filter p) `eqP` (unpackS . S.zipWith c (S.filter p $ packS s) . S.filter p) t_zipWith c s = L.zipWith c s `eqP` (unpackS . T.zipWith c (packS s)) tl_zipWith c s = L.zipWith c s `eqP` (unpackS . TL.zipWith c (packS s)) t_indices (NotEmpty s) = Slow.indices s `eq` indices s tl_indices (NotEmpty s) = lazyIndices s `eq` S.indices s where lazyIndices ss t = map fromIntegral $ Slow.indices (conc ss) (conc t) conc = T.concat . TL.toChunks t_indices_occurs (NotEmpty t) ts = let s = T.intercalate t ts in Slow.indices t s == indices t s -- Bit shifts. shiftL w = forAll (choose (0,width-1)) $ \k -> Bits.shiftL w k == U.shiftL w k where width = round (log (fromIntegral m) / log 2 :: Double) (m,_) = (maxBound, m == w) shiftR w = forAll (choose (0,width-1)) $ \k -> Bits.shiftR w k == U.shiftR w k where width = round (log (fromIntegral m) / log 2 :: Double) (m,_) = (maxBound, m == w) shiftL_Int = shiftL :: Int -> Property shiftL_Word16 = shiftL :: Word16 -> Property shiftL_Word32 = shiftL :: Word32 -> Property shiftR_Int = shiftR :: Int -> Property shiftR_Word16 = shiftR :: Word16 -> Property shiftR_Word32 = shiftR :: Word32 -> Property -- Builder. tb_singleton = id `eqP` (unpackS . TB.toLazyText . mconcat . map TB.singleton) tb_fromText = L.concat `eq` (unpackS . TB.toLazyText . mconcat . map (TB.fromText . packS)) tb_associative s1 s2 s3 = TB.toLazyText (b1 `mappend` (b2 `mappend` b3)) == TB.toLazyText ((b1 `mappend` b2) `mappend` b3) where b1 = TB.fromText (packS s1) b2 = TB.fromText (packS s2) b3 = TB.fromText (packS s3) -- Numeric builder stuff. tb_decimal :: (Integral a, Show a) => a -> Bool tb_decimal = (TB.toLazyText . TB.decimal) `eq` (TL.pack . show) tb_decimal_integer (a::Integer) = tb_decimal a tb_decimal_int (a::Int) = tb_decimal a tb_decimal_int8 (a::Int8) = tb_decimal a tb_decimal_int16 (a::Int16) = tb_decimal a tb_decimal_int32 (a::Int32) = tb_decimal a tb_decimal_int64 (a::Int64) = tb_decimal a tb_decimal_word (a::Word) = tb_decimal a tb_decimal_word8 (a::Word8) = tb_decimal a tb_decimal_word16 (a::Word16) = tb_decimal a tb_decimal_word32 (a::Word32) = tb_decimal a tb_decimal_word64 (a::Word64) = tb_decimal a tb_hex :: (Integral a, Show a) => a -> Bool tb_hex = (TB.toLazyText . TB.hexadecimal) `eq` (TL.pack . flip showHex "") tb_hexadecimal_integer (a::Integer) = tb_hex a tb_hexadecimal_int (a::Int) = tb_hex a tb_hexadecimal_int8 (a::Int8) = tb_hex a tb_hexadecimal_int16 (a::Int16) = tb_hex a tb_hexadecimal_int32 (a::Int32) = tb_hex a tb_hexadecimal_int64 (a::Int64) = tb_hex a tb_hexadecimal_word (a::Word) = tb_hex a tb_hexadecimal_word8 (a::Word8) = tb_hex a tb_hexadecimal_word16 (a::Word16) = tb_hex a tb_hexadecimal_word32 (a::Word32) = tb_hex a tb_hexadecimal_word64 (a::Word64) = tb_hex a tb_realfloat :: (RealFloat a, Show a) => a -> Bool tb_realfloat = (TB.toLazyText . TB.realFloat) `eq` (TL.pack . show) tb_realfloat_float (a::Float) = tb_realfloat a tb_realfloat_double (a::Double) = tb_realfloat a -- Reading. t_decimal (n::Int) s = T.signed T.decimal (T.pack (show n) `T.append` t) == Right (n,t) where t = T.dropWhile isDigit s tl_decimal (n::Int) s = TL.signed TL.decimal (TL.pack (show n) `TL.append` t) == Right (n,t) where t = TL.dropWhile isDigit s t_hexadecimal (n::Positive Int) s ox = T.hexadecimal (T.concat [p, T.pack (showHex n ""), t]) == Right (n,t) where t = T.dropWhile isHexDigit s p = if ox then "0x" else "" tl_hexadecimal (n::Positive Int) s ox = TL.hexadecimal (TL.concat [p, TL.pack (showHex n ""), t]) == Right (n,t) where t = TL.dropWhile isHexDigit s p = if ox then "0x" else "" isFloaty c = c `elem` "+-.0123456789eE" t_read_rational p tol (n::Double) s = case p (T.pack (show n) `T.append` t) of Left _err -> False Right (n',t') -> t == t' && abs (n-n') <= tol where t = T.dropWhile isFloaty s tl_read_rational p tol (n::Double) s = case p (TL.pack (show n) `TL.append` t) of Left _err -> False Right (n',t') -> t == t' && abs (n-n') <= tol where t = TL.dropWhile isFloaty s t_double = t_read_rational T.double 1e-13 tl_double = tl_read_rational TL.double 1e-13 t_rational = t_read_rational T.rational 1e-16 tl_rational = tl_read_rational TL.rational 1e-16 -- Input and output. t_put_get = write_read T.unlines T.filter put get where put h = withRedirect h IO.stdout . T.putStr get h = withRedirect h IO.stdin T.getContents tl_put_get = write_read TL.unlines TL.filter put get where put h = withRedirect h IO.stdout . TL.putStr get h = withRedirect h IO.stdin TL.getContents t_write_read = write_read T.unlines T.filter T.hPutStr T.hGetContents tl_write_read = write_read TL.unlines TL.filter TL.hPutStr TL.hGetContents t_write_read_line e m b t = write_read head T.filter T.hPutStrLn T.hGetLine e m b [t] tl_write_read_line e m b t = write_read head TL.filter TL.hPutStrLn TL.hGetLine e m b [t] -- Low-level. t_dropWord16 m t = dropWord16 m t `T.isSuffixOf` t t_takeWord16 m t = takeWord16 m t `T.isPrefixOf` t t_take_drop_16 m t = T.append (takeWord16 n t) (dropWord16 n t) == t where n = small m t_use_from t = monadicIO $ assert . (==t) =<< run (useAsPtr t fromPtr) t_copy t = T.copy t == t -- Regression tests. s_filter_eq s = S.filter p t == S.streamList (filter p s) where p = (/= S.last t) t = S.streamList s -- Make a stream appear shorter than it really is, to ensure that -- functions that consume inaccurately sized streams behave -- themselves. shorten :: Int -> S.Stream a -> S.Stream a shorten n t@(S.Stream arr off len) | n > 0 = S.Stream arr off (smaller (exactSize n) len) | otherwise = t tests :: Test tests = testGroup "Properties" [ testGroup "creation/elimination" [ testProperty "t_pack_unpack" t_pack_unpack, testProperty "tl_pack_unpack" tl_pack_unpack, testProperty "t_stream_unstream" t_stream_unstream, testProperty "tl_stream_unstream" tl_stream_unstream, testProperty "t_reverse_stream" t_reverse_stream, testProperty "t_singleton" t_singleton, testProperty "tl_singleton" tl_singleton, testProperty "tl_unstreamChunks" tl_unstreamChunks, testProperty "tl_chunk_unchunk" tl_chunk_unchunk, testProperty "tl_from_to_strict" tl_from_to_strict ], testGroup "transcoding" [ testProperty "t_ascii" t_ascii, testProperty "tl_ascii" tl_ascii, testProperty "t_latin1" t_latin1, testProperty "tl_latin1" tl_latin1, testProperty "t_utf8" t_utf8, testProperty "t_utf8'" t_utf8', testProperty "tl_utf8" tl_utf8, testProperty "tl_utf8'" tl_utf8', testProperty "t_utf16LE" t_utf16LE, testProperty "tl_utf16LE" tl_utf16LE, testProperty "t_utf16BE" t_utf16BE, testProperty "tl_utf16BE" tl_utf16BE, testProperty "t_utf32LE" t_utf32LE, testProperty "tl_utf32LE" tl_utf32LE, testProperty "t_utf32BE" t_utf32BE, testProperty "tl_utf32BE" tl_utf32BE, testGroup "errors" [ testProperty "t_utf8_err" t_utf8_err, testProperty "t_utf8_err'" t_utf8_err' ] ], testGroup "instances" [ testProperty "s_Eq" s_Eq, testProperty "sf_Eq" sf_Eq, testProperty "t_Eq" t_Eq, testProperty "tl_Eq" tl_Eq, testProperty "s_Ord" s_Ord, testProperty "sf_Ord" sf_Ord, testProperty "t_Ord" t_Ord, testProperty "tl_Ord" tl_Ord, testProperty "t_Read" t_Read, testProperty "tl_Read" tl_Read, testProperty "t_Show" t_Show, testProperty "tl_Show" tl_Show, testProperty "t_mappend" t_mappend, testProperty "tl_mappend" tl_mappend, testProperty "t_mconcat" t_mconcat, testProperty "tl_mconcat" tl_mconcat, testProperty "t_mempty" t_mempty, testProperty "tl_mempty" tl_mempty, testProperty "t_IsString" t_IsString, testProperty "tl_IsString" tl_IsString ], testGroup "basics" [ testProperty "s_cons" s_cons, testProperty "s_cons_s" s_cons_s, testProperty "sf_cons" sf_cons, testProperty "t_cons" t_cons, testProperty "tl_cons" tl_cons, testProperty "s_snoc" s_snoc, testProperty "t_snoc" t_snoc, testProperty "tl_snoc" tl_snoc, testProperty "s_append" s_append, testProperty "s_append_s" s_append_s, testProperty "sf_append" sf_append, testProperty "t_append" t_append, testProperty "s_uncons" s_uncons, testProperty "sf_uncons" sf_uncons, testProperty "t_uncons" t_uncons, testProperty "tl_uncons" tl_uncons, testProperty "s_head" s_head, testProperty "sf_head" sf_head, testProperty "t_head" t_head, testProperty "tl_head" tl_head, testProperty "s_last" s_last, testProperty "sf_last" sf_last, testProperty "t_last" t_last, testProperty "tl_last" tl_last, testProperty "s_tail" s_tail, testProperty "s_tail_s" s_tail_s, testProperty "sf_tail" sf_tail, testProperty "t_tail" t_tail, testProperty "tl_tail" tl_tail, testProperty "s_init" s_init, testProperty "s_init_s" s_init_s, testProperty "sf_init" sf_init, testProperty "t_init" t_init, testProperty "tl_init" tl_init, testProperty "s_null" s_null, testProperty "sf_null" sf_null, testProperty "t_null" t_null, testProperty "tl_null" tl_null, testProperty "s_length" s_length, testProperty "sf_length" sf_length, testProperty "sl_length" sl_length, testProperty "t_length" t_length, testProperty "tl_length" tl_length, testProperty "t_compareLength" t_compareLength, testProperty "tl_compareLength" tl_compareLength ], testGroup "transformations" [ testProperty "s_map" s_map, testProperty "s_map_s" s_map_s, testProperty "sf_map" sf_map, testProperty "t_map" t_map, testProperty "tl_map" tl_map, testProperty "s_intercalate" s_intercalate, testProperty "t_intercalate" t_intercalate, testProperty "tl_intercalate" tl_intercalate, testProperty "s_intersperse" s_intersperse, testProperty "s_intersperse_s" s_intersperse_s, testProperty "sf_intersperse" sf_intersperse, testProperty "t_intersperse" t_intersperse, testProperty "tl_intersperse" tl_intersperse, testProperty "t_transpose" t_transpose, testProperty "tl_transpose" tl_transpose, testProperty "t_reverse" t_reverse, testProperty "tl_reverse" tl_reverse, testProperty "t_reverse_short" t_reverse_short, testProperty "t_replace" t_replace, testProperty "tl_replace" tl_replace, testGroup "case conversion" [ testProperty "s_toCaseFold_length" s_toCaseFold_length, testProperty "sf_toCaseFold_length" sf_toCaseFold_length, testProperty "t_toCaseFold_length" t_toCaseFold_length, testProperty "tl_toCaseFold_length" tl_toCaseFold_length, testProperty "t_toLower_length" t_toLower_length, testProperty "t_toLower_lower" t_toLower_lower, testProperty "tl_toLower_lower" tl_toLower_lower, testProperty "t_toUpper_length" t_toUpper_length, testProperty "t_toUpper_upper" t_toUpper_upper, testProperty "tl_toUpper_upper" tl_toUpper_upper ], testGroup "justification" [ testProperty "s_justifyLeft" s_justifyLeft, testProperty "s_justifyLeft_s" s_justifyLeft_s, testProperty "sf_justifyLeft" sf_justifyLeft, testProperty "t_justifyLeft" t_justifyLeft, testProperty "tl_justifyLeft" tl_justifyLeft, testProperty "t_justifyRight" t_justifyRight, testProperty "tl_justifyRight" tl_justifyRight, testProperty "t_center" t_center, testProperty "tl_center" tl_center ] ], testGroup "folds" [ testProperty "sf_foldl" sf_foldl, testProperty "t_foldl" t_foldl, testProperty "tl_foldl" tl_foldl, testProperty "sf_foldl'" sf_foldl', testProperty "t_foldl'" t_foldl', testProperty "tl_foldl'" tl_foldl', testProperty "sf_foldl1" sf_foldl1, testProperty "t_foldl1" t_foldl1, testProperty "tl_foldl1" tl_foldl1, testProperty "t_foldl1'" t_foldl1', testProperty "sf_foldl1'" sf_foldl1', testProperty "tl_foldl1'" tl_foldl1', testProperty "sf_foldr" sf_foldr, testProperty "t_foldr" t_foldr, testProperty "tl_foldr" tl_foldr, testProperty "sf_foldr1" sf_foldr1, testProperty "t_foldr1" t_foldr1, testProperty "tl_foldr1" tl_foldr1, testGroup "special" [ testProperty "s_concat_s" s_concat_s, testProperty "sf_concat" sf_concat, testProperty "t_concat" t_concat, testProperty "tl_concat" tl_concat, testProperty "sf_concatMap" sf_concatMap, testProperty "t_concatMap" t_concatMap, testProperty "tl_concatMap" tl_concatMap, testProperty "sf_any" sf_any, testProperty "t_any" t_any, testProperty "tl_any" tl_any, testProperty "sf_all" sf_all, testProperty "t_all" t_all, testProperty "tl_all" tl_all, testProperty "sf_maximum" sf_maximum, testProperty "t_maximum" t_maximum, testProperty "tl_maximum" tl_maximum, testProperty "sf_minimum" sf_minimum, testProperty "t_minimum" t_minimum, testProperty "tl_minimum" tl_minimum ] ], testGroup "construction" [ testGroup "scans" [ testProperty "sf_scanl" sf_scanl, testProperty "t_scanl" t_scanl, testProperty "tl_scanl" tl_scanl, testProperty "t_scanl1" t_scanl1, testProperty "tl_scanl1" tl_scanl1, testProperty "t_scanr" t_scanr, testProperty "tl_scanr" tl_scanr, testProperty "t_scanr1" t_scanr1, testProperty "tl_scanr1" tl_scanr1 ], testGroup "mapAccum" [ testProperty "t_mapAccumL" t_mapAccumL, testProperty "tl_mapAccumL" tl_mapAccumL, testProperty "t_mapAccumR" t_mapAccumR, testProperty "tl_mapAccumR" tl_mapAccumR ], testGroup "unfolds" [ testProperty "s_replicate" s_replicate, testProperty "t_replicate" t_replicate, testProperty "tl_replicate" tl_replicate, testProperty "t_unfoldr" t_unfoldr, testProperty "tl_unfoldr" tl_unfoldr, testProperty "t_unfoldrN" t_unfoldrN, testProperty "tl_unfoldrN" tl_unfoldrN ] ], testGroup "substrings" [ testGroup "breaking" [ testProperty "s_take" s_take, testProperty "s_take_s" s_take_s, testProperty "sf_take" sf_take, testProperty "t_take" t_take, testProperty "tl_take" tl_take, testProperty "s_drop" s_drop, testProperty "s_drop_s" s_drop_s, testProperty "sf_drop" sf_drop, testProperty "t_drop" t_drop, testProperty "tl_drop" tl_drop, testProperty "s_take_drop" s_take_drop, testProperty "s_take_drop_s" s_take_drop_s, testProperty "s_takeWhile" s_takeWhile, testProperty "s_takeWhile_s" s_takeWhile_s, testProperty "sf_takeWhile" sf_takeWhile, testProperty "t_takeWhile" t_takeWhile, testProperty "tl_takeWhile" tl_takeWhile, testProperty "sf_dropWhile" sf_dropWhile, testProperty "s_dropWhile" s_dropWhile, testProperty "s_dropWhile_s" s_dropWhile_s, testProperty "t_dropWhile" t_dropWhile, testProperty "tl_dropWhile" tl_dropWhile, testProperty "t_dropWhileEnd" t_dropWhileEnd, testProperty "tl_dropWhileEnd" tl_dropWhileEnd, testProperty "t_dropAround" t_dropAround, testProperty "tl_dropAround" tl_dropAround, testProperty "t_stripStart" t_stripStart, testProperty "tl_stripStart" tl_stripStart, testProperty "t_stripEnd" t_stripEnd, testProperty "tl_stripEnd" tl_stripEnd, testProperty "t_strip" t_strip, testProperty "tl_strip" tl_strip, testProperty "t_splitAt" t_splitAt, testProperty "tl_splitAt" tl_splitAt, testProperty "t_span" t_span, testProperty "tl_span" tl_span, testProperty "t_breakOn_id" t_breakOn_id, testProperty "tl_breakOn_id" tl_breakOn_id, testProperty "t_breakOn_start" t_breakOn_start, testProperty "tl_breakOn_start" tl_breakOn_start, testProperty "t_breakOnEnd_end" t_breakOnEnd_end, testProperty "tl_breakOnEnd_end" tl_breakOnEnd_end, testProperty "t_break" t_break, testProperty "tl_break" tl_break, testProperty "t_group" t_group, testProperty "tl_group" tl_group, testProperty "t_groupBy" t_groupBy, testProperty "tl_groupBy" tl_groupBy, testProperty "t_inits" t_inits, testProperty "tl_inits" tl_inits, testProperty "t_tails" t_tails, testProperty "tl_tails" tl_tails ], testGroup "breaking many" [ testProperty "t_findAppendId" t_findAppendId, testProperty "tl_findAppendId" tl_findAppendId, testProperty "t_findContains" t_findContains, testProperty "tl_findContains" tl_findContains, testProperty "sl_filterCount" sl_filterCount, testProperty "t_findCount" t_findCount, testProperty "tl_findCount" tl_findCount, testProperty "t_splitOn_split" t_splitOn_split, testProperty "tl_splitOn_split" tl_splitOn_split, testProperty "t_splitOn_i" t_splitOn_i, testProperty "tl_splitOn_i" tl_splitOn_i, testProperty "t_split" t_split, testProperty "t_split_count" t_split_count, testProperty "t_split_splitOn" t_split_splitOn, testProperty "tl_split" tl_split, testProperty "t_chunksOf_same_lengths" t_chunksOf_same_lengths, testProperty "t_chunksOf_length" t_chunksOf_length, testProperty "tl_chunksOf" tl_chunksOf ], testGroup "lines and words" [ testProperty "t_lines" t_lines, testProperty "tl_lines" tl_lines, --testProperty "t_lines'" t_lines', testProperty "t_words" t_words, testProperty "tl_words" tl_words, testProperty "t_unlines" t_unlines, testProperty "tl_unlines" tl_unlines, testProperty "t_unwords" t_unwords, testProperty "tl_unwords" tl_unwords ] ], testGroup "predicates" [ testProperty "s_isPrefixOf" s_isPrefixOf, testProperty "sf_isPrefixOf" sf_isPrefixOf, testProperty "t_isPrefixOf" t_isPrefixOf, testProperty "tl_isPrefixOf" tl_isPrefixOf, testProperty "t_isSuffixOf" t_isSuffixOf, testProperty "tl_isSuffixOf" tl_isSuffixOf, testProperty "t_isInfixOf" t_isInfixOf, testProperty "tl_isInfixOf" tl_isInfixOf, testGroup "view" [ testProperty "t_stripPrefix" t_stripPrefix, testProperty "tl_stripPrefix" tl_stripPrefix, testProperty "t_stripSuffix" t_stripSuffix, testProperty "tl_stripSuffix" tl_stripSuffix, testProperty "t_commonPrefixes" t_commonPrefixes, testProperty "tl_commonPrefixes" tl_commonPrefixes ] ], testGroup "searching" [ testProperty "sf_elem" sf_elem, testProperty "sf_filter" sf_filter, testProperty "t_filter" t_filter, testProperty "tl_filter" tl_filter, testProperty "sf_findBy" sf_findBy, testProperty "t_find" t_find, testProperty "tl_find" tl_find, testProperty "t_partition" t_partition, testProperty "tl_partition" tl_partition ], testGroup "indexing" [ testProperty "sf_index" sf_index, testProperty "t_index" t_index, testProperty "tl_index" tl_index, testProperty "t_findIndex" t_findIndex, testProperty "t_count" t_count, testProperty "tl_count" tl_count, testProperty "t_indices" t_indices, testProperty "tl_indices" tl_indices, testProperty "t_indices_occurs" t_indices_occurs ], testGroup "zips" [ testProperty "t_zip" t_zip, testProperty "tl_zip" tl_zip, testProperty "sf_zipWith" sf_zipWith, testProperty "t_zipWith" t_zipWith, testProperty "tl_zipWith" tl_zipWith ], testGroup "regressions" [ testProperty "s_filter_eq" s_filter_eq ], testGroup "shifts" [ testProperty "shiftL_Int" shiftL_Int, testProperty "shiftL_Word16" shiftL_Word16, testProperty "shiftL_Word32" shiftL_Word32, testProperty "shiftR_Int" shiftR_Int, testProperty "shiftR_Word16" shiftR_Word16, testProperty "shiftR_Word32" shiftR_Word32 ], testGroup "builder" [ testProperty "tb_associative" tb_associative, testGroup "decimal" [ testProperty "tb_decimal_int" tb_decimal_int, testProperty "tb_decimal_int8" tb_decimal_int8, testProperty "tb_decimal_int16" tb_decimal_int16, testProperty "tb_decimal_int32" tb_decimal_int32, testProperty "tb_decimal_int64" tb_decimal_int64, testProperty "tb_decimal_integer" tb_decimal_integer, testProperty "tb_decimal_word" tb_decimal_word, testProperty "tb_decimal_word8" tb_decimal_word8, testProperty "tb_decimal_word16" tb_decimal_word16, testProperty "tb_decimal_word32" tb_decimal_word32, testProperty "tb_decimal_word64" tb_decimal_word64 ], testGroup "hexadecimal" [ testProperty "tb_hexadecimal_int" tb_hexadecimal_int, testProperty "tb_hexadecimal_int8" tb_hexadecimal_int8, testProperty "tb_hexadecimal_int16" tb_hexadecimal_int16, testProperty "tb_hexadecimal_int32" tb_hexadecimal_int32, testProperty "tb_hexadecimal_int64" tb_hexadecimal_int64, testProperty "tb_hexadecimal_integer" tb_hexadecimal_integer, testProperty "tb_hexadecimal_word" tb_hexadecimal_word, testProperty "tb_hexadecimal_word8" tb_hexadecimal_word8, testProperty "tb_hexadecimal_word16" tb_hexadecimal_word16, testProperty "tb_hexadecimal_word32" tb_hexadecimal_word32, testProperty "tb_hexadecimal_word64" tb_hexadecimal_word64 ], testGroup "realfloat" [ testProperty "tb_realfloat_double" tb_realfloat_double, testProperty "tb_realfloat_float" tb_realfloat_float ], testProperty "tb_fromText" tb_fromText, testProperty "tb_singleton" tb_singleton ], testGroup "read" [ testProperty "t_decimal" t_decimal, testProperty "tl_decimal" tl_decimal, testProperty "t_hexadecimal" t_hexadecimal, testProperty "tl_hexadecimal" tl_hexadecimal, testProperty "t_double" t_double, testProperty "tl_double" tl_double, testProperty "t_rational" t_rational, testProperty "tl_rational" tl_rational ], {- testGroup "input-output" [ testProperty "t_write_read" t_write_read, testProperty "tl_write_read" tl_write_read, testProperty "t_write_read_line" t_write_read_line, testProperty "tl_write_read_line" tl_write_read_line -- These tests are subject to I/O race conditions when run under -- test-framework-quickcheck2. -- testProperty "t_put_get" t_put_get -- testProperty "tl_put_get" tl_put_get ], -} testGroup "lowlevel" [ testProperty "t_dropWord16" t_dropWord16, testProperty "t_takeWord16" t_takeWord16, testProperty "t_take_drop_16" t_take_drop_16, testProperty "t_use_from" t_use_from, testProperty "t_copy" t_copy ] ] text-0.11.3.1/tests/Tests/QuickCheckUtils.hs0000644000000000000000000002523412140332415016741 0ustar0000000000000000-- | This module provides quickcheck utilities, e.g. arbitrary and show -- instances, and comparison functions, so we can focus on the actual properties -- in the 'Tests.Properties' module. -- {-# LANGUAGE CPP, FlexibleInstances, TypeSynonymInstances #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Tests.QuickCheckUtils ( genUnicode , unsquare , smallArbitrary , NotEmpty (..) , Small (..) , small , integralRandomR , DecodeErr (..) , Stringy (..) , eq , eqP , Encoding (..) , write_read ) where import Control.Arrow (first, (***)) import Control.DeepSeq (NFData (..), deepseq) import Control.Exception (bracket) import Data.Bits ((.&.)) import Data.Char (chr) import Data.Word (Word8, Word16) import Data.String (IsString, fromString) import Data.Text.Foreign (I16) import Debug.Trace (trace) import System.Random (Random (..), RandomGen) import Test.QuickCheck hiding ((.&.)) import Test.QuickCheck.Monadic (assert, monadicIO, run) import qualified Data.ByteString as B import qualified Data.Text as T import qualified Data.Text.Encoding.Error as T import qualified Data.Text.Fusion as TF import qualified Data.Text.Fusion.Common as TF import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Fusion as TLF import qualified Data.Text.Lazy.Internal as TL import qualified System.IO as IO import Tests.Utils instance Random I16 where randomR = integralRandomR random = randomR (minBound,maxBound) instance Arbitrary I16 where arbitrary = choose (minBound,maxBound) instance Arbitrary B.ByteString where arbitrary = B.pack `fmap` arbitrary genUnicode :: IsString a => Gen a genUnicode = fmap fromString string where string = sized $ \n -> do k <- choose (0,n) sequence [ char | _ <- [1..k] ] excluding :: [a -> Bool] -> Gen a -> Gen a excluding bad gen = loop where loop = do x <- gen if or (map ($ x) bad) then loop else return x reserved = [lowSurrogate, highSurrogate, noncharacter] lowSurrogate c = c >= 0xDC00 && c <= 0xDFFF highSurrogate c = c >= 0xD800 && c <= 0xDBFF noncharacter c = masked == 0xFFFE || masked == 0xFFFF where masked = c .&. 0xFFFF ascii = choose (0,0x7F) plane0 = choose (0xF0, 0xFFFF) plane1 = oneof [ choose (0x10000, 0x10FFF) , choose (0x11000, 0x11FFF) , choose (0x12000, 0x12FFF) , choose (0x13000, 0x13FFF) , choose (0x1D000, 0x1DFFF) , choose (0x1F000, 0x1FFFF) ] plane2 = oneof [ choose (0x20000, 0x20FFF) , choose (0x21000, 0x21FFF) , choose (0x22000, 0x22FFF) , choose (0x23000, 0x23FFF) , choose (0x24000, 0x24FFF) , choose (0x25000, 0x25FFF) , choose (0x26000, 0x26FFF) , choose (0x27000, 0x27FFF) , choose (0x28000, 0x28FFF) , choose (0x29000, 0x29FFF) , choose (0x2A000, 0x2AFFF) , choose (0x2B000, 0x2BFFF) , choose (0x2F000, 0x2FFFF) ] plane14 = choose (0xE0000, 0xE0FFF) planes = [ascii, plane0, plane1, plane2, plane14] char = chr `fmap` excluding reserved (oneof planes) -- For tests that have O(n^2) running times or input sizes, resize -- their inputs to the square root of the originals. unsquare :: (Arbitrary a, Show a, Testable b) => (a -> b) -> Property unsquare = forAll smallArbitrary smallArbitrary :: (Arbitrary a, Show a) => Gen a smallArbitrary = sized $ \n -> resize (smallish n) arbitrary where smallish = round . (sqrt :: Double -> Double) . fromIntegral . abs instance Arbitrary T.Text where arbitrary = T.pack `fmap` arbitrary instance Arbitrary TL.Text where arbitrary = (TL.fromChunks . map notEmpty) `fmap` smallArbitrary newtype NotEmpty a = NotEmpty { notEmpty :: a } deriving (Eq, Ord) instance Show a => Show (NotEmpty a) where show (NotEmpty a) = show a instance Functor NotEmpty where fmap f (NotEmpty a) = NotEmpty (f a) instance Arbitrary a => Arbitrary (NotEmpty [a]) where arbitrary = sized (\n -> NotEmpty `fmap` (choose (1,n+1) >>= vector)) instance Arbitrary (NotEmpty T.Text) where arbitrary = (fmap T.pack) `fmap` arbitrary instance Arbitrary (NotEmpty TL.Text) where arbitrary = (fmap TL.pack) `fmap` arbitrary instance Arbitrary (NotEmpty B.ByteString) where arbitrary = (fmap B.pack) `fmap` arbitrary data Small = S0 | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 | S13 | S14 | S15 | S16 | S17 | S18 | S19 | S20 | S21 | S22 | S23 | S24 | S25 | S26 | S27 | S28 | S29 | S30 | S31 deriving (Eq, Ord, Enum, Bounded) small :: Integral a => Small -> a small = fromIntegral . fromEnum intf :: (Int -> Int -> Int) -> Small -> Small -> Small intf f a b = toEnum ((fromEnum a `f` fromEnum b) `mod` 32) instance Show Small where show = show . fromEnum instance Read Small where readsPrec n = map (first toEnum) . readsPrec n instance Num Small where fromInteger = toEnum . fromIntegral signum _ = 1 abs = id (+) = intf (+) (-) = intf (-) (*) = intf (*) instance Real Small where toRational = toRational . fromEnum instance Integral Small where toInteger = toInteger . fromEnum quotRem a b = (toEnum x, toEnum y) where (x, y) = fromEnum a `quotRem` fromEnum b instance Random Small where randomR = integralRandomR random = randomR (minBound,maxBound) instance Arbitrary Small where arbitrary = choose (minBound,maxBound) integralRandomR :: (Integral a, RandomGen g) => (a,a) -> g -> (a,g) integralRandomR (a,b) g = case randomR (fromIntegral a :: Integer, fromIntegral b :: Integer) g of (x,h) -> (fromIntegral x, h) data DecodeErr = DE String T.OnDecodeError instance Show DecodeErr where show (DE d _) = "DE " ++ d instance Arbitrary DecodeErr where arbitrary = oneof [ return $ DE "lenient" T.lenientDecode , return $ DE "ignore" T.ignore , return $ DE "strict" T.strictDecode , DE "replace" `fmap` arbitrary ] class Stringy s where packS :: String -> s unpackS :: s -> String splitAtS :: Int -> s -> (s,s) packSChunkSize :: Int -> String -> s packSChunkSize _ = packS instance Stringy String where packS = id unpackS = id splitAtS = splitAt instance Stringy (TF.Stream Char) where packS = TF.streamList unpackS = TF.unstreamList splitAtS n s = (TF.take n s, TF.drop n s) instance Stringy T.Text where packS = T.pack unpackS = T.unpack splitAtS = T.splitAt instance Stringy TL.Text where packSChunkSize k = TLF.unstreamChunks k . TF.streamList packS = TL.pack unpackS = TL.unpack splitAtS = ((TL.lazyInvariant *** TL.lazyInvariant) .) . TL.splitAt . fromIntegral -- Do two functions give the same answer? eq :: (Eq a, Show a) => (t -> a) -> (t -> a) -> t -> Bool eq a b s = a s =^= b s -- What about with the RHS packed? eqP :: (Eq a, Show a, Stringy s) => (String -> a) -> (s -> a) -> String -> Word8 -> Bool eqP f g s w = eql "orig" (f s) (g t) && eql "mini" (f s) (g mini) && eql "head" (f sa) (g ta) && eql "tail" (f sb) (g tb) where t = packS s mini = packSChunkSize 10 s (sa,sb) = splitAt m s (ta,tb) = splitAtS m t l = length s m | l == 0 = n | otherwise = n `mod` l n = fromIntegral w eql d a b | a =^= b = True | otherwise = trace (d ++ ": " ++ show a ++ " /= " ++ show b) False -- Work around lack of Show instance for TextEncoding. data Encoding = E String IO.TextEncoding instance Show Encoding where show (E n _) = "utf" ++ n instance Arbitrary Encoding where arbitrary = oneof . map return $ [ E "8" IO.utf8, E "8_bom" IO.utf8_bom, E "16" IO.utf16 , E "16le" IO.utf16le, E "16be" IO.utf16be, E "32" IO.utf32 , E "32le" IO.utf32le, E "32be" IO.utf32be ] windowsNewlineMode :: IO.NewlineMode windowsNewlineMode = IO.NewlineMode { IO.inputNL = IO.CRLF, IO.outputNL = IO.CRLF } -- Newline and NewlineMode have standard Show instance from GHC 7 onwards #if __GLASGOW_HASKELL__ < 700 instance Show IO.Newline where show IO.CRLF = "CRLF" show IO.LF = "LF" instance Show IO.NewlineMode where show (IO.NewlineMode i o) = "NewlineMode { inputNL = " ++ show i ++ ", outputNL = " ++ show o ++ " }" # endif instance Arbitrary IO.NewlineMode where arbitrary = oneof . map return $ [ IO.noNewlineTranslation, IO.universalNewlineMode, IO.nativeNewlineMode , windowsNewlineMode ] instance Arbitrary IO.BufferMode where arbitrary = oneof [ return IO.NoBuffering, return IO.LineBuffering, return (IO.BlockBuffering Nothing), (IO.BlockBuffering . Just . (+1) . fromIntegral) `fmap` (arbitrary :: Gen Word16) ] -- This test harness is complex! What property are we checking? -- -- Reading after writing a multi-line file should give the same -- results as were written. -- -- What do we vary while checking this property? -- * The lines themselves, scrubbed to contain neither CR nor LF. (By -- working with a list of lines, we ensure that the data will -- sometimes contain line endings.) -- * Encoding. -- * Newline translation mode. -- * Buffering. write_read :: (NFData a, Eq a) => ([b] -> a) -> ((Char -> Bool) -> a -> b) -> (IO.Handle -> a -> IO ()) -> (IO.Handle -> IO a) -> Encoding -> IO.NewlineMode -> IO.BufferMode -> [a] -> Property write_read unline filt writer reader (E _ _) nl buf ts = monadicIO $ assert . (==t) =<< run act where t = unline . map (filt (not . (`elem` "\r\n"))) $ ts act = withTempFile $ \path h -> do -- hSetEncoding h enc IO.hSetNewlineMode h nl IO.hSetBuffering h buf () <- writer h t IO.hClose h bracket (IO.openFile path IO.ReadMode) IO.hClose $ \h' -> do -- hSetEncoding h' enc IO.hSetNewlineMode h' nl IO.hSetBuffering h' buf r <- reader h' r `deepseq` return r text-0.11.3.1/tests/Tests/Regressions.hs0000644000000000000000000000474612140332415016216 0ustar0000000000000000-- | Regression tests for specific bugs. -- {-# LANGUAGE OverloadedStrings, ScopedTypeVariables #-} module Tests.Regressions ( tests ) where import Control.Exception (SomeException, handle) import System.IO import Test.HUnit (assertBool, assertFailure) import qualified Data.ByteString as B import Data.ByteString.Char8 () import qualified Data.ByteString.Lazy as LB import qualified Data.Text as T import qualified Data.Text.Encoding as TE import qualified Data.Text.IO as T import qualified Data.Text.Lazy as LT import qualified Data.Text.Lazy.Encoding as LE import qualified Test.Framework as F import qualified Test.Framework.Providers.HUnit as F import Tests.Utils (withTempFile) -- Reported by Michael Snoyman: UTF-8 encoding a large lazy bytestring -- caused either a segfault or attempt to allocate a negative number -- of bytes. lazy_encode_crash :: IO () lazy_encode_crash = withTempFile $ \ _ h -> LB.hPut h . LE.encodeUtf8 . LT.pack . replicate 100000 $ 'a' -- Reported by Pieter Laeremans: attempting to read an incorrectly -- encoded file can result in a crash in the RTS (i.e. not merely an -- exception). hGetContents_crash :: IO () hGetContents_crash = withTempFile $ \ path h -> do B.hPut h (B.pack [0x78, 0xc4 ,0x0a]) >> hClose h h' <- openFile path ReadMode hSetEncoding h' utf8 handle (\(_::SomeException) -> return ()) $ T.hGetContents h' >> assertFailure "T.hGetContents should crash" -- Reported by Ian Lynagh: attempting to allocate a sufficiently large -- string (via either Array.new or Text.replicate) could result in an -- integer overflow. replicate_crash :: IO () replicate_crash = handle (\(_::SomeException) -> return ()) $ T.replicate (2^power) "0123456789abcdef" `seq` assertFailure "T.replicate should crash" where power | maxBound == (2147483647::Int) = 28 | otherwise = 60 :: Int -- Reported by John Millikin: a UTF-8 decode error handler could -- return a bogus substitution character, which we would write without -- checking. utf8_decode_unsafe :: IO () utf8_decode_unsafe = do let t = TE.decodeUtf8With (\_ _ -> Just '\xdc00') "\x80" assertBool "broken error recovery shouldn't break us" (t == "\xfffd") tests :: F.Test tests = F.testGroup "Regressions" [ F.testCase "hGetContents_crash" hGetContents_crash , F.testCase "lazy_encode_crash" lazy_encode_crash , F.testCase "replicate_crash" replicate_crash , F.testCase "utf8_decode_unsafe" utf8_decode_unsafe ] text-0.11.3.1/tests/Tests/SlowFunctions.hs0000644000000000000000000000237612140332415016525 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} module Tests.SlowFunctions ( indices , splitOn ) where import qualified Data.Text as T import Data.Text.Internal (Text(..)) import Data.Text.Unsafe (iter_, unsafeHead, unsafeTail) indices :: T.Text -- ^ Substring to search for (@needle@) -> T.Text -- ^ Text to search in (@haystack@) -> [Int] indices needle@(Text _narr _noff nlen) haystack@(Text harr hoff hlen) | T.null needle = [] | otherwise = scan 0 where scan i | i >= hlen = [] | needle `T.isPrefixOf` t = i : scan (i+nlen) | otherwise = scan (i+d) where t = Text harr (hoff+i) (hlen-i) d = iter_ haystack i splitOn :: T.Text -- ^ Text to split on -> T.Text -- ^ Input text -> [T.Text] splitOn pat src0 | T.null pat = error "splitOn: empty" | l == 1 = T.split (== (unsafeHead pat)) src0 | otherwise = go src0 where l = T.length pat go src = search 0 src where search !n !s | T.null s = [src] -- not found | pat `T.isPrefixOf` s = T.take n src : go (T.drop l s) | otherwise = search (n+1) (unsafeTail s) text-0.11.3.1/tests/Tests/Utils.hs0000644000000000000000000000326312140332415015004 0ustar0000000000000000-- | Miscellaneous testing utilities -- {-# LANGUAGE ScopedTypeVariables #-} module Tests.Utils ( (=^=) , withRedirect , withTempFile ) where import Control.Exception (SomeException, bracket, bracket_, evaluate, try) import Control.Monad (when) import Debug.Trace (trace) import GHC.IO.Handle.Internals (withHandle) import System.Directory (removeFile) import System.IO (Handle, hClose, hFlush, hIsOpen, hIsWritable, openTempFile) import System.IO.Unsafe (unsafePerformIO) -- Ensure that two potentially bottom values (in the sense of crashing -- for some inputs, not looping infinitely) either both crash, or both -- give comparable results for some input. (=^=) :: (Eq a, Show a) => a -> a -> Bool i =^= j = unsafePerformIO $ do x <- try (evaluate i) y <- try (evaluate j) case (x,y) of (Left (_ :: SomeException), Left (_ :: SomeException)) -> return True (Right a, Right b) -> return (a == b) e -> trace ("*** Divergence: " ++ show e) return False infix 4 =^= {-# NOINLINE (=^=) #-} withTempFile :: (FilePath -> Handle -> IO a) -> IO a withTempFile = bracket (openTempFile "." "crashy.txt") cleanupTemp . uncurry where cleanupTemp (path,h) = do open <- hIsOpen h when open (hClose h) removeFile path withRedirect :: Handle -> Handle -> IO a -> IO a withRedirect tmp h = bracket_ swap swap where whenM p a = p >>= (`when` a) swap = do whenM (hIsOpen tmp) $ whenM (hIsWritable tmp) $ hFlush tmp whenM (hIsOpen h) $ whenM (hIsWritable h) $ hFlush h withHandle "spam" tmp $ \tmph -> do hh <- withHandle "spam" h $ \hh -> return (tmph,hh) return (hh,())