hashtables-1.0.1.8/0000755000000000000000000000000012032120320012144 5ustar0000000000000000hashtables-1.0.1.8/haddock.sh0000755000000000000000000000026712032120320014105 0ustar0000000000000000#!/bin/sh set -x rm -Rf dist/doc HADDOCK_OPTS='--html-location=http://hackage.haskell.org/packages/archive/$pkg/latest/doc/html' cabal haddock $HADDOCK_OPTS --hyperlink-source $@ hashtables-1.0.1.8/hashtables.cabal0000644000000000000000000001720712032120320015255 0ustar0000000000000000Name: hashtables Version: 1.0.1.8 Synopsis: Mutable hash tables in the ST monad Homepage: http://github.com/gregorycollins/hashtables License: BSD3 License-file: LICENSE Author: Gregory Collins Maintainer: greg@gregorycollins.net Copyright: (c) 2011-2012, Google, Inc. Category: Data Build-type: Simple Cabal-version: >= 1.8 Description: This package provides a couple of different implementations of mutable hash tables in the ST monad, as well as a typeclass abstracting their common operations, and a set of wrappers to use the hash tables in the IO monad. . /QUICK START/: documentation for the hash table operations is provided in the "Data.HashTable.Class" module, and the IO wrappers (which most users will probably prefer) are located in the "Data.HashTable.IO" module. . This package currently contains three hash table implementations: . 1. "Data.HashTable.ST.Basic" contains a basic open-addressing hash table using linear probing as the collision strategy. On a pure speed basis it should currently be the fastest available Haskell hash table implementation for lookups, although it has a higher memory overhead than the other tables and can suffer from long delays when the table is resized because all of the elements in the table need to be rehashed. . 2. "Data.HashTable.ST.Cuckoo" contains an implementation of \"cuckoo hashing\" as introduced by Pagh and Rodler in 2001 (see ). Cuckoo hashing has worst-case /O(1)/ lookups and can reach a high \"load factor\", in which the table can perform acceptably well even when more than 90% full. Randomized testing shows this implementation of cuckoo hashing to be slightly faster on insert and slightly slower on lookup than "Data.Hashtable.ST.Basic", while being more space efficient by about a half-word per key-value mapping. Cuckoo hashing, like the basic hash table implementation using linear probing, can suffer from long delays when the table is resized. . 3. "Data.HashTable.ST.Linear" contains a linear hash table (see ), which trades some insert and lookup performance for higher space efficiency and much shorter delays when expanding the table. In most cases, benchmarks show this table to be currently slightly faster than @Data.HashTable@ from the Haskell base library. . It is recommended to create a concrete type alias in your code when using this package, i.e.: . > import qualified Data.HashTable.IO as H > > type HashTable k v = H.BasicHashTable k v > > foo :: IO (HashTable Int Int) > foo = do > ht <- H.new > H.insert ht 1 1 > return ht . Firstly, this makes it easy to switch to a different hash table implementation, and secondly, using a concrete type rather than leaving your functions abstract in the HashTable class should allow GHC to optimize away the typeclass dictionaries. . This package accepts a couple of different cabal flags: . * @unsafe-tricks@, default /ON/. If this flag is enabled, we use some unsafe GHC-specific tricks to save indirections (namely @unsafeCoerce#@ and @reallyUnsafePtrEquality#@. These techniques rely on assumptions about the behaviour of the GHC runtime system and, although they've been tested and should be safe under normal conditions, are slightly dangerous. Caveat emptor. In particular, these techniques are incompatible with HPC code coverage reports. . * @sse41@, default /OFF/. If this flag is enabled, we use some SSE 4.1 instructions (see , first available on Intel Core 2 processors) to speed up cache-line searches for cuckoo hashing. . * @bounds-checking@, default /OFF/. If this flag is enabled, array accesses are bounds-checked. . * @debug@, default /OFF/. If turned on, we'll rudely spew debug output to stdout. . * @portable@, default /OFF/. If this flag is enabled, we use only pure Haskell code and try not to use unportable GHC extensions. Turning this flag on forces @unsafe-tricks@ and @sse41@ /OFF/. . This package has been tested with GHC 7.0.3, on: . * a MacBook Pro running Snow Leopard with an Intel Core i5 processor, running GHC 7.0.3 in 64-bit mode. . * an Arch Linux desktop with an AMD Phenom II X4 940 quad-core processor. . * a MacBook Pro running Snow Leopard with an Intel Core 2 Duo processor, running GHC 6.12.3 in 32-bit mode. . Please send bug reports to . Extra-Source-Files: README.md, haddock.sh, benchmark/hashtable-benchmark.cabal, benchmark/LICENSE, benchmark/src/Criterion/Collection/Internal/Types.hs, benchmark/src/Criterion/Collection/Chart.hs, benchmark/src/Criterion/Collection/Main.hs, benchmark/src/Criterion/Collection/Types.hs, benchmark/src/Criterion/Collection/Sample.hs, benchmark/src/Main.hs, benchmark/src/Data/Vector/Algorithms/Shuffle.hs, benchmark/src/Data/Benchmarks/UnorderedCollections/Distributions.hs, benchmark/src/Data/Benchmarks/UnorderedCollections/Types.hs, test/compute-overhead/ComputeOverhead.hs, test/hashtables-test.cabal, test/runTestsAndCoverage.sh, test/runTestsNoCoverage.sh, test/suite/Data/HashTable/Test/Common.hs, test/suite/TestSuite.hs ------------------------------------------------------------------------------ Flag unsafe-tricks Description: turn on unsafe GHC tricks Default: True Flag bounds-checking Description: if on, use bounds-checking array accesses Default: False Flag debug Description: if on, spew debugging output to stdout Default: False Flag sse41 Description: if on, use SSE 4.1 extensions to search cache lines very efficiently. The portable flag forces this off. Default: False Flag portable Description: if on, use only pure Haskell code and no GHC extensions. Default: False Library hs-source-dirs: src if !flag(portable) C-sources: cbits/cfuncs.c Exposed-modules: Data.HashTable.Class, Data.HashTable.IO, Data.HashTable.ST.Basic, Data.HashTable.ST.Cuckoo, Data.HashTable.ST.Linear Other-modules: Data.HashTable.Internal.Array, Data.HashTable.Internal.IntArray, Data.HashTable.Internal.CacheLine, Data.HashTable.Internal.CheapPseudoRandomBitStream, Data.HashTable.Internal.UnsafeTricks, Data.HashTable.Internal.Utils, Data.HashTable.Internal.Linear.Bucket Build-depends: base >= 4 && <5, hashable >= 1.1 && <2, primitive >= 0.4 && <0.6, vector >= 0.7 && <0.11 if flag(portable) cpp-options: -DNO_C_SEARCH -DPORTABLE if !flag(portable) && flag(unsafe-tricks) && impl(ghc) build-depends: ghc-prim cpp-options = -DUNSAFETRICKS if flag(debug) cpp-options: -DDEBUG if flag(bounds-checking) cpp-options: -DBOUNDS_CHECKING if flag(sse41) && !flag(portable) cc-options: -DUSE_SSE_4_1 -msse4.1 cpp-options: -DUSE_SSE_4_1 ghc-prof-options: -prof -auto-all if impl(ghc >= 6.12.0) ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2 -fno-warn-unused-do-bind else ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2 hashtables-1.0.1.8/LICENSE0000644000000000000000000000276112032120320013157 0ustar0000000000000000Copyright (c) 2011-2012, Google, Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google, Inc. nor the names of other contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. hashtables-1.0.1.8/README.md0000644000000000000000000000750512032120320013432 0ustar0000000000000000This package provides a couple of different implementations of mutable hash tables in the ST monad, as well as a typeclass abstracting their common operations, and a set of wrappers to use the hash tables in the IO monad. **Quick start**: documentation for the hash table operations is provided in the `Data.HashTable.Class` module, and the IO wrappers are located in the `Data.HashTable.IO` module. This package currently contains three hash table implementations: 1. `Data.HashTable.ST.Basic` contains a basic open-addressing hash table using linear probing as the collision strategy. On a pure speed basis it should currently be the fastest available Haskell hash table implementation for lookups, although it has a higher memory overhead than the other tables and can suffer from long delays when the table is resized because all of the elements in the table need to be rehashed. 2. `Data.HashTable.ST.Cuckoo` contains an implementation of "cuckoo hashing" as introduced by Pagh and Rodler in 2001 (see [http://en.wikipedia.org/wiki/Cuckoo\_hashing](http://en.wikipedia.org/wiki/Cuckoo_hashing)). Cuckoo hashing has worst-case /O(1)/ lookups and can reach a high "load factor", in which the table can perform acceptably well even when more than 90% full. Randomized testing shows this implementation of cuckoo hashing to be slightly faster on insert and slightly slower on lookup than `Data.Hashtable.ST.Basic`, while being more space efficient by about a half-word per key-value mapping. Cuckoo hashing, like the basic hash table implementation using linear probing, can suffer from long delays when the table is resized. 3. `Data.HashTable.ST.Linear` contains a linear hash table (see [http://en.wikipedia.org/wiki/Linear\_hashing](http://en.wikipedia.org/wiki/Linear_hashing)), which trades some insert and lookup performance for higher space efficiency and much shorter delays when expanding the table. In most cases, benchmarks show this table to be currently slightly faster than `Data.HashTable` from the Haskell base library. It is recommended to create a concrete type alias in your code when using this package, i.e.: import qualified Data.HashTable.IO as H type HashTable k v = H.BasicHashTable k v foo :: IO (HashTable Int Int) foo = do ht <- H.new H.insert ht 1 1 return ht Firstly, this makes it easy to switch to a different hash table implementation, and secondly, using a concrete type rather than leaving your functions abstract in the HashTable class should allow GHC to optimize away the typeclass dictionaries. This package accepts a couple of different cabal flags: * `unsafe-tricks`, default **on**. If this flag is enabled, we use some unsafe GHC-specific tricks to save indirections (namely `unsafeCoerce#` and `reallyUnsafePtrEquality#`. These techniques rely on assumptions about the behaviour of the GHC runtime system and, although they've been tested and should be safe under normal conditions, are slightly dangerous. Caveat emptor. In particular, these techniques are incompatible with HPC code coverage reports. * `sse41`, default /off/. If this flag is enabled, we use some SSE 4.1 instructions (see [http://en.wikipedia.org/wiki/SSE4](http://en.wikipedia.org/wiki/SSE4), first available on Intel Core 2 processors) to speed up cache-line searches for cuckoo hashing. * `bounds-checking`, default /off/. If this flag is enabled, array accesses are bounds-checked. * `debug`, default /off/. If turned on, we'll rudely spew debug output to stdout. * `portable`, default /off/. If this flag is enabled, we use only pure Haskell code and try not to use unportable GHC extensions. Turning this flag on forces `unsafe-tricks` and `sse41` *OFF*. hashtables-1.0.1.8/Setup.hs0000644000000000000000000000005612032120320013601 0ustar0000000000000000import Distribution.Simple main = defaultMain hashtables-1.0.1.8/benchmark/0000755000000000000000000000000012032120320014076 5ustar0000000000000000hashtables-1.0.1.8/benchmark/hashtable-benchmark.cabal0000644000000000000000000000356612032120320020757 0ustar0000000000000000Name: hashtable-benchmark Version: 0.2 Copyright: (c) 2011-2012, Google, Inc. Synopsis: Benchmarks for hashtables License: BSD3 License-file: LICENSE Author: Gregory Collins Maintainer: greg@gregorycollins.net Category: Data Build-type: Simple Cabal-version: >=1.2 Flag chart Default: False Executable hashtable-benchmark main-is: Main.hs hs-source-dirs: src ../src build-depends: base == 4.*, base16-bytestring == 0.1.*, bytestring >= 0.9 && <0.11, containers >= 0.4 && <0.6, criterion >= 0.5 && <0.7, csv == 0.1.*, deepseq >= 1.1 && <1.4, filepath == 1.*, hashable >= 1.1 && <2, hashtables >= 1.0.1.3 && <1.1, mtl == 2.*, mwc-random >= 0.8 && <0.13, primitive, statistics >= 0.8 && <0.11, threads >= 0.4 && <0.6, unordered-containers >= 0.2 && <0.3, vector >= 0.7 && <0.10, vector-algorithms >= 0.5 && <0.6 if flag(chart) Build-depends: Chart == 0.14.*, colour == 2.3.*, data-accessor == 0.2.* Cpp-options: -DCHART ghc-options: -O2 -Wall -fwarn-tabs -funbox-strict-fields -threaded -fno-warn-unused-do-bind -rtsopts -with-rtsopts=-H2G -with-rtsopts=-A4M hashtables-1.0.1.8/benchmark/LICENSE0000644000000000000000000000276112032120320015111 0ustar0000000000000000Copyright (c) 2011-2012, Google, Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google, Inc. nor the names of other contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. hashtables-1.0.1.8/benchmark/src/0000755000000000000000000000000012032120320014665 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Main.hs0000644000000000000000000002155012032120320016110 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} module Main (main) where import Data.Bits import qualified Data.ByteString as B import Data.ByteString (ByteString) import qualified Data.ByteString.Base16 as B16 import Data.Hashable import Data.IORef import Control.DeepSeq import Control.Monad import Control.Monad.ST import Control.Monad.Trans import qualified Data.HashMap.Strict as UC import qualified Data.HashTable as H import qualified Data.Map as Map import qualified Data.HashTable.IO as IOH import Data.Benchmarks.UnorderedCollections.Distributions import Data.Benchmarks.UnorderedCollections.Types import System.Environment import System.FilePath import System.Random.MWC import Criterion.Collection.Main import Criterion.Collection.Sample import Criterion.Collection.Types ------------------------------------------------------------------------------ dataMap :: (Ord k, Eq k) => DataStructure (Operation k) dataMap = setupData Map.empty f where f !m !op = case op of (Insert k v) -> let !m' = Map.insert k v m in m' (Lookup k) -> let !_ = Map.lookup k m in m (Delete k) -> let !m' = Map.delete k m in m' {-# INLINE dataMap #-} ------------------------------------------------------------------------------ hashMap :: (Hashable k, Eq k) => DataStructure (Operation k) hashMap = setupData UC.empty f where f !m !op = case op of (Insert k v) -> let !m' = UC.insert k v m in m' (Lookup k) -> let !_ = UC.lookup k m in m (Delete k) -> let !m' = UC.delete k m in m' {-# INLINE hashMap #-} ------------------------------------------------------------------------------ hashTable :: (Hashable k, Eq k) => DataStructure (Operation k) hashTable = setupDataIO (const (H.new (==) (toEnum . (.&. 0x7fffffff) . hash))) f where f !m !op = case op of (Insert k v) -> H.update m k v >> return m (Lookup k) -> do !_ <- H.lookup m k return m (Delete k) -> do !_ <- H.delete m k return m {-# INLINE hashTable #-} ------------------------------------------------------------------------------ basicHashTable :: (Hashable k, Eq k) => DataStructure (Operation k) basicHashTable = setupDataIO (IOH.newSized :: Int -> IO (IOH.BasicHashTable k v)) f where f !m !op = case op of (Insert k v) -> IOH.insert m k v >> return m (Lookup k) -> do !_ <- IOH.lookup m k return m (Delete k) -> IOH.delete m k >> return m {-# INLINE basicHashTable #-} ------------------------------------------------------------------------------ cuckooHashTable :: (Hashable k, Eq k) => DataStructure (Operation k) cuckooHashTable = setupDataIO (IOH.newSized :: Int -> IO (IOH.CuckooHashTable k v)) f where f !m !op = case op of (Insert k v) -> IOH.insert m k v >> return m (Lookup k) -> do !_ <- IOH.lookup m k return m (Delete k) -> IOH.delete m k >> return m {-# INLINE cuckooHashTable #-} ------------------------------------------------------------------------------ linearHashTable :: (Hashable k, Eq k) => DataStructure (Operation k) linearHashTable = setupDataIO (IOH.newSized :: Int -> IO (IOH.LinearHashTable k v)) f where f !m !op = case op of (Insert k v) -> IOH.insert m k v >> return m (Lookup k) -> do !_ <- IOH.lookup m k return m (Delete k) -> IOH.delete m k >> return m {-# INLINE linearHashTable #-} ------------------------------------------------------------------------------ mkByteString :: GenIO -> IO ByteString mkByteString rng = do n <- uniformR (4,16) rng xs <- replicateM n (uniform rng) let !s = B.pack xs return $! B16.encode s instance NFData ByteString where rnf s = rnf $! B.unpack s ------------------------------------------------------------------------------ mkConsecutiveIntegers :: IORef Int -> GenIO -> IO Int mkConsecutiveIntegers ref _ = do !i <- atomicModifyIORef ref f return $! i where f !i = let !j = i+1 in (j,j) ------------------------------------------------------------------------------ newtype IntMix = IntMix Int deriving (Num, Read, Show, Ord, Eq, NFData) ------------------------------------------------------------------------------ instance Hashable IntMix where hash (IntMix a) = hashWithSalt 1102951999 a hashWithSalt salt (IntMix a) = hashWithSalt salt a ------------------------------------------------------------------------------ loadConsecutiveIntegersWorkload :: WorkloadGenerator (Operation Int) loadConsecutiveIntegersWorkload size = do ref <- liftIO $ newIORef 0 loadOnly (mkConsecutiveIntegers ref) size ------------------------------------------------------------------------------ loadConsecutiveIntegersWorkload' :: WorkloadGenerator (Operation IntMix) loadConsecutiveIntegersWorkload' size = do ref <- liftIO $ newIORef 0 loadOnly (\rng -> IntMix `fmap` mkConsecutiveIntegers ref rng) size ------------------------------------------------------------------------------ testStructures = [ ("Data.Map" , dataMap ) , ("Data.Hashtable" , hashTable ) , ("Data.HashMap" , hashMap ) , ("Data.BasicHashTable" , basicHashTable ) , ("Data.LinearHashTable" , linearHashTable) , ("Data.CuckooHashTable" , cuckooHashTable) ] intStructures = [ ("Data.Map" , dataMap ) , ("Data.Hashtable" , hashTable ) , ("Data.HashMap" , hashMap ) , ("Data.BasicHashTable" , basicHashTable ) , ("Data.CuckooHashTable", cuckooHashTable) ] intStructures' = [ ("Data.Map" , dataMap ) , ("Data.Hashtable" , hashTable ) , ("Data.HashMap" , hashMap ) , ("Data.BasicHashTable" , basicHashTable ) , ("Data.CuckooHashTable", cuckooHashTable) ] testSizes = [ 250 , 500 , 1000 , 2000 , 4000 , 8000 , 16000 , 32000 , 64000 , 128000 , 256000 , 512000 , 1024000 , 2048000 ] ------------------------------------------------------------------------------ lookupBenchmark :: Benchmark (Operation ByteString) lookupBenchmark = Benchmark "Lookup Performance" testStructures testSizes (loadAndUniformLookup mkByteString) ------------------------------------------------------------------------------ insertBenchmark :: Benchmark (Operation ByteString) insertBenchmark = Benchmark "Insert Performance" testStructures testSizes (loadOnly mkByteString) ------------------------------------------------------------------------------ consecutiveIntBenchmark :: Benchmark (Operation Int) consecutiveIntBenchmark = Benchmark "Insert consecutive ints" intStructures testSizes loadConsecutiveIntegersWorkload ------------------------------------------------------------------------------ consecutiveIntWithMixBenchmark :: Benchmark (Operation IntMix) consecutiveIntWithMixBenchmark = Benchmark "Insert consecutive ints (mixed)" intStructures' testSizes loadConsecutiveIntegersWorkload' ------------------------------------------------------------------------------ main :: IO () main = do args <- getArgs let fn = case args of [] -> Nothing (x:_) -> Just (dropExtensions x) let cfg = defaultCriterionCollectionConfig runBenchmark PerBatch Mutating insertBenchmark cfg (fmap (++".insert") fn) runBenchmark PerBatch Pure lookupBenchmark cfg (fmap (++".lookup") fn) runBenchmark PerBatch Mutating consecutiveIntBenchmark cfg (fmap (++".int") fn) runBenchmark PerBatch Mutating consecutiveIntWithMixBenchmark cfg (fmap (++".intmix") fn) hashtables-1.0.1.8/benchmark/src/Criterion/0000755000000000000000000000000012032120320016623 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Criterion/Collection/0000755000000000000000000000000012032120320020716 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Criterion/Collection/Chart.hs0000644000000000000000000000562212032120320022320 0ustar0000000000000000module Criterion.Collection.Chart ( errBarChart , defaultColors ) where import Criterion.Measurement import Data.Accessor import Data.Colour import Data.Colour.Names import Graphics.Rendering.Chart hiding (Vector) import Criterion.Collection.Sample defaultColors :: [AlphaColour Double] defaultColors = cycle $ map opaque [ blue, red, brown, black, darkgoldenrod, coral, cyan, darkcyan, darkkhaki, darkmagenta, darkslategrey ] plotErrBars :: String -> CairoLineStyle -> [SampleData] -> Plot Double Double plotErrBars name lineStyle samples = toPlot plot where value sd = symErrPoint size m 0 s where size = fromIntegral $ sdInputSize sd (m,s) = computeMeanAndStddev sd plot = plot_errbars_values ^= map value samples $ plot_errbars_line_style ^= lineStyle $ plot_errbars_title ^= name $ defaultPlotErrBars plotPoints :: String -> CairoPointStyle -> [SampleData] -> Plot Double Double plotPoints name pointStyle samples = toPlot plot where value sd = (fromIntegral size, m) where size = sdInputSize sd (m,_) = computeMeanAndStddev sd plot = plot_points_values ^= map value samples $ plot_points_style ^= pointStyle $ plot_points_title ^= name $ defaultPlotPoints errBarChart :: Bool -> Double -> String -> [(AlphaColour Double, String, [SampleData])] -> Renderable () errBarChart logPlot lineWidth plotTitle plotData = toRenderable layout where mkPlot (colour, plotName, samples) = joinPlot eb pts where lStyle = line_width ^= lineWidth $ line_color ^= colour $ defaultPlotErrBars ^. plot_errbars_line_style pStyle = filledCircles (1.5 * lineWidth) colour eb = plotErrBars plotName lStyle samples pts = plotPoints plotName pStyle samples remapLabels = axis_labels ^: f where f labels = map (map g) labels g (x,_) = (x, secs x) axisfn = if logPlot then autoScaledLogAxis defaultLogAxis else autoScaledAxis defaultLinearAxis layout = layout1_title ^= plotTitle $ layout1_background ^= solidFillStyle (opaque white) $ layout1_left_axis ^: laxis_generate ^= axisfn $ layout1_left_axis ^: laxis_override ^= remapLabels $ layout1_left_axis ^: laxis_title ^= "Time (seconds)" $ layout1_bottom_axis ^: laxis_generate ^= axisfn $ layout1_bottom_axis ^: laxis_title ^= "# of items in collection" $ layout1_plots ^= (map (Left . mkPlot) plotData) $ defaultLayout1 hashtables-1.0.1.8/benchmark/src/Criterion/Collection/Main.hs0000644000000000000000000001267212032120320022146 0ustar0000000000000000{-# LANGUAGE CPP #-} module Criterion.Collection.Main ( CriterionCollectionConfig , defaultCriterionCollectionConfig , runBenchmark ) where import Control.DeepSeq import Control.Monad.Trans import Criterion.Collection.Sample import Criterion.Config import Criterion.Environment import Criterion.Measurement (secs) import Criterion.Monad import Data.List import System.IO import System.Random.MWC (GenIO) import qualified System.Random.MWC as R import Text.CSV #ifdef CHART import Criterion.Collection.Chart #endif data CriterionCollectionConfig = Cfg { _criterionConfig :: Config , _logPlot :: Bool -- todo: more here } defaultCriterionCollectionConfig :: CriterionCollectionConfig defaultCriterionCollectionConfig = Cfg defaultConfig False -- Fixme: fold chart output into config and generalize to other post-processing -- functions (like alternative chart types and CSV output) runBenchmark :: (NFData op) => MeasurementMode -> WorkloadMode -> Benchmark op -> CriterionCollectionConfig -> Maybe FilePath -> IO () runBenchmark mMode wMode benchmark (Cfg cfg logPlot) fp = withConfig cfg $ do rng <- liftIO $ R.withSystemRandom (\r -> return r :: IO GenIO) env <- measureEnvironment plotData <- takeSamples mMode wMode benchmark env rng liftIO $ mkChart logPlot (benchmarkName benchmark) fp plotData liftIO $ mkCSV (benchmarkName benchmark) fp plotData ------------------------------------------------------------------------------ mkCSV :: String -> Maybe FilePath -> [(String, [SampleData])] -> IO () mkCSV chartTitle output plotData = do h <- maybe (return stdout) (\f -> openFile (f ++ ".csv") WriteMode) output hPutStr h $ printCSV allRows maybe (return ()) (\_ -> hClose h) output where header = [ "Data Structure" , "Input Size" , "Mean (secs)" , "Stddev (secs)" , "95% (secs)" , "Max (secs)" ] allRows = header : sampleRows sampleRows = concatMap samplesToRows plotData samplesToRows (name, sds) = map (sampleToRow name) sds sampleToRow name sd = [ name , show inputSize , show mean , show stddev , show ninetyFifth , show maxVal ] where (mean, stddev) = computeMeanAndStddev sd ninetyFifth = compute95thPercentile sd maxVal = computeMax sd inputSize = sdInputSize sd ------------------------------------------------------------------------------ mkChart :: Bool -> String -> Maybe FilePath -> [(String, [SampleData])] -> IO () #ifdef CHART mkChart logPlot chartTitle output plotData' = do go output printChartResults chartTitle plotData' where plotData = map (\(a,(b,c)) -> (a,b,c)) (defaultColors `zip` plotData') go Nothing = do let chart = errBarChart logPlot 2.5 chartTitle plotData _ <- renderableToWindow chart 1024 768 return () go (Just fn) = do let chart = errBarChart logPlot 1.5 chartTitle plotData _ <- renderableToPNGFile chart 800 600 fn return () #else -- FIXME mkChart _ chartTitle _ plotData = printChartResults chartTitle plotData #endif ------------------------------------------------------------------------------ printChartResults :: String -> [(String, [SampleData])] -> IO () printChartResults chartTitle plotData = do -- fixme putStrLn $ "Results for " ++ chartTitle dashes crlf mapM_ printOne plotData where dashes = putStrLn $ replicate 78 '-' crlf = putStrLn "" fieldSize = 14 rpad s = if n > fieldSize then (take (fieldSize-2) s) ++ ".." else replicate nsp ' ' ++ s where n = length s nsp = fieldSize-n lpad s = if n > fieldSize then (take (fieldSize-2) s) ++ ".." else s ++ replicate nsp ' ' where n = length s nsp = fieldSize-n printHeader = do putStrLn $ concat [ lpad "Input Sz", " " , lpad "Mean (secs)", " " , lpad "Stddev (secs)", " " , lpad "95% (secs)", " " , lpad "Max (secs)"] putStrLn $ concat [ replicate fieldSize '-', " " , replicate fieldSize '-', " " , replicate fieldSize '-', " " , replicate fieldSize '-', " " , replicate fieldSize '-' ] printOne (name, sampledata) = do putStrLn $ "Data structure " ++ name crlf printHeader mapM_ printSample sampledata crlf printSample sd = do --putStrLn $ "fixme: sample length is " ++ show sd let (mean,stddev) = computeMeanAndStddev sd let ninetyFifth = compute95thPercentile sd let maxVal = computeMax sd let inputSize = sdInputSize sd let f1 = rpad $ show inputSize let f2 = rpad $ secs mean let f3 = rpad $ secs stddev let f4 = rpad $ secs ninetyFifth let f5 = rpad $ secs maxVal putStrLn $ intercalate " " [f1, f2, f3, f4, f5] hashtables-1.0.1.8/benchmark/src/Criterion/Collection/Sample.hs0000644000000000000000000002640312032120320022500 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE RankNTypes #-} module Criterion.Collection.Sample ( Benchmark(..) , SampleData(..) , MeasurementMode(..) , WorkloadMode(..) , computeMeanAndStddev , compute95thPercentile , computeMax , takeSample , takeSamples ) where import Control.DeepSeq import Control.Monad import Control.Monad.Trans import Criterion hiding (Benchmark) import Criterion.Config import Criterion.Environment import Criterion.IO import Criterion.Measurement import Criterion.Monad import Criterion.Collection.Internal.Types import Data.IORef import Data.List (foldl') import Data.Monoid import qualified Data.Vector as V import qualified Data.Vector.Unboxed as U import Statistics.Quantile (continuousBy, cadpw) import Statistics.Sample import System.Mem (performGC) import System.Random.MWC import Text.Printf (printf) ------------------------------------------------------------------------------ data MeasurementMode = PerBatch | PerOperation data WorkloadMode = Pure | Mutating ------------------------------------------------------------------------------ data SampleData = SampleData { sdInputSize :: !Int -- ^ what was the size of the input for this -- sample? , sdNumOps :: !Int -- ^ how many operations are covered by this -- sample? For a per-operation measurement, -- this value would be \"1\", and for a batch -- measurement this value would be the number -- of items in the batch. , sdData :: !Sample -- ^ sample data. } instance Show SampleData where show (SampleData is nop da) = "" ------------------------------------------------------------------------------ data Benchmark op = Benchmark { benchmarkName :: String , dataStructures :: [(String, DataStructure op)] , inputSizes :: [Int] , workloadGenerator :: WorkloadGenerator op } ------------------------------------------------------------------------------ -- | Given some sample data, compute the mean time per operation (in seconds) -- and standard deviation computeMeanAndStddev :: SampleData -> (Double, Double) computeMeanAndStddev (SampleData _ nops sample) = (v,s) where nopsReal = fromIntegral nops (meanValue, var) = meanVarianceUnb sample stddev = sqrt $ abs var !v = meanValue / nopsReal !s = stddev / nopsReal ------------------------------------------------------------------------------ -- | Given some sample data, compute the 95th percentile. compute95thPercentile :: SampleData -> Double compute95thPercentile (SampleData _ nops sample) = v where nopsReal = fromIntegral nops quantile = continuousBy cadpw 19 20 sample v = quantile / nopsReal ------------------------------------------------------------------------------ -- | Given some sample data, compute the maximum value computeMax :: SampleData -> Double computeMax (SampleData _ nops sample) = v where nopsReal = fromIntegral nops maxval = U.foldl' max 0 sample v = maxval / nopsReal ------------------------------------------------------------------------------ takeSample :: (NFData op) => MeasurementMode -> WorkloadMode -> Benchmark op -> Environment -> GenIO -> Int -> Criterion [SampleData] takeSample !mMode !wMode !benchmark !env !rng !inputSize = do workload <- liftIO $ runWorkloadMonad (workGen inputSize) rng let setupOperations = setupWork workload let genWorkData = genWorkload workload case mMode of PerBatch -> batch setupOperations genWorkData PerOperation -> perOp setupOperations genWorkData where -------------------------------------------------------------------------- dss = dataStructures benchmark workGen = workloadGenerator benchmark -------------------------------------------------------------------------- batch setupOperations genWorkData = do workData <- liftIO $ runWorkloadMonad (genWorkData $ inputSize `div` 2) rng let nOps = V.length workData mapM (batchOne setupOperations workData nOps) dss -------------------------------------------------------------------------- mkRunOp runOpMutating = let runOpPure = \m op -> do m' <- runOpMutating m op return $! m' `seq` m in case wMode of Pure -> runOpPure Mutating -> runOpMutating -------------------------------------------------------------------------- runWorkData workData chunkSize runOp start i val = go i val where go !i !val | i >= chunkSize = return val | otherwise = do let op = V.unsafeIndex workData (start+i) !val' <- runOp val op go (i+1) val' -------------------------------------------------------------------------- batchOne setupOperations workData nOps (name, (DataStructure emptyValue runOpMutating)) = do note $ "running batch benchmark on " ++ name ++ "\n" let minTime = envClockResolution env * 1000 cfg <- getConfig let proc = V.foldM' runOpMutating let mkStartValue = emptyValue inputSize >>= flip proc setupOperations startValue1 <- liftIO mkStartValue liftIO performGC let tProc = runWorkData workData nOps runOpMutating 0 0 prolix $ "running test batch with " ++ show nOps ++ " work items\n" (tm,_) <- liftIO $ time (tProc startValue1) prolix $ "running initial timing on " ++ show nOps ++ " work items\n" let iters = max 5 (ceiling (minTime / tm)) prolix $ "running benchmark on " ++ show nOps ++ " work items, " ++ show iters ++ " iterations\n" sample <- liftIO $ U.generateM iters $ \_ -> do sv <- mkStartValue performGC (!tm,_) <- time (tProc sv) return $ tm - clockCost prolix $ "finished batch benchmark on " ++ name ++ "\n" return (SampleData inputSize nOps sample) -------------------------------------------------------------------------- perOp setupOperations genWorkData = do -- FIXME: lifted this code from criterion, is there some way to merge -- them? _ <- prolix "generating seed workload" seedData <- liftIO $ runWorkloadMonad (genWorkData 1000) rng _ <- prolix "seed workload generated" workData <- liftIO $ runWorkloadMonad (genWorkData inputSize) rng mapM (perOpOne setupOperations workData seedData) dss -------------------------------------------------------------------------- perOpOne setupOperations workData seedData (name, (DataStructure emptyValue runOpMutating)) = do let runOp = mkRunOp runOpMutating let proc = V.foldM' runOpMutating note $ "running per-op benchmark on " ++ name ++ "\n" startValue <- liftIO (emptyValue inputSize >>= flip proc setupOperations) liftIO performGC -- warm up clock _ <- liftIO $ runForAtLeast 0.1 10000 (`replicateM_` getTime) let minTime = envClockResolution env * 1000 (testTime, testIters, startValue') <- liftIO $ timeSeed (min minTime 0.1) seedData runOp startValue _ <- note "ran %d iterations in %s\n" testIters (secs testTime) cfg <- getConfig let testItersD = fromIntegral testIters let sampleCount = fromLJ cfgSamples cfg let timePer = (testTime - testItersD * clockCost) / testItersD let chunkSizeD = minTime / timePer let chunkSize = min (V.length workData) (ceiling chunkSizeD) let nSamples1 = min (chunkSize * sampleCount) (V.length workData) let numItersD = fromIntegral nSamples1 / fromIntegral chunkSize let nSamples = max 1 (floor numItersD * chunkSize) _ <- note "collecting %d samples (in chunks of %d) in estimated %s\n" nSamples chunkSize (secs ((chunkSizeD * timePer + clockCost)*numItersD)) (sample,_) <- mkSample chunkSize nSamples workData startValue' runOp liftIO performGC return (SampleData inputSize chunkSize sample) -------------------------------------------------------------------------- mkSample chunkSize nSamples workData startValue runOp = liftIO $ do valRef <- newIORef startValue let numItersD = fromIntegral nSamples / fromIntegral chunkSize -- make sure nSamples is an integral multiple of chunkSize let numIters = max 1 (floor (numItersD :: Double)) sample <- U.generateM numIters $ \chunk -> do !val <- readIORef valRef (!tm, val') <- time (runWorkData workData chunkSize runOp (chunk*chunkSize) 0 val) writeIORef valRef val' return $ tm - clockCost val <- readIORef valRef return (sample :: U.Vector Double, val) -------------------------------------------------------------------------- clockCost = envClockCost env -------------------------------------------------------------------------- timeSeed howLong seedData runOp startValue = loop startValue seedData (0::Int) =<< getTime where loop sv seed iters initTime = do now <- getTime let n = V.length seed when (now - initTime > howLong * 10) $ fail (printf "took too long to run: seed %d, iters %d" (V.length seed) iters) (elapsed, (_,sv')) <- time (mkSample 1 n seed sv runOp) if elapsed < howLong then loop sv' (seed `mappend` seed) (iters+1) initTime else return (elapsed, n, sv') ------------------------------------------------------------------------------ takeSamples :: (NFData op) => MeasurementMode -> WorkloadMode -> Benchmark op -> Environment -> GenIO -> Criterion [(String, [SampleData])] takeSamples !mMode !wMode !benchmark !env !rng = do let szs = inputSizes benchmark when (null szs) $ fail "No input sizes defined" ssamples <- mapM (takeSample mMode wMode benchmark env rng) szs let names = map fst $ dataStructures benchmark let inputs = foldl' combine (map (const []) names) (reverse ssamples) return $ names `zip` inputs where combine :: [[SampleData]] -> [SampleData] -> [[SampleData]] combine int samples = map (uncurry (flip (:))) (int `zip` samples) hashtables-1.0.1.8/benchmark/src/Criterion/Collection/Types.hs0000644000000000000000000000640612032120320022364 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE ExistentialQuantification #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} -- | The criterion collection is a set of utilities for benchmarking data -- structures using criterion -- (). -- -- The criterion collection allows you to test the /per-operation/ asymptotic -- performance of your data structures under a variety of simulated -- workloads. For testing a hash table, for example, you might be interested -- in: -- -- * how lookup and insert performance changes as the number of elements in -- your hash table grows -- -- * how lookup performance changes depending on the distribution of the -- lookup keys; you might expect a heavily skewed lookup distribution, where -- most of the requests are for a small subset of the keys, to have -- different performance characteristics than a set of lookups for keys -- uniformly distributed in the keyspace. -- -- * how the hashtable performs under a mixed workload of inserts, deletes, -- and lookups. -- -- Whereas "Criterion" allows you to run a single benchmark a number of times -- to see how fast it runs, @criterion-collection@ makes performance-testing -- data structures easier by decoupling benchmarking from workload generation, -- allowing you to see in-depth how performance changes as the input size -- varies. -- -- To test your data structure using @criterion-collection@, you provide the -- following: -- -- 1. A datatype which models the set of data structure operations you're -- interested in testing. For instance, for our hashtable example, your -- datatype might look like: -- -- > data Operation k = -- > -- | Insert a k-v pair into the collection. If k existed, we -- > -- should update the mapping. -- > Insert {-# UNPACK #-} !k -- > {-# UNPACK #-} !Int -- > -- | Lookup a key in the mapping. -- > | Lookup {-# UNPACK #-} !k -- > -- | Delete a key from the mapping. -- > | Delete {-# UNPACK #-} !k -- > deriving (Show) -- > -- > -- > instance (NFData k) => NFData (Operation k) where -- > rnf (Insert k v) = rnf k `seq` rnf v -- > rnf (Lookup k) = rnf k -- > rnf (Delete k) = rnf k -- -- 2. A function which, given an operation, runs it on your datastructure. -- -- 3. A \"ground state\" for your datastructure, usually \"empty\". You can -- test both pure data structures and data structures in 'IO'. -- -- 4. One or more \"workload simulators\" which, given a random number -- generator and an input size, give you back some functions to generate -- workloads: -- -- a) to prepopulate the data structure prior to the test -- -- b) to test the data structure with. -- -- (Side note: the reason @criterion-collection@ asks you to reify the -- operation type instead of just generating a list of mutation functions of -- type @[m -> m]@ is so you can test multiple datastructures under the same -- workload.) module Criterion.Collection.Types ( Workload(..) , WorkloadGenerator , WorkloadMonad , runWorkloadMonad , getRNG , DataStructure , emptyData , runOperation , setupData , setupDataIO ) where ------------------------------------------------------------------------------ import Criterion.Collection.Internal.Types hashtables-1.0.1.8/benchmark/src/Criterion/Collection/Internal/0000755000000000000000000000000012032120320022472 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Criterion/Collection/Internal/Types.hs0000644000000000000000000000753412032120320024143 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE ExistentialQuantification #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} module Criterion.Collection.Internal.Types ( Workload(..) , WorkloadGenerator , WorkloadMonad(..) , runWorkloadMonad , getRNG , DataStructure(..) , setupData , setupDataIO ) where ------------------------------------------------------------------------------ import Control.DeepSeq import Control.Monad.Reader import Data.Vector (Vector) import System.Random.MWC ------------------------------------------------------------------------------ -- Some thoughts on benchmarking modes -- -- * pre-fill data structure, test an operation workload without modifying the -- data structure, measure time for each operation -- -- ---> allows you to get fine-grained per-operation times with distributions -- -- * pre-fill data structure, get a bunch of work to do (cumulatively modifying -- the data structure), measure time per-operation OR for the whole batch and -- divide out -- -- -- Maybe it will look like this? -- > data MeasurementMode = PerBatch | PerOperation -- > data WorkloadMode = Pure | Mutating ------------------------------------------------------------------------------ newtype WorkloadMonad a = WM (ReaderT GenIO IO a) deriving (Monad, MonadIO) ------------------------------------------------------------------------------ runWorkloadMonad :: WorkloadMonad a -> GenIO -> IO a runWorkloadMonad (WM m) gen = runReaderT m gen ------------------------------------------------------------------------------ getRNG :: WorkloadMonad GenIO getRNG = WM ask ------------------------------------------------------------------------------ -- | Given an 'Int' representing \"input size\", a 'WorkloadGenerator' makes a -- 'Workload'. @Workload@s generate operations to prepopulate data structures -- with /O(n)/ data items, then generate operations on-demand to benchmark your -- data structure according to some interesting distribution. type WorkloadGenerator op = Int -> WorkloadMonad (Workload op) ------------------------------------------------------------------------------ data Workload op = Workload { -- | \"Setup work\" is work that you do to prepopulate a data structure -- to a certain size before testing begins. setupWork :: !(Vector op) -- | Given the number of operations to produce, 'genWorkload' spits out a -- randomly-distributed workload simulation to be used in the benchmark. -- -- | Some kinds of skewed workload distributions (the canonical example -- being \"frequent lookups for a small set of keys and infrequent -- lookups for the others\") need a certain minimum number of operations -- to be generated to be statistically valid, which only the -- 'WorkloadGenerator' would know how to decide. In these cases, you are -- free to return more than @N@ samples from 'genWorkload', and -- @criterion-collection@ will run them all for you. -- -- Otherwise, @criterion-collection@ is free to bootstrap your benchmark -- using as many sample points as it would take to make the results -- statistically relevant. , genWorkload :: !(Int -> WorkloadMonad (Vector op)) } ------------------------------------------------------------------------------ data DataStructure op = forall m . DataStructure { emptyData :: !(Int -> IO m) , runOperation :: !(m -> op -> IO m) } ------------------------------------------------------------------------------ setupData :: m -> (m -> op -> m) -> DataStructure op setupData e r = DataStructure (const $ return e) (\m o -> return $ r m o) ------------------------------------------------------------------------------ setupDataIO :: (Int -> IO m) -> (m -> op -> IO m) -> DataStructure op setupDataIO = DataStructure hashtables-1.0.1.8/benchmark/src/Data/0000755000000000000000000000000012032120320015536 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Data/Benchmarks/0000755000000000000000000000000012032120320017613 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Data/Benchmarks/UnorderedCollections/0000755000000000000000000000000012032120320023741 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Data/Benchmarks/UnorderedCollections/Distributions.hs0000644000000000000000000001764012032120320027147 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} module Data.Benchmarks.UnorderedCollections.Distributions ( makeRandomData , makeRandomVariateData -- * Workloads , insertWorkload , deleteWorkload , uniformLookupWorkload , exponentialLookupWorkload , loadOnly , loadAndUniformLookup , loadAndSkewedLookup , loadAndDeleteAll , loadAndDeleteSome , uniformlyMixed ) where import qualified Control.Concurrent.Thread as Th import Control.DeepSeq import Control.Monad import Control.Monad.Reader import Control.Monad.Trans (liftIO) import Data.Benchmarks.UnorderedCollections.Types import qualified Data.Vector as V import qualified Data.Vector.Mutable as MV import qualified Data.Vector.Unboxed as VU import Data.Vector (Vector) import qualified Data.Vector.Algorithms.Shuffle as V import GHC.Conc (numCapabilities) import Statistics.Distribution import Statistics.Distribution.Exponential import System.Random.MWC import Criterion.Collection.Types ------------------------------------------------------------------------------ debug :: (MonadIO m) => String -> m () debug s = liftIO $ putStrLn s ------------------------------------------------------------------------------ makeRandomData :: (NFData k) => (GenIO -> IO k) -> Int -> WorkloadMonad (Vector (k,Int)) makeRandomData !genFunc !n = do rng <- getRNG debug $ "making " ++ show n ++ " data items" keys <- liftIO $ vreplicateM n rng genFunc let !v = keys `V.zip` vals let !_ = forceVector v debug $ "made " ++ show n ++ " data items" return $! v where vals = V.enumFromN 0 n ------------------------------------------------------------------------------ makeRandomVariateData :: (Ord k, NFData k, Variate k) => Int -> WorkloadMonad (Vector (k,Int)) makeRandomVariateData = makeRandomData uniform ------------------------------------------------------------------------------ insertWorkload :: (NFData k) => Vector (k,Int) -> Vector (Operation k) insertWorkload = mapForce $ \(k,v) -> Insert k v ------------------------------------------------------------------------------ deleteWorkload :: (NFData k) => Vector (k,Int) -> Vector (Operation k) deleteWorkload = mapForce $ \(k,_) -> Delete k ------------------------------------------------------------------------------ uniformLookupWorkload :: (NFData k) => Vector (k,Int) -> Int -> WorkloadMonad (Vector (Operation k)) uniformLookupWorkload !vec !ntimes = do rng <- getRNG debug $ "uniformLookupWorkload: generating " ++ show ntimes ++ " lookups" v <- liftIO $ vreplicateM ntimes rng f debug $ "uniformLookupWorkload: done" return v where !n = V.length vec f r = do idx <- pick let (k,_) = V.unsafeIndex vec idx return $ Lookup k where pick = uniformR (0,n-1) r ------------------------------------------------------------------------------ exponentialLookupWorkload :: (NFData k) => Double -> Vector (k,Int) -> Int -> WorkloadMonad (Vector (Operation k)) exponentialLookupWorkload !lambda !vec !ntimes = do rng <- getRNG liftIO $ vreplicateM ntimes rng f where !dist = exponential lambda !n = V.length vec !n1 = n-1 !nd = fromIntegral n f r = do x <- uniformR (0.1, 7.0) r let idx = max 0 . min n1 . round $ nd * density dist x let (k,_) = V.unsafeIndex vec idx return $! Lookup k ------------------------------------------------------------------------------ loadOnly :: (NFData k) => (GenIO -> IO k) -- ^ rng for keys -> WorkloadGenerator (Operation k) loadOnly !genFunc !n = return $ Workload V.empty f where f _ = liftM insertWorkload $ makeRandomData genFunc n ------------------------------------------------------------------------------ loadAndUniformLookup :: (NFData k) => (GenIO -> IO k) -- ^ rng for keys -> WorkloadGenerator (Operation k) loadAndUniformLookup !genFunc !n = do !vals <- makeRandomData genFunc n let !inserts = insertWorkload vals return $! Workload inserts $ uniformLookupWorkload vals ------------------------------------------------------------------------------ loadAndSkewedLookup :: (NFData k) => (GenIO -> IO k) -- ^ rng for keys -> WorkloadGenerator (Operation k) loadAndSkewedLookup !genFunc !n = do !vals <- makeRandomData genFunc n let !inserts = insertWorkload vals return $! Workload inserts $ exponentialLookupWorkload 1.5 vals ------------------------------------------------------------------------------ loadAndDeleteAll :: (NFData k) => (GenIO -> IO k) -- ^ key generator -> WorkloadGenerator (Operation k) loadAndDeleteAll !genFunc !n = do rng <- getRNG !vals <- makeRandomData genFunc n let !inserts = insertWorkload vals let !deletes = deleteWorkload $ V.shuffle rng vals return $ Workload inserts (const $ return deletes) ------------------------------------------------------------------------------ loadAndDeleteSome :: (NFData k) => (GenIO -> IO k) -> WorkloadGenerator (Operation k) loadAndDeleteSome !genFunc !n = do !vals <- makeRandomData genFunc n let !inserts = insertWorkload vals return $ Workload inserts $ f vals where f vals k = do rng <- getRNG return $ deleteWorkload $ V.take k $ V.shuffle rng vals ------------------------------------------------------------------------------ uniformlyMixed :: (NFData k) => (GenIO -> IO k) -> Double -> Double -> WorkloadGenerator (Operation k) uniformlyMixed !genFunc !lookupPercentage !deletePercentage !n = do let !numLookups = ceiling (fromIntegral n * lookupPercentage) let !numDeletes = ceiling (fromIntegral n * deletePercentage) !vals <- makeRandomData genFunc n let !inserts = insertWorkload vals !lookups <- uniformLookupWorkload vals numLookups rng <- getRNG let !deletes = deleteWorkload $ V.take numDeletes $ V.shuffle rng vals let !out = V.shuffle rng $ V.concat [inserts, lookups, deletes] return $! Workload V.empty $ const $ return $ forceVector out ------------------------------------------------------------------------------ -- utilities ------------------------------------------------------------------------------ forceVector :: (NFData k) => Vector k -> Vector k forceVector !vec = V.foldl' force () vec `seq` vec where force x v = x `deepseq` v `deepseq` () mapForce :: (NFData b) => (a -> b) -> Vector a -> Vector b mapForce !f !vIn = let !vOut = V.map f vIn in forceVector vOut -- split a GenIO splitGenIO :: GenIO -> IO GenIO splitGenIO rng = VU.replicateM 256 (uniform rng) >>= initialize -- vector replicateM is slow as dogshit. vreplicateM :: Int -> GenIO -> (GenIO -> IO a) -> IO (Vector a) vreplicateM n origRng act = do rngs <- replicateM numCapabilities (splitGenIO origRng) mv <- MV.new n let actions = map (f mv) (parts `zip` rngs) results <- liftM (map snd) $ mapM Th.forkIO actions _ <- sequence results V.unsafeFreeze mv where parts = partition (n-1) numCapabilities f mv ((low,high),rng) = do f' low where f' !idx | idx > high = return () | otherwise = do x <- act rng MV.unsafeWrite mv idx x f' (idx+1) partition :: Int -> Int -> [(Int,Int)] partition n k = ys `zip` xs where xs = map f [1..k] ys = 0:(map (+1) xs) f i = (i * n) `div` k hashtables-1.0.1.8/benchmark/src/Data/Benchmarks/UnorderedCollections/Types.hs0000644000000000000000000000144312032120320025403 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} module Data.Benchmarks.UnorderedCollections.Types ( Operation(..) ) where import Control.DeepSeq ------------------------------------------------------------------------------ data Operation k = -- | Insert a k-v pair into the collection. If k existed, we should -- update the mapping. Insert {-# UNPACK #-} !k {-# UNPACK #-} !Int -- | Lookup a key in the mapping. | Lookup {-# UNPACK #-} !k -- | Delete a key from the mapping. | Delete {-# UNPACK #-} !k deriving (Show) ------------------------------------------------------------------------------ instance (NFData k) => NFData (Operation k) where rnf (Insert k v) = rnf k `seq` rnf v rnf (Lookup k) = rnf k rnf (Delete k) = rnf k hashtables-1.0.1.8/benchmark/src/Data/Vector/0000755000000000000000000000000012032120320017000 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Data/Vector/Algorithms/0000755000000000000000000000000012032120320021111 5ustar0000000000000000hashtables-1.0.1.8/benchmark/src/Data/Vector/Algorithms/Shuffle.hs0000644000000000000000000000131212032120320023036 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} module Data.Vector.Algorithms.Shuffle ( shuffle ) where import Control.Monad.ST (unsafeIOToST) import Data.Vector (Vector) import qualified Data.Vector as V import qualified Data.Vector.Mutable as MV import System.Random.MWC shuffle :: GenIO -> Vector k -> Vector k shuffle rng v = V.modify go v where !n = V.length v go mv = f (n-1) where -- note: inclusive pick b = unsafeIOToST $ uniformR (0,b) rng swap = MV.unsafeSwap mv f 0 = return () f !k = do idx <- pick k swap k idx f (k-1) hashtables-1.0.1.8/cbits/0000755000000000000000000000000012032120320013250 5ustar0000000000000000hashtables-1.0.1.8/cbits/cfuncs.c0000644000000000000000000003512412032120320014702 0ustar0000000000000000#include #include #include #ifdef WIN32 #include #else #include #endif #if defined(USE_SSE_4_1) #include #endif #if defined(__GNUC__) #define PREFETCH_READ(x) (__builtin_prefetch(x, 0, 3)) #define PREFETCH_WRITE(x) (__builtin_prefetch(x, 1, 3)) #else #define PREFETCH_READ(x) #define PREFETCH_WRITE(x) #endif void prefetchCacheLine32_write(uint32_t* line, int start) { PREFETCH_WRITE((void*)(&line[start])); } void prefetchCacheLine64_write(uint64_t* line, int start) { PREFETCH_WRITE((void*)(&line[start])); } void prefetchCacheLine32_read(uint32_t* line, int start) { PREFETCH_READ((void*)(&line[start])); } void prefetchCacheLine64_read(uint64_t* line, int start) { PREFETCH_READ((void*)(&line[start])); } int forwardSearch32_2(uint32_t* array, int start, int end, uint32_t x1, uint32_t x2) { uint32_t* ep = array + end; uint32_t* p = array + start; int wrapped = 0; while (1) { if (p == ep) { if (wrapped) return -1; ep = array + start; p = array; wrapped = 1; } if (*p == x1 || *p == x2) return p - array; ++p; } } int forwardSearch32_3(uint32_t* array, int start, int end, uint32_t x1, uint32_t x2, uint32_t x3) { uint32_t* ep = array + end; uint32_t* p = array + start; int wrapped = 0; while (1) { if (p == ep) { if (wrapped) return -1; ep = array + start; p = array; wrapped = 1; } if (*p == x1 || *p == x2 || *p == x3) return p - array; ++p; } } int forwardSearch64_2(uint64_t* array, int start, int end, uint64_t x1, uint64_t x2) { uint64_t* ep = array + end; uint64_t* p = array + start; int wrapped = 0; while (1) { if (p == ep) { if (wrapped) return -1; ep = array + start; p = array; wrapped = 1; } if (*p == x1 || *p == x2) return p - array; ++p; } } int forwardSearch64_3(uint64_t* array, int start, int end, uint64_t x1, uint64_t x2, uint64_t x3) { uint64_t* ep = array + end; uint64_t* p = array + start; int wrapped = 0; while (1) { if (p == ep) { if (wrapped) return -1; ep = array + start; p = array; wrapped = 1; } if (*p == x1 || *p == x2 || *p == x3) return p - array; ++p; } } //---------------------------------------------------------------------------- // cache line search functions // First: 32 bit inline int mask(int a, int b) { return -(a == b); } uint8_t deBruijnBitPositions[] = { 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9 }; int firstBitSet(int a) { int zeroCase = mask(0, a); uint32_t x = (uint32_t) (a & -a); x *= 0x077CB531; x >>= 27; return zeroCase | deBruijnBitPositions[x]; } int32_t lineResult32(int m, int start) { int p = firstBitSet(m); int32_t mm = mask(p, -1); return mm | (~mm & (start + p)); } uint32_t lineMask32(uint32_t* array, int start, uint32_t value) { uint32_t* p = array + start; uint32_t m = 0; int offset = start & 0xf; switch (offset) { case 0: m |= mask(*p++, value) & 0x1; case 1: m |= mask(*p++, value) & 0x2; case 2: m |= mask(*p++, value) & 0x4; case 3: m |= mask(*p++, value) & 0x8; case 4: m |= mask(*p++, value) & 0x10; case 5: m |= mask(*p++, value) & 0x20; case 6: m |= mask(*p++, value) & 0x40; case 7: m |= mask(*p++, value) & 0x80; case 8: m |= mask(*p++, value) & 0x100; case 9: m |= mask(*p++, value) & 0x200; case 10: m |= mask(*p++, value) & 0x400; case 11: m |= mask(*p++, value) & 0x800; case 12: m |= mask(*p++, value) & 0x1000; case 13: m |= mask(*p++, value) & 0x2000; case 14: m |= mask(*p++, value) & 0x4000; case 15: m |= mask(*p++, value) & 0x8000; } return m >> offset; } int lineSearch32(uint32_t* array, int start, uint32_t value) { uint32_t m = lineMask32(array, start, value); return lineResult32((int)m, start); } uint32_t lineMask32_2(uint32_t* array, int start, uint32_t x1, uint32_t x2) { uint32_t* p = array + start; uint32_t m = 0; int offset = start & 0xf; switch (offset) { case 0: m |= (mask(*p, x1) | mask(*p, x2)) & 0x1; ++p; case 1: m |= (mask(*p, x1) | mask(*p, x2)) & 0x2; ++p; case 2: m |= (mask(*p, x1) | mask(*p, x2)) & 0x4; ++p; case 3: m |= (mask(*p, x1) | mask(*p, x2)) & 0x8; ++p; case 4: m |= (mask(*p, x1) | mask(*p, x2)) & 0x10; ++p; case 5: m |= (mask(*p, x1) | mask(*p, x2)) & 0x20; ++p; case 6: m |= (mask(*p, x1) | mask(*p, x2)) & 0x40; ++p; case 7: m |= (mask(*p, x1) | mask(*p, x2)) & 0x80; ++p; case 8: m |= (mask(*p, x1) | mask(*p, x2)) & 0x100; ++p; case 9: m |= (mask(*p, x1) | mask(*p, x2)) & 0x200; ++p; case 10: m |= (mask(*p, x1) | mask(*p, x2)) & 0x400; ++p; case 11: m |= (mask(*p, x1) | mask(*p, x2)) & 0x800; ++p; case 12: m |= (mask(*p, x1) | mask(*p, x2)) & 0x1000; ++p; case 13: m |= (mask(*p, x1) | mask(*p, x2)) & 0x2000; ++p; case 14: m |= (mask(*p, x1) | mask(*p, x2)) & 0x4000; ++p; case 15: m |= (mask(*p, x1) | mask(*p, x2)) & 0x8000; ++p; } return m >> offset; } int lineSearch32_2(uint32_t* array, int start, uint32_t x1, uint32_t x2) { uint32_t m = lineMask32_2(array, start, x1, x2); return lineResult32((int)m, start); } uint32_t lineMask32_3(uint32_t* array, int start, uint32_t x1, uint32_t x2, uint32_t x3) { uint32_t* p = array + start; uint32_t m = 0; int offset = start & 0xf; switch (offset) { case 0: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x1; ++p; case 1: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x2; ++p; case 2: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x4; ++p; case 3: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x8; ++p; case 4: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x10; ++p; case 5: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x20; ++p; case 6: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x40; ++p; case 7: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x80; ++p; case 8: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x100; ++p; case 9: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x200; ++p; case 10: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x400; ++p; case 11: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x800; ++p; case 12: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x1000; ++p; case 13: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x2000; ++p; case 14: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x4000; ++p; case 15: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x8000; ++p; } return m >> offset; } int lineSearch32_3(uint32_t* array, int start, uint32_t x1, uint32_t x2, uint32_t x3) { uint32_t m = lineMask32_3(array, start, x1, x2, x3); return lineResult32((int)m, start); } //---------------------------------------------------------------------------- // Now: 64-bit. If USE_SSE_4_1 is on, we will use SSE4.1 SIMD instructions to // search the cache line super-efficiently. #if defined(USE_SSE_4_1) inline uint64_t mask_to_mask2(__m128i m) { int mask16 = _mm_movemask_epi8(m); // output of _mm_movemask_epi8 is a 16-bit word where bit i is 1 iff the // most significant bit of byte i of the mask is 1 int m1 = mask16 & 0x1; int m2 = (mask16 & 0x100) >> 7; return (uint64_t) (m1 | m2); } inline uint64_t cmp_and_mask(__m128i val, __m128i x0) { __m128i mask1 = _mm_cmpeq_epi64(val, x0); return mask_to_mask2(mask1); } inline uint64_t cmp_and_mask_2(__m128i val, __m128i x0, __m128i x1) { __m128i mask1 = _mm_cmpeq_epi64(val, x0); __m128i mask2 = _mm_cmpeq_epi64(val, x1); mask1 = _mm_or_si128(mask1, mask2); return mask_to_mask2(mask1); } inline uint64_t cmp_and_mask_3(__m128i val, __m128i x0, __m128i x1, __m128i x2) { __m128i mask1 = _mm_cmpeq_epi64(val, x0); __m128i mask2 = _mm_cmpeq_epi64(val, x1); __m128i mask3 = _mm_cmpeq_epi64(val, x2); mask1 = _mm_or_si128(mask1, mask2); mask1 = _mm_or_si128(mask1, mask3); return mask_to_mask2(mask1); } uint64_t lineMask64(uint64_t* array, int start0, uint64_t v1) { int offset = start0 & 0x7; int start = start0 & ~0x7; __m128i* p = (__m128i*) (&array[start]); __m128i x1 = _mm_cvtsi32_si128(0); x1 = _mm_insert_epi64(x1, v1, 0); x1 = _mm_insert_epi64(x1, v1, 1); uint64_t dest_mask = 0; // x1 contains two 64-bit copies of the value to look for // words 0, 1 __m128i x = _mm_load_si128(p); dest_mask = cmp_and_mask(x, x1); p = (__m128i*) (&array[start+2]); // words 2, 3 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask(x, x1) << 2); p = (__m128i*) (&array[start+4]); // words 4, 5 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask(x, x1) << 4); p = (__m128i*) (&array[start+6]); // words 6, 7 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask(x, x1) << 6); return dest_mask >> offset; } uint64_t lineMask64_2(uint64_t* array, int start0, uint64_t v1, uint64_t v2) { int offset = start0 & 0x7; int start = start0 & ~0x7; __m128i* p = (__m128i*) (&array[start]); __m128i x1 = _mm_cvtsi32_si128(0); x1 = _mm_insert_epi64(x1, v1, 0); x1 = _mm_insert_epi64(x1, v1, 1); __m128i x2 = _mm_cvtsi32_si128(0); x2 = _mm_insert_epi64(x2, v2, 0); x2 = _mm_insert_epi64(x2, v2, 1); uint64_t dest_mask = 0; // words 0, 1 __m128i x = _mm_load_si128(p); dest_mask = cmp_and_mask_2(x, x1, x2); p = (__m128i*) (&array[start+2]); // words 2, 3 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask_2(x, x1, x2) << 2); p = (__m128i*) (&array[start+4]); // words 4, 5 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask_2(x, x1, x2) << 4); p = (__m128i*) (&array[start+6]); // words 6, 7 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask_2(x, x1, x2) << 6); return dest_mask >> offset; } uint64_t lineMask64_3(uint64_t* array, int start0, uint64_t v1, uint64_t v2, uint64_t v3) { int offset = start0 & 0x7; int start = start0 & ~0x7; __m128i* p = (__m128i*) (&array[start]); __m128i x1 = _mm_cvtsi32_si128(0); x1 = _mm_insert_epi64(x1, v1, 0); x1 = _mm_insert_epi64(x1, v1, 1); __m128i x2 = _mm_cvtsi32_si128(0); x2 = _mm_insert_epi64(x2, v2, 0); x2 = _mm_insert_epi64(x2, v2, 1); __m128i x3 = _mm_cvtsi32_si128(0); x3 = _mm_insert_epi64(x3, v3, 0); x3 = _mm_insert_epi64(x3, v3, 1); uint64_t dest_mask = 0; // words 0, 1 __m128i x = _mm_load_si128(p); dest_mask = cmp_and_mask_3(x, x1, x2, x3); p = (__m128i*) (&array[start+2]); // words 2, 3 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask_3(x, x1, x2, x3) << 2); p = (__m128i*) (&array[start+4]); // words 4, 5 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask_3(x, x1, x2, x3) << 4); p = (__m128i*) (&array[start+6]); // words 6, 7 x = _mm_load_si128(p); dest_mask |= (cmp_and_mask_3(x, x1, x2, x3) << 6); return dest_mask >> offset; } #else uint64_t lineMask64(uint64_t* array, int start, uint64_t value) { uint64_t* p = array + start; uint64_t m = 0; int offset = start & 0x7; switch (offset) { case 0: m |= mask(*p++, value) & 0x1; case 1: m |= mask(*p++, value) & 0x2; case 2: m |= mask(*p++, value) & 0x4; case 3: m |= mask(*p++, value) & 0x8; case 4: m |= mask(*p++, value) & 0x10; case 5: m |= mask(*p++, value) & 0x20; case 6: m |= mask(*p++, value) & 0x40; case 7: m |= mask(*p++, value) & 0x80; } return m >> offset; } uint64_t lineMask64_2(uint64_t* array, int start, uint64_t x1, uint64_t x2) { uint64_t* p = array + start; uint64_t m = 0; int offset = start & 0x7; switch (offset) { case 0: m |= (mask(*p, x1) | mask(*p, x2)) & 0x1; ++p; case 1: m |= (mask(*p, x1) | mask(*p, x2)) & 0x2; ++p; case 2: m |= (mask(*p, x1) | mask(*p, x2)) & 0x4; ++p; case 3: m |= (mask(*p, x1) | mask(*p, x2)) & 0x8; ++p; case 4: m |= (mask(*p, x1) | mask(*p, x2)) & 0x10; ++p; case 5: m |= (mask(*p, x1) | mask(*p, x2)) & 0x20; ++p; case 6: m |= (mask(*p, x1) | mask(*p, x2)) & 0x40; ++p; case 7: m |= (mask(*p, x1) | mask(*p, x2)) & 0x80; ++p; } return m >> offset; } uint64_t lineMask64_3(uint64_t* array, int start, uint64_t x1, uint64_t x2, uint64_t x3) { uint64_t* p = array + start; uint64_t m = 0; int offset = start & 0x7; switch (offset) { case 0: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x1; ++p; case 1: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x2; ++p; case 2: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x4; ++p; case 3: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x8; ++p; case 4: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x10; ++p; case 5: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x20; ++p; case 6: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x40; ++p; case 7: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x80; ++p; } return m >> offset; } #endif // USE_SSE_4_1 int64_t lineResult64(int64_t m, int64_t start) { int p = firstBitSet((int)m); int64_t mm = (int64_t) mask(p, -1); return mm | (~mm & (start + p)); } int lineSearch64(uint64_t* array, int start, uint64_t value) { uint64_t m = lineMask64(array, start, value); return lineResult64((int)m, start); } int lineSearch64_2(uint64_t* array, int start, uint64_t x1, uint64_t x2) { uint64_t m = lineMask64_2(array, start, x1, x2); return lineResult64((int)m, start); } int lineSearch64_3(uint64_t* array, int start, uint64_t x1, uint64_t x2, uint64_t x3) { uint64_t m = lineMask64_3(array, start, x1, x2, x3); return lineResult64((int)m, start); } void suicide(volatile int* check, int t) { int secs = (3*t + 999999) / 1000000; if (secs < 1) secs = 1; #ifdef WIN32 Sleep(secs * 1000); #else sleep(secs); #endif if (*check) { printf("timeout expired, dying!!\n"); #ifdef WIN32 abort(); #else raise(SIGKILL); #endif } } hashtables-1.0.1.8/src/0000755000000000000000000000000012032120320012733 5ustar0000000000000000hashtables-1.0.1.8/src/Data/0000755000000000000000000000000012032120320013604 5ustar0000000000000000hashtables-1.0.1.8/src/Data/HashTable/0000755000000000000000000000000012032120320015437 5ustar0000000000000000hashtables-1.0.1.8/src/Data/HashTable/Class.hs0000644000000000000000000000715412032120320017047 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} -- | This module contains a 'HashTable' typeclass for the hash table -- implementations in this package. This allows you to provide functions which -- will work for any hash table implementation in this collection. -- -- It is recommended to create a concrete type alias in your code when using this -- package, i.e.: -- -- > import qualified Data.HashTable.IO as H -- > -- > type HashTable k v = H.BasicHashTable k v -- > -- > foo :: IO (HashTable Int Int) -- > foo = do -- > ht <- H.new -- > H.insert ht 1 1 -- > return ht -- -- or -- -- > import qualified Data.HashTable.ST.Cuckoo as C -- > import qualified Data.HashTable.Class as H -- > -- > type HashTable s k v = C.HashTable s k v -- > -- > foo :: ST s (HashTable s k v) -- > foo = do -- > ht <- H.new -- > H.insert ht 1 1 -- > return ht -- -- Firstly, this makes it easy to switch to a different hash table -- implementation, and secondly, using a concrete type rather than leaving your -- functions abstract in the 'HashTable' class should allow GHC to optimize -- away the typeclass dictionaries. -- -- Note that the functions in this typeclass are in the 'ST' monad; if you want -- hash tables in 'IO', use the convenience wrappers in "Data.HashTable.IO". -- module Data.HashTable.Class ( HashTable(..) , fromList , toList ) where import Control.Monad.ST import Data.Hashable import Prelude hiding (mapM_) -- | A typeclass for hash tables in the 'ST' monad. The operations on these -- hash tables are typically both key- and value-strict. class HashTable h where -- | Creates a new, default-sized hash table. /O(1)/. new :: ST s (h s k v) -- | Creates a new hash table sized to hold @n@ elements. /O(n)/. newSized :: Int -> ST s (h s k v) -- | Inserts a key/value mapping into a hash table, replacing any existing -- mapping for that key. -- -- /O(n)/ worst case, /O(1)/ amortized. insert :: (Eq k, Hashable k) => h s k v -> k -> v -> ST s () -- | Deletes a key-value mapping from a hash table. /O(n)/ worst case, -- /O(1)/ amortized. delete :: (Eq k, Hashable k) => h s k v -> k -> ST s () -- | Looks up a key-value mapping in a hash table. /O(n)/ worst case, -- (/O(1)/ for cuckoo hash), /O(1)/ amortized. lookup :: (Eq k, Hashable k) => h s k v -> k -> ST s (Maybe v) -- | A strict fold over the key-value records of a hash table in the 'ST' -- monad. /O(n)/. foldM :: (a -> (k,v) -> ST s a) -> a -> h s k v -> ST s a -- | A side-effecting map over the key-value records of a hash -- table. /O(n)/. mapM_ :: ((k,v) -> ST s b) -> h s k v -> ST s () -- | Computes the overhead (in words) per key-value mapping. Used for -- debugging, etc; time complexity depends on the underlying hash table -- implementation. /O(n)/. computeOverhead :: h s k v -> ST s Double ------------------------------------------------------------------------------ -- | Create a hash table from a list of key-value pairs. /O(n)/. fromList :: (HashTable h, Eq k, Hashable k) => [(k,v)] -> ST s (h s k v) fromList l = do ht <- newSized (length l) go ht l where go ht = go' where go' [] = return ht go' ((!k,!v):xs) = do insert ht k v go' xs {-# INLINE fromList #-} ------------------------------------------------------------------------------ -- | Given a hash table, produce a list of key-value pairs. /O(n)/. toList :: (HashTable h) => h s k v -> ST s [(k,v)] toList ht = do l <- foldM f [] ht return l where f !l !t = return (t:l) {-# INLINE toList #-} hashtables-1.0.1.8/src/Data/HashTable/IO.hs0000644000000000000000000002142312032120320016304 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE EmptyDataDecls #-} -- | This module provides wrappers in 'IO' around the functions from -- "Data.HashTable.Class". -- -- This module exports three concrete hash table types, one for each hash table -- implementation in this package: -- -- > type BasicHashTable k v = IOHashTable (B.HashTable) k v -- > type CuckooHashTable k v = IOHashTable (Cu.HashTable) k v -- > type LinearHashTable k v = IOHashTable (L.HashTable) k v -- -- The 'IOHashTable' type can be thought of as a wrapper around a concrete -- hashtable type, which sets the 'ST' monad state type to 'PrimState' 'IO', -- a.k.a. 'RealWorld': -- -- > type IOHashTable tabletype k v = tabletype (PrimState IO) k v -- -- This module provides 'stToIO' wrappers around the hashtable functions (which -- are in 'ST') to make it convenient to use them in 'IO'. It is intended to be -- imported qualified and used with a user-defined type alias, i.e.: -- -- > import qualified Data.HashTable.IO as H -- > -- > type HashTable k v = H.CuckooHashTable k v -- > -- > foo :: IO (HashTable Int Int) -- > foo = do -- > ht <- H.new -- > H.insert ht 1 1 -- > return ht -- -- Essentially, anywhere you see @'IOHashTable' h k v@ in the type signatures -- below, you can plug in any of @'BasicHashTable' k v@, @'CuckooHashTable' k -- v@, or @'LinearHashTable' k v@. -- module Data.HashTable.IO ( BasicHashTable , CuckooHashTable , LinearHashTable , IOHashTable , new , newSized , insert , delete , lookup , fromList , toList , mapM_ , foldM , computeOverhead ) where ------------------------------------------------------------------------------ import Control.Monad.Primitive (PrimState) import Control.Monad.ST import Data.Hashable (Hashable) import qualified Data.HashTable.Class as C import Prelude hiding (lookup, mapM_) ------------------------------------------------------------------------------ import qualified Data.HashTable.ST.Basic as B import qualified Data.HashTable.ST.Cuckoo as Cu import qualified Data.HashTable.ST.Linear as L ------------------------------------------------------------------------------ -- | A type alias for a basic open addressing hash table using linear -- probing. See "Data.HashTable.ST.Basic". type BasicHashTable k v = IOHashTable (B.HashTable) k v -- | A type alias for the cuckoo hash table. See "Data.HashTable.ST.Cuckoo". type CuckooHashTable k v = IOHashTable (Cu.HashTable) k v -- | A type alias for the linear hash table. See "Data.HashTable.ST.Linear". type LinearHashTable k v = IOHashTable (L.HashTable) k v ------------------------------------------------------------------------------ -- | A type alias for our hash tables, which run in 'ST', to set the state -- token type to 'PrimState' 'IO' (aka 'RealWorld') so that we can use them in -- 'IO'. type IOHashTable tabletype k v = tabletype (PrimState IO) k v ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:new". new :: C.HashTable h => IO (IOHashTable h k v) new = stToIO C.new {-# INLINE new #-} {-# SPECIALIZE INLINE new :: IO (BasicHashTable k v) #-} {-# SPECIALIZE INLINE new :: IO (LinearHashTable k v) #-} {-# SPECIALIZE INLINE new :: IO (CuckooHashTable k v) #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:newSized". newSized :: C.HashTable h => Int -> IO (IOHashTable h k v) newSized = stToIO . C.newSized {-# INLINE newSized #-} {-# SPECIALIZE INLINE newSized :: Int -> IO (BasicHashTable k v) #-} {-# SPECIALIZE INLINE newSized :: Int -> IO (LinearHashTable k v) #-} {-# SPECIALIZE INLINE newSized :: Int -> IO (CuckooHashTable k v) #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:update". insert :: (C.HashTable h, Eq k, Hashable k) => IOHashTable h k v -> k -> v -> IO () insert h k v = stToIO $ C.insert h k v {-# INLINE insert #-} {-# SPECIALIZE INLINE insert :: (Eq k, Hashable k) => BasicHashTable k v -> k -> v -> IO () #-} {-# SPECIALIZE INLINE insert :: (Eq k, Hashable k) => LinearHashTable k v -> k -> v -> IO () #-} {-# SPECIALIZE INLINE insert :: (Eq k, Hashable k) => CuckooHashTable k v -> k -> v -> IO () #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:delete". delete :: (C.HashTable h, Eq k, Hashable k) => IOHashTable h k v -> k -> IO () delete h k = stToIO $ C.delete h k {-# INLINE delete #-} {-# SPECIALIZE INLINE delete :: (Eq k, Hashable k) => BasicHashTable k v -> k -> IO () #-} {-# SPECIALIZE INLINE delete :: (Eq k, Hashable k) => LinearHashTable k v -> k -> IO () #-} {-# SPECIALIZE INLINE delete :: (Eq k, Hashable k) => CuckooHashTable k v -> k -> IO () #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:lookup". lookup :: (C.HashTable h, Eq k, Hashable k) => IOHashTable h k v -> k -> IO (Maybe v) lookup h k = stToIO $ C.lookup h k {-# INLINE lookup #-} {-# SPECIALIZE INLINE lookup :: (Eq k, Hashable k) => BasicHashTable k v -> k -> IO (Maybe v) #-} {-# SPECIALIZE INLINE lookup :: (Eq k, Hashable k) => LinearHashTable k v -> k -> IO (Maybe v) #-} {-# SPECIALIZE INLINE lookup :: (Eq k, Hashable k) => CuckooHashTable k v -> k -> IO (Maybe v) #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:fromList". fromList :: (C.HashTable h, Eq k, Hashable k) => [(k,v)] -> IO (IOHashTable h k v) fromList = stToIO . C.fromList {-# INLINE fromList #-} {-# SPECIALIZE INLINE fromList :: (Eq k, Hashable k) => [(k,v)] -> IO (BasicHashTable k v) #-} {-# SPECIALIZE INLINE fromList :: (Eq k, Hashable k) => [(k,v)] -> IO (LinearHashTable k v) #-} {-# SPECIALIZE INLINE fromList :: (Eq k, Hashable k) => [(k,v)] -> IO (CuckooHashTable k v) #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:toList". toList :: (C.HashTable h, Eq k, Hashable k) => IOHashTable h k v -> IO [(k,v)] toList = stToIO . C.toList {-# INLINE toList #-} {-# SPECIALIZE INLINE toList :: (Eq k, Hashable k) => BasicHashTable k v -> IO [(k,v)] #-} {-# SPECIALIZE INLINE toList :: (Eq k, Hashable k) => LinearHashTable k v -> IO [(k,v)] #-} {-# SPECIALIZE INLINE toList :: (Eq k, Hashable k) => CuckooHashTable k v -> IO [(k,v)] #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:foldM". foldM :: (C.HashTable h) => (a -> (k,v) -> IO a) -> a -> IOHashTable h k v -> IO a foldM f seed ht = stToIO $ C.foldM f' seed ht where f' !i !t = unsafeIOToST $ f i t {-# INLINE foldM #-} {-# SPECIALIZE INLINE foldM :: (a -> (k,v) -> IO a) -> a -> BasicHashTable k v -> IO a #-} {-# SPECIALIZE INLINE foldM :: (a -> (k,v) -> IO a) -> a -> LinearHashTable k v -> IO a #-} {-# SPECIALIZE INLINE foldM :: (a -> (k,v) -> IO a) -> a -> CuckooHashTable k v -> IO a #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in "Data.HashTable.Class#v:mapM_". mapM_ :: (C.HashTable h) => ((k,v) -> IO a) -> IOHashTable h k v -> IO () mapM_ f ht = stToIO $ C.mapM_ f' ht where f' = unsafeIOToST . f {-# INLINE mapM_ #-} {-# SPECIALIZE INLINE mapM_ :: ((k,v) -> IO a) -> BasicHashTable k v -> IO () #-} {-# SPECIALIZE INLINE mapM_ :: ((k,v) -> IO a) -> LinearHashTable k v -> IO () #-} {-# SPECIALIZE INLINE mapM_ :: ((k,v) -> IO a) -> CuckooHashTable k v -> IO () #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:computeOverhead". computeOverhead :: (C.HashTable h) => IOHashTable h k v -> IO Double computeOverhead = stToIO . C.computeOverhead {-# INLINE computeOverhead #-} hashtables-1.0.1.8/src/Data/HashTable/Internal/0000755000000000000000000000000012032120320017213 5ustar0000000000000000hashtables-1.0.1.8/src/Data/HashTable/Internal/Array.hs0000644000000000000000000000160512032120320020627 0ustar0000000000000000{-# LANGUAGE CPP #-} module Data.HashTable.Internal.Array ( MutableArray , newArray , readArray , writeArray ) where import Control.Monad.ST #ifdef BOUNDS_CHECKING import qualified Data.Vector.Mutable as M import Data.Vector.Mutable (MVector) #else import qualified Data.Primitive.Array as M import Data.Primitive.Array (MutableArray) #endif #ifdef BOUNDS_CHECKING type MutableArray s a = MVector s a newArray :: Int -> a -> ST s (MutableArray s a) newArray = M.replicate readArray :: MutableArray s a -> Int -> ST s a readArray = M.read writeArray :: MutableArray s a -> Int -> a -> ST s () writeArray = M.write #else newArray :: Int -> a -> ST s (MutableArray s a) newArray = M.newArray readArray :: MutableArray s a -> Int -> ST s a readArray = M.readArray writeArray :: MutableArray s a -> Int -> a -> ST s () writeArray = M.writeArray #endif hashtables-1.0.1.8/src/Data/HashTable/Internal/CacheLine.hs0000644000000000000000000007441212032120320021372 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE ForeignFunctionInterface #-} {-# LANGUAGE MagicHash #-} module Data.HashTable.Internal.CacheLine ( cacheLineSearch , cacheLineSearch2 , cacheLineSearch3 , forwardSearch2 , forwardSearch3 , isCacheLineAligned , advanceByCacheLineSize , prefetchRead , prefetchWrite , bl_abs# , sign# , mask# , maskw# ) where import Control.Monad.ST import Data.HashTable.Internal.IntArray (IntArray) import qualified Data.HashTable.Internal.IntArray as M #ifndef NO_C_SEARCH import Control.Monad import Foreign.C.Types #else import Data.Bits import Data.Int import qualified Data.Vector.Unboxed as U import GHC.Int #endif import Data.HashTable.Internal.Utils import GHC.Exts {-# INLINE prefetchRead #-} {-# INLINE prefetchWrite #-} prefetchRead :: IntArray s -> Int -> ST s () prefetchWrite :: IntArray s -> Int -> ST s () #ifndef NO_C_SEARCH foreign import ccall unsafe "lineSearch32" c_lineSearch32 :: Ptr a -> CInt -> CUInt -> IO CInt foreign import ccall unsafe "lineSearch64" c_lineSearch64 :: Ptr a -> CInt -> CULong -> IO CInt foreign import ccall unsafe "lineSearch32_2" c_lineSearch32_2 :: Ptr a -> CInt -> CUInt -> CUInt -> IO CInt foreign import ccall unsafe "lineSearch64_2" c_lineSearch64_2 :: Ptr a -> CInt -> CULong -> CULong -> IO CInt foreign import ccall unsafe "lineSearch32_3" c_lineSearch32_3 :: Ptr a -> CInt -> CUInt -> CUInt -> CUInt -> IO CInt foreign import ccall unsafe "lineSearch64_3" c_lineSearch64_3 :: Ptr a -> CInt -> CULong -> CULong -> CULong -> IO CInt foreign import ccall unsafe "forwardSearch32_2" c_forwardSearch32_2 :: Ptr a -> CInt -> CInt -> CUInt -> CUInt -> IO CInt foreign import ccall unsafe "forwardSearch32_3" c_forwardSearch32_3 :: Ptr a -> CInt -> CInt -> CUInt -> CUInt -> CUInt -> IO CInt foreign import ccall unsafe "forwardSearch64_2" c_forwardSearch64_2 :: Ptr a -> CInt -> CInt -> CULong -> CULong -> IO CInt foreign import ccall unsafe "forwardSearch64_3" c_forwardSearch64_3 :: Ptr a -> CInt -> CInt -> CULong -> CULong -> CULong -> IO CInt foreign import ccall unsafe "prefetchCacheLine32_read" prefetchCacheLine32_read :: Ptr a -> CInt -> IO () foreign import ccall unsafe "prefetchCacheLine64_read" prefetchCacheLine64_read :: Ptr a -> CInt -> IO () foreign import ccall unsafe "prefetchCacheLine32_write" prefetchCacheLine32_write :: Ptr a -> CInt -> IO () foreign import ccall unsafe "prefetchCacheLine64_write" prefetchCacheLine64_write :: Ptr a -> CInt -> IO () fI :: (Num b, Integral a) => a -> b fI = fromIntegral prefetchRead a i = unsafeIOToST c where v = M.toPtr a x = fI i c32 = prefetchCacheLine32_read v x c64 = prefetchCacheLine64_read v x c = if wordSize == 32 then c32 else c64 prefetchWrite a i = unsafeIOToST c where v = M.toPtr a x = fI i c32 = prefetchCacheLine32_write v x c64 = prefetchCacheLine64_write v x c = if wordSize == 32 then c32 else c64 {-# INLINE forwardSearch2 #-} forwardSearch2 :: IntArray s -> Int -> Int -> Int -> Int -> ST s Int forwardSearch2 !vec !start !end !x1 !x2 = liftM fromEnum $! unsafeIOToST c where c32 = c_forwardSearch32_2 (M.toPtr vec) (fI start) (fI end) (fI x1) (fI x2) c64 = c_forwardSearch64_2 (M.toPtr vec) (fI start) (fI end) (fI x1) (fI x2) c = if wordSize == 32 then c32 else c64 {-# INLINE forwardSearch3 #-} forwardSearch3 :: IntArray s -> Int -> Int -> Int -> Int -> Int -> ST s Int forwardSearch3 !vec !start !end !x1 !x2 !x3 = liftM fromEnum $! unsafeIOToST c where c32 = c_forwardSearch32_3 (M.toPtr vec) (fI start) (fI end) (fI x1) (fI x2) (fI x3) c64 = c_forwardSearch64_3 (M.toPtr vec) (fI start) (fI end) (fI x1) (fI x2) (fI x3) c = if wordSize == 32 then c32 else c64 {-# INLINE lineSearch #-} lineSearch :: IntArray s -> Int -> Int -> ST s Int lineSearch !vec !start !value = liftM fromEnum $! unsafeIOToST c where c32 = c_lineSearch32 (M.toPtr vec) (fI start) (fI value) c64 = c_lineSearch64 (M.toPtr vec) (fI start) (fI value) c = if wordSize == 32 then c32 else c64 {-# INLINE lineSearch2 #-} lineSearch2 :: IntArray s -> Int -> Int -> Int -> ST s Int lineSearch2 !vec !start !x1 !x2 = liftM fromEnum $! unsafeIOToST c where c32 = c_lineSearch32_2 (M.toPtr vec) (fI start) (fI x1) (fI x2) c64 = c_lineSearch64_2 (M.toPtr vec) (fI start) (fI x1) (fI x2) c = if wordSize == 32 then c32 else c64 {-# INLINE lineSearch3 #-} lineSearch3 :: IntArray s -> Int -> Int -> Int -> Int -> ST s Int lineSearch3 !vec !start !x1 !x2 !x3 = liftM fromEnum $! unsafeIOToST c where c32 = c_lineSearch32_3 (M.toPtr vec) (fI start) (fI x1) (fI x2) (fI x3) c64 = c_lineSearch64_3 (M.toPtr vec) (fI start) (fI x1) (fI x2) (fI x3) c = if wordSize == 32 then c32 else c64 #endif {-# INLINE advanceByCacheLineSize #-} advanceByCacheLineSize :: Int -> Int -> Int advanceByCacheLineSize !(I# start0#) !(I# vecSize#) = out where !(I# clm#) = cacheLineIntMask !clmm# = not# (int2Word# clm#) !start# = word2Int# (clmm# `and#` int2Word# start0#) !(I# nw#) = numWordsInCacheLine !start'# = start# +# nw# !s# = sign# (vecSize# -# start'# -# 1#) !m# = not# (int2Word# s#) !r# = int2Word# start'# `and#` m# !out = I# (word2Int# r#) {-# INLINE isCacheLineAligned #-} isCacheLineAligned :: Int -> Bool isCacheLineAligned (I# x#) = r# ==# 0# where !(I# m#) = cacheLineIntMask !mw# = int2Word# m# !w# = int2Word# x# !r# = word2Int# (mw# `and#` w#) {-# INLINE sign# #-} -- | Returns 0 if x is positive, -1 otherwise sign# :: Int# -> Int# sign# !x# = x# `uncheckedIShiftRA#` wordSizeMinus1# where !(I# wordSizeMinus1#) = wordSize-1 {-# INLINE bl_abs# #-} -- | Abs of an integer, branchless bl_abs# :: Int# -> Int# bl_abs# !x# = word2Int# r# where !m# = sign# x# !r# = (int2Word# (m# +# x#)) `xor#` int2Word# m# {-# INLINE mask# #-} -- | Returns 0xfff..fff (aka -1) if a# == b#, 0 otherwise. mask# :: Int# -> Int# -> Int# mask# !a# !b# = dest# where !d# = a# -# b# !r# = bl_abs# d# -# 1# !dest# = sign# r# {- note: this code should be: mask# :: Int# -> Int# -> Int# mask# !a# !b# = let !(I# z#) = fromEnum (a# ==# b#) !q# = negateInt# z# in q# but GHC doesn't properly optimize this as straight-line code at the moment. -} {-# INLINE maskw# #-} maskw# :: Int# -> Int# -> Word# maskw# !a# !b# = int2Word# (mask# a# b#) #ifdef NO_C_SEARCH prefetchRead _ _ = return () prefetchWrite _ _ = return () {-# INLINE forwardSearch2 #-} forwardSearch2 :: IntArray s -> Int -> Int -> Int -> Int -> ST s Int forwardSearch2 !vec !start !end !x1 !x2 = go start end False where next !i !e !b = let !j = i+1 in if j == e then (if b then (-1,e,True) else (0,start,True)) else (j,e,b) go !i !e !b = do h <- M.readArray vec i if h == x1 || h == x2 then return i else do let (!i',!e',!b') = next i e b if (i' < 0) then return (-1) else go i' e' b' {-# INLINE forwardSearch3 #-} forwardSearch3 :: IntArray s -> Int -> Int -> Int -> Int -> Int -> ST s Int forwardSearch3 !vec !start !end !x1 !x2 !x3 = go start end False where next !i !e !b = let !j = i+1 in if j == e then (if b then (-1,e,True) else (0,start,True)) else (j,e,b) go !i !e !b = do h <- M.readArray vec i if h == x1 || h == x2 || h == x3 then return i else do let (!i',!e',!b') = next i e b if (i' < 0) then return (-1) else go i' e' b' deBruijnBitPositions :: U.Vector Int8 deBruijnBitPositions = U.fromList [ 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9 ] {-# INLINE firstBitSet# #-} -- only works with 32-bit values -- ok for us here firstBitSet# :: Int# -> Int# firstBitSet# i# = word2Int# ((or# zeroCase# posw#)) where !zeroCase# = int2Word# (mask# 0# i#) !w# = int2Word# i# !iLowest# = word2Int# (and# w# (int2Word# (negateInt# i#))) !idxW# = uncheckedShiftRL# (narrow32Word# (timesWord# (int2Word# iLowest#) (int2Word# 0x077CB531#))) 27# !idx = I# (word2Int# idxW#) !(I8# pos8#) = U.unsafeIndex deBruijnBitPositions idx !posw# = int2Word# pos8# #endif ------------------------------------------------------------------------------ -- | Search through a mutable vector for a given int value, cache-line aligned. -- If the start index is cache-line aligned, and there is more than a -- cache-line's room between the start index and the end of the vector, we will -- search the cache-line all at once using an efficient branchless -- bit-twiddling technique. Otherwise, we will use a typical loop. -- cacheLineSearch :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found cacheLineSearch !vec !start !value = do #ifdef NO_C_SEARCH let !vlen = M.length vec let !st1 = vlen - start let !nvlen = numWordsInCacheLine - st1 let adv = (start + cacheLineIntMask) .&. complement cacheLineIntMask let st2 = adv - start if nvlen > 0 || not (isCacheLineAligned start) then naiveSearch vec start (min st1 st2) value else lineSearch vec start value #else lineSearch vec start value #endif {-# INLINE cacheLineSearch #-} #ifdef NO_C_SEARCH -- | Search through a mutable vector for a given int value. The number of -- things to search for must be at most the number of things remaining in the -- vector. naiveSearch :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ number of things to search -> Int -- ^ value to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found naiveSearch !vec !start !nThings !value = go start where !doneIdx = start + nThings go !i | i >= doneIdx = return (-1) | otherwise = do x <- M.readArray vec i if x == value then return i else go (i+1) {-# INLINE naiveSearch #-} lineResult# :: Word# -- ^ mask -> Int -- ^ start value -> Int lineResult# bitmask# (I# start#) = I# (word2Int# rv#) where !p# = firstBitSet# (word2Int# bitmask#) !mm# = maskw# p# (-1#) !nmm# = not# mm# !rv# = mm# `or#` (nmm# `and#` (int2Word# (start# +# p#))) {-# INLINE lineResult# #-} lineSearch :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch | wordSize == 32 = lineSearch32 | otherwise = lineSearch64 {-# INLINE lineSearch #-} lineSearch64 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch64 !vec !start !(I# v#) = do (I# x1#) <- M.readArray vec $! start + 0 let !p1# = maskw# x1# v# `and#` int2Word# 0x1# (I# x2#) <- M.readArray vec $! start + 1 let !p2# = p1# `or#` (maskw# x2# v# `and#` int2Word# 0x2#) (I# x3#) <- M.readArray vec $! start + 2 let !p3# = p2# `or#` (maskw# x3# v# `and#` int2Word# 0x4#) (I# x4#) <- M.readArray vec $! start + 3 let !p4# = p3# `or#` (maskw# x4# v# `and#` int2Word# 0x8#) (I# x5#) <- M.readArray vec $! start + 4 let !p5# = p4# `or#` (maskw# x5# v# `and#` int2Word# 0x10#) (I# x6#) <- M.readArray vec $! start + 5 let !p6# = p5# `or#` (maskw# x6# v# `and#` int2Word# 0x20#) (I# x7#) <- M.readArray vec $! start + 6 let !p7# = p6# `or#` (maskw# x7# v# `and#` int2Word# 0x40#) (I# x8#) <- M.readArray vec $! start + 7 let !p8# = p7# `or#` (maskw# x8# v# `and#` int2Word# 0x80#) return $! lineResult# p8# start {-# INLINE lineSearch64 #-} lineSearch32 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch32 !vec !start !(I# v#) = do (I# x1#) <- M.readArray vec $! start + 0 let !p1# = maskw# x1# v# `and#` int2Word# 0x1# (I# x2#) <- M.readArray vec $! start + 1 let !p2# = p1# `or#` (maskw# x2# v# `and#` int2Word# 0x2#) (I# x3#) <- M.readArray vec $! start + 2 let !p3# = p2# `or#` (maskw# x3# v# `and#` int2Word# 0x4#) (I# x4#) <- M.readArray vec $! start + 3 let !p4# = p3# `or#` (maskw# x4# v# `and#` int2Word# 0x8#) (I# x5#) <- M.readArray vec $! start + 4 let !p5# = p4# `or#` (maskw# x5# v# `and#` int2Word# 0x10#) (I# x6#) <- M.readArray vec $! start + 5 let !p6# = p5# `or#` (maskw# x6# v# `and#` int2Word# 0x20#) (I# x7#) <- M.readArray vec $! start + 6 let !p7# = p6# `or#` (maskw# x7# v# `and#` int2Word# 0x40#) (I# x8#) <- M.readArray vec $! start + 7 let !p8# = p7# `or#` (maskw# x8# v# `and#` int2Word# 0x80#) (I# x9#) <- M.readArray vec $! start + 8 let !p9# = p8# `or#` (maskw# x9# v# `and#` int2Word# 0x100#) (I# x10#) <- M.readArray vec $! start + 9 let !p10# = p9# `or#` (maskw# x10# v# `and#` int2Word# 0x200#) (I# x11#) <- M.readArray vec $! start + 10 let !p11# = p10# `or#` (maskw# x11# v# `and#` int2Word# 0x400#) (I# x12#) <- M.readArray vec $! start + 11 let !p12# = p11# `or#` (maskw# x12# v# `and#` int2Word# 0x800#) (I# x13#) <- M.readArray vec $! start + 12 let !p13# = p12# `or#` (maskw# x13# v# `and#` int2Word# 0x1000#) (I# x14#) <- M.readArray vec $! start + 13 let !p14# = p13# `or#` (maskw# x14# v# `and#` int2Word# 0x2000#) (I# x15#) <- M.readArray vec $! start + 14 let !p15# = p14# `or#` (maskw# x15# v# `and#` int2Word# 0x4000#) (I# x16#) <- M.readArray vec $! start + 15 let !p16# = p15# `or#` (maskw# x16# v# `and#` int2Word# 0x8000#) return $! lineResult# p16# start {-# INLINE lineSearch32 #-} #endif ------------------------------------------------------------------------------ -- | Search through a mutable vector for one of two given int values, -- cache-line aligned. If the start index is cache-line aligned, and there is -- more than a cache-line's room between the start index and the end of the -- vector, we will search the cache-line all at once using an efficient -- branchless bit-twiddling technique. Otherwise, we will use a typical loop. -- cacheLineSearch2 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found cacheLineSearch2 !vec !start !value !value2 = do #ifdef NO_C_SEARCH let !vlen = M.length vec let !st1 = vlen - start let !nvlen = numWordsInCacheLine - st1 let adv = (start + cacheLineIntMask) .&. complement cacheLineIntMask let st2 = adv - start if nvlen > 0 || not (isCacheLineAligned start) then naiveSearch2 vec start (min st1 st2) value value2 else lineSearch2 vec start value value2 #else lineSearch2 vec start value value2 #endif {-# INLINE cacheLineSearch2 #-} #ifdef NO_C_SEARCH naiveSearch2 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ number of things to search -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found naiveSearch2 !vec !start !nThings !value1 !value2 = go start where !doneIdx = start + nThings go !i | i >= doneIdx = return (-1) | otherwise = do x <- M.readArray vec i if x == value1 || x == value2 then return i else go (i+1) {-# INLINE naiveSearch2 #-} lineSearch2 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch2 | wordSize == 32 = lineSearch32_2 | otherwise = lineSearch64_2 lineSearch64_2 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch64_2 !vec !start !(I# v#) !(I# v2#) = do (I# x1#) <- M.readArray vec $! start + 0 let !p1# = (maskw# x1# v# `or#` maskw# x1# v2#) `and#` int2Word# 0x1# (I# x2#) <- M.readArray vec $! start + 1 let !p2# = p1# `or#` ((maskw# x2# v# `or#` maskw# x2# v2#) `and#` int2Word# 0x2#) (I# x3#) <- M.readArray vec $! start + 2 let !p3# = p2# `or#` ((maskw# x3# v# `or#` maskw# x3# v2#) `and#` int2Word# 0x4#) (I# x4#) <- M.readArray vec $! start + 3 let !p4# = p3# `or#` ((maskw# x4# v# `or#` maskw# x4# v2#) `and#` int2Word# 0x8#) (I# x5#) <- M.readArray vec $! start + 4 let !p5# = p4# `or#` ((maskw# x5# v# `or#` maskw# x5# v2#) `and#` int2Word# 0x10#) (I# x6#) <- M.readArray vec $! start + 5 let !p6# = p5# `or#` ((maskw# x6# v# `or#` maskw# x6# v2#) `and#` int2Word# 0x20#) (I# x7#) <- M.readArray vec $! start + 6 let !p7# = p6# `or#` ((maskw# x7# v# `or#` maskw# x7# v2#) `and#` int2Word# 0x40#) (I# x8#) <- M.readArray vec $! start + 7 let !p8# = p7# `or#` ((maskw# x8# v# `or#` maskw# x8# v2#) `and#` int2Word# 0x80#) return $! lineResult# p8# start {-# INLINE lineSearch64_2 #-} lineSearch32_2 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch32_2 !vec !start !(I# v#) !(I# v2#) = do (I# x1#) <- M.readArray vec $! start + 0 let !p1# = (maskw# x1# v# `or#` maskw# x1# v2#) `and#` int2Word# 0x1# (I# x2#) <- M.readArray vec $! start + 1 let !p2# = p1# `or#` ((maskw# x2# v# `or#` maskw# x2# v2#) `and#` int2Word# 0x2#) (I# x3#) <- M.readArray vec $! start + 2 let !p3# = p2# `or#` ((maskw# x3# v# `or#` maskw# x3# v2#) `and#` int2Word# 0x4#) (I# x4#) <- M.readArray vec $! start + 3 let !p4# = p3# `or#` ((maskw# x4# v# `or#` maskw# x4# v2#) `and#` int2Word# 0x8#) (I# x5#) <- M.readArray vec $! start + 4 let !p5# = p4# `or#` ((maskw# x5# v# `or#` maskw# x5# v2#) `and#` int2Word# 0x10#) (I# x6#) <- M.readArray vec $! start + 5 let !p6# = p5# `or#` ((maskw# x6# v# `or#` maskw# x6# v2#) `and#` int2Word# 0x20#) (I# x7#) <- M.readArray vec $! start + 6 let !p7# = p6# `or#` ((maskw# x7# v# `or#` maskw# x7# v2#) `and#` int2Word# 0x40#) (I# x8#) <- M.readArray vec $! start + 7 let !p8# = p7# `or#` ((maskw# x8# v# `or#` maskw# x8# v2#) `and#` int2Word# 0x80#) (I# x9#) <- M.readArray vec $! start + 8 let !p9# = p8# `or#` ((maskw# x9# v# `or#` maskw# x9# v2#) `and#` int2Word# 0x100#) (I# x10#) <- M.readArray vec $! start + 9 let !p10# = p9# `or#` ((maskw# x10# v# `or#` maskw# x10# v2#) `and#` int2Word# 0x200#) (I# x11#) <- M.readArray vec $! start + 10 let !p11# = p10# `or#` ((maskw# x11# v# `or#` maskw# x11# v2#) `and#` int2Word# 0x400#) (I# x12#) <- M.readArray vec $! start + 11 let !p12# = p11# `or#` ((maskw# x12# v# `or#` maskw# x12# v2#) `and#` int2Word# 0x800#) (I# x13#) <- M.readArray vec $! start + 12 let !p13# = p12# `or#` ((maskw# x13# v# `or#` maskw# x13# v2#) `and#` int2Word# 0x1000#) (I# x14#) <- M.readArray vec $! start + 13 let !p14# = p13# `or#` ((maskw# x14# v# `or#` maskw# x14# v2#) `and#` int2Word# 0x2000#) (I# x15#) <- M.readArray vec $! start + 14 let !p15# = p14# `or#` ((maskw# x15# v# `or#` maskw# x15# v2#) `and#` int2Word# 0x4000#) (I# x16#) <- M.readArray vec $! start + 15 let !p16# = p15# `or#` ((maskw# x16# v# `or#` maskw# x16# v2#) `and#` int2Word# 0x8000#) return $! lineResult# p16# start {-# INLINE lineSearch32_2 #-} #endif ------------------------------------------------------------------------------ -- | Search through a mutable vector for one of three given int values, -- cache-line aligned. If the start index is cache-line aligned, and there is -- more than a cache-line's room between the start index and the end of the -- vector, we will search the cache-line all at once using an efficient -- branchless bit-twiddling technique. Otherwise, we will use a typical loop. -- cacheLineSearch3 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> Int -- ^ value 3 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found cacheLineSearch3 !vec !start !value !value2 !value3 = do #ifdef NO_C_SEARCH let !vlen = M.length vec let !st1 = vlen - start let !nvlen = numWordsInCacheLine - st1 let adv = (start + cacheLineIntMask) .&. complement cacheLineIntMask let st2 = adv - start if nvlen > 0 || not (isCacheLineAligned start) then naiveSearch3 vec start (min st1 st2) value value2 value3 else lineSearch3 vec start value value2 value3 #else lineSearch3 vec start value value2 value3 #endif {-# INLINE cacheLineSearch3 #-} #ifdef NO_C_SEARCH naiveSearch3 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ number of things to search -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> Int -- ^ value 3 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found naiveSearch3 !vec !start !nThings !value1 !value2 !value3 = go start where !doneIdx = start + nThings go !i | i >= doneIdx = return (-1) | otherwise = do x <- M.readArray vec i if x == value1 || x == value2 || x == value3 then return i else go (i+1) {-# INLINE naiveSearch3 #-} lineSearch3 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> Int -- ^ value 3 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch3 | wordSize == 32 = lineSearch32_3 | otherwise = lineSearch64_3 lineSearch64_3 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> Int -- ^ value 3 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch64_3 !vec !start !(I# v#) !(I# v2#) !(I# v3#) = do (I# x1#) <- M.readArray vec $! start + 0 let !p1# = (maskw# x1# v# `or#` maskw# x1# v2# `or#` maskw# x1# v3#) `and#` int2Word# 0x1# (I# x2#) <- M.readArray vec $! start + 1 let !p2# = p1# `or#` ((maskw# x2# v# `or#` maskw# x2# v2# `or#` maskw# x2# v3#) `and#` int2Word# 0x2#) (I# x3#) <- M.readArray vec $! start + 2 let !p3# = p2# `or#` ((maskw# x3# v# `or#` maskw# x3# v2# `or#` maskw# x3# v3#) `and#` int2Word# 0x4#) (I# x4#) <- M.readArray vec $! start + 3 let !p4# = p3# `or#` ((maskw# x4# v# `or#` maskw# x4# v2# `or#` maskw# x4# v3#) `and#` int2Word# 0x8#) (I# x5#) <- M.readArray vec $! start + 4 let !p5# = p4# `or#` ((maskw# x5# v# `or#` maskw# x5# v2# `or#` maskw# x5# v3#) `and#` int2Word# 0x10#) (I# x6#) <- M.readArray vec $! start + 5 let !p6# = p5# `or#` ((maskw# x6# v# `or#` maskw# x6# v2# `or#` maskw# x6# v3#) `and#` int2Word# 0x20#) (I# x7#) <- M.readArray vec $! start + 6 let !p7# = p6# `or#` ((maskw# x7# v# `or#` maskw# x7# v2# `or#` maskw# x7# v3#) `and#` int2Word# 0x40#) (I# x8#) <- M.readArray vec $! start + 7 let !p8# = p7# `or#` ((maskw# x8# v# `or#` maskw# x8# v2# `or#` maskw# x8# v3#) `and#` int2Word# 0x80#) return $! lineResult# p8# start {-# INLINE lineSearch64_3 #-} lineSearch32_3 :: IntArray s -- ^ vector to search -> Int -- ^ start index -> Int -- ^ value to search for -> Int -- ^ value 2 to search for -> Int -- ^ value 3 to search for -> ST s Int -- ^ dest index where it can be found, or -- \"-1\" if not found lineSearch32_3 !vec !start !(I# v#) !(I# v2#) !(I# v3#) = do (I# x1#) <- M.readArray vec $! start + 0 let !p1# = (maskw# x1# v# `or#` maskw# x1# v2# `or#` maskw# x1# v3#) `and#` int2Word# 0x1# (I# x2#) <- M.readArray vec $! start + 1 let !p2# = p1# `or#` ((maskw# x2# v# `or#` maskw# x2# v2# `or#` maskw# x2# v3#) `and#` int2Word# 0x2#) (I# x3#) <- M.readArray vec $! start + 2 let !p3# = p2# `or#` ((maskw# x3# v# `or#` maskw# x3# v2# `or#` maskw# x3# v3#) `and#` int2Word# 0x4#) (I# x4#) <- M.readArray vec $! start + 3 let !p4# = p3# `or#` ((maskw# x4# v# `or#` maskw# x4# v2# `or#` maskw# x4# v3#) `and#` int2Word# 0x8#) (I# x5#) <- M.readArray vec $! start + 4 let !p5# = p4# `or#` ((maskw# x5# v# `or#` maskw# x5# v2# `or#` maskw# x5# v3#) `and#` int2Word# 0x10#) (I# x6#) <- M.readArray vec $! start + 5 let !p6# = p5# `or#` ((maskw# x6# v# `or#` maskw# x6# v2# `or#` maskw# x6# v3#) `and#` int2Word# 0x20#) (I# x7#) <- M.readArray vec $! start + 6 let !p7# = p6# `or#` ((maskw# x7# v# `or#` maskw# x7# v2# `or#` maskw# x7# v3#) `and#` int2Word# 0x40#) (I# x8#) <- M.readArray vec $! start + 7 let !p8# = p7# `or#` ((maskw# x8# v# `or#` maskw# x8# v2# `or#` maskw# x8# v3#) `and#` int2Word# 0x80#) (I# x9#) <- M.readArray vec $! start + 8 let !p9# = p8# `or#` ((maskw# x9# v# `or#` maskw# x9# v2# `or#` maskw# x9# v3#) `and#` int2Word# 0x100#) (I# x10#) <- M.readArray vec $! start + 9 let !p10# = p9# `or#` ((maskw# x10# v# `or#` maskw# x10# v2# `or#` maskw# x10# v3#) `and#` int2Word# 0x200#) (I# x11#) <- M.readArray vec $! start + 10 let !p11# = p10# `or#` ((maskw# x11# v# `or#` maskw# x11# v2# `or#` maskw# x11# v3#) `and#` int2Word# 0x400#) (I# x12#) <- M.readArray vec $! start + 11 let !p12# = p11# `or#` ((maskw# x12# v# `or#` maskw# x12# v2# `or#` maskw# x12# v3#) `and#` int2Word# 0x800#) (I# x13#) <- M.readArray vec $! start + 12 let !p13# = p12# `or#` ((maskw# x13# v# `or#` maskw# x13# v2# `or#` maskw# x13# v3#) `and#` int2Word# 0x1000#) (I# x14#) <- M.readArray vec $! start + 13 let !p14# = p13# `or#` ((maskw# x14# v# `or#` maskw# x14# v2# `or#` maskw# x14# v3#) `and#` int2Word# 0x2000#) (I# x15#) <- M.readArray vec $! start + 14 let !p15# = p14# `or#` ((maskw# x15# v# `or#` maskw# x15# v2# `or#` maskw# x15# v3#) `and#` int2Word# 0x4000#) (I# x16#) <- M.readArray vec $! start + 15 let !p16# = p15# `or#` ((maskw# x16# v# `or#` maskw# x16# v2# `or#` maskw# x16# v3#) `and#` int2Word# 0x8000#) return $! lineResult# p16# start {-# INLINE lineSearch32_3 #-} #endif hashtables-1.0.1.8/src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs0000644000000000000000000000744212032120320024732 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} module Data.HashTable.Internal.CheapPseudoRandomBitStream ( BitStream , newBitStream , getNextBit , getNBits ) where import Control.Applicative import Control.Monad.ST import Data.Bits import Data.Int import Data.STRef import qualified Data.Vector.Unboxed as V import Data.Vector.Unboxed (Vector) import Data.HashTable.Internal.Utils ------------------------------------------------------------------------------ -- Chosen by fair dice roll. Guaranteed random. More importantly, there are an -- equal number of 0 and 1 bits in both of these vectors. random32s :: Vector Int32 random32s = V.fromList [ 0xe293c315 , 0x82e2ff62 , 0xcb1ef9ae , 0x78850172 , 0x551ee1ce , 0x59d6bfd1 , 0xb717ec44 , 0xe7a3024e , 0x02bb8976 , 0x87e2f94f , 0xfa156372 , 0xe1325b17 , 0xe005642a , 0xc8d02eb3 , 0xe90c0a87 , 0x4cb9e6e2 ] ------------------------------------------------------------------------------ random64s :: Vector Int64 random64s = V.fromList [ 0x62ef447e007e8732 , 0x149d6acb499feef8 , 0xca7725f9b404fbf8 , 0x4b5dfad194e626a9 , 0x6d76f2868359491b , 0x6b2284e3645dcc87 , 0x5b89b485013eaa16 , 0x6e2d4308250c435b , 0xc31e641a659e0013 , 0xe237b85e9dc7276d , 0x0b3bb7fa40d94f3f , 0x4da446874d4ca023 , 0x69240623fedbd26b , 0x76fb6810dcf894d3 , 0xa0da4e0ce57c8ea7 , 0xeb76b84453dc3873 ] ------------------------------------------------------------------------------ numRandoms :: Int numRandoms = 16 ------------------------------------------------------------------------------ randoms :: Vector Int randoms | wordSize == 32 = V.map fromEnum random32s | otherwise = V.map fromEnum random64s ------------------------------------------------------------------------------ data BitStream s = BitStream { _curRandom :: !(STRef s Int) , _bitsLeft :: !(STRef s Int) , _randomPos :: !(STRef s Int) } ------------------------------------------------------------------------------ newBitStream :: ST s (BitStream s) newBitStream = unwrapMonad $ BitStream <$> (WrapMonad $ newSTRef $ V.unsafeIndex randoms 0) <*> (WrapMonad $ newSTRef wordSize) <*> (WrapMonad $ newSTRef 1) ------------------------------------------------------------------------------ getNextBit :: BitStream s -> ST s Int getNextBit = getNBits 1 ------------------------------------------------------------------------------ getNBits :: Int -> BitStream s -> ST s Int getNBits nbits (BitStream crRef blRef rpRef) = do !bl <- readSTRef blRef if bl < nbits then newWord else nextBits bl where newWord = do !rp <- readSTRef rpRef let r = V.unsafeIndex randoms rp writeSTRef blRef $! wordSize - nbits writeSTRef rpRef $! if rp == (numRandoms-1) then 0 else rp + 1 extractBits r extractBits r = do let !b = r .&. ((1 `iShiftL` nbits) - 1) writeSTRef crRef $! (r `iShiftRL` nbits) return b nextBits bl = do !r <- readSTRef crRef writeSTRef blRef $! bl - nbits extractBits r hashtables-1.0.1.8/src/Data/HashTable/Internal/IntArray.hs0000644000000000000000000000357512032120320021312 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE MagicHash #-} module Data.HashTable.Internal.IntArray ( IntArray , newArray , readArray , writeArray , length , toPtr ) where import Control.Monad.ST import Data.Bits import qualified Data.Primitive.ByteArray as A import Data.Primitive.Types (Addr(..)) import GHC.Exts import Prelude hiding (length) #ifdef BOUNDS_CHECKING #define BOUNDS_MSG(sz,i) concat [ "[", __FILE__, ":", \ show (__LINE__ :: Int), \ "] bounds check exceeded: ",\ "size was ", show (sz), " i was ", show (i) ] #define BOUNDS_CHECK(arr,i) let sz = (A.sizeofMutableByteArray (arr) \ `div` wordSizeInBytes) in \ if (i) < 0 || (i) >= sz \ then error (BOUNDS_MSG(sz,(i))) \ else return () #else #define BOUNDS_CHECK(arr,i) #endif newtype IntArray s = IA (A.MutableByteArray s) wordSizeInBytes :: Int wordSizeInBytes = bitSize (0::Int) `div` 8 -- | Cache line size, in bytes cacheLineSize :: Int cacheLineSize = 64 newArray :: Int -> ST s (IntArray s) newArray n = do let !sz = n * wordSizeInBytes !arr <- A.newAlignedPinnedByteArray sz cacheLineSize A.fillByteArray arr 0 sz 0 return $! IA arr readArray :: IntArray s -> Int -> ST s Int readArray (IA a) idx = do BOUNDS_CHECK(a,idx) A.readByteArray a idx writeArray :: IntArray s -> Int -> Int -> ST s () writeArray (IA a) idx val = do BOUNDS_CHECK(a,idx) A.writeByteArray a idx val length :: IntArray s -> Int length (IA a) = A.sizeofMutableByteArray a `div` wordSizeInBytes toPtr :: IntArray s -> Ptr a toPtr (IA a) = Ptr a# where !(Addr !a#) = A.mutableByteArrayContents a hashtables-1.0.1.8/src/Data/HashTable/Internal/UnsafeTricks.hs0000644000000000000000000000443012032120320022151 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} #ifdef UNSAFETRICKS {-# LANGUAGE MagicHash #-} #endif module Data.HashTable.Internal.UnsafeTricks ( Key , toKey , fromKey , emptyRecord , deletedRecord , keyIsEmpty , keyIsDeleted , writeDeletedElement , makeEmptyVector ) where import Control.Monad.Primitive import Data.Vector.Mutable (MVector) import qualified Data.Vector.Mutable as M #ifdef UNSAFETRICKS import GHC.Exts import Unsafe.Coerce #endif ------------------------------------------------------------------------------ #ifdef UNSAFETRICKS type Key a = Any #else data Key a = Key !a | EmptyElement | DeletedElement deriving (Show) #endif ------------------------------------------------------------------------------ -- Type signatures emptyRecord :: Key a deletedRecord :: Key a keyIsEmpty :: Key a -> Bool keyIsDeleted :: Key a -> Bool makeEmptyVector :: PrimMonad m => Int -> m (MVector (PrimState m) (Key a)) writeDeletedElement :: PrimMonad m => MVector (PrimState m) (Key a) -> Int -> m () toKey :: a -> Key a fromKey :: Key a -> a #ifdef UNSAFETRICKS data TombStone = EmptyElement | DeletedElement {-# NOINLINE emptyRecord #-} emptyRecord = unsafeCoerce EmptyElement {-# NOINLINE deletedRecord #-} deletedRecord = unsafeCoerce DeletedElement {-# INLINE keyIsEmpty #-} keyIsEmpty a = x# ==# 1# where !x# = reallyUnsafePtrEquality# a emptyRecord {-# INLINE keyIsDeleted #-} keyIsDeleted a = x# ==# 1# where !x# = reallyUnsafePtrEquality# a deletedRecord {-# INLINE toKey #-} toKey = unsafeCoerce {-# INLINE fromKey #-} fromKey = unsafeCoerce #else emptyRecord = EmptyElement deletedRecord = DeletedElement keyIsEmpty EmptyElement = True keyIsEmpty _ = False keyIsDeleted DeletedElement = True keyIsDeleted _ = False toKey = Key fromKey (Key x) = x fromKey _ = error "impossible" #endif ------------------------------------------------------------------------------ {-# INLINE makeEmptyVector #-} makeEmptyVector m = M.replicate m emptyRecord ------------------------------------------------------------------------------ {-# INLINE writeDeletedElement #-} writeDeletedElement v i = M.unsafeWrite v i deletedRecord hashtables-1.0.1.8/src/Data/HashTable/Internal/Utils.hs0000644000000000000000000002124412032120320020652 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE MagicHash #-} module Data.HashTable.Internal.Utils ( whichBucket , nextBestPrime , bumpSize , shiftL , shiftRL , iShiftL , iShiftRL , nextHighestPowerOf2 , log2 , highestBitMask , wordSize , cacheLineSize , numWordsInCacheLine , cacheLineIntMask , cacheLineIntBits , forceSameType ) where import Data.Bits hiding (shiftL) import Data.Vector (Vector) import qualified Data.Vector as V #if __GLASGOW_HASKELL__ >= 503 import GHC.Exts #else import qualified Data.Bits import Data.Word #endif ------------------------------------------------------------------------------ wordSize :: Int wordSize = bitSize (0::Int) cacheLineSize :: Int cacheLineSize = 64 numWordsInCacheLine :: Int numWordsInCacheLine = z where !z = cacheLineSize `div` (wordSize `div` 8) -- | What you have to mask an integer index by to tell if it's -- cacheline-aligned cacheLineIntMask :: Int cacheLineIntMask = z where !z = numWordsInCacheLine - 1 cacheLineIntBits :: Int cacheLineIntBits = log2 $ toEnum numWordsInCacheLine ------------------------------------------------------------------------------ {-# INLINE whichBucket #-} whichBucket :: Int -> Int -> Int whichBucket !h !sz = o where !o = h `mod` sz ------------------------------------------------------------------------------ binarySearch :: (Ord e) => Vector e -> e -> Int binarySearch = binarySearchBy compare {-# INLINE binarySearch #-} ------------------------------------------------------------------------------ binarySearchBy :: (e -> e -> Ordering) -> Vector e -> e -> Int binarySearchBy cmp vec e = binarySearchByBounds cmp vec e 0 (V.length vec) {-# INLINE binarySearchBy #-} ------------------------------------------------------------------------------ binarySearchByBounds :: (e -> e -> Ordering) -> Vector e -> e -> Int -> Int -> Int binarySearchByBounds cmp vec e = loop where loop !l !u | u <= l = l | otherwise = let e' = V.unsafeIndex vec k in case cmp e' e of LT -> loop (k+1) u EQ -> k GT -> loop l k where k = (u + l) `shiftR` 1 {-# INLINE binarySearchByBounds #-} ------------------------------------------------------------------------------ primeSizes :: Vector Integer primeSizes = V.fromList [ 19 , 31 , 37 , 43 , 47 , 53 , 61 , 67 , 79 , 89 , 97 , 107 , 113 , 127 , 137 , 149 , 157 , 167 , 181 , 193 , 211 , 233 , 257 , 281 , 307 , 331 , 353 , 389 , 409 , 421 , 443 , 467 , 503 , 523 , 563 , 593 , 631 , 653 , 673 , 701 , 733 , 769 , 811 , 877 , 937 , 1039 , 1117 , 1229 , 1367 , 1543 , 1637 , 1747 , 1873 , 2003 , 2153 , 2311 , 2503 , 2777 , 3079 , 3343 , 3697 , 5281 , 6151 , 7411 , 9901 , 12289 , 18397 , 24593 , 34651 , 49157 , 66569 , 73009 , 98317 , 118081 , 151051 , 196613 , 246011 , 393241 , 600011 , 786433 , 1050013 , 1572869 , 2203657 , 3145739 , 4000813 , 6291469 , 7801379 , 10004947 , 12582917 , 19004989 , 22752641 , 25165843 , 39351667 , 50331653 , 69004951 , 83004629 , 100663319 , 133004881 , 173850851 , 201326611 , 293954587 , 402653189 , 550001761 , 702952391 , 805306457 , 1102951999 , 1402951337 , 1610612741 , 1902802801 , 2147483647 , 3002954501 , 3902954959 , 4294967291 , 5002902979 , 6402754181 , 8589934583 , 17179869143 , 34359738337 , 68719476731 , 137438953447 , 274877906899 ] ------------------------------------------------------------------------------ nextBestPrime :: Int -> Int nextBestPrime x = fromEnum yi where xi = toEnum x idx = binarySearch primeSizes xi yi = V.unsafeIndex primeSizes idx ------------------------------------------------------------------------------ bumpSize :: Int -> Int bumpSize !s = nextBestPrime s' where -- double at small sizes, then 3/2 thereafter s' = if s < 24593 then 2*s else (s `div` 2) * 3 ------------------------------------------------------------------------------ shiftL :: Word -> Int -> Word shiftRL :: Word -> Int -> Word iShiftL :: Int -> Int -> Int iShiftRL :: Int -> Int -> Int #if __GLASGOW_HASKELL__ {-------------------------------------------------------------------- GHC: use unboxing to get @shiftRL@ inlined. --------------------------------------------------------------------} {-# INLINE shiftL #-} shiftL (W# x) (I# i) = W# (shiftL# x i) {-# INLINE shiftRL #-} shiftRL (W# x) (I# i) = W# (shiftRL# x i) {-# INLINE iShiftL #-} iShiftL (I# x) (I# i) = I# (iShiftL# x i) {-# INLINE iShiftRL #-} iShiftRL (I# x) (I# i) = I# (iShiftRL# x i) #else shiftL x i = Data.Bits.shiftL x i shiftRL x i = shiftR x i iShiftL x i = shiftL x i iShiftRL x i = shiftRL x i #endif ------------------------------------------------------------------------------ {-# INLINE nextHighestPowerOf2 #-} nextHighestPowerOf2 :: Word -> Word nextHighestPowerOf2 w = highestBitMask (w-1) + 1 ------------------------------------------------------------------------------ log2 :: Word -> Int log2 w = go (nextHighestPowerOf2 w) 0 where go 0 !i = i-1 go !n !i = go (shiftRL n 1) (i+1) ------------------------------------------------------------------------------ {-# INLINE highestBitMask #-} highestBitMask :: Word -> Word highestBitMask !x0 = case (x0 .|. shiftRL x0 1) of x1 -> case (x1 .|. shiftRL x1 2) of x2 -> case (x2 .|. shiftRL x2 4) of x3 -> case (x3 .|. shiftRL x3 8) of x4 -> case (x4 .|. shiftRL x4 16) of x5 -> x5 .|. shiftRL x5 32 ------------------------------------------------------------------------------ forceSameType :: Monad m => a -> a -> m () forceSameType _ _ = return () {-# INLINE forceSameType #-} hashtables-1.0.1.8/src/Data/HashTable/Internal/Linear/0000755000000000000000000000000012032120320020425 5ustar0000000000000000hashtables-1.0.1.8/src/Data/HashTable/Internal/Linear/Bucket.hs0000644000000000000000000002500312032120320022176 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} module Data.HashTable.Internal.Linear.Bucket ( Bucket, newBucketArray, newBucketSize, emptyWithSize, growBucketTo, snoc, size, lookup, delete, toList, fromList, mapM_, foldM, expandBucketArray, expandArray, nelemsAndOverheadInWords, bucketSplitSize ) where ------------------------------------------------------------------------------ import qualified Control.Monad import Control.Monad hiding (mapM_, foldM) import Control.Monad.ST import Data.Maybe (fromMaybe) import Data.HashTable.Internal.Array import Data.STRef import Prelude hiding (lookup, mapM_) ------------------------------------------------------------------------------ import Data.HashTable.Internal.UnsafeTricks #ifdef DEBUG import System.IO #endif type Bucket s k v = Key (Bucket_ s k v) ------------------------------------------------------------------------------ data Bucket_ s k v = Bucket { _bucketSize :: {-# UNPACK #-} !Int , _highwater :: {-# UNPACK #-} !(STRef s Int) , _keys :: {-# UNPACK #-} !(MutableArray s k) , _values :: {-# UNPACK #-} !(MutableArray s v) } ------------------------------------------------------------------------------ bucketSplitSize :: Int bucketSplitSize = 16 ------------------------------------------------------------------------------ newBucketArray :: Int -> ST s (MutableArray s (Bucket s k v)) newBucketArray k = newArray k emptyRecord ------------------------------------------------------------------------------ nelemsAndOverheadInWords :: Bucket s k v -> ST s (Int,Int) nelemsAndOverheadInWords bKey = do if (not $ keyIsEmpty bKey) then do !hw <- readSTRef hwRef let !w = sz - hw return (hw, constOverhead + 2*w) else return (0, 0) where constOverhead = 8 b = fromKey bKey sz = _bucketSize b hwRef = _highwater b ------------------------------------------------------------------------------ emptyWithSize :: Int -> ST s (Bucket s k v) emptyWithSize !sz = do !keys <- newArray sz undefined !values <- newArray sz undefined !ref <- newSTRef 0 return $ toKey $ Bucket sz ref keys values ------------------------------------------------------------------------------ newBucketSize :: Int newBucketSize = 4 ------------------------------------------------------------------------------ expandArray :: a -- ^ default value -> Int -- ^ new size -> Int -- ^ number of elements to copy -> MutableArray s a -- ^ old array -> ST s (MutableArray s a) expandArray def !sz !hw !arr = do newArr <- newArray sz def cp newArr where cp !newArr = go 0 where go !i | i >= hw = return newArr | otherwise = do readArray arr i >>= writeArray newArr i go (i+1) ------------------------------------------------------------------------------ expandBucketArray :: Int -> Int -> MutableArray s (Bucket s k v) -> ST s (MutableArray s (Bucket s k v)) expandBucketArray = expandArray emptyRecord ------------------------------------------------------------------------------ growBucketTo :: Int -> Bucket s k v -> ST s (Bucket s k v) growBucketTo !sz bk | keyIsEmpty bk = emptyWithSize sz | otherwise = do if osz >= sz then return bk else do hw <- readSTRef hwRef k' <- expandArray undefined sz hw keys v' <- expandArray undefined sz hw values return $ toKey $ Bucket sz hwRef k' v' where bucket = fromKey bk osz = _bucketSize bucket hwRef = _highwater bucket keys = _keys bucket values = _values bucket ------------------------------------------------------------------------------ {-# INLINE snoc #-} -- Just return == new bucket object snoc :: Bucket s k v -> k -> v -> ST s (Int, Maybe (Bucket s k v)) snoc bucket | keyIsEmpty bucket = mkNew | otherwise = snoc' (fromKey bucket) where mkNew !k !v = do debug "Bucket.snoc: mkNew" keys <- newArray newBucketSize undefined values <- newArray newBucketSize undefined writeArray keys 0 k writeArray values 0 v ref <- newSTRef 1 return (1, Just $ toKey $ Bucket newBucketSize ref keys values) snoc' (Bucket bsz hwRef keys values) !k !v = readSTRef hwRef >>= check where check !hw | hw < bsz = bump hw | otherwise = spill hw bump hw = do debug $ "Bucket.snoc: bumping hw, bsz=" ++ show bsz ++ ", hw=" ++ show hw writeArray keys hw k writeArray values hw v let !hw' = hw + 1 writeSTRef hwRef hw' debug "Bucket.snoc: finished" return (hw', Nothing) doublingThreshold = bucketSplitSize `div` 2 growFactor = 1.5 :: Double newSize z | z == 0 = newBucketSize | z < doublingThreshold = z * 2 | otherwise = ceiling $ growFactor * fromIntegral z spill !hw = do let sz = newSize bsz debug $ "Bucket.snoc: spilling, old size=" ++ show bsz ++ ", new size=" ++ show sz bk <- growBucketTo sz bucket debug "Bucket.snoc: spill finished, snoccing element" let (Bucket _ hwRef' keys' values') = fromKey bk let !hw' = hw+1 writeArray keys' hw k writeArray values' hw v writeSTRef hwRef' hw' return (hw', Just bk) ------------------------------------------------------------------------------ {-# INLINE size #-} size :: Bucket s k v -> ST s Int size b | keyIsEmpty b = return 0 | otherwise = readSTRef $ _highwater $ fromKey b ------------------------------------------------------------------------------ -- note: search in reverse order! We prefer recently snoc'd keys. lookup :: (Eq k) => Bucket s k v -> k -> ST s (Maybe v) lookup bucketKey !k | keyIsEmpty bucketKey = return Nothing | otherwise = lookup' $ fromKey bucketKey where lookup' (Bucket _ hwRef keys values) = do hw <- readSTRef hwRef go (hw-1) where go !i | i < 0 = return Nothing | otherwise = do k' <- readArray keys i if k == k' then do !v <- readArray values i return $! Just v else go (i-1) ------------------------------------------------------------------------------ {-# INLINE toList #-} toList :: Bucket s k v -> ST s [(k,v)] toList bucketKey | keyIsEmpty bucketKey = return [] | otherwise = toList' $ fromKey bucketKey where toList' (Bucket _ hwRef keys values) = do hw <- readSTRef hwRef go [] hw 0 where go !l !hw !i | i >= hw = return l | otherwise = do k <- readArray keys i v <- readArray values i go ((k,v):l) hw $ i+1 ------------------------------------------------------------------------------ -- fromList needs to reverse the input in order to make fromList . toList == id {-# INLINE fromList #-} fromList :: [(k,v)] -> ST s (Bucket s k v) fromList l = Control.Monad.foldM f emptyRecord (reverse l) where f bucket (k,v) = do (_,m) <- snoc bucket k v return $ fromMaybe bucket m ------------------------------------------------------------------------------ delete :: (Eq k) => Bucket s k v -> k -> ST s Bool delete bucketKey !k | keyIsEmpty bucketKey = do debug $ "Bucket.delete: empty bucket" return False | otherwise = do debug "Bucket.delete: start" del $ fromKey bucketKey where del (Bucket sz hwRef keys values) = do hw <- readSTRef hwRef debug $ "Bucket.delete: hw=" ++ show hw ++ ", sz=" ++ show sz go hw $ hw - 1 where go !hw !i | i < 0 = return False | otherwise = do k' <- readArray keys i if k == k' then do debug $ "found entry to delete at " ++ show i move (hw-1) i keys move (hw-1) i values let !hw' = hw-1 writeSTRef hwRef hw' return True else go hw (i-1) ------------------------------------------------------------------------------ {-# INLINE mapM_ #-} mapM_ :: ((k,v) -> ST s a) -> Bucket s k v -> ST s () mapM_ f bucketKey | keyIsEmpty bucketKey = do debug $ "Bucket.mapM_: bucket was empty" return () | otherwise = doMap $ fromKey bucketKey where doMap (Bucket sz hwRef keys values) = do hw <- readSTRef hwRef debug $ "Bucket.mapM_: hw was " ++ show hw ++ ", sz was " ++ show sz go hw 0 where go !hw !i | i >= hw = return () | otherwise = do k <- readArray keys i v <- readArray values i _ <- f (k,v) go hw $ i+1 ------------------------------------------------------------------------------ {-# INLINE foldM #-} foldM :: (a -> (k,v) -> ST s a) -> a -> Bucket s k v -> ST s a foldM f !seed0 bucketKey | keyIsEmpty bucketKey = return seed0 | otherwise = doMap $ fromKey bucketKey where doMap (Bucket _ hwRef keys values) = do hw <- readSTRef hwRef go hw seed0 0 where go !hw !seed !i | i >= hw = return seed | otherwise = do k <- readArray keys i v <- readArray values i seed' <- f seed (k,v) go hw seed' (i+1) ------------------------------------------------------------------------------ -- move i into j move :: Int -> Int -> MutableArray s a -> ST s () move i j arr | i == j = do debug $ "move " ++ show i ++ " into " ++ show j return () | otherwise = do debug $ "move " ++ show i ++ " into " ++ show j readArray arr i >>= writeArray arr j {-# INLINE debug #-} debug :: String -> ST s () #ifdef DEBUG debug s = unsafeIOToST $ do putStrLn s hFlush stdout #else #ifdef TESTSUITE debug !s = do let !_ = length s return $! () #else debug _ = return () #endif #endif hashtables-1.0.1.8/src/Data/HashTable/ST/0000755000000000000000000000000012032120320015765 5ustar0000000000000000hashtables-1.0.1.8/src/Data/HashTable/ST/Basic.hs0000644000000000000000000004732012032120320017350 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE MagicHash #-} {-| A basic open-addressing hash table using linear probing. Use this hash table if you... * want the fastest possible lookups, and very fast inserts. * don't care about wasting a little bit of memory to get it. * don't care that a table resize might pause for a long time to rehash all of the key-value mappings. * have a workload which is not heavy with deletes; deletes clutter the table with deleted markers and force the table to be completely rehashed fairly often. /Details:/ Of the hash tables in this collection, this hash table has the best insert and lookup performance, with the following caveats. /Space overhead/ This table is not especially memory-efficient; firstly, the table has a maximum load factor of 0.83 and will be resized if load exceeds this value. Secondly, to improve insert and lookup performance, we store the hash code for each key in the table. Each hash table entry requires three words, two for the pointers to the key and value and one for the hash code. We don't count key and value pointers as overhead, because they have to be there -- so the overhead for a full slot is one word -- but empty slots in the hash table count for a full three words of overhead. Define @m@ as the number of slots in the table and @n@ as the number of key value mappings. If the load factor is @k=n\/m@, the amount of space wasted is: @ w(n) = 1*n + 3(m-n) @ Since @m=n\/k@, @ w(n) = n + 3(n\/k - n) = n (3\/k-2) @ Solving for @k=0.83@, the maximum load factor, gives a /minimum/ overhead of 2 words per mapping. If @k=0.5@, under normal usage the /maximum/ overhead situation, then the overhead would be 4 words per mapping. /Space overhead: experimental results/ In randomized testing (see @test\/compute-overhead\/ComputeOverhead.hs@ in the source distribution), mean overhead (that is, the number of words needed to store the key-value mapping over and above the two words necessary for the key and the value pointers) is approximately 2.29 machine words per key-value mapping with a standard deviation of about 0.44 words, and 3.14 words per mapping at the 95th percentile. /Expensive resizes/ If enough elements are inserted into the table to make it exceed the maximum load factor, the table is resized. A resize involves a complete rehash of all the elements in the table, which means that any given call to 'insert' might take /O(n)/ time in the size of the table, with a large constant factor. If a long pause waiting for the table to resize is unacceptable for your application, you should choose the included linear hash table instead. /References:/ * Knuth, Donald E. /The Art of Computer Programming/, vol. 3 Sorting and Searching. Addison-Wesley Publishing Company, 1973. -} module Data.HashTable.ST.Basic ( HashTable , new , newSized , delete , lookup , insert , mapM_ , foldM , computeOverhead ) where ------------------------------------------------------------------------------ import Control.Exception (assert) import Control.Monad hiding (mapM_, foldM) import Control.Monad.ST import Data.Hashable (Hashable) import qualified Data.Hashable as H import Data.Maybe import Data.Monoid import Data.STRef import GHC.Exts import Prelude hiding (lookup, read, mapM_) ------------------------------------------------------------------------------ import Data.HashTable.Internal.Array import qualified Data.HashTable.Internal.IntArray as U import Data.HashTable.Internal.CacheLine import Data.HashTable.Internal.Utils import qualified Data.HashTable.Class as C ------------------------------------------------------------------------------ -- | An open addressing hash table using linear probing. newtype HashTable s k v = HT (STRef s (HashTable_ s k v)) data HashTable_ s k v = HashTable { _size :: {-# UNPACK #-} !Int , _load :: !(U.IntArray s) -- ^ How many entries in the table? Prefer -- unboxed vector here to STRef because I -- know it will be appropriately strict , _delLoad :: !(U.IntArray s) -- ^ How many deleted entries in the table? , _hashes :: !(U.IntArray s) , _keys :: {-# UNPACK #-} !(MutableArray s k) , _values :: {-# UNPACK #-} !(MutableArray s v) } ------------------------------------------------------------------------------ instance C.HashTable HashTable where new = new newSized = newSized insert = insert delete = delete lookup = lookup foldM = foldM mapM_ = mapM_ computeOverhead = computeOverhead ------------------------------------------------------------------------------ instance Show (HashTable s k v) where show _ = "" ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:new". new :: ST s (HashTable s k v) new = newSized 30 {-# INLINE new #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:newSized". newSized :: Int -> ST s (HashTable s k v) newSized n = do let m = nextBestPrime $ ceiling (fromIntegral n / maxLoad) ht <- newSizedReal m newRef ht {-# INLINE newSized #-} ------------------------------------------------------------------------------ newSizedReal :: Int -> ST s (HashTable_ s k v) newSizedReal m = do -- make sure the hash array is a multiple of cache-line sized so we can -- always search a whole cache line at once let m' = ((m + numWordsInCacheLine - 1) `div` numWordsInCacheLine) * numWordsInCacheLine h <- U.newArray m' k <- newArray m undefined v <- newArray m undefined ld <- U.newArray 1 dl <- U.newArray 1 return $! HashTable m ld dl h k v ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:delete". delete :: (Hashable k, Eq k) => (HashTable s k v) -> k -> ST s () delete htRef k = do ht <- readRef htRef _ <- delete' ht True k h return () where !h = hash k {-# INLINE delete #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:lookup". lookup :: (Eq k, Hashable k) => (HashTable s k v) -> k -> ST s (Maybe v) lookup htRef !k = do ht <- readRef htRef lookup' ht where lookup' (HashTable sz _ _ hashes keys values) = do let !b = whichBucket h sz debug $ "lookup sz=" ++ show sz ++ " h=" ++ show h ++ " b=" ++ show b go b 0 sz where !h = hash k go !b !start !end = {-# SCC "lookup/go" #-} do idx <- forwardSearch2 hashes b end h emptyMarker debug $ "forwardSearch2 returned " ++ show idx if (idx < 0 || idx < start || idx >= end) then return Nothing else do h0 <- U.readArray hashes idx debug $ "h0 was " ++ show h0 if recordIsEmpty h0 then return Nothing else do k' <- readArray keys idx if k == k' then do debug $ "value found at " ++ show idx v <- readArray values idx return $! Just v else if idx < b then go (idx + 1) (idx + 1) b else go (idx + 1) start end {-# INLINE lookup #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:insert". insert :: (Eq k, Hashable k) => (HashTable s k v) -> k -> v -> ST s () insert htRef !k !v = do ht <- readRef htRef !ht' <- insert' ht writeRef htRef ht' where insert' ht = do debug "insert': calling delete'" b <- delete' ht False k h debug $ "insert': writing h=" ++ show h ++ " b=" ++ show b U.writeArray hashes b h writeArray keys b k writeArray values b v checkOverflow ht where !h = hash k hashes = _hashes ht keys = _keys ht values = _values ht {-# INLINE insert #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:foldM". foldM :: (a -> (k,v) -> ST s a) -> a -> HashTable s k v -> ST s a foldM f seed0 htRef = readRef htRef >>= work where work (HashTable sz _ _ hashes keys values) = go 0 seed0 where go !i !seed | i >= sz = return seed | otherwise = do h <- U.readArray hashes i if recordIsEmpty h || recordIsDeleted h then go (i+1) seed else do k <- readArray keys i v <- readArray values i !seed' <- f seed (k, v) go (i+1) seed' ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:mapM_". mapM_ :: ((k,v) -> ST s b) -> HashTable s k v -> ST s () mapM_ f htRef = readRef htRef >>= work where work (HashTable sz _ _ hashes keys values) = go 0 where go !i | i >= sz = return () | otherwise = do h <- U.readArray hashes i if recordIsEmpty h || recordIsDeleted h then go (i+1) else do k <- readArray keys i v <- readArray values i _ <- f (k, v) go (i+1) ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:computeOverhead". computeOverhead :: HashTable s k v -> ST s Double computeOverhead htRef = readRef htRef >>= work where work (HashTable sz' loadRef _ _ _ _) = do !ld <- U.readArray loadRef 0 let k = fromIntegral ld / sz return $ constOverhead / sz + overhead k where sz = fromIntegral sz' -- Change these if you change the representation constOverhead = 14 overhead k = 3 / k - 2 ------------------------------ -- Private functions follow -- ------------------------------ ------------------------------------------------------------------------------ {-# INLINE insertRecord #-} insertRecord :: Int -> U.IntArray s -> MutableArray s k -> MutableArray s v -> Int -> k -> v -> ST s () insertRecord !sz !hashes !keys !values !h !key !value = do let !b = whichBucket h sz debug $ "insertRecord sz=" ++ show sz ++ " h=" ++ show h ++ " b=" ++ show b probe b where probe !i = {-# SCC "insertRecord/probe" #-} do !idx <- forwardSearch2 hashes i sz emptyMarker deletedMarker debug $ "forwardSearch2 returned " ++ show idx assert (idx >= 0) $ do U.writeArray hashes idx h writeArray keys idx key writeArray values idx value ------------------------------------------------------------------------------ checkOverflow :: (Eq k, Hashable k) => (HashTable_ s k v) -> ST s (HashTable_ s k v) checkOverflow ht@(HashTable sz ldRef delRef _ _ _) = do !ld <- U.readArray ldRef 0 let !ld' = ld + 1 U.writeArray ldRef 0 ld' !dl <- U.readArray delRef 0 debug $ concat [ "checkOverflow: sz=" , show sz , " entries=" , show ld , " deleted=" , show dl ] if fromIntegral (ld + dl) / fromIntegral sz > maxLoad then if dl > ld `div` 2 then rehashAll ht sz else growTable ht else return ht ------------------------------------------------------------------------------ rehashAll :: Hashable k => HashTable_ s k v -> Int -> ST s (HashTable_ s k v) rehashAll (HashTable sz loadRef _ hashes keys values) sz' = do debug $ "rehashing: old size " ++ show sz ++ ", new size " ++ show sz' ht' <- newSizedReal sz' let (HashTable _ loadRef' _ newHashes newKeys newValues) = ht' U.readArray loadRef 0 >>= U.writeArray loadRef' 0 rehash newHashes newKeys newValues return ht' where rehash newHashes newKeys newValues = go 0 where go !i | i >= sz = return () | otherwise = {-# SCC "growTable/rehash" #-} do h0 <- U.readArray hashes i when (not (recordIsEmpty h0 || recordIsDeleted h0)) $ do k <- readArray keys i v <- readArray values i insertRecord sz' newHashes newKeys newValues (hash k) k v go $ i+1 ------------------------------------------------------------------------------ growTable :: Hashable k => HashTable_ s k v -> ST s (HashTable_ s k v) growTable ht@(HashTable sz _ _ _ _ _) = do let !sz' = bumpSize sz rehashAll ht sz' ------------------------------------------------------------------------------ -- Helper data structure for delete' data Slot = Slot { _slot :: {-# UNPACK #-} !Int , _wasDeleted :: {-# UNPACK #-} !Int -- we use Int because Bool won't -- unpack } deriving (Show) ------------------------------------------------------------------------------ instance Monoid Slot where mempty = Slot maxBound 0 (Slot x1 b1) `mappend` (Slot x2 b2) = if x1 == maxBound then Slot x2 b2 else Slot x1 b1 ------------------------------------------------------------------------------ -- Returns the slot in the array where it would be safe to write the given key. delete' :: (Hashable k, Eq k) => (HashTable_ s k v) -> Bool -> k -> Int -> ST s Int delete' (HashTable sz loadRef delRef hashes keys values) clearOut k h = do debug $ "delete': sz=" ++ show sz ++ " h=" ++ show h ++ " b0=" ++ show b0 (found, slot) <- go mempty b0 False let !b' = _slot slot when found $ bump loadRef (-1) -- bump the delRef lower if we're writing over a deleted marker when (not clearOut && _wasDeleted slot == 1) $ bump delRef (-1) return b' where bump ref i = do !ld <- U.readArray ref 0 U.writeArray ref 0 $! ld + i !b0 = whichBucket h sz haveWrapped !(Slot fp _) !b = if fp == maxBound then False else b <= fp -- arguments: -- * fp maintains the slot in the array where it would be safe to -- write the given key -- * b search the buckets array starting at this index. -- * wrap True if we've wrapped around, False otherwise go !fp !b !wrap = do debug $ "go: fp=" ++ show fp ++ " b=" ++ show b ++ ", wrap=" ++ show wrap !idx <- forwardSearch3 hashes b sz h emptyMarker deletedMarker debug $ "forwardSearch3 returned " ++ show idx if wrap && idx >= b0 -- we wrapped around in the search and didn't find our hash code; -- this means that the table is full of deleted elements. Just return -- the first place we'd be allowed to insert. -- -- TODO: if we get in this situation we should probably just rehash -- the table, because every insert is going to be O(n). then return $! (False, fp `mappend` (Slot (error "impossible") 0)) else do -- because the table isn't full, we know that there must be either -- an empty or a deleted marker somewhere in the table. Assert this -- here. assert (idx >= 0) $ return () h0 <- U.readArray hashes idx debug $ "h0 was " ++ show h0 if recordIsEmpty h0 then do let pl = fp `mappend` (Slot idx 0) debug $ "empty, returning " ++ show pl return (False, pl) else do let !wrap' = haveWrapped fp idx if recordIsDeleted h0 then do let pl = fp `mappend` (Slot idx 1) debug $ "deleted, cont with pl=" ++ show pl go pl (idx + 1) wrap' else if h == h0 then do k' <- readArray keys idx if k == k' then do let samePlace = _slot fp == idx debug $ "found at " ++ show idx debug $ "clearout=" ++ show clearOut debug $ "sp? " ++ show samePlace -- "clearOut" is set if we intend to write a new -- element into the slot. If we're doing an update -- and we found the old key, instead of writing -- "deleted" and then re-writing the new element -- there, we can just write the new element. This -- only works if we were planning on writing the -- new element here. when (clearOut || not samePlace) $ do bump delRef 1 U.writeArray hashes idx 1 writeArray keys idx undefined writeArray values idx undefined return (True, fp `mappend` (Slot idx 0)) else go fp (idx + 1) wrap' else go fp (idx + 1) wrap' ------------------------------------------------------------------------------ maxLoad :: Double maxLoad = 0.82 ------------------------------------------------------------------------------ emptyMarker :: Int emptyMarker = 0 ------------------------------------------------------------------------------ deletedMarker :: Int deletedMarker = 1 ------------------------------------------------------------------------------ {-# INLINE recordIsEmpty #-} recordIsEmpty :: Int -> Bool recordIsEmpty = (== emptyMarker) ------------------------------------------------------------------------------ {-# INLINE recordIsDeleted #-} recordIsDeleted :: Int -> Bool recordIsDeleted = (== deletedMarker) ------------------------------------------------------------------------------ {-# INLINE hash #-} hash :: (Hashable k) => k -> Int hash k = out where !(I# h#) = H.hash k !m# = maskw# h# 0# `or#` maskw# h# 1# !nm# = not# m# !r# = ((int2Word# 2#) `and#` m#) `or#` (int2Word# h# `and#` nm#) !out = I# (word2Int# r#) ------------------------------------------------------------------------------ newRef :: HashTable_ s k v -> ST s (HashTable s k v) newRef = liftM HT . newSTRef {-# INLINE newRef #-} writeRef :: HashTable s k v -> HashTable_ s k v -> ST s () writeRef (HT ref) ht = writeSTRef ref ht {-# INLINE writeRef #-} readRef :: HashTable s k v -> ST s (HashTable_ s k v) readRef (HT ref) = readSTRef ref {-# INLINE readRef #-} ------------------------------------------------------------------------------ {-# INLINE debug #-} debug :: String -> ST s () #ifdef DEBUG debug s = unsafeIOToST (putStrLn s) #else debug _ = return () #endif hashtables-1.0.1.8/src/Data/HashTable/ST/Cuckoo.hs0000644000000000000000000005057512032120320017560 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE MagicHash #-} {-| A hash table using the cuckoo strategy. (See ). Use this hash table if you... * want the fastest possible inserts, and very fast lookups. * are conscious of memory usage; this table has less space overhead than "Data.HashTable.ST.Basic", but more than "Data.HashTable.ST.Linear". * don't care that a table resize might pause for a long time to rehash all of the key-value mappings. /Details:/ The basic idea of cuckoo hashing, first introduced by Pagh and Rodler in 2001, is to use /d/ hash functions instead of only one; in this implementation d=2 and the strategy we use is to split up a flat array of slots into @k@ buckets, each cache-line-sized: @ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+ |x0|x1|x2|x3|x4|x5|x6|x7|y0|y1|y2|y3|y4|y5|y6|y7|z0|z1|z2........| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+ [ ^^^ bucket 0 ^^^ ][ ^^^ bucket 1 ^^^ ]... @ There are actually three parallel arrays: one unboxed array of 'Int's for hash codes, one boxed array for keys, and one boxed array for values. When looking up a key-value mapping, we hash the key using two hash functions and look in both buckets in the hash code array for the key. Each bucket is cache-line sized, with its keys in no particular order. Because the hash code array is unboxed, we can search it for the key using a highly-efficient branchless strategy in C code, using SSE instructions if available. On insert, if both buckets are full, we knock out a randomly-selected entry from one of the buckets (using a random walk ensures that \"key cycles\" are broken with maximum probability) and try to repeat the insert procedure. This process may not succeed; if all items have not successfully found a home after some number of tries, we give up and rehash all of the elements into a larger table. /Space overhead: experimental results/ The implementation of cuckoo hash given here is almost as fast for lookups as the basic open-addressing hash table using linear probing, and on average is more space-efficient: in randomized testing on my 64-bit machine (see @test\/compute-overhead\/ComputeOverhead.hs@ in the source distribution), mean overhead is 1.71 machine words per key-value mapping, with a standard deviation of 0.30 words, and 2.46 words per mapping at the 95th percentile. /References:/ * A. Pagh and F. Rodler. Cuckoo hashing. In /Proceedings of the 9th Annual European Symposium on Algorithms/, pp. 121-133, 2001. -} module Data.HashTable.ST.Cuckoo ( HashTable , new , newSized , delete , lookup , insert , mapM_ , foldM ) where ------------------------------------------------------------------------------ import Control.Monad hiding (foldM, mapM_) import Control.Monad.ST import Data.Hashable hiding (hash) import qualified Data.Hashable as H import Data.Int import Data.Maybe import Data.Primitive.Array import Data.STRef import GHC.Exts import Prelude hiding ( lookup, read, mapM_ ) ------------------------------------------------------------------------------ import qualified Data.HashTable.Class as C import Data.HashTable.Internal.CheapPseudoRandomBitStream import Data.HashTable.Internal.CacheLine import qualified Data.HashTable.Internal.IntArray as U import Data.HashTable.Internal.Utils #ifdef DEBUG import System.IO #endif ------------------------------------------------------------------------------ -- | A cuckoo hash table. newtype HashTable s k v = HT (STRef s (HashTable_ s k v)) data HashTable_ s k v = HashTable { _size :: {-# UNPACK #-} !Int -- ^ in buckets, total size is -- numWordsInCacheLine * _size , _rng :: {-# UNPACK #-} !(BitStream s) , _hashes :: {-# UNPACK #-} !(U.IntArray s) , _keys :: {-# UNPACK #-} !(MutableArray s k) , _values :: {-# UNPACK #-} !(MutableArray s v) , _maxAttempts :: {-# UNPACK #-} !Int } ------------------------------------------------------------------------------ instance C.HashTable HashTable where new = new newSized = newSized insert = insert delete = delete lookup = lookup foldM = foldM mapM_ = mapM_ computeOverhead = computeOverhead ------------------------------------------------------------------------------ instance Show (HashTable s k v) where show _ = "" ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:new". new :: ST s (HashTable s k v) new = newSizedReal 2 >>= newRef {-# INLINE new #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:newSized". newSized :: Int -> ST s (HashTable s k v) newSized n = do let n' = (n + numWordsInCacheLine - 1) `div` numWordsInCacheLine let k = nextBestPrime $ ceiling $ fromIntegral n' / maxLoad newSizedReal k >>= newRef {-# INLINE newSized #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:insert". insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () insert ht !k !v = readRef ht >>= \h -> insert' h k v >>= writeRef ht ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:computeOverhead". computeOverhead :: HashTable s k v -> ST s Double computeOverhead htRef = readRef htRef >>= work where work (HashTable sz _ _ _ _ _) = do nFilled <- foldM f 0 htRef let oh = totSz -- one word per element in hashes + 2 * (totSz - nFilled) -- two words per non-filled entry + 12 -- fixed overhead return $! fromIntegral (oh::Int) / fromIntegral nFilled where totSz = numWordsInCacheLine * sz f !a _ = return $! a+1 ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:delete". delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s () delete htRef k = readRef htRef >>= go where go ht@(HashTable sz _ _ _ _ _) = do _ <- delete' ht False k b1 b2 h1 h2 return () where h1 = hash1 k h2 = hash2 k b1 = whichLine h1 sz b2 = whichLine h2 sz ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:lookup". lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) lookup htRef k = do ht <- readRef htRef lookup' ht k {-# INLINE lookup #-} ------------------------------------------------------------------------------ lookup' :: (Eq k, Hashable k) => HashTable_ s k v -> k -> ST s (Maybe v) lookup' (HashTable sz _ hashes keys values _) !k = do -- Unlike the write case, prefetch doesn't seem to help here for lookup. -- prefetchRead hashes b2 idx1 <- searchOne keys hashes k b1 h1 if idx1 >= 0 then do v <- readArray values idx1 return $! Just v else do idx2 <- searchOne keys hashes k b2 h2 if idx2 >= 0 then do v <- readArray values idx2 return $! Just v else return Nothing where h1 = hash1 k h2 = hash2 k b1 = whichLine h1 sz b2 = whichLine h2 sz {-# INLINE lookup' #-} ------------------------------------------------------------------------------ searchOne :: (Eq k) => MutableArray s k -> U.IntArray s -> k -> Int -> Int -> ST s Int searchOne !keys !hashes !k = go where go !b !h = do debug $ "searchOne: go " ++ show b ++ " " ++ show h idx <- cacheLineSearch hashes b h debug $ "searchOne: cacheLineSearch returned " ++ show idx case idx of -1 -> return (-1) _ -> do k' <- readArray keys idx if k == k' then return idx else do let !idx' = idx + 1 if isCacheLineAligned idx' then return (-1) else go idx' h {-# INLINE searchOne #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:foldM". foldM :: (a -> (k,v) -> ST s a) -> a -> HashTable s k v -> ST s a foldM f seed0 htRef = readRef htRef >>= foldMWork f seed0 {-# INLINE foldM #-} ------------------------------------------------------------------------------ foldMWork :: (a -> (k,v) -> ST s a) -> a -> HashTable_ s k v -> ST s a foldMWork f seed0 (HashTable sz _ hashes keys values _) = go 0 seed0 where totSz = numWordsInCacheLine * sz go !i !seed | i >= totSz = return seed | otherwise = do h <- U.readArray hashes i if h /= emptyMarker then do k <- readArray keys i v <- readArray values i !seed' <- f seed (k,v) go (i+1) seed' else go (i+1) seed {-# INLINE foldMWork #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:mapM_". mapM_ :: ((k,v) -> ST s a) -> HashTable s k v -> ST s () mapM_ f htRef = readRef htRef >>= mapMWork f {-# INLINE mapM_ #-} ------------------------------------------------------------------------------ mapMWork :: ((k,v) -> ST s a) -> HashTable_ s k v -> ST s () mapMWork f (HashTable sz _ hashes keys values _) = go 0 where totSz = numWordsInCacheLine * sz go !i | i >= totSz = return () | otherwise = do h <- U.readArray hashes i if h /= emptyMarker then do k <- readArray keys i v <- readArray values i _ <- f (k,v) go (i+1) else go (i+1) {-# INLINE mapMWork #-} --------------------------------- -- Private declarations follow -- --------------------------------- ------------------------------------------------------------------------------ newSizedReal :: Int -> ST s (HashTable_ s k v) newSizedReal nbuckets = do let !ntotal = nbuckets * numWordsInCacheLine let !maxAttempts = 12 + (log2 $ toEnum nbuckets) debug $ "creating cuckoo hash table with " ++ show nbuckets ++ " buckets having " ++ show ntotal ++ " total slots" rng <- newBitStream hashes <- U.newArray ntotal keys <- newArray ntotal undefined values <- newArray ntotal undefined return $! HashTable nbuckets rng hashes keys values maxAttempts insert' :: (Eq k, Hashable k) => HashTable_ s k v -> k -> v -> ST s (HashTable_ s k v) insert' ht k v = do debug "insert': begin" mbX <- updateOrFail ht k v z <- maybe (return ht) (\(k',v') -> grow ht k' v') mbX debug "insert': end" return z {-# INLINE insert #-} ------------------------------------------------------------------------------ updateOrFail :: (Eq k, Hashable k) => HashTable_ s k v -> k -> v -> ST s (Maybe (k,v)) updateOrFail ht@(HashTable sz _ hashes keys values _) k v = do debug $ "updateOrFail: begin: sz = " ++ show sz debug $ " h1=" ++ show h1 ++ ", h2=" ++ show h2 ++ ", b1=" ++ show b1 ++ ", b2=" ++ show b2 (didx, hashCode) <- delete' ht True k b1 b2 h1 h2 debug $ "delete' returned (" ++ show didx ++ "," ++ show hashCode ++ ")" if didx >= 0 then do U.writeArray hashes didx hashCode writeArray keys didx k writeArray values didx v return Nothing else cuckoo where h1 = hash1 k h2 = hash2 k b1 = whichLine h1 sz b2 = whichLine h2 sz cuckoo = do debug "cuckoo: calling cuckooOrFail" result <- cuckooOrFail ht h1 h2 b1 b2 k v debug $ "cuckoo: cuckooOrFail returned " ++ (if isJust result then "Just _" else "Nothing") -- if cuckoo failed we need to grow the table. maybe (return Nothing) (return . Just) result {-# INLINE updateOrFail #-} ------------------------------------------------------------------------------ -- Returns either (-1,-1) (not found, and both buckets full ==> trigger -- cuckoo), or the slot in the array where it would be safe to write the given -- key, and the hashcode to use there delete' :: (Hashable k, Eq k) => HashTable_ s k v -- ^ hash table -> Bool -- ^ are we updating? -> k -- ^ key -> Int -- ^ cache line start address 1 -> Int -- ^ cache line start address 2 -> Int -- ^ hash1 -> Int -- ^ hash2 -> ST s (Int, Int) delete' (HashTable _ _ hashes keys values _) !updating !k b1 b2 h1 h2 = do debug $ "delete' b1=" ++ show b1 ++ " b2=" ++ show b2 ++ " h1=" ++ show h1 ++ " h2=" ++ show h2 prefetchWrite hashes b2 idx1 <- searchOne keys hashes k b1 h1 if idx1 < 0 then do idx2 <- searchOne keys hashes k b2 h2 if idx2 < 0 then if updating then do debug $ "delete': looking for empty element" -- if we're updating, we look for an empty element idxE1 <- cacheLineSearch hashes b1 emptyMarker debug $ "delete': idxE1 was " ++ show idxE1 if idxE1 >= 0 then return (idxE1, h1) else do idxE2 <- cacheLineSearch hashes b2 emptyMarker debug $ "delete': idxE2 was " ++ show idxE1 if idxE2 >= 0 then return (idxE2, h2) else return (-1, -1) else return (-1,-1) else deleteIt idx2 h2 else deleteIt idx1 h1 where deleteIt !idx !h = do if not updating then do U.writeArray hashes idx emptyMarker writeArray keys idx undefined writeArray values idx undefined else return () return $! (idx, h) {-# INLINE delete' #-} ------------------------------------------------------------------------------ cuckooOrFail :: (Hashable k, Eq k) => HashTable_ s k v -- ^ hash table -> Int -- ^ hash code 1 -> Int -- ^ hash code 2 -> Int -- ^ cache line 1 -> Int -- ^ cache line 2 -> k -- ^ key -> v -- ^ value -> ST s (Maybe (k,v)) cuckooOrFail (HashTable sz rng hashes keys values maxAttempts0) !h1_0 !h2_0 !b1_0 !b2_0 !k0 !v0 = do -- at this point we know: -- -- * there is no empty slot in either cache line -- -- * the key doesn't already exist in the table -- -- next things to do: -- -- * decide which element to bump -- -- * read that element, and write (k,v) in there -- -- * attempt to write the bumped element into its other cache slot -- -- * if it fails, recurse. debug $ "cuckooOrFail h1_0=" ++ show h1_0 ++ " h2_0=" ++ show h2_0 ++ " b1_0=" ++ show b1_0 ++ " b2_0=" ++ show b2_0 !lineChoice <- getNextBit rng debug $ "chose line " ++ show lineChoice let (!b, !h) = if lineChoice == 0 then (b1_0, h1_0) else (b2_0, h2_0) go b h k0 v0 maxAttempts0 where randomIdx !b = do !z <- getNBits cacheLineIntBits rng return $! b + z bumpIdx !idx !h !k !v = do debug $ "bumpIdx idx=" ++ show idx ++ " h=" ++ show h !h' <- U.readArray hashes idx debug $ "bumpIdx: h' was " ++ show h' !k' <- readArray keys idx v' <- readArray values idx U.writeArray hashes idx h writeArray keys idx k writeArray values idx v debug $ "bumped key with h'=" ++ show h' return $! (h', k', v') otherHash h k = if h2 == h then h1 else h2 where h1 = hash1 k h2 = hash2 k tryWrite !b !h k v maxAttempts = do debug $ "tryWrite b=" ++ show b ++ " h=" ++ show h idx <- cacheLineSearch hashes b emptyMarker debug $ "cacheLineSearch returned " ++ show idx if idx >= 0 then do U.writeArray hashes idx h writeArray keys idx k writeArray values idx v return Nothing else go b h k v $! maxAttempts - 1 go !b !h !k v !maxAttempts | maxAttempts == 0 = return $! Just (k,v) | otherwise = do idx <- randomIdx b (!h0', !k', v') <- bumpIdx idx h k v let !h' = otherHash h0' k' let !b' = whichLine h' sz tryWrite b' h' k' v' maxAttempts ------------------------------------------------------------------------------ grow :: (Eq k, Hashable k) => HashTable_ s k v -> k -> v -> ST s (HashTable_ s k v) grow (HashTable sz _ hashes keys values _) k0 v0 = do newHt <- grow' $! bumpSize sz mbR <- updateOrFail newHt k0 v0 maybe (return newHt) (\_ -> grow' $ bumpSize $ _size newHt) mbR where grow' newSz = do debug $ "growing table, oldsz = " ++ show sz ++ ", newsz=" ++ show newSz newHt <- newSizedReal newSz rehash newSz newHt rehash !newSz !newHt = go 0 where totSz = numWordsInCacheLine * sz go !i | i >= totSz = return newHt | otherwise = do h <- U.readArray hashes i if (h /= emptyMarker) then do k <- readArray keys i v <- readArray values i mbR <- updateOrFail newHt k v maybe (go $ i + 1) (\_ -> grow' $ bumpSize newSz) mbR else go $ i + 1 ------------------------------------------------------------------------------ hashPrime :: Int hashPrime = if wordSize == 32 then hashPrime32 else hashPrime64 where hashPrime32 = 0xedf2a025 hashPrime64 = 0x3971ca9c8b3722e9 ------------------------------------------------------------------------------ hash1 :: Hashable k => k -> Int hash1 = hashF H.hash {-# INLINE hash1 #-} hash2 :: Hashable k => k -> Int hash2 = hashF (H.hashWithSalt hashPrime) {-# INLINE hash2 #-} hashF :: (k -> Int) -> k -> Int hashF f k = out where !(I# h#) = f k !m# = maskw# h# 0# !nm# = not# m# !r# = ((int2Word# 1#) `and#` m#) `or#` (int2Word# h# `and#` nm#) !out = I# (word2Int# r#) {-# INLINE hashF #-} ------------------------------------------------------------------------------ emptyMarker :: Int emptyMarker = 0 ------------------------------------------------------------------------------ maxLoad :: Double maxLoad = 0.88 ------------------------------------------------------------------------------ debug :: String -> ST s () #ifdef DEBUG debug s = unsafeIOToST (putStrLn s >> hFlush stdout) #else debug _ = return () #endif {-# INLINE debug #-} ------------------------------------------------------------------------------ whichLine :: Int -> Int -> Int whichLine !h !sz = whichBucket h sz `iShiftL` cacheLineIntBits {-# INLINE whichLine #-} ------------------------------------------------------------------------------ newRef :: HashTable_ s k v -> ST s (HashTable s k v) newRef = liftM HT . newSTRef {-# INLINE newRef #-} writeRef :: HashTable s k v -> HashTable_ s k v -> ST s () writeRef (HT ref) ht = writeSTRef ref ht {-# INLINE writeRef #-} readRef :: HashTable s k v -> ST s (HashTable_ s k v) readRef (HT ref) = readSTRef ref {-# INLINE readRef #-} hashtables-1.0.1.8/src/Data/HashTable/ST/Linear.hs0000644000000000000000000003455612032120320017550 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE MagicHash #-} {-# LANGUAGE RankNTypes #-} {-| An implementation of linear hash tables. (See ). Use this hash table if you... * care a lot about fitting your data set into memory; of the hash tables included in this collection, this one has the lowest space overhead * don't care that inserts and lookups are slower than the other hash table implementations in this collection (this one is slightly faster than @Data.HashTable@ from the base library in most cases) * have a soft real-time or interactive application for which the risk of introducing a long pause on insert while all of the keys are rehashed is unacceptable. /Details:/ Linear hashing allows for the expansion of the hash table one slot at a time, by moving a \"split\" pointer across an array of pointers to buckets. The number of buckets is always a power of two, and the bucket to look in is defined as: @ bucket(level,key) = hash(key) mod (2^level) @ The \"split pointer\" controls the expansion of the hash table. If the hash table is at level @k@ (i.e. @2^k@ buckets have been allocated), we first calculate @b=bucket(level-1,key)@. If @b < splitptr@, the destination bucket is calculated as @b'=bucket(level,key)@, otherwise the original value @b@ is used. The split pointer is incremented once an insert causes some bucket to become fuller than some predetermined threshold; the bucket at the split pointer (*not* the bucket which triggered the split!) is then rehashed, and half of its keys can be expected to be rehashed into the upper half of the table. When the split pointer reaches the middle of the bucket array, the size of the bucket array is doubled, the level increases, and the split pointer is reset to zero. Linear hashing, although not quite as fast for inserts or lookups as the implementation of linear probing included in this package, is well suited for interactive applications because it has much better worst case behaviour on inserts. Other hash table implementations can suffer from long pauses, because it is occasionally necessary to rehash all of the keys when the table grows. Linear hashing, on the other hand, only ever rehashes a bounded (effectively constant) number of keys when an insert forces a bucket split. /Space overhead: experimental results/ In randomized testing (see @test\/compute-overhead\/ComputeOverhead.hs@ in the source distribution), mean overhead is approximately 1.51 machine words per key-value mapping with a very low standard deviation of about 0.06 words, 1.60 words per mapping at the 95th percentile. /Unsafe tricks/ Then the @unsafe-tricks@ flag is on when this package is built (and it is on by default), we use some unsafe tricks (namely 'unsafeCoerce#' and 'reallyUnsafePtrEquality#') to save indirections in this table. These techniques rely on assumptions about the behaviour of the GHC runtime system and, although they've been tested and should be safe under normal conditions, are slightly dangerous. Caveat emptor. In particular, these techniques are incompatible with HPC code coverage reports. References: * W. Litwin. Linear hashing: a new tool for file and table addressing. In /Proc. 6th International Conference on Very Large Data Bases, Volume 6/, pp. 212-223, 1980. * P-A. Larson. Dynamic hash tables. /Communications of the ACM/ 31: 446-457, 1988. -} module Data.HashTable.ST.Linear ( HashTable , new , newSized , delete , lookup , insert , mapM_ , foldM , computeOverhead ) where ------------------------------------------------------------------------------ import Control.Monad hiding (mapM_, foldM) import Control.Monad.ST import Data.Bits import Data.Hashable import Data.STRef import Prelude hiding (mapM_, lookup) ------------------------------------------------------------------------------ import qualified Data.HashTable.Class as C import Data.HashTable.Internal.Array import qualified Data.HashTable.Internal.Linear.Bucket as Bucket import Data.HashTable.Internal.Linear.Bucket (Bucket) import Data.HashTable.Internal.Utils #ifdef DEBUG import System.IO #endif ------------------------------------------------------------------------------ -- | A linear hash table. newtype HashTable s k v = HT (STRef s (HashTable_ s k v)) data HashTable_ s k v = HashTable { _level :: {-# UNPACK #-} !Int , _splitptr :: {-# UNPACK #-} !Int , _buckets :: {-# UNPACK #-} !(MutableArray s (Bucket s k v)) } ------------------------------------------------------------------------------ instance C.HashTable HashTable where new = new newSized = newSized insert = insert delete = delete lookup = lookup foldM = foldM mapM_ = mapM_ computeOverhead = computeOverhead ------------------------------------------------------------------------------ instance Show (HashTable s k v) where show _ = "" ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:new". new :: ST s (HashTable s k v) new = do v <- Bucket.newBucketArray 2 newRef $ HashTable 1 0 v ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:newSized". newSized :: Int -> ST s (HashTable s k v) newSized n = do v <- Bucket.newBucketArray sz newRef $ HashTable lvl 0 v where k = ceiling (fromIntegral n * fillFactor / fromIntegral bucketSplitSize) lvl = max 1 (fromEnum $ log2 k) sz = power2 lvl ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:delete". delete :: (Hashable k, Eq k) => (HashTable s k v) -> k -> ST s () delete htRef !k = readRef htRef >>= work where work (HashTable lvl splitptr buckets) = do let !h0 = hashKey lvl splitptr k debug $ "delete: size=" ++ show (power2 lvl) ++ ", h0=" ++ show h0 ++ "splitptr: " ++ show splitptr delete' buckets h0 k {-# INLINE delete #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:lookup". lookup :: (Eq k, Hashable k) => (HashTable s k v) -> k -> ST s (Maybe v) lookup htRef !k = readRef htRef >>= work where work (HashTable lvl splitptr buckets) = do let h0 = hashKey lvl splitptr k bucket <- readArray buckets h0 Bucket.lookup bucket k {-# INLINE lookup #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:insert". insert :: (Eq k, Hashable k) => (HashTable s k v) -> k -> v -> ST s () insert htRef k v = do ht' <- readRef htRef >>= work writeRef htRef ht' where work ht@(HashTable lvl splitptr buckets) = do let !h0 = hashKey lvl splitptr k delete' buckets h0 k bsz <- primitiveInsert' buckets h0 k v if checkOverflow bsz then do debug $ "insert: splitting" h <- split ht debug $ "insert: done splitting" return h else do debug $ "insert: done" return ht {-# INLINE insert #-} ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:mapM_". mapM_ :: ((k,v) -> ST s b) -> HashTable s k v -> ST s () mapM_ f htRef = readRef htRef >>= work where work (HashTable lvl _ buckets) = go 0 where !sz = power2 lvl go !i | i >= sz = return () | otherwise = do b <- readArray buckets i Bucket.mapM_ f b go $ i+1 ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:foldM". foldM :: (a -> (k,v) -> ST s a) -> a -> HashTable s k v -> ST s a foldM f seed0 htRef = readRef htRef >>= work where work (HashTable lvl _ buckets) = go seed0 0 where !sz = power2 lvl go !seed !i | i >= sz = return seed | otherwise = do b <- readArray buckets i !seed' <- Bucket.foldM f seed b go seed' $ i+1 ------------------------------------------------------------------------------ -- | See the documentation for this function in -- "Data.HashTable.Class#v:computeOverhead". computeOverhead :: HashTable s k v -> ST s Double computeOverhead htRef = readRef htRef >>= work where work (HashTable lvl _ buckets) = do (totElems, overhead) <- go 0 0 0 let n = fromIntegral totElems let o = fromIntegral overhead return $ (fromIntegral sz + constOverhead + o) / n where constOverhead = 5.0 !sz = power2 lvl go !nelems !overhead !i | i >= sz = return (nelems, overhead) | otherwise = do b <- readArray buckets i (!n,!o) <- Bucket.nelemsAndOverheadInWords b let !n' = n + nelems let !o' = o + overhead go n' o' (i+1) ------------------------------ -- Private functions follow -- ------------------------------ ------------------------------------------------------------------------------ delete' :: Eq k => MutableArray s (Bucket s k v) -> Int -> k -> ST s () delete' buckets h0 k = do bucket <- readArray buckets h0 _ <- Bucket.delete bucket k return () ------------------------------------------------------------------------------ split :: (Hashable k) => (HashTable_ s k v) -> ST s (HashTable_ s k v) split ht@(HashTable lvl splitptr buckets) = do debug $ "split: start: nbuck=" ++ show (power2 lvl) ++ ", splitptr=" ++ show splitptr -- grab bucket at splitPtr oldBucket <- readArray buckets splitptr nelems <- Bucket.size oldBucket let !bsz = max Bucket.newBucketSize $ ceiling $ (0.625 :: Double) * fromIntegral nelems -- write an empty bucket there dbucket1 <- Bucket.emptyWithSize bsz writeArray buckets splitptr dbucket1 -- grow the buckets? let lvl2 = power2 lvl let lvl1 = power2 $ lvl-1 (!buckets',!lvl',!sp') <- if splitptr+1 >= lvl1 then do debug $ "split: resizing bucket array" let lvl3 = 2*lvl2 b <- Bucket.expandBucketArray lvl3 lvl2 buckets debug $ "split: resizing bucket array: done" return (b,lvl+1,0) else return (buckets,lvl,splitptr+1) let ht' = HashTable lvl' sp' buckets' -- make sure the other split bucket has enough room in it also let splitOffs = splitptr + lvl1 db2 <- readArray buckets' splitOffs db2sz <- Bucket.size db2 let db2sz' = db2sz + bsz db2' <- Bucket.growBucketTo db2sz' db2 debug $ "growing bucket at " ++ show splitOffs ++ " to size " ++ show db2sz' writeArray buckets' splitOffs db2' -- rehash old bucket debug $ "split: rehashing bucket" let f = uncurry $ primitiveInsert ht' forceSameType f (uncurry $ primitiveInsert ht) Bucket.mapM_ f oldBucket debug $ "split: done" return ht' ------------------------------------------------------------------------------ checkOverflow :: Int -> Bool checkOverflow sz = sz > bucketSplitSize ------------------------------------------------------------------------------ -- insert w/o splitting primitiveInsert :: (Hashable k) => (HashTable_ s k v) -> k -> v -> ST s Int primitiveInsert (HashTable lvl splitptr buckets) k v = do debug $ "primitiveInsert start: nbuckets=" ++ show (power2 lvl) let h0 = hashKey lvl splitptr k primitiveInsert' buckets h0 k v ------------------------------------------------------------------------------ primitiveInsert' :: MutableArray s (Bucket s k v) -> Int -> k -> v -> ST s Int primitiveInsert' buckets !h0 !k !v = do debug $ "primitiveInsert': bucket number=" ++ show h0 bucket <- readArray buckets h0 debug $ "primitiveInsert': snoccing bucket" (!hw,m) <- Bucket.snoc bucket k v debug $ "primitiveInsert': bucket snoc'd" maybe (return ()) (writeArray buckets h0) m return hw ------------------------------------------------------------------------------ fillFactor :: Double fillFactor = 1.3 ------------------------------------------------------------------------------ bucketSplitSize :: Int bucketSplitSize = Bucket.bucketSplitSize ------------------------------------------------------------------------------ {-# INLINE power2 #-} power2 :: Int -> Int power2 i = 1 `iShiftL` i ------------------------------------------------------------------------------ {-# INLINE hashKey #-} hashKey :: (Hashable k) => Int -> Int -> k -> Int hashKey !lvl !splitptr !k = h1 where !h0 = hashAtLvl (lvl-1) k !h1 = if (h0 < splitptr) then hashAtLvl lvl k else h0 ------------------------------------------------------------------------------ {-# INLINE hashAtLvl #-} hashAtLvl :: (Hashable k) => Int -> k -> Int hashAtLvl !lvl !k = h where !h = hashcode .&. mask !hashcode = hash k !mask = power2 lvl - 1 ------------------------------------------------------------------------------ newRef :: HashTable_ s k v -> ST s (HashTable s k v) newRef = liftM HT . newSTRef writeRef :: HashTable s k v -> HashTable_ s k v -> ST s () writeRef (HT ref) ht = writeSTRef ref ht readRef :: HashTable s k v -> ST s (HashTable_ s k v) readRef (HT ref) = readSTRef ref ------------------------------------------------------------------------------ {-# INLINE debug #-} debug :: String -> ST s () #ifdef DEBUG debug s = unsafeIOToST $ do putStrLn s hFlush stdout #else #ifdef TESTSUITE debug !s = do let !_ = length s return $! () #else debug _ = return () #endif #endif hashtables-1.0.1.8/test/0000755000000000000000000000000012032120320013123 5ustar0000000000000000hashtables-1.0.1.8/test/hashtables-test.cabal0000644000000000000000000000755112032120320017212 0ustar0000000000000000Name: hashtables-test Version: 0.2 Author: Gregory Collins Maintainer: greg@gregorycollins.net Copyright: (c) 2011-2012, Google, Inc. Category: Data Build-type: Simple Cabal-version: >= 1.8 ------------------------------------------------------------------------------ Flag debug Description: if on, spew debugging output to stdout Default: False Flag unsafe-tricks Description: turn on unsafe GHC tricks Default: False Flag bounds-checking Description: if on, use bounds-checking array accesses Default: False Flag sse41 Description: if on, use SSE 4.1 extensions to search cache lines very efficiently Default: False Flag portable Description: if on, use only pure Haskell code and no GHC extensions. Default: False Executable testsuite hs-source-dirs: ../src suite main-is: TestSuite.hs if !flag(portable) C-sources: ../cbits/cfuncs.c ghc-prof-options: -prof -auto-all if flag(portable) || !flag(unsafe-tricks) ghc-options: -fhpc if flag(portable) cpp-options: -DNO_C_SEARCH -DPORTABLE if !flag(portable) && flag(unsafe-tricks) && impl(ghc) cpp-options: -DUNSAFETRICKS build-depends: ghc-prim if flag(debug) cpp-options: -DDEBUG if flag(bounds-checking) cpp-options: -DBOUNDS_CHECKING if !flag(portable) && flag(sse41) cc-options: -DUSE_SSE_4_1 -msse4.1 cpp-options: -DUSE_SSE_4_1 Build-depends: base >= 4 && <5, hashable >= 1.1 && <2, mwc-random >= 0.8 && <0.13, primitive, QuickCheck >= 2.3.0.2, HUnit >= 1.2 && <2, test-framework >= 0.3.1 && <0.7, test-framework-quickcheck2 >= 0.2.6 && <0.3, test-framework-hunit >= 0.2.6 && <3, vector >= 0.7 cpp-options: -DTESTSUITE if impl(ghc >= 7) ghc-options: -rtsopts if impl(ghc >= 6.12.0) ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2 -fno-warn-unused-do-bind -threaded else ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2 -threaded Executable compute-overhead hs-source-dirs: ../src suite compute-overhead main-is: ComputeOverhead.hs C-sources: ../cbits/cfuncs.c ghc-prof-options: -prof -auto-all if flag(portable) cpp-options: -DNO_C_SEARCH -DPORTABLE if !flag(portable) && flag(unsafe-tricks) && impl(ghc) cpp-options: -DUNSAFETRICKS build-depends: ghc-prim if flag(debug) cpp-options: -DDEBUG if flag(bounds-checking) cpp-options: -DBOUNDS_CHECKING if !flag(portable) && flag(sse41) cc-options: -DUSE_SSE_4_1 -msse4.1 cpp-options: -DUSE_SSE_4_1 Build-depends: base >= 4 && <5, hashable >= 1.1 && <2, mwc-random >= 0.8 && <0.13, QuickCheck >= 2.3.0.2 && <3, HUnit >= 1.2 && <2, test-framework >= 0.3.1 && <0.7, test-framework-quickcheck2 >= 0.2.6 && <0.3, test-framework-hunit >= 0.2.6 && <3, statistics >= 0.8 && <0.11, primitive, vector >= 0.7 if impl(ghc >= 7) ghc-options: -rtsopts if impl(ghc >= 6.12.0) ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2 -fno-warn-unused-do-bind -threaded else ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2 -threaded hashtables-1.0.1.8/test/runTestsAndCoverage.sh0000755000000000000000000000112312032120320017405 0ustar0000000000000000#!/bin/sh set -e SUITE=./dist/build/testsuite/testsuite export LC_ALL=C export LANG=C rm -f testsuite.tix if [ ! -f $SUITE ]; then cat </dev/null 2>&1 rm -f testsuite.tix cat < FixedTableType h -> GenIO -> IO Double overhead dummy rng = do size <- uniformR (1000,50000) rng !v <- replicateM' size $ uniform rng let _ = v :: [(Int,Int)] !ht <- fromList v forceType dummy ht x <- computeOverhead ht return x where replicateM' :: Int -> IO a -> IO [a] replicateM' !sz f = go sz [] where go !i !l | i == 0 = return l | otherwise = do !x <- f go (i-1) (x:l) -- Returns mean / stddev runTrials :: C.HashTable h => FixedTableType h -> GenIO -> Int -> IO (Double, Double, Double, Double) runTrials dummy rng ntrials = do sample <- rep ntrials $ overhead dummy rng let (m, v) = meanVarianceUnb sample return (m, sqrt v, p95 sample, pMax sample) where p95 sample = continuousBy cadpw 19 20 sample pMax sample = V.foldl' max (-1) sample rep !n !f = do mv <- VM.new n go mv where go !mv = go' 0 where go' !i | i >= n = V.unsafeFreeze mv | otherwise = do !d <- f VM.unsafeWrite mv i d go' $ i+1 main :: IO () main = do rng <- do args <- getArgs if null args then withSystemRandom (\x -> (return x) :: IO GenIO) else initialize $ V.fromList [read $ head args] runTrials dummyLinearTable rng nTrials >>= report "linear hash table" runTrials dummyBasicTable rng nTrials >>= report "basic hash table" runTrials dummyCuckooTable rng nTrials >>= report "cuckoo hash table" where nTrials = 200 report name md = putStrLn msg where msg = concat [ "\n(Mean,StdDev,95%,Max) for overhead of " , name , " (" , show nTrials , " trials): " , show md , "\n" ] dummyBasicTable = dummyTable :: forall k v . BasicHashTable k v dummyLinearTable = dummyTable :: forall k v . LinearHashTable k v dummyCuckooTable = dummyTable :: forall k v . CuckooHashTable k v hashtables-1.0.1.8/test/suite/0000755000000000000000000000000012032120320014254 5ustar0000000000000000hashtables-1.0.1.8/test/suite/TestSuite.hs0000644000000000000000000000207612032120320016546 0ustar0000000000000000{-# LANGUAGE RankNTypes #-} module Main where import Test.Framework (defaultMain) ------------------------------------------------------------------------------ import qualified Data.HashTable.Test.Common as Common import qualified Data.HashTable.ST.Basic as B import qualified Data.HashTable.ST.Cuckoo as C import qualified Data.HashTable.ST.Linear as L import qualified Data.HashTable.IO as IO ------------------------------------------------------------------------------ main :: IO () main = defaultMain tests where dummyBasicTable = Common.dummyTable :: forall k v . IO.IOHashTable (B.HashTable) k v dummyCuckooTable = Common.dummyTable :: forall k v . IO.IOHashTable (C.HashTable) k v dummyLinearTable = Common.dummyTable :: forall k v . IO.IOHashTable (L.HashTable) k v basicTests = Common.tests "basic" dummyBasicTable cuckooTests = Common.tests "cuckoo" dummyCuckooTable linearTests = Common.tests "linear" dummyLinearTable tests = [basicTests, linearTests, cuckooTests] hashtables-1.0.1.8/test/suite/Data/0000755000000000000000000000000012032120320015125 5ustar0000000000000000hashtables-1.0.1.8/test/suite/Data/HashTable/0000755000000000000000000000000012032120320016760 5ustar0000000000000000hashtables-1.0.1.8/test/suite/Data/HashTable/Test/0000755000000000000000000000000012032120320017677 5ustar0000000000000000hashtables-1.0.1.8/test/suite/Data/HashTable/Test/Common.hs0000644000000000000000000003147312032120320021473 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE CPP #-} {-# LANGUAGE ForeignFunctionInterface #-} {-# LANGUAGE RankNTypes #-} module Data.HashTable.Test.Common ( FixedTableType , dummyTable , forceType , tests ) where ------------------------------------------------------------------------------ import Control.Monad (foldM_, liftM, when) import Control.Monad.ST (unsafeIOToST) import Data.IORef import Data.List hiding ( insert , delete , lookup ) import Data.Vector (Vector) import qualified Data.Vector as V import qualified Data.Vector.Mutable as MV import Prelude hiding (lookup, mapM_) import System.Random.MWC import System.Timeout import Test.Framework import Test.Framework.Providers.HUnit import Test.Framework.Providers.QuickCheck2 import Test.HUnit (assertFailure) import Test.QuickCheck import Test.QuickCheck.Monadic ------------------------------------------------------------------------------ import qualified Data.HashTable.Class as C import Data.HashTable.IO #ifndef PORTABLE import Control.Concurrent import Foreign (malloc, free, poke, Ptr) import Foreign.C.Types (CInt) #endif ------------------------------------------------------------------------------ type FixedTableType h = forall k v . IOHashTable h k v type HashTest = forall h . C.HashTable h => String -> FixedTableType h -> Test data SomeTest = SomeTest HashTest ------------------------------------------------------------------------------ assertEq :: (Eq a, Show a) => String -> a -> a -> PropertyM IO () assertEq s expected got = when (expected /= got) $ do fail $ s ++ ": expected '" ++ show expected ++ "', got '" ++ show got ++ "'" ------------------------------------------------------------------------------ forceType :: forall m h k1 k2 v1 v2 . (Monad m, C.HashTable h) => IOHashTable h k1 v1 -> IOHashTable h k2 v2 -> m () forceType _ _ = return () ------------------------------------------------------------------------------ dummyTable :: forall k v h . C.HashTable h => IOHashTable h k v dummyTable = undefined ------------------------------------------------------------------------------ tests :: C.HashTable h => String -> FixedTableType h -> Test tests prefix dummyArg = testGroup prefix $ map f ts where f (SomeTest ht) = ht prefix dummyArg ts = [ SomeTest testFromListToList , SomeTest testInsert , SomeTest testInsert2 , SomeTest testNewAndInsert , SomeTest testGrowTable , SomeTest testDelete , SomeTest testNastyFullLookup , SomeTest testForwardSearch3 ] ------------------------------------------------------------------------------ testFromListToList :: HashTest testFromListToList prefix dummyArg = testProperty (prefix ++ "/fromListToList") $ monadicIO $ do rng <- initializeRNG forAllM arbitrary $ prop rng where prop :: GenIO -> [(Int, Int)] -> PropertyM IO () prop rng origL = do let l = V.toList $ shuffle rng $ V.fromList $ dedupe origL ht <- run $ fromList l l' <- run $ toList ht assertEq "fromList . toList == id" (sort l) (sort l') forceType dummyArg ht ------------------------------------------------------------------------------ testInsert :: HashTest testInsert prefix dummyArg = testProperty (prefix ++ "/insert") $ monadicIO $ do rng <- initializeRNG forAllM arbitrary $ prop rng where prop :: GenIO -> ([(Int, Int)], (Int,Int)) -> PropertyM IO () prop rng (origL, (k,v)) = do let l = V.toList $ shuffle rng $ V.fromList $ remove k $ dedupe origL assert $ all (\t -> fst t /= k) l ht <- run $ fromList l nothing <- run $ lookup ht k assertEq ("lookup " ++ show k) Nothing nothing run $ insert ht k v r <- run $ lookup ht k assertEq ("lookup2 " ++ show k) (Just v) r forceType dummyArg ht ------------------------------------------------------------------------------ testInsert2 :: HashTest testInsert2 prefix dummyArg = testProperty (prefix ++ "/insert2") $ monadicIO $ do rng <- initializeRNG forAllM arbitrary $ prop rng where prop :: GenIO -> ([(Int, Int)], (Int,Int,Int)) -> PropertyM IO () prop rng (origL, (k,v,v2)) = do let l = V.toList $ shuffle rng $ V.fromList $ dedupe origL ht <- run $ fromList l run $ insert ht k v r <- run $ lookup ht k assertEq ("lookup1 " ++ show k) (Just v) r run $ insert ht k v2 r' <- run $ lookup ht k assertEq ("lookup2 " ++ show k) (Just v2) r' forceType dummyArg ht ------------------------------------------------------------------------------ testNewAndInsert :: HashTest testNewAndInsert prefix dummyArg = testProperty (prefix ++ "/newAndInsert") $ monadicIO $ forAllM arbitrary prop where prop :: (Int,Int,Int) -> PropertyM IO () prop (k,v,v2) = do ht <- run new nothing <- run $ lookup ht k assertEq ("lookup " ++ show k) Nothing nothing run $ insert ht k v r <- run $ lookup ht k assertEq ("lookup2 " ++ show k) (Just v) r run $ insert ht k v2 r' <- run $ lookup ht k assertEq ("lookup3 " ++ show k) (Just v2) r' ctRef <- run $ newIORef (0::Int) run $ mapM_ (const $ modifyIORef ctRef (+1)) ht ct <- run $ readIORef ctRef assertEq "count = 1" 1 ct ct' <- run $ foldM (\i _ -> return $! i+1) (0::Int) ht assertEq "count2 = 1" 1 ct' forceType dummyArg ht ------------------------------------------------------------------------------ testGrowTable :: HashTest testGrowTable prefix dummyArg = testProperty (prefix ++ "/growTable") $ monadicIO $ forAllM generator prop where generator = choose (32,2048) go n = new >>= go' (0::Int) where go' !i !ht | i >= n = return ht | otherwise = do insert ht i i go' (i+1) ht f (!m,!s) (!k,!v) = return $! (max m k, v `seq` s+1) prop :: Int -> PropertyM IO () prop n = do ht <- run $ go n i <- liftM head $ run $ sample' $ choose (0,n-1) v <- run $ lookup ht i assertEq ("lookup " ++ show i) (Just i) v ct <- run $ foldM f (0::Int, 0::Int) ht assertEq "max + count" (n-1,n) ct forceType dummyArg ht ------------------------------------------------------------------------------ testDelete :: HashTest testDelete prefix dummyArg = testProperty (prefix ++ "/delete") $ monadicIO $ forAllM generator prop where generator = choose (32,2048) go n = new >>= go' (0::Int) where go' !i !ht | i >= n = return ht | otherwise = do insert ht i i case i of 3 -> do delete ht 2 delete ht 3 insert ht 2 2 _ -> if i `mod` 2 == 0 then do delete ht i insert ht i i else return () go' (i+1) ht f (!m,!s) (!k,!v) = return $! (max m k, v `seq` s+1) prop :: Int -> PropertyM IO () prop n = do ht <- run $ go n i <- liftM head $ run $ sample' $ choose (4,n-1) v <- run $ lookup ht i assertEq ("lookup " ++ show i) (Just i) v v3 <- run $ lookup ht 3 assertEq ("lookup 3") Nothing v3 ct <- run $ foldM f (0::Int, 0::Int) ht assertEq "max + count" (n-1,n-1) ct forceType dummyArg ht ------------------------------------------------------------------------------ data Action = Lookup Int | Insert Int | Delete Int deriving Show timeout_ :: Int -> IO a -> IO () #ifdef PORTABLE timeout_ t m = timeout t m >>= maybe (assertFailure "timeout") (const $ return ()) #else foreign import ccall safe "suicide" c_suicide :: Ptr CInt -> CInt -> IO () -- Foreign thread can get blocked here, stalling progress. We'll make damned -- sure we bomb out. timeout_ t m = do ptr <- malloc poke ptr 1 forkOS $ suicide ptr threadDelay 1000 r <- timeout t m poke ptr 0 maybe (assertFailure "timeout") (const $ return ()) r where suicide ptr = do c_suicide ptr $ toEnum t free ptr #endif applyAction :: forall h . C.HashTable h => IOHashTable h Int () -> Action -> IO () applyAction tbl (Lookup key) = lookup tbl key >> return () applyAction tbl (Insert key) = insert tbl key () applyAction tbl (Delete key) = delete tbl key testForwardSearch3 :: HashTest testForwardSearch3 prefix dummyArg = testCase (prefix ++ "/forwardSearch3") go where go = do tbl <- new forceType tbl dummyArg timeout_ 3000000 $ foldM_ (\t k -> applyAction t k >> return t) tbl testData testData = [ Insert 65 , Insert 66 , Insert 67 , Insert 74 , Insert 75 , Insert 76 , Insert 77 , Insert 79 , Insert 80 , Insert 81 , Insert 82 , Insert 83 , Insert 84 , Delete 81 , Delete 82 , Insert 85 , Insert 86 , Insert 87 , Insert 88 , Insert 89 , Insert 90 , Insert 78 , Insert 93 , Insert 94 , Insert 95 , Insert 96 , Insert 97 , Insert 92 , Delete 93 , Delete 94 , Delete 95 , Delete 96 , Insert 99 , Insert 100 , Insert 101 , Insert 102 , Insert 103 , Insert 104 , Insert 98 , Insert 91 , Insert 108 , Insert 109 , Insert 110 , Insert 111 ] testNastyFullLookup :: HashTest testNastyFullLookup prefix dummyArg = testCase (prefix ++ "/nastyFullLookup") go where go = do tbl <- new forceType tbl dummyArg timeout_ 3000000 $ foldM_ (\t k -> applyAction t k >> return t) tbl testData testData = [ Insert 28 , Insert 27 , Insert 30 , Insert 31 , Insert 32 , Insert 33 , Insert 34 , Insert 29 , Insert 36 , Insert 37 , Delete 34 , Delete 29 , Insert 38 , Insert 39 , Insert 40 , Insert 35 , Delete 39 , Insert 42 , Insert 43 , Delete 40 , Delete 35 , Insert 44 , Insert 45 , Insert 41 , Insert 48 , Insert 47 , Insert 50 , Insert 51 , Insert 52 , Insert 49 , Insert 54 , Insert 53 , Insert 56 , Insert 55 , Insert 58 , Insert 57 , Insert 60 , Insert 59 , Delete 60 , Insert 62 , Insert 61 , Insert 63 , Insert 46 , Lookup 66 ] ------------------------------------------------------------------------------ initializeRNG :: PropertyM IO GenIO initializeRNG = run $ withSystemRandom (return :: GenIO -> IO GenIO) ------------------------------------------------------------------------------ dedupe :: (Ord k, Ord v, Eq k) => [(k,v)] -> [(k,v)] dedupe l = go0 $ sort l where go0 [] = [] go0 (x:xs) = go id x xs go !dl !lastOne [] = (dl . (lastOne:)) [] go !dl !lastOne@(!lx,_) ((x,v):xs) = if lx == x then go dl lastOne xs else go (dl . (lastOne:)) (x,v) xs ------------------------------------------------------------------------------ -- assumption: list is sorted. remove :: (Ord k, Eq k) => k -> [(k,v)] -> [(k,v)] remove m l = go id l where go !dl [] = dl [] go !dl ll@((k,v):xs) = case compare k m of LT -> go (dl . ((k,v):)) xs EQ -> go dl xs GT -> dl ll ------------------------------------------------------------------------------ shuffle :: GenIO -> Vector k -> Vector k shuffle rng v = if V.null v then v else V.modify go v where !n = V.length v go mv = f (n-1) where -- note: inclusive pickOne b = unsafeIOToST $ uniformR (0,b) rng swap = MV.unsafeSwap mv f 0 = return () f !k = do idx <- pickOne k swap k idx f (k-1)