statistics-0.13.3.0/0000755000000000000000000000000012724050603012313 5ustar0000000000000000statistics-0.13.3.0/statistics.cabal0000644000000000000000000001105612724050603015474 0ustar0000000000000000name: statistics version: 0.13.3.0 synopsis: A library of statistical types, data, and functions description: This library provides a number of common functions and types useful in statistics. We focus on high performance, numerical robustness, and use of good algorithms. Where possible, we provide references to the statistical literature. . The library's facilities can be divided into four broad categories: . * Working with widely used discrete and continuous probability distributions. (There are dozens of exotic distributions in use; we focus on the most common.) . * Computing with sample data: quantile estimation, kernel density estimation, histograms, bootstrap methods, significance testing, and regression and autocorrelation analysis. . * Random variate generation under several different distributions. . * Common statistical tests for significant differences between samples. license: BSD3 license-file: LICENSE homepage: https://github.com/bos/statistics bug-reports: https://github.com/bos/statistics/issues author: Bryan O'Sullivan maintainer: Bryan O'Sullivan copyright: 2009-2014 Bryan O'Sullivan category: Math, Statistics build-type: Simple cabal-version: >= 1.8 extra-source-files: README.markdown benchmark/bench.hs changelog.md examples/kde/KDE.hs examples/kde/data/faithful.csv examples/kde/kde.html examples/kde/kde.tpl tests/Tests/Math/Tables.hs tests/Tests/Math/gen.py tests/utils/Makefile tests/utils/fftw.c library exposed-modules: Statistics.Autocorrelation Statistics.Constants Statistics.Correlation Statistics.Correlation.Kendall Statistics.Distribution Statistics.Distribution.Beta Statistics.Distribution.Binomial Statistics.Distribution.CauchyLorentz Statistics.Distribution.ChiSquared Statistics.Distribution.Exponential Statistics.Distribution.FDistribution Statistics.Distribution.Gamma Statistics.Distribution.Geometric Statistics.Distribution.Hypergeometric Statistics.Distribution.Laplace Statistics.Distribution.Normal Statistics.Distribution.Poisson Statistics.Distribution.StudentT Statistics.Distribution.Transform Statistics.Distribution.Uniform Statistics.Function Statistics.Math.RootFinding Statistics.Matrix Statistics.Matrix.Algorithms Statistics.Matrix.Mutable Statistics.Matrix.Types Statistics.Quantile Statistics.Regression Statistics.Resampling Statistics.Resampling.Bootstrap Statistics.Sample Statistics.Sample.Histogram Statistics.Sample.KernelDensity Statistics.Sample.KernelDensity.Simple Statistics.Sample.Powers Statistics.Test.ChiSquared Statistics.Test.KolmogorovSmirnov Statistics.Test.KruskalWallis Statistics.Test.MannWhitneyU Statistics.Test.Types Statistics.Test.WilcoxonT Statistics.Transform Statistics.Types other-modules: Statistics.Distribution.Poisson.Internal Statistics.Function.Comparison Statistics.Internal Statistics.Sample.Internal Statistics.Test.Internal build-depends: aeson >= 0.6.0.0, base >= 4.4 && < 5, binary >= 0.5.1.0, deepseq >= 1.1.0.2, erf, math-functions >= 0.1.5.2, monad-par >= 0.3.4, mwc-random >= 0.13.0.0, primitive >= 0.3, vector >= 0.10, vector-algorithms >= 0.4, vector-binary-instances >= 0.2.1 if impl(ghc < 7.6) build-depends: ghc-prim -- gather extensive profiling data for now ghc-prof-options: -auto-all ghc-options: -O2 -Wall -fwarn-tabs -funbox-strict-fields test-suite tests type: exitcode-stdio-1.0 hs-source-dirs: tests main-is: tests.hs other-modules: Tests.ApproxEq Tests.Correlation Tests.Distribution Tests.Function Tests.Helpers Tests.KDE Tests.Matrix Tests.Matrix.Types Tests.NonParametric Tests.NonParametric.Table Tests.Transform ghc-options: -Wall -threaded -rtsopts -fsimpl-tick-factor=500 build-depends: HUnit, QuickCheck >= 2.7.5, base, binary, erf, ieee754 >= 0.7.3, math-functions, mwc-random, primitive, statistics, test-framework, test-framework-hunit, test-framework-quickcheck2, vector, vector-algorithms source-repository head type: git location: https://github.com/bos/statistics source-repository head type: mercurial location: https://bitbucket.org/bos/statistics statistics-0.13.3.0/changelog.md0000644000000000000000000001223312724050603014565 0ustar0000000000000000Changes in 0.13.0.0 * All types now support JSON encoding and decoding. Changes in 0.12.0.0 * The `Statistics.Math` module has been removed, after being deprecated for several years. Use the [math-functions](http://hackage.haskell.org/package/math-functions) package instead. * The `Statistics.Test.NonParametric` module has been removed, after being deprecated for several years. * Added support for Kendall's tau. * Added support for OLS regression. * Added basic 2D matrix support. * Added the Kruskal-Wallis test. Changes in 0.11.0.3 * Fixed a subtle bug in calculation of the jackknifed unbiased variance. * The test suite now requires QuickCheck 2.7. * We now calculate quantiles for normal distribution in a more numerically stable way (bug #64). Changes in 0.10.6.0 * The Estimator type has become an algebraic data type. This allows the jackknife function to potentially use more efficient jackknife implementations. * jackknifeMean, jackknifeStdDev, jackknifeVariance, jackknifeVarianceUnb: new functions. These have O(n) cost instead of the O(n^2) cost of the standard jackknife. * The mean function has been renamed to welfordMean; a new implementation of mean has better numerical accuracy in almost all cases. Changes in 0.10.5.2 * histogram correctly chooses range when all elements in the sample are same (bug #57) Changes in 0.10.5.1 * Bug fix for S.Distributions.Normal.standard introduced in 0.10.5.0 (Bug #56) Changes in 0.10.5.0 * Enthropy type class for distributions is added. * Probability and probability density of distribution is given in log domain too. Changes in 0.10.4.0 * Support for versions of GHC older than 7.2 is discontinued. * All datatypes now support 'Data.Binary' and 'GHC.Generics'. Changes in 0.10.3.0 * Bug fixes Changes in 0.10.2.0 * Bugs in DCT and IDCT are fixed. * Accesors for uniform distribution are added. * ContGen instances for all continous distribtuions are added. * Beta distribution is added. * Constructor for improper gamma distribtuion is added. * Binomial distribution allows zero trials. * Poisson distribution now accept zero parameter. * Integer overflow in caculation of Wilcoxon-T test is fixed. * Bug in 'ContGen' instance for normal distribution is fixed. Changes in 0.10.1.0 * Kolmogorov-Smirnov nonparametric test added. * Pearson chi squared test added. * Type class for generating random variates for given distribution is added. * Modules 'Statistics.Math' and 'Statistics.Constants' are moved to the @math-functions@ package. They are still available but marked as deprecated. Changed in 0.10.0.1 * @dct@ and @idct@ now have type @Vector Double -> Vector Double@ Changes in 0.10.0.0 * The type classes Mean and Variance are split in two. This is required for distributions which do not have finite variance or mean. * The S.Sample.KernelDensity module has been renamed, and completely rewritten to be much more robust. The older module oversmoothed multi-modal data. (The older module is still available under the name S.Sample.KernelDensity.Simple). * Histogram computation is added, in S.Sample.Histogram. * Discrete Fourie transform is added, in S.Transform * Root finding is added, in S.Math.RootFinding. * The complCumulative function is added to the Distribution class in order to accurately assess probalities P(X>x) which are used in one-tailed tests. * A stdDev function is added to the Variance class for distributions. * The constructor S.Distribution.normalDistr now takes standard deviation instead of variance as its parameter. * A bug in S.Quantile.weightedAvg is fixed. It produced a wrong answer if a sample contained only one element. * Bugs in quantile estimations for chi-square and gamma distribution are fixed. * Integer overlow in mannWhitneyUCriticalValue is fixed. It produced incorrect critical values for moderately large samples. Something around 20 for 32-bit machines and 40 for 64-bit ones. * A bug in mannWhitneyUSignificant is fixed. If either sample was larger than 20, it produced a completely incorrect answer. * One- and two-tailed tests in S.Tests.NonParametric are selected with sum types instead of Bool. * Test results returned as enumeration instead of @Bool@. * Performance improvements for Mann-Whitney U and Wilcoxon tests. * Module @S.Tests.NonParamtric@ is split into @S.Tests.MannWhitneyU@ and @S.Tests.WilcoxonT@ * sortBy is added to S.Function. * Mean and variance for gamma distribution are fixed. * Much faster cumulative probablity functions for Poisson and hypergeometric distributions. * Better density functions for gamma and Poisson distributions. * Student-T, Fisher-Snedecor F-distributions and Cauchy-Lorentz distrbution are added. * The function S.Function.create is removed. Use generateM from the vector package instead. * Function to perform approximate comparion of doubles is added to S.Function.Comparison * Regularized incomplete beta function and its inverse are added to S.Function statistics-0.13.3.0/Setup.lhs0000644000000000000000000000011412724050603014117 0ustar0000000000000000#!/usr/bin/env runhaskell > import Distribution.Simple > main = defaultMain statistics-0.13.3.0/README.markdown0000644000000000000000000000200712724050603015013 0ustar0000000000000000# Statistics: efficient, general purpose statistics This package provides the Statistics module, a Haskell library for working with statistical data in a space- and time-efficient way. Where possible, we give citations and computational complexity estimates for the algorithms used. # Performance This library has been carefully optimised for high performance. To obtain the best runtime efficiency, it is imperative to compile libraries and applications that use this library using a high level of optimisation. # Get involved! Please report bugs via the [github issue tracker](https://github.com/bos/statistics/issues). Master [git mirror](https://github.com/bos/statistics): * `git clone git://github.com/bos/statistics.git` There's also a [Mercurial mirror](https://bitbucket.org/bos/statistics): * `hg clone https://bitbucket.org/bos/statistics` (You can create and contribute changes using either Mercurial or git.) # Authors This library is written and maintained by Bryan O'Sullivan, . statistics-0.13.3.0/LICENSE0000644000000000000000000000246112724050603013323 0ustar0000000000000000Copyright (c) 2009, 2010 Bryan O'Sullivan All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. statistics-0.13.3.0/Statistics/0000755000000000000000000000000012724050603014445 5ustar0000000000000000statistics-0.13.3.0/Statistics/Constants.hs0000644000000000000000000000105212724050603016753 0ustar0000000000000000-- | -- Module : Statistics.Constants -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Constant values common to much statistics code. -- -- DEPRECATED: use module 'Numeric.MathFunctions.Constants' from -- math-functions. module Statistics.Constants {-# DEPRECATED "use module Numeric.MathFunctions.Constants from math-functions" #-} ( module Numeric.MathFunctions.Constants ) where import Numeric.MathFunctions.Constants statistics-0.13.3.0/Statistics/Matrix.hs0000644000000000000000000001757212724050603016261 0ustar0000000000000000{-# LANGUAGE PatternGuards #-} -- | -- Module : Statistics.Matrix -- Copyright : 2011 Aleksey Khudyakov, 2014 Bryan O'Sullivan -- License : BSD3 -- -- Basic matrix operations. -- -- There isn't a widely used matrix package for Haskell yet, so -- we implement the necessary minimum here. module Statistics.Matrix ( -- * Data types Matrix(..) , Vector -- * Conversion from/to lists/vectors , fromVector , fromList , fromRowLists , fromRows , fromColumns , toVector , toList , toRows , toColumns , toRowLists -- * Other , generate , generateSym , ident , diag , dimension , center , multiply , multiplyV , transpose , power , norm , column , row , map , for , unsafeIndex , hasNaN , bounds , unsafeBounds ) where import Prelude hiding (exponent, map, sum) import Control.Applicative ((<$>)) import Control.Monad.ST import qualified Data.Vector.Unboxed as U import Data.Vector.Unboxed ((!)) import qualified Data.Vector.Unboxed.Mutable as UM import Statistics.Function (for, square) import Statistics.Matrix.Types import Statistics.Matrix.Mutable (unsafeNew,unsafeWrite,unsafeFreeze) import Statistics.Sample.Internal (sum) ---------------------------------------------------------------- -- Conversion to/from vectors/lists ---------------------------------------------------------------- -- | Convert from a row-major list. fromList :: Int -- ^ Number of rows. -> Int -- ^ Number of columns. -> [Double] -- ^ Flat list of values, in row-major order. -> Matrix fromList r c = fromVector r c . U.fromList -- | create a matrix from a list of lists, as rows fromRowLists :: [[Double]] -> Matrix fromRowLists = fromRows . fmap U.fromList -- | Convert from a row-major vector. fromVector :: Int -- ^ Number of rows. -> Int -- ^ Number of columns. -> U.Vector Double -- ^ Flat list of values, in row-major order. -> Matrix fromVector r c v | r*c /= len = error "input size mismatch" | otherwise = Matrix r c 0 v where len = U.length v -- | create a matrix from a list of vectors, as rows fromRows :: [Vector] -> Matrix fromRows xs | [] <- xs = error "Statistics.Matrix.fromRows: empty list of rows!" | any (/=nCol) ns = error "Statistics.Matrix.fromRows: row sizes do not match" | nCol == 0 = error "Statistics.Matrix.fromRows: zero columns in matrix" | otherwise = fromVector nRow nCol (U.concat xs) where nCol:ns = U.length <$> xs nRow = length xs -- | create a matrix from a list of vectors, as columns fromColumns :: [Vector] -> Matrix fromColumns = transpose . fromRows -- | Convert to a row-major flat vector. toVector :: Matrix -> U.Vector Double toVector (Matrix _ _ _ v) = v -- | Convert to a row-major flat list. toList :: Matrix -> [Double] toList = U.toList . toVector -- | Convert to a list of lists, as rows toRowLists :: Matrix -> [[Double]] toRowLists (Matrix _ nCol _ v) = chunks $ U.toList v where chunks [] = [] chunks xs = case splitAt nCol xs of (rowE,rest) -> rowE : chunks rest -- | Convert to a list of vectors, as rows toRows :: Matrix -> [Vector] toRows (Matrix _ nCol _ v) = chunks v where chunks xs | U.null xs = [] | otherwise = case U.splitAt nCol xs of (rowE,rest) -> rowE : chunks rest -- | Convert to a list of vectors, as columns toColumns :: Matrix -> [Vector] toColumns = toRows . transpose ---------------------------------------------------------------- -- Other ---------------------------------------------------------------- -- | Generate matrix using function generate :: Int -- ^ Number of rows -> Int -- ^ Number of columns -> (Int -> Int -> Double) -- ^ Function which takes /row/ and /column/ as argument. -> Matrix generate nRow nCol f = Matrix nRow nCol 0 $ U.generate (nRow*nCol) $ \i -> let (r,c) = i `quotRem` nCol in f r c -- | Generate symmetric square matrix using function generateSym :: Int -- ^ Number of rows and columns -> (Int -> Int -> Double) -- ^ Function which takes /row/ and /column/ as argument. It must -- be symmetric in arguments: @f i j == f j i@ -> Matrix generateSym n f = runST $ do m <- unsafeNew n n for 0 n $ \r -> do unsafeWrite m r r (f r r) for (r+1) n $ \c -> do let x = f r c unsafeWrite m r c x unsafeWrite m c r x unsafeFreeze m -- | Create the square identity matrix with given dimensions. ident :: Int -> Matrix ident n = diag $ U.replicate n 1.0 -- | Create a square matrix with given diagonal, other entries default to 0 diag :: Vector -> Matrix diag v = Matrix n n 0 $ U.create $ do arr <- UM.replicate (n*n) 0 for 0 n $ \i -> UM.unsafeWrite arr (i*n + i) (v ! i) return arr where n = U.length v -- | Return the dimensions of this matrix, as a (row,column) pair. dimension :: Matrix -> (Int, Int) dimension (Matrix r c _ _) = (r, c) -- | Avoid overflow in the matrix. avoidOverflow :: Matrix -> Matrix avoidOverflow m@(Matrix r c e v) | center m > 1e140 = Matrix r c (e + 140) (U.map (* 1e-140) v) | otherwise = m -- | Matrix-matrix multiplication. Matrices must be of compatible -- sizes (/note: not checked/). multiply :: Matrix -> Matrix -> Matrix multiply m1@(Matrix r1 _ e1 _) m2@(Matrix _ c2 e2 _) = Matrix r1 c2 (e1 + e2) $ U.generate (r1*c2) go where go t = sum $ U.zipWith (*) (row m1 i) (column m2 j) where (i,j) = t `quotRem` c2 -- | Matrix-vector multiplication. multiplyV :: Matrix -> Vector -> Vector multiplyV m v | cols m == c = U.generate (rows m) (sum . U.zipWith (*) v . row m) | otherwise = error $ "matrix/vector unconformable " ++ show (cols m,c) where c = U.length v -- | Raise matrix to /n/th power. Power must be positive -- (/note: not checked). power :: Matrix -> Int -> Matrix power mat 1 = mat power mat n = avoidOverflow res where mat2 = power mat (n `quot` 2) pow = multiply mat2 mat2 res | odd n = multiply pow mat | otherwise = pow -- | Element in the center of matrix (not corrected for exponent). center :: Matrix -> Double center mat@(Matrix r c _ _) = unsafeBounds U.unsafeIndex mat (r `quot` 2) (c `quot` 2) -- | Calculate the Euclidean norm of a vector. norm :: Vector -> Double norm = sqrt . sum . U.map square -- | Return the given column. column :: Matrix -> Int -> Vector column (Matrix r c _ v) i = U.backpermute v $ U.enumFromStepN i c r {-# INLINE column #-} -- | Return the given row. row :: Matrix -> Int -> Vector row (Matrix _ c _ v) i = U.slice (c*i) c v unsafeIndex :: Matrix -> Int -- ^ Row. -> Int -- ^ Column. -> Double unsafeIndex = unsafeBounds U.unsafeIndex -- | Apply function to every element of matrix map :: (Double -> Double) -> Matrix -> Matrix map f (Matrix r c e v) = Matrix r c e (U.map f v) -- | Indicate whether any element of the matrix is @NaN@. hasNaN :: Matrix -> Bool hasNaN = U.any isNaN . toVector -- | Given row and column numbers, calculate the offset into the flat -- row-major vector. bounds :: (Vector -> Int -> r) -> Matrix -> Int -> Int -> r bounds k (Matrix rs cs _ v) r c | r < 0 || r >= rs = error "row out of bounds" | c < 0 || c >= cs = error "column out of bounds" | otherwise = k v $! r * cs + c {-# INLINE bounds #-} -- | Given row and column numbers, calculate the offset into the flat -- row-major vector, without checking. unsafeBounds :: (Vector -> Int -> r) -> Matrix -> Int -> Int -> r unsafeBounds k (Matrix _ cs _ v) r c = k v $! r * cs + c {-# INLINE unsafeBounds #-} transpose :: Matrix -> Matrix transpose m@(Matrix r0 c0 e _) = Matrix c0 r0 e . U.generate (r0*c0) $ \i -> let (r,c) = i `quotRem` r0 in unsafeIndex m c r statistics-0.13.3.0/Statistics/Types.hs0000644000000000000000000000202112724050603016100 0ustar0000000000000000-- | -- Module : Statistics.Types -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Types for working with statistics. module Statistics.Types ( Estimator(..) , Sample , WeightedSample , Weights ) where import qualified Data.Vector.Unboxed as U (Vector) -- | Sample data. type Sample = U.Vector Double -- | Sample with weights. First element of sample is data, second is weight type WeightedSample = U.Vector (Double,Double) -- | An estimator of a property of a sample, such as its 'mean'. -- -- The use of an algebraic data type here allows functions such as -- 'jackknife' and 'bootstrapBCA' to use more efficient algorithms -- when possible. data Estimator = Mean | Variance | VarianceUnbiased | StdDev | Function (Sample -> Double) -- | Weights for affecting the importance of elements of a sample. type Weights = U.Vector Double statistics-0.13.3.0/Statistics/Quantile.hs0000644000000000000000000001573712724050603016600 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Quantile -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for approximating quantiles, i.e. points taken at regular -- intervals from the cumulative distribution function of a random -- variable. -- -- The number of quantiles is described below by the variable /q/, so -- with /q/=4, a 4-quantile (also known as a /quartile/) has 4 -- intervals, and contains 5 points. The parameter /k/ describes the -- desired point, where 0 ≤ /k/ ≤ /q/. module Statistics.Quantile ( -- * Quantile estimation functions weightedAvg , ContParam(..) , continuousBy , midspread -- * Parameters for the continuous sample method , cadpw , hazen , s , spss , medianUnbiased , normalUnbiased -- * References -- $references ) where import Data.Vector.Generic ((!)) import Numeric.MathFunctions.Constants (m_epsilon) import Statistics.Function (partialSort) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | O(/n/ log /n/). Estimate the /k/th /q/-quantile of a sample, -- using the weighted average method. weightedAvg :: G.Vector v Double => Int -- ^ /k/, the desired quantile. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double weightedAvg k q x | G.any isNaN x = modErr "weightedAvg" "Sample contains NaNs" | n == 1 = G.head x | q < 2 = modErr "weightedAvg" "At least 2 quantiles is needed" | k < 0 || k >= q = modErr "weightedAvg" "Wrong quantile number" | otherwise = xj + g * (xj1 - xj) where j = floor idx idx = fromIntegral (n - 1) * fromIntegral k / fromIntegral q g = idx - fromIntegral j xj = sx ! j xj1 = sx ! (j+1) sx = partialSort (j+2) x n = G.length x {-# SPECIALIZE weightedAvg :: Int -> Int -> U.Vector Double -> Double #-} {-# SPECIALIZE weightedAvg :: Int -> Int -> V.Vector Double -> Double #-} -- | Parameters /a/ and /b/ to the 'continuousBy' function. data ContParam = ContParam {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | O(/n/ log /n/). Estimate the /k/th /q/-quantile of a sample /x/, -- using the continuous sample method with the given parameters. This -- is the method used by most statistical software, such as R, -- Mathematica, SPSS, and S. continuousBy :: G.Vector v Double => ContParam -- ^ Parameters /a/ and /b/. -> Int -- ^ /k/, the desired quantile. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double continuousBy (ContParam a b) k q x | q < 2 = modErr "continuousBy" "At least 2 quantiles is needed" | k < 0 || k > q = modErr "continuousBy" "Wrong quantile number" | G.any isNaN x = modErr "continuousBy" "Sample contains NaNs" | otherwise = (1-h) * item (j-1) + h * item j where j = floor (t + eps) t = a + p * (fromIntegral n + 1 - a - b) p = fromIntegral k / fromIntegral q h | abs r < eps = 0 | otherwise = r where r = t - fromIntegral j eps = m_epsilon * 4 n = G.length x item = (sx !) . bracket sx = partialSort (bracket j + 1) x bracket m = min (max m 0) (n - 1) {-# SPECIALIZE continuousBy :: ContParam -> Int -> Int -> U.Vector Double -> Double #-} {-# SPECIALIZE continuousBy :: ContParam -> Int -> Int -> V.Vector Double -> Double #-} -- | O(/n/ log /n/). Estimate the range between /q/-quantiles 1 and -- /q/-1 of a sample /x/, using the continuous sample method with the -- given parameters. -- -- For instance, the interquartile range (IQR) can be estimated as -- follows: -- -- > midspread medianUnbiased 4 (U.fromList [1,1,2,2,3]) -- > ==> 1.333333 midspread :: G.Vector v Double => ContParam -- ^ Parameters /a/ and /b/. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double midspread (ContParam a b) k x | G.any isNaN x = modErr "midspread" "Sample contains NaNs" | k <= 0 = modErr "midspread" "Nonpositive number of quantiles" | otherwise = quantile (1-frac) - quantile frac where quantile i = (1-h i) * item (j i-1) + h i * item (j i) j i = floor (t i + eps) :: Int t i = a + i * (fromIntegral n + 1 - a - b) h i | abs r < eps = 0 | otherwise = r where r = t i - fromIntegral (j i) eps = m_epsilon * 4 n = G.length x item = (sx !) . bracket sx = partialSort (bracket (j (1-frac)) + 1) x bracket m = min (max m 0) (n - 1) frac = 1 / fromIntegral k {-# SPECIALIZE midspread :: ContParam -> Int -> U.Vector Double -> Double #-} {-# SPECIALIZE midspread :: ContParam -> Int -> V.Vector Double -> Double #-} -- | California Department of Public Works definition, /a/=0, /b/=1. -- Gives a linear interpolation of the empirical CDF. This -- corresponds to method 4 in R and Mathematica. cadpw :: ContParam cadpw = ContParam 0 1 -- | Hazen's definition, /a/=0.5, /b/=0.5. This is claimed to be -- popular among hydrologists. This corresponds to method 5 in R and -- Mathematica. hazen :: ContParam hazen = ContParam 0.5 0.5 -- | Definition used by the SPSS statistics application, with /a/=0, -- /b/=0 (also known as Weibull's definition). This corresponds to -- method 6 in R and Mathematica. spss :: ContParam spss = ContParam 0 0 -- | Definition used by the S statistics application, with /a/=1, -- /b/=1. The interpolation points divide the sample range into @n-1@ -- intervals. This corresponds to method 7 in R and Mathematica. s :: ContParam s = ContParam 1 1 -- | Median unbiased definition, /a/=1\/3, /b/=1\/3. The resulting -- quantile estimates are approximately median unbiased regardless of -- the distribution of /x/. This corresponds to method 8 in R and -- Mathematica. medianUnbiased :: ContParam medianUnbiased = ContParam third third where third = 1/3 -- | Normal unbiased definition, /a/=3\/8, /b/=3\/8. An approximately -- unbiased estimate if the empirical distribution approximates the -- normal distribution. This corresponds to method 9 in R and -- Mathematica. normalUnbiased :: ContParam normalUnbiased = ContParam ta ta where ta = 3/8 modErr :: String -> String -> a modErr f err = error $ "Statistics.Quantile." ++ f ++ ": " ++ err -- $references -- -- * Weisstein, E.W. Quantile. /MathWorld/. -- -- -- * Hyndman, R.J.; Fan, Y. (1996) Sample quantiles in statistical -- packages. /American Statistician/ -- 50(4):361–365. statistics-0.13.3.0/Statistics/Correlation.hs0000644000000000000000000000361612724050603017270 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE BangPatterns #-} -- | -- Module : Statistics.Correlation.Pearson -- module Statistics.Correlation ( -- * Pearson correlation pearson , pearsonMatByRow -- * Spearman correlation , spearman , spearmanMatByRow ) where import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import Statistics.Matrix import Statistics.Sample import Statistics.Test.Internal (rankUnsorted) ---------------------------------------------------------------- -- Pearson ---------------------------------------------------------------- -- | Pearson correlation for sample of pairs. pearson :: (G.Vector v (Double, Double), G.Vector v Double) => v (Double, Double) -> Double pearson = correlation {-# INLINE pearson #-} -- | Compute pairwise pearson correlation between rows of a matrix pearsonMatByRow :: Matrix -> Matrix pearsonMatByRow m = generateSym (rows m) (\i j -> pearson $ row m i `U.zip` row m j) {-# INLINE pearsonMatByRow #-} ---------------------------------------------------------------- -- Spearman ---------------------------------------------------------------- -- | compute spearman correlation between two samples spearman :: ( Ord a , Ord b , G.Vector v a , G.Vector v b , G.Vector v (a, b) , G.Vector v Int , G.Vector v Double , G.Vector v (Double, Double) , G.Vector v (Int, a) , G.Vector v (Int, b) ) => v (a, b) -> Double spearman xy = pearson $ G.zip (rankUnsorted x) (rankUnsorted y) where (x, y) = G.unzip xy {-# INLINE spearman #-} -- | compute pairwise spearman correlation between rows of a matrix spearmanMatByRow :: Matrix -> Matrix spearmanMatByRow = pearsonMatByRow . fromRows . fmap rankUnsorted . toRows {-# INLINE spearmanMatByRow #-} statistics-0.13.3.0/Statistics/Internal.hs0000644000000000000000000000220112724050603016550 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash, UnboxedTuples #-} -- | -- Module : Statistics.Internal -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Scary internal functions. module Statistics.Internal ( inlinePerformIO ) where #if __GLASGOW_HASKELL__ >= 611 import GHC.IO (IO(IO)) #else import GHC.IOBase (IO(IO)) #endif import GHC.Base (realWorld#) #if !defined(__GLASGOW_HASKELL__) import System.IO.Unsafe (unsafePerformIO) #endif -- Lifted from Data.ByteString.Internal so we don't introduce an -- otherwise unnecessary dependency on the bytestring package. -- | Just like unsafePerformIO, but we inline it. Big performance -- gains as it exposes lots of things to further inlining. /Very -- unsafe/. In particular, you should do no memory allocation inside -- an 'inlinePerformIO' block. On Hugs this is just @unsafePerformIO@. {-# INLINE inlinePerformIO #-} inlinePerformIO :: IO a -> a #if defined(__GLASGOW_HASKELL__) inlinePerformIO (IO m) = case m realWorld# of (# _, r #) -> r #else inlinePerformIO = unsafePerformIO #endif statistics-0.13.3.0/Statistics/Regression.hs0000644000000000000000000001376612724050603017136 0ustar0000000000000000-- | -- Module : Statistics.Regression -- Copyright : 2014 Bryan O'Sullivan -- License : BSD3 -- -- Functions for regression analysis. module Statistics.Regression ( olsRegress , ols , rSquare , bootstrapRegress ) where import Control.Applicative ((<$>)) import Control.Concurrent (forkIO) import Control.Concurrent.Chan (newChan, readChan, writeChan) import Control.DeepSeq (rnf) import Control.Monad (forM_, replicateM) import GHC.Conc (getNumCapabilities) import Prelude hiding (pred, sum) import Statistics.Function as F import Statistics.Matrix hiding (map) import Statistics.Matrix.Algorithms (qr) import Statistics.Resampling (splitGen) import Statistics.Resampling.Bootstrap (Estimate(..)) import Statistics.Sample (mean) import Statistics.Sample.Internal (sum) import System.Random.MWC (GenIO, uniformR) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M -- | Perform an ordinary least-squares regression on a set of -- predictors, and calculate the goodness-of-fit of the regression. -- -- The returned pair consists of: -- -- * A vector of regression coefficients. This vector has /one more/ -- element than the list of predictors; the last element is the -- /y/-intercept value. -- -- * /R²/, the coefficient of determination (see 'rSquare' for -- details). olsRegress :: [Vector] -- ^ Non-empty list of predictor vectors. Must all have -- the same length. These will become the columns of -- the matrix /A/ solved by 'ols'. -> Vector -- ^ Responder vector. Must have the same length as the -- predictor vectors. -> (Vector, Double) olsRegress preds@(_:_) resps | any (/=n) ls = error $ "predictor vector length mismatch " ++ show lss | G.length resps /= n = error $ "responder/predictor length mismatch " ++ show (G.length resps, n) | otherwise = (coeffs, rSquare mxpreds resps coeffs) where coeffs = ols mxpreds resps mxpreds = transpose . fromVector (length lss + 1) n . G.concat $ preds ++ [G.replicate n 1] lss@(n:ls) = map G.length preds olsRegress _ _ = error "no predictors given" -- | Compute the ordinary least-squares solution to /A x = b/. ols :: Matrix -- ^ /A/ has at least as many rows as columns. -> Vector -- ^ /b/ has the same length as columns in /A/. -> Vector ols a b | rs < cs = error $ "fewer rows than columns " ++ show d | otherwise = solve r (transpose q `multiplyV` b) where d@(rs,cs) = dimension a (q,r) = qr a -- | Solve the equation /R x = b/. solve :: Matrix -- ^ /R/ is an upper-triangular square matrix. -> Vector -- ^ /b/ is of the same length as rows\/columns in /R/. -> Vector solve r b | n /= l = error $ "row/vector mismatch " ++ show (n,l) | otherwise = U.create $ do s <- U.thaw b rfor n 0 $ \i -> do si <- (/ unsafeIndex r i i) <$> M.unsafeRead s i M.unsafeWrite s i si for 0 i $ \j -> F.unsafeModify s j $ subtract ((unsafeIndex r j i) * si) return s where n = rows r l = U.length b -- | Compute /R²/, the coefficient of determination that -- indicates goodness-of-fit of a regression. -- -- This value will be 1 if the predictors fit perfectly, dropping to 0 -- if they have no explanatory power. rSquare :: Matrix -- ^ Predictors (regressors). -> Vector -- ^ Responders. -> Vector -- ^ Regression coefficients. -> Double rSquare pred resp coeff = 1 - r / t where r = sum $ flip U.imap resp $ \i x -> square (x - p i) t = sum $ flip U.map resp $ \x -> square (x - mean resp) p i = sum . flip U.imap coeff $ \j -> (* unsafeIndex pred i j) -- | Bootstrap a regression function. Returns both the results of the -- regression and the requested confidence interval values. bootstrapRegress :: GenIO -> Int -- ^ Number of resamples to compute. -> Double -- ^ Confidence interval. -> ([Vector] -> Vector -> (Vector, Double)) -- ^ Regression function. -> [Vector] -- ^ Predictor vectors. -> Vector -- ^ Responder vector. -> IO (V.Vector Estimate, Estimate) bootstrapRegress gen0 numResamples ci rgrss preds0 resp0 | numResamples < 1 = error $ "bootstrapRegress: number of resamples " ++ "must be positive" | ci <= 0 || ci >= 1 = error $ "bootstrapRegress: confidence interval " ++ "must lie between 0 and 1" | otherwise = do caps <- getNumCapabilities gens <- splitGen caps gen0 done <- newChan forM_ (zip gens (balance caps numResamples)) $ \(gen,count) -> do forkIO $ do v <- V.replicateM count $ do let n = U.length resp0 ixs <- U.replicateM n $ uniformR (0,n-1) gen let resp = U.backpermute resp0 ixs preds = map (flip U.backpermute ixs) preds0 return $ rgrss preds resp rnf v `seq` writeChan done v (coeffsv, r2v) <- (G.unzip . V.concat) <$> replicateM caps (readChan done) let coeffs = flip G.imap (G.convert coeffss) $ \i x -> est x . U.generate numResamples $ \k -> ((coeffsv G.! k) G.! i) r2 = est r2s (G.convert r2v) (coeffss, r2s) = rgrss preds0 resp0 est s v = Estimate s (w G.! lo) (w G.! hi) ci where w = F.sort v lo = round c hi = truncate (n - c) n = fromIntegral numResamples c = n * ((1 - ci) / 2) return (coeffs, r2) -- | Balance units of work across workers. balance :: Int -> Int -> [Int] balance numSlices numItems = zipWith (+) (replicate numSlices q) (replicate r 1 ++ repeat 0) where (q,r) = numItems `quotRem` numSlices statistics-0.13.3.0/Statistics/Autocorrelation.hs0000644000000000000000000000321312724050603020152 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Autocorrelation -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for computing autocovariance and autocorrelation of a -- sample. module Statistics.Autocorrelation ( autocovariance , autocorrelation ) where import Prelude hiding (sum) import Statistics.Function (square) import Statistics.Sample (mean) import Statistics.Sample.Internal (sum) import qualified Data.Vector.Generic as G -- | Compute the autocovariance of a sample, i.e. the covariance of -- the sample against a shifted version of itself. autocovariance :: (G.Vector v Double, G.Vector v Int) => v Double -> v Double autocovariance a = G.map f . G.enumFromTo 0 $ l-2 where f k = sum (G.zipWith (*) (G.take (l-k) c) (G.slice k (l-k) c)) / fromIntegral l c = G.map (subtract (mean a)) a l = G.length a -- | Compute the autocorrelation function of a sample, and the upper -- and lower bounds of confidence intervals for each element. -- -- /Note/: The calculation of the 95% confidence interval assumes a -- stationary Gaussian process. autocorrelation :: (G.Vector v Double, G.Vector v Int) => v Double -> (v Double, v Double, v Double) autocorrelation a = (r, ci (-), ci (+)) where r = G.map (/ G.head c) c where c = autocovariance a dllse = G.map f . G.scanl1 (+) . G.map square $ r where f v = 1.96 * sqrt ((v * 2 + 1) / l) l = fromIntegral (G.length a) ci f = G.cons 1 . G.tail . G.map (f (-1/l)) $ dllse statistics-0.13.3.0/Statistics/Transform.hs0000644000000000000000000001406512724050603016762 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts #-} -- | -- Module : Statistics.Transform -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Fourier-related transformations of mathematical functions. -- -- These functions are written for simplicity and correctness, not -- speed. If you need a fast FFT implementation for your application, -- you should strongly consider using a library of FFTW bindings -- instead. module Statistics.Transform ( -- * Type synonyms CD -- * Discrete cosine transform , dct , dct_ , idct , idct_ -- * Fast Fourier transform , fft , ifft ) where import Control.Monad (when) import Control.Monad.ST (ST) import Data.Bits (shiftL, shiftR) import Data.Complex (Complex(..), conjugate, realPart) import Numeric.SpecFunctions (log2) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as M import qualified Data.Vector.Unboxed as U import qualified Data.Vector as V type CD = Complex Double -- | Discrete cosine transform (DCT-II). dct :: (G.Vector v CD, G.Vector v Double, G.Vector v Int) => v Double -> v Double dct = dctWorker . G.map (:+0) {-# INLINABLE dct #-} {-# SPECIAlIZE dct :: U.Vector Double -> U.Vector Double #-} {-# SPECIAlIZE dct :: V.Vector Double -> V.Vector Double #-} -- | Discrete cosine transform (DCT-II). Only real part of vector is -- transformed, imaginary part is ignored. dct_ :: (G.Vector v CD, G.Vector v Double, G.Vector v Int) => v CD -> v Double dct_ = dctWorker . G.map (\(i :+ _) -> i :+ 0) {-# INLINABLE dct_ #-} {-# SPECIAlIZE dct_ :: U.Vector CD -> U.Vector Double #-} {-# SPECIAlIZE dct_ :: V.Vector CD -> V.Vector Double#-} dctWorker :: (G.Vector v CD, G.Vector v Double, G.Vector v Int) => v CD -> v Double {-# INLINE dctWorker #-} dctWorker xs -- length 1 is special cased because shuffle algorithms fail for it. | G.length xs == 1 = G.map ((2*) . realPart) xs | vectorOK xs = G.map realPart $ G.zipWith (*) weights (fft interleaved) | otherwise = error "Statistics.Transform.dct: bad vector length" where interleaved = G.backpermute xs $ G.enumFromThenTo 0 2 (len-2) G.++ G.enumFromThenTo (len-1) (len-3) 1 weights = G.cons 2 . G.generate (len-1) $ \x -> 2 * exp ((0:+(-1))*fi (x+1)*pi/(2*n)) where n = fi len len = G.length xs -- | Inverse discrete cosine transform (DCT-III). It's inverse of -- 'dct' only up to scale parameter: -- -- > (idct . dct) x = (* length x) idct :: (G.Vector v CD, G.Vector v Double) => v Double -> v Double idct = idctWorker . G.map (:+0) {-# INLINABLE idct #-} {-# SPECIAlIZE idct :: U.Vector Double -> U.Vector Double #-} {-# SPECIAlIZE idct :: V.Vector Double -> V.Vector Double #-} -- | Inverse discrete cosine transform (DCT-III). Only real part of vector is -- transformed, imaginary part is ignored. idct_ :: (G.Vector v CD, G.Vector v Double) => v CD -> v Double idct_ = idctWorker . G.map (\(i :+ _) -> i :+ 0) {-# INLINABLE idct_ #-} {-# SPECIAlIZE idct_ :: U.Vector CD -> U.Vector Double #-} {-# SPECIAlIZE idct_ :: V.Vector CD -> V.Vector Double #-} idctWorker :: (G.Vector v CD, G.Vector v Double) => v CD -> v Double {-# INLINE idctWorker #-} idctWorker xs | vectorOK xs = G.generate len interleave | otherwise = error "Statistics.Transform.dct: bad vector length" where interleave z | even z = vals `G.unsafeIndex` halve z | otherwise = vals `G.unsafeIndex` (len - halve z - 1) vals = G.map realPart . ifft $ G.zipWith (*) weights xs weights = G.cons n $ G.generate (len - 1) $ \x -> 2 * n * exp ((0:+1) * fi (x+1) * pi/(2*n)) where n = fi len len = G.length xs -- | Inverse fast Fourier transform. ifft :: G.Vector v CD => v CD -> v CD ifft xs | vectorOK xs = G.map ((/fi (G.length xs)) . conjugate) . fft . G.map conjugate $ xs | otherwise = error "Statistics.Transform.ifft: bad vector length" {-# INLINABLE ifft #-} {-# SPECIAlIZE ifft :: U.Vector CD -> U.Vector CD #-} {-# SPECIAlIZE ifft :: V.Vector CD -> V.Vector CD #-} -- | Radix-2 decimation-in-time fast Fourier transform. fft :: G.Vector v CD => v CD -> v CD fft v | vectorOK v = G.create $ do mv <- G.thaw v mfft mv return mv | otherwise = error "Statistics.Transform.fft: bad vector length" {-# INLINABLE fft #-} {-# SPECIAlIZE fft :: U.Vector CD -> U.Vector CD #-} {-# SPECIAlIZE fft :: V.Vector CD -> V.Vector CD #-} -- Vector length must be power of two. It's not checked mfft :: (M.MVector v CD) => v s CD -> ST s () {-# INLINE mfft #-} mfft vec = bitReverse 0 0 where bitReverse i j | i == len-1 = stage 0 1 | otherwise = do when (i < j) $ M.swap vec i j let inner k l | k <= l = inner (k `shiftR` 1) (l-k) | otherwise = bitReverse (i+1) (l+k) inner (len `shiftR` 1) j stage l !l1 | l == m = return () | otherwise = do let !l2 = l1 `shiftL` 1 !e = -6.283185307179586/fromIntegral l2 flight j !a | j == l1 = stage (l+1) l2 | otherwise = do let butterfly i | i >= len = flight (j+1) (a+e) | otherwise = do let i1 = i + l1 xi1 :+ yi1 <- M.read vec i1 let !c = cos a !s = sin a d = (c*xi1 - s*yi1) :+ (s*xi1 + c*yi1) ci <- M.read vec i M.write vec i1 (ci - d) M.write vec i (ci + d) butterfly (i+l2) butterfly j flight 0 0 len = M.length vec m = log2 len ---------------------------------------------------------------- -- Helpers ---------------------------------------------------------------- fi :: Int -> CD fi = fromIntegral halve :: Int -> Int halve = (`shiftR` 1) vectorOK :: G.Vector v a => v a -> Bool {-# INLINE vectorOK #-} vectorOK v = (1 `shiftL` log2 n) == n where n = G.length v statistics-0.13.3.0/Statistics/Function.hs0000644000000000000000000001006112724050603016564 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, FlexibleContexts, Rank2Types #-} #if __GLASGOW_HASKELL__ >= 704 {-# OPTIONS_GHC -fsimpl-tick-factor=200 #-} #endif -- | -- Module : Statistics.Function -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Useful functions. module Statistics.Function ( -- * Scanning minMax -- * Sorting , sort , gsort , sortBy , partialSort -- * Indexing , indexed , indices -- * Bit twiddling , nextHighestPowerOfTwo -- * Comparison , within -- * Arithmetic , square -- * Vectors , unsafeModify -- * Combinators , for , rfor ) where #include "MachDeps.h" import Control.Monad.ST (ST) import Data.Bits ((.|.), shiftR) import qualified Data.Vector.Algorithms.Intro as I import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M import Statistics.Function.Comparison (within) -- | Sort a vector. sort :: U.Vector Double -> U.Vector Double sort = G.modify I.sort {-# NOINLINE sort #-} -- | Sort a vector. gsort :: (Ord e, G.Vector v e) => v e -> v e gsort = G.modify I.sort {-# INLINE gsort #-} -- | Sort a vector using a custom ordering. sortBy :: (G.Vector v e) => I.Comparison e -> v e -> v e sortBy f = G.modify $ I.sortBy f {-# INLINE sortBy #-} -- | Partially sort a vector, such that the least /k/ elements will be -- at the front. partialSort :: (G.Vector v e, Ord e) => Int -- ^ The number /k/ of least elements. -> v e -> v e partialSort k = G.modify (`I.partialSort` k) {-# SPECIALIZE partialSort :: Int -> U.Vector Double -> U.Vector Double #-} -- | Return the indices of a vector. indices :: (G.Vector v a, G.Vector v Int) => v a -> v Int indices a = G.enumFromTo 0 (G.length a - 1) {-# INLINE indices #-} -- | Zip a vector with its indices. indexed :: (G.Vector v e, G.Vector v Int, G.Vector v (Int,e)) => v e -> v (Int,e) indexed a = G.zip (indices a) a {-# INLINE indexed #-} data MM = MM {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Compute the minimum and maximum of a vector in one pass. minMax :: (G.Vector v Double) => v Double -> (Double, Double) minMax = fini . G.foldl' go (MM (1/0) (-1/0)) where go (MM lo hi) k = MM (min lo k) (max hi k) fini (MM lo hi) = (lo, hi) {-# INLINE minMax #-} -- | Efficiently compute the next highest power of two for a -- non-negative integer. If the given value is already a power of -- two, it is returned unchanged. If negative, zero is returned. nextHighestPowerOfTwo :: Int -> Int nextHighestPowerOfTwo n #if WORD_SIZE_IN_BITS == 64 = 1 + _i32 #else = 1 + i16 #endif where i0 = n - 1 i1 = i0 .|. i0 `shiftR` 1 i2 = i1 .|. i1 `shiftR` 2 i4 = i2 .|. i2 `shiftR` 4 i8 = i4 .|. i4 `shiftR` 8 i16 = i8 .|. i8 `shiftR` 16 _i32 = i16 .|. i16 `shiftR` 32 -- It could be implemented as -- -- > nextHighestPowerOfTwo n = 1 + foldl' go (n-1) [1, 2, 4, 8, 16, 32] -- where go m i = m .|. m `shiftR` i -- -- But GHC do not inline foldl (probably because it's recursive) and -- as result function walks list of boxed ints. Hand rolled version -- uses unboxed arithmetic. -- | Multiply a number by itself. square :: Double -> Double square x = x * x -- | Simple for loop. Counts from /start/ to /end/-1. for :: Monad m => Int -> Int -> (Int -> m ()) -> m () for n0 !n f = loop n0 where loop i | i == n = return () | otherwise = f i >> loop (i+1) {-# INLINE for #-} -- | Simple reverse-for loop. Counts from /start/-1 to /end/ (which -- must be less than /start/). rfor :: Monad m => Int -> Int -> (Int -> m ()) -> m () rfor n0 !n f = loop n0 where loop i | i == n = return () | otherwise = let i' = i-1 in f i' >> loop i' {-# INLINE rfor #-} unsafeModify :: M.MVector s Double -> Int -> (Double -> Double) -> ST s () unsafeModify v i f = do k <- M.unsafeRead v i M.unsafeWrite v i (f k) {-# INLINE unsafeModify #-} statistics-0.13.3.0/Statistics/Resampling.hs0000644000000000000000000001476112724050603017113 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Resampling -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Resampling statistics. module Statistics.Resampling ( Resample(..) , jackknife , jackknifeMean , jackknifeVariance , jackknifeVarianceUnb , jackknifeStdDev , resample , estimate , splitGen ) where import Data.Aeson (FromJSON, ToJSON) import Control.Concurrent (forkIO, newChan, readChan, writeChan) import Control.Monad (forM_, liftM, replicateM, replicateM_) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import Data.Vector.Algorithms.Intro (sort) import Data.Vector.Binary () import Data.Vector.Generic (unsafeFreeze) import Data.Word (Word32) import GHC.Conc (numCapabilities) import GHC.Generics (Generic) import Numeric.Sum (Summation(..), kbn) import Statistics.Function (indices) import Statistics.Sample (mean, stdDev, variance, varianceUnbiased) import Statistics.Types (Estimator(..), Sample) import System.Random.MWC (GenIO, initialize, uniform, uniformVector) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU -- | A resample drawn randomly, with replacement, from a set of data -- points. Distinct from a normal array to make it harder for your -- humble author's brain to go wrong. newtype Resample = Resample { fromResample :: U.Vector Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Resample instance ToJSON Resample instance Binary Resample where put = put . fromResample get = fmap Resample get -- | /O(e*r*s)/ Resample a data set repeatedly, with replacement, -- computing each estimate over the resampled data. -- -- This function is expensive; it has to do work proportional to -- /e*r*s/, where /e/ is the number of estimation functions, /r/ is -- the number of resamples to compute, and /s/ is the number of -- original samples. -- -- To improve performance, this function will make use of all -- available CPUs. At least with GHC 7.0, parallel performance seems -- best if the parallel garbage collector is disabled (RTS option -- @-qg@). resample :: GenIO -> [Estimator] -- ^ Estimation functions. -> Int -- ^ Number of resamples to compute. -> Sample -- ^ Original sample. -> IO [Resample] resample gen ests numResamples samples = do let !numSamples = U.length samples ixs = scanl (+) 0 $ zipWith (+) (replicate numCapabilities q) (replicate r 1 ++ repeat 0) where (q,r) = numResamples `quotRem` numCapabilities results <- mapM (const (MU.new numResamples)) ests done <- newChan gens <- splitGen numCapabilities gen forM_ (zip3 ixs (tail ixs) gens) $ \ (start,!end,gen') -> do forkIO $ do let loop k ers | k >= end = writeChan done () | otherwise = do re <- U.replicateM numSamples $ do r <- uniform gen' return (U.unsafeIndex samples (r `mod` numSamples)) forM_ ers $ \(est,arr) -> MU.write arr k . est $ re loop (k+1) ers loop start (zip ests' results) replicateM_ numCapabilities $ readChan done mapM_ sort results mapM (liftM Resample . unsafeFreeze) results where ests' = map estimate ests -- | Run an 'Estimator' over a sample. estimate :: Estimator -> Sample -> Double estimate Mean = mean estimate Variance = variance estimate VarianceUnbiased = varianceUnbiased estimate StdDev = stdDev estimate (Function est) = est -- | /O(n) or O(n^2)/ Compute a statistical estimate repeatedly over a -- sample, each time omitting a successive element. jackknife :: Estimator -> Sample -> U.Vector Double jackknife Mean sample = jackknifeMean sample jackknife Variance sample = jackknifeVariance sample jackknife VarianceUnbiased sample = jackknifeVarianceUnb sample jackknife StdDev sample = jackknifeStdDev sample jackknife (Function est) sample | G.length sample == 1 = singletonErr "jackknife" | otherwise = U.map f . indices $ sample where f i = est (dropAt i sample) -- | /O(n)/ Compute the jackknife mean of a sample. jackknifeMean :: Sample -> U.Vector Double jackknifeMean samp | len == 1 = singletonErr "jackknifeMean" | otherwise = G.map (/l) $ G.zipWith (+) (pfxSumL samp) (pfxSumR samp) where l = fromIntegral (len - 1) len = G.length samp -- | /O(n)/ Compute the jackknife variance of a sample with a -- correction factor @c@, so we can get either the regular or -- \"unbiased\" variance. jackknifeVariance_ :: Double -> Sample -> U.Vector Double jackknifeVariance_ c samp | len == 1 = singletonErr "jackknifeVariance" | otherwise = G.zipWith4 go als ars bls brs where als = pfxSumL . G.map goa $ samp ars = pfxSumR . G.map goa $ samp goa x = v * v where v = x - m bls = pfxSumL . G.map (subtract m) $ samp brs = pfxSumR . G.map (subtract m) $ samp m = mean samp n = fromIntegral len go al ar bl br = (al + ar - (b * b) / q) / (q - c) where b = bl + br q = n - 1 len = G.length samp -- | /O(n)/ Compute the unbiased jackknife variance of a sample. jackknifeVarianceUnb :: Sample -> U.Vector Double jackknifeVarianceUnb = jackknifeVariance_ 1 -- | /O(n)/ Compute the jackknife variance of a sample. jackknifeVariance :: Sample -> U.Vector Double jackknifeVariance = jackknifeVariance_ 0 -- | /O(n)/ Compute the jackknife standard deviation of a sample. jackknifeStdDev :: Sample -> U.Vector Double jackknifeStdDev = G.map sqrt . jackknifeVarianceUnb pfxSumL :: U.Vector Double -> U.Vector Double pfxSumL = G.map kbn . G.scanl add zero pfxSumR :: U.Vector Double -> U.Vector Double pfxSumR = G.tail . G.map kbn . G.scanr (flip add) zero -- | Drop the /k/th element of a vector. dropAt :: U.Unbox e => Int -> U.Vector e -> U.Vector e dropAt n v = U.slice 0 n v U.++ U.slice (n+1) (U.length v - n - 1) v singletonErr :: String -> a singletonErr func = error $ "Statistics.Resampling." ++ func ++ ": singleton input" -- | Split a generator into several that can run independently. splitGen :: Int -> GenIO -> IO [GenIO] splitGen n gen | n <= 0 = return [] | otherwise = fmap (gen:) . replicateM (n-1) $ initialize =<< (uniformVector gen 256 :: IO (U.Vector Word32)) statistics-0.13.3.0/Statistics/Distribution.hs0000644000000000000000000001715412724050603017470 0ustar0000000000000000{-# LANGUAGE BangPatterns, ScopedTypeVariables #-} -- | -- Module : Statistics.Distribution -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Type classes for probability distributions module Statistics.Distribution ( -- * Type classes Distribution(..) , DiscreteDistr(..) , ContDistr(..) -- ** Distribution statistics , MaybeMean(..) , Mean(..) , MaybeVariance(..) , Variance(..) , MaybeEntropy(..) , Entropy(..) -- ** Random number generation , ContGen(..) , DiscreteGen(..) , genContinous -- * Helper functions , findRoot , sumProbabilities ) where import Control.Applicative ((<$>), Applicative(..)) import Control.Monad.Primitive (PrimMonad,PrimState) import Prelude hiding (sum) import Statistics.Function (square) import Statistics.Sample.Internal (sum) import System.Random.MWC (Gen, uniform) import qualified Data.Vector.Unboxed as U -- | Type class common to all distributions. Only c.d.f. could be -- defined for both discrete and continous distributions. class Distribution d where -- | Cumulative distribution function. The probability that a -- random variable /X/ is less or equal than /x/, -- i.e. P(/X/≤/x/). Cumulative should be defined for -- infinities as well: -- -- > cumulative d +∞ = 1 -- > cumulative d -∞ = 0 cumulative :: d -> Double -> Double -- | One's complement of cumulative distibution: -- -- > complCumulative d x = 1 - cumulative d x -- -- It's useful when one is interested in P(/X/>/x/) and -- expression on the right side begin to lose precision. This -- function have default implementation but implementors are -- encouraged to provide more precise implementation. complCumulative :: d -> Double -> Double complCumulative d x = 1 - cumulative d x -- | Discrete probability distribution. class Distribution d => DiscreteDistr d where -- | Probability of n-th outcome. probability :: d -> Int -> Double probability d = exp . logProbability d -- | Logarithm of probability of n-th outcome logProbability :: d -> Int -> Double logProbability d = log . probability d -- | Continuous probability distributuion. -- -- Minimal complete definition is 'quantile' and either 'density' or -- 'logDensity'. class Distribution d => ContDistr d where -- | Probability density function. Probability that random -- variable /X/ lies in the infinitesimal interval -- [/x/,/x+/δ/x/) equal to /density(x)/⋅δ/x/ density :: d -> Double -> Double density d = exp . logDensity d -- | Inverse of the cumulative distribution function. The value -- /x/ for which P(/X/≤/x/) = /p/. If probability is outside -- of [0,1] range function should call 'error' quantile :: d -> Double -> Double -- | Natural logarithm of density. logDensity :: d -> Double -> Double logDensity d = log . density d -- | Type class for distributions with mean. 'maybeMean' should return -- 'Nothing' if it's undefined for current value of data class Distribution d => MaybeMean d where maybeMean :: d -> Maybe Double -- | Type class for distributions with mean. If distribution have -- finite mean for all valid values of parameters it should be -- instance of this type class. class MaybeMean d => Mean d where mean :: d -> Double -- | Type class for distributions with variance. If variance is -- undefined for some parameter values both 'maybeVariance' and -- 'maybeStdDev' should return Nothing. -- -- Minimal complete definition is 'maybeVariance' or 'maybeStdDev' class MaybeMean d => MaybeVariance d where maybeVariance :: d -> Maybe Double maybeVariance d = (*) <$> x <*> x where x = maybeStdDev d maybeStdDev :: d -> Maybe Double maybeStdDev = fmap sqrt . maybeVariance -- | Type class for distributions with variance. If distibution have -- finite variance for all valid parameter values it should be -- instance of this type class. -- -- Minimal complete definition is 'variance' or 'stdDev' class (Mean d, MaybeVariance d) => Variance d where variance :: d -> Double variance d = square (stdDev d) stdDev :: d -> Double stdDev = sqrt . variance -- | Type class for distributions with entropy, meaning Shannon entropy -- in the case of a discrete distribution, or differential entropy in the -- case of a continuous one. 'maybeEntropy' should return 'Nothing' if -- entropy is undefined for the chosen parameter values. class (Distribution d) => MaybeEntropy d where -- | Returns the entropy of a distribution, in nats, if such is defined. maybeEntropy :: d -> Maybe Double -- | Type class for distributions with entropy, meaning Shannon -- entropy in the case of a discrete distribution, or differential -- entropy in the case of a continuous one. If the distribution has -- well-defined entropy for all valid parameter values then it -- should be an instance of this type class. class (MaybeEntropy d) => Entropy d where -- | Returns the entropy of a distribution, in nats. entropy :: d -> Double -- | Generate discrete random variates which have given -- distribution. class Distribution d => ContGen d where genContVar :: PrimMonad m => d -> Gen (PrimState m) -> m Double -- | Generate discrete random variates which have given -- distribution. 'ContGen' is superclass because it's always possible -- to generate real-valued variates from integer values class (DiscreteDistr d, ContGen d) => DiscreteGen d where genDiscreteVar :: PrimMonad m => d -> Gen (PrimState m) -> m Int -- | Generate variates from continous distribution using inverse -- transform rule. genContinous :: (ContDistr d, PrimMonad m) => d -> Gen (PrimState m) -> m Double genContinous d gen = do x <- uniform gen return $! quantile d x data P = P {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Approximate the value of /X/ for which P(/x/>/X/)=/p/. -- -- This method uses a combination of Newton-Raphson iteration and -- bisection with the given guess as a starting point. The upper and -- lower bounds specify the interval in which the probability -- distribution reaches the value /p/. findRoot :: ContDistr d => d -- ^ Distribution -> Double -- ^ Probability /p/ -> Double -- ^ Initial guess -> Double -- ^ Lower bound on interval -> Double -- ^ Upper bound on interval -> Double findRoot d prob = loop 0 1 where loop !(i::Int) !dx !x !lo !hi | abs dx <= accuracy || i >= maxIters = x | otherwise = loop (i+1) dx'' x'' lo' hi' where err = cumulative d x - prob P lo' hi' | err < 0 = P x hi | otherwise = P lo x pdf = density d x P dx' x' | pdf /= 0 = P (err / pdf) (x - dx) | otherwise = P dx x P dx'' x'' | x' < lo' || x' > hi' || pdf == 0 = let y = (lo' + hi') / 2 in P (y-x) y | otherwise = P dx' x' accuracy = 1e-15 maxIters = 150 -- | Sum probabilities in inclusive interval. sumProbabilities :: DiscreteDistr d => d -> Int -> Int -> Double sumProbabilities d low hi = -- Return value is forced to be less than 1 to guard againist roundoff errors. -- ATTENTION! this check should be removed for testing or it could mask bugs. min 1 . sum . U.map (probability d) $ U.enumFromTo low hi statistics-0.13.3.0/Statistics/Sample.hs0000644000000000000000000003565712724050603016242 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample -- Copyright : (c) 2008 Don Stewart, 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Commonly used sample statistics, also known as descriptive -- statistics. module Statistics.Sample ( -- * Types Sample , WeightedSample -- * Descriptive functions , range -- * Statistics of location , mean , welfordMean , meanWeighted , harmonicMean , geometricMean -- * Statistics of dispersion -- $variance -- ** Functions over central moments , centralMoment , centralMoments , skewness , kurtosis -- ** Two-pass functions (numerically robust) -- $robust , variance , varianceUnbiased , meanVariance , meanVarianceUnb , stdDev , varianceWeighted -- ** Single-pass functions (faster, less safe) -- $cancellation , fastVariance , fastVarianceUnbiased , fastStdDev -- * Joint distirbutions , covariance , correlation , pair -- * References -- $references ) where import Statistics.Function (minMax) import Statistics.Sample.Internal (robustSumVar, sum) import Statistics.Types (Sample,WeightedSample) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- Operator ^ will be overriden import Prelude hiding ((^), sum) -- | /O(n)/ Range. The difference between the largest and smallest -- elements of a sample. range :: (G.Vector v Double) => v Double -> Double range s = hi - lo where (lo , hi) = minMax s {-# INLINE range #-} -- | /O(n)/ Arithmetic mean. This uses Kahan-Babuška-Neumaier -- summation, so is more accurate than 'welfordMean' unless the input -- values are very large. mean :: (G.Vector v Double) => v Double -> Double mean xs = sum xs / fromIntegral (G.length xs) {-# SPECIALIZE mean :: U.Vector Double -> Double #-} {-# SPECIALIZE mean :: V.Vector Double -> Double #-} -- | /O(n)/ Arithmetic mean. This uses Welford's algorithm to provide -- numerical stability, using a single pass over the sample data. -- -- Compared to 'mean', this loses a surprising amount of precision -- unless the inputs are very large. welfordMean :: (G.Vector v Double) => v Double -> Double welfordMean = fini . G.foldl' go (T 0 0) where fini (T a _) = a go (T m n) x = T m' n' where m' = m + (x - m) / fromIntegral n' n' = n + 1 {-# SPECIALIZE welfordMean :: U.Vector Double -> Double #-} {-# SPECIALIZE welfordMean :: V.Vector Double -> Double #-} -- | /O(n)/ Arithmetic mean for weighted sample. It uses a single-pass -- algorithm analogous to the one used by 'welfordMean'. meanWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> Double meanWeighted = fini . G.foldl' go (V 0 0) where fini (V a _) = a go (V m w) (x,xw) = V m' w' where m' | w' == 0 = 0 | otherwise = m + xw * (x - m) / w' w' = w + xw {-# INLINE meanWeighted #-} -- | /O(n)/ Harmonic mean. This algorithm performs a single pass over -- the sample. harmonicMean :: (G.Vector v Double) => v Double -> Double harmonicMean = fini . G.foldl' go (T 0 0) where fini (T b a) = fromIntegral a / b go (T x y) n = T (x + (1/n)) (y+1) {-# INLINE harmonicMean #-} -- | /O(n)/ Geometric mean of a sample containing no negative values. geometricMean :: (G.Vector v Double) => v Double -> Double geometricMean = exp . mean . G.map log {-# INLINE geometricMean #-} -- | Compute the /k/th central moment of a sample. The central moment -- is also known as the moment about the mean. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. centralMoment :: (G.Vector v Double) => Int -> v Double -> Double centralMoment a xs | a < 0 = error "Statistics.Sample.centralMoment: negative input" | a == 0 = 1 | a == 1 = 0 | otherwise = sum (G.map go xs) / fromIntegral (G.length xs) where go x = (x-m) ^ a m = mean xs {-# SPECIALIZE centralMoment :: Int -> U.Vector Double -> Double #-} {-# SPECIALIZE centralMoment :: Int -> V.Vector Double -> Double #-} -- | Compute the /k/th and /j/th central moments of a sample. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. centralMoments :: (G.Vector v Double) => Int -> Int -> v Double -> (Double, Double) centralMoments a b xs | a < 2 || b < 2 = (centralMoment a xs , centralMoment b xs) | otherwise = fini . G.foldl' go (V 0 0) $ xs where go (V i j) x = V (i + d^a) (j + d^b) where d = x - m fini (V i j) = (i / n , j / n) m = mean xs n = fromIntegral (G.length xs) {-# SPECIALIZE centralMoments :: Int -> Int -> U.Vector Double -> (Double, Double) #-} {-# SPECIALIZE centralMoments :: Int -> Int -> V.Vector Double -> (Double, Double) #-} -- | Compute the skewness of a sample. This is a measure of the -- asymmetry of its distribution. -- -- A sample with negative skew is said to be /left-skewed/. Most of -- its mass is on the right of the distribution, with the tail on the -- left. -- -- > skewness $ U.to [1,100,101,102,103] -- > ==> -1.497681449918257 -- -- A sample with positive skew is said to be /right-skewed/. -- -- > skewness $ U.to [1,2,3,4,100] -- > ==> 1.4975367033335198 -- -- A sample's skewness is not defined if its 'variance' is zero. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. skewness :: (G.Vector v Double) => v Double -> Double skewness xs = c3 * c2 ** (-1.5) where (c3 , c2) = centralMoments 3 2 xs {-# SPECIALIZE skewness :: U.Vector Double -> Double #-} {-# SPECIALIZE skewness :: V.Vector Double -> Double #-} -- | Compute the excess kurtosis of a sample. This is a measure of -- the \"peakedness\" of its distribution. A high kurtosis indicates -- that more of the sample's variance is due to infrequent severe -- deviations, rather than more frequent modest deviations. -- -- A sample's excess kurtosis is not defined if its 'variance' is -- zero. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. kurtosis :: (G.Vector v Double) => v Double -> Double kurtosis xs = c4 / (c2 * c2) - 3 where (c4 , c2) = centralMoments 4 2 xs {-# SPECIALIZE kurtosis :: U.Vector Double -> Double #-} {-# SPECIALIZE kurtosis :: V.Vector Double -> Double #-} -- $variance -- -- The variance—and hence the standard deviation—of a -- sample of fewer than two elements are both defined to be zero. -- $robust -- -- These functions use the compensated summation algorithm of Chan et -- al. for numerical robustness, but require two passes over the -- sample data as a result. -- -- Because of the need for two passes, these functions are /not/ -- subject to stream fusion. data V = V {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Maximum likelihood estimate of a sample's variance. Also known -- as the population variance, where the denominator is /n/. variance :: (G.Vector v Double) => v Double -> Double variance samp | n > 1 = robustSumVar (mean samp) samp / fromIntegral n | otherwise = 0 where n = G.length samp {-# SPECIALIZE variance :: U.Vector Double -> Double #-} {-# SPECIALIZE variance :: V.Vector Double -> Double #-} -- | Unbiased estimate of a sample's variance. Also known as the -- sample variance, where the denominator is /n/-1. varianceUnbiased :: (G.Vector v Double) => v Double -> Double varianceUnbiased samp | n > 1 = robustSumVar (mean samp) samp / fromIntegral (n-1) | otherwise = 0 where n = G.length samp {-# SPECIALIZE varianceUnbiased :: U.Vector Double -> Double #-} {-# SPECIALIZE varianceUnbiased :: V.Vector Double -> Double #-} -- | Calculate mean and maximum likelihood estimate of variance. This -- function should be used if both mean and variance are required -- since it will calculate mean only once. meanVariance :: (G.Vector v Double) => v Double -> (Double,Double) meanVariance samp | n > 1 = (m, robustSumVar m samp / fromIntegral n) | otherwise = (m, 0) where n = G.length samp m = mean samp {-# SPECIALIZE meanVariance :: U.Vector Double -> (Double,Double) #-} {-# SPECIALIZE meanVariance :: V.Vector Double -> (Double,Double) #-} -- | Calculate mean and unbiased estimate of variance. This -- function should be used if both mean and variance are required -- since it will calculate mean only once. meanVarianceUnb :: (G.Vector v Double) => v Double -> (Double,Double) meanVarianceUnb samp | n > 1 = (m, robustSumVar m samp / fromIntegral (n-1)) | otherwise = (m, 0) where n = G.length samp m = mean samp {-# SPECIALIZE meanVarianceUnb :: U.Vector Double -> (Double,Double) #-} {-# SPECIALIZE meanVarianceUnb :: V.Vector Double -> (Double,Double) #-} -- | Standard deviation. This is simply the square root of the -- unbiased estimate of the variance. stdDev :: (G.Vector v Double) => v Double -> Double stdDev = sqrt . varianceUnbiased {-# SPECIALIZE stdDev :: U.Vector Double -> Double #-} {-# SPECIALIZE stdDev :: V.Vector Double -> Double #-} robustSumVarWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> V robustSumVarWeighted samp = G.foldl' go (V 0 0) samp where go (V s w) (x,xw) = V (s + xw*d*d) (w + xw) where d = x - m m = meanWeighted samp {-# INLINE robustSumVarWeighted #-} -- | Weighted variance. This is biased estimation. varianceWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> Double varianceWeighted samp | G.length samp > 1 = fini $ robustSumVarWeighted samp | otherwise = 0 where fini (V s w) = s / w {-# SPECIALIZE varianceWeighted :: U.Vector (Double,Double) -> Double #-} {-# SPECIALIZE varianceWeighted :: V.Vector (Double,Double) -> Double #-} -- $cancellation -- -- The functions prefixed with the name @fast@ below perform a single -- pass over the sample data using Knuth's algorithm. They usually -- work well, but see below for caveats. These functions are subject -- to array fusion. -- -- /Note/: in cases where most sample data is close to the sample's -- mean, Knuth's algorithm gives inaccurate results due to -- catastrophic cancellation. fastVar :: (G.Vector v Double) => v Double -> T1 fastVar = G.foldl' go (T1 0 0 0) where go (T1 n m s) x = T1 n' m' s' where n' = n + 1 m' = m + d / fromIntegral n' s' = s + d * (x - m') d = x - m -- | Maximum likelihood estimate of a sample's variance. fastVariance :: (G.Vector v Double) => v Double -> Double fastVariance = fini . fastVar where fini (T1 n _m s) | n > 1 = s / fromIntegral n | otherwise = 0 {-# INLINE fastVariance #-} -- | Unbiased estimate of a sample's variance. fastVarianceUnbiased :: (G.Vector v Double) => v Double -> Double fastVarianceUnbiased = fini . fastVar where fini (T1 n _m s) | n > 1 = s / fromIntegral (n - 1) | otherwise = 0 {-# INLINE fastVarianceUnbiased #-} -- | Standard deviation. This is simply the square root of the -- maximum likelihood estimate of the variance. fastStdDev :: (G.Vector v Double) => v Double -> Double fastStdDev = sqrt . fastVariance {-# INLINE fastStdDev #-} -- | Covariance of sample of pairs. For empty sample it's set to -- zero covariance :: (G.Vector v (Double,Double), G.Vector v Double) => v (Double,Double) -> Double covariance xy | n == 0 = 0 | otherwise = mean $ G.zipWith (*) (G.map (\x -> x - muX) xs) (G.map (\y -> y - muY) ys) where n = G.length xy (xs,ys) = G.unzip xy muX = mean xs muY = mean ys {-# SPECIALIZE covariance :: U.Vector (Double,Double) -> Double #-} {-# SPECIALIZE covariance :: V.Vector (Double,Double) -> Double #-} -- | Correlation coefficient for sample of pairs. Also known as -- Pearson's correlation. For empty sample it's set to zero. correlation :: (G.Vector v (Double,Double), G.Vector v Double) => v (Double,Double) -> Double correlation xy | n == 0 = 0 | otherwise = cov / sqrt (varX * varY) where n = G.length xy (xs,ys) = G.unzip xy (muX,varX) = meanVariance xs (muY,varY) = meanVariance ys cov = mean $ G.zipWith (*) (G.map (\x -> x - muX) xs) (G.map (\y -> y - muY) ys) {-# SPECIALIZE correlation :: U.Vector (Double,Double) -> Double #-} {-# SPECIALIZE correlation :: V.Vector (Double,Double) -> Double #-} -- | Pair two samples. It's like 'G.zip' but requires that both -- samples have equal size. pair :: (G.Vector v a, G.Vector v b, G.Vector v (a,b)) => v a -> v b -> v (a,b) pair va vb | G.length va == G.length vb = G.zip va vb | otherwise = error "Statistics.Sample.pair: vector must have same length" {-# INLINE pair #-} ------------------------------------------------------------------------ -- Helper code. Monomorphic unpacked accumulators. -- (^) operator from Prelude is just slow. (^) :: Double -> Int -> Double x ^ 1 = x x ^ n = x * (x ^ (n-1)) {-# INLINE (^) #-} -- don't support polymorphism, as we can't get unboxed returns if we use it. data T = T {-# UNPACK #-}!Double {-# UNPACK #-}!Int data T1 = T1 {-# UNPACK #-}!Int {-# UNPACK #-}!Double {-# UNPACK #-}!Double {- Consider this core: with data T a = T !a !Int $wfold :: Double# -> Int# -> Int# -> (# Double, Int# #) and without, $wfold :: Double# -> Int# -> Int# -> (# Double#, Int# #) yielding to boxed returns and heap checks. -} -- $references -- -- * Chan, T. F.; Golub, G.H.; LeVeque, R.J. (1979) Updating formulae -- and a pairwise algorithm for computing sample -- variances. Technical Report STAN-CS-79-773, Department of -- Computer Science, Stanford -- University. -- -- * Knuth, D.E. (1998) The art of computer programming, volume 2: -- seminumerical algorithms, 3rd ed., p. 232. -- -- * Welford, B.P. (1962) Note on a method for calculating corrected -- sums of squares and products. /Technometrics/ -- 4(3):419–420. -- -- * West, D.H.D. (1979) Updating mean and variance estimates: an -- improved method. /Communications of the ACM/ -- 22(9):532–535. statistics-0.13.3.0/Statistics/Function/0000755000000000000000000000000012724050603016232 5ustar0000000000000000statistics-0.13.3.0/Statistics/Function/Comparison.hs0000644000000000000000000000250112724050603020676 0ustar0000000000000000-- | -- Module : Statistics.Function.Comparison -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Approximate floating point comparison, based on Bruce Dawson's -- \"Comparing floating point numbers\": -- module Statistics.Function.Comparison ( within ) where import Control.Monad.ST (runST) import Data.Primitive.ByteArray (newByteArray, readByteArray, writeByteArray) import Data.Word (Word64) -- | Compare two 'Double' values for approximate equality, using -- Dawson's method. -- -- The required accuracy is specified in ULPs (units of least -- precision). If the two numbers differ by the given number of ULPs -- or less, this function returns @True@. within :: Int -- ^ Number of ULPs of accuracy desired. -> Double -> Double -> Bool within ulps a b = runST $ do buf <- newByteArray 8 ai0 <- writeByteArray buf 0 a >> readByteArray buf 0 bi0 <- writeByteArray buf 0 b >> readByteArray buf 0 let big = 0x8000000000000000 :: Word64 ai | ai0 < 0 = big - ai0 | otherwise = ai0 bi | bi0 < 0 = big - bi0 | otherwise = bi0 return $ abs (ai - bi) <= fromIntegral ulps statistics-0.13.3.0/Statistics/Correlation/0000755000000000000000000000000012724050603016726 5ustar0000000000000000statistics-0.13.3.0/Statistics/Correlation/Kendall.hs0000644000000000000000000001225612724050603020642 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, FlexibleContexts #-} -- | -- Module : Statistics.Correlation.Kendall -- -- Fast O(NlogN) implementation of -- . -- -- This module implementes Kendall's tau form b which allows ties in the data. -- This is the same formula used by other statistical packages, e.g., R, matlab. -- -- > \tau = \frac{n_c - n_d}{\sqrt{(n_0 - n_1)(n_0 - n_2)}} -- -- where n_0 = n(n-1)\/2, n_1 = number of pairs tied for the first quantify, -- n_2 = number of pairs tied for the second quantify, -- n_c = number of concordant pairs$, n_d = number of discordant pairs. module Statistics.Correlation.Kendall ( kendall -- * References -- $references ) where import Control.Monad.ST (ST, runST) import Data.Bits (shiftR) import Data.Function (on) import Data.STRef import qualified Data.Vector.Algorithms.Intro as I import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as GM -- | /O(nlogn)/ Compute the Kendall's tau from a vector of paired data. -- Return NaN when number of pairs <= 1. kendall :: (Ord a, Ord b, G.Vector v (a, b)) => v (a, b) -> Double kendall xy' | G.length xy' <= 1 = 0/0 | otherwise = runST $ do xy <- G.thaw xy' let n = GM.length xy n_dRef <- newSTRef 0 I.sort xy tieX <- numOfTiesBy ((==) `on` fst) xy tieXY <- numOfTiesBy (==) xy tmp <- GM.new n mergeSort (compare `on` snd) xy tmp n_dRef tieY <- numOfTiesBy ((==) `on` snd) xy n_d <- readSTRef n_dRef let n_0 = (fromIntegral n * (fromIntegral n-1)) `shiftR` 1 :: Integer n_c = n_0 - n_d - tieX - tieY + tieXY return $ fromIntegral (n_c - n_d) / (sqrt.fromIntegral) ((n_0 - tieX) * (n_0 - tieY)) {-# INLINE kendall #-} -- calculate number of tied pairs in a sorted vector numOfTiesBy :: GM.MVector v a => (a -> a -> Bool) -> v s a -> ST s Integer numOfTiesBy f xs = do count <- newSTRef (0::Integer) loop count (1::Int) (0::Int) readSTRef count where n = GM.length xs loop c !acc !i | i >= n - 1 = modifySTRef' c (+ g acc) | otherwise = do x1 <- GM.unsafeRead xs i x2 <- GM.unsafeRead xs (i+1) if f x1 x2 then loop c (acc+1) (i+1) else modifySTRef' c (+ g acc) >> loop c 1 (i+1) g x = fromIntegral ((x * (x - 1)) `shiftR` 1) {-# INLINE numOfTiesBy #-} -- Implementation of Knight's merge sort (adapted from vector-algorithm). This -- function is used to count the number of discordant pairs. mergeSort :: GM.MVector v e => (e -> e -> Ordering) -> v s e -> v s e -> STRef s Integer -> ST s () mergeSort cmp src buf count = loop 0 (GM.length src - 1) where loop l u | u == l = return () | u - l == 1 = do eL <- GM.unsafeRead src l eU <- GM.unsafeRead src u case cmp eL eU of GT -> do GM.unsafeWrite src l eU GM.unsafeWrite src u eL modifySTRef' count (+1) _ -> return () | otherwise = do let mid = (u + l) `shiftR` 1 loop l mid loop mid u merge cmp (GM.unsafeSlice l (u-l+1) src) buf (mid - l) count {-# INLINE mergeSort #-} merge :: GM.MVector v e => (e -> e -> Ordering) -> v s e -> v s e -> Int -> STRef s Integer -> ST s () merge cmp src buf mid count = do GM.unsafeCopy tmp lower eTmp <- GM.unsafeRead tmp 0 eUpp <- GM.unsafeRead upper 0 loop tmp 0 eTmp upper 0 eUpp 0 where lower = GM.unsafeSlice 0 mid src upper = GM.unsafeSlice mid (GM.length src - mid) src tmp = GM.unsafeSlice 0 mid buf wroteHigh low iLow eLow high iHigh iIns | iHigh >= GM.length high = GM.unsafeCopy (GM.unsafeSlice iIns (GM.length low - iLow) src) (GM.unsafeSlice iLow (GM.length low - iLow) low) | otherwise = do eHigh <- GM.unsafeRead high iHigh loop low iLow eLow high iHigh eHigh iIns wroteLow low iLow high iHigh eHigh iIns | iLow >= GM.length low = return () | otherwise = do eLow <- GM.unsafeRead low iLow loop low iLow eLow high iHigh eHigh iIns loop !low !iLow !eLow !high !iHigh !eHigh !iIns = case cmp eHigh eLow of LT -> do GM.unsafeWrite src iIns eHigh modifySTRef' count (+ fromIntegral (GM.length low - iLow)) wroteHigh low iLow eLow high (iHigh+1) (iIns+1) _ -> do GM.unsafeWrite src iIns eLow wroteLow low (iLow+1) high iHigh eHigh (iIns+1) {-# INLINE merge #-} #if !MIN_VERSION_base(4,6,0) modifySTRef' :: STRef s a -> (a -> a) -> ST s () modifySTRef' = modifySTRef #endif -- $references -- -- * William R. Knight. (1966) A computer method for calculating Kendall's Tau -- with ungrouped data. /Journal of the American Statistical Association/, -- Vol. 61, No. 314, Part 1, pp. 436-439. -- statistics-0.13.3.0/Statistics/Math/0000755000000000000000000000000012724050603015336 5ustar0000000000000000statistics-0.13.3.0/Statistics/Math/RootFinding.hs0000644000000000000000000001110312724050603020110 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Math.RootFinding -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Haskell functions for finding the roots of mathematical functions. module Statistics.Math.RootFinding ( Root(..) , fromRoot , ridders -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative (Alternative(..), Applicative(..)) import Control.Monad (MonadPlus(..), ap) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Binary.Get (getWord8) import Data.Binary.Put (putWord8) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Statistics.Function.Comparison (within) -- | The result of searching for a root of a mathematical function. data Root a = NotBracketed -- ^ The function does not have opposite signs when -- evaluated at the lower and upper bounds of the search. | SearchFailed -- ^ The search failed to converge to within the given -- error tolerance after the given number of iterations. | Root a -- ^ A root was successfully found. deriving (Eq, Read, Show, Typeable, Data, Generic) instance (FromJSON a) => FromJSON (Root a) instance (ToJSON a) => ToJSON (Root a) instance (Binary a) => Binary (Root a) where put NotBracketed = putWord8 0 put SearchFailed = putWord8 1 put (Root a) = putWord8 2 >> put a get = do i <- getWord8 case i of 0 -> return NotBracketed 1 -> return SearchFailed 2 -> fmap Root get _ -> fail $ "Root.get: Invalid value: " ++ show i instance Functor Root where fmap _ NotBracketed = NotBracketed fmap _ SearchFailed = SearchFailed fmap f (Root a) = Root (f a) instance Monad Root where NotBracketed >>= _ = NotBracketed SearchFailed >>= _ = SearchFailed Root a >>= m = m a return = Root instance MonadPlus Root where mzero = SearchFailed r@(Root _) `mplus` _ = r _ `mplus` p = p instance Applicative Root where pure = Root (<*>) = ap instance Alternative Root where empty = SearchFailed r@(Root _) <|> _ = r _ <|> p = p -- | Returns either the result of a search for a root, or the default -- value if the search failed. fromRoot :: a -- ^ Default value. -> Root a -- ^ Result of search for a root. -> a fromRoot _ (Root a) = a fromRoot a _ = a -- | Use the method of Ridders to compute a root of a function. -- -- The function must have opposite signs when evaluated at the lower -- and upper bounds of the search (i.e. the root must be bracketed). ridders :: Double -- ^ Absolute error tolerance. -> (Double,Double) -- ^ Lower and upper bounds for the search. -> (Double -> Double) -- ^ Function to find the roots of. -> Root Double ridders tol (lo,hi) f | flo == 0 = Root lo | fhi == 0 = Root hi | flo*fhi > 0 = NotBracketed -- root is not bracketed | otherwise = go lo flo hi fhi 0 where go !a !fa !b !fb !i -- Root is bracketed within 1 ulp. No improvement could be made | within 1 a b = Root a -- Root is found. Check that f(m) == 0 is nessesary to ensure -- that root is never passed to 'go' | fm == 0 = Root m | fn == 0 = Root n | d < tol = Root n -- Too many iterations performed. Fail | i >= (100 :: Int) = SearchFailed -- Ridder's approximation coincide with one of old -- bounds. Revert to bisection | n == a || n == b = case () of _| fm*fa < 0 -> go a fa m fm (i+1) | otherwise -> go m fm b fb (i+1) -- Proceed as usual | fn*fm < 0 = go n fn m fm (i+1) | fn*fa < 0 = go a fa n fn (i+1) | otherwise = go n fn b fb (i+1) where d = abs (b - a) dm = (b - a) * 0.5 !m = a + dm !fm = f m !dn = signum (fb - fa) * dm * fm / sqrt(fm*fm - fa*fb) !n = m - signum dn * min (abs dn) (abs dm - 0.5 * tol) !fn = f n !flo = f lo !fhi = f hi -- $references -- -- * Ridders, C.F.J. (1979) A new algorithm for computing a single -- root of a real continuous function. -- /IEEE Transactions on Circuits and Systems/ 26:979–980. statistics-0.13.3.0/Statistics/Resampling/0000755000000000000000000000000012724050603016546 5ustar0000000000000000statistics-0.13.3.0/Statistics/Resampling/Bootstrap.hs0000644000000000000000000001117212724050603021061 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric, OverloadedStrings, RecordWildCards #-} -- | -- Module : Statistics.Resampling.Bootstrap -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The bootstrap method for statistical inference. module Statistics.Resampling.Bootstrap ( Estimate(..) , bootstrapBCA , scale -- * References -- $references ) where import Control.Applicative ((<$>), (<*>)) import Control.DeepSeq (NFData) import Control.Exception (assert) import Control.Monad.Par (parMap, runPar) import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data) import Data.Typeable (Typeable) import Data.Vector.Unboxed ((!)) import GHC.Generics (Generic) import Statistics.Distribution (cumulative, quantile) import Statistics.Distribution.Normal import Statistics.Resampling (Resample(..), jackknife) import Statistics.Sample (mean) import Statistics.Types (Estimator, Sample) import qualified Data.Vector.Unboxed as U import qualified Statistics.Resampling as R -- | A point and interval estimate computed via an 'Estimator'. data Estimate = Estimate { estPoint :: {-# UNPACK #-} !Double -- ^ Point estimate. , estLowerBound :: {-# UNPACK #-} !Double -- ^ Lower bound of the estimate interval (i.e. the lower bound of -- the confidence interval). , estUpperBound :: {-# UNPACK #-} !Double -- ^ Upper bound of the estimate interval (i.e. the upper bound of -- the confidence interval). , estConfidenceLevel :: {-# UNPACK #-} !Double -- ^ Confidence level of the confidence intervals. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Estimate instance ToJSON Estimate instance Binary Estimate where put (Estimate w x y z) = put w >> put x >> put y >> put z get = Estimate <$> get <*> get <*> get <*> get instance NFData Estimate -- | Multiply the point, lower bound, and upper bound in an 'Estimate' -- by the given value. scale :: Double -- ^ Value to multiply by. -> Estimate -> Estimate scale f e@Estimate{..} = e { estPoint = f * estPoint , estLowerBound = f * estLowerBound , estUpperBound = f * estUpperBound } estimate :: Double -> Double -> Double -> Double -> Estimate estimate pt lb ub cl = assert (lb <= ub) . assert (cl > 0 && cl < 1) $ Estimate { estPoint = pt , estLowerBound = lb , estUpperBound = ub , estConfidenceLevel = cl } data T = {-# UNPACK #-} !Double :< {-# UNPACK #-} !Double infixl 2 :< -- | Bias-corrected accelerated (BCA) bootstrap. This adjusts for both -- bias and skewness in the resampled distribution. bootstrapBCA :: Double -- ^ Confidence level -> Sample -- ^ Sample data -> [Estimator] -- ^ Estimators -> [Resample] -- ^ Resampled data -> [Estimate] bootstrapBCA confidenceLevel sample estimators resamples | confidenceLevel > 0 && confidenceLevel < 1 = runPar $ parMap (uncurry e) (zip estimators resamples) | otherwise = error "Statistics.Resampling.Bootstrap.bootstrapBCA: confidence level outside (0,1) range" where e est (Resample resample) | U.length sample == 1 || isInfinite bias = estimate pt pt pt confidenceLevel | otherwise = estimate pt (resample ! lo) (resample ! hi) confidenceLevel where pt = R.estimate est sample lo = max (cumn a1) 0 where a1 = bias + b1 / (1 - accel * b1) b1 = bias + z1 hi = min (cumn a2) (ni - 1) where a2 = bias + b2 / (1 - accel * b2) b2 = bias - z1 z1 = quantile standard ((1 - confidenceLevel) / 2) cumn = round . (*n) . cumulative standard bias = quantile standard (probN / n) where probN = fromIntegral . U.length . U.filter ( statistics-0.13.3.0/Statistics/Distribution/0000755000000000000000000000000012724050603017124 5ustar0000000000000000statistics-0.13.3.0/Statistics/Distribution/Beta.hs0000644000000000000000000000654112724050603020341 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} ----------------------------------------------------------------------------- -- | -- Module : Statistics.Distribution.Beta -- Copyright : (C) 2012 Edward Kmett, -- License : BSD-style (see the file LICENSE) -- -- Maintainer : Edward Kmett -- Stability : provisional -- Portability : DeriveDataTypeable -- ---------------------------------------------------------------------------- module Statistics.Distribution.Beta ( BetaDistribution -- * Constructor , betaDistr , improperBetaDistr -- * Accessors , bdAlpha , bdBeta ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.SpecFunctions ( incompleteBeta, invIncompleteBeta, logBeta, digamma) import Numeric.MathFunctions.Constants (m_NaN) import qualified Statistics.Distribution as D import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | The beta distribution data BetaDistribution = BD { bdAlpha :: {-# UNPACK #-} !Double -- ^ Alpha shape parameter , bdBeta :: {-# UNPACK #-} !Double -- ^ Beta shape parameter } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON BetaDistribution instance ToJSON BetaDistribution instance Binary BetaDistribution where put (BD x y) = put x >> put y get = BD <$> get <*> get -- | Create beta distribution. Both shape parameters must be positive. betaDistr :: Double -- ^ Shape parameter alpha -> Double -- ^ Shape parameter beta -> BetaDistribution betaDistr a b | a > 0 && b > 0 = improperBetaDistr a b | otherwise = error $ "Statistics.Distribution.Beta.betaDistr: " ++ "shape parameters must be positive. Got a = " ++ show a ++ " b = " ++ show b -- | Create beta distribution. This construtor doesn't check parameters. improperBetaDistr :: Double -- ^ Shape parameter alpha -> Double -- ^ Shape parameter beta -> BetaDistribution improperBetaDistr = BD instance D.Distribution BetaDistribution where cumulative (BD a b) x | x <= 0 = 0 | x >= 1 = 1 | otherwise = incompleteBeta a b x instance D.Mean BetaDistribution where mean (BD a b) = a / (a + b) instance D.MaybeMean BetaDistribution where maybeMean = Just . D.mean instance D.Variance BetaDistribution where variance (BD a b) = a*b / (apb*apb*(apb+1)) where apb = a + b instance D.MaybeVariance BetaDistribution where maybeVariance = Just . D.variance instance D.Entropy BetaDistribution where entropy (BD a b) = logBeta a b - (a-1) * digamma a - (b-1) * digamma b + (a+b-2) * digamma (a+b) instance D.MaybeEntropy BetaDistribution where maybeEntropy = Just . D.entropy instance D.ContDistr BetaDistribution where density (BD a b) x | a <= 0 || b <= 0 = m_NaN | x <= 0 = 0 | x >= 1 = 0 | otherwise = exp $ (a-1)*log x + (b-1)*log (1-x) - logBeta a b quantile (BD a b) p | p == 0 = 0 | p == 1 = 1 | p > 0 && p < 1 = invIncompleteBeta a b p | otherwise = error $ "Statistics.Distribution.Gamma.quantile: p must be in [0,1] range. Got: "++show p instance D.ContGen BetaDistribution where genContVar = D.genContinous statistics-0.13.3.0/Statistics/Distribution/FDistribution.hs0000644000000000000000000000647512724050603022261 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.FDistribution -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Fisher F distribution module Statistics.Distribution.FDistribution ( FDistribution , fDistribution , fDistributionNDF1 , fDistributionNDF2 ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import Numeric.MathFunctions.Constants (m_neg_inf) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import Statistics.Function (square) import Numeric.SpecFunctions ( logBeta, incompleteBeta, invIncompleteBeta, digamma) import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | F distribution data FDistribution = F { fDistributionNDF1 :: {-# UNPACK #-} !Double , fDistributionNDF2 :: {-# UNPACK #-} !Double , _pdfFactor :: {-# UNPACK #-} !Double } deriving (Eq, Show, Read, Typeable, Data, Generic) instance FromJSON FDistribution instance ToJSON FDistribution instance Binary FDistribution where get = F <$> get <*> get <*> get put (F x y z) = put x >> put y >> put z fDistribution :: Int -> Int -> FDistribution fDistribution n m | n > 0 && m > 0 = let n' = fromIntegral n m' = fromIntegral m f' = 0.5 * (log m' * m' + log n' * n') - logBeta (0.5*n') (0.5*m') in F n' m' f' | otherwise = error "Statistics.Distribution.FDistribution.fDistribution: non-positive number of degrees of freedom" instance D.Distribution FDistribution where cumulative = cumulative instance D.ContDistr FDistribution where density d x | x <= 0 = 0 | otherwise = exp $ logDensity d x logDensity d x | x <= 0 = m_neg_inf | otherwise = logDensity d x quantile = quantile cumulative :: FDistribution -> Double -> Double cumulative (F n m _) x | x <= 0 = 0 | isInfinite x = 1 -- Only matches +∞ | otherwise = let y = n*x in incompleteBeta (0.5 * n) (0.5 * m) (y / (m + y)) logDensity :: FDistribution -> Double -> Double logDensity (F n m fac) x = fac + log x * (0.5 * n - 1) - log(m + n*x) * 0.5 * (n + m) quantile :: FDistribution -> Double -> Double quantile (F n m _) p | p >= 0 && p <= 1 = let x = invIncompleteBeta (0.5 * n) (0.5 * m) p in m * x / (n * (1 - x)) | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.MaybeMean FDistribution where maybeMean (F _ m _) | m > 2 = Just $ m / (m - 2) | otherwise = Nothing instance D.MaybeVariance FDistribution where maybeStdDev (F n m _) | m > 4 = Just $ 2 * square m * (m + n - 2) / (n * square (m - 2) * (m - 4)) | otherwise = Nothing instance D.Entropy FDistribution where entropy (F n m _) = let nHalf = 0.5 * n mHalf = 0.5 * m in log (n/m) + logBeta nHalf mHalf + (1 - nHalf) * digamma nHalf - (1 + mHalf) * digamma mHalf + (nHalf + mHalf) * digamma (nHalf + mHalf) instance D.MaybeEntropy FDistribution where maybeEntropy = Just . D.entropy instance D.ContGen FDistribution where genContVar = D.genContinous statistics-0.13.3.0/Statistics/Distribution/Poisson.hs0000644000000000000000000000570212724050603021116 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Poisson -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Poisson distribution. This is the discrete probability -- distribution of a number of events occurring in a fixed interval if -- these events occur with a known average rate, and occur -- independently from each other within that interval. module Statistics.Distribution.Poisson ( PoissonDistribution -- * Constructors , poisson -- * Accessors , poissonLambda -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import qualified Statistics.Distribution.Poisson.Internal as I import Numeric.SpecFunctions (incompleteGamma,logFactorial) import Numeric.MathFunctions.Constants (m_neg_inf) import Data.Binary (put, get) newtype PoissonDistribution = PD { poissonLambda :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON PoissonDistribution instance ToJSON PoissonDistribution instance Binary PoissonDistribution where get = fmap PD get put = put . poissonLambda instance D.Distribution PoissonDistribution where cumulative (PD lambda) x | x < 0 = 0 | isInfinite x = 1 | isNaN x = error "Statistics.Distribution.Poisson.cumulative: NaN input" | otherwise = 1 - incompleteGamma (fromIntegral (floor x + 1 :: Int)) lambda instance D.DiscreteDistr PoissonDistribution where probability (PD lambda) x = I.probability lambda (fromIntegral x) logProbability (PD lambda) i | i < 0 = m_neg_inf | otherwise = log lambda * fromIntegral i - logFactorial i - lambda instance D.Variance PoissonDistribution where variance = poissonLambda instance D.Mean PoissonDistribution where mean = poissonLambda instance D.MaybeMean PoissonDistribution where maybeMean = Just . D.mean instance D.MaybeVariance PoissonDistribution where maybeStdDev = Just . D.stdDev instance D.Entropy PoissonDistribution where entropy (PD lambda) = I.poissonEntropy lambda instance D.MaybeEntropy PoissonDistribution where maybeEntropy = Just . D.entropy -- | Create Poisson distribution. poisson :: Double -> PoissonDistribution poisson l | l >= 0 = PD l | otherwise = error $ "Statistics.Distribution.Poisson.poisson: lambda must be non-negative. Got " ++ show l -- $references -- -- * Loader, C. (2000) Fast and Accurate Computation of Binomial -- Probabilities. -- * Adell, J., Lekuona, A., and Yu, Y. (2010) Sharp Bounds on the -- Entropy of the Poisson Law and Related Quantities -- statistics-0.13.3.0/Statistics/Distribution/StudentT.hs0000644000000000000000000000663312724050603021242 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.StudentT -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Student-T distribution module Statistics.Distribution.StudentT ( StudentT , studentT , studentTndf , studentTUnstandardized ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import Statistics.Distribution.Transform (LinearTransform (..)) import Numeric.SpecFunctions ( logBeta, incompleteBeta, invIncompleteBeta, digamma) import Data.Binary (put, get) -- | Student-T distribution newtype StudentT = StudentT { studentTndf :: Double } deriving (Eq, Show, Read, Typeable, Data, Generic) instance FromJSON StudentT instance ToJSON StudentT instance Binary StudentT where put = put . studentTndf get = fmap StudentT get -- | Create Student-T distribution. Number of parameters must be positive. studentT :: Double -> StudentT studentT ndf | ndf > 0 = StudentT ndf | otherwise = modErr "studentT" "non-positive number of degrees of freedom" instance D.Distribution StudentT where cumulative = cumulative instance D.ContDistr StudentT where density d@(StudentT ndf) x = exp (logDensityUnscaled d x) / sqrt ndf logDensity d@(StudentT ndf) x = logDensityUnscaled d x - log (sqrt ndf) quantile = quantile cumulative :: StudentT -> Double -> Double cumulative (StudentT ndf) x | x > 0 = 1 - 0.5 * ibeta | otherwise = 0.5 * ibeta where ibeta = incompleteBeta (0.5 * ndf) 0.5 (ndf / (ndf + x*x)) logDensityUnscaled :: StudentT -> Double -> Double logDensityUnscaled (StudentT ndf) x = log (ndf / (ndf + x*x)) * (0.5 * (1 + ndf)) - logBeta 0.5 (0.5 * ndf) quantile :: StudentT -> Double -> Double quantile (StudentT ndf) p | p >= 0 && p <= 1 = let x = invIncompleteBeta (0.5 * ndf) 0.5 (2 * min p (1 - p)) in case sqrt $ ndf * (1 - x) / x of r | p < 0.5 -> -r | otherwise -> r | otherwise = modErr "quantile" $ "p must be in [0,1] range. Got: "++show p instance D.MaybeMean StudentT where maybeMean (StudentT ndf) | ndf > 1 = Just 0 | otherwise = Nothing instance D.MaybeVariance StudentT where maybeVariance (StudentT ndf) | ndf > 2 = Just $! ndf / (ndf - 2) | otherwise = Nothing instance D.Entropy StudentT where entropy (StudentT ndf) = 0.5 * (ndf+1) * (digamma ((1+ndf)/2) - digamma(ndf/2)) + log (sqrt ndf) + logBeta (ndf/2) 0.5 instance D.MaybeEntropy StudentT where maybeEntropy = Just . D.entropy instance D.ContGen StudentT where genContVar = D.genContinous -- | Create an unstandardized Student-t distribution. studentTUnstandardized :: Double -- ^ Number of degrees of freedom -> Double -- ^ Central value (0 for standard Student T distribution) -> Double -- ^ Scale parameter -> LinearTransform StudentT studentTUnstandardized ndf mu sigma | sigma > 0 = LinearTransform mu sigma $ studentT ndf | otherwise = modErr "studentTUnstandardized" $ "sigma must be > 0. Got: " ++ show sigma modErr :: String -> String -> a modErr fun msg = error $ "Statistics.Distribution.StudentT." ++ fun ++ ": " ++ msg statistics-0.13.3.0/Statistics/Distribution/Uniform.hs0000644000000000000000000000567012724050603021107 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Uniform -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Variate distributed uniformly in the interval. module Statistics.Distribution.Uniform ( UniformDistribution -- * Constructors , uniformDistr -- ** Accessors , uniformA , uniformB ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import qualified System.Random.MWC as MWC import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | Uniform distribution from A to B data UniformDistribution = UniformDistribution { uniformA :: {-# UNPACK #-} !Double -- ^ Low boundary of distribution , uniformB :: {-# UNPACK #-} !Double -- ^ Upper boundary of distribution } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON UniformDistribution instance ToJSON UniformDistribution instance Binary UniformDistribution where put (UniformDistribution x y) = put x >> put y get = UniformDistribution <$> get <*> get -- | Create uniform distribution. uniformDistr :: Double -> Double -> UniformDistribution uniformDistr a b | b < a = uniformDistr b a | a < b = UniformDistribution a b | otherwise = error "Statistics.Distribution.Uniform.uniform: wrong parameters" -- NOTE: failure is in default branch to guard againist NaNs. instance D.Distribution UniformDistribution where cumulative (UniformDistribution a b) x | x < a = 0 | x > b = 1 | otherwise = (x - a) / (b - a) instance D.ContDistr UniformDistribution where density (UniformDistribution a b) x | x < a = 0 | x > b = 0 | otherwise = 1 / (b - a) quantile (UniformDistribution a b) p | p >= 0 && p <= 1 = a + (b - a) * p | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.Mean UniformDistribution where mean (UniformDistribution a b) = 0.5 * (a + b) instance D.Variance UniformDistribution where -- NOTE: 1/sqrt 12 is not constant folded (#4101) so it's written as -- numerical constant. (Also FIXME!) stdDev (UniformDistribution a b) = 0.2886751345948129 * (b - a) variance (UniformDistribution a b) = d * d / 12 where d = b - a instance D.MaybeMean UniformDistribution where maybeMean = Just . D.mean instance D.MaybeVariance UniformDistribution where maybeStdDev = Just . D.stdDev instance D.Entropy UniformDistribution where entropy (UniformDistribution a b) = log (b - a) instance D.MaybeEntropy UniformDistribution where maybeEntropy = Just . D.entropy instance D.ContGen UniformDistribution where genContVar (UniformDistribution a b) gen = MWC.uniformR (a,b) gen statistics-0.13.3.0/Statistics/Distribution/Laplace.hs0000644000000000000000000000760512724050603021031 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Laplace -- Copyright : (c) 2015 Mihai Maruseac -- License : BSD3 -- -- Maintainer : mihai.maruseac@maruseac.com -- Stability : experimental -- Portability : portable -- -- The Laplace distribution. This is the continuous probability -- defined as the difference of two iid exponential random variables -- or a Brownian motion evaluated as exponentially distributed times. -- It is used in differential privacy (Laplace Method), speech -- recognition and least absolute deviations method (Laplace's first -- law of errors, giving a robust regression method) -- module Statistics.Distribution.Laplace ( LaplaceDistribution -- * Constructors , laplace , laplaceFromSample -- * Accessors , ldLocation , ldScale ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Data.Vector.Generic as G import qualified Statistics.Distribution as D import qualified Statistics.Quantile as Q import qualified Statistics.Sample as S import Statistics.Types (Sample) import Control.Applicative ((<$>), (<*>)) data LaplaceDistribution = LD { ldLocation :: {-# UNPACK #-} !Double -- ^ Location. , ldScale :: {-# UNPACK #-} !Double -- ^ Scale. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON LaplaceDistribution instance ToJSON LaplaceDistribution instance Binary LaplaceDistribution where put (LD l s) = put l >> put s get = LD <$> get <*> get instance D.Distribution LaplaceDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr LaplaceDistribution where density (LD l s) x = exp (- abs (x - l) / s) / (2 * s) logDensity (LD l s) x = - abs (x - l) / s - log 2 - log s quantile = quantile instance D.Mean LaplaceDistribution where mean (LD l _) = l instance D.Variance LaplaceDistribution where variance (LD _ s) = 2 * s * s instance D.MaybeMean LaplaceDistribution where maybeMean = Just . D.mean instance D.MaybeVariance LaplaceDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy LaplaceDistribution where entropy (LD _ s) = 1 + log (2 * s) instance D.MaybeEntropy LaplaceDistribution where maybeEntropy = Just . D.entropy instance D.ContGen LaplaceDistribution where genContVar = D.genContinous cumulative :: LaplaceDistribution -> Double -> Double cumulative (LD l s) x | x <= l = 0.5 * exp ( (x - l) / s) | otherwise = 1 - 0.5 * exp ( - (x - l) / s ) complCumulative :: LaplaceDistribution -> Double -> Double complCumulative (LD l s) x | x <= l = 1 - 0.5 * exp ( (x - l) / s) | otherwise = 0.5 * exp ( - (x - l) / s ) quantile :: LaplaceDistribution -> Double -> Double quantile (LD l s) p | p == 0 = -inf | p == 1 = inf | p == 0.5 = l | p > 0 && p < 0.5 = l + s * log (2 * p) | p > 0.5 && p < 1 = l - s * log (2 - 2 * p) | otherwise = error $ "Statistics.Distribution.Laplace.quantile: p must be in [0,1] range. Got: "++show p where inf = 1 / 0 -- | Create an Laplace distribution. laplace :: Double -- ^ Location -> Double -- ^ Scale -> LaplaceDistribution laplace l s | s <= 0 = error $ "Statistics.Distribution.Laplace.laplace: scale parameter must be positive. Got " ++ show s | otherwise = LD l s -- | Create Laplace distribution from sample. No tests are made to -- check whether it truly is Laplace. Location of distribution -- estimated as median of sample. laplaceFromSample :: Sample -> LaplaceDistribution laplaceFromSample xs = LD s l where s = Q.continuousBy Q.medianUnbiased 1 2 xs l = S.mean $ G.map (\x -> abs $ x - s) xs statistics-0.13.3.0/Statistics/Distribution/CauchyLorentz.hs0000644000000000000000000000525212724050603022256 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.CauchyLorentz -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Cauchy-Lorentz distribution. It's also known as Lorentz -- distribution or Breit–Wigner distribution. -- -- It doesn't have mean and variance. module Statistics.Distribution.CauchyLorentz ( CauchyDistribution , cauchyDistribMedian , cauchyDistribScale -- * Constructors , cauchyDistribution , standardCauchy ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | Cauchy-Lorentz distribution. data CauchyDistribution = CD { -- | Central value of Cauchy-Lorentz distribution which is its -- mode and median. Distribution doesn't have mean so function -- is named after median. cauchyDistribMedian :: {-# UNPACK #-} !Double -- | Scale parameter of Cauchy-Lorentz distribution. It's -- different from variance and specify half width at half -- maximum (HWHM). , cauchyDistribScale :: {-# UNPACK #-} !Double } deriving (Eq, Show, Read, Typeable, Data, Generic) instance FromJSON CauchyDistribution instance ToJSON CauchyDistribution instance Binary CauchyDistribution where put (CD x y) = put x >> put y get = CD <$> get <*> get -- | Cauchy distribution cauchyDistribution :: Double -- ^ Central point -> Double -- ^ Scale parameter (FWHM) -> CauchyDistribution cauchyDistribution m s | s > 0 = CD m s | otherwise = error $ "Statistics.Distribution.CauchyLorentz.cauchyDistribution: FWHM must be positive. Got " ++ show s standardCauchy :: CauchyDistribution standardCauchy = CD 0 1 instance D.Distribution CauchyDistribution where cumulative (CD m s) x = 0.5 + atan( (x - m) / s ) / pi instance D.ContDistr CauchyDistribution where density (CD m s) x = (1 / pi) / (s * (1 + y*y)) where y = (x - m) / s quantile (CD m s) p | p > 0 && p < 1 = m + s * tan( pi * (p - 0.5) ) | p == 0 = -1 / 0 | p == 1 = 1 / 0 | otherwise = error $ "Statistics.Distribution.CauchyLorentz..quantile: p must be in [0,1] range. Got: "++show p instance D.ContGen CauchyDistribution where genContVar = D.genContinous instance D.Entropy CauchyDistribution where entropy (CD _ s) = log s + log (4*pi) instance D.MaybeEntropy CauchyDistribution where maybeEntropy = Just . D.entropy statistics-0.13.3.0/Statistics/Distribution/Transform.hs0000644000000000000000000000705212724050603021437 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric, FlexibleContexts, FlexibleInstances, UndecidableInstances #-} -- | -- Module : Statistics.Distribution.Transform -- Copyright : (c) 2013 John McDonnell; -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Transformations over distributions module Statistics.Distribution.Transform ( LinearTransform (..) , linTransFixedPoint , scaleAround ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<*>)) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import Data.Functor ((<$>)) import GHC.Generics (Generic) import qualified Statistics.Distribution as D -- | Linear transformation applied to distribution. -- -- > LinearTransform μ σ _ -- > x' = μ + σ·x data LinearTransform d = LinearTransform { linTransLocation :: {-# UNPACK #-} !Double -- ^ Location parameter. , linTransScale :: {-# UNPACK #-} !Double -- ^ Scale parameter. , linTransDistr :: d -- ^ Distribution being transformed. } deriving (Eq, Show, Read, Typeable, Data, Generic) instance (FromJSON d) => FromJSON (LinearTransform d) instance (ToJSON d) => ToJSON (LinearTransform d) instance (Binary d) => Binary (LinearTransform d) where get = LinearTransform <$> get <*> get <*> get put (LinearTransform x y z) = put x >> put y >> put z -- | Apply linear transformation to distribution. scaleAround :: Double -- ^ Fixed point -> Double -- ^ Scale parameter -> d -- ^ Distribution -> LinearTransform d scaleAround x0 sc = LinearTransform (x0 * (1 - sc)) sc -- | Get fixed point of linear transformation linTransFixedPoint :: LinearTransform d -> Double linTransFixedPoint (LinearTransform loc sc _) = loc / (1 - sc) instance Functor LinearTransform where fmap f (LinearTransform loc sc dist) = LinearTransform loc sc (f dist) instance D.Distribution d => D.Distribution (LinearTransform d) where cumulative (LinearTransform loc sc dist) x = D.cumulative dist $ (x-loc) / sc instance D.ContDistr d => D.ContDistr (LinearTransform d) where density (LinearTransform loc sc dist) x = D.density dist ((x-loc) / sc) / sc logDensity (LinearTransform loc sc dist) x = D.logDensity dist ((x-loc) / sc) - log sc quantile (LinearTransform loc sc dist) p = loc + sc * D.quantile dist p instance D.MaybeMean d => D.MaybeMean (LinearTransform d) where maybeMean (LinearTransform loc _ dist) = (+loc) <$> D.maybeMean dist instance (D.Mean d) => D.Mean (LinearTransform d) where mean (LinearTransform loc _ dist) = loc + D.mean dist instance D.MaybeVariance d => D.MaybeVariance (LinearTransform d) where maybeVariance (LinearTransform _ sc dist) = (*(sc*sc)) <$> D.maybeVariance dist maybeStdDev (LinearTransform _ sc dist) = (*sc) <$> D.maybeStdDev dist instance (D.Variance d) => D.Variance (LinearTransform d) where variance (LinearTransform _ sc dist) = sc * sc * D.variance dist stdDev (LinearTransform _ sc dist) = sc * D.stdDev dist instance (D.MaybeEntropy d, D.DiscreteDistr d) => D.MaybeEntropy (LinearTransform d) where maybeEntropy (LinearTransform _ _ dist) = D.maybeEntropy dist instance (D.Entropy d, D.DiscreteDistr d) => D.Entropy (LinearTransform d) where entropy (LinearTransform _ _ dist) = D.entropy dist instance D.ContGen d => D.ContGen (LinearTransform d) where genContVar (LinearTransform loc sc d) g = do x <- D.genContVar d g return $! loc + sc * x statistics-0.13.3.0/Statistics/Distribution/Exponential.hs0000644000000000000000000000661012724050603021751 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Exponential -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The exponential distribution. This is the continunous probability -- distribution of the times between events in a poisson process, in -- which events occur continuously and independently at a constant -- average rate. module Statistics.Distribution.Exponential ( ExponentialDistribution -- * Constructors , exponential , exponentialFromSample -- * Accessors , edLambda ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_neg_inf) import qualified Statistics.Distribution as D import qualified Statistics.Sample as S import qualified System.Random.MWC.Distributions as MWC import Statistics.Types (Sample) import Data.Binary (put, get) newtype ExponentialDistribution = ED { edLambda :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON ExponentialDistribution instance ToJSON ExponentialDistribution instance Binary ExponentialDistribution where put = put . edLambda get = fmap ED get instance D.Distribution ExponentialDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr ExponentialDistribution where density (ED l) x | x < 0 = 0 | otherwise = l * exp (-l * x) logDensity (ED l) x | x < 0 = m_neg_inf | otherwise = log l + (-l * x) quantile = quantile instance D.Mean ExponentialDistribution where mean (ED l) = 1 / l instance D.Variance ExponentialDistribution where variance (ED l) = 1 / (l * l) instance D.MaybeMean ExponentialDistribution where maybeMean = Just . D.mean instance D.MaybeVariance ExponentialDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy ExponentialDistribution where entropy (ED l) = 1 - log l instance D.MaybeEntropy ExponentialDistribution where maybeEntropy = Just . D.entropy instance D.ContGen ExponentialDistribution where genContVar = MWC.exponential . edLambda cumulative :: ExponentialDistribution -> Double -> Double cumulative (ED l) x | x <= 0 = 0 | otherwise = 1 - exp (-l * x) complCumulative :: ExponentialDistribution -> Double -> Double complCumulative (ED l) x | x <= 0 = 1 | otherwise = exp (-l * x) quantile :: ExponentialDistribution -> Double -> Double quantile (ED l) p | p == 1 = 1 / 0 | p >= 0 && p < 1 = -log (1 - p) / l | otherwise = error $ "Statistics.Distribution.Exponential.quantile: p must be in [0,1] range. Got: "++show p -- | Create an exponential distribution. exponential :: Double -- ^ Rate parameter. -> ExponentialDistribution exponential l | l <= 0 = error $ "Statistics.Distribution.Exponential.exponential: scale parameter must be positive. Got " ++ show l | otherwise = ED l -- | Create exponential distribution from sample. No tests are made to -- check whether it truly is exponential. exponentialFromSample :: Sample -> ExponentialDistribution exponentialFromSample = ED . S.mean statistics-0.13.3.0/Statistics/Distribution/ChiSquared.hs0000644000000000000000000000623212724050603021513 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.ChiSquared -- Copyright : (c) 2010 Alexey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The chi-squared distribution. This is a continuous probability -- distribution of sum of squares of k independent standard normal -- distributions. It's commonly used in statistical tests module Statistics.Distribution.ChiSquared ( ChiSquared -- Constructors , chiSquared , chiSquaredNDF ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.SpecFunctions ( incompleteGamma,invIncompleteGamma,logGamma,digamma) import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC import Data.Binary (put, get) -- | Chi-squared distribution newtype ChiSquared = ChiSquared Int deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON ChiSquared instance ToJSON ChiSquared instance Binary ChiSquared where get = fmap ChiSquared get put (ChiSquared x) = put x -- | Get number of degrees of freedom chiSquaredNDF :: ChiSquared -> Int chiSquaredNDF (ChiSquared ndf) = ndf -- | Construct chi-squared distribution. Number of degrees of freedom -- must be positive. chiSquared :: Int -> ChiSquared chiSquared n | n <= 0 = error $ "Statistics.Distribution.ChiSquared.chiSquared: N.D.F. must be positive. Got " ++ show n | otherwise = ChiSquared n instance D.Distribution ChiSquared where cumulative = cumulative instance D.ContDistr ChiSquared where density = density quantile = quantile instance D.Mean ChiSquared where mean (ChiSquared ndf) = fromIntegral ndf instance D.Variance ChiSquared where variance (ChiSquared ndf) = fromIntegral (2*ndf) instance D.MaybeMean ChiSquared where maybeMean = Just . D.mean instance D.MaybeVariance ChiSquared where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy ChiSquared where entropy (ChiSquared ndf) = let kHalf = 0.5 * fromIntegral ndf in kHalf + log 2 + logGamma kHalf + (1-kHalf) * digamma kHalf instance D.MaybeEntropy ChiSquared where maybeEntropy = Just . D.entropy instance D.ContGen ChiSquared where genContVar (ChiSquared n) = MWC.chiSquare n cumulative :: ChiSquared -> Double -> Double cumulative chi x | x <= 0 = 0 | otherwise = incompleteGamma (ndf/2) (x/2) where ndf = fromIntegral $ chiSquaredNDF chi density :: ChiSquared -> Double -> Double density chi x | x <= 0 = 0 | otherwise = exp $ log x * (ndf2 - 1) - x2 - logGamma ndf2 - log 2 * ndf2 where ndf = fromIntegral $ chiSquaredNDF chi ndf2 = ndf/2 x2 = x/2 quantile :: ChiSquared -> Double -> Double quantile (ChiSquared ndf) p | p == 0 = 0 | p == 1 = 1/0 | p > 0 && p < 1 = 2 * invIncompleteGamma (fromIntegral ndf / 2) p | otherwise = error $ "Statistics.Distribution.ChiSquared.quantile: p must be in [0,1] range. Got: "++show p statistics-0.13.3.0/Statistics/Distribution/Normal.hs0000644000000000000000000001035212724050603020711 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Normal -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The normal distribution. This is a continuous probability -- distribution that describes data that cluster around a mean. module Statistics.Distribution.Normal ( NormalDistribution -- * Constructors , normalDistr , normalFromSample , standard ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<$>), (<*>)) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_sqrt_2, m_sqrt_2_pi) import Numeric.SpecFunctions (erfc, invErfc) import qualified Statistics.Distribution as D import qualified Statistics.Sample as S import qualified System.Random.MWC.Distributions as MWC -- | The normal distribution. data NormalDistribution = ND { mean :: {-# UNPACK #-} !Double , stdDev :: {-# UNPACK #-} !Double , ndPdfDenom :: {-# UNPACK #-} !Double , ndCdfDenom :: {-# UNPACK #-} !Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON NormalDistribution instance ToJSON NormalDistribution instance Binary NormalDistribution where put (ND w x y z) = put w >> put x >> put y >> put z get = ND <$> get <*> get <*> get <*> get instance D.Distribution NormalDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr NormalDistribution where logDensity = logDensity quantile = quantile instance D.MaybeMean NormalDistribution where maybeMean = Just . D.mean instance D.Mean NormalDistribution where mean = mean instance D.MaybeVariance NormalDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Variance NormalDistribution where stdDev = stdDev instance D.Entropy NormalDistribution where entropy d = 0.5 * log (2 * pi * exp 1 * D.variance d) instance D.MaybeEntropy NormalDistribution where maybeEntropy = Just . D.entropy instance D.ContGen NormalDistribution where genContVar d = MWC.normal (mean d) (stdDev d) -- | Standard normal distribution with mean equal to 0 and variance equal to 1 standard :: NormalDistribution standard = ND { mean = 0.0 , stdDev = 1.0 , ndPdfDenom = log m_sqrt_2_pi , ndCdfDenom = m_sqrt_2 } -- | Create normal distribution from parameters. -- -- IMPORTANT: prior to 0.10 release second parameter was variance not -- standard deviation. normalDistr :: Double -- ^ Mean of distribution -> Double -- ^ Standard deviation of distribution -> NormalDistribution normalDistr m sd | sd > 0 = ND { mean = m , stdDev = sd , ndPdfDenom = log $ m_sqrt_2_pi * sd , ndCdfDenom = m_sqrt_2 * sd } | otherwise = error $ "Statistics.Distribution.Normal.normalDistr: standard deviation must be positive. Got " ++ show sd -- | Create distribution using parameters estimated from -- sample. Variance is estimated using maximum likelihood method -- (biased estimation). normalFromSample :: S.Sample -> NormalDistribution normalFromSample xs = normalDistr m (sqrt v) where (m,v) = S.meanVariance xs logDensity :: NormalDistribution -> Double -> Double logDensity d x = (-xm * xm / (2 * sd * sd)) - ndPdfDenom d where xm = x - mean d sd = stdDev d cumulative :: NormalDistribution -> Double -> Double cumulative d x = erfc ((mean d - x) / ndCdfDenom d) / 2 complCumulative :: NormalDistribution -> Double -> Double complCumulative d x = erfc ((x - mean d) / ndCdfDenom d) / 2 quantile :: NormalDistribution -> Double -> Double quantile d p | p == 0 = -inf | p == 1 = inf | p == 0.5 = mean d | p > 0 && p < 1 = x * ndCdfDenom d + mean d | otherwise = error $ "Statistics.Distribution.Normal.quantile: p must be in [0,1] range. Got: "++show p where x = - invErfc (2 * p) inf = 1/0 statistics-0.13.3.0/Statistics/Distribution/Gamma.hs0000644000000000000000000001020712724050603020502 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Gamma -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The gamma distribution. This is a continuous probability -- distribution with two parameters, /k/ and ϑ. If /k/ is -- integral, the distribution represents the sum of /k/ independent -- exponentially distributed random variables, each of which has a -- mean of ϑ. module Statistics.Distribution.Gamma ( GammaDistribution -- * Constructors , gammaDistr , improperGammaDistr -- * Accessors , gdShape , gdScale ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<$>), (<*>)) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_pos_inf, m_NaN, m_neg_inf) import Numeric.SpecFunctions (incompleteGamma, invIncompleteGamma, logGamma, digamma) import Statistics.Distribution.Poisson.Internal as Poisson import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC -- | The gamma distribution. data GammaDistribution = GD { gdShape :: {-# UNPACK #-} !Double -- ^ Shape parameter, /k/. , gdScale :: {-# UNPACK #-} !Double -- ^ Scale parameter, ϑ. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON GammaDistribution instance ToJSON GammaDistribution instance Binary GammaDistribution where put (GD x y) = put x >> put y get = GD <$> get <*> get -- | Create gamma distribution. Both shape and scale parameters must -- be positive. gammaDistr :: Double -- ^ Shape parameter. /k/ -> Double -- ^ Scale parameter, ϑ. -> GammaDistribution gammaDistr k theta | k <= 0 = error $ msg ++ "shape must be positive. Got " ++ show k | theta <= 0 = error $ msg ++ "scale must be positive. Got " ++ show theta | otherwise = improperGammaDistr k theta where msg = "Statistics.Distribution.Gamma.gammaDistr: " -- | Create gamma distribution. This constructor do not check whether -- parameters are valid improperGammaDistr :: Double -- ^ Shape parameter. /k/ -> Double -- ^ Scale parameter, ϑ. -> GammaDistribution improperGammaDistr = GD instance D.Distribution GammaDistribution where cumulative = cumulative instance D.ContDistr GammaDistribution where density = density logDensity (GD k theta) x | x <= 0 = m_neg_inf | otherwise = log x * (k - 1) - (x / theta) - logGamma k - log theta * k quantile = quantile instance D.Variance GammaDistribution where variance (GD a l) = a * l * l instance D.Mean GammaDistribution where mean (GD a l) = a * l instance D.MaybeMean GammaDistribution where maybeMean = Just . D.mean instance D.MaybeVariance GammaDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.MaybeEntropy GammaDistribution where maybeEntropy (GD a l) | a > 0 && l > 0 = Just $ a + log l + logGamma a + (1-a) * digamma a | otherwise = Nothing instance D.ContGen GammaDistribution where genContVar (GD a l) = MWC.gamma a l density :: GammaDistribution -> Double -> Double density (GD a l) x | a < 0 || l <= 0 = m_NaN | x <= 0 = 0 | a == 0 = if x == 0 then m_pos_inf else 0 | x == 0 = if a < 1 then m_pos_inf else if a > 1 then 0 else 1/l | a < 1 = Poisson.probability (x/l) a * a / x | otherwise = Poisson.probability (x/l) (a-1) / l cumulative :: GammaDistribution -> Double -> Double cumulative (GD k l) x | x <= 0 = 0 | otherwise = incompleteGamma k (x/l) quantile :: GammaDistribution -> Double -> Double quantile (GD k l) p | p == 0 = 0 | p == 1 = 1/0 | p > 0 && p < 1 = l * invIncompleteGamma k p | otherwise = error $ "Statistics.Distribution.Gamma.quantile: p must be in [0,1] range. Got: "++show p statistics-0.13.3.0/Statistics/Distribution/Geometric.hs0000644000000000000000000001216712724050603021405 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Geometric -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Geometric distribution. There are two variants of -- distribution. First is the probability distribution of the number -- of Bernoulli trials needed to get one success, supported on the set -- [1,2..] ('GeometricDistribution'). Sometimes it's referred to as -- the /shifted/ geometric distribution to distinguish from another -- one. -- -- Second variant is probability distribution of the number of -- failures before first success, defined over the set [0,1..] -- ('GeometricDistribution0'). module Statistics.Distribution.Geometric ( GeometricDistribution , GeometricDistribution0 -- * Constructors , geometric , geometric0 -- ** Accessors , gdSuccess , gdSuccess0 ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<$>)) import Control.Monad (liftM) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_pos_inf, m_neg_inf) import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC ---------------------------------------------------------------- -- Distribution over [1..] newtype GeometricDistribution = GD { gdSuccess :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON GeometricDistribution instance ToJSON GeometricDistribution instance Binary GeometricDistribution where get = GD <$> get put (GD x) = put x instance D.Distribution GeometricDistribution where cumulative = cumulative instance D.DiscreteDistr GeometricDistribution where probability (GD s) n | n < 1 = 0 | otherwise = s * (1-s) ** (fromIntegral n - 1) logProbability (GD s) n | n < 1 = m_neg_inf | otherwise = log s + log (1-s) * (fromIntegral n - 1) instance D.Mean GeometricDistribution where mean (GD s) = 1 / s instance D.Variance GeometricDistribution where variance (GD s) = (1 - s) / (s * s) instance D.MaybeMean GeometricDistribution where maybeMean = Just . D.mean instance D.MaybeVariance GeometricDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy GeometricDistribution where entropy (GD s) | s == 0 = m_pos_inf | s == 1 = 0 | otherwise = negate $ (s * log s + (1-s) * log (1-s)) / s instance D.MaybeEntropy GeometricDistribution where maybeEntropy = Just . D.entropy instance D.DiscreteGen GeometricDistribution where genDiscreteVar (GD s) g = MWC.geometric1 s g instance D.ContGen GeometricDistribution where genContVar d g = fromIntegral `liftM` D.genDiscreteVar d g -- | Create geometric distribution. geometric :: Double -- ^ Success rate -> GeometricDistribution geometric x | x >= 0 && x <= 1 = GD x | otherwise = error $ "Statistics.Distribution.Geometric.geometric: probability must be in [0,1] range. Got " ++ show x cumulative :: GeometricDistribution -> Double -> Double cumulative (GD s) x | x < 1 = 0 | isInfinite x = 1 | isNaN x = error "Statistics.Distribution.Geometric.cumulative: NaN input" | otherwise = 1 - (1-s) ^ (floor x :: Int) ---------------------------------------------------------------- -- Distribution over [0..] newtype GeometricDistribution0 = GD0 { gdSuccess0 :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON GeometricDistribution0 instance ToJSON GeometricDistribution0 instance Binary GeometricDistribution0 where get = GD0 <$> get put (GD0 x) = put x instance D.Distribution GeometricDistribution0 where cumulative (GD0 s) x = cumulative (GD s) (x + 1) instance D.DiscreteDistr GeometricDistribution0 where probability (GD0 s) n = D.probability (GD s) (n + 1) logProbability (GD0 s) n = D.logProbability (GD s) (n + 1) instance D.Mean GeometricDistribution0 where mean (GD0 s) = 1 / s - 1 instance D.Variance GeometricDistribution0 where variance (GD0 s) = D.variance (GD s) instance D.MaybeMean GeometricDistribution0 where maybeMean = Just . D.mean instance D.MaybeVariance GeometricDistribution0 where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy GeometricDistribution0 where entropy (GD0 s) = D.entropy (GD s) instance D.MaybeEntropy GeometricDistribution0 where maybeEntropy = Just . D.entropy instance D.DiscreteGen GeometricDistribution0 where genDiscreteVar (GD0 s) g = MWC.geometric0 s g instance D.ContGen GeometricDistribution0 where genContVar d g = fromIntegral `liftM` D.genDiscreteVar d g -- | Create geometric distribution. geometric0 :: Double -- ^ Success rate -> GeometricDistribution0 geometric0 x | x >= 0 && x <= 1 = GD0 x | otherwise = error $ "Statistics.Distribution.Geometric.geometric: probability must be in [0,1] range. Got " ++ show x statistics-0.13.3.0/Statistics/Distribution/Binomial.hs0000644000000000000000000000742112724050603021216 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Binomial -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The binomial distribution. This is the discrete probability -- distribution of the number of successes in a sequence of /n/ -- independent yes\/no experiments, each of which yields success with -- probability /p/. module Statistics.Distribution.Binomial ( BinomialDistribution -- * Constructors , binomial -- * Accessors , bdTrials , bdProbability ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import qualified Statistics.Distribution.Poisson.Internal as I import Numeric.SpecFunctions (choose,incompleteBeta) import Numeric.MathFunctions.Constants (m_epsilon) import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | The binomial distribution. data BinomialDistribution = BD { bdTrials :: {-# UNPACK #-} !Int -- ^ Number of trials. , bdProbability :: {-# UNPACK #-} !Double -- ^ Probability. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON BinomialDistribution instance ToJSON BinomialDistribution instance Binary BinomialDistribution where put (BD x y) = put x >> put y get = BD <$> get <*> get instance D.Distribution BinomialDistribution where cumulative = cumulative instance D.DiscreteDistr BinomialDistribution where probability = probability instance D.Mean BinomialDistribution where mean = mean instance D.Variance BinomialDistribution where variance = variance instance D.MaybeMean BinomialDistribution where maybeMean = Just . D.mean instance D.MaybeVariance BinomialDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy BinomialDistribution where entropy (BD n p) | n == 0 = 0 | n <= 100 = directEntropy (BD n p) | otherwise = I.poissonEntropy (fromIntegral n * p) instance D.MaybeEntropy BinomialDistribution where maybeEntropy = Just . D.entropy -- This could be slow for big n probability :: BinomialDistribution -> Int -> Double probability (BD n p) k | k < 0 || k > n = 0 | n == 0 = 1 | otherwise = choose n k * p^k * (1-p)^(n-k) -- Summation from different sides required to reduce roundoff errors cumulative :: BinomialDistribution -> Double -> Double cumulative (BD n p) x | isNaN x = error "Statistics.Distribution.Binomial.cumulative: NaN input" | isInfinite x = if x > 0 then 1 else 0 | k < 0 = 0 | k >= n = 1 | otherwise = incompleteBeta (fromIntegral (n-k)) (fromIntegral (k+1)) (1 - p) where k = floor x mean :: BinomialDistribution -> Double mean (BD n p) = fromIntegral n * p variance :: BinomialDistribution -> Double variance (BD n p) = fromIntegral n * p * (1 - p) directEntropy :: BinomialDistribution -> Double directEntropy d@(BD n _) = negate . sum $ takeWhile (< negate m_epsilon) $ dropWhile (not . (< negate m_epsilon)) $ [ let x = probability d k in x * log x | k <- [0..n]] -- | Construct binomial distribution. Number of trials must be -- non-negative and probability must be in [0,1] range binomial :: Int -- ^ Number of trials. -> Double -- ^ Probability. -> BinomialDistribution binomial n p | n < 0 = error $ msg ++ "number of trials must be non-negative. Got " ++ show n | p < 0 || p > 1 = error $ msg++"probability must be in [0,1] range. Got " ++ show p | otherwise = BD n p where msg = "Statistics.Distribution.Binomial.binomial: " statistics-0.13.3.0/Statistics/Distribution/Hypergeometric.hs0000644000000000000000000000766312724050603022462 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Hypergeometric -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Hypergeometric distribution. This is the discrete probability -- distribution that measures the probability of /k/ successes in /l/ -- trials, without replacement, from a finite population. -- -- The parameters of the distribution describe /k/ elements chosen -- from a population of /l/, with /m/ elements of one type, and -- /l/-/m/ of the other (all are positive integers). module Statistics.Distribution.Hypergeometric ( HypergeometricDistribution -- * Constructors , hypergeometric -- ** Accessors , hdM , hdL , hdK ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_epsilon) import Numeric.SpecFunctions (choose) import qualified Statistics.Distribution as D import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) data HypergeometricDistribution = HD { hdM :: {-# UNPACK #-} !Int , hdL :: {-# UNPACK #-} !Int , hdK :: {-# UNPACK #-} !Int } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON HypergeometricDistribution instance ToJSON HypergeometricDistribution instance Binary HypergeometricDistribution where get = HD <$> get <*> get <*> get put (HD x y z) = put x >> put y >> put z instance D.Distribution HypergeometricDistribution where cumulative = cumulative instance D.DiscreteDistr HypergeometricDistribution where probability = probability instance D.Mean HypergeometricDistribution where mean = mean instance D.Variance HypergeometricDistribution where variance = variance instance D.MaybeMean HypergeometricDistribution where maybeMean = Just . D.mean instance D.MaybeVariance HypergeometricDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy HypergeometricDistribution where entropy = directEntropy instance D.MaybeEntropy HypergeometricDistribution where maybeEntropy = Just . D.entropy variance :: HypergeometricDistribution -> Double variance (HD m l k) = (k' * ml) * (1 - ml) * (l' - k') / (l' - 1) where m' = fromIntegral m l' = fromIntegral l k' = fromIntegral k ml = m' / l' mean :: HypergeometricDistribution -> Double mean (HD m l k) = fromIntegral k * fromIntegral m / fromIntegral l directEntropy :: HypergeometricDistribution -> Double directEntropy d@(HD m _ _) = negate . sum $ takeWhile (< negate m_epsilon) $ dropWhile (not . (< negate m_epsilon)) $ [ let x = probability d n in x * log x | n <- [0..m]] hypergeometric :: Int -- ^ /m/ -> Int -- ^ /l/ -> Int -- ^ /k/ -> HypergeometricDistribution hypergeometric m l k | not (l > 0) = error $ msg ++ "l must be positive" | not (m >= 0 && m <= l) = error $ msg ++ "m must lie in [0,l] range" | not (k > 0 && k <= l) = error $ msg ++ "k must lie in (0,l] range" | otherwise = HD m l k where msg = "Statistics.Distribution.Hypergeometric.hypergeometric: " -- Naive implementation probability :: HypergeometricDistribution -> Int -> Double probability (HD mi li ki) n | n < max 0 (mi+ki-li) || n > min mi ki = 0 | otherwise = choose mi n * choose (li - mi) (ki - n) / choose li ki cumulative :: HypergeometricDistribution -> Double -> Double cumulative d@(HD mi li ki) x | isNaN x = error "Statistics.Distribution.Hypergeometric.cumulative: NaN argument" | isInfinite x = if x > 0 then 1 else 0 | n < minN = 0 | n >= maxN = 1 | otherwise = D.sumProbabilities d minN n where n = floor x minN = max 0 (mi+ki-li) maxN = min mi ki statistics-0.13.3.0/Statistics/Distribution/Poisson/0000755000000000000000000000000012724050603020556 5ustar0000000000000000statistics-0.13.3.0/Statistics/Distribution/Poisson/Internal.hs0000644000000000000000000001541512724050603022674 0ustar0000000000000000-- | -- Module : Statistics.Distribution.Poisson.Internal -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Internal code for the Poisson distribution. module Statistics.Distribution.Poisson.Internal ( probability, poissonEntropy ) where import Data.List (unfoldr) import Numeric.MathFunctions.Constants (m_sqrt_2_pi, m_tiny, m_epsilon) import Numeric.SpecFunctions (logGamma, stirlingError, choose, logFactorial) import Numeric.SpecFunctions.Extra (bd0) -- | An unchecked, non-integer-valued version of Loader's saddle point -- algorithm. probability :: Double -> Double -> Double probability 0 0 = 1 probability 0 1 = 0 probability lambda x | isInfinite lambda = 0 | x < 0 = 0 | x <= lambda * m_tiny = exp (-lambda) | lambda < x * m_tiny = exp (-lambda + x * log lambda - logGamma (x+1)) | otherwise = exp (-(stirlingError x) - bd0 x lambda) / (m_sqrt_2_pi * sqrt x) -- | Compute entropy using Theorem 1 from "Sharp Bounds on the Entropy -- of the Poisson Law". This function is unused because 'directEntorpy' -- is just as accurate and is faster by about a factor of 4. alyThm1 :: Double -> Double alyThm1 lambda = sum (takeWhile (\x -> abs x >= m_epsilon * lll) alySeries) + lll where lll = lambda * (1 - log lambda) alySeries = [ alyc k * exp (fromIntegral k * log lambda - logFactorial k) | k <- [2..] ] alyc :: Int -> Double alyc k = sum [ parity j * choose (k-1) j * log (fromIntegral j+1) | j <- [0..k-1] ] where parity j | even (k-j) = -1 | otherwise = 1 -- | Returns [x, x^2, x^3, x^4, ...] powers :: Double -> [Double] powers x = unfoldr (\y -> Just (y*x,y*x)) 1 -- | Returns an upper bound according to theorem 2 of "Sharp Bounds on -- the Entropy of the Poisson Law" alyThm2Upper :: Double -> [Double] -> Double alyThm2Upper lambda coefficients = 1.4189385332046727 + 0.5 * log lambda + zipCoefficients lambda coefficients -- | Returns the average of the upper and lower bounds accounding to -- theorem 2. alyThm2 :: Double -> [Double] -> [Double] -> Double alyThm2 lambda upper lower = alyThm2Upper lambda upper + 0.5 * (zipCoefficients lambda lower) zipCoefficients :: Double -> [Double] -> Double zipCoefficients lambda coefficients = (sum $ map (uncurry (*)) (zip (powers $ recip lambda) coefficients)) -- Mathematica code deriving the coefficients below: -- -- poissonMoment[0, s_] := 1 -- poissonMoment[1, s_] := 0 -- poissonMoment[k_, s_] := -- Sum[s * Binomial[k - 1, j] * poissonMoment[j, s], {j, 0, k - 2}] -- -- upperSeries[m_] := -- Distribute[Integrate[ -- Sum[(-1)^(j - 1) * -- poissonMoment[j, \[Lambda]] / (j * (j - 1)* \[Lambda]^j), -- {j, 3, 2 m - 1}], -- \[Lambda]]] -- -- lowerSeries[m_] := -- Distribute[Integrate[ -- poissonMoment[ -- 2 m + 2, \[Lambda]] / ((2 m + -- 1)*\[Lambda]^(2 m + 2)), \[Lambda]]] -- -- upperBound[m_] := upperSeries[m] + (Log[2*Pi*\[Lambda]] + 1)/2 -- -- lowerBound[m_] := upperBound[m] + lowerSeries[m] upperCoefficients4 :: [Double] upperCoefficients4 = [1/12, 1/24, -103/180, -13/40, -1/210] lowerCoefficients4 :: [Double] lowerCoefficients4 = [0,0,0, -105/4, -210, -2275/18, -167/21, -1/72] upperCoefficients6 :: [Double] upperCoefficients6 = [1/12, 1/24, 19/360, 9/80, -38827/2520, -74855/1008, -73061/2520, -827/720, -1/990] lowerCoefficients6 :: [Double] lowerCoefficients6 = [0,0,0,0,0, -3465/2, -45045, -466235/4, -531916/9, -56287/10, -629/11, -1/156] upperCoefficients8 :: [Double] upperCoefficients8 = [1/12, 1/24, 19/360, 9/80, 863/2520, 1375/1008, -3023561/2520, -15174047/720, -231835511/5940, -18927611/1320, -58315591/60060, -23641/3640, -1/2730] lowerCoefficients8 :: [Double] lowerCoefficients8 = [0,0,0,0,0,0,0, -2027025/8, -15315300, -105252147, -178127950, -343908565/4, -10929270, -3721149/14, -7709/15, -1/272] upperCoefficients10 :: [Double] upperCoefficients10 = [1/12, 1/24, 19/360, 9,80, 863/2520, 1375/1008, 33953/5040, 57281/1440, -2271071617/11880, -1483674219/176, -31714406276557/720720, -7531072742237/131040, -1405507544003/65520, -21001919627/10080, -1365808297/36720, -26059/544, -1/5814] lowerCoefficients10 :: [Double] lowerCoefficients10 = [0,0,0,0,0,0,0,0,0,-130945815/2, -7638505875, -438256243425/4, -435477637540, -3552526473925/6, -857611717105/3, -545654955967/12, -5794690528/3, -578334559/42, -699043/133, -1/420] upperCoefficients12 :: [Double] upperCoefficients12 = [1/12, 1/24, 19/360, 863/2520, 1375/1008, 33953/5040, 57281/1440, 3250433/11880, 378351/176, -37521922090657/720720, -612415657466657/131040, -3476857538815223/65520, -243882174660761/1440, -34160796727900637/183600, -39453820646687/544, -750984629069237/81396, -2934056300989/9576, -20394527513/12540, -3829559/9240, -1/10626] lowerCoefficients12 :: [Double] lowerCoefficients12 = [0,0,0,0,0,0,0,0,0,0,0, -105411381075/4, -5270569053750, -272908057767345/2, -1051953238104769, -24557168490009155/8, -3683261873403112, -5461918738302026/3, -347362037754732, -2205885452434521/100, -12237195698286/35, -16926981721/22, -6710881/155, -1/600] -- | Compute entropy directly from its definition. This is just as accurate -- as 'alyThm1' for lambda <= 1 and is faster, but is slow for large lambda, -- and produces some underestimation due to accumulation of floating point -- error. directEntropy :: Double -> Double directEntropy lambda = negate . sum $ takeWhile (< negate m_epsilon * lambda) $ dropWhile (not . (< negate m_epsilon * lambda)) $ [ let x = probability lambda k in x * log x | k <- [0..]] -- | Compute the entropy of a poisson distribution using the best available -- method. poissonEntropy :: Double -> Double poissonEntropy lambda | lambda == 0 = 0 | lambda <= 10 = directEntropy lambda | lambda <= 12 = alyThm2 lambda upperCoefficients4 lowerCoefficients4 | lambda <= 18 = alyThm2 lambda upperCoefficients6 lowerCoefficients6 | lambda <= 24 = alyThm2 lambda upperCoefficients8 lowerCoefficients8 | lambda <= 30 = alyThm2 lambda upperCoefficients10 lowerCoefficients10 | otherwise = alyThm2 lambda upperCoefficients12 lowerCoefficients12 statistics-0.13.3.0/Statistics/Matrix/0000755000000000000000000000000012724050603015711 5ustar0000000000000000statistics-0.13.3.0/Statistics/Matrix/Types.hs0000644000000000000000000000417312724050603017356 0ustar0000000000000000-- | -- Module : Statistics.Matrix.Types -- Copyright : 2014 Bryan O'Sullivan -- License : BSD3 -- -- Basic matrix operations. -- -- There isn't a widely used matrix package for Haskell yet, so -- we implement the necessary minimum here. module Statistics.Matrix.Types ( Vector , MVector , Matrix(..) , MMatrix(..) , debug ) where import Data.Char (isSpace) import Numeric (showFFloat) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M type Vector = U.Vector Double type MVector s = M.MVector s Double -- | Two-dimensional matrix, stored in row-major order. data Matrix = Matrix { rows :: {-# UNPACK #-} !Int -- ^ Rows of matrix. , cols :: {-# UNPACK #-} !Int -- ^ Columns of matrix. , exponent :: {-# UNPACK #-} !Int -- ^ In order to avoid overflows during matrix multiplication, a -- large exponent is stored separately. , _vector :: !Vector -- ^ Matrix data. } deriving (Eq) -- | Two-dimensional mutable matrix, stored in row-major order. data MMatrix s = MMatrix {-# UNPACK #-} !Int {-# UNPACK #-} !Int {-# UNPACK #-} !Int !(MVector s) -- The Show instance is useful only for debugging. instance Show Matrix where show = debug debug :: Matrix -> String debug (Matrix r c _ vs) = unlines $ zipWith (++) (hdr0 : repeat hdr) rrows where rrows = map (cleanEnd . unwords) . split $ zipWith (++) ldone tdone hdr0 = show (r,c) ++ " " hdr = replicate (length hdr0) ' ' pad plus k xs = replicate (k - length xs) ' ' `plus` xs ldone = map (pad (++) (longest lstr)) lstr tdone = map (pad (flip (++)) (longest tstr)) tstr (lstr, tstr) = unzip . map (break (=='.') . render) . U.toList $ vs longest = maximum . map length render k = reverse . dropWhile (=='.') . dropWhile (=='0') . reverse . showFFloat (Just 4) k $ "" split [] = [] split xs = i : split rest where (i, rest) = splitAt c xs cleanEnd = reverse . dropWhile isSpace . reverse statistics-0.13.3.0/Statistics/Matrix/Algorithms.hs0000644000000000000000000000242312724050603020357 0ustar0000000000000000-- | -- Module : Statistics.Matrix.Algorithms -- Copyright : 2014 Bryan O'Sullivan -- License : BSD3 -- -- Useful matrix functions. module Statistics.Matrix.Algorithms ( qr ) where import Control.Applicative ((<$>), (<*>)) import Control.Monad.ST (ST, runST) import Prelude hiding (sum, replicate) import Statistics.Matrix (Matrix, column, dimension, for, norm) import qualified Statistics.Matrix.Mutable as M import Statistics.Sample.Internal (sum) import qualified Data.Vector.Unboxed as U -- | /O(r*c)/ Compute the QR decomposition of a matrix. -- The result returned is the matrices (/q/,/r/). qr :: Matrix -> (Matrix, Matrix) qr mat = runST $ do let (m,n) = dimension mat r <- M.replicate n n 0 a <- M.thaw mat for 0 n $ \j -> do cn <- M.immutably a $ \aa -> norm (column aa j) M.unsafeWrite r j j cn for 0 m $ \i -> M.unsafeModify a i j (/ cn) for (j+1) n $ \jj -> do p <- innerProduct a j jj M.unsafeWrite r j jj p for 0 m $ \i -> do aij <- M.unsafeRead a i j M.unsafeModify a i jj $ subtract (p * aij) (,) <$> M.unsafeFreeze a <*> M.unsafeFreeze r innerProduct :: M.MMatrix s -> Int -> Int -> ST s Double innerProduct mmat j k = M.immutably mmat $ \mat -> sum $ U.zipWith (*) (column mat j) (column mat k) statistics-0.13.3.0/Statistics/Matrix/Mutable.hs0000644000000000000000000000534512724050603017645 0ustar0000000000000000-- | -- Module : Statistics.Matrix.Mutable -- Copyright : (c) 2014 Bryan O'Sullivan -- License : BSD3 -- -- Basic mutable matrix operations. module Statistics.Matrix.Mutable ( MMatrix(..) , MVector , replicate , thaw , bounds , unsafeNew , unsafeFreeze , unsafeRead , unsafeWrite , unsafeModify , immutably , unsafeBounds ) where import Control.Applicative ((<$>)) import Control.DeepSeq (NFData(..)) import Control.Monad.ST (ST) import Statistics.Matrix.Types (Matrix(..), MMatrix(..), MVector) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M import Prelude hiding (replicate) replicate :: Int -> Int -> Double -> ST s (MMatrix s) replicate r c k = MMatrix r c 0 <$> M.replicate (r*c) k thaw :: Matrix -> ST s (MMatrix s) thaw (Matrix r c e v) = MMatrix r c e <$> U.thaw v unsafeFreeze :: MMatrix s -> ST s Matrix unsafeFreeze (MMatrix r c e mv) = Matrix r c e <$> U.unsafeFreeze mv -- | Allocate new matrix. Matrix content is not initialized hence unsafe. unsafeNew :: Int -- ^ Number of row -> Int -- ^ Number of columns -> ST s (MMatrix s) unsafeNew r c | r < 0 = error "Statistics.Matrix.Mutable.unsafeNew: negative number of rows" | c < 0 = error "Statistics.Matrix.Mutable.unsafeNew: negative number of columns" | otherwise = do vec <- M.new (r*c) return $ MMatrix r c 0 vec unsafeRead :: MMatrix s -> Int -> Int -> ST s Double unsafeRead mat r c = unsafeBounds mat r c M.unsafeRead {-# INLINE unsafeRead #-} unsafeWrite :: MMatrix s -> Int -> Int -> Double -> ST s () unsafeWrite mat row col k = unsafeBounds mat row col $ \v i -> M.unsafeWrite v i k {-# INLINE unsafeWrite #-} unsafeModify :: MMatrix s -> Int -> Int -> (Double -> Double) -> ST s () unsafeModify mat row col f = unsafeBounds mat row col $ \v i -> do k <- M.unsafeRead v i M.unsafeWrite v i (f k) {-# INLINE unsafeModify #-} -- | Given row and column numbers, calculate the offset into the flat -- row-major vector. bounds :: MMatrix s -> Int -> Int -> (MVector s -> Int -> r) -> r bounds (MMatrix rs cs _ mv) r c k | r < 0 || r >= rs = error "row out of bounds" | c < 0 || c >= cs = error "column out of bounds" | otherwise = k mv $! r * cs + c {-# INLINE bounds #-} -- | Given row and column numbers, calculate the offset into the flat -- row-major vector, without checking. unsafeBounds :: MMatrix s -> Int -> Int -> (MVector s -> Int -> r) -> r unsafeBounds (MMatrix _ cs _ mv) r c k = k mv $! r * cs + c {-# INLINE unsafeBounds #-} immutably :: NFData a => MMatrix s -> (Matrix -> a) -> ST s a immutably mmat f = do k <- f <$> unsafeFreeze mmat rnf k `seq` return k {-# INLINE immutably #-} statistics-0.13.3.0/Statistics/Test/0000755000000000000000000000000012724050603015364 5ustar0000000000000000statistics-0.13.3.0/Statistics/Test/Types.hs0000644000000000000000000000206112724050603017023 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} module Statistics.Test.Types ( TestType(..) , TestResult(..) , significant ) where import Data.Aeson (FromJSON, ToJSON) import Data.Data (Typeable, Data) import GHC.Generics -- | Test type. Exact meaning depends on a specific test. But -- generally it's tested whether some statistics is too big (small) -- for 'OneTailed' or whether it too big or too small for 'TwoTailed' data TestType = OneTailed | TwoTailed deriving (Eq,Ord,Show,Typeable,Data,Generic) instance FromJSON TestType instance ToJSON TestType -- | Result of hypothesis testing data TestResult = Significant -- ^ Null hypothesis should be rejected | NotSignificant -- ^ Data is compatible with hypothesis deriving (Eq,Ord,Show,Typeable,Data,Generic) instance FromJSON TestResult instance ToJSON TestResult -- | Significant if parameter is 'True', not significant otherwiser significant :: Bool -> TestResult significant True = Significant significant False = NotSignificant statistics-0.13.3.0/Statistics/Test/MannWhitneyU.hs0000644000000000000000000002321612724050603020312 0ustar0000000000000000-- | -- Module : Statistics.Test.MannWhitneyU -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Mann-Whitney U test (also know as Mann-Whitney-Wilcoxon and -- Wilcoxon rank sum test) is a non-parametric test for assesing -- whether two samples of independent observations have different -- mean. module Statistics.Test.MannWhitneyU ( -- * Mann-Whitney U test mannWhitneyUtest , mannWhitneyU , mannWhitneyUCriticalValue , mannWhitneyUSignificant -- ** Wilcoxon rank sum test , wilcoxonRankSums -- * Data types , TestType(..) , TestResult(..) -- * References -- $references ) where import Control.Applicative ((<$>)) import Data.List (findIndex) import Data.Ord (comparing) import Numeric.SpecFunctions (choose) import Prelude hiding (sum) import Statistics.Distribution (quantile) import Statistics.Distribution.Normal (standard) import Statistics.Function (sortBy) import Statistics.Sample.Internal (sum) import Statistics.Test.Internal (rank, splitByTags) import Statistics.Test.Types (TestResult(..), TestType(..), significant) import Statistics.Types (Sample) import qualified Data.Vector.Unboxed as U -- | The Wilcoxon Rank Sums Test. -- -- This test calculates the sum of ranks for the given two samples. The samples -- are ordered, and assigned ranks (ties are given their average rank), then these -- ranks are summed for each sample. -- -- The return value is (W₁, W₂) where W₁ is the sum of ranks of the first sample -- and W₂ is the sum of ranks of the second sample. This test is trivially transformed -- into the Mann-Whitney U test. You will probably want to use 'mannWhitneyU' -- and the related functions for testing significance, but this function is exposed -- for completeness. wilcoxonRankSums :: Sample -> Sample -> (Double, Double) wilcoxonRankSums xs1 xs2 = (sum ranks1, sum ranks2) where -- Ranks for each sample (ranks1,ranks2) = splitByTags $ U.zip tags (rank (==) joinSample) -- Sorted and tagged sample (tags,joinSample) = U.unzip $ sortBy (comparing snd) $ tagSample True xs1 U.++ tagSample False xs2 -- Add tag to a sample tagSample t = U.map ((,) t) -- | The Mann-Whitney U Test. -- -- This is sometimes known as the Mann-Whitney-Wilcoxon U test, and -- confusingly many sources state that the Mann-Whitney U test is the same as -- the Wilcoxon's rank sum test (which is provided as 'wilcoxonRankSums'). -- The Mann-Whitney U is a simple transform of Wilcoxon's rank sum test. -- -- Again confusingly, different sources state reversed definitions for U₁ -- and U₂, so it is worth being explicit about what this function returns. -- Given two samples, the first, xs₁, of size n₁ and the second, xs₂, -- of size n₂, this function returns (U₁, U₂) -- where U₁ = W₁ - (n₁(n₁+1))\/2 -- and U₂ = W₂ - (n₂(n₂+1))\/2, -- where (W₁, W₂) is the return value of @wilcoxonRankSums xs1 xs2@. -- -- Some sources instead state that U₁ and U₂ should be the other way round, often -- expressing this using U₁' = n₁n₂ - U₁ (since U₁ + U₂ = n₁n₂). -- -- All of which you probably don't care about if you just feed this into 'mannWhitneyUSignificant'. mannWhitneyU :: Sample -> Sample -> (Double, Double) mannWhitneyU xs1 xs2 = (fst summedRanks - (n1*(n1 + 1))/2 ,snd summedRanks - (n2*(n2 + 1))/2) where n1 = fromIntegral $ U.length xs1 n2 = fromIntegral $ U.length xs2 summedRanks = wilcoxonRankSums xs1 xs2 -- | Calculates the critical value of Mann-Whitney U for the given sample -- sizes and significance level. -- -- This function returns the exact calculated value of U for all sample sizes; -- it does not use the normal approximation at all. Above sample size 20 it is -- generally recommended to use the normal approximation instead, but this function -- will calculate the higher critical values if you need them. -- -- The algorithm to generate these values is a faster, memoised version of the -- simple unoptimised generating function given in section 2 of \"The Mann Whitney -- Wilcoxon Distribution Using Linked Lists\" mannWhitneyUCriticalValue :: (Int, Int) -- ^ The sample size -> Double -- ^ The p-value (e.g. 0.05) for which you want the critical value. -> Maybe Int -- ^ The critical value (of U). mannWhitneyUCriticalValue (m, n) p | m < 1 || n < 1 = Nothing -- Sample must be nonempty | p >= 1 = Nothing -- Nonsensical p-value | p' <= 1 = Nothing -- p-value is too small. Null hypothesys couln't be disproved | otherwise = findIndex (>= p') $ take (m*n) $ tail $ alookup !! (m+n-2) !! (min m n - 1) where mnCn = (m+n) `choose` n p' = mnCn * p {- -- Original function, without memoisation, from Cheung and Klotz: -- Double is needed to avoid integer overflows. a :: Int -> Int -> Int -> Double a u bigN m | u < 0 = 0 | u >= m * n = bigN `choose` m | m == 1 || n == 1 = fromIntegral (u + 1) | otherwise = a u (bigN - 1) m + a (u - n) (bigN - 1) (m-1) where n = bigN - m -} -- Memoised version of the original a function, above. -- -- Doubles are stored to avoid integer overflow. 32-bit Ints begin to -- overflow for bigN as small as 33 (64-bit one at 66) while Double to -- go to infinity till bigN=1029 -- -- -- outer list is indexed by big N - 2 -- inner list by (m-1) (we know m < bigN) -- innermost list by u -- -- So: (alookup !! (bigN - 2) !! (m - 1) ! u) == a u bigN m alookup :: [[[Double]]] alookup = gen 2 [1 : repeat 2] where gen bigN predBigNList = let bigNlist = [ [ amemoed u m | u <- [0 .. m*(bigN-m)] ] ++ repeat (bigN `choose` m) | m <- [1 .. (bigN-1)]] -- has bigN-1 elements in bigNlist : gen (bigN+1) bigNlist where amemoed :: Int -> Int -> Double amemoed u m | m == 1 || n == 1 = fromIntegral (u + 1) | otherwise = mList !! u + if u < n then 0 else predmList !! (u-n) where n = bigN - m (predmList : mList : _) = drop (m-2) predBigNList -- Lists for m-1 and m respectively. i-th list correspond to m=i+1 -- -- We know that predBigNList has bigN - 2 elements -- (and we know that n > 1 therefore bigN > m + 1) -- So bigN - 2 >= m, i.e. predBigNList must have at least m elements -- elements, so dropping (m-2) must leave at least 2 -- | Calculates whether the Mann Whitney U test is significant. -- -- If both sample sizes are less than or equal to 20, the exact U critical value -- (as calculated by 'mannWhitneyUCriticalValue') is used. If either sample is -- larger than 20, the normal approximation is used instead. -- -- If you use a one-tailed test, the test indicates whether the first sample is -- significantly larger than the second. If you want the opposite, simply reverse -- the order in both the sample size and the (U₁, U₂) pairs. mannWhitneyUSignificant :: TestType -- ^ Perform one-tailed test (see description above). -> (Int, Int) -- ^ The samples' size from which the (U₁,U₂) values were derived. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> (Double, Double) -- ^ The (U₁, U₂) values from 'mannWhitneyU'. -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. mannWhitneyUSignificant test (in1, in2) p (u1, u2) --Use normal approximation | in1 > 20 || in2 > 20 = let mean = n1 * n2 / 2 sigma = sqrt $ n1*n2*(n1 + n2 + 1) / 12 z = (mean - u1) / sigma in Just $ case test of OneTailed -> significant $ z < quantile standard p TwoTailed -> significant $ abs z > abs (quantile standard (p/2)) -- Use exact critical value | otherwise = do crit <- fromIntegral <$> mannWhitneyUCriticalValue (in1, in2) p return $ case test of OneTailed -> significant $ u2 <= crit TwoTailed -> significant $ min u1 u2 <= crit where n1 = fromIntegral in1 n2 = fromIntegral in2 -- | Perform Mann-Whitney U Test for two samples and required -- significance. For additional information check documentation of -- 'mannWhitneyU' and 'mannWhitneyUSignificant'. This is just a helper -- function. -- -- One-tailed test checks whether first sample is significantly larger -- than second. Two-tailed whether they are significantly different. mannWhitneyUtest :: TestType -- ^ Perform one-tailed test (see description above). -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Sample -- ^ First sample -> Sample -- ^ Second sample -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too small to -- make a decision. mannWhitneyUtest ontTail p smp1 smp2 = mannWhitneyUSignificant ontTail (n1,n2) p $ mannWhitneyU smp1 smp2 where n1 = U.length smp1 n2 = U.length smp2 -- $references -- -- * Cheung, Y.K.; Klotz, J.H. (1997) The Mann Whitney Wilcoxon -- distribution using linked lists. /Statistica Sinica/ -- 7:805–813. -- . statistics-0.13.3.0/Statistics/Test/WilcoxonT.hs0000644000000000000000000002152012724050603017646 0ustar0000000000000000-- | -- Module : Statistics.Test.WilcoxonT -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Wilcoxon matched-pairs signed-rank test is non-parametric test -- which could be used to whether two related samples have different -- means. -- -- WARNING: current implementation contain serious bug and couldn't be -- used with samples larger than 1023. -- module Statistics.Test.WilcoxonT ( -- * Wilcoxon signed-rank matched-pair test wilcoxonMatchedPairTest , wilcoxonMatchedPairSignedRank , wilcoxonMatchedPairSignificant , wilcoxonMatchedPairSignificance , wilcoxonMatchedPairCriticalValue -- * Data types , TestType(..) , TestResult(..) ) where -- -- -- -- Note that: wilcoxonMatchedPairSignedRank == (\(x, y) -> (y, x)) . flip wilcoxonMatchedPairSignedRank -- The samples are zipped together: if one is longer than the other, both are truncated -- The value returned is the pair (T+, T-). T+ is the sum of positive ranks (the -- These values mean little by themselves, and should be combined with the 'wilcoxonSignificant' -- function in this module to get a meaningful result. -- ranks of the differences where the first parameter is higher) whereas T- is -- the sum of negative ranks (the ranks of the differences where the second parameter is higher). -- to the the length of the shorter sample. import Control.Applicative ((<$>)) import Data.Function (on) import Data.List (findIndex) import Data.Ord (comparing) import Prelude hiding (sum) import Statistics.Function (sortBy) import Statistics.Sample.Internal (sum) import Statistics.Test.Internal (rank, splitByTags) import Statistics.Test.Types (TestResult(..), TestType(..), significant) import Statistics.Types (Sample) import qualified Data.Vector.Unboxed as U wilcoxonMatchedPairSignedRank :: Sample -> Sample -> (Double, Double) wilcoxonMatchedPairSignedRank a b = (sum ranks1, negate (sum ranks2)) where (ranks1, ranks2) = splitByTags $ U.zip tags (rank ((==) `on` abs) diffs) (tags,diffs) = U.unzip $ U.map (\x -> (x>0 , x)) -- Attack tags to distribution elements $ U.filter (/= 0.0) -- Remove equal elements $ sortBy (comparing abs) -- Sort the differences by absolute difference $ U.zipWith (-) a b -- Work out differences -- | The coefficients for x^0, x^1, x^2, etc, in the expression -- \prod_{r=1}^s (1 + x^r). See the Mitic paper for details. -- -- We can define: -- f(1) = 1 + x -- f(r) = (1 + x^r)*f(r-1) -- = f(r-1) + x^r * f(r-1) -- The effect of multiplying the equation by x^r is to shift -- all the coefficients by r down the list. -- -- This list will be processed lazily from the head. coefficients :: Int -> [Integer] coefficients 1 = [1, 1] -- 1 + x coefficients r = let coeffs = coefficients (r-1) (firstR, rest) = splitAt r coeffs in firstR ++ add rest coeffs where add (x:xs) (y:ys) = x + y : add xs ys add xs [] = xs add [] ys = ys -- This list will be processed lazily from the head. summedCoefficients :: Int -> [Double] summedCoefficients n | n < 1 = error "Statistics.Test.WilcoxonT.summedCoefficients: nonpositive sample size" | n > 1023 = error "Statistics.Test.WilcoxonT.summedCoefficients: sample is too large (see bug #18)" | otherwise = map fromIntegral $ scanl1 (+) $ coefficients n -- | Tests whether a given result from a Wilcoxon signed-rank matched-pairs test -- is significant at the given level. -- -- This function can perform a one-tailed or two-tailed test. If the first -- parameter to this function is 'TwoTailed', the test is performed two-tailed to -- check if the two samples differ significantly. If the first parameter is -- 'OneTailed', the check is performed one-tailed to decide whether the first sample -- (i.e. the first sample you passed to 'wilcoxonMatchedPairSignedRank') is -- greater than the second sample (i.e. the second sample you passed to -- 'wilcoxonMatchedPairSignedRank'). If you wish to perform a one-tailed test -- in the opposite direction, you can either pass the parameters in a different -- order to 'wilcoxonMatchedPairSignedRank', or simply swap the values in the resulting -- pair before passing them to this function. wilcoxonMatchedPairSignificant :: TestType -- ^ Perform one- or two-tailed test (see description below). -> Int -- ^ The sample size from which the (T+,T-) values were derived. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> (Double, Double) -- ^ The (T+, T-) values from 'wilcoxonMatchedPairSignedRank'. -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. wilcoxonMatchedPairSignificant test sampleSize p (tPlus, tMinus) = case test of -- According to my nearest book (Understanding Research Methods and Statistics -- by Gary W. Heiman, p590), to check that the first sample is bigger you must -- use the absolute value of T- for a one-tailed check: OneTailed -> (significant . (abs tMinus <=) . fromIntegral) <$> wilcoxonMatchedPairCriticalValue sampleSize p -- Otherwise you must use the value of T+ and T- with the smallest absolute value: TwoTailed -> (significant . (t <=) . fromIntegral) <$> wilcoxonMatchedPairCriticalValue sampleSize (p/2) where t = min (abs tPlus) (abs tMinus) -- | Obtains the critical value of T to compare against, given a sample size -- and a p-value (significance level). Your T value must be less than or -- equal to the return of this function in order for the test to work out -- significant. If there is a Nothing return, the sample size is too small to -- make a decision. -- -- 'wilcoxonSignificant' tests the return value of 'wilcoxonMatchedPairSignedRank' -- for you, so you should use 'wilcoxonSignificant' for determining test results. -- However, this function is useful, for example, for generating lookup tables -- for Wilcoxon signed rank critical values. -- -- The return values of this function are generated using the method detailed in -- the paper \"Critical Values for the Wilcoxon Signed Rank Statistic\", Peter -- Mitic, The Mathematica Journal, volume 6, issue 3, 1996, which can be found -- here: . -- According to that paper, the results may differ from other published lookup tables, but -- (Mitic claims) the values obtained by this function will be the correct ones. wilcoxonMatchedPairCriticalValue :: Int -- ^ The sample size -> Double -- ^ The p-value (e.g. 0.05) for which you want the critical value. -> Maybe Int -- ^ The critical value (of T), or Nothing if -- the sample is too small to make a decision. wilcoxonMatchedPairCriticalValue sampleSize p = case critical of Just n | n < 0 -> Nothing | otherwise -> Just n Nothing -> Just maxBound -- shouldn't happen: beyond end of list where m = (2 ** fromIntegral sampleSize) * p critical = subtract 1 <$> findIndex (> m) (summedCoefficients sampleSize) -- | Works out the significance level (p-value) of a T value, given a sample -- size and a T value from the Wilcoxon signed-rank matched-pairs test. -- -- See the notes on 'wilcoxonCriticalValue' for how this is calculated. wilcoxonMatchedPairSignificance :: Int -- ^ The sample size -> Double -- ^ The value of T for which you want the significance. -> Double -- ^ The significance (p-value). wilcoxonMatchedPairSignificance sampleSize rnk = (summedCoefficients sampleSize !! floor rnk) / 2 ** fromIntegral sampleSize -- | The Wilcoxon matched-pairs signed-rank test. The samples are -- zipped together: if one is longer than the other, both are -- truncated to the the length of the shorter sample. -- -- For one-tailed test it tests whether first sample is significantly -- greater than the second. For two-tailed it checks whether they -- significantly differ -- -- Check 'wilcoxonMatchedPairSignedRank' and -- 'wilcoxonMatchedPairSignificant' for additional information. wilcoxonMatchedPairTest :: TestType -- ^ Perform one-tailed test. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Sample -- ^ First sample -> Sample -- ^ Second sample -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. wilcoxonMatchedPairTest test p smp1 smp2 = wilcoxonMatchedPairSignificant test (min n1 n2) p $ wilcoxonMatchedPairSignedRank smp1 smp2 where n1 = U.length smp1 n2 = U.length smp2 statistics-0.13.3.0/Statistics/Test/Internal.hs0000644000000000000000000000526312724050603017502 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} module Statistics.Test.Internal ( rank , rankUnsorted , splitByTags ) where import Data.Ord import Data.Vector.Generic ((!)) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as M import Statistics.Function -- Private data type for unfolding data Rank v a = Rank { rankCnt :: {-# UNPACK #-} !Int -- Number of ranks to return , rankVal :: {-# UNPACK #-} !Double -- Rank to return , rankNum :: {-# UNPACK #-} !Double -- Current rank , rankVec :: v a -- Remaining vector } -- | Calculate rank of every element of sample. In case of ties ranks -- are averaged. Sample should be already sorted in ascending order. -- -- >>> rank (==) (fromList [10,20,30::Int]) -- > fromList [1.0,2.0,3.0] -- -- >>> rank (==) (fromList [10,10,10,30::Int]) -- > fromList [2.0,2.0,2.0,4.0] rank :: (G.Vector v a, G.Vector v Double) => (a -> a -> Bool) -- ^ Equivalence relation -> v a -- ^ Vector to rank -> v Double rank eq vec = G.unfoldr go (Rank 0 (-1) 1 vec) where go (Rank 0 _ r v) | G.null v = Nothing | otherwise = case G.length h of 1 -> Just (r, Rank 0 0 (r+1) rest) n -> go Rank { rankCnt = n , rankVal = 0.5 * (r*2 + fromIntegral (n-1)) , rankNum = r + fromIntegral n , rankVec = rest } where (h,rest) = G.span (eq $ G.head v) v go (Rank n val r v) = Just (val, Rank (n-1) val r v) {-# INLINE rank #-} -- | Compute rank of every element of vector. Unlike rank it doesn't -- require sample to be sorted. rankUnsorted :: ( Ord a , G.Vector v a , G.Vector v Int , G.Vector v Double , G.Vector v (Int, a) ) => v a -> v Double rankUnsorted xs = G.create $ do -- Put ranks into their original positions -- NOTE: backpermute will do wrong thing vec <- M.new n for 0 n $ \i -> M.unsafeWrite vec (index ! i) (ranks ! i) return vec where n = G.length xs -- Calculate ranks for sorted array ranks = rank (==) sorted -- Sort vector and retain original indices of elements (index, sorted) = G.unzip $ sortBy (comparing snd) $ indexed xs {-# INLINE rankUnsorted #-} -- | Split tagged vector splitByTags :: (G.Vector v a, G.Vector v (Bool,a)) => v (Bool,a) -> (v a, v a) splitByTags vs = (G.map snd a, G.map snd b) where (a,b) = G.unstablePartition fst vs {-# INLINE splitByTags #-} statistics-0.13.3.0/Statistics/Test/KruskalWallis.hs0000644000000000000000000000670212724050603020515 0ustar0000000000000000-- | -- Module : Statistics.Test.KruskalWallis -- Copyright : (c) 2014 Danny Navarro -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- module Statistics.Test.KruskalWallis ( kruskalWallisRank , kruskalWallis , kruskalWallisSignificant , kruskalWallisTest ) where import Data.Ord (comparing) import Data.Foldable (foldMap) import qualified Data.Vector.Unboxed as U import Statistics.Function (sort, sortBy, square) import Statistics.Distribution (quantile) import Statistics.Distribution.ChiSquared (chiSquared) import Statistics.Test.Types (TestResult(..), significant) import Statistics.Test.Internal (rank) import Statistics.Sample import qualified Statistics.Sample.Internal as Sample(sum) -- | Kruskal-Wallis ranking. -- -- All values are replaced by the absolute rank in the combined samples. -- -- The samples and values need not to be ordered but the values in the result -- are ordered. Assigned ranks (ties are given their average rank). kruskalWallisRank :: [Sample] -> [Sample] kruskalWallisRank samples = groupByTags . sortBy (comparing fst) . U.zip tags $ rank (==) joinSample where (tags,joinSample) = U.unzip . sortBy (comparing snd) $ foldMap (uncurry tagSample) $ zip [(1::Int)..] samples tagSample t = U.map (\x -> (t,x)) groupByTags xs | U.null xs = [] | otherwise = sort (U.map snd ys) : groupByTags zs where (ys,zs) = U.span ((==) (fst $ U.head xs) . fst) xs -- | The Kruskal-Wallis Test. -- -- In textbooks the output value is usually represented by 'K' or 'H'. This -- function already does the ranking. kruskalWallis :: [Sample] -> Double kruskalWallis samples = (nTot - 1) * numerator / denominator where -- Total number of elements in all samples nTot = fromIntegral $ sumWith rsamples U.length -- Average rank of all samples avgRank = (nTot + 1) / 2 -- numerator = sumWith rsamples $ \sample -> let n = fromIntegral $ U.length sample in n * square (mean sample - avgRank) denominator = sumWith rsamples $ \sample -> Sample.sum $ U.map (\r -> square (r - avgRank)) sample rsamples = kruskalWallisRank samples -- | Calculates whether the Kruskal-Wallis test is significant. -- -- It uses /Chi-Squared/ distribution for aproximation as long as the sizes are -- larger than 5. Otherwise the test returns 'Nothing'. kruskalWallisSignificant :: [Int] -- ^ The samples' size -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Double -- ^ K value from 'kruskallWallis' -> Maybe TestResult kruskalWallisSignificant ns p k -- Use chi-squared approximation | all (>4) ns = Just . significant $ k > x -- TODO: Implement critical value calculation: kruskalWallisCriticalValue | otherwise = Nothing where x = quantile (chiSquared (length ns - 1)) (1 - p) -- | Perform Kruskal-Wallis Test for the given samples and required -- significance. For additional information check 'kruskalWallis'. This is just -- a helper function. kruskalWallisTest :: Double -> [Sample] -> Maybe TestResult kruskalWallisTest p samples = kruskalWallisSignificant (map U.length samples) p $ kruskalWallis samples -- * Helper functions sumWith :: Num a => [Sample] -> (Sample -> a) -> a sumWith samples f = Prelude.sum $ fmap f samples {-# INLINE sumWith #-} statistics-0.13.3.0/Statistics/Test/ChiSquared.hs0000644000000000000000000000354512724050603017757 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | Pearson's chi squared test. module Statistics.Test.ChiSquared ( chi2test -- * Data types , TestType(..) , TestResult(..) ) where import Prelude hiding (sum) import Statistics.Distribution import Statistics.Distribution.ChiSquared import Statistics.Function (square) import Statistics.Sample.Internal (sum) import Statistics.Test.Types import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | Generic form of Pearson chi squared tests for binned data. Data -- sample is supplied in form of tuples (observed quantity, -- expected number of events). Both must be positive. chi2test :: (G.Vector v (Int,Double), G.Vector v Double) => Double -- ^ p-value -> Int -- ^ Number of additional degrees of -- freedom. One degree of freedom -- is due to the fact that the are -- N observation in total and -- accounted for automatically. -> v (Int,Double) -- ^ Observation and expectation. -> TestResult chi2test p ndf vec | ndf < 0 = error $ "Statistics.Test.ChiSquare.chi2test: negative NDF " ++ show ndf | n < 0 = error $ "Statistics.Test.ChiSquare.chi2test: too short data sample" | p > 0 && p < 1 = significant $ complCumulative d chi2 < p | otherwise = error $ "Statistics.Test.ChiSquare.chi2test: bad p-value: " ++ show p where n = G.length vec - ndf - 1 chi2 = sum $ G.map (\(o,e) -> square (fromIntegral o - e) / e) vec d = chiSquared n {-# SPECIALIZE chi2test :: Double -> Int -> U.Vector (Int,Double) -> TestResult #-} {-# SPECIALIZE chi2test :: Double -> Int -> V.Vector (Int,Double) -> TestResult #-} statistics-0.13.3.0/Statistics/Test/KolmogorovSmirnov.hs0000644000000000000000000002077712724050603021451 0ustar0000000000000000-- | -- Module : Statistics.Test.KolmogorovSmirnov -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kolmogov-Smirnov tests are non-parametric tests for assesing -- whether given sample could be described by distribution or whether -- two samples have the same distribution. It's only applicable to -- continous distributions. module Statistics.Test.KolmogorovSmirnov ( -- * Kolmogorov-Smirnov test kolmogorovSmirnovTest , kolmogorovSmirnovTestCdf , kolmogorovSmirnovTest2 -- * Evaluate statistics , kolmogorovSmirnovCdfD , kolmogorovSmirnovD , kolmogorovSmirnov2D -- * Probablities , kolmogorovSmirnovProbability -- * Data types , TestType(..) , TestResult(..) -- * References -- $references ) where import Control.Monad (when) import Prelude hiding (exponent, sum) import Statistics.Distribution (Distribution(..)) import Statistics.Function (sort, unsafeModify) import Statistics.Matrix (center, exponent, for, fromVector, power) import Statistics.Test.Types (TestResult(..), TestType(..), significant) import Statistics.Types (Sample) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M ---------------------------------------------------------------- -- Test ---------------------------------------------------------------- -- | Check that sample could be described by -- distribution. 'Significant' means distribution is not compatible -- with data for given p-value. -- -- This test uses Marsaglia-Tsang-Wang exact alogorithm for -- calculation of p-value. kolmogorovSmirnovTest :: Distribution d => d -- ^ Distribution -> Double -- ^ p-value -> Sample -- ^ Data sample -> TestResult kolmogorovSmirnovTest d = kolmogorovSmirnovTestCdf (cumulative d) -- | Variant of 'kolmogorovSmirnovTest' which uses CFD in form of -- function. kolmogorovSmirnovTestCdf :: (Double -> Double) -- ^ CDF of distribution -> Double -- ^ p-value -> Sample -- ^ Data sample -> TestResult kolmogorovSmirnovTestCdf cdf p sample | p > 0 && p < 1 = significant $ 1 - prob < p | otherwise = error "Statistics.Test.KolmogorovSmirnov.kolmogorovSmirnovTestCdf:bad p-value" where d = kolmogorovSmirnovCdfD cdf sample prob = kolmogorovSmirnovProbability (U.length sample) d -- | Two sample Kolmogorov-Smirnov test. It tests whether two data -- samples could be described by the same distribution without -- making any assumptions about it. -- -- This test uses approxmate formula for computing p-value. kolmogorovSmirnovTest2 :: Double -- ^ p-value -> Sample -- ^ Sample 1 -> Sample -- ^ Sample 2 -> TestResult kolmogorovSmirnovTest2 p xs1 xs2 | p > 0 && p < 1 = significant $ 1 - prob( d*(en + 0.12 + 0.11/en) ) < p | otherwise = error "Statistics.Test.KolmogorovSmirnov.kolmogorovSmirnovTest2:bad p-value" where d = kolmogorovSmirnov2D xs1 xs2 -- Effective number of data points n1 = fromIntegral (U.length xs1) n2 = fromIntegral (U.length xs2) en = sqrt $ n1 * n2 / (n1 + n2) -- prob z | z < 0 = error "kolmogorovSmirnov2D: internal error" | z == 0 = 1 | z < 1.18 = let y = exp( -1.23370055013616983 / (z*z) ) in 2.25675833419102515 * sqrt( -log(y) ) * (y + y**9 + y**25 + y**49) | otherwise = let x = exp(-2 * z * z) in 1 - 2*(x - x**4 + x**9) -- FIXME: Find source for approximation for D ---------------------------------------------------------------- -- Kolmogorov's statistic ---------------------------------------------------------------- -- | Calculate Kolmogorov's statistic /D/ for given cumulative -- distribution function (CDF) and data sample. If sample is empty -- returns 0. kolmogorovSmirnovCdfD :: (Double -> Double) -- ^ CDF function -> Sample -- ^ Sample -> Double kolmogorovSmirnovCdfD cdf sample | U.null sample = 0 | otherwise = U.maximum $ U.zipWith3 (\p a b -> abs (p-a) `max` abs (p-b)) ps steps (U.tail steps) where xs = sort sample n = U.length xs -- ps = U.map cdf xs steps = U.map ((/ fromIntegral n) . fromIntegral) $ U.generate (n+1) id -- | Calculate Kolmogorov's statistic /D/ for given cumulative -- distribution function (CDF) and data sample. If sample is empty -- returns 0. kolmogorovSmirnovD :: (Distribution d) => d -- ^ Distribution -> Sample -- ^ Sample -> Double kolmogorovSmirnovD d = kolmogorovSmirnovCdfD (cumulative d) -- | Calculate Kolmogorov's statistic /D/ for two data samples. If -- either of samples is empty returns 0. kolmogorovSmirnov2D :: Sample -- ^ First sample -> Sample -- ^ Second sample -> Double kolmogorovSmirnov2D sample1 sample2 | U.null sample1 || U.null sample2 = 0 | otherwise = worker 0 0 0 where xs1 = sort sample1 xs2 = sort sample2 n1 = U.length xs1 n2 = U.length xs2 en1 = fromIntegral n1 en2 = fromIntegral n2 -- Find new index skip x i xs = go (i+1) where go n | n >= U.length xs = n | xs U.! n == x = go (n+1) | otherwise = n -- Main loop worker d i1 i2 | i1 >= n1 || i2 >= n2 = d | otherwise = worker d' i1' i2' where d1 = xs1 U.! i1 d2 = xs2 U.! i2 i1' | d1 <= d2 = skip d1 i1 xs1 | otherwise = i1 i2' | d2 <= d1 = skip d2 i2 xs2 | otherwise = i2 d' = max d (abs $ fromIntegral i1' / en1 - fromIntegral i2' / en2) -- | Calculate cumulative probability function for Kolmogorov's -- distribution with /n/ parameters or probability of getting value -- smaller than /d/ with n-elements sample. -- -- It uses algorithm by Marsgalia et. al. and provide at least -- 7-digit accuracy. kolmogorovSmirnovProbability :: Int -- ^ Size of the sample -> Double -- ^ D value -> Double kolmogorovSmirnovProbability n d -- Avoid potencially lengthy calculations for large N and D > 0.999 | s > 7.24 || (s > 3.76 && n > 99) = 1 - 2 * exp( -(2.000071 + 0.331 / sqrt n' + 1.409 / n') * s) -- Exact computation | otherwise = fini $ matrix `power` n where s = n' * d * d n' = fromIntegral n size = 2*k - 1 k = floor (n' * d) + 1 h = fromIntegral k - n' * d -- Calculate initial matrix matrix = let m = U.create $ do mat <- M.new (size*size) -- Fill matrix with 0 and 1s for 0 size $ \row -> for 0 size $ \col -> do let val | row + 1 >= col = 1 | otherwise = 0 :: Double M.write mat (row * size + col) val -- Correct left column/bottom row for 0 size $ \i -> do let delta = h ^^ (i + 1) unsafeModify mat (i * size) (subtract delta) unsafeModify mat (size * size - 1 - i) (subtract delta) -- Correct corner element if needed when (2*h > 1) $ do unsafeModify mat ((size - 1) * size) (+ ((2*h - 1) ^ size)) -- Divide diagonals by factorial let divide g num | num == size = return () | otherwise = do for num size $ \i -> unsafeModify mat (i * (size + 1) - num) (/ g) divide (g * fromIntegral (num+2)) (num+1) divide 2 1 return mat in fromVector size size m -- Last calculation fini m = loop 1 (center m) (exponent m) where loop i ss eQ | i > n = ss * 10 ^^ eQ | ss' < 1e-140 = loop (i+1) (ss' * 1e140) (eQ - 140) | otherwise = loop (i+1) ss' eQ where ss' = ss * fromIntegral i / fromIntegral n ---------------------------------------------------------------- -- $references -- -- * G. Marsaglia, W. W. Tsang, J. Wang (2003) Evaluating Kolmogorov's -- distribution, Journal of Statistical Software, American -- Statistical Association, vol. 8(i18). statistics-0.13.3.0/Statistics/Sample/0000755000000000000000000000000012724050603015666 5ustar0000000000000000statistics-0.13.3.0/Statistics/Sample/KernelDensity.hs0000644000000000000000000001136012724050603021003 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts, UnboxedTuples #-} -- | -- Module : Statistics.Sample.KernelDensity -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kernel density estimation. This module provides a fast, robust, -- non-parametric way to estimate the probability density function of -- a sample. -- -- This estimator does not use the commonly employed \"Gaussian rule -- of thumb\". As a result, it outperforms many plug-in methods on -- multimodal samples with widely separated modes. module Statistics.Sample.KernelDensity ( -- * Estimation functions kde , kde_ -- * References -- $references ) where import Numeric.MathFunctions.Constants (m_sqrt_2_pi) import Prelude hiding (const, min, max, sum) import Statistics.Function (minMax, nextHighestPowerOfTwo) import Statistics.Math.RootFinding (fromRoot, ridders) import Statistics.Sample.Histogram (histogram_) import Statistics.Sample.Internal (sum) import Statistics.Transform (CD, dct, idct) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector as V -- | Gaussian kernel density estimator for one-dimensional data, using -- the method of Botev et al. -- -- The result is a pair of vectors, containing: -- -- * The coordinates of each mesh point. The mesh interval is chosen -- to be 20% larger than the range of the sample. (To specify the -- mesh interval, use 'kde_'.) -- -- * Density estimates at each mesh point. kde :: (G.Vector v CD, G.Vector v Double, G.Vector v Int) => Int -- ^ The number of mesh points to use in the uniform discretization -- of the interval @(min,max)@. If this value is not a power of -- two, then it is rounded up to the next power of two. -> v Double -> (v Double, v Double) kde n0 xs = kde_ n0 (lo - range / 10) (hi + range / 10) xs where (lo,hi) = minMax xs range | G.length xs <= 1 = 1 -- Unreasonable guess | lo == hi = 1 -- All elements are equal | otherwise = hi - lo {-# INLINABLE kde #-} {-# SPECIAlIZE kde :: Int -> U.Vector Double -> (U.Vector Double, U.Vector Double) #-} {-# SPECIAlIZE kde :: Int -> V.Vector Double -> (V.Vector Double, V.Vector Double) #-} -- | Gaussian kernel density estimator for one-dimensional data, using -- the method of Botev et al. -- -- The result is a pair of vectors, containing: -- -- * The coordinates of each mesh point. -- -- * Density estimates at each mesh point. kde_ :: (G.Vector v CD, G.Vector v Double, G.Vector v Int) => Int -- ^ The number of mesh points to use in the uniform discretization -- of the interval @(min,max)@. If this value is not a power of -- two, then it is rounded up to the next power of two. -> Double -- ^ Lower bound (@min@) of the mesh range. -> Double -- ^ Upper bound (@max@) of the mesh range. -> v Double -> (v Double, v Double) kde_ n0 min max xs | G.null xs = error "Statistics.KernelDensity.kde: empty sample" | n0 <= 1 = error "Statistics.KernelDensity.kde: invalid number of points" | otherwise = (mesh, density) where mesh = G.generate ni $ \z -> min + (d * fromIntegral z) where d = r / (n-1) density = G.map (/(2 * r)) . idct $ G.zipWith f a (G.enumFromTo 0 (n-1)) where f b z = b * exp (sqr z * sqr pi * t_star * (-0.5)) !n = fromIntegral ni !ni = nextHighestPowerOfTwo n0 !r = max - min a = dct . G.map (/ sum h) $ h where h = G.map (/ len) $ histogram_ ni min max xs !len = fromIntegral (G.length xs) !t_star = fromRoot (0.28 * len ** (-0.4)) . ridders 1e-14 (0,0.1) $ \x -> x - (len * (2 * sqrt pi) * go 6 (f 7 x)) ** (-0.4) where f q t = 2 * pi ** (q*2) * sum (G.zipWith g iv a2v) where g i a2 = i ** q * a2 * exp ((-i) * sqr pi * t) a2v = G.map (sqr . (*0.5)) $ G.tail a iv = G.map sqr $ G.enumFromTo 1 (n-1) go s !h | s == 1 = h | otherwise = go (s-1) (f s time) where time = (2 * const * k0 / len / h) ** (2 / (3 + 2 * s)) const = (1 + 0.5 ** (s+0.5)) / 3 k0 = U.product (G.enumFromThenTo 1 3 (2*s-1)) / m_sqrt_2_pi sqr x = x * x {-# INLINABLE kde_ #-} {-# SPECIAlIZE kde_ :: Int -> Double -> Double -> U.Vector Double -> (U.Vector Double, U.Vector Double) #-} {-# SPECIAlIZE kde_ :: Int -> Double -> Double -> V.Vector Double -> (V.Vector Double, V.Vector Double) #-} -- $references -- -- Botev. Z.I., Grotowski J.F., Kroese D.P. (2010). Kernel density -- estimation via diffusion. /Annals of Statistics/ -- 38(5):2916–2957. statistics-0.13.3.0/Statistics/Sample/Internal.hs0000644000000000000000000000136012724050603017776 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample.Internal -- Copyright : (c) 2013 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Internal functions for computing over samples. module Statistics.Sample.Internal ( robustSumVar , sum ) where import Numeric.Sum (kbn, sumVector) import Prelude hiding (sum) import Statistics.Function (square) import qualified Data.Vector.Generic as G robustSumVar :: (G.Vector v Double) => Double -> v Double -> Double robustSumVar m = sum . G.map (square . subtract m) {-# INLINE robustSumVar #-} sum :: (G.Vector v Double) => v Double -> Double sum = sumVector kbn {-# INLINE sum #-} statistics-0.13.3.0/Statistics/Sample/Powers.hs0000644000000000000000000001567012724050603017512 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric, FlexibleContexts #-} -- | -- Module : Statistics.Sample.Powers -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Very fast statistics over simple powers of a sample. These can all -- be computed efficiently in just a single pass over a sample, with -- that pass subject to stream fusion. -- -- The tradeoff is that some of these functions are less numerically -- robust than their counterparts in the 'Statistics.Sample' module. -- Where this is the case, the alternatives are noted. module Statistics.Sample.Powers ( -- * Types Powers -- * Constructor , powers -- * Descriptive functions , order , count , sum -- * Statistics of location , mean -- * Statistics of dispersion , variance , stdDev , varianceUnbiased -- * Functions over central moments , centralMoment , skewness , kurtosis -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import Data.Vector.Binary () import Data.Vector.Generic (unsafeFreeze) import Data.Vector.Unboxed ((!)) import GHC.Generics (Generic) import Numeric.SpecFunctions (choose) import Prelude hiding (sum) import Statistics.Function (indexed) import Statistics.Internal (inlinePerformIO) import System.IO.Unsafe (unsafePerformIO) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU import qualified Statistics.Sample.Internal as S newtype Powers = Powers (U.Vector Double) deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Powers instance ToJSON Powers instance Binary Powers where put (Powers v) = put v get = fmap Powers get -- | O(/n/) Collect the /n/ simple powers of a sample. -- -- Functions computed over a sample's simple powers require at least a -- certain number (or /order/) of powers to be collected. -- -- * To compute the /k/th 'centralMoment', at least /k/ simple powers -- must be collected. -- -- * For the 'variance', at least 2 simple powers are needed. -- -- * For 'skewness', we need at least 3 simple powers. -- -- * For 'kurtosis', at least 4 simple powers are required. -- -- This function is subject to stream fusion. powers :: G.Vector v Double => Int -- ^ /n/, the number of powers, where /n/ >= 2. -> v Double -> Powers powers k | k < 2 = error "Statistics.Sample.powers: too few powers" | otherwise = fini . G.foldl' go (unsafePerformIO $ MU.replicate l 0) where go ms x = inlinePerformIO $ loop 0 1 where loop !i !xk | i == l = return ms | otherwise = do MU.read ms i >>= MU.write ms i . (+ xk) loop (i+1) (xk*x) fini = Powers . unsafePerformIO . unsafeFreeze l = k + 1 {-# SPECIALIZE powers :: Int -> U.Vector Double -> Powers #-} {-# SPECIALIZE powers :: Int -> V.Vector Double -> Powers #-} -- | The order (number) of simple powers collected from a 'sample'. order :: Powers -> Int order (Powers pa) = U.length pa - 1 -- | Compute the /k/th central moment of a sample. The central -- moment is also known as the moment about the mean. centralMoment :: Int -> Powers -> Double centralMoment k p@(Powers pa) | k < 0 || k > order p = error ("Statistics.Sample.Powers.centralMoment: " ++ "invalid argument") | k == 0 = 1 | otherwise = (/n) . S.sum . U.map go . indexed . U.take (k+1) $ pa where go (i , e) = (k `choose` i) * ((-m) ^ (k-i)) * e n = U.head pa m = mean p -- | Maximum likelihood estimate of a sample's variance. Also known -- as the population variance, where the denominator is /n/. This is -- the second central moment of the sample. -- -- This is less numerically robust than the variance function in the -- 'Statistics.Sample' module, but the number is essentially free to -- compute if you have already collected a sample's simple powers. -- -- Requires 'Powers' with 'order' at least 2. variance :: Powers -> Double variance = centralMoment 2 -- | Standard deviation. This is simply the square root of the -- maximum likelihood estimate of the variance. stdDev :: Powers -> Double stdDev = sqrt . variance -- | Unbiased estimate of a sample's variance. Also known as the -- sample variance, where the denominator is /n/-1. -- -- Requires 'Powers' with 'order' at least 2. varianceUnbiased :: Powers -> Double varianceUnbiased p@(Powers pa) | n > 1 = variance p * n / (n-1) | otherwise = 0 where n = U.head pa -- | Compute the skewness of a sample. This is a measure of the -- asymmetry of its distribution. -- -- A sample with negative skew is said to be /left-skewed/. Most of -- its mass is on the right of the distribution, with the tail on the -- left. -- -- > skewness . powers 3 $ U.to [1,100,101,102,103] -- > ==> -1.497681449918257 -- -- A sample with positive skew is said to be /right-skewed/. -- -- > skewness . powers 3 $ U.to [1,2,3,4,100] -- > ==> 1.4975367033335198 -- -- A sample's skewness is not defined if its 'variance' is zero. -- -- Requires 'Powers' with 'order' at least 3. skewness :: Powers -> Double skewness p = centralMoment 3 p * variance p ** (-1.5) -- | Compute the excess kurtosis of a sample. This is a measure of -- the \"peakedness\" of its distribution. A high kurtosis indicates -- that the sample's variance is due more to infrequent severe -- deviations than to frequent modest deviations. -- -- A sample's excess kurtosis is not defined if its 'variance' is -- zero. -- -- Requires 'Powers' with 'order' at least 4. kurtosis :: Powers -> Double kurtosis p = centralMoment 4 p / (v * v) - 3 where v = variance p -- | The number of elements in the original 'Sample'. This is the -- sample's zeroth simple power. count :: Powers -> Int count (Powers pa) = floor $ U.head pa -- | The sum of elements in the original 'Sample'. This is the -- sample's first simple power. sum :: Powers -> Double sum (Powers pa) = pa ! 1 -- | The arithmetic mean of elements in the original 'Sample'. -- -- This is less numerically robust than the mean function in the -- 'Statistics.Sample' module, but the number is essentially free to -- compute if you have already collected a sample's simple powers. mean :: Powers -> Double mean p@(Powers pa) | n == 0 = 0 | otherwise = sum p / n where n = U.head pa -- $references -- -- * Besset, D.H. (2000) Elements of statistics. -- /Object-oriented implementation of numerical methods/ -- ch. 9, pp. 311–331. -- -- -- * Anderson, G. (2009) Compute /k/th central moments in one -- pass. /quantblog/. statistics-0.13.3.0/Statistics/Sample/Histogram.hs0000644000000000000000000000751312724050603020165 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample.Histogram -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for computing histograms of sample data. module Statistics.Sample.Histogram ( histogram -- * Building blocks , histogram_ , range ) where import Numeric.MathFunctions.Constants (m_epsilon,m_tiny) import Statistics.Function (minMax) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as GM -- | /O(n)/ Compute a histogram over a data set. -- -- The result consists of a pair of vectors: -- -- * The lower bound of each interval. -- -- * The number of samples within the interval. -- -- Interval (bin) sizes are uniform, and the upper and lower bounds -- are chosen automatically using the 'range' function. To specify -- these parameters directly, use the 'histogram_' function. histogram :: (G.Vector v0 Double, G.Vector v1 Double, Num b, G.Vector v1 b) => Int -- ^ Number of bins (must be positive). -> v0 Double -- ^ Sample data (cannot be empty). -> (v1 Double, v1 b) histogram numBins xs = (G.generate numBins step, histogram_ numBins lo hi xs) where (lo,hi) = range numBins xs step i = lo + d * fromIntegral i d = (hi - lo) / fromIntegral numBins {-# INLINE histogram #-} -- | /O(n)/ Compute a histogram over a data set. -- -- Interval (bin) sizes are uniform, based on the supplied upper -- and lower bounds. histogram_ :: (Num b, RealFrac a, G.Vector v0 a, G.Vector v1 b) => Int -- ^ Number of bins. This value must be positive. A zero -- or negative value will cause an error. -> a -- ^ Lower bound on interval range. Sample data less than -- this will cause an error. -> a -- ^ Upper bound on interval range. This value must not be -- less than the lower bound. Sample data that falls above -- the upper bound will cause an error. -> v0 a -- ^ Sample data. -> v1 b histogram_ numBins lo hi xs0 = G.create (GM.replicate numBins 0 >>= bin xs0) where bin xs bins = go 0 where go i | i >= len = return bins | otherwise = do let x = xs `G.unsafeIndex` i b = truncate $ (x - lo) / d GM.write bins b . (+1) =<< GM.read bins b go (i+1) len = G.length xs d = ((hi - lo) * (1 + realToFrac m_epsilon)) / fromIntegral numBins {-# INLINE histogram_ #-} -- | /O(n)/ Compute decent defaults for the lower and upper bounds of -- a histogram, based on the desired number of bins and the range of -- the sample data. -- -- The upper and lower bounds used are @(lo-d, hi+d)@, where -- -- @d = (maximum sample - minimum sample) / ((bins - 1) * 2)@ -- -- If all elements in the sample are the same and equal to @x@ range -- is set to @(x - |x|/10, x + |x|/10)@. And if @x@ is equal to 0 range -- is set to @(-1,1)@. This is needed to avoid creating histogram with -- zero bin size. range :: (G.Vector v Double) => Int -- ^ Number of bins (must be positive). -> v Double -- ^ Sample data (cannot be empty). -> (Double, Double) range numBins xs | numBins < 1 = error "Statistics.Histogram.range: invalid bin count" | G.null xs = error "Statistics.Histogram.range: empty sample" | lo == hi = case abs lo / 10 of a | a < m_tiny -> (-1,1) | otherwise -> (lo - a, lo + a) | otherwise = (lo-d, hi+d) where d | numBins == 1 = 0 | otherwise = (hi - lo) / ((fromIntegral numBins - 1) * 2) (lo,hi) = minMax xs {-# INLINE range #-} statistics-0.13.3.0/Statistics/Sample/KernelDensity/0000755000000000000000000000000012724050603020446 5ustar0000000000000000statistics-0.13.3.0/Statistics/Sample/KernelDensity/Simple.hs0000644000000000000000000001602612724050603022240 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric, FlexibleContexts #-} -- | -- Module : Statistics.Sample.KernelDensity.Simple -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kernel density estimation code, providing non-parametric ways to -- estimate the probability density function of a sample. -- -- The techniques used by functions in this module are relatively -- fast, but they generally give inferior results to the KDE function -- in the main 'Statistics.KernelDensity' module (due to the -- oversmoothing documented for 'bandwidth' below). module Statistics.Sample.KernelDensity.Simple {-# DEPRECATED "Use Statistics.Sample.KernelDensity instead." #-} ( -- * Simple entry points epanechnikovPDF , gaussianPDF -- * Building blocks -- These functions may be useful if you need to construct a kernel -- density function estimator other than the ones provided in this -- module. -- ** Choosing points from a sample , Points(..) , choosePoints -- ** Bandwidth estimation , Bandwidth , bandwidth , epanechnikovBW , gaussianBW -- ** Kernels , Kernel , epanechnikovKernel , gaussianKernel -- ** Low-level estimation , estimatePDF , simplePDF -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import Data.Vector.Binary () import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_1_sqrt_2, m_2_sqrt_pi) import Prelude hiding (sum) import Statistics.Function (minMax) import Statistics.Sample (stdDev) import Statistics.Sample.Internal (sum) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | Points from the range of a 'Sample'. newtype Points = Points { fromPoints :: U.Vector Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Points instance ToJSON Points instance Binary Points where get = fmap Points get put = put . fromPoints -- | Bandwidth estimator for an Epanechnikov kernel. epanechnikovBW :: Double -> Bandwidth epanechnikovBW n = (80 / (n * m_2_sqrt_pi)) ** 0.2 -- | Bandwidth estimator for a Gaussian kernel. gaussianBW :: Double -> Bandwidth gaussianBW n = (4 / (n * 3)) ** 0.2 -- | The width of the convolution kernel used. type Bandwidth = Double -- | Compute the optimal bandwidth from the observed data for the -- given kernel. -- -- This function uses an estimate based on the standard deviation of a -- sample (due to Deheuvels), which performs reasonably well for -- unimodal distributions but leads to oversmoothing for more complex -- ones. bandwidth :: G.Vector v Double => (Double -> Bandwidth) -> v Double -> Bandwidth bandwidth kern values = stdDev values * kern (fromIntegral $ G.length values) -- | Choose a uniform range of points at which to estimate a sample's -- probability density function. -- -- If you are using a Gaussian kernel, multiply the sample's bandwidth -- by 3 before passing it to this function. -- -- If this function is passed an empty vector, it returns values of -- positive and negative infinity. choosePoints :: G.Vector v Double => Int -- ^ Number of points to select, /n/ -> Double -- ^ Sample bandwidth, /h/ -> v Double -- ^ Input data -> Points choosePoints n h sample = Points . U.map f $ U.enumFromTo 0 n' where lo = a - h hi = z + h (a, z) = minMax sample d = (hi - lo) / fromIntegral n' f i = lo + fromIntegral i * d n' = n - 1 -- | The convolution kernel. Its parameters are as follows: -- -- * Scaling factor, 1\//nh/ -- -- * Bandwidth, /h/ -- -- * A point at which to sample the input, /p/ -- -- * One sample value, /v/ type Kernel = Double -> Double -> Double -> Double -> Double -- | Epanechnikov kernel for probability density function estimation. epanechnikovKernel :: Kernel epanechnikovKernel f h p v | abs u <= 1 = f * (1 - u * u) | otherwise = 0 where u = (v - p) / (h * 0.75) -- | Gaussian kernel for probability density function estimation. gaussianKernel :: Kernel gaussianKernel f h p v = exp (-0.5 * u * u) * g where u = (v - p) / h g = f * 0.5 * m_2_sqrt_pi * m_1_sqrt_2 -- | Kernel density estimator, providing a non-parametric way of -- estimating the PDF of a random variable. estimatePDF :: G.Vector v Double => Kernel -- ^ Kernel function -> Bandwidth -- ^ Bandwidth, /h/ -> v Double -- ^ Sample data -> Points -- ^ Points at which to estimate -> U.Vector Double estimatePDF kernel h sample | n < 2 = errorShort "estimatePDF" | otherwise = U.map k . fromPoints where k p = sum . G.map (kernel f h p) $ sample f = 1 / (h * fromIntegral n) n = G.length sample {-# INLINE estimatePDF #-} -- | A helper for creating a simple kernel density estimation function -- with automatically chosen bandwidth and estimation points. simplePDF :: G.Vector v Double => (Double -> Double) -- ^ Bandwidth function -> Kernel -- ^ Kernel function -> Double -- ^ Bandwidth scaling factor (3 for a Gaussian kernel, 1 for all others) -> Int -- ^ Number of points at which to estimate -> v Double -- ^ sample data -> (Points, U.Vector Double) simplePDF fbw fpdf k numPoints sample = (points, estimatePDF fpdf bw sample points) where points = choosePoints numPoints (bw*k) sample bw = bandwidth fbw sample {-# INLINE simplePDF #-} -- | Simple Epanechnikov kernel density estimator. Returns the -- uniformly spaced points from the sample range at which the density -- function was estimated, and the estimates at those points. epanechnikovPDF :: G.Vector v Double => Int -- ^ Number of points at which to estimate -> v Double -- ^ Data sample -> (Points, U.Vector Double) epanechnikovPDF = simplePDF epanechnikovBW epanechnikovKernel 1 -- | Simple Gaussian kernel density estimator. Returns the uniformly -- spaced points from the sample range at which the density function -- was estimated, and the estimates at those points. gaussianPDF :: G.Vector v Double => Int -- ^ Number of points at which to estimate -> v Double -- ^ Data sample -> (Points, U.Vector Double) gaussianPDF = simplePDF gaussianBW gaussianKernel 3 errorShort :: String -> a errorShort func = error ("Statistics.KernelDensity." ++ func ++ ": at least two points required") -- $references -- -- * Deheuvels, P. (1977) Estimation non paramétrique de la densité -- par histogrammes -- généralisés. Mhttp://archive.numdam.org/article/RSA_1977__25_3_5_0.pdf> statistics-0.13.3.0/tests/0000755000000000000000000000000012724050603013455 5ustar0000000000000000statistics-0.13.3.0/tests/tests.hs0000644000000000000000000000121212724050603015147 0ustar0000000000000000import Test.Framework (defaultMain) import qualified Tests.Distribution as Distribution import qualified Tests.Function as Function import qualified Tests.KDE as KDE import qualified Tests.Matrix as Matrix import qualified Tests.NonParametric as NonParametric import qualified Tests.Transform as Transform import qualified Tests.Correlation as Correlation main :: IO () main = defaultMain [ Distribution.tests , Function.tests , KDE.tests , Matrix.tests , NonParametric.tests , Transform.tests , Correlation.tests ] statistics-0.13.3.0/tests/utils/0000755000000000000000000000000012724050603014615 5ustar0000000000000000statistics-0.13.3.0/tests/utils/fftw.c0000644000000000000000000000200612724050603015725 0ustar0000000000000000/* Generate some test cases using fftw3 */ #include #include #include void dump_vector(int n, double* vec) { for(int i = 0; i < n; i++) printf("%20.15f ", vec[i]); printf("\n"); } void dct(int flag, int n) { double* in = malloc( n * sizeof(double)); double* out = malloc( n * sizeof(double)); // fftw_plan plan = fftw_plan_r2r_1d(n, in, out, flag, FFTW_ESTIMATE); for( int k = 0; k < n; k++) { // Init input vector for( int i = 0; i < n; i++) in[i] = 0; in[k] = 1; // Perform DFT fftw_execute(plan); // Print results dump_vector(n, in ); dump_vector(n, out); printf("\n"); } // free(in); free(out); fftw_destroy_plan(plan); } int main(void) { printf("DCT II (the DCT)\n"); dct( FFTW_REDFT10, 2); dct( FFTW_REDFT10, 4); printf("DCT III (Inverse DCT)\n"); dct( FFTW_REDFT01, 2); dct( FFTW_REDFT01, 4); return 0; } statistics-0.13.3.0/tests/utils/Makefile0000644000000000000000000000017012724050603016253 0ustar0000000000000000C = gcc CFLAGS = -W -Wall -O2 -std=c99 LDFLAGS = -lfftw3 .PHONY: all clean all : fftw clean : rm -rf fftw *.o statistics-0.13.3.0/tests/Tests/0000755000000000000000000000000012724050603014557 5ustar0000000000000000statistics-0.13.3.0/tests/Tests/KDE.hs0000644000000000000000000000213512724050603015517 0ustar0000000000000000-- | Tests for Kernel density estimates. module Tests.KDE ( tests )where import Data.Vector.Unboxed ((!)) import Numeric.Sum (kbn, sumVector) import Statistics.Sample.KernelDensity import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck (Property, (==>), counterexample) import Text.Printf (printf) import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "KDE" [ testProperty "integral(PDF) == 1" t_densityIsPDF ] t_densityIsPDF :: [Double] -> Property t_densityIsPDF vec = not (null vec) ==> test where (xs,ys) = kde 4096 (U.fromList vec) step = (xs ! 1) - (xs ! 0) integral = integratePDF step ys -- test = counterexample (printf "Integral %f" integral) $ abs (1 - integral) <= 1e-3 integratePDF :: Double -> U.Vector Double -> Double integratePDF step vec = step * sumVector kbn (U.zipWith (*) vec weights) where n = U.length vec weights = U.generate n go where go i | i == 0 = 0.5 | i == n-1 = 0.5 | otherwise = 1 statistics-0.13.3.0/tests/Tests/Matrix.hs0000644000000000000000000000227712724050603016367 0ustar0000000000000000module Tests.Matrix (tests) where import Statistics.Matrix hiding (map) import Statistics.Matrix.Algorithms import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck import Tests.ApproxEq (ApproxEq(..)) import Tests.Matrix.Types import qualified Data.Vector.Unboxed as U t_row :: Mat Double -> Gen Property t_row ms@(Mat r _ xs) = do i <- choose (0,r-1) return $ row (fromMat ms) i === U.fromList (xs !! i) t_column :: Mat Double -> Gen Property t_column ms@(Mat _ c xs) = do i <- choose (0,c-1) return $ column (fromMat ms) i === U.fromList (map (!! i) xs) t_center :: Mat Double -> Property t_center ms@(Mat r c xs) = (xs !! (r `quot` 2)) !! (c `quot` 2) === center (fromMat ms) t_transpose :: Matrix -> Property t_transpose m = U.concat (map (column n) [0..rows m-1]) === toVector m where n = transpose m t_qr :: Matrix -> Property t_qr a = hasNaN p .||. eql 1e-10 a p where p = uncurry multiply (qr a) tests :: Test tests = testGroup "Matrix" [ testProperty "t_row" t_row , testProperty "t_column" t_column , testProperty "t_center" t_center , testProperty "t_transpose" t_transpose , testProperty "t_qr" t_qr ] statistics-0.13.3.0/tests/Tests/Correlation.hs0000644000000000000000000001261512724050603017401 0ustar0000000000000000{-#LANGUAGE BangPatterns #-} module Tests.Correlation ( tests ) where import Control.Arrow (Arrow(..)) import qualified Data.Vector as V import Statistics.Matrix hiding (map) import Statistics.Correlation import Statistics.Correlation.Kendall import Test.QuickCheck ((==>),Property,counterexample) import Test.Framework import Test.Framework.Providers.QuickCheck2 import Test.Framework.Providers.HUnit import Test.HUnit (Assertion, (@=?), assertBool) import Tests.ApproxEq ---------------------------------------------------------------- -- Tests list ---------------------------------------------------------------- tests :: Test tests = testGroup "Correlation" [ testProperty "Pearson correlation" testPearson , testProperty "Spearman correlation is scale invariant" testSpearmanScale , testProperty "Spearman correlation, nonlinear" testSpearmanNonlinear , testProperty "Kendall test -- general" testKendall , testCase "Kendall test -- special cases" testKendallSpecial ] ---------------------------------------------------------------- -- Pearson's correlation ---------------------------------------------------------------- testPearson :: [(Double,Double)] -> Property testPearson sample = (length sample > 1) ==> (exact ~= fast) where (~=) = eql 1e-12 exact = exactPearson $ map (realToFrac *** realToFrac) sample fast = pearson $ V.fromList sample exactPearson :: [(Rational,Rational)] -> Double exactPearson sample = realToFrac cov / sqrt (realToFrac (varX * varY)) where (xs,ys) = unzip sample n = fromIntegral $ length sample -- Mean muX = sum xs / n muY = sum ys / n -- Mean of squares muX2 = sum (map (\x->x*x) xs) / n muY2 = sum (map (\x->x*x) ys) / n -- Covariance cov = sum (zipWith (*) [x - muX | x<-xs] [y - muY | y<-ys]) / n varX = muX2 - muX*muX varY = muY2 - muY*muY ---------------------------------------------------------------- -- Spearman's correlation ---------------------------------------------------------------- -- Test that Spearman correlation is scale invariant testSpearmanScale :: [(Double,Double)] -> Double -> Property testSpearmanScale xs a = and [ length xs > 1 -- Enough to calculate underflow , a /= 0 , not (isNaN c1) , not (isNaN c2) , not (isNaN c3) , not (isNaN c4) ] ==> ( counterexample (show xs2) $ counterexample (show xs3) $ counterexample (show xs4) $ counterexample (show (c1,c2,c3,c4)) $ and [ c1 == c4 , c1 == signum a * c2 , c1 == signum a * c3 ] ) where xs2 = map ((*a) *** id ) xs xs3 = map (id *** (*a)) xs xs4 = map ((*a) *** (*a)) xs c1 = spearman $ V.fromList xs c2 = spearman $ V.fromList xs2 c3 = spearman $ V.fromList xs3 c4 = spearman $ V.fromList xs4 -- Test that Spearman correlation allows to transform sample with testSpearmanNonlinear :: [(Double,Double)] -> Property testSpearmanNonlinear sample0 = and [ length sample0 > 1 , not (isNaN c1) , not (isNaN c2) , not (isNaN c3) , not (isNaN c4) ] ==> ( counterexample (show sample0) $ counterexample (show sample1) $ counterexample (show sample2) $ counterexample (show sample3) $ counterexample (show sample4) $ counterexample (show (c1,c2,c3,c4)) $ and [ c1 == c2 , c1 == c3 , c1 == c4 ] ) where -- We need to stretch sample into [-10 .. 10] range to avoid -- problems with under/overflows etc. stretch xs | a == b = xs | otherwise = [ (x - a - 10) * 20 / (a - b) | x <- xs ] where a = minimum xs b = maximum xs sample1 = uncurry zip $ (stretch *** stretch) $ unzip sample0 sample2 = map (exp *** id ) sample1 sample3 = map (id *** exp) sample1 sample4 = map (exp *** exp) sample1 c1 = spearman $ V.fromList sample1 c2 = spearman $ V.fromList sample2 c3 = spearman $ V.fromList sample3 c4 = spearman $ V.fromList sample4 ---------------------------------------------------------------- -- Kendall's correlation ---------------------------------------------------------------- testKendall :: [(Double, Double)] -> Bool testKendall xy | isNaN r1 = isNaN r2 | otherwise = r1 == r2 where r1 = kendallBruteForce xy r2 = kendall $ V.fromList xy testKendallSpecial :: Assertion testKendallSpecial = vs @=? map (\(xs, ys) -> kendall $ V.fromList $ zip xs ys) d where (d, vs) = unzip testData testData :: [(([Double], [Double]), Double)] testData = [ (([1, 2, 3, 1, 2], [1, 2, 1, 5, 2]), -0.375) , (([1, 1, 1, 3, 3], [3, 3, 3, 2, 5]), 0) ] kendallBruteForce :: [(Double, Double)] -> Double kendallBruteForce xy = (n_c - n_d) / sqrt ((n_0 - n_1) * (n_0 - n_2)) where allPairs = f xy (n_c, n_d, n_1, n_2) = foldl g (0,0,0,0) allPairs n_0 = fromIntegral.length $ allPairs g (!nc, !nd, !n1, !n2) ((x1, y1), (x2, y2)) | (x2 - x1) * (y2 - y1) > 0 = (nc+1, nd, n1, n2) | (x2 - x1) * (y2 - y1) < 0 = (nc, nd+1, n1, n2) | otherwise = if x1 == x2 then if y1 == y2 then (nc, nd, n1+1, n2+1) else (nc, nd, n1+1, n2) else (nc, nd, n1, n2+1) f (x:xs) = zip (repeat x) xs ++ f xs f _ = [] statistics-0.13.3.0/tests/Tests/Helpers.hs0000644000000000000000000000456012724050603016522 0ustar0000000000000000-- | Helpers for testing module Tests.Helpers ( -- * helpers T(..) , typeName -- * Generic QC tests , monotonicallyIncreases , monotonicallyIncreasesIEEE -- * HUnit helpers , testAssertion , testEquality -- * QC helpers , small , unsquare , shrinkFixedList ) where import Data.Typeable import Test.Framework import Test.Framework.Providers.HUnit import Test.QuickCheck import qualified Numeric.IEEE as IEEE import qualified Test.HUnit as HU -- | Phantom typed value used to select right instance in QC tests data T a = T -- | String representation of type name typeName :: Typeable a => T a -> String typeName = show . typeOf . typeParam where typeParam :: T a -> a typeParam _ = undefined ---------------------------------------------------------------- -- Generic QC ---------------------------------------------------------------- -- Check that function is nondecreasing monotonicallyIncreases :: (Ord a, Ord b) => (a -> b) -> a -> a -> Bool monotonicallyIncreases f x1 x2 = f (min x1 x2) <= f (max x1 x2) -- Check that function is nondecreasing taking rounding errors into -- account. -- -- In fact funstion is allowed to decrease less than one ulp in order -- to guard againist problems with excess precision. On x86 FPU works -- with 80-bit numbers but doubles are 64-bit so rounding happens -- whenever values are moved from registers to memory monotonicallyIncreasesIEEE :: (Ord a, IEEE.IEEE b) => (a -> b) -> a -> a -> Bool monotonicallyIncreasesIEEE f x1 x2 = y1 <= y2 || (y1 - y2) < y2 * IEEE.epsilon where y1 = f (min x1 x2) y2 = f (max x1 x2) ---------------------------------------------------------------- -- HUnit helpers ---------------------------------------------------------------- testAssertion :: String -> Bool -> Test testAssertion str cont = testCase str $ HU.assertBool str cont testEquality :: (Show a, Eq a) => String -> a -> a -> Test testEquality msg a b = testCase msg $ HU.assertEqual msg a b unsquare :: (Arbitrary a, Show a, Testable b) => (a -> b) -> Property unsquare = forAll (small arbitrary) small :: Gen a -> Gen a small act = sized $ \n -> resize (smallish n) act where smallish = round . (sqrt :: Double -> Double) . fromIntegral . abs shrinkFixedList :: (a -> [a]) -> [a] -> [[a]] shrinkFixedList shr (x:xs) = map (:xs) (shr x) ++ map (x:) (shrinkFixedList shr xs) shrinkFixedList _ [] = [] statistics-0.13.3.0/tests/Tests/NonParametric.hs0000644000000000000000000002761412724050603017667 0ustar0000000000000000-- Tests for Statistics.Test.NonParametric module Tests.NonParametric (tests) where import Statistics.Distribution.Normal (standard) import Statistics.Test.KolmogorovSmirnov import Statistics.Test.MannWhitneyU import Statistics.Test.KruskalWallis import Statistics.Test.WilcoxonT import Test.Framework (Test, testGroup) import Test.Framework.Providers.HUnit import Test.HUnit (assertEqual) import Tests.ApproxEq (eq) import Tests.Helpers (testAssertion, testEquality) import Tests.NonParametric.Table (tableKSD, tableKS2D) import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "Nonparametric tests" $ concat [ mannWhitneyTests , wilcoxonSumTests , wilcoxonPairTests , kruskalWallisRankTests , kruskalWallisTests , kolmogorovSmirnovDTest ] ---------------------------------------------------------------- mannWhitneyTests :: [Test] mannWhitneyTests = zipWith test [(0::Int)..] testData ++ [ testEquality "Mann-Whitney U Critical Values, m=1" (replicate (20*3) Nothing) [mannWhitneyUCriticalValue (1,x) p | x <- [1..20], p <- [0.005,0.01,0.025]] , testEquality "Mann-Whitney U Critical Values, m=2, p=0.025" (replicate 7 Nothing ++ map Just [0,0,0,0,1,1,1,1,1,2,2,2,2]) [mannWhitneyUCriticalValue (2,x) 0.025 | x <- [1..20]] , testEquality "Mann-Whitney U Critical Values, m=6, p=0.05" (replicate 1 Nothing ++ map Just [0, 2,3,5,7,8,10,12,14,16,17,19,21,23,25,26,28,30,32]) [mannWhitneyUCriticalValue (6,x) 0.05 | x <- [1..20]] , testEquality "Mann-Whitney U Critical Values, m=20, p=0.025" (replicate 1 Nothing ++ map Just [2,8,14,20,27,34,41,48,55,62,69,76,83,90,98,105,112,119,127]) [mannWhitneyUCriticalValue (20,x) 0.025 | x <- [1..20]] ] where test n (a, b, c, d) = testCase "Mann-Whitney" $ do assertEqual ("Mann-Whitney U " ++ show n) c us assertEqual ("Mann-Whitney U Sig " ++ show n) d ss where us = mannWhitneyU (U.fromList a) (U.fromList b) ss = mannWhitneyUSignificant TwoTailed (length a, length b) 0.05 us -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double), Maybe TestResult)] testData = [ ( [3,4,2,6,2,5] , [9,7,5,10,6,8] , (2, 34) , Just Significant ) , ( [540,480,600,590,605] , [760,890,1105,595,940] , (2, 23) , Just Significant ) , ( [19,22,16,29,24] , [20,11,17,12] , (17, 3) , Just NotSignificant ) , ( [126,148,85,61, 179,93, 45,189,85,93] , [194,128,69,135,171,149,89,248,79,137] , (35,65) , Just NotSignificant ) , ( [1..30] , [1..30] , (450,450) , Just NotSignificant ) , ( [1 .. 30] , [11.5 .. 40 ] , (190.0,710.0) , Just Significant ) ] wilcoxonSumTests :: [Test] wilcoxonSumTests = zipWith test [(0::Int)..] testData where test n (a, b, c) = testCase "Wilcoxon Sum" $ assertEqual ("Wilcoxon Sum " ++ show n) c (wilcoxonRankSums (U.fromList a) (U.fromList b)) -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double))] testData = [ ( [8.50,9.48,8.65,8.16,8.83,7.76,8.63] , [8.27,8.20,8.25,8.14,9.00,8.10,7.20,8.32,7.70] , (75, 61) ) , ( [0.45,0.50,0.61,0.63,0.75,0.85,0.93] , [0.44,0.45,0.52,0.53,0.56,0.58,0.58,0.65,0.79] , (71.5, 64.5) ) ] wilcoxonPairTests :: [Test] wilcoxonPairTests = zipWith test [(0::Int)..] testData ++ -- Taken from the Mitic paper: [ testAssertion "Sig 16, 35" (to4dp 0.0467 $ wilcoxonMatchedPairSignificance 16 35) , testAssertion "Sig 16, 36" (to4dp 0.0523 $ wilcoxonMatchedPairSignificance 16 36) , testEquality "Wilcoxon critical values, p=0.05" (replicate 4 Nothing ++ map Just [0,2,3,5,8,10,13,17,21,25,30,35,41,47,53,60,67,75,83,91,100,110,119]) [wilcoxonMatchedPairCriticalValue x 0.05 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.025" (replicate 5 Nothing ++ map Just [0,2,3,5,8,10,13,17,21,25,29,34,40,46,52,58,65,73,81,89,98,107]) [wilcoxonMatchedPairCriticalValue x 0.025 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.01" (replicate 6 Nothing ++ map Just [0,1,3,5,7,9,12,15,19,23,27,32,37,43,49,55,62,69,76,84,92]) [wilcoxonMatchedPairCriticalValue x 0.01 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.005" (replicate 7 Nothing ++ map Just [0,1,3,5,7,9,12,15,19,23,27,32,37,42,48,54,61,68,75,83]) [wilcoxonMatchedPairCriticalValue x 0.005 | x <- [1..27]] ] where test n (a, b, c) = testEquality ("Wilcoxon Paired " ++ show n) c res where res = (wilcoxonMatchedPairSignedRank (U.fromList a) (U.fromList b)) -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double))] testData = [ ([1..10], [1..10], (0, 0 )) , ([1..5], [6..10], (0, 5*(-3))) -- Worked example from the Internet: , ( [125,115,130,140,140,115,140,125,140,135] , [110,122,125,120,140,124,123,137,135,145] , ( sum $ filter (> 0) [7,-3,1.5,9,0,-4,8,-6,1.5,-5] , sum $ filter (< 0) [7,-3,1.5,9,0,-4,8,-6,1.5,-5] ) ) -- Worked examples from books/papers: , ( [2.4,1.9,2.3,1.9,2.4,2.5] , [2.0,2.1,2.0,2.0,1.8,2.0] , (18, -3) ) , ( [130,170,125,170,130,130,145,160] , [120,163,120,135,143,136,144,120] , (27, -9) ) , ( [540,580,600,680,430,740,600,690,605,520] , [760,710,1105,880,500,990,1050,640,595,520] , (3, -42) ) ] to4dp tgt x = x >= tgt - 0.00005 && x < tgt + 0.00005 ---------------------------------------------------------------- kruskalWallisRankTests :: [Test] kruskalWallisRankTests = zipWith test [(0::Int)..] testData where test n (a, b) = testCase "Kruskal-Wallis Ranking" $ assertEqual ("Kruskal-Wallis " ++ show n) (map U.fromList b) (kruskalWallisRank $ map U.fromList a) testData = [ ( [ [68,93,123,83,108,122] , [119,116,101,103,113,84] , [70,68,54,73,81,68] , [61,54,59,67,59,70] ] , [ [8.0,14.0,16.0,19.0,23.0,24.0] , [15.0,17.0,18.0,20.0,21.0,22.0] , [1.5,8.0,8.0,10.5,12.0,13.0] , [1.5,3.5,3.5,5.0,6.0,10.5] ] ) ] kruskalWallisTests :: [Test] kruskalWallisTests = zipWith test [(0::Int)..] testData where test n (a, b, c) = testCase "Kruskal-Wallis" $ do assertEqual ("Kruskal-Wallis " ++ show n) (round100 b) (round100 kw) assertEqual ("Kruskal-Wallis Sig " ++ show n) c kwt where kw = kruskalWallis $ map U.fromList a kwt = kruskalWallisTest 0.05 $ map U.fromList a round100 :: Double -> Integer round100 = round . (*100) testData = [ ( [ [68,93,123,83,108,122] , [119,116,101,103,113,84] , [70,68,54,73,81,68] , [61,54,59,67,59,70] ] , 16.03 , Just Significant ) , ( [ [5,5,3,5,5,5,5] , [5,5,5,5,7,5,5] , [5,5,6,5,5,5,5] , [4,5,5,5,6,5,5] ] , 2.24 , Just NotSignificant ) , ( [ [36,48,5,67,53] , [49,33,60,2,55] , [71,31,140,59,42] ] , 1.22 , Just NotSignificant ) , ( [ [6,38,3,17,11,30,15,16,25,5] , [34,28,42,13,40,31,9,32,39,27] , [13,35,19,4,29,0,7,33,18,24] ] , 6.10 , Just Significant ) ] ---------------------------------------------------------------- -- K-S test ---------------------------------------------------------------- kolmogorovSmirnovDTest :: [Test] kolmogorovSmirnovDTest = [ testAssertion "K-S D statistics" $ and [ eq 1e-6 (kolmogorovSmirnovD standard (toU sample)) reference | (reference,sample) <- tableKSD ] , testAssertion "K-S 2-sample statistics" $ and [ eq 1e-6 (kolmogorovSmirnov2D (toU xs) (toU ys)) reference | (reference,xs,ys) <- tableKS2D ] , testAssertion "K-S probability" $ and [ eq 1e-14 (kolmogorovSmirnovProbability n d) p | (d,n,p) <- testData ] ] where toU = U.fromList -- Test data for the calculation of cumulative probability -- P(D[n] < d). -- -- Test data is: -- (D[n], n, p) -- Table is generated using sample program from paper testData :: [(Double,Int,Double)] testData = [ (0.09 , 3, 0 ) , (0.2 , 3, 0.00177777777777778 ) , (0.301 , 3, 0.116357025777778 ) , (0.392 , 3, 0.383127210666667 ) , (0.5003 , 3, 0.667366306558667 ) , (0.604 , 3, 0.861569877333333 ) , (0.699 , 3, 0.945458198 ) , (0.802 , 3, 0.984475216 ) , (0.9 , 3, 0.998 ) , (0.09 , 5, 0 ) , (0.2 , 5, 0.0384 ) , (0.301 , 5, 0.33993786080016 ) , (0.392 , 5, 0.66931908083712 ) , (0.5003 , 5, 0.888397260183794 ) , (0.604 , 5, 0.971609957879808 ) , (0.699 , 5, 0.994331075994008 ) , (0.802 , 5, 0.999391366368064 ) , (0.9 , 5, 0.99998 ) , (0.09 , 8, 3.37615237575e-06 ) , (0.2 , 8, 0.151622071801758 ) , (0.301 , 8, 0.613891042670582 ) , (0.392 , 8, 0.871491561427005 ) , (0.5003 , 8, 0.977534089199071 ) , (0.604 , 8, 0.997473116268255 ) , (0.699 , 8, 0.999806082005123 ) , (0.802 , 8, 0.999995133786947 ) , (0.9 , 8, 0.99999998 ) , (0.09 , 10, 3.89639433093119e-05) , (0.2 , 10, 0.25128096 ) , (0.301 , 10, 0.732913126355935 ) , (0.392 , 10, 0.932185254518767 ) , (0.5003 , 10, 0.992276179340446 ) , (0.604 , 10, 0.999495533516769 ) , (0.699 , 10, 0.999979691783985 ) , (0.802 , 10, 0.999999801409237 ) , (0.09 , 20, 0.00794502217168886 ) , (0.2 , 20, 0.647279826376584 ) , (0.301 , 20, 0.958017466965765 ) , (0.392 , 20, 0.997206424843499 ) , (0.5003 , 20, 0.999962641414228 ) , (0.09 , 30, 0.0498147538075168 ) , (0.2 , 30, 0.842030838984526 ) , (0.301 , 30, 0.993403560017612 ) , (0.392 , 30, 0.99988478803318 ) , (0.09 , 100, 0.629367974413669 ) ] statistics-0.13.3.0/tests/Tests/Transform.hs0000644000000000000000000001315612724050603017074 0ustar0000000000000000{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE ViewPatterns #-} module Tests.Transform ( tests ) where import Data.Bits ((.&.), shiftL) import Data.Complex (Complex((:+))) import Data.Functor ((<$>)) import Numeric.Sum (kbn, sumVector) import Statistics.Function (within) import Statistics.Transform (CD, dct, fft, idct, ifft) import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck (Positive(..), Arbitrary(..), Gen, choose, vectorOf, counterexample) import Test.QuickCheck.Property (Property(..)) import Tests.Helpers (testAssertion) import Text.Printf (printf) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "fft" [ testProperty "t_impulse" t_impulse , testProperty "t_impulse_offset" t_impulse_offset , testProperty "ifft . fft = id" (t_fftInverse $ ifft . fft) , testProperty "fft . ifft = id" (t_fftInverse $ fft . ifft) , testProperty "idct . dct = id [up to scale]" (t_fftInverse (\v -> U.map (/ (2 * fromIntegral (U.length v))) $ idct $ dct v)) , testProperty "dct . idct = id [up to scale]" (t_fftInverse (\v -> U.map (/ (2 * fromIntegral (U.length v))) $ idct $ dct v)) -- Exact small size DCT -- 1 , testDCT [1] $ [2] -- 2 , testDCT [1,0] $ map (*2) [1, cos (pi/4) ] , testDCT [0,1] $ map (*2) [1, cos (3*pi/4) ] -- 4 , testDCT [1,0,0,0] $ map (*2) [1, cos( pi/8), cos( 2*pi/8), cos( 3*pi/8)] , testDCT [0,1,0,0] $ map (*2) [1, cos(3*pi/8), cos( 6*pi/8), cos( 9*pi/8)] , testDCT [0,0,1,0] $ map (*2) [1, cos(5*pi/8), cos(10*pi/8), cos(15*pi/8)] , testDCT [0,0,0,1] $ map (*2) [1, cos(7*pi/8), cos(14*pi/8), cos(21*pi/8)] -- Exact small size IDCT -- 1 , testIDCT [1] [1] -- 2 , testIDCT [1,0] [1, 1 ] , testIDCT [0,1] $ map (*2) [cos(pi/4), cos(3*pi/4)] -- 4 , testIDCT [1,0,0,0] [1, 1, 1, 1 ] , testIDCT [0,1,0,0] $ map (*2) [cos( pi/8), cos( 3*pi/8), cos( 5*pi/8), cos( 7*pi/8) ] , testIDCT [0,0,1,0] $ map (*2) [cos( 2*pi/8), cos( 6*pi/8), cos(10*pi/8), cos(14*pi/8) ] , testIDCT [0,0,0,1] $ map (*2) [cos( 3*pi/8), cos( 9*pi/8), cos(15*pi/8), cos(21*pi/8) ] ] -- A single real-valued impulse at the beginning of an otherwise zero -- vector should be replicated in every real component of the result, -- and all the imaginary components should be zero. t_impulse :: Double -> Positive Int -> Bool t_impulse k (Positive m) = U.all (c_near i) (fft v) where v = i `G.cons` G.replicate (n-1) 0 i = k :+ 0 n = 1 `shiftL` (m .&. 6) -- If a real-valued impulse is offset from the beginning of an -- otherwise zero vector, the sum-of-squares of each component of the -- result should equal the square of the impulse. t_impulse_offset :: Double -> Positive Int -> Positive Int -> Bool t_impulse_offset k (Positive x) (Positive m) = U.all ok (fft v) where v = G.concat [G.replicate xn 0, G.singleton i, G.replicate (n-xn-1) 0] ok (re :+ im) = within ulps (re*re + im*im) (k*k) i = k :+ 0 xn = x `rem` n n = 1 `shiftL` (m .&. 6) -- Test that (ifft . fft ≈ id) -- -- Approximate equality here is tricky. Smaller values of vector tend -- to have large relative error. Thus we should test that vectors as -- whole are approximate equal. t_fftInverse :: (HasNorm (U.Vector a), U.Unbox a, Num a, Show a, Arbitrary a) => (U.Vector a -> U.Vector a) -> Property t_fftInverse roundtrip = MkProperty $ do x <- genFftVector let n = G.length x x' = roundtrip x d = G.zipWith (-) x x' nd = vectorNorm d nx = vectorNorm x unProperty $ counterexample "Original vector" $ counterexample (show x ) $ counterexample "Transformed one" $ counterexample (show x') $ counterexample (printf "Length = %i" n) $ counterexample (printf "|x - x'| / |x| = %.6g" (nd / nx)) $ nd <= 3e-14 * nx -- Test discrete cosine transform testDCT :: [Double] -> [Double] -> Test testDCT (U.fromList -> vec) (U.fromList -> res) = testAssertion ("DCT test for " ++ show vec) $ vecEqual 3e-14 (dct vec) res -- Test inverse discrete cosine transform testIDCT :: [Double] -> [Double] -> Test testIDCT (U.fromList -> vec) (U.fromList -> res) = testAssertion ("IDCT test for " ++ show vec) $ vecEqual 3e-14 (idct vec) res ---------------------------------------------------------------- -- With an error tolerance of 8 ULPs, a million QuickCheck tests are -- likely to all succeed. With a tolerance of 7, we fail around the -- half million mark. ulps :: Int ulps = 8 c_near :: CD -> CD -> Bool c_near (a :+ b) (c :+ d) = within ulps a c && within ulps b d -- Arbitrary vector for FFT od DCT genFftVector :: (U.Unbox a, Arbitrary a) => Gen (U.Vector a) genFftVector = do n <- (2^) <$> choose (1,9::Int) -- Size of vector G.fromList <$> vectorOf n arbitrary -- Vector to transform -- Ad-hoc type class for calculation of vector norm class HasNorm a where vectorNorm :: a -> Double instance HasNorm (U.Vector Double) where vectorNorm = sqrt . sumVector kbn . U.map (\x -> x*x) instance HasNorm (U.Vector CD) where vectorNorm = sqrt . sumVector kbn . U.map (\(x :+ y) -> x*x + y*y) -- Approximate equality for vectors vecEqual :: Double -> U.Vector Double -> U.Vector Double -> Bool vecEqual ε v u = vectorNorm (U.zipWith (-) v u) < ε * vectorNorm v statistics-0.13.3.0/tests/Tests/Function.hs0000644000000000000000000000141512724050603016701 0ustar0000000000000000module Tests.Function ( tests ) where import Statistics.Function import Test.Framework import Test.Framework.Providers.QuickCheck2 import Test.QuickCheck import Tests.Helpers import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "S.Function" [ testProperty "Sort is sort" p_sort , testAssertion "nextHighestPowerOfTwo is OK" p_nextHighestPowerOfTwo ] p_sort :: [Double] -> Property p_sort xs = not (null xs) ==> U.all (uncurry (<=)) (U.zip v $ U.tail v) where v = sort $ U.fromList xs p_nextHighestPowerOfTwo :: Bool p_nextHighestPowerOfTwo = all (\(good, is) -> all ((==good) . nextHighestPowerOfTwo) is) lists where pows = [1 .. 17 :: Int] lists = [ (2^m, [2^n+1 .. 2^m]) | (n,m) <- pows `zip` tail pows ] statistics-0.13.3.0/tests/Tests/Distribution.hs0000644000000000000000000003670412724050603017604 0ustar0000000000000000{-# OPTIONS_GHC -fno-warn-orphans #-} {-# LANGUAGE FlexibleInstances, OverlappingInstances, ScopedTypeVariables, ViewPatterns #-} module Tests.Distribution (tests) where import Control.Applicative ((<$), (<$>), (<*>)) import Data.Binary (Binary, decode, encode) import Data.List (find) import Data.Typeable (Typeable) import Statistics.Distribution import Statistics.Distribution.Beta (BetaDistribution, betaDistr) import Statistics.Distribution.Binomial (BinomialDistribution, binomial) import Statistics.Distribution.CauchyLorentz import Statistics.Distribution.ChiSquared (ChiSquared, chiSquared) import Statistics.Distribution.Exponential (ExponentialDistribution, exponential) import Statistics.Distribution.FDistribution (FDistribution, fDistribution) import Statistics.Distribution.Gamma (GammaDistribution, gammaDistr) import Statistics.Distribution.Geometric import Statistics.Distribution.Hypergeometric import Statistics.Distribution.Laplace (LaplaceDistribution, laplace) import Statistics.Distribution.Normal (NormalDistribution, normalDistr) import Statistics.Distribution.Poisson (PoissonDistribution, poisson) import Statistics.Distribution.StudentT import Statistics.Distribution.Transform (LinearTransform, linTransDistr) import Statistics.Distribution.Uniform (UniformDistribution, uniformDistr) import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck as QC import Test.QuickCheck.Monadic as QC import Tests.ApproxEq (ApproxEq(..)) import Tests.Helpers (T(..), testAssertion, typeName) import Tests.Helpers (monotonicallyIncreasesIEEE) import Text.Printf (printf) import qualified Control.Exception as E import qualified Numeric.IEEE as IEEE -- | Tests for all distributions tests :: Test tests = testGroup "Tests for all distributions" [ contDistrTests (T :: T BetaDistribution ) , contDistrTests (T :: T CauchyDistribution ) , contDistrTests (T :: T ChiSquared ) , contDistrTests (T :: T ExponentialDistribution ) , contDistrTests (T :: T GammaDistribution ) , contDistrTests (T :: T LaplaceDistribution ) , contDistrTests (T :: T NormalDistribution ) , contDistrTests (T :: T UniformDistribution ) , contDistrTests (T :: T StudentT ) , contDistrTests (T :: T (LinearTransform StudentT) ) , contDistrTests (T :: T FDistribution ) , discreteDistrTests (T :: T BinomialDistribution ) , discreteDistrTests (T :: T GeometricDistribution ) , discreteDistrTests (T :: T GeometricDistribution0 ) , discreteDistrTests (T :: T HypergeometricDistribution ) , discreteDistrTests (T :: T PoissonDistribution ) , unitTests ] ---------------------------------------------------------------- -- Tests ---------------------------------------------------------------- -- Tests for continous distribution contDistrTests :: (Param d, ContDistr d, QC.Arbitrary d, Typeable d, Show d, Binary d, Eq d) => T d -> Test contDistrTests t = testGroup ("Tests for: " ++ typeName t) $ cdfTests t ++ [ testProperty "PDF sanity" $ pdfSanityCheck t , testProperty "Quantile is CDF inverse" $ quantileIsInvCDF t , testProperty "quantile fails p<0||p>1" $ quantileShouldFail t , testProperty "log density check" $ logDensityCheck t ] -- Tests for discrete distribution discreteDistrTests :: (Param d, DiscreteDistr d, QC.Arbitrary d, Typeable d, Show d, Binary d, Eq d) => T d -> Test discreteDistrTests t = testGroup ("Tests for: " ++ typeName t) $ cdfTests t ++ [ testProperty "Prob. sanity" $ probSanityCheck t , testProperty "CDF is sum of prob." $ discreteCDFcorrect t , testProperty "Discrete CDF is OK" $ cdfDiscreteIsCorrect t , testProperty "log probabilty check" $ logProbabilityCheck t ] -- Tests for distributions which have CDF cdfTests :: (Param d, Distribution d, QC.Arbitrary d, Show d, Binary d, Eq d) => T d -> [Test] cdfTests t = [ testProperty "C.D.F. sanity" $ cdfSanityCheck t , testProperty "CDF limit at +inf" $ cdfLimitAtPosInfinity t , testProperty "CDF limit at -inf" $ cdfLimitAtNegInfinity t , testProperty "CDF at +inf = 1" $ cdfAtPosInfinity t , testProperty "CDF at -inf = 1" $ cdfAtNegInfinity t , testProperty "CDF is nondecreasing" $ cdfIsNondecreasing t , testProperty "1-CDF is correct" $ cdfComplementIsCorrect t , testProperty "Binary OK" $ p_binary t ] ---------------------------------------------------------------- -- CDF is in [0,1] range cdfSanityCheck :: (Distribution d) => T d -> d -> Double -> Bool cdfSanityCheck _ d x = c >= 0 && c <= 1 where c = cumulative d x -- CDF never decreases cdfIsNondecreasing :: (Distribution d) => T d -> d -> Double -> Double -> Bool cdfIsNondecreasing _ d = monotonicallyIncreasesIEEE $ cumulative d -- cumulative d +∞ = 1 cdfAtPosInfinity :: (Param d, Distribution d) => T d -> d -> Bool cdfAtPosInfinity _ d = cumulative d (1/0) == 1 -- cumulative d - ∞ = 0 cdfAtNegInfinity :: (Param d, Distribution d) => T d -> d -> Bool cdfAtNegInfinity _ d = cumulative d (-1/0) == 0 -- CDF limit at +∞ is 1 cdfLimitAtPosInfinity :: (Param d, Distribution d) => T d -> d -> Property cdfLimitAtPosInfinity _ d = okForInfLimit d ==> counterexample ("Last elements: " ++ show (drop 990 probs)) $ Just 1.0 == (find (>=1) probs) where probs = take 1000 $ map (cumulative d) $ iterate (*1.4) 1000 -- CDF limit at -∞ is 0 cdfLimitAtNegInfinity :: (Param d, Distribution d) => T d -> d -> Property cdfLimitAtNegInfinity _ d = okForInfLimit d ==> counterexample ("Last elements: " ++ show (drop 990 probs)) $ case find (< IEEE.epsilon) probs of Nothing -> False Just p -> p >= 0 where probs = take 1000 $ map (cumulative d) $ iterate (*1.4) (-1) -- CDF's complement is implemented correctly cdfComplementIsCorrect :: (Distribution d) => T d -> d -> Double -> Bool cdfComplementIsCorrect _ d x = (eq 1e-14) 1 (cumulative d x + complCumulative d x) -- CDF for discrete distribution uses <= for comparison cdfDiscreteIsCorrect :: (DiscreteDistr d) => T d -> d -> Property cdfDiscreteIsCorrect _ d = counterexample (unlines badN) $ null badN where -- We are checking that: -- -- > CDF(i) - CDF(i-e) = P(i) -- -- Apporixmate equality is tricky here. Scale is set by maximum -- value of CDF and probability. Case when all proabilities are -- zero should be trated specially. badN = [ printf "N=%3i p[i]=%g\tp[i+1]=%g\tdP=%g\trelerr=%g" i p p1 dp ((p1-p-dp) / max p1 dp) | i <- [0 .. 100] , let p = cumulative d $ fromIntegral i - 1e-6 p1 = cumulative d $ fromIntegral i dp = probability d i relerr = ((p1 - p) - dp) / max p1 dp , not (p == 0 && p1 == 0 && dp == 0) && relerr > 1e-14 ] logDensityCheck :: (ContDistr d) => T d -> d -> Double -> Property logDensityCheck _ d x = counterexample (printf "density = %g" p) $ counterexample (printf "logDensity = %g" logP) $ counterexample (printf "log p = %g" (log p)) $ counterexample (printf "eps = %g" (abs (logP - log p) / max (abs (log p)) (abs logP))) $ or [ p == 0 && logP == (-1/0) , p < 1e-308 && logP < 609 , eq 1e-14 (log p) logP ] where p = density d x logP = logDensity d x -- PDF is positive pdfSanityCheck :: (ContDistr d) => T d -> d -> Double -> Bool pdfSanityCheck _ d x = p >= 0 where p = density d x -- Quantile is inverse of CDF quantileIsInvCDF :: (Param d, ContDistr d) => T d -> d -> Double -> Property quantileIsInvCDF _ d (snd . properFraction -> p) = p > 0 && p < 1 ==> ( counterexample (printf "Quantile = %g" q ) $ counterexample (printf "Probability = %g" p ) $ counterexample (printf "Probability' = %g" p') $ counterexample (printf "Error = %e" (abs $ p - p')) $ abs (p - p') < invQuantilePrec d ) where q = quantile d p p' = cumulative d q -- Test that quantile fails if p<0 or p>1 quantileShouldFail :: (ContDistr d) => T d -> d -> Double -> Property quantileShouldFail _ d p = p < 0 || p > 1 ==> QC.monadicIO $ do r <- QC.run $ E.catch (False <$ (return $! quantile d p)) (\(_ :: E.SomeException) -> return True) QC.assert r -- Probability is in [0,1] range probSanityCheck :: (DiscreteDistr d) => T d -> d -> Int -> Bool probSanityCheck _ d x = p >= 0 && p <= 1 where p = probability d x -- Check that discrete CDF is correct discreteCDFcorrect :: (DiscreteDistr d) => T d -> d -> Int -> Int -> Property discreteCDFcorrect _ d a b = counterexample (printf "CDF = %g" p1) $ counterexample (printf "Sum = %g" p2) $ counterexample (printf "Delta = %g" (abs (p1 - p2))) $ abs (p1 - p2) < 3e-10 -- Avoid too large differeneces. Otherwise there is to much to sum -- -- Absolute difference is used guard againist precision loss when -- close values of CDF are subtracted where n = min a b m = n + (abs (a - b) `mod` 100) p1 = cumulative d (fromIntegral m + 0.5) - cumulative d (fromIntegral n - 0.5) p2 = sum $ map (probability d) [n .. m] logProbabilityCheck :: (DiscreteDistr d) => T d -> d -> Int -> Property logProbabilityCheck _ d x = counterexample (printf "probability = %g" p) $ counterexample (printf "logProbability = %g" logP) $ counterexample (printf "log p = %g" (log p)) $ counterexample (printf "eps = %g" (abs (logP - log p) / max (abs (log p)) (abs logP))) $ or [ p == 0 && logP == (-1/0) , p < 1e-308 && logP < 609 , eq 1e-14 (log p) logP ] where p = probability d x logP = logProbability d x p_binary :: (Eq a, Show a, Binary a) => T a -> a -> Bool p_binary _ a = a == (decode . encode) a ---------------------------------------------------------------- -- Arbitrary instances for ditributions ---------------------------------------------------------------- instance QC.Arbitrary BinomialDistribution where arbitrary = binomial <$> QC.choose (1,100) <*> QC.choose (0,1) instance QC.Arbitrary ExponentialDistribution where arbitrary = exponential <$> QC.choose (0,100) instance QC.Arbitrary LaplaceDistribution where arbitrary = laplace <$> QC.choose (-10,10) <*> QC.choose (0, 2) instance QC.Arbitrary GammaDistribution where arbitrary = gammaDistr <$> QC.choose (0.1,10) <*> QC.choose (0.1,10) instance QC.Arbitrary BetaDistribution where arbitrary = betaDistr <$> QC.choose (1e-3,10) <*> QC.choose (1e-3,10) instance QC.Arbitrary GeometricDistribution where arbitrary = geometric <$> QC.choose (0,1) instance QC.Arbitrary GeometricDistribution0 where arbitrary = geometric0 <$> QC.choose (0,1) instance QC.Arbitrary HypergeometricDistribution where arbitrary = do l <- QC.choose (1,20) m <- QC.choose (0,l) k <- QC.choose (1,l) return $ hypergeometric m l k instance QC.Arbitrary NormalDistribution where arbitrary = normalDistr <$> QC.choose (-100,100) <*> QC.choose (1e-3, 1e3) instance QC.Arbitrary PoissonDistribution where arbitrary = poisson <$> QC.choose (0,1) instance QC.Arbitrary ChiSquared where arbitrary = chiSquared <$> QC.choose (1,100) instance QC.Arbitrary UniformDistribution where arbitrary = do a <- QC.arbitrary b <- QC.arbitrary `suchThat` (/= a) return $ uniformDistr a b instance QC.Arbitrary CauchyDistribution where arbitrary = cauchyDistribution <$> arbitrary <*> ((abs <$> arbitrary) `suchThat` (> 0)) instance QC.Arbitrary StudentT where arbitrary = studentT <$> ((abs <$> arbitrary) `suchThat` (>0)) instance QC.Arbitrary (LinearTransform StudentT) where arbitrary = studentTUnstandardized <$> ((abs <$> arbitrary) `suchThat` (>0)) <*> ((abs <$> arbitrary)) <*> ((abs <$> arbitrary) `suchThat` (>0)) instance QC.Arbitrary FDistribution where arbitrary = fDistribution <$> ((abs <$> arbitrary) `suchThat` (>0)) <*> ((abs <$> arbitrary) `suchThat` (>0)) -- Parameters for distribution testing. Some distribution require -- relaxing parameters a bit class Param a where -- Precision for quantileIsInvCDF invQuantilePrec :: a -> Double invQuantilePrec _ = 1e-14 -- Distribution is OK for testing limits okForInfLimit :: a -> Bool okForInfLimit _ = True instance Param a instance Param StudentT where invQuantilePrec _ = 1e-13 okForInfLimit d = studentTndf d > 0.75 instance Param (LinearTransform StudentT) where invQuantilePrec _ = 1e-13 okForInfLimit d = (studentTndf . linTransDistr) d > 0.75 instance Param FDistribution where invQuantilePrec _ = 1e-12 ---------------------------------------------------------------- -- Unit tests ---------------------------------------------------------------- unitTests :: Test unitTests = testGroup "Unit tests" [ testAssertion "density (gammaDistr 150 1/150) 1 == 4.883311" $ 4.883311418525483 =~ (density (gammaDistr 150 (1/150)) 1) -- Student-T , testStudentPDF 0.3 1.34 0.0648215 -- PDF , testStudentPDF 1 0.42 0.27058 , testStudentPDF 4.4 0.33 0.352994 , testStudentCDF 0.3 3.34 0.757146 -- CDF , testStudentCDF 1 0.42 0.626569 , testStudentCDF 4.4 0.33 0.621739 -- Student-T General , testStudentUnstandardizedPDF 0.3 1.2 4 0.45 0.0533456 -- PDF , testStudentUnstandardizedPDF 4.3 (-2.4) 3.22 (-0.6) 0.0971141 , testStudentUnstandardizedPDF 3.8 0.22 7.62 0.14 0.0490523 , testStudentUnstandardizedCDF 0.3 1.2 4 0.45 0.458035 -- CDF , testStudentUnstandardizedCDF 4.3 (-2.4) 3.22 (-0.6) 0.698001 , testStudentUnstandardizedCDF 3.8 0.22 7.62 0.14 0.496076 -- F-distribution , testFdistrPDF 1 3 3 (1/(6 * pi)) -- PDF , testFdistrPDF 2 2 1.2 0.206612 , testFdistrPDF 10 12 8 0.000385613179281892790166 , testFdistrCDF 1 3 3 0.81830988618379067153 -- CDF , testFdistrCDF 2 2 1.2 0.545455 , testFdistrCDF 10 12 8 0.99935509863451408041 ] where -- Student-T testStudentPDF ndf x exact = testAssertion (printf "density (studentT %f) %f ~ %f" ndf x exact) $ eq 1e-5 exact (density (studentT ndf) x) testStudentCDF ndf x exact = testAssertion (printf "cumulative (studentT %f) %f ~ %f" ndf x exact) $ eq 1e-5 exact (cumulative (studentT ndf) x) -- Student-T General testStudentUnstandardizedPDF ndf mu sigma x exact = testAssertion (printf "density (studentTUnstandardized %f %f %f) %f ~ %f" ndf mu sigma x exact) $ eq 1e-5 exact (density (studentTUnstandardized ndf mu sigma) x) testStudentUnstandardizedCDF ndf mu sigma x exact = testAssertion (printf "cumulative (studentTUnstandardized %f %f %f) %f ~ %f" ndf mu sigma x exact) $ eq 1e-5 exact (cumulative (studentTUnstandardized ndf mu sigma) x) -- F-distribution testFdistrPDF n m x exact = testAssertion (printf "density (fDistribution %i %i) %f ~ %f [got %f]" n m x exact d) $ eq 1e-5 exact d where d = density (fDistribution n m) x testFdistrCDF n m x exact = testAssertion (printf "cumulative (fDistribution %i %i) %f ~ %f [got %f]" n m x exact d) $ eq 1e-5 exact d where d = cumulative (fDistribution n m) x statistics-0.13.3.0/tests/Tests/ApproxEq.hs0000644000000000000000000000655712724050603016667 0ustar0000000000000000{-# LANGUAGE FlexibleContexts, FlexibleInstances, TypeFamilies #-} module Tests.ApproxEq ( ApproxEq(..) ) where import Data.Complex (Complex(..), realPart) import Data.List (intersperse) import Data.Maybe (catMaybes) import Numeric.MathFunctions.Constants (m_epsilon) import Statistics.Matrix hiding (map, toList) import Test.QuickCheck import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Statistics.Matrix as M class (Eq a, Show a) => ApproxEq a where type Bounds a eq :: Bounds a -> a -> a -> Bool eql :: Bounds a -> a -> a -> Property eql eps a b = counterexample (show a ++ " /=~ " ++ show b) (eq eps a b) (=~) :: a -> a -> Bool (==~) :: ApproxEq a => a -> a -> Property a ==~ b = counterexample (show a ++ " /=~ " ++ show b) (a =~ b) instance ApproxEq Double where type Bounds Double = Double eq eps a b | a == 0 && b == 0 = True | otherwise = abs (a - b) <= eps * max (abs a) (abs b) (=~) = eq m_epsilon instance ApproxEq (Complex Double) where type Bounds (Complex Double) = Double eq eps a@(ar :+ ai) b@(br :+ bi) | a == 0 && b == 0 = True | otherwise = abs (ar - br) <= eps * d && abs (ai - bi) <= eps * d where d = max (realPart $ abs a) (realPart $ abs b) (=~) = eq m_epsilon instance ApproxEq [Double] where type Bounds [Double] = Double eq eps (x:xs) (y:ys) = eq eps x y && eq eps xs ys eq _ [] [] = True eq _ _ _ = False eql = eqll length id id (=~) = eq m_epsilon (==~) = eql m_epsilon instance ApproxEq (U.Vector Double) where type Bounds (U.Vector Double) = Double eq = eqv (=~) = eq m_epsilon eql = eqlv (==~) = eqlv m_epsilon instance ApproxEq (V.Vector Double) where type Bounds (V.Vector Double) = Double eq = eqv (=~) = eq m_epsilon eql = eqlv (==~) = eqlv m_epsilon instance ApproxEq Matrix where type Bounds Matrix = Double eq eps (Matrix r1 c1 e1 v1) (Matrix r2 c2 e2 v2) = (r1,c1,e1) == (r2,c2,e2) && eq eps v1 v2 (=~) = eq m_epsilon eql eps a b = eqll dimension M.toList (`quotRem` cols a) eps a b (==~) = eql m_epsilon eqv :: (ApproxEq a, G.Vector v Bool, G.Vector v a) => Bounds a -> v a -> v a -> Bool eqv eps a b = G.length a == G.length b && G.and (G.zipWith (eq eps) a b) eqlv :: (ApproxEq [a], G.Vector v a) => Bounds [a] -> v a -> v a -> Property eqlv eps a b = eql eps (G.toList a) (G.toList b) eqll :: (ApproxEq l, ApproxEq a, Show c, Show d, Eq d, Bounds l ~ Bounds a) => (l -> d) -> (l -> [a]) -> (Int -> c) -> Bounds l -> l -> l -> Property eqll dim toList coord eps a b = counterexample fancy $ eq eps a b where fancy | la /= lb = "size mismatch: " ++ show la ++ " /= " ++ show lb | length summary < length full = summary | otherwise = full summary = concat . intersperse ", " . catMaybes $ zipWith3 whee (map coord [(0::Int)..]) xs ys full | '\n' `elem` sa = sa ++ " /=~\n" ++ sb | otherwise = sa ++ " /=~" ++ sb (sa, sb) = (show a, show b) (xs, ys) = (toList a, toList b) (la, lb) = (dim a, dim b) whee i x y | eq eps x y = Nothing | otherwise = Just $ show i ++ ": " ++ show x ++ " /=~ " ++ show y statistics-0.13.3.0/tests/Tests/NonParametric/0000755000000000000000000000000012724050603017321 5ustar0000000000000000statistics-0.13.3.0/tests/Tests/NonParametric/Table.hs0000644000000000000000000001332712724050603020712 0ustar0000000000000000module Tests.NonParametric.Table ( tableKSD , tableKS2D ) where -- Table for Kolmogorov-Smirnov statistics for standard normal -- distribution. Generated using R. -- -- First element of tuple is D second is sample for which it was -- calculated. tableKSD :: [(Double,[Double])] tableKSD = [ (0.2012078,[1.360645,-0.3151904,-1.245443,0.1741977,-0.1421206,-1.798246,1.171594,-1.335844,-5.050093e-2,1.030063,-1.849005,0.6491455,-0.7028004]) , (0.2569956,[0.3884734,-1.227821,-0.4166262,0.429118,-0.9280124,0.8025867,-0.6703089,-0.2124872,0.1224496,0.1087734,-4.285284e-2,-1.039936,-0.7071956]) , (0.1960356,[-1.814745,-0.6327167,0.7082493,0.6264716,1.02061,-0.4094635,0.821026,-0.4255047,-0.4820728,-0.2239833,0.648517,1.114283,0.3610216]) , (0.2095746,[0.187011,0.1805498,0.4448389,0.6065506,0.2308673,0.5292549,-1.489902,-1.455191,0.5449396,-0.1436403,-0.7977073,-0.2693545,0.8260888,-1.474473,-2.158696e-2,-0.1455387]) , (0.1922603,[0.5772317,-1.255561,1.605823,0.4923361,0.2470848,1.176101,-0.3767689,-0.6896885,0.4509345,-0.5048447,0.9436534,1.025599,0.2998393,-3.415219e-2,1.264315,-1.44433,-1.646449e-2]) , (0.2173401,[1.812807,-0.8687497,-0.5710508,1.003647,1.142621,0.6546577,-6.083323e-3,1.628574e-2,1.067499,-1.953143,-0.6060077,1.90859,-0.7480553,0.6715162,-0.928759,1.862,1.604621,-0.2171044,-0.1835918]) , (0.2510541,[-0.4769572,1.062319,0.9952284,1.198086,1.015589,-0.4154523,-0.6711762,1.202902,0.2217098,5.381759e-2,0.6679715,0.2551287,-0.1371492]) , (0.1996022,[1.158607,-0.7354863,1.526559,-0.7855418,-2.82999,-0.6045106,-0.1830228,0.3306812,-0.819657,-1.223715,0.2536423,-0.4155781,1.447042]) , (0.2284761,[1.239965,0.8187093,0.5199788,1.172072,0.748259,1.869376e-2,0.1625921,-1.712065,0.7043582,-1.702702,-0.4792806,-0.1023351,0.1187189]) , (0.2337866,[0.9417261,-0.1024297,-0.7354359,1.099991,0.801984,-0.3745397,-1.749564,1.795771,1.099963,-0.605557,-2.035897,1.893603,-0.3468928,-0.2593938,2.100988,0.9665698,0.8757091,0.7696328,0.8730729,-0.3990352,2.04361,-0.4617864,-0.155021,2.15774,0.2687795,-0.9853512,-0.3264898,1.260026,4.267695,-0.5571145,0.6307067,-0.1691405,-1.730686]) , (0.3389167,[2.025542,-1.542641,-1.090238,3.99027,9.949129e-2,-0.8974433,-2.508418,6.390346,-2.675515,1.154459,1.688072,2.220727,-0.4743102]) , (0.4920231,[0.5192906,-3.260813,-1.245185,1.693084,3.561318,4.058924,2.27063,0.9446943,4.794159,-3.423733,0.8240817,0.644059,0.900175,1.932513,1.024586,2.82823,2.072192,-0.353231,-0.4319673,1.505952,1.0199,4.555054,2.364929,5.531467,3.279415,3.19821,2.726925,1.680027,-0.9041334,-0.8246765,-1.343979,8.454955,1.354581]) , (0.6727408,[-6.705672,-3.193988,-4.612611,-3.207994,-5.070172,-6.141169,-0.397149,-4.093359,-1.204801,-3.986585,-2.724662,0.9868107,-6.295266,-5.95839,-6.35114,-1.679555,-2.635889,-4.050329,1.557428,-2.548465,-0.9073924,-1.502018,-4.535688,-4.158818,-8.833434,-5.944697,-1.569672,-4.70399,-7.832059,-4.093708,-8.393417,-2.085432,-7.06495,-0.4230419,-3.046822,-3.23895,-0.9265873,-9.227822,3.293713,-5.593577,-5.942398,-4.358421,2.660044,-4.301572,-1.258879,0.1499903,3.572833,-3.19844,0.8652432,-0.3025793,-1.576673,-7.666265,-6.751463,-1.398944,-2.690656,-1.429654,7.508364e-2,0.7998344,-3.562074,-1.021431,1.342968,2.110244,-7.561497,-2.372083,-3.649193,-5.7723,-1.068083,0.7537809,-4.569546,-1.198005,-5.638384,-1.227226,-1.195852,-1.118175,-9.130527,0.9675821,-2.497391,0.5988562,-1.965783,-4.25741,-6.547006,-1.459294,-2.380556,-3.977307,-7.809006,-4.276819,-4.028746,-9.055546,-3.599239,-1.470512,-8.253329,-1.351687,-4.269324,-6.140353,-6.30808,-1.834091,-3.135146,-9.391791,3.117815,-5.554733,-2.556769,-3.287376,-2.064013,-5.741995,-5.047918,-4.808841,-1.488526,-0.2351115,-5.760833,-2.722929,-7.012353,2.281171,-3.890514,-1.516824,-1.41011,-1.828457,-5.561244,-3.472142,-10.16919,-0.4369042,-5.698953,-4.587462,-4.897086]) ] -- Table for 2-sample Kolmogorov-Smirnov statistics. Generated using R -- -- First element is D, second and third are samples tableKS2D :: [(Double,[Double],[Double])] tableKS2D = [ (0.2820513,[-0.4212928,2.146532,0.7585263,-0.5086105,-0.7725486,6.235548e-2,-0.1849861,0.861972,-0.1958534,-3.379697e-2,-1.316854,0.6701269],[0.4957582,0.4241167,0.9822869,0.4504248,-0.1749617,1.178098,-1.117222,-0.859273,0.3073736,0.4344583,-0.4761338,-1.332374,1.487291]) , (0.2820513,[-0.712252,0.7990333,-0.7968473,1.443609,1.163096,-1.349071,-0.1553941,-2.003104,-0.3400618,-0.7019282,0.183293,-0.2352167],[-0.4622455,-0.8132221,0.1161614,-1.472115e-2,1.001454,-6.557789e-2,-0.2531216,-1.032432,0.4105478,1.749614,0.9722899,5.850942e-2,-0.3352746]) , (0.2564103,[0.3509882,-0.2982833,1.314731,1.264223,-0.8156374,0.3734029,-3.288915e-2,0.6766016,0.9786335,0.1079949,-0.4211722,1.58656],[0.8024675,7.464538e-2,0.2739861,-2.334255e-2,0.5611802,0.6683374,0.4358206,0.349843,1.207834,1.402578,-0.4049183,0.4286042,1.665129]) , (0.1833333,[1.376196,9.926384e-2,2.199292,-2.04993,0.5585353,-0.4812132,0.1041527,2.084774,0.71194,-1.398245,-4.458574e-2,1.484945,-1.473182,1.020076,-0.7019646,0.2182066,-1.702963,-0.3522622,-0.8129267,-0.6338972],[-1.020371,0.3323861,1.513288,0.1958708,-1.0723,5.323446e-2,-0.9993713,-0.7046356,-0.6781067,-0.4471603,1.512042,-0.2650665,-4.765228e-2,-1.501205,1.228664,0.5332935,-0.2960315,-0.1509683]) , (0.5666667,[0.7145305,0.1255674,2.001531,0.1419216],[2.113474,-0.3352839,-0.4962429,-1.386079,0.6404667,-0.7145304,0.1084008,-0.9821421,-2.270472,-1.003846,-0.5644588,2.699695,-1.296494,-0.1538839,1.319094,-1.127544,0.3568889,0.2004726,-1.313291,0.3581084,0.3313498,0.9336278,0.9850203,-1.309506,1.170459,-0.7517466,-1.771269,0.7156381,-1.129691,0.877729]) , (0.5,[0.6950626,0.1643805,-0.3102472,0.4810762,0.1844602,1.338836,-0.8083386,-0.5482141,0.9532421,-0.2644837],[7.527945,-1.95654,1.513725,-1.318431,2.453895,0.2078194,0.7371092,2.834245,-2.134794,3.938259]) ] statistics-0.13.3.0/tests/Tests/Math/0000755000000000000000000000000012724050603015450 5ustar0000000000000000statistics-0.13.3.0/tests/Tests/Math/Tables.hs0000644000000000000000000000454512724050603017226 0ustar0000000000000000module Tests.Math.Tables where tableLogGamma :: [(Double,Double)] tableLogGamma = [(0.000001250000000, 13.592366285131769033) , (0.000068200000000, 9.5930266308318756785) , (0.000246000000000, 8.3100370767447966358) , (0.000880000000000, 7.03508133735248542) , (0.003120000000000, 5.768129358365567505) , (0.026700000000000, 3.6082588918892977148) , (0.077700000000000, 2.5148371858768232556) , (0.234000000000000, 1.3579557559432759994) , (0.860000000000000, 0.098146578027685615897) , (1.340000000000000, -0.11404757557207759189) , (1.890000000000000, -0.0425116422978701336) , (2.450000000000000, 0.25014296569217625565) , (3.650000000000000, 1.3701041997380685178) , (4.560000000000000, 2.5375143317949580002) , (6.660000000000000, 5.9515377269550207018) , (8.250000000000000, 9.0331869196051233217) , (11.300000000000001, 15.814180681373947834) , (25.600000000000001, 56.711261598328121636) , (50.399999999999999, 146.12815158702164808) , (123.299999999999997, 468.85500075897556371) , (487.399999999999977, 2526.9846647543727158) , (853.399999999999977, 4903.9359135978220365) , (2923.300000000000182, 20402.93198938705973) , (8764.299999999999272, 70798.268343590112636) , (12630.000000000000000, 106641.77264982508495) , (34500.000000000000000, 325976.34838781820145) , (82340.000000000000000, 849629.79603036714252) , (234800.000000000000000, 2668846.4390507959761) , (834300.000000000000000, 10540830.912557534873) , (1230000.000000000000000, 16017699.322315014899) ] tableIncompleteBeta :: [(Double,Double,Double,Double)] tableIncompleteBeta = [(2.000000000000000, 3.000000000000000, 0.030000000000000, 0.0051864299999999996862) , (2.000000000000000, 3.000000000000000, 0.230000000000000, 0.22845923000000001313) , (2.000000000000000, 3.000000000000000, 0.760000000000000, 0.95465728000000005249) , (4.000000000000000, 2.300000000000000, 0.890000000000000, 0.93829812158347802864) , (1.000000000000000, 1.000000000000000, 0.550000000000000, 0.55000000000000004441) , (0.300000000000000, 12.199999999999999, 0.110000000000000, 0.95063000053947077639) , (13.100000000000000, 9.800000000000001, 0.120000000000000, 1.3483109941962659385e-07) , (13.100000000000000, 9.800000000000001, 0.420000000000000, 0.071321857831804780226) , (13.100000000000000, 9.800000000000001, 0.920000000000000, 0.99999578339197081611) ] statistics-0.13.3.0/tests/Tests/Math/gen.py0000644000000000000000000000264412724050603016601 0ustar0000000000000000#!/usr/bin/python """ """ from mpmath import * def printListLiteral(lines) : print " [" + "\n , ".join(lines) + "\n ]" ################################################################ # Generate header print "module Tests.Math.Tables where" print ################################################################ ## Generate table for logGamma print "tableLogGamma :: [(Double,Double)]" print "tableLogGamma =" gammaArg = [ 1.25e-6, 6.82e-5, 2.46e-4, 8.8e-4, 3.12e-3, 2.67e-2, 7.77e-2, 0.234, 0.86, 1.34, 1.89, 2.45, 3.65, 4.56, 6.66, 8.25, 11.3, 25.6, 50.4, 123.3, 487.4, 853.4, 2923.3, 8764.3, 1.263e4, 3.45e4, 8.234e4, 2.348e5, 8.343e5, 1.23e6, ] printListLiteral( [ '(%.15f, %.20g)' % (x, log(gamma(x))) for x in gammaArg ] ) ################################################################ ## Generate table for incompleteBeta print "tableIncompleteBeta :: [(Double,Double,Double,Double)]" print "tableIncompleteBeta =" incompleteBetaArg = [ (2, 3, 0.03), (2, 3, 0.23), (2, 3, 0.76), (4, 2.3, 0.89), (1, 1, 0.55), (0.3, 12.2, 0.11), (13.1, 9.8, 0.12), (13.1, 9.8, 0.42), (13.1, 9.8, 0.92), ] printListLiteral( [ '(%.15f, %.15f, %.15f, %.20g)' % (p,q,x, betainc(p,q,0,x, regularized=True)) for (p,q,x) in incompleteBetaArg ]) statistics-0.13.3.0/tests/Tests/Matrix/0000755000000000000000000000000012724050603016023 5ustar0000000000000000statistics-0.13.3.0/tests/Tests/Matrix/Types.hs0000644000000000000000000000243112724050603017463 0ustar0000000000000000{-# LANGUAGE DeriveFunctor #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Tests.Matrix.Types ( Mat(..) , fromMat , toMat ) where import Control.Monad (join) import Control.Applicative ((<$>), (<*>)) import Statistics.Matrix (Matrix(..), fromList) import Test.QuickCheck import Tests.Helpers (shrinkFixedList, small) import qualified Data.Vector.Unboxed as U data Mat a = Mat { mrows :: Int , mcols :: Int , asList :: [[a]] } deriving (Eq, Ord, Show, Functor) fromMat :: Mat Double -> Matrix fromMat (Mat r c xs) = fromList r c (concat xs) toMat :: Matrix -> Mat Double toMat (Matrix r c _ v) = Mat r c . split . U.toList $ v where split xs@(_:_) = let (h,t) = splitAt c xs in h : split t split [] = [] instance (Arbitrary a) => Arbitrary (Mat a) where arbitrary = small $ join (arbMat <$> arbitrary <*> arbitrary) shrink (Mat r c xs) = Mat r c <$> shrinkFixedList (shrinkFixedList shrink) xs arbMat :: (Arbitrary a) => Positive (Small Int) -> Positive (Small Int) -> Gen (Mat a) arbMat (Positive (Small r)) (Positive (Small c)) = Mat r c <$> vectorOf r (vector c) instance Arbitrary Matrix where arbitrary = fromMat <$> arbitrary -- shrink = map fromMat . shrink . toMat statistics-0.13.3.0/benchmark/0000755000000000000000000000000012724050603014245 5ustar0000000000000000statistics-0.13.3.0/benchmark/bench.hs0000644000000000000000000000521512724050603015663 0ustar0000000000000000import Control.Monad.ST (runST) import Criterion.Main import Data.Complex import Statistics.Sample import Statistics.Transform import Statistics.Correlation.Pearson import System.Random.MWC import qualified Data.Vector.Unboxed as U -- Test sample sample :: U.Vector Double sample = runST $ flip uniformVector 10000 =<< create -- Weighted test sample sampleW :: U.Vector (Double,Double) sampleW = U.zip sample (U.reverse sample) -- Comlex vector for FFT tests sampleC :: U.Vector (Complex Double) sampleC = U.zipWith (:+) sample (U.reverse sample) -- Simple benchmark for functions from Statistics.Sample main :: IO () main = defaultMain [ bgroup "sample" [ bench "range" $ nf (\x -> range x) sample -- Mean , bench "mean" $ nf (\x -> mean x) sample , bench "meanWeighted" $ nf (\x -> meanWeighted x) sampleW , bench "harmonicMean" $ nf (\x -> harmonicMean x) sample , bench "geometricMean" $ nf (\x -> geometricMean x) sample -- Variance , bench "variance" $ nf (\x -> variance x) sample , bench "varianceUnbiased" $ nf (\x -> varianceUnbiased x) sample , bench "varianceWeighted" $ nf (\x -> varianceWeighted x) sampleW -- Correlation , bench "pearson" $ nf (\x -> pearson (U.reverse sample) x) sample , bench "pearson'" $ nf (\x -> pearson' (U.reverse sample) x) sample , bench "pearsonFast" $ nf (\x -> pearsonFast (U.reverse sample) x) sample -- Other , bench "stdDev" $ nf (\x -> stdDev x) sample , bench "skewness" $ nf (\x -> skewness x) sample , bench "kurtosis" $ nf (\x -> kurtosis x) sample -- Central moments , bench "C.M. 2" $ nf (\x -> centralMoment 2 x) sample , bench "C.M. 3" $ nf (\x -> centralMoment 3 x) sample , bench "C.M. 4" $ nf (\x -> centralMoment 4 x) sample , bench "C.M. 5" $ nf (\x -> centralMoment 5 x) sample ] , bgroup "FFT" [ bgroup "fft" [ bench (show n) $ whnf fft (U.take n sampleC) | n <- fftSizes ] , bgroup "ifft" [ bench (show n) $ whnf ifft (U.take n sampleC) | n <- fftSizes ] , bgroup "dct" [ bench (show n) $ whnf dct (U.take n sample) | n <- fftSizes ] , bgroup "dct_" [ bench (show n) $ whnf dct_ (U.take n sampleC) | n <- fftSizes ] , bgroup "idct" [ bench (show n) $ whnf idct (U.take n sample) | n <- fftSizes ] , bgroup "idct_" [ bench (show n) $ whnf idct_ (U.take n sampleC) | n <- fftSizes ] ] ] fftSizes :: [Int] fftSizes = [32,128,512,2048] statistics-0.13.3.0/examples/0000755000000000000000000000000012724050603014131 5ustar0000000000000000statistics-0.13.3.0/examples/kde/0000755000000000000000000000000012724050603014674 5ustar0000000000000000statistics-0.13.3.0/examples/kde/KDE.hs0000644000000000000000000000166412724050603015642 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} import Control.Applicative ((<$>)) import Statistics.Sample.KernelDensity (kde) import Text.Hastache (MuType(..), defaultConfig, hastacheFile) import Text.Hastache.Context (mkStrContext) import qualified Data.Attoparsec.ByteString as B import qualified Data.Attoparsec.ByteString.Char8 as A import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as L import qualified Data.Vector.Unboxed as U import qualified Data.Text.Lazy.IO as TL csv = do B.takeTill A.isEndOfLine (A.double `A.sepBy` A.char ',') `A.sepBy` A.endOfLine main = do waits <- (either error (U.fromList . map last . filter (not.null)) . A.parseOnly csv) <$> B.readFile "data/faithful.csv" let xs = map (\(a,b) -> [a,b]) . U.toList . uncurry U.zip . kde 64 $ waits context "data" = MuVariable . show $ xs s <- hastacheFile defaultConfig "kde.tpl" (mkStrContext context) TL.writeFile "kde.html" s statistics-0.13.3.0/examples/kde/kde.tpl0000644000000000000000000000221112724050603016154 0ustar0000000000000000 Kernel density

Kernel density

This is a 64-point kernel density estimate of wait times between eruptions of the Old Faithful geyser.

statistics-0.13.3.0/examples/kde/kde.html0000644000000000000000000000727112724050603016334 0ustar0000000000000000 Kernel density

Kernel density

This is a 64-point kernel density estimate of wait times between eruptions of the Old Faithful geyser.

statistics-0.13.3.0/examples/kde/data/0000755000000000000000000000000012724050603015605 5ustar0000000000000000statistics-0.13.3.0/examples/kde/data/faithful.csv0000644000000000000000000000433312724050603020127 0ustar0000000000000000eruption,wait 3.6,79 1.8,54 3.333,74 2.283,62 4.533,85 2.883,55 4.7,88 3.6,85 1.95,51 4.35,85 1.833,54 3.917,84 4.2,78 1.75,47 4.7,83 2.167,52 1.75,62 4.8,84 1.6,52 4.25,79 1.8,51 1.75,47 3.45,78 3.067,69 4.533,74 3.6,83 1.967,55 4.083,76 3.85,78 4.433,79 4.3,73 4.467,77 3.367,66 4.033,80 3.833,74 2.017,52 1.867,48 4.833,80 1.833,59 4.783,90 4.35,80 1.883,58 4.567,84 1.75,58 4.533,73 3.317,83 3.833,64 2.1,53 4.633,82 2,59 4.8,75 4.716,90 1.833,54 4.833,80 1.733,54 4.883,83 3.717,71 1.667,64 4.567,77 4.317,81 2.233,59 4.5,84 1.75,48 4.8,82 1.817,60 4.4,92 4.167,78 4.7,78 2.067,65 4.7,73 4.033,82 1.967,56 4.5,79 4,71 1.983,62 5.067,76 2.017,60 4.567,78 3.883,76 3.6,83 4.133,75 4.333,82 4.1,70 2.633,65 4.067,73 4.933,88 3.95,76 4.517,80 2.167,48 4,86 2.2,60 4.333,90 1.867,50 4.817,78 1.833,63 4.3,72 4.667,84 3.75,75 1.867,51 4.9,82 2.483,62 4.367,88 2.1,49 4.5,83 4.05,81 1.867,47 4.7,84 1.783,52 4.85,86 3.683,81 4.733,75 2.3,59 4.9,89 4.417,79 1.7,59 4.633,81 2.317,50 4.6,85 1.817,59 4.417,87 2.617,53 4.067,69 4.25,77 1.967,56 4.6,88 3.767,81 1.917,45 4.5,82 2.267,55 4.65,90 1.867,45 4.167,83 2.8,56 4.333,89 1.833,46 4.383,82 1.883,51 4.933,86 2.033,53 3.733,79 4.233,81 2.233,60 4.533,82 4.817,77 4.333,76 1.983,59 4.633,80 2.017,49 5.1,96 1.8,53 5.033,77 4,77 2.4,65 4.6,81 3.567,71 4,70 4.5,81 4.083,93 1.8,53 3.967,89 2.2,45 4.15,86 2,58 3.833,78 3.5,66 4.583,76 2.367,63 5,88 1.933,52 4.617,93 1.917,49 2.083,57 4.583,77 3.333,68 4.167,81 4.333,81 4.5,73 2.417,50 4,85 4.167,74 1.883,55 4.583,77 4.25,83 3.767,83 2.033,51 4.433,78 4.083,84 1.833,46 4.417,83 2.183,55 4.8,81 1.833,57 4.8,76 4.1,84 3.966,77 4.233,81 3.5,87 4.366,77 2.25,51 4.667,78 2.1,60 4.35,82 4.133,91 1.867,53 4.6,78 1.783,46 4.367,77 3.85,84 1.933,49 4.5,83 2.383,71 4.7,80 1.867,49 3.833,75 3.417,64 4.233,76 2.4,53 4.8,94 2,55 4.15,76 1.867,50 4.267,82 1.75,54 4.483,75 4,78 4.117,79 4.083,78 4.267,78 3.917,70 4.55,79 4.083,70 2.417,54 4.183,86 2.217,50 4.45,90 1.883,54 1.85,54 4.283,77 3.95,79 2.333,64 4.15,75 2.35,47 4.933,86 2.9,63 4.583,85 3.833,82 2.083,57 4.367,82 2.133,67 4.35,74 2.2,54 4.45,83 3.567,73 4.5,73 4.15,88 3.817,80 3.917,71 4.45,83 2,56 4.283,79 4.767,78 4.533,84 1.85,58 4.25,83 1.983,43 2.25,60 4.75,75 4.117,81 2.15,46 4.417,90 1.817,46 4.467,74