statistics-0.10.2.0/0000755000000000000000000000000012016036043012303 5ustar0000000000000000statistics-0.10.2.0/LICENSE0000644000000000000000000000246112016036043013313 0ustar0000000000000000Copyright (c) 2009, 2010 Bryan O'Sullivan All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. statistics-0.10.2.0/README.markdown0000644000000000000000000000200712016036043015003 0ustar0000000000000000# Statistics: efficient, general purpose statistics This package provides the Statistics module, a Haskell library for working with statistical data in a space- and time-efficient way. Where possible, we give citations and computational complexity estimates for the algorithms used. # Performance This library has been carefully optimised for high performance. To obtain the best runtime efficiency, it is imperative to compile libraries and applications that use this library using a high level of optimisation. # Get involved! Please report bugs via the [github issue tracker](https://github.com/bos/statistics/issues). Master [git mirror](https://github.com/bos/statistics): * `git clone git://github.com/bos/statistics.git` There's also a [Mercurial mirror](https://bitbucket.org/bos/statistics): * `hg clone https://bitbucket.org/bos/statistics` (You can create and contribute changes using either Mercurial or git.) # Authors This library is written and maintained by Bryan O'Sullivan, . statistics-0.10.2.0/Setup.lhs0000644000000000000000000000011412016036043014107 0ustar0000000000000000#!/usr/bin/env runhaskell > import Distribution.Simple > main = defaultMain statistics-0.10.2.0/statistics.cabal0000644000000000000000000001701112016036043015461 0ustar0000000000000000name: statistics version: 0.10.2.0 synopsis: A library of statistical types, data, and functions description: This library provides a number of common functions and types useful in statistics. We focus on high performance, numerical robustness, and use of good algorithms. Where possible, we provide references to the statistical literature. . The library's facilities can be divided into four broad categories: . * Working with widely used discrete and continuous probability distributions. (There are dozens of exotic distributions in use; we focus on the most common.) . * Computing with sample data: quantile estimation, kernel density estimation, histograms, bootstrap methods, significance testing, and autocorrelation analysis. . * Random variate generation under several different distributions. . * Common statistical tests for significant differences between samples. . Changes in 0.10.2.0 . * Bugs in DCT and IDCT are fixed. . * Accesors for uniform distribution are added. . * 'ContGen' instances for all continous distribtuions are added. . * Beta distribution is added. . * Constructor for improper gamma distribtuion is added. . * Binomial distribution allows zero trials. . * Poisson distribution now accept zero parameter. . * Integer overflow in caculation of Wilcoxon-T test is fixed. . * Bug in 'ContGen' instance for normal distribution is fixed. . Changes in 0.10.1.0 . * Kolmogorov-Smirnov nonparametric test added. . * Pearson's chi squared test added. . * Type class for generating random variates for given distribution is added. . * Modules 'Statistics.Math' and 'Statistics.Constants' are moved to the @math-functions@ package. They are still available but marked as deprecated. . Changed in 0.10.0.1 . * @dct@ and @idct@ now have type @Vector Double -> Vector Double@ . Changes in 0.10.0.0: . * The type classes @Mean@ and @Variance@ are split in two. This is required for distributions which do not have finite variance or mean. . * The @S.Sample.KernelDensity@ module has been renamed, and completely rewritten to be much more robust. The older module oversmoothed multi-modal data. (The older module is still available under the name @S.Sample.KernelDensity.Simple@). . * Histogram computation is added, in @S.Sample.Histogram@. . * Forward and inverse discrete Fourier and cosine transforms are added, in @S.Transform@. . * Root finding is added, in @S.Math.RootFinding@. . * The @complCumulative@ function is added to the @Distribution@ class in order to accurately assess probalities P(X>x) which are used in one-tailed tests. . * A @stdDev@ function is added to the @Variance@ class for distributions. . * The constructor @S.Distribution.normalDistr@ now takes standard deviation instead of variance as its parameter. . * A bug in @S.Quantile.weightedAvg@ is fixed. It produced a wrong answer if a sample contained only one element. . * Bugs in quantile estimations for chi-square and gamma distribution are fixed. . * Integer overlow in @mannWhitneyUCriticalValue@ is fixed. It produced incorrect critical values for moderately large samples. Something around 20 for 32-bit machines and 40 for 64-bit ones. . * A bug in @mannWhitneyUSignificant@ is fixed. If either sample was larger than 20, it produced a completely incorrect answer. . * One- and two-tailed tests in @S.Tests.NonParametric@ are selected with sum types instead of @Bool@. . * Test results returned as enumeration instead of @Bool@. . * Performance improvements for Mann-Whitney U and Wilcoxon tests. . * Module @S.Tests.NonParamtric@ is split into @S.Tests.MannWhitneyU@ and @S.Tests.WilcoxonT@ . * @sortBy@ is added to @S.Function@. . * Mean and variance for gamma distribution are fixed. . * Much faster cumulative probablity functions for Poisson and hypergeometric distributions. . * Better density functions for gamma and Poisson distributions. . * Student-T, Fisher-Snedecor F-distributions and Cauchy-Lorentz distrbution are added. . * The function @S.Function.create@ is removed. Use @generateM@ from the @vector@ package instead. . * Function to perform approximate comparion of doubles is added to @S.Function.Comparison@ . * Regularized incomplete beta function and its inverse are added to @S.Function@. license: BSD3 license-file: LICENSE homepage: https://github.com/bos/statistics bug-reports: https://github.com/bos/statistics/issues author: Bryan O'Sullivan maintainer: Bryan O'Sullivan copyright: 2009, 2010, 2011 Bryan O'Sullivan category: Math, Statistics build-type: Simple cabal-version: >= 1.8 extra-source-files: README.markdown examples/kde/KDE.hs examples/kde/data/faithful.csv examples/kde/kde.html examples/kde/kde.tpl tests/Tests/Math/gen.py library exposed-modules: Statistics.Autocorrelation Statistics.Constants Statistics.Distribution Statistics.Distribution.Beta Statistics.Distribution.Binomial Statistics.Distribution.CauchyLorentz Statistics.Distribution.ChiSquared Statistics.Distribution.Exponential Statistics.Distribution.FDistribution Statistics.Distribution.Gamma Statistics.Distribution.Geometric Statistics.Distribution.Hypergeometric Statistics.Distribution.Normal Statistics.Distribution.Poisson Statistics.Distribution.StudentT Statistics.Distribution.Uniform Statistics.Function Statistics.Math Statistics.Math.RootFinding Statistics.Quantile Statistics.Resampling Statistics.Resampling.Bootstrap Statistics.Sample Statistics.Sample.Histogram Statistics.Sample.KernelDensity Statistics.Sample.KernelDensity.Simple Statistics.Sample.Powers Statistics.Test.NonParametric Statistics.Test.ChiSquared Statistics.Test.KolmogorovSmirnov Statistics.Test.MannWhitneyU Statistics.Test.WilcoxonT Statistics.Test.Types Statistics.Transform Statistics.Types other-modules: Statistics.Distribution.Poisson.Internal Statistics.Function.Comparison Statistics.Internal Statistics.Test.Internal build-depends: base < 5, deepseq >= 1.1.0.2, erf, monad-par >= 0.1.0.1, mwc-random >= 0.11.0.0, math-functions >= 0.1.1, primitive >= 0.3, vector >= 0.7.1, vector-algorithms >= 0.4 if impl(ghc >= 6.10) build-depends: base >= 4 -- gather extensive profiling data for now ghc-prof-options: -auto-all ghc-options: -O2 -Wall -funbox-strict-fields if impl(ghc >= 6.8) ghc-options: -fwarn-tabs test-suite tests type: exitcode-stdio-1.0 hs-source-dirs: tests main-is: tests.hs other-modules: Tests.Distribution Tests.Helpers Tests.Function Tests.NonparametricTest Tests.NonparametricTest.Table Tests.Transform Tests.KDE ghc-options: -Wall -threaded -rtsopts build-depends: base, ieee754 >= 0.7.3, HUnit, QuickCheck >= 2, test-framework, test-framework-quickcheck2, test-framework-hunit, math-functions, statistics, primitive, vector, vector-algorithms, erf source-repository head type: git location: https://github.com/bos/statistics source-repository head type: mercurial location: https://bitbucket.org/bos/statistics statistics-0.10.2.0/examples/0000755000000000000000000000000012016036043014121 5ustar0000000000000000statistics-0.10.2.0/examples/kde/0000755000000000000000000000000012016036043014664 5ustar0000000000000000statistics-0.10.2.0/examples/kde/KDE.hs0000644000000000000000000000156412016036043015631 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} import Control.Applicative ((<$>)) import Statistics.Sample.KernelDensity (kde) import Text.Hastache (MuType(..), defaultConfig, hastacheFile) import Text.Hastache.Context (mkStrContext) import qualified Data.Attoparsec as B import qualified Data.Attoparsec.Char8 as A import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as L import qualified Data.Vector.Unboxed as U csv = do B.takeTill A.isEndOfLine (A.double `A.sepBy` A.char ',') `A.sepBy` A.endOfLine main = do waits <- (either error (U.fromList . map last . filter (not.null)) . A.parseOnly csv) <$> B.readFile "data/faithful.csv" let xs = map (\(a,b) -> [a,b]) . U.toList . uncurry U.zip . kde 64 $ waits context "data" = MuVariable . show $ xs s <- hastacheFile defaultConfig "kde.tpl" (mkStrContext context) L.writeFile "kde.html" s statistics-0.10.2.0/examples/kde/kde.html0000644000000000000000000000727112016036043016324 0ustar0000000000000000 Kernel density

Kernel density

This is a 64-point kernel density estimate of wait times between eruptions of the Old Faithful geyser.

statistics-0.10.2.0/examples/kde/kde.tpl0000644000000000000000000000221112016036043016144 0ustar0000000000000000 Kernel density

Kernel density

This is a 64-point kernel density estimate of wait times between eruptions of the Old Faithful geyser.

statistics-0.10.2.0/examples/kde/data/0000755000000000000000000000000012016036043015575 5ustar0000000000000000statistics-0.10.2.0/examples/kde/data/faithful.csv0000644000000000000000000000433312016036043020117 0ustar0000000000000000eruption,wait 3.6,79 1.8,54 3.333,74 2.283,62 4.533,85 2.883,55 4.7,88 3.6,85 1.95,51 4.35,85 1.833,54 3.917,84 4.2,78 1.75,47 4.7,83 2.167,52 1.75,62 4.8,84 1.6,52 4.25,79 1.8,51 1.75,47 3.45,78 3.067,69 4.533,74 3.6,83 1.967,55 4.083,76 3.85,78 4.433,79 4.3,73 4.467,77 3.367,66 4.033,80 3.833,74 2.017,52 1.867,48 4.833,80 1.833,59 4.783,90 4.35,80 1.883,58 4.567,84 1.75,58 4.533,73 3.317,83 3.833,64 2.1,53 4.633,82 2,59 4.8,75 4.716,90 1.833,54 4.833,80 1.733,54 4.883,83 3.717,71 1.667,64 4.567,77 4.317,81 2.233,59 4.5,84 1.75,48 4.8,82 1.817,60 4.4,92 4.167,78 4.7,78 2.067,65 4.7,73 4.033,82 1.967,56 4.5,79 4,71 1.983,62 5.067,76 2.017,60 4.567,78 3.883,76 3.6,83 4.133,75 4.333,82 4.1,70 2.633,65 4.067,73 4.933,88 3.95,76 4.517,80 2.167,48 4,86 2.2,60 4.333,90 1.867,50 4.817,78 1.833,63 4.3,72 4.667,84 3.75,75 1.867,51 4.9,82 2.483,62 4.367,88 2.1,49 4.5,83 4.05,81 1.867,47 4.7,84 1.783,52 4.85,86 3.683,81 4.733,75 2.3,59 4.9,89 4.417,79 1.7,59 4.633,81 2.317,50 4.6,85 1.817,59 4.417,87 2.617,53 4.067,69 4.25,77 1.967,56 4.6,88 3.767,81 1.917,45 4.5,82 2.267,55 4.65,90 1.867,45 4.167,83 2.8,56 4.333,89 1.833,46 4.383,82 1.883,51 4.933,86 2.033,53 3.733,79 4.233,81 2.233,60 4.533,82 4.817,77 4.333,76 1.983,59 4.633,80 2.017,49 5.1,96 1.8,53 5.033,77 4,77 2.4,65 4.6,81 3.567,71 4,70 4.5,81 4.083,93 1.8,53 3.967,89 2.2,45 4.15,86 2,58 3.833,78 3.5,66 4.583,76 2.367,63 5,88 1.933,52 4.617,93 1.917,49 2.083,57 4.583,77 3.333,68 4.167,81 4.333,81 4.5,73 2.417,50 4,85 4.167,74 1.883,55 4.583,77 4.25,83 3.767,83 2.033,51 4.433,78 4.083,84 1.833,46 4.417,83 2.183,55 4.8,81 1.833,57 4.8,76 4.1,84 3.966,77 4.233,81 3.5,87 4.366,77 2.25,51 4.667,78 2.1,60 4.35,82 4.133,91 1.867,53 4.6,78 1.783,46 4.367,77 3.85,84 1.933,49 4.5,83 2.383,71 4.7,80 1.867,49 3.833,75 3.417,64 4.233,76 2.4,53 4.8,94 2,55 4.15,76 1.867,50 4.267,82 1.75,54 4.483,75 4,78 4.117,79 4.083,78 4.267,78 3.917,70 4.55,79 4.083,70 2.417,54 4.183,86 2.217,50 4.45,90 1.883,54 1.85,54 4.283,77 3.95,79 2.333,64 4.15,75 2.35,47 4.933,86 2.9,63 4.583,85 3.833,82 2.083,57 4.367,82 2.133,67 4.35,74 2.2,54 4.45,83 3.567,73 4.5,73 4.15,88 3.817,80 3.917,71 4.45,83 2,56 4.283,79 4.767,78 4.533,84 1.85,58 4.25,83 1.983,43 2.25,60 4.75,75 4.117,81 2.15,46 4.417,90 1.817,46 4.467,74 statistics-0.10.2.0/Statistics/0000755000000000000000000000000012016036043014435 5ustar0000000000000000statistics-0.10.2.0/Statistics/Autocorrelation.hs0000644000000000000000000000307512016036043020150 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Autocorrelation -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for computing autocovariance and autocorrelation of a -- sample. module Statistics.Autocorrelation ( autocovariance , autocorrelation ) where import Statistics.Sample (mean) import qualified Data.Vector.Generic as G -- | Compute the autocovariance of a sample, i.e. the covariance of -- the sample against a shifted version of itself. autocovariance :: (G.Vector v Double, G.Vector v Int) => v Double -> v Double autocovariance a = G.map f . G.enumFromTo 0 $ l-2 where f k = G.sum (G.zipWith (*) (G.take (l-k) c) (G.slice k (l-k) c)) / fromIntegral l c = G.map (subtract (mean a)) a l = G.length a -- | Compute the autocorrelation function of a sample, and the upper -- and lower bounds of confidence intervals for each element. -- -- /Note/: The calculation of the 95% confidence interval assumes a -- stationary Gaussian process. autocorrelation :: (G.Vector v Double, G.Vector v Int) => v Double -> (v Double, v Double, v Double) autocorrelation a = (r, ci (-), ci (+)) where r = G.map (/ G.head c) c where c = autocovariance a dllse = G.map f . G.scanl1 (+) . G.map square $ r where f v = 1.96 * sqrt ((v * 2 + 1) / l) l = fromIntegral (G.length a) ci f = G.cons 1 . G.tail . G.map (f (-1/l)) $ dllse square x = x * x statistics-0.10.2.0/Statistics/Constants.hs0000644000000000000000000000105212016036043016743 0ustar0000000000000000-- | -- Module : Statistics.Constants -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Constant values common to much statistics code. -- -- DEPRECATED: use module 'Numeric.MathFunctions.Constants' from -- math-functions. module Statistics.Constants {-# DEPRECATED "use module Numeric.MathFunctions.Constants from math-functions" #-} ( module Numeric.MathFunctions.Constants ) where import Numeric.MathFunctions.Constants statistics-0.10.2.0/Statistics/Distribution.hs0000644000000000000000000001421312016036043017451 0ustar0000000000000000{-# LANGUAGE BangPatterns, ScopedTypeVariables #-} -- | -- Module : Statistics.Distribution -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Types classes for probability distrubutions module Statistics.Distribution ( -- * Type classes Distribution(..) , DiscreteDistr(..) , ContDistr(..) -- ** Distribution statistics , MaybeMean(..) , Mean(..) , MaybeVariance(..) , Variance(..) -- ** Random number generation , ContGen(..) , DiscreteGen(..) , genContinous -- * Helper functions , findRoot , sumProbabilities ) where import Control.Applicative ((<$>), Applicative(..)) import Control.Monad.Primitive (PrimMonad,PrimState) import qualified Data.Vector.Unboxed as U import System.Random.MWC -- | Type class common to all distributions. Only c.d.f. could be -- defined for both discrete and continous distributions. class Distribution d where -- | Cumulative distribution function. The probability that a -- random variable /X/ is less or equal than /x/, -- i.e. P(/X/≤/x/). cumulative :: d -> Double -> Double -- | One's complement of cumulative distibution: -- -- > complCumulative d x = 1 - cumulative d x -- -- It's useful when one is interested in P(/X/≥/x/) and -- expression on the right side begin to lose precision. This -- function have default implementation but implementors are -- encouraged to provide more precise implementation complCumulative :: d -> Double -> Double complCumulative d x = 1 - cumulative d x -- | Discrete probability distribution. class Distribution d => DiscreteDistr d where -- | Probability of n-th outcome. probability :: d -> Int -> Double -- | Continuous probability distributuion class Distribution d => ContDistr d where -- | Probability density function. Probability that random -- variable /X/ lies in the infinitesimal interval -- [/x/,/x+/δ/x/) equal to /density(x)/⋅δ/x/ density :: d -> Double -> Double -- | Inverse of the cumulative distribution function. The value -- /x/ for which P(/X/≤/x/) = /p/. If probability is outside -- of [0,1] range function should call 'error' quantile :: d -> Double -> Double -- | Type class for distributions with mean. 'maybeMean' should return -- 'Nothing' if it's undefined for current value of data class Distribution d => MaybeMean d where maybeMean :: d -> Maybe Double -- | Type class for distributions with mean. If distribution have -- finite mean for all valid values of parameters it should be -- instance of this type class. class MaybeMean d => Mean d where mean :: d -> Double -- | Type class for distributions with variance. If variance is -- undefined for some parameter values both 'maybeVariance' and -- 'maybeStdDev' should return Nothing. -- -- Minimal complete definition is 'maybeVariance' or 'maybeStdDev' class MaybeMean d => MaybeVariance d where maybeVariance :: d -> Maybe Double maybeVariance d = (*) <$> x <*> x where x = maybeStdDev d maybeStdDev :: d -> Maybe Double maybeStdDev = fmap sqrt . maybeVariance -- | Type class for distributions with variance. If distibution have -- finite variance for all valid parameter values it should be -- instance of this type class. -- -- Minimal complete definition is 'variance' or 'stdDev' class (Mean d, MaybeVariance d) => Variance d where variance :: d -> Double variance d = x * x where x = stdDev d stdDev :: d -> Double stdDev = sqrt . variance -- | Generate discrete random variates which have given -- distribution. class Distribution d => ContGen d where genContVar :: PrimMonad m => d -> Gen (PrimState m) -> m Double -- | Generate discrete random variates which have given -- distribution. 'ContGen' is superclass because it's always possible -- to generate real-valued variates from integer values class (DiscreteDistr d, ContGen d) => DiscreteGen d where genDiscreteVar :: PrimMonad m => d -> Gen (PrimState m) -> m Int -- | Generate variates from continous distribution using inverse -- transform rule. genContinous :: (ContDistr d, PrimMonad m) => d -> Gen (PrimState m) -> m Double genContinous d gen = do x <- uniform gen return $! quantile d x {-# INLINE genContinous #-} data P = P {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Approximate the value of /X/ for which P(/x/>/X/)=/p/. -- -- This method uses a combination of Newton-Raphson iteration and -- bisection with the given guess as a starting point. The upper and -- lower bounds specify the interval in which the probability -- distribution reaches the value /p/. findRoot :: ContDistr d => d -- ^ Distribution -> Double -- ^ Probability /p/ -> Double -- ^ Initial guess -> Double -- ^ Lower bound on interval -> Double -- ^ Upper bound on interval -> Double findRoot d prob = loop 0 1 where loop !(i::Int) !dx !x !lo !hi | abs dx <= accuracy || i >= maxIters = x | otherwise = loop (i+1) dx'' x'' lo' hi' where err = cumulative d x - prob P lo' hi' | err < 0 = P x hi | otherwise = P lo x pdf = density d x P dx' x' | pdf /= 0 = P (err / pdf) (x - dx) | otherwise = P dx x P dx'' x'' | x' < lo' || x' > hi' || pdf == 0 = let y = (lo' + hi') / 2 in P (y-x) y | otherwise = P dx' x' accuracy = 1e-15 maxIters = 150 -- | Sum probabilities in inclusive interval. sumProbabilities :: DiscreteDistr d => d -> Int -> Int -> Double sumProbabilities d low hi = -- Return value is forced to be less than 1 to guard againist roundoff errors. -- ATTENTION! this check should be removed for testing or it could mask bugs. min 1 . U.sum . U.map (probability d) $ U.enumFromTo low hi {-# INLINE sumProbabilities #-}statistics-0.10.2.0/Statistics/Function.hs0000644000000000000000000000476412016036043016571 0ustar0000000000000000{-# LANGUAGE CPP, FlexibleContexts, Rank2Types #-} -- | -- Module : Statistics.Function -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Useful functions. module Statistics.Function ( -- * Scanning minMax -- * Sorting , sort , sortBy , partialSort -- * Indexing , indexed , indices -- * Bit twiddling , nextHighestPowerOfTwo -- * Comparison , within ) where #include "MachDeps.h" import Data.Bits ((.|.), shiftR) import qualified Data.Vector.Algorithms.Intro as I import qualified Data.Vector.Generic as G import Statistics.Function.Comparison (within) -- | Sort a vector. sort :: (Ord e, G.Vector v e) => v e -> v e sort = G.modify I.sort {-# INLINE sort #-} -- | Sort a vector using a custom ordering. sortBy :: (G.Vector v e) => I.Comparison e -> v e -> v e sortBy f = G.modify $ I.sortBy f {-# INLINE sortBy #-} -- | Partially sort a vector, such that the least /k/ elements will be -- at the front. partialSort :: (G.Vector v e, Ord e) => Int -- ^ The number /k/ of least elements. -> v e -> v e partialSort k = G.modify (`I.partialSort` k) {-# INLINE partialSort #-} -- | Return the indices of a vector. indices :: (G.Vector v a, G.Vector v Int) => v a -> v Int indices a = G.enumFromTo 0 (G.length a - 1) {-# INLINE indices #-} -- | Zip a vector with its indices. indexed :: (G.Vector v e, G.Vector v Int, G.Vector v (Int,e)) => v e -> v (Int,e) indexed a = G.zip (indices a) a {-# INLINE indexed #-} data MM = MM {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Compute the minimum and maximum of a vector in one pass. minMax :: (G.Vector v Double) => v Double -> (Double, Double) minMax = fini . G.foldl' go (MM (1/0) (-1/0)) where go (MM lo hi) k = MM (min lo k) (max hi k) fini (MM lo hi) = (lo, hi) {-# INLINE minMax #-} -- | Efficiently compute the next highest power of two for a -- non-negative integer. If the given value is already a power of -- two, it is returned unchanged. If negative, zero is returned. nextHighestPowerOfTwo :: Int -> Int nextHighestPowerOfTwo n = o + 1 where m = n - 1 o = m .|. (m `shiftR` 1) .|. (m `shiftR` 2) .|. (m `shiftR` 4) .|. (m `shiftR` 8) .|. (m `shiftR` 16) #if WORD_SIZE_IN_BITS == 64 .|. (m `shiftR` 32) #endif statistics-0.10.2.0/Statistics/Internal.hs0000644000000000000000000000220112016036043016540 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash, UnboxedTuples #-} -- | -- Module : Statistics.Internal -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Scary internal functions. module Statistics.Internal ( inlinePerformIO ) where #if __GLASGOW_HASKELL__ >= 611 import GHC.IO (IO(IO)) #else import GHC.IOBase (IO(IO)) #endif import GHC.Base (realWorld#) #if !defined(__GLASGOW_HASKELL__) import System.IO.Unsafe (unsafePerformIO) #endif -- Lifted from Data.ByteString.Internal so we don't introduce an -- otherwise unnecessary dependency on the bytestring package. -- | Just like unsafePerformIO, but we inline it. Big performance -- gains as it exposes lots of things to further inlining. /Very -- unsafe/. In particular, you should do no memory allocation inside -- an 'inlinePerformIO' block. On Hugs this is just @unsafePerformIO@. {-# INLINE inlinePerformIO #-} inlinePerformIO :: IO a -> a #if defined(__GLASGOW_HASKELL__) inlinePerformIO (IO m) = case m realWorld# of (# _, r #) -> r #else inlinePerformIO = unsafePerformIO #endif statistics-0.10.2.0/Statistics/Math.hs0000644000000000000000000000142712016036043015666 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts #-} -- | -- Module : Statistics.Math -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Mathematical functions for statistics. -- -- DEPRECATED. Use package math-functions instead. This module is just -- reexports functions from 'Numeric.SpecFunctions', -- 'Numeric.SpecFunctions.Extra' and 'Numeric.Polynomial.Chebyshev'. module Statistics.Math {-# DEPRECATED "Use package math-function" #-} ( module Numeric.Polynomial.Chebyshev , module Numeric.SpecFunctions , module Numeric.SpecFunctions.Extra ) where import Numeric.Polynomial.Chebyshev import Numeric.SpecFunctions import Numeric.SpecFunctions.Extra statistics-0.10.2.0/Statistics/Quantile.hs0000644000000000000000000001440312016036043016555 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Quantile -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for approximating quantiles, i.e. points taken at regular -- intervals from the cumulative distribution function of a random -- variable. -- -- The number of quantiles is described below by the variable /q/, so -- with /q/=4, a 4-quantile (also known as a /quartile/) has 4 -- intervals, and contains 5 points. The parameter /k/ describes the -- desired point, where 0 ≤ /k/ ≤ /q/. module Statistics.Quantile ( -- * Quantile estimation functions weightedAvg , ContParam(..) , continuousBy , midspread -- * Parameters for the continuous sample method , cadpw , hazen , s , spss , medianUnbiased , normalUnbiased -- * References -- $references ) where import Control.Exception (assert) import Data.Vector.Generic ((!)) import Numeric.MathFunctions.Constants (m_epsilon) import Statistics.Function (partialSort) import qualified Data.Vector.Generic as G -- | O(/n/ log /n/). Estimate the /k/th /q/-quantile of a sample, -- using the weighted average method. weightedAvg :: G.Vector v Double => Int -- ^ /k/, the desired quantile. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double weightedAvg k q x | n == 1 = G.head x | otherwise = assert (q >= 2) . assert (k >= 0) . assert (k < q) . assert (G.all (not . isNaN) x) $ xj + g * (xj1 - xj) where j = floor idx idx = fromIntegral (n - 1) * fromIntegral k / fromIntegral q g = idx - fromIntegral j xj = sx ! j xj1 = sx ! (j+1) sx = partialSort (j+2) x n = G.length x {-# INLINE weightedAvg #-} -- | Parameters /a/ and /b/ to the 'continuousBy' function. data ContParam = ContParam {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | O(/n/ log /n/). Estimate the /k/th /q/-quantile of a sample /x/, -- using the continuous sample method with the given parameters. This -- is the method used by most statistical software, such as R, -- Mathematica, SPSS, and S. continuousBy :: G.Vector v Double => ContParam -- ^ Parameters /a/ and /b/. -> Int -- ^ /k/, the desired quantile. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double continuousBy (ContParam a b) k q x = assert (q >= 2) . assert (k >= 0) . assert (k <= q) . assert (G.all (not . isNaN) x) $ (1-h) * item (j-1) + h * item j where j = floor (t + eps) t = a + p * (fromIntegral n + 1 - a - b) p = fromIntegral k / fromIntegral q h | abs r < eps = 0 | otherwise = r where r = t - fromIntegral j eps = m_epsilon * 4 n = G.length x item = (sx !) . bracket sx = partialSort (bracket j + 1) x bracket m = min (max m 0) (n - 1) {-# INLINE continuousBy #-} -- | O(/n/ log /n/). Estimate the range between /q/-quantiles 1 and -- /q/-1 of a sample /x/, using the continuous sample method with the -- given parameters. -- -- For instance, the interquartile range (IQR) can be estimated as -- follows: -- -- > midspread medianUnbiased 4 (U.fromList [1,1,2,2,3]) -- > ==> 1.333333 midspread :: G.Vector v Double => ContParam -- ^ Parameters /a/ and /b/. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double midspread (ContParam a b) k x = assert (G.all (not . isNaN) x) . assert (k > 0) $ quantile (1-frac) - quantile frac where quantile i = (1-h i) * item (j i-1) + h i * item (j i) j i = floor (t i + eps) :: Int t i = a + i * (fromIntegral n + 1 - a - b) h i | abs r < eps = 0 | otherwise = r where r = t i - fromIntegral (j i) eps = m_epsilon * 4 n = G.length x item = (sx !) . bracket sx = partialSort (bracket (j (1-frac)) + 1) x bracket m = min (max m 0) (n - 1) frac = 1 / fromIntegral k {-# INLINE midspread #-} -- | California Department of Public Works definition, /a/=0, /b/=1. -- Gives a linear interpolation of the empirical CDF. This -- corresponds to method 4 in R and Mathematica. cadpw :: ContParam cadpw = ContParam 0 1 {-# INLINE cadpw #-} -- | Hazen's definition, /a/=0.5, /b/=0.5. This is claimed to be -- popular among hydrologists. This corresponds to method 5 in R and -- Mathematica. hazen :: ContParam hazen = ContParam 0.5 0.5 {-# INLINE hazen #-} -- | Definition used by the SPSS statistics application, with /a/=0, -- /b/=0 (also known as Weibull's definition). This corresponds to -- method 6 in R and Mathematica. spss :: ContParam spss = ContParam 0 0 {-# INLINE spss #-} -- | Definition used by the S statistics application, with /a/=1, -- /b/=1. The interpolation points divide the sample range into @n-1@ -- intervals. This corresponds to method 7 in R and Mathematica. s :: ContParam s = ContParam 1 1 {-# INLINE s #-} -- | Median unbiased definition, /a/=1\/3, /b/=1\/3. The resulting -- quantile estimates are approximately median unbiased regardless of -- the distribution of /x/. This corresponds to method 8 in R and -- Mathematica. medianUnbiased :: ContParam medianUnbiased = ContParam third third where third = 1/3 {-# INLINE medianUnbiased #-} -- | Normal unbiased definition, /a/=3\/8, /b/=3\/8. An approximately -- unbiased estimate if the empirical distribution approximates the -- normal distribution. This corresponds to method 9 in R and -- Mathematica. normalUnbiased :: ContParam normalUnbiased = ContParam ta ta where ta = 3/8 {-# INLINE normalUnbiased #-} -- $references -- -- * Weisstein, E.W. Quantile. /MathWorld/. -- -- -- * Hyndman, R.J.; Fan, Y. (1996) Sample quantiles in statistical -- packages. /American Statistician/ -- 50(4):361–365. statistics-0.10.2.0/Statistics/Resampling.hs0000644000000000000000000000652112016036043017076 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} -- | -- Module : Statistics.Resampling -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Resampling statistics. module Statistics.Resampling ( Resample(..) , jackknife , resample ) where import Control.Concurrent (forkIO, newChan, readChan, writeChan) import Control.Monad (forM_, liftM, replicateM_) import Control.Monad.Primitive (PrimMonad, PrimState) import Data.Vector.Algorithms.Intro (sort) import Data.Vector.Generic (unsafeFreeze) import Data.Word (Word32) import GHC.Conc (numCapabilities) import Statistics.Function (indices) import Statistics.Types (Estimator, Sample) import System.Random.MWC (Gen, initialize, uniform, uniformVector) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU -- | A resample drawn randomly, with replacement, from a set of data -- points. Distinct from a normal array to make it harder for your -- humble author's brain to go wrong. newtype Resample = Resample { fromResample :: U.Vector Double } deriving (Eq, Show) -- | /O(e*r*s)/ Resample a data set repeatedly, with replacement, -- computing each estimate over the resampled data. -- -- This function is expensive; it has to do work proportional to -- /e*r*s/, where /e/ is the number of estimation functions, /r/ is -- the number of resamples to compute, and /s/ is the number of -- original samples. -- -- To improve performance, this function will make use of all -- available CPUs. At least with GHC 7.0, parallel performance seems -- best if the parallel garbage collector is disabled (RTS option -- @-qg@). resample :: Gen (PrimState IO) -> [Estimator] -- ^ Estimation functions. -> Int -- ^ Number of resamples to compute. -> Sample -- ^ Original sample. -> IO [Resample] resample gen ests numResamples samples = do let !numSamples = U.length samples ixs = scanl (+) 0 $ zipWith (+) (replicate numCapabilities q) (replicate r 1 ++ repeat 0) where (q,r) = numResamples `quotRem` numCapabilities results <- mapM (const (MU.new numResamples)) ests done <- newChan forM_ (zip ixs (tail ixs)) $ \ (start,!end) -> do gen' <- initialize =<< (uniformVector gen 256 :: IO (U.Vector Word32)) forkIO $ do let loop k ers | k >= end = writeChan done () | otherwise = do re <- U.replicateM numSamples $ do r <- uniform gen' return (U.unsafeIndex samples (r `mod` numSamples)) forM_ ers $ \(est,arr) -> MU.write arr k . est $ re loop (k+1) ers loop start (zip ests results) replicateM_ numCapabilities $ readChan done mapM_ sort results mapM (liftM Resample . unsafeFreeze) results -- | Compute a statistical estimate repeatedly over a sample, each -- time omitting a successive element. jackknife :: Estimator -> Sample -> U.Vector Double jackknife est sample = U.map f . indices $ sample where f i = est (dropAt i sample) {- INLINE jackknife #-} -- | Drop the /k/th element of a vector. dropAt :: U.Unbox e => Int -> U.Vector e -> U.Vector e dropAt n v = U.slice 0 n v U.++ U.slice (n+1) (U.length v - n - 1) v statistics-0.10.2.0/Statistics/Sample.hs0000644000000000000000000002743512016036043016225 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample -- Copyright : (c) 2008 Don Stewart, 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Commonly used sample statistics, also known as descriptive -- statistics. module Statistics.Sample ( -- * Types Sample , WeightedSample -- * Descriptive functions , range -- * Statistics of location , mean , meanWeighted , harmonicMean , geometricMean -- * Statistics of dispersion -- $variance -- ** Functions over central moments , centralMoment , centralMoments , skewness , kurtosis -- ** Two-pass functions (numerically robust) -- $robust , variance , varianceUnbiased , meanVariance , meanVarianceUnb , stdDev , varianceWeighted -- ** Single-pass functions (faster, less safe) -- $cancellation , fastVariance , fastVarianceUnbiased , fastStdDev -- * References -- $references ) where import Statistics.Function (minMax) import Statistics.Types (Sample,WeightedSample) import qualified Data.Vector.Generic as G -- Operator ^ will be overriden import Prelude hiding ((^)) -- | /O(n)/ Range. The difference between the largest and smallest -- elements of a sample. range :: (G.Vector v Double) => v Double -> Double range s = hi - lo where (lo , hi) = minMax s {-# INLINE range #-} -- | /O(n)/ Arithmetic mean. This uses Welford's algorithm to provide -- numerical stability, using a single pass over the sample data. mean :: (G.Vector v Double) => v Double -> Double mean = fini . G.foldl' go (T 0 0) where fini (T a _) = a go (T m n) x = T m' n' where m' = m + (x - m) / fromIntegral n' n' = n + 1 {-# INLINE mean #-} -- | /O(n)/ Arithmetic mean for weighted sample. It uses a single-pass -- algorithm analogous to the one used by 'mean'. meanWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> Double meanWeighted = fini . G.foldl' go (V 0 0) where fini (V a _) = a go (V m w) (x,xw) = V m' w' where m' | w' == 0 = 0 | otherwise = m + xw * (x - m) / w' w' = w + xw {-# INLINE meanWeighted #-} -- | /O(n)/ Harmonic mean. This algorithm performs a single pass over -- the sample. harmonicMean :: (G.Vector v Double) => v Double -> Double harmonicMean = fini . G.foldl' go (T 0 0) where fini (T b a) = fromIntegral a / b go (T x y) n = T (x + (1/n)) (y+1) {-# INLINE harmonicMean #-} -- | /O(n)/ Geometric mean of a sample containing no negative values. geometricMean :: (G.Vector v Double) => v Double -> Double geometricMean = fini . G.foldl' go (T 1 0) where fini (T p n) = p ** (1 / fromIntegral n) go (T p n) a = T (p * a) (n + 1) {-# INLINE geometricMean #-} -- | Compute the /k/th central moment of a sample. The central moment -- is also known as the moment about the mean. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. centralMoment :: (G.Vector v Double) => Int -> v Double -> Double centralMoment a xs | a < 0 = error "Statistics.Sample.centralMoment: negative input" | a == 0 = 1 | a == 1 = 0 | otherwise = G.sum (G.map go xs) / fromIntegral (G.length xs) where go x = (x-m) ^ a m = mean xs {-# INLINE centralMoment #-} -- | Compute the /k/th and /j/th central moments of a sample. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. centralMoments :: (G.Vector v Double) => Int -> Int -> v Double -> (Double, Double) centralMoments a b xs | a < 2 || b < 2 = (centralMoment a xs , centralMoment b xs) | otherwise = fini . G.foldl' go (V 0 0) $ xs where go (V i j) x = V (i + d^a) (j + d^b) where d = x - m fini (V i j) = (i / n , j / n) m = mean xs n = fromIntegral (G.length xs) {-# INLINE centralMoments #-} -- | Compute the skewness of a sample. This is a measure of the -- asymmetry of its distribution. -- -- A sample with negative skew is said to be /left-skewed/. Most of -- its mass is on the right of the distribution, with the tail on the -- left. -- -- > skewness $ U.to [1,100,101,102,103] -- > ==> -1.497681449918257 -- -- A sample with positive skew is said to be /right-skewed/. -- -- > skewness $ U.to [1,2,3,4,100] -- > ==> 1.4975367033335198 -- -- A sample's skewness is not defined if its 'variance' is zero. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. skewness :: (G.Vector v Double) => v Double -> Double skewness xs = c3 * c2 ** (-1.5) where (c3 , c2) = centralMoments 3 2 xs {-# INLINE skewness #-} -- | Compute the excess kurtosis of a sample. This is a measure of -- the \"peakedness\" of its distribution. A high kurtosis indicates -- that more of the sample's variance is due to infrequent severe -- deviations, rather than more frequent modest deviations. -- -- A sample's excess kurtosis is not defined if its 'variance' is -- zero. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. kurtosis :: (G.Vector v Double) => v Double -> Double kurtosis xs = c4 / (c2 * c2) - 3 where (c4 , c2) = centralMoments 4 2 xs {-# INLINE kurtosis #-} -- $variance -- -- The variance—and hence the standard deviation—of a -- sample of fewer than two elements are both defined to be zero. -- $robust -- -- These functions use the compensated summation algorithm of Chan et -- al. for numerical robustness, but require two passes over the -- sample data as a result. -- -- Because of the need for two passes, these functions are /not/ -- subject to stream fusion. data V = V {-# UNPACK #-} !Double {-# UNPACK #-} !Double robustSumVar :: (G.Vector v Double) => Double -> v Double -> Double robustSumVar m samp = G.sum . G.map (square . subtract m) $ samp where square x = x * x {-# INLINE robustSumVar #-} -- | Maximum likelihood estimate of a sample's variance. Also known -- as the population variance, where the denominator is /n/. variance :: (G.Vector v Double) => v Double -> Double variance samp | n > 1 = robustSumVar (mean samp) samp / fromIntegral n | otherwise = 0 where n = G.length samp {-# INLINE variance #-} -- | Unbiased estimate of a sample's variance. Also known as the -- sample variance, where the denominator is /n/-1. varianceUnbiased :: (G.Vector v Double) => v Double -> Double varianceUnbiased samp | n > 1 = robustSumVar (mean samp) samp / fromIntegral (n-1) | otherwise = 0 where n = G.length samp {-# INLINE varianceUnbiased #-} -- | Calculate mean and maximum likelihood estimate of variance. This -- function should be used if both mean and variance are required -- since it will calculate mean only once. meanVariance :: (G.Vector v Double) => v Double -> (Double,Double) meanVariance samp | n > 1 = (m, robustSumVar m samp / fromIntegral n) | otherwise = (m, 0) where n = G.length samp m = mean samp {-# INLINE meanVariance #-} -- | Calculate mean and unbiased estimate of variance. This -- function should be used if both mean and variance are required -- since it will calculate mean only once. meanVarianceUnb :: (G.Vector v Double) => v Double -> (Double,Double) meanVarianceUnb samp | n > 1 = (m, robustSumVar m samp / fromIntegral (n-1)) | otherwise = (m, 0) where n = G.length samp m = mean samp {-# INLINE meanVarianceUnb #-} -- | Standard deviation. This is simply the square root of the -- unbiased estimate of the variance. stdDev :: (G.Vector v Double) => v Double -> Double stdDev = sqrt . varianceUnbiased {-# INLINE stdDev #-} robustSumVarWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> V robustSumVarWeighted samp = G.foldl' go (V 0 0) samp where go (V s w) (x,xw) = V (s + xw*d*d) (w + xw) where d = x - m m = meanWeighted samp {-# INLINE robustSumVarWeighted #-} -- | Weighted variance. This is biased estimation. varianceWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> Double varianceWeighted samp | G.length samp > 1 = fini $ robustSumVarWeighted samp | otherwise = 0 where fini (V s w) = s / w {-# INLINE varianceWeighted #-} -- $cancellation -- -- The functions prefixed with the name @fast@ below perform a single -- pass over the sample data using Knuth's algorithm. They usually -- work well, but see below for caveats. These functions are subject -- to array fusion. -- -- /Note/: in cases where most sample data is close to the sample's -- mean, Knuth's algorithm gives inaccurate results due to -- catastrophic cancellation. fastVar :: (G.Vector v Double) => v Double -> T1 fastVar = G.foldl' go (T1 0 0 0) where go (T1 n m s) x = T1 n' m' s' where n' = n + 1 m' = m + d / fromIntegral n' s' = s + d * (x - m') d = x - m -- | Maximum likelihood estimate of a sample's variance. fastVariance :: (G.Vector v Double) => v Double -> Double fastVariance = fini . fastVar where fini (T1 n _m s) | n > 1 = s / fromIntegral n | otherwise = 0 {-# INLINE fastVariance #-} -- | Unbiased estimate of a sample's variance. fastVarianceUnbiased :: (G.Vector v Double) => v Double -> Double fastVarianceUnbiased = fini . fastVar where fini (T1 n _m s) | n > 1 = s / fromIntegral (n - 1) | otherwise = 0 {-# INLINE fastVarianceUnbiased #-} -- | Standard deviation. This is simply the square root of the -- maximum likelihood estimate of the variance. fastStdDev :: (G.Vector v Double) => v Double -> Double fastStdDev = sqrt . fastVariance {-# INLINE fastStdDev #-} ------------------------------------------------------------------------ -- Helper code. Monomorphic unpacked accumulators. -- (^) operator from Prelude is just slow. (^) :: Double -> Int -> Double x ^ 1 = x x ^ n = x * (x ^ (n-1)) {-# INLINE (^) #-} -- don't support polymorphism, as we can't get unboxed returns if we use it. data T = T {-# UNPACK #-}!Double {-# UNPACK #-}!Int data T1 = T1 {-# UNPACK #-}!Int {-# UNPACK #-}!Double {-# UNPACK #-}!Double {- Consider this core: with data T a = T !a !Int $wfold :: Double# -> Int# -> Int# -> (# Double, Int# #) and without, $wfold :: Double# -> Int# -> Int# -> (# Double#, Int# #) yielding to boxed returns and heap checks. -} -- $references -- -- * Chan, T. F.; Golub, G.H.; LeVeque, R.J. (1979) Updating formulae -- and a pairwise algorithm for computing sample -- variances. Technical Report STAN-CS-79-773, Department of -- Computer Science, Stanford -- University. -- -- * Knuth, D.E. (1998) The art of computer programming, volume 2: -- seminumerical algorithms, 3rd ed., p. 232. -- -- * Welford, B.P. (1962) Note on a method for calculating corrected -- sums of squares and products. /Technometrics/ -- 4(3):419–420. -- -- * West, D.H.D. (1979) Updating mean and variance estimates: an -- improved method. /Communications of the ACM/ -- 22(9):532–535. statistics-0.10.2.0/Statistics/Transform.hs0000644000000000000000000001036512016036043016751 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts #-} -- | -- Module : Statistics.Transform -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Fourier-related transformations of mathematical functions. -- -- These functions are written for simplicity and correctness, not -- speed. If you need a fast FFT implementation for your application, -- you should strongly consider using a library of FFTW bindings -- instead. module Statistics.Transform ( -- * Type synonyms CD -- * Discrete cosine transform , dct , dct_ , idct , idct_ -- * Fast Fourier transform , fft , ifft ) where import Control.Monad (when) import Control.Monad.ST (ST) import Data.Bits (shiftL, shiftR) import Data.Complex (Complex(..), conjugate, realPart) import Numeric.SpecFunctions (log2) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as M import qualified Data.Vector.Unboxed as U type CD = Complex Double -- | Discrete cosine transform (DCT-II). dct :: U.Vector Double -> U.Vector Double dct = dctWorker . G.map (:+0) -- | Discrete cosine transform (DCT-II). Only real part of vector is -- transformed, imaginary part is ignored. dct_ :: U.Vector CD -> U.Vector Double dct_ = dctWorker . G.map (\(i :+ _) -> i :+ 0) dctWorker :: U.Vector CD -> U.Vector Double dctWorker xs = G.map realPart $ G.zipWith (*) weights (fft interleaved) where interleaved = G.backpermute xs $ G.enumFromThenTo 0 2 (len-2) G.++ G.enumFromThenTo (len-1) (len-3) 1 weights = G.cons 2 . G.generate (len-1) $ \x -> 2 * exp ((0:+(-1))*fi (x+1)*pi/(2*n)) where n = fi len len = G.length xs -- | Inverse discrete cosine transform (DCT-III). It's inverse of -- 'dct' only up to scale parameter: -- -- > (idct . dct) x = (* lenngth x) idct :: U.Vector Double -> U.Vector Double idct = idctWorker . G.map (:+0) -- | Inverse discrete cosine transform (DCT-III). Only real part of vector is -- transformed, imaginary part is ignored. idct_ :: U.Vector CD -> U.Vector Double idct_ = idctWorker . G.map (\(i :+ _) -> i :+ 0) idctWorker :: U.Vector CD -> U.Vector Double idctWorker xs = G.generate len interleave where interleave z | even z = vals `G.unsafeIndex` halve z | otherwise = vals `G.unsafeIndex` (len - halve z - 1) vals = G.map realPart . ifft $ G.zipWith (*) weights xs weights = G.cons n $ G.generate (len - 1) $ \x -> 2 * n * exp ((0:+1) * fi (x+1) * pi/(2*n)) where n = fi len len = G.length xs -- | Inverse fast Fourier transform. ifft :: U.Vector CD -> U.Vector CD ifft xs = G.map ((/fi (G.length xs)) . conjugate) . fft . G.map conjugate $ xs -- | Radix-2 decimation-in-time fast Fourier transform. fft :: U.Vector CD -> U.Vector CD fft v = G.create $ do mv <- G.thaw v mfft mv return mv mfft :: (M.MVector v CD) => v s CD -> ST s () mfft vec | 1 `shiftL` m /= len = error "Statistics.Transform.fft: bad vector size" | otherwise = bitReverse 0 0 where bitReverse i j | i == len-1 = stage 0 1 | otherwise = do when (i < j) $ M.swap vec i j let inner k l | k <= l = inner (k `shiftR` 1) (l-k) | otherwise = bitReverse (i+1) (l+k) inner (len `shiftR` 1) j stage l !l1 | l == m = return () | otherwise = do let !l2 = l1 `shiftL` 1 !e = -6.283185307179586/fromIntegral l2 flight j !a | j == l1 = stage (l+1) l2 | otherwise = do let butterfly i | i >= len = flight (j+1) (a+e) | otherwise = do let i1 = i + l1 xi1 :+ yi1 <- M.read vec i1 let !c = cos a !s = sin a d = (c*xi1 - s*yi1) :+ (s*xi1 + c*yi1) ci <- M.read vec i M.write vec i1 (ci - d) M.write vec i (ci + d) butterfly (i+l2) butterfly j flight 0 0 len = M.length vec m = log2 len fi :: Int -> CD fi = fromIntegral halve :: Int -> Int halve = (`shiftR` 1) statistics-0.10.2.0/Statistics/Types.hs0000644000000000000000000000141212016036043016073 0ustar0000000000000000-- | -- Module : Statistics.Types -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Types for working with statistics. module Statistics.Types ( Estimator , Sample , WeightedSample , Weights ) where import qualified Data.Vector.Unboxed as U (Vector) -- | Sample data. type Sample = U.Vector Double -- | Sample with weights. First element of sample is data, second is weight type WeightedSample = U.Vector (Double,Double) -- | A function that estimates a property of a sample, such as its -- 'mean'. type Estimator = Sample -> Double -- | Weights for affecting the importance of elements of a sample. type Weights = U.Vector Double statistics-0.10.2.0/Statistics/Distribution/0000755000000000000000000000000012016036043017114 5ustar0000000000000000statistics-0.10.2.0/Statistics/Distribution/Beta.hs0000644000000000000000000000574112016036043020332 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} ----------------------------------------------------------------------------- -- | -- Module : Statistics.Distribution.Beta -- Copyright : (C) 2012 Edward Kmett, -- License : BSD-style (see the file LICENSE) -- -- Maintainer : Edward Kmett -- Stability : provisional -- Portability : DeriveDataTypeable -- ---------------------------------------------------------------------------- module Statistics.Distribution.Beta ( BetaDistribution -- * Constructor , betaDistr , improperBetaDistr -- * Accessors , bdAlpha , bdBeta ) where import Numeric.SpecFunctions (incompleteBeta, invIncompleteBeta, logBeta) import Numeric.MathFunctions.Constants (m_NaN) import qualified Statistics.Distribution as D import Data.Typeable -- | The beta distribution data BetaDistribution = BD { bdAlpha :: {-# UNPACK #-} !Double -- ^ Alpha shape parameter , bdBeta :: {-# UNPACK #-} !Double -- ^ Beta shape parameter } deriving (Eq,Read,Show,Typeable) -- | Create beta distribution. Both shape parameters must be positive. betaDistr :: Double -- ^ Shape parameter alpha -> Double -- ^ Shape parameter beta -> BetaDistribution betaDistr a b | a > 0 && b > 0 = improperBetaDistr a b | otherwise = error $ "Statistics.Distribution.Beta.betaDistr: " ++ "shape parameters must be positive. Got a = " ++ show a ++ " b = " ++ show b {-# INLINE betaDistr #-} -- | Create beta distribution. This construtor doesn't check parameters. improperBetaDistr :: Double -- ^ Shape parameter alpha -> Double -- ^ Shape parameter beta -> BetaDistribution improperBetaDistr = BD {-# INLINE improperBetaDistr #-} instance D.Distribution BetaDistribution where cumulative (BD a b) x | x <= 0 = 0 | x >= 1 = 1 | otherwise = incompleteBeta a b x {-# INLINE cumulative #-} instance D.Mean BetaDistribution where mean (BD a b) = a / (a + b) {-# INLINE mean #-} instance D.MaybeMean BetaDistribution where maybeMean = Just . D.mean {-# INLINE maybeMean #-} instance D.Variance BetaDistribution where variance (BD a b) = a*b / (apb*apb*(apb+1)) where apb = a + b {-# INLINE variance #-} instance D.MaybeVariance BetaDistribution where maybeVariance = Just . D.variance {-# INLINE maybeVariance #-} instance D.ContDistr BetaDistribution where density (BD a b) x | a <= 0 || b <= 0 = m_NaN | x <= 0 = 0 | x >= 1 = 0 | otherwise = exp $ (a-1)*log x + (b-1)*log (1-x) - logBeta a b {-# INLINE density #-} quantile (BD a b) p | p == 0 = 0 | p == 1 = 1 | p > 0 && p < 1 = invIncompleteBeta a b p | otherwise = error $ "Statistics.Distribution.Gamma.quantile: p must be in [0,1] range. Got: "++show p {-# INLINE quantile #-} instance D.ContGen BetaDistribution where genContVar = D.genContinous statistics-0.10.2.0/Statistics/Distribution/Binomial.hs0000644000000000000000000000550512016036043021207 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Binomial -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The binomial distribution. This is the discrete probability -- distribution of the number of successes in a sequence of /n/ -- independent yes\/no experiments, each of which yields success with -- probability /p/. module Statistics.Distribution.Binomial ( BinomialDistribution -- * Constructors , binomial -- * Accessors , bdTrials , bdProbability ) where import Data.Typeable (Typeable) import qualified Statistics.Distribution as D import Numeric.SpecFunctions (choose) -- | The binomial distribution. data BinomialDistribution = BD { bdTrials :: {-# UNPACK #-} !Int -- ^ Number of trials. , bdProbability :: {-# UNPACK #-} !Double -- ^ Probability. } deriving (Eq, Read, Show, Typeable) instance D.Distribution BinomialDistribution where cumulative = cumulative instance D.DiscreteDistr BinomialDistribution where probability = probability instance D.Mean BinomialDistribution where mean = mean instance D.Variance BinomialDistribution where variance = variance instance D.MaybeMean BinomialDistribution where maybeMean = Just . D.mean instance D.MaybeVariance BinomialDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance -- This could be slow for big n probability :: BinomialDistribution -> Int -> Double probability (BD n p) k | k < 0 || k > n = 0 | n == 0 = 1 | otherwise = choose n k * p^k * (1-p)^(n-k) {-# INLINE probability #-} -- Summation from different sides required to reduce roundoff errors cumulative :: BinomialDistribution -> Double -> Double cumulative d@(BD n _) x | k < 0 = 0 | k >= n = 1 | k < m = D.sumProbabilities d 0 k | otherwise = 1 - D.sumProbabilities d (k+1) n where m = floor (mean d) k = floor x {-# INLINE cumulative #-} mean :: BinomialDistribution -> Double mean (BD n p) = fromIntegral n * p {-# INLINE mean #-} variance :: BinomialDistribution -> Double variance (BD n p) = fromIntegral n * p * (1 - p) {-# INLINE variance #-} -- | Construct binomial distribution. Number of trials must be -- non-negative and probability must be in [0,1] range binomial :: Int -- ^ Number of trials. -> Double -- ^ Probability. -> BinomialDistribution binomial n p | n < 0 = error $ msg ++ "number of trials must be non-negative. Got " ++ show n | p < 0 || p > 1 = error $ msg++"probability must be in [0,1] range. Got " ++ show p | otherwise = BD n p where msg = "Statistics.Distribution.Binomial.binomial: " {-# INLINE binomial #-} statistics-0.10.2.0/Statistics/Distribution/CauchyLorentz.hs0000644000000000000000000000420612016036043022244 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.CauchyLorentz -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Cauchy-Lorentz distribution. It's also known as Lorentz -- distribution or Breit–Wigner distribution. -- -- It doesn't have mean and variance. module Statistics.Distribution.CauchyLorentz ( CauchyDistribution , cauchyDistribMedian , cauchyDistribScale -- * Constructors , cauchyDistribution , standardCauchy ) where import Data.Typeable (Typeable) import qualified Statistics.Distribution as D -- | Cauchy-Lorentz distribution. data CauchyDistribution = CD { -- | Central value of Cauchy-Lorentz distribution which is its -- mode and median. Distribution doesn't have mean so function -- is named after median. cauchyDistribMedian :: {-# UNPACK #-} !Double -- | Scale parameter of Cauchy-Lorentz distribution. It's -- different from variance and specify half width at half -- maximum (HWHM). , cauchyDistribScale :: {-# UNPACK #-} !Double } deriving (Eq,Show,Read,Typeable) -- | Cauchy distribution cauchyDistribution :: Double -- ^ Central point -> Double -- ^ Scale parameter (FWHM) -> CauchyDistribution cauchyDistribution m s | s > 0 = CD m s | otherwise = error $ "Statistics.Distribution.CauchyLorentz.cauchyDistribution: FWHM must be positive. Got " ++ show s standardCauchy :: CauchyDistribution standardCauchy = CD 0 1 instance D.Distribution CauchyDistribution where cumulative (CD m s) x = 0.5 + atan( (x - m) / s ) / pi instance D.ContDistr CauchyDistribution where density (CD m s) x = (1 / pi) / (s * (1 + y*y)) where y = (x - m) / s quantile (CD m s) p | p > 0 && p < 1 = m + s * tan( pi * (p - 0.5) ) | p == 0 = -1 / 0 | p == 1 = 1 / 0 | otherwise = error $ "Statistics.Distribution.CauchyLorentz..quantile: p must be in [0,1] range. Got: "++show p instance D.ContGen CauchyDistribution where genContVar = D.genContinous statistics-0.10.2.0/Statistics/Distribution/ChiSquared.hs0000644000000000000000000000541112016036043021501 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.ChiSquared -- Copyright : (c) 2010 Alexey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The chi-squared distribution. This is a continuous probability -- distribution of sum of squares of k independent standard normal -- distributions. It's commonly used in statistical tests module Statistics.Distribution.ChiSquared ( ChiSquared -- Constructors , chiSquared , chiSquaredNDF ) where import Data.Typeable (Typeable) import Numeric.SpecFunctions (incompleteGamma,invIncompleteGamma,logGamma) import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC -- | Chi-squared distribution newtype ChiSquared = ChiSquared Int deriving (Show,Typeable) -- | Get number of degrees of freedom chiSquaredNDF :: ChiSquared -> Int chiSquaredNDF (ChiSquared ndf) = ndf {-# INLINE chiSquaredNDF #-} -- | Construct chi-squared distribution. Number of degrees of freedom -- must be positive. chiSquared :: Int -> ChiSquared chiSquared n | n <= 0 = error $ "Statistics.Distribution.ChiSquared.chiSquared: N.D.F. must be positive. Got " ++ show n | otherwise = ChiSquared n {-# INLINE chiSquared #-} instance D.Distribution ChiSquared where cumulative = cumulative instance D.ContDistr ChiSquared where density = density quantile = quantile instance D.Mean ChiSquared where mean (ChiSquared ndf) = fromIntegral ndf {-# INLINE mean #-} instance D.Variance ChiSquared where variance (ChiSquared ndf) = fromIntegral (2*ndf) {-# INLINE variance #-} instance D.MaybeMean ChiSquared where maybeMean = Just . D.mean instance D.MaybeVariance ChiSquared where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.ContGen ChiSquared where genContVar (ChiSquared n) = MWC.chiSquare n cumulative :: ChiSquared -> Double -> Double cumulative chi x | x <= 0 = 0 | otherwise = incompleteGamma (ndf/2) (x/2) where ndf = fromIntegral $ chiSquaredNDF chi {-# INLINE cumulative #-} density :: ChiSquared -> Double -> Double density chi x | x <= 0 = 0 | otherwise = exp $ log x * (ndf2 - 1) - x2 - logGamma ndf2 - log 2 * ndf2 where ndf = fromIntegral $ chiSquaredNDF chi ndf2 = ndf/2 x2 = x/2 {-# INLINE density #-} quantile :: ChiSquared -> Double -> Double quantile (ChiSquared ndf) p | p == 0 = 0 | p == 1 = 1/0 | p > 0 && p < 1 = 2 * invIncompleteGamma (fromIntegral ndf / 2) p | otherwise = error $ "Statistics.Distribution.ChiSquared.quantile: p must be in [0,1] range. Got: "++show p {-# INLINE quantile #-} statistics-0.10.2.0/Statistics/Distribution/Exponential.hs0000644000000000000000000000607112016036043021742 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Exponential -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The exponential distribution. This is the continunous probability -- distribution of the times between events in a poisson process, in -- which events occur continuously and independently at a constant -- average rate. module Statistics.Distribution.Exponential ( ExponentialDistribution -- * Constructors , exponential , exponentialFromSample -- * Accessors , edLambda ) where import Data.Typeable (Typeable) import qualified Statistics.Distribution as D import qualified Statistics.Sample as S import qualified System.Random.MWC.Distributions as MWC import Statistics.Types (Sample) newtype ExponentialDistribution = ED { edLambda :: Double } deriving (Eq, Read, Show, Typeable) instance D.Distribution ExponentialDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr ExponentialDistribution where density = density quantile = quantile instance D.Mean ExponentialDistribution where mean (ED l) = 1 / l {-# INLINE mean #-} instance D.Variance ExponentialDistribution where variance (ED l) = 1 / (l * l) {-# INLINE variance #-} instance D.MaybeMean ExponentialDistribution where maybeMean = Just . D.mean instance D.MaybeVariance ExponentialDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.ContGen ExponentialDistribution where genContVar = MWC.exponential . edLambda cumulative :: ExponentialDistribution -> Double -> Double cumulative (ED l) x | x <= 0 = 0 | otherwise = 1 - exp (-l * x) {-# INLINE cumulative #-} complCumulative :: ExponentialDistribution -> Double -> Double complCumulative (ED l) x | x <= 0 = 1 | otherwise = exp (-l * x) {-# INLINE complCumulative #-} density :: ExponentialDistribution -> Double -> Double density (ED l) x | x < 0 = 0 | otherwise = l * exp (-l * x) {-# INLINE density #-} quantile :: ExponentialDistribution -> Double -> Double quantile (ED l) p | p == 1 = 1 / 0 | p >= 0 && p < 1 = -log (1 - p) / l | otherwise = error $ "Statistics.Distribution.Exponential.quantile: p must be in [0,1] range. Got: "++show p {-# INLINE quantile #-} -- | Create an exponential distribution. exponential :: Double -- ^ λ (scale) parameter. -> ExponentialDistribution exponential l | l <= 0 = error $ "Statistics.Distribution.Exponential.exponential: scale parameter must be positive. Got " ++ show l | otherwise = ED l {-# INLINE exponential #-} -- | Create exponential distribution from sample. No tests are made to -- check whether it truly is exponential. exponentialFromSample :: Sample -> ExponentialDistribution exponentialFromSample = ED . S.mean {-# INLINE exponentialFromSample #-} statistics-0.10.2.0/Statistics/Distribution/FDistribution.hs0000644000000000000000000000465012016036043022242 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.FDistribution -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Fisher F distribution module Statistics.Distribution.FDistribution ( FDistribution , fDistribution , fDistributionNDF1 , fDistributionNDF2 ) where import qualified Statistics.Distribution as D import Data.Typeable (Typeable) import Numeric.SpecFunctions (logBeta, incompleteBeta, invIncompleteBeta) -- | F distribution data FDistribution = F { fDistributionNDF1 :: {-# UNPACK #-} !Double , fDistributionNDF2 :: {-# UNPACK #-} !Double , _pdfFactor :: {-# UNPACK #-} !Double } deriving (Eq,Show,Read,Typeable) fDistribution :: Int -> Int -> FDistribution fDistribution n m | n > 0 && m > 0 = let n' = fromIntegral n m' = fromIntegral m f' = 0.5 * (log m' * m' + log n' * n') - logBeta (0.5*n') (0.5*m') in F n' m' f' | otherwise = error "Statistics.Distribution.FDistribution.fDistribution: non-positive number of degrees of freedom" instance D.Distribution FDistribution where cumulative = cumulative instance D.ContDistr FDistribution where density = density quantile = quantile cumulative :: FDistribution -> Double -> Double cumulative (F n m _) x | x > 0 = let y = n*x in incompleteBeta (0.5 * n) (0.5 * m) (y / (m + y)) | otherwise = 0 density :: FDistribution -> Double -> Double density (F n m fac) x | x > 0 = exp $ fac + log x * (0.5 * n - 1) - log(m + n*x) * 0.5 * (n + m) | otherwise = 0 quantile :: FDistribution -> Double -> Double quantile (F n m _) p | p >= 0 && p <= 1 = let x = invIncompleteBeta (0.5 * n) (0.5 * m) p in m * x / (n * (1 - x)) | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.MaybeMean FDistribution where maybeMean (F _ m _) | m > 2 = Just $ m / (m - 2) | otherwise = Nothing instance D.MaybeVariance FDistribution where maybeStdDev (F n m _) | m > 4 = Just $ 2 * sqr m * (m + n - 2) / (n * sqr (m - 2) * (m - 4)) | otherwise = Nothing instance D.ContGen FDistribution where genContVar = D.genContinous sqr :: Double -> Double sqr x = x * x {-# INLINE sqr #-} statistics-0.10.2.0/Statistics/Distribution/Gamma.hs0000644000000000000000000000715212016036043020477 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Gamma -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The gamma distribution. This is a continuous probability -- distribution with two parameters, /k/ and ϑ. If /k/ is -- integral, the distribution represents the sum of /k/ independent -- exponentially distributed random variables, each of which has a -- mean of ϑ. module Statistics.Distribution.Gamma ( GammaDistribution -- * Constructors , gammaDistr , improperGammaDistr -- * Accessors , gdShape , gdScale ) where import Data.Typeable (Typeable) import Numeric.MathFunctions.Constants (m_pos_inf, m_NaN) import Numeric.SpecFunctions (incompleteGamma, invIncompleteGamma) import Statistics.Distribution.Poisson.Internal as Poisson import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC -- | The gamma distribution. data GammaDistribution = GD { gdShape :: {-# UNPACK #-} !Double -- ^ Shape parameter, /k/. , gdScale :: {-# UNPACK #-} !Double -- ^ Scale parameter, ϑ. } deriving (Eq, Read, Show, Typeable) -- | Create gamma distribution. Both shape and scale parameters must -- be positive. gammaDistr :: Double -- ^ Shape parameter. /k/ -> Double -- ^ Scale parameter, ϑ. -> GammaDistribution gammaDistr k theta | k <= 0 = error $ msg ++ "shape must be positive. Got " ++ show k | theta <= 0 = error $ msg ++ "scale must be positive. Got " ++ show theta | otherwise = improperGammaDistr k theta where msg = "Statistics.Distribution.Gamma.gammaDistr: " {-# INLINE gammaDistr #-} -- | Create gamma distribution. This constructor do not check whether -- parameters are valid improperGammaDistr :: Double -- ^ Shape parameter. /k/ -> Double -- ^ Scale parameter, ϑ. -> GammaDistribution improperGammaDistr = GD {-# INLINE improperGammaDistr #-} instance D.Distribution GammaDistribution where cumulative = cumulative instance D.ContDistr GammaDistribution where density = density quantile = quantile instance D.Variance GammaDistribution where variance (GD a l) = a * l * l {-# INLINE variance #-} instance D.Mean GammaDistribution where mean (GD a l) = a * l {-# INLINE mean #-} instance D.MaybeMean GammaDistribution where maybeMean = Just . D.mean instance D.MaybeVariance GammaDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.ContGen GammaDistribution where genContVar (GD a l) = MWC.gamma a l density :: GammaDistribution -> Double -> Double density (GD a l) x | a < 0 || l <= 0 = m_NaN | x <= 0 = 0 | a == 0 = if x == 0 then m_pos_inf else 0 | x == 0 = if a < 1 then m_pos_inf else if a > 1 then 0 else 1/l | a < 1 = Poisson.probability (x/l) a * a / x | otherwise = Poisson.probability (x/l) (a-1) / l {-# INLINE density #-} cumulative :: GammaDistribution -> Double -> Double cumulative (GD k l) x | x <= 0 = 0 | otherwise = incompleteGamma k (x/l) {-# INLINE cumulative #-} quantile :: GammaDistribution -> Double -> Double quantile (GD k l) p | p == 0 = 0 | p == 1 = 1/0 | p > 0 && p < 1 = l * invIncompleteGamma k p | otherwise = error $ "Statistics.Distribution.Gamma.quantile: p must be in [0,1] range. Got: "++show p {-# INLINE quantile #-} statistics-0.10.2.0/Statistics/Distribution/Geometric.hs0000644000000000000000000000432312016036043021370 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Geometric -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Geometric distribution. This is the probability distribution of -- the number of Bernoulli trials needed to get one success, supported -- on the set [1,2..]. -- -- This distribution is sometimes referred to as the /shifted/ -- geometric distribution, to distinguish it from a variant measuring -- the number of failures before the first success, defined over the -- set [0,1..]. module Statistics.Distribution.Geometric ( GeometricDistribution -- * Constructors , geometric -- ** Accessors , gdSuccess ) where import Data.Typeable (Typeable) import qualified Statistics.Distribution as D newtype GeometricDistribution = GD { gdSuccess :: Double } deriving (Eq, Read, Show, Typeable) instance D.Distribution GeometricDistribution where cumulative = cumulative instance D.DiscreteDistr GeometricDistribution where probability = probability instance D.Mean GeometricDistribution where mean (GD s) = 1 / s {-# INLINE mean #-} instance D.Variance GeometricDistribution where variance (GD s) = (1 - s) / (s * s) {-# INLINE variance #-} instance D.MaybeMean GeometricDistribution where maybeMean = Just . D.mean instance D.MaybeVariance GeometricDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance -- | Create geometric distribution. geometric :: Double -- ^ Success rate -> GeometricDistribution geometric x | x >= 0 && x <= 1 = GD x | otherwise = error $ "Statistics.Distribution.Geometric.geometric: probability must be in [0,1] range. Got " ++ show x {-# INLINE geometric #-} probability :: GeometricDistribution -> Int -> Double probability (GD s) n | n < 1 = 0 | otherwise = s * (1-s) ** (fromIntegral n - 1) {-# INLINE probability #-} cumulative :: GeometricDistribution -> Double -> Double cumulative (GD s) x | x < 1 = 0 | otherwise = 1 - (1-s) ^ (floor x :: Int) {-# INLINE cumulative #-} statistics-0.10.2.0/Statistics/Distribution/Hypergeometric.hs0000644000000000000000000000605412016036043022443 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Hypergeometric -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Hypergeometric distribution. This is the discrete probability -- distribution that measures the probability of /k/ successes in /l/ -- trials, without replacement, from a finite population. -- -- The parameters of the distribution describe /k/ elements chosen -- from a population of /l/, with /m/ elements of one type, and -- /l/-/m/ of the other (all are positive integers). module Statistics.Distribution.Hypergeometric ( HypergeometricDistribution -- * Constructors , hypergeometric -- ** Accessors , hdM , hdL , hdK ) where import Data.Typeable (Typeable) import Numeric.SpecFunctions (choose) import qualified Statistics.Distribution as D data HypergeometricDistribution = HD { hdM :: {-# UNPACK #-} !Int , hdL :: {-# UNPACK #-} !Int , hdK :: {-# UNPACK #-} !Int } deriving (Eq, Read, Show, Typeable) instance D.Distribution HypergeometricDistribution where cumulative = cumulative instance D.DiscreteDistr HypergeometricDistribution where probability = probability instance D.Mean HypergeometricDistribution where mean = mean instance D.Variance HypergeometricDistribution where variance = variance instance D.MaybeMean HypergeometricDistribution where maybeMean = Just . D.mean instance D.MaybeVariance HypergeometricDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance variance :: HypergeometricDistribution -> Double variance (HD m l k) = (k' * ml) * (1 - ml) * (l' - k') / (l' - 1) where m' = fromIntegral m l' = fromIntegral l k' = fromIntegral k ml = m' / l' {-# INLINE variance #-} mean :: HypergeometricDistribution -> Double mean (HD m l k) = fromIntegral k * fromIntegral m / fromIntegral l {-# INLINE mean #-} hypergeometric :: Int -- ^ /m/ -> Int -- ^ /l/ -> Int -- ^ /k/ -> HypergeometricDistribution hypergeometric m l k | not (l > 0) = error $ msg ++ "l must be positive" | not (m >= 0 && m <= l) = error $ msg ++ "m must lie in [0,l] range" | not (k > 0 && k <= l) = error $ msg ++ "k must lie in (0,l] range" | otherwise = HD m l k where msg = "Statistics.Distribution.Hypergeometric.hypergeometric: " {-# INLINE hypergeometric #-} -- Naive implementation probability :: HypergeometricDistribution -> Int -> Double probability (HD mi li ki) n | n < max 0 (mi+ki-li) || n > min mi ki = 0 | otherwise = choose mi n * choose (li - mi) (ki - n) / choose li ki {-# INLINE probability #-} cumulative :: HypergeometricDistribution -> Double -> Double cumulative d@(HD mi li ki) x | n < minN = 0 | n >= maxN = 1 | otherwise = D.sumProbabilities d minN n where n = floor x minN = max 0 (mi+ki-li) maxN = min mi kistatistics-0.10.2.0/Statistics/Distribution/Normal.hs0000644000000000000000000000716612016036043020712 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Normal -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The normal distribution. This is a continuous probability -- distribution that describes data that cluster around a mean. module Statistics.Distribution.Normal ( NormalDistribution -- * Constructors , normalDistr , normalFromSample , standard ) where import Data.Number.Erf (erfc) import Data.Typeable (Typeable) import Numeric.MathFunctions.Constants (m_sqrt_2, m_sqrt_2_pi) import qualified Statistics.Distribution as D import qualified Statistics.Sample as S import qualified System.Random.MWC.Distributions as MWC -- | The normal distribution. data NormalDistribution = ND { mean :: {-# UNPACK #-} !Double , stdDev :: {-# UNPACK #-} !Double , ndPdfDenom :: {-# UNPACK #-} !Double , ndCdfDenom :: {-# UNPACK #-} !Double } deriving (Eq, Read, Show, Typeable) instance D.Distribution NormalDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr NormalDistribution where density = density quantile = quantile instance D.MaybeMean NormalDistribution where maybeMean = Just . D.mean instance D.Mean NormalDistribution where mean = mean instance D.MaybeVariance NormalDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Variance NormalDistribution where stdDev = stdDev instance D.ContGen NormalDistribution where genContVar d = MWC.normal (mean d) (stdDev d) {-# INLINE genContVar #-} -- | Standard normal distribution with mean equal to 0 and variance equal to 1 standard :: NormalDistribution standard = ND { mean = 0.0 , stdDev = 1.0 , ndPdfDenom = m_sqrt_2_pi , ndCdfDenom = m_sqrt_2 } -- | Create normal distribution from parameters. -- -- IMPORTANT: prior to 0.10 release second parameter was variance not -- standard deviation. normalDistr :: Double -- ^ Mean of distribution -> Double -- ^ Standard deviation of distribution -> NormalDistribution normalDistr m sd | sd > 0 = ND { mean = m , stdDev = sd , ndPdfDenom = m_sqrt_2_pi * sd , ndCdfDenom = m_sqrt_2 * sd } | otherwise = error $ "Statistics.Distribution.Normal.normalDistr: standard deviation must be positive. Got " ++ show sd -- | Create distribution using parameters estimated from -- sample. Variance is estimated using maximum likelihood method -- (biased estimation). normalFromSample :: S.Sample -> NormalDistribution normalFromSample a = normalDistr (S.mean a) (S.stdDev a) density :: NormalDistribution -> Double -> Double density d x = exp (-xm * xm / (2 * sd * sd)) / ndPdfDenom d where xm = x - mean d sd = stdDev d cumulative :: NormalDistribution -> Double -> Double cumulative d x = erfc ((mean d - x) / ndCdfDenom d) / 2 complCumulative :: NormalDistribution -> Double -> Double complCumulative d x = erfc ((x - mean d) / ndCdfDenom d) / 2 quantile :: NormalDistribution -> Double -> Double quantile d p | p == 0 = -inf | p == 1 = inf | p == 0.5 = mean d | p > 0 && p < 1 = x * stdDev d + mean d | otherwise = error $ "Statistics.Distribution.Normal.quantile: p must be in [0,1] range. Got: "++show p where x = D.findRoot standard p 0 (-100) 100 inf = 1/0 statistics-0.10.2.0/Statistics/Distribution/Poisson.hs0000644000000000000000000000420412016036043021102 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Poisson -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Poisson distribution. This is the discrete probability -- distribution of a number of events occurring in a fixed interval if -- these events occur with a known average rate, and occur -- independently from each other within that interval. module Statistics.Distribution.Poisson ( PoissonDistribution -- * Constructors , poisson -- * Accessors , poissonLambda -- * References -- $references ) where import Data.Typeable (Typeable) import qualified Statistics.Distribution as D import qualified Statistics.Distribution.Poisson.Internal as I import Numeric.SpecFunctions (incompleteGamma) newtype PoissonDistribution = PD { poissonLambda :: Double } deriving (Eq, Read, Show, Typeable) instance D.Distribution PoissonDistribution where cumulative (PD lambda) x | x < 0 = 0 | otherwise = 1 - incompleteGamma (fromIntegral (floor x + 1 :: Int)) lambda {-# INLINE cumulative #-} instance D.DiscreteDistr PoissonDistribution where probability (PD lambda) x = I.probability lambda (fromIntegral x) {-# INLINE probability #-} instance D.Variance PoissonDistribution where variance = poissonLambda {-# INLINE variance #-} instance D.Mean PoissonDistribution where mean = poissonLambda {-# INLINE mean #-} instance D.MaybeMean PoissonDistribution where maybeMean = Just . D.mean instance D.MaybeVariance PoissonDistribution where maybeStdDev = Just . D.stdDev -- | Create Poisson distribution. poisson :: Double -> PoissonDistribution poisson l | l >= 0 = PD l | otherwise = error $ "Statistics.Distribution.Poisson.poisson:\ \ lambda must be non-negative. Got " ++ show l {-# INLINE poisson #-} -- $references -- -- * Loader, C. (2000) Fast and Accurate Computation of Binomial -- Probabilities. statistics-0.10.2.0/Statistics/Distribution/StudentT.hs0000644000000000000000000000417012016036043021224 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.StudentT -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Student-T distribution module Statistics.Distribution.StudentT ( StudentT , studentT , studentTndf ) where import qualified Statistics.Distribution as D import Data.Typeable (Typeable) import Numeric.SpecFunctions (logBeta, incompleteBeta, invIncompleteBeta) -- | Student-T distribution newtype StudentT = StudentT { studentTndf :: Double } deriving (Eq,Show,Read,Typeable) -- | Create Student-T distribution. Number of parameters must be positive. studentT :: Double -> StudentT studentT ndf | ndf > 0 = StudentT ndf | otherwise = error "Statistics.Distribution.StudentT.studentT: non-positive number of degrees of freedom" instance D.Distribution StudentT where cumulative = cumulative instance D.ContDistr StudentT where density = density quantile = quantile cumulative :: StudentT -> Double -> Double cumulative (StudentT ndf) x | x > 0 = 1 - 0.5 * ibeta | otherwise = 0.5 * ibeta where ibeta = incompleteBeta (0.5 * ndf) 0.5 (ndf / (ndf + x*x)) density :: StudentT -> Double -> Double density (StudentT ndf) x = exp( log (ndf / (ndf + x*x)) * (0.5 * (1 + ndf)) - logBeta 0.5 (0.5 * ndf) ) / sqrt ndf quantile :: StudentT -> Double -> Double quantile (StudentT ndf) p | p >= 0 && p <= 1 = let x = invIncompleteBeta (0.5 * ndf) 0.5 (2 * min p (1 - p)) in case sqrt $ ndf * (1 - x) / x of r | p < 0.5 -> -r | otherwise -> r | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.MaybeMean StudentT where maybeMean (StudentT ndf) | ndf > 1 = Just 0 | otherwise = Nothing instance D.MaybeVariance StudentT where maybeStdDev (StudentT ndf) | ndf > 2 = Just $ ndf / (ndf - 2) | otherwise = Nothing instance D.ContGen StudentT where genContVar = D.genContinous statistics-0.10.2.0/Statistics/Distribution/Uniform.hs0000644000000000000000000000456512016036043021101 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} -- | -- Module : Statistics.Distribution.Uniform -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Variate distributed uniformly in the interval. module Statistics.Distribution.Uniform ( UniformDistribution -- * Constructors , uniformDistr -- ** Accessors , uniformA , uniformB ) where import Data.Typeable (Typeable) import qualified Statistics.Distribution as D import qualified System.Random.MWC as MWC -- | Uniform distribution from A to B data UniformDistribution = UniformDistribution { uniformA :: {-# UNPACK #-} !Double -- ^ Low boundary of distribution , uniformB :: {-# UNPACK #-} !Double -- ^ Upper boundary of distribution } deriving (Eq, Read, Show, Typeable) -- | Create uniform distribution. uniformDistr :: Double -> Double -> UniformDistribution uniformDistr a b | b < a = uniformDistr b a | a < b = UniformDistribution a b | otherwise = error "Statistics.Distribution.Uniform.uniform: wrong parameters" -- NOTE: failure is in default branch to guard againist NaNs. instance D.Distribution UniformDistribution where cumulative (UniformDistribution a b) x | x < a = 0 | x > b = 1 | otherwise = (x - a) / (b - a) instance D.ContDistr UniformDistribution where density (UniformDistribution a b) x | x < a = 0 | x > b = 0 | otherwise = 1 / (b - a) quantile (UniformDistribution a b) p | p >= 0 && p <= 1 = a + (b - a) * p | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.Mean UniformDistribution where mean (UniformDistribution a b) = 0.5 * (a + b) instance D.Variance UniformDistribution where -- NOTE: 1/sqrt 12 is not constant folded (#4101) so it's written as -- numerical constant. (Also FIXME!) stdDev (UniformDistribution a b) = 0.2886751345948129 * (b - a) variance (UniformDistribution a b) = d * d / 12 where d = b - a instance D.MaybeMean UniformDistribution where maybeMean = Just . D.mean instance D.MaybeVariance UniformDistribution where maybeStdDev = Just . D.stdDev instance D.ContGen UniformDistribution where genContVar (UniformDistribution a b) gen = MWC.uniformR (a,b) gen statistics-0.10.2.0/Statistics/Distribution/Poisson/0000755000000000000000000000000012016036043020546 5ustar0000000000000000statistics-0.10.2.0/Statistics/Distribution/Poisson/Internal.hs0000644000000000000000000000203312016036043022654 0ustar0000000000000000-- | -- Module : Statistics.Distribution.Poisson.Internal -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Internal code for the Poisson distribution. module Statistics.Distribution.Poisson.Internal ( probability ) where import Numeric.MathFunctions.Constants (m_sqrt_2_pi, m_tiny) import Numeric.SpecFunctions (logGamma, stirlingError) import Numeric.SpecFunctions.Extra (bd0) -- | An unchecked, non-integer-valued version of Loader's saddle point -- algorithm. probability :: Double -> Double -> Double probability 0 0 = 1 probability 0 1 = 0 probability lambda x | isInfinite lambda = 0 | x < 0 = 0 | x <= lambda * m_tiny = exp (-lambda) | lambda < x * m_tiny = exp (-lambda + x * log lambda - logGamma (x+1)) | otherwise = exp (-(stirlingError x) - bd0 x lambda) / (m_sqrt_2_pi * sqrt x) {-# INLINE probability #-} statistics-0.10.2.0/Statistics/Function/0000755000000000000000000000000012016036043016222 5ustar0000000000000000statistics-0.10.2.0/Statistics/Function/Comparison.hs0000644000000000000000000000252612016036043020675 0ustar0000000000000000-- | -- Module : Statistics.Function.Comparison -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Approximate floating point comparison, based on Bruce Dawson's -- \"Comparing floating point numbers\": -- module Statistics.Function.Comparison ( within ) where import Control.Monad.ST (runST) import Data.Primitive.ByteArray (newByteArray, readByteArray, writeByteArray) import Data.Int (Int64) -- | Compare two 'Double' values for approximate equality, using -- Dawson's method. -- -- The required accuracy is specified in ULPs (units of least -- precision). If the two numbers differ by the given number of ULPs -- or less, this function returns @True@. within :: Int -- ^ Number of ULPs of accuracy desired. -> Double -> Double -> Bool within ulps a b = runST $ do buf <- newByteArray 8 ai0 <- writeByteArray buf 0 a >> readByteArray buf 0 bi0 <- writeByteArray buf 0 b >> readByteArray buf 0 let big = 0x8000000000000000 :: Int64 ai | ai0 < 0 = big - ai0 | otherwise = ai0 bi | bi0 < 0 = big - bi0 | otherwise = bi0 return $ abs (ai - bi) <= fromIntegral ulps statistics-0.10.2.0/Statistics/Math/0000755000000000000000000000000012016036043015326 5ustar0000000000000000statistics-0.10.2.0/Statistics/Math/RootFinding.hs0000644000000000000000000000757212016036043020117 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable #-} -- | -- Module : Statistics.Math.RootFinding -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Haskell functions for finding the roots of mathematical functions. module Statistics.Math.RootFinding ( Root(..) , fromRoot , ridders -- * References -- $references ) where import Statistics.Function.Comparison import Control.Applicative import Control.Monad (MonadPlus(..), ap) import Data.Typeable (Typeable) -- | The result of searching for a root of a mathematical function. data Root a = NotBracketed -- ^ The function does not have opposite signs when -- evaluated at the lower and upper bounds of the search. | SearchFailed -- ^ The search failed to converge to within the given -- error tolerance after the given number of iterations. | Root a -- ^ A root was successfully found. deriving (Eq, Read, Show, Typeable) instance Functor Root where fmap _ NotBracketed = NotBracketed fmap _ SearchFailed = SearchFailed fmap f (Root a) = Root (f a) instance Monad Root where NotBracketed >>= _ = NotBracketed SearchFailed >>= _ = SearchFailed Root a >>= m = m a return = Root instance MonadPlus Root where mzero = SearchFailed r@(Root _) `mplus` _ = r _ `mplus` p = p instance Applicative Root where pure = Root (<*>) = ap instance Alternative Root where empty = SearchFailed r@(Root _) <|> _ = r _ <|> p = p -- | Returns either the result of a search for a root, or the default -- value if the search failed. fromRoot :: a -- ^ Default value. -> Root a -- ^ Result of search for a root. -> a fromRoot _ (Root a) = a fromRoot a _ = a -- | Use the method of Ridders to compute a root of a function. -- -- The function must have opposite signs when evaluated at the lower -- and upper bounds of the search (i.e. the root must be bracketed). ridders :: Double -- ^ Absolute error tolerance. -> (Double,Double) -- ^ Lower and upper bounds for the search. -> (Double -> Double) -- ^ Function to find the roots of. -> Root Double ridders tol (lo,hi) f | flo == 0 = Root lo | fhi == 0 = Root hi | flo*fhi > 0 = NotBracketed -- root is not bracketed | otherwise = go lo flo hi fhi 0 where go !a !fa !b !fb !i -- Root is bracketed within 1 ulp. No improvement could be made | within 1 a b = Root a -- Root is found. Check that f(m) == 0 is nessesary to ensure -- that root is never passed to 'go' | fm == 0 = Root m | fn == 0 = Root n | d < tol = Root n -- Too many iterations performed. Fail | i >= (100 :: Int) = SearchFailed -- Ridder's approximation coincide with one of old -- bounds. Revert to bisection | n == a || n == b = case () of _| fm*fa < 0 -> go a fa m fm (i+1) | otherwise -> go m fm b fb (i+1) -- Proceed as usual | fn*fm < 0 = go n fn m fm (i+1) | fn*fa < 0 = go a fa n fn (i+1) | otherwise = go n fn b fb (i+1) where d = abs (b - a) dm = (b - a) * 0.5 !m = a + dm !fm = f m !dn = signum (fb - fa) * dm * fm / sqrt(fm*fm - fa*fb) !n = m - signum dn * min (abs dn) (abs dm - 0.5 * tol) !fn = f n !flo = f lo !fhi = f hi -- $references -- -- * Ridders, C.F.J. (1979) A new algorithm for computing a single -- root of a real continuous function. -- /IEEE Transactions on Circuits and Systems/ 26:979–980. statistics-0.10.2.0/Statistics/Resampling/0000755000000000000000000000000012016036043016536 5ustar0000000000000000statistics-0.10.2.0/Statistics/Resampling/Bootstrap.hs0000644000000000000000000001006512016036043021051 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, OverloadedStrings, RecordWildCards #-} -- | -- Module : Statistics.Resampling.Bootstrap -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The bootstrap method for statistical inference. module Statistics.Resampling.Bootstrap ( Estimate(..) , bootstrapBCA , scale -- * References -- $references ) where import Control.DeepSeq (NFData) import Control.Exception (assert) import Control.Monad.Par (runPar, parMap) import Data.Data (Data) import Data.Typeable (Typeable) import Data.Vector.Unboxed ((!)) import Statistics.Distribution (cumulative, quantile) import Statistics.Distribution.Normal import Statistics.Resampling (Resample(..), jackknife) import Statistics.Sample (mean) import Statistics.Types (Estimator, Sample) import qualified Data.Vector.Unboxed as U -- | A point and interval estimate computed via an 'Estimator'. data Estimate = Estimate { estPoint :: {-# UNPACK #-} !Double -- ^ Point estimate. , estLowerBound :: {-# UNPACK #-} !Double -- ^ Lower bound of the estimate interval (i.e. the lower bound of -- the confidence interval). , estUpperBound :: {-# UNPACK #-} !Double -- ^ Upper bound of the estimate interval (i.e. the upper bound of -- the confidence interval). , estConfidenceLevel :: {-# UNPACK #-} !Double -- ^ Confidence level of the confidence intervals. } deriving (Eq, Show, Typeable, Data) instance NFData Estimate -- | Multiply the point, lower bound, and upper bound in an 'Estimate' -- by the given value. scale :: Double -- ^ Value to multiply by. -> Estimate -> Estimate scale f e@Estimate{..} = e { estPoint = f * estPoint , estLowerBound = f * estLowerBound , estUpperBound = f * estUpperBound } estimate :: Double -> Double -> Double -> Double -> Estimate estimate pt lb ub cl = assert (lb <= ub) . assert (cl > 0 && cl < 1) $ Estimate { estPoint = pt , estLowerBound = lb , estUpperBound = ub , estConfidenceLevel = cl } data T = {-# UNPACK #-} !Double :< {-# UNPACK #-} !Double infixl 2 :< -- | Bias-corrected accelerated (BCA) bootstrap. This adjusts for both -- bias and skewness in the resampled distribution. bootstrapBCA :: Double -- ^ Confidence level -> Sample -- ^ Sample data -> [Estimator] -- ^ Estimators -> [Resample] -- ^ Resampled data -> [Estimate] bootstrapBCA confidenceLevel sample estimators resamples = assert (confidenceLevel > 0 && confidenceLevel < 1) runPar $ parMap (uncurry e) (zip estimators resamples) where e est (Resample resample) | U.length sample == 1 = estimate pt pt pt confidenceLevel | otherwise = estimate pt (resample ! lo) (resample ! hi) confidenceLevel where pt = est sample lo = max (cumn a1) 0 where a1 = bias + b1 / (1 - accel * b1) b1 = bias + z1 hi = min (cumn a2) (ni - 1) where a2 = bias + b2 / (1 - accel * b2) b2 = bias - z1 z1 = quantile standard ((1 - confidenceLevel) / 2) cumn = round . (*n) . cumulative standard bias = quantile standard (probN / n) where probN = fromIntegral . U.length . U.filter ( statistics-0.10.2.0/Statistics/Sample/0000755000000000000000000000000012016036043015656 5ustar0000000000000000statistics-0.10.2.0/Statistics/Sample/Histogram.hs0000644000000000000000000000671712016036043020162 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample.Histogram -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for computing histograms of sample data. module Statistics.Sample.Histogram ( histogram -- * Building blocks , histogram_ , range ) where import Numeric.MathFunctions.Constants (m_epsilon) import Statistics.Function (minMax) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as GM -- | /O(n)/ Compute a histogram over a data set. -- -- The result consists of a pair of vectors: -- -- * The lower bound of each interval. -- * The number of samples within the interval. -- -- Interval (bin) sizes are uniform, and the upper and lower bounds -- are chosen automatically using the 'range' function. To specify -- these parameters directly, use the 'histogram_' function. histogram :: (G.Vector v0 Double, G.Vector v1 Double, Num b, G.Vector v1 b) => Int -- ^ Number of bins (must be positive). -> v0 Double -- ^ Sample data (cannot be empty). -> (v1 Double, v1 b) histogram numBins xs = (G.generate numBins step, histogram_ numBins lo hi xs) where (lo,hi) = range numBins xs step i = lo + d * fromIntegral i d = (hi - lo) / fromIntegral numBins {-# INLINE histogram #-} -- | /O(n)/ Compute a histogram over a data set. -- -- Interval (bin) sizes are uniform, based on the supplied upper -- and lower bounds. histogram_ :: (Num b, RealFrac a, G.Vector v0 a, G.Vector v1 b) => Int -- ^ Number of bins. This value must be positive. A zero -- or negative value will cause an error. -> a -- ^ Lower bound on interval range. Sample data less than -- this will cause an error. -> a -- ^ Upper bound on interval range. This value must not be -- less than the lower bound. Sample data that falls above -- the upper bound will cause an error. -> v0 a -- ^ Sample data. -> v1 b histogram_ numBins lo hi xs0 = G.create (GM.replicate numBins 0 >>= bin xs0) where bin xs bins = go 0 where go i | i >= len = return bins | otherwise = do let x = xs `G.unsafeIndex` i b = truncate $ (x - lo) / d GM.write bins b . (+1) =<< GM.read bins b go (i+1) len = G.length xs d = ((hi - lo) * (1 + realToFrac m_epsilon)) / fromIntegral numBins {-# INLINE histogram_ #-} -- | /O(n)/ Compute decent defaults for the lower and upper bounds of -- a histogram, based on the desired number of bins and the range of -- the sample data. -- -- The upper and lower bounds used are @(lo-d, hi+d)@, where -- -- @d = (maximum sample - minimum sample) / ((bins - 1) * 2)@ range :: (G.Vector v Double) => Int -- ^ Number of bins (must be positive). -> v Double -- ^ Sample data (cannot be empty). -> (Double, Double) range numBins xs | numBins < 1 = error "Statistics.Histogram.range: invalid bin count" | G.null xs = error "Statistics.Histogram.range: empty sample" | otherwise = (lo-d, hi+d) where d | numBins == 1 = 0 | otherwise = (hi - lo) / ((fromIntegral numBins - 1) * 2) (lo,hi) = minMax xs {-# INLINE range #-} statistics-0.10.2.0/Statistics/Sample/KernelDensity.hs0000644000000000000000000001015012016036043020767 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts, UnboxedTuples #-} -- | -- Module : Statistics.Sample.KernelDensity -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kernel density estimation. This module provides a fast, robust, -- non-parametric way to estimate the probability density function of -- a sample. -- -- This estimator does not use the commonly employed \"Gaussian rule -- of thumb\". As a result, it outperforms many plug-in methods on -- multimodal samples with widely separated modes. module Statistics.Sample.KernelDensity ( -- * Estimation functions kde , kde_ -- * References -- $references ) where import Prelude hiding (const,min,max) import Numeric.MathFunctions.Constants (m_sqrt_2_pi) import Statistics.Function (minMax, nextHighestPowerOfTwo) import Statistics.Math.RootFinding (fromRoot, ridders) import Statistics.Sample.Histogram (histogram_) import Statistics.Transform (dct, idct) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | Gaussian kernel density estimator for one-dimensional data, using -- the method of Botev et al. -- -- The result is a pair of vectors, containing: -- -- * The coordinates of each mesh point. The mesh interval is chosen -- to be 20% larger than the range of the sample. (To specify the -- mesh interval, use 'kde_'.) -- -- * Density estimates at each mesh point. kde :: Int -- ^ The number of mesh points to use in the uniform discretization -- of the interval @(min,max)@. If this value is not a power of -- two, then it is rounded up to the next power of two. -> U.Vector Double -> (U.Vector Double, U.Vector Double) kde n0 xs = kde_ n0 (lo - range / 10) (hi + range / 10) xs where (lo,hi) = minMax xs range | U.length xs <= 1 = 1 -- Unreasonable guess | otherwise = hi - lo -- | Gaussian kernel density estimator for one-dimensional data, using -- the method of Botev et al. -- -- The result is a pair of vectors, containing: -- -- * The coordinates of each mesh point. -- -- * Density estimates at each mesh point. kde_ :: Int -- ^ The number of mesh points to use in the uniform discretization -- of the interval @(min,max)@. If this value is not a power of -- two, then it is rounded up to the next power of two. -> Double -- ^ Lower bound (@min@) of the mesh range. -> Double -- ^ Upper bound (@max@) of the mesh range. -> U.Vector Double -> (U.Vector Double, U.Vector Double) kde_ n0 min max xs | U.null xs = error "Statistics.KernelDensity.kde: empty sample" | n0 < 1 = error "Statistics.KernelDensity.kde: invalid number of points" | otherwise = (mesh, density) where mesh = G.generate ni $ \z -> min + (d * fromIntegral z) where d = r / (n-1) density = G.map (/(2 * r)) . idct $ G.zipWith f a (G.enumFromTo 0 (n-1)) where f b z = b * exp (sqr z * sqr pi * t_star * (-0.5)) !n = fromIntegral ni !ni = nextHighestPowerOfTwo n0 !r = max - min a = dct . G.map (/ G.sum h) $ h where h = G.map (/ len) $ histogram_ ni min max xs !len = fromIntegral (G.length xs) !t_star = fromRoot (0.28 * len ** (-0.4)) . ridders 1e-14 (0,0.1) $ \x -> x - (len * (2 * sqrt pi) * go 6 (f 7 x)) ** (-0.4) where f q t = 2 * pi ** (q*2) * G.sum (G.zipWith g iv a2v) where g i a2 = i ** q * a2 * exp ((-i) * sqr pi * t) a2v = G.map (sqr . (*0.5)) $ G.tail a iv = G.map sqr $ G.enumFromTo 1 (n-1) go s !h | s == 1 = h | otherwise = go (s-1) (f s time) where time = (2 * const * k0 / len / h) ** (2 / (3 + 2 * s)) const = (1 + 0.5 ** (s+0.5)) / 3 k0 = U.product (G.enumFromThenTo 1 3 (2*s-1)) / m_sqrt_2_pi sqr x = x * x -- $references -- -- Botev. Z.I., Grotowski J.F., Kroese D.P. (2010). Kernel density -- estimation via diffusion. /Annals of Statistics/ -- 38(5):2916–2957. statistics-0.10.2.0/Statistics/Sample/Powers.hs0000644000000000000000000001524012016036043017473 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE BangPatterns #-} -- | -- Module : Statistics.Sample.Powers -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Very fast statistics over simple powers of a sample. These can all -- be computed efficiently in just a single pass over a sample, with -- that pass subject to stream fusion. -- -- The tradeoff is that some of these functions are less numerically -- robust than their counterparts in the 'Statistics.Sample' module. -- Where this is the case, the alternatives are noted. module Statistics.Sample.Powers ( -- * Types Powers -- * Constructor , powers -- * Descriptive functions , order , count , sum -- * Statistics of location , mean -- * Statistics of dispersion , variance , stdDev , varianceUnbiased -- * Functions over central moments , centralMoment , skewness , kurtosis -- * References -- $references ) where import Data.Vector.Generic (unsafeFreeze) import Data.Vector.Unboxed ((!)) import Prelude hiding (sum) import Statistics.Function (indexed) import Statistics.Internal (inlinePerformIO) import Numeric.SpecFunctions (choose) import System.IO.Unsafe (unsafePerformIO) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed.Mutable as MU newtype Powers = Powers (U.Vector Double) deriving (Eq, Show) -- | O(/n/) Collect the /n/ simple powers of a sample. -- -- Functions computed over a sample's simple powers require at least a -- certain number (or /order/) of powers to be collected. -- -- * To compute the /k/th 'centralMoment', at least /k/ simple powers -- must be collected. -- -- * For the 'variance', at least 2 simple powers are needed. -- -- * For 'skewness', we need at least 3 simple powers. -- -- * For 'kurtosis', at least 4 simple powers are required. -- -- This function is subject to stream fusion. powers :: G.Vector v Double => Int -- ^ /n/, the number of powers, where /n/ >= 2. -> v Double -> Powers powers k | k < 2 = error "Statistics.Sample.powers: too few powers" | otherwise = fini . G.foldl' go (unsafePerformIO $ MU.replicate l 0) where go ms x = inlinePerformIO $ loop 0 1 where loop !i !xk | i == l = return ms | otherwise = do MU.read ms i >>= MU.write ms i . (+ xk) loop (i+1) (xk*x) fini = Powers . unsafePerformIO . unsafeFreeze l = k + 1 {-# INLINE powers #-} -- | The order (number) of simple powers collected from a 'sample'. order :: Powers -> Int order (Powers pa) = U.length pa - 1 {-# INLINE order #-} -- | Compute the /k/th central moment of a sample. The central -- moment is also known as the moment about the mean. centralMoment :: Int -> Powers -> Double centralMoment k p@(Powers pa) | k < 0 || k > order p = error ("Statistics.Sample.Powers.centralMoment: " ++ "invalid argument") | k == 0 = 1 | otherwise = (/n) . U.sum . U.map go . indexed . U.take (k+1) $ pa where go (i , e) = (k `choose` i) * ((-m) ^ (k-i)) * e n = U.head pa m = mean p {-# INLINE centralMoment #-} -- | Maximum likelihood estimate of a sample's variance. Also known -- as the population variance, where the denominator is /n/. This is -- the second central moment of the sample. -- -- This is less numerically robust than the variance function in the -- 'Statistics.Sample' module, but the number is essentially free to -- compute if you have already collected a sample's simple powers. -- -- Requires 'Powers' with 'order' at least 2. variance :: Powers -> Double variance = centralMoment 2 {-# INLINE variance #-} -- | Standard deviation. This is simply the square root of the -- maximum likelihood estimate of the variance. stdDev :: Powers -> Double stdDev = sqrt . variance {-# INLINE stdDev #-} -- | Unbiased estimate of a sample's variance. Also known as the -- sample variance, where the denominator is /n/-1. -- -- Requires 'Powers' with 'order' at least 2. varianceUnbiased :: Powers -> Double varianceUnbiased p@(Powers pa) | n > 1 = variance p * n / (n-1) | otherwise = 0 where n = U.head pa {-# INLINE varianceUnbiased #-} -- | Compute the skewness of a sample. This is a measure of the -- asymmetry of its distribution. -- -- A sample with negative skew is said to be /left-skewed/. Most of -- its mass is on the right of the distribution, with the tail on the -- left. -- -- > skewness . powers 3 $ U.to [1,100,101,102,103] -- > ==> -1.497681449918257 -- -- A sample with positive skew is said to be /right-skewed/. -- -- > skewness . powers 3 $ U.to [1,2,3,4,100] -- > ==> 1.4975367033335198 -- -- A sample's skewness is not defined if its 'variance' is zero. -- -- Requires 'Powers' with 'order' at least 3. skewness :: Powers -> Double skewness p = centralMoment 3 p * variance p ** (-1.5) {-# INLINE skewness #-} -- | Compute the excess kurtosis of a sample. This is a measure of -- the \"peakedness\" of its distribution. A high kurtosis indicates -- that the sample's variance is due more to infrequent severe -- deviations than to frequent modest deviations. -- -- A sample's excess kurtosis is not defined if its 'variance' is -- zero. -- -- Requires 'Powers' with 'order' at least 4. kurtosis :: Powers -> Double kurtosis p = centralMoment 4 p / (v * v) - 3 where v = variance p {-# INLINE kurtosis #-} -- | The number of elements in the original 'Sample'. This is the -- sample's zeroth simple power. count :: Powers -> Int count (Powers pa) = floor $ U.head pa {-# INLINE count #-} -- | The sum of elements in the original 'Sample'. This is the -- sample's first simple power. sum :: Powers -> Double sum (Powers pa) = pa ! 1 {-# INLINE sum #-} -- | The arithmetic mean of elements in the original 'Sample'. -- -- This is less numerically robust than the mean function in the -- 'Statistics.Sample' module, but the number is essentially free to -- compute if you have already collected a sample's simple powers. mean :: Powers -> Double mean p@(Powers pa) | n == 0 = 0 | otherwise = sum p / n where n = U.head pa {-# INLINE mean #-} -- $references -- -- * Besset, D.H. (2000) Elements of statistics. -- /Object-oriented implementation of numerical methods/ -- ch. 9, pp. 311–331. -- -- -- * Anderson, G. (2009) Compute /k/th central moments in one -- pass. /quantblog/. statistics-0.10.2.0/Statistics/Sample/KernelDensity/0000755000000000000000000000000012016036043020436 5ustar0000000000000000statistics-0.10.2.0/Statistics/Sample/KernelDensity/Simple.hs0000644000000000000000000001515612016036043022233 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample.KernelDensity.Simple -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kernel density estimation code, providing non-parametric ways to -- estimate the probability density function of a sample. -- -- The techniques used by functions in this module are relatively -- fast, but they generally give inferior results to the KDE function -- in the main 'Statistics.KernelDensity' module (due to the -- oversmoothing documented for 'bandwidth' below). module Statistics.Sample.KernelDensity.Simple {-# DEPRECATED "Use Statistics.Sample.KernelDensity instead." #-} ( -- * Simple entry points epanechnikovPDF , gaussianPDF -- * Building blocks -- These functions may be useful if you need to construct a kernel -- density function estimator other than the ones provided in this -- module. -- ** Choosing points from a sample , Points(..) , choosePoints -- ** Bandwidth estimation , Bandwidth , bandwidth , epanechnikovBW , gaussianBW -- ** Kernels , Kernel , epanechnikovKernel , gaussianKernel -- ** Low-level estimation , estimatePDF , simplePDF -- * References -- $references ) where import Numeric.MathFunctions.Constants (m_1_sqrt_2, m_2_sqrt_pi) import Statistics.Function (minMax) import Statistics.Sample (stdDev) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Generic as G -- | Points from the range of a 'Sample'. newtype Points = Points { fromPoints :: U.Vector Double } deriving (Eq, Show) -- | Bandwidth estimator for an Epanechnikov kernel. epanechnikovBW :: Double -> Bandwidth epanechnikovBW n = (80 / (n * m_2_sqrt_pi)) ** 0.2 -- | Bandwidth estimator for a Gaussian kernel. gaussianBW :: Double -> Bandwidth gaussianBW n = (4 / (n * 3)) ** 0.2 -- | The width of the convolution kernel used. type Bandwidth = Double -- | Compute the optimal bandwidth from the observed data for the -- given kernel. -- -- This function uses an estimate based on the standard deviation of a -- sample (due to Deheuvels), which performs reasonably well for -- unimodal distributions but leads to oversmoothing for more complex -- ones. bandwidth :: G.Vector v Double => (Double -> Bandwidth) -> v Double -> Bandwidth bandwidth kern values = stdDev values * kern (fromIntegral $ G.length values) -- | Choose a uniform range of points at which to estimate a sample's -- probability density function. -- -- If you are using a Gaussian kernel, multiply the sample's bandwidth -- by 3 before passing it to this function. -- -- If this function is passed an empty vector, it returns values of -- positive and negative infinity. choosePoints :: G.Vector v Double => Int -- ^ Number of points to select, /n/ -> Double -- ^ Sample bandwidth, /h/ -> v Double -- ^ Input data -> Points choosePoints n h sample = Points . U.map f $ U.enumFromTo 0 n' where lo = a - h hi = z + h (a, z) = minMax sample d = (hi - lo) / fromIntegral n' f i = lo + fromIntegral i * d n' = n - 1 -- | The convolution kernel. Its parameters are as follows: -- -- * Scaling factor, 1\//nh/ -- -- * Bandwidth, /h/ -- -- * A point at which to sample the input, /p/ -- -- * One sample value, /v/ type Kernel = Double -> Double -> Double -> Double -> Double -- | Epanechnikov kernel for probability density function estimation. epanechnikovKernel :: Kernel epanechnikovKernel f h p v | abs u <= 1 = f * (1 - u * u) | otherwise = 0 where u = (v - p) / (h * 0.75) -- | Gaussian kernel for probability density function estimation. gaussianKernel :: Kernel gaussianKernel f h p v = exp (-0.5 * u * u) * g where u = (v - p) / h g = f * 0.5 * m_2_sqrt_pi * m_1_sqrt_2 -- | Kernel density estimator, providing a non-parametric way of -- estimating the PDF of a random variable. estimatePDF :: G.Vector v Double => Kernel -- ^ Kernel function -> Bandwidth -- ^ Bandwidth, /h/ -> v Double -- ^ Sample data -> Points -- ^ Points at which to estimate -> U.Vector Double estimatePDF kernel h sample | n < 2 = errorShort "estimatePDF" | otherwise = U.map k . fromPoints where k p = G.sum . G.map (kernel f h p) $ sample f = 1 / (h * fromIntegral n) n = G.length sample {-# INLINE estimatePDF #-} -- | A helper for creating a simple kernel density estimation function -- with automatically chosen bandwidth and estimation points. simplePDF :: G.Vector v Double => (Double -> Double) -- ^ Bandwidth function -> Kernel -- ^ Kernel function -> Double -- ^ Bandwidth scaling factor (3 for a Gaussian kernel, 1 for all others) -> Int -- ^ Number of points at which to estimate -> v Double -- ^ sample data -> (Points, U.Vector Double) simplePDF fbw fpdf k numPoints sample = (points, estimatePDF fpdf bw sample points) where points = choosePoints numPoints (bw*k) sample bw = bandwidth fbw sample {-# INLINE simplePDF #-} -- | Simple Epanechnikov kernel density estimator. Returns the -- uniformly spaced points from the sample range at which the density -- function was estimated, and the estimates at those points. epanechnikovPDF :: G.Vector v Double => Int -- ^ Number of points at which to estimate -> v Double -- ^ Data sample -> (Points, U.Vector Double) epanechnikovPDF = simplePDF epanechnikovBW epanechnikovKernel 1 -- | Simple Gaussian kernel density estimator. Returns the uniformly -- spaced points from the sample range at which the density function -- was estimated, and the estimates at those points. gaussianPDF :: G.Vector v Double => Int -- ^ Number of points at which to estimate -> v Double -- ^ Data sample -> (Points, U.Vector Double) gaussianPDF = simplePDF gaussianBW gaussianKernel 3 errorShort :: String -> a errorShort func = error ("Statistics.KernelDensity." ++ func ++ ": at least two points required") -- $references -- -- * Deheuvels, P. (1977) Estimation non paramétrique de la densité -- par histogrammes -- généralisés. Mhttp://archive.numdam.org/article/RSA_1977__25_3_5_0.pdf> statistics-0.10.2.0/Statistics/Test/0000755000000000000000000000000012016036043015354 5ustar0000000000000000statistics-0.10.2.0/Statistics/Test/ChiSquared.hs0000644000000000000000000000305312016036043017741 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | Pearson's chi squared test. module Statistics.Test.ChiSquared ( chi2test -- * Data types , TestType(..) , TestResult(..) ) where import qualified Data.Vector.Generic as G import Statistics.Distribution import Statistics.Distribution.ChiSquared import Statistics.Test.Types -- | Generic form of Pearson chi squared tests for binned data. Data -- sample is supplied in form of tuples (observed quantity, -- expected number of events). Both must be positive. chi2test :: (G.Vector v (Int,Double), G.Vector v Double) => Double -- ^ p-value -> Int -- ^ Number of additional degrees of -- freedom. One degree of freedom -- is due to the fact that the are -- N observation in total and -- accounted for automatically. -> v (Int,Double) -- ^ Observation and expectation. -> TestResult chi2test p ndf vec | ndf < 0 = error $ "Statistics.Test.ChiSquare.chi2test: negative NDF " ++ show ndf | n < 0 = error $ "Statistics.Test.ChiSquare.chi2test: too short data sample" | p > 0 && p < 1 = significant $ complCumulative d chi2 < p | otherwise = error $ "Statistics.Test.ChiSquare.chi2test: bad p-value: " ++ show p where n = G.length vec - ndf - 1 chi2 = G.sum $ G.map (\(o,e) -> sqr (fromIntegral o - e) / e) vec d = chiSquared n sqr x = x * x {-# INLINE chi2test #-} statistics-0.10.2.0/Statistics/Test/Internal.hs0000644000000000000000000000274312016036043017472 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} module Statistics.Test.Internal ( rank , splitByTags ) where import qualified Data.Vector.Generic as G -- Private data type for unfolding data Rank v a = Rank { rankCnt :: {-# UNPACK #-} !Int -- Number of ranks to return , rankVal :: {-# UNPACK #-} !Double -- Rank to return , rankNum :: {-# UNPACK #-} !Double -- Current rank , rankVec :: v a -- Remaining vector } -- | Calculate rank of sample. Sample should be already sorted. rank :: (G.Vector v a, G.Vector v Double) => (a -> a -> Bool) -- ^ Equivalence relation -> v a -- ^ Vector to rank -> v Double rank eq vec = G.unfoldr go (Rank 0 (-1) 1 vec) where go (Rank 0 _ r v) | G.null v = Nothing | otherwise = case G.length h of 1 -> Just (r, Rank 0 0 (r+1) rest) n -> go Rank { rankCnt = n , rankVal = 0.5 * (r*2 + fromIntegral (n-1)) , rankNum = r + fromIntegral n , rankVec = rest } where (h,rest) = G.span (eq $ G.head v) v go (Rank n val r v) = Just (val, Rank (n-1) val r v) {-# INLINE rank #-} -- | Split tagged vector splitByTags :: (G.Vector v a, G.Vector v (Bool,a)) => v (Bool,a) -> (v a, v a) splitByTags vs = (G.map snd a, G.map snd b) where (a,b) = G.unstablePartition fst vs {-# INLINE splitByTags #-} statistics-0.10.2.0/Statistics/Test/KolmogorovSmirnov.hs0000644000000000000000000002526512016036043021436 0ustar0000000000000000-- | -- Module : Statistics.Test.KolmogorovSmirnov -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kolmogov-Smirnov tests are non-parametric tests for assesing -- whether given sample could be described by distribution or whether -- two samples have the same distribution. module Statistics.Test.KolmogorovSmirnov ( -- * Kolmogorov-Smirnov test kolmogorovSmirnovTest , kolmogorovSmirnovTestCdf , kolmogorovSmirnovTest2 -- * Evaluate statistics , kolmogorovSmirnovCdfD , kolmogorovSmirnovD , kolmogorovSmirnov2D -- * Probablities , kolmogorovSmirnovProbability -- * Data types , TestType(..) , TestResult(..) -- * References -- $references ) where import Control.Monad import Control.Monad.ST (ST) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M import Statistics.Distribution (Distribution(..)) import Statistics.Types (Sample) import Statistics.Function (sort) import Statistics.Test.Types import Text.Printf ---------------------------------------------------------------- -- Test ---------------------------------------------------------------- -- | Check that sample could be described by -- distribution. 'Significant' means distribution is not compatible -- with data for given p-value. -- -- This test uses Marsaglia-Tsang-Wang exact alogorithm for -- calculation of p-value. kolmogorovSmirnovTest :: Distribution d => d -- ^ Distribution -> Double -- ^ p-value -> Sample -- ^ Data sample -> TestResult kolmogorovSmirnovTest d = kolmogorovSmirnovTestCdf (cumulative d) {-# INLINE kolmogorovSmirnovTest #-} -- | Variant of 'kolmogorovSmirnovTest' which uses CFD in form of -- function. kolmogorovSmirnovTestCdf :: (Double -> Double) -- ^ CDF of distribution -> Double -- ^ p-value -> Sample -- ^ Data sample -> TestResult kolmogorovSmirnovTestCdf cdf p sample | p > 0 && p < 1 = significant $ 1 - prob < p | otherwise = error "Statistics.Test.KolmogorovSmirnov.kolmogorovSmirnovTestCdf:bad p-value" where d = kolmogorovSmirnovCdfD cdf sample prob = kolmogorovSmirnovProbability (U.length sample) d -- | Two sample Kolmogorov-Smirnov test. It tests whether two data -- samples could be described by the same distribution without -- making any assumptions about it. -- -- This test uses approxmate formula for computing p-value. kolmogorovSmirnovTest2 :: Double -- ^ p-value -> Sample -- ^ Sample 1 -> Sample -- ^ Sample 2 -> TestResult kolmogorovSmirnovTest2 p xs1 xs2 | p > 0 && p < 1 = significant $ 1 - prob( d*(en + 0.12 + 0.11/en) ) < p | otherwise = error "Statistics.Test.KolmogorovSmirnov.kolmogorovSmirnovTest2:bad p-value" where d = kolmogorovSmirnov2D xs1 xs2 -- Effective number of data points n1 = fromIntegral (U.length xs1) n2 = fromIntegral (U.length xs2) en = sqrt $ n1 * n2 / (n1 + n2) -- prob z | z < 0 = error "kolmogorovSmirnov2D: internal error" | z == 0 = 1 | z < 1.18 = let y = exp( -1.23370055013616983 / (z*z) ) in 2.25675833419102515 * sqrt( -log(y) ) * (y + y**9 + y**25 + y**49) | otherwise = let x = exp(-2 * z * z) in 1 - 2*(x - x**4 + x**9) -- FIXME: Find source for approximation for D ---------------------------------------------------------------- -- Kolmogorov's statistic ---------------------------------------------------------------- -- | Calculate Kolmogorov's statistic /D/ for given cumulative -- distribution function (CDF) and data sample. If sample is empty -- returns 0. kolmogorovSmirnovCdfD :: (Double -> Double) -- ^ CDF function -> Sample -- ^ Sample -> Double kolmogorovSmirnovCdfD cdf sample | U.null xs = 0 | otherwise = U.maximum $ U.zipWith3 (\p a b -> abs (p-a) `max` abs (p-b)) ps steps (U.tail steps) where xs = sort sample n = U.length xs -- ps = U.map cdf xs steps = U.map ((/ fromIntegral n) . fromIntegral) $ U.generate (n+1) id -- | Calculate Kolmogorov's statistic /D/ for given cumulative -- distribution function (CDF) and data sample. If sample is empty -- returns 0. kolmogorovSmirnovD :: (Distribution d) => d -- ^ Distribution -> Sample -- ^ Sample -> Double kolmogorovSmirnovD d = kolmogorovSmirnovCdfD (cumulative d) {-# INLINE kolmogorovSmirnovD #-} -- | Calculate Kolmogorov's statistic /D/ for two data samples. If -- either of samples is empty returns 0. kolmogorovSmirnov2D :: Sample -- ^ First sample -> Sample -- ^ Second sample -> Double kolmogorovSmirnov2D sample1 sample2 | U.null sample1 || U.null sample2 = 0 | otherwise = worker 0 0 0 where xs1 = sort sample1 xs2 = sort sample2 n1 = U.length xs1 n2 = U.length xs2 en1 = fromIntegral n1 en2 = fromIntegral n2 -- Find new index skip x i xs = go (i+1) where go n | n >= U.length xs = n | xs U.! n == x = go (n+1) | otherwise = n -- Main loop worker d i1 i2 | i1 >= n1 || i2 >= n2 = d | otherwise = worker d' i1' i2' where d1 = xs1 U.! i1 d2 = xs2 U.! i2 i1' | d1 <= d2 = skip d1 i1 xs1 | otherwise = i1 i2' | d2 <= d1 = skip d2 i2 xs2 | otherwise = i2 d' = max d (abs $ fromIntegral i1' / en1 - fromIntegral i2' / en2) -- | Calculate cumulative probability function for Kolmogorov's -- distribution with /n/ parameters or probability of getting value -- smaller than /d/ with n-elements sample. -- -- It uses algorithm by Marsgalia et. al. and provide at least -- 7-digit accuracy. kolmogorovSmirnovProbability :: Int -- ^ Size of the sample -> Double -- ^ D value -> Double kolmogorovSmirnovProbability n d -- Avoid potencially lengthy calculations for large N and D > 0.999 | s > 7.24 || (s > 3.76 && n > 99) = 1 - 2 * exp( -(2.000071 + 0.331 / sqrt n' + 1.409 / n') * s) -- Exact computation | otherwise = fini $ matrixPower matrix n where s = n' * d * d n' = fromIntegral n size = 2*k - 1 k = floor (n' * d) + 1 h = fromIntegral k - n' * d -- Calculate initial matrix matrix = let m = U.create $ do mat <- M.new (size*size) -- Fill matrix with 0 and 1s for 0 size $ \row -> for 0 size $ \col -> do let val | row + 1 >= col = 1 | otherwise = 0 :: Double M.write mat (row * size + col) val -- Correct left column/bottom row for 0 size $ \i -> do let delta = h ^^ (i + 1) modify mat (i * size) (subtract delta) modify mat (size * size - 1 - i) (subtract delta) -- Correct corner element if needed when (2*h > 1) $ do modify mat ((size - 1) * size) (+ ((2*h - 1) ^ size)) -- Divide diagonals by factorial let divide g num | num == size = return () | otherwise = do for num size $ \i -> modify mat (i * (size + 1) - num) (/ g) divide (g * fromIntegral (num+2)) (num+1) divide 2 1 return mat in Matrix size m 0 -- Last calculation fini m@(Matrix _ _ e) = loop 1 (matrixCenter m) e where loop i ss eQ | i > n = ss * 10 ^^ eQ | ss' < 1e-140 = loop (i+1) (ss' * 1e140) (eQ - 140) | otherwise = loop (i+1) ss' eQ where ss' = ss * fromIntegral i / fromIntegral n ---------------------------------------------------------------- -- Maxtrix operations. -- -- There isn't the matrix package for haskell yet so nessesary minimum -- is implemented here. -- Square matrix stored in row-major order data Matrix = Matrix {-# UNPACK #-} !Int -- Size of matrix !(U.Vector Double) -- Matrix data {-# UNPACK #-} !Int -- In order to avoid overflows -- during matrix multiplication large -- exponent is stored seprately -- Show instance useful mostly for debugging instance Show Matrix where show (Matrix n vs _) = unlines $ map (unwords . map (printf "%.4f")) $ split $ U.toList vs where split [] = [] split xs = row : split rest where (row, rest) = splitAt n xs -- Avoid overflow in the matrix avoidOverflow :: Matrix -> Matrix avoidOverflow m@(Matrix n xs e) | matrixCenter m > 1e140 = Matrix n (U.map (* 1e-140) xs) (e + 140) | otherwise = m -- Unsafe matrix-matrix multiplication. Matrices must be of the same -- size. This is not checked. matrixMultiply :: Matrix -> Matrix -> Matrix matrixMultiply (Matrix n xs e1) (Matrix _ ys e2) = Matrix n (U.generate (n*n) go) (e1 + e2) where go i = U.sum $ U.zipWith (*) row col where nCol = i `rem` n row = U.slice (i - nCol) n xs col = U.backpermute ys $ U.enumFromStepN nCol n n -- Raise matrix to power N. power must be positive it's not checked matrixPower :: Matrix -> Int -> Matrix matrixPower mat 1 = mat matrixPower mat n = avoidOverflow res where mat2 = matrixPower mat (n `quot` 2) pow = matrixMultiply mat2 mat2 res | odd n = matrixMultiply pow mat | otherwise = pow -- Element in the center of matrix (Not corrected for exponent) matrixCenter :: Matrix -> Double matrixCenter (Matrix n xs _) = (U.!) xs (k*n + k) where k = n `quot` 2 -- Simple for loop for :: Monad m => Int -> Int -> (Int -> m ()) -> m () for n0 n f = loop n0 where loop i | i == n = return () | otherwise = f i >> loop (i+1) -- Modify element in the vector modify :: U.Unbox a => M.MVector s a -> Int -> (a -> a) -> ST s () modify arr i f = do x <- M.read arr i M.write arr i (f x) {-# INLINE modify #-} ---------------------------------------------------------------- -- $references -- -- * G. Marsaglia, W. W. Tsang, J. Wang (2003) Evaluating Kolmogorov's -- distribution, Journal of Statistical Software, American -- Statistical Association, vol. 8(i18). statistics-0.10.2.0/Statistics/Test/MannWhitneyU.hs0000644000000000000000000002316012016036043020300 0ustar0000000000000000-- | -- Module : Statistics.Test.MannWhitneyU -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Mann-Whitney U test (also know as Mann-Whitney-Wilcoxon and -- Wilcoxon rank sum test) is a non-parametric test for assesing -- whether two samples of independent observations have different -- mean. module Statistics.Test.MannWhitneyU ( -- * Mann-Whitney U test mannWhitneyUtest , mannWhitneyU , mannWhitneyUCriticalValue , mannWhitneyUSignificant -- ** Wilcoxon rank sum test , wilcoxonRankSums -- * Data types , TestType(..) , TestResult(..) -- * References -- $references ) where import Control.Applicative ((<$>)) import Data.List (findIndex) import Data.Ord (comparing) import qualified Data.Vector.Unboxed as U import Numeric.SpecFunctions (choose) import Statistics.Distribution (quantile) import Statistics.Distribution.Normal (standard) import Statistics.Types (Sample) import Statistics.Function (sortBy) import Statistics.Test.Types import Statistics.Test.Internal -- | The Wilcoxon Rank Sums Test. -- -- This test calculates the sum of ranks for the given two samples. The samples -- are ordered, and assigned ranks (ties are given their average rank), then these -- ranks are summed for each sample. -- -- The return value is (W₁, W₂) where W₁ is the sum of ranks of the first sample -- and W₂ is the sum of ranks of the second sample. This test is trivially transformed -- into the Mann-Whitney U test. You will probably want to use 'mannWhitneyU' -- and the related functions for testing significance, but this function is exposed -- for completeness. wilcoxonRankSums :: Sample -> Sample -> (Double, Double) wilcoxonRankSums xs1 xs2 = ( U.sum ranks1 , U.sum ranks2 ) where -- Ranks for each sample (ranks1,ranks2) = splitByTags $ U.zip tags (rank (==) joinSample) -- Sorted and tagged sample (tags,joinSample) = U.unzip $ sortBy (comparing snd) $ tagSample True xs1 U.++ tagSample False xs2 -- Add tag to a sample tagSample t = U.map ((,) t) -- | The Mann-Whitney U Test. -- -- This is sometimes known as the Mann-Whitney-Wilcoxon U test, and -- confusingly many sources state that the Mann-Whitney U test is the same as -- the Wilcoxon's rank sum test (which is provided as 'wilcoxonRankSums'). -- The Mann-Whitney U is a simple transform of Wilcoxon's rank sum test. -- -- Again confusingly, different sources state reversed definitions for U₁ -- and U₂, so it is worth being explicit about what this function returns. -- Given two samples, the first, xs₁, of size n₁ and the second, xs₂, -- of size n₂, this function returns (U₁, U₂) -- where U₁ = W₁ - (n₁(n₁+1))\/2 -- and U₂ = W₂ - (n₂(n₂+1))\/2, -- where (W₁, W₂) is the return value of @wilcoxonRankSums xs1 xs2@. -- -- Some sources instead state that U₁ and U₂ should be the other way round, often -- expressing this using U₁' = n₁n₂ - U₁ (since U₁ + U₂ = n₁n₂). -- -- All of which you probably don't care about if you just feed this into 'mannWhitneyUSignificant'. mannWhitneyU :: Sample -> Sample -> (Double, Double) mannWhitneyU xs1 xs2 = (fst summedRanks - (n1*(n1 + 1))/2 ,snd summedRanks - (n2*(n2 + 1))/2) where n1 = fromIntegral $ U.length xs1 n2 = fromIntegral $ U.length xs2 summedRanks = wilcoxonRankSums xs1 xs2 -- | Calculates the critical value of Mann-Whitney U for the given sample -- sizes and significance level. -- -- This function returns the exact calculated value of U for all sample sizes; -- it does not use the normal approximation at all. Above sample size 20 it is -- generally recommended to use the normal approximation instead, but this function -- will calculate the higher critical values if you need them. -- -- The algorithm to generate these values is a faster, memoised version of the -- simple unoptimised generating function given in section 2 of \"The Mann Whitney -- Wilcoxon Distribution Using Linked Lists\" mannWhitneyUCriticalValue :: (Int, Int) -- ^ The sample size -> Double -- ^ The p-value (e.g. 0.05) for which you want the critical value. -> Maybe Int -- ^ The critical value (of U). mannWhitneyUCriticalValue (m, n) p | m < 1 || n < 1 = Nothing -- Sample must be nonempty | p >= 1 = Nothing -- Nonsensical p-value | p' <= 1 = Nothing -- p-value is too small. Null hypothesys couln't be disproved | otherwise = findIndex (>= p') $ take (m*n) $ tail $ alookup !! (m+n-2) !! (min m n - 1) where mnCn = (m+n) `choose` n p' = mnCn * p {- -- Original function, without memoisation, from Cheung and Klotz: -- Double is needed to avoid integer overflows. a :: Int -> Int -> Int -> Double a u bigN m | u < 0 = 0 | u >= m * n = bigN `choose` m | m == 1 || n == 1 = fromIntegral (u + 1) | otherwise = a u (bigN - 1) m + a (u - n) (bigN - 1) (m-1) where n = bigN - m -} -- Memoised version of the original a function, above. -- -- Doubles are stored to avoid integer overflow. 32-bit Ints begin to -- overflow for bigN as small as 33 (64-bit one at 66) while Double to -- go to infinity till bigN=1029 -- -- -- outer list is indexed by big N - 2 -- inner list by (m-1) (we know m < bigN) -- innermost list by u -- -- So: (alookup !! (bigN - 2) !! (m - 1) ! u) == a u bigN m alookup :: [[[Double]]] alookup = gen 2 [1 : repeat 2] where gen bigN predBigNList = let bigNlist = [ [ amemoed u m | u <- [0 .. m*(bigN-m)] ] ++ repeat (bigN `choose` m) | m <- [1 .. (bigN-1)]] -- has bigN-1 elements in bigNlist : gen (bigN+1) bigNlist where amemoed :: Int -> Int -> Double amemoed u m | m == 1 || n == 1 = fromIntegral (u + 1) | otherwise = mList !! u + if u < n then 0 else predmList !! (u-n) where n = bigN - m (predmList : mList : _) = drop (m-2) predBigNList -- Lists for m-1 and m respectively. i-th list correspond to m=i+1 -- -- We know that predBigNList has bigN - 2 elements -- (and we know that n > 1 therefore bigN > m + 1) -- So bigN - 2 >= m, i.e. predBigNList must have at least m elements -- elements, so dropping (m-2) must leave at least 2 -- | Calculates whether the Mann Whitney U test is significant. -- -- If both sample sizes are less than or equal to 20, the exact U critical value -- (as calculated by 'mannWhitneyUCriticalValue') is used. If either sample is -- larger than 20, the normal approximation is used instead. -- -- If you use a one-tailed test, the test indicates whether the first sample is -- significantly larger than the second. If you want the opposite, simply reverse -- the order in both the sample size and the (U₁, U₂) pairs. mannWhitneyUSignificant :: TestType -- ^ Perform one-tailed test (see description above). -> (Int, Int) -- ^ The samples' size from which the (U₁,U₂) values were derived. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> (Double, Double) -- ^ The (U₁, U₂) values from 'mannWhitneyU'. -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. mannWhitneyUSignificant test (in1, in2) p (u1, u2) --Use normal approximation | in1 > 20 || in2 > 20 = let mean = n1 * n2 / 2 sigma = sqrt $ n1*n2*(n1 + n2 + 1) / 12 z = (mean - u1) / sigma in Just $ case test of OneTailed -> significant $ z < quantile standard p TwoTailed -> significant $ abs z > abs (quantile standard (p/2)) -- Use exact critical value | otherwise = do crit <- fromIntegral <$> mannWhitneyUCriticalValue (in1, in2) p return $ case test of OneTailed -> significant $ u2 <= crit TwoTailed -> significant $ min u1 u2 <= crit where n1 = fromIntegral in1 n2 = fromIntegral in2 -- | Perform Mann-Whitney U Test for two samples and required -- significance. For additional information check documentation of -- 'mannWhitneyU' and 'mannWhitneyUSignificant'. This is just a helper -- function. -- -- One-tailed test checks whether first sample is significantly larger -- than second. Two-tailed whether they are significantly different. mannWhitneyUtest :: TestType -- ^ Perform one-tailed test (see description above). -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Sample -- ^ First sample -> Sample -- ^ Second sample -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too small to -- make a decision. mannWhitneyUtest ontTail p smp1 smp2 = mannWhitneyUSignificant ontTail (n1,n2) p $ mannWhitneyU smp1 smp2 where n1 = U.length smp1 n2 = U.length smp2 -- $references -- -- * Cheung, Y.K.; Klotz, J.H. (1997) The Mann Whitney Wilcoxon -- distribution using linked lists. /Statistica Sinica/ -- 7:805–813. -- . statistics-0.10.2.0/Statistics/Test/NonParametric.hs0000644000000000000000000000126512016036043020456 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Test.NonParametric -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for performing non-parametric tests (i.e. tests without an assumption -- of underlying distribution). -- HADDOCK NOTE -- ₁ is 1 subscript -- ₂ is 2 subscript module Statistics.Test.NonParametric {-# DEPRECATED "Use S.Test.MannWhitneyU and S.Test.WilcoxonT instead" #-} ( module Statistics.Test.MannWhitneyU , module Statistics.Test.WilcoxonT ) where import Statistics.Test.MannWhitneyU import Statistics.Test.WilcoxonT statistics-0.10.2.0/Statistics/Test/Types.hs0000644000000000000000000000157212016036043017021 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} module Statistics.Test.Types ( TestType(..) , TestResult(..) , significant ) where import Data.Typeable (Typeable) -- | Test type. Exact meaning depends on a specific test. But -- generally it's tested whether some statistics is too big (small) -- for 'OneTailed' or whether it too big or too small for 'TwoTailed' data TestType = OneTailed | TwoTailed deriving (Eq,Ord,Show,Typeable) -- | Result of hypothesis testing data TestResult = Significant -- ^ Null hypothesis should be rejected | NotSignificant -- ^ Data is compatible with hypothesis deriving (Eq,Ord,Show,Typeable) -- | Significant if parameter is 'True', not significant otherwiser significant :: Bool -> TestResult significant True = Significant significant False = NotSignificant {-# INLINE significant #-} statistics-0.10.2.0/Statistics/Test/WilcoxonT.hs0000644000000000000000000002152412016036043017642 0ustar0000000000000000-- | -- Module : Statistics.Test.WilcoxonT -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Wilcoxon matched-pairs signed-rank test is non-parametric test -- which could be used to whether two related samples have different -- means. -- -- WARNING: current implementation contain critical bugs -- module Statistics.Test.WilcoxonT ( -- * Wilcoxon signed-rank matched-pair test wilcoxonMatchedPairTest , wilcoxonMatchedPairSignedRank , wilcoxonMatchedPairSignificant , wilcoxonMatchedPairSignificance , wilcoxonMatchedPairCriticalValue -- * Data types , TestType(..) , TestResult(..) ) where import Control.Applicative ((<$>)) import Data.Function (on) import Data.List (findIndex) import Data.Ord (comparing) import qualified Data.Vector.Unboxed as U import Statistics.Types (Sample) import Statistics.Function (sortBy) import Statistics.Test.Types import Statistics.Test.Internal -- | The Wilcoxon matched-pairs signed-rank test. -- -- The value returned is the pair (T+, T-). T+ is the sum of positive ranks (the -- ranks of the differences where the first parameter is higher) whereas T- is -- the sum of negative ranks (the ranks of the differences where the second parameter is higher). -- These values mean little by themselves, and should be combined with the 'wilcoxonSignificant' -- function in this module to get a meaningful result. -- -- The samples are zipped together: if one is longer than the other, both are truncated -- to the the length of the shorter sample. -- -- Note that: wilcoxonMatchedPairSignedRank == (\(x, y) -> (y, x)) . flip wilcoxonMatchedPairSignedRank wilcoxonMatchedPairSignedRank :: Sample -> Sample -> (Double, Double) wilcoxonMatchedPairSignedRank a b = ( U.sum ranks1 , negate $ U.sum ranks2 ) where (ranks1, ranks2) = splitByTags $ U.zip tags (rank ((==) `on` abs) diffs) (tags,diffs) = U.unzip $ U.map (\x -> (x>0 , x)) -- Attack tags to distribution elements $ U.filter (/= 0.0) -- Remove equal elements $ sortBy (comparing abs) -- Sort the differences by absolute difference $ U.zipWith (-) a b -- Work out differences -- | The coefficients for x^0, x^1, x^2, etc, in the expression -- \prod_{r=1}^s (1 + x^r). See the Mitic paper for details. -- -- We can define: -- f(1) = 1 + x -- f(r) = (1 + x^r)*f(r-1) -- = f(r-1) + x^r * f(r-1) -- The effect of multiplying the equation by x^r is to shift -- all the coefficients by r down the list. -- -- This list will be processed lazily from the head. coefficients :: Int -> [Integer] coefficients 1 = [1, 1] -- 1 + x coefficients r = let coeffs = coefficients (r-1) (firstR, rest) = splitAt r coeffs in firstR ++ add rest coeffs where add (x:xs) (y:ys) = x + y : add xs ys add xs [] = xs add [] ys = ys -- This list will be processed lazily from the head. summedCoefficients :: Int -> [Double] summedCoefficients n | n < 1 = error "Statistics.Test.WilcoxonT.summedCoefficients: nonpositive sample size" | n > 1023 = error "Statistics.Test.WilcoxonT.summedCoefficients: sample is too large (see bug #18)" | otherwise = map fromIntegral $ scanl1 (+) $ coefficients n -- | Tests whether a given result from a Wilcoxon signed-rank matched-pairs test -- is significant at the given level. -- -- This function can perform a one-tailed or two-tailed test. If the first -- parameter to this function is 'TwoTailed', the test is performed two-tailed to -- check if the two samples differ significantly. If the first parameter is -- 'OneTailed', the check is performed one-tailed to decide whether the first sample -- (i.e. the first sample you passed to 'wilcoxonMatchedPairSignedRank') is -- greater than the second sample (i.e. the second sample you passed to -- 'wilcoxonMatchedPairSignedRank'). If you wish to perform a one-tailed test -- in the opposite direction, you can either pass the parameters in a different -- order to 'wilcoxonMatchedPairSignedRank', or simply swap the values in the resulting -- pair before passing them to this function. wilcoxonMatchedPairSignificant :: TestType -- ^ Perform one- or two-tailed test (see description below). -> Int -- ^ The sample size from which the (T+,T-) values were derived. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> (Double, Double) -- ^ The (T+, T-) values from 'wilcoxonMatchedPairSignedRank'. -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. wilcoxonMatchedPairSignificant test sampleSize p (tPlus, tMinus) = case test of -- According to my nearest book (Understanding Research Methods and Statistics -- by Gary W. Heiman, p590), to check that the first sample is bigger you must -- use the absolute value of T- for a one-tailed check: OneTailed -> (significant . (abs tMinus <=) . fromIntegral) <$> wilcoxonMatchedPairCriticalValue sampleSize p -- Otherwise you must use the value of T+ and T- with the smallest absolute value: TwoTailed -> (significant . (t <=) . fromIntegral) <$> wilcoxonMatchedPairCriticalValue sampleSize (p/2) where t = min (abs tPlus) (abs tMinus) -- | Obtains the critical value of T to compare against, given a sample size -- and a p-value (significance level). Your T value must be less than or -- equal to the return of this function in order for the test to work out -- significant. If there is a Nothing return, the sample size is too small to -- make a decision. -- -- 'wilcoxonSignificant' tests the return value of 'wilcoxonMatchedPairSignedRank' -- for you, so you should use 'wilcoxonSignificant' for determining test results. -- However, this function is useful, for example, for generating lookup tables -- for Wilcoxon signed rank critical values. -- -- The return values of this function are generated using the method detailed in -- the paper \"Critical Values for the Wilcoxon Signed Rank Statistic\", Peter -- Mitic, The Mathematica Journal, volume 6, issue 3, 1996, which can be found -- here: . -- According to that paper, the results may differ from other published lookup tables, but -- (Mitic claims) the values obtained by this function will be the correct ones. wilcoxonMatchedPairCriticalValue :: Int -- ^ The sample size -> Double -- ^ The p-value (e.g. 0.05) for which you want the critical value. -> Maybe Int -- ^ The critical value (of T), or Nothing if -- the sample is too small to make a decision. wilcoxonMatchedPairCriticalValue sampleSize p = case critical of Just n | n < 0 -> Nothing | otherwise -> Just n Nothing -> Just maxBound -- shouldn't happen: beyond end of list where m = (2 ** fromIntegral sampleSize) * p critical = subtract 1 <$> findIndex (> m) (summedCoefficients sampleSize) -- | Works out the significance level (p-value) of a T value, given a sample -- size and a T value from the Wilcoxon signed-rank matched-pairs test. -- -- See the notes on 'wilcoxonCriticalValue' for how this is calculated. wilcoxonMatchedPairSignificance :: Int -- ^ The sample size -> Double -- ^ The value of T for which you want the significance. -> Double -- ^ The significance (p-value). wilcoxonMatchedPairSignificance sampleSize rnk = (summedCoefficients sampleSize !! floor rnk) / 2 ** fromIntegral sampleSize -- | The Wilcoxon matched-pairs signed-rank test. The samples are -- zipped together: if one is longer than the other, both are -- truncated to the the length of the shorter sample. -- -- For one-tailed test it tests whether first sample is significantly -- greater than the second. For two-tailed it checks whether they -- significantly differ -- -- Check 'wilcoxonMatchedPairSignedRank' and -- 'wilcoxonMatchedPairSignificant' for additional information. wilcoxonMatchedPairTest :: TestType -- ^ Perform one-tailed test. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Sample -- ^ First sample -> Sample -- ^ Second sample -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. wilcoxonMatchedPairTest test p smp1 smp2 = wilcoxonMatchedPairSignificant test (min n1 n2) p $ wilcoxonMatchedPairSignedRank smp1 smp2 where n1 = U.length smp1 n2 = U.length smp2 statistics-0.10.2.0/tests/0000755000000000000000000000000012016036043013445 5ustar0000000000000000statistics-0.10.2.0/tests/tests.hs0000644000000000000000000000065612016036043015152 0ustar0000000000000000import Test.Framework (defaultMain) import Tests.Distribution import Tests.NonparametricTest import qualified Tests.Transform import qualified Tests.Function import qualified Tests.KDE main :: IO () main = defaultMain [ distributionTests , nonparametricTests , Tests.Transform.tests , Tests.Function.tests , Tests.KDE.tests ] statistics-0.10.2.0/tests/Tests/0000755000000000000000000000000012016036043014547 5ustar0000000000000000statistics-0.10.2.0/tests/Tests/Distribution.hs0000644000000000000000000002523212016036043017566 0ustar0000000000000000{-# OPTIONS_GHC -fno-warn-orphans #-} {-# LANGUAGE ScopedTypeVariables #-} -- Required for Param {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE OverlappingInstances #-} {-# LANGUAGE ViewPatterns #-} module Tests.Distribution ( distributionTests ) where import Control.Applicative import Control.Exception import Data.List (find) import Data.Typeable (Typeable) import qualified Numeric.IEEE as IEEE import Test.Framework (Test,testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck as QC import Test.QuickCheck.Monadic as QC import Text.Printf import Statistics.Distribution import Statistics.Distribution.Beta import Statistics.Distribution.Binomial import Statistics.Distribution.ChiSquared import Statistics.Distribution.CauchyLorentz import Statistics.Distribution.Exponential import Statistics.Distribution.FDistribution import Statistics.Distribution.Gamma import Statistics.Distribution.Geometric import Statistics.Distribution.Hypergeometric import Statistics.Distribution.Normal import Statistics.Distribution.Poisson import Statistics.Distribution.StudentT import Statistics.Distribution.Uniform import Prelude hiding (catch) import Tests.Helpers -- | Tests for all distributions distributionTests :: Test distributionTests = testGroup "Tests for all distributions" [ contDistrTests (T :: T BetaDistribution ) , contDistrTests (T :: T CauchyDistribution ) , contDistrTests (T :: T ChiSquared ) , contDistrTests (T :: T ExponentialDistribution ) , contDistrTests (T :: T GammaDistribution ) , contDistrTests (T :: T NormalDistribution ) , contDistrTests (T :: T UniformDistribution ) , contDistrTests (T :: T StudentT ) , contDistrTests (T :: T FDistribution ) , discreteDistrTests (T :: T BinomialDistribution ) , discreteDistrTests (T :: T GeometricDistribution ) , discreteDistrTests (T :: T HypergeometricDistribution ) , discreteDistrTests (T :: T PoissonDistribution ) , unitTests ] ---------------------------------------------------------------- -- Tests ---------------------------------------------------------------- -- Tests for continous distribution contDistrTests :: (Param d, ContDistr d, QC.Arbitrary d, Typeable d, Show d) => T d -> Test contDistrTests t = testGroup ("Tests for: " ++ typeName t) $ cdfTests t ++ [ testProperty "PDF sanity" $ pdfSanityCheck t , testProperty "Quantile is CDF inverse" $ quantileIsInvCDF t , testProperty "quantile fails p<0||p>1" $ quantileShouldFail t ] -- Tests for discrete distribution discreteDistrTests :: (Param d, DiscreteDistr d, QC.Arbitrary d, Typeable d, Show d) => T d -> Test discreteDistrTests t = testGroup ("Tests for: " ++ typeName t) $ cdfTests t ++ [ testProperty "Prob. sanity" $ probSanityCheck t , testProperty "CDF is sum of prob." $ discreteCDFcorrect t ] -- Tests for distributions which have CDF cdfTests :: (Param d, Distribution d, QC.Arbitrary d, Show d) => T d -> [Test] cdfTests t = [ testProperty "C.D.F. sanity" $ cdfSanityCheck t , testProperty "CDF limit at +∞" $ cdfLimitAtPosInfinity t , testProperty "CDF limit at -∞" $ cdfLimitAtNegInfinity t , testProperty "CDF is nondecreasing" $ cdfIsNondecreasing t , testProperty "1-CDF is correct" $ cdfComplementIsCorrect t ] ---------------------------------------------------------------- -- CDF is in [0,1] range cdfSanityCheck :: (Distribution d) => T d -> d -> Double -> Bool cdfSanityCheck _ d x = c >= 0 && c <= 1 where c = cumulative d x -- CDF never decreases cdfIsNondecreasing :: (Distribution d) => T d -> d -> Double -> Double -> Bool cdfIsNondecreasing _ d = monotonicallyIncreasesIEEE $ cumulative d -- CDF limit at +∞ is 1 cdfLimitAtPosInfinity :: (Param d, Distribution d) => T d -> d -> Property cdfLimitAtPosInfinity _ d = okForInfLimit d ==> printTestCase ("Last elements: " ++ show (drop 990 probs)) $ Just 1.0 == (find (>=1) probs) where probs = take 1000 $ map (cumulative d) $ iterate (*1.4) 1 -- CDF limit at -∞ is 0 cdfLimitAtNegInfinity :: (Param d, Distribution d) => T d -> d -> Property cdfLimitAtNegInfinity _ d = okForInfLimit d ==> printTestCase ("Last elements: " ++ show (drop 990 probs)) $ case find (< IEEE.epsilon) probs of Nothing -> False Just p -> p >= 0 where probs = take 1000 $ map (cumulative d) $ iterate (*1.4) (-1) -- CDF's complement is implemented correctly cdfComplementIsCorrect :: (Distribution d) => T d -> d -> Double -> Bool cdfComplementIsCorrect _ d x = (eq 1e-14) 1 (cumulative d x + complCumulative d x) -- PDF is positive pdfSanityCheck :: (ContDistr d) => T d -> d -> Double -> Bool pdfSanityCheck _ d x = p >= 0 where p = density d x -- Quantile is inverse of CDF quantileIsInvCDF :: (Param d, ContDistr d) => T d -> d -> Double -> Property quantileIsInvCDF _ d (snd . properFraction -> p) = p > 0 && p < 1 ==> ( printTestCase (printf "Quantile = %g" q ) $ printTestCase (printf "Probability = %g" p ) $ printTestCase (printf "Probability' = %g" p') $ printTestCase (printf "Error = %e" (abs $ p - p')) $ abs (p - p') < invQuantilePrec d ) where q = quantile d p p' = cumulative d q -- Test that quantile fails if p<0 or p>1 quantileShouldFail :: (ContDistr d) => T d -> d -> Double -> Property quantileShouldFail _ d p = p < 0 || p > 1 ==> QC.monadicIO $ do r <- QC.run $ catch (do { return $! quantile d p; return False }) (\(e :: SomeException) -> return True) QC.assert r -- Probability is in [0,1] range probSanityCheck :: (DiscreteDistr d) => T d -> d -> Int -> Bool probSanityCheck _ d x = p >= 0 && p <= 1 where p = probability d x -- Check that discrete CDF is correct discreteCDFcorrect :: (DiscreteDistr d) => T d -> d -> Int -> Int -> Property discreteCDFcorrect _ d a b = printTestCase (printf "CDF = %g" p1) $ printTestCase (printf "Sum = %g" p2) $ printTestCase (printf "Δ = %g" (abs (p1 - p2))) $ abs (p1 - p2) < 3e-10 -- Avoid too large differeneces. Otherwise there is to much to sum -- -- Absolute difference is used guard againist precision loss when -- close values of CDF are subtracted where n = min a b m = n + (abs (a - b) `mod` 100) p1 = cumulative d (fromIntegral m + 0.5) - cumulative d (fromIntegral n - 0.5) p2 = sum $ map (probability d) [n .. m] ---------------------------------------------------------------- -- Arbitrary instances for ditributions ---------------------------------------------------------------- instance QC.Arbitrary BinomialDistribution where arbitrary = binomial <$> QC.choose (1,100) <*> QC.choose (0,1) instance QC.Arbitrary ExponentialDistribution where arbitrary = exponential <$> QC.choose (0,100) instance QC.Arbitrary GammaDistribution where arbitrary = gammaDistr <$> QC.choose (0.1,10) <*> QC.choose (0.1,10) instance QC.Arbitrary BetaDistribution where arbitrary = betaDistr <$> QC.choose (1e-3,10) <*> QC.choose (1e-3,10) instance QC.Arbitrary GeometricDistribution where arbitrary = geometric <$> QC.choose (0,1) instance QC.Arbitrary HypergeometricDistribution where arbitrary = do l <- QC.choose (1,20) m <- QC.choose (0,l) k <- QC.choose (1,l) return $ hypergeometric m l k instance QC.Arbitrary NormalDistribution where arbitrary = normalDistr <$> QC.choose (-100,100) <*> QC.choose (1e-3, 1e3) instance QC.Arbitrary PoissonDistribution where arbitrary = poisson <$> QC.choose (0,1) instance QC.Arbitrary ChiSquared where arbitrary = chiSquared <$> QC.choose (1,100) instance QC.Arbitrary UniformDistribution where arbitrary = do a <- QC.arbitrary b <- QC.arbitrary `suchThat` (/= a) return $ uniformDistr a b instance QC.Arbitrary CauchyDistribution where arbitrary = cauchyDistribution <$> arbitrary <*> ((abs <$> arbitrary) `suchThat` (> 0)) instance QC.Arbitrary StudentT where arbitrary = studentT <$> ((abs <$> arbitrary) `suchThat` (>0)) instance QC.Arbitrary FDistribution where arbitrary = fDistribution <$> ((abs <$> arbitrary) `suchThat` (>0)) <*> ((abs <$> arbitrary) `suchThat` (>0)) -- Parameters for distribution testing. Some distribution require -- relaxing parameters a bit class Param a where -- Precision for quantileIsInvCDF invQuantilePrec :: a -> Double invQuantilePrec _ = 1e-14 -- Distribution is OK for testing limits okForInfLimit :: a -> Bool okForInfLimit _ = True instance Param a instance Param StudentT where invQuantilePrec _ = 1e-13 okForInfLimit d = studentTndf d > 0.75 instance Param FDistribution where invQuantilePrec _ = 1e-12 ---------------------------------------------------------------- -- Unit tests ---------------------------------------------------------------- unitTests :: Test unitTests = testGroup "Unit tests" [ testAssertion "density (gammaDistr 150 1/150) 1 == 4.883311" $ 4.883311418525483 =~ (density (gammaDistr 150 (1/150)) 1) -- Student-T , testStudentPDF 0.3 1.34 0.0648215 -- PDF , testStudentPDF 1 0.42 0.27058 , testStudentPDF 4.4 0.33 0.352994 , testStudentCDF 0.3 3.34 0.757146 -- CDF , testStudentCDF 1 0.42 0.626569 , testStudentCDF 4.4 0.33 0.621739 -- F-distribution , testFdistrPDF 1 3 3 (1/(6 * pi)) -- PDF , testFdistrPDF 2 2 1.2 0.206612 , testFdistrPDF 10 12 8 0.000385613179281892790166 , testFdistrCDF 1 3 3 0.81830988618379067153 -- CDF , testFdistrCDF 2 2 1.2 0.545455 , testFdistrCDF 10 12 8 0.99935509863451408041 ] where -- Student-T testStudentPDF ndf x exact = testAssertion (printf "density (studentT %f) %f ≈ %f" ndf x exact) $ eq 1e-5 exact (density (studentT ndf) x) testStudentCDF ndf x exact = testAssertion (printf "cumulative (studentT %f) %f ≈ %f" ndf x exact) $ eq 1e-5 exact (cumulative (studentT ndf) x) -- F-distribution testFdistrPDF n m x exact = testAssertion (printf "density (fDistribution %i %i) %f ≈ %f [got %f]" n m x exact d) $ eq 1e-5 exact d where d = density (fDistribution n m) x testFdistrCDF n m x exact = testAssertion (printf "cumulative (fDistribution %i %i) %f ≈ %f [got %f]" n m x exact d) $ eq 1e-5 exact d where d = cumulative (fDistribution n m) x statistics-0.10.2.0/tests/Tests/Function.hs0000644000000000000000000000075212016036043016674 0ustar0000000000000000module Tests.Function ( tests ) where import qualified Data.Vector.Unboxed as U import Data.Vector.Unboxed ((!)) import Test.QuickCheck import Test.Framework import Test.Framework.Providers.QuickCheck2 import Statistics.Function tests :: Test tests = testGroup "S.Function" [ testProperty "Sort is sort" p_sort ] p_sort :: [Double] -> Property p_sort xs = not (null xs) ==> U.all (uncurry (<=)) (U.zip v $ U.tail v) where v = sort $ U.fromList xs statistics-0.10.2.0/tests/Tests/Helpers.hs0000644000000000000000000000552312016036043016512 0ustar0000000000000000-- | Helpers for testing module Tests.Helpers ( -- * helpers T(..) , typeName , eq , eqC , (=~) -- * Generic QC tests , monotonicallyIncreases , monotonicallyIncreasesIEEE -- * HUnit helpers , testAssertion , testEquality ) where import Data.Complex import Data.Typeable import qualified Numeric.IEEE as IEEE import qualified Test.HUnit as HU import Test.Framework import Test.Framework.Providers.HUnit import Numeric.MathFunctions.Constants ---------------------------------------------------------------- -- Helpers ---------------------------------------------------------------- -- | Phantom typed value used to select right instance in QC tests data T a = T -- | String representation of type name typeName :: Typeable a => T a -> String typeName = show . typeOf . typeParam where typeParam :: T a -> a typeParam _ = undefined -- | Approximate equality for 'Double'. Doesn't work well for numbers -- which are almost zero. eq :: Double -- ^ Relative error -> Double -> Double -> Bool eq eps a b | a == 0 && b == 0 = True | otherwise = abs (a - b) <= eps * max (abs a) (abs b) -- | Approximate equality for 'Complex Double' eqC :: Double -- ^ Relative error -> Complex Double -> Complex Double -> Bool eqC eps a@(ar :+ ai) b@(br :+ bi) | a == 0 && b == 0 = True | otherwise = abs (ar - br) <= eps * d && abs (ai - bi) <= eps * d where d = max (realPart $ abs a) (realPart $ abs b) -- | Approximately equal up to 1 ulp (=~) :: Double -> Double -> Bool (=~) = eq m_epsilon ---------------------------------------------------------------- -- Generic QC ---------------------------------------------------------------- -- Check that function is nondecreasing monotonicallyIncreases :: (Ord a, Ord b) => (a -> b) -> a -> a -> Bool monotonicallyIncreases f x1 x2 = f (min x1 x2) <= f (max x1 x2) -- Check that function is nondecreasing taking rounding errors into -- account. -- -- In fact funstion is allowed to decrease less than one ulp in order -- to guard againist problems with excess precision. On x86 FPU works -- with 80-bit numbers but doubles are 64-bit so rounding happens -- whenever values are moved from registers to memory monotonicallyIncreasesIEEE :: (Ord a, IEEE.IEEE b) => (a -> b) -> a -> a -> Bool monotonicallyIncreasesIEEE f x1 x2 = y1 <= y2 || (y1 - y2) < y2 * IEEE.epsilon where y1 = f (min x1 x2) y2 = f (max x1 x2) ---------------------------------------------------------------- -- HUnit helpers ---------------------------------------------------------------- testAssertion :: String -> Bool -> Test testAssertion str cont = testCase str $ HU.assertBool str cont testEquality :: (Show a, Eq a) => String -> a -> a -> Test testEquality msg a b = testCase msg $ HU.assertEqual msg a b statistics-0.10.2.0/tests/Tests/KDE.hs0000644000000000000000000000210112016036043015500 0ustar0000000000000000-- | Tests for Kernel density estimates. module Tests.KDE ( tests )where import Data.Vector.Unboxed ((!)) import qualified Data.Vector.Unboxed as U import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck import Text.Printf import Statistics.Sample.KernelDensity tests :: Test tests = testGroup "KDE" [ testProperty "integral(PDF) == 1" t_densityIsPDF ] t_densityIsPDF :: [Double] -> Property t_densityIsPDF vec = not (null vec) ==> test where (xs,ys) = kde 4096 (U.fromList vec) step = (xs ! 1) - (xs ! 0) integral = integratePDF step ys -- test = printTestCase (printf "Integral %f" integral) $ abs (1 - integral) <= 1e-3 integratePDF :: Double -> U.Vector Double -> Double integratePDF step vec = step * U.sum (U.zipWith (*) vec weights) where n = U.length vec weights = U.generate n go where go i | i == 0 = 0.5 | i == n-1 = 0.5 | otherwise = 1statistics-0.10.2.0/tests/Tests/NonparametricTest.hs0000644000000000000000000002314412016036043020551 0ustar0000000000000000-- Tests for Statistics.Test.NonParametric module Tests.NonparametricTest ( nonparametricTests ) where import qualified Data.Vector.Unboxed as U import Test.HUnit (assertEqual) import Test.Framework import Test.Framework.Providers.HUnit import Statistics.Test.MannWhitneyU import Statistics.Test.WilcoxonT import Tests.Helpers import Tests.NonparametricTest.Table import Statistics.Test.KolmogorovSmirnov import Statistics.Distribution.Normal (standard) nonparametricTests :: Test nonparametricTests = testGroup "Nonparametric tests" $ concat [ mannWhitneyTests , wilcoxonSumTests , wilcoxonPairTests , kolmogorovSmirnovDTest ] ---------------------------------------------------------------- mannWhitneyTests :: [Test] mannWhitneyTests = zipWith test [(0::Int)..] testData ++ [ testEquality "Mann-Whitney U Critical Values, m=1" (replicate (20*3) Nothing) [mannWhitneyUCriticalValue (1,x) p | x <- [1..20], p <- [0.005,0.01,0.025]] , testEquality "Mann-Whitney U Critical Values, m=2, p=0.025" (replicate 7 Nothing ++ map Just [0,0,0,0,1,1,1,1,1,2,2,2,2]) [mannWhitneyUCriticalValue (2,x) 0.025 | x <- [1..20]] , testEquality "Mann-Whitney U Critical Values, m=6, p=0.05" (replicate 1 Nothing ++ map Just [0, 2,3,5,7,8,10,12,14,16,17,19,21,23,25,26,28,30,32]) [mannWhitneyUCriticalValue (6,x) 0.05 | x <- [1..20]] , testEquality "Mann-Whitney U Critical Values, m=20, p=0.025" (replicate 1 Nothing ++ map Just [2,8,14,20,27,34,41,48,55,62,69,76,83,90,98,105,112,119,127]) [mannWhitneyUCriticalValue (20,x) 0.025 | x <- [1..20]] ] where test n (a, b, c, d) = testCase "Mann-Whitney" $ do assertEqual ("Mann-Whitney U " ++ show n) c us assertEqual ("Mann-Whitney U Sig " ++ show n) d ss where us = mannWhitneyU (U.fromList a) (U.fromList b) ss = mannWhitneyUSignificant TwoTailed (length a, length b) 0.05 us -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double), Maybe TestResult)] testData = [ ( [3,4,2,6,2,5] , [9,7,5,10,6,8] , (2, 34) , Just Significant ) , ( [540,480,600,590,605] , [760,890,1105,595,940] , (2, 23) , Just Significant ) , ( [19,22,16,29,24] , [20,11,17,12] , (17, 3) , Just NotSignificant ) , ( [126,148,85,61, 179,93, 45,189,85,93] , [194,128,69,135,171,149,89,248,79,137] , (35,65) , Just NotSignificant ) , ( [1..30] , [1..30] , (450,450) , Just NotSignificant ) , ( [1 .. 30] , [11.5 .. 40 ] , (190.0,710.0) , Just Significant ) ] wilcoxonSumTests :: [Test] wilcoxonSumTests = zipWith test [(0::Int)..] testData where test n (a, b, c) = testCase "Wilcoxon Sum" $ assertEqual ("Wilcoxon Sum " ++ show n) c (wilcoxonRankSums (U.fromList a) (U.fromList b)) -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double))] testData = [ ( [8.50,9.48,8.65,8.16,8.83,7.76,8.63] , [8.27,8.20,8.25,8.14,9.00,8.10,7.20,8.32,7.70] , (75, 61) ) , ( [0.45,0.50,0.61,0.63,0.75,0.85,0.93] , [0.44,0.45,0.52,0.53,0.56,0.58,0.58,0.65,0.79] , (71.5, 64.5) ) ] wilcoxonPairTests :: [Test] wilcoxonPairTests = zipWith test [(0::Int)..] testData ++ -- Taken from the Mitic paper: [ testAssertion "Sig 16, 35" (to4dp 0.0467 $ wilcoxonMatchedPairSignificance 16 35) , testAssertion "Sig 16, 36" (to4dp 0.0523 $ wilcoxonMatchedPairSignificance 16 36) , testEquality "Wilcoxon critical values, p=0.05" (replicate 4 Nothing ++ map Just [0,2,3,5,8,10,13,17,21,25,30,35,41,47,53,60,67,75,83,91,100,110,119]) [wilcoxonMatchedPairCriticalValue x 0.05 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.025" (replicate 5 Nothing ++ map Just [0,2,3,5,8,10,13,17,21,25,29,34,40,46,52,58,65,73,81,89,98,107]) [wilcoxonMatchedPairCriticalValue x 0.025 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.01" (replicate 6 Nothing ++ map Just [0,1,3,5,7,9,12,15,19,23,27,32,37,43,49,55,62,69,76,84,92]) [wilcoxonMatchedPairCriticalValue x 0.01 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.005" (replicate 7 Nothing ++ map Just [0,1,3,5,7,9,12,15,19,23,27,32,37,42,48,54,61,68,75,83]) [wilcoxonMatchedPairCriticalValue x 0.005 | x <- [1..27]] ] where test n (a, b, c) = testEquality ("Wilcoxon Paired " ++ show n) c res where res = (wilcoxonMatchedPairSignedRank (U.fromList a) (U.fromList b)) -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double))] testData = [ ([1..10], [1..10], (0, 0 )) , ([1..5], [6..10], (0, 5*(-3))) -- Worked example from the Internet: , ( [125,115,130,140,140,115,140,125,140,135] , [110,122,125,120,140,124,123,137,135,145] , ( sum $ filter (> 0) [7,-3,1.5,9,0,-4,8,-6,1.5,-5] , sum $ filter (< 0) [7,-3,1.5,9,0,-4,8,-6,1.5,-5] ) ) -- Worked examples from books/papers: , ( [2.4,1.9,2.3,1.9,2.4,2.5] , [2.0,2.1,2.0,2.0,1.8,2.0] , (18, -3) ) , ( [130,170,125,170,130,130,145,160] , [120,163,120,135,143,136,144,120] , (27, -9) ) , ( [540,580,600,680,430,740,600,690,605,520] , [760,710,1105,880,500,990,1050,640,595,520] , (3, -42) ) ] to4dp tgt x = x >= tgt - 0.00005 && x < tgt + 0.00005 ---------------------------------------------------------------- -- K-S test ---------------------------------------------------------------- kolmogorovSmirnovDTest :: [Test] kolmogorovSmirnovDTest = [ testAssertion "K-S D statistics" $ and [ eq 1e-6 (kolmogorovSmirnovD standard (toU sample)) reference | (reference,sample) <- tableKSD ] , testAssertion "K-S 2-sample statistics" $ and [ eq 1e-6 (kolmogorovSmirnov2D (toU xs) (toU ys)) reference | (reference,xs,ys) <- tableKS2D ] , testAssertion "K-S probability" $ and [ eq 1e-14 (kolmogorovSmirnovProbability n d) p | (d,n,p) <- testData ] ] where toU = U.fromList -- Test data for the calculation of cumulative probability -- P(D[n] < d). -- -- Test data is: -- (D[n], n, p) -- Table is generated using sample program from paper testData :: [(Double,Int,Double)] testData = [ (0.09 , 3, 0 ) , (0.2 , 3, 0.00177777777777778 ) , (0.301 , 3, 0.116357025777778 ) , (0.392 , 3, 0.383127210666667 ) , (0.5003 , 3, 0.667366306558667 ) , (0.604 , 3, 0.861569877333333 ) , (0.699 , 3, 0.945458198 ) , (0.802 , 3, 0.984475216 ) , (0.9 , 3, 0.998 ) , (0.09 , 5, 0 ) , (0.2 , 5, 0.0384 ) , (0.301 , 5, 0.33993786080016 ) , (0.392 , 5, 0.66931908083712 ) , (0.5003 , 5, 0.888397260183794 ) , (0.604 , 5, 0.971609957879808 ) , (0.699 , 5, 0.994331075994008 ) , (0.802 , 5, 0.999391366368064 ) , (0.9 , 5, 0.99998 ) , (0.09 , 8, 3.37615237575e-06 ) , (0.2 , 8, 0.151622071801758 ) , (0.301 , 8, 0.613891042670582 ) , (0.392 , 8, 0.871491561427005 ) , (0.5003 , 8, 0.977534089199071 ) , (0.604 , 8, 0.997473116268255 ) , (0.699 , 8, 0.999806082005123 ) , (0.802 , 8, 0.999995133786947 ) , (0.9 , 8, 0.99999998 ) , (0.09 , 10, 3.89639433093119e-05) , (0.2 , 10, 0.25128096 ) , (0.301 , 10, 0.732913126355935 ) , (0.392 , 10, 0.932185254518767 ) , (0.5003 , 10, 0.992276179340446 ) , (0.604 , 10, 0.999495533516769 ) , (0.699 , 10, 0.999979691783985 ) , (0.802 , 10, 0.999999801409237 ) , (0.09 , 20, 0.00794502217168886 ) , (0.2 , 20, 0.647279826376584 ) , (0.301 , 20, 0.958017466965765 ) , (0.392 , 20, 0.997206424843499 ) , (0.5003 , 20, 0.999962641414228 ) , (0.09 , 30, 0.0498147538075168 ) , (0.2 , 30, 0.842030838984526 ) , (0.301 , 30, 0.993403560017612 ) , (0.392 , 30, 0.99988478803318 ) , (0.09 , 100, 0.629367974413669 ) ] statistics-0.10.2.0/tests/Tests/Transform.hs0000644000000000000000000001275712016036043017072 0ustar0000000000000000{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE ViewPatterns #-} module Tests.Transform ( tests ) where import Data.Bits ((.&.), shiftL) import Data.Complex (Complex((:+))) import Data.Functor ((<$>)) import Statistics.Function (within) import Statistics.Transform import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck (Positive(..),Property,Arbitrary(..),Gen,choose,vectorOf, printTestCase, quickCheck) import Text.Printf import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import Tests.Helpers tests :: Test tests = testGroup "fft" [ testProperty "t_impulse" t_impulse , testProperty "t_impulse_offset" t_impulse_offset , testProperty "ifft . fft = id" (t_fftInverse $ ifft . fft) , testProperty "fft . ifft = id" (t_fftInverse $ fft . ifft) , testProperty "idct . dct = id [up to scale]" (t_fftInverse (\v -> U.map (/ (2 * fromIntegral (U.length v))) $ idct $ dct v)) , testProperty "dct . idct = id [up to scale]" (t_fftInverse (\v -> U.map (/ (2 * fromIntegral (U.length v))) $ idct $ dct v)) -- Exact small size DCT -- 2 , testDCT [1,0] $ map (*2) [1, cos (pi/4) ] , testDCT [0,1] $ map (*2) [1, cos (3*pi/4) ] -- 4 , testDCT [1,0,0,0] $ map (*2) [1, cos( pi/8), cos( 2*pi/8), cos( 3*pi/8)] , testDCT [0,1,0,0] $ map (*2) [1, cos(3*pi/8), cos( 6*pi/8), cos( 9*pi/8)] , testDCT [0,0,1,0] $ map (*2) [1, cos(5*pi/8), cos(10*pi/8), cos(15*pi/8)] , testDCT [0,0,0,1] $ map (*2) [1, cos(7*pi/8), cos(14*pi/8), cos(21*pi/8)] -- Exact small size IDCT -- 2 , testIDCT [1,0] [1, 1 ] , testIDCT [0,1] $ map (*2) [cos(pi/4), cos(3*pi/4)] -- 4 , testIDCT [1,0,0,0] [1, 1, 1, 1 ] , testIDCT [0,1,0,0] $ map (*2) [cos( pi/8), cos( 3*pi/8), cos( 5*pi/8), cos( 7*pi/8) ] , testIDCT [0,0,1,0] $ map (*2) [cos( 2*pi/8), cos( 6*pi/8), cos(10*pi/8), cos(14*pi/8) ] , testIDCT [0,0,0,1] $ map (*2) [cos( 3*pi/8), cos( 9*pi/8), cos(15*pi/8), cos(21*pi/8) ] ] -- A single real-valued impulse at the beginning of an otherwise zero -- vector should be replicated in every real component of the result, -- and all the imaginary components should be zero. t_impulse :: Double -> Positive Int -> Bool t_impulse k (Positive m) = G.all (c_near i) (fft v) where v = i `G.cons` G.replicate (n-1) 0 i = k :+ 0 n = 1 `shiftL` (m .&. 6) -- If a real-valued impulse is offset from the beginning of an -- otherwise zero vector, the sum-of-squares of each component of the -- result should equal the square of the impulse. t_impulse_offset :: Double -> Positive Int -> Positive Int -> Bool t_impulse_offset k (Positive x) (Positive m) = G.all ok (fft v) where v = G.concat [G.replicate xn 0, G.singleton i, G.replicate (n-xn-1) 0] ok (re :+ im) = within ulps (re*re + im*im) (k*k) i = k :+ 0 xn = x `rem` n n = 1 `shiftL` (m .&. 6) -- Test that (ifft . fft ≈ id) -- -- Approximate equality here is tricky. Smaller values of vector tend -- to have large relative error. Thus we should test that vectors as -- whole are approximate equal. t_fftInverse :: (HasNorm (U.Vector a), U.Unbox a, Num a, Show a, Arbitrary a) => (U.Vector a -> U.Vector a) -> Property t_fftInverse roundtrip = do x <- genFftVector let n = G.length x x' = roundtrip x d = G.zipWith (-) x x' nd = vectorNorm d nx = vectorNorm x id $ printTestCase "Original vector" $ printTestCase (show x ) $ printTestCase "Transformed one" $ printTestCase (show x') $ printTestCase (printf "Length = %i" n) $ printTestCase (printf "|x - x'| / |x| = %.6g" (nd / nx)) $ nd <= 3e-14 * nx -- Test discrete cosine transform testDCT :: [Double] -> [Double] -> Test testDCT (U.fromList -> vec) (U.fromList -> res) = testAssertion ("DCT test for " ++ show vec) $ vecEqual 3e-14 (dct vec) res -- Test inverse discrete cosine transform testIDCT :: [Double] -> [Double] -> Test testIDCT (U.fromList -> vec) (U.fromList -> res) = testAssertion ("IDCT test for " ++ show vec) $ vecEqual 3e-14 (idct vec) res ---------------------------------------------------------------- -- With an error tolerance of 8 ULPs, a million QuickCheck tests are -- likely to all succeed. With a tolerance of 7, we fail around the -- half million mark. ulps :: Int ulps = 8 c_near :: CD -> CD -> Bool c_near (a :+ b) (c :+ d) = within ulps a c && within ulps b d -- Arbitrary vector for FFT od DCT genFftVector :: (U.Unbox a, Arbitrary a) => Gen (U.Vector a) genFftVector = do n <- (2^) <$> choose (1,9::Int) -- Size of vector G.fromList <$> vectorOf n arbitrary -- Vector to transform -- Ad-hoc type class for calculation of vector norm class HasNorm a where vectorNorm :: a -> Double instance HasNorm (U.Vector Double) where vectorNorm = sqrt . U.sum . U.map (\x -> x*x) instance HasNorm (U.Vector CD) where vectorNorm = sqrt . U.sum . U.map (\(x :+ y) -> x*x + y*y) -- Approximate equality for vectors vecEqual :: Double -> U.Vector Double -> U.Vector Double -> Bool vecEqual ε v u = vectorNorm (U.zipWith (-) v u) < ε * vectorNorm v statistics-0.10.2.0/tests/Tests/Math/0000755000000000000000000000000012016036043015440 5ustar0000000000000000statistics-0.10.2.0/tests/Tests/Math/gen.py0000644000000000000000000000264412016036043016571 0ustar0000000000000000#!/usr/bin/python """ """ from mpmath import * def printListLiteral(lines) : print " [" + "\n , ".join(lines) + "\n ]" ################################################################ # Generate header print "module Tests.Math.Tables where" print ################################################################ ## Generate table for logGamma print "tableLogGamma :: [(Double,Double)]" print "tableLogGamma =" gammaArg = [ 1.25e-6, 6.82e-5, 2.46e-4, 8.8e-4, 3.12e-3, 2.67e-2, 7.77e-2, 0.234, 0.86, 1.34, 1.89, 2.45, 3.65, 4.56, 6.66, 8.25, 11.3, 25.6, 50.4, 123.3, 487.4, 853.4, 2923.3, 8764.3, 1.263e4, 3.45e4, 8.234e4, 2.348e5, 8.343e5, 1.23e6, ] printListLiteral( [ '(%.15f, %.20g)' % (x, log(gamma(x))) for x in gammaArg ] ) ################################################################ ## Generate table for incompleteBeta print "tableIncompleteBeta :: [(Double,Double,Double,Double)]" print "tableIncompleteBeta =" incompleteBetaArg = [ (2, 3, 0.03), (2, 3, 0.23), (2, 3, 0.76), (4, 2.3, 0.89), (1, 1, 0.55), (0.3, 12.2, 0.11), (13.1, 9.8, 0.12), (13.1, 9.8, 0.42), (13.1, 9.8, 0.92), ] printListLiteral( [ '(%.15f, %.15f, %.15f, %.20g)' % (p,q,x, betainc(p,q,0,x, regularized=True)) for (p,q,x) in incompleteBetaArg ]) statistics-0.10.2.0/tests/Tests/NonparametricTest/0000755000000000000000000000000012016036043020211 5ustar0000000000000000statistics-0.10.2.0/tests/Tests/NonparametricTest/Table.hs0000644000000000000000000001326712016036043021605 0ustar0000000000000000module Tests.NonparametricTest.Table where -- Table for Kolmogorov-Smirnov statistics for standard normal -- distribution. Generated using R. -- -- First element of tuple is D second is sample for which it was -- calculated. tableKSD :: [(Double,[Double])] tableKSD = [ (0.2012078,[1.360645,-0.3151904,-1.245443,0.1741977,-0.1421206,-1.798246,1.171594,-1.335844,-5.050093e-2,1.030063,-1.849005,0.6491455,-0.7028004]) , (0.2569956,[0.3884734,-1.227821,-0.4166262,0.429118,-0.9280124,0.8025867,-0.6703089,-0.2124872,0.1224496,0.1087734,-4.285284e-2,-1.039936,-0.7071956]) , (0.1960356,[-1.814745,-0.6327167,0.7082493,0.6264716,1.02061,-0.4094635,0.821026,-0.4255047,-0.4820728,-0.2239833,0.648517,1.114283,0.3610216]) , (0.2095746,[0.187011,0.1805498,0.4448389,0.6065506,0.2308673,0.5292549,-1.489902,-1.455191,0.5449396,-0.1436403,-0.7977073,-0.2693545,0.8260888,-1.474473,-2.158696e-2,-0.1455387]) , (0.1922603,[0.5772317,-1.255561,1.605823,0.4923361,0.2470848,1.176101,-0.3767689,-0.6896885,0.4509345,-0.5048447,0.9436534,1.025599,0.2998393,-3.415219e-2,1.264315,-1.44433,-1.646449e-2]) , (0.2173401,[1.812807,-0.8687497,-0.5710508,1.003647,1.142621,0.6546577,-6.083323e-3,1.628574e-2,1.067499,-1.953143,-0.6060077,1.90859,-0.7480553,0.6715162,-0.928759,1.862,1.604621,-0.2171044,-0.1835918]) , (0.2510541,[-0.4769572,1.062319,0.9952284,1.198086,1.015589,-0.4154523,-0.6711762,1.202902,0.2217098,5.381759e-2,0.6679715,0.2551287,-0.1371492]) , (0.1996022,[1.158607,-0.7354863,1.526559,-0.7855418,-2.82999,-0.6045106,-0.1830228,0.3306812,-0.819657,-1.223715,0.2536423,-0.4155781,1.447042]) , (0.2284761,[1.239965,0.8187093,0.5199788,1.172072,0.748259,1.869376e-2,0.1625921,-1.712065,0.7043582,-1.702702,-0.4792806,-0.1023351,0.1187189]) , (0.2337866,[0.9417261,-0.1024297,-0.7354359,1.099991,0.801984,-0.3745397,-1.749564,1.795771,1.099963,-0.605557,-2.035897,1.893603,-0.3468928,-0.2593938,2.100988,0.9665698,0.8757091,0.7696328,0.8730729,-0.3990352,2.04361,-0.4617864,-0.155021,2.15774,0.2687795,-0.9853512,-0.3264898,1.260026,4.267695,-0.5571145,0.6307067,-0.1691405,-1.730686]) , (0.3389167,[2.025542,-1.542641,-1.090238,3.99027,9.949129e-2,-0.8974433,-2.508418,6.390346,-2.675515,1.154459,1.688072,2.220727,-0.4743102]) , (0.4920231,[0.5192906,-3.260813,-1.245185,1.693084,3.561318,4.058924,2.27063,0.9446943,4.794159,-3.423733,0.8240817,0.644059,0.900175,1.932513,1.024586,2.82823,2.072192,-0.353231,-0.4319673,1.505952,1.0199,4.555054,2.364929,5.531467,3.279415,3.19821,2.726925,1.680027,-0.9041334,-0.8246765,-1.343979,8.454955,1.354581]) , (0.6727408,[-6.705672,-3.193988,-4.612611,-3.207994,-5.070172,-6.141169,-0.397149,-4.093359,-1.204801,-3.986585,-2.724662,0.9868107,-6.295266,-5.95839,-6.35114,-1.679555,-2.635889,-4.050329,1.557428,-2.548465,-0.9073924,-1.502018,-4.535688,-4.158818,-8.833434,-5.944697,-1.569672,-4.70399,-7.832059,-4.093708,-8.393417,-2.085432,-7.06495,-0.4230419,-3.046822,-3.23895,-0.9265873,-9.227822,3.293713,-5.593577,-5.942398,-4.358421,2.660044,-4.301572,-1.258879,0.1499903,3.572833,-3.19844,0.8652432,-0.3025793,-1.576673,-7.666265,-6.751463,-1.398944,-2.690656,-1.429654,7.508364e-2,0.7998344,-3.562074,-1.021431,1.342968,2.110244,-7.561497,-2.372083,-3.649193,-5.7723,-1.068083,0.7537809,-4.569546,-1.198005,-5.638384,-1.227226,-1.195852,-1.118175,-9.130527,0.9675821,-2.497391,0.5988562,-1.965783,-4.25741,-6.547006,-1.459294,-2.380556,-3.977307,-7.809006,-4.276819,-4.028746,-9.055546,-3.599239,-1.470512,-8.253329,-1.351687,-4.269324,-6.140353,-6.30808,-1.834091,-3.135146,-9.391791,3.117815,-5.554733,-2.556769,-3.287376,-2.064013,-5.741995,-5.047918,-4.808841,-1.488526,-0.2351115,-5.760833,-2.722929,-7.012353,2.281171,-3.890514,-1.516824,-1.41011,-1.828457,-5.561244,-3.472142,-10.16919,-0.4369042,-5.698953,-4.587462,-4.897086]) ] -- Table for 2-sample Kolmogorov-Smirnov statistics. Generated using R -- -- First element is D, second and third are samples tableKS2D :: [(Double,[Double],[Double])] tableKS2D = [ (0.2820513,[-0.4212928,2.146532,0.7585263,-0.5086105,-0.7725486,6.235548e-2,-0.1849861,0.861972,-0.1958534,-3.379697e-2,-1.316854,0.6701269],[0.4957582,0.4241167,0.9822869,0.4504248,-0.1749617,1.178098,-1.117222,-0.859273,0.3073736,0.4344583,-0.4761338,-1.332374,1.487291]) , (0.2820513,[-0.712252,0.7990333,-0.7968473,1.443609,1.163096,-1.349071,-0.1553941,-2.003104,-0.3400618,-0.7019282,0.183293,-0.2352167],[-0.4622455,-0.8132221,0.1161614,-1.472115e-2,1.001454,-6.557789e-2,-0.2531216,-1.032432,0.4105478,1.749614,0.9722899,5.850942e-2,-0.3352746]) , (0.2564103,[0.3509882,-0.2982833,1.314731,1.264223,-0.8156374,0.3734029,-3.288915e-2,0.6766016,0.9786335,0.1079949,-0.4211722,1.58656],[0.8024675,7.464538e-2,0.2739861,-2.334255e-2,0.5611802,0.6683374,0.4358206,0.349843,1.207834,1.402578,-0.4049183,0.4286042,1.665129]) , (0.1833333,[1.376196,9.926384e-2,2.199292,-2.04993,0.5585353,-0.4812132,0.1041527,2.084774,0.71194,-1.398245,-4.458574e-2,1.484945,-1.473182,1.020076,-0.7019646,0.2182066,-1.702963,-0.3522622,-0.8129267,-0.6338972],[-1.020371,0.3323861,1.513288,0.1958708,-1.0723,5.323446e-2,-0.9993713,-0.7046356,-0.6781067,-0.4471603,1.512042,-0.2650665,-4.765228e-2,-1.501205,1.228664,0.5332935,-0.2960315,-0.1509683]) , (0.5666667,[0.7145305,0.1255674,2.001531,0.1419216],[2.113474,-0.3352839,-0.4962429,-1.386079,0.6404667,-0.7145304,0.1084008,-0.9821421,-2.270472,-1.003846,-0.5644588,2.699695,-1.296494,-0.1538839,1.319094,-1.127544,0.3568889,0.2004726,-1.313291,0.3581084,0.3313498,0.9336278,0.9850203,-1.309506,1.170459,-0.7517466,-1.771269,0.7156381,-1.129691,0.877729]) , (0.5,[0.6950626,0.1643805,-0.3102472,0.4810762,0.1844602,1.338836,-0.8083386,-0.5482141,0.9532421,-0.2644837],[7.527945,-1.95654,1.513725,-1.318431,2.453895,0.2078194,0.7371092,2.834245,-2.134794,3.938259]) ]