statistics-0.13.2.3/0000755000000000000000000000000012504370360012316 5ustar0000000000000000statistics-0.13.2.3/changelog.md0000644000000000000000000001223312504370360014570 0ustar0000000000000000Changes in 0.13.0.0 * All types now support JSON encoding and decoding. Changes in 0.12.0.0 * The `Statistics.Math` module has been removed, after being deprecated for several years. Use the [math-functions](http://hackage.haskell.org/package/math-functions) package instead. * The `Statistics.Test.NonParametric` module has been removed, after being deprecated for several years. * Added support for Kendall's tau. * Added support for OLS regression. * Added basic 2D matrix support. * Added the Kruskal-Wallis test. Changes in 0.11.0.3 * Fixed a subtle bug in calculation of the jackknifed unbiased variance. * The test suite now requires QuickCheck 2.7. * We now calculate quantiles for normal distribution in a more numerically stable way (bug #64). Changes in 0.10.6.0 * The Estimator type has become an algebraic data type. This allows the jackknife function to potentially use more efficient jackknife implementations. * jackknifeMean, jackknifeStdDev, jackknifeVariance, jackknifeVarianceUnb: new functions. These have O(n) cost instead of the O(n^2) cost of the standard jackknife. * The mean function has been renamed to welfordMean; a new implementation of mean has better numerical accuracy in almost all cases. Changes in 0.10.5.2 * histogram correctly chooses range when all elements in the sample are same (bug #57) Changes in 0.10.5.1 * Bug fix for S.Distributions.Normal.standard introduced in 0.10.5.0 (Bug #56) Changes in 0.10.5.0 * Enthropy type class for distributions is added. * Probability and probability density of distribution is given in log domain too. Changes in 0.10.4.0 * Support for versions of GHC older than 7.2 is discontinued. * All datatypes now support 'Data.Binary' and 'GHC.Generics'. Changes in 0.10.3.0 * Bug fixes Changes in 0.10.2.0 * Bugs in DCT and IDCT are fixed. * Accesors for uniform distribution are added. * ContGen instances for all continous distribtuions are added. * Beta distribution is added. * Constructor for improper gamma distribtuion is added. * Binomial distribution allows zero trials. * Poisson distribution now accept zero parameter. * Integer overflow in caculation of Wilcoxon-T test is fixed. * Bug in 'ContGen' instance for normal distribution is fixed. Changes in 0.10.1.0 * Kolmogorov-Smirnov nonparametric test added. * Pearson chi squared test added. * Type class for generating random variates for given distribution is added. * Modules 'Statistics.Math' and 'Statistics.Constants' are moved to the @math-functions@ package. They are still available but marked as deprecated. Changed in 0.10.0.1 * @dct@ and @idct@ now have type @Vector Double -> Vector Double@ Changes in 0.10.0.0 * The type classes Mean and Variance are split in two. This is required for distributions which do not have finite variance or mean. * The S.Sample.KernelDensity module has been renamed, and completely rewritten to be much more robust. The older module oversmoothed multi-modal data. (The older module is still available under the name S.Sample.KernelDensity.Simple). * Histogram computation is added, in S.Sample.Histogram. * Discrete Fourie transform is added, in S.Transform * Root finding is added, in S.Math.RootFinding. * The complCumulative function is added to the Distribution class in order to accurately assess probalities P(X>x) which are used in one-tailed tests. * A stdDev function is added to the Variance class for distributions. * The constructor S.Distribution.normalDistr now takes standard deviation instead of variance as its parameter. * A bug in S.Quantile.weightedAvg is fixed. It produced a wrong answer if a sample contained only one element. * Bugs in quantile estimations for chi-square and gamma distribution are fixed. * Integer overlow in mannWhitneyUCriticalValue is fixed. It produced incorrect critical values for moderately large samples. Something around 20 for 32-bit machines and 40 for 64-bit ones. * A bug in mannWhitneyUSignificant is fixed. If either sample was larger than 20, it produced a completely incorrect answer. * One- and two-tailed tests in S.Tests.NonParametric are selected with sum types instead of Bool. * Test results returned as enumeration instead of @Bool@. * Performance improvements for Mann-Whitney U and Wilcoxon tests. * Module @S.Tests.NonParamtric@ is split into @S.Tests.MannWhitneyU@ and @S.Tests.WilcoxonT@ * sortBy is added to S.Function. * Mean and variance for gamma distribution are fixed. * Much faster cumulative probablity functions for Poisson and hypergeometric distributions. * Better density functions for gamma and Poisson distributions. * Student-T, Fisher-Snedecor F-distributions and Cauchy-Lorentz distrbution are added. * The function S.Function.create is removed. Use generateM from the vector package instead. * Function to perform approximate comparion of doubles is added to S.Function.Comparison * Regularized incomplete beta function and its inverse are added to S.Function statistics-0.13.2.3/LICENSE0000644000000000000000000000246112504370360013326 0ustar0000000000000000Copyright (c) 2009, 2010 Bryan O'Sullivan All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. statistics-0.13.2.3/README.markdown0000644000000000000000000000200712504370360015016 0ustar0000000000000000# Statistics: efficient, general purpose statistics This package provides the Statistics module, a Haskell library for working with statistical data in a space- and time-efficient way. Where possible, we give citations and computational complexity estimates for the algorithms used. # Performance This library has been carefully optimised for high performance. To obtain the best runtime efficiency, it is imperative to compile libraries and applications that use this library using a high level of optimisation. # Get involved! Please report bugs via the [github issue tracker](https://github.com/bos/statistics/issues). Master [git mirror](https://github.com/bos/statistics): * `git clone git://github.com/bos/statistics.git` There's also a [Mercurial mirror](https://bitbucket.org/bos/statistics): * `hg clone https://bitbucket.org/bos/statistics` (You can create and contribute changes using either Mercurial or git.) # Authors This library is written and maintained by Bryan O'Sullivan, . statistics-0.13.2.3/Setup.lhs0000644000000000000000000000011412504370360014122 0ustar0000000000000000#!/usr/bin/env runhaskell > import Distribution.Simple > main = defaultMain statistics-0.13.2.3/statistics.cabal0000644000000000000000000001072712504370360015503 0ustar0000000000000000name: statistics version: 0.13.2.3 synopsis: A library of statistical types, data, and functions description: This library provides a number of common functions and types useful in statistics. We focus on high performance, numerical robustness, and use of good algorithms. Where possible, we provide references to the statistical literature. . The library's facilities can be divided into four broad categories: . * Working with widely used discrete and continuous probability distributions. (There are dozens of exotic distributions in use; we focus on the most common.) . * Computing with sample data: quantile estimation, kernel density estimation, histograms, bootstrap methods, significance testing, and regression and autocorrelation analysis. . * Random variate generation under several different distributions. . * Common statistical tests for significant differences between samples. license: BSD3 license-file: LICENSE homepage: https://github.com/bos/statistics bug-reports: https://github.com/bos/statistics/issues author: Bryan O'Sullivan maintainer: Bryan O'Sullivan copyright: 2009-2014 Bryan O'Sullivan category: Math, Statistics build-type: Simple cabal-version: >= 1.8 extra-source-files: README.markdown benchmark/bench.hs changelog.md examples/kde/KDE.hs examples/kde/data/faithful.csv examples/kde/kde.html examples/kde/kde.tpl tests/Tests/Math/Tables.hs tests/Tests/Math/gen.py tests/utils/Makefile tests/utils/fftw.c library exposed-modules: Statistics.Autocorrelation Statistics.Constants Statistics.Correlation.Kendall Statistics.Distribution Statistics.Distribution.Beta Statistics.Distribution.Binomial Statistics.Distribution.CauchyLorentz Statistics.Distribution.ChiSquared Statistics.Distribution.Exponential Statistics.Distribution.FDistribution Statistics.Distribution.Gamma Statistics.Distribution.Geometric Statistics.Distribution.Hypergeometric Statistics.Distribution.Normal Statistics.Distribution.Poisson Statistics.Distribution.StudentT Statistics.Distribution.Transform Statistics.Distribution.Uniform Statistics.Function Statistics.Math.RootFinding Statistics.Matrix Statistics.Matrix.Algorithms Statistics.Matrix.Mutable Statistics.Matrix.Types Statistics.Quantile Statistics.Regression Statistics.Resampling Statistics.Resampling.Bootstrap Statistics.Sample Statistics.Sample.Histogram Statistics.Sample.KernelDensity Statistics.Sample.KernelDensity.Simple Statistics.Sample.Powers Statistics.Test.ChiSquared Statistics.Test.KolmogorovSmirnov Statistics.Test.KruskalWallis Statistics.Test.MannWhitneyU Statistics.Test.Types Statistics.Test.WilcoxonT Statistics.Transform Statistics.Types other-modules: Statistics.Distribution.Poisson.Internal Statistics.Function.Comparison Statistics.Internal Statistics.Sample.Internal Statistics.Test.Internal build-depends: aeson >= 0.6.0.0, base >= 4.4 && < 5, binary >= 0.5.1.0, deepseq >= 1.1.0.2, erf, math-functions >= 0.1.5.2, monad-par >= 0.3.4, mwc-random >= 0.13.0.0, primitive >= 0.3, vector >= 0.10, vector-algorithms >= 0.4, vector-binary-instances >= 0.2.1 if impl(ghc < 7.6) build-depends: ghc-prim -- gather extensive profiling data for now ghc-prof-options: -auto-all ghc-options: -O2 -Wall -fwarn-tabs -funbox-strict-fields test-suite tests type: exitcode-stdio-1.0 hs-source-dirs: tests main-is: tests.hs other-modules: Tests.ApproxEq Tests.Correlation Tests.Distribution Tests.Function Tests.Helpers Tests.KDE Tests.Matrix Tests.Matrix.Types Tests.NonParametric Tests.NonParametric.Table Tests.Transform ghc-options: -Wall -threaded -rtsopts build-depends: HUnit, QuickCheck >= 2.7.5, base, binary, erf, ieee754 >= 0.7.3, math-functions, mwc-random, primitive, statistics, test-framework, test-framework-hunit, test-framework-quickcheck2, vector, vector-algorithms source-repository head type: git location: https://github.com/bos/statistics source-repository head type: mercurial location: https://bitbucket.org/bos/statistics statistics-0.13.2.3/benchmark/0000755000000000000000000000000012504370360014250 5ustar0000000000000000statistics-0.13.2.3/benchmark/bench.hs0000644000000000000000000000453412504370360015671 0ustar0000000000000000import Control.Monad.ST (runST) import Criterion.Main import Data.Complex import Statistics.Sample import Statistics.Transform import System.Random.MWC import qualified Data.Vector.Unboxed as U -- Test sample sample :: U.Vector Double sample = runST $ flip uniformVector 10000 =<< create -- Weighted test sample sampleW :: U.Vector (Double,Double) sampleW = U.zip sample (U.reverse sample) -- Comlex vector for FFT tests sampleC :: U.Vector (Complex Double) sampleC = U.zipWith (:+) sample (U.reverse sample) -- Simple benchmark for functions from Statistics.Sample main :: IO () main = defaultMain [ bgroup "sample" [ bench "range" $ nf (\x -> range x) sample -- Mean , bench "mean" $ nf (\x -> mean x) sample , bench "meanWeighted" $ nf (\x -> meanWeighted x) sampleW , bench "harmonicMean" $ nf (\x -> harmonicMean x) sample , bench "geometricMean" $ nf (\x -> geometricMean x) sample -- Variance , bench "variance" $ nf (\x -> variance x) sample , bench "varianceUnbiased" $ nf (\x -> varianceUnbiased x) sample , bench "varianceWeighted" $ nf (\x -> varianceWeighted x) sampleW -- Other , bench "stdDev" $ nf (\x -> stdDev x) sample , bench "skewness" $ nf (\x -> skewness x) sample , bench "kurtosis" $ nf (\x -> kurtosis x) sample -- Central moments , bench "C.M. 2" $ nf (\x -> centralMoment 2 x) sample , bench "C.M. 3" $ nf (\x -> centralMoment 3 x) sample , bench "C.M. 4" $ nf (\x -> centralMoment 4 x) sample , bench "C.M. 5" $ nf (\x -> centralMoment 5 x) sample ] , bgroup "FFT" [ bgroup "fft" [ bench (show n) $ whnf fft (U.take n sampleC) | n <- fftSizes ] , bgroup "ifft" [ bench (show n) $ whnf ifft (U.take n sampleC) | n <- fftSizes ] , bgroup "dct" [ bench (show n) $ whnf dct (U.take n sample) | n <- fftSizes ] , bgroup "dct_" [ bench (show n) $ whnf dct_ (U.take n sampleC) | n <- fftSizes ] , bgroup "idct" [ bench (show n) $ whnf idct (U.take n sample) | n <- fftSizes ] , bgroup "idct_" [ bench (show n) $ whnf idct_ (U.take n sampleC) | n <- fftSizes ] ] ] fftSizes :: [Int] fftSizes = [32,128,512,2048] statistics-0.13.2.3/examples/0000755000000000000000000000000012504370360014134 5ustar0000000000000000statistics-0.13.2.3/examples/kde/0000755000000000000000000000000012504370360014677 5ustar0000000000000000statistics-0.13.2.3/examples/kde/KDE.hs0000644000000000000000000000156412504370360015644 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} import Control.Applicative ((<$>)) import Statistics.Sample.KernelDensity (kde) import Text.Hastache (MuType(..), defaultConfig, hastacheFile) import Text.Hastache.Context (mkStrContext) import qualified Data.Attoparsec as B import qualified Data.Attoparsec.Char8 as A import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as L import qualified Data.Vector.Unboxed as U csv = do B.takeTill A.isEndOfLine (A.double `A.sepBy` A.char ',') `A.sepBy` A.endOfLine main = do waits <- (either error (U.fromList . map last . filter (not.null)) . A.parseOnly csv) <$> B.readFile "data/faithful.csv" let xs = map (\(a,b) -> [a,b]) . U.toList . uncurry U.zip . kde 64 $ waits context "data" = MuVariable . show $ xs s <- hastacheFile defaultConfig "kde.tpl" (mkStrContext context) L.writeFile "kde.html" s statistics-0.13.2.3/examples/kde/kde.html0000644000000000000000000000727112504370360016337 0ustar0000000000000000 Kernel density

Kernel density

This is a 64-point kernel density estimate of wait times between eruptions of the Old Faithful geyser.

statistics-0.13.2.3/examples/kde/kde.tpl0000644000000000000000000000221112504370360016157 0ustar0000000000000000 Kernel density

Kernel density

This is a 64-point kernel density estimate of wait times between eruptions of the Old Faithful geyser.

statistics-0.13.2.3/examples/kde/data/0000755000000000000000000000000012504370360015610 5ustar0000000000000000statistics-0.13.2.3/examples/kde/data/faithful.csv0000644000000000000000000000433312504370360020132 0ustar0000000000000000eruption,wait 3.6,79 1.8,54 3.333,74 2.283,62 4.533,85 2.883,55 4.7,88 3.6,85 1.95,51 4.35,85 1.833,54 3.917,84 4.2,78 1.75,47 4.7,83 2.167,52 1.75,62 4.8,84 1.6,52 4.25,79 1.8,51 1.75,47 3.45,78 3.067,69 4.533,74 3.6,83 1.967,55 4.083,76 3.85,78 4.433,79 4.3,73 4.467,77 3.367,66 4.033,80 3.833,74 2.017,52 1.867,48 4.833,80 1.833,59 4.783,90 4.35,80 1.883,58 4.567,84 1.75,58 4.533,73 3.317,83 3.833,64 2.1,53 4.633,82 2,59 4.8,75 4.716,90 1.833,54 4.833,80 1.733,54 4.883,83 3.717,71 1.667,64 4.567,77 4.317,81 2.233,59 4.5,84 1.75,48 4.8,82 1.817,60 4.4,92 4.167,78 4.7,78 2.067,65 4.7,73 4.033,82 1.967,56 4.5,79 4,71 1.983,62 5.067,76 2.017,60 4.567,78 3.883,76 3.6,83 4.133,75 4.333,82 4.1,70 2.633,65 4.067,73 4.933,88 3.95,76 4.517,80 2.167,48 4,86 2.2,60 4.333,90 1.867,50 4.817,78 1.833,63 4.3,72 4.667,84 3.75,75 1.867,51 4.9,82 2.483,62 4.367,88 2.1,49 4.5,83 4.05,81 1.867,47 4.7,84 1.783,52 4.85,86 3.683,81 4.733,75 2.3,59 4.9,89 4.417,79 1.7,59 4.633,81 2.317,50 4.6,85 1.817,59 4.417,87 2.617,53 4.067,69 4.25,77 1.967,56 4.6,88 3.767,81 1.917,45 4.5,82 2.267,55 4.65,90 1.867,45 4.167,83 2.8,56 4.333,89 1.833,46 4.383,82 1.883,51 4.933,86 2.033,53 3.733,79 4.233,81 2.233,60 4.533,82 4.817,77 4.333,76 1.983,59 4.633,80 2.017,49 5.1,96 1.8,53 5.033,77 4,77 2.4,65 4.6,81 3.567,71 4,70 4.5,81 4.083,93 1.8,53 3.967,89 2.2,45 4.15,86 2,58 3.833,78 3.5,66 4.583,76 2.367,63 5,88 1.933,52 4.617,93 1.917,49 2.083,57 4.583,77 3.333,68 4.167,81 4.333,81 4.5,73 2.417,50 4,85 4.167,74 1.883,55 4.583,77 4.25,83 3.767,83 2.033,51 4.433,78 4.083,84 1.833,46 4.417,83 2.183,55 4.8,81 1.833,57 4.8,76 4.1,84 3.966,77 4.233,81 3.5,87 4.366,77 2.25,51 4.667,78 2.1,60 4.35,82 4.133,91 1.867,53 4.6,78 1.783,46 4.367,77 3.85,84 1.933,49 4.5,83 2.383,71 4.7,80 1.867,49 3.833,75 3.417,64 4.233,76 2.4,53 4.8,94 2,55 4.15,76 1.867,50 4.267,82 1.75,54 4.483,75 4,78 4.117,79 4.083,78 4.267,78 3.917,70 4.55,79 4.083,70 2.417,54 4.183,86 2.217,50 4.45,90 1.883,54 1.85,54 4.283,77 3.95,79 2.333,64 4.15,75 2.35,47 4.933,86 2.9,63 4.583,85 3.833,82 2.083,57 4.367,82 2.133,67 4.35,74 2.2,54 4.45,83 3.567,73 4.5,73 4.15,88 3.817,80 3.917,71 4.45,83 2,56 4.283,79 4.767,78 4.533,84 1.85,58 4.25,83 1.983,43 2.25,60 4.75,75 4.117,81 2.15,46 4.417,90 1.817,46 4.467,74 statistics-0.13.2.3/Statistics/0000755000000000000000000000000012504370360014450 5ustar0000000000000000statistics-0.13.2.3/Statistics/Autocorrelation.hs0000644000000000000000000000321312504370360020155 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Autocorrelation -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for computing autocovariance and autocorrelation of a -- sample. module Statistics.Autocorrelation ( autocovariance , autocorrelation ) where import Prelude hiding (sum) import Statistics.Function (square) import Statistics.Sample (mean) import Statistics.Sample.Internal (sum) import qualified Data.Vector.Generic as G -- | Compute the autocovariance of a sample, i.e. the covariance of -- the sample against a shifted version of itself. autocovariance :: (G.Vector v Double, G.Vector v Int) => v Double -> v Double autocovariance a = G.map f . G.enumFromTo 0 $ l-2 where f k = sum (G.zipWith (*) (G.take (l-k) c) (G.slice k (l-k) c)) / fromIntegral l c = G.map (subtract (mean a)) a l = G.length a -- | Compute the autocorrelation function of a sample, and the upper -- and lower bounds of confidence intervals for each element. -- -- /Note/: The calculation of the 95% confidence interval assumes a -- stationary Gaussian process. autocorrelation :: (G.Vector v Double, G.Vector v Int) => v Double -> (v Double, v Double, v Double) autocorrelation a = (r, ci (-), ci (+)) where r = G.map (/ G.head c) c where c = autocovariance a dllse = G.map f . G.scanl1 (+) . G.map square $ r where f v = 1.96 * sqrt ((v * 2 + 1) / l) l = fromIntegral (G.length a) ci f = G.cons 1 . G.tail . G.map (f (-1/l)) $ dllse statistics-0.13.2.3/Statistics/Constants.hs0000644000000000000000000000105212504370360016756 0ustar0000000000000000-- | -- Module : Statistics.Constants -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Constant values common to much statistics code. -- -- DEPRECATED: use module 'Numeric.MathFunctions.Constants' from -- math-functions. module Statistics.Constants {-# DEPRECATED "use module Numeric.MathFunctions.Constants from math-functions" #-} ( module Numeric.MathFunctions.Constants ) where import Numeric.MathFunctions.Constants statistics-0.13.2.3/Statistics/Distribution.hs0000644000000000000000000001715512504370360017474 0ustar0000000000000000{-# LANGUAGE BangPatterns, ScopedTypeVariables #-} -- | -- Module : Statistics.Distribution -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Types classes for probability distrubutions module Statistics.Distribution ( -- * Type classes Distribution(..) , DiscreteDistr(..) , ContDistr(..) -- ** Distribution statistics , MaybeMean(..) , Mean(..) , MaybeVariance(..) , Variance(..) , MaybeEntropy(..) , Entropy(..) -- ** Random number generation , ContGen(..) , DiscreteGen(..) , genContinous -- * Helper functions , findRoot , sumProbabilities ) where import Control.Applicative ((<$>), Applicative(..)) import Control.Monad.Primitive (PrimMonad,PrimState) import Prelude hiding (sum) import Statistics.Function (square) import Statistics.Sample.Internal (sum) import System.Random.MWC (Gen, uniform) import qualified Data.Vector.Unboxed as U -- | Type class common to all distributions. Only c.d.f. could be -- defined for both discrete and continous distributions. class Distribution d where -- | Cumulative distribution function. The probability that a -- random variable /X/ is less or equal than /x/, -- i.e. P(/X/≤/x/). Cumulative should be defined for -- infinities as well: -- -- > cumulative d +∞ = 1 -- > cumulative d -∞ = 0 cumulative :: d -> Double -> Double -- | One's complement of cumulative distibution: -- -- > complCumulative d x = 1 - cumulative d x -- -- It's useful when one is interested in P(/X/ Double -> Double complCumulative d x = 1 - cumulative d x -- | Discrete probability distribution. class Distribution d => DiscreteDistr d where -- | Probability of n-th outcome. probability :: d -> Int -> Double probability d = exp . logProbability d -- | Logarithm of probability of n-th outcome logProbability :: d -> Int -> Double logProbability d = log . probability d -- | Continuous probability distributuion. -- -- Minimal complete definition is 'quantile' and either 'density' or -- 'logDensity'. class Distribution d => ContDistr d where -- | Probability density function. Probability that random -- variable /X/ lies in the infinitesimal interval -- [/x/,/x+/δ/x/) equal to /density(x)/⋅δ/x/ density :: d -> Double -> Double density d = exp . logDensity d -- | Inverse of the cumulative distribution function. The value -- /x/ for which P(/X/≤/x/) = /p/. If probability is outside -- of [0,1] range function should call 'error' quantile :: d -> Double -> Double -- | Natural logarithm of density. logDensity :: d -> Double -> Double logDensity d = log . density d -- | Type class for distributions with mean. 'maybeMean' should return -- 'Nothing' if it's undefined for current value of data class Distribution d => MaybeMean d where maybeMean :: d -> Maybe Double -- | Type class for distributions with mean. If distribution have -- finite mean for all valid values of parameters it should be -- instance of this type class. class MaybeMean d => Mean d where mean :: d -> Double -- | Type class for distributions with variance. If variance is -- undefined for some parameter values both 'maybeVariance' and -- 'maybeStdDev' should return Nothing. -- -- Minimal complete definition is 'maybeVariance' or 'maybeStdDev' class MaybeMean d => MaybeVariance d where maybeVariance :: d -> Maybe Double maybeVariance d = (*) <$> x <*> x where x = maybeStdDev d maybeStdDev :: d -> Maybe Double maybeStdDev = fmap sqrt . maybeVariance -- | Type class for distributions with variance. If distibution have -- finite variance for all valid parameter values it should be -- instance of this type class. -- -- Minimal complete definition is 'variance' or 'stdDev' class (Mean d, MaybeVariance d) => Variance d where variance :: d -> Double variance d = square (stdDev d) stdDev :: d -> Double stdDev = sqrt . variance -- | Type class for distributions with entropy, meaning Shannon entropy -- in the case of a discrete distribution, or differential entropy in the -- case of a continuous one. 'maybeEntropy' should return 'Nothing' if -- entropy is undefined for the chosen parameter values. class (Distribution d) => MaybeEntropy d where -- | Returns the entropy of a distribution, in nats, if such is defined. maybeEntropy :: d -> Maybe Double -- | Type class for distributions with entropy, meaning Shannon -- entropy in the case of a discrete distribution, or differential -- entropy in the case of a continuous one. If the distribution has -- well-defined entropy for all valid parameter values then it -- should be an instance of this type class. class (MaybeEntropy d) => Entropy d where -- | Returns the entropy of a distribution, in nats. entropy :: d -> Double -- | Generate discrete random variates which have given -- distribution. class Distribution d => ContGen d where genContVar :: PrimMonad m => d -> Gen (PrimState m) -> m Double -- | Generate discrete random variates which have given -- distribution. 'ContGen' is superclass because it's always possible -- to generate real-valued variates from integer values class (DiscreteDistr d, ContGen d) => DiscreteGen d where genDiscreteVar :: PrimMonad m => d -> Gen (PrimState m) -> m Int -- | Generate variates from continous distribution using inverse -- transform rule. genContinous :: (ContDistr d, PrimMonad m) => d -> Gen (PrimState m) -> m Double genContinous d gen = do x <- uniform gen return $! quantile d x data P = P {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Approximate the value of /X/ for which P(/x/>/X/)=/p/. -- -- This method uses a combination of Newton-Raphson iteration and -- bisection with the given guess as a starting point. The upper and -- lower bounds specify the interval in which the probability -- distribution reaches the value /p/. findRoot :: ContDistr d => d -- ^ Distribution -> Double -- ^ Probability /p/ -> Double -- ^ Initial guess -> Double -- ^ Lower bound on interval -> Double -- ^ Upper bound on interval -> Double findRoot d prob = loop 0 1 where loop !(i::Int) !dx !x !lo !hi | abs dx <= accuracy || i >= maxIters = x | otherwise = loop (i+1) dx'' x'' lo' hi' where err = cumulative d x - prob P lo' hi' | err < 0 = P x hi | otherwise = P lo x pdf = density d x P dx' x' | pdf /= 0 = P (err / pdf) (x - dx) | otherwise = P dx x P dx'' x'' | x' < lo' || x' > hi' || pdf == 0 = let y = (lo' + hi') / 2 in P (y-x) y | otherwise = P dx' x' accuracy = 1e-15 maxIters = 150 -- | Sum probabilities in inclusive interval. sumProbabilities :: DiscreteDistr d => d -> Int -> Int -> Double sumProbabilities d low hi = -- Return value is forced to be less than 1 to guard againist roundoff errors. -- ATTENTION! this check should be removed for testing or it could mask bugs. min 1 . sum . U.map (probability d) $ U.enumFromTo low hi statistics-0.13.2.3/Statistics/Function.hs0000644000000000000000000001006112504370360016567 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, FlexibleContexts, Rank2Types #-} #if __GLASGOW_HASKELL__ >= 704 {-# OPTIONS_GHC -fsimpl-tick-factor=200 #-} #endif -- | -- Module : Statistics.Function -- Copyright : (c) 2009, 2010, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Useful functions. module Statistics.Function ( -- * Scanning minMax -- * Sorting , sort , gsort , sortBy , partialSort -- * Indexing , indexed , indices -- * Bit twiddling , nextHighestPowerOfTwo -- * Comparison , within -- * Arithmetic , square -- * Vectors , unsafeModify -- * Combinators , for , rfor ) where #include "MachDeps.h" import Control.Monad.ST (ST) import Data.Bits ((.|.), shiftR) import qualified Data.Vector.Algorithms.Intro as I import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M import Statistics.Function.Comparison (within) -- | Sort a vector. sort :: U.Vector Double -> U.Vector Double sort = G.modify I.sort {-# NOINLINE sort #-} -- | Sort a vector. gsort :: (Ord e, G.Vector v e) => v e -> v e gsort = G.modify I.sort {-# INLINE gsort #-} -- | Sort a vector using a custom ordering. sortBy :: (G.Vector v e) => I.Comparison e -> v e -> v e sortBy f = G.modify $ I.sortBy f {-# INLINE sortBy #-} -- | Partially sort a vector, such that the least /k/ elements will be -- at the front. partialSort :: (G.Vector v e, Ord e) => Int -- ^ The number /k/ of least elements. -> v e -> v e partialSort k = G.modify (`I.partialSort` k) {-# SPECIALIZE partialSort :: Int -> U.Vector Double -> U.Vector Double #-} -- | Return the indices of a vector. indices :: (G.Vector v a, G.Vector v Int) => v a -> v Int indices a = G.enumFromTo 0 (G.length a - 1) {-# INLINE indices #-} -- | Zip a vector with its indices. indexed :: (G.Vector v e, G.Vector v Int, G.Vector v (Int,e)) => v e -> v (Int,e) indexed a = G.zip (indices a) a {-# INLINE indexed #-} data MM = MM {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Compute the minimum and maximum of a vector in one pass. minMax :: (G.Vector v Double) => v Double -> (Double, Double) minMax = fini . G.foldl' go (MM (1/0) (-1/0)) where go (MM lo hi) k = MM (min lo k) (max hi k) fini (MM lo hi) = (lo, hi) {-# INLINE minMax #-} -- | Efficiently compute the next highest power of two for a -- non-negative integer. If the given value is already a power of -- two, it is returned unchanged. If negative, zero is returned. nextHighestPowerOfTwo :: Int -> Int nextHighestPowerOfTwo n #if WORD_SIZE_IN_BITS == 64 = 1 + _i32 #else = 1 + i16 #endif where i0 = n - 1 i1 = i0 .|. i0 `shiftR` 1 i2 = i1 .|. i1 `shiftR` 2 i4 = i2 .|. i2 `shiftR` 4 i8 = i4 .|. i4 `shiftR` 8 i16 = i8 .|. i8 `shiftR` 16 _i32 = i16 .|. i16 `shiftR` 32 -- It could be implemented as -- -- > nextHighestPowerOfTwo n = 1 + foldl' go (n-1) [1, 2, 4, 8, 16, 32] -- where go m i = m .|. m `shiftR` i -- -- But GHC do not inline foldl (probably because it's recursive) and -- as result function walks list of boxed ints. Hand rolled version -- uses unboxed arithmetic. -- | Multiply a number by itself. square :: Double -> Double square x = x * x -- | Simple for loop. Counts from /start/ to /end/-1. for :: Monad m => Int -> Int -> (Int -> m ()) -> m () for n0 !n f = loop n0 where loop i | i == n = return () | otherwise = f i >> loop (i+1) {-# INLINE for #-} -- | Simple reverse-for loop. Counts from /start/-1 to /end/ (which -- must be less than /start/). rfor :: Monad m => Int -> Int -> (Int -> m ()) -> m () rfor n0 !n f = loop n0 where loop i | i == n = return () | otherwise = let i' = i-1 in f i' >> loop i' {-# INLINE rfor #-} unsafeModify :: M.MVector s Double -> Int -> (Double -> Double) -> ST s () unsafeModify v i f = do k <- M.unsafeRead v i M.unsafeWrite v i (f k) {-# INLINE unsafeModify #-} statistics-0.13.2.3/Statistics/Internal.hs0000644000000000000000000000220112504370360016553 0ustar0000000000000000{-# LANGUAGE CPP, MagicHash, UnboxedTuples #-} -- | -- Module : Statistics.Internal -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Scary internal functions. module Statistics.Internal ( inlinePerformIO ) where #if __GLASGOW_HASKELL__ >= 611 import GHC.IO (IO(IO)) #else import GHC.IOBase (IO(IO)) #endif import GHC.Base (realWorld#) #if !defined(__GLASGOW_HASKELL__) import System.IO.Unsafe (unsafePerformIO) #endif -- Lifted from Data.ByteString.Internal so we don't introduce an -- otherwise unnecessary dependency on the bytestring package. -- | Just like unsafePerformIO, but we inline it. Big performance -- gains as it exposes lots of things to further inlining. /Very -- unsafe/. In particular, you should do no memory allocation inside -- an 'inlinePerformIO' block. On Hugs this is just @unsafePerformIO@. {-# INLINE inlinePerformIO #-} inlinePerformIO :: IO a -> a #if defined(__GLASGOW_HASKELL__) inlinePerformIO (IO m) = case m realWorld# of (# _, r #) -> r #else inlinePerformIO = unsafePerformIO #endif statistics-0.13.2.3/Statistics/Matrix.hs0000644000000000000000000001076712504370360016263 0ustar0000000000000000-- | -- Module : Statistics.Matrix -- Copyright : 2011 Aleksey Khudyakov, 2014 Bryan O'Sullivan -- License : BSD3 -- -- Basic matrix operations. -- -- There isn't a widely used matrix package for Haskell yet, so -- we implement the necessary minimum here. module Statistics.Matrix ( Matrix(..) , Vector , fromList , fromVector , toVector , toList , dimension , center , multiply , multiplyV , transpose , power , norm , column , row , map , for , unsafeIndex , hasNaN , bounds , unsafeBounds ) where import Prelude hiding (exponent, map, sum) import Statistics.Function (for, square) import Statistics.Matrix.Types import Statistics.Sample.Internal (sum) import qualified Data.Vector.Unboxed as U -- | Convert from a row-major list. fromList :: Int -- ^ Number of rows. -> Int -- ^ Number of columns. -> [Double] -- ^ Flat list of values, in row-major order. -> Matrix fromList r c = fromVector r c . U.fromList -- | Convert from a row-major vector. fromVector :: Int -- ^ Number of rows. -> Int -- ^ Number of columns. -> U.Vector Double -- ^ Flat list of values, in row-major order. -> Matrix fromVector r c v | r*c /= len = error "input size mismatch" | otherwise = Matrix r c 0 v where len = U.length v -- | Convert to a row-major flat vector. toVector :: Matrix -> U.Vector Double toVector (Matrix _ _ _ v) = v -- | Convert to a row-major flat list. toList :: Matrix -> [Double] toList = U.toList . toVector -- | Return the dimensions of this matrix, as a (row,column) pair. dimension :: Matrix -> (Int, Int) dimension (Matrix r c _ _) = (r, c) -- | Avoid overflow in the matrix. avoidOverflow :: Matrix -> Matrix avoidOverflow m@(Matrix r c e v) | center m > 1e140 = Matrix r c (e + 140) (U.map (* 1e-140) v) | otherwise = m -- | Matrix-matrix multiplication. Matrices must be of compatible -- sizes (/note: not checked/). multiply :: Matrix -> Matrix -> Matrix multiply m1@(Matrix r1 _ e1 _) m2@(Matrix _ c2 e2 _) = Matrix r1 c2 (e1 + e2) $ U.generate (r1*c2) go where go t = sum $ U.zipWith (*) (row m1 i) (column m2 j) where (i,j) = t `quotRem` c2 -- | Matrix-vector multiplication. multiplyV :: Matrix -> Vector -> Vector multiplyV m v | cols m == c = U.generate (rows m) (sum . U.zipWith (*) v . row m) | otherwise = error $ "matrix/vector unconformable " ++ show (cols m,c) where c = U.length v -- | Raise matrix to /n/th power. Power must be positive -- (/note: not checked). power :: Matrix -> Int -> Matrix power mat 1 = mat power mat n = avoidOverflow res where mat2 = power mat (n `quot` 2) pow = multiply mat2 mat2 res | odd n = multiply pow mat | otherwise = pow -- | Element in the center of matrix (not corrected for exponent). center :: Matrix -> Double center mat@(Matrix r c _ _) = unsafeBounds U.unsafeIndex mat (r `quot` 2) (c `quot` 2) -- | Calculate the Euclidean norm of a vector. norm :: Vector -> Double norm = sqrt . sum . U.map square -- | Return the given column. column :: Matrix -> Int -> Vector column (Matrix r c _ v) i = U.backpermute v $ U.enumFromStepN i c r {-# INLINE column #-} -- | Return the given row. row :: Matrix -> Int -> Vector row (Matrix _ c _ v) i = U.slice (c*i) c v unsafeIndex :: Matrix -> Int -- ^ Row. -> Int -- ^ Column. -> Double unsafeIndex = unsafeBounds U.unsafeIndex map :: (Double -> Double) -> Matrix -> Matrix map f (Matrix r c e v) = Matrix r c e (U.map f v) -- | Indicate whether any element of the matrix is @NaN@. hasNaN :: Matrix -> Bool hasNaN = U.any isNaN . toVector -- | Given row and column numbers, calculate the offset into the flat -- row-major vector. bounds :: (Vector -> Int -> r) -> Matrix -> Int -> Int -> r bounds k (Matrix rs cs _ v) r c | r < 0 || r >= rs = error "row out of bounds" | c < 0 || c >= cs = error "column out of bounds" | otherwise = k v $! r * cs + c {-# INLINE bounds #-} -- | Given row and column numbers, calculate the offset into the flat -- row-major vector, without checking. unsafeBounds :: (Vector -> Int -> r) -> Matrix -> Int -> Int -> r unsafeBounds k (Matrix _ cs _ v) r c = k v $! r * cs + c {-# INLINE unsafeBounds #-} transpose :: Matrix -> Matrix transpose m@(Matrix r0 c0 e _) = Matrix c0 r0 e . U.generate (r0*c0) $ \i -> let (r,c) = i `quotRem` r0 in unsafeIndex m c r statistics-0.13.2.3/Statistics/Quantile.hs0000644000000000000000000001573712504370360016603 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Quantile -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for approximating quantiles, i.e. points taken at regular -- intervals from the cumulative distribution function of a random -- variable. -- -- The number of quantiles is described below by the variable /q/, so -- with /q/=4, a 4-quantile (also known as a /quartile/) has 4 -- intervals, and contains 5 points. The parameter /k/ describes the -- desired point, where 0 ≤ /k/ ≤ /q/. module Statistics.Quantile ( -- * Quantile estimation functions weightedAvg , ContParam(..) , continuousBy , midspread -- * Parameters for the continuous sample method , cadpw , hazen , s , spss , medianUnbiased , normalUnbiased -- * References -- $references ) where import Data.Vector.Generic ((!)) import Numeric.MathFunctions.Constants (m_epsilon) import Statistics.Function (partialSort) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | O(/n/ log /n/). Estimate the /k/th /q/-quantile of a sample, -- using the weighted average method. weightedAvg :: G.Vector v Double => Int -- ^ /k/, the desired quantile. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double weightedAvg k q x | G.any isNaN x = modErr "weightedAvg" "Sample contains NaNs" | n == 1 = G.head x | q < 2 = modErr "weightedAvg" "At least 2 quantiles is needed" | k < 0 || k >= q = modErr "weightedAvg" "Wrong quantile number" | otherwise = xj + g * (xj1 - xj) where j = floor idx idx = fromIntegral (n - 1) * fromIntegral k / fromIntegral q g = idx - fromIntegral j xj = sx ! j xj1 = sx ! (j+1) sx = partialSort (j+2) x n = G.length x {-# SPECIALIZE weightedAvg :: Int -> Int -> U.Vector Double -> Double #-} {-# SPECIALIZE weightedAvg :: Int -> Int -> V.Vector Double -> Double #-} -- | Parameters /a/ and /b/ to the 'continuousBy' function. data ContParam = ContParam {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | O(/n/ log /n/). Estimate the /k/th /q/-quantile of a sample /x/, -- using the continuous sample method with the given parameters. This -- is the method used by most statistical software, such as R, -- Mathematica, SPSS, and S. continuousBy :: G.Vector v Double => ContParam -- ^ Parameters /a/ and /b/. -> Int -- ^ /k/, the desired quantile. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double continuousBy (ContParam a b) k q x | q < 2 = modErr "continuousBy" "At least 2 quantiles is needed" | k < 0 || k > q = modErr "continuousBy" "Wrong quantile number" | G.any isNaN x = modErr "continuousBy" "Sample contains NaNs" | otherwise = (1-h) * item (j-1) + h * item j where j = floor (t + eps) t = a + p * (fromIntegral n + 1 - a - b) p = fromIntegral k / fromIntegral q h | abs r < eps = 0 | otherwise = r where r = t - fromIntegral j eps = m_epsilon * 4 n = G.length x item = (sx !) . bracket sx = partialSort (bracket j + 1) x bracket m = min (max m 0) (n - 1) {-# SPECIALIZE continuousBy :: ContParam -> Int -> Int -> U.Vector Double -> Double #-} {-# SPECIALIZE continuousBy :: ContParam -> Int -> Int -> V.Vector Double -> Double #-} -- | O(/n/ log /n/). Estimate the range between /q/-quantiles 1 and -- /q/-1 of a sample /x/, using the continuous sample method with the -- given parameters. -- -- For instance, the interquartile range (IQR) can be estimated as -- follows: -- -- > midspread medianUnbiased 4 (U.fromList [1,1,2,2,3]) -- > ==> 1.333333 midspread :: G.Vector v Double => ContParam -- ^ Parameters /a/ and /b/. -> Int -- ^ /q/, the number of quantiles. -> v Double -- ^ /x/, the sample data. -> Double midspread (ContParam a b) k x | G.any isNaN x = modErr "midspread" "Sample contains NaNs" | k <= 0 = modErr "midspread" "Nonpositive number of quantiles" | otherwise = quantile (1-frac) - quantile frac where quantile i = (1-h i) * item (j i-1) + h i * item (j i) j i = floor (t i + eps) :: Int t i = a + i * (fromIntegral n + 1 - a - b) h i | abs r < eps = 0 | otherwise = r where r = t i - fromIntegral (j i) eps = m_epsilon * 4 n = G.length x item = (sx !) . bracket sx = partialSort (bracket (j (1-frac)) + 1) x bracket m = min (max m 0) (n - 1) frac = 1 / fromIntegral k {-# SPECIALIZE midspread :: ContParam -> Int -> U.Vector Double -> Double #-} {-# SPECIALIZE midspread :: ContParam -> Int -> V.Vector Double -> Double #-} -- | California Department of Public Works definition, /a/=0, /b/=1. -- Gives a linear interpolation of the empirical CDF. This -- corresponds to method 4 in R and Mathematica. cadpw :: ContParam cadpw = ContParam 0 1 -- | Hazen's definition, /a/=0.5, /b/=0.5. This is claimed to be -- popular among hydrologists. This corresponds to method 5 in R and -- Mathematica. hazen :: ContParam hazen = ContParam 0.5 0.5 -- | Definition used by the SPSS statistics application, with /a/=0, -- /b/=0 (also known as Weibull's definition). This corresponds to -- method 6 in R and Mathematica. spss :: ContParam spss = ContParam 0 0 -- | Definition used by the S statistics application, with /a/=1, -- /b/=1. The interpolation points divide the sample range into @n-1@ -- intervals. This corresponds to method 7 in R and Mathematica. s :: ContParam s = ContParam 1 1 -- | Median unbiased definition, /a/=1\/3, /b/=1\/3. The resulting -- quantile estimates are approximately median unbiased regardless of -- the distribution of /x/. This corresponds to method 8 in R and -- Mathematica. medianUnbiased :: ContParam medianUnbiased = ContParam third third where third = 1/3 -- | Normal unbiased definition, /a/=3\/8, /b/=3\/8. An approximately -- unbiased estimate if the empirical distribution approximates the -- normal distribution. This corresponds to method 9 in R and -- Mathematica. normalUnbiased :: ContParam normalUnbiased = ContParam ta ta where ta = 3/8 modErr :: String -> String -> a modErr f err = error $ "Statistics.Quantile." ++ f ++ ": " ++ err -- $references -- -- * Weisstein, E.W. Quantile. /MathWorld/. -- -- -- * Hyndman, R.J.; Fan, Y. (1996) Sample quantiles in statistical -- packages. /American Statistician/ -- 50(4):361–365. statistics-0.13.2.3/Statistics/Regression.hs0000644000000000000000000001376612504370360017141 0ustar0000000000000000-- | -- Module : Statistics.Regression -- Copyright : 2014 Bryan O'Sullivan -- License : BSD3 -- -- Functions for regression analysis. module Statistics.Regression ( olsRegress , ols , rSquare , bootstrapRegress ) where import Control.Applicative ((<$>)) import Control.Concurrent (forkIO) import Control.Concurrent.Chan (newChan, readChan, writeChan) import Control.DeepSeq (rnf) import Control.Monad (forM_, replicateM) import GHC.Conc (getNumCapabilities) import Prelude hiding (pred, sum) import Statistics.Function as F import Statistics.Matrix hiding (map) import Statistics.Matrix.Algorithms (qr) import Statistics.Resampling (splitGen) import Statistics.Resampling.Bootstrap (Estimate(..)) import Statistics.Sample (mean) import Statistics.Sample.Internal (sum) import System.Random.MWC (GenIO, uniformR) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M -- | Perform an ordinary least-squares regression on a set of -- predictors, and calculate the goodness-of-fit of the regression. -- -- The returned pair consists of: -- -- * A vector of regression coefficients. This vector has /one more/ -- element than the list of predictors; the last element is the -- /y/-intercept value. -- -- * /R²/, the coefficient of determination (see 'rSquare' for -- details). olsRegress :: [Vector] -- ^ Non-empty list of predictor vectors. Must all have -- the same length. These will become the columns of -- the matrix /A/ solved by 'ols'. -> Vector -- ^ Responder vector. Must have the same length as the -- predictor vectors. -> (Vector, Double) olsRegress preds@(_:_) resps | any (/=n) ls = error $ "predictor vector length mismatch " ++ show lss | G.length resps /= n = error $ "responder/predictor length mismatch " ++ show (G.length resps, n) | otherwise = (coeffs, rSquare mxpreds resps coeffs) where coeffs = ols mxpreds resps mxpreds = transpose . fromVector (length lss + 1) n . G.concat $ preds ++ [G.replicate n 1] lss@(n:ls) = map G.length preds olsRegress _ _ = error "no predictors given" -- | Compute the ordinary least-squares solution to /A x = b/. ols :: Matrix -- ^ /A/ has at least as many rows as columns. -> Vector -- ^ /b/ has the same length as columns in /A/. -> Vector ols a b | rs < cs = error $ "fewer rows than columns " ++ show d | otherwise = solve r (transpose q `multiplyV` b) where d@(rs,cs) = dimension a (q,r) = qr a -- | Solve the equation /R x = b/. solve :: Matrix -- ^ /R/ is an upper-triangular square matrix. -> Vector -- ^ /b/ is of the same length as rows\/columns in /R/. -> Vector solve r b | n /= l = error $ "row/vector mismatch " ++ show (n,l) | otherwise = U.create $ do s <- U.thaw b rfor n 0 $ \i -> do si <- (/ unsafeIndex r i i) <$> M.unsafeRead s i M.unsafeWrite s i si for 0 i $ \j -> F.unsafeModify s j $ subtract ((unsafeIndex r j i) * si) return s where n = rows r l = U.length b -- | Compute /R²/, the coefficient of determination that -- indicates goodness-of-fit of a regression. -- -- This value will be 1 if the predictors fit perfectly, dropping to 0 -- if they have no explanatory power. rSquare :: Matrix -- ^ Predictors (regressors). -> Vector -- ^ Responders. -> Vector -- ^ Regression coefficients. -> Double rSquare pred resp coeff = 1 - r / t where r = sum $ flip U.imap resp $ \i x -> square (x - p i) t = sum $ flip U.map resp $ \x -> square (x - mean resp) p i = sum . flip U.imap coeff $ \j -> (* unsafeIndex pred i j) -- | Bootstrap a regression function. Returns both the results of the -- regression and the requested confidence interval values. bootstrapRegress :: GenIO -> Int -- ^ Number of resamples to compute. -> Double -- ^ Confidence interval. -> ([Vector] -> Vector -> (Vector, Double)) -- ^ Regression function. -> [Vector] -- ^ Predictor vectors. -> Vector -- ^ Responder vector. -> IO (V.Vector Estimate, Estimate) bootstrapRegress gen0 numResamples ci rgrss preds0 resp0 | numResamples < 1 = error $ "bootstrapRegress: number of resamples " ++ "must be positive" | ci <= 0 || ci >= 1 = error $ "bootstrapRegress: confidence interval " ++ "must lie between 0 and 1" | otherwise = do caps <- getNumCapabilities gens <- splitGen caps gen0 done <- newChan forM_ (zip gens (balance caps numResamples)) $ \(gen,count) -> do forkIO $ do v <- V.replicateM count $ do let n = U.length resp0 ixs <- U.replicateM n $ uniformR (0,n-1) gen let resp = U.backpermute resp0 ixs preds = map (flip U.backpermute ixs) preds0 return $ rgrss preds resp rnf v `seq` writeChan done v (coeffsv, r2v) <- (G.unzip . V.concat) <$> replicateM caps (readChan done) let coeffs = flip G.imap (G.convert coeffss) $ \i x -> est x . U.generate numResamples $ \k -> ((coeffsv G.! k) G.! i) r2 = est r2s (G.convert r2v) (coeffss, r2s) = rgrss preds0 resp0 est s v = Estimate s (w G.! lo) (w G.! hi) ci where w = F.sort v lo = round c hi = truncate (n - c) n = fromIntegral numResamples c = n * ((1 - ci) / 2) return (coeffs, r2) -- | Balance units of work across workers. balance :: Int -> Int -> [Int] balance numSlices numItems = zipWith (+) (replicate numSlices q) (replicate r 1 ++ repeat 0) where (q,r) = numItems `quotRem` numSlices statistics-0.13.2.3/Statistics/Resampling.hs0000644000000000000000000001476112504370360017116 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Resampling -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Resampling statistics. module Statistics.Resampling ( Resample(..) , jackknife , jackknifeMean , jackknifeVariance , jackknifeVarianceUnb , jackknifeStdDev , resample , estimate , splitGen ) where import Data.Aeson (FromJSON, ToJSON) import Control.Concurrent (forkIO, newChan, readChan, writeChan) import Control.Monad (forM_, liftM, replicateM, replicateM_) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import Data.Vector.Algorithms.Intro (sort) import Data.Vector.Binary () import Data.Vector.Generic (unsafeFreeze) import Data.Word (Word32) import GHC.Conc (numCapabilities) import GHC.Generics (Generic) import Numeric.Sum (Summation(..), kbn) import Statistics.Function (indices) import Statistics.Sample (mean, stdDev, variance, varianceUnbiased) import Statistics.Types (Estimator(..), Sample) import System.Random.MWC (GenIO, initialize, uniform, uniformVector) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU -- | A resample drawn randomly, with replacement, from a set of data -- points. Distinct from a normal array to make it harder for your -- humble author's brain to go wrong. newtype Resample = Resample { fromResample :: U.Vector Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Resample instance ToJSON Resample instance Binary Resample where put = put . fromResample get = fmap Resample get -- | /O(e*r*s)/ Resample a data set repeatedly, with replacement, -- computing each estimate over the resampled data. -- -- This function is expensive; it has to do work proportional to -- /e*r*s/, where /e/ is the number of estimation functions, /r/ is -- the number of resamples to compute, and /s/ is the number of -- original samples. -- -- To improve performance, this function will make use of all -- available CPUs. At least with GHC 7.0, parallel performance seems -- best if the parallel garbage collector is disabled (RTS option -- @-qg@). resample :: GenIO -> [Estimator] -- ^ Estimation functions. -> Int -- ^ Number of resamples to compute. -> Sample -- ^ Original sample. -> IO [Resample] resample gen ests numResamples samples = do let !numSamples = U.length samples ixs = scanl (+) 0 $ zipWith (+) (replicate numCapabilities q) (replicate r 1 ++ repeat 0) where (q,r) = numResamples `quotRem` numCapabilities results <- mapM (const (MU.new numResamples)) ests done <- newChan gens <- splitGen numCapabilities gen forM_ (zip3 ixs (tail ixs) gens) $ \ (start,!end,gen') -> do forkIO $ do let loop k ers | k >= end = writeChan done () | otherwise = do re <- U.replicateM numSamples $ do r <- uniform gen' return (U.unsafeIndex samples (r `mod` numSamples)) forM_ ers $ \(est,arr) -> MU.write arr k . est $ re loop (k+1) ers loop start (zip ests' results) replicateM_ numCapabilities $ readChan done mapM_ sort results mapM (liftM Resample . unsafeFreeze) results where ests' = map estimate ests -- | Run an 'Estimator' over a sample. estimate :: Estimator -> Sample -> Double estimate Mean = mean estimate Variance = variance estimate VarianceUnbiased = varianceUnbiased estimate StdDev = stdDev estimate (Function est) = est -- | /O(n) or O(n^2)/ Compute a statistical estimate repeatedly over a -- sample, each time omitting a successive element. jackknife :: Estimator -> Sample -> U.Vector Double jackknife Mean sample = jackknifeMean sample jackknife Variance sample = jackknifeVariance sample jackknife VarianceUnbiased sample = jackknifeVarianceUnb sample jackknife StdDev sample = jackknifeStdDev sample jackknife (Function est) sample | G.length sample == 1 = singletonErr "jackknife" | otherwise = U.map f . indices $ sample where f i = est (dropAt i sample) -- | /O(n)/ Compute the jackknife mean of a sample. jackknifeMean :: Sample -> U.Vector Double jackknifeMean samp | len == 1 = singletonErr "jackknifeMean" | otherwise = G.map (/l) $ G.zipWith (+) (pfxSumL samp) (pfxSumR samp) where l = fromIntegral (len - 1) len = G.length samp -- | /O(n)/ Compute the jackknife variance of a sample with a -- correction factor @c@, so we can get either the regular or -- \"unbiased\" variance. jackknifeVariance_ :: Double -> Sample -> U.Vector Double jackknifeVariance_ c samp | len == 1 = singletonErr "jackknifeVariance" | otherwise = G.zipWith4 go als ars bls brs where als = pfxSumL . G.map goa $ samp ars = pfxSumR . G.map goa $ samp goa x = v * v where v = x - m bls = pfxSumL . G.map (subtract m) $ samp brs = pfxSumR . G.map (subtract m) $ samp m = mean samp n = fromIntegral len go al ar bl br = (al + ar - (b * b) / q) / (q - c) where b = bl + br q = n - 1 len = G.length samp -- | /O(n)/ Compute the unbiased jackknife variance of a sample. jackknifeVarianceUnb :: Sample -> U.Vector Double jackknifeVarianceUnb = jackknifeVariance_ 1 -- | /O(n)/ Compute the jackknife variance of a sample. jackknifeVariance :: Sample -> U.Vector Double jackknifeVariance = jackknifeVariance_ 0 -- | /O(n)/ Compute the jackknife standard deviation of a sample. jackknifeStdDev :: Sample -> U.Vector Double jackknifeStdDev = G.map sqrt . jackknifeVarianceUnb pfxSumL :: U.Vector Double -> U.Vector Double pfxSumL = G.map kbn . G.scanl add zero pfxSumR :: U.Vector Double -> U.Vector Double pfxSumR = G.tail . G.map kbn . G.scanr (flip add) zero -- | Drop the /k/th element of a vector. dropAt :: U.Unbox e => Int -> U.Vector e -> U.Vector e dropAt n v = U.slice 0 n v U.++ U.slice (n+1) (U.length v - n - 1) v singletonErr :: String -> a singletonErr func = error $ "Statistics.Resampling." ++ func ++ ": singleton input" -- | Split a generator into several that can run independently. splitGen :: Int -> GenIO -> IO [GenIO] splitGen n gen | n <= 0 = return [] | otherwise = fmap (gen:) . replicateM (n-1) $ initialize =<< (uniformVector gen 256 :: IO (U.Vector Word32)) statistics-0.13.2.3/Statistics/Sample.hs0000644000000000000000000003241612504370360016233 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample -- Copyright : (c) 2008 Don Stewart, 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Commonly used sample statistics, also known as descriptive -- statistics. module Statistics.Sample ( -- * Types Sample , WeightedSample -- * Descriptive functions , range -- * Statistics of location , mean , welfordMean , meanWeighted , harmonicMean , geometricMean -- * Statistics of dispersion -- $variance -- ** Functions over central moments , centralMoment , centralMoments , skewness , kurtosis -- ** Two-pass functions (numerically robust) -- $robust , variance , varianceUnbiased , meanVariance , meanVarianceUnb , stdDev , varianceWeighted -- ** Single-pass functions (faster, less safe) -- $cancellation , fastVariance , fastVarianceUnbiased , fastStdDev -- * References -- $references ) where import Statistics.Function (minMax) import Statistics.Sample.Internal (robustSumVar, sum) import Statistics.Types (Sample,WeightedSample) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- Operator ^ will be overriden import Prelude hiding ((^), sum) -- | /O(n)/ Range. The difference between the largest and smallest -- elements of a sample. range :: (G.Vector v Double) => v Double -> Double range s = hi - lo where (lo , hi) = minMax s {-# INLINE range #-} -- | /O(n)/ Arithmetic mean. This uses Kahan-Babuška-Neumaier -- summation, so is more accurate than 'welfordMean' unless the input -- values are very large. mean :: (G.Vector v Double) => v Double -> Double mean xs = sum xs / fromIntegral (G.length xs) {-# SPECIALIZE mean :: U.Vector Double -> Double #-} {-# SPECIALIZE mean :: V.Vector Double -> Double #-} -- | /O(n)/ Arithmetic mean. This uses Welford's algorithm to provide -- numerical stability, using a single pass over the sample data. -- -- Compared to 'mean', this loses a surprising amount of precision -- unless the inputs are very large. welfordMean :: (G.Vector v Double) => v Double -> Double welfordMean = fini . G.foldl' go (T 0 0) where fini (T a _) = a go (T m n) x = T m' n' where m' = m + (x - m) / fromIntegral n' n' = n + 1 {-# SPECIALIZE welfordMean :: U.Vector Double -> Double #-} {-# SPECIALIZE welfordMean :: V.Vector Double -> Double #-} -- | /O(n)/ Arithmetic mean for weighted sample. It uses a single-pass -- algorithm analogous to the one used by 'welfordMean'. meanWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> Double meanWeighted = fini . G.foldl' go (V 0 0) where fini (V a _) = a go (V m w) (x,xw) = V m' w' where m' | w' == 0 = 0 | otherwise = m + xw * (x - m) / w' w' = w + xw {-# INLINE meanWeighted #-} -- | /O(n)/ Harmonic mean. This algorithm performs a single pass over -- the sample. harmonicMean :: (G.Vector v Double) => v Double -> Double harmonicMean = fini . G.foldl' go (T 0 0) where fini (T b a) = fromIntegral a / b go (T x y) n = T (x + (1/n)) (y+1) {-# INLINE harmonicMean #-} -- | /O(n)/ Geometric mean of a sample containing no negative values. geometricMean :: (G.Vector v Double) => v Double -> Double geometricMean = exp . mean . G.map log {-# INLINE geometricMean #-} -- | Compute the /k/th central moment of a sample. The central moment -- is also known as the moment about the mean. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. centralMoment :: (G.Vector v Double) => Int -> v Double -> Double centralMoment a xs | a < 0 = error "Statistics.Sample.centralMoment: negative input" | a == 0 = 1 | a == 1 = 0 | otherwise = sum (G.map go xs) / fromIntegral (G.length xs) where go x = (x-m) ^ a m = mean xs {-# SPECIALIZE centralMoment :: Int -> U.Vector Double -> Double #-} {-# SPECIALIZE centralMoment :: Int -> V.Vector Double -> Double #-} -- | Compute the /k/th and /j/th central moments of a sample. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. centralMoments :: (G.Vector v Double) => Int -> Int -> v Double -> (Double, Double) centralMoments a b xs | a < 2 || b < 2 = (centralMoment a xs , centralMoment b xs) | otherwise = fini . G.foldl' go (V 0 0) $ xs where go (V i j) x = V (i + d^a) (j + d^b) where d = x - m fini (V i j) = (i / n , j / n) m = mean xs n = fromIntegral (G.length xs) {-# SPECIALIZE centralMoments :: Int -> Int -> U.Vector Double -> (Double, Double) #-} {-# SPECIALIZE centralMoments :: Int -> Int -> V.Vector Double -> (Double, Double) #-} -- | Compute the skewness of a sample. This is a measure of the -- asymmetry of its distribution. -- -- A sample with negative skew is said to be /left-skewed/. Most of -- its mass is on the right of the distribution, with the tail on the -- left. -- -- > skewness $ U.to [1,100,101,102,103] -- > ==> -1.497681449918257 -- -- A sample with positive skew is said to be /right-skewed/. -- -- > skewness $ U.to [1,2,3,4,100] -- > ==> 1.4975367033335198 -- -- A sample's skewness is not defined if its 'variance' is zero. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. skewness :: (G.Vector v Double) => v Double -> Double skewness xs = c3 * c2 ** (-1.5) where (c3 , c2) = centralMoments 3 2 xs {-# SPECIALIZE skewness :: U.Vector Double -> Double #-} {-# SPECIALIZE skewness :: V.Vector Double -> Double #-} -- | Compute the excess kurtosis of a sample. This is a measure of -- the \"peakedness\" of its distribution. A high kurtosis indicates -- that more of the sample's variance is due to infrequent severe -- deviations, rather than more frequent modest deviations. -- -- A sample's excess kurtosis is not defined if its 'variance' is -- zero. -- -- This function performs two passes over the sample, so is not subject -- to stream fusion. -- -- For samples containing many values very close to the mean, this -- function is subject to inaccuracy due to catastrophic cancellation. kurtosis :: (G.Vector v Double) => v Double -> Double kurtosis xs = c4 / (c2 * c2) - 3 where (c4 , c2) = centralMoments 4 2 xs {-# SPECIALIZE kurtosis :: U.Vector Double -> Double #-} {-# SPECIALIZE kurtosis :: V.Vector Double -> Double #-} -- $variance -- -- The variance—and hence the standard deviation—of a -- sample of fewer than two elements are both defined to be zero. -- $robust -- -- These functions use the compensated summation algorithm of Chan et -- al. for numerical robustness, but require two passes over the -- sample data as a result. -- -- Because of the need for two passes, these functions are /not/ -- subject to stream fusion. data V = V {-# UNPACK #-} !Double {-# UNPACK #-} !Double -- | Maximum likelihood estimate of a sample's variance. Also known -- as the population variance, where the denominator is /n/. variance :: (G.Vector v Double) => v Double -> Double variance samp | n > 1 = robustSumVar (mean samp) samp / fromIntegral n | otherwise = 0 where n = G.length samp {-# SPECIALIZE variance :: U.Vector Double -> Double #-} {-# SPECIALIZE variance :: V.Vector Double -> Double #-} -- | Unbiased estimate of a sample's variance. Also known as the -- sample variance, where the denominator is /n/-1. varianceUnbiased :: (G.Vector v Double) => v Double -> Double varianceUnbiased samp | n > 1 = robustSumVar (mean samp) samp / fromIntegral (n-1) | otherwise = 0 where n = G.length samp {-# SPECIALIZE varianceUnbiased :: U.Vector Double -> Double #-} {-# SPECIALIZE varianceUnbiased :: V.Vector Double -> Double #-} -- | Calculate mean and maximum likelihood estimate of variance. This -- function should be used if both mean and variance are required -- since it will calculate mean only once. meanVariance :: (G.Vector v Double) => v Double -> (Double,Double) meanVariance samp | n > 1 = (m, robustSumVar m samp / fromIntegral n) | otherwise = (m, 0) where n = G.length samp m = mean samp {-# SPECIALIZE meanVariance :: U.Vector Double -> (Double,Double) #-} {-# SPECIALIZE meanVariance :: V.Vector Double -> (Double,Double) #-} -- | Calculate mean and unbiased estimate of variance. This -- function should be used if both mean and variance are required -- since it will calculate mean only once. meanVarianceUnb :: (G.Vector v Double) => v Double -> (Double,Double) meanVarianceUnb samp | n > 1 = (m, robustSumVar m samp / fromIntegral (n-1)) | otherwise = (m, 0) where n = G.length samp m = mean samp {-# SPECIALIZE meanVarianceUnb :: U.Vector Double -> (Double,Double) #-} {-# SPECIALIZE meanVarianceUnb :: V.Vector Double -> (Double,Double) #-} -- | Standard deviation. This is simply the square root of the -- unbiased estimate of the variance. stdDev :: (G.Vector v Double) => v Double -> Double stdDev = sqrt . varianceUnbiased {-# SPECIALIZE stdDev :: U.Vector Double -> Double #-} {-# SPECIALIZE stdDev :: V.Vector Double -> Double #-} robustSumVarWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> V robustSumVarWeighted samp = G.foldl' go (V 0 0) samp where go (V s w) (x,xw) = V (s + xw*d*d) (w + xw) where d = x - m m = meanWeighted samp {-# INLINE robustSumVarWeighted #-} -- | Weighted variance. This is biased estimation. varianceWeighted :: (G.Vector v (Double,Double)) => v (Double,Double) -> Double varianceWeighted samp | G.length samp > 1 = fini $ robustSumVarWeighted samp | otherwise = 0 where fini (V s w) = s / w {-# SPECIALIZE varianceWeighted :: U.Vector (Double,Double) -> Double #-} {-# SPECIALIZE varianceWeighted :: V.Vector (Double,Double) -> Double #-} -- $cancellation -- -- The functions prefixed with the name @fast@ below perform a single -- pass over the sample data using Knuth's algorithm. They usually -- work well, but see below for caveats. These functions are subject -- to array fusion. -- -- /Note/: in cases where most sample data is close to the sample's -- mean, Knuth's algorithm gives inaccurate results due to -- catastrophic cancellation. fastVar :: (G.Vector v Double) => v Double -> T1 fastVar = G.foldl' go (T1 0 0 0) where go (T1 n m s) x = T1 n' m' s' where n' = n + 1 m' = m + d / fromIntegral n' s' = s + d * (x - m') d = x - m -- | Maximum likelihood estimate of a sample's variance. fastVariance :: (G.Vector v Double) => v Double -> Double fastVariance = fini . fastVar where fini (T1 n _m s) | n > 1 = s / fromIntegral n | otherwise = 0 {-# INLINE fastVariance #-} -- | Unbiased estimate of a sample's variance. fastVarianceUnbiased :: (G.Vector v Double) => v Double -> Double fastVarianceUnbiased = fini . fastVar where fini (T1 n _m s) | n > 1 = s / fromIntegral (n - 1) | otherwise = 0 {-# INLINE fastVarianceUnbiased #-} -- | Standard deviation. This is simply the square root of the -- maximum likelihood estimate of the variance. fastStdDev :: (G.Vector v Double) => v Double -> Double fastStdDev = sqrt . fastVariance {-# INLINE fastStdDev #-} ------------------------------------------------------------------------ -- Helper code. Monomorphic unpacked accumulators. -- (^) operator from Prelude is just slow. (^) :: Double -> Int -> Double x ^ 1 = x x ^ n = x * (x ^ (n-1)) {-# INLINE (^) #-} -- don't support polymorphism, as we can't get unboxed returns if we use it. data T = T {-# UNPACK #-}!Double {-# UNPACK #-}!Int data T1 = T1 {-# UNPACK #-}!Int {-# UNPACK #-}!Double {-# UNPACK #-}!Double {- Consider this core: with data T a = T !a !Int $wfold :: Double# -> Int# -> Int# -> (# Double, Int# #) and without, $wfold :: Double# -> Int# -> Int# -> (# Double#, Int# #) yielding to boxed returns and heap checks. -} -- $references -- -- * Chan, T. F.; Golub, G.H.; LeVeque, R.J. (1979) Updating formulae -- and a pairwise algorithm for computing sample -- variances. Technical Report STAN-CS-79-773, Department of -- Computer Science, Stanford -- University. -- -- * Knuth, D.E. (1998) The art of computer programming, volume 2: -- seminumerical algorithms, 3rd ed., p. 232. -- -- * Welford, B.P. (1962) Note on a method for calculating corrected -- sums of squares and products. /Technometrics/ -- 4(3):419–420. -- -- * West, D.H.D. (1979) Updating mean and variance estimates: an -- improved method. /Communications of the ACM/ -- 22(9):532–535. statistics-0.13.2.3/Statistics/Transform.hs0000644000000000000000000001143012504370360016756 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts #-} -- | -- Module : Statistics.Transform -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Fourier-related transformations of mathematical functions. -- -- These functions are written for simplicity and correctness, not -- speed. If you need a fast FFT implementation for your application, -- you should strongly consider using a library of FFTW bindings -- instead. module Statistics.Transform ( -- * Type synonyms CD -- * Discrete cosine transform , dct , dct_ , idct , idct_ -- * Fast Fourier transform , fft , ifft ) where import Control.Monad (when) import Control.Monad.ST (ST) import Data.Bits (shiftL, shiftR) import Data.Complex (Complex(..), conjugate, realPart) import Numeric.SpecFunctions (log2) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as M import qualified Data.Vector.Unboxed as U type CD = Complex Double -- | Discrete cosine transform (DCT-II). dct :: U.Vector Double -> U.Vector Double dct = dctWorker . G.map (:+0) -- | Discrete cosine transform (DCT-II). Only real part of vector is -- transformed, imaginary part is ignored. dct_ :: U.Vector CD -> U.Vector Double dct_ = dctWorker . G.map (\(i :+ _) -> i :+ 0) dctWorker :: U.Vector CD -> U.Vector Double dctWorker xs -- length 1 is special cased because shuffle algorithms fail for it. | G.length xs == 1 = G.map ((2*) . realPart) xs | vectorOK xs = G.map realPart $ G.zipWith (*) weights (fft interleaved) | otherwise = error "Statistics.Transform.dct: bad vector length" where interleaved = G.backpermute xs $ G.enumFromThenTo 0 2 (len-2) G.++ G.enumFromThenTo (len-1) (len-3) 1 weights = G.cons 2 . G.generate (len-1) $ \x -> 2 * exp ((0:+(-1))*fi (x+1)*pi/(2*n)) where n = fi len len = G.length xs -- | Inverse discrete cosine transform (DCT-III). It's inverse of -- 'dct' only up to scale parameter: -- -- > (idct . dct) x = (* length x) idct :: U.Vector Double -> U.Vector Double idct = idctWorker . G.map (:+0) -- | Inverse discrete cosine transform (DCT-III). Only real part of vector is -- transformed, imaginary part is ignored. idct_ :: U.Vector CD -> U.Vector Double idct_ = idctWorker . G.map (\(i :+ _) -> i :+ 0) idctWorker :: U.Vector CD -> U.Vector Double idctWorker xs | vectorOK xs = G.generate len interleave | otherwise = error "Statistics.Transform.dct: bad vector length" where interleave z | even z = vals `G.unsafeIndex` halve z | otherwise = vals `G.unsafeIndex` (len - halve z - 1) vals = G.map realPart . ifft $ G.zipWith (*) weights xs weights = G.cons n $ G.generate (len - 1) $ \x -> 2 * n * exp ((0:+1) * fi (x+1) * pi/(2*n)) where n = fi len len = G.length xs -- | Inverse fast Fourier transform. ifft :: U.Vector CD -> U.Vector CD ifft xs | vectorOK xs = G.map ((/fi (G.length xs)) . conjugate) . fft . G.map conjugate $ xs | otherwise = error "Statistics.Transform.ifft: bad vector length" -- | Radix-2 decimation-in-time fast Fourier transform. fft :: U.Vector CD -> U.Vector CD fft v | vectorOK v = G.create $ do mv <- G.thaw v mfft mv return mv | otherwise = error "Statistics.Transform.fft: bad vector length" -- Vector length must be power of two. It's not checked mfft :: (M.MVector v CD) => v s CD -> ST s () mfft vec = bitReverse 0 0 where bitReverse i j | i == len-1 = stage 0 1 | otherwise = do when (i < j) $ M.swap vec i j let inner k l | k <= l = inner (k `shiftR` 1) (l-k) | otherwise = bitReverse (i+1) (l+k) inner (len `shiftR` 1) j stage l !l1 | l == m = return () | otherwise = do let !l2 = l1 `shiftL` 1 !e = -6.283185307179586/fromIntegral l2 flight j !a | j == l1 = stage (l+1) l2 | otherwise = do let butterfly i | i >= len = flight (j+1) (a+e) | otherwise = do let i1 = i + l1 xi1 :+ yi1 <- M.read vec i1 let !c = cos a !s = sin a d = (c*xi1 - s*yi1) :+ (s*xi1 + c*yi1) ci <- M.read vec i M.write vec i1 (ci - d) M.write vec i (ci + d) butterfly (i+l2) butterfly j flight 0 0 len = M.length vec m = log2 len fi :: Int -> CD fi = fromIntegral halve :: Int -> Int halve = (`shiftR` 1) vectorOK :: U.Unbox a => U.Vector a -> Bool {-# INLINE vectorOK #-} vectorOK v = (1 `shiftL` log2 n) == n where n = G.length v statistics-0.13.2.3/Statistics/Types.hs0000644000000000000000000000202112504370360016103 0ustar0000000000000000-- | -- Module : Statistics.Types -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Types for working with statistics. module Statistics.Types ( Estimator(..) , Sample , WeightedSample , Weights ) where import qualified Data.Vector.Unboxed as U (Vector) -- | Sample data. type Sample = U.Vector Double -- | Sample with weights. First element of sample is data, second is weight type WeightedSample = U.Vector (Double,Double) -- | An estimator of a property of a sample, such as its 'mean'. -- -- The use of an algebraic data type here allows functions such as -- 'jackknife' and 'bootstrapBCA' to use more efficient algorithms -- when possible. data Estimator = Mean | Variance | VarianceUnbiased | StdDev | Function (Sample -> Double) -- | Weights for affecting the importance of elements of a sample. type Weights = U.Vector Double statistics-0.13.2.3/Statistics/Correlation/0000755000000000000000000000000012504370360016731 5ustar0000000000000000statistics-0.13.2.3/Statistics/Correlation/Kendall.hs0000644000000000000000000001227012504370360020641 0ustar0000000000000000{-# LANGUAGE BangPatterns, CPP, FlexibleContexts #-} -- | -- Module : Statistics.Correlation.Kendall -- -- Fast O(NlogN) implementation of -- . -- -- This module implementes Kendall's tau form b which allows ties in the data. -- This is the same formula used by other statistical packages, e.g., R, matlab. -- -- $$\tau = \frac{n_c - n_d}{\sqrt{(n_0 - n_1)(n_0 - n_2)}}$$ -- -- where $n_0 = n(n-1)/2$, $n_1 = number of pairs tied for the first quantify$, -- $n_2 = number of pairs tied for the second quantify$, -- $n_c = number of concordant pairs$, $n_d = number of discordant pairs$. module Statistics.Correlation.Kendall ( kendall -- * References -- $references ) where import Control.Monad.ST (ST, runST) import Data.Bits (shiftR) import Data.Function (on) import Data.STRef import qualified Data.Vector.Algorithms.Intro as I import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as GM -- | /O(nlogn)/ Compute the Kendall's tau from a vector of paired data. -- Return NaN when number of pairs <= 1. kendall :: (Ord a, Ord b, G.Vector v (a, b)) => v (a, b) -> Double kendall xy' | G.length xy' <= 1 = 0/0 | otherwise = runST $ do xy <- G.thaw xy' let n = GM.length xy n_dRef <- newSTRef 0 I.sort xy tieX <- numOfTiesBy ((==) `on` fst) xy tieXY <- numOfTiesBy (==) xy tmp <- GM.new n mergeSort (compare `on` snd) xy tmp n_dRef tieY <- numOfTiesBy ((==) `on` snd) xy n_d <- readSTRef n_dRef let n_0 = (fromIntegral n * (fromIntegral n-1)) `shiftR` 1 :: Integer n_c = n_0 - n_d - tieX - tieY + tieXY return $ fromIntegral (n_c - n_d) / (sqrt.fromIntegral) ((n_0 - tieX) * (n_0 - tieY)) {-# INLINE kendall #-} -- calculate number of tied pairs in a sorted vector numOfTiesBy :: GM.MVector v a => (a -> a -> Bool) -> v s a -> ST s Integer numOfTiesBy f xs = do count <- newSTRef (0::Integer) loop count (1::Int) (0::Int) readSTRef count where n = GM.length xs loop c !acc !i | i >= n - 1 = modifySTRef' c (+ g acc) | otherwise = do x1 <- GM.unsafeRead xs i x2 <- GM.unsafeRead xs (i+1) if f x1 x2 then loop c (acc+1) (i+1) else modifySTRef' c (+ g acc) >> loop c 1 (i+1) g x = fromIntegral ((x * (x - 1)) `shiftR` 1) {-# INLINE numOfTiesBy #-} -- Implementation of Knight's merge sort (adapted from vector-algorithm). This -- function is used to count the number of discordant pairs. mergeSort :: GM.MVector v e => (e -> e -> Ordering) -> v s e -> v s e -> STRef s Integer -> ST s () mergeSort cmp src buf count = loop 0 (GM.length src - 1) where loop l u | u == l = return () | u - l == 1 = do eL <- GM.unsafeRead src l eU <- GM.unsafeRead src u case cmp eL eU of GT -> do GM.unsafeWrite src l eU GM.unsafeWrite src u eL modifySTRef' count (+1) _ -> return () | otherwise = do let mid = (u + l) `shiftR` 1 loop l mid loop mid u merge cmp (GM.unsafeSlice l (u-l+1) src) buf (mid - l) count {-# INLINE mergeSort #-} merge :: GM.MVector v e => (e -> e -> Ordering) -> v s e -> v s e -> Int -> STRef s Integer -> ST s () merge cmp src buf mid count = do GM.unsafeCopy tmp lower eTmp <- GM.unsafeRead tmp 0 eUpp <- GM.unsafeRead upper 0 loop tmp 0 eTmp upper 0 eUpp 0 where lower = GM.unsafeSlice 0 mid src upper = GM.unsafeSlice mid (GM.length src - mid) src tmp = GM.unsafeSlice 0 mid buf wroteHigh low iLow eLow high iHigh iIns | iHigh >= GM.length high = GM.unsafeCopy (GM.unsafeSlice iIns (GM.length low - iLow) src) (GM.unsafeSlice iLow (GM.length low - iLow) low) | otherwise = do eHigh <- GM.unsafeRead high iHigh loop low iLow eLow high iHigh eHigh iIns wroteLow low iLow high iHigh eHigh iIns | iLow >= GM.length low = return () | otherwise = do eLow <- GM.unsafeRead low iLow loop low iLow eLow high iHigh eHigh iIns loop !low !iLow !eLow !high !iHigh !eHigh !iIns = case cmp eHigh eLow of LT -> do GM.unsafeWrite src iIns eHigh modifySTRef' count (+ fromIntegral (GM.length low - iLow)) wroteHigh low iLow eLow high (iHigh+1) (iIns+1) _ -> do GM.unsafeWrite src iIns eLow wroteLow low (iLow+1) high iHigh eHigh (iIns+1) {-# INLINE merge #-} #if !MIN_VERSION_base(4,6,0) modifySTRef' :: STRef s a -> (a -> a) -> ST s () modifySTRef' = modifySTRef #endif -- $references -- -- * William R. Knight. (1966) A computer method for calculating Kendall's Tau -- with ungrouped data. /Journal of the American Statistical Association/, -- Vol. 61, No. 314, Part 1, pp. 436-439. -- statistics-0.13.2.3/Statistics/Distribution/0000755000000000000000000000000012504370360017127 5ustar0000000000000000statistics-0.13.2.3/Statistics/Distribution/Beta.hs0000644000000000000000000000654112504370360020344 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} ----------------------------------------------------------------------------- -- | -- Module : Statistics.Distribution.Beta -- Copyright : (C) 2012 Edward Kmett, -- License : BSD-style (see the file LICENSE) -- -- Maintainer : Edward Kmett -- Stability : provisional -- Portability : DeriveDataTypeable -- ---------------------------------------------------------------------------- module Statistics.Distribution.Beta ( BetaDistribution -- * Constructor , betaDistr , improperBetaDistr -- * Accessors , bdAlpha , bdBeta ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.SpecFunctions ( incompleteBeta, invIncompleteBeta, logBeta, digamma) import Numeric.MathFunctions.Constants (m_NaN) import qualified Statistics.Distribution as D import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | The beta distribution data BetaDistribution = BD { bdAlpha :: {-# UNPACK #-} !Double -- ^ Alpha shape parameter , bdBeta :: {-# UNPACK #-} !Double -- ^ Beta shape parameter } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON BetaDistribution instance ToJSON BetaDistribution instance Binary BetaDistribution where put (BD x y) = put x >> put y get = BD <$> get <*> get -- | Create beta distribution. Both shape parameters must be positive. betaDistr :: Double -- ^ Shape parameter alpha -> Double -- ^ Shape parameter beta -> BetaDistribution betaDistr a b | a > 0 && b > 0 = improperBetaDistr a b | otherwise = error $ "Statistics.Distribution.Beta.betaDistr: " ++ "shape parameters must be positive. Got a = " ++ show a ++ " b = " ++ show b -- | Create beta distribution. This construtor doesn't check parameters. improperBetaDistr :: Double -- ^ Shape parameter alpha -> Double -- ^ Shape parameter beta -> BetaDistribution improperBetaDistr = BD instance D.Distribution BetaDistribution where cumulative (BD a b) x | x <= 0 = 0 | x >= 1 = 1 | otherwise = incompleteBeta a b x instance D.Mean BetaDistribution where mean (BD a b) = a / (a + b) instance D.MaybeMean BetaDistribution where maybeMean = Just . D.mean instance D.Variance BetaDistribution where variance (BD a b) = a*b / (apb*apb*(apb+1)) where apb = a + b instance D.MaybeVariance BetaDistribution where maybeVariance = Just . D.variance instance D.Entropy BetaDistribution where entropy (BD a b) = logBeta a b - (a-1) * digamma a - (b-1) * digamma b + (a+b-2) * digamma (a+b) instance D.MaybeEntropy BetaDistribution where maybeEntropy = Just . D.entropy instance D.ContDistr BetaDistribution where density (BD a b) x | a <= 0 || b <= 0 = m_NaN | x <= 0 = 0 | x >= 1 = 0 | otherwise = exp $ (a-1)*log x + (b-1)*log (1-x) - logBeta a b quantile (BD a b) p | p == 0 = 0 | p == 1 = 1 | p > 0 && p < 1 = invIncompleteBeta a b p | otherwise = error $ "Statistics.Distribution.Gamma.quantile: p must be in [0,1] range. Got: "++show p instance D.ContGen BetaDistribution where genContVar = D.genContinous statistics-0.13.2.3/Statistics/Distribution/Binomial.hs0000644000000000000000000000742112504370360021221 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Binomial -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The binomial distribution. This is the discrete probability -- distribution of the number of successes in a sequence of /n/ -- independent yes\/no experiments, each of which yields success with -- probability /p/. module Statistics.Distribution.Binomial ( BinomialDistribution -- * Constructors , binomial -- * Accessors , bdTrials , bdProbability ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import qualified Statistics.Distribution.Poisson.Internal as I import Numeric.SpecFunctions (choose,incompleteBeta) import Numeric.MathFunctions.Constants (m_epsilon) import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | The binomial distribution. data BinomialDistribution = BD { bdTrials :: {-# UNPACK #-} !Int -- ^ Number of trials. , bdProbability :: {-# UNPACK #-} !Double -- ^ Probability. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON BinomialDistribution instance ToJSON BinomialDistribution instance Binary BinomialDistribution where put (BD x y) = put x >> put y get = BD <$> get <*> get instance D.Distribution BinomialDistribution where cumulative = cumulative instance D.DiscreteDistr BinomialDistribution where probability = probability instance D.Mean BinomialDistribution where mean = mean instance D.Variance BinomialDistribution where variance = variance instance D.MaybeMean BinomialDistribution where maybeMean = Just . D.mean instance D.MaybeVariance BinomialDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy BinomialDistribution where entropy (BD n p) | n == 0 = 0 | n <= 100 = directEntropy (BD n p) | otherwise = I.poissonEntropy (fromIntegral n * p) instance D.MaybeEntropy BinomialDistribution where maybeEntropy = Just . D.entropy -- This could be slow for big n probability :: BinomialDistribution -> Int -> Double probability (BD n p) k | k < 0 || k > n = 0 | n == 0 = 1 | otherwise = choose n k * p^k * (1-p)^(n-k) -- Summation from different sides required to reduce roundoff errors cumulative :: BinomialDistribution -> Double -> Double cumulative (BD n p) x | isNaN x = error "Statistics.Distribution.Binomial.cumulative: NaN input" | isInfinite x = if x > 0 then 1 else 0 | k < 0 = 0 | k >= n = 1 | otherwise = incompleteBeta (fromIntegral (n-k)) (fromIntegral (k+1)) (1 - p) where k = floor x mean :: BinomialDistribution -> Double mean (BD n p) = fromIntegral n * p variance :: BinomialDistribution -> Double variance (BD n p) = fromIntegral n * p * (1 - p) directEntropy :: BinomialDistribution -> Double directEntropy d@(BD n _) = negate . sum $ takeWhile (< negate m_epsilon) $ dropWhile (not . (< negate m_epsilon)) $ [ let x = probability d k in x * log x | k <- [0..n]] -- | Construct binomial distribution. Number of trials must be -- non-negative and probability must be in [0,1] range binomial :: Int -- ^ Number of trials. -> Double -- ^ Probability. -> BinomialDistribution binomial n p | n < 0 = error $ msg ++ "number of trials must be non-negative. Got " ++ show n | p < 0 || p > 1 = error $ msg++"probability must be in [0,1] range. Got " ++ show p | otherwise = BD n p where msg = "Statistics.Distribution.Binomial.binomial: " statistics-0.13.2.3/Statistics/Distribution/CauchyLorentz.hs0000644000000000000000000000525212504370360022261 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.CauchyLorentz -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Cauchy-Lorentz distribution. It's also known as Lorentz -- distribution or Breit–Wigner distribution. -- -- It doesn't have mean and variance. module Statistics.Distribution.CauchyLorentz ( CauchyDistribution , cauchyDistribMedian , cauchyDistribScale -- * Constructors , cauchyDistribution , standardCauchy ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | Cauchy-Lorentz distribution. data CauchyDistribution = CD { -- | Central value of Cauchy-Lorentz distribution which is its -- mode and median. Distribution doesn't have mean so function -- is named after median. cauchyDistribMedian :: {-# UNPACK #-} !Double -- | Scale parameter of Cauchy-Lorentz distribution. It's -- different from variance and specify half width at half -- maximum (HWHM). , cauchyDistribScale :: {-# UNPACK #-} !Double } deriving (Eq, Show, Read, Typeable, Data, Generic) instance FromJSON CauchyDistribution instance ToJSON CauchyDistribution instance Binary CauchyDistribution where put (CD x y) = put x >> put y get = CD <$> get <*> get -- | Cauchy distribution cauchyDistribution :: Double -- ^ Central point -> Double -- ^ Scale parameter (FWHM) -> CauchyDistribution cauchyDistribution m s | s > 0 = CD m s | otherwise = error $ "Statistics.Distribution.CauchyLorentz.cauchyDistribution: FWHM must be positive. Got " ++ show s standardCauchy :: CauchyDistribution standardCauchy = CD 0 1 instance D.Distribution CauchyDistribution where cumulative (CD m s) x = 0.5 + atan( (x - m) / s ) / pi instance D.ContDistr CauchyDistribution where density (CD m s) x = (1 / pi) / (s * (1 + y*y)) where y = (x - m) / s quantile (CD m s) p | p > 0 && p < 1 = m + s * tan( pi * (p - 0.5) ) | p == 0 = -1 / 0 | p == 1 = 1 / 0 | otherwise = error $ "Statistics.Distribution.CauchyLorentz..quantile: p must be in [0,1] range. Got: "++show p instance D.ContGen CauchyDistribution where genContVar = D.genContinous instance D.Entropy CauchyDistribution where entropy (CD _ s) = log s + log (4*pi) instance D.MaybeEntropy CauchyDistribution where maybeEntropy = Just . D.entropy statistics-0.13.2.3/Statistics/Distribution/ChiSquared.hs0000644000000000000000000000623212504370360021516 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.ChiSquared -- Copyright : (c) 2010 Alexey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The chi-squared distribution. This is a continuous probability -- distribution of sum of squares of k independent standard normal -- distributions. It's commonly used in statistical tests module Statistics.Distribution.ChiSquared ( ChiSquared -- Constructors , chiSquared , chiSquaredNDF ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.SpecFunctions ( incompleteGamma,invIncompleteGamma,logGamma,digamma) import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC import Data.Binary (put, get) -- | Chi-squared distribution newtype ChiSquared = ChiSquared Int deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON ChiSquared instance ToJSON ChiSquared instance Binary ChiSquared where get = fmap ChiSquared get put (ChiSquared x) = put x -- | Get number of degrees of freedom chiSquaredNDF :: ChiSquared -> Int chiSquaredNDF (ChiSquared ndf) = ndf -- | Construct chi-squared distribution. Number of degrees of freedom -- must be positive. chiSquared :: Int -> ChiSquared chiSquared n | n <= 0 = error $ "Statistics.Distribution.ChiSquared.chiSquared: N.D.F. must be positive. Got " ++ show n | otherwise = ChiSquared n instance D.Distribution ChiSquared where cumulative = cumulative instance D.ContDistr ChiSquared where density = density quantile = quantile instance D.Mean ChiSquared where mean (ChiSquared ndf) = fromIntegral ndf instance D.Variance ChiSquared where variance (ChiSquared ndf) = fromIntegral (2*ndf) instance D.MaybeMean ChiSquared where maybeMean = Just . D.mean instance D.MaybeVariance ChiSquared where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy ChiSquared where entropy (ChiSquared ndf) = let kHalf = 0.5 * fromIntegral ndf in kHalf + log 2 + logGamma kHalf + (1-kHalf) * digamma kHalf instance D.MaybeEntropy ChiSquared where maybeEntropy = Just . D.entropy instance D.ContGen ChiSquared where genContVar (ChiSquared n) = MWC.chiSquare n cumulative :: ChiSquared -> Double -> Double cumulative chi x | x <= 0 = 0 | otherwise = incompleteGamma (ndf/2) (x/2) where ndf = fromIntegral $ chiSquaredNDF chi density :: ChiSquared -> Double -> Double density chi x | x <= 0 = 0 | otherwise = exp $ log x * (ndf2 - 1) - x2 - logGamma ndf2 - log 2 * ndf2 where ndf = fromIntegral $ chiSquaredNDF chi ndf2 = ndf/2 x2 = x/2 quantile :: ChiSquared -> Double -> Double quantile (ChiSquared ndf) p | p == 0 = 0 | p == 1 = 1/0 | p > 0 && p < 1 = 2 * invIncompleteGamma (fromIntegral ndf / 2) p | otherwise = error $ "Statistics.Distribution.ChiSquared.quantile: p must be in [0,1] range. Got: "++show p statistics-0.13.2.3/Statistics/Distribution/Exponential.hs0000644000000000000000000000662212504370360021757 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Exponential -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The exponential distribution. This is the continunous probability -- distribution of the times between events in a poisson process, in -- which events occur continuously and independently at a constant -- average rate. module Statistics.Distribution.Exponential ( ExponentialDistribution -- * Constructors , exponential , exponentialFromSample -- * Accessors , edLambda ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_neg_inf) import qualified Statistics.Distribution as D import qualified Statistics.Sample as S import qualified System.Random.MWC.Distributions as MWC import Statistics.Types (Sample) import Data.Binary (put, get) newtype ExponentialDistribution = ED { edLambda :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON ExponentialDistribution instance ToJSON ExponentialDistribution instance Binary ExponentialDistribution where put = put . edLambda get = fmap ED get instance D.Distribution ExponentialDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr ExponentialDistribution where density (ED l) x | x < 0 = 0 | otherwise = l * exp (-l * x) logDensity (ED l) x | x < 0 = m_neg_inf | otherwise = log l + (-l * x) quantile = quantile instance D.Mean ExponentialDistribution where mean (ED l) = 1 / l instance D.Variance ExponentialDistribution where variance (ED l) = 1 / (l * l) instance D.MaybeMean ExponentialDistribution where maybeMean = Just . D.mean instance D.MaybeVariance ExponentialDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy ExponentialDistribution where entropy (ED l) = 1 - log l instance D.MaybeEntropy ExponentialDistribution where maybeEntropy = Just . D.entropy instance D.ContGen ExponentialDistribution where genContVar = MWC.exponential . edLambda cumulative :: ExponentialDistribution -> Double -> Double cumulative (ED l) x | x <= 0 = 0 | otherwise = 1 - exp (-l * x) complCumulative :: ExponentialDistribution -> Double -> Double complCumulative (ED l) x | x <= 0 = 1 | otherwise = exp (-l * x) quantile :: ExponentialDistribution -> Double -> Double quantile (ED l) p | p == 1 = 1 / 0 | p >= 0 && p < 1 = -log (1 - p) / l | otherwise = error $ "Statistics.Distribution.Exponential.quantile: p must be in [0,1] range. Got: "++show p -- | Create an exponential distribution. exponential :: Double -- ^ λ (scale) parameter. -> ExponentialDistribution exponential l | l <= 0 = error $ "Statistics.Distribution.Exponential.exponential: scale parameter must be positive. Got " ++ show l | otherwise = ED l -- | Create exponential distribution from sample. No tests are made to -- check whether it truly is exponential. exponentialFromSample :: Sample -> ExponentialDistribution exponentialFromSample = ED . S.mean statistics-0.13.2.3/Statistics/Distribution/FDistribution.hs0000644000000000000000000000647512504370360022264 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.FDistribution -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Fisher F distribution module Statistics.Distribution.FDistribution ( FDistribution , fDistribution , fDistributionNDF1 , fDistributionNDF2 ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import Numeric.MathFunctions.Constants (m_neg_inf) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import Statistics.Function (square) import Numeric.SpecFunctions ( logBeta, incompleteBeta, invIncompleteBeta, digamma) import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | F distribution data FDistribution = F { fDistributionNDF1 :: {-# UNPACK #-} !Double , fDistributionNDF2 :: {-# UNPACK #-} !Double , _pdfFactor :: {-# UNPACK #-} !Double } deriving (Eq, Show, Read, Typeable, Data, Generic) instance FromJSON FDistribution instance ToJSON FDistribution instance Binary FDistribution where get = F <$> get <*> get <*> get put (F x y z) = put x >> put y >> put z fDistribution :: Int -> Int -> FDistribution fDistribution n m | n > 0 && m > 0 = let n' = fromIntegral n m' = fromIntegral m f' = 0.5 * (log m' * m' + log n' * n') - logBeta (0.5*n') (0.5*m') in F n' m' f' | otherwise = error "Statistics.Distribution.FDistribution.fDistribution: non-positive number of degrees of freedom" instance D.Distribution FDistribution where cumulative = cumulative instance D.ContDistr FDistribution where density d x | x <= 0 = 0 | otherwise = exp $ logDensity d x logDensity d x | x <= 0 = m_neg_inf | otherwise = logDensity d x quantile = quantile cumulative :: FDistribution -> Double -> Double cumulative (F n m _) x | x <= 0 = 0 | isInfinite x = 1 -- Only matches +∞ | otherwise = let y = n*x in incompleteBeta (0.5 * n) (0.5 * m) (y / (m + y)) logDensity :: FDistribution -> Double -> Double logDensity (F n m fac) x = fac + log x * (0.5 * n - 1) - log(m + n*x) * 0.5 * (n + m) quantile :: FDistribution -> Double -> Double quantile (F n m _) p | p >= 0 && p <= 1 = let x = invIncompleteBeta (0.5 * n) (0.5 * m) p in m * x / (n * (1 - x)) | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.MaybeMean FDistribution where maybeMean (F _ m _) | m > 2 = Just $ m / (m - 2) | otherwise = Nothing instance D.MaybeVariance FDistribution where maybeStdDev (F n m _) | m > 4 = Just $ 2 * square m * (m + n - 2) / (n * square (m - 2) * (m - 4)) | otherwise = Nothing instance D.Entropy FDistribution where entropy (F n m _) = let nHalf = 0.5 * n mHalf = 0.5 * m in log (n/m) + logBeta nHalf mHalf + (1 - nHalf) * digamma nHalf - (1 + mHalf) * digamma mHalf + (nHalf + mHalf) * digamma (nHalf + mHalf) instance D.MaybeEntropy FDistribution where maybeEntropy = Just . D.entropy instance D.ContGen FDistribution where genContVar = D.genContinous statistics-0.13.2.3/Statistics/Distribution/Gamma.hs0000644000000000000000000001020712504370360020505 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Gamma -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The gamma distribution. This is a continuous probability -- distribution with two parameters, /k/ and ϑ. If /k/ is -- integral, the distribution represents the sum of /k/ independent -- exponentially distributed random variables, each of which has a -- mean of ϑ. module Statistics.Distribution.Gamma ( GammaDistribution -- * Constructors , gammaDistr , improperGammaDistr -- * Accessors , gdShape , gdScale ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<$>), (<*>)) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_pos_inf, m_NaN, m_neg_inf) import Numeric.SpecFunctions (incompleteGamma, invIncompleteGamma, logGamma, digamma) import Statistics.Distribution.Poisson.Internal as Poisson import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC -- | The gamma distribution. data GammaDistribution = GD { gdShape :: {-# UNPACK #-} !Double -- ^ Shape parameter, /k/. , gdScale :: {-# UNPACK #-} !Double -- ^ Scale parameter, ϑ. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON GammaDistribution instance ToJSON GammaDistribution instance Binary GammaDistribution where put (GD x y) = put x >> put y get = GD <$> get <*> get -- | Create gamma distribution. Both shape and scale parameters must -- be positive. gammaDistr :: Double -- ^ Shape parameter. /k/ -> Double -- ^ Scale parameter, ϑ. -> GammaDistribution gammaDistr k theta | k <= 0 = error $ msg ++ "shape must be positive. Got " ++ show k | theta <= 0 = error $ msg ++ "scale must be positive. Got " ++ show theta | otherwise = improperGammaDistr k theta where msg = "Statistics.Distribution.Gamma.gammaDistr: " -- | Create gamma distribution. This constructor do not check whether -- parameters are valid improperGammaDistr :: Double -- ^ Shape parameter. /k/ -> Double -- ^ Scale parameter, ϑ. -> GammaDistribution improperGammaDistr = GD instance D.Distribution GammaDistribution where cumulative = cumulative instance D.ContDistr GammaDistribution where density = density logDensity (GD k theta) x | x <= 0 = m_neg_inf | otherwise = log x * (k - 1) - (x / theta) - logGamma k - log theta * k quantile = quantile instance D.Variance GammaDistribution where variance (GD a l) = a * l * l instance D.Mean GammaDistribution where mean (GD a l) = a * l instance D.MaybeMean GammaDistribution where maybeMean = Just . D.mean instance D.MaybeVariance GammaDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.MaybeEntropy GammaDistribution where maybeEntropy (GD a l) | a > 0 && l > 0 = Just $ a + log l + logGamma a + (1-a) * digamma a | otherwise = Nothing instance D.ContGen GammaDistribution where genContVar (GD a l) = MWC.gamma a l density :: GammaDistribution -> Double -> Double density (GD a l) x | a < 0 || l <= 0 = m_NaN | x <= 0 = 0 | a == 0 = if x == 0 then m_pos_inf else 0 | x == 0 = if a < 1 then m_pos_inf else if a > 1 then 0 else 1/l | a < 1 = Poisson.probability (x/l) a * a / x | otherwise = Poisson.probability (x/l) (a-1) / l cumulative :: GammaDistribution -> Double -> Double cumulative (GD k l) x | x <= 0 = 0 | otherwise = incompleteGamma k (x/l) quantile :: GammaDistribution -> Double -> Double quantile (GD k l) p | p == 0 = 0 | p == 1 = 1/0 | p > 0 && p < 1 = l * invIncompleteGamma k p | otherwise = error $ "Statistics.Distribution.Gamma.quantile: p must be in [0,1] range. Got: "++show p statistics-0.13.2.3/Statistics/Distribution/Geometric.hs0000644000000000000000000001216712504370360021410 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Geometric -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Geometric distribution. There are two variants of -- distribution. First is the probability distribution of the number -- of Bernoulli trials needed to get one success, supported on the set -- [1,2..] ('GeometricDistribution'). Sometimes it's referred to as -- the /shifted/ geometric distribution to distinguish from another -- one. -- -- Second variant is probability distribution of the number of -- failures before first success, defined over the set [0,1..] -- ('GeometricDistribution0'). module Statistics.Distribution.Geometric ( GeometricDistribution , GeometricDistribution0 -- * Constructors , geometric , geometric0 -- ** Accessors , gdSuccess , gdSuccess0 ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<$>)) import Control.Monad (liftM) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_pos_inf, m_neg_inf) import qualified Statistics.Distribution as D import qualified System.Random.MWC.Distributions as MWC ---------------------------------------------------------------- -- Distribution over [1..] newtype GeometricDistribution = GD { gdSuccess :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON GeometricDistribution instance ToJSON GeometricDistribution instance Binary GeometricDistribution where get = GD <$> get put (GD x) = put x instance D.Distribution GeometricDistribution where cumulative = cumulative instance D.DiscreteDistr GeometricDistribution where probability (GD s) n | n < 1 = 0 | otherwise = s * (1-s) ** (fromIntegral n - 1) logProbability (GD s) n | n < 1 = m_neg_inf | otherwise = log s + log (1-s) * (fromIntegral n - 1) instance D.Mean GeometricDistribution where mean (GD s) = 1 / s instance D.Variance GeometricDistribution where variance (GD s) = (1 - s) / (s * s) instance D.MaybeMean GeometricDistribution where maybeMean = Just . D.mean instance D.MaybeVariance GeometricDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy GeometricDistribution where entropy (GD s) | s == 0 = m_pos_inf | s == 1 = 0 | otherwise = negate $ (s * log s + (1-s) * log (1-s)) / s instance D.MaybeEntropy GeometricDistribution where maybeEntropy = Just . D.entropy instance D.DiscreteGen GeometricDistribution where genDiscreteVar (GD s) g = MWC.geometric1 s g instance D.ContGen GeometricDistribution where genContVar d g = fromIntegral `liftM` D.genDiscreteVar d g -- | Create geometric distribution. geometric :: Double -- ^ Success rate -> GeometricDistribution geometric x | x >= 0 && x <= 1 = GD x | otherwise = error $ "Statistics.Distribution.Geometric.geometric: probability must be in [0,1] range. Got " ++ show x cumulative :: GeometricDistribution -> Double -> Double cumulative (GD s) x | x < 1 = 0 | isInfinite x = 1 | isNaN x = error "Statistics.Distribution.Geometric.cumulative: NaN input" | otherwise = 1 - (1-s) ^ (floor x :: Int) ---------------------------------------------------------------- -- Distribution over [0..] newtype GeometricDistribution0 = GD0 { gdSuccess0 :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON GeometricDistribution0 instance ToJSON GeometricDistribution0 instance Binary GeometricDistribution0 where get = GD0 <$> get put (GD0 x) = put x instance D.Distribution GeometricDistribution0 where cumulative (GD0 s) x = cumulative (GD s) (x + 1) instance D.DiscreteDistr GeometricDistribution0 where probability (GD0 s) n = D.probability (GD s) (n + 1) logProbability (GD0 s) n = D.logProbability (GD s) (n + 1) instance D.Mean GeometricDistribution0 where mean (GD0 s) = 1 / s - 1 instance D.Variance GeometricDistribution0 where variance (GD0 s) = D.variance (GD s) instance D.MaybeMean GeometricDistribution0 where maybeMean = Just . D.mean instance D.MaybeVariance GeometricDistribution0 where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy GeometricDistribution0 where entropy (GD0 s) = D.entropy (GD s) instance D.MaybeEntropy GeometricDistribution0 where maybeEntropy = Just . D.entropy instance D.DiscreteGen GeometricDistribution0 where genDiscreteVar (GD0 s) g = MWC.geometric0 s g instance D.ContGen GeometricDistribution0 where genContVar d g = fromIntegral `liftM` D.genDiscreteVar d g -- | Create geometric distribution. geometric0 :: Double -- ^ Success rate -> GeometricDistribution0 geometric0 x | x >= 0 && x <= 1 = GD0 x | otherwise = error $ "Statistics.Distribution.Geometric.geometric: probability must be in [0,1] range. Got " ++ show x statistics-0.13.2.3/Statistics/Distribution/Hypergeometric.hs0000644000000000000000000000766312504370360022465 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Hypergeometric -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Hypergeometric distribution. This is the discrete probability -- distribution that measures the probability of /k/ successes in /l/ -- trials, without replacement, from a finite population. -- -- The parameters of the distribution describe /k/ elements chosen -- from a population of /l/, with /m/ elements of one type, and -- /l/-/m/ of the other (all are positive integers). module Statistics.Distribution.Hypergeometric ( HypergeometricDistribution -- * Constructors , hypergeometric -- ** Accessors , hdM , hdL , hdK ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_epsilon) import Numeric.SpecFunctions (choose) import qualified Statistics.Distribution as D import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) data HypergeometricDistribution = HD { hdM :: {-# UNPACK #-} !Int , hdL :: {-# UNPACK #-} !Int , hdK :: {-# UNPACK #-} !Int } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON HypergeometricDistribution instance ToJSON HypergeometricDistribution instance Binary HypergeometricDistribution where get = HD <$> get <*> get <*> get put (HD x y z) = put x >> put y >> put z instance D.Distribution HypergeometricDistribution where cumulative = cumulative instance D.DiscreteDistr HypergeometricDistribution where probability = probability instance D.Mean HypergeometricDistribution where mean = mean instance D.Variance HypergeometricDistribution where variance = variance instance D.MaybeMean HypergeometricDistribution where maybeMean = Just . D.mean instance D.MaybeVariance HypergeometricDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Entropy HypergeometricDistribution where entropy = directEntropy instance D.MaybeEntropy HypergeometricDistribution where maybeEntropy = Just . D.entropy variance :: HypergeometricDistribution -> Double variance (HD m l k) = (k' * ml) * (1 - ml) * (l' - k') / (l' - 1) where m' = fromIntegral m l' = fromIntegral l k' = fromIntegral k ml = m' / l' mean :: HypergeometricDistribution -> Double mean (HD m l k) = fromIntegral k * fromIntegral m / fromIntegral l directEntropy :: HypergeometricDistribution -> Double directEntropy d@(HD m _ _) = negate . sum $ takeWhile (< negate m_epsilon) $ dropWhile (not . (< negate m_epsilon)) $ [ let x = probability d n in x * log x | n <- [0..m]] hypergeometric :: Int -- ^ /m/ -> Int -- ^ /l/ -> Int -- ^ /k/ -> HypergeometricDistribution hypergeometric m l k | not (l > 0) = error $ msg ++ "l must be positive" | not (m >= 0 && m <= l) = error $ msg ++ "m must lie in [0,l] range" | not (k > 0 && k <= l) = error $ msg ++ "k must lie in (0,l] range" | otherwise = HD m l k where msg = "Statistics.Distribution.Hypergeometric.hypergeometric: " -- Naive implementation probability :: HypergeometricDistribution -> Int -> Double probability (HD mi li ki) n | n < max 0 (mi+ki-li) || n > min mi ki = 0 | otherwise = choose mi n * choose (li - mi) (ki - n) / choose li ki cumulative :: HypergeometricDistribution -> Double -> Double cumulative d@(HD mi li ki) x | isNaN x = error "Statistics.Distribution.Hypergeometric.cumulative: NaN argument" | isInfinite x = if x > 0 then 1 else 0 | n < minN = 0 | n >= maxN = 1 | otherwise = D.sumProbabilities d minN n where n = floor x minN = max 0 (mi+ki-li) maxN = min mi ki statistics-0.13.2.3/Statistics/Distribution/Normal.hs0000644000000000000000000001035212504370360020714 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Normal -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The normal distribution. This is a continuous probability -- distribution that describes data that cluster around a mean. module Statistics.Distribution.Normal ( NormalDistribution -- * Constructors , normalDistr , normalFromSample , standard ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<$>), (<*>)) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_sqrt_2, m_sqrt_2_pi) import Numeric.SpecFunctions (erfc, invErfc) import qualified Statistics.Distribution as D import qualified Statistics.Sample as S import qualified System.Random.MWC.Distributions as MWC -- | The normal distribution. data NormalDistribution = ND { mean :: {-# UNPACK #-} !Double , stdDev :: {-# UNPACK #-} !Double , ndPdfDenom :: {-# UNPACK #-} !Double , ndCdfDenom :: {-# UNPACK #-} !Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON NormalDistribution instance ToJSON NormalDistribution instance Binary NormalDistribution where put (ND w x y z) = put w >> put x >> put y >> put z get = ND <$> get <*> get <*> get <*> get instance D.Distribution NormalDistribution where cumulative = cumulative complCumulative = complCumulative instance D.ContDistr NormalDistribution where logDensity = logDensity quantile = quantile instance D.MaybeMean NormalDistribution where maybeMean = Just . D.mean instance D.Mean NormalDistribution where mean = mean instance D.MaybeVariance NormalDistribution where maybeStdDev = Just . D.stdDev maybeVariance = Just . D.variance instance D.Variance NormalDistribution where stdDev = stdDev instance D.Entropy NormalDistribution where entropy d = 0.5 * log (2 * pi * exp 1 * D.variance d) instance D.MaybeEntropy NormalDistribution where maybeEntropy = Just . D.entropy instance D.ContGen NormalDistribution where genContVar d = MWC.normal (mean d) (stdDev d) -- | Standard normal distribution with mean equal to 0 and variance equal to 1 standard :: NormalDistribution standard = ND { mean = 0.0 , stdDev = 1.0 , ndPdfDenom = log m_sqrt_2_pi , ndCdfDenom = m_sqrt_2 } -- | Create normal distribution from parameters. -- -- IMPORTANT: prior to 0.10 release second parameter was variance not -- standard deviation. normalDistr :: Double -- ^ Mean of distribution -> Double -- ^ Standard deviation of distribution -> NormalDistribution normalDistr m sd | sd > 0 = ND { mean = m , stdDev = sd , ndPdfDenom = log $ m_sqrt_2_pi * sd , ndCdfDenom = m_sqrt_2 * sd } | otherwise = error $ "Statistics.Distribution.Normal.normalDistr: standard deviation must be positive. Got " ++ show sd -- | Create distribution using parameters estimated from -- sample. Variance is estimated using maximum likelihood method -- (biased estimation). normalFromSample :: S.Sample -> NormalDistribution normalFromSample xs = normalDistr m (sqrt v) where (m,v) = S.meanVariance xs logDensity :: NormalDistribution -> Double -> Double logDensity d x = (-xm * xm / (2 * sd * sd)) - ndPdfDenom d where xm = x - mean d sd = stdDev d cumulative :: NormalDistribution -> Double -> Double cumulative d x = erfc ((mean d - x) / ndCdfDenom d) / 2 complCumulative :: NormalDistribution -> Double -> Double complCumulative d x = erfc ((x - mean d) / ndCdfDenom d) / 2 quantile :: NormalDistribution -> Double -> Double quantile d p | p == 0 = -inf | p == 1 = inf | p == 0.5 = mean d | p > 0 && p < 1 = x * ndCdfDenom d + mean d | otherwise = error $ "Statistics.Distribution.Normal.quantile: p must be in [0,1] range. Got: "++show p where x = - invErfc (2 * p) inf = 1/0 statistics-0.13.2.3/Statistics/Distribution/Poisson.hs0000644000000000000000000000570212504370360021121 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Poisson -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Poisson distribution. This is the discrete probability -- distribution of a number of events occurring in a fixed interval if -- these events occur with a known average rate, and occur -- independently from each other within that interval. module Statistics.Distribution.Poisson ( PoissonDistribution -- * Constructors , poisson -- * Accessors , poissonLambda -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import qualified Statistics.Distribution.Poisson.Internal as I import Numeric.SpecFunctions (incompleteGamma,logFactorial) import Numeric.MathFunctions.Constants (m_neg_inf) import Data.Binary (put, get) newtype PoissonDistribution = PD { poissonLambda :: Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON PoissonDistribution instance ToJSON PoissonDistribution instance Binary PoissonDistribution where get = fmap PD get put = put . poissonLambda instance D.Distribution PoissonDistribution where cumulative (PD lambda) x | x < 0 = 0 | isInfinite x = 1 | isNaN x = error "Statistics.Distribution.Poisson.cumulative: NaN input" | otherwise = 1 - incompleteGamma (fromIntegral (floor x + 1 :: Int)) lambda instance D.DiscreteDistr PoissonDistribution where probability (PD lambda) x = I.probability lambda (fromIntegral x) logProbability (PD lambda) i | i < 0 = m_neg_inf | otherwise = log lambda * fromIntegral i - logFactorial i - lambda instance D.Variance PoissonDistribution where variance = poissonLambda instance D.Mean PoissonDistribution where mean = poissonLambda instance D.MaybeMean PoissonDistribution where maybeMean = Just . D.mean instance D.MaybeVariance PoissonDistribution where maybeStdDev = Just . D.stdDev instance D.Entropy PoissonDistribution where entropy (PD lambda) = I.poissonEntropy lambda instance D.MaybeEntropy PoissonDistribution where maybeEntropy = Just . D.entropy -- | Create Poisson distribution. poisson :: Double -> PoissonDistribution poisson l | l >= 0 = PD l | otherwise = error $ "Statistics.Distribution.Poisson.poisson: lambda must be non-negative. Got " ++ show l -- $references -- -- * Loader, C. (2000) Fast and Accurate Computation of Binomial -- Probabilities. -- * Adell, J., Lekuona, A., and Yu, Y. (2010) Sharp Bounds on the -- Entropy of the Poisson Law and Related Quantities -- statistics-0.13.2.3/Statistics/Distribution/StudentT.hs0000644000000000000000000000663312504370360021245 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.StudentT -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Student-T distribution module Statistics.Distribution.StudentT ( StudentT , studentT , studentTndf , studentTUnstandardized ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import Statistics.Distribution.Transform (LinearTransform (..)) import Numeric.SpecFunctions ( logBeta, incompleteBeta, invIncompleteBeta, digamma) import Data.Binary (put, get) -- | Student-T distribution newtype StudentT = StudentT { studentTndf :: Double } deriving (Eq, Show, Read, Typeable, Data, Generic) instance FromJSON StudentT instance ToJSON StudentT instance Binary StudentT where put = put . studentTndf get = fmap StudentT get -- | Create Student-T distribution. Number of parameters must be positive. studentT :: Double -> StudentT studentT ndf | ndf > 0 = StudentT ndf | otherwise = modErr "studentT" "non-positive number of degrees of freedom" instance D.Distribution StudentT where cumulative = cumulative instance D.ContDistr StudentT where density d@(StudentT ndf) x = exp (logDensityUnscaled d x) / sqrt ndf logDensity d@(StudentT ndf) x = logDensityUnscaled d x - log (sqrt ndf) quantile = quantile cumulative :: StudentT -> Double -> Double cumulative (StudentT ndf) x | x > 0 = 1 - 0.5 * ibeta | otherwise = 0.5 * ibeta where ibeta = incompleteBeta (0.5 * ndf) 0.5 (ndf / (ndf + x*x)) logDensityUnscaled :: StudentT -> Double -> Double logDensityUnscaled (StudentT ndf) x = log (ndf / (ndf + x*x)) * (0.5 * (1 + ndf)) - logBeta 0.5 (0.5 * ndf) quantile :: StudentT -> Double -> Double quantile (StudentT ndf) p | p >= 0 && p <= 1 = let x = invIncompleteBeta (0.5 * ndf) 0.5 (2 * min p (1 - p)) in case sqrt $ ndf * (1 - x) / x of r | p < 0.5 -> -r | otherwise -> r | otherwise = modErr "quantile" $ "p must be in [0,1] range. Got: "++show p instance D.MaybeMean StudentT where maybeMean (StudentT ndf) | ndf > 1 = Just 0 | otherwise = Nothing instance D.MaybeVariance StudentT where maybeVariance (StudentT ndf) | ndf > 2 = Just $! ndf / (ndf - 2) | otherwise = Nothing instance D.Entropy StudentT where entropy (StudentT ndf) = 0.5 * (ndf+1) * (digamma ((1+ndf)/2) - digamma(ndf/2)) + log (sqrt ndf) + logBeta (ndf/2) 0.5 instance D.MaybeEntropy StudentT where maybeEntropy = Just . D.entropy instance D.ContGen StudentT where genContVar = D.genContinous -- | Create an unstandardized Student-t distribution. studentTUnstandardized :: Double -- ^ Number of degrees of freedom -> Double -- ^ Central value (0 for standard Student T distribution) -> Double -- ^ Scale parameter -> LinearTransform StudentT studentTUnstandardized ndf mu sigma | sigma > 0 = LinearTransform mu sigma $ studentT ndf | otherwise = modErr "studentTUnstandardized" $ "sigma must be > 0. Got: " ++ show sigma modErr :: String -> String -> a modErr fun msg = error $ "Statistics.Distribution.StudentT." ++ fun ++ ": " ++ msg statistics-0.13.2.3/Statistics/Distribution/Transform.hs0000644000000000000000000000705212504370360021442 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric, FlexibleContexts, FlexibleInstances, UndecidableInstances #-} -- | -- Module : Statistics.Distribution.Transform -- Copyright : (c) 2013 John McDonnell; -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Transformations over distributions module Statistics.Distribution.Transform ( LinearTransform (..) , linTransFixedPoint , scaleAround ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative ((<*>)) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data, Typeable) import Data.Functor ((<$>)) import GHC.Generics (Generic) import qualified Statistics.Distribution as D -- | Linear transformation applied to distribution. -- -- > LinearTransform μ σ _ -- > x' = μ + σ·x data LinearTransform d = LinearTransform { linTransLocation :: {-# UNPACK #-} !Double -- ^ Location parameter. , linTransScale :: {-# UNPACK #-} !Double -- ^ Scale parameter. , linTransDistr :: d -- ^ Distribution being transformed. } deriving (Eq, Show, Read, Typeable, Data, Generic) instance (FromJSON d) => FromJSON (LinearTransform d) instance (ToJSON d) => ToJSON (LinearTransform d) instance (Binary d) => Binary (LinearTransform d) where get = LinearTransform <$> get <*> get <*> get put (LinearTransform x y z) = put x >> put y >> put z -- | Apply linear transformation to distribution. scaleAround :: Double -- ^ Fixed point -> Double -- ^ Scale parameter -> d -- ^ Distribution -> LinearTransform d scaleAround x0 sc = LinearTransform (x0 * (1 - sc)) sc -- | Get fixed point of linear transformation linTransFixedPoint :: LinearTransform d -> Double linTransFixedPoint (LinearTransform loc sc _) = loc / (1 - sc) instance Functor LinearTransform where fmap f (LinearTransform loc sc dist) = LinearTransform loc sc (f dist) instance D.Distribution d => D.Distribution (LinearTransform d) where cumulative (LinearTransform loc sc dist) x = D.cumulative dist $ (x-loc) / sc instance D.ContDistr d => D.ContDistr (LinearTransform d) where density (LinearTransform loc sc dist) x = D.density dist ((x-loc) / sc) / sc logDensity (LinearTransform loc sc dist) x = D.logDensity dist ((x-loc) / sc) - log sc quantile (LinearTransform loc sc dist) p = loc + sc * D.quantile dist p instance D.MaybeMean d => D.MaybeMean (LinearTransform d) where maybeMean (LinearTransform loc _ dist) = (+loc) <$> D.maybeMean dist instance (D.Mean d) => D.Mean (LinearTransform d) where mean (LinearTransform loc _ dist) = loc + D.mean dist instance D.MaybeVariance d => D.MaybeVariance (LinearTransform d) where maybeVariance (LinearTransform _ sc dist) = (*(sc*sc)) <$> D.maybeVariance dist maybeStdDev (LinearTransform _ sc dist) = (*sc) <$> D.maybeStdDev dist instance (D.Variance d) => D.Variance (LinearTransform d) where variance (LinearTransform _ sc dist) = sc * sc * D.variance dist stdDev (LinearTransform _ sc dist) = sc * D.stdDev dist instance (D.MaybeEntropy d, D.DiscreteDistr d) => D.MaybeEntropy (LinearTransform d) where maybeEntropy (LinearTransform _ _ dist) = D.maybeEntropy dist instance (D.Entropy d, D.DiscreteDistr d) => D.Entropy (LinearTransform d) where entropy (LinearTransform _ _ dist) = D.entropy dist instance D.ContGen d => D.ContGen (LinearTransform d) where genContVar (LinearTransform loc sc d) g = do x <- D.genContVar d g return $! loc + sc * x statistics-0.13.2.3/Statistics/Distribution/Uniform.hs0000644000000000000000000000567012504370360021112 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Distribution.Uniform -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Variate distributed uniformly in the interval. module Statistics.Distribution.Uniform ( UniformDistribution -- * Constructors , uniformDistr -- ** Accessors , uniformA , uniformB ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import qualified Statistics.Distribution as D import qualified System.Random.MWC as MWC import Data.Binary (put, get) import Control.Applicative ((<$>), (<*>)) -- | Uniform distribution from A to B data UniformDistribution = UniformDistribution { uniformA :: {-# UNPACK #-} !Double -- ^ Low boundary of distribution , uniformB :: {-# UNPACK #-} !Double -- ^ Upper boundary of distribution } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON UniformDistribution instance ToJSON UniformDistribution instance Binary UniformDistribution where put (UniformDistribution x y) = put x >> put y get = UniformDistribution <$> get <*> get -- | Create uniform distribution. uniformDistr :: Double -> Double -> UniformDistribution uniformDistr a b | b < a = uniformDistr b a | a < b = UniformDistribution a b | otherwise = error "Statistics.Distribution.Uniform.uniform: wrong parameters" -- NOTE: failure is in default branch to guard againist NaNs. instance D.Distribution UniformDistribution where cumulative (UniformDistribution a b) x | x < a = 0 | x > b = 1 | otherwise = (x - a) / (b - a) instance D.ContDistr UniformDistribution where density (UniformDistribution a b) x | x < a = 0 | x > b = 0 | otherwise = 1 / (b - a) quantile (UniformDistribution a b) p | p >= 0 && p <= 1 = a + (b - a) * p | otherwise = error $ "Statistics.Distribution.Uniform.quantile: p must be in [0,1] range. Got: "++show p instance D.Mean UniformDistribution where mean (UniformDistribution a b) = 0.5 * (a + b) instance D.Variance UniformDistribution where -- NOTE: 1/sqrt 12 is not constant folded (#4101) so it's written as -- numerical constant. (Also FIXME!) stdDev (UniformDistribution a b) = 0.2886751345948129 * (b - a) variance (UniformDistribution a b) = d * d / 12 where d = b - a instance D.MaybeMean UniformDistribution where maybeMean = Just . D.mean instance D.MaybeVariance UniformDistribution where maybeStdDev = Just . D.stdDev instance D.Entropy UniformDistribution where entropy (UniformDistribution a b) = log (b - a) instance D.MaybeEntropy UniformDistribution where maybeEntropy = Just . D.entropy instance D.ContGen UniformDistribution where genContVar (UniformDistribution a b) gen = MWC.uniformR (a,b) gen statistics-0.13.2.3/Statistics/Distribution/Poisson/0000755000000000000000000000000012504370360020561 5ustar0000000000000000statistics-0.13.2.3/Statistics/Distribution/Poisson/Internal.hs0000644000000000000000000001541512504370360022677 0ustar0000000000000000-- | -- Module : Statistics.Distribution.Poisson.Internal -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Internal code for the Poisson distribution. module Statistics.Distribution.Poisson.Internal ( probability, poissonEntropy ) where import Data.List (unfoldr) import Numeric.MathFunctions.Constants (m_sqrt_2_pi, m_tiny, m_epsilon) import Numeric.SpecFunctions (logGamma, stirlingError, choose, logFactorial) import Numeric.SpecFunctions.Extra (bd0) -- | An unchecked, non-integer-valued version of Loader's saddle point -- algorithm. probability :: Double -> Double -> Double probability 0 0 = 1 probability 0 1 = 0 probability lambda x | isInfinite lambda = 0 | x < 0 = 0 | x <= lambda * m_tiny = exp (-lambda) | lambda < x * m_tiny = exp (-lambda + x * log lambda - logGamma (x+1)) | otherwise = exp (-(stirlingError x) - bd0 x lambda) / (m_sqrt_2_pi * sqrt x) -- | Compute entropy using Theorem 1 from "Sharp Bounds on the Entropy -- of the Poisson Law". This function is unused because 'directEntorpy' -- is just as accurate and is faster by about a factor of 4. alyThm1 :: Double -> Double alyThm1 lambda = sum (takeWhile (\x -> abs x >= m_epsilon * lll) alySeries) + lll where lll = lambda * (1 - log lambda) alySeries = [ alyc k * exp (fromIntegral k * log lambda - logFactorial k) | k <- [2..] ] alyc :: Int -> Double alyc k = sum [ parity j * choose (k-1) j * log (fromIntegral j+1) | j <- [0..k-1] ] where parity j | even (k-j) = -1 | otherwise = 1 -- | Returns [x, x^2, x^3, x^4, ...] powers :: Double -> [Double] powers x = unfoldr (\y -> Just (y*x,y*x)) 1 -- | Returns an upper bound according to theorem 2 of "Sharp Bounds on -- the Entropy of the Poisson Law" alyThm2Upper :: Double -> [Double] -> Double alyThm2Upper lambda coefficients = 1.4189385332046727 + 0.5 * log lambda + zipCoefficients lambda coefficients -- | Returns the average of the upper and lower bounds accounding to -- theorem 2. alyThm2 :: Double -> [Double] -> [Double] -> Double alyThm2 lambda upper lower = alyThm2Upper lambda upper + 0.5 * (zipCoefficients lambda lower) zipCoefficients :: Double -> [Double] -> Double zipCoefficients lambda coefficients = (sum $ map (uncurry (*)) (zip (powers $ recip lambda) coefficients)) -- Mathematica code deriving the coefficients below: -- -- poissonMoment[0, s_] := 1 -- poissonMoment[1, s_] := 0 -- poissonMoment[k_, s_] := -- Sum[s * Binomial[k - 1, j] * poissonMoment[j, s], {j, 0, k - 2}] -- -- upperSeries[m_] := -- Distribute[Integrate[ -- Sum[(-1)^(j - 1) * -- poissonMoment[j, \[Lambda]] / (j * (j - 1)* \[Lambda]^j), -- {j, 3, 2 m - 1}], -- \[Lambda]]] -- -- lowerSeries[m_] := -- Distribute[Integrate[ -- poissonMoment[ -- 2 m + 2, \[Lambda]] / ((2 m + -- 1)*\[Lambda]^(2 m + 2)), \[Lambda]]] -- -- upperBound[m_] := upperSeries[m] + (Log[2*Pi*\[Lambda]] + 1)/2 -- -- lowerBound[m_] := upperBound[m] + lowerSeries[m] upperCoefficients4 :: [Double] upperCoefficients4 = [1/12, 1/24, -103/180, -13/40, -1/210] lowerCoefficients4 :: [Double] lowerCoefficients4 = [0,0,0, -105/4, -210, -2275/18, -167/21, -1/72] upperCoefficients6 :: [Double] upperCoefficients6 = [1/12, 1/24, 19/360, 9/80, -38827/2520, -74855/1008, -73061/2520, -827/720, -1/990] lowerCoefficients6 :: [Double] lowerCoefficients6 = [0,0,0,0,0, -3465/2, -45045, -466235/4, -531916/9, -56287/10, -629/11, -1/156] upperCoefficients8 :: [Double] upperCoefficients8 = [1/12, 1/24, 19/360, 9/80, 863/2520, 1375/1008, -3023561/2520, -15174047/720, -231835511/5940, -18927611/1320, -58315591/60060, -23641/3640, -1/2730] lowerCoefficients8 :: [Double] lowerCoefficients8 = [0,0,0,0,0,0,0, -2027025/8, -15315300, -105252147, -178127950, -343908565/4, -10929270, -3721149/14, -7709/15, -1/272] upperCoefficients10 :: [Double] upperCoefficients10 = [1/12, 1/24, 19/360, 9,80, 863/2520, 1375/1008, 33953/5040, 57281/1440, -2271071617/11880, -1483674219/176, -31714406276557/720720, -7531072742237/131040, -1405507544003/65520, -21001919627/10080, -1365808297/36720, -26059/544, -1/5814] lowerCoefficients10 :: [Double] lowerCoefficients10 = [0,0,0,0,0,0,0,0,0,-130945815/2, -7638505875, -438256243425/4, -435477637540, -3552526473925/6, -857611717105/3, -545654955967/12, -5794690528/3, -578334559/42, -699043/133, -1/420] upperCoefficients12 :: [Double] upperCoefficients12 = [1/12, 1/24, 19/360, 863/2520, 1375/1008, 33953/5040, 57281/1440, 3250433/11880, 378351/176, -37521922090657/720720, -612415657466657/131040, -3476857538815223/65520, -243882174660761/1440, -34160796727900637/183600, -39453820646687/544, -750984629069237/81396, -2934056300989/9576, -20394527513/12540, -3829559/9240, -1/10626] lowerCoefficients12 :: [Double] lowerCoefficients12 = [0,0,0,0,0,0,0,0,0,0,0, -105411381075/4, -5270569053750, -272908057767345/2, -1051953238104769, -24557168490009155/8, -3683261873403112, -5461918738302026/3, -347362037754732, -2205885452434521/100, -12237195698286/35, -16926981721/22, -6710881/155, -1/600] -- | Compute entropy directly from its definition. This is just as accurate -- as 'alyThm1' for lambda <= 1 and is faster, but is slow for large lambda, -- and produces some underestimation due to accumulation of floating point -- error. directEntropy :: Double -> Double directEntropy lambda = negate . sum $ takeWhile (< negate m_epsilon * lambda) $ dropWhile (not . (< negate m_epsilon * lambda)) $ [ let x = probability lambda k in x * log x | k <- [0..]] -- | Compute the entropy of a poisson distribution using the best available -- method. poissonEntropy :: Double -> Double poissonEntropy lambda | lambda == 0 = 0 | lambda <= 10 = directEntropy lambda | lambda <= 12 = alyThm2 lambda upperCoefficients4 lowerCoefficients4 | lambda <= 18 = alyThm2 lambda upperCoefficients6 lowerCoefficients6 | lambda <= 24 = alyThm2 lambda upperCoefficients8 lowerCoefficients8 | lambda <= 30 = alyThm2 lambda upperCoefficients10 lowerCoefficients10 | otherwise = alyThm2 lambda upperCoefficients12 lowerCoefficients12 statistics-0.13.2.3/Statistics/Function/0000755000000000000000000000000012504370360016235 5ustar0000000000000000statistics-0.13.2.3/Statistics/Function/Comparison.hs0000644000000000000000000000250112504370360020701 0ustar0000000000000000-- | -- Module : Statistics.Function.Comparison -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Approximate floating point comparison, based on Bruce Dawson's -- \"Comparing floating point numbers\": -- module Statistics.Function.Comparison ( within ) where import Control.Monad.ST (runST) import Data.Primitive.ByteArray (newByteArray, readByteArray, writeByteArray) import Data.Word (Word64) -- | Compare two 'Double' values for approximate equality, using -- Dawson's method. -- -- The required accuracy is specified in ULPs (units of least -- precision). If the two numbers differ by the given number of ULPs -- or less, this function returns @True@. within :: Int -- ^ Number of ULPs of accuracy desired. -> Double -> Double -> Bool within ulps a b = runST $ do buf <- newByteArray 8 ai0 <- writeByteArray buf 0 a >> readByteArray buf 0 bi0 <- writeByteArray buf 0 b >> readByteArray buf 0 let big = 0x8000000000000000 :: Word64 ai | ai0 < 0 = big - ai0 | otherwise = ai0 bi | bi0 < 0 = big - bi0 | otherwise = bi0 return $ abs (ai - bi) <= fromIntegral ulps statistics-0.13.2.3/Statistics/Math/0000755000000000000000000000000012504370360015341 5ustar0000000000000000statistics-0.13.2.3/Statistics/Math/RootFinding.hs0000644000000000000000000001110312504370360020113 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric #-} -- | -- Module : Statistics.Math.RootFinding -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Haskell functions for finding the roots of mathematical functions. module Statistics.Math.RootFinding ( Root(..) , fromRoot , ridders -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Control.Applicative (Alternative(..), Applicative(..)) import Control.Monad (MonadPlus(..), ap) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Binary.Get (getWord8) import Data.Binary.Put (putWord8) import Data.Data (Data, Typeable) import GHC.Generics (Generic) import Statistics.Function.Comparison (within) -- | The result of searching for a root of a mathematical function. data Root a = NotBracketed -- ^ The function does not have opposite signs when -- evaluated at the lower and upper bounds of the search. | SearchFailed -- ^ The search failed to converge to within the given -- error tolerance after the given number of iterations. | Root a -- ^ A root was successfully found. deriving (Eq, Read, Show, Typeable, Data, Generic) instance (FromJSON a) => FromJSON (Root a) instance (ToJSON a) => ToJSON (Root a) instance (Binary a) => Binary (Root a) where put NotBracketed = putWord8 0 put SearchFailed = putWord8 1 put (Root a) = putWord8 2 >> put a get = do i <- getWord8 case i of 0 -> return NotBracketed 1 -> return SearchFailed 2 -> fmap Root get _ -> fail $ "Root.get: Invalid value: " ++ show i instance Functor Root where fmap _ NotBracketed = NotBracketed fmap _ SearchFailed = SearchFailed fmap f (Root a) = Root (f a) instance Monad Root where NotBracketed >>= _ = NotBracketed SearchFailed >>= _ = SearchFailed Root a >>= m = m a return = Root instance MonadPlus Root where mzero = SearchFailed r@(Root _) `mplus` _ = r _ `mplus` p = p instance Applicative Root where pure = Root (<*>) = ap instance Alternative Root where empty = SearchFailed r@(Root _) <|> _ = r _ <|> p = p -- | Returns either the result of a search for a root, or the default -- value if the search failed. fromRoot :: a -- ^ Default value. -> Root a -- ^ Result of search for a root. -> a fromRoot _ (Root a) = a fromRoot a _ = a -- | Use the method of Ridders to compute a root of a function. -- -- The function must have opposite signs when evaluated at the lower -- and upper bounds of the search (i.e. the root must be bracketed). ridders :: Double -- ^ Absolute error tolerance. -> (Double,Double) -- ^ Lower and upper bounds for the search. -> (Double -> Double) -- ^ Function to find the roots of. -> Root Double ridders tol (lo,hi) f | flo == 0 = Root lo | fhi == 0 = Root hi | flo*fhi > 0 = NotBracketed -- root is not bracketed | otherwise = go lo flo hi fhi 0 where go !a !fa !b !fb !i -- Root is bracketed within 1 ulp. No improvement could be made | within 1 a b = Root a -- Root is found. Check that f(m) == 0 is nessesary to ensure -- that root is never passed to 'go' | fm == 0 = Root m | fn == 0 = Root n | d < tol = Root n -- Too many iterations performed. Fail | i >= (100 :: Int) = SearchFailed -- Ridder's approximation coincide with one of old -- bounds. Revert to bisection | n == a || n == b = case () of _| fm*fa < 0 -> go a fa m fm (i+1) | otherwise -> go m fm b fb (i+1) -- Proceed as usual | fn*fm < 0 = go n fn m fm (i+1) | fn*fa < 0 = go a fa n fn (i+1) | otherwise = go n fn b fb (i+1) where d = abs (b - a) dm = (b - a) * 0.5 !m = a + dm !fm = f m !dn = signum (fb - fa) * dm * fm / sqrt(fm*fm - fa*fb) !n = m - signum dn * min (abs dn) (abs dm - 0.5 * tol) !fn = f n !flo = f lo !fhi = f hi -- $references -- -- * Ridders, C.F.J. (1979) A new algorithm for computing a single -- root of a real continuous function. -- /IEEE Transactions on Circuits and Systems/ 26:979–980. statistics-0.13.2.3/Statistics/Matrix/0000755000000000000000000000000012504370360015714 5ustar0000000000000000statistics-0.13.2.3/Statistics/Matrix/Algorithms.hs0000644000000000000000000000242312504370360020362 0ustar0000000000000000-- | -- Module : Statistics.Matrix.Algorithms -- Copyright : 2014 Bryan O'Sullivan -- License : BSD3 -- -- Useful matrix functions. module Statistics.Matrix.Algorithms ( qr ) where import Control.Applicative ((<$>), (<*>)) import Control.Monad.ST (ST, runST) import Prelude hiding (sum, replicate) import Statistics.Matrix (Matrix, column, dimension, for, norm) import qualified Statistics.Matrix.Mutable as M import Statistics.Sample.Internal (sum) import qualified Data.Vector.Unboxed as U -- | /O(r*c)/ Compute the QR decomposition of a matrix. -- The result returned is the matrices (/q/,/r/). qr :: Matrix -> (Matrix, Matrix) qr mat = runST $ do let (m,n) = dimension mat r <- M.replicate n n 0 a <- M.thaw mat for 0 n $ \j -> do cn <- M.immutably a $ \aa -> norm (column aa j) M.unsafeWrite r j j cn for 0 m $ \i -> M.unsafeModify a i j (/ cn) for (j+1) n $ \jj -> do p <- innerProduct a j jj M.unsafeWrite r j jj p for 0 m $ \i -> do aij <- M.unsafeRead a i j M.unsafeModify a i jj $ subtract (p * aij) (,) <$> M.unsafeFreeze a <*> M.unsafeFreeze r innerProduct :: M.MMatrix s -> Int -> Int -> ST s Double innerProduct mmat j k = M.immutably mmat $ \mat -> sum $ U.zipWith (*) (column mat j) (column mat k) statistics-0.13.2.3/Statistics/Matrix/Mutable.hs0000644000000000000000000000437212504370360017647 0ustar0000000000000000-- | -- Module : Statistics.Matrix.Mutable -- Copyright : (c) 2014 Bryan O'Sullivan -- License : BSD3 -- -- Basic mutable matrix operations. module Statistics.Matrix.Mutable ( MMatrix(..) , MVector , replicate , thaw , bounds , unsafeFreeze , unsafeRead , unsafeWrite , unsafeModify , immutably , unsafeBounds ) where import Control.Applicative ((<$>)) import Control.DeepSeq (NFData(..)) import Control.Monad.ST (ST) import Statistics.Matrix.Types (Matrix(..), MMatrix(..), MVector) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M import Prelude hiding (replicate) replicate :: Int -> Int -> Double -> ST s (MMatrix s) replicate r c k = MMatrix r c 0 <$> M.replicate (r*c) k thaw :: Matrix -> ST s (MMatrix s) thaw (Matrix r c e v) = MMatrix r c e <$> U.thaw v unsafeFreeze :: MMatrix s -> ST s Matrix unsafeFreeze (MMatrix r c e mv) = Matrix r c e <$> U.unsafeFreeze mv unsafeRead :: MMatrix s -> Int -> Int -> ST s Double unsafeRead mat r c = unsafeBounds mat r c M.unsafeRead {-# INLINE unsafeRead #-} unsafeWrite :: MMatrix s -> Int -> Int -> Double -> ST s () unsafeWrite mat row col k = unsafeBounds mat row col $ \v i -> M.unsafeWrite v i k {-# INLINE unsafeWrite #-} unsafeModify :: MMatrix s -> Int -> Int -> (Double -> Double) -> ST s () unsafeModify mat row col f = unsafeBounds mat row col $ \v i -> do k <- M.unsafeRead v i M.unsafeWrite v i (f k) {-# INLINE unsafeModify #-} -- | Given row and column numbers, calculate the offset into the flat -- row-major vector. bounds :: MMatrix s -> Int -> Int -> (MVector s -> Int -> r) -> r bounds (MMatrix rs cs _ mv) r c k | r < 0 || r >= rs = error "row out of bounds" | c < 0 || c >= cs = error "column out of bounds" | otherwise = k mv $! r * cs + c {-# INLINE bounds #-} -- | Given row and column numbers, calculate the offset into the flat -- row-major vector, without checking. unsafeBounds :: MMatrix s -> Int -> Int -> (MVector s -> Int -> r) -> r unsafeBounds (MMatrix _ cs _ mv) r c k = k mv $! r * cs + c {-# INLINE unsafeBounds #-} immutably :: NFData a => MMatrix s -> (Matrix -> a) -> ST s a immutably mmat f = do k <- f <$> unsafeFreeze mmat rnf k `seq` return k {-# INLINE immutably #-} statistics-0.13.2.3/Statistics/Matrix/Types.hs0000644000000000000000000000417312504370360017361 0ustar0000000000000000-- | -- Module : Statistics.Matrix.Types -- Copyright : 2014 Bryan O'Sullivan -- License : BSD3 -- -- Basic matrix operations. -- -- There isn't a widely used matrix package for Haskell yet, so -- we implement the necessary minimum here. module Statistics.Matrix.Types ( Vector , MVector , Matrix(..) , MMatrix(..) , debug ) where import Data.Char (isSpace) import Numeric (showFFloat) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M type Vector = U.Vector Double type MVector s = M.MVector s Double -- | Two-dimensional matrix, stored in row-major order. data Matrix = Matrix { rows :: {-# UNPACK #-} !Int -- ^ Rows of matrix. , cols :: {-# UNPACK #-} !Int -- ^ Columns of matrix. , exponent :: {-# UNPACK #-} !Int -- ^ In order to avoid overflows during matrix multiplication, a -- large exponent is stored separately. , _vector :: !Vector -- ^ Matrix data. } deriving (Eq) -- | Two-dimensional mutable matrix, stored in row-major order. data MMatrix s = MMatrix {-# UNPACK #-} !Int {-# UNPACK #-} !Int {-# UNPACK #-} !Int !(MVector s) -- The Show instance is useful only for debugging. instance Show Matrix where show = debug debug :: Matrix -> String debug (Matrix r c _ vs) = unlines $ zipWith (++) (hdr0 : repeat hdr) rrows where rrows = map (cleanEnd . unwords) . split $ zipWith (++) ldone tdone hdr0 = show (r,c) ++ " " hdr = replicate (length hdr0) ' ' pad plus k xs = replicate (k - length xs) ' ' `plus` xs ldone = map (pad (++) (longest lstr)) lstr tdone = map (pad (flip (++)) (longest tstr)) tstr (lstr, tstr) = unzip . map (break (=='.') . render) . U.toList $ vs longest = maximum . map length render k = reverse . dropWhile (=='.') . dropWhile (=='0') . reverse . showFFloat (Just 4) k $ "" split [] = [] split xs = i : split rest where (i, rest) = splitAt c xs cleanEnd = reverse . dropWhile isSpace . reverse statistics-0.13.2.3/Statistics/Resampling/0000755000000000000000000000000012504370360016551 5ustar0000000000000000statistics-0.13.2.3/Statistics/Resampling/Bootstrap.hs0000644000000000000000000001117212504370360021064 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric, OverloadedStrings, RecordWildCards #-} -- | -- Module : Statistics.Resampling.Bootstrap -- Copyright : (c) 2009, 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The bootstrap method for statistical inference. module Statistics.Resampling.Bootstrap ( Estimate(..) , bootstrapBCA , scale -- * References -- $references ) where import Control.Applicative ((<$>), (<*>)) import Control.DeepSeq (NFData) import Control.Exception (assert) import Control.Monad.Par (parMap, runPar) import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary) import Data.Binary (put, get) import Data.Data (Data) import Data.Typeable (Typeable) import Data.Vector.Unboxed ((!)) import GHC.Generics (Generic) import Statistics.Distribution (cumulative, quantile) import Statistics.Distribution.Normal import Statistics.Resampling (Resample(..), jackknife) import Statistics.Sample (mean) import Statistics.Types (Estimator, Sample) import qualified Data.Vector.Unboxed as U import qualified Statistics.Resampling as R -- | A point and interval estimate computed via an 'Estimator'. data Estimate = Estimate { estPoint :: {-# UNPACK #-} !Double -- ^ Point estimate. , estLowerBound :: {-# UNPACK #-} !Double -- ^ Lower bound of the estimate interval (i.e. the lower bound of -- the confidence interval). , estUpperBound :: {-# UNPACK #-} !Double -- ^ Upper bound of the estimate interval (i.e. the upper bound of -- the confidence interval). , estConfidenceLevel :: {-# UNPACK #-} !Double -- ^ Confidence level of the confidence intervals. } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Estimate instance ToJSON Estimate instance Binary Estimate where put (Estimate w x y z) = put w >> put x >> put y >> put z get = Estimate <$> get <*> get <*> get <*> get instance NFData Estimate -- | Multiply the point, lower bound, and upper bound in an 'Estimate' -- by the given value. scale :: Double -- ^ Value to multiply by. -> Estimate -> Estimate scale f e@Estimate{..} = e { estPoint = f * estPoint , estLowerBound = f * estLowerBound , estUpperBound = f * estUpperBound } estimate :: Double -> Double -> Double -> Double -> Estimate estimate pt lb ub cl = assert (lb <= ub) . assert (cl > 0 && cl < 1) $ Estimate { estPoint = pt , estLowerBound = lb , estUpperBound = ub , estConfidenceLevel = cl } data T = {-# UNPACK #-} !Double :< {-# UNPACK #-} !Double infixl 2 :< -- | Bias-corrected accelerated (BCA) bootstrap. This adjusts for both -- bias and skewness in the resampled distribution. bootstrapBCA :: Double -- ^ Confidence level -> Sample -- ^ Sample data -> [Estimator] -- ^ Estimators -> [Resample] -- ^ Resampled data -> [Estimate] bootstrapBCA confidenceLevel sample estimators resamples | confidenceLevel > 0 && confidenceLevel < 1 = runPar $ parMap (uncurry e) (zip estimators resamples) | otherwise = error "Statistics.Resampling.Bootstrap.bootstrapBCA: confidence level outside (0,1) range" where e est (Resample resample) | U.length sample == 1 || isInfinite bias = estimate pt pt pt confidenceLevel | otherwise = estimate pt (resample ! lo) (resample ! hi) confidenceLevel where pt = R.estimate est sample lo = max (cumn a1) 0 where a1 = bias + b1 / (1 - accel * b1) b1 = bias + z1 hi = min (cumn a2) (ni - 1) where a2 = bias + b2 / (1 - accel * b2) b2 = bias - z1 z1 = quantile standard ((1 - confidenceLevel) / 2) cumn = round . (*n) . cumulative standard bias = quantile standard (probN / n) where probN = fromIntegral . U.length . U.filter ( statistics-0.13.2.3/Statistics/Sample/0000755000000000000000000000000012504370360015671 5ustar0000000000000000statistics-0.13.2.3/Statistics/Sample/Histogram.hs0000644000000000000000000000751312504370360020170 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample.Histogram -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Functions for computing histograms of sample data. module Statistics.Sample.Histogram ( histogram -- * Building blocks , histogram_ , range ) where import Numeric.MathFunctions.Constants (m_epsilon,m_tiny) import Statistics.Function (minMax) import qualified Data.Vector.Generic as G import qualified Data.Vector.Generic.Mutable as GM -- | /O(n)/ Compute a histogram over a data set. -- -- The result consists of a pair of vectors: -- -- * The lower bound of each interval. -- -- * The number of samples within the interval. -- -- Interval (bin) sizes are uniform, and the upper and lower bounds -- are chosen automatically using the 'range' function. To specify -- these parameters directly, use the 'histogram_' function. histogram :: (G.Vector v0 Double, G.Vector v1 Double, Num b, G.Vector v1 b) => Int -- ^ Number of bins (must be positive). -> v0 Double -- ^ Sample data (cannot be empty). -> (v1 Double, v1 b) histogram numBins xs = (G.generate numBins step, histogram_ numBins lo hi xs) where (lo,hi) = range numBins xs step i = lo + d * fromIntegral i d = (hi - lo) / fromIntegral numBins {-# INLINE histogram #-} -- | /O(n)/ Compute a histogram over a data set. -- -- Interval (bin) sizes are uniform, based on the supplied upper -- and lower bounds. histogram_ :: (Num b, RealFrac a, G.Vector v0 a, G.Vector v1 b) => Int -- ^ Number of bins. This value must be positive. A zero -- or negative value will cause an error. -> a -- ^ Lower bound on interval range. Sample data less than -- this will cause an error. -> a -- ^ Upper bound on interval range. This value must not be -- less than the lower bound. Sample data that falls above -- the upper bound will cause an error. -> v0 a -- ^ Sample data. -> v1 b histogram_ numBins lo hi xs0 = G.create (GM.replicate numBins 0 >>= bin xs0) where bin xs bins = go 0 where go i | i >= len = return bins | otherwise = do let x = xs `G.unsafeIndex` i b = truncate $ (x - lo) / d GM.write bins b . (+1) =<< GM.read bins b go (i+1) len = G.length xs d = ((hi - lo) * (1 + realToFrac m_epsilon)) / fromIntegral numBins {-# INLINE histogram_ #-} -- | /O(n)/ Compute decent defaults for the lower and upper bounds of -- a histogram, based on the desired number of bins and the range of -- the sample data. -- -- The upper and lower bounds used are @(lo-d, hi+d)@, where -- -- @d = (maximum sample - minimum sample) / ((bins - 1) * 2)@ -- -- If all elements in the sample are the same and equal to @x@ range -- is set to @(x - |x|/10, x + |x|/10)@. And if @x@ is equal to 0 range -- is set to @(-1,1)@. This is needed to avoid creating histogram with -- zero bin size. range :: (G.Vector v Double) => Int -- ^ Number of bins (must be positive). -> v Double -- ^ Sample data (cannot be empty). -> (Double, Double) range numBins xs | numBins < 1 = error "Statistics.Histogram.range: invalid bin count" | G.null xs = error "Statistics.Histogram.range: empty sample" | lo == hi = case abs lo / 10 of a | a < m_tiny -> (-1,1) | otherwise -> (lo - a, lo + a) | otherwise = (lo-d, hi+d) where d | numBins == 1 = 0 | otherwise = (hi - lo) / ((fromIntegral numBins - 1) * 2) (lo,hi) = minMax xs {-# INLINE range #-} statistics-0.13.2.3/Statistics/Sample/Internal.hs0000644000000000000000000000136012504370360020001 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | -- Module : Statistics.Sample.Internal -- Copyright : (c) 2013 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Internal functions for computing over samples. module Statistics.Sample.Internal ( robustSumVar , sum ) where import Numeric.Sum (kbn, sumVector) import Prelude hiding (sum) import Statistics.Function (square) import qualified Data.Vector.Generic as G robustSumVar :: (G.Vector v Double) => Double -> v Double -> Double robustSumVar m = sum . G.map (square . subtract m) {-# INLINE robustSumVar #-} sum :: (G.Vector v Double) => v Double -> Double sum = sumVector kbn {-# INLINE sum #-} statistics-0.13.2.3/Statistics/Sample/KernelDensity.hs0000644000000000000000000001027012504370360021005 0ustar0000000000000000{-# LANGUAGE BangPatterns, FlexibleContexts, UnboxedTuples #-} -- | -- Module : Statistics.Sample.KernelDensity -- Copyright : (c) 2011 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kernel density estimation. This module provides a fast, robust, -- non-parametric way to estimate the probability density function of -- a sample. -- -- This estimator does not use the commonly employed \"Gaussian rule -- of thumb\". As a result, it outperforms many plug-in methods on -- multimodal samples with widely separated modes. module Statistics.Sample.KernelDensity ( -- * Estimation functions kde , kde_ -- * References -- $references ) where import Numeric.MathFunctions.Constants (m_sqrt_2_pi) import Prelude hiding (const, min, max, sum) import Statistics.Function (minMax, nextHighestPowerOfTwo) import Statistics.Math.RootFinding (fromRoot, ridders) import Statistics.Sample.Histogram (histogram_) import Statistics.Sample.Internal (sum) import Statistics.Transform (dct, idct) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | Gaussian kernel density estimator for one-dimensional data, using -- the method of Botev et al. -- -- The result is a pair of vectors, containing: -- -- * The coordinates of each mesh point. The mesh interval is chosen -- to be 20% larger than the range of the sample. (To specify the -- mesh interval, use 'kde_'.) -- -- * Density estimates at each mesh point. kde :: Int -- ^ The number of mesh points to use in the uniform discretization -- of the interval @(min,max)@. If this value is not a power of -- two, then it is rounded up to the next power of two. -> U.Vector Double -> (U.Vector Double, U.Vector Double) kde n0 xs = kde_ n0 (lo - range / 10) (hi + range / 10) xs where (lo,hi) = minMax xs range | U.length xs <= 1 = 1 -- Unreasonable guess | lo == hi = 1 -- All elements are equal | otherwise = hi - lo -- | Gaussian kernel density estimator for one-dimensional data, using -- the method of Botev et al. -- -- The result is a pair of vectors, containing: -- -- * The coordinates of each mesh point. -- -- * Density estimates at each mesh point. kde_ :: Int -- ^ The number of mesh points to use in the uniform discretization -- of the interval @(min,max)@. If this value is not a power of -- two, then it is rounded up to the next power of two. -> Double -- ^ Lower bound (@min@) of the mesh range. -> Double -- ^ Upper bound (@max@) of the mesh range. -> U.Vector Double -> (U.Vector Double, U.Vector Double) kde_ n0 min max xs | U.null xs = error "Statistics.KernelDensity.kde: empty sample" | n0 <= 1 = error "Statistics.KernelDensity.kde: invalid number of points" | otherwise = (mesh, density) where mesh = G.generate ni $ \z -> min + (d * fromIntegral z) where d = r / (n-1) density = G.map (/(2 * r)) . idct $ G.zipWith f a (G.enumFromTo 0 (n-1)) where f b z = b * exp (sqr z * sqr pi * t_star * (-0.5)) !n = fromIntegral ni !ni = nextHighestPowerOfTwo n0 !r = max - min a = dct . G.map (/ sum h) $ h where h = G.map (/ len) $ histogram_ ni min max xs !len = fromIntegral (G.length xs) !t_star = fromRoot (0.28 * len ** (-0.4)) . ridders 1e-14 (0,0.1) $ \x -> x - (len * (2 * sqrt pi) * go 6 (f 7 x)) ** (-0.4) where f q t = 2 * pi ** (q*2) * sum (G.zipWith g iv a2v) where g i a2 = i ** q * a2 * exp ((-i) * sqr pi * t) a2v = G.map (sqr . (*0.5)) $ G.tail a iv = G.map sqr $ G.enumFromTo 1 (n-1) go s !h | s == 1 = h | otherwise = go (s-1) (f s time) where time = (2 * const * k0 / len / h) ** (2 / (3 + 2 * s)) const = (1 + 0.5 ** (s+0.5)) / 3 k0 = U.product (G.enumFromThenTo 1 3 (2*s-1)) / m_sqrt_2_pi sqr x = x * x -- $references -- -- Botev. Z.I., Grotowski J.F., Kroese D.P. (2010). Kernel density -- estimation via diffusion. /Annals of Statistics/ -- 38(5):2916–2957. statistics-0.13.2.3/Statistics/Sample/Powers.hs0000644000000000000000000001567012504370360017515 0ustar0000000000000000{-# LANGUAGE BangPatterns, DeriveDataTypeable, DeriveGeneric, FlexibleContexts #-} -- | -- Module : Statistics.Sample.Powers -- Copyright : (c) 2009, 2010 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Very fast statistics over simple powers of a sample. These can all -- be computed efficiently in just a single pass over a sample, with -- that pass subject to stream fusion. -- -- The tradeoff is that some of these functions are less numerically -- robust than their counterparts in the 'Statistics.Sample' module. -- Where this is the case, the alternatives are noted. module Statistics.Sample.Powers ( -- * Types Powers -- * Constructor , powers -- * Descriptive functions , order , count , sum -- * Statistics of location , mean -- * Statistics of dispersion , variance , stdDev , varianceUnbiased -- * Functions over central moments , centralMoment , skewness , kurtosis -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import Data.Vector.Binary () import Data.Vector.Generic (unsafeFreeze) import Data.Vector.Unboxed ((!)) import GHC.Generics (Generic) import Numeric.SpecFunctions (choose) import Prelude hiding (sum) import Statistics.Function (indexed) import Statistics.Internal (inlinePerformIO) import System.IO.Unsafe (unsafePerformIO) import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as MU import qualified Statistics.Sample.Internal as S newtype Powers = Powers (U.Vector Double) deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Powers instance ToJSON Powers instance Binary Powers where put (Powers v) = put v get = fmap Powers get -- | O(/n/) Collect the /n/ simple powers of a sample. -- -- Functions computed over a sample's simple powers require at least a -- certain number (or /order/) of powers to be collected. -- -- * To compute the /k/th 'centralMoment', at least /k/ simple powers -- must be collected. -- -- * For the 'variance', at least 2 simple powers are needed. -- -- * For 'skewness', we need at least 3 simple powers. -- -- * For 'kurtosis', at least 4 simple powers are required. -- -- This function is subject to stream fusion. powers :: G.Vector v Double => Int -- ^ /n/, the number of powers, where /n/ >= 2. -> v Double -> Powers powers k | k < 2 = error "Statistics.Sample.powers: too few powers" | otherwise = fini . G.foldl' go (unsafePerformIO $ MU.replicate l 0) where go ms x = inlinePerformIO $ loop 0 1 where loop !i !xk | i == l = return ms | otherwise = do MU.read ms i >>= MU.write ms i . (+ xk) loop (i+1) (xk*x) fini = Powers . unsafePerformIO . unsafeFreeze l = k + 1 {-# SPECIALIZE powers :: Int -> U.Vector Double -> Powers #-} {-# SPECIALIZE powers :: Int -> V.Vector Double -> Powers #-} -- | The order (number) of simple powers collected from a 'sample'. order :: Powers -> Int order (Powers pa) = U.length pa - 1 -- | Compute the /k/th central moment of a sample. The central -- moment is also known as the moment about the mean. centralMoment :: Int -> Powers -> Double centralMoment k p@(Powers pa) | k < 0 || k > order p = error ("Statistics.Sample.Powers.centralMoment: " ++ "invalid argument") | k == 0 = 1 | otherwise = (/n) . S.sum . U.map go . indexed . U.take (k+1) $ pa where go (i , e) = (k `choose` i) * ((-m) ^ (k-i)) * e n = U.head pa m = mean p -- | Maximum likelihood estimate of a sample's variance. Also known -- as the population variance, where the denominator is /n/. This is -- the second central moment of the sample. -- -- This is less numerically robust than the variance function in the -- 'Statistics.Sample' module, but the number is essentially free to -- compute if you have already collected a sample's simple powers. -- -- Requires 'Powers' with 'order' at least 2. variance :: Powers -> Double variance = centralMoment 2 -- | Standard deviation. This is simply the square root of the -- maximum likelihood estimate of the variance. stdDev :: Powers -> Double stdDev = sqrt . variance -- | Unbiased estimate of a sample's variance. Also known as the -- sample variance, where the denominator is /n/-1. -- -- Requires 'Powers' with 'order' at least 2. varianceUnbiased :: Powers -> Double varianceUnbiased p@(Powers pa) | n > 1 = variance p * n / (n-1) | otherwise = 0 where n = U.head pa -- | Compute the skewness of a sample. This is a measure of the -- asymmetry of its distribution. -- -- A sample with negative skew is said to be /left-skewed/. Most of -- its mass is on the right of the distribution, with the tail on the -- left. -- -- > skewness . powers 3 $ U.to [1,100,101,102,103] -- > ==> -1.497681449918257 -- -- A sample with positive skew is said to be /right-skewed/. -- -- > skewness . powers 3 $ U.to [1,2,3,4,100] -- > ==> 1.4975367033335198 -- -- A sample's skewness is not defined if its 'variance' is zero. -- -- Requires 'Powers' with 'order' at least 3. skewness :: Powers -> Double skewness p = centralMoment 3 p * variance p ** (-1.5) -- | Compute the excess kurtosis of a sample. This is a measure of -- the \"peakedness\" of its distribution. A high kurtosis indicates -- that the sample's variance is due more to infrequent severe -- deviations than to frequent modest deviations. -- -- A sample's excess kurtosis is not defined if its 'variance' is -- zero. -- -- Requires 'Powers' with 'order' at least 4. kurtosis :: Powers -> Double kurtosis p = centralMoment 4 p / (v * v) - 3 where v = variance p -- | The number of elements in the original 'Sample'. This is the -- sample's zeroth simple power. count :: Powers -> Int count (Powers pa) = floor $ U.head pa -- | The sum of elements in the original 'Sample'. This is the -- sample's first simple power. sum :: Powers -> Double sum (Powers pa) = pa ! 1 -- | The arithmetic mean of elements in the original 'Sample'. -- -- This is less numerically robust than the mean function in the -- 'Statistics.Sample' module, but the number is essentially free to -- compute if you have already collected a sample's simple powers. mean :: Powers -> Double mean p@(Powers pa) | n == 0 = 0 | otherwise = sum p / n where n = U.head pa -- $references -- -- * Besset, D.H. (2000) Elements of statistics. -- /Object-oriented implementation of numerical methods/ -- ch. 9, pp. 311–331. -- -- -- * Anderson, G. (2009) Compute /k/th central moments in one -- pass. /quantblog/. statistics-0.13.2.3/Statistics/Sample/KernelDensity/0000755000000000000000000000000012504370360020451 5ustar0000000000000000statistics-0.13.2.3/Statistics/Sample/KernelDensity/Simple.hs0000644000000000000000000001602612504370360022243 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric, FlexibleContexts #-} -- | -- Module : Statistics.Sample.KernelDensity.Simple -- Copyright : (c) 2009 Bryan O'Sullivan -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kernel density estimation code, providing non-parametric ways to -- estimate the probability density function of a sample. -- -- The techniques used by functions in this module are relatively -- fast, but they generally give inferior results to the KDE function -- in the main 'Statistics.KernelDensity' module (due to the -- oversmoothing documented for 'bandwidth' below). module Statistics.Sample.KernelDensity.Simple {-# DEPRECATED "Use Statistics.Sample.KernelDensity instead." #-} ( -- * Simple entry points epanechnikovPDF , gaussianPDF -- * Building blocks -- These functions may be useful if you need to construct a kernel -- density function estimator other than the ones provided in this -- module. -- ** Choosing points from a sample , Points(..) , choosePoints -- ** Bandwidth estimation , Bandwidth , bandwidth , epanechnikovBW , gaussianBW -- ** Kernels , Kernel , epanechnikovKernel , gaussianKernel -- ** Low-level estimation , estimatePDF , simplePDF -- * References -- $references ) where import Data.Aeson (FromJSON, ToJSON) import Data.Binary (Binary(..)) import Data.Data (Data, Typeable) import Data.Vector.Binary () import GHC.Generics (Generic) import Numeric.MathFunctions.Constants (m_1_sqrt_2, m_2_sqrt_pi) import Prelude hiding (sum) import Statistics.Function (minMax) import Statistics.Sample (stdDev) import Statistics.Sample.Internal (sum) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | Points from the range of a 'Sample'. newtype Points = Points { fromPoints :: U.Vector Double } deriving (Eq, Read, Show, Typeable, Data, Generic) instance FromJSON Points instance ToJSON Points instance Binary Points where get = fmap Points get put = put . fromPoints -- | Bandwidth estimator for an Epanechnikov kernel. epanechnikovBW :: Double -> Bandwidth epanechnikovBW n = (80 / (n * m_2_sqrt_pi)) ** 0.2 -- | Bandwidth estimator for a Gaussian kernel. gaussianBW :: Double -> Bandwidth gaussianBW n = (4 / (n * 3)) ** 0.2 -- | The width of the convolution kernel used. type Bandwidth = Double -- | Compute the optimal bandwidth from the observed data for the -- given kernel. -- -- This function uses an estimate based on the standard deviation of a -- sample (due to Deheuvels), which performs reasonably well for -- unimodal distributions but leads to oversmoothing for more complex -- ones. bandwidth :: G.Vector v Double => (Double -> Bandwidth) -> v Double -> Bandwidth bandwidth kern values = stdDev values * kern (fromIntegral $ G.length values) -- | Choose a uniform range of points at which to estimate a sample's -- probability density function. -- -- If you are using a Gaussian kernel, multiply the sample's bandwidth -- by 3 before passing it to this function. -- -- If this function is passed an empty vector, it returns values of -- positive and negative infinity. choosePoints :: G.Vector v Double => Int -- ^ Number of points to select, /n/ -> Double -- ^ Sample bandwidth, /h/ -> v Double -- ^ Input data -> Points choosePoints n h sample = Points . U.map f $ U.enumFromTo 0 n' where lo = a - h hi = z + h (a, z) = minMax sample d = (hi - lo) / fromIntegral n' f i = lo + fromIntegral i * d n' = n - 1 -- | The convolution kernel. Its parameters are as follows: -- -- * Scaling factor, 1\//nh/ -- -- * Bandwidth, /h/ -- -- * A point at which to sample the input, /p/ -- -- * One sample value, /v/ type Kernel = Double -> Double -> Double -> Double -> Double -- | Epanechnikov kernel for probability density function estimation. epanechnikovKernel :: Kernel epanechnikovKernel f h p v | abs u <= 1 = f * (1 - u * u) | otherwise = 0 where u = (v - p) / (h * 0.75) -- | Gaussian kernel for probability density function estimation. gaussianKernel :: Kernel gaussianKernel f h p v = exp (-0.5 * u * u) * g where u = (v - p) / h g = f * 0.5 * m_2_sqrt_pi * m_1_sqrt_2 -- | Kernel density estimator, providing a non-parametric way of -- estimating the PDF of a random variable. estimatePDF :: G.Vector v Double => Kernel -- ^ Kernel function -> Bandwidth -- ^ Bandwidth, /h/ -> v Double -- ^ Sample data -> Points -- ^ Points at which to estimate -> U.Vector Double estimatePDF kernel h sample | n < 2 = errorShort "estimatePDF" | otherwise = U.map k . fromPoints where k p = sum . G.map (kernel f h p) $ sample f = 1 / (h * fromIntegral n) n = G.length sample {-# INLINE estimatePDF #-} -- | A helper for creating a simple kernel density estimation function -- with automatically chosen bandwidth and estimation points. simplePDF :: G.Vector v Double => (Double -> Double) -- ^ Bandwidth function -> Kernel -- ^ Kernel function -> Double -- ^ Bandwidth scaling factor (3 for a Gaussian kernel, 1 for all others) -> Int -- ^ Number of points at which to estimate -> v Double -- ^ sample data -> (Points, U.Vector Double) simplePDF fbw fpdf k numPoints sample = (points, estimatePDF fpdf bw sample points) where points = choosePoints numPoints (bw*k) sample bw = bandwidth fbw sample {-# INLINE simplePDF #-} -- | Simple Epanechnikov kernel density estimator. Returns the -- uniformly spaced points from the sample range at which the density -- function was estimated, and the estimates at those points. epanechnikovPDF :: G.Vector v Double => Int -- ^ Number of points at which to estimate -> v Double -- ^ Data sample -> (Points, U.Vector Double) epanechnikovPDF = simplePDF epanechnikovBW epanechnikovKernel 1 -- | Simple Gaussian kernel density estimator. Returns the uniformly -- spaced points from the sample range at which the density function -- was estimated, and the estimates at those points. gaussianPDF :: G.Vector v Double => Int -- ^ Number of points at which to estimate -> v Double -- ^ Data sample -> (Points, U.Vector Double) gaussianPDF = simplePDF gaussianBW gaussianKernel 3 errorShort :: String -> a errorShort func = error ("Statistics.KernelDensity." ++ func ++ ": at least two points required") -- $references -- -- * Deheuvels, P. (1977) Estimation non paramétrique de la densité -- par histogrammes -- généralisés. Mhttp://archive.numdam.org/article/RSA_1977__25_3_5_0.pdf> statistics-0.13.2.3/Statistics/Test/0000755000000000000000000000000012504370360015367 5ustar0000000000000000statistics-0.13.2.3/Statistics/Test/ChiSquared.hs0000644000000000000000000000354512504370360017762 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} -- | Pearson's chi squared test. module Statistics.Test.ChiSquared ( chi2test -- * Data types , TestType(..) , TestResult(..) ) where import Prelude hiding (sum) import Statistics.Distribution import Statistics.Distribution.ChiSquared import Statistics.Function (square) import Statistics.Sample.Internal (sum) import Statistics.Test.Types import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U -- | Generic form of Pearson chi squared tests for binned data. Data -- sample is supplied in form of tuples (observed quantity, -- expected number of events). Both must be positive. chi2test :: (G.Vector v (Int,Double), G.Vector v Double) => Double -- ^ p-value -> Int -- ^ Number of additional degrees of -- freedom. One degree of freedom -- is due to the fact that the are -- N observation in total and -- accounted for automatically. -> v (Int,Double) -- ^ Observation and expectation. -> TestResult chi2test p ndf vec | ndf < 0 = error $ "Statistics.Test.ChiSquare.chi2test: negative NDF " ++ show ndf | n < 0 = error $ "Statistics.Test.ChiSquare.chi2test: too short data sample" | p > 0 && p < 1 = significant $ complCumulative d chi2 < p | otherwise = error $ "Statistics.Test.ChiSquare.chi2test: bad p-value: " ++ show p where n = G.length vec - ndf - 1 chi2 = sum $ G.map (\(o,e) -> square (fromIntegral o - e) / e) vec d = chiSquared n {-# SPECIALIZE chi2test :: Double -> Int -> U.Vector (Int,Double) -> TestResult #-} {-# SPECIALIZE chi2test :: Double -> Int -> V.Vector (Int,Double) -> TestResult #-} statistics-0.13.2.3/Statistics/Test/Internal.hs0000644000000000000000000000274312504370360017505 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} module Statistics.Test.Internal ( rank , splitByTags ) where import qualified Data.Vector.Generic as G -- Private data type for unfolding data Rank v a = Rank { rankCnt :: {-# UNPACK #-} !Int -- Number of ranks to return , rankVal :: {-# UNPACK #-} !Double -- Rank to return , rankNum :: {-# UNPACK #-} !Double -- Current rank , rankVec :: v a -- Remaining vector } -- | Calculate rank of sample. Sample should be already sorted. rank :: (G.Vector v a, G.Vector v Double) => (a -> a -> Bool) -- ^ Equivalence relation -> v a -- ^ Vector to rank -> v Double rank eq vec = G.unfoldr go (Rank 0 (-1) 1 vec) where go (Rank 0 _ r v) | G.null v = Nothing | otherwise = case G.length h of 1 -> Just (r, Rank 0 0 (r+1) rest) n -> go Rank { rankCnt = n , rankVal = 0.5 * (r*2 + fromIntegral (n-1)) , rankNum = r + fromIntegral n , rankVec = rest } where (h,rest) = G.span (eq $ G.head v) v go (Rank n val r v) = Just (val, Rank (n-1) val r v) {-# INLINE rank #-} -- | Split tagged vector splitByTags :: (G.Vector v a, G.Vector v (Bool,a)) => v (Bool,a) -> (v a, v a) splitByTags vs = (G.map snd a, G.map snd b) where (a,b) = G.unstablePartition fst vs {-# INLINE splitByTags #-} statistics-0.13.2.3/Statistics/Test/KolmogorovSmirnov.hs0000644000000000000000000002077712504370360021454 0ustar0000000000000000-- | -- Module : Statistics.Test.KolmogorovSmirnov -- Copyright : (c) 2011 Aleksey Khudyakov -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Kolmogov-Smirnov tests are non-parametric tests for assesing -- whether given sample could be described by distribution or whether -- two samples have the same distribution. It's only applicable to -- continous distributions. module Statistics.Test.KolmogorovSmirnov ( -- * Kolmogorov-Smirnov test kolmogorovSmirnovTest , kolmogorovSmirnovTestCdf , kolmogorovSmirnovTest2 -- * Evaluate statistics , kolmogorovSmirnovCdfD , kolmogorovSmirnovD , kolmogorovSmirnov2D -- * Probablities , kolmogorovSmirnovProbability -- * Data types , TestType(..) , TestResult(..) -- * References -- $references ) where import Control.Monad (when) import Prelude hiding (exponent, sum) import Statistics.Distribution (Distribution(..)) import Statistics.Function (sort, unsafeModify) import Statistics.Matrix (center, exponent, for, fromVector, power) import Statistics.Test.Types (TestResult(..), TestType(..), significant) import Statistics.Types (Sample) import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Unboxed.Mutable as M ---------------------------------------------------------------- -- Test ---------------------------------------------------------------- -- | Check that sample could be described by -- distribution. 'Significant' means distribution is not compatible -- with data for given p-value. -- -- This test uses Marsaglia-Tsang-Wang exact alogorithm for -- calculation of p-value. kolmogorovSmirnovTest :: Distribution d => d -- ^ Distribution -> Double -- ^ p-value -> Sample -- ^ Data sample -> TestResult kolmogorovSmirnovTest d = kolmogorovSmirnovTestCdf (cumulative d) -- | Variant of 'kolmogorovSmirnovTest' which uses CFD in form of -- function. kolmogorovSmirnovTestCdf :: (Double -> Double) -- ^ CDF of distribution -> Double -- ^ p-value -> Sample -- ^ Data sample -> TestResult kolmogorovSmirnovTestCdf cdf p sample | p > 0 && p < 1 = significant $ 1 - prob < p | otherwise = error "Statistics.Test.KolmogorovSmirnov.kolmogorovSmirnovTestCdf:bad p-value" where d = kolmogorovSmirnovCdfD cdf sample prob = kolmogorovSmirnovProbability (U.length sample) d -- | Two sample Kolmogorov-Smirnov test. It tests whether two data -- samples could be described by the same distribution without -- making any assumptions about it. -- -- This test uses approxmate formula for computing p-value. kolmogorovSmirnovTest2 :: Double -- ^ p-value -> Sample -- ^ Sample 1 -> Sample -- ^ Sample 2 -> TestResult kolmogorovSmirnovTest2 p xs1 xs2 | p > 0 && p < 1 = significant $ 1 - prob( d*(en + 0.12 + 0.11/en) ) < p | otherwise = error "Statistics.Test.KolmogorovSmirnov.kolmogorovSmirnovTest2:bad p-value" where d = kolmogorovSmirnov2D xs1 xs2 -- Effective number of data points n1 = fromIntegral (U.length xs1) n2 = fromIntegral (U.length xs2) en = sqrt $ n1 * n2 / (n1 + n2) -- prob z | z < 0 = error "kolmogorovSmirnov2D: internal error" | z == 0 = 1 | z < 1.18 = let y = exp( -1.23370055013616983 / (z*z) ) in 2.25675833419102515 * sqrt( -log(y) ) * (y + y**9 + y**25 + y**49) | otherwise = let x = exp(-2 * z * z) in 1 - 2*(x - x**4 + x**9) -- FIXME: Find source for approximation for D ---------------------------------------------------------------- -- Kolmogorov's statistic ---------------------------------------------------------------- -- | Calculate Kolmogorov's statistic /D/ for given cumulative -- distribution function (CDF) and data sample. If sample is empty -- returns 0. kolmogorovSmirnovCdfD :: (Double -> Double) -- ^ CDF function -> Sample -- ^ Sample -> Double kolmogorovSmirnovCdfD cdf sample | U.null sample = 0 | otherwise = U.maximum $ U.zipWith3 (\p a b -> abs (p-a) `max` abs (p-b)) ps steps (U.tail steps) where xs = sort sample n = U.length xs -- ps = U.map cdf xs steps = U.map ((/ fromIntegral n) . fromIntegral) $ U.generate (n+1) id -- | Calculate Kolmogorov's statistic /D/ for given cumulative -- distribution function (CDF) and data sample. If sample is empty -- returns 0. kolmogorovSmirnovD :: (Distribution d) => d -- ^ Distribution -> Sample -- ^ Sample -> Double kolmogorovSmirnovD d = kolmogorovSmirnovCdfD (cumulative d) -- | Calculate Kolmogorov's statistic /D/ for two data samples. If -- either of samples is empty returns 0. kolmogorovSmirnov2D :: Sample -- ^ First sample -> Sample -- ^ Second sample -> Double kolmogorovSmirnov2D sample1 sample2 | U.null sample1 || U.null sample2 = 0 | otherwise = worker 0 0 0 where xs1 = sort sample1 xs2 = sort sample2 n1 = U.length xs1 n2 = U.length xs2 en1 = fromIntegral n1 en2 = fromIntegral n2 -- Find new index skip x i xs = go (i+1) where go n | n >= U.length xs = n | xs U.! n == x = go (n+1) | otherwise = n -- Main loop worker d i1 i2 | i1 >= n1 || i2 >= n2 = d | otherwise = worker d' i1' i2' where d1 = xs1 U.! i1 d2 = xs2 U.! i2 i1' | d1 <= d2 = skip d1 i1 xs1 | otherwise = i1 i2' | d2 <= d1 = skip d2 i2 xs2 | otherwise = i2 d' = max d (abs $ fromIntegral i1' / en1 - fromIntegral i2' / en2) -- | Calculate cumulative probability function for Kolmogorov's -- distribution with /n/ parameters or probability of getting value -- smaller than /d/ with n-elements sample. -- -- It uses algorithm by Marsgalia et. al. and provide at least -- 7-digit accuracy. kolmogorovSmirnovProbability :: Int -- ^ Size of the sample -> Double -- ^ D value -> Double kolmogorovSmirnovProbability n d -- Avoid potencially lengthy calculations for large N and D > 0.999 | s > 7.24 || (s > 3.76 && n > 99) = 1 - 2 * exp( -(2.000071 + 0.331 / sqrt n' + 1.409 / n') * s) -- Exact computation | otherwise = fini $ matrix `power` n where s = n' * d * d n' = fromIntegral n size = 2*k - 1 k = floor (n' * d) + 1 h = fromIntegral k - n' * d -- Calculate initial matrix matrix = let m = U.create $ do mat <- M.new (size*size) -- Fill matrix with 0 and 1s for 0 size $ \row -> for 0 size $ \col -> do let val | row + 1 >= col = 1 | otherwise = 0 :: Double M.write mat (row * size + col) val -- Correct left column/bottom row for 0 size $ \i -> do let delta = h ^^ (i + 1) unsafeModify mat (i * size) (subtract delta) unsafeModify mat (size * size - 1 - i) (subtract delta) -- Correct corner element if needed when (2*h > 1) $ do unsafeModify mat ((size - 1) * size) (+ ((2*h - 1) ^ size)) -- Divide diagonals by factorial let divide g num | num == size = return () | otherwise = do for num size $ \i -> unsafeModify mat (i * (size + 1) - num) (/ g) divide (g * fromIntegral (num+2)) (num+1) divide 2 1 return mat in fromVector size size m -- Last calculation fini m = loop 1 (center m) (exponent m) where loop i ss eQ | i > n = ss * 10 ^^ eQ | ss' < 1e-140 = loop (i+1) (ss' * 1e140) (eQ - 140) | otherwise = loop (i+1) ss' eQ where ss' = ss * fromIntegral i / fromIntegral n ---------------------------------------------------------------- -- $references -- -- * G. Marsaglia, W. W. Tsang, J. Wang (2003) Evaluating Kolmogorov's -- distribution, Journal of Statistical Software, American -- Statistical Association, vol. 8(i18). statistics-0.13.2.3/Statistics/Test/KruskalWallis.hs0000644000000000000000000000670212504370360020520 0ustar0000000000000000-- | -- Module : Statistics.Test.KruskalWallis -- Copyright : (c) 2014 Danny Navarro -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- module Statistics.Test.KruskalWallis ( kruskalWallisRank , kruskalWallis , kruskalWallisSignificant , kruskalWallisTest ) where import Data.Ord (comparing) import Data.Foldable (foldMap) import qualified Data.Vector.Unboxed as U import Statistics.Function (sort, sortBy, square) import Statistics.Distribution (quantile) import Statistics.Distribution.ChiSquared (chiSquared) import Statistics.Test.Types (TestResult(..), significant) import Statistics.Test.Internal (rank) import Statistics.Sample import qualified Statistics.Sample.Internal as Sample(sum) -- | Kruskal-Wallis ranking. -- -- All values are replaced by the absolute rank in the combined samples. -- -- The samples and values need not to be ordered but the values in the result -- are ordered. Assigned ranks (ties are given their average rank). kruskalWallisRank :: [Sample] -> [Sample] kruskalWallisRank samples = groupByTags . sortBy (comparing fst) . U.zip tags $ rank (==) joinSample where (tags,joinSample) = U.unzip . sortBy (comparing snd) $ foldMap (uncurry tagSample) $ zip [(1::Int)..] samples tagSample t = U.map (\x -> (t,x)) groupByTags xs | U.null xs = [] | otherwise = sort (U.map snd ys) : groupByTags zs where (ys,zs) = U.span ((==) (fst $ U.head xs) . fst) xs -- | The Kruskal-Wallis Test. -- -- In textbooks the output value is usually represented by 'K' or 'H'. This -- function already does the ranking. kruskalWallis :: [Sample] -> Double kruskalWallis samples = (nTot - 1) * numerator / denominator where -- Total number of elements in all samples nTot = fromIntegral $ sumWith rsamples U.length -- Average rank of all samples avgRank = (nTot + 1) / 2 -- numerator = sumWith rsamples $ \sample -> let n = fromIntegral $ U.length sample in n * square (mean sample - avgRank) denominator = sumWith rsamples $ \sample -> Sample.sum $ U.map (\r -> square (r - avgRank)) sample rsamples = kruskalWallisRank samples -- | Calculates whether the Kruskal-Wallis test is significant. -- -- It uses /Chi-Squared/ distribution for aproximation as long as the sizes are -- larger than 5. Otherwise the test returns 'Nothing'. kruskalWallisSignificant :: [Int] -- ^ The samples' size -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Double -- ^ K value from 'kruskallWallis' -> Maybe TestResult kruskalWallisSignificant ns p k -- Use chi-squared approximation | all (>4) ns = Just . significant $ k > x -- TODO: Implement critical value calculation: kruskalWallisCriticalValue | otherwise = Nothing where x = quantile (chiSquared (length ns - 1)) (1 - p) -- | Perform Kruskal-Wallis Test for the given samples and required -- significance. For additional information check 'kruskalWallis'. This is just -- a helper function. kruskalWallisTest :: Double -> [Sample] -> Maybe TestResult kruskalWallisTest p samples = kruskalWallisSignificant (map U.length samples) p $ kruskalWallis samples -- * Helper functions sumWith :: Num a => [Sample] -> (Sample -> a) -> a sumWith samples f = Prelude.sum $ fmap f samples {-# INLINE sumWith #-} statistics-0.13.2.3/Statistics/Test/MannWhitneyU.hs0000644000000000000000000002321612504370360020315 0ustar0000000000000000-- | -- Module : Statistics.Test.MannWhitneyU -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- Mann-Whitney U test (also know as Mann-Whitney-Wilcoxon and -- Wilcoxon rank sum test) is a non-parametric test for assesing -- whether two samples of independent observations have different -- mean. module Statistics.Test.MannWhitneyU ( -- * Mann-Whitney U test mannWhitneyUtest , mannWhitneyU , mannWhitneyUCriticalValue , mannWhitneyUSignificant -- ** Wilcoxon rank sum test , wilcoxonRankSums -- * Data types , TestType(..) , TestResult(..) -- * References -- $references ) where import Control.Applicative ((<$>)) import Data.List (findIndex) import Data.Ord (comparing) import Numeric.SpecFunctions (choose) import Prelude hiding (sum) import Statistics.Distribution (quantile) import Statistics.Distribution.Normal (standard) import Statistics.Function (sortBy) import Statistics.Sample.Internal (sum) import Statistics.Test.Internal (rank, splitByTags) import Statistics.Test.Types (TestResult(..), TestType(..), significant) import Statistics.Types (Sample) import qualified Data.Vector.Unboxed as U -- | The Wilcoxon Rank Sums Test. -- -- This test calculates the sum of ranks for the given two samples. The samples -- are ordered, and assigned ranks (ties are given their average rank), then these -- ranks are summed for each sample. -- -- The return value is (W₁, W₂) where W₁ is the sum of ranks of the first sample -- and W₂ is the sum of ranks of the second sample. This test is trivially transformed -- into the Mann-Whitney U test. You will probably want to use 'mannWhitneyU' -- and the related functions for testing significance, but this function is exposed -- for completeness. wilcoxonRankSums :: Sample -> Sample -> (Double, Double) wilcoxonRankSums xs1 xs2 = (sum ranks1, sum ranks2) where -- Ranks for each sample (ranks1,ranks2) = splitByTags $ U.zip tags (rank (==) joinSample) -- Sorted and tagged sample (tags,joinSample) = U.unzip $ sortBy (comparing snd) $ tagSample True xs1 U.++ tagSample False xs2 -- Add tag to a sample tagSample t = U.map ((,) t) -- | The Mann-Whitney U Test. -- -- This is sometimes known as the Mann-Whitney-Wilcoxon U test, and -- confusingly many sources state that the Mann-Whitney U test is the same as -- the Wilcoxon's rank sum test (which is provided as 'wilcoxonRankSums'). -- The Mann-Whitney U is a simple transform of Wilcoxon's rank sum test. -- -- Again confusingly, different sources state reversed definitions for U₁ -- and U₂, so it is worth being explicit about what this function returns. -- Given two samples, the first, xs₁, of size n₁ and the second, xs₂, -- of size n₂, this function returns (U₁, U₂) -- where U₁ = W₁ - (n₁(n₁+1))\/2 -- and U₂ = W₂ - (n₂(n₂+1))\/2, -- where (W₁, W₂) is the return value of @wilcoxonRankSums xs1 xs2@. -- -- Some sources instead state that U₁ and U₂ should be the other way round, often -- expressing this using U₁' = n₁n₂ - U₁ (since U₁ + U₂ = n₁n₂). -- -- All of which you probably don't care about if you just feed this into 'mannWhitneyUSignificant'. mannWhitneyU :: Sample -> Sample -> (Double, Double) mannWhitneyU xs1 xs2 = (fst summedRanks - (n1*(n1 + 1))/2 ,snd summedRanks - (n2*(n2 + 1))/2) where n1 = fromIntegral $ U.length xs1 n2 = fromIntegral $ U.length xs2 summedRanks = wilcoxonRankSums xs1 xs2 -- | Calculates the critical value of Mann-Whitney U for the given sample -- sizes and significance level. -- -- This function returns the exact calculated value of U for all sample sizes; -- it does not use the normal approximation at all. Above sample size 20 it is -- generally recommended to use the normal approximation instead, but this function -- will calculate the higher critical values if you need them. -- -- The algorithm to generate these values is a faster, memoised version of the -- simple unoptimised generating function given in section 2 of \"The Mann Whitney -- Wilcoxon Distribution Using Linked Lists\" mannWhitneyUCriticalValue :: (Int, Int) -- ^ The sample size -> Double -- ^ The p-value (e.g. 0.05) for which you want the critical value. -> Maybe Int -- ^ The critical value (of U). mannWhitneyUCriticalValue (m, n) p | m < 1 || n < 1 = Nothing -- Sample must be nonempty | p >= 1 = Nothing -- Nonsensical p-value | p' <= 1 = Nothing -- p-value is too small. Null hypothesys couln't be disproved | otherwise = findIndex (>= p') $ take (m*n) $ tail $ alookup !! (m+n-2) !! (min m n - 1) where mnCn = (m+n) `choose` n p' = mnCn * p {- -- Original function, without memoisation, from Cheung and Klotz: -- Double is needed to avoid integer overflows. a :: Int -> Int -> Int -> Double a u bigN m | u < 0 = 0 | u >= m * n = bigN `choose` m | m == 1 || n == 1 = fromIntegral (u + 1) | otherwise = a u (bigN - 1) m + a (u - n) (bigN - 1) (m-1) where n = bigN - m -} -- Memoised version of the original a function, above. -- -- Doubles are stored to avoid integer overflow. 32-bit Ints begin to -- overflow for bigN as small as 33 (64-bit one at 66) while Double to -- go to infinity till bigN=1029 -- -- -- outer list is indexed by big N - 2 -- inner list by (m-1) (we know m < bigN) -- innermost list by u -- -- So: (alookup !! (bigN - 2) !! (m - 1) ! u) == a u bigN m alookup :: [[[Double]]] alookup = gen 2 [1 : repeat 2] where gen bigN predBigNList = let bigNlist = [ [ amemoed u m | u <- [0 .. m*(bigN-m)] ] ++ repeat (bigN `choose` m) | m <- [1 .. (bigN-1)]] -- has bigN-1 elements in bigNlist : gen (bigN+1) bigNlist where amemoed :: Int -> Int -> Double amemoed u m | m == 1 || n == 1 = fromIntegral (u + 1) | otherwise = mList !! u + if u < n then 0 else predmList !! (u-n) where n = bigN - m (predmList : mList : _) = drop (m-2) predBigNList -- Lists for m-1 and m respectively. i-th list correspond to m=i+1 -- -- We know that predBigNList has bigN - 2 elements -- (and we know that n > 1 therefore bigN > m + 1) -- So bigN - 2 >= m, i.e. predBigNList must have at least m elements -- elements, so dropping (m-2) must leave at least 2 -- | Calculates whether the Mann Whitney U test is significant. -- -- If both sample sizes are less than or equal to 20, the exact U critical value -- (as calculated by 'mannWhitneyUCriticalValue') is used. If either sample is -- larger than 20, the normal approximation is used instead. -- -- If you use a one-tailed test, the test indicates whether the first sample is -- significantly larger than the second. If you want the opposite, simply reverse -- the order in both the sample size and the (U₁, U₂) pairs. mannWhitneyUSignificant :: TestType -- ^ Perform one-tailed test (see description above). -> (Int, Int) -- ^ The samples' size from which the (U₁,U₂) values were derived. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> (Double, Double) -- ^ The (U₁, U₂) values from 'mannWhitneyU'. -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. mannWhitneyUSignificant test (in1, in2) p (u1, u2) --Use normal approximation | in1 > 20 || in2 > 20 = let mean = n1 * n2 / 2 sigma = sqrt $ n1*n2*(n1 + n2 + 1) / 12 z = (mean - u1) / sigma in Just $ case test of OneTailed -> significant $ z < quantile standard p TwoTailed -> significant $ abs z > abs (quantile standard (p/2)) -- Use exact critical value | otherwise = do crit <- fromIntegral <$> mannWhitneyUCriticalValue (in1, in2) p return $ case test of OneTailed -> significant $ u2 <= crit TwoTailed -> significant $ min u1 u2 <= crit where n1 = fromIntegral in1 n2 = fromIntegral in2 -- | Perform Mann-Whitney U Test for two samples and required -- significance. For additional information check documentation of -- 'mannWhitneyU' and 'mannWhitneyUSignificant'. This is just a helper -- function. -- -- One-tailed test checks whether first sample is significantly larger -- than second. Two-tailed whether they are significantly different. mannWhitneyUtest :: TestType -- ^ Perform one-tailed test (see description above). -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Sample -- ^ First sample -> Sample -- ^ Second sample -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too small to -- make a decision. mannWhitneyUtest ontTail p smp1 smp2 = mannWhitneyUSignificant ontTail (n1,n2) p $ mannWhitneyU smp1 smp2 where n1 = U.length smp1 n2 = U.length smp2 -- $references -- -- * Cheung, Y.K.; Klotz, J.H. (1997) The Mann Whitney Wilcoxon -- distribution using linked lists. /Statistica Sinica/ -- 7:805–813. -- . statistics-0.13.2.3/Statistics/Test/Types.hs0000644000000000000000000000206112504370360017026 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable, DeriveGeneric #-} module Statistics.Test.Types ( TestType(..) , TestResult(..) , significant ) where import Data.Aeson (FromJSON, ToJSON) import Data.Data (Typeable, Data) import GHC.Generics -- | Test type. Exact meaning depends on a specific test. But -- generally it's tested whether some statistics is too big (small) -- for 'OneTailed' or whether it too big or too small for 'TwoTailed' data TestType = OneTailed | TwoTailed deriving (Eq,Ord,Show,Typeable,Data,Generic) instance FromJSON TestType instance ToJSON TestType -- | Result of hypothesis testing data TestResult = Significant -- ^ Null hypothesis should be rejected | NotSignificant -- ^ Data is compatible with hypothesis deriving (Eq,Ord,Show,Typeable,Data,Generic) instance FromJSON TestResult instance ToJSON TestResult -- | Significant if parameter is 'True', not significant otherwiser significant :: Bool -> TestResult significant True = Significant significant False = NotSignificant statistics-0.13.2.3/Statistics/Test/WilcoxonT.hs0000644000000000000000000002152012504370360017651 0ustar0000000000000000-- | -- Module : Statistics.Test.WilcoxonT -- Copyright : (c) 2010 Neil Brown -- License : BSD3 -- -- Maintainer : bos@serpentine.com -- Stability : experimental -- Portability : portable -- -- The Wilcoxon matched-pairs signed-rank test is non-parametric test -- which could be used to whether two related samples have different -- means. -- -- WARNING: current implementation contain serious bug and couldn't be -- used with samples larger than 1023. -- module Statistics.Test.WilcoxonT ( -- * Wilcoxon signed-rank matched-pair test wilcoxonMatchedPairTest , wilcoxonMatchedPairSignedRank , wilcoxonMatchedPairSignificant , wilcoxonMatchedPairSignificance , wilcoxonMatchedPairCriticalValue -- * Data types , TestType(..) , TestResult(..) ) where -- -- -- -- Note that: wilcoxonMatchedPairSignedRank == (\(x, y) -> (y, x)) . flip wilcoxonMatchedPairSignedRank -- The samples are zipped together: if one is longer than the other, both are truncated -- The value returned is the pair (T+, T-). T+ is the sum of positive ranks (the -- These values mean little by themselves, and should be combined with the 'wilcoxonSignificant' -- function in this module to get a meaningful result. -- ranks of the differences where the first parameter is higher) whereas T- is -- the sum of negative ranks (the ranks of the differences where the second parameter is higher). -- to the the length of the shorter sample. import Control.Applicative ((<$>)) import Data.Function (on) import Data.List (findIndex) import Data.Ord (comparing) import Prelude hiding (sum) import Statistics.Function (sortBy) import Statistics.Sample.Internal (sum) import Statistics.Test.Internal (rank, splitByTags) import Statistics.Test.Types (TestResult(..), TestType(..), significant) import Statistics.Types (Sample) import qualified Data.Vector.Unboxed as U wilcoxonMatchedPairSignedRank :: Sample -> Sample -> (Double, Double) wilcoxonMatchedPairSignedRank a b = (sum ranks1, negate (sum ranks2)) where (ranks1, ranks2) = splitByTags $ U.zip tags (rank ((==) `on` abs) diffs) (tags,diffs) = U.unzip $ U.map (\x -> (x>0 , x)) -- Attack tags to distribution elements $ U.filter (/= 0.0) -- Remove equal elements $ sortBy (comparing abs) -- Sort the differences by absolute difference $ U.zipWith (-) a b -- Work out differences -- | The coefficients for x^0, x^1, x^2, etc, in the expression -- \prod_{r=1}^s (1 + x^r). See the Mitic paper for details. -- -- We can define: -- f(1) = 1 + x -- f(r) = (1 + x^r)*f(r-1) -- = f(r-1) + x^r * f(r-1) -- The effect of multiplying the equation by x^r is to shift -- all the coefficients by r down the list. -- -- This list will be processed lazily from the head. coefficients :: Int -> [Integer] coefficients 1 = [1, 1] -- 1 + x coefficients r = let coeffs = coefficients (r-1) (firstR, rest) = splitAt r coeffs in firstR ++ add rest coeffs where add (x:xs) (y:ys) = x + y : add xs ys add xs [] = xs add [] ys = ys -- This list will be processed lazily from the head. summedCoefficients :: Int -> [Double] summedCoefficients n | n < 1 = error "Statistics.Test.WilcoxonT.summedCoefficients: nonpositive sample size" | n > 1023 = error "Statistics.Test.WilcoxonT.summedCoefficients: sample is too large (see bug #18)" | otherwise = map fromIntegral $ scanl1 (+) $ coefficients n -- | Tests whether a given result from a Wilcoxon signed-rank matched-pairs test -- is significant at the given level. -- -- This function can perform a one-tailed or two-tailed test. If the first -- parameter to this function is 'TwoTailed', the test is performed two-tailed to -- check if the two samples differ significantly. If the first parameter is -- 'OneTailed', the check is performed one-tailed to decide whether the first sample -- (i.e. the first sample you passed to 'wilcoxonMatchedPairSignedRank') is -- greater than the second sample (i.e. the second sample you passed to -- 'wilcoxonMatchedPairSignedRank'). If you wish to perform a one-tailed test -- in the opposite direction, you can either pass the parameters in a different -- order to 'wilcoxonMatchedPairSignedRank', or simply swap the values in the resulting -- pair before passing them to this function. wilcoxonMatchedPairSignificant :: TestType -- ^ Perform one- or two-tailed test (see description below). -> Int -- ^ The sample size from which the (T+,T-) values were derived. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> (Double, Double) -- ^ The (T+, T-) values from 'wilcoxonMatchedPairSignedRank'. -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. wilcoxonMatchedPairSignificant test sampleSize p (tPlus, tMinus) = case test of -- According to my nearest book (Understanding Research Methods and Statistics -- by Gary W. Heiman, p590), to check that the first sample is bigger you must -- use the absolute value of T- for a one-tailed check: OneTailed -> (significant . (abs tMinus <=) . fromIntegral) <$> wilcoxonMatchedPairCriticalValue sampleSize p -- Otherwise you must use the value of T+ and T- with the smallest absolute value: TwoTailed -> (significant . (t <=) . fromIntegral) <$> wilcoxonMatchedPairCriticalValue sampleSize (p/2) where t = min (abs tPlus) (abs tMinus) -- | Obtains the critical value of T to compare against, given a sample size -- and a p-value (significance level). Your T value must be less than or -- equal to the return of this function in order for the test to work out -- significant. If there is a Nothing return, the sample size is too small to -- make a decision. -- -- 'wilcoxonSignificant' tests the return value of 'wilcoxonMatchedPairSignedRank' -- for you, so you should use 'wilcoxonSignificant' for determining test results. -- However, this function is useful, for example, for generating lookup tables -- for Wilcoxon signed rank critical values. -- -- The return values of this function are generated using the method detailed in -- the paper \"Critical Values for the Wilcoxon Signed Rank Statistic\", Peter -- Mitic, The Mathematica Journal, volume 6, issue 3, 1996, which can be found -- here: . -- According to that paper, the results may differ from other published lookup tables, but -- (Mitic claims) the values obtained by this function will be the correct ones. wilcoxonMatchedPairCriticalValue :: Int -- ^ The sample size -> Double -- ^ The p-value (e.g. 0.05) for which you want the critical value. -> Maybe Int -- ^ The critical value (of T), or Nothing if -- the sample is too small to make a decision. wilcoxonMatchedPairCriticalValue sampleSize p = case critical of Just n | n < 0 -> Nothing | otherwise -> Just n Nothing -> Just maxBound -- shouldn't happen: beyond end of list where m = (2 ** fromIntegral sampleSize) * p critical = subtract 1 <$> findIndex (> m) (summedCoefficients sampleSize) -- | Works out the significance level (p-value) of a T value, given a sample -- size and a T value from the Wilcoxon signed-rank matched-pairs test. -- -- See the notes on 'wilcoxonCriticalValue' for how this is calculated. wilcoxonMatchedPairSignificance :: Int -- ^ The sample size -> Double -- ^ The value of T for which you want the significance. -> Double -- ^ The significance (p-value). wilcoxonMatchedPairSignificance sampleSize rnk = (summedCoefficients sampleSize !! floor rnk) / 2 ** fromIntegral sampleSize -- | The Wilcoxon matched-pairs signed-rank test. The samples are -- zipped together: if one is longer than the other, both are -- truncated to the the length of the shorter sample. -- -- For one-tailed test it tests whether first sample is significantly -- greater than the second. For two-tailed it checks whether they -- significantly differ -- -- Check 'wilcoxonMatchedPairSignedRank' and -- 'wilcoxonMatchedPairSignificant' for additional information. wilcoxonMatchedPairTest :: TestType -- ^ Perform one-tailed test. -> Double -- ^ The p-value at which to test (e.g. 0.05) -> Sample -- ^ First sample -> Sample -- ^ Second sample -> Maybe TestResult -- ^ Return 'Nothing' if the sample was too -- small to make a decision. wilcoxonMatchedPairTest test p smp1 smp2 = wilcoxonMatchedPairSignificant test (min n1 n2) p $ wilcoxonMatchedPairSignedRank smp1 smp2 where n1 = U.length smp1 n2 = U.length smp2 statistics-0.13.2.3/tests/0000755000000000000000000000000012504370360013460 5ustar0000000000000000statistics-0.13.2.3/tests/tests.hs0000644000000000000000000000121212504370360015152 0ustar0000000000000000import Test.Framework (defaultMain) import qualified Tests.Distribution as Distribution import qualified Tests.Function as Function import qualified Tests.KDE as KDE import qualified Tests.Matrix as Matrix import qualified Tests.NonParametric as NonParametric import qualified Tests.Transform as Transform import qualified Tests.Correlation as Correlation main :: IO () main = defaultMain [ Distribution.tests , Function.tests , KDE.tests , Matrix.tests , NonParametric.tests , Transform.tests , Correlation.tests ] statistics-0.13.2.3/tests/Tests/0000755000000000000000000000000012504370360014562 5ustar0000000000000000statistics-0.13.2.3/tests/Tests/ApproxEq.hs0000644000000000000000000000655712504370360016672 0ustar0000000000000000{-# LANGUAGE FlexibleContexts, FlexibleInstances, TypeFamilies #-} module Tests.ApproxEq ( ApproxEq(..) ) where import Data.Complex (Complex(..), realPart) import Data.List (intersperse) import Data.Maybe (catMaybes) import Numeric.MathFunctions.Constants (m_epsilon) import Statistics.Matrix hiding (map, toList) import Test.QuickCheck import qualified Data.Vector as V import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U import qualified Statistics.Matrix as M class (Eq a, Show a) => ApproxEq a where type Bounds a eq :: Bounds a -> a -> a -> Bool eql :: Bounds a -> a -> a -> Property eql eps a b = counterexample (show a ++ " /=~ " ++ show b) (eq eps a b) (=~) :: a -> a -> Bool (==~) :: ApproxEq a => a -> a -> Property a ==~ b = counterexample (show a ++ " /=~ " ++ show b) (a =~ b) instance ApproxEq Double where type Bounds Double = Double eq eps a b | a == 0 && b == 0 = True | otherwise = abs (a - b) <= eps * max (abs a) (abs b) (=~) = eq m_epsilon instance ApproxEq (Complex Double) where type Bounds (Complex Double) = Double eq eps a@(ar :+ ai) b@(br :+ bi) | a == 0 && b == 0 = True | otherwise = abs (ar - br) <= eps * d && abs (ai - bi) <= eps * d where d = max (realPart $ abs a) (realPart $ abs b) (=~) = eq m_epsilon instance ApproxEq [Double] where type Bounds [Double] = Double eq eps (x:xs) (y:ys) = eq eps x y && eq eps xs ys eq _ [] [] = True eq _ _ _ = False eql = eqll length id id (=~) = eq m_epsilon (==~) = eql m_epsilon instance ApproxEq (U.Vector Double) where type Bounds (U.Vector Double) = Double eq = eqv (=~) = eq m_epsilon eql = eqlv (==~) = eqlv m_epsilon instance ApproxEq (V.Vector Double) where type Bounds (V.Vector Double) = Double eq = eqv (=~) = eq m_epsilon eql = eqlv (==~) = eqlv m_epsilon instance ApproxEq Matrix where type Bounds Matrix = Double eq eps (Matrix r1 c1 e1 v1) (Matrix r2 c2 e2 v2) = (r1,c1,e1) == (r2,c2,e2) && eq eps v1 v2 (=~) = eq m_epsilon eql eps a b = eqll dimension M.toList (`quotRem` cols a) eps a b (==~) = eql m_epsilon eqv :: (ApproxEq a, G.Vector v Bool, G.Vector v a) => Bounds a -> v a -> v a -> Bool eqv eps a b = G.length a == G.length b && G.and (G.zipWith (eq eps) a b) eqlv :: (ApproxEq [a], G.Vector v a) => Bounds [a] -> v a -> v a -> Property eqlv eps a b = eql eps (G.toList a) (G.toList b) eqll :: (ApproxEq l, ApproxEq a, Show c, Show d, Eq d, Bounds l ~ Bounds a) => (l -> d) -> (l -> [a]) -> (Int -> c) -> Bounds l -> l -> l -> Property eqll dim toList coord eps a b = counterexample fancy $ eq eps a b where fancy | la /= lb = "size mismatch: " ++ show la ++ " /= " ++ show lb | length summary < length full = summary | otherwise = full summary = concat . intersperse ", " . catMaybes $ zipWith3 whee (map coord [(0::Int)..]) xs ys full | '\n' `elem` sa = sa ++ " /=~\n" ++ sb | otherwise = sa ++ " /=~" ++ sb (sa, sb) = (show a, show b) (xs, ys) = (toList a, toList b) (la, lb) = (dim a, dim b) whee i x y | eq eps x y = Nothing | otherwise = Just $ show i ++ ": " ++ show x ++ " /=~ " ++ show y statistics-0.13.2.3/tests/Tests/Correlation.hs0000644000000000000000000000321312504370360017376 0ustar0000000000000000{-#LANGUAGE BangPatterns #-} module Tests.Correlation ( tests ) where import Test.Framework import Test.Framework.Providers.QuickCheck2 import Test.Framework.Providers.HUnit import Test.HUnit (Assertion, (@=?)) import qualified Data.Vector as V import Statistics.Correlation.Kendall tests :: Test tests = testGroup "Correlation" [ testProperty "Kendall test -- general" testKendall , testCase "Kendall test -- special cases" testKendallSpecial ] testKendall :: [(Double, Double)] -> Bool testKendall xy | isNaN r1 = isNaN r2 | otherwise = r1 == r2 where r1 = kendallBruteForce xy r2 = kendall $ V.fromList xy testKendallSpecial :: Assertion testKendallSpecial = ys @=? map (kendall.V.fromList) xs where (xs, ys) = unzip testData testData :: [([(Double, Double)], Double)] testData = [ ( [(1,1), (2,2), (3,1), (1,5), (2,2)], -0.375 ) , ( [(1,3), (1,3), (1,3), (3,2), (3,5)], 0) ] kendallBruteForce :: [(Double, Double)] -> Double kendallBruteForce xy = (n_c - n_d) / sqrt ((n_0 - n_1) * (n_0 - n_2)) where allPairs = f xy (n_c, n_d, n_1, n_2) = foldl g (0,0,0,0) allPairs n_0 = fromIntegral.length $ allPairs g (!nc, !nd, !n1, !n2) ((x1, y1), (x2, y2)) | (x2 - x1) * (y2 - y1) > 0 = (nc+1, nd, n1, n2) | (x2 - x1) * (y2 - y1) < 0 = (nc, nd+1, n1, n2) | otherwise = if x1 == x2 then if y1 == y2 then (nc, nd, n1+1, n2+1) else (nc, nd, n1+1, n2) else (nc, nd, n1, n2+1) f (x:xs) = zip (repeat x) xs ++ f xs f _ = [] statistics-0.13.2.3/tests/Tests/Distribution.hs0000644000000000000000000003632712504370360017610 0ustar0000000000000000{-# OPTIONS_GHC -fno-warn-orphans #-} {-# LANGUAGE FlexibleInstances, OverlappingInstances, ScopedTypeVariables, ViewPatterns #-} module Tests.Distribution (tests) where import Control.Applicative ((<$), (<$>), (<*>)) import Data.Binary (Binary, decode, encode) import Data.List (find) import Data.Typeable (Typeable) import Statistics.Distribution import Statistics.Distribution.Beta (BetaDistribution, betaDistr) import Statistics.Distribution.Binomial (BinomialDistribution, binomial) import Statistics.Distribution.CauchyLorentz import Statistics.Distribution.ChiSquared (ChiSquared, chiSquared) import Statistics.Distribution.Exponential (ExponentialDistribution, exponential) import Statistics.Distribution.FDistribution (FDistribution, fDistribution) import Statistics.Distribution.Gamma (GammaDistribution, gammaDistr) import Statistics.Distribution.Geometric import Statistics.Distribution.Hypergeometric import Statistics.Distribution.Normal (NormalDistribution, normalDistr) import Statistics.Distribution.Poisson (PoissonDistribution, poisson) import Statistics.Distribution.StudentT import Statistics.Distribution.Transform (LinearTransform, linTransDistr) import Statistics.Distribution.Uniform (UniformDistribution, uniformDistr) import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck as QC import Test.QuickCheck.Monadic as QC import Tests.ApproxEq (ApproxEq(..)) import Tests.Helpers (T(..), testAssertion, typeName) import Tests.Helpers (monotonicallyIncreasesIEEE) import Text.Printf (printf) import qualified Control.Exception as E import qualified Numeric.IEEE as IEEE -- | Tests for all distributions tests :: Test tests = testGroup "Tests for all distributions" [ contDistrTests (T :: T BetaDistribution ) , contDistrTests (T :: T CauchyDistribution ) , contDistrTests (T :: T ChiSquared ) , contDistrTests (T :: T ExponentialDistribution ) , contDistrTests (T :: T GammaDistribution ) , contDistrTests (T :: T NormalDistribution ) , contDistrTests (T :: T UniformDistribution ) , contDistrTests (T :: T StudentT ) , contDistrTests (T :: T (LinearTransform StudentT) ) , contDistrTests (T :: T FDistribution ) , discreteDistrTests (T :: T BinomialDistribution ) , discreteDistrTests (T :: T GeometricDistribution ) , discreteDistrTests (T :: T GeometricDistribution0 ) , discreteDistrTests (T :: T HypergeometricDistribution ) , discreteDistrTests (T :: T PoissonDistribution ) , unitTests ] ---------------------------------------------------------------- -- Tests ---------------------------------------------------------------- -- Tests for continous distribution contDistrTests :: (Param d, ContDistr d, QC.Arbitrary d, Typeable d, Show d, Binary d, Eq d) => T d -> Test contDistrTests t = testGroup ("Tests for: " ++ typeName t) $ cdfTests t ++ [ testProperty "PDF sanity" $ pdfSanityCheck t , testProperty "Quantile is CDF inverse" $ quantileIsInvCDF t , testProperty "quantile fails p<0||p>1" $ quantileShouldFail t , testProperty "log density check" $ logDensityCheck t ] -- Tests for discrete distribution discreteDistrTests :: (Param d, DiscreteDistr d, QC.Arbitrary d, Typeable d, Show d, Binary d, Eq d) => T d -> Test discreteDistrTests t = testGroup ("Tests for: " ++ typeName t) $ cdfTests t ++ [ testProperty "Prob. sanity" $ probSanityCheck t , testProperty "CDF is sum of prob." $ discreteCDFcorrect t , testProperty "Discrete CDF is OK" $ cdfDiscreteIsCorrect t , testProperty "log probabilty check" $ logProbabilityCheck t ] -- Tests for distributions which have CDF cdfTests :: (Param d, Distribution d, QC.Arbitrary d, Show d, Binary d, Eq d) => T d -> [Test] cdfTests t = [ testProperty "C.D.F. sanity" $ cdfSanityCheck t , testProperty "CDF limit at +inf" $ cdfLimitAtPosInfinity t , testProperty "CDF limit at -inf" $ cdfLimitAtNegInfinity t , testProperty "CDF at +inf = 1" $ cdfAtPosInfinity t , testProperty "CDF at -inf = 1" $ cdfAtNegInfinity t , testProperty "CDF is nondecreasing" $ cdfIsNondecreasing t , testProperty "1-CDF is correct" $ cdfComplementIsCorrect t , testProperty "Binary OK" $ p_binary t ] ---------------------------------------------------------------- -- CDF is in [0,1] range cdfSanityCheck :: (Distribution d) => T d -> d -> Double -> Bool cdfSanityCheck _ d x = c >= 0 && c <= 1 where c = cumulative d x -- CDF never decreases cdfIsNondecreasing :: (Distribution d) => T d -> d -> Double -> Double -> Bool cdfIsNondecreasing _ d = monotonicallyIncreasesIEEE $ cumulative d -- cumulative d +∞ = 1 cdfAtPosInfinity :: (Param d, Distribution d) => T d -> d -> Bool cdfAtPosInfinity _ d = cumulative d (1/0) == 1 -- cumulative d - ∞ = 0 cdfAtNegInfinity :: (Param d, Distribution d) => T d -> d -> Bool cdfAtNegInfinity _ d = cumulative d (-1/0) == 0 -- CDF limit at +∞ is 1 cdfLimitAtPosInfinity :: (Param d, Distribution d) => T d -> d -> Property cdfLimitAtPosInfinity _ d = okForInfLimit d ==> counterexample ("Last elements: " ++ show (drop 990 probs)) $ Just 1.0 == (find (>=1) probs) where probs = take 1000 $ map (cumulative d) $ iterate (*1.4) 1000 -- CDF limit at -∞ is 0 cdfLimitAtNegInfinity :: (Param d, Distribution d) => T d -> d -> Property cdfLimitAtNegInfinity _ d = okForInfLimit d ==> counterexample ("Last elements: " ++ show (drop 990 probs)) $ case find (< IEEE.epsilon) probs of Nothing -> False Just p -> p >= 0 where probs = take 1000 $ map (cumulative d) $ iterate (*1.4) (-1) -- CDF's complement is implemented correctly cdfComplementIsCorrect :: (Distribution d) => T d -> d -> Double -> Bool cdfComplementIsCorrect _ d x = (eq 1e-14) 1 (cumulative d x + complCumulative d x) -- CDF for discrete distribution uses <= for comparison cdfDiscreteIsCorrect :: (DiscreteDistr d) => T d -> d -> Property cdfDiscreteIsCorrect _ d = counterexample (unlines badN) $ null badN where -- We are checking that: -- -- > CDF(i) - CDF(i-e) = P(i) -- -- Apporixmate equality is tricky here. Scale is set by maximum -- value of CDF and probability. Case when all proabilities are -- zero should be trated specially. badN = [ printf "N=%3i p[i]=%g\tp[i+1]=%g\tdP=%g\trelerr=%g" i p p1 dp ((p1-p-dp) / max p1 dp) | i <- [0 .. 100] , let p = cumulative d $ fromIntegral i - 1e-6 p1 = cumulative d $ fromIntegral i dp = probability d i relerr = ((p1 - p) - dp) / max p1 dp , not (p == 0 && p1 == 0 && dp == 0) && relerr > 1e-14 ] logDensityCheck :: (ContDistr d) => T d -> d -> Double -> Property logDensityCheck _ d x = counterexample (printf "density = %g" p) $ counterexample (printf "logDensity = %g" logP) $ counterexample (printf "log p = %g" (log p)) $ counterexample (printf "eps = %g" (abs (logP - log p) / max (abs (log p)) (abs logP))) $ or [ p == 0 && logP == (-1/0) , p < 1e-308 && logP < 609 , eq 1e-14 (log p) logP ] where p = density d x logP = logDensity d x -- PDF is positive pdfSanityCheck :: (ContDistr d) => T d -> d -> Double -> Bool pdfSanityCheck _ d x = p >= 0 where p = density d x -- Quantile is inverse of CDF quantileIsInvCDF :: (Param d, ContDistr d) => T d -> d -> Double -> Property quantileIsInvCDF _ d (snd . properFraction -> p) = p > 0 && p < 1 ==> ( counterexample (printf "Quantile = %g" q ) $ counterexample (printf "Probability = %g" p ) $ counterexample (printf "Probability' = %g" p') $ counterexample (printf "Error = %e" (abs $ p - p')) $ abs (p - p') < invQuantilePrec d ) where q = quantile d p p' = cumulative d q -- Test that quantile fails if p<0 or p>1 quantileShouldFail :: (ContDistr d) => T d -> d -> Double -> Property quantileShouldFail _ d p = p < 0 || p > 1 ==> QC.monadicIO $ do r <- QC.run $ E.catch (False <$ (return $! quantile d p)) (\(_ :: E.SomeException) -> return True) QC.assert r -- Probability is in [0,1] range probSanityCheck :: (DiscreteDistr d) => T d -> d -> Int -> Bool probSanityCheck _ d x = p >= 0 && p <= 1 where p = probability d x -- Check that discrete CDF is correct discreteCDFcorrect :: (DiscreteDistr d) => T d -> d -> Int -> Int -> Property discreteCDFcorrect _ d a b = counterexample (printf "CDF = %g" p1) $ counterexample (printf "Sum = %g" p2) $ counterexample (printf "Delta = %g" (abs (p1 - p2))) $ abs (p1 - p2) < 3e-10 -- Avoid too large differeneces. Otherwise there is to much to sum -- -- Absolute difference is used guard againist precision loss when -- close values of CDF are subtracted where n = min a b m = n + (abs (a - b) `mod` 100) p1 = cumulative d (fromIntegral m + 0.5) - cumulative d (fromIntegral n - 0.5) p2 = sum $ map (probability d) [n .. m] logProbabilityCheck :: (DiscreteDistr d) => T d -> d -> Int -> Property logProbabilityCheck _ d x = counterexample (printf "probability = %g" p) $ counterexample (printf "logProbability = %g" logP) $ counterexample (printf "log p = %g" (log p)) $ counterexample (printf "eps = %g" (abs (logP - log p) / max (abs (log p)) (abs logP))) $ or [ p == 0 && logP == (-1/0) , p < 1e-308 && logP < 609 , eq 1e-14 (log p) logP ] where p = probability d x logP = logProbability d x p_binary :: (Eq a, Show a, Binary a) => T a -> a -> Bool p_binary _ a = a == (decode . encode) a ---------------------------------------------------------------- -- Arbitrary instances for ditributions ---------------------------------------------------------------- instance QC.Arbitrary BinomialDistribution where arbitrary = binomial <$> QC.choose (1,100) <*> QC.choose (0,1) instance QC.Arbitrary ExponentialDistribution where arbitrary = exponential <$> QC.choose (0,100) instance QC.Arbitrary GammaDistribution where arbitrary = gammaDistr <$> QC.choose (0.1,10) <*> QC.choose (0.1,10) instance QC.Arbitrary BetaDistribution where arbitrary = betaDistr <$> QC.choose (1e-3,10) <*> QC.choose (1e-3,10) instance QC.Arbitrary GeometricDistribution where arbitrary = geometric <$> QC.choose (0,1) instance QC.Arbitrary GeometricDistribution0 where arbitrary = geometric0 <$> QC.choose (0,1) instance QC.Arbitrary HypergeometricDistribution where arbitrary = do l <- QC.choose (1,20) m <- QC.choose (0,l) k <- QC.choose (1,l) return $ hypergeometric m l k instance QC.Arbitrary NormalDistribution where arbitrary = normalDistr <$> QC.choose (-100,100) <*> QC.choose (1e-3, 1e3) instance QC.Arbitrary PoissonDistribution where arbitrary = poisson <$> QC.choose (0,1) instance QC.Arbitrary ChiSquared where arbitrary = chiSquared <$> QC.choose (1,100) instance QC.Arbitrary UniformDistribution where arbitrary = do a <- QC.arbitrary b <- QC.arbitrary `suchThat` (/= a) return $ uniformDistr a b instance QC.Arbitrary CauchyDistribution where arbitrary = cauchyDistribution <$> arbitrary <*> ((abs <$> arbitrary) `suchThat` (> 0)) instance QC.Arbitrary StudentT where arbitrary = studentT <$> ((abs <$> arbitrary) `suchThat` (>0)) instance QC.Arbitrary (LinearTransform StudentT) where arbitrary = studentTUnstandardized <$> ((abs <$> arbitrary) `suchThat` (>0)) <*> ((abs <$> arbitrary)) <*> ((abs <$> arbitrary) `suchThat` (>0)) instance QC.Arbitrary FDistribution where arbitrary = fDistribution <$> ((abs <$> arbitrary) `suchThat` (>0)) <*> ((abs <$> arbitrary) `suchThat` (>0)) -- Parameters for distribution testing. Some distribution require -- relaxing parameters a bit class Param a where -- Precision for quantileIsInvCDF invQuantilePrec :: a -> Double invQuantilePrec _ = 1e-14 -- Distribution is OK for testing limits okForInfLimit :: a -> Bool okForInfLimit _ = True instance Param a instance Param StudentT where invQuantilePrec _ = 1e-13 okForInfLimit d = studentTndf d > 0.75 instance Param (LinearTransform StudentT) where invQuantilePrec _ = 1e-13 okForInfLimit d = (studentTndf . linTransDistr) d > 0.75 instance Param FDistribution where invQuantilePrec _ = 1e-12 ---------------------------------------------------------------- -- Unit tests ---------------------------------------------------------------- unitTests :: Test unitTests = testGroup "Unit tests" [ testAssertion "density (gammaDistr 150 1/150) 1 == 4.883311" $ 4.883311418525483 =~ (density (gammaDistr 150 (1/150)) 1) -- Student-T , testStudentPDF 0.3 1.34 0.0648215 -- PDF , testStudentPDF 1 0.42 0.27058 , testStudentPDF 4.4 0.33 0.352994 , testStudentCDF 0.3 3.34 0.757146 -- CDF , testStudentCDF 1 0.42 0.626569 , testStudentCDF 4.4 0.33 0.621739 -- Student-T General , testStudentUnstandardizedPDF 0.3 1.2 4 0.45 0.0533456 -- PDF , testStudentUnstandardizedPDF 4.3 (-2.4) 3.22 (-0.6) 0.0971141 , testStudentUnstandardizedPDF 3.8 0.22 7.62 0.14 0.0490523 , testStudentUnstandardizedCDF 0.3 1.2 4 0.45 0.458035 -- CDF , testStudentUnstandardizedCDF 4.3 (-2.4) 3.22 (-0.6) 0.698001 , testStudentUnstandardizedCDF 3.8 0.22 7.62 0.14 0.496076 -- F-distribution , testFdistrPDF 1 3 3 (1/(6 * pi)) -- PDF , testFdistrPDF 2 2 1.2 0.206612 , testFdistrPDF 10 12 8 0.000385613179281892790166 , testFdistrCDF 1 3 3 0.81830988618379067153 -- CDF , testFdistrCDF 2 2 1.2 0.545455 , testFdistrCDF 10 12 8 0.99935509863451408041 ] where -- Student-T testStudentPDF ndf x exact = testAssertion (printf "density (studentT %f) %f ~ %f" ndf x exact) $ eq 1e-5 exact (density (studentT ndf) x) testStudentCDF ndf x exact = testAssertion (printf "cumulative (studentT %f) %f ~ %f" ndf x exact) $ eq 1e-5 exact (cumulative (studentT ndf) x) -- Student-T General testStudentUnstandardizedPDF ndf mu sigma x exact = testAssertion (printf "density (studentTUnstandardized %f %f %f) %f ~ %f" ndf mu sigma x exact) $ eq 1e-5 exact (density (studentTUnstandardized ndf mu sigma) x) testStudentUnstandardizedCDF ndf mu sigma x exact = testAssertion (printf "cumulative (studentTUnstandardized %f %f %f) %f ~ %f" ndf mu sigma x exact) $ eq 1e-5 exact (cumulative (studentTUnstandardized ndf mu sigma) x) -- F-distribution testFdistrPDF n m x exact = testAssertion (printf "density (fDistribution %i %i) %f ~ %f [got %f]" n m x exact d) $ eq 1e-5 exact d where d = density (fDistribution n m) x testFdistrCDF n m x exact = testAssertion (printf "cumulative (fDistribution %i %i) %f ~ %f [got %f]" n m x exact d) $ eq 1e-5 exact d where d = cumulative (fDistribution n m) x statistics-0.13.2.3/tests/Tests/Function.hs0000644000000000000000000000141512504370360016704 0ustar0000000000000000module Tests.Function ( tests ) where import Statistics.Function import Test.Framework import Test.Framework.Providers.QuickCheck2 import Test.QuickCheck import Tests.Helpers import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "S.Function" [ testProperty "Sort is sort" p_sort , testAssertion "nextHighestPowerOfTwo is OK" p_nextHighestPowerOfTwo ] p_sort :: [Double] -> Property p_sort xs = not (null xs) ==> U.all (uncurry (<=)) (U.zip v $ U.tail v) where v = sort $ U.fromList xs p_nextHighestPowerOfTwo :: Bool p_nextHighestPowerOfTwo = all (\(good, is) -> all ((==good) . nextHighestPowerOfTwo) is) lists where pows = [1 .. 17 :: Int] lists = [ (2^m, [2^n+1 .. 2^m]) | (n,m) <- pows `zip` tail pows ] statistics-0.13.2.3/tests/Tests/Helpers.hs0000644000000000000000000000456012504370360016525 0ustar0000000000000000-- | Helpers for testing module Tests.Helpers ( -- * helpers T(..) , typeName -- * Generic QC tests , monotonicallyIncreases , monotonicallyIncreasesIEEE -- * HUnit helpers , testAssertion , testEquality -- * QC helpers , small , unsquare , shrinkFixedList ) where import Data.Typeable import Test.Framework import Test.Framework.Providers.HUnit import Test.QuickCheck import qualified Numeric.IEEE as IEEE import qualified Test.HUnit as HU -- | Phantom typed value used to select right instance in QC tests data T a = T -- | String representation of type name typeName :: Typeable a => T a -> String typeName = show . typeOf . typeParam where typeParam :: T a -> a typeParam _ = undefined ---------------------------------------------------------------- -- Generic QC ---------------------------------------------------------------- -- Check that function is nondecreasing monotonicallyIncreases :: (Ord a, Ord b) => (a -> b) -> a -> a -> Bool monotonicallyIncreases f x1 x2 = f (min x1 x2) <= f (max x1 x2) -- Check that function is nondecreasing taking rounding errors into -- account. -- -- In fact funstion is allowed to decrease less than one ulp in order -- to guard againist problems with excess precision. On x86 FPU works -- with 80-bit numbers but doubles are 64-bit so rounding happens -- whenever values are moved from registers to memory monotonicallyIncreasesIEEE :: (Ord a, IEEE.IEEE b) => (a -> b) -> a -> a -> Bool monotonicallyIncreasesIEEE f x1 x2 = y1 <= y2 || (y1 - y2) < y2 * IEEE.epsilon where y1 = f (min x1 x2) y2 = f (max x1 x2) ---------------------------------------------------------------- -- HUnit helpers ---------------------------------------------------------------- testAssertion :: String -> Bool -> Test testAssertion str cont = testCase str $ HU.assertBool str cont testEquality :: (Show a, Eq a) => String -> a -> a -> Test testEquality msg a b = testCase msg $ HU.assertEqual msg a b unsquare :: (Arbitrary a, Show a, Testable b) => (a -> b) -> Property unsquare = forAll (small arbitrary) small :: Gen a -> Gen a small act = sized $ \n -> resize (smallish n) act where smallish = round . (sqrt :: Double -> Double) . fromIntegral . abs shrinkFixedList :: (a -> [a]) -> [a] -> [[a]] shrinkFixedList shr (x:xs) = map (:xs) (shr x) ++ map (x:) (shrinkFixedList shr xs) shrinkFixedList _ [] = [] statistics-0.13.2.3/tests/Tests/KDE.hs0000644000000000000000000000213512504370360015522 0ustar0000000000000000-- | Tests for Kernel density estimates. module Tests.KDE ( tests )where import Data.Vector.Unboxed ((!)) import Numeric.Sum (kbn, sumVector) import Statistics.Sample.KernelDensity import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck (Property, (==>), counterexample) import Text.Printf (printf) import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "KDE" [ testProperty "integral(PDF) == 1" t_densityIsPDF ] t_densityIsPDF :: [Double] -> Property t_densityIsPDF vec = not (null vec) ==> test where (xs,ys) = kde 4096 (U.fromList vec) step = (xs ! 1) - (xs ! 0) integral = integratePDF step ys -- test = counterexample (printf "Integral %f" integral) $ abs (1 - integral) <= 1e-3 integratePDF :: Double -> U.Vector Double -> Double integratePDF step vec = step * sumVector kbn (U.zipWith (*) vec weights) where n = U.length vec weights = U.generate n go where go i | i == 0 = 0.5 | i == n-1 = 0.5 | otherwise = 1 statistics-0.13.2.3/tests/Tests/Matrix.hs0000644000000000000000000000227712504370360016372 0ustar0000000000000000module Tests.Matrix (tests) where import Statistics.Matrix hiding (map) import Statistics.Matrix.Algorithms import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck import Tests.ApproxEq (ApproxEq(..)) import Tests.Matrix.Types import qualified Data.Vector.Unboxed as U t_row :: Mat Double -> Gen Property t_row ms@(Mat r _ xs) = do i <- choose (0,r-1) return $ row (fromMat ms) i === U.fromList (xs !! i) t_column :: Mat Double -> Gen Property t_column ms@(Mat _ c xs) = do i <- choose (0,c-1) return $ column (fromMat ms) i === U.fromList (map (!! i) xs) t_center :: Mat Double -> Property t_center ms@(Mat r c xs) = (xs !! (r `quot` 2)) !! (c `quot` 2) === center (fromMat ms) t_transpose :: Matrix -> Property t_transpose m = U.concat (map (column n) [0..rows m-1]) === toVector m where n = transpose m t_qr :: Matrix -> Property t_qr a = hasNaN p .||. eql 1e-10 a p where p = uncurry multiply (qr a) tests :: Test tests = testGroup "Matrix" [ testProperty "t_row" t_row , testProperty "t_column" t_column , testProperty "t_center" t_center , testProperty "t_transpose" t_transpose , testProperty "t_qr" t_qr ] statistics-0.13.2.3/tests/Tests/NonParametric.hs0000644000000000000000000002761412504370360017672 0ustar0000000000000000-- Tests for Statistics.Test.NonParametric module Tests.NonParametric (tests) where import Statistics.Distribution.Normal (standard) import Statistics.Test.KolmogorovSmirnov import Statistics.Test.MannWhitneyU import Statistics.Test.KruskalWallis import Statistics.Test.WilcoxonT import Test.Framework (Test, testGroup) import Test.Framework.Providers.HUnit import Test.HUnit (assertEqual) import Tests.ApproxEq (eq) import Tests.Helpers (testAssertion, testEquality) import Tests.NonParametric.Table (tableKSD, tableKS2D) import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "Nonparametric tests" $ concat [ mannWhitneyTests , wilcoxonSumTests , wilcoxonPairTests , kruskalWallisRankTests , kruskalWallisTests , kolmogorovSmirnovDTest ] ---------------------------------------------------------------- mannWhitneyTests :: [Test] mannWhitneyTests = zipWith test [(0::Int)..] testData ++ [ testEquality "Mann-Whitney U Critical Values, m=1" (replicate (20*3) Nothing) [mannWhitneyUCriticalValue (1,x) p | x <- [1..20], p <- [0.005,0.01,0.025]] , testEquality "Mann-Whitney U Critical Values, m=2, p=0.025" (replicate 7 Nothing ++ map Just [0,0,0,0,1,1,1,1,1,2,2,2,2]) [mannWhitneyUCriticalValue (2,x) 0.025 | x <- [1..20]] , testEquality "Mann-Whitney U Critical Values, m=6, p=0.05" (replicate 1 Nothing ++ map Just [0, 2,3,5,7,8,10,12,14,16,17,19,21,23,25,26,28,30,32]) [mannWhitneyUCriticalValue (6,x) 0.05 | x <- [1..20]] , testEquality "Mann-Whitney U Critical Values, m=20, p=0.025" (replicate 1 Nothing ++ map Just [2,8,14,20,27,34,41,48,55,62,69,76,83,90,98,105,112,119,127]) [mannWhitneyUCriticalValue (20,x) 0.025 | x <- [1..20]] ] where test n (a, b, c, d) = testCase "Mann-Whitney" $ do assertEqual ("Mann-Whitney U " ++ show n) c us assertEqual ("Mann-Whitney U Sig " ++ show n) d ss where us = mannWhitneyU (U.fromList a) (U.fromList b) ss = mannWhitneyUSignificant TwoTailed (length a, length b) 0.05 us -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double), Maybe TestResult)] testData = [ ( [3,4,2,6,2,5] , [9,7,5,10,6,8] , (2, 34) , Just Significant ) , ( [540,480,600,590,605] , [760,890,1105,595,940] , (2, 23) , Just Significant ) , ( [19,22,16,29,24] , [20,11,17,12] , (17, 3) , Just NotSignificant ) , ( [126,148,85,61, 179,93, 45,189,85,93] , [194,128,69,135,171,149,89,248,79,137] , (35,65) , Just NotSignificant ) , ( [1..30] , [1..30] , (450,450) , Just NotSignificant ) , ( [1 .. 30] , [11.5 .. 40 ] , (190.0,710.0) , Just Significant ) ] wilcoxonSumTests :: [Test] wilcoxonSumTests = zipWith test [(0::Int)..] testData where test n (a, b, c) = testCase "Wilcoxon Sum" $ assertEqual ("Wilcoxon Sum " ++ show n) c (wilcoxonRankSums (U.fromList a) (U.fromList b)) -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double))] testData = [ ( [8.50,9.48,8.65,8.16,8.83,7.76,8.63] , [8.27,8.20,8.25,8.14,9.00,8.10,7.20,8.32,7.70] , (75, 61) ) , ( [0.45,0.50,0.61,0.63,0.75,0.85,0.93] , [0.44,0.45,0.52,0.53,0.56,0.58,0.58,0.65,0.79] , (71.5, 64.5) ) ] wilcoxonPairTests :: [Test] wilcoxonPairTests = zipWith test [(0::Int)..] testData ++ -- Taken from the Mitic paper: [ testAssertion "Sig 16, 35" (to4dp 0.0467 $ wilcoxonMatchedPairSignificance 16 35) , testAssertion "Sig 16, 36" (to4dp 0.0523 $ wilcoxonMatchedPairSignificance 16 36) , testEquality "Wilcoxon critical values, p=0.05" (replicate 4 Nothing ++ map Just [0,2,3,5,8,10,13,17,21,25,30,35,41,47,53,60,67,75,83,91,100,110,119]) [wilcoxonMatchedPairCriticalValue x 0.05 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.025" (replicate 5 Nothing ++ map Just [0,2,3,5,8,10,13,17,21,25,29,34,40,46,52,58,65,73,81,89,98,107]) [wilcoxonMatchedPairCriticalValue x 0.025 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.01" (replicate 6 Nothing ++ map Just [0,1,3,5,7,9,12,15,19,23,27,32,37,43,49,55,62,69,76,84,92]) [wilcoxonMatchedPairCriticalValue x 0.01 | x <- [1..27]] , testEquality "Wilcoxon critical values, p=0.005" (replicate 7 Nothing ++ map Just [0,1,3,5,7,9,12,15,19,23,27,32,37,42,48,54,61,68,75,83]) [wilcoxonMatchedPairCriticalValue x 0.005 | x <- [1..27]] ] where test n (a, b, c) = testEquality ("Wilcoxon Paired " ++ show n) c res where res = (wilcoxonMatchedPairSignedRank (U.fromList a) (U.fromList b)) -- List of (Sample A, Sample B, (Positive Rank, Negative Rank)) testData :: [([Double], [Double], (Double, Double))] testData = [ ([1..10], [1..10], (0, 0 )) , ([1..5], [6..10], (0, 5*(-3))) -- Worked example from the Internet: , ( [125,115,130,140,140,115,140,125,140,135] , [110,122,125,120,140,124,123,137,135,145] , ( sum $ filter (> 0) [7,-3,1.5,9,0,-4,8,-6,1.5,-5] , sum $ filter (< 0) [7,-3,1.5,9,0,-4,8,-6,1.5,-5] ) ) -- Worked examples from books/papers: , ( [2.4,1.9,2.3,1.9,2.4,2.5] , [2.0,2.1,2.0,2.0,1.8,2.0] , (18, -3) ) , ( [130,170,125,170,130,130,145,160] , [120,163,120,135,143,136,144,120] , (27, -9) ) , ( [540,580,600,680,430,740,600,690,605,520] , [760,710,1105,880,500,990,1050,640,595,520] , (3, -42) ) ] to4dp tgt x = x >= tgt - 0.00005 && x < tgt + 0.00005 ---------------------------------------------------------------- kruskalWallisRankTests :: [Test] kruskalWallisRankTests = zipWith test [(0::Int)..] testData where test n (a, b) = testCase "Kruskal-Wallis Ranking" $ assertEqual ("Kruskal-Wallis " ++ show n) (map U.fromList b) (kruskalWallisRank $ map U.fromList a) testData = [ ( [ [68,93,123,83,108,122] , [119,116,101,103,113,84] , [70,68,54,73,81,68] , [61,54,59,67,59,70] ] , [ [8.0,14.0,16.0,19.0,23.0,24.0] , [15.0,17.0,18.0,20.0,21.0,22.0] , [1.5,8.0,8.0,10.5,12.0,13.0] , [1.5,3.5,3.5,5.0,6.0,10.5] ] ) ] kruskalWallisTests :: [Test] kruskalWallisTests = zipWith test [(0::Int)..] testData where test n (a, b, c) = testCase "Kruskal-Wallis" $ do assertEqual ("Kruskal-Wallis " ++ show n) (round100 b) (round100 kw) assertEqual ("Kruskal-Wallis Sig " ++ show n) c kwt where kw = kruskalWallis $ map U.fromList a kwt = kruskalWallisTest 0.05 $ map U.fromList a round100 :: Double -> Integer round100 = round . (*100) testData = [ ( [ [68,93,123,83,108,122] , [119,116,101,103,113,84] , [70,68,54,73,81,68] , [61,54,59,67,59,70] ] , 16.03 , Just Significant ) , ( [ [5,5,3,5,5,5,5] , [5,5,5,5,7,5,5] , [5,5,6,5,5,5,5] , [4,5,5,5,6,5,5] ] , 2.24 , Just NotSignificant ) , ( [ [36,48,5,67,53] , [49,33,60,2,55] , [71,31,140,59,42] ] , 1.22 , Just NotSignificant ) , ( [ [6,38,3,17,11,30,15,16,25,5] , [34,28,42,13,40,31,9,32,39,27] , [13,35,19,4,29,0,7,33,18,24] ] , 6.10 , Just Significant ) ] ---------------------------------------------------------------- -- K-S test ---------------------------------------------------------------- kolmogorovSmirnovDTest :: [Test] kolmogorovSmirnovDTest = [ testAssertion "K-S D statistics" $ and [ eq 1e-6 (kolmogorovSmirnovD standard (toU sample)) reference | (reference,sample) <- tableKSD ] , testAssertion "K-S 2-sample statistics" $ and [ eq 1e-6 (kolmogorovSmirnov2D (toU xs) (toU ys)) reference | (reference,xs,ys) <- tableKS2D ] , testAssertion "K-S probability" $ and [ eq 1e-14 (kolmogorovSmirnovProbability n d) p | (d,n,p) <- testData ] ] where toU = U.fromList -- Test data for the calculation of cumulative probability -- P(D[n] < d). -- -- Test data is: -- (D[n], n, p) -- Table is generated using sample program from paper testData :: [(Double,Int,Double)] testData = [ (0.09 , 3, 0 ) , (0.2 , 3, 0.00177777777777778 ) , (0.301 , 3, 0.116357025777778 ) , (0.392 , 3, 0.383127210666667 ) , (0.5003 , 3, 0.667366306558667 ) , (0.604 , 3, 0.861569877333333 ) , (0.699 , 3, 0.945458198 ) , (0.802 , 3, 0.984475216 ) , (0.9 , 3, 0.998 ) , (0.09 , 5, 0 ) , (0.2 , 5, 0.0384 ) , (0.301 , 5, 0.33993786080016 ) , (0.392 , 5, 0.66931908083712 ) , (0.5003 , 5, 0.888397260183794 ) , (0.604 , 5, 0.971609957879808 ) , (0.699 , 5, 0.994331075994008 ) , (0.802 , 5, 0.999391366368064 ) , (0.9 , 5, 0.99998 ) , (0.09 , 8, 3.37615237575e-06 ) , (0.2 , 8, 0.151622071801758 ) , (0.301 , 8, 0.613891042670582 ) , (0.392 , 8, 0.871491561427005 ) , (0.5003 , 8, 0.977534089199071 ) , (0.604 , 8, 0.997473116268255 ) , (0.699 , 8, 0.999806082005123 ) , (0.802 , 8, 0.999995133786947 ) , (0.9 , 8, 0.99999998 ) , (0.09 , 10, 3.89639433093119e-05) , (0.2 , 10, 0.25128096 ) , (0.301 , 10, 0.732913126355935 ) , (0.392 , 10, 0.932185254518767 ) , (0.5003 , 10, 0.992276179340446 ) , (0.604 , 10, 0.999495533516769 ) , (0.699 , 10, 0.999979691783985 ) , (0.802 , 10, 0.999999801409237 ) , (0.09 , 20, 0.00794502217168886 ) , (0.2 , 20, 0.647279826376584 ) , (0.301 , 20, 0.958017466965765 ) , (0.392 , 20, 0.997206424843499 ) , (0.5003 , 20, 0.999962641414228 ) , (0.09 , 30, 0.0498147538075168 ) , (0.2 , 30, 0.842030838984526 ) , (0.301 , 30, 0.993403560017612 ) , (0.392 , 30, 0.99988478803318 ) , (0.09 , 100, 0.629367974413669 ) ] statistics-0.13.2.3/tests/Tests/Transform.hs0000644000000000000000000001315612504370360017077 0ustar0000000000000000{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE ViewPatterns #-} module Tests.Transform ( tests ) where import Data.Bits ((.&.), shiftL) import Data.Complex (Complex((:+))) import Data.Functor ((<$>)) import Numeric.Sum (kbn, sumVector) import Statistics.Function (within) import Statistics.Transform (CD, dct, fft, idct, ifft) import Test.Framework (Test, testGroup) import Test.Framework.Providers.QuickCheck2 (testProperty) import Test.QuickCheck (Positive(..), Arbitrary(..), Gen, choose, vectorOf, counterexample) import Test.QuickCheck.Property (Property(..)) import Tests.Helpers (testAssertion) import Text.Printf (printf) import qualified Data.Vector.Generic as G import qualified Data.Vector.Unboxed as U tests :: Test tests = testGroup "fft" [ testProperty "t_impulse" t_impulse , testProperty "t_impulse_offset" t_impulse_offset , testProperty "ifft . fft = id" (t_fftInverse $ ifft . fft) , testProperty "fft . ifft = id" (t_fftInverse $ fft . ifft) , testProperty "idct . dct = id [up to scale]" (t_fftInverse (\v -> U.map (/ (2 * fromIntegral (U.length v))) $ idct $ dct v)) , testProperty "dct . idct = id [up to scale]" (t_fftInverse (\v -> U.map (/ (2 * fromIntegral (U.length v))) $ idct $ dct v)) -- Exact small size DCT -- 1 , testDCT [1] $ [2] -- 2 , testDCT [1,0] $ map (*2) [1, cos (pi/4) ] , testDCT [0,1] $ map (*2) [1, cos (3*pi/4) ] -- 4 , testDCT [1,0,0,0] $ map (*2) [1, cos( pi/8), cos( 2*pi/8), cos( 3*pi/8)] , testDCT [0,1,0,0] $ map (*2) [1, cos(3*pi/8), cos( 6*pi/8), cos( 9*pi/8)] , testDCT [0,0,1,0] $ map (*2) [1, cos(5*pi/8), cos(10*pi/8), cos(15*pi/8)] , testDCT [0,0,0,1] $ map (*2) [1, cos(7*pi/8), cos(14*pi/8), cos(21*pi/8)] -- Exact small size IDCT -- 1 , testIDCT [1] [1] -- 2 , testIDCT [1,0] [1, 1 ] , testIDCT [0,1] $ map (*2) [cos(pi/4), cos(3*pi/4)] -- 4 , testIDCT [1,0,0,0] [1, 1, 1, 1 ] , testIDCT [0,1,0,0] $ map (*2) [cos( pi/8), cos( 3*pi/8), cos( 5*pi/8), cos( 7*pi/8) ] , testIDCT [0,0,1,0] $ map (*2) [cos( 2*pi/8), cos( 6*pi/8), cos(10*pi/8), cos(14*pi/8) ] , testIDCT [0,0,0,1] $ map (*2) [cos( 3*pi/8), cos( 9*pi/8), cos(15*pi/8), cos(21*pi/8) ] ] -- A single real-valued impulse at the beginning of an otherwise zero -- vector should be replicated in every real component of the result, -- and all the imaginary components should be zero. t_impulse :: Double -> Positive Int -> Bool t_impulse k (Positive m) = G.all (c_near i) (fft v) where v = i `G.cons` G.replicate (n-1) 0 i = k :+ 0 n = 1 `shiftL` (m .&. 6) -- If a real-valued impulse is offset from the beginning of an -- otherwise zero vector, the sum-of-squares of each component of the -- result should equal the square of the impulse. t_impulse_offset :: Double -> Positive Int -> Positive Int -> Bool t_impulse_offset k (Positive x) (Positive m) = G.all ok (fft v) where v = G.concat [G.replicate xn 0, G.singleton i, G.replicate (n-xn-1) 0] ok (re :+ im) = within ulps (re*re + im*im) (k*k) i = k :+ 0 xn = x `rem` n n = 1 `shiftL` (m .&. 6) -- Test that (ifft . fft ≈ id) -- -- Approximate equality here is tricky. Smaller values of vector tend -- to have large relative error. Thus we should test that vectors as -- whole are approximate equal. t_fftInverse :: (HasNorm (U.Vector a), U.Unbox a, Num a, Show a, Arbitrary a) => (U.Vector a -> U.Vector a) -> Property t_fftInverse roundtrip = MkProperty $ do x <- genFftVector let n = G.length x x' = roundtrip x d = G.zipWith (-) x x' nd = vectorNorm d nx = vectorNorm x unProperty $ counterexample "Original vector" $ counterexample (show x ) $ counterexample "Transformed one" $ counterexample (show x') $ counterexample (printf "Length = %i" n) $ counterexample (printf "|x - x'| / |x| = %.6g" (nd / nx)) $ nd <= 3e-14 * nx -- Test discrete cosine transform testDCT :: [Double] -> [Double] -> Test testDCT (U.fromList -> vec) (U.fromList -> res) = testAssertion ("DCT test for " ++ show vec) $ vecEqual 3e-14 (dct vec) res -- Test inverse discrete cosine transform testIDCT :: [Double] -> [Double] -> Test testIDCT (U.fromList -> vec) (U.fromList -> res) = testAssertion ("IDCT test for " ++ show vec) $ vecEqual 3e-14 (idct vec) res ---------------------------------------------------------------- -- With an error tolerance of 8 ULPs, a million QuickCheck tests are -- likely to all succeed. With a tolerance of 7, we fail around the -- half million mark. ulps :: Int ulps = 8 c_near :: CD -> CD -> Bool c_near (a :+ b) (c :+ d) = within ulps a c && within ulps b d -- Arbitrary vector for FFT od DCT genFftVector :: (U.Unbox a, Arbitrary a) => Gen (U.Vector a) genFftVector = do n <- (2^) <$> choose (1,9::Int) -- Size of vector G.fromList <$> vectorOf n arbitrary -- Vector to transform -- Ad-hoc type class for calculation of vector norm class HasNorm a where vectorNorm :: a -> Double instance HasNorm (U.Vector Double) where vectorNorm = sqrt . sumVector kbn . U.map (\x -> x*x) instance HasNorm (U.Vector CD) where vectorNorm = sqrt . sumVector kbn . U.map (\(x :+ y) -> x*x + y*y) -- Approximate equality for vectors vecEqual :: Double -> U.Vector Double -> U.Vector Double -> Bool vecEqual ε v u = vectorNorm (U.zipWith (-) v u) < ε * vectorNorm v statistics-0.13.2.3/tests/Tests/Math/0000755000000000000000000000000012504370360015453 5ustar0000000000000000statistics-0.13.2.3/tests/Tests/Math/gen.py0000644000000000000000000000264412504370360016604 0ustar0000000000000000#!/usr/bin/python """ """ from mpmath import * def printListLiteral(lines) : print " [" + "\n , ".join(lines) + "\n ]" ################################################################ # Generate header print "module Tests.Math.Tables where" print ################################################################ ## Generate table for logGamma print "tableLogGamma :: [(Double,Double)]" print "tableLogGamma =" gammaArg = [ 1.25e-6, 6.82e-5, 2.46e-4, 8.8e-4, 3.12e-3, 2.67e-2, 7.77e-2, 0.234, 0.86, 1.34, 1.89, 2.45, 3.65, 4.56, 6.66, 8.25, 11.3, 25.6, 50.4, 123.3, 487.4, 853.4, 2923.3, 8764.3, 1.263e4, 3.45e4, 8.234e4, 2.348e5, 8.343e5, 1.23e6, ] printListLiteral( [ '(%.15f, %.20g)' % (x, log(gamma(x))) for x in gammaArg ] ) ################################################################ ## Generate table for incompleteBeta print "tableIncompleteBeta :: [(Double,Double,Double,Double)]" print "tableIncompleteBeta =" incompleteBetaArg = [ (2, 3, 0.03), (2, 3, 0.23), (2, 3, 0.76), (4, 2.3, 0.89), (1, 1, 0.55), (0.3, 12.2, 0.11), (13.1, 9.8, 0.12), (13.1, 9.8, 0.42), (13.1, 9.8, 0.92), ] printListLiteral( [ '(%.15f, %.15f, %.15f, %.20g)' % (p,q,x, betainc(p,q,0,x, regularized=True)) for (p,q,x) in incompleteBetaArg ]) statistics-0.13.2.3/tests/Tests/Math/Tables.hs0000644000000000000000000000454512504370360017231 0ustar0000000000000000module Tests.Math.Tables where tableLogGamma :: [(Double,Double)] tableLogGamma = [(0.000001250000000, 13.592366285131769033) , (0.000068200000000, 9.5930266308318756785) , (0.000246000000000, 8.3100370767447966358) , (0.000880000000000, 7.03508133735248542) , (0.003120000000000, 5.768129358365567505) , (0.026700000000000, 3.6082588918892977148) , (0.077700000000000, 2.5148371858768232556) , (0.234000000000000, 1.3579557559432759994) , (0.860000000000000, 0.098146578027685615897) , (1.340000000000000, -0.11404757557207759189) , (1.890000000000000, -0.0425116422978701336) , (2.450000000000000, 0.25014296569217625565) , (3.650000000000000, 1.3701041997380685178) , (4.560000000000000, 2.5375143317949580002) , (6.660000000000000, 5.9515377269550207018) , (8.250000000000000, 9.0331869196051233217) , (11.300000000000001, 15.814180681373947834) , (25.600000000000001, 56.711261598328121636) , (50.399999999999999, 146.12815158702164808) , (123.299999999999997, 468.85500075897556371) , (487.399999999999977, 2526.9846647543727158) , (853.399999999999977, 4903.9359135978220365) , (2923.300000000000182, 20402.93198938705973) , (8764.299999999999272, 70798.268343590112636) , (12630.000000000000000, 106641.77264982508495) , (34500.000000000000000, 325976.34838781820145) , (82340.000000000000000, 849629.79603036714252) , (234800.000000000000000, 2668846.4390507959761) , (834300.000000000000000, 10540830.912557534873) , (1230000.000000000000000, 16017699.322315014899) ] tableIncompleteBeta :: [(Double,Double,Double,Double)] tableIncompleteBeta = [(2.000000000000000, 3.000000000000000, 0.030000000000000, 0.0051864299999999996862) , (2.000000000000000, 3.000000000000000, 0.230000000000000, 0.22845923000000001313) , (2.000000000000000, 3.000000000000000, 0.760000000000000, 0.95465728000000005249) , (4.000000000000000, 2.300000000000000, 0.890000000000000, 0.93829812158347802864) , (1.000000000000000, 1.000000000000000, 0.550000000000000, 0.55000000000000004441) , (0.300000000000000, 12.199999999999999, 0.110000000000000, 0.95063000053947077639) , (13.100000000000000, 9.800000000000001, 0.120000000000000, 1.3483109941962659385e-07) , (13.100000000000000, 9.800000000000001, 0.420000000000000, 0.071321857831804780226) , (13.100000000000000, 9.800000000000001, 0.920000000000000, 0.99999578339197081611) ] statistics-0.13.2.3/tests/Tests/Matrix/0000755000000000000000000000000012504370360016026 5ustar0000000000000000statistics-0.13.2.3/tests/Tests/Matrix/Types.hs0000644000000000000000000000243112504370360017466 0ustar0000000000000000{-# LANGUAGE DeriveFunctor #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Tests.Matrix.Types ( Mat(..) , fromMat , toMat ) where import Control.Monad (join) import Control.Applicative ((<$>), (<*>)) import Statistics.Matrix (Matrix(..), fromList) import Test.QuickCheck import Tests.Helpers (shrinkFixedList, small) import qualified Data.Vector.Unboxed as U data Mat a = Mat { mrows :: Int , mcols :: Int , asList :: [[a]] } deriving (Eq, Ord, Show, Functor) fromMat :: Mat Double -> Matrix fromMat (Mat r c xs) = fromList r c (concat xs) toMat :: Matrix -> Mat Double toMat (Matrix r c _ v) = Mat r c . split . U.toList $ v where split xs@(_:_) = let (h,t) = splitAt c xs in h : split t split [] = [] instance (Arbitrary a) => Arbitrary (Mat a) where arbitrary = small $ join (arbMat <$> arbitrary <*> arbitrary) shrink (Mat r c xs) = Mat r c <$> shrinkFixedList (shrinkFixedList shrink) xs arbMat :: (Arbitrary a) => Positive (Small Int) -> Positive (Small Int) -> Gen (Mat a) arbMat (Positive (Small r)) (Positive (Small c)) = Mat r c <$> vectorOf r (vector c) instance Arbitrary Matrix where arbitrary = fromMat <$> arbitrary -- shrink = map fromMat . shrink . toMat statistics-0.13.2.3/tests/Tests/NonParametric/0000755000000000000000000000000012504370360017324 5ustar0000000000000000statistics-0.13.2.3/tests/Tests/NonParametric/Table.hs0000644000000000000000000001332712504370360020715 0ustar0000000000000000module Tests.NonParametric.Table ( tableKSD , tableKS2D ) where -- Table for Kolmogorov-Smirnov statistics for standard normal -- distribution. Generated using R. -- -- First element of tuple is D second is sample for which it was -- calculated. tableKSD :: [(Double,[Double])] tableKSD = [ (0.2012078,[1.360645,-0.3151904,-1.245443,0.1741977,-0.1421206,-1.798246,1.171594,-1.335844,-5.050093e-2,1.030063,-1.849005,0.6491455,-0.7028004]) , (0.2569956,[0.3884734,-1.227821,-0.4166262,0.429118,-0.9280124,0.8025867,-0.6703089,-0.2124872,0.1224496,0.1087734,-4.285284e-2,-1.039936,-0.7071956]) , (0.1960356,[-1.814745,-0.6327167,0.7082493,0.6264716,1.02061,-0.4094635,0.821026,-0.4255047,-0.4820728,-0.2239833,0.648517,1.114283,0.3610216]) , (0.2095746,[0.187011,0.1805498,0.4448389,0.6065506,0.2308673,0.5292549,-1.489902,-1.455191,0.5449396,-0.1436403,-0.7977073,-0.2693545,0.8260888,-1.474473,-2.158696e-2,-0.1455387]) , (0.1922603,[0.5772317,-1.255561,1.605823,0.4923361,0.2470848,1.176101,-0.3767689,-0.6896885,0.4509345,-0.5048447,0.9436534,1.025599,0.2998393,-3.415219e-2,1.264315,-1.44433,-1.646449e-2]) , (0.2173401,[1.812807,-0.8687497,-0.5710508,1.003647,1.142621,0.6546577,-6.083323e-3,1.628574e-2,1.067499,-1.953143,-0.6060077,1.90859,-0.7480553,0.6715162,-0.928759,1.862,1.604621,-0.2171044,-0.1835918]) , (0.2510541,[-0.4769572,1.062319,0.9952284,1.198086,1.015589,-0.4154523,-0.6711762,1.202902,0.2217098,5.381759e-2,0.6679715,0.2551287,-0.1371492]) , (0.1996022,[1.158607,-0.7354863,1.526559,-0.7855418,-2.82999,-0.6045106,-0.1830228,0.3306812,-0.819657,-1.223715,0.2536423,-0.4155781,1.447042]) , (0.2284761,[1.239965,0.8187093,0.5199788,1.172072,0.748259,1.869376e-2,0.1625921,-1.712065,0.7043582,-1.702702,-0.4792806,-0.1023351,0.1187189]) , (0.2337866,[0.9417261,-0.1024297,-0.7354359,1.099991,0.801984,-0.3745397,-1.749564,1.795771,1.099963,-0.605557,-2.035897,1.893603,-0.3468928,-0.2593938,2.100988,0.9665698,0.8757091,0.7696328,0.8730729,-0.3990352,2.04361,-0.4617864,-0.155021,2.15774,0.2687795,-0.9853512,-0.3264898,1.260026,4.267695,-0.5571145,0.6307067,-0.1691405,-1.730686]) , (0.3389167,[2.025542,-1.542641,-1.090238,3.99027,9.949129e-2,-0.8974433,-2.508418,6.390346,-2.675515,1.154459,1.688072,2.220727,-0.4743102]) , (0.4920231,[0.5192906,-3.260813,-1.245185,1.693084,3.561318,4.058924,2.27063,0.9446943,4.794159,-3.423733,0.8240817,0.644059,0.900175,1.932513,1.024586,2.82823,2.072192,-0.353231,-0.4319673,1.505952,1.0199,4.555054,2.364929,5.531467,3.279415,3.19821,2.726925,1.680027,-0.9041334,-0.8246765,-1.343979,8.454955,1.354581]) , (0.6727408,[-6.705672,-3.193988,-4.612611,-3.207994,-5.070172,-6.141169,-0.397149,-4.093359,-1.204801,-3.986585,-2.724662,0.9868107,-6.295266,-5.95839,-6.35114,-1.679555,-2.635889,-4.050329,1.557428,-2.548465,-0.9073924,-1.502018,-4.535688,-4.158818,-8.833434,-5.944697,-1.569672,-4.70399,-7.832059,-4.093708,-8.393417,-2.085432,-7.06495,-0.4230419,-3.046822,-3.23895,-0.9265873,-9.227822,3.293713,-5.593577,-5.942398,-4.358421,2.660044,-4.301572,-1.258879,0.1499903,3.572833,-3.19844,0.8652432,-0.3025793,-1.576673,-7.666265,-6.751463,-1.398944,-2.690656,-1.429654,7.508364e-2,0.7998344,-3.562074,-1.021431,1.342968,2.110244,-7.561497,-2.372083,-3.649193,-5.7723,-1.068083,0.7537809,-4.569546,-1.198005,-5.638384,-1.227226,-1.195852,-1.118175,-9.130527,0.9675821,-2.497391,0.5988562,-1.965783,-4.25741,-6.547006,-1.459294,-2.380556,-3.977307,-7.809006,-4.276819,-4.028746,-9.055546,-3.599239,-1.470512,-8.253329,-1.351687,-4.269324,-6.140353,-6.30808,-1.834091,-3.135146,-9.391791,3.117815,-5.554733,-2.556769,-3.287376,-2.064013,-5.741995,-5.047918,-4.808841,-1.488526,-0.2351115,-5.760833,-2.722929,-7.012353,2.281171,-3.890514,-1.516824,-1.41011,-1.828457,-5.561244,-3.472142,-10.16919,-0.4369042,-5.698953,-4.587462,-4.897086]) ] -- Table for 2-sample Kolmogorov-Smirnov statistics. Generated using R -- -- First element is D, second and third are samples tableKS2D :: [(Double,[Double],[Double])] tableKS2D = [ (0.2820513,[-0.4212928,2.146532,0.7585263,-0.5086105,-0.7725486,6.235548e-2,-0.1849861,0.861972,-0.1958534,-3.379697e-2,-1.316854,0.6701269],[0.4957582,0.4241167,0.9822869,0.4504248,-0.1749617,1.178098,-1.117222,-0.859273,0.3073736,0.4344583,-0.4761338,-1.332374,1.487291]) , (0.2820513,[-0.712252,0.7990333,-0.7968473,1.443609,1.163096,-1.349071,-0.1553941,-2.003104,-0.3400618,-0.7019282,0.183293,-0.2352167],[-0.4622455,-0.8132221,0.1161614,-1.472115e-2,1.001454,-6.557789e-2,-0.2531216,-1.032432,0.4105478,1.749614,0.9722899,5.850942e-2,-0.3352746]) , (0.2564103,[0.3509882,-0.2982833,1.314731,1.264223,-0.8156374,0.3734029,-3.288915e-2,0.6766016,0.9786335,0.1079949,-0.4211722,1.58656],[0.8024675,7.464538e-2,0.2739861,-2.334255e-2,0.5611802,0.6683374,0.4358206,0.349843,1.207834,1.402578,-0.4049183,0.4286042,1.665129]) , (0.1833333,[1.376196,9.926384e-2,2.199292,-2.04993,0.5585353,-0.4812132,0.1041527,2.084774,0.71194,-1.398245,-4.458574e-2,1.484945,-1.473182,1.020076,-0.7019646,0.2182066,-1.702963,-0.3522622,-0.8129267,-0.6338972],[-1.020371,0.3323861,1.513288,0.1958708,-1.0723,5.323446e-2,-0.9993713,-0.7046356,-0.6781067,-0.4471603,1.512042,-0.2650665,-4.765228e-2,-1.501205,1.228664,0.5332935,-0.2960315,-0.1509683]) , (0.5666667,[0.7145305,0.1255674,2.001531,0.1419216],[2.113474,-0.3352839,-0.4962429,-1.386079,0.6404667,-0.7145304,0.1084008,-0.9821421,-2.270472,-1.003846,-0.5644588,2.699695,-1.296494,-0.1538839,1.319094,-1.127544,0.3568889,0.2004726,-1.313291,0.3581084,0.3313498,0.9336278,0.9850203,-1.309506,1.170459,-0.7517466,-1.771269,0.7156381,-1.129691,0.877729]) , (0.5,[0.6950626,0.1643805,-0.3102472,0.4810762,0.1844602,1.338836,-0.8083386,-0.5482141,0.9532421,-0.2644837],[7.527945,-1.95654,1.513725,-1.318431,2.453895,0.2078194,0.7371092,2.834245,-2.134794,3.938259]) ] statistics-0.13.2.3/tests/utils/0000755000000000000000000000000012504370360014620 5ustar0000000000000000statistics-0.13.2.3/tests/utils/fftw.c0000644000000000000000000000200612504370360015730 0ustar0000000000000000/* Generate some test cases using fftw3 */ #include #include #include void dump_vector(int n, double* vec) { for(int i = 0; i < n; i++) printf("%20.15f ", vec[i]); printf("\n"); } void dct(int flag, int n) { double* in = malloc( n * sizeof(double)); double* out = malloc( n * sizeof(double)); // fftw_plan plan = fftw_plan_r2r_1d(n, in, out, flag, FFTW_ESTIMATE); for( int k = 0; k < n; k++) { // Init input vector for( int i = 0; i < n; i++) in[i] = 0; in[k] = 1; // Perform DFT fftw_execute(plan); // Print results dump_vector(n, in ); dump_vector(n, out); printf("\n"); } // free(in); free(out); fftw_destroy_plan(plan); } int main(void) { printf("DCT II (the DCT)\n"); dct( FFTW_REDFT10, 2); dct( FFTW_REDFT10, 4); printf("DCT III (Inverse DCT)\n"); dct( FFTW_REDFT01, 2); dct( FFTW_REDFT01, 4); return 0; } statistics-0.13.2.3/tests/utils/Makefile0000644000000000000000000000017012504370360016256 0ustar0000000000000000C = gcc CFLAGS = -W -Wall -O2 -std=c99 LDFLAGS = -lfftw3 .PHONY: all clean all : fftw clean : rm -rf fftw *.o