statistics/0000755000175000017500000000000012271566224012615 5ustar asneltasneltstatistics/COPYING0000644000175000017500000000722112174323412013642 0ustar asneltasneltinst/private/tbl_delim.m.m GPLv3+ inst/anderson_darling_cdf.m public domain inst/anderson_darling_test.m public domain inst/anovan.m GPLv3+ inst/betastat.m GPLv3+ inst/binostat.m GPLv3+ inst/boxplot.m GPLv3+ inst/caseread.m GPLv3+ inst/casewrite.m GPLv3+ inst/chi2stat.m GPLv3+ inst/cl_multinom.m GPLv3+ inst/combnk.m GPLv3+ inst/copulacdf.m GPLv3+ inst/copulapdf.m GPLv3+ inst/copularnd.m GPLv3+ inst/dendogram.m GPLv3+ inst/expstat.m GPLv3+ inst/ff2n.m public domain inst/fstat.m GPLv3+ inst/fullfact.m public domain inst/gamfit.m public domain inst/gamlike.m public domain inst/gamstat.m GPLv3+ inst/geomean.m GPLv3+ inst/geostat.m GPLv3+ inst/gevcdf.m GPLv3+ inst/gevfit_lmom.m GPLv3+ inst/gevfit.m GPLv3+ inst/gevinv.m GPLv3+ inst/gevlike.m GPLv3+ inst/gevpdf.m GPLv3+ inst/gevrnd.m GPLv3+ inst/gevstat.m GPLv3+ inst/harmmean.m GPLv3+ inst/hist3.m GPLv3+ inst/histfit.m GPLv3+ inst/hmmestimate.m GPLv3+ inst/hmmgenerate.m GPLv3+ inst/hmmviterbi.m GPLv3+ inst/hygestat.m GPLv3+ inst/jackknife.m GPLv3+ inst/jsucdf.m GPLv3+ inst/jsupdf.m GPLv3+ inst/kmeans.m GPLv3+ inst/linkage.m GPLv3+ inst/lognstat.m GPLv3+ inst/mad.m GPLv3+ inst/mnpdf.m GPLv3+ inst/mnrnd.m GPLv3+ inst/monotone_smooth.m GPLv3+ inst/mvncdf.m GPLv3+ inst/mvnpdf.m public domain inst/mvnrnd.m GPLv3+ inst/mvtcdf.m GPLv3+ inst/mvtrnd.m GPLv3+ inst/nanmax.m GPLv3+ inst/nanmean.m GPLv3+ inst/nanmedian.m GPLv3+ inst/nanmin.m GPLv3+ inst/nanstd.m GPLv3+ inst/nansum.m GPLv3+ inst/nanvar.m.m GPLv3+ inst/nbinstat.m GPLv3+ inst/normalise_distribution.m GPLv3+ inst/normplot.m public domain inst/normstat.m GPLv3+ inst/pcacov.m GPLv3+ inst/pcares.m GPLv3+ inst/pdist.m GPLv3+ inst/plsregress.m GPLv3+ inst/poisstat.m GPLv3+ inst/princomp.m public domain inst/random.m GPLv3+ inst/raylcdf.m GPLv3+ inst/raylinv.m GPLv3+ inst/raylpdf.m GPLv3+ inst/raylrnd.m GPLv3+ inst/raylstat.m GPLv3+ inst/regress_gp.m GPLv3+ inst/regress.m GPLv3+ inst/repanova.m.m GPLv3+ inst/runtest.m GPLv3+ inst/squareform.m GPLv3+ inst/stepwisefit.m GPLv3+ inst/tabulate.m GPLv3+ inst/tblread.m GPLv3+ inst/tblwrite.m GPLv3+ inst/trimmean.m GPLv3+ inst/tstat.m GPLv3+ inst/unidstat.m GPLv3+ inst/unifstat.m GPLv3+ inst/vmpdf.m GPLv3+ inst/vmrnd.m GPLv3+ inst/wblstat.m.m GPLv3+ statistics/NEWS0000644000175000017500000001052112271566110013305 0ustar asneltasneltSummary of important user-visible changes for statistics 1.2.3: ------------------------------------------------------------------- ** Made sure that output of nanstd is real. ** Fixed second output of nanmax and nanmin. ** Corrected handle for outliers in boxplot. ** Bug fix and enhanced functionality for mvnrnd. ** The following functions are new: wishrnd iwishrnd wishpdf iwishpdf cmdscale Summary of important user-visible changes for statistics 1.2.2: ------------------------------------------------------------------- ** Fixed documentation of dendogram and hist3 to work with TexInfo 5. Summary of important user-visible changes for statistics 1.2.1: ------------------------------------------------------------------- ** The following functions are new: pcares pcacov runstest stepwisefit hist3 ** dendogram now returns the leaf node numbers and order that the nodes were displayed in. ** New faster implementation of princomp. Summary of important user-visible changes for statistics 1.2.0: ------------------------------------------------------------------- ** The following functions are new: regress_gp dendogram plsregress ** New functions for the generalized extreme value (GEV) distribution: gevcdf gevfit gevfit_lmom gevinv gevlike gevpdf gevrnd gevstat ** The interface of the following functions has been modified: mvnrnd ** `kmeans' has been fixed to deal with clusters that contain only one element. ** `normplot' has been fixed to avoid use of functions that have been removed from Octave core. Also, the plot produced should now display some aesthetic elements and appropriate legends. ** The help text of `mvtrnd' has been improved. ** Package is no longer autoloaded. Summary of important user-visible changes for statistics 1.1.3: ------------------------------------------------------------------- ** The following functions are new in 1.1.3: copularnd mvtrnd ** The functions mnpdf and mnrnd are now also usable for greater numbers of categories for which the rows do not exactly sum to 1. Summary of important user-visible changes for statistics 1.1.2: ------------------------------------------------------------------- ** The following functions are new in 1.1.2: mnpdf mnrnd ** The package is now dependent on the io package (version 1.0.18 or later) since the functions that it depended of from miscellaneous package have been moved to io. ** The function `kmeans' now accepts the 'emptyaction' property with the 'singleton' value. This allows for the kmeans algorithm to handle empty cluster better. It also throws an error if the user does not request an empty cluster handling, and there is an empty cluster. Plus, the returned items are now a closer match to Matlab. Summary of important user-visible changes for statistics 1.1.1: ------------------------------------------------------------------- ** The following functions are new in 1.1.1: monotone_smooth kmeans jackknife ** Bug fixes on the functions: normalise_distribution combnk repanova ** The following functions were removed since equivalents are now part of GNU octave core: zscore ** boxplot.m now returns a structure with handles to the plot elemenets. Summary of important user-visible changes for statistics 1.1.0: ------------------------------------------------------------------- ** IMPORTANT note about `fstat' shadowing core library function: GNU octave's 3.2 release added a new function `fstat' to return information of a file. Statistics' `fstat' computes F mean and variance. Since MatLab's `fstat' is the equivalent to statistics' `fstat' (not to core's `fstat'), and to avoid problems with the statistics package, `fstat' has been deprecated in octave 3.4 and will be removed in Octave 3.8. In the mean time, please ignore this warning when installing the package. ** The following functions are new in 1.1.0: normalise_distribution repanova combnk ** The following functions were removed since equivalents are now part of GNU octave core: prctile ** The __tbl_delim__ function is now private. ** The function `boxplot' now accepts named arguments. ** Bug fixes on the functions: harmmean nanmax nanmin regress ** Small improvements on help text. statistics/test/0000755000175000017500000000000012271566224013574 5ustar asneltasneltstatistics/test/caseread.dat0000644000175000017500000000001211014046367016021 0ustar asneltasnelta bcd efstatistics/test/tblread-space.dat0000644000175000017500000000002411014040653016753 0ustar asneltasnelt a bc de 1 2 f 3 4statistics/test/tblread-tab.dat0000644000175000017500000000007011014040653016427 0ustar asneltasnelt a bc de 1 2 f 3 4statistics/inst/0000755000175000017500000000000012271566224013572 5ustar asneltasneltstatistics/inst/squareform.m0000644000175000017500000000542611741556364016150 0ustar asneltasnelt## Copyright (C) 2006, 2008 Bill Denney ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} squareform (@var{x}) ## @deftypefnx {Function File} {@var{y} =} squareform (@var{x}, @ ## "tovector") ## @deftypefnx {Function File} {@var{y} =} squareform (@var{x}, @ ## "tomatrix") ## Convert a vector from the pdist function into a square matrix or from ## a square matrix back to the vector form. ## ## The second argument is used to specify the output type in case there ## is a single element. ## @seealso{pdist} ## @end deftypefn ## Author: Bill Denney function y = squareform (x, method) if nargin < 1 print_usage (); elseif nargin < 2 if isscalar (x) || isvector (x) method = "tomatrix"; elseif issquare (x) method = "tovector"; else error ("squareform: cannot deal with a nonsquare, nonvector \ input"); endif endif method = lower (method); if ! strcmp ({"tovector" "tomatrix"}, method) error ("squareform: method must be either \"tovector\" or \ \"tomatrix\""); endif if strcmp ("tovector", method) if ! issquare (x) error ("squareform: x is not a square matrix"); endif sx = size (x, 1); y = zeros ((sx-1)*sx/2, 1); idx = 1; for i = 2:sx newidx = idx + sx - i; y(idx:newidx) = x(i:sx,i-1); idx = newidx + 1; endfor else ## we're converting to a matrix ## make sure that x is a column x = x(:); ## the dimensions of y are the solution to the quadratic formula ## for: ## length(x) = (sy-1)*(sy/2) sy = (1 + sqrt (1+ 8*length (x)))/2; y = zeros (sy); for i = 1:sy-1 step = sy - i; y((sy-step+1):sy,i) = x(1:step); x(1:step) = []; endfor y = y + y'; endif endfunction ## make sure that it can go both directions automatically %!assert(squareform(1:6), [0 1 2 3;1 0 4 5;2 4 0 6;3 5 6 0]) %!assert(squareform([0 1 2 3;1 0 4 5;2 4 0 6;3 5 6 0]), [1:6]') ## make sure that the command arguments force the correct behavior %!assert(squareform(1), [0 1;1 0]) %!assert(squareform(1, "tomatrix"), [0 1;1 0]) %!assert(squareform(1, "tovector"), zeros(0,1)) statistics/inst/gevfit_lmom.m0000644000175000017500000000676212070115105016254 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{paramhat}, @var{paramci} =} gevfit_lmom (@var{data}) ## Find an estimator (@var{paramhat}) of the generalized extreme value (GEV) distribution fitting @var{data} using the method of L-moments. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{data} is the vector of given values. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{parmhat} is the 3-parameter maximum-likelihood parameter vector [@var{k}; @var{sigma}; @var{mu}], where @var{k} is the shape parameter of the GEV distribution, @var{sigma} is the scale parameter of the GEV distribution, and @var{mu} is the location parameter of the GEV distribution. ## @item ## @var{paramci} has the approximate 95% confidence intervals of the parameter values (currently not implemented). ## ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## data = gevrnd (0.1, 1, 0, 100, 1); ## [pfit, pci] = gevfit_lmom (data); ## p1 = gevcdf (data,pfit(1),pfit(2),pfit(3)); ## [f, x] = ecdf (data); ## plot(data, p1, 's', x, f) ## @end group ## @end example ## @seealso{gevfit} ## @subheading References ## ## @enumerate ## @item ## Ailliot, P.; Thompson, C. & Thomson, P. Mixed methods for fitting the GEV distribution, Water Resources Research, 2011, 47, W05551 ## ## @end enumerate ## @end deftypefn ## Author: Nir Krakauer ## Description: L-moments parameter estimation for the generalized extreme value distribution function [paramhat, paramci] = gevfit_lmom (data) # Check arguments if (nargin < 1) print_usage; endif # find the L-moments data = sort (data(:))'; n = numel(data); L1 = mean(data); L2 = sum(data .* (2*(1:n) - n - 1)) / (2*nchoosek(n, 2)); # or mean(triu(data' - data, 1, 'pack')) / 2; b = bincoeff((1:n) - 1, 2); L3 = sum(data .* (b - 2 * ((1:n) - 1) .* (n - (1:n)) + fliplr(b))) / (3*nchoosek(n, 3)); #match the moments to the GEV distribution #first find k based on L3/L2 f = @(k) (L3/L2 + 3)/2 - limdiv((1 - 3^(k)), (1 - 2^(k))); k = fzero(f, 0); #next find sigma and mu given k if abs(k) < 1E-8 sigma = L2 / log(2); eg = 0.57721566490153286; %Euler-Mascheroni constant mu = L1 - sigma * eg; else sigma = -k*L2 / (gamma(1 - k) * (1 - 2^(k))); mu = L1 - sigma * ((gamma(1 - k) - 1) / k); endif paramhat = [k; sigma; mu]; if nargout > 1 paramci = NaN; endif endfunction #internal function to accurately evaluate (1 - 3^k)/(1 - 2^k) in the limit as k --> 0 function c = limdiv(a, b) # c = ifelse (abs(b) < 1E-8, log(3)/log(2), a ./ b); if abs(b) < 1E-8 c = log(3)/log(2); else c = a / b; endif endfunction %!test %! data = 1:50; %! [pfit, pci] = gevfit_lmom (data); %! expected_p = [-0.28 15.01 20.22]'; %! assert (pfit, expected_p, 0.1); statistics/inst/anovan.m0000644000175000017500000002745711741556364015256 0ustar asneltasnelt## Copyright (C) 2003-2005 Andy Adler ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{pval}, @var{f}, @var{df_b}, @var{df_e}] =} anovan (@var{data}, @var{grps}) ## @deftypefnx {Function File} {[@var{pval}, @var{f}, @var{df_b}, @var{df_e}] =} anovan (@var{data}, @var{grps}, 'param1', @var{value1}) ## Perform a multi-way analysis of variance (ANOVA). The goal is to test ## whether the population means of data taken from @var{k} different ## groups are all equal. ## ## Data is a single vector @var{data} with groups specified by ## a corresponding matrix of group labels @var{grps}, where @var{grps} ## has the same number of rows as @var{data}. For example, if ## @var{data} = [1.1;1.2]; @var{grps}= [1,2,1; 1,5,2]; ## then data point 1.1 was measured under conditions 1,2,1 and ## data point 1.2 was measured under conditions 1,5,2. ## Note that groups do not need to be sequentially numbered. ## ## By default, a 'linear' model is used, computing the N main effects ## with no interactions. this may be modified by param 'model' ## ## p= anovan(data,groups, 'model', modeltype) ## - modeltype = 'linear': compute N main effects ## - modeltype = 'interaction': compute N effects and ## N*(N-1) two-factor interactions ## - modeltype = 'full': compute interactions at all levels ## ## Under the null of constant means, the statistic @var{f} follows an F ## distribution with @var{df_b} and @var{df_e} degrees of freedom. ## ## The p-value (1 minus the CDF of this distribution at @var{f}) is ## returned in @var{pval}. ## ## If no output argument is given, the standard one-way ANOVA table is ## printed. ## ## BUG: DFE is incorrect for modeltypes != full ## @end deftypefn ## Author: Andy Adler ## Based on code by: KH ## $Id: anovan.m 10203 2012-04-12 13:47:32Z carandraug $ ## ## TESTING RESULTS: ## 1. ANOVA ACCURACY: www.itl.nist.gov/div898/strd/anova/anova.html ## Passes 'easy' test. Comes close on 'Average'. Fails 'Higher'. ## This could be fixed with higher precision arithmetic ## 2. Matlab anova2 test ## www.mathworks.com/access/helpdesk/help/toolbox/stats/anova2.html ## % From web site: ## popcorn= [ 5.5 4.5 3.5; 5.5 4.5 4.0; 6.0 4.0 3.0; ## 6.5 5.0 4.0; 7.0 5.5 5.0; 7.0 5.0 4.5]; ## % Define groups so reps = 3 ## groups = [ 1 1;1 2;1 3;1 1;1 2;1 3;1 1;1 2;1 3; ## 2 1;2 2;2 3;2 1;2 2;2 3;2 1;2 2;2 3 ]; ## anovan( vec(popcorn'), groups, 'model', 'full') ## % Results same as Matlab output ## 3. Matlab anovan test ## www.mathworks.com/access/helpdesk/help/toolbox/stats/anovan.html ## % From web site ## y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; ## g1 = [1 2 1 2 1 2 1 2]; ## g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; ## g3 = {'may'; 'may'; 'may'; 'may'; 'june'; 'june'; 'june'; 'june'}; ## anovan( y', [g1',g2',g3']) ## % Fails because we always do interactions function [PVAL, FSTAT, DF_B, DFE] = anovan (data, grps, varargin) if nargin <= 1 usage ("anovan (data, grps)"); end # test supplied parameters modeltype= 'linear'; for idx= 3:2:nargin param= varargin{idx-2}; value= varargin{idx-1}; if strcmp(param, 'model') modeltype= value; # elseif strcmp(param # add other parameters here else error(sprintf('parameter %s is not supported', param)); end end if ~isvector (data) error ("anova: for `anova (data, grps)', data must be a vector"); endif nd = size (grps,1); # number of data points nw = size (grps,2); # number of anova "ways" if (~ isvector (data) || (length(data) ~= nd)) error ("anova: grps must be a matrix of the same number of rows as data"); endif [g,grp_map] = relabel_groups (grps); if strcmp(modeltype, 'linear') max_interact = 1; elseif strcmp(modeltype,'interaction') max_interact = 2; elseif strcmp(modeltype,'full') max_interact = rows(grps); else error(sprintf('modeltype %s is not supported', modeltype)); end ng = length(grp_map); int_tbl = interact_tbl (nw, ng, max_interact ); [gn, gs, gss] = raw_sums(data, g, ng, int_tbl); stats_tbl = int_tbl(2:size(int_tbl,1),:)>0; nstats= size(stats_tbl,1); stats= zeros( nstats+1, 5); # SS, DF, MS, F, p for i= 1:nstats [SS, DF, MS]= factor_sums( gn, gs, gss, stats_tbl(i,:), ng, nw); stats(i,1:3)= [SS, DF, MS]; end # The Mean squared error is the data - avg for each possible measurement # This calculation doesn't work unless there is replication for all grps # SSE= sum( gss(sel) ) - sum( gs(sel).^2 ./ gn(sel) ); SST= gss(1) - gs(1)^2/gn(1); SSE= SST - sum(stats(:,1)); sel = select_pat( ones(1,nw), ng, nw); %incorrect for modeltypes != full DFE= sum( (gn(sel)-1).*(gn(sel)>0) ); MSE= SSE/DFE; stats(nstats+1,1:3)= [SSE, DFE, MSE]; for i= 1:nstats MS= stats(i,3); DF= stats(i,2); F= MS/MSE; pval = 1 - fcdf (F, DF, DFE); stats(i,4:5)= [F, pval]; end if nargout==0; printout( stats, stats_tbl ); else PVAL= stats(1:nstats,5); FSTAT=stats(1:nstats,4); DF_B= stats(1:nstats,2); DF_E= DFE; end endfunction # relabel groups to a mapping from 1 to ng # Input # grps input grouping # Output # g relabelled grouping # grp_map map from output to input grouping function [g,grp_map] = relabel_groups(grps) grp_vec= vec(grps); s= sort (grp_vec); uniq = 1+[0;find(diff(s))]; # mapping from new grps to old groups grp_map = s(uniq); # create new group g ngroups= length(uniq); g= zeros(size(grp_vec)); for i = 1:ngroups g( find( grp_vec== grp_map(i) ) ) = i; end g= reshape(g, size(grps)); endfunction # Create interaction table # # Input: # nw number of "ways" # ng number of ANOVA groups # max_interact maximum number of interactions to consider # default is nw function int_tbl =interact_tbl(nw, ng, max_interact) combin= 2^nw; inter_tbl= zeros( combin, nw); idx= (0:combin-1)'; for i=1:nw; inter_tbl(:,i) = ( rem(idx,2^i) >= 2^(i-1) ); end # find elements with more than max_interact 1's idx = ( sum(inter_tbl',1) > max_interact ); inter_tbl(idx,:) =[]; combin= size(inter_tbl,1); # update value #scale inter_tbl # use ng+1 to map combinations of groups to integers # this would be lots easier with a hash data structure int_tbl = inter_tbl .* (ones(combin,1) * (ng+1).^(0:nw-1) ); endfunction # Calculate sums for each combination # # Input: # g relabelled grouping matrix # ng number of ANOVA groups # max_interact # # Output (virtual (ng+1)x(nw) matrices): # gn number of data sums in each group # gs sum of data in each group # gss sumsqr of data in each group function [gn, gs, gss] = raw_sums(data, g, ng, int_tbl); nw= size(g,2); ndata= size(g,1); gn= gs= gss= zeros((ng+1)^nw, 1); for i=1:ndata # need offset by one for indexing datapt= data(i); idx = 1+ int_tbl*g(i,:)'; gn(idx) +=1; gs(idx) +=datapt; gss(idx) +=datapt^2; end endfunction # Calcualte the various factor sums # Input: # gn number of data sums in each group # gs sum of data in each group # gss sumsqr of data in each group # select binary vector of factor for this "way"? # ng number of ANOVA groups # nw number of ways function [SS,DF]= raw_factor_sums( gn, gs, gss, select, ng, nw); sel= select_pat( select, ng, nw); ss_raw= gs(sel).^2 ./ gn(sel); SS= sum( ss_raw( ~isnan(ss_raw) )); if length(find(select>0))==1 DF= sum(gn(sel)>0)-1; else DF= 1; #this isn't the real DF, but needed to multiply end endfunction function [SS, DF, MS]= factor_sums( gn, gs, gss, select, ng, nw); SS=0; DF=1; ff = find(select); lff= length(ff); # zero terms added, one term subtracted, two added, etc for i= 0:2^lff-1 remove= find( rem( floor( i * 2.^(-lff+1:0) ), 2) ); sel1= select; if ~isempty(remove) sel1( ff( remove ) )=0; end [raw_sum,raw_df]= raw_factor_sums(gn,gs,gss,sel1,ng,nw); add_sub= (-1)^length(remove); SS+= add_sub*raw_sum; DF*= raw_df; end MS= SS/DF; endfunction # Calcualte the various factor sums # Input: # select binary vector of factor for this "way"? # ng number of ANOVA groups # nw number of ways function sel= select_pat( select, ng, nw); # if select(i) is zero, remove nonzeros # if select(i) is zero, remove zero terms for i field=[]; if length(select) ~= nw; error("length of select must be = nw"); end ng1= ng+1; if isempty(field) # expand 0:(ng+1)^nw in base ng+1 field= (0:(ng1)^nw-1)'* ng1.^(-nw+1:0); field= rem( floor( field), ng1); # select zero or non-zero elements field= field>0; end sel= find( all( field == ones(ng1^nw,1)*select(:)', 2) ); endfunction function printout( stats, stats_tbl ); nw= size( stats_tbl,2); [jnk,order]= sort( sum(stats_tbl,2) ); printf('\n%d-way ANOVA Table (Factors A%s):\n\n', nw, ... sprintf(',%c',toascii('A')+(1:nw-1)) ); printf('Source of Variation Sum Sqr df MeanSS Fval p-value\n'); printf('*********************************************************************\n'); printf('Error %10.2f %4d %10.2f\n', stats( size(stats,1),1:3)); for i= order(:)' str= sprintf(' %c x',toascii('A')+find(stats_tbl(i,:)>0)-1 ); str= str(1:length(str)-2); # remove x printf('Factor %15s %10.2f %4d %10.2f %7.3f %7.6f\n', ... str, stats(i,:) ); end printf('\n'); endfunction #{ # Test Data from http://maths.sci.shu.ac.uk/distance/stats/14.shtml data=[7 9 9 8 12 10 ... 9 8 10 11 13 13 ... 9 10 10 12 10 12]'; grp = [1,1; 1,1; 1,2; 1,2; 1,3; 1,3; 2,1; 2,1; 2,2; 2,2; 2,3; 2,3; 3,1; 3,1; 3,2; 3,2; 3,3; 3,3]; data=[7 9 9 8 12 10 9 8 ... 9 8 10 11 13 13 10 11 ... 9 10 10 12 10 12 10 12]'; grp = [1,4; 1,4; 1,5; 1,5; 1,6; 1,6; 1,7; 1,7; 2,4; 2,4; 2,5; 2,5; 2,6; 2,6; 2,7; 2,7; 3,4; 3,4; 3,5; 3,5; 3,6; 3,6; 3,7; 3,7]; # Test Data from http://maths.sci.shu.ac.uk/distance/stats/9.shtml data=[9.5 11.1 11.9 12.8 ... 10.9 10.0 11.0 11.9 ... 11.2 10.4 10.8 13.4]'; grp= [1:4,1:4,1:4]'; # Test Data from http://maths.sci.shu.ac.uk/distance/stats/13.shtml data=[7.56 9.68 11.65 ... 9.98 9.69 10.69 ... 7.23 10.49 11.77 ... 8.22 8.55 10.72 ... 7.59 8.30 12.36]'; grp = [1,1;1,2;1,3; 2,1;2,2;2,3; 3,1;3,2;3,3; 4,1;4,2;4,3; 5,1;5,2;5,3]; # Test Data from www.mathworks.com/ # access/helpdesk/help/toolbox/stats/linear10.shtml data=[23 27 43 41 15 17 3 9 20 63 55 90]; grp= [ 1 1 1 1 2 2 2 2 3 3 3 3; 1 1 2 2 1 1 2 2 1 1 2 2]'; #} statistics/inst/gevlike.m0000644000175000017500000002722012071127341015370 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{nlogL}, @var{Grad}, @var{ACOV} =} gevlike (@var{params}, @var{data}) ## Compute the negative log-likelihood of data under the generalized extreme value (GEV) distribution with given parameter values. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{params} is the 3-parameter vector [@var{k}, @var{sigma}, @var{mu}], where @var{k} is the shape parameter of the GEV distribution, @var{sigma} is the scale parameter of the GEV distribution, and @var{mu} is the location parameter of the GEV distribution. ## @item ## @var{data} is the vector of given values. ## ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{nlogL} is the negative log-likelihood. ## @item ## @var{Grad} is the 3 by 1 gradient vector (first derivative of the negative log likelihood with respect to the parameter values) ## @item ## @var{ACOV} is the 3 by 3 Fisher information matrix (second derivative of the negative log likelihood with respect to the parameter values) ## ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = -5:-1; ## k = -0.2; ## sigma = 0.3; ## mu = 0.5; ## [L, ~, C] = gevlike ([k sigma mu], x); ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Rolf-Dieter Reiss and Michael Thomas. @cite{Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields}. Chapter 1, pages 16-17, Springer, 2007. ## ## @end enumerate ## @seealso{gevcdf, gevfit, gevinv, gevpdf, gevrnd, gevstat} ## @end deftypefn ## Author: Nir Krakauer ## Description: Negative log-likelihood for the generalized extreme value distribution function [nlogL, Grad, ACOV] = gevlike (params, data) # Check arguments if (nargin != 2) print_usage; endif k = params(1); sigma = params(2); mu = params(3); #calculate negative log likelihood [nll, k_terms] = gevnll (data, k, sigma, mu); nlogL = sum(nll(:)); #optionally calculate the first and second derivatives of the negative log likelihood with respect to parameters if nargout > 1 [Grad, kk_terms] = gevgrad (data, k, sigma, mu, k_terms); if nargout > 2 ACOV = gevfim (data, k, sigma, mu, k_terms, kk_terms); endif endif endfunction function [nlogL, k_terms] = gevnll (x, k, sigma, mu) #internal function to calculate negative log likelihood for gevlike #no input checking done k_terms = []; a = (x - mu) ./ sigma; if all(k == 0) nlogL = exp(-a) + a + log(sigma); else aa = k .* a; if min(abs(aa)) < 1E-3 && max(abs(aa)) < 0.5 #use a series expansion to find the log likelihood more accurately when k is small k_terms = 1; sgn = 1; i = 0; while 1 sgn = -sgn; i++; newterm = (sgn / (i + 1)) * (aa .^ i); k_terms = k_terms + newterm; if max(abs(newterm)) <= eps break endif endwhile nlogL = exp(-a .* k_terms) + a .* (k + 1) .* k_terms + log(sigma); else b = 1 + aa; nlogL = b .^ (-1 ./ k) + (1 + 1 ./ k) .* log(b) + log(sigma); nlogL(b <= 0) = Inf; endif endif endfunction function [G, kk_terms] = gevgrad (x, k, sigma, mu, k_terms) #calculate the gradient of the negative log likelihood of data x with respect to the parameters of the generalized extreme value distribution for gevlike #no input checking done kk_terms = []; G = ones(3, 1); if k == 0 ##use the expressions for first derivatives that are the limits as k --> 0 a = (x - mu) ./ sigma; f = exp(-a) - 1; #k #g = -(2 * x .* (mu .* (1 - f) - sigma .* f) + 2 .* sigma .* mu .* f + (x.^2 + mu.^2).*(f - 1)) ./ (2 * f .* sigma .^ 2); g = a .* (1 + a .* f / 2); G(1) = sum(g(:)); #sigma g = (a .* f + 1) ./ sigma; G(2) = sum(g(:)); #mu g = f ./ sigma; G(3) = sum(g(:)); return endif a = (x - mu) ./ sigma; b = 1 + k .* a; if any (b <= 0) G(:) = 0; #negative log likelihood is locally infinite return endif #k c = log(b); d = 1 ./ k + 1; if nargin > 4 && ~isempty(k_terms) #use a series expansion to find the gradient more accurately when k is small aa = k .* a; f = exp(-a .* k_terms); kk_terms = 0.5; sgn = 1; i = 0; while 1 sgn = -sgn; i++; newterm = (sgn * (i + 1) / (i + 2)) * (aa .^ i); kk_terms = kk_terms + newterm; if max(abs(newterm)) <= eps break endif endwhile g = a .* ((a .* kk_terms) .* (f - 1 - k) + k_terms); else g = (c ./ k - a ./ b) ./ (k .* b .^ (1/k)) - c ./ (k .^ 2) + a .* d ./ b; endif %keyboard G(1) = sum(g(:)); #sigma if nargin > 4 && ~isempty(k_terms) #use a series expansion to find the gradient more accurately when k is small g = (1 - a .* (a .* k .* kk_terms - k_terms) .* (f - k - 1)) ./ sigma; else #g = (a .* b .^ (-d) - d .* k .* a ./ b + 1) ./ sigma; g = (a .* b .^ (-d) - (k + 1) .* a ./ b + 1) ./ sigma; endif G(2) = sum(g(:)); #mu if nargin > 4 && ~isempty(k_terms) #use a series expansion to find the gradient more accurately when k is small g = -(a .* k .* kk_terms - k_terms) .* (f - k - 1) ./ sigma; else #g = (b .^ (-d) - d .* k ./ b) ./ sigma; g = (b .^ (-d) - (k + 1) ./ b) ./ sigma; end G(3) = sum(g(:)); endfunction function ACOV = gevfim (x, k, sigma, mu, k_terms, kk_terms) #internal function to calculate the Fisher information matrix for gevlike #no input checking done #find the various second derivatives (used Maxima to help find the expressions) ACOV = ones(3); if k == 0 ##use the expressions for second derivatives that are the limits as k --> 0 #k, k a = (x - mu) ./ sigma; f = exp(-a); #der = (x .* (24 * mu .^ 2 .* sigma .* (f - 1) + 24 * mu .* sigma .^ 2 .* f - 12 * mu .^ 3) + x .^ 3 .* (8 * sigma .* (f - 1) - 12*mu) + x .^ 2 .* (-12 * sigma .^ 2 .* f + 24 * mu .* sigma .* (1 - f) + 18 * mu .^ 2) - 12 * mu .^ 2 .* sigma .^ 2 .* f + 8 * mu .^ 3 .* sigma .* (1 - f) + 3 * (x .^ 4 + mu .^ 4)) ./ (12 .* f .* sigma .^ 4); der = (a .^ 2) .* (a .* (a/4 - 2/3) .* f + 2/3 * a - 1); ACOV(1, 1) = sum(der(:)); #sigma, sigma der = (sigma .^ -2) .* (a .* ((a - 2) .* f + 2) - 1); ACOV(2, 2) = sum(der(:)); #mu, mu der = (sigma .^ -2) .* f; ACOV(3, 3) = sum(der(:)); #k, sigma #der = (x .^2 .* (2*sigma .* (f - 1) - 3*mu) + x .* (-2 * sigma .^ 2 .* f + 4 * mu .* sigma .* (1 - f) + 3 .* mu .^ 2) + 2 * mu .^ 2 .* sigma .* (f - 1) + 2 * mu * sigma .^ 2 * f + x .^ 3 - mu .^ 3) ./ (2 .* f .* sigma .^ 4); der = (-a ./ sigma) .* (a .* (1 - a/2) .* f - a + 1); ACOV(1, 2) = ACOV(2, 1) = sum(der(:)); #k, mu #der = (x .* (2*sigma .* (f - 1) - 2*mu) - 2 * f .* sigma .^ 2 + 2 .* mu .* sigma .* (1 - f) + x .^ 2 + mu .^ 2)./ (2 .* f .* sigma .^ 3); der = (-1 ./ sigma) .* (a .* (1 - a/2) .* f - a + 1); ACOV(1, 3) = ACOV(3, 1) = sum(der(:)); #sigma, mu der = (1 + (a - 1) .* f) ./ (sigma .^ 2); ACOV(2, 3) = ACOV(3, 2) = sum(der(:)); return endif #general case z = 1 + k .* (x - mu) ./ sigma; #k, k a = (x - mu) ./ sigma; b = k .* a + 1; c = log(b); d = 1 ./ k + 1; if nargin > 5 && ~isempty(kk_terms) #use a series expansion to find the derivatives more accurately when k is small aa = k .* a; f = exp(-a .* k_terms); kkk_terms = 2/3; sgn = 1; i = 0; while 1 sgn = -sgn; i++; newterm = (sgn * (i + 1) * (i + 2) / (i + 3)) * (aa .^ i); kkk_terms = kkk_terms + newterm; if max(abs(newterm)) <= eps break endif endwhile der = (a .^ 2) .* (a .* (a .* kk_terms .^ 2 - kkk_terms) .* f + a .* (1 + k) .* kkk_terms - 2 * kk_terms); else der = ((((c ./ k.^2) - (a ./ (k .* b))) .^ 2) ./ (b .^ (1 ./ k))) + ... ((-2*c ./ k.^3) + (2*a ./ (k.^2 .* b)) + ((a ./ b) .^ 2 ./ k)) ./ (b .^ (1 ./ k)) + ... 2*c ./ k.^3 - ... (2*a ./ (k.^2 .* b)) - (d .* (a ./ b) .^ 2); endif der(z <= 0) = 0; %no probability mass in this region ACOV(1, 1) = sum(der(:)); #sigma, sigma if nargin > 5 && ~isempty(kk_terms) #use a series expansion to find the derivatives more accurately when k is small der = ((-2*a .* k_terms + 4 * a .^ 2 .* k .* kk_terms - a .^ 3 .* (k .^ 2) .* kkk_terms) .* (f - k - 1) + f .* ((a .* (k_terms - a .* k .* kk_terms)) .^ 2) - 1) ./ (sigma .^ 2); else der = (sigma .^ -2) .* (... -2*a .* b .^ (-d) + ... d .* k .* a .^ 2 .* (b .^ (-d-1)) + ... 2 .* d .* k .* a ./ b - ... d .* (k .* a ./ b) .^ 2 - 1); end der(z <= 0) = 0; %no probability mass in this region ACOV(2, 2) = sum(der(:)); #mu, mu if nargin > 5 && ~isempty(kk_terms) #use a series expansion to find the derivatives involving k more accurately when k is small der = (f .* (a .* k .* kk_terms - k_terms) .^ 2 - a .* k .^ 2 .* kkk_terms .* (f - k - 1)) ./ (sigma .^ 2); else der = (d .* (sigma .^ -2)) .* (... k .* (b .^ (-d-1)) - ... (k ./ b) .^ 2); endif der(z <= 0) = 0; %no probability mass in this region ACOV(3, 3) = sum(der(:)); #k, mu if nargin > 5 && ~isempty(kk_terms) #use a series expansion to find the derivatives involving k more accurately when k is small der = 2 * a .* kk_terms .* (f - 1 - k) - a .^ 2 .* k_terms .* kk_terms .* f + k_terms; #k, a second derivative der = -der ./ sigma; else der = ( (b .^ (-d)) .* (c ./ k - a ./ b) ./ k - ... a .* (b .^ (-d-1)) + ... ((1 ./ k) - d) ./ b + a .* k .* d ./ (b .^ 2)) ./ sigma; endif der(z <= 0) = 0; %no probability mass in this region ACOV(1, 3) = ACOV(3, 1) = sum(der(:)); #k, sigma der = a .* der; der(z <= 0) = 0; %no probability mass in this region ACOV(1, 2) = ACOV(2, 1) = sum(der(:)); #sigma, mu if nargin > 5 && ~isempty(kk_terms) #use a series expansion to find the derivatives involving k more accurately when k is small der = ((-k_terms + 3 * a .* k .* kk_terms - (a .* k) .^ 2 .* kkk_terms) .* (f - k - 1) + a .* (k_terms - a .* k .* kk_terms) .^ 2 .* f) ./ (sigma .^ 2); else der = ( -(b .^ (-d)) + ... a .* k .* d .* (b .^ (-d-1)) + ... (d .* k ./ b) - a .* (k./b).^2 .* d) ./ (sigma .^ 2); end der(z <= 0) = 0; %no probability mass in this region ACOV(2, 3) = ACOV(3, 2) = sum(der(:)); endfunction %!test %! x = 1; %! k = 0.2; %! sigma = 0.3; %! mu = 0.5; %! [L, D, C] = gevlike ([k sigma mu], x); %! expected_L = 0.75942; %! expected_D = [0.53150; -0.67790; -2.40674]; %! expected_C = [-0.12547 1.77884 1.06731; 1.77884 16.40761 8.48877; 1.06731 8.48877 0.27979]; %! assert (L, expected_L, 0.001); %! assert (D, expected_D, 0.001); %! assert (C, expected_C, 0.001); %!test %! x = 1; %! k = 0; %! sigma = 0.3; %! mu = 0.5; %! [L, D, C] = gevlike ([k sigma mu], x); %! expected_L = 0.65157; %! expected_D = [0.54011; -1.17291; -2.70375]; %! expected_C = [0.090036 3.41229 2.047337; 3.412229 24.760027 12.510190; 2.047337 12.510190 2.098618]; %! assert (L, expected_L, 0.001); %! assert (D, expected_D, 0.001); %! assert (C, expected_C, 0.001); %!test %! x = -5:-1; %! k = -0.2; %! sigma = 0.3; %! mu = 0.5; %! [L, D, C] = gevlike ([k sigma mu], x); %! expected_L = 3786.4; %! expected_D = [6.4511e+04; -4.8194e+04; 3.0633e+03]; %! expected_C = -[-1.4937e+06 1.0083e+06 -6.1837e+04; 1.0083e+06 -8.1138e+05 4.0917e+04; -6.1837e+04 4.0917e+04 -2.0422e+03]; %! assert (L, expected_L, -0.001); %! assert (D, expected_D, -0.001); %! assert (C, expected_C, -0.001); statistics/inst/pcares.m0000644000175000017500000000533612077545776015251 0ustar asneltasnelt## Copyright (C) 2013 Fernando Damian Nieuwveldt ## ## This program is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License ## as published by the Free Software Foundation; either version 3 ## of the License, or (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{residuals},@var{reconstructed}]}=pcares(@var{X}, @var{NDIM}) ## @itemize @bullet ## @item ## @var{X} : N x P Matrix with N observations and P variables, the variables will be mean centered ## @item ## @var{ndim} : Is a scalar indicating the number of principal components to use and should be <= P ## @end itemize ## ## @subheading References ## ## @enumerate ## @item ## Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer, 2002 ## ## @end enumerate ## @end deftypefn ## Author: Fernando Damian Nieuwveldt ## Description: Residuals from Principal Components Analysis function [residuals,reconstructed] = pcares(X,NDIM) if (nargin ~= 2) error('pcares takes two inputs: The data Matrix X and number of principal components NDIM') endif # Mean center data Xcentered = bsxfun(@minus,X,mean(X)); # Apply svd to get the principal component coefficients [U,S,V] = svd(Xcentered); # Use only the first ndim PCA components v = V(:,1:NDIM); if (nargout == 2) # Calculate the residuals residuals = Xcentered - Xcentered * (v*v'); # Reconstructed data using ndim PCA components reconstructed = X - residuals; else # Calculate the residuals residuals = Xcentered - Xcentered * (v*v'); endif endfunction %!demo %! X = [ 7 26 6 60; %! 1 29 15 52; %! 11 56 8 20; %! 11 31 8 47; %! 7 52 6 33; %! 11 55 9 22; %! 3 71 17 6; %! 1 31 22 44; %! 2 54 18 22; %! 21 47 4 26; %! 1 40 23 34; %! 11 66 9 12; %! 10 68 8 12 %! ]; %! # As we increase the number of principal components, the norm %! # of the residuals matrix will decrease %! r1 = pcares(X,1); %! n1 = norm(r1) %! r2 = pcares(X,2); %! n2 = norm(r2) %! r3 = pcares(X,3); %! n3 = norm(r3) %! r4 = pcares(X,4); %! n4 = norm(r4) statistics/inst/pcacov.m0000644000175000017500000000454612174252676015241 0ustar asneltasnelt## Copyright (C) 2013 Fernando Damian Nieuwveldt ## ## This program is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License ## as published by the Free Software Foundation; either version 3 ## of the License, or (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{COEFF}]} = pcacov(@var{X}) ## @deftypefnx {Function File} {[@var{COEFF},@var{latent}]} = pcacov(@var{X}) ## @deftypefnx {Function File} {[@var{COEFF},@var{latent},@var{explained}]} = pcacov(@var{X}) ## @itemize @bullet ## @item ## pcacov performs principal component analysis on the nxn covariance matrix X ## @item ## @var{COEFF} : a nxn matrix with columns containing the principal component coefficients ## @item ## @var{latent} : a vector containing the principal component variances ## @item ## @var{explained} : a vector containing the percentage of the total variance explained by each principal component ## ## @end itemize ## ## @subheading References ## ## @enumerate ## @item ## Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer, 2002 ## ## @end enumerate ## @end deftypefn ## Author: Fernando Damian Nieuwveldt ## Description: Principal Components Analysis using a covariance matrix function [COEFF, latent, explained] = pcacov(X) [U,S,V] = svd(X); if nargout == 1 COEFF = U; elseif nargout == 2 COEFF = U; latent = diag(S); else COEFF = U; latent = diag(S); explained = 100*latent./sum(latent); end endfunction %!demo %! X = [ 7 26 6 60; %! 1 29 15 52; %! 11 56 8 20; %! 11 31 8 47; %! 7 52 6 33; %! 11 55 9 22; %! 3 71 17 6; %! 1 31 22 44; %! 2 54 18 22; %! 21 47 4 26; %! 1 40 23 34; %! 11 66 9 12; %! 10 68 8 12 %! ]; %! covx = cov(X); %! [COEFF,latent,explained] = pcacov(covx) statistics/inst/vmrnd.m0000644000175000017500000000474011741556364015110 0ustar asneltasnelt## Copyright (C) 2009 Soren Hauberg ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} @var{theta} = vmrnd (@var{mu}, @var{k}) ## @deftypefnx{Function File} @var{theta} = vmrnd (@var{mu}, @var{k}, @var{sz}) ## Draw random angles from a Von Mises distribution with mean @var{mu} and ## concentration @var{k}. ## ## The Von Mises distribution has probability density function ## @example ## f (@var{x}) = exp (@var{k} * cos (@var{x} - @var{mu})) / @var{Z} , ## @end example ## where @var{Z} is a normalisation constant. ## ## The output, @var{theta}, is a matrix of size @var{sz} containing random angles ## drawn from the given Von Mises distribution. By default, @var{mu} is 0 ## and @var{k} is 1. ## @seealso{vmpdf} ## @end deftypefn function theta = vmrnd (mu = 0, k = 1, sz = 1) ## Check input if (!isreal (mu)) error ("vmrnd: first input must be a scalar"); endif if (!isreal (k) || k <= 0) error ("vmrnd: second input must be a real positive scalar"); endif if (isscalar (sz)) sz = [sz, sz]; elseif (!isvector (sz)) error ("vmrnd: third input must be a scalar or a vector"); endif ## Simulate! if (k < 1e-6) ## k is small: sample uniformly on circle theta = 2 * pi * rand (sz) - pi; else a = 1 + sqrt (1 + 4 * k.^2); b = (a - sqrt (2 * a)) / (2 * k); r = (1 + b^2) / (2 * b); N = prod (sz); notdone = true (N, 1); while (any (notdone)) u (:, notdone) = rand (3, N); z (notdone) = cos (pi * u (1, notdone)); f (notdone) = (1 + r * z (notdone)) ./ (r + z (notdone)); c (notdone) = k * (r - f (notdone)); notdone = (u (2, :) >= c .* (2 - c)) & (log (c) - log (u (2, :)) + 1 - c < 0); N = sum (notdone); endwhile theta = mu + sign (u (3, :) - 0.5) .* acos (f); theta = reshape (theta, sz); endif endfunction statistics/inst/jsucdf.m0000644000175000017500000000402511741556364015234 0ustar asneltasnelt## Copyright (C) 2006 Frederick (Rick) A Niles ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} jsucdf (@var{x}, @var{alpha1}, @var{alpha2}) ## For each element of @var{x}, compute the cumulative distribution ## function (CDF) at @var{x} of the Johnson SU distribution with shape parameters ## @var{alpha1} and @var{alpha2}. ## ## Default values are @var{alpha1} = 1, @var{alpha2} = 1. ## @end deftypefn ## Author: Frederick (Rick) A Niles ## Description: CDF of the Johnson SU distribution ## This function is derived from normcdf.m ## This is the TeX equation of this function: ## ## \[ F(x) = \Phi\left(\alpha_1 + \alpha_2 ## \log\left(x + \sqrt{x^2 + 1} \right)\right) \] ## ## where \[ -\infty < x < \infty ; \alpha_2 > 0 \] and $\Phi$ is the ## standard normal cumulative distribution function. $\alpha_1$ and ## $\alpha_2$ are shape parameters. function cdf = jsucdf (x, alpha1, alpha2) if (! ((nargin == 1) || (nargin == 3))) print_usage; endif if (nargin == 1) m = 0; v = 1; endif if (!isscalar (alpha1) || !isscalar(alpha2)) [retval, x, alpha1, alpha2] = common_size (x, alpha1, alpha2); if (retval > 0) error ("normcdf: x, alpha1 and alpha2 must be of common size or scalar"); endif endif one = ones (size (x)); cdf = stdnormal_cdf (alpha1 .* one + alpha2 .* log (x + sqrt(x.*x + one))); endfunction statistics/inst/vmpdf.m0000644000175000017500000000303711741556364015074 0ustar asneltasnelt## Copyright (C) 2009 Soren Hauberg ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} @var{theta} = vmpdf (@var{x}, @var{mu}, @var{k}) ## Evaluates the Von Mises probability density function. ## ## The Von Mises distribution has probability density function ## @example ## f (@var{x}) = exp (@var{k} * cos (@var{x} - @var{mu})) / @var{Z} , ## @end example ## where @var{Z} is a normalisation constant. By default, @var{mu} is 0 and ## @var{k} is 1. ## @seealso{vmrnd} ## @end deftypefn function p = vmpdf (x, mu = 0, k = 1) ## Check input if (!isreal (x)) error ("vmpdf: first input must be real"); endif if (!isreal (mu)) error ("vmpdf: second input must be a scalar"); endif if (!isreal (k) || k <= 0) error ("vmpdf: third input must be a real positive scalar"); endif ## Evaluate PDF Z = 2 * pi * besseli (0, k); p = exp (k * cos (x-mu)) / Z; endfunction statistics/inst/trimmean.m0000644000175000017500000000317611741556364015600 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{a} =} trimmean (@var{x}, @var{p}) ## ## Compute the trimmed mean. ## ## The trimmed mean of @var{x} is defined as the mean of @var{x} excluding the ## highest and lowest @var{p} percent of the data. ## ## For example ## ## @example ## mean ([-inf, 1:9, inf]) ## @end example ## ## is NaN, while ## ## @example ## trimmean ([-inf, 1:9, inf], 10) ## @end example ## ## excludes the infinite values, which make the result 5. ## ## @seealso{mean} ## @end deftypefn function a = trimmean(x, p, varargin) if (nargin != 2 && nargin != 3) print_usage; endif y = sort(x, varargin{:}); sz = size(x); if nargin < 3 dim = min(find(sz>1)); if isempty(dim), dim=1; endif; else dim = varargin{1}; endif idx = cell (0); for i=1:length(sz), idx{i} = 1:sz(i); end; trim = round(sz(dim)*p*0.01); idx{dim} = 1+trim : sz(dim)-trim; a = mean (y (idx{:}), varargin{:}); endfunction statistics/inst/wishrnd.m0000644000175000017500000000642212270062651015424 0ustar asneltasnelt## Copyright (C) 2013 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ## ## Octave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with Octave; see the file COPYING. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} [@var{W}[, @var{D}]] = wishrnd (@var{Sigma}, @var{df}[, @var{D}][, @var{n}=1]) ## Return a random matrix sampled from the Wishart distribution with given parameters ## ## Inputs: the @var{p} x @var{p} positive definite matrix @var{Sigma} and scalar degrees of freedom parameter @var{df} (and optionally the Cholesky factor @var{D} of @var{Sigma}). ## @var{df} can be non-integer as long as @var{df} > @var{p} ## ## Output: a random @var{p} x @var{p} matrix @var{W} from the Wishart(@var{Sigma}, @var{df}) distribution. If @var{n} > 1, then @var{W} is @var{p} x @var{p} x @var{n} and holds @var{n} such random matrices. (Optionally, the Cholesky factor @var{D} of @var{Sigma} is also returned.) ## ## Averaged across many samples, the mean of @var{W} should approach @var{df}*@var{Sigma}, and the variance of each element @var{W}_ij should approach @var{df}*(@var{Sigma}_ij^2 + @var{Sigma}_ii*@var{Sigma}_jj) ## ## Reference: Yu-Cheng Ku and Peter Bloomfield (2010), Generating Random Wishart Matrices with Fractional Degrees of Freedom in OX, http://www.gwu.edu/~forcpgm/YuChengKu-030510final-WishartYu-ChengKu.pdf ## ## @seealso{iwishrnd, wishpdf} ## @end deftypefn ## Author: Nir Krakauer ## Description: Compute the probability density function of the Wishart distribution function [W, D] = wishrnd(Sigma, df, D, n=1) if (nargin < 3) print_usage (); endif if nargin < 3 || isempty(D) try D = chol(Sigma); catch error('Cholesky decomposition failed; Sigma probably not positive definite') end_try_catch endif p = size(D, 1); if df < p df = floor(df); #distribution not defined for small noninteger df df_isint = 1; else #check for integer degrees of freedom df_isint = (df == floor(df)); endif if ~df_isint [ii, jj] = ind2sub([p, p], 1:(p*p)); endif if n > 1 W = nan(p, p, n); endif for i = 1:n if df_isint Z = randn(df, p) * D; W(:, :, i) = Z'*Z; else Z = diag(sqrt(chi2rnd(df - (0:(p-1))))); #fill diagonal #note: chi2rnd(x) is equivalent to 2*randg(x/2), but the latter seems to offer no performance advantage Z(ii > jj) = randn(p*(p-1)/2, 1); #fill lower triangle with normally distributed variates Z = D * Z; W(:, :, i) = Z*Z'; endif endfor endfunction %!assert(size (wishrnd (1,2,1)), [1, 1]); %!assert(size (wishrnd ([],2,1)), [1, 1]); %!assert(size (wishrnd ([3 1; 1 3], 2.00001, [], 1)), [2, 2]); %!assert(size (wishrnd (eye(2), 2, [], 3)), [2, 2, 3]); %% Test input validation %!error wishrnd () %!error wishrnd (1) %!error wishrnd ([-3 1; 1 3],1) %!error wishrnd ([1; 1],1) statistics/inst/copulacdf.m0000644000175000017500000002174011741556364015721 0ustar asneltasnelt## Copyright (C) 2008 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{p} =} copulacdf (@var{family}, @var{x}, @var{theta}) ## @deftypefnx {Function File} {} copulacdf ('t', @var{x}, @var{theta}, @var{nu}) ## Compute the cumulative distribution function of a copula family. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{family} is the copula family name. Currently, @var{family} can ## be @code{'Gaussian'} for the Gaussian family, @code{'t'} for the ## Student's t family, @code{'Clayton'} for the Clayton family, ## @code{'Gumbel'} for the Gumbel-Hougaard family, @code{'Frank'} for ## the Frank family, @code{'AMH'} for the Ali-Mikhail-Haq family, or ## @code{'FGM'} for the Farlie-Gumbel-Morgenstern family. ## ## @item ## @var{x} is the support where each row corresponds to an observation. ## ## @item ## @var{theta} is the parameter of the copula. For the Gaussian and ## Student's t copula, @var{theta} must be a correlation matrix. For ## bivariate copulas @var{theta} can also be a correlation coefficient. ## For the Clayton family, the Gumbel-Hougaard family, the Frank family, ## and the Ali-Mikhail-Haq family, @var{theta} must be a vector with the ## same number of elements as observations in @var{x} or be scalar. For ## the Farlie-Gumbel-Morgenstern family, @var{theta} must be a matrix of ## coefficients for the Farlie-Gumbel-Morgenstern polynomial where each ## row corresponds to one set of coefficients for an observation in ## @var{x}. A single row is expanded. The coefficients are in binary ## order. ## ## @item ## @var{nu} is the degrees of freedom for the Student's t family. ## @var{nu} must be a vector with the same number of elements as ## observations in @var{x} or be scalar. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{p} is the cumulative distribution of the copula at each row of ## @var{x} and corresponding parameter @var{theta}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = [0.2:0.2:0.6; 0.2:0.2:0.6]; ## theta = [1; 2]; ## p = copulacdf ("Clayton", x, theta) ## @end group ## ## @group ## x = [0.2:0.2:0.6; 0.2:0.1:0.4]; ## theta = [0.2, 0.1, 0.1, 0.05]; ## p = copulacdf ("FGM", x, theta) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Roger B. Nelsen. @cite{An Introduction to Copulas}. Springer, ## New York, second edition, 2006. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: CDF of a copula family function p = copulacdf (family, x, theta, nu) # Check arguments if (nargin != 3 && (nargin != 4 || ! strcmpi (family, "t"))) print_usage (); endif if (! ischar (family)) error ("copulacdf: family must be one of 'Gaussian', 't', 'Clayton', 'Gumbel', 'Frank', 'AMH', and 'FGM'"); endif if (! isempty (x) && ! ismatrix (x)) error ("copulacdf: x must be a numeric matrix"); endif [n, d] = size (x); lower_family = lower (family); # Check family and copula parameters switch (lower_family) case {"gaussian", "t"} # Family with a covariance matrix if (d == 2 && isscalar (theta)) # Expand a scalar to a correlation matrix theta = [1, theta; theta, 1]; endif if (any (size (theta) != [d, d]) || any (diag (theta) != 1) || any (any (theta != theta')) || min (eig (theta)) <= 0) error ("copulacdf: theta must be a correlation matrix"); endif if (nargin == 4) # Student's t family if (! isscalar (nu) && (! isvector (nu) || length (nu) != n)) error ("copulacdf: nu must be a vector with the same number of rows as x or be scalar"); endif nu = nu(:); endif case {"clayton", "gumbel", "frank", "amh"} # Archimedian one parameter family if (! isvector (theta) || (! isscalar (theta) && length (theta) != n)) error ("copulacdf: theta must be a vector with the same number of rows as x or be scalar"); endif theta = theta(:); if (n > 1 && isscalar (theta)) theta = repmat (theta, n, 1); endif case {"fgm"} # Exponential number of parameters if (! ismatrix (theta) || size (theta, 2) != (2 .^ d - d - 1) || (size (theta, 1) != 1 && size (theta, 1) != n)) error ("copulacdf: theta must be a row vector of length 2^d-d-1 or a matrix of size n x (2^d-d-1)"); endif if (n > 1 && size (theta, 1) == 1) theta = repmat (theta, n, 1); endif otherwise error ("copulacdf: unknown copula family '%s'", family); endswitch if (n == 0) # Input is empty p = zeros (0, 1); else # Truncate input to unit hypercube x(x < 0) = 0; x(x > 1) = 1; # Compute the cumulative distribution function according to family switch (lower_family) case {"gaussian"} # The Gaussian family p = mvncdf (norminv (x), zeros (1, d), theta); # No parameter bounds check k = []; case {"t"} # The Student's t family p = mvtcdf (tinv (x, nu), theta, nu); # No parameter bounds check k = []; case {"clayton"} # The Clayton family p = exp (-log (max (sum (x .^ (repmat (-theta, 1, d)), 2) - d + 1, 0)) ./ theta); # Product copula at columns where theta == 0 k = find (theta == 0); if (any (k)) p(k) = prod (x(k, :), 2); endif # Check bounds if (d > 2) k = find (! (theta >= 0) | ! (theta < inf)); else k = find (! (theta >= -1) | ! (theta < inf)); endif case {"gumbel"} # The Gumbel-Hougaard family p = exp (-(sum ((-log (x)) .^ repmat (theta, 1, d), 2)) .^ (1 ./ theta)); # Check bounds k = find (! (theta >= 1) | ! (theta < inf)); case {"frank"} # The Frank family p = -log (1 + (prod (expm1 (repmat (-theta, 1, d) .* x), 2)) ./ (expm1 (-theta) .^ (d - 1))) ./ theta; # Product copula at columns where theta == 0 k = find (theta == 0); if (any (k)) p(k) = prod (x(k, :), 2); endif # Check bounds if (d > 2) k = find (! (theta > 0) | ! (theta < inf)); else k = find (! (theta > -inf) | ! (theta < inf)); endif case {"amh"} # The Ali-Mikhail-Haq family p = (theta - 1) ./ (theta - prod ((1 + repmat (theta, 1, d) .* (x - 1)) ./ x, 2)); # Check bounds if (d > 2) k = find (! (theta >= 0) | ! (theta < 1)); else k = find (! (theta >= -1) | ! (theta < 1)); endif case {"fgm"} # The Farlie-Gumbel-Morgenstern family # All binary combinations bcomb = logical (floor (mod (((0:(2 .^ d - 1))' * 2 .^ ((1 - d):0)), 2))); ecomb = ones (size (bcomb)); ecomb(bcomb) = -1; # Summation over all combinations of order >= 2 bcomb = bcomb(sum (bcomb, 2) >= 2, end:-1:1); # Linear constraints matrix ac = zeros (size (ecomb, 1), size (bcomb, 1)); # Matrix to compute p ap = zeros (size (x, 1), size (bcomb, 1)); for i = 1:size (bcomb, 1) ac(:, i) = -prod (ecomb(:, bcomb(i, :)), 2); ap(:, i) = prod (1 - x(:, bcomb(i, :)), 2); endfor p = prod (x, 2) .* (1 + sum (ap .* theta, 2)); # Check linear constraints k = false (n, 1); for i = 1:n k(i) = any (ac * theta(i, :)' > 1); endfor endswitch # Out of bounds parameters if (any (k)) p(k) = NaN; endif endif endfunction %!test %! x = [0.2:0.2:0.6; 0.2:0.2:0.6]; %! theta = [1; 2]; %! p = copulacdf ("Clayton", x, theta); %! expected_p = [0.1395; 0.1767]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2:0.2:0.6; 0.2:0.2:0.6]; %! p = copulacdf ("Gumbel", x, 2); %! expected_p = [0.1464; 0.1464]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2:0.2:0.6; 0.2:0.2:0.6]; %! theta = [1; 2]; %! p = copulacdf ("Frank", x, theta); %! expected_p = [0.0699; 0.0930]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2:0.2:0.6; 0.2:0.2:0.6]; %! theta = [0.3; 0.7]; %! p = copulacdf ("AMH", x, theta); %! expected_p = [0.0629; 0.0959]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2:0.2:0.6; 0.2:0.1:0.4]; %! theta = [0.2, 0.1, 0.1, 0.05]; %! p = copulacdf ("FGM", x, theta); %! expected_p = [0.0558; 0.0293]; %! assert (p, expected_p, 0.001); statistics/inst/nanmean.m0000644000175000017500000000250711741556364015376 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{v} =} nanmean (@var{X}) ## @deftypefnx{Function File} {@var{v} =} nanmean (@var{X}, @var{dim}) ## Compute the mean value while ignoring NaN values. ## ## @code{nanmean} is identical to the @code{mean} function except that NaN values ## are ignored. If all values are NaN, the mean is returned as NaN. ## ## @seealso{mean, nanmin, nanmax, nansum, nanmedian} ## @end deftypefn function v = nanmean (X, varargin) if nargin < 1 print_usage; else n = sum (!isnan(X), varargin{:}); n(n == 0) = NaN; X(isnan(X)) = 0; v = sum (X, varargin{:}) ./ n; endif endfunction statistics/inst/ff2n.m0000644000175000017500000000047611741556364014617 0ustar asneltasnelt## Author: Paul Kienzle ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} ff2n (@var{n}) ## Full-factor design with n binary terms. ## ## @seealso {fullfact} ## @end deftypefn function A=ff2n(n) A = fullfact (2 * ones (1,n)) - 1; endfunction statistics/inst/runstest.m0000644000175000017500000001243112133602332015624 0ustar asneltasnelt## Copyright (C) 2013 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{h}, @var{p}, @var{stats} =} runstest (@var{x}, @var{v}) ## Runs test for detecting serial correlation in the vector @var{x}. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the vector of given values. ## @item ## @var{v} is the value to subtract from @var{x} to get runs (defaults to @code{median(x)}) ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{h} is true if serial correlation is detected at the 95% confidence level (two-tailed), false otherwise. ## @item ## @var{p} is the probablity of obtaining a test statistic of the magnitude found under the null hypothesis of no serial correlation. ## @item ## @var{stats} is the structure containing as fields the number of runs @var{nruns}; the numbers of positive and negative values of @code{x - v}, @var{n1} and @var{n0}; and the test statistic @var{z}. ## ## @end itemize ## ## Note: the large-sample normal approximation is used to find @var{h} and @var{p}. This is accurate if @var{n1}, @var{n0} are both greater than 10. ## ## Reference: ## NIST Engineering Statistics Handbook, 1.3.5.13. Runs Test for Detecting Non-randomness, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm ## ## @seealso{} ## @end deftypefn ## Author: Nir Krakauer ## Description: Runs test for detecting serial correlation function [h, p, stats] = runstest (x, x2) # Check arguments if (nargin < 1) print_usage; endif if nargin > 1 && isnumeric(x2) v = x2; else v = median(x); endif x = x(~isnan(x)); #delete missing values x = sign(x - v); x = x(x ~= 0); #delete any zeros R = sum((x(1:(end-1)) .* x(2:end)) < 0) + 1; #number of runs #expected number of runs for an iid sequence n1 = sum(x > 0); n2 = sum(x < 0); R_bar = 1 + 2*n1*n2/(n1 + n2); #standard deviation of number of runs for an iid sequence s_R = sqrt(2*n1*n2*(2*n1*n2 - n1 - n2)/((n1 + n2)^2 * (n1 + n2 - 1))); #desired significance level alpha = 0.05; Z = (R - R_bar) / s_R; #test statistic p = 2 * normcdf(-abs(Z)); h = p < alpha; if nargout > 2 stats.nruns = R; stats.n1 = n1; stats.n0 = n2; stats.z = Z; endif endfunction %!test %! data = [-213 -564 -35 -15 141 115 -420 -360 203 -338 -431 194 -220 -513 154 -125 -559 92 -21 -579 -52 99 -543 -175 162 -457 -346 204 -300 -474 164 -107 -572 -8 83 -541 -224 180 -420 -374 201 -236 -531 83 27 -564 -112 131 -507 -254 199 -311 -495 143 -46 -579 -90 136 -472 -338 202 -287 -477 169 -124 -568 17 48 -568 -135 162 -430 -422 172 -74 -577 -13 92 -534 -243 194 -355 -465 156 -81 -578 -64 139 -449 -384 193 -198 -538 110 -44 -577 -6 66 -552 -164 161 -460 -344 205 -281 -504 134 -28 -576 -118 156 -437 -381 200 -220 -540 83 11 -568 -160 172 -414 -408 188 -125 -572 -32 139 -492 -321 205 -262 -504 142 -83 -574 0 48 -571 -106 137 -501 -266 190 -391 -406 194 -186 -553 83 -13 -577 -49 103 -515 -280 201 300 -506 131 -45 -578 -80 138 -462 -361 201 -211 -554 32 74 -533 -235 187 -372 -442 182 -147 -566 25 68 -535 -244 194 -351 -463 174 -125 -570 15 72 -550 -190 172 -424 -385 198 -218 -536 96]; #NIST beam deflection data, http://www.itl.nist.gov/div898/handbook/eda/section4/eda425.htm %! [h, p, stats] = runstest (data); %! expected_h = true; %! expected_p = 0.0070646; %! expected_z = 2.6938; %! assert (h, expected_h); %! assert (p, expected_p, 1E-6); %! assert (stats.z, expected_z, 1E-4); statistics/inst/poisstat.m0000644000175000017500000000452711741556364015633 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} poisstat (@var{lambda}) ## Compute mean and variance of the Poisson distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{lambda} is the parameter of the Poisson distribution. The ## elements of @var{lambda} must be positive ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the Poisson distribution ## ## @item ## @var{v} is the variance of the Poisson distribution ## @end itemize ## ## @subheading Example ## ## @example ## @group ## lambda = 1 ./ (1:6); ## [m, v] = poisstat (lambda) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the Poisson distribution function [m, v] = poisstat (lambda) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (lambda) && ! ismatrix (lambda)) error ("poisstat: lambda must be a numeric matrix"); endif # Set moments m = lambda; v = lambda; # Continue argument check k = find (! (lambda > 0) | ! (lambda < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! lambda = 1 ./ (1:6); %! [m, v] = poisstat (lambda); %! assert (m, lambda); %! assert (v, lambda); statistics/inst/gevrnd.m0000644000175000017500000001021512067160771015234 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## Octave is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Octave; see the file COPYING. If not, see ## . ## -*- texinfo -*- ## @deftypefn {Function File} {} gevrnd (@var{k}, @var{sigma}, @var{mu}) ## @deftypefnx {Function File} {} gevrnd (@var{k}, @var{sigma}, @var{mu}, @var{r}) ## @deftypefnx {Function File} {} gevrnd (@var{k}, @var{sigma}, @var{mu}, @var{r}, @var{c}, @dots{}) ## @deftypefnx {Function File} {} gevrnd (@var{k}, @var{sigma}, @var{mu}, [@var{sz}]) ## Return a matrix of random samples from the generalized extreme value (GEV) distribution with parameters ## @var{k}, @var{sigma}, @var{mu}. ## ## When called with a single size argument, returns a square matrix with ## the dimension specified. When called with more than one scalar argument the ## first two arguments are taken as the number of rows and columns and any ## further arguments specify additional matrix dimensions. The size may also ## be specified with a vector @var{sz} of dimensions. ## ## If no size arguments are given, then the result matrix is the common size of ## the input parameters. ## @seealso{gevcdf, gevfit, gevinv, gevlike, gevpdf, gevstat} ## @end deftypefn ## Author: Nir Krakauer ## Description: Random deviates from the generalized extreme value distribution function rnd = gevrnd (k, sigma, mu, varargin) if (nargin < 3) print_usage (); endif if any (sigma <= 0) error ("gevrnd: sigma must be positive"); endif if (!isscalar (k) || !isscalar (sigma) || !isscalar (mu)) [retval, k, sigma, mu] = common_size (k, sigma, mu); if (retval > 0) error ("gevrnd: k, sigma, mu must be of common size or scalars"); endif endif if (iscomplex (k) || iscomplex (sigma) || iscomplex (mu)) error ("gevrnd: k, sigma, mu must not be complex"); endif if (nargin == 3) sz = size (k); elseif (nargin == 4) if (isscalar (varargin{1}) && varargin{1} >= 0) sz = [varargin{1}, varargin{1}]; elseif (isrow (varargin{1}) && all (varargin{1} >= 0)) sz = varargin{1}; else error ("gevrnd: dimension vector must be row vector of non-negative integers"); endif elseif (nargin > 4) if (any (cellfun (@(x) (!isscalar (x) || x < 0), varargin))) error ("gevrnd: dimensions must be non-negative integers"); endif sz = [varargin{:}]; endif if (!isscalar (k) && !isequal (size (k), sz)) error ("gevrnd: k, sigma, mu must be scalar or of size SZ"); endif if (isa (k, "single") || isa (sigma, "single") || isa (mu, "single")) cls = "single"; else cls = "double"; endif rnd = gevinv (rand(sz), k, sigma, mu); if (strcmp (cls, "single")) rnd = single (rnd); endif endfunction %!assert(size (gevrnd (1,2,1)), [1, 1]); %!assert(size (gevrnd (ones(2,1), 2, 1)), [2, 1]); %!assert(size (gevrnd (ones(2,2), 2, 1)), [2, 2]); %!assert(size (gevrnd (1, 2*ones(2,1), 1)), [2, 1]); %!assert(size (gevrnd (1, 2*ones(2,2), 1)), [2, 2]); %!assert(size (gevrnd (1, 2, 1, 3)), [3, 3]); %!assert(size (gevrnd (1, 2, 1, [4 1])), [4, 1]); %!assert(size (gevrnd (1, 2, 1, 4, 1)), [4, 1]); %% Test input validation %!error gevrnd () %!error gevrnd (1, 2) %!error gevrnd (ones(3),ones(2),1) %!error gevrnd (ones(2),ones(3),1) %!error gevrnd (i, 2, 1) %!error gevrnd (2, i, 1) %!error gevrnd (2, 0, 1) %!error gevrnd (1,2, 1, -1) %!error gevrnd (1,2, 1, ones(2)) %!error gevrnd (1,2, 1, [2 -1 2]) %!error gevrnd (1,2, 1, 1, ones(2)) %!error gevrnd (1,2, 1, 1, -1) %!error gevrnd (ones(2,2), 2, 1, 3) %!error gevrnd (ones(2,2), 2, 1, [3, 2]) %!error gevrnd (ones(2,2), 2, 1, 2, 3) statistics/inst/random.m0000644000175000017500000001546211741556364015245 0ustar asneltasnelt## Copyright (C) 2007 Soren Hauberg ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} @var{r} = random(@var{name}, @var{arg1}) ## @deftypefnx{Function File} @var{r} = random(@var{name}, @var{arg1}, @var{arg2}) ## @deftypefnx{Function File} @var{r} = random(@var{name}, @var{arg1}, @var{arg2}, @var{arg3}) ## @deftypefnx{Function File} @var{r} = random(@var{name}, ..., @var{s1}, ...) ## Generates pseudo-random numbers from a given one-, two-, or three-parameter ## distribution. ## ## The variable @var{name} must be a string that names the distribution from ## which to sample. If this distribution is a one-parameter distribution @var{arg1} ## should be supplied, if it is a two-paramter distribution @var{arg2} must also ## be supplied, and if it is a three-parameter distribution @var{arg3} must also ## be present. Any arguments following the distribution paramters will determine ## the size of the result. ## ## As an example, the following code generates a 10 by 20 matrix containing ## random numbers from a normal distribution with mean 5 and standard deviation ## 2. ## @example ## R = random("normal", 5, 2, [10, 20]); ## @end example ## ## The variable @var{name} can be one of the following strings ## ## @table @asis ## @item "beta" ## @itemx "beta distribution" ## Samples are drawn from the Beta distribution. ## @item "bino" ## @itemx "binomial" ## @itemx "binomial distribution" ## Samples are drawn from the Binomial distribution. ## @item "chi2" ## @itemx "chi-square" ## @itemx "chi-square distribution" ## Samples are drawn from the Chi-Square distribution. ## @item "exp" ## @itemx "exponential" ## @itemx "exponential distribution" ## Samples are drawn from the Exponential distribution. ## @item "f" ## @itemx "f distribution" ## Samples are drawn from the F distribution. ## @item "gam" ## @itemx "gamma" ## @itemx "gamma distribution" ## Samples are drawn from the Gamma distribution. ## @item "geo" ## @itemx "geometric" ## @itemx "geometric distribution" ## Samples are drawn from the Geometric distribution. ## @item "hyge" ## @itemx "hypergeometric" ## @itemx "hypergeometric distribution" ## Samples are drawn from the Hypergeometric distribution. ## @item "logn" ## @itemx "lognormal" ## @itemx "lognormal distribution" ## Samples are drawn from the Log-Normal distribution. ## @item "nbin" ## @itemx "negative binomial" ## @itemx "negative binomial distribution" ## Samples are drawn from the Negative Binomial distribution. ## @item "norm" ## @itemx "normal" ## @itemx "normal distribution" ## Samples are drawn from the Normal distribution. ## @item "poiss" ## @itemx "poisson" ## @itemx "poisson distribution" ## Samples are drawn from the Poisson distribution. ## @item "rayl" ## @itemx "rayleigh" ## @itemx "rayleigh distribution" ## Samples are drawn from the Rayleigh distribution. ## @item "t" ## @itemx "t distribution" ## Samples are drawn from the T distribution. ## @item "unif" ## @itemx "uniform" ## @itemx "uniform distribution" ## Samples are drawn from the Uniform distribution. ## @item "unid" ## @itemx "discrete uniform" ## @itemx "discrete uniform distribution" ## Samples are drawn from the Uniform Discrete distribution. ## @item "wbl" ## @itemx "weibull" ## @itemx "weibull distribution" ## Samples are drawn from the Weibull distribution. ## @end table ## @seealso{rand, betarnd, binornd, chi2rnd, exprnd, frnd, gamrnd, geornd, hygernd, ## lognrnd, nbinrnd, normrnd, poissrnd, raylrnd, trnd, unifrnd, unidrnd, wblrnd} ## @end deftypefn function retval = random(name, varargin) ## General input checking if (nargin < 2) print_usage(); endif if (!ischar(name)) error("random: first input argument must be a string"); endif ## Select distribution switch (lower(name)) case {"beta", "beta distribution"} retval = betarnd(varargin{:}); case {"bino", "binomial", "binomial distribution"} retval = binornd(varargin{:}); case {"chi2", "chi-square", "chi-square distribution"} retval = chi2rnd(varargin{:}); case {"exp", "exponential", "exponential distribution"} retval = exprnd(varargin{:}); case {"ev", "extreme value", "extreme value distribution"} error("random: distribution type '%s' is not yet implemented", name); case {"f", "f distribution"} retval = frnd(varargin{:}); case {"gam", "gamma", "gamma distribution"} retval = gamrnd(varargin{:}); case {"gev", "generalized extreme value", "generalized extreme value distribution"} error("random: distribution type '%s' is not yet implemented", name); case {"gp", "generalized pareto", "generalized pareto distribution"} error("random: distribution type '%s' is not yet implemented", name); case {"geo", "geometric", "geometric distribution"} retval = geornd(varargin{:}); case {"hyge", "hypergeometric", "hypergeometric distribution"} retval = hygernd(varargin{:}); case {"logn", "lognormal", "lognormal distribution"} retval = lognrnd(varargin{:}); case {"nbin", "negative binomial", "negative binomial distribution"} retval = nbinrnd(varargin{:}); case {"ncf", "noncentral f", "noncentral f distribution"} error("random: distribution type '%s' is not yet implemented", name); case {"nct", "noncentral t", "noncentral t distribution"} error("random: distribution type '%s' is not yet implemented", name); case {"ncx2", "noncentral chi-square", "noncentral chi-square distribution"} error("random: distribution type '%s' is not yet implemented", name); case {"norm", "normal", "normal distribution"} retval = normrnd(varargin{:}); case {"poiss", "poisson", "poisson distribution"} retval = poissrnd(varargin{:}); case {"rayl", "rayleigh", "rayleigh distribution"} retval = raylrnd(varargin{:}); case {"t", "t distribution"} retval = trnd(varargin{:}); case {"unif", "uniform", "uniform distribution"} retval = unifrnd(varargin{:}); case {"unid", "discrete uniform", "discrete uniform distribution"} retval = unidrnd(varargin{:}); case {"wbl", "weibull", "weibull distribution"} retval = wblrnd(varargin{:}); otherwise error("random: unsupported distribution type '%s'", name); endswitch endfunction statistics/inst/jackknife.m0000644000175000017500000001177111741556364015711 0ustar asneltasnelt## Copyright (C) 2011 Alexander Klein ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn{Function File} {@var{jackstat} =} jackknife (@var{E}, @var{x}, @dots{}) ## Compute jackknife estimates of a parameter taking one or more given samples as parameters. ## In particular, @var{E} is the estimator to be jackknifed as a function name, handle, ## or inline function, and @var{x} is the sample for which the estimate is to be taken. ## The @var{i}-th entry of @var{jackstat} will contain the value of the estimator ## on the sample @var{x} with its @var{i}-th row omitted. ## ## @example ## @group ## jackstat(@var{i}) = @var{E}(@var{x}(1 : @var{i} - 1, @var{i} + 1 : length(@var{x}))) ## @end group ## @end example ## ## Depending on the number of samples to be used, the estimator must have the appropriate form: ## If only one sample is used, then the estimator need not be concerned with cell arrays, ## for example jackknifing the standard deviation of a sample can be performed with ## @code{@var{jackstat} = jackknife (@@std, rand (100, 1))}. ## If, however, more than one sample is to be used, the samples must all be of equal size, ## and the estimator must address them as elements of a cell-array, ## in which they are aggregated in their order of appearance: ## ## @example ## @group ## @var{jackstat} = jackknife(@@(x) std(x@{1@})/var(x@{2@}), rand (100, 1), randn (100, 1) ## @end group ## @end example ## ## If all goes well, a theoretical value @var{P} for the parameter is already known, ## @var{n} is the sample size, ## @code{@var{t} = @var{n} * @var{E}(@var{x}) - (@var{n} - 1) * mean(@var{jackstat})}, and ## @code{@var{v} = sumsq(@var{n} * @var{E}(@var{x}) - (@var{n} - 1) * @var{jackstat} - @var{t}) / (@var{n} * (@var{n} - 1))}, then ## @code{(@var{t}-@var{P})/sqrt(@var{v})} should follow a t-distribution with @var{n}-1 degrees of freedom. ## ## Jackknifing is a well known method to reduce bias; further details can be found in: ## @itemize @bullet ## @item Rupert G. Miller: The jackknife-a review; Biometrika (1974) 61(1): 1-15; doi:10.1093/biomet/61.1.1 ## @item Rupert G. Miller: Jackknifing Variances; Ann. Math. Statist. Volume 39, Number 2 (1968), 567-582; doi:10.1214/aoms/1177698418 ## @item M. H. Quenouille: Notes on Bias in Estimation; Biometrika Vol. 43, No. 3/4 (Dec., 1956), pp. 353-360; doi:10.1093/biomet/43.3-4.353 ## @end itemize ## @end deftypefn ## Author: Alexander Klein ## Created: 2011-11-25 function jackstat = jackknife ( anEstimator, varargin ) ## Convert function name to handle if necessary, or throw ## an error. if ( !strcmp ( typeinfo ( anEstimator ), "function handle" ) ) if ( isascii ( anEstimator ) ) anEstimator = str2func ( anEstimator ); else error ( "Estimators must be passed as function names or handles!" ); end end ## Simple jackknifing can be done with a single vector argument, and ## first and foremost with a function that does not care about ## cell-arrays. if ( length ( varargin ) == 1 && isnumeric ( varargin { 1 } ) ) aSample = varargin { 1 }; g = length ( aSample ); jackstat = zeros ( 1, g ); for k = 1 : g jackstat ( k ) = anEstimator ( aSample ( [ 1 : k - 1, k + 1 : g ] ) ); end ## More complicated input requires more work, however. else g = cellfun ( @(x) length ( x ), varargin ); if ( any ( g - g ( 1 ) ) ) error ( "All passed data must be of equal length!" ); end g = g ( 1 ); jackstat = zeros ( 1, g ); for k = 1 : g jackstat ( k ) = anEstimator ( cellfun ( @(x) x( [ 1 : k - 1, k + 1 : g ] ), varargin, "UniformOutput", false ) ); end end endfunction %!test %! ##Example from Quenouille, Table 1 %! d=[0.18 4.00 1.04 0.85 2.14 1.01 3.01 2.33 1.57 2.19]; %! jackstat = jackknife ( @(x) 1/mean(x), d ); %! assert ( 10 / mean(d) - 9 * mean(jackstat), 0.5240, 1e-5 ); %!demo %! for k = 1:1000 %! x=rand(10,1); %! s(k)=std(x); %! jackstat=jackknife(@std,x); %! j(k)=10*std(x) - 9*mean(jackstat); %! end %! figure();hist([s',j'], 0:sqrt(1/12)/10:2*sqrt(1/12)) %!demo %! for k = 1:1000 %! x=randn(1,50); %! y=rand(1,50); %! jackstat=jackknife(@(x) std(x{1})/std(x{2}),y,x); %! j(k)=50*std(y)/std(x) - 49*mean(jackstat); %! v(k)=sumsq((50*std(y)/std(x) - 49*jackstat) - j(k)) / (50 * 49); %! end %! t=(j-sqrt(1/12))./sqrt(v); %! figure();plot(sort(tcdf(t,49)),"-;Almost linear mapping indicates good fit with t-distribution.;") statistics/inst/mvnpdf.m0000644000175000017500000000743511741556364015260 0ustar asneltasnelt## Author: Paul Kienzle ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} mvnpdf (@var{x}) ## @deftypefnx{Function File} {@var{y} =} mvnpdf (@var{x}, @var{mu}) ## @deftypefnx{Function File} {@var{y} =} mvnpdf (@var{x}, @var{mu}, @var{sigma}) ## Compute multivariate normal pdf for @var{x} given mean @var{mu} and covariance matrix ## @var{sigma}. The dimension of @var{x} is @var{d} x @var{p}, @var{mu} is ## @var{1} x @var{p} and @var{sigma} is @var{p} x @var{p}. The normal pdf is ## defined as ## ## @example ## @iftex ## @tex ## $$ 1/y^2 = (2 pi)^p |\Sigma| \exp \{ (x-\mu)^T \Sigma^{-1} (x-\mu) \} $$ ## @end tex ## @end iftex ## @ifnottex ## 1/@var{y}^2 = (2 pi)^@var{p} |@var{Sigma}| exp @{ (@var{x}-@var{mu})' inv(@var{Sigma})@ ## (@var{x}-@var{mu}) @} ## @end ifnottex ## @end example ## ## @strong{References} ## ## NIST Engineering Statistics Handbook 6.5.4.2 ## http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm ## ## @strong{Algorithm} ## ## Using Cholesky factorization on the positive definite covariance matrix: ## ## @example ## @var{r} = chol (@var{sigma}); ## @end example ## ## where @var{r}'*@var{r} = @var{sigma}. Being upper triangular, the determinant ## of @var{r} is trivially the product of the diagonal, and the determinant of ## @var{sigma} is the square of this: ## ## @example ## @var{det} = prod (diag (@var{r}))^2; ## @end example ## ## The formula asks for the square root of the determinant, so no need to ## square it. ## ## The exponential argument @var{A} = @var{x}' * inv (@var{sigma}) * @var{x} ## ## @example ## @var{A} = @var{x}' * inv (@var{sigma}) * @var{x} ## = @var{x}' * inv (@var{r}' * @var{r}) * @var{x} ## = @var{x}' * inv (@var{r}) * inv(@var{r}') * @var{x} ## @end example ## ## Given that inv (@var{r}') == inv(@var{r})', at least in theory if not numerically, ## ## @example ## @var{A} = (@var{x}' / @var{r}) * (@var{x}'/@var{r})' = sumsq (@var{x}'/@var{r}) ## @end example ## ## The interface takes the parameters to the multivariate normal in columns rather than ## rows, so we are actually dealing with the transpose: ## ## @example ## @var{A} = sumsq (@var{x}/r) ## @end example ## ## and the final result is: ## ## @example ## @var{r} = chol (@var{sigma}) ## @var{y} = (2*pi)^(-@var{p}/2) * exp (-sumsq ((@var{x}-@var{mu})/@var{r}, 2)/2) / prod (diag (@var{r})) ## @end example ## ## @seealso{mvncdf, mvnrnd} ## @end deftypefn function pdf = mvnpdf (x, mu = 0, sigma = 1) ## Check input if (!ismatrix (x)) error ("mvnpdf: first input must be a matrix"); endif if (!isvector (mu) && !isscalar (mu)) error ("mvnpdf: second input must be a real scalar or vector"); endif if (!ismatrix (sigma) || !issquare (sigma)) error ("mvnpdf: third input must be a square matrix"); endif [ps, ps] = size (sigma); [d, p] = size (x); if (p != ps) error ("mvnpdf: dimensions of data and covariance matrix does not match"); endif if (numel (mu) != p && numel (mu) != 1) error ("mvnpdf: dimensions of data does not match dimensions of mean value"); endif mu = mu (:).'; if (all (size (mu) == [1, p])) mu = repmat (mu, [d, 1]); endif if (nargin < 3) pdf = (2*pi)^(-p/2) * exp (-sumsq (x-mu, 2)/2); else r = chol (sigma); pdf = (2*pi)^(-p/2) * exp (-sumsq ((x-mu)/r, 2)/2) / prod (diag (r)); endif endfunction %!demo %! mu = [0, 0]; %! sigma = [1, 0.1; 0.1, 0.5]; %! [X, Y] = meshgrid (linspace (-3, 3, 25)); %! XY = [X(:), Y(:)]; %! Z = mvnpdf (XY, mu, sigma); %! mesh (X, Y, reshape (Z, size (X))); %! colormap jet %!test %! mu = [1,-1]; %! sigma = [.9 .4; .4 .3]; %! x = [ 0.5 -1.2; -0.5 -1.4; 0 -1.5]; %! p = [ 0.41680003660313; 0.10278162359708; 0.27187267524566 ]; %! q = mvnpdf (x, mu, sigma); %! assert (p, q, 10*eps); statistics/inst/raylrnd.m0000644000175000017500000001053511752275032015424 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{x} =} raylrnd (@var{sigma}) ## @deftypefnx {Function File} {@var{x} =} raylrnd (@var{sigma}, @var{sz}) ## @deftypefnx {Function File} {@var{x} =} raylrnd (@var{sigma}, @var{r}, @var{c}) ## Generate a matrix of random samples from the Rayleigh distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{sigma} is the parameter of the Rayleigh distribution. The elements ## of @var{sigma} must be positive. ## ## @item ## @var{sz} is the size of the matrix to be generated. @var{sz} must be a ## vector of non-negative integers. ## ## @item ## @var{r} is the number of rows of the matrix to be generated. @var{r} must ## be a non-negative integer. ## ## @item ## @var{c} is the number of columns of the matrix to be generated. @var{c} ## must be a non-negative integer. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{x} is a matrix of random samples from the Rayleigh distribution with ## corresponding parameter @var{sigma}. If neither @var{sz} nor @var{r} and ## @var{c} are specified, then @var{x} is of the same size as @var{sigma}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## sigma = 1:6; ## x = raylrnd (sigma) ## @end group ## ## @group ## sz = [2, 3]; ## x = raylrnd (0.5, sz) ## @end group ## ## @group ## r = 2; ## c = 3; ## x = raylrnd (0.5, r, c) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. pages 104 and 148, McGraw-Hill, New York, second edition, ## 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Random samples from the Rayleigh distribution function x = raylrnd (sigma, r, c) # Check arguments if (nargin == 1) sz = size (sigma); elseif (nargin == 2) if (! isvector (r) || any ((r < 0) | round (r) != r)) error ("raylrnd: sz must be a vector of non-negative integers") endif sz = r(:)'; if (! isscalar (sigma) && ! isempty (sigma) && (length (size (sigma)) != length (sz) || any (size (sigma) != sz))) error ("raylrnd: sigma must be scalar or of size sz"); endif elseif (nargin == 3) if (! isscalar (r) || any ((r < 0) | round (r) != r)) error ("raylrnd: r must be a non-negative integer") endif if (! isscalar (c) || any ((c < 0) | round (c) != c)) error ("raylrnd: c must be a non-negative integer") endif sz = [r, c]; if (! isscalar (sigma) && ! isempty (sigma) && (length (size (sigma)) != length (sz) || any (size (sigma) != sz))) error ("raylrnd: sigma must be scalar or of size [r, c]"); endif else print_usage (); endif if (! isempty (sigma) && ! ismatrix (sigma)) error ("raylrnd: sigma must be a numeric matrix"); endif if (isempty (sigma)) x = []; elseif (isscalar (sigma) && ! (sigma > 0)) x = NaN .* ones (sz); else # Draw random samples x = sqrt (-2 .* log (1 - rand (sz)) .* sigma .^ 2); # Continue argument check k = find (! (sigma > 0)); if (any (k)) x(k) = NaN; endif endif endfunction %!test %! sigma = 1:6; %! x = raylrnd (sigma); %! assert (size (x), size (sigma)); %! assert (all (x >= 0)); %!test %! sigma = 0.5; %! sz = [2, 3]; %! x = raylrnd (sigma, sz); %! assert (size (x), sz); %! assert (all (x >= 0)); %!test %! sigma = 0.5; %! r = 2; %! c = 3; %! x = raylrnd (sigma, r, c); %! assert (size (x), [r, c]); %! assert (all (x >= 0)); statistics/inst/regress.m0000644000175000017500000001516111741556364015433 0ustar asneltasnelt## Copyright (C) 2005, 2006 William Poetra Yoga Hadisoeseno ## Copyright (C) 2011 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{b}, @var{bint}, @var{r}, @var{rint}, @var{stats}] =} regress (@var{y}, @var{X}, [@var{alpha}]) ## Multiple Linear Regression using Least Squares Fit of @var{y} on @var{X} ## with the model @code{y = X * beta + e}. ## ## Here, ## ## @itemize ## @item ## @code{y} is a column vector of observed values ## @item ## @code{X} is a matrix of regressors, with the first column filled with ## the constant value 1 ## @item ## @code{beta} is a column vector of regression parameters ## @item ## @code{e} is a column vector of random errors ## @end itemize ## ## Arguments are ## ## @itemize ## @item ## @var{y} is the @code{y} in the model ## @item ## @var{X} is the @code{X} in the model ## @item ## @var{alpha} is the significance level used to calculate the confidence ## intervals @var{bint} and @var{rint} (see `Return values' below). If not ## specified, ALPHA defaults to 0.05 ## @end itemize ## ## Return values are ## ## @itemize ## @item ## @var{b} is the @code{beta} in the model ## @item ## @var{bint} is the confidence interval for @var{b} ## @item ## @var{r} is a column vector of residuals ## @item ## @var{rint} is the confidence interval for @var{r} ## @item ## @var{stats} is a row vector containing: ## ## @itemize ## @item The R^2 statistic ## @item The F statistic ## @item The p value for the full model ## @item The estimated error variance ## @end itemize ## @end itemize ## ## @var{r} and @var{rint} can be passed to @code{rcoplot} to visualize ## the residual intervals and identify outliers. ## ## NaN values in @var{y} and @var{X} are removed before calculation begins. ## ## @end deftypefn ## References: ## - Matlab 7.0 documentation (pdf) ## - ¡¶´óѧÊưѧʵÑé¡· ½ªÆôÔ´ µÈ (textbook) ## - http://www.netnam.vn/unescocourse/statistics/12_5.htm ## - wsolve.m in octave-forge ## - http://www.stanford.edu/class/ee263/ls_ln_matlab.pdf function [b, bint, r, rint, stats] = regress (y, X, alpha) if (nargin < 2 || nargin > 3) print_usage; endif if (! ismatrix (y)) error ("regress: y must be a numeric matrix"); endif if (! ismatrix (X)) error ("regress: X must be a numeric matrix"); endif if (columns (y) != 1) error ("regress: y must be a column vector"); endif if (rows (y) != rows (X)) error ("regress: y and X must contain the same number of rows"); endif if (nargin < 3) alpha = 0.05; elseif (! isscalar (alpha)) error ("regress: alpha must be a scalar value") endif notnans = ! logical (sum (isnan ([y X]), 2)); y = y(notnans); X = X(notnans,:); [Xq Xr] = qr (X, 0); pinv_X = Xr \ Xq'; b = pinv_X * y; if (nargout > 1) n = rows (X); p = columns (X); dof = n - p; t_alpha_2 = tinv (alpha / 2, dof); r = y - X * b; # added -- Nir SSE = sum (r .^ 2); v = SSE / dof; # c = diag(inv (X' * X)) using (economy) QR decomposition # which means that we only have to use Xr c = diag (inv (Xr' * Xr)); db = t_alpha_2 * sqrt (v * c); bint = [b + db, b - db]; endif if (nargout > 3) dof1 = n - p - 1; h = sum(X.*pinv_X', 2); #added -- Nir (same as diag(X*pinv_X), without doing the matrix multiply) # From Matlab's documentation on Multiple Linear Regression, # sigmaihat2 = norm (r) ^ 2 / dof1 - r .^ 2 / (dof1 * (1 - h)); # dr = -tinv (1 - alpha / 2, dof) * sqrt (sigmaihat2 .* (1 - h)); # Substitute # norm (r) ^ 2 == sum (r .^ 2) == SSE # -tinv (1 - alpha / 2, dof) == tinv (alpha / 2, dof) == t_alpha_2 # We get # sigmaihat2 = (SSE - r .^ 2 / (1 - h)) / dof1; # dr = t_alpha_2 * sqrt (sigmaihat2 .* (1 - h)); # Combine, we get # dr = t_alpha_2 * sqrt ((SSE * (1 - h) - (r .^ 2)) / dof1); dr = t_alpha_2 * sqrt ((SSE * (1 - h) - (r .^ 2)) / dof1); rint = [r + dr, r - dr]; endif if (nargout > 4) R2 = 1 - SSE / sum ((y - mean (y)) .^ 2); # F = (R2 / (p - 1)) / ((1 - R2) / dof); F = dof / (p - 1) / (1 / R2 - 1); pval = 1 - fcdf (F, p - 1, dof); stats = [R2 F pval v]; endif endfunction %!test %! % Longley data from the NIST Statistical Reference Dataset %! Z = [ 60323 83.0 234289 2356 1590 107608 1947 %! 61122 88.5 259426 2325 1456 108632 1948 %! 60171 88.2 258054 3682 1616 109773 1949 %! 61187 89.5 284599 3351 1650 110929 1950 %! 63221 96.2 328975 2099 3099 112075 1951 %! 63639 98.1 346999 1932 3594 113270 1952 %! 64989 99.0 365385 1870 3547 115094 1953 %! 63761 100.0 363112 3578 3350 116219 1954 %! 66019 101.2 397469 2904 3048 117388 1955 %! 67857 104.6 419180 2822 2857 118734 1956 %! 68169 108.4 442769 2936 2798 120445 1957 %! 66513 110.8 444546 4681 2637 121950 1958 %! 68655 112.6 482704 3813 2552 123366 1959 %! 69564 114.2 502601 3931 2514 125368 1960 %! 69331 115.7 518173 4806 2572 127852 1961 %! 70551 116.9 554894 4007 2827 130081 1962 ]; %! % Results certified by NIST using 500 digit arithmetic %! % b and standard error in b %! V = [ -3482258.63459582 890420.383607373 %! 15.0618722713733 84.9149257747669 %! -0.358191792925910E-01 0.334910077722432E-01 %! -2.02022980381683 0.488399681651699 %! -1.03322686717359 0.214274163161675 %! -0.511041056535807E-01 0.226073200069370 %! 1829.15146461355 455.478499142212 ]; %! Rsq = 0.995479004577296; %! F = 330.285339234588; %! y = Z(:,1); X = [ones(rows(Z),1), Z(:,2:end)]; %! alpha = 0.05; %! [b, bint, r, rint, stats] = regress (y, X, alpha); %! assert(b,V(:,1),3e-6); %! assert(stats(1),Rsq,1e-12); %! assert(stats(2),F,3e-8); %! assert(((bint(:,1)-bint(:,2))/2)/tinv(alpha/2,9),V(:,2),-1.e-5); statistics/inst/kmeans.m0000644000175000017500000001034312001676530015220 0ustar asneltasnelt## Copyright (C) 2011 Soren Hauberg ## Copyright (C) 2012 Daniel Ward ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{idx}, @var{centers}] =} kmeans (@var{data}, @var{k}, @var{param1}, @var{value1}, @dots{}) ## K-means clustering. ## ## @seealso{linkage} ## @end deftypefn function [classes, centers, sumd, D] = kmeans (data, k, varargin) [reg, prop] = parseparams (varargin); ## defaults for options emptyaction = "error"; start = "sample"; #used for getting the number of samples nRows = rows (data); ## used to hold the distances from each sample to each class D = zeros (nRows, k); #used for convergence of the centroids err = 1; #initial sum of distances sumd = Inf; ## Input checking, validate the matrix and k if (!isnumeric (data) || !ismatrix (data) || !isreal (data)) error ("kmeans: first input argument must be a DxN real data matrix"); elseif (!isscalar (k)) error ("kmeans: second input argument must be a scalar"); endif if (length (varargin) > 0) ## check for the 'emptyaction' property found = find (strcmpi (prop, "emptyaction") == 1); switch (lower (prop{found+1})) case "singleton" emptyaction = "singleton"; otherwise error ("kmeans: unsupported empty cluster action parameter"); endswitch endif ## check for the 'start' property switch (lower (start)) case "sample" idx = randperm (nRows) (1:k); centers = data (idx, :); otherwise error ("kmeans: unsupported initial clustering parameter"); endswitch ## Run the algorithm while err > .001 ## Compute distances for i = 1:k D (:, i) = sumsq (data - repmat (centers(i, :), nRows, 1), 2); endfor ## Classify [tmp, classes] = min (D, [], 2); ## Calculate new centroids for i = 1:k ## Get binary vector indicating membership in cluster i membership = (classes == i); ## Check for empty clusters if (sum (membership) == 0) switch emptyaction ## if 'singleton', then find the point that is the ## farthest and add it to the empty cluster case 'singleton' idx=maxCostSampleIndex (data, centers(i,:)); classes(idx) = i; membership(idx)=1; ## if 'error' then throw the error otherwise error ("kmeans: empty cluster created"); endswitch endif ## end check for empty clusters ## update the centroids members = data(membership, :); centers(i, :) = sum(members,1)/size(members,1); endfor ## calculate the difference in the sum of distances err = sumd - objCost (data, classes, centers); ## update the current sum of distances sumd = objCost (data, classes, centers); endwhile endfunction ## calculate the sum of distances function obj = objCost (data, classes, centers) obj = 0; for i=1:rows (data) obj = obj + sumsq (data(i,:) - centers(classes(i),:)); endfor endfunction function idx = maxCostSampleIndex (data, centers) cost = 0; for idx = 1:rows (data) if cost < sumsq (data(idx,:) - centers) cost = sumsq (data(idx,:) - centers); endif endfor endfunction %!demo %! ## Generate a two-cluster problem %! C1 = randn (100, 2) + 1; %! C2 = randn (100, 2) - 1; %! data = [C1; C2]; %! %! ## Perform clustering %! [idx, centers] = kmeans (data, 2); %! %! ## Plot the result %! figure %! plot (data (idx==1, 1), data (idx==1, 2), 'ro'); %! hold on %! plot (data (idx==2, 1), data (idx==2, 2), 'bs'); %! plot (centers (:, 1), centers (:, 2), 'kv', 'markersize', 10); %! hold off statistics/inst/hmmgenerate.m0000644000175000017500000002102411741556364016250 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{sequence}, @var{states}] =} hmmgenerate (@var{len}, @var{transprob}, @var{outprob}) ## @deftypefnx {Function File} {} hmmgenerate (@dots{}, 'symbols', @var{symbols}) ## @deftypefnx {Function File} {} hmmgenerate (@dots{}, 'statenames', @var{statenames}) ## Generate an output sequence and hidden states of a hidden Markov model. ## The model starts in state @code{1} at step @code{0} but will not include ## step @code{0} in the generated states and sequence. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{len} is the number of steps to generate. @var{sequence} and ## @var{states} will have @var{len} entries each. ## ## @item ## @var{transprob} is the matrix of transition probabilities of the states. ## @code{transprob(i, j)} is the probability of a transition to state ## @code{j} given state @code{i}. ## ## @item ## @var{outprob} is the matrix of output probabilities. ## @code{outprob(i, j)} is the probability of generating output @code{j} ## given state @code{i}. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{sequence} is a vector of length @var{len} of the generated ## outputs. The outputs are integers ranging from @code{1} to ## @code{columns (outprob)}. ## ## @item ## @var{states} is a vector of length @var{len} of the generated hidden ## states. The states are integers ranging from @code{1} to ## @code{columns (transprob)}. ## @end itemize ## ## If @code{'symbols'} is specified, then the elements of @var{symbols} are ## used for the output sequence instead of integers ranging from @code{1} to ## @code{columns (outprob)}. @var{symbols} can be a cell array. ## ## If @code{'statenames'} is specified, then the elements of ## @var{statenames} are used for the states instead of integers ranging from ## @code{1} to @code{columns (transprob)}. @var{statenames} can be a cell ## array. ## ## @subheading Examples ## ## @example ## @group ## transprob = [0.8, 0.2; 0.4, 0.6]; ## outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; ## [sequence, states] = hmmgenerate (25, transprob, outprob) ## @end group ## ## @group ## symbols = @{'A', 'B', 'C'@}; ## statenames = @{'One', 'Two'@}; ## [sequence, states] = hmmgenerate (25, transprob, outprob, ## 'symbols', symbols, 'statenames', statenames) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected ## Applications in Speech Recognition. @cite{Proceedings of the IEEE}, ## 77(2), pages 257-286, February 1989. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Output sequence and hidden states of a hidden Markov model function [sequence, states] = hmmgenerate (len, transprob, outprob, varargin) # Check arguments if (nargin < 3 || mod (length (varargin), 2) != 0) print_usage (); endif if (! isscalar (len) || len < 0 || round (len) != len) error ("hmmgenerate: len must be a non-negative scalar integer") endif if (! ismatrix (transprob)) error ("hmmgenerate: transprob must be a non-empty numeric matrix"); endif if (! ismatrix (outprob)) error ("hmmgenerate: outprob must be a non-empty numeric matrix"); endif # nstate is the number of states of the hidden Markov model nstate = rows (transprob); # noutput is the number of different outputs that the hidden Markov model # can generate noutput = columns (outprob); # Check whether transprob and outprob are feasible for a hidden Markov # model if (columns (transprob) != nstate) error ("hmmgenerate: transprob must be a square matrix"); endif if (rows (outprob) != nstate) error ("hmmgenerate: outprob must have the same number of rows as transprob"); endif # Flag for symbols usesym = false; # Flag for statenames usesn = false; # Process varargin for i = 1:2:length (varargin) # There must be an identifier: 'symbols' or 'statenames' if (! ischar (varargin{i})) print_usage (); endif # Upper case is also fine lowerarg = lower (varargin{i}); if (strcmp (lowerarg, 'symbols')) if (length (varargin{i + 1}) != noutput) error ("hmmgenerate: number of symbols does not match number of possible outputs"); endif usesym = true; # Use the following argument as symbols symbols = varargin{i + 1}; # The same for statenames elseif (strcmp (lowerarg, 'statenames')) if (length (varargin{i + 1}) != nstate) error ("hmmgenerate: number of statenames does not match number of states"); endif usesn = true; # Use the following argument as statenames statenames = varargin{i + 1}; else error ("hmmgenerate: expected 'symbols' or 'statenames' but found '%s'", varargin{i}); endif endfor # Each row in transprob and outprob should contain probabilities # => scale so that the sum is 1 # A zero row remains zero # - for transprob s = sum (transprob, 2); s(s == 0) = 1; transprob = transprob ./ repmat (s, 1, nstate); # - for outprob s = sum (outprob, 2); s(s == 0) = 1; outprob = outprob ./ repmat (s, 1, noutput); # Generate sequences of uniformly distributed random numbers between 0 and # 1 # - for the state transitions transdraw = rand (1, len); # - for the outputs outdraw = rand (1, len); # Generate the return vectors # They remain unchanged if the according probability row of transprob # and outprob contain, respectively, only zeros sequence = ones (1, len); states = ones (1, len); if (len > 0) # Calculate cumulated probabilities backwards for easy comparison with # the generated random numbers # Cumulated probability in first column must always be 1 # We might have a zero row # - for transprob transprob(:, end:-1:1) = cumsum (transprob(:, end:-1:1), 2); transprob(:, 1) = 1; # - for outprob outprob(:, end:-1:1) = cumsum (outprob(:, end:-1:1), 2); outprob(:, 1) = 1; # cstate is the current state # Start in state 1 but do not include it in the states vector cstate = 1; for i = 1:len # Compare the randon number i of transdraw to the cumulated # probability of the state transition and set the transition # accordingly states(i) = sum (transdraw(i) <= transprob(cstate, :)); cstate = states(i); endfor # Compare the random numbers of outdraw to the cumulated probabilities # of the outputs and set the sequence vector accordingly sequence = sum (repmat (outdraw, noutput, 1) <= outprob(states, :)', 1); # Transform default matrices into symbols/statenames if requested if (usesym) sequence = reshape (symbols(sequence), 1, len); endif if (usesn) states = reshape (statenames(states), 1, len); endif endif endfunction %!test %! len = 25; %! transprob = [0.8, 0.2; 0.4, 0.6]; %! outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; %! [sequence, states] = hmmgenerate (len, transprob, outprob); %! assert (length (sequence), len); %! assert (length (states), len); %! assert (min (sequence) >= 1); %! assert (max (sequence) <= columns (outprob)); %! assert (min (states) >= 1); %! assert (max (states) <= rows (transprob)); %!test %! len = 25; %! transprob = [0.8, 0.2; 0.4, 0.6]; %! outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; %! symbols = {'A', 'B', 'C'}; %! statenames = {'One', 'Two'}; %! [sequence, states] = hmmgenerate (len, transprob, outprob, 'symbols', symbols, 'statenames', statenames); %! assert (length (sequence), len); %! assert (length (states), len); %! assert (strcmp (sequence, 'A') + strcmp (sequence, 'B') + strcmp (sequence, 'C') == ones (1, len)); %! assert (strcmp (states, 'One') + strcmp (states, 'Two') == ones (1, len)); statistics/inst/betastat.m0000644000175000017500000000641311741556364015570 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} betastat (@var{a}, @var{b}) ## Compute mean and variance of the beta distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{a} is the first parameter of the beta distribution. @var{a} must be ## positive ## ## @item ## @var{b} is the second parameter of the beta distribution. @var{b} must be ## positive ## @end itemize ## @var{a} and @var{b} must be of common size or one of them must be scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the beta distribution ## ## @item ## @var{v} is the variance of the beta distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## a = 1:6; ## b = 1:0.2:2; ## [m, v] = betastat (a, b) ## @end group ## ## @group ## [m, v] = betastat (a, 1.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the beta distribution function [m, v] = betastat (a, b) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (a) && ! ismatrix (a)) error ("betastat: a must be a numeric matrix"); endif if (! isempty (b) && ! ismatrix (b)) error ("betastat: b must be a numeric matrix"); endif if (! isscalar (a) || ! isscalar (b)) [retval, a, b] = common_size (a, b); if (retval > 0) error ("betastat: a and b must be of common size or scalar"); endif endif # Calculate moments m = a ./ (a + b); v = (a .* b) ./ (((a + b) .^ 2) .* (a + b + 1)); # Continue argument check k = find (! (a > 0) | ! (a < Inf) | ! (b > 0) | ! (b < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! a = 1:6; %! b = 1:0.2:2; %! [m, v] = betastat (a, b); %! expected_m = [0.5000, 0.6250, 0.6818, 0.7143, 0.7353, 0.7500]; %! expected_v = [0.0833, 0.0558, 0.0402, 0.0309, 0.0250, 0.0208]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! a = 1:6; %! [m, v] = betastat (a, 1.5); %! expected_m = [0.4000, 0.5714, 0.6667, 0.7273, 0.7692, 0.8000]; %! expected_v = [0.0686, 0.0544, 0.0404, 0.0305, 0.0237, 0.0188]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/combnk.m0000644000175000017500000000454611730217466015232 0ustar asneltasnelt## Copyright (C) 2010 Soren Hauberg ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{c} =} combnk (@var{data}, @var{k}) ## Return all combinations of @var{k} elements in @var{data}. ## @end deftypefn function retval = combnk (data, k) ## Check input if (nargin != 2) print_usage; elseif (! isvector (data)) error ("combnk: first input argument must be a vector"); elseif (!isreal (k) || k != round (k) || k < 0) error ("combnk: second input argument must be a non-negative integer"); endif ## Simple checks n = numel (data); if (k == 0 || k > n) retval = resize (data, 0, k); elseif (k == n) retval = data (:).'; else retval = __combnk__ (data, k); endif ## For some odd reason Matlab seems to treat strings differently compared to other data-types... if (ischar (data)) retval = flipud (retval); endif endfunction function retval = __combnk__ (data, k) ## Recursion stopping criteria if (k == 1) retval = data (:); else ## Process data n = numel (data); if iscell (data) retval = {}; else retval = []; endif for j = 1:n C = __combnk__ (data ((j+1):end), k-1); C = cat (2, repmat (data (j), rows (C), 1), C); if (!isempty (C)) retval = [retval; C]; endif endfor endif endfunction %!demo %! c = combnk (1:5, 2); %! disp ("All pairs of integers between 1 and 5:"); %! disp (c); %!test %! c = combnk (1:3, 2); %! assert (c, [1, 2; 1, 3; 2, 3]); %!test %! c = combnk (1:3, 6); %! assert (isempty (c)); %!test %! c = combnk ({1, 2, 3}, 2); %! assert (c, {1, 2; 1, 3; 2, 3}); %!test %! c = combnk ("hello", 2); %! assert (c, ["lo"; "lo"; "ll"; "eo"; "el"; "el"; "ho"; "hl"; "hl"; "he"]); statistics/inst/nbinstat.m0000644000175000017500000000643011741556364015602 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} nbinstat (@var{n}, @var{p}) ## Compute mean and variance of the negative binomial distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{n} is the first parameter of the negative binomial distribution. The elements ## of @var{n} must be natural numbers ## ## @item ## @var{p} is the second parameter of the negative binomial distribution. The ## elements of @var{p} must be probabilities ## @end itemize ## @var{n} and @var{p} must be of common size or one of them must be scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the negative binomial distribution ## ## @item ## @var{v} is the variance of the negative binomial distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## n = 1:4; ## p = 0.2:0.2:0.8; ## [m, v] = nbinstat (n, p) ## @end group ## ## @group ## [m, v] = nbinstat (n, 0.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the negative binomial distribution function [m, v] = nbinstat (n, p) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (n) && ! ismatrix (n)) error ("nbinstat: n must be a numeric matrix"); endif if (! isempty (p) && ! ismatrix (p)) error ("nbinstat: p must be a numeric matrix"); endif if (! isscalar (n) || ! isscalar (p)) [retval, n, p] = common_size (n, p); if (retval > 0) error ("nbinstat: n and p must be of common size or scalar"); endif endif # Calculate moments q = 1 - p; m = n .* q ./ p; v = n .* q ./ (p .^ 2); # Continue argument check k = find (! (n > 0) | ! (n < Inf) | ! (p > 0) | ! (p < 1)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! n = 1:4; %! p = 0.2:0.2:0.8; %! [m, v] = nbinstat (n, p); %! expected_m = [ 4.0000, 3.0000, 2.0000, 1.0000]; %! expected_v = [20.0000, 7.5000, 3.3333, 1.2500]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! n = 1:4; %! [m, v] = nbinstat (n, 0.5); %! expected_m = [1, 2, 3, 4]; %! expected_v = [2, 4, 6, 8]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/chi2stat.m0000644000175000017500000000445011741556364015501 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} chi2stat (@var{n}) ## Compute mean and variance of the chi-square distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{n} is the parameter of the chi-square distribution. The elements ## of @var{n} must be positive ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the chi-square distribution ## ## @item ## @var{v} is the variance of the chi-square distribution ## @end itemize ## ## @subheading Example ## ## @example ## @group ## n = 1:6; ## [m, v] = chi2stat (n) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the chi-square distribution function [m, v] = chi2stat (n) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (n) && ! ismatrix (n)) error ("chi2stat: n must be a numeric matrix"); endif # Calculate moments m = n; v = 2 .* n; # Continue argument check k = find (! (n > 0) | ! (n < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! n = 1:6; %! [m, v] = chi2stat (n); %! assert (m, n); %! assert (v, [2, 4, 6, 8, 10, 12], 0.001); statistics/inst/fullfact.m0000644000175000017500000000140411741556364015554 0ustar asneltasnelt## Author: Paul Kienzle ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} fullfact (@var{N}) ## Full factorial design. ## ## If @var{N} is a scalar, return the full factorial design with @var{N} binary ## choices, 0 and 1. ## ## If @var{N} is a vector, return the full factorial design with choices 1 ## through @var{n_i} for each factor @var{i}. ## ## @end deftypefn function A = fullfact(n) if length(n) == 1 % combinatorial design with n either/or choices A = fullfact(2*ones(1,n))-1; else % combinatorial design with n(i) choices per level A = [1:n(end)]'; for i=length(n)-1:-1:1 A = [kron([1:n(i)]',ones(rows(A),1)), repmat(A,n(i),1)]; end end endfunction statistics/inst/casewrite.m0000644000175000017500000000402111741556364015740 0ustar asneltasnelt## Copyright (C) 2008 Bill Denney ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} casewrite (@var{strmat}, @var{filename}) ## Write case names to an ascii file. ## ## Essentially, this writes all lines from @var{strmat} to ## @var{filename} (after deblanking them). ## @seealso{caseread, tblread, tblwrite, csv2cell, cell2csv, fopen} ## @end deftypefn ## Author: Bill Denney ## Description: Write strings from a file function names = casewrite (s="", f="") ## Check arguments if nargin != 2 print_usage (); endif if isempty (f) ## FIXME: open a file dialog box in this case when a file dialog box ## becomes available error ("casewrite: filename must be given") endif if isempty (s) error ("casewrite: strmat must be given") elseif ! ischar (s) error ("casewrite: strmat must be a character matrix") elseif ndims (s) != 2 error ("casewrite: strmat must be two dimensional") endif [fid msg] = fopen (f, "wt"); if fid < 0 || (! isempty (msg)) error ("casewrite: cannot open %s for writing: %s", f, msg); endif for i = 1:rows (s) status = fputs (fid, sprintf ("%s\n", deblank (s(i,:)))); endfor if (fclose (fid) < 0) error ("casewrite: error closing f") endif endfunction ## Tests %!shared s %! s = ["a ";"bcd";"ef "]; %!test %! casewrite (s, "casewrite.dat") %! assert(caseread ("casewrite.dat"), s); statistics/inst/gamstat.m0000644000175000017500000000627611741556364015430 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} gamstat (@var{a}, @var{b}) ## Compute mean and variance of the gamma distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{a} is the first parameter of the gamma distribution. @var{a} must be ## positive ## ## @item ## @var{b} is the second parameter of the gamma distribution. @var{b} must be ## positive ## @end itemize ## @var{a} and @var{b} must be of common size or one of them must be scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the gamma distribution ## ## @item ## @var{v} is the variance of the gamma distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## a = 1:6; ## b = 1:0.2:2; ## [m, v] = gamstat (a, b) ## @end group ## ## @group ## [m, v] = gamstat (a, 1.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the gamma distribution function [m, v] = gamstat (a, b) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (a) && ! ismatrix (a)) error ("gamstat: a must be a numeric matrix"); endif if (! isempty (b) && ! ismatrix (b)) error ("gamstat: b must be a numeric matrix"); endif if (! isscalar (a) || ! isscalar (b)) [retval, a, b] = common_size (a, b); if (retval > 0) error ("gamstat: a and b must be of common size or scalar"); endif endif # Calculate moments m = a .* b; v = a .* (b .^ 2); # Continue argument check k = find (! (a > 0) | ! (a < Inf) | ! (b > 0) | ! (b < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! a = 1:6; %! b = 1:0.2:2; %! [m, v] = gamstat (a, b); %! expected_m = [1.00, 2.40, 4.20, 6.40, 9.00, 12.00]; %! expected_v = [1.00, 2.88, 5.88, 10.24, 16.20, 24.00]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! a = 1:6; %! [m, v] = gamstat (a, 1.5); %! expected_m = [1.50, 3.00, 4.50, 6.00, 7.50, 9.00]; %! expected_v = [2.25, 4.50, 6.75, 9.00, 11.25, 13.50]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/normplot.m0000644000175000017500000000435112041401712015606 0ustar asneltasnelt## Author: Paul Kienzle ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} normplot (@var{X}) ## Produce normal probability plot for each column of @var{X}. ## ## The line joing the 1st and 3rd quantile is drawn on the ## graph. If the underlying distribution is normal, the ## points will cluster around this line. ## ## Note that this function sets the title, xlabel, ylabel, ## axis, grid, tics and hold properties of the graph. These ## need to be cleared before subsequent graphs using 'clf'. ## @end deftypefn function normplot(X) if nargin!=1, print_usage; end if (rows(X) == 1), X=X(:); end # Transform data n = rows(X); if n<2, error("normplot requires a vector"); end q = norminv([1:n]'/(n+1)); Y = sort(X); # Find the line joining the first to the third quartile for each column q1 = ceil(n/4); q3 = n-q1+1; m = (q(q3)-q(q1))./(Y(q3,:)-Y(q1,:)); p = [ m; q(q1)-m.*Y(q1,:) ]; # Plot the lines one at a time. Plot the lines before overlaying the # normals so that the default label is 'line n'. if columns(Y)==1, leg = "+;;"; else leg = "%d+;Column %d;"; endif for i=1:columns(Y) plot(Y(:,i),q,sprintf(leg,i,i)); hold on; # estimate the mean and standard deviation by linear regression # [v,dv] = wpolyfit(q,Y(:,i),1) end # Overlay the estimated normal lines. for i=1:columns(Y) # Use the end points and one point guaranteed to be in the view since # gnuplot skips any lines whose points are all outside the view. pts = [Y(1,i);Y(q1,i);Y(end,i)]; plot(pts, polyval(p(:,i),pts), [num2str(i),";;"]); end hold off; # plot labels title "Normal Probability Plot" ylabel "% Probability" xlabel "Data" # plot grid t = [0.00001;0.0001;0.001;0.01;0.1;0.3;1;2;5;10;25;50; 75;90;95;98;99;99.7;99.9;99.99;99.999;99.9999;99.99999]; set(gca, "ytick", norminv(t/100), "yticklabel", num2str(t)); grid on # Set view range with a bit of space around data miny = min(Y(:)); minq = min(q(1),norminv(0.05)); maxy = max(Y(:)); maxq = max(q(end),norminv(0.95)); yspace = (maxy-miny)*0.05; qspace = (q(end)-q(1))*0.05; axis ([miny-yspace, maxy+yspace, minq-qspace, maxq+qspace]); end statistics/inst/harmmean.m0000644000175000017500000000215111741556364015544 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} harmmean (@var{x}) ## @deftypefnx{Function File} harmmean (@var{x}, @var{dim}) ## Compute the harmonic mean. ## ## This function does the same as @code{mean (x, "h")}. ## ## @seealso{mean} ## @end deftypefn function a = harmmean(x, dim) if (nargin == 1) a = mean(x, "h"); elseif (nargin == 2) a = mean(x, "h", dim); else print_usage; endif endfunction statistics/inst/geostat.m0000644000175000017500000000455011741556364015427 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} geostat (@var{p}) ## Compute mean and variance of the geometric distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{p} is the rate parameter of the geometric distribution. The ## elements of @var{p} must be probabilities ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the geometric distribution ## ## @item ## @var{v} is the variance of the geometric distribution ## @end itemize ## ## @subheading Example ## ## @example ## @group ## p = 1 ./ (1:6); ## [m, v] = geostat (p) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the geometric distribution function [m, v] = geostat (p) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (p) && ! ismatrix (p)) error ("geostat: p must be a numeric matrix"); endif # Calculate moments q = 1 - p; m = q ./ p; v = q ./ (p .^ 2); # Continue argument check k = find (! (p >= 0) | ! (p <= 1)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! p = 1 ./ (1:6); %! [m, v] = geostat (p); %! assert (m, [0, 1, 2, 3, 4, 5], 0.001); %! assert (v, [0, 2, 6, 12, 20, 30], 0.001); statistics/inst/boxplot.m0000644000175000017500000002470512246445126015446 0ustar asneltasnelt## Copyright (C) 2002 Alberto Terruzzi ## Copyright (C) 2006 Alberto Pose ## Copyright (C) 2011 Pascal Dupuis ## Copyright (C) 2012 Juan Pablo Carbajal ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{s} =} boxplot (@var{data}, @var{notched}, @ ## @var{symbol}, @var{vertical}, @var{maxwhisker}, @dots{}) ## @deftypefnx {Function File} {[@dots{} @var{h}]=} boxplot (@dots{}) ## ## Produce a box plot. ## ## The box plot is a graphical display that simultaneously describes several ## important features of a data set, such as center, spread, departure from ## symmetry, and identification of observations that lie unusually far from ## the bulk of the data. ## ## @var{data} is a matrix with one column for each data set, or data is a cell ## vector with one cell for each data set. ## ## @var{notched} = 1 produces a notched-box plot. Notches represent a robust ## estimate of the uncertainty about the median. ## ## @var{notched} = 0 (default) produces a rectangular box plot. ## ## @var{notched} in (0,1) produces a notch of the specified depth. ## notched values outside (0,1) are amusing if not exactly practical. ## ## @var{symbol} sets the symbol for the outlier values, default symbol for ## points that lie outside 3 times the interquartile range is 'o', ## default symbol for points between 1.5 and 3 times the interquartile ## range is '+'. ## ## @var{symbol} = '.' points between 1.5 and 3 times the IQR is marked with ## '.' and points outside 3 times IQR with 'o'. ## ## @var{symbol} = ['x','*'] points between 1.5 and 3 times the IQR is marked with ## 'x' and points outside 3 times IQR with '*'. ## ## @var{vertical} = 0 makes the boxes horizontal, by default @var{vertical} = 1. ## ## @var{maxwhisker} defines the length of the whiskers as a function of the IQR ## (default = 1.5). If @var{maxwhisker} = 0 then @code{boxplot} displays all data ## values outside the box using the plotting symbol for points that lie ## outside 3 times the IQR. ## ## Supplemental arguments are concatenated and passed to plot. ## ## The returned matrix @var{s} has one column for each data set as follows: ## ## @multitable @columnfractions .1 .8 ## @item 1 @tab Minimum ## @item 2 @tab 1st quartile ## @item 3 @tab 2nd quartile (median) ## @item 4 @tab 3rd quartile ## @item 5 @tab Maximum ## @item 6 @tab Lower confidence limit for median ## @item 7 @tab Upper confidence limit for median ## @end multitable ## ## The returned structure @var{h} has hanldes to the plot elements, allowing ## customization of the visualization using set/get functions. ## ## Example ## ## @example ## title ("Grade 3 heights"); ## axis ([0,3]); ## tics ("x", 1:2, @{"girls"; "boys"@}); ## boxplot (@{randn(10,1)*5+140, randn(13,1)*8+135@}); ## @end example ## ## @end deftypefn ## Author: Alberto Terruzzi ## Version: 1.4 ## Created: 6 January 2002 ## Version: 1.4.1 ## Author: Alberto Pose ## Updated: 3 September 2006 ## - Replaced deprecated is_nan_or_na(X) with (isnan(X) | isna(X)) ## (now works with this software 2.9.7 and foward) ## Version: 1.4.2 ## Author: Pascal Dupuis ## Updated: 14 October 2011 ## - Added support for named arguments ## Version: 1.4.2 ## Author: Juan Pablo Carbajal ## Updated: 01 March 2012 ## - Returns structure with handles to plot elements ## - Added example as demo %# function s = boxplot (data,notched,symbol,vertical,maxwhisker) function [s hs] = boxplot (data, varargin) ## assign parameter defaults if (nargin < 1) print_usage; endif %# default values maxwhisker = 1.5; vertical = 1; symbol = ['+', 'o']; notched = 0; plot_opts = {}; %# Optional arguments analysis numarg = nargin - 1; option_args = ['Notch'; 'Symbol'; 'Vertical'; 'Maxwhisker']; indopt = 1; while (numarg) dummy = varargin{indopt++}; if (!ischar (dummy)) %# old way: positional argument switch indopt case 2 notched = dummy; case 4 vertical = dummy; case 5 maxwhisker = dummy; otherwise error("No positional argument allowed at position %d", --indopt); endswitch numarg--; continue; else if (3 == indopt && length (dummy) <= 2) symbol = dummy; numarg--; continue; else tt = strmatch(dummy, option_args); switch (tt) case 1 notched = varargin{indopt}; case 2 symbol = varargin{indopt}; case 3 vertical = varargin{indopt}; case 4 maxwhisker = varargin{indopt}; otherwise %# take two args and append them to plot_opts plot_opts(1, end+1:end+2) = {dummy, varargin{indopt}}; endswitch endif indopt++; numarg -= 2; endif endwhile if (1 == length (symbol)) symbol(2) = symbol(1); endif if (1 == notched) notched = 0.25; endif a = 1-notched; ## figure out how many data sets we have if (iscell (data)) nc = length (data); else if (isvector (data)) data = data(:); endif nc = columns (data); endif ## compute statistics ## s will contain ## 1,5 min and max ## 2,3,4 1st, 2nd and 3rd quartile ## 6,7 lower and upper confidence intervals for median s = zeros (7,nc); box = zeros (1,nc); whisker_x = ones (2,1)*[1:nc,1:nc]; whisker_y = zeros (2,2*nc); outliers_x = []; outliers_y = []; outliers2_x = []; outliers2_y = []; for indi = (1:nc) ## Get the next data set from the array or cell array if (iscell (data)) col = data{indi}(:); else col = data(:, indi); endif ## Skip missing data col(isnan (col) | isna (col)) = []; ## Remember the data length nd = length (col); box(indi) = nd; if (nd > 1) ## min,max and quartiles s(1:5, indi) = statistics (col)(1:5); ## confidence interval for the median est = 1.57*(s(4, indi)-s(2, indi))/sqrt (nd); s(6, indi) = max ([s(3, indi)-est, s(2, indi)]); s(7, indi) = min ([s(3, indi)+est, s(4, indi)]); ## whiskers out to the last point within the desired inter-quartile range IQR = maxwhisker*(s(4, indi)-s(2, indi)); whisker_y(:, indi) = [min(col(col >= s(2, indi)-IQR)); s(2, indi)]; whisker_y(:,nc+indi) = [max(col(col <= s(4, indi)+IQR)); s(4, indi)]; ## outliers beyond 1 and 2 inter-quartile ranges outliers = col((col < s(2, indi)-IQR & col >= s(2, indi)-2*IQR) | (col > s(4, indi)+IQR & col <= s(4, indi)+2*IQR)); outliers2 = col(col < s(2, indi)-2*IQR | col > s(4, indi)+2*IQR); outliers_x = [outliers_x; indi*ones(size(outliers))]; outliers_y = [outliers_y; outliers]; outliers2_x = [outliers2_x; indi*ones(size(outliers2))]; outliers2_y = [outliers2_y; outliers2]; elseif (1 == nd) ## all statistics collapse to the value of the point s(:, indi) = col; ## single point data sets are plotted as outliers. outliers_x = [outliers_x; indi]; outliers_y = [outliers_y; col]; else ## no statistics if no points s(:, indi) = NaN; end end ## Note which boxes don't have enough stats chop = find (box <= 1); ## Draw a box around the quartiles, with width proportional to the number of ## items in the box. Draw notches if desired. box *= 0.4/max (box); quartile_x = ones (11,1)*[1:nc] + [-a;-1;-1;1;1;a;1;1;-1;-1;-a]*box; quartile_y = s([3,7,4,4,7,3,6,2,2,6,3],:); ## Draw a line through the median median_x = ones (2,1)*[1:nc] + [-a;+a]*box; median_y = s([3,3],:); ## Chop all boxes which don't have enough stats quartile_x(:, chop) = []; quartile_y(:, chop) = []; whisker_x(:,[chop, chop+nc]) = []; whisker_y(:,[chop, chop+nc]) = []; median_x(:, chop) = []; median_y(:, chop) = []; ## Add caps to the remaining whiskers cap_x = whisker_x; cap_x(1, :) -= 0.05; cap_x(2, :) += 0.05; cap_y = whisker_y([1, 1], :); #quartile_x,quartile_y #whisker_x,whisker_y #median_x,median_y #cap_x,cap_y ## Do the plot if (vertical) if (isempty (plot_opts)) h = plot (quartile_x, quartile_y, "b;;", whisker_x, whisker_y, "b;;", cap_x, cap_y, "b;;", median_x, median_y, "r;;", outliers_x, outliers_y, [symbol(1), "r;;"], outliers2_x, outliers2_y, [symbol(2), "r;;"]); else h = plot (quartile_x, quartile_y, "b;;", whisker_x, whisker_y, "b;;", cap_x, cap_y, "b;;", median_x, median_y, "r;;", outliers_x, outliers_y, [symbol(1), "r;;"], outliers2_x, outliers2_y, [symbol(2), "r;;"], plot_opts{:}); endif else if (isempty (plot_opts)) h = plot (quartile_y, quartile_x, "b;;", whisker_y, whisker_x, "b;;", cap_y, cap_x, "b;;", median_y, median_x, "r;;", outliers_y, outliers_x, [symbol(1), "r;;"], outliers2_y, outliers2_x, [symbol(2), "r;;"]); else h = plot (quartile_y, quartile_x, "b;;", whisker_y, whisker_x, "b;;", cap_y, cap_x, "b;;", median_y, median_x, "r;;", outliers_y, outliers_x, [symbol(1), "r;;"], outliers2_y, outliers2_x, [symbol(2), "r;;"], plot_opts{:}); endif endif % Distribute handles nq = 1:size(quartile_x,2); hs.box = h(nq); nw = nq(end) + [1:2*size(whisker_x,2)]; hs.whisker = h(nw); nm = nw(end)+ [1:size(median_x,2)]; hs.median = h(nm); no = nm; if ~isempty (outliers_y) no = nm(end) + [1:size(outliers_y,2)]; hs.outliers = h(no); end if ~isempty (outliers2_y) no2 = no(end) + [1:size(outliers2_y,2)]; hs.outliers2 = h(no2); end endfunction %!demo %! axis ([0,3]); %! boxplot ({randn(10,1)*5+140, randn(13,1)*8+135}); %! set(gca (), "xtick", [1 2], "xticklabel", {"girls", "boys"}) %! title ("Grade 3 heights"); statistics/inst/mad.m0000644000175000017500000000544011741556364014521 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} mad (@var{x}) ## @deftypefnx{Function File} mad (@var{x}, @var{flag}) ## @deftypefnx{Function File} mad (@var{x}, @var{flag}, @var{dim}) ## Compute the mean/median absolute deviation of @var{x}. ## ## The mean absolute deviation is computed as ## ## @example ## mean (abs (@var{x} - mean (@var{x}))) ## @end example ## ## and the median absolute deviation is computed as ## ## @example ## median (abs (@var{x} - median (@var{x}))) ## @end example ## ## Elements of @var{x} containing NaN or NA values are ignored during computations. ## ## If @var{flag} is 0, the absolute mean deviation is computed, and if @var{flag} ## is 1, the absolute median deviation is computed. By default @var{flag} is 0. ## ## This is done along the dimension @var{dim} of @var{x}. If this variable is not ## given, the mean/median absolute deviation s computed along the smallest dimension of ## @var{x}. ## ## @seealso{std} ## @end deftypefn function a = mad (X, flag = 0, dim = []) ## Check input if (nargin < 1) print_usage (); endif if (nargin > 3) error ("mad: too many input arguments"); endif if (!isnumeric (X)) error ("mad: first input must be numeric"); endif if (isempty (dim)) dim = min (find (size (X) > 1)); if (isempty(dim)) dim = 1; endif endif if (!isscalar (flag)) error ("mad: second input argument must be a scalar"); endif if (!isscalar (dim)) error ("mad: dimension argument must be a scalar"); endif if (flag == 0) f = @nanmean; else f = @nanmedian; endif ## Compute the mad if (prod(size(X)) != size(X,dim)) sz = ones (1, length (size (X))); sz (dim) = size (X,dim); a = f (abs (X - repmat (f (X, dim), sz)), dim); elseif (all (size (X) > 1)) a = f (abs (X - ones (size(X, 1), 1) * f (X, dim)), dim); else a = f (abs (X - f(X, dim)), dim); endif endfunction ## Tests %!assert (mad(1), 0); %!test %! X = eye(3); abs_mean = [4/9, 4/9, 4/9]; abs_median=[0,0,0]; %! assert(mad(X), abs_mean, eps); %! assert(mad(X, 0), abs_mean, eps); %! assert(mad(X,1), abs_median); statistics/inst/hist3.m0000644000175000017500000001134412175466150015005 0ustar asneltasnelt## Copyright (C) 2007 Roman Stanchak ## ## This file is part of Octave. ## ## Octave is free software; you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2, or (at your option) ## any later version. ## ## Octave is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Octave; see the file COPYING. If not, write to the Free ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA ## 02110-1301, USA. ## -*- texinfo -*- ## @deftypefn {Function File} hist3(@var{X}) ## @deftypefnx {Function File} hist3(@var{X}, @var{nbins}) ## @deftypefnx {Function File} hist3(@var{X}, 'Nbins', @var{nbins}) ## @deftypefnx {Function File} hist3(@var{X}, @var{centers}) ## @deftypefnx {Function File} hist3(@var{X}, 'Centers', @var{centers}) ## @deftypefnx {Function File} hist3(@var{X}, 'Edges', @var{edges}) ## @deftypefnx {Function File} {@var{N} =} hist3(@var{X}, ...) ## @deftypefnx {Function File} {[@var{N}, @var{C}] =} hist3(@var{X}, ...) ## Plots a 2D histogram of the N x 2 matrix @var{X} with 10 equally spaced ## bins in both the x and y direction using the @code{mesh} function ## ## The number of equally spaced bins to compute histogram can be specified with ## @var{nbins}. If @var{nbins} is a 2 element vector, use the two values as the ## number of bins in the x and y axis, respectively, otherwise, use the same ## value for each. ## ## The centers of the histogram bins can be specified with @var{centers}. ## @var{centers} should be a cell array containing two arrays of the bin ## centers on the x and y axis, respectively. ## ## The edges of the histogram bins can be specified with @var{edges}. ## @var{edges} should be a cell array containing two arrays of the bin edges ## on the x and y axis, respectively. ## ## @var{N} returns the 2D array of bin counts, and does not plot the ## histogram ## ## @var{N} and @var{C} returns the 2D array of bin counts in @var{N} and the ## bin centers in the 2 element cell array @var{C}, and does not plot the ## histogram ## ## @seealso{hist, mesh} ## @end deftypefn ## Authors: Paul Kienzle (segments borrowed from hist2d), ## Roman Stanchak (addition of matlab compatible syntax, bin edge arg) function varargout = hist3(varargin) methods={'Nbins', 'Centers', 'Edges'}; method=1; xbins=10; ybins=10; edges={}; M=varargin{1}; if nargin>=2, % is a binning method is specified? if isstr(varargin{2}), method = find(strcmp(methods,varargin{2})); if isempty(method), error('Unknown property string'); elseif nargin < 3, error('Expected an additional argument'); elseif method==2, xbins = varargin{3}{1}; ybins = varargin{3}{2}; elseif method==3 edges = varargin{3}; end elseif iscell(varargin{2}) % second argument contains centers method = 2; xbins = varargin{2}{1}; ybins = varargin{2}{2}; elseif isscalar(varargin{2}), xbins = ybins = varargin{2}; elseif isvector(varargin{2}), % second argument contain number of bins xbins = varargin{2}(1); ybins = varargin{2}(2); else error('Unsupported type for 2nd argument'); end end % If n bins, find centers based on n+1 bin edges if method==1, lo = min(M); hi = max(M); if isscalar(xbins) xbins = linspace(lo(1),hi(1),xbins+1); xbins = (xbins(1:end-1)+xbins(2:end))/2; end if isscalar(ybins) ybins = linspace(lo(2),hi(2),ybins+1); ybins = (ybins(1:end-1)+ybins(2:end))/2; end method=2; end if method==2, % centers specified, compute edges xcut = (xbins(1:end-1)+xbins(2:end))/2; ycut = (ybins(1:end-1)+ybins(2:end))/2; xidx = lookup(xcut,M(:,1))+1; yidx = lookup(ycut,M(:,2))+1; else, % edges specified. Filter points outside edge range xidx = lookup(edges{1},M(:,1)); yidx = lookup(edges{2},M(:,2)); idx = find(xidx>0); idx = intersect(idx, find(xidx0)); idx = intersect(idx, find(yidx1 varargout{2} = {xbins,ybins}; end else mesh(xbins,ybins,full(counts)); end end statistics/inst/mnpdf.m0000644000175000017500000000776611751505627015076 0ustar asneltasnelt## Copyright (C) 2012 Arno Onken ## ## This program is free software: you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 3 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} mnpdf (@var{x}, @var{p}) ## Compute the probability density function of the multinomial distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is vector with a single sample of a multinomial distribution with ## parameter @var{p} or a matrix of random samples from multinomial ## distributions. In the latter case, each row of @var{x} is a sample from a ## multinomial distribution with the corresponding row of @var{p} being its ## parameter. ## ## @item ## @var{p} is a vector with the probabilities of the categories or a matrix ## with each row containing the probabilities of a multinomial sample. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{y} is a vector of probabilites of the random samples @var{x} from the ## multinomial distribution with corresponding parameter @var{p}. The parameter ## @var{n} of the multinomial distribution is the sum of the elements of each ## row of @var{x}. The length of @var{y} is the number of columns of @var{x}. ## If a row of @var{p} does not sum to @code{1}, then the corresponding element ## of @var{y} will be @code{NaN}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = [1, 4, 2]; ## p = [0.2, 0.5, 0.3]; ## y = mnpdf (x, p); ## @end group ## ## @group ## x = [1, 4, 2; 1, 0, 9]; ## p = [0.2, 0.5, 0.3; 0.1, 0.1, 0.8]; ## y = mnpdf (x, p); ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, 2001. ## ## @item ## Merran Evans, Nicholas Hastings and Brian Peacock. @cite{Statistical ## Distributions}. pages 134-136, Wiley, New York, third edition, 2000. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: PDF of the multinomial distribution function y = mnpdf (x, p) # Check arguments if (nargin != 2) print_usage (); endif if (! ismatrix (x) || any (x(:) < 0 | round (x(:) != x(:)))) error ("mnpdf: x must be a matrix of non-negative integer values"); endif if (! ismatrix (p) || any (p(:) < 0)) error ("mnpdf: p must be a non-empty matrix with rows of probabilities"); endif # Adjust input sizes if (! isvector (x) || ! isvector (p)) if (isvector (x)) x = x(:)'; endif if (isvector (p)) p = p(:)'; endif if (size (x, 1) == 1 && size (p, 1) > 1) x = repmat (x, size (p, 1), 1); elseif (size (x, 1) > 1 && size (p, 1) == 1) p = repmat (p, size (x, 1), 1); endif endif # Continue argument check if (any (size (x) != size (p))) error ("mnpdf: x and p must have compatible sizes"); endif # Count total number of elements of each multinomial sample n = sum (x, 2); # Compute probability density function of the multinomial distribution t = x .* log (p); t(x == 0) = 0; y = exp (gammaln (n+1) - sum (gammaln (x+1), 2) + sum (t, 2)); # Set invalid rows to NaN k = (abs (sum (p, 2) - 1) > 1e-6); y(k) = NaN; endfunction %!test %! x = [1, 4, 2]; %! p = [0.2, 0.5, 0.3]; %! y = mnpdf (x, p); %! assert (y, 0.11812, 0.001); %!test %! x = [1, 4, 2; 1, 0, 9]; %! p = [0.2, 0.5, 0.3; 0.1, 0.1, 0.8]; %! y = mnpdf (x, p); %! assert (y, [0.11812; 0.13422], 0.001); statistics/inst/unifstat.m0000644000175000017500000000645011741556364015617 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} unifstat (@var{a}, @var{b}) ## Compute mean and variance of the continuous uniform distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{a} is the first parameter of the continuous uniform distribution ## ## @item ## @var{b} is the second parameter of the continuous uniform distribution ## @end itemize ## @var{a} and @var{b} must be of common size or one of them must be scalar ## and @var{a} must be less than @var{b} ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the continuous uniform distribution ## ## @item ## @var{v} is the variance of the continuous uniform distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## a = 1:6; ## b = 2:2:12; ## [m, v] = unifstat (a, b) ## @end group ## ## @group ## [m, v] = unifstat (a, 10) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the continuous uniform distribution function [m, v] = unifstat (a, b) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (a) && ! ismatrix (a)) error ("unifstat: a must be a numeric matrix"); endif if (! isempty (b) && ! ismatrix (b)) error ("unifstat: b must be a numeric matrix"); endif if (! isscalar (a) || ! isscalar (b)) [retval, a, b] = common_size (a, b); if (retval > 0) error ("unifstat: a and b must be of common size or scalar"); endif endif # Calculate moments m = (a + b) ./ 2; v = ((b - a) .^ 2) ./ 12; # Continue argument check k = find (! (-Inf < a) | ! (a < b) | ! (b < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! a = 1:6; %! b = 2:2:12; %! [m, v] = unifstat (a, b); %! expected_m = [1.5000, 3.0000, 4.5000, 6.0000, 7.5000, 9.0000]; %! expected_v = [0.0833, 0.3333, 0.7500, 1.3333, 2.0833, 3.0000]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! a = 1:6; %! [m, v] = unifstat (a, 10); %! expected_m = [5.5000, 6.0000, 6.5000, 7.0000, 7.5000, 8.0000]; %! expected_v = [6.7500, 5.3333, 4.0833, 3.0000, 2.0833, 1.3333]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/nanmedian.m0000644000175000017500000000522511741556364015713 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} @var{v} = nanmedian (@var{x}) ## @deftypefnx{Function File} @var{v} = nanmedian (@var{x}, @var{dim}) ## Compute the median of data while ignoring NaN values. ## ## This function is identical to the @code{median} function except that NaN values ## are ignored. If all values are NaN, the median is returned as NaN. ## ## @seealso{median, nanmin, nanmax, nansum, nanmean} ## @end deftypefn function v = nanmedian (X, varargin) if nargin < 1 || nargin > 2 print_usage; endif if nargin < 2 dim = min(find(size(X)>1)); if isempty(dim), dim=1; endif; else dim = varargin{:}; endif sz = size (X); if (prod (sz) > 1) ## Find lengths of datasets after excluding NaNs; valid datasets ## are those that are not empty after you remove all the NaNs n = sz(dim) - sum (isnan(X),varargin{:}); ## When n is equal to zero, force it to one, so that median ## picks up a NaN value below n (n==0) = 1; ## Sort the datasets, with the NaN going to the end of the data X = sort (X, varargin{:}); ## Determine the offset for each column in single index mode colidx = reshape((0:(prod(sz) / sz(dim) - 1)), size(n)); colidx = floor(colidx / prod(sz(1:dim-1))) * prod(sz(1:dim)) + ... mod(colidx,prod(sz(1:dim-1))); stride = prod(sz(1:dim-1)); ## Average the two central values of the sorted list to compute ## the median, but only do so for valid rows. If the dataset ## is odd length, the single central value will be used twice. ## E.g., ## for n==5, ceil(2.5+0.5) is 3 and floor(2.5+0.5) is also 3 ## for n==6, ceil(3.0+0.5) is 4 and floor(3.0+0.5) is 3 ## correction made for stride of data "stride*ceil(2.5-0.5)+1" v = (X(colidx + stride*ceil(n./2-0.5) + 1) + ... X(colidx + stride*floor(n./2-0.5) + 1)) ./ 2; else error ("nanmedian: invalid matrix argument"); endif endfunction statistics/inst/mvncdf.m0000644000175000017500000001115011741556364015230 0ustar asneltasnelt## Copyright (C) 2008 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{p} =} mvncdf (@var{x}, @var{mu}, @var{sigma}) ## @deftypefnx {Function File} {} mvncdf (@var{a}, @var{x}, @var{mu}, @var{sigma}) ## @deftypefnx {Function File} {[@var{p}, @var{err}] =} mvncdf (@dots{}) ## Compute the cumulative distribution function of the multivariate ## normal distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the upper limit for integration where each row corresponds ## to an observation. ## ## @item ## @var{mu} is the mean. ## ## @item ## @var{sigma} is the correlation matrix. ## ## @item ## @var{a} is the lower limit for integration where each row corresponds ## to an observation. @var{a} must have the same size as @var{x}. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{p} is the cumulative distribution at each row of @var{x} and ## @var{a}. ## ## @item ## @var{err} is the estimated error. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = [1 2]; ## mu = [0.5 1.5]; ## sigma = [1.0 0.5; 0.5 1.0]; ## p = mvncdf (x, mu, sigma) ## @end group ## ## @group ## a = [-inf 0]; ## p = mvncdf (a, x, mu, sigma) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Alan Genz and Frank Bretz. Numerical Computation of Multivariate ## t-Probabilities with Application to Power Calculation of Multiple ## Constrasts. @cite{Journal of Statistical Computation and Simulation}, ## 63, pages 361-378, 1999. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: CDF of the multivariate normal distribution function [p, err] = mvncdf (varargin) # Monte-Carlo confidence factor for the standard error: 99 % gamma = 2.5; # Tolerance err_eps = 1e-3; if (length (varargin) == 1) x = varargin{1}; mu = []; sigma = eye (size (x, 2)); a = -Inf .* ones (size (x)); elseif (length (varargin) == 3) x = varargin{1}; mu = varargin{2}; sigma = varargin{3}; a = -Inf .* ones (size (x)); elseif (length (varargin) == 4) a = varargin{1}; x = varargin{2}; mu = varargin{3}; sigma = varargin{4}; else print_usage (); endif # Dimension q = size (sigma, 1); cases = size (x, 1); # Default value for mu if (isempty (mu)) mu = zeros (1, q); endif # Check parameters if (size (x, 2) != q) error ("mvncdf: x must have the same number of columns as sigma"); endif if (any (size (x) != size (a))) error ("mvncdf: a must have the same size as x"); endif if (isscalar (mu)) mu = ones (1, q) .* mu; elseif (! isvector (mu) || size (mu, 2) != q) error ("mvncdf: mu must be a scalar or a vector with the same number of columns as x"); endif x = x - repmat (mu, cases, 1); if (q < 1 || size (sigma, 2) != q || any (any (sigma != sigma')) || min (eig (sigma)) <= 0) error ("mvncdf: sigma must be nonempty symmetric positive definite"); endif c = chol (sigma)'; # Number of integral transformations n = 1; p = zeros (cases, 1); varsum = zeros (cases, 1); err = ones (cases, 1) .* err_eps; # Apply crude Monte-Carlo estimation while any (err >= err_eps) # Sample from q-1 dimensional unit hypercube w = rand (cases, q - 1); # Transformation of the multivariate normal integral dvev = normcdf ([a(:, 1) / c(1, 1), x(:, 1) / c(1, 1)]); dv = dvev(:, 1); ev = dvev(:, 2); fv = ev - dv; y = zeros (cases, q - 1); for i = 1:(q - 1) y(:, i) = norminv (dv + w(:, i) .* (ev - dv)); dvev = normcdf ([(a(:, i + 1) - c(i + 1, 1:i) .* y(:, 1:i)) ./ c(i + 1, i + 1), (x(:, i + 1) - c(i + 1, 1:i) .* y(:, 1:i)) ./ c(i + 1, i + 1)]); dv = dvev(:, 1); ev = dvev(:, 2); fv = (ev - dv) .* fv; endfor n++; # Estimate standard error varsum += (n - 1) .* ((fv - p) .^ 2) ./ n; err = gamma .* sqrt (varsum ./ (n .* (n - 1))); p += (fv - p) ./ n; endwhile endfunction statistics/inst/gevstat.m0000644000175000017500000000570312260307464015427 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} gevstat (@var{k}, @var{sigma}, @var{mu}) ## Compute the mean and variance of the generalized extreme value (GEV) distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{k} is the shape parameter of the GEV distribution. (Also denoted gamma or xi.) ## @item ## @var{sigma} is the scale parameter of the GEV distribution. The elements ## of @var{sigma} must be positive. ## @item ## @var{mu} is the location parameter of the GEV distribution. ## @end itemize ## The inputs must be of common size, or some of them must be scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the GEV distribution ## ## @item ## @var{v} is the variance of the GEV distribution ## @end itemize ## @seealso{gevcdf, gevfit, gevinv, gevlike, gevpdf, gevrnd} ## @end deftypefn ## Author: Nir Krakauer ## Description: Moments of the generalized extreme value distribution function [m, v] = gevstat (k, sigma, mu) # Check arguments if (nargin < 3) print_usage (); endif if (isempty (k) || isempty (sigma) || isempty (mu) || ~ismatrix (k) || ~ismatrix (sigma) || ~ismatrix (mu)) error ("gevstat: inputs must be numeric matrices"); endif [retval, k, sigma, mu] = common_size (k, sigma, mu); if (retval > 0) error ("gevstat: inputs must be of common size or scalars"); endif eg = 0.57721566490153286; %Euler-Mascheroni constant m = v = k; #find the mean m(k >= 1) = Inf; m(k == 0) = mu(k == 0) + eg*sigma(k == 0); m(k < 1 & k ~= 0) = mu(k < 1 & k ~= 0) + sigma(k < 1 & k ~= 0) .* (gamma(1-k(k < 1 & k ~= 0)) - 1) ./ k(k < 1 & k ~= 0); #find the variance v(k >= 0.5) = Inf; v(k == 0) = (pi^2 / 6) * sigma(k == 0) .^ 2; v(k < 0.5 & k ~= 0) = (gamma(1-2*k(k < 0.5 & k ~= 0)) - gamma(1-k(k < 0.5 & k ~= 0)).^2) .* (sigma(k < 0.5 & k ~= 0) ./ k(k < 0.5 & k ~= 0)) .^ 2; endfunction %!test %! k = [-1 -0.5 0 0.2 0.4 0.5 1]; %! sigma = 2; %! mu = 1; %! [m, v] = gevstat (k, sigma, mu); %! expected_m = [1 1.4551 2.1544 2.6423 3.4460 4.0898 Inf]; %! expected_v = [4 3.4336 6.5797 13.3761 59.3288 Inf Inf]; %! assert (m, expected_m, -0.001); %! assert (v, expected_v, -0.001); statistics/inst/gevinv.m0000644000175000017500000000621212070346332015240 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{X} =} gevinv (@var{P}, @var{k}, @var{sigma}, @var{mu}) ## Compute a desired quantile (inverse CDF) of the generalized extreme value (GEV) distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{P} is the desired quantile of the GEV distribution. (Between 0 and 1.) ## @item ## @var{k} is the shape parameter of the GEV distribution. (Also denoted gamma or xi.) ## @item ## @var{sigma} is the scale parameter of the GEV distribution. The elements ## of @var{sigma} must be positive. ## @item ## @var{mu} is the location parameter of the GEV distribution. ## @end itemize ## The inputs must be of common size, or some of them must be scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{X} is the value corresponding to each quantile of the GEV distribution ## @end itemize ## @subheading References ## ## @enumerate ## @item ## Rolf-Dieter Reiss and Michael Thomas. @cite{Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields}. Chapter 1, pages 16-17, Springer, 2007. ## @item ## J. R. M. Hosking (2012). @cite{L-moments}. R package, version 1.6. URL: http://CRAN.R-project.org/package=lmom. ## ## @end enumerate ## @seealso{gevcdf, gevfit, gevlike, gevpdf, gevrnd, gevstat} ## @end deftypefn ## Author: Nir Krakauer ## Description: Inverse CDF of the generalized extreme value distribution function [X] = gevinv (P, k = 0, sigma = 1, mu = 0) [retval, P, k, sigma, mu] = common_size (P, k, sigma, mu); if (retval > 0) error ("gevinv: inputs must be of common size or scalars"); endif X = P; llP = log(-log(P)); kllP = k .* llP; ii = (abs(kllP) < 1E-4); #use the Taylor series expansion of the exponential to avoid roundoff error or dividing by zero when k is small X(ii) = mu(ii) - sigma(ii) .* llP(ii) .* (1 - kllP(ii) .* (1 - kllP(ii))); X(~ii) = mu(~ii) + (sigma(~ii) ./ k(~ii)) .* (exp(-kllP(~ii)) - 1); endfunction %!test %! p = 0.1:0.1:0.9; %! k = 0; %! sigma = 1; %! mu = 0; %! x = gevinv (p, k, sigma, mu); %! c = gevcdf(x, k, sigma, mu); %! assert (c, p, 0.001); %!test %! p = 0.1:0.1:0.9; %! k = 1; %! sigma = 1; %! mu = 0; %! x = gevinv (p, k, sigma, mu); %! c = gevcdf(x, k, sigma, mu); %! assert (c, p, 0.001); %!test %! p = 0.1:0.1:0.9; %! k = 0.3; %! sigma = 1; %! mu = 0; %! x = gevinv (p, k, sigma, mu); %! c = gevcdf(x, k, sigma, mu); %! assert (c, p, 0.001); statistics/inst/hmmviterbi.m0000644000175000017500000002203711741556364016127 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{vpath} =} hmmviterbi (@var{sequence}, @var{transprob}, @var{outprob}) ## @deftypefnx {Function File} {} hmmviterbi (@dots{}, 'symbols', @var{symbols}) ## @deftypefnx {Function File} {} hmmviterbi (@dots{}, 'statenames', @var{statenames}) ## Use the Viterbi algorithm to find the Viterbi path of a hidden Markov ## model given a sequence of outputs. The model assumes that the generation ## starts in state @code{1} at step @code{0} but does not include step ## @code{0} in the generated states and sequence. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{sequence} is the vector of length @var{len} of given outputs. The ## outputs must be integers ranging from @code{1} to ## @code{columns (outprob)}. ## ## @item ## @var{transprob} is the matrix of transition probabilities of the states. ## @code{transprob(i, j)} is the probability of a transition to state ## @code{j} given state @code{i}. ## ## @item ## @var{outprob} is the matrix of output probabilities. ## @code{outprob(i, j)} is the probability of generating output @code{j} ## given state @code{i}. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{vpath} is the vector of the same length as @var{sequence} of the ## estimated hidden states. The states are integers ranging from @code{1} to ## @code{columns (transprob)}. ## @end itemize ## ## If @code{'symbols'} is specified, then @var{sequence} is expected to be a ## sequence of the elements of @var{symbols} instead of integers ranging ## from @code{1} to @code{columns (outprob)}. @var{symbols} can be a cell array. ## ## If @code{'statenames'} is specified, then the elements of ## @var{statenames} are used for the states in @var{vpath} instead of ## integers ranging from @code{1} to @code{columns (transprob)}. ## @var{statenames} can be a cell array. ## ## @subheading Examples ## ## @example ## @group ## transprob = [0.8, 0.2; 0.4, 0.6]; ## outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; ## [sequence, states] = hmmgenerate (25, transprob, outprob) ## vpath = hmmviterbi (sequence, transprob, outprob) ## @end group ## ## @group ## symbols = @{'A', 'B', 'C'@}; ## statenames = @{'One', 'Two'@}; ## [sequence, states] = hmmgenerate (25, transprob, outprob, ## 'symbols', symbols, 'statenames', statenames) ## vpath = hmmviterbi (sequence, transprob, outprob, ## 'symbols', symbols, 'statenames', statenames) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected ## Applications in Speech Recognition. @cite{Proceedings of the IEEE}, ## 77(2), pages 257-286, February 1989. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Viterbi path of a hidden Markov model function vpath = hmmviterbi (sequence, transprob, outprob, varargin) # Check arguments if (nargin < 3 || mod (length (varargin), 2) != 0) print_usage (); endif if (! ismatrix (transprob)) error ("hmmviterbi: transprob must be a non-empty numeric matrix"); endif if (! ismatrix (outprob)) error ("hmmviterbi: outprob must be a non-empty numeric matrix"); endif len = length (sequence); # nstate is the number of states of the hidden Markov model nstate = rows (transprob); # noutput is the number of different outputs that the hidden Markov model # can generate noutput = columns (outprob); # Check whether transprob and outprob are feasible for a hidden Markov model if (columns (transprob) != nstate) error ("hmmviterbi: transprob must be a square matrix"); endif if (rows (outprob) != nstate) error ("hmmviterbi: outprob must have the same number of rows as transprob"); endif # Flag for symbols usesym = false; # Flag for statenames usesn = false; # Process varargin for i = 1:2:length (varargin) # There must be an identifier: 'symbols' or 'statenames' if (! ischar (varargin{i})) print_usage (); endif # Upper case is also fine lowerarg = lower (varargin{i}); if (strcmp (lowerarg, 'symbols')) if (length (varargin{i + 1}) != noutput) error ("hmmviterbi: number of symbols does not match number of possible outputs"); endif usesym = true; # Use the following argument as symbols symbols = varargin{i + 1}; # The same for statenames elseif (strcmp (lowerarg, 'statenames')) if (length (varargin{i + 1}) != nstate) error ("hmmviterbi: number of statenames does not match number of states"); endif usesn = true; # Use the following argument as statenames statenames = varargin{i + 1}; else error ("hmmviterbi: expected 'symbols' or 'statenames' but found '%s'", varargin{i}); endif endfor # Transform sequence from symbols to integers if necessary if (usesym) # sequenceint is used to build the transformed sequence sequenceint = zeros (1, len); for i = 1:noutput # Search for symbols(i) in the sequence, isequal will have 1 at # corresponding indices; i is the right integer for that symbol isequal = ismember (sequence, symbols(i)); # We do not want to change sequenceint if the symbol appears a second # time in symbols if (any ((sequenceint == 0) & (isequal == 1))) isequal *= i; sequenceint += isequal; endif endfor if (! all (sequenceint)) index = max ((sequenceint == 0) .* (1:len)); error (["hmmviterbi: sequence(" int2str (index) ") not in symbols"]); endif sequence = sequenceint; else if (! isvector (sequence) && ! isempty (sequence)) error ("hmmviterbi: sequence must be a vector"); endif if (! all (ismember (sequence, 1:noutput))) index = max ((ismember (sequence, 1:noutput) == 0) .* (1:len)); error (["hmmviterbi: sequence(" int2str (index) ") out of range"]); endif endif # Each row in transprob and outprob should contain probabilities # => scale so that the sum is 1 # A zero row remains zero # - for transprob s = sum (transprob, 2); s(s == 0) = 1; transprob = transprob ./ (s * ones (1, columns (transprob))); # - for outprob s = sum (outprob, 2); s(s == 0) = 1; outsprob = outprob ./ (s * ones (1, columns (outprob))); # Store the path starting from i in spath(i, :) spath = ones (nstate, len + 1); # Set the first state for each path spath(:, 1) = (1:nstate)'; # Store the probability of path i in spathprob(i) spathprob = transprob(1, :); # Find the most likely paths for the given output sequence for i = 1:len # Calculate the new probabilities of the continuation with each state nextpathprob = ((spathprob' .* outprob(:, sequence(i))) * ones (1, nstate)) .* transprob; # Find the paths with the highest probabilities [spathprob, mindex] = max (nextpathprob); # Update spath and spathprob with the new paths spath = spath(mindex, :); spath(:, i + 1) = (1:nstate)'; endfor # Set vpath to the most likely path # We do not want the last state because we do not have an output for it [m, mindex] = max (spathprob); vpath = spath(mindex, 1:len); # Transform vpath into statenames if requested if (usesn) vpath = reshape (statenames(vpath), 1, len); endif endfunction %!test %! sequence = [1, 2, 1, 1, 1, 2, 2, 1, 2, 3, 3, 3, 3, 2, 3, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3]; %! transprob = [0.8, 0.2; 0.4, 0.6]; %! outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; %! vpath = hmmviterbi (sequence, transprob, outprob); %! expected = [1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1]; %! assert (vpath, expected); %!test %! sequence = {'A', 'B', 'A', 'A', 'A', 'B', 'B', 'A', 'B', 'C', 'C', 'C', 'C', 'B', 'C', 'A', 'A', 'A', 'A', 'C', 'C', 'B', 'C', 'A', 'C'}; %! transprob = [0.8, 0.2; 0.4, 0.6]; %! outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; %! symbols = {'A', 'B', 'C'}; %! statenames = {'One', 'Two'}; %! vpath = hmmviterbi (sequence, transprob, outprob, 'symbols', symbols, 'statenames', statenames); %! expected = {'One', 'One', 'Two', 'Two', 'Two', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'Two', 'Two', 'Two', 'Two', 'One', 'One', 'One', 'One', 'One', 'One'}; %! assert (vpath, expected); statistics/inst/gevcdf.m0000644000175000017500000000723012067160771015210 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{p} =} gevcdf (@var{x}, @var{k}, @var{sigma}, @var{mu}) ## Compute the cumulative distribution function of the generalized extreme value (GEV) distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the support. ## ## @item ## @var{k} is the shape parameter of the GEV distribution. (Also denoted gamma or xi.) ## @item ## @var{sigma} is the scale parameter of the GEV distribution. The elements ## of @var{sigma} must be positive. ## @item ## @var{mu} is the location parameter of the GEV distribution. ## @end itemize ## The inputs must be of common size, or some of them must be scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{p} is the cumulative distribution of the GEV distribution at each ## element of @var{x} and corresponding parameter values. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = 0:0.5:2.5; ## sigma = 1:6; ## k = 1; ## mu = 0; ## y = gevcdf (x, k, sigma, mu) ## @end group ## ## @group ## y = gevcdf (x, k, 0.5, mu) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Rolf-Dieter Reiss and Michael Thomas. @cite{Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields}. Chapter 1, pages 16-17, Springer, 2007. ## ## @end enumerate ## @seealso{gevfit, gevinv, gevlike, gevpdf, gevrnd, gevstat} ## @end deftypefn ## Author: Nir Krakauer ## Description: CDF of the generalized extreme value distribution function p = gevcdf (x, k, sigma, mu) # Check arguments if (nargin != 4) print_usage (); endif if (isempty (x) || isempty (k) || isempty (sigma) || isempty (mu) || ~ismatrix (x) || ~ismatrix (k) || ~ismatrix (sigma) || ~ismatrix (mu)) error ("gevcdf: inputs must be a numeric matrices"); endif [retval, x, k, sigma, mu] = common_size (x, k, sigma, mu); if (retval > 0) error ("gevcdf: inputs must be of common size or scalars"); endif z = 1 + k .* (x - mu) ./ sigma; # Calculate pdf p = exp(-(z .^ (-1 ./ k))); p(z <= 0 & x < mu) = 0; p(z <= 0 & x > mu) = 1; inds = (k == 0); %use a different formula if any(inds) z = (mu(inds) - x(inds)) ./ sigma(inds); p(inds) = exp(-exp(z)); endif endfunction %!test %! x = 0:0.5:2.5; %! sigma = 1:6; %! k = 1; %! mu = 0; %! p = gevcdf (x, k, sigma, mu); %! expected_p = [0.36788 0.44933 0.47237 0.48323 0.48954 0.49367]; %! assert (p, expected_p, 0.001); %!test %! x = -0.5:0.5:2.5; %! sigma = 0.5; %! k = 1; %! mu = 0; %! p = gevcdf (x, k, sigma, mu); %! expected_p = [0 0.36788 0.60653 0.71653 0.77880 0.81873 0.84648]; %! assert (p, expected_p, 0.001); %!test #check for continuity for k near 0 %! x = 1; %! sigma = 0.5; %! k = -0.03:0.01:0.03; %! mu = 0; %! p = gevcdf (x, k, sigma, mu); %! expected_p = [0.88062 0.87820 0.87580 0.87342 0.87107 0.86874 0.86643]; %! assert (p, expected_p, 0.001); statistics/inst/nanstd.m0000644000175000017500000000614212241756572015246 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{v} =} nanstd (@var{X}) ## @deftypefnx{Function File} {@var{v} =} nanstd (@var{X}, @var{opt}) ## @deftypefnx{Function File} {@var{v} =} nanstd (@var{X}, @var{opt}, @var{dim}) ## Compute the standard deviation while ignoring NaN values. ## ## @code{nanstd} is identical to the @code{std} function except that NaN values are ## ignored. If all values are NaN, the standard deviation is returned as NaN. ## If there is only a single non-NaN value, the deviation is returned as 0. ## ## The argument @var{opt} determines the type of normalization to use. Valid values ## are ## ## @table @asis ## @item 0: ## normalizes with @math{N-1}, provides the square root of best unbiased estimator of ## the variance [default] ## @item 1: ## normalizes with @math{N}, this provides the square root of the second moment around ## the mean ## @end table ## ## The third argument @var{dim} determines the dimension along which the standard ## deviation is calculated. ## ## @seealso{std, nanmin, nanmax, nansum, nanmedian, nanmean} ## @end deftypefn function v = nanstd (X, opt, varargin) if nargin < 1 print_usage; else if nargin < 3 dim = min(find(size(X)>1)); if isempty(dim), dim=1; endif; else dim = varargin{1}; endif if ((nargin < 2) || isempty(opt)) opt = 0; endif ## determine the number of non-missing points in each data set n = sum (!isnan(X), varargin{:}); ## replace missing data with zero and compute the mean X(isnan(X)) = 0; meanX = sum (X, varargin{:}) ./ n; ## subtract the mean from the data and compute the sum squared sz = ones(1,length(size(X))); sz(dim) = size(X,dim); v = sumsq (X - repmat(meanX,sz), varargin{:}); ## because the missing data was set to zero each missing data ## point will contribute (-meanX)^2 to sumsq, so remove these v = v - (meanX .^ 2) .* (size(X,dim) - n); if (opt == 0) ## compute the standard deviation from the corrected sumsq using ## max(n-1,1) in the denominator so that the std for a single point is 0 v = sqrt ( v ./ max(n - 1, 1) ); elseif (opt == 1) ## compute the standard deviation from the corrected sumsq v = sqrt ( v ./ n ); else error ("std: unrecognized normalization type"); endif ## make sure that we return a real number v = real (v); endif endfunction statistics/inst/anderson_darling_test.m0000644000175000017500000001235011741556364020326 0ustar asneltasnelt## Author: Paul Kienzle ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{q}, @var{Asq}, @var{info}] = } = @ ## anderson_darling_test (@var{x}, @var{distribution}) ## ## Test the hypothesis that @var{x} is selected from the given distribution ## using the Anderson-Darling test. If the returned @var{q} is small, reject ## the hypothesis at the @var{q}*100% level. ## ## The Anderson-Darling @math{@var{A}^2} statistic is calculated as follows: ## ## @example ## @iftex ## A^2_n = -n - \sum_{i=1}^n (2i-1)/n log(z_i (1-z_{n-i+1})) ## @end iftex ## @ifnottex ## n ## A^2_n = -n - SUM (2i-1)/n log(@math{z_i} (1 - @math{z_@{n-i+1@}})) ## i=1 ## @end ifnottex ## @end example ## ## where @math{z_i} is the ordered position of the @var{x}'s in the CDF of the ## distribution. Unlike the Kolmogorov-Smirnov statistic, the ## Anderson-Darling statistic is sensitive to the tails of the ## distribution. ## ## The @var{distribution} argument must be a either @t{"uniform"}, @t{"normal"}, ## or @t{"exponential"}. ## ## For @t{"normal"}' and @t{"exponential"} distributions, estimate the ## distribution parameters from the data, convert the values ## to CDF values, and compare the result to tabluated critical ## values. This includes an correction for small @var{n} which ## works well enough for @var{n} >= 8, but less so from smaller @var{n}. The ## returned @code{info.Asq_corrected} contains the adjusted statistic. ## ## For @t{"uniform"}, assume the values are uniformly distributed ## in (0,1), compute @math{@var{A}^2} and return the corresponding @math{p}-value from ## @code{1-anderson_darling_cdf(A^2,n)}. ## ## If you are selecting from a known distribution, convert your ## values into CDF values for the distribution and use @t{"uniform"}. ## Do not use @t{"uniform"} if the distribution parameters are estimated ## from the data itself, as this sharply biases the @math{A^2} statistic ## toward smaller values. ## ## [1] Stephens, MA; (1986), "Tests based on EDF statistics", in ## D'Agostino, RB; Stephens, MA; (eds.) Goodness-of-fit Techinques. ## New York: Dekker. ## ## @seealso{anderson_darling_cdf} ## @end deftypefn function [q,Asq,info] = anderson_darling_test(x,dist) if size(x,1) == 1, x=x(:); end x = sort(x); n = size(x,1); use_cdf = 0; # Compute adjustment and critical values to use for stats. switch dist case 'normal', # This expression for adj is used in R. # Note that the values from NIST dataplot don't work nearly as well. adj = 1 + (.75 + 2.25/n)/n; qvals = [ 0.1, 0.05, 0.025, 0.01 ]; Acrit = [ 0.631, 0.752, 0.873, 1.035]; x = stdnormal_cdf(zscore(x)); case 'uniform', ## Put invalid data at the limits of the distribution ## This will drive the statistic to infinity. x(x<0) = 0; x(x>1) = 1; adj = 1.; qvals = [ 0.1, 0.05, 0.025, 0.01 ]; Acrit = [ 1.933, 2.492, 3.070, 3.857 ]; use_cdf = 1; case 'XXXweibull', adj = 1 + 0.2/sqrt(n); qvals = [ 0.1, 0.05, 0.025, 0.01 ]; Acrit = [ 0.637, 0.757, 0.877, 1.038]; ## XXX FIXME XXX how to fit alpha and sigma? x = wblcdf (x, ones(n,1)*sigma, ones(n,1)*alpha); case 'exponential', adj = 1 + 0.6/n; qvals = [ 0.1, 0.05, 0.025, 0.01 ]; # Critical values depend on n. Choose the appropriate critical set. # These values come from NIST dataplot/src/dp8.f. Acritn = [ 0, 1.022, 1.265, 1.515, 1.888 11, 1.045, 1.300, 1.556, 1.927; 21, 1.062, 1.323, 1.582, 1.945; 51, 1.070, 1.330, 1.595, 1.951; 101, 1.078, 1.341, 1.606, 1.957; ]; # FIXME: consider interpolating in the critical value table. Acrit = Acritn(lookup(Acritn(:,1),n),2:5); lambda = 1./mean(x); # exponential parameter estimation x = expcdf(x, 1./(ones(n,1)*lambda)); otherwise # FIXME consider implementing more of distributions; a number # of them are defined in NIST dataplot/src/dp8.f. error("Anderson-Darling test for %s not implemented", dist); endswitch if any(x<0 | x>1) error('Anderson-Darling test requires data in CDF form'); endif i = [1:n]'*ones(1,size(x,2)); Asq = -n - sum( (2*i-1) .* (log(x) + log(1-x(n:-1:1,:))) )/n; # Lookup adjusted critical value in the cdf (if uniform) or in the # the critical table. if use_cdf q = 1-anderson_darling_cdf(Asq*adj, n); else idx = lookup([-Inf,Acrit],Asq*adj); q = [1,qvals](idx); endif if nargout > 2, info.Asq = Asq; info.Asq_corrected = Asq*adj; info.Asq_critical = [100*(1-qvals); Acrit]'; info.p = 1-q; info.p_is_precise = use_cdf; endif endfunction %!demo %! c = anderson_darling_test(10*rande(12,10000),'exponential'); %! tabulate(100*c,100*[unique(c),1]); %! % The Fc column should report 100, 250, 500, 1000, 10000 more or less. %!demo %! c = anderson_darling_test(randn(12,10000),'normal'); %! tabulate(100*c,100*[unique(c),1]); %! % The Fc column should report 100, 250, 500, 1000, 10000 more or less. %!demo %! c = anderson_darling_test(rand(12,10000),'uniform'); %! hist(100*c,1:2:99); %! % The histogram should be flat more or less. statistics/inst/tblwrite.m0000644000175000017500000000774111741556364015622 0ustar asneltasnelt## Copyright (C) 2008 Bill Denney ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} tblwrite (@var{data}, @var{varnames}, @var{casenames}, @var{filename}) ## @deftypefnx {Function File} {} tblwrite (@var{data}, @var{varnames}, @var{casenames}, @var{filename}, @var{delimeter}) ## Write tabular data to an ascii file. ## ## @var{data} is written to an ascii data file named @var{filename} with ## an optional @var{delimeter}. The delimeter may be any single ## character or ## @itemize ## @item "space" " " (default) ## @item "tab" "\t" ## @item "comma" "," ## @item "semi" ";" ## @item "bar" "|" ## @end itemize ## ## The @var{data} is written starting at cell (2,2) where the ## @var{varnames} are a char matrix or cell vector written to the first ## row (starting at (1,2)), and the @var{casenames} are a char matrix ## (or cell vector) written to the first column (starting at (2,1)). ## @seealso{tblread, csv2cell, cell2csv} ## @end deftypefn function tblwrite (data, varnames, casenames, f="", d=" ") ## Check arguments if nargin < 4 || nargin > 5 print_usage (); endif varnames = __makecell__ (varnames, "varnames"); casenames = __makecell__ (casenames, "varnames"); if numel (varnames) != columns (data) error ("tblwrite: the number of rows (or cells) in varnames must equal the number of columns in data") endif if numel (varnames) != rows (data) error ("tblwrite: the number of rows (or cells) in casenames must equal the number of rows in data") endif if isempty (f) ## FIXME: open a file dialog box in this case when a file dialog box ## becomes available error ("tblread: filename must be given") endif [d err] = tbl_delim (d); if ! isempty (err) error ("tblwrite: %s", err) endif dat = cell (size (data) + 1); dat(1,2:end) = varnames; dat(2:end,1) = casenames; dat(2:end,2:end) = mat2cell (data, ones (rows (data), 1), ones (columns (data), 1));; cell2csv (f, dat, d); endfunction function x = __makecell__ (x, name) ## force x into a cell matrix if ! iscell (x) if ischar (x) ## convert varnames into a cell x = mat2cell (x, ones (rows (x), 1)); else error ("tblwrite: %s must be either a char or a cell", name) endif endif endfunction ## Tests %!shared d, v, c %! d = [1 2;3 4]; %! v = ["a ";"bc"]; %! c = ["de";"f "]; %!test %! tblwrite (d, v, c, "tblwrite-space.dat"); %! [dt vt ct] = tblread ("tblwrite-space.dat", " "); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! tblwrite (d, v, c, "tblwrite-space.dat", " "); %! [dt vt ct] = tblread ("tblwrite-space.dat", " "); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! tblwrite (d, v, c, "tblwrite-space.dat", "space"); %! [dt vt ct] = tblread ("tblwrite-space.dat"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! tblwrite (d, v, c, "tblwrite-tab.dat", "tab"); %! [dt vt ct] = tblread ("tblwrite-tab.dat", "tab"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! tblwrite (d, v, c, "tblwrite-tab.dat", "\t"); %! [dt vt ct] = tblread ("tblwrite-tab.dat", "\t"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! tblwrite (d, v, c, "tblwrite-tab.dat", '\t'); %! [dt vt ct] = tblread ("tblwrite-tab.dat", '\t'); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); statistics/inst/raylcdf.m0000644000175000017500000000617411741556364015411 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{p} =} raylcdf (@var{x}, @var{sigma}) ## Compute the cumulative distribution function of the Rayleigh ## distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the support. The elements of @var{x} must be non-negative. ## ## @item ## @var{sigma} is the parameter of the Rayleigh distribution. The elements ## of @var{sigma} must be positive. ## @end itemize ## @var{x} and @var{sigma} must be of common size or one of them must be ## scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{p} is the cumulative distribution of the Rayleigh distribution at ## each element of @var{x} and corresponding parameter @var{sigma}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = 0:0.5:2.5; ## sigma = 1:6; ## p = raylcdf (x, sigma) ## @end group ## ## @group ## p = raylcdf (x, 0.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. pages 104 and 148, McGraw-Hill, New York, second edition, ## 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: CDF of the Rayleigh distribution function p = raylcdf (x, sigma) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (x) && ! ismatrix (x)) error ("raylcdf: x must be a numeric matrix"); endif if (! isempty (sigma) && ! ismatrix (sigma)) error ("raylcdf: sigma must be a numeric matrix"); endif if (! isscalar (x) || ! isscalar (sigma)) [retval, x, sigma] = common_size (x, sigma); if (retval > 0) error ("raylcdf: x and sigma must be of common size or scalar"); endif endif # Calculate cdf p = 1 - exp ((-x .^ 2) ./ (2 * sigma .^ 2)); # Continue argument check k = find (! (x >= 0) | ! (x < Inf) | ! (sigma > 0)); if (any (k)) p(k) = NaN; endif endfunction %!test %! x = 0:0.5:2.5; %! sigma = 1:6; %! p = raylcdf (x, sigma); %! expected_p = [0.0000, 0.0308, 0.0540, 0.0679, 0.0769, 0.0831]; %! assert (p, expected_p, 0.001); %!test %! x = 0:0.5:2.5; %! p = raylcdf (x, 0.5); %! expected_p = [0.0000, 0.3935, 0.8647, 0.9889, 0.9997, 1.0000]; %! assert (p, expected_p, 0.001); statistics/inst/iwishrnd.m0000644000175000017500000000543712260336563015607 0ustar asneltasnelt## Copyright (C) 2013 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ## ## Octave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with Octave; see the file COPYING. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} [@var{W}[, @var{DI}]] = iwishrnd (@var{Psi}, @var{df}[, @var{DI}][, @var{n}=1]) ## Return a random matrix sampled from the inverse Wishart distribution with given parameters ## ## Inputs: the @var{p} x @var{p} positive definite matrix @var{Tau} and scalar degrees of freedom parameter @var{df} (and optionally the transposed Cholesky factor @var{DI} of @var{Sigma} = @code{inv(Tau)}). ## @var{df} can be non-integer as long as @var{df} > @var{d} ## ## Output: a random @var{p} x @var{p} matrix @var{W} from the inverse Wishart(@var{Tau}, @var{df}) distribution. (@code{inv(W)} is from the Wishart(@code{inv(Tau)}, @var{df}) distribution.) If @var{n} > 1, then @var{W} is @var{p} x @var{p} x @var{n} and holds @var{n} such random matrices. (Optionally, the transposed Cholesky factor @var{DI} of @var{Sigma} is also returned.) ## ## Averaged across many samples, the mean of @var{W} should approach @var{Tau} / (@var{df} - @var{p} - 1). ## ## Reference: Yu-Cheng Ku and Peter Bloomfield (2010), Generating Random Wishart Matrices with Fractional Degrees of Freedom in OX, http://www.gwu.edu/~forcpgm/YuChengKu-030510final-WishartYu-ChengKu.pdf ## ## @seealso{wishrnd, iwishpdf} ## @end deftypefn ## Author: Nir Krakauer ## Description: Random matrices from the inverse Wishart distribution function [W, DI] = iwishrnd(Tau, df, DI, n = 1) if (nargin < 2) print_usage (); endif if nargin < 3 || isempty(DI) try D = chol(inv(Tau)); catch error('Cholesky decomposition failed; Tau probably not positive definite') end_try_catch DI = D'; else D = DI'; endif w = wishrnd([], df, D, n); if n > 1 p = size(D, 1); W = nan(p, p, n); endif for i = 1:n W(:, :, i) = inv(w(:, :, i)); endfor endfunction %!assert(size (iwishrnd (1,2,1)), [1, 1]); %!assert(size (iwishrnd ([],2,1)), [1, 1]); %!assert(size (iwishrnd ([3 1; 1 3], 2.00001, [], 1)), [2, 2]); %!assert(size (iwishrnd (eye(2), 2, [], 3)), [2, 2, 3]); %% Test input validation %!error iwishrnd () %!error iwishrnd (1) %!error iwishrnd ([-3 1; 1 3],1) %!error iwishrnd ([1; 1],1) statistics/inst/fstat.m0000644000175000017500000000672411741556364015107 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{mn}, @var{v}] =} fstat (@var{m}, @var{n}) ## Compute mean and variance of the F distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{m} is the first parameter of the F distribution. The elements ## of @var{m} must be positive ## ## @item ## @var{n} is the second parameter of the F distribution. The ## elements of @var{n} must be positive ## @end itemize ## @var{m} and @var{n} must be of common size or one of them must be scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{mn} is the mean of the F distribution. The mean is undefined for ## @var{n} not greater than 2 ## ## @item ## @var{v} is the variance of the F distribution. The variance is undefined ## for @var{n} not greater than 4 ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## m = 1:6; ## n = 5:10; ## [mn, v] = fstat (m, n) ## @end group ## ## @group ## [mn, v] = fstat (m, 5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the F distribution function [mn, v] = fstat (m, n) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (m) && ! ismatrix (m)) error ("fstat: m must be a numeric matrix"); endif if (! isempty (n) && ! ismatrix (n)) error ("fstat: n must be a numeric matrix"); endif if (! isscalar (m) || ! isscalar (n)) [retval, m, n] = common_size (m, n); if (retval > 0) error ("fstat: m and n must be of common size or scalar"); endif endif # Calculate moments mn = n ./ (n - 2); v = (2 .* (n .^ 2) .* (m + n - 2)) ./ (m .* ((n - 2) .^ 2) .* (n - 4)); # Continue argument check k = find (! (m > 0) | ! (m < Inf) | ! (n > 2) | ! (n < Inf)); if (any (k)) mn(k) = NaN; v(k) = NaN; endif k = find (! (n > 4)); if (any (k)) v(k) = NaN; endif endfunction %!test %! m = 1:6; %! n = 5:10; %! [mn, v] = fstat (m, n); %! expected_mn = [1.6667, 1.5000, 1.4000, 1.3333, 1.2857, 1.2500]; %! expected_v = [22.2222, 6.7500, 3.4844, 2.2222, 1.5869, 1.2153]; %! assert (mn, expected_mn, 0.001); %! assert (v, expected_v, 0.001); %!test %! m = 1:6; %! [mn, v] = fstat (m, 5); %! expected_mn = [1.6667, 1.6667, 1.6667, 1.6667, 1.6667, 1.6667]; %! expected_v = [22.2222, 13.8889, 11.1111, 9.7222, 8.8889, 8.3333]; %! assert (mn, expected_mn, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/normstat.m0000644000175000017500000000611111741556364015623 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{mn}, @var{v}] =} normstat (@var{m}, @var{s}) ## Compute mean and variance of the normal distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{m} is the mean of the normal distribution ## ## @item ## @var{s} is the standard deviation of the normal distribution. ## @var{s} must be positive ## @end itemize ## @var{m} and @var{s} must be of common size or one of them must be ## scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{mn} is the mean of the normal distribution ## ## @item ## @var{v} is the variance of the normal distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## m = 1:6; ## s = 0:0.2:1; ## [mn, v] = normstat (m, s) ## @end group ## ## @group ## [mn, v] = normstat (0, s) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the normal distribution function [mn, v] = normstat (m, s) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (m) && ! ismatrix (m)) error ("normstat: m must be a numeric matrix"); endif if (! isempty (s) && ! ismatrix (s)) error ("normstat: s must be a numeric matrix"); endif if (! isscalar (m) || ! isscalar (s)) [retval, m, s] = common_size (m, s); if (retval > 0) error ("normstat: m and s must be of common size or scalar"); endif endif # Set moments mn = m; v = s .* s; # Continue argument check k = find (! (s > 0) | ! (s < Inf)); if (any (k)) mn(k) = NaN; v(k) = NaN; endif endfunction %!test %! m = 1:6; %! s = 0.2:0.2:1.2; %! [mn, v] = normstat (m, s); %! expected_v = [0.0400, 0.1600, 0.3600, 0.6400, 1.0000, 1.4400]; %! assert (mn, m); %! assert (v, expected_v, 0.001); %!test %! s = 0.2:0.2:1.2; %! [mn, v] = normstat (0, s); %! expected_mn = [0, 0, 0, 0, 0, 0]; %! expected_v = [0.0400, 0.1600, 0.3600, 0.6400, 1.0000, 1.4400]; %! assert (mn, expected_mn, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/raylstat.m0000644000175000017500000000477211741556364015632 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} raylstat (@var{sigma}) ## Compute mean and variance of the Rayleigh distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{sigma} is the parameter of the Rayleigh distribution. The elements ## of @var{sigma} must be positive. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the Rayleigh distribution. ## ## @item ## @var{v} is the variance of the Rayleigh distribution. ## @end itemize ## ## @subheading Example ## ## @example ## @group ## sigma = 1:6; ## [m, v] = raylstat (sigma) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the Rayleigh distribution function [m, v] = raylstat (sigma) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (sigma) && ! ismatrix (sigma)) error ("raylstat: sigma must be a numeric matrix"); endif # Calculate moments m = sigma .* sqrt (pi ./ 2); v = (2 - pi ./ 2) .* sigma .^ 2; # Continue argument check k = find (! (sigma > 0)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! sigma = 1:6; %! [m, v] = raylstat (sigma); %! expected_m = [1.2533, 2.5066, 3.7599, 5.0133, 6.2666, 7.5199]; %! expected_v = [0.4292, 1.7168, 3.8628, 6.8673, 10.7301, 15.4513]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/binostat.m0000644000175000017500000000642411741556364015606 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} binostat (@var{n}, @var{p}) ## Compute mean and variance of the binomial distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{n} is the first parameter of the binomial distribution. The elements ## of @var{n} must be natural numbers ## ## @item ## @var{p} is the second parameter of the binomial distribution. The ## elements of @var{p} must be probabilities ## @end itemize ## @var{n} and @var{p} must be of common size or one of them must be scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the binomial distribution ## ## @item ## @var{v} is the variance of the binomial distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## n = 1:6; ## p = 0:0.2:1; ## [m, v] = binostat (n, p) ## @end group ## ## @group ## [m, v] = binostat (n, 0.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the binomial distribution function [m, v] = binostat (n, p) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (n) && ! ismatrix (n)) error ("binostat: n must be a numeric matrix"); endif if (! isempty (p) && ! ismatrix (p)) error ("binostat: p must be a numeric matrix"); endif if (! isscalar (n) || ! isscalar (p)) [retval, n, p] = common_size (n, p); if (retval > 0) error ("binostat: n and p must be of common size or scalar"); endif endif # Calculate moments m = n .* p; v = n .* p .* (1 - p); # Continue argument check k = find (! (n > 0) | ! (n < Inf) | ! (n == round (n)) | ! (p >= 0) | ! (p <= 1)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! n = 1:6; %! p = 0:0.2:1; %! [m, v] = binostat (n, p); %! expected_m = [0.00, 0.40, 1.20, 2.40, 4.00, 6.00]; %! expected_v = [0.00, 0.32, 0.72, 0.96, 0.80, 0.00]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! n = 1:6; %! [m, v] = binostat (n, 0.5); %! expected_m = [0.50, 1.00, 1.50, 2.00, 2.50, 3.00]; %! expected_v = [0.25, 0.50, 0.75, 1.00, 1.25, 1.50]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/gamlike.m0000644000175000017500000000106111741556364015364 0ustar asneltasnelt## Author: Martijn van Oosterhout ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} {@var{X} =} gamlike ([@var{A} @var{B}], @var{R}) ## Calculates the negative log-likelihood function for the Gamma ## distribution over vector @var{R}, with the given parameters @var{A} and @var{B}. ## @seealso{gampdf, gaminv, gamrnd, gamfit} ## @end deftypefn function res = gamlike(P,K) if (nargin != 2) print_usage; endif a=P(1); b=P(2); res = -sum( log( gampdf(K, a, b) ) ); endfunction statistics/inst/stepwisefit.m0000644000175000017500000001046312117711112016305 0ustar asneltasnelt## Copyright (C) 2013 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{X_use}, @var{b}, @var{bint}, @var{r}, @var{rint}, @var{stats} =} stepwisefit (@var{y}, @var{X}, @var{penter} = 0.05, @var{premove} = 0.1) ## Linear regression with stepwise variable selection. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{y} is an @var{n} by 1 vector of data to fit. ## @item ## @var{X} is an @var{n} by @var{k} matrix containing the values of @var{k} potential predictors. No constant term should be included (one will always be added to the regression automatically). ## @item ## @var{penter} is the maximum p-value to enter a new variable into the regression (default: 0.05). ## @item ## @var{premove} is the minimum p-value to remove a variable from the regression (default: 0.1). ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{X_use} contains the indices of the predictors included in the final regression model. The predictors are listed in the order they were added, so typically the first ones listed are the most significant. ## @item ## @var{b}, @var{bint}, @var{r}, @var{rint}, @var{stats} are the results of @code{[b, bint, r, rint, stats] = regress(y, [ones(size(y)) X(:, X_use)], penter);} ## @end itemize ## @subheading References ## ## @enumerate ## @item ## N. R. Draper and H. Smith (1966). @cite{Applied Regression Analysis}. Wiley. Chapter 6. ## ## @end enumerate ## @seealso{regress} ## @end deftypefn ## Author: Nir Krakauer ## Description: Linear regression with stepwise variable selection function [X_use, b, bint, r, rint, stats] = stepwisefit(y, X, penter = 0.05, premove = 0.1) n = numel(y); #number of data points k = size(X, 2); #number of predictors X_use = []; v = 0; #number of predictor variables in regression model r = y; while 1 #decide which variable to add to regression, if any added = false; if numel(X_use) < k X_inds = zeros(k, 1, "logical"); X_inds(X_use) = 1; [~, i_max_corr] = max(abs(corrcoef(X(:, ~X_inds), r))); #try adding the variable with the highest correlation to the residual from current regression [b_new, bint_new, r_new, rint_new, stats_new] = regress(y, [ones(n, 1) X(:, [X_use i_max_corr])], penter); z_new = abs(b_new(end)) / (bint_new(end, 2) - b_new(end)); if z_new > 1 #accept new variable added = true; X_use = [X_use i_max_corr]; b = b_new; bint = bint_new; r = r_new; rint = rint_new; stats = stats_new; v = v + 1; endif endif #decide which variable to drop from regression, if any dropped = false; if v > 0 t_ratio = tinv(1 - premove/2, n - v - 1) / tinv(1 - penter/2, n - v - 1); #estimate the ratio between the z score corresponding to premove to that corresponding to penter [z_min, i_min] = min(abs(b(2:end)) / (bint(2:end, 2) - b(2:end))); if z_min < t_ratio #drop a variable dropped = true; X_use(i_min) = []; [b, bint, r, rint, stats] = regress(y, [ones(n, 1) X(:, X_use)], penter); v = v - 1; endif endif #terminate if no change in the list of regression variables if ~added && ~dropped break endif endwhile endfunction %!test %! % Sample data from Draper and Smith (n = 13, k = 4) %! X = [7 1 11 11 7 11 3 1 2 21 1 11 10; ... %! 26 29 56 31 52 55 71 31 54 47 40 66 68; ... %! 6 15 8 8 6 9 17 22 18 4 23 9 8; ... %! 60 52 20 47 33 22 6 44 22 26 34 12 12]'; %! y = [78.5 74.3 104.3 87.6 95.9 109.2 102.7 72.5 93.1 115.9 83.8 113.3 109.4]'; %! [X_use, b, bint, r, rint, stats] = stepwisefit(y, X); %! assert(X_use, [4 1]) %! assert(b, regress(y, [ones(size(y)) X(:, X_use)], 0.05)) statistics/inst/tstat.m0000644000175000017500000000474311741556364015124 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} tstat (@var{n}) ## Compute mean and variance of the t (Student) distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{n} is the parameter of the t (Student) distribution. The elements ## of @var{n} must be positive ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the t (Student) distribution ## ## @item ## @var{v} is the variance of the t (Student) distribution ## @end itemize ## ## @subheading Example ## ## @example ## @group ## n = 3:8; ## [m, v] = tstat (n) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the t (Student) distribution function [m, v] = tstat (n) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (n) && ! ismatrix (n)) error ("tstat: n must be a numeric matrix"); endif # Calculate moments m = zeros (size (n)); v = n ./ (n - 2); # Continue argument check k = find (! (n > 1) | ! (n < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif k = find (! (n > 2) & (n < Inf)); if (any (k)) v(k) = Inf; endif endfunction %!test %! n = 3:8; %! [m, v] = tstat (n); %! expected_m = [0, 0, 0, 0, 0, 0]; %! expected_v = [3.0000, 2.0000, 1.6667, 1.5000, 1.4000, 1.3333]; %! assert (m, expected_m); %! assert (v, expected_v, 0.001); statistics/inst/iwishpdf.m0000644000175000017500000000545112260343373015566 0ustar asneltasnelt## Copyright (C) 2013 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ## ## Octave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with Octave; see the file COPYING. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} @var{y} = iwishpdf (@var{W}, @var{Tau}, @var{df}, @var{log_y}=false) ## Compute the probability density function of the Wishart distribution ## ## Inputs: A @var{p} x @var{p} matrix @var{W} where to find the PDF and the @var{p} x @var{p} positive definite scale matrix @var{Tau} and scalar degrees of freedom parameter @var{df} characterizing the inverse Wishart distribution. (For the density to be finite, need @var{df} > (@var{p} - 1).) ## If the flag @var{log_y} is set, return the log probability density -- this helps avoid underflow when the numerical value of the density is very small ## ## Output: @var{y} is the probability density of Wishart(@var{Sigma}, @var{df}) at @var{W}. ## ## @seealso{iwishrnd, wishpdf} ## @end deftypefn ## Author: Nir Krakauer ## Description: Compute the probability density function of the inverse Wishart distribution function [y] = iwishpdf(W, Tau, df, log_y=false) if (nargin < 3) print_usage (); endif p = size(Tau, 1); if (df <= (p - 1)) error('df too small, no finite densities exist') endif #calculate the logarithm of G_d(df/2), the multivariate gamma function g = (p * (p-1) / 4) * log(pi); for i = 1:p g = g + log(gamma((df + (1 - i))/2)); #using lngamma_gsl(.) from the gsl package instead of log(gamma(.)) might help avoid underflow/overflow endfor C = chol(W); #use formulas for determinant of positive definite matrix for better efficiency and numerical accuracy logdet_W = 2*sum(log(diag(C))); logdet_Tau = 2*sum(log(diag(chol(Tau)))); y = -(df*p)/2 * log(2) + (df/2)*logdet_Tau - g - ((df + p + 1)/2)*logdet_W - trace(Tau*chol2inv(C))/2; if ~log_y y = exp(y); endif endfunction ##test results cross-checked against diwish function in R MCMCpack library %!assert(iwishpdf(4, 3, 3.1), 0.04226595, 1E-7); %!assert(iwishpdf([2 -0.3;-0.3 4], [1 0.3;0.3 1], 4), 1.60166e-05, 1E-10); %!assert(iwishpdf([6 2 5; 2 10 -5; 5 -5 25], [9 5 5; 5 10 -8; 5 -8 22], 5.1), 4.946831e-12, 1E-17); %% Test input validation %!error iwishpdf () %!error iwishpdf (1, 2) %!error iwishpdf (1, 2, 0) %!error wishpdf (1, 2) statistics/inst/raylinv.m0000644000175000017500000000636311741556364015451 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{x} =} raylinv (@var{p}, @var{sigma}) ## Compute the quantile of the Rayleigh distribution. The quantile is the ## inverse of the cumulative distribution function. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{p} is the cumulative distribution. The elements of @var{p} must be ## probabilities. ## ## @item ## @var{sigma} is the parameter of the Rayleigh distribution. The elements ## of @var{sigma} must be positive. ## @end itemize ## @var{p} and @var{sigma} must be of common size or one of them must be ## scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{x} is the quantile of the Rayleigh distribution at each element of ## @var{p} and corresponding parameter @var{sigma}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## p = 0:0.1:0.5; ## sigma = 1:6; ## x = raylinv (p, sigma) ## @end group ## ## @group ## x = raylinv (p, 0.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. pages 104 and 148, McGraw-Hill, New York, second edition, ## 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Quantile of the Rayleigh distribution function x = raylinv (p, sigma) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (p) && ! ismatrix (p)) error ("raylinv: p must be a numeric matrix"); endif if (! isempty (sigma) && ! ismatrix (sigma)) error ("raylinv: sigma must be a numeric matrix"); endif if (! isscalar (p) || ! isscalar (sigma)) [retval, p, sigma] = common_size (p, sigma); if (retval > 0) error ("raylinv: p and sigma must be of common size or scalar"); endif endif # Calculate quantile x = sqrt (-2 .* log (1 - p) .* sigma .^ 2); k = find (p == 1); if (any (k)) x(k) = Inf; endif # Continue argument check k = find (! (p >= 0) | ! (p <= 1) | ! (sigma > 0)); if (any (k)) x(k) = NaN; endif endfunction %!test %! p = 0:0.1:0.5; %! sigma = 1:6; %! x = raylinv (p, sigma); %! expected_x = [0.0000, 0.9181, 2.0041, 3.3784, 5.0538, 7.0645]; %! assert (x, expected_x, 0.001); %!test %! p = 0:0.1:0.5; %! x = raylinv (p, 0.5); %! expected_x = [0.0000, 0.2295, 0.3340, 0.4223, 0.5054, 0.5887]; %! assert (x, expected_x, 0.001); statistics/inst/gevpdf.m0000644000175000017500000000724612067160771015234 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} gevpdf (@var{x}, @var{k}, @var{sigma}, @var{mu}) ## Compute the probability density function of the generalized extreme value (GEV) distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the support. ## ## @item ## @var{k} is the shape parameter of the GEV distribution. (Also denoted gamma or xi.) ## @item ## @var{sigma} is the scale parameter of the GEV distribution. The elements ## of @var{sigma} must be positive. ## @item ## @var{mu} is the location parameter of the GEV distribution. ## @end itemize ## The inputs must be of common size, or some of them must be scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{y} is the probability density of the GEV distribution at each ## element of @var{x} and corresponding parameter values. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = 0:0.5:2.5; ## sigma = 1:6; ## k = 1; ## mu = 0; ## y = gevpdf (x, k, sigma, mu) ## @end group ## ## @group ## y = gevpdf (x, k, 0.5, mu) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Rolf-Dieter Reiss and Michael Thomas. @cite{Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields}. Chapter 1, pages 16-17, Springer, 2007. ## ## @end enumerate ## @seealso{gevcdf, gevfit, gevinv, gevlike, gevrnd, gevstat} ## @end deftypefn ## Author: Nir Krakauer ## Description: PDF of the generalized extreme value distribution function y = gevpdf (x, k, sigma, mu) # Check arguments if (nargin != 4) print_usage (); endif if (isempty (x) || isempty (k) || isempty (sigma) || isempty (mu) || ~ismatrix (x) || ~ismatrix (k) || ~ismatrix (sigma) || ~ismatrix (mu)) error ("gevpdf: inputs must be a numeric matrices"); endif [retval, x, k, sigma, mu] = common_size (x, k, sigma, mu); if (retval > 0) error ("gevpdf: inputs must be of common size or scalars"); endif z = 1 + k .* (x - mu) ./ sigma; # Calculate pdf y = exp(-(z .^ (-1 ./ k))) .* (z .^ (-1 - 1 ./ k)) ./ sigma; y(z <= 0) = 0; inds = (k == 0); %use a different formula if any(inds) z = (mu(inds) - x(inds)) ./ sigma(inds); y(inds) = exp(z-exp(z)) ./ sigma(inds); endif endfunction %!test %! x = 0:0.5:2.5; %! sigma = 1:6; %! k = 1; %! mu = 0; %! y = gevpdf (x, k, sigma, mu); %! expected_y = [0.367879 0.143785 0.088569 0.063898 0.049953 0.040997]; %! assert (y, expected_y, 0.001); %!test %! x = -0.5:0.5:2.5; %! sigma = 0.5; %! k = 1; %! mu = 0; %! y = gevpdf (x, k, sigma, mu); %! expected_y = [0 0.735759 0.303265 0.159229 0.097350 0.065498 0.047027]; %! assert (y, expected_y, 0.001); %!test #check for continuity for k near 0 %! x = 1; %! sigma = 0.5; %! k = -0.03:0.01:0.03; %! mu = 0; %! y = gevpdf (x, k, sigma, mu); %! expected_y = [0.23820 0.23764 0.23704 0.23641 0.23576 0.23508 0.23438]; %! assert (y, expected_y, 0.001); statistics/inst/copularnd.m0000644000175000017500000001762411752545534015755 0ustar asneltasnelt## Copyright (C) 2012 Arno Onken ## ## This program is free software: you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 3 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{x} =} copularnd (@var{family}, @var{theta}, @var{n}) ## @deftypefnx {Function File} {} copularnd (@var{family}, @var{theta}, @var{n}, @var{d}) ## @deftypefnx {Function File} {} copularnd ('t', @var{theta}, @var{nu}, @var{n}) ## Generate random samples from a copula family. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{family} is the copula family name. Currently, @var{family} can be ## @code{'Gaussian'} for the Gaussian family, @code{'t'} for the Student's t ## family, or @code{'Clayton'} for the Clayton family. ## ## @item ## @var{theta} is the parameter of the copula. For the Gaussian and Student's t ## copula, @var{theta} must be a correlation matrix. For bivariate copulas ## @var{theta} can also be a correlation coefficient. For the Clayton family, ## @var{theta} must be a vector with the same number of elements as samples to ## be generated or be scalar. ## ## @item ## @var{nu} is the degrees of freedom for the Student's t family. @var{nu} must ## be a vector with the same number of elements as samples to be generated or ## be scalar. ## ## @item ## @var{n} is the number of rows of the matrix to be generated. @var{n} must be ## a non-negative integer and corresponds to the number of samples to be ## generated. ## ## @item ## @var{d} is the number of columns of the matrix to be generated. @var{d} must ## be a positive integer and corresponds to the dimension of the copula. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{x} is a matrix of random samples from the copula with @var{n} samples ## of distribution dimension @var{d}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## theta = 0.5; ## x = copularnd ("Gaussian", theta); ## @end group ## ## @group ## theta = 0.5; ## nu = 2; ## x = copularnd ("t", theta, nu); ## @end group ## ## @group ## theta = 0.5; ## n = 2; ## x = copularnd ("Clayton", theta, n); ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Roger B. Nelsen. @cite{An Introduction to Copulas}. Springer, New York, ## second edition, 2006. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Random samples from a copula family function x = copularnd (family, theta, nu, n) # Check arguments if (nargin < 2) print_usage (); endif if (! ischar (family)) error ("copularnd: family must be one of 'Gaussian', 't', and 'Clayton'"); endif lower_family = lower (family); # Check family and copula parameters switch (lower_family) case {"gaussian"} # Gaussian family if (isscalar (theta)) # Expand a scalar to a correlation matrix theta = [1, theta; theta, 1]; endif if (! ismatrix (theta) || any (diag (theta) != 1) || any (any (theta != theta')) || min (eig (theta)) <= 0) error ("copularnd: theta must be a correlation matrix"); endif if (nargin > 3) d = n; if (! isscalar (d) || d != size (theta, 1)) error ("copularnd: d must correspond to dimension of theta"); endif else d = size (theta, 1); endif if (nargin < 3) n = 1; else n = nu; if (! isscalar (n) || (n < 0) || round (n) != n) error ("copularnd: n must be a non-negative integer"); endif endif case {"t"} # Student's t family if (nargin < 3) print_usage (); endif if (isscalar (theta)) # Expand a scalar to a correlation matrix theta = [1, theta; theta, 1]; endif if (! ismatrix (theta) || any (diag (theta) != 1) || any (any (theta != theta')) || min (eig (theta)) <= 0) error ("copularnd: theta must be a correlation matrix"); endif if (! isscalar (nu) && (! isvector (nu) || length (nu) != n)) error ("copularnd: nu must be a vector with the same number of rows as x or be scalar"); endif nu = nu(:); if (nargin < 4) n = 1; else if (! isscalar (n) || (n < 0) || round (n) != n) error ("copularnd: n must be a non-negative integer"); endif endif case {"clayton"} # Archimedian one parameter family if (nargin < 4) # Default is bivariate d = 2; else d = n; if (! isscalar (d) || (d < 2) || round (d) != d) error ("copularnd: d must be an integer greater than 1"); endif endif if (nargin < 3) # Default is one sample n = 1; else n = nu; if (! isscalar (n) || (n < 0) || round (n) != n) error ("copularnd: n must be a non-negative integer"); endif endif if (! isvector (theta) || (! isscalar (theta) && size (theta, 1) != n)) error ("copularnd: theta must be a column vector with the number of rows equal to n or be scalar"); endif if (n > 1 && isscalar (theta)) theta = repmat (theta, n, 1); endif otherwise error ("copularnd: unknown copula family '%s'", family); endswitch if (n == 0) # Input is empty x = zeros (0, d); else # Draw random samples according to family switch (lower_family) case {"gaussian"} # The Gaussian family x = normcdf (mvnrnd (zeros (1, d), theta, n), 0, 1); # No parameter bounds check k = []; case {"t"} # The Student's t family x = tcdf (mvtrnd (theta, nu, n), nu); # No parameter bounds check k = []; case {"clayton"} # The Clayton family u = rand (n, d); if (d == 2) x = zeros (n, 2); # Conditional distribution method for the bivariate case which also # works for theta < 0 x(:, 1) = u(:, 1); x(:, 2) = (1 + u(:, 1) .^ (-theta) .* (u(:, 2) .^ (-theta ./ (1 + theta)) - 1)) .^ (-1 ./ theta); else # Apply the algorithm by Marshall and Olkin: # Frailty distribution for Clayton copula is gamma y = randg (1 ./ theta, n, 1); x = (1 - log (u) ./ repmat (y, 1, d)) .^ (-1 ./ repmat (theta, 1, d)); endif k = find (theta == 0); if (any (k)) # Produkt copula at columns k x(k, :) = u(k, :); endif # Continue argument check if (d == 2) k = find (! (theta >= -1) | ! (theta < inf)); else k = find (! (theta >= 0) | ! (theta < inf)); endif endswitch # Out of bounds parameters if (any (k)) x(k, :) = NaN; endif endif endfunction %!test %! theta = 0.5; %! x = copularnd ("Gaussian", theta); %! assert (size (x), [1, 2]); %! assert (all ((x >= 0) & (x <= 1))); %!test %! theta = 0.5; %! nu = 2; %! x = copularnd ("t", theta, nu); %! assert (size (x), [1, 2]); %! assert (all ((x >= 0) & (x <= 1))); %!test %! theta = 0.5; %! x = copularnd ("Clayton", theta); %! assert (size (x), [1, 2]); %! assert (all ((x >= 0) & (x <= 1))); %!test %! theta = 0.5; %! n = 2; %! x = copularnd ("Clayton", theta, n); %! assert (size (x), [n, 2]); %! assert (all ((x >= 0) & (x <= 1))); %!test %! theta = [1; 2]; %! n = 2; %! d = 3; %! x = copularnd ("Clayton", theta, n, d); %! assert (size (x), [n, d]); %! assert (all ((x >= 0) & (x <= 1))); statistics/inst/hygestat.m0000644000175000017500000000746111741556364015615 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{mn}, @var{v}] =} hygestat (@var{t}, @var{m}, @var{n}) ## Compute mean and variance of the hypergeometric distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{t} is the total size of the population of the hypergeometric ## distribution. The elements of @var{t} must be positive natural numbers ## ## @item ## @var{m} is the number of marked items of the hypergeometric distribution. ## The elements of @var{m} must be natural numbers ## ## @item ## @var{n} is the size of the drawn sample of the hypergeometric ## distribution. The elements of @var{n} must be positive natural numbers ## @end itemize ## @var{t}, @var{m}, and @var{n} must be of common size or scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{mn} is the mean of the hypergeometric distribution ## ## @item ## @var{v} is the variance of the hypergeometric distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## t = 4:9; ## m = 0:5; ## n = 1:6; ## [mn, v] = hygestat (t, m, n) ## @end group ## ## @group ## [mn, v] = hygestat (t, m, 2) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the hypergeometric distribution function [mn, v] = hygestat (t, m, n) # Check arguments if (nargin != 3) print_usage (); endif if (! isempty (t) && ! ismatrix (t)) error ("hygestat: t must be a numeric matrix"); endif if (! isempty (m) && ! ismatrix (m)) error ("hygestat: m must be a numeric matrix"); endif if (! isempty (n) && ! ismatrix (n)) error ("hygestat: n must be a numeric matrix"); endif if (! isscalar (t) || ! isscalar (m) || ! isscalar (n)) [retval, t, m, n] = common_size (t, m, n); if (retval > 0) error ("hygestat: t, m and n must be of common size or scalar"); endif endif # Calculate moments mn = (n .* m) ./ t; v = (n .* (m ./ t) .* (1 - m ./ t) .* (t - n)) ./ (t - 1); # Continue argument check k = find (! (t >= 0) | ! (m >= 0) | ! (n > 0) | ! (t == round (t)) | ! (m == round (m)) | ! (n == round (n)) | ! (m <= t) | ! (n <= t)); if (any (k)) mn(k) = NaN; v(k) = NaN; endif endfunction %!test %! t = 4:9; %! m = 0:5; %! n = 1:6; %! [mn, v] = hygestat (t, m, n); %! expected_mn = [0.0000, 0.4000, 1.0000, 1.7143, 2.5000, 3.3333]; %! expected_v = [0.0000, 0.2400, 0.4000, 0.4898, 0.5357, 0.5556]; %! assert (mn, expected_mn, 0.001); %! assert (v, expected_v, 0.001); %!test %! t = 4:9; %! m = 0:5; %! [mn, v] = hygestat (t, m, 2); %! expected_mn = [0.0000, 0.4000, 0.6667, 0.8571, 1.0000, 1.1111]; %! expected_v = [0.0000, 0.2400, 0.3556, 0.4082, 0.4286, 0.4321]; %! assert (mn, expected_mn, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/expstat.m0000644000175000017500000000450111741556364015445 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} expstat (@var{l}) ## Compute mean and variance of the exponential distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{l} is the parameter of the exponential distribution. The ## elements of @var{l} must be positive ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the exponential distribution ## ## @item ## @var{v} is the variance of the exponential distribution ## @end itemize ## ## @subheading Example ## ## @example ## @group ## l = 1:6; ## [m, v] = expstat (l) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the exponential distribution function [m, v] = expstat (l) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (l) && ! ismatrix (l)) error ("expstat: l must be a numeric matrix"); endif # Calculate moments m = l; v = m .^ 2; # Continue argument check k = find (! (l > 0) | ! (l < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! l = 1:6; %! [m, v] = expstat (l); %! assert (m, [1, 2, 3, 4, 5, 6], 0.001); %! assert (v, [1, 4, 9, 16, 25, 36], 0.001); statistics/inst/lognstat.m0000644000175000017500000000662611741556364015622 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} lognstat (@var{mu}, @var{sigma}) ## Compute mean and variance of the lognormal distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{mu} is the first parameter of the lognormal distribution ## ## @item ## @var{sigma} is the second parameter of the lognormal distribution. ## @var{sigma} must be positive or zero ## @end itemize ## @var{mu} and @var{sigma} must be of common size or one of them must be ## scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the lognormal distribution ## ## @item ## @var{v} is the variance of the lognormal distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## mu = 0:0.2:1; ## sigma = 0.2:0.2:1.2; ## [m, v] = lognstat (mu, sigma) ## @end group ## ## @group ## [m, v] = lognstat (0, sigma) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the lognormal distribution function [m, v] = lognstat (mu, sigma) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (mu) && ! ismatrix (mu)) error ("lognstat: mu must be a numeric matrix"); endif if (! isempty (sigma) && ! ismatrix (sigma)) error ("lognstat: sigma must be a numeric matrix"); endif if (! isscalar (mu) || ! isscalar (sigma)) [retval, mu, sigma] = common_size (mu, sigma); if (retval > 0) error ("lognstat: mu and sigma must be of common size or scalar"); endif endif # Calculate moments m = exp (mu + (sigma .^ 2) ./ 2); v = (exp (sigma .^ 2) - 1) .* exp (2 .* mu + sigma .^ 2); # Continue argument check k = find (! (sigma >= 0) | ! (sigma < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! mu = 0:0.2:1; %! sigma = 0.2:0.2:1.2; %! [m, v] = lognstat (mu, sigma); %! expected_m = [1.0202, 1.3231, 1.7860, 2.5093, 3.6693, 5.5845]; %! expected_v = [0.0425, 0.3038, 1.3823, 5.6447, 23.1345, 100.4437]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! sigma = 0.2:0.2:1.2; %! [m, v] = lognstat (0, sigma); %! expected_m = [1.0202, 1.0833, 1.1972, 1.3771, 1.6487, 2.0544]; %! expected_v = [0.0425, 0.2036, 0.6211, 1.7002, 4.6708, 13.5936]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/cmdscale.m0000644000175000017500000000526312267316306015530 0ustar asneltasnelt## Copyright (C) 2014 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} @var{Y} = cmdscale (@var{D}) ## @deftypefnx{Function File} [@var{Y}, @var{e}] = cmdscale (@var{D}) ## Classical multidimensional scaling of a matrix. Also known as principal coordinates analysis. ## ## Given an @var{n} by @var{n} Euclidean distance matrix @var{D}, find @var{n} points in @var{p} dimensional space which have this distance matrix. The coordinates of the points @var{Y} are returned. ## ## @var{D} should be a full distance matrix (hollow, symmetric, entries obeying the triangle inequality), or can be a vector of length @code{n(n-1)/2} containing the upper triangular elements of the distance matrix (such as that returned by the pdist function). If @var{D} is not a valid distance matrix, points @var{Y} will be returned whose distance matrix approximates @var{D}. ## ## The returned @var{Y} is an @var{n} by @var{p} matrix showing possible coordinates of the points in @var{p} dimensional space (@code{p < n}). Of course, any translation, rotation, or reflection of these would also have the same distance matrix. ## ## Can also return the eigenvalues @var{e} of @code{(D(1, :) .^ 2 + D(:, 1) .^ 2 - D .^ 2) / 2}, where the number of positive eigenvalues determines @var{p}. ## ## Reference: Rudolf Mathar (1985), The best Euclidian fit to a given distance matrix in prescribed dimensions, Linear Algebra and its Applications, 67: 1-6, doi: 10.1016/0024-3795(85)90181-8 ## ## @seealso{pdist, squareform} ## @end deftypefn ## Author: Nir Krakauer ## Description: Classical multidimensional scaling function [Y, e] = cmdscale (D) if isvector (D) D = squareform (D); endif warning ("off", "Octave:broadcast","local"); M = (D(1, :) .^ 2 + D(:, 1) .^ 2 - D .^ 2) / 2; [v e] = eig(M); e = diag(e); pe = (e > 0); #positive eigenvalues Y = v(:, pe) * diag(sqrt(e(pe))); endfunction %!shared m, n, X, D %! m = 4; n = 3; X = rand(m, n); D = pdist(X); %!assert(pdist(cmdscale(D)), D, m*n*eps) %!assert(pdist(cmdscale(squareform(D))), D, m*n*eps) statistics/inst/wblstat.m0000644000175000017500000000667711741556364015455 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} wblstat (@var{scale}, @var{shape}) ## Compute mean and variance of the Weibull distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{scale} is the scale parameter of the Weibull distribution. ## @var{scale} must be positive ## ## @item ## @var{shape} is the shape parameter of the Weibull distribution. ## @var{shape} must be positive ## @end itemize ## @var{scale} and @var{shape} must be of common size or one of them must be ## scalar ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the Weibull distribution ## ## @item ## @var{v} is the variance of the Weibull distribution ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## scale = 3:8; ## shape = 1:6; ## [m, v] = wblstat (scale, shape) ## @end group ## ## @group ## [m, v] = wblstat (6, shape) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the Weibull distribution function [m, v] = wblstat (scale, shape) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (scale) && ! ismatrix (scale)) error ("wblstat: scale must be a numeric matrix"); endif if (! isempty (shape) && ! ismatrix (shape)) error ("wblstat: shape must be a numeric matrix"); endif if (! isscalar (scale) || ! isscalar (shape)) [retval, scale, shape] = common_size (scale, shape); if (retval > 0) error ("wblstat: scale and shape must be of common size or scalar"); endif endif # Calculate moments m = scale .* gamma (1 + 1 ./ shape); v = (scale .^ 2) .* gamma (1 + 2 ./ shape) - m .^ 2; # Continue argument check k = find (! (scale > 0) | ! (scale < Inf) | ! (shape > 0) | ! (shape < Inf)); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! scale = 3:8; %! shape = 1:6; %! [m, v] = wblstat (scale, shape); %! expected_m = [3.0000, 3.5449, 4.4649, 5.4384, 6.4272, 7.4218]; %! expected_v = [9.0000, 3.4336, 2.6333, 2.3278, 2.1673, 2.0682]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); %!test %! shape = 1:6; %! [m, v] = wblstat (6, shape); %! expected_m = [ 6.0000, 5.3174, 5.3579, 5.4384, 5.5090, 5.5663]; %! expected_v = [36.0000, 7.7257, 3.7920, 2.3278, 1.5923, 1.1634]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/geomean.m0000644000175000017500000000214711741556364015374 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} geomean (@var{x}) ## @deftypefnx{Function File} geomean (@var{x}, @var{dim}) ## Compute the geometric mean. ## ## This function does the same as @code{mean (x, "g")}. ## ## @seealso{mean} ## @end deftypefn function a = geomean(x, dim) if (nargin == 1) a = mean(x, "g"); elseif (nargin == 2) a = mean(x, "g", dim); else print_usage; endif endfunction statistics/inst/cl_multinom.m0000644000175000017500000001201111741556364016272 0ustar asneltasnelt## Copyright (C) 2009 Levente Torok ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## ## @deftypefn {Function File} {@var{CL} =} cl_multinom (@var{x}, @var{N}, @var{b}, @var{calculation_type} ) - Confidence level of multinomial portions ## Returns confidence level of multinomial parameters estimated @math{ p = x / sum(x) } with predefined confidence interval @var{b}. ## Finite population is also considered. ## ## This function calculates the level of confidence at which the samples represent the true distribution ## given that there is a predefined tolerance (confidence interval). ## This is the upside down case of the typical excercises at which we want to get the confidence interval ## given the confidence level (and the estimated parameters of the underlying distribution). ## But once we accept (lets say at elections) that we have a standard predefined ## maximal acceptable error rate (e.g. @var{b}=0.02 ) in the estimation and we just want to know that how sure we ## can be that the measured proportions are the same as in the ## entire population (ie. the expected value and mean of the samples are roghly the same) we need to use this function. ## ## @subheading Arguments ## @itemize @bullet ## @item @var{x} : int vector : sample frequencies bins ## @item @var{N} : int : Population size that was sampled by x. If N 4) print_usage; elseif (!ischar (calculation_type)) error ("Argument calculation_type must be a string"); endif k = rows(x); nn = sum(x); p = x / nn; if (isscalar( b )) if (b==0) b=0.02; endif b = ones( rows(x), 1 ) * b; if (b<0) b=1 ./ max( x, 1 ); endif endif bb = b .* b; if (N==nn) CL = 1; return; endif if (N ## ## This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ## ## Octave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with Octave; see the file COPYING. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} @var{y} = wishpdf (@var{W}, @var{Sigma}, @var{df}, @var{log_y}=false) ## Compute the probability density function of the Wishart distribution ## ## Inputs: A @var{p} x @var{p} matrix @var{W} where to find the PDF. The @var{p} x @var{p} positive definite matrix @var{Sigma} and scalar degrees of freedom parameter @var{df} characterizing the Wishart distribution. (For the density to be finite, need @var{df} > (@var{p} - 1).) ## If the flag @var{log_y} is set, return the log probability density -- this helps avoid underflow when the numerical value of the density is very small ## ## Output: @var{y} is the probability density of Wishart(@var{Sigma}, @var{df}) at @var{W}. ## ## @seealso{wishrnd, iwishpdf} ## @end deftypefn ## Author: Nir Krakauer ## Description: Compute the probability density function of the Wishart distribution function [y] = wishpdf(W, Sigma, df, log_y=false) if (nargin < 3) print_usage (); endif p = size(Sigma, 1); if (df <= (p - 1)) error('df too small, no finite densities exist') endif #calculate the logarithm of G_d(df/2), the multivariate gamma function g = (p * (p-1) / 4) * log(pi); for i = 1:p g = g + log(gamma((df + (1 - i))/2)); #using lngamma_gsl(.) from the gsl package instead of log(gamma(.)) might help avoid underflow/overflow endfor C = chol(Sigma); #use formulas for determinant of positive definite matrix for better efficiency and numerical accuracy logdet_W = 2*sum(log(diag(chol(W)))); logdet_Sigma = 2*sum(log(diag(C))); y = -(df*p)/2 * log(2) - (df/2)*logdet_Sigma - g + ((df - p - 1)/2)*logdet_W - trace(chol2inv(C)*W)/2; if ~log_y y = exp(y); endif endfunction ##test results cross-checked against dwish function in R MCMCpack library %!assert(wishpdf(4, 3, 3.1), 0.07702496, 1E-7); %!assert(wishpdf([2 -0.3;-0.3 4], [1 0.3;0.3 1], 4), 0.004529741, 1E-7); %!assert(wishpdf([6 2 5; 2 10 -5; 5 -5 25], [9 5 5; 5 10 -8; 5 -8 22], 5.1), 4.474865e-10, 1E-15); %% Test input validation %!error wishpdf () %!error wishpdf (1, 2) %!error wishpdf (1, 2, 0) %!error wishpdf (1, 2) statistics/inst/mvtcdf.m0000644000175000017500000001123411741556364015241 0ustar asneltasnelt## Copyright (C) 2008 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{p} =} mvtcdf (@var{x}, @var{sigma}, @var{nu}) ## @deftypefnx {Function File} {} mvtcdf (@var{a}, @var{x}, @var{sigma}, @var{nu}) ## @deftypefnx {Function File} {[@var{p}, @var{err}] =} mvtcdf (@dots{}) ## Compute the cumulative distribution function of the multivariate ## Student's t distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the upper limit for integration where each row corresponds ## to an observation. ## ## @item ## @var{sigma} is the correlation matrix. ## ## @item ## @var{nu} is the degrees of freedom. ## ## @item ## @var{a} is the lower limit for integration where each row corresponds ## to an observation. @var{a} must have the same size as @var{x}. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{p} is the cumulative distribution at each row of @var{x} and ## @var{a}. ## ## @item ## @var{err} is the estimated error. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = [1 2]; ## sigma = [1.0 0.5; 0.5 1.0]; ## nu = 4; ## p = mvtcdf (x, sigma, nu) ## @end group ## ## @group ## a = [-inf 0]; ## p = mvtcdf (a, x, sigma, nu) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Alan Genz and Frank Bretz. Numerical Computation of Multivariate ## t-Probabilities with Application to Power Calculation of Multiple ## Constrasts. @cite{Journal of Statistical Computation and Simulation}, ## 63, pages 361-378, 1999. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: CDF of the multivariate Student's t distribution function [p, err] = mvtcdf (varargin) # Monte-Carlo confidence factor for the standard error: 99 % gamma = 2.5; # Tolerance err_eps = 1e-3; if (length (varargin) == 3) x = varargin{1}; sigma = varargin{2}; nu = varargin{3}; a = -Inf .* ones (size (x)); elseif (length (varargin) == 4) a = varargin{1}; x = varargin{2}; sigma = varargin{3}; nu = varargin{4}; else print_usage (); endif # Dimension q = size (sigma, 1); cases = size (x, 1); # Check parameters if (size (x, 2) != q) error ("mvtcdf: x must have the same number of columns as sigma"); endif if (any (size (x) != size (a))) error ("mvtcdf: a must have the same size as x"); endif if (! isscalar (nu) && (! isvector (nu) || length (nu) != cases)) error ("mvtcdf: nu must be a scalar or a vector with the same number of rows as x"); endif # Convert to correlation matrix if necessary if (any (diag (sigma) != 1)) svar = repmat (diag (sigma), 1, q); sigma = sigma ./ sqrt (svar .* svar'); endif if (q < 1 || size (sigma, 2) != q || any (any (sigma != sigma')) || min (eig (sigma)) <= 0) error ("mvtcdf: sigma must be nonempty symmetric positive definite"); endif nu = nu(:); c = chol (sigma)'; # Number of integral transformations n = 1; p = zeros (cases, 1); varsum = zeros (cases, 1); err = ones (cases, 1) .* err_eps; # Apply crude Monte-Carlo estimation while any (err >= err_eps) # Sample from q-1 dimensional unit hypercube w = rand (cases, q - 1); # Transformation of the multivariate t-integral dvev = tcdf ([a(:, 1) / c(1, 1), x(:, 1) / c(1, 1)], nu); dv = dvev(:, 1); ev = dvev(:, 2); fv = ev - dv; y = zeros (cases, q - 1); for i = 1:(q - 1) y(:, i) = tinv (dv + w(:, i) .* (ev - dv), nu + i - 1) .* sqrt ((nu + sum (y(:, 1:(i-1)) .^ 2, 2)) ./ (nu + i - 1)); tf = (sqrt ((nu + i) ./ (nu + sum (y(:, 1:i) .^ 2, 2)))) ./ c(i + 1, i + 1); dvev = tcdf ([(a(:, i + 1) - c(i + 1, 1:i) .* y(:, 1:i)) .* tf, (x(:, i + 1) - c(i + 1, 1:i) .* y(:, 1:i)) .* tf], nu + i); dv = dvev(:, 1); ev = dvev(:, 2); fv = (ev - dv) .* fv; endfor n++; # Estimate standard error varsum += (n - 1) .* ((fv - p) .^ 2) ./ n; err = gamma .* sqrt (varsum ./ (n .* (n - 1))); p += (fv - p) ./ n; endwhile endfunction statistics/inst/nansum.m0000644000175000017500000000241611741556364015261 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{v} =} nansum (@var{X}) ## @deftypefnx{Function File} {@var{v} =} nansum (@var{X}, @var{dim}) ## Compute the sum while ignoring NaN values. ## ## @code{nansum} is identical to the @code{sum} function except that NaN values are ## treated as 0 and so ignored. If all values are NaN, the sum is ## returned as 0. ## ## @seealso{sum, nanmin, nanmax, nanmean, nanmedian} ## @end deftypefn function v = nansum (X, varargin) if nargin < 1 print_usage; else X(isnan(X)) = 0; v = sum (X, varargin{:}); endif endfunction statistics/inst/nanmin.m0000644000175000017500000000352012246443234015225 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## Copyright (C) 2003 Alois Schloegl ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{v}, @var{idx}] =} nanmin (@var{X}) ## @deftypefnx{Function File} {[@var{v}, @var{idx}] =} nanmin (@var{X}, @var{Y}) ## Find the minimal element while ignoring NaN values. ## ## @code{nanmin} is identical to the @code{min} function except that NaN values ## are ignored. If all values in a column are NaN, the minimum is ## returned as NaN rather than []. ## ## @seealso{min, nansum, nanmax, nanmean, nanmedian} ## @end deftypefn function [v, idx] = nanmin (X, Y, DIM) if nargin < 1 || nargin > 3 print_usage; elseif nargin == 1 || (nargin == 2 && isempty(Y)) nanvals = isnan(X); X(nanvals) = Inf; [v, idx] = min (X); v(all(nanvals)) = NaN; elseif (nargin == 3 && isempty(Y)) nanvals = isnan(X); X(nanvals) = Inf; [v, idx] = min (X,[],DIM); v(all(nanvals,DIM)) = NaN; else Xnan = isnan(X); Ynan = isnan(Y); X(Xnan) = Inf; Y(Ynan) = Inf; if (nargin == 3) [v, idx] = min(X,Y,DIM); else [v, idx] = min(X,Y); endif v(Xnan & Ynan) = NaN; endif endfunction statistics/inst/linkage.m0000644000175000017500000002301612026164456015363 0ustar asneltasnelt## Copyright (C) 2008 Francesco Potort́ ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} linkage (@var{d}) ## @deftypefnx {Function File} {@var{y} =} linkage (@var{d}, @var{method}) ## @deftypefnx {Function File} @ ## {@var{y} =} linkage (@var{x}, @var{method}, @var{metric}) ## @deftypefnx {Function File} @ ## {@var{y} =} linkage (@var{x}, @var{method}, @var{arglist}) ## ## Produce a hierarchical clustering dendrogram ## ## @var{d} is the dissimilarity matrix relative to n observations, ## formatted as a @math{(n-1)*n/2}x1 vector as produced by @code{pdist}. ## Alternatively, @var{x} contains data formatted for input to ## @code{pdist}, @var{metric} is a metric for @code{pdist} and ## @var{arglist} is a cell array containing arguments that are passed to ## @code{pdist}. ## ## @code{linkage} starts by putting each observation into a singleton ## cluster and numbering those from 1 to n. Then it merges two ## clusters, chosen according to @var{method}, to create a new cluster ## numbered n+1, and so on until all observations are grouped into ## a single cluster numbered 2(n-1). Row k of the ## (m-1)x3 output matrix relates to cluster n+k: the first ## two columns are the numbers of the two component clusters and column ## 3 contains their distance. ## ## @var{method} defines the way the distance between two clusters is ## computed and how they are recomputed when two clusters are merged: ## ## @table @samp ## @item "single" (default) ## Distance between two clusters is the minimum distance between two ## elements belonging each to one cluster. Produces a cluster tree ## known as minimum spanning tree. ## ## @item "complete" ## Furthest distance between two elements belonging each to one cluster. ## ## @item "average" ## Unweighted pair group method with averaging (UPGMA). ## The mean distance between all pair of elements each belonging to one ## cluster. ## ## @item "weighted" ## Weighted pair group method with averaging (WPGMA). ## When two clusters A and B are joined together, the new distance to a ## cluster C is the mean between distances A-C and B-C. ## ## @item "centroid" ## Unweighted Pair-Group Method using Centroids (UPGMC). ## Assumes Euclidean metric. The distance between cluster centroids, ## each centroid being the center of mass of a cluster. ## ## @item "median" ## Weighted pair-group method using centroids (WPGMC). ## Assumes Euclidean metric. Distance between cluster centroids. When ## two clusters are joined together, the new centroid is the midpoint ## between the joined centroids. ## ## @item "ward" ## Ward's sum of squared deviations about the group mean (ESS). ## Also known as minimum variance or inner squared distance. ## Assumes Euclidean metric. How much the moment of inertia of the ## merged cluster exceeds the sum of those of the individual clusters. ## @end table ## ## @strong{Reference} ## Ward, J. H. Hierarchical Grouping to Optimize an Objective Function ## J. Am. Statist. Assoc. 1963, 58, 236-244, ## @url{http://iv.slis.indiana.edu/sw/data/ward.pdf}. ## @end deftypefn ## ## @seealso{pdist,squareform} ## Author: Francesco Potort́ function dgram = linkage (d, method = "single", distarg) ## check the input if (nargin < 1) || (nargin > 3) print_usage (); endif if (isempty (d)) error ("linkage: d cannot be empty"); elseif ( nargin < 3 && ~ isvector (d)) error ("linkage: d must be a vector"); endif methods = struct ... ("name", { "single"; "complete"; "average"; "weighted"; "centroid"; "median"; "ward" }, "distfunc", {(@(x) min(x)) # single (@(x) max(x)) # complete (@(x,i,j,w) sum(diag(q=w([i,j]))*x)/sum(q)) # average (@(x) mean(x)) # weighted (@massdist) # centroid (@(x,i) massdist(x,i)) # median (@inertialdist) # ward }); mask = strcmp (lower (method), {methods.name}); if (! any (mask)) error ("linkage: %s: unknown method", method); endif dist = {methods.distfunc}{mask}; if (nargin == 3) if (ischar (distarg)) d = pdist (d, distarg); elseif (iscell (distarg)) d = pdist (d, distarg{:}); else print_usage (); endif endif d = squareform (d, "tomatrix"); # dissimilarity NxN matrix n = rows (d); # the number of observations diagidx = sub2ind ([n,n], 1:n, 1:n); # indices of diagonal elements d(diagidx) = Inf; # consider a cluster as far from itself ## For equal-distance nodes, the order in which clusters are ## merged is arbitrary. Rotating the initial matrix produces an ## ordering similar to Matlab's. cname = n:-1:1; # cluster names in d d = rot90 (d, 2); # exchange low and high cluster numbers weight = ones (1, n); # cluster weights dgram = zeros (n-1, 3); # clusters from n+1 to 2*n-1 for cluster = n+1:2*n-1 ## Find the two nearest clusters [m midx] = min (d(:)); [r, c] = ind2sub (size (d), midx); ## Here is the new cluster dgram(cluster-n, :) = [cname(r) cname(c) d(r, c)]; ## Put it in place of the first one and remove the second cname(r) = cluster; cname(c) = []; ## Compute the new distances newd = dist (d([r c], :), r, c, weight); newd(r) = Inf; # take care of the diagonal element ## Put distances in place of the first ones, remove the second ones d(r,:) = newd; d(:,r) = newd'; d(c,:) = []; d(:,c) = []; ## The new weight is the sum of the components' weights weight(r) += weight(c); weight(c) = []; endfor ## Sort the cluster numbers, as Matlab does dgram(:,1:2) = sort (dgram(:,1:2), 2); ## Check that distances are monotonically increasing if (any (diff (dgram(:,3)) < 0)) warning ("clustering", "linkage: cluster distances do not monotonically increase\n\ you should probably use a method different from \"%s\"", method); endif endfunction ## Take two row vectors, which are the Euclidean distances of clusters I ## and J from the others. Column I of second row contains the distance ## between clusters I and J. The centre of gravity of the new cluster ## is on the segment joining the old ones. W are the weights of all ## clusters. Use the law of cosines to find the distances of the new ## cluster from all the others. function y = massdist (x, i, j, w) x .^= 2; # squared Euclidean distances if (nargin == 2) # median distance qi = 0.5; # equal weights ("weighted") else # centroid distance qi = 1 / (1 + w(j) / w(i)); # proportional weights ("unweighted") endif y = sqrt (qi*x(1,:) + (1-qi)*(x(2,:) - qi*x(2,i))); endfunction ## Take two row vectors, which are the inertial distances of clusters I ## and J from the others. Column I of second row contains the inertial ## distance between clusters I and J. The centre of gravity of the new ## cluster K is on the segment joining I and J. W are the weights of ## all clusters. Convert inertial to Euclidean distances, then use the ## law of cosines to find the Euclidean distances of K from all the ## other clusters, convert them back to inertial distances and return ## them. function y = inertialdist (x, i, j, w) wi = w(i); wj = w(j); # the cluster weights s = [wi + w; wj + w]; # sum of weights for all cluster pairs p = [wi * w; wj * w]; # product of weights for all cluster pairs x = x.^2 .* s ./ p; # convert inertial dist. to squared Eucl. sij = wi + wj; # sum of weights of I and J qi = wi/sij; # normalise the weight of I ## Squared Euclidean distances between all clusters and new cluster K x = qi*x(1,:) + (1-qi)*(x(2,:) - qi*x(2,i)); y = sqrt (x * sij .* w ./ (sij + w)); # convert Eucl. dist. to inertial endfunction %!shared x, t %! x = reshape(mod(magic(6),5),[],3); %! t = 1e-6; %!assert (cond (linkage (pdist (x))), 34.119045,t); %!assert (cond (linkage (pdist (x), "complete")), 21.793345,t); %!assert (cond (linkage (pdist (x), "average")), 27.045012,t); %!assert (cond (linkage (pdist (x), "weighted")), 27.412889,t); %! lastwarn(); # Clear last warning before the test %!warning linkage (pdist (x), "centroid"); %!test warning off clustering %! assert (cond (linkage (pdist (x), "centroid")), 27.457477,t); %! warning on clustering %!warning linkage (pdist (x), "median"); %!test warning off clustering %! assert (cond (linkage (pdist (x), "median")), 27.683325,t); %! warning on clustering %!assert (cond (linkage (pdist (x), "ward")), 17.195198,t); %!assert (cond (linkage(x,"ward","euclidean")), 17.195198,t); %!assert (cond (linkage(x,"ward",{"euclidean"})), 17.195198,t); %!assert (cond (linkage(x,"ward",{"minkowski",2})),17.195198,t); statistics/inst/caseread.m0000644000175000017500000000341411741556364015526 0ustar asneltasnelt## Copyright (C) 2008 Bill Denney ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{names} =} caseread (@var{filename}) ## Read case names from an ascii file. ## ## Essentially, this reads all lines from a file as text and returns ## them in a string matrix. ## @seealso{casewrite, tblread, tblwrite, csv2cell, cell2csv, fopen} ## @end deftypefn ## Author: Bill Denney ## Description: Read strings from a file function names = caseread (f="") ## Check arguments if nargin != 1 print_usage (); endif if isempty (f) ## FIXME: open a file dialog box in this case when a file dialog box ## becomes available error ("caseread: filename must be given") endif [fid msg] = fopen (f, "rt"); if fid < 0 || (! isempty (msg)) error ("caseread: cannot open %s: %s", f, msg); endif names = {}; t = fgetl (fid); while ischar (t) names{end+1} = t; t = fgetl (fid); endwhile if (fclose (fid) < 0) error ("caseread: error closing f") endif names = strvcat (names); endfunction ## Tests %!shared n %! n = ["a ";"bcd";"ef "]; %!assert (caseread ("caseread.dat"), n); statistics/inst/copulapdf.m0000644000175000017500000001372511741556364015742 0ustar asneltasnelt## Copyright (C) 2008 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{p} =} copulapdf (@var{family}, @var{x}, @var{theta}) ## Compute the probability density function of a copula family. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{family} is the copula family name. Currently, @var{family} can ## be @code{'Clayton'} for the Clayton family, @code{'Gumbel'} for the ## Gumbel-Hougaard family, @code{'Frank'} for the Frank family, or ## @code{'AMH'} for the Ali-Mikhail-Haq family. ## ## @item ## @var{x} is the support where each row corresponds to an observation. ## ## @item ## @var{theta} is the parameter of the copula. The elements of ## @var{theta} must be greater than or equal to @code{-1} for the ## Clayton family, greater than or equal to @code{1} for the ## Gumbel-Hougaard family, arbitrary for the Frank family, and greater ## than or equal to @code{-1} and lower than @code{1} for the ## Ali-Mikhail-Haq family. Moreover, @var{theta} must be non-negative ## for dimensions greater than @code{2}. @var{theta} must be a column ## vector with the same number of rows as @var{x} or be scalar. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{p} is the probability density of the copula at each row of ## @var{x} and corresponding parameter @var{theta}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = [0.2:0.2:0.6; 0.2:0.2:0.6]; ## theta = [1; 2]; ## p = copulapdf ("Clayton", x, theta) ## @end group ## ## @group ## p = copulapdf ("Gumbel", x, 2) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Roger B. Nelsen. @cite{An Introduction to Copulas}. Springer, ## New York, second edition, 2006. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: PDF of a copula family function p = copulapdf (family, x, theta) # Check arguments if (nargin != 3) print_usage (); endif if (! ischar (family)) error ("copulapdf: family must be one of 'Clayton', 'Gumbel', 'Frank', and 'AMH'"); endif if (! isempty (x) && ! ismatrix (x)) error ("copulapdf: x must be a numeric matrix"); endif [n, d] = size (x); if (! isvector (theta) || (! isscalar (theta) && size (theta, 1) != n)) error ("copulapdf: theta must be a column vector with the same number of rows as x or be scalar"); endif if (n == 0) # Input is empty p = zeros (0, 1); else if (n > 1 && isscalar (theta)) theta = repmat (theta, n, 1); endif # Truncate input to unit hypercube x(x < 0) = 0; x(x > 1) = 1; # Compute the cumulative distribution function according to family lowerarg = lower (family); if (strcmp (lowerarg, "clayton")) # The Clayton family log_cdf = -log (max (sum (x .^ (repmat (-theta, 1, d)), 2) - d + 1, 0)) ./ theta; p = prod (repmat (theta, 1, d) .* repmat (0:(d - 1), n, 1) + 1, 2) .* exp ((1 + theta .* d) .* log_cdf - (theta + 1) .* sum (log (x), 2)); # Product copula at columns where theta == 0 k = find (theta == 0); if (any (k)) p(k) = 1; endif # Check theta if (d > 2) k = find (! (theta >= 0) | ! (theta < inf)); else k = find (! (theta >= -1) | ! (theta < inf)); endif elseif (strcmp (lowerarg, "gumbel")) # The Gumbel-Hougaard family g = sum ((-log (x)) .^ repmat (theta, 1, d), 2); c = exp (-g .^ (1 ./ theta)); p = ((prod (-log (x), 2)) .^ (theta - 1)) ./ prod (x, 2) .* c .* (g .^ (2 ./ theta - 2) + (theta - 1) .* g .^ (1 ./ theta - 2)); # Check theta k = find (! (theta >= 1) | ! (theta < inf)); elseif (strcmp (lowerarg, "frank")) # The Frank family if (d != 2) error ("copulapdf: Frank copula PDF implemented as bivariate only"); endif p = (theta .* exp (theta .* (1 + sum (x, 2))) .* (exp (theta) - 1))./ (exp (theta) - exp (theta + theta .* x(:, 1)) + exp (theta .* sum (x, 2)) - exp (theta + theta .* x(:, 2))) .^ 2; # Product copula at columns where theta == 0 k = find (theta == 0); if (any (k)) p(k) = 1; endif # Check theta k = find (! (theta > -inf) | ! (theta < inf)); elseif (strcmp (lowerarg, "amh")) # The Ali-Mikhail-Haq family if (d != 2) error ("copulapdf: Ali-Mikhail-Haq copula PDF implemented as bivariate only"); endif z = theta .* prod (x - 1, 2) - 1; p = (theta .* (1 - sum (x, 2) - prod (x, 2) - z) - 1) ./ (z .^ 3); # Check theta k = find (! (theta >= -1) | ! (theta < 1)); else error ("copulapdf: unknown copula family '%s'", family); endif if (any (k)) p(k) = NaN; endif endif endfunction %!test %! x = [0.2:0.2:0.6; 0.2:0.2:0.6]; %! theta = [1; 2]; %! p = copulapdf ("Clayton", x, theta); %! expected_p = [0.9872; 0.7295]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2:0.2:0.6; 0.2:0.2:0.6]; %! p = copulapdf ("Gumbel", x, 2); %! expected_p = [0.9468; 0.9468]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2, 0.6; 0.2, 0.6]; %! theta = [1; 2]; %! p = copulapdf ("Frank", x, theta); %! expected_p = [0.9378; 0.8678]; %! assert (p, expected_p, 0.001); %!test %! x = [0.2, 0.6; 0.2, 0.6]; %! theta = [0.3; 0.7]; %! p = copulapdf ("AMH", x, theta); %! expected_p = [0.9540; 0.8577]; %! assert (p, expected_p, 0.001); statistics/inst/princomp.m0000644000175000017500000001070212174252676015604 0ustar asneltasnelt## Copyright (C) 2013 Fernando Damian Nieuwveldt ## ## This program is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License ## as published by the Free Software Foundation; either version 3 ## of the License, or (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{COEFF}]} = princomp(@var{X}) ## @deftypefnx {Function File} {[@var{COEFF},@var{SCORE}]} = princomp(@var{X}) ## @deftypefnx {Function File} {[@var{COEFF},@var{SCORE},@var{latent}]} = princomp(@var{X}) ## @deftypefnx {Function File} {[@var{COEFF},@var{SCORE},@var{latent},@var{tsquare}]} = princomp(@var{X}) ## @deftypefnx {Function File} {[...]} = princomp(@var{X},'econ') ## @itemize @bullet ## @item ## princomp performs principal component analysis on a NxP data matrix X ## @item ## @var{COEFF} : returns the principal component coefficients ## @item ## @var{SCORE} : returns the principal component scores, the representation of X ## in the principal component space ## @item ## @var{LATENT} : returns the principal component variances, i.e., the ## eigenvalues of the covariance matrix X. ## @item ## @var{TSQUARE} : returns Hotelling's T-squared Statistic for each observation in X ## @item ## [...] = princomp(X,'econ') returns only the elements of latent that are not ## necessarily zero, and the corresponding columns of COEFF and SCORE, that is, ## when n <= p, only the first n-1. This can be significantly faster when p is ## much larger than n. In this case the svd will be applied on the transpose of ## the data matrix X ## ## @end itemize ## ## @subheading References ## ## @enumerate ## @item ## Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer, 2002 ## ## @end enumerate ## @end deftypefn function [COEFF,SCORE,latent,tsquare] = princomp(X,varargin) if (nargin < 1 || nargin > 2) print_usage (); endif if (nargin == 2 && ! strcmpi (varargin{:}, "econ")) error ("princomp: if a second input argument is present, it must be the string 'econ'"); endif [nobs nvars] = size(X); # Center the columns to mean zero Xcentered = bsxfun(@minus,X,mean(X)); # Check if there are more variables then observations if nvars <= nobs [U,S,COEFF] = svd(Xcentered); else # Calculate the svd on the transpose matrix, much faster if (nargin == 2 && strcmpi ( varargin{:} , "econ")) [COEFF,S,V] = svd(Xcentered' , 'econ'); else [COEFF,S,V] = svd(Xcentered'); endif endif if nargout > 1 # Get the Scores SCORE = Xcentered*COEFF; # Get the rank of the SCORE matrix r = rank(SCORE); # Only use the first r columns, pad rest with zeros if economy != 'econ' SCORE = SCORE(:,1:r) ; if !(nargin == 2 && strcmpi ( varargin{:} , "econ")) SCORE = [SCORE, zeros(nobs , nvars-r)]; else COEFF = COEFF(: , 1:r); endif endif # This is the same as the eigenvalues of the covariance matrix of X latent = (diag(S'*S)/(size(Xcentered,1)-1))(1:r); if nargout > 2 if !(nargin == 2 && strcmpi ( varargin{:} , "econ")) latent= [latent;zeros(nvars-r,1)]; endif endif if nargout > 3 # Calculate the Hotelling T-Square statistic for the observations tsquare = sumsq(zscore(SCORE(:,1:r)),2); endif endfunction %!shared COEFF,SCORE,latent,tsquare,m,x %!test %! x=[1,2,3;2,1,3]'; %! [COEFF,SCORE,latent,tsquare] = princomp(x); %! m=[sqrt(2),sqrt(2);sqrt(2),-sqrt(2);-2*sqrt(2),0]/2; %! m(:,1) = m(:,1)*sign(COEFF(1,1)); %! m(:,2) = m(:,2)*sign(COEFF(1,2)); %!assert(COEFF,m(1:2,:),10*eps); %!assert(SCORE,-m,10*eps); %!assert(latent,[1.5;.5],10*eps); %!assert(tsquare,[4;4;4]/3,10*eps); %!test %! x=x'; %! [COEFF,SCORE,latent,tsquare] = princomp(x); %! m=[sqrt(2),sqrt(2),0;-sqrt(2),sqrt(2),0;0,0,2]/2; %! m(:,1) = m(:,1)*sign(COEFF(1,1)); %! m(:,2) = m(:,2)*sign(COEFF(1,2)); %! m(:,3) = m(:,3)*sign(COEFF(3,3)); %!assert(COEFF,m,10*eps); %!assert(SCORE(:,1),-m(1:2,1),10*eps); %!assert(SCORE(:,2:3),zeros(2),10*eps); %!assert(latent,[1;0;0],10*eps); %!xtest %! assert(tsquare,[0.5;0.5],10*eps) statistics/inst/jsupdf.m0000644000175000017500000000414711741556364015256 0ustar asneltasnelt## Copyright (C) 2006 Frederick (Rick) A Niles ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} jsupdf (@var{x}, @var{alpha1}, @var{alpha2}) ## For each element of @var{x}, compute the probability density function ## (PDF) at @var{x} of the Johnson SU distribution with shape parameters @var{alpha1} ## and @var{alpha2}. ## ## Default values are @var{alpha1} = 1, @var{alpha2} = 1. ## @end deftypefn ## Author: Frederick (Rick) A Niles ## Description: PDF of Johnson SU distribution ## This function is derived from normpdf.m ## This is the TeX equation of this function: ## ## \[ f(x) = \frac{\alpha_2}{\sqrt{x^2+1}} \phi\left(\alpha_1+\alpha_2 ## \log{\left(x+\sqrt{x^2+1}\right)}\right) \] ## ## where \[ -\infty < x < \infty ; \alpha_2 > 0 \] and $\phi$ is the ## standard normal probability distribution function. $\alpha_1$ and ## $\alpha_2$ are shape parameters. function pdf = jsupdf (x, alpha1, alpha2) if (nargin != 1 && nargin != 3) print_usage; endif if (nargin == 1) alpha1 = 1; alpha2 = 1; endif if (!isscalar (alpha1) || !isscalar(alpha2)) [retval, x, alpha1, alpha2] = common_size (x, alpha1, alpha2); if (retval > 0) error ("normpdf: x, alpha1 and alpha2 must be of common size or scalars"); endif endif one = ones(size(x)); sr = sqrt(x.*x + one); pdf = (alpha2 ./ sr) .* stdnormal_pdf (alpha1 .* one + alpha2 .* log (x + sr)); endfunction statistics/inst/monotone_smooth.m0000644000175000017500000001245611741556364017214 0ustar asneltasnelt## Copyright (C) 2011 Nir Krakauer ## Copyright (C) 2011 CarnĂ« Draug ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{yy} =} monotone_smooth (@var{x}, @var{y}, @var{h}) ## Produce a smooth monotone increasing approximation to a sampled functional ## dependence y(x) using a kernel method (an Epanechnikov smoothing kernel is ## applied to y(x); this is integrated to yield the monotone increasing form. ## See Reference 1 for details.) ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is a vector of values of the independent variable. ## ## @item ## @var{y} is a vector of values of the dependent variable, of the same size as ## @var{x}. For best performance, it is recommended that the @var{y} already be ## fairly smooth, e.g. by applying a kernel smoothing to the original values if ## they are noisy. ## ## @item ## @var{h} is the kernel bandwidth to use. If @var{h} is not given, a "reasonable" ## value is computed. ## ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{yy} is the vector of smooth monotone increasing function values at @var{x}. ## ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = 0:0.1:10; ## y = (x .^ 2) + 3 * randn(size(x)); %typically non-monotonic from the added noise ## ys = ([y(1) y(1:(end-1))] + y + [y(2:end) y(end)])/3; %crudely smoothed via ## moving average, but still typically non-monotonic ## yy = monotone_smooth(x, ys); %yy is monotone increasing in x ## plot(x, y, '+', x, ys, x, yy) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Holger Dette, Natalie Neumeyer and Kay F. Pilz (2006), A simple nonparametric ## estimator of a strictly monotone regression function, @cite{Bernoulli}, 12:469-490 ## @item ## Regine Scheder (2007), R Package 'monoProc', Version 1.0-6, ## @url{http://cran.r-project.org/web/packages/monoProc/monoProc.pdf} (The ## implementation here is based on the monoProc function mono.1d) ## @end enumerate ## @end deftypefn ## Author: Nir Krakauer ## Description: Nonparametric monotone increasing regression function yy = monotone_smooth (x, y, h) if (nargin < 2 || nargin > 3) print_usage (); elseif (!isnumeric (x) || !isvector (x)) error ("first argument x must be a numeric vector") elseif (!isnumeric (y) || !isvector (y)) error ("second argument y must be a numeric vector") elseif (numel (x) != numel (y)) error ("x and y must have the same number of elements") elseif (nargin == 3 && (!isscalar (h) || !isnumeric (h))) error ("third argument 'h' (kernel bandwith) must a numeric scalar") endif n = numel(x); %set filter bandwidth at a reasonable default value, if not specified if (nargin != 3) s = std(x); h = s / (n^0.2); end x_min = min(x); x_max = max(x); y_min = min(y); y_max = max(y); %transform range of x to [0, 1] xl = (x - x_min) / (x_max - x_min); yy = ones(size(y)); %Epanechnikov smoothing kernel (with finite support) %K_epanech_kernel = @(z) (3/4) * ((1 - z).^2) .* (abs(z) < 1); K_epanech_int = @(z) mean(((abs(z) < 1)/2) - (3/4) * (z .* (abs(z) < 1) - (1/3) * (z.^3) .* (abs(z) < 1)) + (z < -1)); %integral of kernels up to t monotone_inverse = @(t) K_epanech_int((y - t) / h); %find the value of the monotone smooth function at each point in x niter_max = 150; %maximum number of iterations for estimating each value (should not be reached in most cases) for l = 1:n tmax = y_max; tmin = y_min; wmin = monotone_inverse(tmin); wmax = monotone_inverse(tmax); if (wmax == wmin) yy(l) = tmin; else wt = xl(l); iter_max_reached = 1; for i = 1:niter_max wt_scaled = (wt - wmin) / (wmax - wmin); tn = tmin + wt_scaled * (tmax - tmin) ; wn = monotone_inverse(tn); wn_scaled = (wn - wmin) / (wmax - wmin); %if (abs(wt-wn) < 1E-4) || (tn < (y_min-0.1)) || (tn > (y_max+0.1)) %% criterion for break in the R code -- replaced by the following line to %% hopefully be less dependent on the scale of y if (abs(wt_scaled-wn_scaled) < 1E-4) || (wt_scaled < -0.1) || (wt_scaled > 1.1) iter_max_reached = 0; break endif if wn > wt tmax = tn; wmax = wn; else tmin = tn; wmin = wn; endif endfor if iter_max_reached warning("at x = %g, maximum number of iterations %d reached without convergence; approximation may not be optimal", x(l), niter_max) endif yy(l) = tmin + (wt - wmin) * (tmax - tmin) / (wmax - wmin); endif endfor endfunction statistics/inst/gamfit.m0000644000175000017500000000324712041377754015230 0ustar asneltasnelt## Author: Martijn van Oosterhout ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} {@var{MLE} =} gamfit (@var{data}) ## Calculate gamma distribution parameters. ## ## Find the maximum likelihood estimators (@var{mle}s) of the Gamma distribution ## of @var{data}. @var{MLE} is a two element vector with shape parameter ## @var{A} and scale @var{B}. ## ## @seealso{gampdf, gaminv, gamrnd, gamlike} ## @end deftypefn ## This function works by minimizing the value of gamlike for the vector R. ## Just about any minimization function will work, all it has to do a ## minimize for one variable. Although the gamma distribution has two ## parameters, their product is the mean of the data. so a helper function ## for the search takes one parameter, calculates the other and then returns ## the value of gamlike. ## FIXME is this still true??? ## Note: Octave uses the inverse scale parameter, which is the opposite of ## Matlab. To work for Matlab, value of b needs to be inverted in a few ## places (marked with **) function res = gamfit(R) if (nargin != 1) print_usage; endif avg = mean(R); # This can be just about any search function. I choose this because it # seemed to be the only one that might work in this situaition... a=nmsmax( @gamfit_search, 1, [], [], avg, R ); b=a/avg; # ** res=[a 1/b]; endfunction # Helper function so we only have to minimize for one variable. Also to # inverting the output of gamlike, incase the optimisation function wants to # maximize rather than minimize. function res = gamfit_search( a, avg, R ) b=a/avg; # ** res = -gamlike([a 1/b], R); endfunction statistics/inst/private/0000755000175000017500000000000012271566224015244 5ustar asneltasneltstatistics/inst/private/tbl_delim.m0000644000175000017500000000614411741556364017367 0ustar asneltasnelt## Copyright (C) 2008 Bill Denney ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{d}, @var{err}] = } tbl_delim (@var{d}) ## Return the delimiter for tblread or tblwrite. ## ## The delimeter, @var{d} may be any single character or ## @itemize ## @item "space" " " (default) ## @item "tab" "\t" ## @item "comma" "," ## @item "semi" ";" ## @item "bar" "|" ## @end itemize ## ## @var{err} will be empty if there is no error, and @var{d} will be NaN ## if there is an error. You MUST check the value of @var{err}. ## @seealso{tblread, tblwrite} ## @end deftypefn function [d, err] = tbl_delim (d) ## Check arguments if nargin != 1 print_usage (); endif err = ""; ## Format the delimiter if ischar (d) ## allow for escape characters d = sprintf (d); if numel (d) > 1 ## allow the word forms s.space = " "; s.tab = "\t"; s.comma = ","; s.semi = ";"; s.bar = "|"; if ! ismember (d, fieldnames (s)) err = ["tblread: delimiter must be either a single " ... "character or one of\n" ... sprintf("%s, ", fieldnames (s){:})(1:end-2)]; d = NaN; else d = s.(d); endif endif else err = "delimiter must be a character"; d = NaN; endif if isempty (d) err = "the delimiter may not be empty"; d = NaN; endif endfunction ## Tests ## The defaults %!test %! [d err] = tbl_delim (" "); %! assert (d, " "); %! assert (err, ""); ## Named delimiters %!test %! [d err] = tbl_delim ("space"); %! assert (d, " "); %! assert (err, ""); %!test %! [d err] = tbl_delim ("tab"); %! assert (d, sprintf ("\t")); %! assert (err, ""); %!test %! [d err] = tbl_delim ("comma"); %! assert (d, ","); %! assert (err, ""); %!test %! [d err] = tbl_delim ("semi"); %! assert (d, ";"); %! assert (err, ""); %!test %! [d err] = tbl_delim ("bar"); %! assert (d, "|"); %! assert (err, ""); ## An arbitrary character %!test %! [d err] = tbl_delim ("x"); %! assert (d, "x"); %! assert (err, ""); ## An arbitrary escape string %!test %! [d err] = tbl_delim ('\r'); %! assert (d, sprintf ('\r')) %! assert (err, ""); ## Errors %!test %! [d err] = tbl_delim ("bars"); %! assert (isnan (d)); %! assert (! isempty (err)); %!test %! [d err] = tbl_delim (""); %! assert (isnan (d)); %! assert (! isempty (err)); %!test %! [d err] = tbl_delim (5); %! assert (isnan (d)); %! assert (! isempty (err)); %!test %! [d err] = tbl_delim ({"."}); %! assert (isnan (d)); %! assert (! isempty (err)); statistics/inst/mnrnd.m0000644000175000017500000001334611752275032015072 0ustar asneltasnelt## Copyright (C) 2012 Arno Onken ## ## This program is free software: you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 3 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program. If not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{x} =} mnrnd (@var{n}, @var{p}) ## @deftypefnx {Function File} {@var{x} =} mnrnd (@var{n}, @var{p}, @var{s}) ## Generate random samples from the multinomial distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{n} is the first parameter of the multinomial distribution. @var{n} can ## be scalar or a vector containing the number of trials of each multinomial ## sample. The elements of @var{n} must be non-negative integers. ## ## @item ## @var{p} is the second parameter of the multinomial distribution. @var{p} can ## be a vector with the probabilities of the categories or a matrix with each ## row containing the probabilities of a multinomial sample. If @var{p} has ## more than one row and @var{n} is non-scalar, then the number of rows of ## @var{p} must match the number of elements of @var{n}. ## ## @item ## @var{s} is the number of multinomial samples to be generated. @var{s} must ## be a non-negative integer. If @var{s} is specified, then @var{n} must be ## scalar and @var{p} must be a vector. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{x} is a matrix of random samples from the multinomial distribution with ## corresponding parameters @var{n} and @var{p}. Each row corresponds to one ## multinomial sample. The number of columns, therefore, corresponds to the ## number of columns of @var{p}. If @var{s} is not specified, then the number ## of rows of @var{x} is the maximum of the number of elements of @var{n} and ## the number of rows of @var{p}. If a row of @var{p} does not sum to @code{1}, ## then the corresponding row of @var{x} will contain only @code{NaN} values. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## n = 10; ## p = [0.2, 0.5, 0.3]; ## x = mnrnd (n, p); ## @end group ## ## @group ## n = 10 * ones (3, 1); ## p = [0.2, 0.5, 0.3]; ## x = mnrnd (n, p); ## @end group ## ## @group ## n = (1:2)'; ## p = [0.2, 0.5, 0.3; 0.1, 0.1, 0.8]; ## x = mnrnd (n, p); ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, 2001. ## ## @item ## Merran Evans, Nicholas Hastings and Brian Peacock. @cite{Statistical ## Distributions}. pages 134-136, Wiley, New York, third edition, 2000. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Random samples from the multinomial distribution function x = mnrnd (n, p, s) # Check arguments if (nargin == 3) if (! isscalar (n) || n < 0 || round (n) != n) error ("mnrnd: n must be a non-negative integer"); endif if (! isvector (p) || any (p < 0 | p > 1)) error ("mnrnd: p must be a vector of probabilities"); endif if (! isscalar (s) || s < 0 || round (s) != s) error ("mnrnd: s must be a non-negative integer"); endif elseif (nargin == 2) if (isvector (p) && size (p, 1) > 1) p = p'; endif if (! isvector (n) || any (n < 0 | round (n) != n) || size (n, 2) > 1) error ("mnrnd: n must be a non-negative integer column vector"); endif if (! ismatrix (p) || isempty (p) || any (p < 0 | p > 1)) error ("mnrnd: p must be a non-empty matrix with rows of probabilities"); endif if (! isscalar (n) && size (p, 1) > 1 && length (n) != size (p, 1)) error ("mnrnd: the length of n must match the number of rows of p"); endif else print_usage (); endif # Adjust input sizes if (nargin == 3) n = n * ones (s, 1); p = repmat (p(:)', s, 1); elseif (nargin == 2) if (isscalar (n) && size (p, 1) > 1) n = n * ones (size (p, 1), 1); elseif (size (p, 1) == 1) p = repmat (p, length (n), 1); endif endif sz = size (p); # Upper bounds of categories ub = cumsum (p, 2); # Make sure that the greatest upper bound is 1 gub = ub(:, end); ub(:, end) = 1; # Lower bounds of categories lb = [zeros(sz(1), 1) ub(:, 1:(end-1))]; # Draw multinomial samples x = zeros (sz); for i = 1:sz(1) # Draw uniform random numbers r = repmat (rand (n(i), 1), 1, sz(2)); # Compare the random numbers of r to the cumulated probabilities of p and # count the number of samples for each category x(i, :) = sum (r <= repmat (ub(i, :), n(i), 1) & r > repmat (lb(i, :), n(i), 1), 1); endfor # Set invalid rows to NaN k = (abs (gub - 1) > 1e-6); x(k, :) = NaN; endfunction %!test %! n = 10; %! p = [0.2, 0.5, 0.3]; %! x = mnrnd (n, p); %! assert (size (x), size (p)); %! assert (all (x >= 0)); %! assert (all (round (x) == x)); %! assert (sum (x) == n); %!test %! n = 10 * ones (3, 1); %! p = [0.2, 0.5, 0.3]; %! x = mnrnd (n, p); %! assert (size (x), [length(n), length(p)]); %! assert (all (x >= 0)); %! assert (all (round (x) == x)); %! assert (all (sum (x, 2) == n)); %!test %! n = (1:2)'; %! p = [0.2, 0.5, 0.3; 0.1, 0.1, 0.8]; %! x = mnrnd (n, p); %! assert (size (x), size (p)); %! assert (all (x >= 0)); %! assert (all (round (x) == x)); %! assert (all (sum (x, 2) == n)); statistics/inst/mvtrnd.m0000644000175000017500000001032212056125211015244 0ustar asneltasnelt## Copyright (C) 2012 Arno Onken , Iñigo Urteaga ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{x} =} mvtrnd (@var{sigma}, @var{nu}) ## @deftypefnx {Function File} {@var{x} =} mvtrnd (@var{sigma}, @var{nu}, @var{n}) ## Generate random samples from the multivariate t-distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{sigma} is the matrix of correlation coefficients. If there are any ## non-unit diagonal elements then @var{sigma} will be normalized, so that the ## resulting covariance of the obtained samples @var{x} follows: ## @code{cov (x) = nu/(nu-2) * sigma ./ (sqrt (diag (sigma) * diag (sigma)))}. ## In order to obtain samples distributed according to a standard multivariate ## t-distribution, @var{sigma} must be equal to the identity matrix. To generate ## multivariate t-distribution samples @var{x} with arbitrary covariance matrix ## @var{sigma}, the following scaling might be used: ## @code{x = mvtrnd (sigma, nu, n) * diag (sqrt (diag (sigma)))}. ## ## @item ## @var{nu} is the degrees of freedom for the multivariate t-distribution. ## @var{nu} must be a vector with the same number of elements as samples to be ## generated or be scalar. ## ## @item ## @var{n} is the number of rows of the matrix to be generated. @var{n} must be ## a non-negative integer and corresponds to the number of samples to be ## generated. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{x} is a matrix of random samples from the multivariate t-distribution ## with @var{n} row samples. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## sigma = [1, 0.5; 0.5, 1]; ## nu = 3; ## n = 10; ## x = mvtrnd (sigma, nu, n); ## @end group ## ## @group ## sigma = [1, 0.5; 0.5, 1]; ## nu = [2; 3]; ## n = 2; ## x = mvtrnd (sigma, nu, 2); ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, 2001. ## ## @item ## Samuel Kotz and Saralees Nadarajah. @cite{Multivariate t Distributions and ## Their Applications}. Cambridge University Press, Cambridge, 2004. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Random samples from the multivariate t-distribution function x = mvtrnd (sigma, nu, n) # Check arguments if (nargin < 2) print_usage (); endif if (! ismatrix (sigma) || any (any (sigma != sigma')) || min (eig (sigma)) <= 0) error ("mvtrnd: sigma must be a positive definite matrix"); endif if (!isvector (nu) || any (nu <= 0)) error ("mvtrnd: nu must be a positive scalar or vector"); endif nu = nu(:); if (nargin > 2) if (! isscalar (n) || n < 0 | round (n) != n) error ("mvtrnd: n must be a non-negative integer") endif if (isscalar (nu)) nu = nu * ones (n, 1); else if (length (nu) != n) error ("mvtrnd: n must match the length of nu") endif endif else n = length (nu); endif # Normalize sigma if (any (diag (sigma) != 1)) sigma = sigma ./ sqrt (diag (sigma) * diag (sigma)'); endif # Dimension d = size (sigma, 1); # Draw samples y = mvnrnd (zeros (1, d), sigma, n); u = repmat (chi2rnd (nu), 1, d); x = y .* sqrt (repmat (nu, 1, d) ./ u); endfunction %!test %! sigma = [1, 0.5; 0.5, 1]; %! nu = 3; %! n = 10; %! x = mvtrnd (sigma, nu, n); %! assert (size (x), [10, 2]); %!test %! sigma = [1, 0.5; 0.5, 1]; %! nu = [2; 3]; %! n = 2; %! x = mvtrnd (sigma, nu, 2); %! assert (size (x), [2, 2]); statistics/inst/dendogram.m0000644000175000017500000000640712271002042015676 0ustar asneltasnelt%% Copyright (c) 2012 Juan Pablo Carbajal %% %% This program is free software: you can redistribute it and/or modify %% it under the terms of the GNU General Public License as published by %% the Free Software Foundation, either version 3 of the License, or %% any later version. %% %% This program is distributed in the hope that it will be useful, %% but WITHOUT ANY WARRANTY; without even the implied warranty of %% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the %% GNU General Public License for more details. %% %% You should have received a copy of the GNU General Public License %% along with this program. If not, see . %% -*- texinfo -*- %% @deftypefn {Function File} {@var{p} = } dendogram (@var{tree}) %% @deftypefnx {Function File} {@var{p, t} = } dendogram (@var{tree}) %% @deftypefnx {Function File} {@var{p, t, perm} = } dendogram (@var{tree}) %% Plots a dendogram using the output of function @command{linkage}. %% %% t is a vector containing the leaf node number for each object in the %% original dataset. For now, all objects are leaf nodes. %% %% perm is the permutation of the input objects used to display the %% dendrogram, in left-to-right order. %% %% TODO: Return handle to lines to set properties %% TODO: Rescale the plot automatically based on data. %% %% @seealso{linkage} %% @end deftypefn function [p, t, perm] = dendogram (tree) [m d] = size (tree); if d != 3 error ("Input data must be a tree as returned by function linkage.") end n = m + 1; % t is the leaf node number for all objects in the original dataset. % TODO: Add support for collapsing the tree. % For now, we always display all objects, so this is the identity map. t = (1:m)'; nc = max(tree(:,1:2)(:)); % Vector with the horizontal and vertical position of each cluster p = zeros (nc,2); perm = zeros (n,1); %% Ordering by depth-first search nodecount = 0; nodes_to_visit = nc+1; while !isempty(nodes_to_visit) currentnode = nodes_to_visit(1); nodes_to_visit(1) = []; if currentnode > n node = currentnode - n; nodes_to_visit = [tree(node,[2 1]) nodes_to_visit]; end if currentnode <= n && p(currentnode,1) == 0 nodecount +=1; p(currentnode,1) = nodecount; perm(nodecount) = currentnode; end end % Compute the horizontal position, begin-end % and vertical position of all clusters. for i = 1:m p(n+i,1) = mean (p(tree(i,1:2),1)); p(n+i,2) = tree(i,3); x(i,1:2) = p(tree(i,1:2),1); end figure(gcf) % plot horizontal lines tmp = line (x', tree(:,[3 3])'); % plot vertical lines [~,tf] = ismember (1:nc, tree(:,1:2)); [ind,~] = ind2sub (size (tree(:,1:2)), tf); y = [p(1:nc,2) tree(ind,3)]; tmp = line ([p(1:nc,1) p(1:nc,1)]',y'); xticks = 1:n; xl_txt = arrayfun (@num2str, perm,"uniformoutput",false); set (gca,"xticklabel",xl_txt,"xtick",xticks); axis ([0.5 n+0.5 0 max(tree(:,3))+0.1*min(tree(:,3))]); endfunction %!demo %! y = [4 5; 2 6; 3 7; 8 9; 1 10]; %! y(:,3) = 1:5; %! figure(gcf); clf; %! dendogram(y); %!demo %! v = 2*rand(30,1)-1; %! d = abs(bsxfun(@minus, v(:,1), v(:,1)')); %! y = linkage (squareform(d,"tovector")); %! figure(gcf); clf; %! dendogram(y); statistics/inst/hmmestimate.m0000644000175000017500000003254411741556364016302 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{transprobest}, @var{outprobest}] =} hmmestimate (@var{sequence}, @var{states}) ## @deftypefnx {Function File} {} hmmestimate (@dots{}, 'statenames', @var{statenames}) ## @deftypefnx {Function File} {} hmmestimate (@dots{}, 'symbols', @var{symbols}) ## @deftypefnx {Function File} {} hmmestimate (@dots{}, 'pseudotransitions', @var{pseudotransitions}) ## @deftypefnx {Function File} {} hmmestimate (@dots{}, 'pseudoemissions', @var{pseudoemissions}) ## Estimate the matrix of transition probabilities and the matrix of output ## probabilities of a given sequence of outputs and states generated by a ## hidden Markov model. The model assumes that the generation starts in ## state @code{1} at step @code{0} but does not include step @code{0} in the ## generated states and sequence. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{sequence} is a vector of a sequence of given outputs. The outputs ## must be integers ranging from @code{1} to the number of outputs of the ## hidden Markov model. ## ## @item ## @var{states} is a vector of the same length as @var{sequence} of given ## states. The states must be integers ranging from @code{1} to the number ## of states of the hidden Markov model. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{transprobest} is the matrix of the estimated transition ## probabilities of the states. @code{transprobest(i, j)} is the estimated ## probability of a transition to state @code{j} given state @code{i}. ## ## @item ## @var{outprobest} is the matrix of the estimated output probabilities. ## @code{outprobest(i, j)} is the estimated probability of generating ## output @code{j} given state @code{i}. ## @end itemize ## ## If @code{'symbols'} is specified, then @var{sequence} is expected to be a ## sequence of the elements of @var{symbols} instead of integers. ## @var{symbols} can be a cell array. ## ## If @code{'statenames'} is specified, then @var{states} is expected to be ## a sequence of the elements of @var{statenames} instead of integers. ## @var{statenames} can be a cell array. ## ## If @code{'pseudotransitions'} is specified then the integer matrix ## @var{pseudotransitions} is used as an initial number of counted ## transitions. @code{pseudotransitions(i, j)} is the initial number of ## counted transitions from state @code{i} to state @code{j}. ## @var{transprobest} will have the same size as @var{pseudotransitions}. ## Use this if you have transitions that are very unlikely to occur. ## ## If @code{'pseudoemissions'} is specified then the integer matrix ## @var{pseudoemissions} is used as an initial number of counted outputs. ## @code{pseudoemissions(i, j)} is the initial number of counted outputs ## @code{j} given state @code{i}. If @code{'pseudoemissions'} is also ## specified then the number of rows of @var{pseudoemissions} must be the ## same as the number of rows of @var{pseudotransitions}. @var{outprobest} ## will have the same size as @var{pseudoemissions}. Use this if you have ## outputs or states that are very unlikely to occur. ## ## @subheading Examples ## ## @example ## @group ## transprob = [0.8, 0.2; 0.4, 0.6]; ## outprob = [0.2, 0.4, 0.4; 0.7, 0.2, 0.1]; ## [sequence, states] = hmmgenerate (25, transprob, outprob); ## [transprobest, outprobest] = hmmestimate (sequence, states) ## @end group ## ## @group ## symbols = @{'A', 'B', 'C'@}; ## statenames = @{'One', 'Two'@}; ## [sequence, states] = hmmgenerate (25, transprob, outprob, ## 'symbols', symbols, 'statenames', statenames); ## [transprobest, outprobest] = hmmestimate (sequence, states, ## 'symbols', symbols, ## 'statenames', statenames) ## @end group ## ## @group ## pseudotransitions = [8, 2; 4, 6]; ## pseudoemissions = [2, 4, 4; 7, 2, 1]; ## [sequence, states] = hmmgenerate (25, transprob, outprob); ## [transprobest, outprobest] = hmmestimate (sequence, states, 'pseudotransitions', pseudotransitions, 'pseudoemissions', pseudoemissions) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected ## Applications in Speech Recognition. @cite{Proceedings of the IEEE}, ## 77(2), pages 257-286, February 1989. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Estimation of a hidden Markov model for a given sequence function [transprobest, outprobest] = hmmestimate (sequence, states, varargin) # Check arguments if (nargin < 2 || mod (length (varargin), 2) != 0) print_usage (); endif len = length (sequence); if (length (states) != len) error ("hmmestimate: sequence and states must have equal length"); endif # Flag for symbols usesym = false; # Flag for statenames usesn = false; # Variables for return values transprobest = []; outprobest = []; # Process varargin for i = 1:2:length (varargin) # There must be an identifier: 'symbols', 'statenames', # 'pseudotransitions' or 'pseudoemissions' if (! ischar (varargin{i})) print_usage (); endif # Upper case is also fine lowerarg = lower (varargin{i}); if (strcmp (lowerarg, 'symbols')) usesym = true; # Use the following argument as symbols symbols = varargin{i + 1}; # The same for statenames elseif (strcmp (lowerarg, 'statenames')) usesn = true; # Use the following argument as statenames statenames = varargin{i + 1}; elseif (strcmp (lowerarg, 'pseudotransitions')) # Use the following argument as an initial count for transitions transprobest = varargin{i + 1}; if (! ismatrix (transprobest)) error ("hmmestimate: pseudotransitions must be a non-empty numeric matrix"); endif if (rows (transprobest) != columns (transprobest)) error ("hmmestimate: pseudotransitions must be a square matrix"); endif elseif (strcmp (lowerarg, 'pseudoemissions')) # Use the following argument as an initial count for outputs outprobest = varargin{i + 1}; if (! ismatrix (outprobest)) error ("hmmestimate: pseudoemissions must be a non-empty numeric matrix"); endif else error ("hmmestimate: expected 'symbols', 'statenames', 'pseudotransitions' or 'pseudoemissions' but found '%s'", varargin{i}); endif endfor # Transform sequence from symbols to integers if necessary if (usesym) # sequenceint is used to build the transformed sequence sequenceint = zeros (1, len); for i = 1:length (symbols) # Search for symbols(i) in the sequence, isequal will have 1 at # corresponding indices; i is the right integer for that symbol isequal = ismember (sequence, symbols(i)); # We do not want to change sequenceint if the symbol appears a second # time in symbols if (any ((sequenceint == 0) & (isequal == 1))) isequal *= i; sequenceint += isequal; endif endfor if (! all (sequenceint)) index = max ((sequenceint == 0) .* (1:len)); error (["hmmestimate: sequence(" int2str (index) ") not in symbols"]); endif sequence = sequenceint; else if (! isvector (sequence)) error ("hmmestimate: sequence must be a non-empty vector"); endif if (! all (ismember (sequence, 1:max (sequence)))) index = max ((ismember (sequence, 1:max (sequence)) == 0) .* (1:len)); error (["hmmestimate: sequence(" int2str (index) ") not feasible"]); endif endif # Transform states from statenames to integers if necessary if (usesn) # statesint is used to build the transformed states statesint = zeros (1, len); for i = 1:length (statenames) # Search for statenames(i) in states, isequal will have 1 at # corresponding indices; i is the right integer for that statename isequal = ismember (states, statenames(i)); # We do not want to change statesint if the statename appears a second # time in statenames if (any ((statesint == 0) & (isequal == 1))) isequal *= i; statesint += isequal; endif endfor if (! all (statesint)) index = max ((statesint == 0) .* (1:len)); error (["hmmestimate: states(" int2str (index) ") not in statenames"]); endif states = statesint; else if (! isvector (states)) error ("hmmestimate: states must be a non-empty vector"); endif if (! all (ismember (states, 1:max (states)))) index = max ((ismember (states, 1:max (states)) == 0) .* (1:len)); error (["hmmestimate: states(" int2str (index) ") not feasible"]); endif endif # Estimate the number of different states as the max of states nstate = max (states); # Estimate the number of different outputs as the max of sequence noutput = max (sequence); # transprobest is empty if pseudotransitions is not specified if (isempty (transprobest)) # outprobest is not empty if pseudoemissions is specified if (! isempty (outprobest)) if (nstate > rows (outprobest)) error ("hmmestimate: not enough rows in pseudoemissions"); endif # The number of states is specified by pseudoemissions nstate = rows (outprobest); endif transprobest = zeros (nstate, nstate); else if (nstate > rows (transprobest)) error ("hmmestimate: not enough rows in pseudotransitions"); endif # The number of states is given by pseudotransitions nstate = rows (transprobest); endif # outprobest is empty if pseudoemissions is not specified if (isempty (outprobest)) outprobest = zeros (nstate, noutput); else if (noutput > columns (outprobest)) error ("hmmestimate: not enough columns in pseudoemissions"); endif # Number of outputs is specified by pseudoemissions noutput = columns (outprobest); if (rows (outprobest) != nstate) error ("hmmestimate: pseudoemissions must have the same number of rows as pseudotransitions"); endif endif # Assume that the model started in state 1 cstate = 1; for i = 1:len # Count the number of transitions for each state pair transprobest(cstate, states(i)) ++; cstate = states (i); # Count the number of outputs for each state output pair outprobest(cstate, sequence(i)) ++; endfor # transprobest and outprobest contain counted numbers # Each row in transprobest and outprobest should contain estimated # probabilities # => scale so that the sum is 1 # A zero row remains zero # - for transprobest s = sum (transprobest, 2); s(s == 0) = 1; transprobest = transprobest ./ (s * ones (1, nstate)); # - for outprobest s = sum (outprobest, 2); s(s == 0) = 1; outprobest = outprobest ./ (s * ones (1, noutput)); endfunction %!test %! sequence = [1, 2, 1, 1, 1, 2, 2, 1, 2, 3, 3, 3, 3, 2, 3, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3]; %! states = [1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1]; %! [transprobest, outprobest] = hmmestimate (sequence, states); %! expectedtransprob = [0.88889, 0.11111; 0.28571, 0.71429]; %! expectedoutprob = [0.16667, 0.33333, 0.50000; 1.00000, 0.00000, 0.00000]; %! assert (transprobest, expectedtransprob, 0.001); %! assert (outprobest, expectedoutprob, 0.001); %!test %! sequence = {'A', 'B', 'A', 'A', 'A', 'B', 'B', 'A', 'B', 'C', 'C', 'C', 'C', 'B', 'C', 'A', 'A', 'A', 'A', 'C', 'C', 'B', 'C', 'A', 'C'}; %! states = {'One', 'One', 'Two', 'Two', 'Two', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'One', 'Two', 'Two', 'Two', 'Two', 'One', 'One', 'One', 'One', 'One', 'One'}; %! symbols = {'A', 'B', 'C'}; %! statenames = {'One', 'Two'}; %! [transprobest, outprobest] = hmmestimate (sequence, states, 'symbols', symbols, 'statenames', statenames); %! expectedtransprob = [0.88889, 0.11111; 0.28571, 0.71429]; %! expectedoutprob = [0.16667, 0.33333, 0.50000; 1.00000, 0.00000, 0.00000]; %! assert (transprobest, expectedtransprob, 0.001); %! assert (outprobest, expectedoutprob, 0.001); %!test %! sequence = [1, 2, 1, 1, 1, 2, 2, 1, 2, 3, 3, 3, 3, 2, 3, 1, 1, 1, 1, 3, 3, 2, 3, 1, 3]; %! states = [1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1]; %! pseudotransitions = [8, 2; 4, 6]; %! pseudoemissions = [2, 4, 4; 7, 2, 1]; %! [transprobest, outprobest] = hmmestimate (sequence, states, 'pseudotransitions', pseudotransitions, 'pseudoemissions', pseudoemissions); %! expectedtransprob = [0.85714, 0.14286; 0.35294, 0.64706]; %! expectedoutprob = [0.178571, 0.357143, 0.464286; 0.823529, 0.117647, 0.058824]; %! assert (transprobest, expectedtransprob, 0.001); %! assert (outprobest, expectedoutprob, 0.001); statistics/inst/histfit.m0000644000175000017500000000422611762205451015422 0ustar asneltasnelt## Copyright (C) 2003 Alberto Terruzzi ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} histfit (@var{data}, @var{nbins}) ## ## Plot histogram with superimposed fitted normal density. ## ## @code{histfit (@var{data}, @var{nbins})} plots a histogram of the values in ## the vector @var{data} using @var{nbins} bars in the histogram. With one input ## argument, @var{nbins} is set to the square root of the number of elements in ## data. ## ## Example ## ## @example ## histfit (randn (100, 1)) ## @end example ## ## @seealso{bar,hist, pareto} ## @end deftypefn ## Author: Alberto Terruzzi ## Version: 1.0 ## Created: 3 March 2004 function histfit (data,nbins) if nargin < 1 || nargin > 2 print_usage; endif if isvector (data) != 1 error ("data must be a vector."); endif row = sum(~isnan(data)); if nargin < 2 nbins = ceil(sqrt(row)); endif [n,xbin]=hist(data,nbins); if any(abs(diff(xbin,2)) > 10*max(abs(xbin))*eps) error("histfit bins must be uniform width"); endif mr = nanmean(data); ## Estimates the parameter, MU, of the normal distribution. sr = nanstd(data); ## Estimates the parameter, SIGMA, of the normal distribution. x=(-3*sr+mr:0.1*sr:3*sr+mr)';## Evenly spaced samples of the expected data range. [xb,yb] = bar(xbin,n); y = normpdf(x,mr,sr); binwidth = xbin(2)-xbin(1); y = row*y*binwidth; ## Normalization necessary to overplot the histogram. plot(xb,yb,";;b",x,y,";;r-"); ## Plots density line over histogram. endfunction statistics/inst/mvnrnd.m0000644000175000017500000001332712265542200015251 0ustar asneltasnelt## Copyright (C) 2003 Iain Murray ## ## This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} @var{s} = mvnrnd (@var{mu}, @var{Sigma}) ## @deftypefnx{Function File} @var{s} = mvnrnd (@var{mu}, @var{Sigma}, @var{n}) ## @deftypefnx{Function File} @var{s} = mvnrnd (@dots{}, @var{tol}) ## Draw @var{n} random @var{d}-dimensional vectors from a multivariate Gaussian distribution with mean @var{mu}(@var{n}x@var{d}) and covariance matrix ## @var{Sigma}(@var{d}x@var{d}). ## ## @var{mu} must be @var{n}-by-@var{d} (or 1-by-@var{d} if @var{n} is given) or a scalar. ## ## If the argument @var{tol} is given the eigenvalues of @var{Sigma} are checked for positivity against -100*tol. The default value of tol is @code{eps*norm (Sigma, "fro")}. ## ## @end deftypefn function s = mvnrnd (mu, Sigma, K, tol=eps*norm (Sigma, "fro")) % Iain Murray 2003 -- I got sick of this simple thing not being in Octave and locking up a stats-toolbox license in Matlab for no good reason. % May 2004 take a third arg, cases. Makes it more compatible with Matlab's. % Paul Kienzle % * Add GPL notice. % * Add docs for argument K % 2012 Juan Pablo Carbajal % * Uses Octave 3.6.2 broadcast. % * Stabilizes chol by perturbing Sigma with a epsilon multiple of the identity. % The effect on the generated samples is to add additional independent noise of variance epsilon. Ref: GPML Rasmussen & Williams. 2006. pp 200-201 % * Improved doc. % * Added tolerance to the positive definite check % * Used chol with option 'upper'. % 2014 Nir Krakauer % * Add tests. % * Allow mu to be scalar, in which case it's assumed that all elements share this mean. %perform some input checking if ~issquare (Sigma) error ('Sigma must be a square covariance matrix.'); end d = size(Sigma, 1); % If mu is column vector and Sigma not a scalar then assume user didn't read help but let them off and flip mu. Don't be more liberal than this or it will encourage errors (eg what should you do if mu is square?). if (size (mu, 2) == 1) && (d != 1) mu = mu'; end if nargin >= 3 n = K; else n = size(mu, 1); %1 if mu is scalar end if (~isscalar (mu)) && any(size (mu) != [1,d]) && any(size (mu) != [n,d]) error ('mu must be nxd, 1xd, or scalar, where Sigma has dimensions dxd.'); end warning ("off", "Octave:broadcast","local"); try U = chol (Sigma + tol*eye (d),"upper"); catch [E , Lambda] = eig (Sigma); if min (diag (Lambda)) < -100*tol error('Sigma must be positive semi-definite. Lowest eigenvalue %g', ... min (diag (Lambda))); else Lambda(Lambda<0) = 0; end warning ("mvnrnd:InvalidInput","Cholesky factorization failed. Using diagonalized matrix.") U = sqrt (Lambda) * E'; end s = randn(n,d)*U + mu; warning ("on", "Octave:broadcast"); endfunction % {{{ END OF CODE --- Guess I should provide an explanation: % % We can draw from axis aligned unit Gaussians with randn(d) % x ~ A*exp(-0.5*x'*x) % We can then rotate this distribution using % y = U'*x % Note that % x = inv(U')*y % Our new variable y is distributed according to: % y ~ B*exp(-0.5*y'*inv(U'*U)*y) % or % y ~ N(0,Sigma) % where % Sigma = U'*U % For a given Sigma we can use the chol function to find the corresponding U, % draw x and find y. We can adjust for a non-zero mean by just adding it on. % % But the Cholsky decomposition function doesn't always work... % Consider Sigma=[1 1;1 1]. Now inv(Sigma) doesn't actually exist, but Matlab's % mvnrnd provides samples with this covariance st x(1)~N(0,1) x(2)=x(1). The % fast way to deal with this would do something similar to chol but be clever % when the rows aren't linearly independent. However, I can't be bothered, so % another way of doing the decomposition is by diagonalising Sigma (which is % slower but works). % if % [E,Lambda]=eig(Sigma) % then % Sigma = E*Lambda*E' % so % U = sqrt(Lambda)*E' % If any Lambdas are negative then Sigma just isn't even positive semi-definite % so we can give up. % % Paul Kienzle adds: % Where it exists, chol(Sigma) is numerically well behaved. chol(hilb(12)) for doubles and for 100 digit floating point differ in the last digit. % Where chol(Sigma) doesn't exist, X*sqrt(Lambda)*E' will be somewhat accurate. For example, the elements of sqrt(Lambda)*E' for hilb(12), hilb(55) and hilb(120) are accurate to around 1e-8 or better. This was tested using the TNT+JAMA for eig and chol templates, and qlib for 100 digit precision. % }}} %!shared m, n, C, rho %! m = 10; n = 3; rho = 0.4; C = rho*ones(n, n) + (1 - rho)*eye(n); %!assert(size(mvnrnd(0, C, m)), [m n]) %!assert(size(mvnrnd(zeros(1, n), C, m)), [m n]) %!assert(size(mvnrnd(zeros(n, 1), C, m)), [m n]) %!assert(size(mvnrnd(zeros(m, n), C, m)), [m n]) %!assert(size(mvnrnd(zeros(m, n), C)), [m n]) %!assert(size(mvnrnd(zeros(1, n), C)), [1 n]) %!assert(size(mvnrnd(zeros(n, 1), C)), [1 n]) %!error(mvnrnd(zeros(m+1, n), C, m)) %!error(mvnrnd(zeros(1, n+1), C, m)) %!error(mvnrnd(zeros(n+1, 1), C, m)) %!error(mvnrnd(zeros(m, n), eye(n+1), m)) %!error(mvnrnd(zeros(m, n), eye(n+1, n), m)) statistics/inst/repanova.m0000644000175000017500000001002611741556364015567 0ustar asneltasnelt## Copyright (C) 2011 Kyle Winfree ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{pval}, @var{table}, @var{st}] =} repanova (@var{X}, @var{cond}) ## @deftypefnx {Function File} {[@var{pval}, @var{table}, @var{st}] =} repanova (@var{X}, @var{cond}, ['string' | 'cell']) ## Perform a repeated measures analysis of variance (Repeated ANOVA). ## X is formated such that each row is a subject and each column is a condition. ## ## condition is typically a point in time, say t=1 then t=2, etc ## condition can also be thought of as groups. ## ## The optional flag can be either 'cell' or 'string' and reflects ## the format of the table returned. Cell is the default. ## ## NaNs are ignored using nanmean and nanstd. ## ## This fuction does not currently support multiple columns of the same ## condition! ## @end deftypefn function [p, table, st] = repanova(varargin) switch nargin case 0 error('Too few inputs.'); case 1 X = varargin{1}; for c = 1:size(X, 2) condition{c} = ['time', num2str(c)]; end option = 'cell'; case 2 X = varargin{1}; condition = varargin{2}; option = 'cell'; case 3 X = varargin{1}; condition = varargin{2}; option = varargin{3}; otherwise error('Too many inputs.'); end % Find the means of the subjects and measures, ignoring any NaNs u_subjects = nanmean(X,2); u_measures = nanmean(X,1); u_grand = nansum(nansum(X)) / (size(X,1) * size(X,2)); % Differences between rows will be reflected in SS subjects, differences % between columns will be reflected in SS_within subjects. N = size(X,1); % number of subjects J = size(X,2); % number of samples per subject SS_measures = N * nansum((u_measures - u_grand).^2); SS_subjects = J * nansum((u_subjects - u_grand).^2); SS_total = nansum(nansum((X - u_grand).^2)); SS_error = SS_total - SS_measures - SS_subjects; df_measures = J - 1; df_subjects = N - 1; df_grand = (N*J) - 1; df_error = df_grand - df_measures - df_subjects; MS_measures = SS_measures / df_measures; MS_subjects = SS_subjects / df_subjects; MS_error = SS_error / df_error; % variation expected as a result of sampling error alone F = MS_measures / MS_error; p = 1 - fcdf(F, df_measures, df_error); % Probability of F given equal means. if strcmp(option, 'string') table = [sprintf('\nSource\tSS\tdf\tMS\tF\tProb > F'), ... sprintf('\nSubject\t%g\t%i\t%g', SS_subjects, df_subjects, MS_subjects), ... sprintf('\nMeasure\t%g\t%i\t%g\t%g\t%g', SS_measures, df_measures, MS_measures, F, p), ... sprintf('\nError\t%g\t%i\t%g', SS_error, df_error, MS_error), ... sprintf('\n')]; else table = {'Source', 'Partial SS', 'df', 'MS', 'F', 'Prob > F'; ... 'Subject', SS_subjects, df_subjects, MS_subjects, '', ''; ... 'Measure', SS_measures, df_measures, MS_measures, F, p}; end st.gnames = condition'; % this is the same struct format used in anova1 st.n = repmat(N, 1, J); st.source = 'anova1'; % it cannot be assumed that 'repanova' is a supported source for multcompare st.means = u_measures; st.df = df_error; st.s = sqrt(MS_error); end % This function was created with guidance from the following websites: % http://courses.washington.edu/stat217/rmANOVA.html % http://grants.hhp.coe.uh.edu/doconnor/PEP6305/Topic%20010%20Repeated%20Measures.htm statistics/inst/tblread.m0000644000175000017500000000560611741556364015401 0ustar asneltasnelt## Copyright (C) 2008 Bill Denney ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{data}, @var{varnames}, @var{casenames}] =} tblread (@var{filename}) ## @deftypefnx {Function File} {[@var{data}, @var{varnames}, @var{casenames}] =} tblread (@var{filename}, @var{delimeter}) ## Read tabular data from an ascii file. ## ## @var{data} is read from an ascii data file named @var{filename} with ## an optional @var{delimeter}. The delimeter may be any single ## character or ## @itemize ## @item "space" " " (default) ## @item "tab" "\t" ## @item "comma" "," ## @item "semi" ";" ## @item "bar" "|" ## @end itemize ## ## The @var{data} is read starting at cell (2,2) where the ## @var{varnames} form a char matrix from the first row (starting at ## (1,2)) vertically concatenated, and the @var{casenames} form a char ## matrix read from the first column (starting at (2,1)) vertically ## concatenated. ## @seealso{tblwrite, csv2cell, cell2csv} ## @end deftypefn function [data, varnames, casenames] = tblread (f="", d=" ") ## Check arguments if nargin < 1 || nargin > 2 print_usage (); endif if isempty (f) ## FIXME: open a file dialog box in this case when a file dialog box ## becomes available error ("tblread: filename must be given") endif [d err] = tbl_delim (d); if ! isempty (err) error ("tblread: %s", err) endif d = csv2cell (f, d); data = cell2mat (d(2:end, 2:end)); varnames = strvcat (d(1,2:end)); casenames = strvcat (d(2:end,1)); endfunction ## Tests %!shared d, v, c %! d = [1 2;3 4]; %! v = ["a ";"bc"]; %! c = ["de";"f "]; %!test %! [dt vt ct] = tblread ("tblread-space.dat"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! [dt vt ct] = tblread ("tblread-space.dat", " "); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! [dt vt ct] = tblread ("tblread-space.dat", "space"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! [dt vt ct] = tblread ("tblread-tab.dat", "tab"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! [dt vt ct] = tblread ("tblread-tab.dat", "\t"); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); %!test %! [dt vt ct] = tblread ("tblread-tab.dat", '\t'); %! assert (dt, d); %! assert (vt, v); %! assert (ct, c); statistics/inst/regress_gp.m0000644000175000017500000001064112271002042016071 0ustar asneltasnelt## Copyright (c) 2012 Juan Pablo Carbajal ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 3 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{K}] =} regress_gp (@var{x}, @var{y}, @var{Sp}) ## @deftypefnx {Function File} {[@dots{} @var{yi} @var{dy}] =} sqp (@dots{}, @var{xi}) ## Linear scalar regression using gaussian processes. ## ## It estimates the model @var{y} = @var{x}'*m for @var{x} R^D and @var{y} in R. ## The information about errors of the predictions (interpolation/extrapolation) is given ## by the covarianve matrix @var{K}. If D==1 the inputs must be column vectors, ## if D>1 then @var{x} is n-by-D, with n the number of data points. @var{Sp} defines ## the prior covariance of @var{m}, it should be a (D+1)-by-(D+1) positive definite matrix, ## if it is empty, the default is @code{Sp = 100*eye(size(x,2)+1)}. ## ## If @var{xi} inputs are provided, the model is evaluated and returned in @var{yi}. ## The estimation of the variation of @var{yi} are given in @var{dy}. ## ## Run @code{demo regress_gp} to see an examples. ## ## The function is a direc implementation of the formulae in pages 11-12 of ## Gaussian Processes for Machine Learning. Carl Edward Rasmussen and @ ## Christopher K. I. Williams. The MIT Press, 2006. ISBN 0-262-18253-X. ## available online at @url{http://gaussianprocess.org/gpml/}. ## ## @seealso{regress} ## @end deftypefn function [wm K yi dy] = regress_gp (x,y,Sp=[],xi=[]) if isempty(Sp) Sp = 100*eye(size(x,2)+1); end x = [ones(1,size(x,1)); x']; ## Juan Pablo Carbajal ## Note that in the book the equation (below 2.11) for the A reads ## A = (1/sy^2)*x*x' + inv (Vp); ## where sy is the scalar variance of the of the residuals (i.e y = x' * w + epsilon) ## and epsilon is drawn from N(0,sy^2). Vp is the variance of the parameters w. ## Note that ## (sy^2 * A)^{-1} = (1/sy^2)*A^{-1} = (x*x' + sy^2 * inv(Vp))^{-1}; ## and that the formula for the w mean is ## (1/sy^2)*A^{-1}*x*y ## Then one obtains ## inv(x*x' + sy^2 * inv(Vp))*x*y ## Looking at the formula bloew we see that Sp = (1/sy^2)*Vp ## making the regression depend on only one parameter, Sp, and not two. A = x*x' + inv (Sp); K = inv (A); wm = K*x*y; yi =[]; dy =[]; if !isempty (xi); xi = [ones(size(xi,1),1) xi]; yi = xi*wm; dy = diag (xi*K*xi'); end endfunction %!demo %! % 1D Data %! x = 2*rand (5,1)-1; %! y = 2*x -1 + 0.3*randn (5,1); %! %! % Points for interpolation/extrapolation %! xi = linspace (-2,2,10)'; %! %! [m K yi dy] = regress_gp (x,y,[],xi); %! %! plot (x,y,'xk',xi,yi,'r-',xi,bsxfun(@plus, yi, [-dy +dy]),'b-'); %!demo %! % 2D Data %! x = 2*rand (4,2)-1; %! y = 2*x(:,1)-3*x(:,2) -1 + 1*randn (4,1); %! %! % Mesh for interpolation/extrapolation %! [xi yi] = meshgrid (linspace (-1,1,10)); %! %! [m K zi dz] = regress_gp (x,y,[],[xi(:) yi(:)]); %! zi = reshape (zi, 10,10); %! dz = reshape (dz,10,10); %! %! plot3 (x(:,1),x(:,2),y,'.g','markersize',8); %! hold on; %! h = mesh (xi,yi,zi,zeros(10,10)); %! set(h,'facecolor','none'); %! h = mesh (xi,yi,zi+dz,ones(10,10)); %! set(h,'facecolor','none'); %! h = mesh (xi,yi,zi-dz,ones(10,10)); %! set(h,'facecolor','none'); %! hold off %! axis tight %! view(80,25) %!demo %! % Projection over basis function %! pp = [2 2 0.3 1]; %! n = 10; %! x = 2*rand (n,1)-1; %! y = polyval(pp,x) + 0.3*randn (n,1); %! %! % Powers %! px = [sqrt(abs(x)) x x.^2 x.^3]; %! %! % Points for interpolation/extrapolation %! xi = linspace (-1,1,100)'; %! pxi = [sqrt(abs(xi)) xi xi.^2 xi.^3]; %! %! Sp = 100*eye(size(px,2)+1); %! Sp(2,2) = 1; # We don't believe the sqrt is present %! [m K yi dy] = regress_gp (px,y,Sp,pxi); %! disp(m) %! %! plot (x,y,'xk;Data;',xi,yi,'r-;Estimation;',xi,polyval(pp,xi),'g-;True;'); %! axis tight %! axis manual %! hold on %! plot (xi,bsxfun(@plus, yi, [-dy +dy]),'b-'); %! hold off statistics/inst/normalise_distribution.m0000644000175000017500000002302711730425221020532 0ustar asneltasnelt## Copyright (C) 2011 Alexander Klein ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn{Function File} {@var{NORMALISED} =} normalise_distribution (@var{DATA}) ## @deftypefnx{Function File} {@var{NORMALISED} =} normalise_distribution (@var{DATA}, @var{DISTRIBUTION}) ## @deftypefnx{Function File} {@var{NORMALISED} =} normalise_distribution (@var{DATA}, @var{DISTRIBUTION}, @var{DIMENSION}) ## ## Transform a set of data so as to be N(0,1) distributed according to an idea ## by van Albada and Robinson. ## This is achieved by first passing it through its own cumulative distribution ## function (CDF) in order to get a uniform distribution, and then mapping ## the uniform to a normal distribution. ## The data must be passed as a vector or matrix in @var{DATA}. ## If the CDF is unknown, then [] can be passed in @var{DISTRIBUTION}, and in ## this case the empirical CDF will be used. ## Otherwise, if the CDFs for all data are known, they can be passed in ## @var{DISTRIBUTION}, ## either in the form of a single function name as a string, ## or a single function handle, ## or a cell array consisting of either all function names as strings, ## or all function handles. ## In the latter case, the number of CDFs passed must match the number ## of rows, or columns respectively, to normalise. ## If the data are passed as a matrix, then the transformation will ## operate either along the first non-singleton dimension, ## or along @var{DIMENSION} if present. ## ## Notes: ## The empirical CDF will map any two sets of data ## having the same size and their ties in the same places after sorting ## to some permutation of the same normalised data: ## @example ## @code{normalise_distribution([1 2 2 3 4])} ## @result{} -1.28 0.00 0.00 0.52 1.28 ## ## @code{normalise_distribution([1 10 100 10 1000])} ## @result{} -1.28 0.00 0.52 0.00 1.28 ## @end example ## ## Original source: ## S.J. van Albada, P.A. Robinson ## "Transformation of arbitrary distributions to the ## normal distribution with application to EEG ## test-retest reliability" ## Journal of Neuroscience Methods, Volume 161, Issue 2, ## 15 April 2007, Pages 205-211 ## ISSN 0165-0270, 10.1016/j.jneumeth.2006.11.004. ## (http://www.sciencedirect.com/science/article/pii/S0165027006005668) ## @end deftypefn function [ normalised ] = normalise_distribution ( data, distribution, dimension ) if ( nargin < 1 || nargin > 3 ) print_usage; elseif ( !ismatrix ( data ) || length ( size ( data ) ) > 2 ) error ( "First argument must be a vector or matrix" ); end if ( nargin >= 2 ) if ( !isempty ( distribution ) ) #Wrap a single handle in a cell array. if ( strcmp ( typeinfo ( distribution ), typeinfo ( @(x)(x) ) ) ) distribution = { distribution }; #Do we have a string argument instead? elseif ( ischar ( distribution ) ) ##Is it a single string? if ( rows ( distribution ) == 1 ) distribution = { str2func( distribution ) }; else error ( ["Second argument cannot contain more than one string" ... " unless in a cell array"] ); end ##Do we have a cell array of distributions instead? elseif ( iscell ( distribution ) ) ##Does it consist of strings only? if ( all ( cellfun ( @ischar, distribution ) ) ) distribution = cellfun ( @str2func, distribution, "UniformOutput", false ); end ##Does it eventually consist of function handles only if ( !all ( cellfun ( @ ( h ) ( strcmp ( typeinfo ( h ), typeinfo ( @(x)(x) ) ) ), distribution ) ) ) error ( ["Second argument must contain either" ... " a single function name or handle or " ... " a cell array of either all function names or handles!"] ); end else error ( "Illegal second argument: ", typeinfo ( distribution ) ); end end else distribution = []; end if ( nargin == 3 ) if ( !isscalar ( dimension ) || ( dimension != 1 && dimension != 2 ) ) error ( "Third argument must be either 1 or 2" ); end else if ( isvector ( data ) && rows ( data ) == 1 ) dimension = 2; else dimension = 1; end end trp = ( dimension == 2 ); if ( trp ) data = data'; end r = rows ( data ); c = columns ( data ); normalised = NA ( r, c ); ##Do we know the distribution of the sample? if ( isempty ( distribution ) ) precomputed_normalisation = []; for k = 1 : columns ( data ) ##Note that this line is in accordance with equation (16) in the ##original text. The author's original program, however, produces ##different values in the presence of ties, namely those you'd ##get replacing "last" by "first". [ uniq, indices ] = unique ( sort ( data ( :, k ) ), "last" ); ##Does the sample have ties? if ( rows ( uniq ) != r ) ##Transform to uniform, then normal distribution. uniform = ( indices - 1/2 ) / r; normal = norminv ( uniform ); else ## Without ties everything is pretty much straightforward as ## stated in the text. if ( isempty ( precomputed_normalisation ) ) precomputed_normalisation = norminv ( 1 / (2*r) : 1/r : 1 - 1 / (2*r) ); end normal = precomputed_normalisation; end #Find the original indices in the unsorted sample. #This somewhat quirky way of doing it is still faster than #using a for-loop. [ ignore, ignore, target_indices ] = unique ( data (:, k ) ); #Put normalised values in the places where they belong. ## A regression in the 3.4 series made this no longer work so we behave ## differently depending on octave version. This applies the fix for all ## 3.4 releases but it may have appeared on 3.2.4 (can someone check?) ## See https://savannah.gnu.org/bugs/index.php?34765 ## FIXME Once package dependency increases beyond an octave version that ## has this fixed, remove this if (compare_versions (OCTAVE_VERSION, "3.4", "<") || compare_versions (OCTAVE_VERSION, "3.6.2", ">=")) ## this is how it should work f_remap = @( k ) ( normal ( k ) ); normalised ( :, k ) = arrayfun ( f_remap, target_indices ); else ## this is the workaround because of bug in 3.4.?? for index = 1:numel(target_indices) normalised ( index, k ) = normal(target_indices(index)); endfor endif end else ##With known distributions, everything boils down to a few lines of code ##The same distribution for all data? if ( all ( size ( distribution ) == 1 ) ) normalised = norminv ( distribution {1,1} ( data ) ); elseif ( length ( vec ( distribution ) ) == c ) for k = 1 : c normalised ( :, k ) = norminv ( distribution { k } ( data ) ( :, k ) ); end else error ( "Number of distributions does not match data size! ") end end if ( trp ) normalised = normalised'; end endfunction %!test %! v = normalise_distribution ( [ 1 2 3 ], [], 1 ); %! assert ( v, [ 0 0 0 ] ) %!test %! v = normalise_distribution ( [ 1 2 3 ], [], 2 ); %! assert ( v, norminv ( [ 1 3 5 ] / 6 ), 3 * eps ) %!test %! v = normalise_distribution ( [ 1 2 3 ]', [], 2 ); %! assert ( v, [ 0 0 0 ]' ) %!test %! v = normalise_distribution ( [ 1 2 3 ]' , [], 1 ); %! assert ( v, norminv ( [ 1 3 5 ]' / 6 ), 3 * eps ) %!test %! v = normalise_distribution ( [ 1 1 2 2 3 3 ], [], 2 ); %! assert ( v, norminv ( [ 3 3 7 7 11 11 ] / 12 ), 3 * eps ) %!test %! v = normalise_distribution ( [ 1 1 2 2 3 3 ]', [], 1 ); %! assert ( v, norminv ( [ 3 3 7 7 11 11 ]' / 12 ), 3 * eps ) %!test %! A = randn ( 10 ); %! N = normalise_distribution ( A, @normcdf ); %! assert ( A, N, 1000 * eps ) %!xtest %! A = exprnd ( 1, 100 ); %! N = normalise_distribution ( A, @ ( x ) ( expcdf ( x, 1 ) ) ); %! assert ( mean ( vec ( N ) ), 0, 0.1 ) %! assert ( std ( vec ( N ) ), 1, 0.1 ) %!xtest %! A = rand (1000,1); %! N = normalise_distribution ( A, "unifcdf" ); %! assert ( mean ( vec ( N ) ), 0, 0.1 ) %! assert ( std ( vec ( N ) ), 1, 0.1 ) %!xtest %! A = [rand(1000,1), randn( 1000, 1)]; %! N = normalise_distribution ( A, { "unifcdf", "normcdf" } ); %! assert ( mean ( N ), [ 0, 0 ], 0.1 ) %! assert ( std ( N ), [ 1, 1 ], 0.1 ) %!xtest %! A = [rand(1000,1), randn( 1000, 1), exprnd( 1, 1000, 1 )]'; %! N = normalise_distribution ( A, { @unifcdf; @normcdf; @( x )( expcdf ( x, 1 ) ) }, 2 ); %! assert ( mean ( N, 2 ), [ 0, 0, 0 ]', 0.1 ) %! assert ( std ( N, [], 2 ), [ 1, 1, 1 ]', 0.1 ) %!xtest %! A = exprnd ( 1, 1000, 9 ); A ( 300 : 500, 4:6 ) = 17; %! N = normalise_distribution ( A ); %! assert ( mean ( N ), [ 0 0 0 0.38 0.38 0.38 0 0 0 ], 0.1 ); %! assert ( var ( N ), [ 1 1 1 2.59 2.59 2.59 1 1 1 ], 0.1 ); %!test %! fail ("normalise_distribution( zeros ( 3, 4 ), { @unifcdf; @normcdf; @( x )( expcdf ( x, 1 ) ) } )", ... %! "Number of distributions does not match data size!"); statistics/inst/nanvar.m0000644000175000017500000000400011741556364015234 0ustar asneltasnelt# Copyright (C) 2008 Sylvain Pelissier ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {} nanvar (@var{x}) ## @deftypefnx{Function File} {@var{v} =} nanvar (@var{X}, @var{opt}) ## @deftypefnx{Function File} {@var{v} =} nanvar (@var{X}, @var{opt}, @var{dim}) ## Compute the variance while ignoring NaN values. ## ## For vector arguments, return the (real) variance of the values. ## For matrix arguments, return a row vector containing the variance for ## each column. ## ## The argument @var{opt} determines the type of normalization to use. ## Valid values are ## ## @table @asis ## @item 0: ## Normalizes with @math{N-1}, provides the best unbiased estimator of the ## variance [default]. ## @item 1: ## Normalizes with @math{N}, this provides the second moment around the mean. ## @end table ## ## The third argument @var{dim} determines the dimension along which the ## variance is calculated. ## ## @seealso{var, nanmean, nanstd, nanmax, nanmin} ## @end deftypefn function y = nanvar(x,w,dim) if nargin < 1 print_usage (); else if ((nargin < 2) || isempty(w)) w = 0; endif if nargin < 3 dim = min(find(size(x)>1)); if isempty(dim) dim=1; endif endif y = nanstd(x,w,dim).^2; endif endfunction ## Tests %!shared x %! x = [1 2 nan 3 4 5]; %!assert (nanvar (x), var (x(! isnan (x))), 10*eps) statistics/inst/anderson_darling_cdf.m0000644000175000017500000000726611741556364020115 0ustar asneltasnelt## Author: Paul Kienzle ## This program is granted to the public domain. ## -*- texinfo -*- ## @deftypefn {Function File} @var{p} = anderson_darling_cdf (@var{A}, @var{n}) ## ## Return the CDF for the given Anderson-Darling coefficient @var{A} ## computed from @var{n} values sampled from a distribution. For a ## vector of random variables @var{x} of length @var{n}, compute the CDF ## of the values from the distribution from which they are drawn. ## You can uses these values to compute @var{A} as follows: ## ## @example ## @var{A} = -@var{n} - sum( (2*i-1) .* (log(@var{x}) + log(1 - @var{x}(@var{n}:-1:1,:))) )/@var{n}; ## @end example ## ## From the value @var{A}, @code{anderson_darling_cdf} returns the probability ## that @var{A} could be returned from a set of samples. ## ## The algorithm given in [1] claims to be an approximation for the ## Anderson-Darling CDF accurate to 6 decimal points. ## ## Demonstrate using: ## ## @example ## n = 300; reps = 10000; ## z = randn(n, reps); ## x = sort ((1 + erf (z/sqrt (2)))/2); ## i = [1:n]' * ones (1, size (x, 2)); ## A = -n - sum ((2*i-1) .* (log (x) + log (1 - x (n:-1:1, :))))/n; ## p = anderson_darling_cdf (A, n); ## hist (100 * p, [1:100] - 0.5); ## @end example ## ## You will see that the histogram is basically flat, which is to ## say that the probabilities returned by the Anderson-Darling CDF ## are distributed uniformly. ## ## You can easily determine the extreme values of @var{p}: ## ## @example ## [junk, idx] = sort (p); ## @end example ## ## The histograms of various @var{p} aren't very informative: ## ## @example ## histfit (z (:, idx (1)), linspace (-3, 3, 15)); ## histfit (z (:, idx (end/2)), linspace (-3, 3, 15)); ## histfit (z (:, idx (end)), linspace (-3, 3, 15)); ## @end example ## ## More telling is the qqplot: ## ## @example ## qqplot (z (:, idx (1))); hold on; plot ([-3, 3], [-3, 3], ';;'); hold off; ## qqplot (z (:, idx (end/2))); hold on; plot ([-3, 3], [-3, 3], ';;'); hold off; ## qqplot (z (:, idx (end))); hold on; plot ([-3, 3], [-3, 3], ';;'); hold off; ## @end example ## ## Try a similarly analysis for @var{z} uniform: ## ## @example ## z = rand (n, reps); x = sort(z); ## @end example ## ## and for @var{z} exponential: ## ## @example ## z = rande (n, reps); x = sort (1 - exp (-z)); ## @end example ## ## [1] Marsaglia, G; Marsaglia JCW; (2004) "Evaluating the Anderson Darling ## distribution", Journal of Statistical Software, 9(2). ## ## @seealso{anderson_darling_test} ## @end deftypefn function y = anderson_darling_cdf(z,n) y = ADinf(z); y += ADerrfix(y,n); end function y = ADinf(z) y = zeros(size(z)); idx = (z < 2); if any(idx(:)) p = [.00168691, -.0116720, .0347962, -.0649821, .247105, 2.00012]; z1 = z(idx); y(idx) = exp(-1.2337141./z1)./sqrt(z1).*polyval(p,z1); end idx = (z >= 2); if any(idx(:)) p = [-.0003146, +.008056, -.082433, +.43424, -2.30695, 1.0776]; y(idx) = exp(-exp(polyval(p,z(idx)))); end end function y = ADerrfix(x,n) if isscalar(n), n = n*ones(size(x)); elseif isscalar(x), x = x*ones(size(n)); end y = zeros(size(x)); c = .01265 + .1757./n; idx = (x >= 0.8); if any(idx(:)) p = [255.7844, -1116.360, 1950.646, -1705.091, 745.2337, -130.2137]; g3 = polyval(p,x(idx)); y(idx) = g3./n(idx); end idx = (x < 0.8 & x > c); if any(idx(:)) p = [1.91864, -8.259, 14.458, -14.6538, 6.54034, -.00022633]; n1 = 1./n(idx); c1 = c(idx); g2 = polyval(p,(x(idx)-c1)./(.8-c1)); y(idx) = (.04213 + .01365*n1).*n1 .* g2; end idx = (x <= c); if any(idx(:)) x1 = x(idx)./c(idx); n1 = 1./n(idx); g1 = sqrt(x1).*(1-x1).*(49*x1-102); y(idx) = ((.0037*n1+.00078).*n1+.00006).*n1 .* g1; end end statistics/inst/plsregress.m0000644000175000017500000000701712064656741016152 0ustar asneltasnelt## Copyright (C) 2012 Fernando Damian Nieuwveldt ## ## This program is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License ## as published by the Free Software Foundation; either version 3 ## of the License, or (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{XLOADINGS},@var{YLOADINGS},@var{XSCORES},@var{YSCORES},@var{coefficients},@var{fitted}] =} ... ## plsregress(@var{X}, @var{Y}, @var{NCOMP}) ## @itemize @bullet ## @item ## @var{X}: Matrix of observations ## @item ## @var{Y}: Is a vector or matrix of responses ## @item ## @var{NCOMP}: number of components used for modelling ## @item ## @var{X} and @var{Y} will be mean centered to improve accuracy ## @end itemize ## ## @subheading References ## ## @enumerate ## @item ## SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory ## Systems (1993) ## ## @end enumerate ## @end deftypefn ## Author: Fernando Damian Nieuwveldt ## Description: Partial least squares regression using SIMPLS algorithm function [XLOADINGS, YLOADINGS, XSCORES, YSCORES, coefficients, fitted] = plsregress (X, Y, NCOMP) if nargout != 6 print_usage(); end nobs = rows (X); # Number of observations npred = columns (X); # Number of predictor variables nresp = columns (Y); # Number of responses if (! isnumeric (X) || ! isnumeric (Y)) error ("plsregress:Data matrix X and reponse matrix Y must be real matrices"); elseif (nobs != rows (Y)) error ("plsregress:Number of observations for Data matrix X and Response Matrix Y must be equal"); elseif(! isscalar (NCOMP)) error ("plsregress: Third argument must be a scalar"); end ## Mean centering Data matrix Xmeans = mean (X); X = bsxfun (@minus, X, Xmeans); ## Mean centering responses Ymeans = mean (Y); Y = bsxfun (@minus, Y, Ymeans); S = X'*Y; R = P = V = zeros (npred, NCOMP); T = U = zeros (nobs, NCOMP); Q = zeros (nresp, NCOMP); for a = 1:NCOMP [eigvec eigval] = eig (S'*S); # Y factor weights domindex = find (diag (eigval) == max (diag (eigval))); # get dominant eigenvector q = eigvec(:,domindex); r = S*q; # X block factor weights t = X*r; # X block factor scores t = t - mean (t); nt = sqrt (t'*t); # compute norm t = t/nt; r = r/nt; # normalize p = X'*t; # X block factor loadings q = Y'*t; # Y block factor loadings u = Y*q; # Y block factor scores v = p; ## Ensure orthogonality if a > 1 v = v - V*(V'*p); u = u - T*(T'*u); endif v = v/sqrt(v'*v); # normalize orthogonal loadings S = S - v*(v'*S); # deflate S wrt loadings ## Store data R(:,a) = r; T(:,a) = t; P(:,a) = p; Q(:,a) = q; U(:,a) = u; V(:,a) = v; endfor ## Regression coefficients B = R*Q'; fitted = bsxfun (@plus, T*Q', Ymeans); # Add mean ## Return coefficients = B; XSCORES = T; XLOADINGS = P; YSCORES = U; YLOADINGS = Q; projection = R; endfunction statistics/inst/pdist.m0000644000175000017500000001715411741556364015110 0ustar asneltasnelt## Copyright (C) 2008 Francesco Potort́ ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} pdist (@var{x}) ## @deftypefnx {Function File} {@var{y} =} pdist (@var{x}, @var{metric}) ## @deftypefnx {Function File} {@var{y} =} pdist (@var{x}, @var{metric}, @var{metricarg}, @dots{}) ## ## Return the distance between any two rows in @var{x}. ## ## @var{x} is the @var{n}x@var{d} matrix representing @var{q} row ## vectors of size @var{d}. ## ## The output is a dissimilarity matrix formatted as a row vector ## @var{y}, @math{(n-1)*n/2} long, where the distances are in ## the order [(1, 2) (1, 3) @dots{} (2, 3) @dots{} (n-1, n)]. You can ## use the @code{squareform} function to display the distances between ## the vectors arranged into an @var{n}x@var{n} matrix. ## ## @code{metric} is an optional argument specifying how the distance is ## computed. It can be any of the following ones, defaulting to ## "euclidean", or a user defined function that takes two arguments ## @var{x} and @var{y} plus any number of optional arguments, ## where @var{x} is a row vector and and @var{y} is a matrix having the ## same number of columns as @var{x}. @code{metric} returns a column ## vector where row @var{i} is the distance between @var{x} and row ## @var{i} of @var{y}. Any additional arguments after the @code{metric} ## are passed as metric (@var{x}, @var{y}, @var{metricarg1}, ## @var{metricarg2} @dots{}). ## ## Predefined distance functions are: ## ## @table @samp ## @item "euclidean" ## Euclidean distance (default). ## ## @item "seuclidean" ## Standardized Euclidean distance. Each coordinate in the sum of ## squares is inverse weighted by the sample variance of that ## coordinate. ## ## @item "mahalanobis" ## Mahalanobis distance: see the function mahalanobis. ## ## @item "cityblock" ## City Block metric, aka Manhattan distance. ## ## @item "minkowski" ## Minkowski metric. Accepts a numeric parameter @var{p}: for @var{p}=1 ## this is the same as the cityblock metric, with @var{p}=2 (default) it ## is equal to the euclidean metric. ## ## @item "cosine" ## One minus the cosine of the included angle between rows, seen as ## vectors. ## ## @item "correlation" ## One minus the sample correlation between points (treated as ## sequences of values). ## ## @item "spearman" ## One minus the sample Spearman's rank correlation between ## observations, treated as sequences of values. ## ## @item "hamming" ## Hamming distance: the quote of the number of coordinates that differ. ## ## @item "jaccard" ## One minus the Jaccard coefficient, the quote of nonzero ## coordinates that differ. ## ## @item "chebychev" ## Chebychev distance: the maximum coordinate difference. ## @end table ## @seealso{linkage, mahalanobis, squareform} ## @end deftypefn ## Author: Francesco Potort́ function y = pdist (x, metric, varargin) if (nargin < 1) print_usage (); elseif ((nargin > 1) && ! ischar (metric) && ! isa (metric, "function_handle")) error (["pdist: the distance function must be either a string or a " "function handle."]); endif if (nargin < 2) metric = "euclidean"; endif if (! ismatrix (x) || isempty (x)) error ("pdist: x must be a nonempty matrix"); elseif (length (size (x)) > 2) error ("pdist: x must be 1 or 2 dimensional"); endif y = []; if (rows(x) == 1) return; endif if (ischar (metric)) order = nchoosek(1:rows(x),2); Xi = order(:,1); Yi = order(:,2); X = x'; metric = lower (metric); switch (metric) case "euclidean" d = X(:,Xi) - X(:,Yi); if (str2num(version()(1:3)) > 3.1) y = norm (d, "cols"); else y = sqrt (sumsq (d, 1)); endif case "seuclidean" d = X(:,Xi) - X(:,Yi); weights = inv (diag (var (x, 0, 1))); y = sqrt (sum ((weights * d) .* d, 1)); case "mahalanobis" d = X(:,Xi) - X(:,Yi); weights = inv (cov (x)); y = sqrt (sum ((weights * d) .* d, 1)); case "cityblock" d = X(:,Xi) - X(:,Yi); if (str2num(version()(1:3)) > 3.1) y = norm (d, 1, "cols"); else y = sum (abs (d), 1); endif case "minkowski" d = X(:,Xi) - X(:,Yi); p = 2; # default if (nargin > 2) p = varargin{1}; # explicitly assigned endif; if (str2num(version()(1:3)) > 3.1) y = norm (d, p, "cols"); else y = (sum ((abs (d)).^p, 1)).^(1/p); endif case "cosine" prod = X(:,Xi) .* X(:,Yi); weights = sumsq (X(:,Xi), 1) .* sumsq (X(:,Yi), 1); y = 1 - sum (prod, 1) ./ sqrt (weights); case "correlation" if (rows(X) == 1) error ("pdist: correlation distance between scalars not defined") endif corr = cor (X); y = 1 - corr (sub2ind (size (corr), Xi, Yi))'; case "spearman" if (rows(X) == 1) error ("pdist: spearman distance between scalars not defined") endif corr = spearman (X); y = 1 - corr (sub2ind (size (corr), Xi, Yi))'; case "hamming" d = logical (X(:,Xi) - X(:,Yi)); y = sum (d, 1) / rows (X); case "jaccard" d = logical (X(:,Xi) - X(:,Yi)); weights = X(:,Xi) | X(:,Yi); y = sum (d & weights, 1) ./ sum (weights, 1); case "chebychev" d = X(:,Xi) - X(:,Yi); if (str2num(version()(1:3)) > 3.1) y = norm (d, Inf, "cols"); else y = max (abs (d), [], 1); endif endswitch endif if (isempty (y)) ## Metric is a function handle or the name of an external function l = rows (x); y = zeros (1, nchoosek (l, 2)); idx = 1; for ii = 1:l-1 for jj = ii+1:l y(idx++) = feval (metric, x(ii,:), x, varargin{:})(jj); endfor endfor endif endfunction %!shared xy, t, eucl %! xy = [0 1; 0 2; 7 6; 5 6]; %! t = 1e-3; %! eucl = @(v,m) sqrt(sumsq(repmat(v,rows(m),1)-m,2)); %!assert(pdist(xy), [1.000 8.602 7.071 8.062 6.403 2.000],t); %!assert(pdist(xy,eucl), [1.000 8.602 7.071 8.062 6.403 2.000],t); %!assert(pdist(xy,"euclidean"), [1.000 8.602 7.071 8.062 6.403 2.000],t); %!assert(pdist(xy,"seuclidean"), [0.380 2.735 2.363 2.486 2.070 0.561],t); %!assert(pdist(xy,"mahalanobis"),[1.384 1.967 2.446 2.384 1.535 2.045],t); %!assert(pdist(xy,"cityblock"), [1.000 12.00 10.00 11.00 9.000 2.000],t); %!assert(pdist(xy,"minkowski"), [1.000 8.602 7.071 8.062 6.403 2.000],t); %!assert(pdist(xy,"minkowski",3),[1.000 7.763 6.299 7.410 5.738 2.000],t); %!assert(pdist(xy,"cosine"), [0.000 0.349 0.231 0.349 0.231 0.013],t); %!assert(pdist(xy,"correlation"),[0.000 2.000 0.000 2.000 0.000 2.000],t); %!assert(pdist(xy,"spearman"), [0.000 2.000 0.000 2.000 0.000 2.000],t); %!assert(pdist(xy,"hamming"), [0.500 1.000 1.000 1.000 1.000 0.500],t); %!assert(pdist(xy,"jaccard"), [1.000 1.000 1.000 1.000 1.000 0.500],t); %!assert(pdist(xy,"chebychev"), [1.000 7.000 5.000 7.000 5.000 2.000],t); statistics/inst/raylpdf.m0000644000175000017500000000620311741556364015417 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{y} =} raylpdf (@var{x}, @var{sigma}) ## Compute the probability density function of the Rayleigh distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{x} is the support. The elements of @var{x} must be non-negative. ## ## @item ## @var{sigma} is the parameter of the Rayleigh distribution. The elements ## of @var{sigma} must be positive. ## @end itemize ## @var{x} and @var{sigma} must be of common size or one of them must be ## scalar. ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{y} is the probability density of the Rayleigh distribution at each ## element of @var{x} and corresponding parameter @var{sigma}. ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## x = 0:0.5:2.5; ## sigma = 1:6; ## y = raylpdf (x, sigma) ## @end group ## ## @group ## y = raylpdf (x, 0.5) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. pages 104 and 148, McGraw-Hill, New York, second edition, ## 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: PDF of the Rayleigh distribution function y = raylpdf (x, sigma) # Check arguments if (nargin != 2) print_usage (); endif if (! isempty (x) && ! ismatrix (x)) error ("raylpdf: x must be a numeric matrix"); endif if (! isempty (sigma) && ! ismatrix (sigma)) error ("raylpdf: sigma must be a numeric matrix"); endif if (! isscalar (x) || ! isscalar (sigma)) [retval, x, sigma] = common_size (x, sigma); if (retval > 0) error ("raylpdf: x and sigma must be of common size or scalar"); endif endif # Calculate pdf y = x .* exp ((-x .^ 2) ./ (2 .* sigma .^ 2)) ./ (sigma .^ 2); # Continue argument check k = find (! (x >= 0) | ! (x < Inf) | ! (sigma > 0)); if (any (k)) y(k) = NaN; endif endfunction %!test %! x = 0:0.5:2.5; %! sigma = 1:6; %! y = raylpdf (x, sigma); %! expected_y = [0.0000, 0.1212, 0.1051, 0.0874, 0.0738, 0.0637]; %! assert (y, expected_y, 0.001); %!test %! x = 0:0.5:2.5; %! y = raylpdf (x, 0.5); %! expected_y = [0.0000, 1.2131, 0.5413, 0.0667, 0.0027, 0.0000]; %! assert (y, expected_y, 0.001); statistics/inst/tabulate.m0000644000175000017500000000770111741556364015563 0ustar asneltasnelt## Copyright (C) 2003 Alberto Terruzzi ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{table} =} tabulate (@var{data}, @var{edges}) ## ## Compute a frequency table. ## ## For vector data, the function counts the number of ## values in data that fall between the elements in the edges vector ## (which must contain monotonically non-decreasing values). @var{table} is a ## matrix. ## The first column of @var{table} is the number of bin, the second ## is the number of instances in each class (absolute frequency). The ## third column contains the percentage of each value (relative ## frequency) and the fourth column contains the cumulative frequency. ## ## If @var{edges} is missed the width of each class is unitary, if @var{edges} ## is a scalar then represent the number of classes, or you can define the ## width of each bin. ## @var{table}(@var{k}, 2) will count the value @var{data} (@var{i}) if ## @var{edges} (@var{k}) <= @var{data} (@var{i}) < @var{edges} (@var{k}+1). ## The last bin will count the value of @var{data} (@var{i}) if ## @var{edges}(@var{k}) <= @var{data} (@var{i}) <= @var{edges} (@var{k}+1). ## Values outside the values in @var{edges} are not counted. Use -inf and inf ## in @var{edges} to include all values. ## Tabulate with no output arguments returns a formatted table in the ## command window. ## ## Example ## ## @example ## sphere_radius = [1:0.05:2.5]; ## tabulate (sphere_radius) ## @end example ## ## Tabulate returns 2 bins, the first contains the sphere with radius ## between 1 and 2 mm excluded, and the second one contains the sphere with ## radius between 2 and 3 mm. ## ## @example ## tabulate (sphere_radius, 10) ## @end example ## ## Tabulate returns ten bins. ## ## @example ## tabulate (sphere_radius, [1, 1.5, 2, 2.5]) ## @end example ## ## Tabulate returns three bins, the first contains the sphere with radius ## between 1 and 1.5 mm excluded, the second one contains the sphere with ## radius between 1.5 and 2 mm excluded, and the third contains the sphere with ## radius between 2 and 2.5 mm. ## ## @example ## bar (table (:, 1), table (:, 2)) ## @end example ## ## draw histogram. ## ## @seealso{bar, pareto} ## @end deftypefn ## Author: Alberto Terruzzi ## Version: 1.0 ## Created: 13 February 2003 function table = tabulate (varargin) if nargin < 1 || nargin > 2 print_usage; endif data = varargin{1}; if isvector (data) != 1 error ("data must be a vector."); endif n = length(data); m = min(data); M = max(data); if nargin == 1 edges = 1:1:max(data)+1; else edges = varargin{2}; end if isscalar(edges) h=(M-m)/edges; edges = [m:h:M]; end # number of classes bins=length(edges)-1; # initialize freqency table freqtable = zeros(bins,4); for k=1:1:bins; if k != bins freqtable(k,2)=length(find (data >= edges(k) & data < edges(k+1))); else freqtable(k,2)=length(find (data >= edges(k) & data <= edges(k+1))); end if k == 1 freqtable (k,4) = freqtable(k,2); else freqtable(k,4) = freqtable(k-1,4) + freqtable(k,2); end end freqtable(:,1) = edges(1:end-1)(:); freqtable(:,3) = 100*freqtable(:,2)/n; if nargout == 0 disp(" bin Fa Fr% Fc"); printf("%8g %5d %6.2f%% %5d\n",freqtable'); else table = freqtable; end endfunction statistics/inst/unidstat.m0000644000175000017500000000501611741556364015612 0ustar asneltasnelt## Copyright (C) 2006, 2007 Arno Onken ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{m}, @var{v}] =} unidstat (@var{n}) ## Compute mean and variance of the discrete uniform distribution. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{n} is the parameter of the discrete uniform distribution. The elements ## of @var{n} must be positive natural numbers ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{m} is the mean of the discrete uniform distribution ## ## @item ## @var{v} is the variance of the discrete uniform distribution ## @end itemize ## ## @subheading Example ## ## @example ## @group ## n = 1:6; ## [m, v] = unidstat (n) ## @end group ## @end example ## ## @subheading References ## ## @enumerate ## @item ## Wendy L. Martinez and Angel R. Martinez. @cite{Computational Statistics ## Handbook with MATLAB}. Appendix E, pages 547-557, Chapman & Hall/CRC, ## 2001. ## ## @item ## Athanasios Papoulis. @cite{Probability, Random Variables, and Stochastic ## Processes}. McGraw-Hill, New York, second edition, 1984. ## @end enumerate ## @end deftypefn ## Author: Arno Onken ## Description: Moments of the discrete uniform distribution function [m, v] = unidstat (n) # Check arguments if (nargin != 1) print_usage (); endif if (! isempty (n) && ! ismatrix (n)) error ("unidstat: n must be a numeric matrix"); endif # Calculate moments m = (n + 1) ./ 2; v = ((n .^ 2) - 1) ./ 12; # Continue argument check k = find (! (n > 0) | ! (n < Inf) | ! (n == round (n))); if (any (k)) m(k) = NaN; v(k) = NaN; endif endfunction %!test %! n = 1:6; %! [m, v] = unidstat (n); %! expected_m = [1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000]; %! expected_v = [0.0000, 0.2500, 0.6667, 1.2500, 2.0000, 2.9167]; %! assert (m, expected_m, 0.001); %! assert (v, expected_v, 0.001); statistics/inst/gevfit.m0000644000175000017500000000547312070346436015243 0ustar asneltasnelt## Copyright (C) 2012 Nir Krakauer ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {@var{paramhat}, @var{paramci} =} gevfit (@var{data}, @var{parmguess}) ## Find the maximum likelihood estimator (@var{paramhat}) of the generalized extreme value (GEV) distribution to fit @var{data}. ## ## @subheading Arguments ## ## @itemize @bullet ## @item ## @var{data} is the vector of given values. ## @item ## @var{parmguess} is an initial guess for the maximum likelihood parameter vector. If not given, this defaults to [0; 1; 0]. ## @end itemize ## ## @subheading Return values ## ## @itemize @bullet ## @item ## @var{parmhat} is the 3-parameter maximum-likelihood parameter vector [@var{k}; @var{sigma}; @var{mu}], where @var{k} is the shape parameter of the GEV distribution, @var{sigma} is the scale parameter of the GEV distribution, and @var{mu} is the location parameter of the GEV distribution. ## @item ## @var{paramci} has the approximate 95% confidence intervals of the parameter values based on the Fisher information matrix at the maximum-likelihood position. ## ## @end itemize ## ## @subheading Examples ## ## @example ## @group ## data = 1:50; ## [pfit, pci] = gevfit (data); ## p1 = gevcdf(data,pfit(1),pfit(2),pfit(3)); ## plot(data, p1) ## @end group ## @end example ## @seealso{gevcdf, gevinv, gevlike, gevpdf, gevrnd, gevstat} ## @end deftypefn ## Author: Nir Krakauer ## Description: Maximum likelihood parameter estimation for the generalized extreme value distribution function [paramhat, paramci] = gevfit (data, parmguess=[0; 1; 0]) # Check arguments if (nargin < 1) print_usage; endif #cost function to minimize f = @(p) gevlike(p, data); paramhat = fminunc(f, parmguess, optimset("GradObj", "on")); if nargout > 1 [nlogL, ~, ACOV] = gevlike (paramhat, data); param_se = sqrt(diag(inv(ACOV))); paramci(:, 1) = paramhat - 1.96*param_se; paramci(:, 2) = paramhat + 1.96*param_se; endif endfunction %!test %! data = 1:50; %! [pfit, pci] = gevfit (data); %! expected_p = [-0.44 15.19 21.53]'; %! expected_pu = [-0.13 19.31 26.49]'; %! assert (pfit, expected_p, 0.1); %! assert (pci(:, 2), expected_pu, 0.1); statistics/inst/nanmax.m0000644000175000017500000000342312246443234015231 0ustar asneltasnelt## Copyright (C) 2001 Paul Kienzle ## ## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU General Public License as published by the Free Software ## Foundation; either version 3 of the License, or (at your option) any later ## version. ## ## This program is distributed in the hope that it will be useful, but WITHOUT ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more ## details. ## ## You should have received a copy of the GNU General Public License along with ## this program; if not, see . ## -*- texinfo -*- ## @deftypefn {Function File} {[@var{v}, @var{idx}] =} nanmax (@var{X}) ## @deftypefnx{Function File} {[@var{v}, @var{idx}] =} nanmax (@var{X}, @var{Y}) ## Find the maximal element while ignoring NaN values. ## ## @code{nanmax} is identical to the @code{max} function except that NaN values ## are ignored. If all values in a column are NaN, the maximum is ## returned as NaN rather than []. ## ## @seealso{max, nansum, nanmin, nanmean, nanmedian} ## @end deftypefn function [v, idx] = nanmax (X, Y, DIM) if nargin < 1 || nargin > 3 print_usage; elseif nargin == 1 || (nargin == 2 && isempty(Y)) nanvals = isnan(X); X(nanvals) = -Inf; [v, idx] = max (X); v(all(nanvals)) = NaN; elseif (nargin == 3 && isempty(Y)) nanvals = isnan(X); X(nanvals) = -Inf; [v, idx] = max (X,[],DIM); v(all(nanvals,DIM)) = NaN; else Xnan = isnan(X); Ynan = isnan(Y); X(Xnan) = -Inf; Y(Ynan) = -Inf; if (nargin == 3) [v, idx] = max(X,Y,DIM); else [v, idx] = max(X,Y); endif v(Xnan & Ynan) = NaN; endif endfunction statistics/DESCRIPTION0000644000175000017500000000047212271566110014320 0ustar asneltasneltName: Statistics Version: 1.2.3 Date: 2014-01-28 Author: various authors Maintainer: Arno Onken Title: Statistics Description: Additional statistics functions for Octave. Categories: Statistics Depends: octave (>= 3.6.1), io (>= 1.0.18) License: GPLv3+, public domain Url: http://octave.sf.net statistics/INDEX0000644000175000017500000000211112267316306013401 0ustar asneltasneltstatistics >> Statistics Distributions anderson_darling_cdf betastat binostat chi2stat cl_multinom copulacdf copulapdf copularnd expstat fstat gamlike gamstat geostat gevcdf gevfit gevfit_lmom gevinv gevlike gevpdf gevrnd gevstat hygestat iwishpdf iwishrnd jsucdf jsupdf lognstat mvnpdf mvnrnd mvncdf mnpdf mnrnd mvtcdf mvtrnd nbinstat normalise_distribution normstat poisstat random raylcdf raylinv raylpdf raylrnd raylstat tstat unidstat unifstat vmpdf vmrnd wblstat wishpdf wishrnd Descriptive statistics nansum nanmax nanmean nanmedian nanmin nanstd nanvar geomean harmmean mad trimmean tabulate combnk jackknife Experimental design fullfact ff2n Regression anovan monotone_smooth princomp pcares pcacov plsregress regress regress_gp stepwisefit Plots boxplot normplot histfit hist3 repanova dendogram Models hmmestimate hmmgenerate hmmviterbi Hypothesis testing anderson_darling_test runstest Fitting gamfit Clustering cmdscale kmeans linkage pdist squareform Reading and Writing caseread casewrite tblread tblwrite