debian/0000755000000000000000000000000011565104212007163 5ustar debian/source/0000755000000000000000000000000011565104212010463 5ustar debian/source/format0000644000000000000000000000001411565104212011671 0ustar 3.0 (quilt) debian/dirs0000644000000000000000000000001011565104212010036 0ustar usr/bin debian/watch0000644000000000000000000000007511565104212010216 0ustar version=3 http://sf.net/klustakwik/KlustaKwik-(.*)\.tar\.bz2 debian/patches/0000755000000000000000000000000011565104212010612 5ustar debian/patches/series0000644000000000000000000000004411565104212012025 0ustar up_get_makefile_back up_extra_files debian/patches/up_extra_files0000644000000000000000000002164611565104212013557 0ustar --- /dev/null +++ b/README @@ -0,0 +1,148 @@ +KlustaKwik version 2.01 +---------------------- + +KlustaKwik is a program for unsupervised classification of multidimensional +continuous data. It arose from a specific need - automatic sorting of neuronal +action potential waveforms (see KD Harris et al, Journal of Neurophysiology +84:401-414,2000), but works for any type of data. We needed a program that +would: + +1) Fit a mixture of Gaussians with unconstrained covariance matrices +2) Automatically choose the number of mixture components +3) Be robust against noise +4) Reduce the problem of local minima +5) Run fast on large data sets (up to 100000 points, 48 dimensions) + +Speed in particular was essential. KlustaKwik is based on the CEM algorithm of +Celeux and Govaert (which is faster than the standard EM algorithm), and also +uses several tricks to improve execution speed while maintaining good +performance. On our data, it runs at least 10 times faster than Autoclass. + +Cluster splitting and deletion +------------------------------ + +The main improvement in version 1.5 is a cluster splitting feature. KlustaKwik +allows for a variable number of clusters to be fit, penalized by AIC. The +program periodically checks if splitting any cluster would improve the overall +score. It also checks to see if deleting any cluster and reallocating its +points would improve overall score. The splitting and deletion features allow +the program to often escape from local minima, reducing sensitivity to the +initial number of clusters, and reducing the total number of starts needed for +a data set. + + +Compilation +----------- + +The program is written in C++. To compile under unix, extract all files to a +single directory and type make. That should be all you need to do. If it +doesn't work, change the makefile to replace g++ with the name of your C++ +compiler. + +To check it compiled properly type "KlustaKwik test 1 -MinClusters 2" to run +the program on the supplied test file. + +Usage +----- + +The program takes a "feature file" as input, and produces two output files, the +"cluster file", and a log file. The file formats and conventions may seem +slightly strange. This is for historical reasons. If you want to change the +code, go ahead, this is open source software. + +The feature file should have a name like FILE.fet.n, where FILE is any string, +and n is a number. The program is invoked by running "KlustaKwik FILE n", and +will create a cluster file FILE.clu.n and a log file FILE.klg.n. The number n +doesn't serve any purpose other than to let you have several files with the same +file base. + +The first line of the feature file should be the number of input dimensions. +The following lines are the data, with each line being one data instance, +consisting of a list of numbers separated by spaces. An example file test.fet.1 +is provided. + +The first line of the cluster file will be the number of classes that the +program chose. The following lines will be the classes asigned to the data +points. Class 1 is a "noise cluster" modelled by a uniform distribution, which +should contain outliers, if there are any. + + +Parameters +---------- + +It is possible to pass the program parameters by running "KlustaKwik FILE n +params" etc. All parameters have default values. Here are the parameters you can +use: + +-help +Prints a short message and then the default parameter values. + +-MinClusters n (default 20) +The random intial assignment will have no less than n clusters. The final +number may be different, since clusters can be split or deleted during the +course of the algorithm + +-MaxClusters n (default 30) +The random intial assignment will have no more than n clusters. + +-nStarts n (default 1) +The algorithm will be started n times for each inital cluster count between +MinClusters and MaxClusters. + +-SplitEvery n (default 50) +Test to see if any clusters should be split every n steps. 0 means don't split. + +-MaxPossibleClusters n (default 100) +Cluster splitting can produce no more than n clusters. + +-RandomSeed n (default 1) +Specifies a seed for the random number generator + +-UseFeatures STRING (default 11111111111100001) +Specifies a subset of the input features to use. STRING should consist of 1s +and 0s with a 1 indicating to use the feature and a 0 to leave it out. NB The +default value for this parameter is 11111111111100001 (because this is what we +use in the lab) - so if you have more than 12 dimensions you will need to change +it. + +-StartCluFile STRING (default "") +Treats the specified cluster file as a "gold standard". If it can't find a +better cluster assignment, it will output this. + +-DistThresh d (default 6.907755) +Time-saving paramter. If a point has log likelihood more than d worse for a +given class than for the best class, the log likelihood for that class is not +recalculated. This saves an awful lot of time. + +-FullStepEvery n (default 10) +All log-likelihoods are recalculated every n steps (see DistThresh) + +-ChangedThresh f (default 0.05) +All log-likelihoods are recalculated if the fraction of instances changing class +exeeds f (see DistThresh) + +-MaxIter n (default 500) +Don't try more than n iterations from any starting point. + +-Log (default 1) + +Produces .klg log file (default is yes, to switch off do -Log 0) + +-Screen (default 1) + +Produces parameters and progress information on the console. Set to 0 to suppress +output in batches. + +-Debug (default 0) +Miscellaneous debugging information (not recommended) + +-DistDump (default 0) +Outputs a ridiculous amount of debugging information (definately not recommended). + + +Contact Information +------------------- + +This program is copyright Ken Harris (harris@axon.rutgers.edu), 2000-2002. It +is distributed under the GNU General Public License (www.gnu.org). If you make +any changes or improvements, please let me know. --- /dev/null +++ b/test.fet.1 @@ -0,0 +1,202 @@ +2 +-4326 -1834 +-2437 -3718 +-3642 -2409 +-2392 -3417 +-2483 -3470 +-1751 -4523 +-4094 -1892 +-3774 -2010 +-2635 -3306 +-4117 -1770 +-3669 -2095 +-3085 -2993 +-3290 -2744 +-2238 -3799 +-3704 -2294 +-2491 -3533 +-3597 -2386 +-3966 -1797 +-1339 -4910 +-3095 -3061 +-3162 -2953 +-3620 -2456 +-3407 -2760 +-1948 -4340 +-3721 -2314 +-3898 -2204 +-3407 -2588 +-4588 -1501 +-2688 -3253 +-3666 -2507 +-761 -5480 +-2184 -3818 +-4029 -1732 +-995 -5200 +-2979 -3036 +-3643 -2197 +-3755 -2309 +-2870 -2956 +-3072 -2963 +-2109 -3610 +-2920 -3521 +-2860 -3409 +-4234 -1824 +-3813 -2090 +-3447 -2357 +-1362 -4430 +-4773 -973 +-4041 -1688 +-3409 -2426 +-3256 -2679 +-3367 -2793 +-4368 -1488 +-503 -5354 +-1968 -4362 +-4979 -1032 +-3115 -2816 +-1196 -4717 +-2486 -3729 +-2642 -3450 +-2460 -3424 +-3120 -2823 +-3965 -2088 +-2232 -3793 +-665 -5335 +-4442 -1923 +-2697 -3232 +-2417 -3317 +-1995 -4416 +-2891 -3090 +-2306 -3696 +-890 -4959 +-2857 -3257 +-4396 -1656 +-4724 -1194 +-3795 -2216 +-2349 -3429 +-2352 -3380 +-2216 -3863 +-3392 -2511 +-4628 -1471 +-1961 -3789 +-2783 -3583 +-3486 -2652 +-2084 -3307 +-2361 -3520 +-3568 -2269 +-2428 -3390 +-2731 -3322 +-3008 -3067 +-5142 -808 +-4021 -1913 +-3600 -2423 +-1879 -4507 +-2902 -2907 +-2790 -3325 +-3749 -2061 +-4278 -1728 +-2407 -3597 +-2347 -3766 +-3671 -2647 +2918 4029 +3185 4445 +2637 3126 +3162 3983 +3033 4518 +2898 4001 +3118 3818 +2854 1385 +2818 4377 +2957 2300 +3094 3119 +3040 2649 +3079 2914 +3347 1310 +3232 2064 +3383 2916 +3527 3780 +2667 4199 +2860 2151 +2995 3161 +3298 4057 +3163 2902 +3318 3694 +3107 1996 +2853 2448 +2927 4169 +2954 1052 +2893 2598 +2939 3064 +2993 1013 +2792 3996 +3211 3226 +3076 3885 +2943 2748 +2928 3930 +2953 3012 +3039 1962 +3140 2110 +2991 3878 +2930 3650 +2873 3107 +2897 2983 +3107 2813 +3223 2366 +3246 4391 +2869 3684 +2706 1623 +3263 1425 +3007 3931 +3244 3060 +3142 2632 +3218 3530 +3058 1120 +2879 2784 +2285 3624 +2871 921 +2981 3129 +2725 2852 +2884 1657 +2891 1722 +3089 4797 +2984 1936 +3443 3679 +3165 3726 +2875 4545 +2865 2137 +3115 2169 +3012 1031 +3148 1722 +3142 2500 +2830 3383 +3084 3545 +3120 2423 +2765 2456 +2984 1631 +2981 3797 +2407 3704 +2885 3240 +3189 4081 +2653 3172 +2993 5084 +2940 3365 +2891 4528 +2677 4228 +3044 5899 +3124 426 +3213 1975 +2929 4583 +3164 4701 +3100 2755 +2951 -356 +3174 2629 +3129 4391 +2965 2793 +2527 4671 +3327 4180 +3187 2113 +3142 1422 +2904 3945 +2909 4102 +0 0 --- /dev/null +++ b/test_res.clu.1 @@ -0,0 +1,202 @@ +3 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +2 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +3 +1 debian/patches/up_get_makefile_back0000644000000000000000000000114711565104212014640 0ustar --- a/makefile +++ b/makefile @@ -1,2 +1,20 @@ -KlustaKwik: KlustaKwik.C - g++ -g -O -o KlustaKwik KlustaKwik.C param.c -lm +SRC=KlustaKwik.C +MSRC=KlustaKwik.h param.c param.h Array.h +ASRC=$(SRC) $(MSRC) + +CPPFLAGS=-O -g + +KlustaKwik: $(SRC) $(MSRC) + g++ $(CPPFLAGS) -o $@ KlustaKwik.C param.c -lm + +test: check +check: KlustaKwik + ./KlustaKwik test 1 -MinClusters 2 >| tempout + : # For now no actual test, because non-noise cluster + : # indicies differ from the 'test' ones + diff test_res.clu.1 test.clu.1 || : + +clean: + -rm -f $(SRC:.C=) tempout test.clu.1 test.klg.1 test.out + +.PHONY: check test debian/changelog0000644000000000000000000000024311565104212011034 0ustar klustakwik (2.0.1-1) unstable; urgency=low * Initial release (Closes: #627262) -- Yaroslav Halchenko Wed, 18 May 2011 21:50:28 -0400 debian/klustakwik.lintian-overrides0000644000000000000000000000026611565104212014740 0ustar # upstream does not provide binary -- user manual consists of 1 README # file. Compilation of a manpage is on TODO list klustakwik binary: binary-without-manpage usr/bin/KlustaKwik debian/copyright0000644000000000000000000000204111565104212011113 0ustar Format: http://dep.debian.net/deps/dep5 Upstream-Name: klustakwik Source: http://sourceforge.net/projects/klustakwik/ Files: * Copyright: 2000-2011, Ken Harris (harris@axon.rutgers.edu) License: GPL-2+ This package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. . This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. . You should have received a copy of the GNU General Public License along with this program. If not, see . On Debian systems, the complete text of the GNU General Public License version 2 can be found in "/usr/share/common-licenses/GPL-2". Files: debian/* Copyright: 2011 Yaroslav Halchenko License: GPL-2+ debian/rules0000755000000000000000000000100311565104212010235 0ustar #!/usr/bin/make -f # -*- makefile -*- # Sample debian/rules that uses debhelper. # This file was originally written by Joey Hess and Craig Small. # As a special exception, when this file is copied by dh-make into a # dh-make output file, you may use that output file without restriction. # This special exception was added by Craig Small in version 0.37 of dh-make. # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 %: dh $@ override_dh_auto_install: install KlustaKwik debian/klustakwik/usr/bin/ debian/README.source0000644000000000000000000000052311565104212011342 0ustar KlustaKwik for Debian --------------------- Directory debian/patches/files contains additional files which are available from upstream GIT repository git://klustakwik.git.sourceforge.net/gitroot/klustakwik but were not included in the shipped tarball. -- Yaroslav Halchenko , Wed, 18 May 2011 21:57:38 -0400 debian/docs0000644000000000000000000000000711565104212010033 0ustar README debian/blends0000644000000000000000000000066211565104212010361 0ustar Source: klustakwik Format: extended Tasks: debian-science/electrophysiology Homepage: http://klustakwik.sourceforge.net Published-Title: Accuracy of Tetrode Spike Separation as Determined by Simultaneous Intracellular and Extracellular Measurements Published-Authors: Kenneth D. Harris, Darrell A. Henze, Jozsef Csicsvari, Hajime Hirase, and György Buzsáki Published-In: Journal of Neurophysiology, 84, 401-414 Published-Year: 2000 debian/TODO0000644000000000000000000000003711565104212007653 0ustar * Cook a manpage out of README debian/control0000644000000000000000000000145711565104212010575 0ustar Source: klustakwik Section: science Priority: extra Maintainer: NeuroDebian Team Uploaders: Yaroslav Halchenko , Michael Hanke Build-Depends: debhelper (>= 7.0.50~) Standards-Version: 3.9.2 Homepage: http://sourceforge.net/projects/klustakwik/ Vcs-Browser: http://git.debian.org/?p=pkg-exppsy/klustakwik.git Vcs-Git: git://git.debian.org/git/pkg-exppsy/klustakwik.git Package: klustakwik Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: automatic sorting of the samples (spikes) into clusters KlustaKwik is a program for automatic clustering of continuous data into a mixture of Gaussians. The program was originally developed for sorting of neuronal action potentials, but can be applied to any sort of data. debian/compat0000644000000000000000000000000211565104212010361 0ustar 7