debian/0000755000000000000000000000000011776571734007210 5ustar debian/source/0000755000000000000000000000000011776571271010504 5ustar debian/source/format0000644000000000000000000000001411775235535011711 0ustar 3.0 (quilt) debian/ncbi-seg.install0000644000000000000000000000002211775324710012246 0ustar usr/bin usr/share debian/README.source0000644000000000000000000000110611775324710011352 0ustar ncbi-seg for Debian =================== COPYING ------- Licensing information was provided by John C. Wootton, see his email at [1]. [1] http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016269.html Repackaged Upstream Source -------------------------- Upstream comes as a bunch of files at ftp://ftp.ncbi.nih.gov/pub/seg/seg/. There is no upstream tarball at all. The orig.tar.gz can be generated using the get-orig-source of debian/rules. debian/watch ------------ The upstream is not versioned, there is no upstream tarball. There can be no watch file. debian/copyright0000644000000000000000000000446511775235535011147 0ustar Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Upstream-Name: seg Upstream-Contact: John Wootton Source: ftp://ftp.ncbi.nih.gov/pub/seg/seg/ Files: debian/* Copyright: 2012 Laszlo Kajan License: GPL-3+ Files: * Copyright: public-domain License: public-domain PUBLIC DOMAIN NOTICE National Center for Biotechnology Information . This software/database is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the authors' official duties as United States Government employees and thus cannot be copyrighted. This software/database is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction. . Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose. . Please cite the authors in any work or product based on this material. . Authors: John C. Wootton, Scott Federhen National Center For Biotechnology Information National Library of Medicine National Institutes of Health License: GPL-3+ This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. . This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. . You should have received a copy of the GNU General Public License along with this program. If not, see . . On Debian systems, the complete text of the GNU General Public License version 3 can be found in "/usr/share/common-licenses/GPL-3". debian/upstream0000644000000000000000000000050511775506256010767 0ustar Reference: Author: John C. Wootton and Scott Federhen Title: 'Statistics of local complexity in amino acid sequences and sequence databases.' Journal: 'Computers & Chemistry' Year: 1993 Volume: 17 Pages: 149-163 DOI: 10.1016/0097-8485(93)85006-X URL: http://www.sciencedirect.com/science/article/pii/009784859385006X debian/rules0000755000000000000000000000222311775324710010254 0ustar #!/usr/bin/make -f # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 export CPPFLAGS:=$(shell dpkg-buildflags --get CPPFLAGS) export CFLAGS:=$(shell dpkg-buildflags --get CFLAGS) export LDFLAGS:=$(shell dpkg-buildflags --get LDFLAGS) ver := 0.0.20000620 pkg := ncbi-seg .PHONY: get-orig-source get-orig-source: set -e; \ t=$$(mktemp -d) || exit 1; \ trap "rm -rf -- '$$t'" EXIT; \ d="$$t/$(pkg)-$(ver).orig"; \ mkdir "$$d"; \ ( cd "$$d"; \ wget \ ftp://ftp.ncbi.nih.gov/pub/seg/seg/README\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/genwin.c\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/genwin.h\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/hiseg.c\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/lnfac.h\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/makefile\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/seg.c\ ftp://ftp.ncbi.nih.gov/pub/seg/seg/seg.doc; \ ); \ GZIP="--best --no-name" tar --owner=root --group=root --mode=a+rX -caf "./$(pkg)_$(ver).orig.tar.gz" -C "$$t" "$(pkg)-$(ver).orig" .PHONY: override_dh_autoreconf override_dh_autoreconf: dh_autoreconf -X./COPYING .PHONY: override_dh_strip override_dh_strip: dh_strip --dbg-package=$(pkg)-dbg %: dh $@ --with autoreconf debian/patches/0000755000000000000000000000000011776571271010633 5ustar debian/patches/genwin.c0000644000000000000000000000237711775324707012276 0ustar Description: add missing include file and fix type casts From: Laszlo Kajan Forwarded: http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016275.html Index: seg-1994101801/genwin.c =================================================================== --- seg-1994101801.orig/genwin.c 2012-07-04 18:15:40.000000000 +0000 +++ seg-1994101801/genwin.c 2012-07-04 18:48:41.414821989 +0000 @@ -5,6 +5,7 @@ /*--------------------------------------------------------------(includes)---*/ +#include #include "genwin.h" /*---------------------------------------------------------------(defines)---*/ @@ -53,7 +54,7 @@ #define TESTMAX 1000 void *tmalloc(); -int record_ptrs[TESTMAX] = {0,0,0,0}; +void *record_ptrs[TESTMAX] = {NULL,NULL,NULL,NULL}; int rptr = 0; /*------------------------------------------------------------(genwininit)---*/ @@ -850,7 +851,7 @@ exit(2); } - record_ptrs[rptr] = (int) ptr; + record_ptrs[rptr] = ptr; rptr++; return(ptr); @@ -865,9 +866,9 @@ for (i=0; i Forwarded: http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016275.html --- /dev/null +++ b/ncbi-seg.pod @@ -0,0 +1,290 @@ +=head1 NAME + +ncbi-seg - segment sequence(s) by local complexity + +=head1 SYNOPSIS + +ncbi-seg sequence [ W ] [ K(1) ] [ K(2) ] [ -x ] [ options ] + +=head1 DESCRIPTION + +ncbi-seg divides sequences into contrasting segments of low-complexity +and high-complexity. Low-complexity segments defined by the +algorithm represent "simple sequences" or "compositionally-biased +regions". + +Locally-optimized low-complexity segments are produced at defined +levels of stringency, based on formal definitions of local +compositional complexity (Wootton & Federhen, 1993). The segment +lengths and the number of segments per sequence are determined +automatically by the algorithm. + +The input is a FASTA-formatted sequence file, or a database file +containing many FASTA-formatted sequences. ncbi-seg is tuned for amino +acid sequences. For nucleotide sequences, see EXAMPLES OF +PARAMETER SETS below. + +The stringency of the search for low-complexity segments is +determined by three user-defined parameters, trigger window length +[ W ], trigger complexity [ K(1) ] and extension complexity [ K(2)] +(see below under PARAMETERS ). The defaults provided are suitable +for low-complexity masking of database search query sequences [ -x +option required, see below]. + + +=head1 OUTPUTS AND APPLICATIONS + +(1) Readable segmented sequence [Default]. Regions of contrasting +complexity are displayed in "tree format". See EXAMPLES. + +(2) Low-complexity masking (see Altschul et al, 1994). Produce a +masked FASTA-formatted file, ready for input as a query sequence for +database search programs such as BLAST or FASTA. The amino acids in +low-complexity regions are replaced with "x" characters [-x option]. +See EXAMPLES. + +(3) Database construction. Produce FASTA-formatted files containing +low-complexity segments [-l option], or high-complexity segments +[-h option], or both [-a option]. Each segment is a separate +sequence entry with an informative header line. + +=head1 ALGORITHM + +The SEG algorithm has two stages. First, identification of +approximate raw segments of low- complexity; second local +optimization. + +At the first stage, the stringency and resolution of the search for +low-complexity segments is determined by the W, K(1) and K(2) +parameters. All trigger windows are defined, including overlapping +windows, of length W and complexity less than or equal to K(1). +"Complexity" here is defined by equation (3) of Wootton & Federhen +(1993). Each trigger window is then extended into a contig in both +directions by merging with extension windows, which are overlapping +windows of length W and complexity less than or equal to K(2). +Each contig is a raw segment. + +At the second stage, each raw segment is reduced to a single +optimal low-complexity segment, which may be the entire raw +segment but is usually a subsequence. The optimal subsequence has +the lowest value of the probability P(0) (equation (5) of Wootton +& Federhen, 1993). + +=head1 PARAMETERS + +These three numeric parameters are in obligatory order after the +sequence file name. + +Trigger window length [ W ]. An integer greater than zero [ Default +12 ]. + +Trigger complexity. [ K1 ]. The maximum complexity of a trigger +window in units of bits. K1 must be equal to or greater than zero. +The maximum value is 4.322 (log[base 2]20) for amino acid +sequences [ Default 2.2 ]. + +Extension complexity [ K2 ]. The maximum complexity of an extension +window in units of bits. Only values greater than K1 are effective +in extending triggered windows. Range of possible values is as for +K1 [ Default 2.5 ]. + + +=head1 OPTIONS + +The following options may be placed in any order in the command +line after the W, K1 and K2 parameters: + +=over + +=item -a + +Output both low-complexity and high-complexity segments in a +FASTA-formatted file, as a set of separate entries with header +lines. + +=item -c [characters-per-line] + +Number of sequence characters per line of +output [Default 60]. Other characters, such as residue numbers, are additional. + +=item -h + +Output only the high-complexity segments in a FASTA-formatted +file, as a set of separate entries with header lines. + +=item -l + +Output only the low-complexity segments in a FASTA-formatted +file, as a set of separate entries with header lines. + +=item -m [length] + +Minimum length in residues for a high-complexity +segment [default 0]. Shorter segments are merged with adjacent +low-complexity segments. + +=item -o + +Show all overlapping, independently-triggered low-complexity +segments [these are merged by default]. + +=item -q + +Produce an output format with the sequence in a numbered block +with markings to assist residue counting. The low-complexity and +high-complexity segments are in lower- and upper-case characters +respectively. + +=item -t [length] + +"Maximum trim length" parameter [default 100]. This +controls the search space (and search time) during the +optimization of raw segments (see ALGORITHM above). By default, +subsequences 100 or more residues shorter than the raw segment are +omitted from the search. This parameter may be increased to give +a more extensive search if raw segments are longer than 100 residues. + +=item -x + +The masking option for amino acid sequences. Each input +sequence is represented by a single output sequence in FASTA-format +with low-complexity regions replaced by strings of "x" characters. + +=back + +=head1 EXAMPLES OF PARAMETER SETS + +Default parameters are given by 'ncbi-seg sequence' (equivalent to 'ncbi-seg +sequence 12 2.2 2.5'). These parameters are appropriate for low- +complexity masking of many amino acid sequences [with -x option ]. + +=head2 Database-database comparisons: + +More stringent (lower) complexity parameters are suitable when +masked sequences are compared with masked sequences. For example, +for BLAST or FASTA searches that compare two amino acid sequence +databases, the following masking may be applied to both databases: + + ncbi-seg database 12 1.8 2.0 -x + +=head2 Homopolymer analysis: + +To examine all homopolymeric subsequences of length (for example) +7 or greater: + + ncbi-seg sequence 7 0 0 + +=head2 Non-globular regions of protein sequences: + +Many long non-globular domains may be diagnosed at longer window +lengths, typically: + + ncbi-seg sequence 45 3.4 3.75 + +For some shorter non-globular domains, the following set is +appropriate: + + ncbi-seg sequence 25 3.0 3.3 + +=head2 Nucleotide sequences: + +The maximum value of the complexity parameters is 2 (log[base 2]4). +For masking, the following is approximately equivalent in effect +to the default parameters for amino acid sequences: + + ncbi-seg sequence.na 21 1.4 1.6 + +=head1 EXAMPLES + +The following is a file named 'prion' in FASTA format: + + >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR + MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP + HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA + VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV + NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV + ILLISFLIFLIVG + +The command line: + + ncbi-seg __docdir__/examples/prion.fa + +gives the standard output below + + + >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR + + 1-49 MANLGCWMLVLFVATWSDLGLCKKRPKPGG + WNTGGSRYPGQGSPGGNRY + ppqggggwgqphgggwgqphgggwgqphgg 50-94 + gwgqphgggwgqggg + 95-112 THSQWNKPSKPKTNMKHM + agaaaagavvgglggymlgsams 113-135 + 136-187 RPIIHFGSDYEDRYYRENMHRYPNQVYYRP + MDEYSNQNNFVHDCVNITIKQH + tvttttkgenftet 188-201 + 202-236 DVKMMERVVEQMCITQYERESQAYYQRGSS + MVLFS + sppvillisflifliv 237-252 + 253-253 G + +The low-complexity sequences are on the left (lower case) and +high-complexity sequences are on the right (upper case). All +sequence segments read from left to right and their order in the +sequence is from top to bottom, as shown by the central column of +residue numbers. + +The command line: + + ncbi-seg __docdir__/examples/prion.fa -x + +gives the following FASTA-formatted file:- + + >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR + MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYxxxxxxxxxxx + xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTHSQWNKPSKPKTNMKHMxxxxxxxx + xxxxxxxxxxxxxxxRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV + NITIKQHxxxxxxxxxxxxxxDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxxx + xxxxxxxxxxxxG + +=head1 SEE ALSO + +segn(1), blast(1), saps(1), xnu(1) + +=head1 AUTHORS + +John Wootton: wootton@ncbi.nlm.nih.gov + +Scott Federhen: federhen@ncbi.nlm.nih.gov + + National Center for Biotechnology Information + Building 38A, Room 8N805 + National Library of Medicine + National Institutes of Health + Bethesda, Maryland, MD 20894 + U.S.A. + + +=head1 PRIMARY REFERENCE + +Wootton, J.C., Federhen, S. (1993) Statistics of local complexity +in amino acid sequences and sequence databases. Computers & +Chemistry 17: 149-163. + + +=head1 OTHER REFERENCES + +Wootton, J.C. (1994) Non-globular domains in protein sequences: +automated segmentation using complexity measures. Computers & +Chemistry 18: (in press). + +Altschul, S.F., Boguski, M., Gish, W., Wootton, J.C. (1994) Issues +in searching molecular sequence databases. Nature Genetics 6: +119-129. + +Wootton, J.C. (1994) Simple sequences of protein and DNA. In: +Nucleic Acid and Protein Sequence Analysis: A Practical Approach. +(Second Edition, Chapter 8, Bishop, M.J. and Rawlings, C.R. Eds. +IRL Press, Oxford) (In press). + + --- a/seg.doc +++ b/seg.doc @@ -13,12 +13,12 @@ seg sequence [ W ] [ K(1) ] [ K(2) ] [ -x ] [ options ] -DESCRIPTION +DESCRIPTION ----------- seg divides sequences into contrasting segments of low-complexity and high-complexity. Low-complexity segments defined by the -algorithm represent "simple sequences" or "compositionally-biased +algorithm represent "simple sequences" or "compositionally-biased regions". Locally-optimized low-complexity segments are produced at defined @@ -29,36 +29,36 @@ The input is a FASTA-formatted sequence file, or a database file containing many FASTA-formatted sequences. seg is tuned for amino -acid sequences. For nucleotide sequences, see EXAMPLES OF +acid sequences. For nucleotide sequences, see EXAMPLES OF PARAMETER SETS below. The stringency of the search for low-complexity segments is determined by three user-defined parameters, trigger window length -[ W ], trigger complexity [ K(1) ] and extension complexity [ K(2)] +[ W ], trigger complexity [ K(1) ] and extension complexity [ K(2)] (see below under PARAMETERS ). The defaults provided are suitable for low-complexity masking of database search query sequences [ -x option required, see below]. -OUTPUTS AND APPLICATIONS +OUTPUTS AND APPLICATIONS ------------------------ (1) Readable segmented sequence [Default]. Regions of contrasting complexity are displayed in "tree format". See EXAMPLES. -(2) Low-complexity masking (see Altschul et al, 1994). Produce a +(2) Low-complexity masking (see Altschul et al, 1994). Produce a masked FASTA-formatted file, ready for input as a query sequence for -database search programs such as BLAST or FASTA. The amino acids in -low-complexity regions are replaced with "x" characters [-x option]. +database search programs such as BLAST or FASTA. The amino acids in +low-complexity regions are replaced with "x" characters [-x option]. See EXAMPLES. (3) Database construction. Produce FASTA-formatted files containing low-complexity segments [-l option], or high-complexity segments -[-h option], or both [-a option]. Each segment is a separate +[-h option], or both [-a option]. Each segment is a separate sequence entry with an informative header line. -ALGORITHM +ALGORITHM --------- The SEG algorithm has two stages. First, identification of @@ -81,7 +81,7 @@ the lowest value of the probability P(0) (equation (5) of Wootton & Federhen, 1993). -PARAMETERS +PARAMETERS ---------- These three numeric parameters are in obligatory order after the @@ -92,16 +92,16 @@ Trigger complexity. [ K1 ]. The maximum complexity of a trigger window in units of bits. K1 must be equal to or greater than zero. -The maximum value is 4.322 (log[base 2]20) for amino acid +The maximum value is 4.322 (log[base 2]20) for amino acid sequences [ Default 2.2 ]. Extension complexity [ K2 ]. The maximum complexity of an extension window in units of bits. Only values greater than K1 are effective -in extending triggered windows. Range of possible values is as for +in extending triggered windows. Range of possible values is as for K1 [ Default 2.5 ]. -OPTIONS +OPTIONS ------- The following options may be placed in any order in the command @@ -112,7 +112,7 @@ lines. -c [characters-per-line] Number of sequence characters per line of - output [Default 60]. Other characters, such as residue numbers, + output [Default 60]. Other characters, such as residue numbers, are additional. -h Output only the high-complexity segments in a FASTA-formatted @@ -122,7 +122,7 @@ file, as a set of separate entries with header lines. -m [length] Minimum length in residues for a high-complexity - segment [default 0]. Shorter segments are merged with adjacent + segment [default 0]. Shorter segments are merged with adjacent low-complexity segments. -o Show all overlapping, independently-triggered low-complexity @@ -130,7 +130,7 @@ -q Produce an output format with the sequence in a numbered block with markings to assist residue counting. The low-complexity and - high-complexity segments are in lower- and upper-case characters + high-complexity segments are in lower- and upper-case characters respectively. -t [length] "Maximum trim length" parameter [default 100]. This @@ -145,7 +145,7 @@ with low-complexity regions replaced by strings of "x" characters. -EXAMPLES OF PARAMETER SETS +EXAMPLES OF PARAMETER SETS -------------------------- Default parameters are given by 'seg sequence' (equivalent to 'seg @@ -154,48 +154,48 @@ Database-database comparisons: ----------------------------- -More stringent (lower) complexity parameters are suitable when -masked sequences are compared with masked sequences. For example, -for BLAST or FASTA searches that compare two amino acid sequence +More stringent (lower) complexity parameters are suitable when +masked sequences are compared with masked sequences. For example, +for BLAST or FASTA searches that compare two amino acid sequence databases, the following masking may be applied to both databases: seg database 12 1.8 2.0 -x Homopolymer analysis: -------------------- -To examine all homopolymeric subsequences of length (for example) +To examine all homopolymeric subsequences of length (for example) 7 or greater: - seg sequence 7 0 0 + seg sequence 7 0 0 Non-globular regions of protein sequences: ----------------------------------------- -Many long non-globular domains may be diagnosed at longer window +Many long non-globular domains may be diagnosed at longer window lengths, typically: seg sequence 45 3.4 3.75 -For some shorter non-globular domains, the following set is +For some shorter non-globular domains, the following set is appropriate: seg sequence 25 3.0 3.3 Nucleotide sequences: -------------------- -The maximum value of the complexity parameters is 2 (log[base 2]4). -For masking, the following is approximately equivalent in effect +The maximum value of the complexity parameters is 2 (log[base 2]4). +For masking, the following is approximately equivalent in effect to the default parameters for amino acid sequences: seg sequence.na 21 1.4 1.6 -EXAMPLES +EXAMPLES The following is a file named 'prion' in FASTA format: >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR -MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP -HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA -VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV -NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV +MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP +HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA +VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV +NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV ILLISFLIFLIVG The command line: @@ -221,8 +221,8 @@ sppvillisflifliv 237-252 253-253 G -The low-complexity sequences are on the left (lower case) and -high-complexity sequences are on the right (upper case). All +The low-complexity sequences are on the left (lower case) and +high-complexity sequences are on the right (upper case). All sequence segments read from left to right and their order in the sequence is from top to bottom, as shown by the central column of residue numbers. @@ -234,21 +234,21 @@ gives the following FASTA-formatted file:- >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR -MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYxxxxxxxxxxx -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTHSQWNKPSKPKTNMKHMxxxxxxxx -xxxxxxxxxxxxxxxRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV -NITIKQHxxxxxxxxxxxxxxDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxxx +MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYxxxxxxxxxxx +xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTHSQWNKPSKPKTNMKHMxxxxxxxx +xxxxxxxxxxxxxxxRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV +NITIKQHxxxxxxxxxxxxxxDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxxx xxxxxxxxxxxxG -SEE ALSO +SEE ALSO -------- segn, blast, saps, xnu -AUTHORS +AUTHORS ------- John Wootton: wootton@ncbi.nlm.nih.gov @@ -262,7 +262,7 @@ U.S.A. -PRIMARY REFERENCE +PRIMARY REFERENCE ----------------- Wootton, J.C., Federhen, S. (1993) Statistics of local complexity debian/patches/example0000644000000000000000000000136411775324707012214 0ustar Description: add an example Upstream has an example in the text documentation file. From: Laszlo Kajan Forwarded: http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016275.html Index: seg-1994101801/prion.fa =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ seg-1994101801/prion.fa 2012-07-04 19:08:07.066840944 +0000 @@ -0,0 +1,6 @@ +>PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR +MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP +HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA +VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV +NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV +ILLISFLIFLIVG debian/patches/autotools0000644000000000000000000001112011775324707012601 0ustar Description: using autotools build system Upstream uses a simple makefile. From: Laszlo Kajan Forwarded: http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016275.html --- /dev/null +++ b/configure.ac @@ -0,0 +1,7 @@ +AC_INIT([ncbi-seg], [0.0.20000620]) +AC_CONFIG_SRCDIR([seg.c]) +AM_INIT_AUTOMAKE +AC_CONFIG_FILES([Makefile]) +AC_PROG_CC + +AC_OUTPUT --- /dev/null +++ b/Makefile.am @@ -0,0 +1,14 @@ +man_MANS = ncbi-seg.1 + +bin_PROGRAMS = ncbi-seg + +ncbi_seg_SOURCES = seg.c genwin.c genwin.h lnfac.h + +LDADD = -lm + +ncbi-seg.1: ncbi-seg.pod + sed -e 's|__datadir__|$(datadir)|g;s|__docdir__|$(docdir)|g;s|__pkgdatadir__|$(pkgdatadir)|g;s|__PREFIX__|$(prefix)|g;s|__sysconfdir__|$(sysconfdir)|g;s|__VERSION__|$(VERSION)|g;' "$<" | \ + pod2man -c 'User Commands' -r "$(VERSION)" -name $(shell echo "$(basename $@)" | tr '[:lower:]' '[:upper:]') > "$@" + +clean-local: + rm -f $(man_MANS) --- /dev/null +++ b/AUTHORS @@ -0,0 +1,4 @@ +John C. Wootton, Scott Federhen +National Center For Biotechnology Information +National Library of Medicine +National Institutes of Health --- /dev/null +++ b/ChangeLog @@ -0,0 +1,52 @@ + +This directory contains C language source code for the SEG program of Wootton +and Federhen, for identifying and masking segments of low compositional +complexity in amino acid sequences. This program is inappropriate for +masking nucleotide sequences and, in fact, may strip some nucleotide +ambiguity codes from nt. sequences as they are being read. + +The SEG program can be used as a plug-in filter of query sequences used in the +NCBI BLAST programs. See the -filter and -echofilter options described in the +BLAST software's manual page. + +Input to SEG must be sequences in FASTA format. Output can be produced in a +variety of formats, with FASTA format being one of them when the -x option is +used. The file seg.doc includes a copy of the man page for the seg program. + + +References: +Wootton, J. C. and S. Federhen (1993). Statistics of local complexity in amino +acid sequences and sequence databases. Computers and Chemistry 17:149-163. + + +MODIFICATION HISTORY +10/18/94 +Fixed a bug in the boundary conditions for the alphabet assignments +(colorings) calculations. This condition seems not to arise in the +current protein sequence databases, but does appear when the algorithm +is customized for the nucleic acid alphabet. + +4/2/94 +Fixed a bug in the reading of input sequence files. B, Z, and U letters found +in the IUB amino acid alphabet and the NCBI standard amino acid alphabet +were being stripped. + +3/30/94 +WRG improved speed by about 3X (roughly 5X overall since 3/21/94), due in part +to the elimination of nearly all log() function calls, plus the removal of much +unused or unnecessary code. + +3/21/94 +Included support for the special characters "*" (translation stop) and "-" +(gap) which are found in some NCBI standard amino acid alphabets. + +WRG replaced repetitive dynamic calls to log(2.) and log(20.) with precomputed +values, yielding a 33-50% speed improvement. + +WRG added EOF checks in several places, the lack of which could produce +infinite looping. + +The previous version of seg is archived beneath the archive subdirectory. + +9/30/97 +HMF5 plugged a memory leak. --- /dev/null +++ b/NEWS @@ -0,0 +1 @@ +2012-07-04 Debianization of seg as ncbi-seg. --- /dev/null +++ b/COPYING @@ -0,0 +1,24 @@ + PUBLIC DOMAIN NOTICE + National Center for Biotechnology Information + + This software/database is a "United States Government Work" under the + terms of the United States Copyright Act. It was written as part of + the authors' official duties as United States Government employees and + thus cannot be copyrighted. This software/database is freely available + to the public for use. The National Library of Medicine and the U.S. + Government have not placed any restriction on its use or reproduction. + + Although all reasonable efforts have been taken to ensure the accuracy + and reliability of the software and data, the NLM and the U.S. + Government do not and cannot warrant the performance or results that + may be obtained by using this software or data. The NLM and the U.S. + Government disclaim all warranties, express or implied, including + warranties of performance, merchantability or fitness for any particular + purpose. + + Please cite the authors in any work or product based on this material. + + Authors: John C. Wootton, Scott Federhen + National Center For Biotechnology Information + National Library of Medicine + National Institutes of Health debian/patches/makefile0000644000000000000000000000222311775324707012331 0ustar Description: rename original makefile From: Laszlo Kajan Forwarded: http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016275.html Index: seg-1994101801/makefile =================================================================== --- seg-1994101801.orig/makefile 2012-07-04 19:09:41.530825609 +0000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,15 +0,0 @@ - -all : seg - -seg : seg.c lnfac.h genwin.h genwin.o - cc -O -o seg seg.c genwin.o -lm - -hiseg : hiseg.c lnfac.h genwin.h genwin.o - cc -O -o hiseg hiseg.c genwin.o -lm - -genwin.o : genwin.c genwin.h - cc -O -c genwin.c - -clean: - rm -f seg seg.o genwin.o - Index: seg-1994101801/makefile.old =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ seg-1994101801/makefile.old 2012-07-04 19:09:41.530825609 +0000 @@ -0,0 +1,15 @@ + +all : seg + +seg : seg.c lnfac.h genwin.h genwin.o + cc -O -o seg seg.c genwin.o -lm + +hiseg : hiseg.c lnfac.h genwin.h genwin.o + cc -O -o hiseg hiseg.c genwin.o -lm + +genwin.o : genwin.c genwin.h + cc -O -c genwin.c + +clean: + rm -f seg seg.o genwin.o + debian/patches/series0000644000000000000000000000006211775235535012045 0ustar makefile example seg.c genwin.c autotools seg.pod debian/patches/seg.c0000644000000000000000000000106211775324707011553 0ustar Description: add missing include file From: Laszlo Kajan Forwarded: http://lists.alioth.debian.org/pipermail/debian-med-packaging/2012-July/016275.html Index: seg-1994101801/seg.c =================================================================== --- seg-1994101801.orig/seg.c 2012-07-04 18:15:49.000000000 +0000 +++ seg-1994101801/seg.c 2012-07-04 18:47:15.222835801 +0000 @@ -6,6 +6,7 @@ /*--------------------------------------------------------------(includes)---*/ +#include #include "genwin.h" #include "lnfac.h" debian/changelog0000644000000000000000000000024111776571420011047 0ustar ncbi-seg (0.0.20000620-1) unstable; urgency=low * Initial version. (Closes: #680233) -- Laszlo Kajan Wed, 04 Jul 2012 16:11:12 +0000 debian/compat0000644000000000000000000000000211775235535010401 0ustar 8 debian/control0000644000000000000000000000275411775324707010616 0ustar Source: ncbi-seg Section: science Priority: extra Maintainer: Debian Med Packaging Team Uploaders: Laszlo Kajan Build-Depends: debhelper (>= 8), dh-autoreconf Standards-Version: 3.9.3 Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/seg/trunk/ Vcs-Browser: http://svn.debian.org/wsvn/debian-med/trunk/packages/seg/trunk/ DM-Upload-Allowed: yes Homepage: ftp://ftp.ncbi.nih.gov/pub/seg/seg/ Package: ncbi-seg Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: tool to mask segments of low compositional complexity in amino acid sequences ncbi-seg (a.k.a. SEG) is a program for identifying and masking segments of low compositional complexity in amino acid sequences. . ncbi-seg divides sequences into contrasting segments of low-complexity and high-complexity. Low-complexity segments defined by the algorithm represent "simple sequences" or "compositionally-biased regions". . This program is inappropriate for masking nucleotide sequences and, in fact, may strip some nucleotide ambiguity codes from nt. sequences as they are being read. Package: ncbi-seg-dbg Section: debug Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends}, ncbi-seg (= ${binary:Version}) Description: debug symbols for ncbi-seg ncbi-seg (a.k.a. SEG) is a program for identifying and masking segments of low compositional complexity in amino acid sequences. . This package contains detached debug symbols for ncbi-seg. debian/ncbi-seg.examples0000644000000000000000000000001111775324710012414 0ustar prion.fa