lutefisk-1.0.7+dfsg.orig/0000755000175000017500000000000012153121713015145 5ustar rusconirusconilutefisk-1.0.7+dfsg.orig/HISTORY.txt0000644000175000017500000003216611666251650017073 0ustar rusconirusconiLutefisk Code History ---------------------- Version 1.0 - Released 5/30/97 Version 1.1 - Released 6/22/97 - Fixed some problems in GetCidData() with reading of Finnigan ASCII files. Version 1.3 - Released 11/17/97 - Compilable with gcc 2.7 on UNIX. - Command line options added to specify the output and param files. Version 1.4.5 - monoisotopic to average mass switch occurs over a 400 Da range - auto-tag automatically finds sequence tags prior to sequencing - auto-peakwidth automatically finds the peakwidth of profile data - for unit resolved spectra, the 2xC13 peak is discarded - cross-correlation bins every 0.5 Da instead of 1 Da - over-used ions in the final scoring are identified (same ions considered to be b and y) and final score reduced - the final scoring section was more carefully debugged and simplified Version 1.4.6 - auto-tag altered so that N-terminal mass can be three amino acids - in final scoring, sequences not ending in R or K for tryptic peptides are penalized - ions that connect to 147 or 175 for tryptic peptides were retained regardless of the intensity Version 1.4.7 - a few bugs that affected Unix operation were fixed Version 1.4.8 - a few more bugs were found and fixed Verson 2.0.1 - more bugs - modified to better handle ion trap data Version 2.0.2 - added ability to directly read Finnigan .dat files - rearranged the .params file into a more sensible order Version 2.0.3 - found a spot in GetCID where dividing by zero - added function to GetCID that eliminates ions below a specific s/n (SIGNAL+AF8-NOISE in the definitions file). The noise is determined in a 100 da range local to each ion. - added ability to read Sequest .dta files (Martin Baker) - added default output to match input name plus +ACI-.lut+ACI- (Martin Baker) Version 2.0.4 - added stuff to make compatible with Codewarrior compilier for Win32 - added stuff to allow it to compile on DEC alpha Version 2.0.5 - Released 10/16/98 -bugs and the like ############################################################################## Lutefisk for the new millenium: ############################################################################## Lutefisk1900 version 1.2.4 - changed code so that data was not converted to nominal masses+ADs- this means that high mass accuracy measurements will not be wasted. - because of temperature dependent drifts in the calibration slope of qtof data, the program will recalibrate the data for each candidate sequence. This allows for poorly calibrated qtof data to still have a tight fragment ion tolerance applied. - the program will input sequences derived from database matches and compare these with the de novo derived candidate sequences. This allows for an electronic validation of a database hit. - a CID data quality check has been added as an option. It looks for low mass amino acid related ions, high mass ions that can be connected by amino acid residue masses, as well as a general check on data strangeness (few ions, negative masses, precursor charge, etc). - code was modified to accomodate data that had been processed using Micromass's Maxent3 software. This software converts fragments to corresponding singly-charged values, and also removes isotope peaks. - the lutefisk.residues file was added in order for people to add unusual amino acids to the list of common ones. - some bugs were removed and others were added Lutefisk1900 version 1.2.5 - changed LutefiskMakeGraph so that y ions resulting from the loss of a single N-terminal amino acid is graphed as a node even if there is no corresponding y ion. - changed main so that if the peakWidth parameter is set to zero for centroided data, that the autopeak width feature is not used. Its not possible to determine peakwidths from centroided data, and was getting slightly screwy results. If peakwidth is zero, then reasonable values are inserted depending on the instrument (0.75 for qtof and 1 for lcq). - changed LutefiskGetCID where there was an error in reading LCQ text files. The intensity values in the file are real, but were read into a struct value that needed an int+ADs- hence the intensity values were getting garbled. Lutefisk1900 version 1.2.6 - changed LutefiskGetCID so that it would not bomb-out when looking for a header to 'tab text' input file formats. - changed LutefiskScore so that it would include losses of CH3SOH from oxidized methionine. If the mass accuracy is sufficient to distinguish oxMet from Phe, then this loss is only considered for b and y ions that contain the oxMet. For lower accuracy, losses of 64 u from b and y ions that contain either Phe or oxMet are considered. Lutefisk1900 version 1.2.7 - changed ExpandSequence in LutefiskScore.c so that it checks to see how many new qtof sequences could be generated for each original sequence. In some rare instances (particularly for longer sequences), it was possible to expand a single sequence into many thousand related sequences (oxMet replacing Phe, or Gln replacing Lys, etc). Although not a bug, it was bogging things down. It now checks to make sure that there are no more than 500 new sequences expanded from the original+ADs- if there are too many, then it only allows sequence expansion for single amino acids (oxMet/Phe and Gln/Lys), and other reasons for expansion (multiple dipeptide choices, or Trp plus three dipeptides of the same mass) are eliminated. This should keep Lutefisk from getting hung up on certain spectra. - checks to make sure that peptide lengths do not exceed the array limits -- if they do, then it just exits w/o going any further. Lutefisk1900 version 1.2.8 - changed the cross-correlation scoring so that instead of subtracting the average value of tau from -75 to 75, it does a point by point subtraction of the two numbers on each side of tau, takes the absolute value, and sums this over the range of -250 to 250. In other words, it subtracts tau(250) from tau(-250) and adds that to tau(249)-tau(-249), etc. This value is subtracted from tau(0) to give the cross-correlation score. - changed the normalization of the cross-correlation score so that it is normalized to the auto-correlation of the spectrum itself (after the funny intensity normalizations). - The program can determine best scoring sequences for incorrect peptide molecular weights. It determines the best sequences for MW 14xN where N is any number from zero to 20 (or more if you want). It determines an average best wrong score and standard deviations around the mean, and then compares this with the best score for the correct peptide molecular weight. This allows for a statistical evaluation of the Lutefisk1900 results. - Because of all this looping, I found some memory leaks (fixed). - Found a place where peptides exceeding the allowed length were stomping on some memory. - Added a spectrum quality assessment following completion of the sequencing. Its based on Pavel Pevzner comment that quality +AD0- +ACM-y or b ions / number of possible y or b ions. In practice, lutefisk will find the longest contiguous stretch of uninterrupted y or b ions and divide this by the (total number of amino acids minus one). A dipeptide in the sequence is not counted as an interrupted series+ADs- however, the dipeptide counts as two amino acids in the denominator. No skips are allowed. Lutefisk1900 version 1.2.9 - The quality score from v1.2.8 is now used for screening candidate sequences. A minimum quality score is determined from the highest quality candidate, and candidates below this minimum are tossed out. - Various bug fixes. Lutefisk1900 version 1.3.1 - Revamped command-line invocation to be more intuitive. Now can say 'lutefisk +ACo-.dta' - The '-q' flag now completely quiets STDOUT output. - Various minor bug fixes to aid compilation on some systems. Lutefisk1900 version 1.3.2 - fixed bug in GetCID so that plusArgLys array is initialized properly - set limits on +ACI-quality+ACI- and final combined scores, below which the sequences are trashed - minor change in final scoring of database-derived sequence - changed LutefiskScore so that when Qtof sequences are expanded, any given original sequence can only generate 250 new ones. When doing this procedure using the wrong peptide mass (statistical analysis) the program is limited to 50 new sequences for every old one. Lutefisk1900 version 1.3.3 - An output file is now made even when no scoring candidates are generated. Lutefisk1900 version 1.3.4 - Fixed a buffer overflow problem in PeptideString() caused by long peptides. Lutefisk1900 version 1.3.5 - Changed the way that the N- and C-terminal masses are entered. Now it is possible to provide any mass rather than specific ones. Lutefisk1900 version 1.3.7 - Made it so that the y1 and y2 goldenboy ions could not be removed on the basis of their intensity - Added the Haggis type of subsequencing. This finds series of ions connected by single amino acids, makes two sequences (forward and backward), and adds it to the list of sequence candidates. - Changed the +ACI-fuzzy logic+ACI- bit in LutefiskScorer so that mass variation from the calculated values have an affect on ion scores in a manner that follows a normal distribution (instead of a linear drop off). Lutefisk1900 version 1.3.8 - Changed final scoring of candidate sequences. About 150 spectra from both LCQ and Qtof were examined and sequence candidates with particularly high or low values for Pevscr (pavel pevzner) Intscr (intensity based scoring, quality (fraction of peptide mass accounted for by contiguous ion series) and xcorr (cross correlation score) were retained or discarded. An average of these four scores was calculated and an emperically derived probability of being correct was determined. The output is an estimated probability of being correct (Pr(c)). - Looks for pairs of y/b ions to re-determine the peptide MW in LCQ data. Lutefisk1900 version 1.3.9 - bug fixes - scores slightly differently for candidate sequences that contain more Arg's than protons on the precursor ################################################################################################## LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP LutefiskXP ################################################################################################## LutefiskXP version 1.0 - The Haggis sequencing was modified so that two ion series could be combined. - The Haggis sequencing was modified so that the two unsequenced masses at either end could be matched to randomly derived sequences. The random sequences that fit with the most b and y ions is saved and replaces the chunk of mass. - Added two output variables to the params file. Now can specify the number of sequences and their Pr(c) limit. - Modified final scoring slightly, such that +ACI-quality+ACI- has less of an influence. LutefiskXP version 1.0.3 - When Q is in the second position, then a c1 ion is present. It is now scored, and used to distinguish the two amino acids if they are not already. - Limits placed on the number of subsequences (drops by half) if the program has been processing for over 30 seconds. It drops by another half if the processing is over a minute. This will speed long ones up. - Limits placed on the deconvolution of the C-terminal unsequenced chunk of mass (in Haggis). If the overall processing has gone on for over 45 sec then it drops the mass to 600. - For LCQ data, b/y pairs are found and labeled as "golden boys" that cannot be gotten rid of as easily. These could also be considered "favored sons", although we can all hope that the favored son "W" is eliminated in November. - Fixed the ranking so that the db derived sequences don't fowl up the rank numbering. - Fixed a bug that, in certain cases, produced negative values for summed residue masses displayed within brackets - Changed the LCQ b/y pair procedures (both in GetCID and main) such that pairs were either both singly charged, or one was singly-charged and the other doubly-charged (but only if the precursor is > 2 charge state). - Changed Haggis a bit so that the minimum number of edges required to be considered a sequence varied with the number of Lutefisk sequences already obtained as well as peptide molecular weight. Minimum stayed at 4, but could be higher. - Changed Haggis so that if the array capacity was exceeded, it dumped the results obtained with the Lutefisk sequences and then continued to run (rather than exit(1), which is a waste). LutefiskXP version 1.0.4 - changed the way the ion intensity is altered in the final scoring. The high intensity ions are reduced, but the low intensity ones are not increased LutefiskXP version 1.0.5 - Added clean bailout code if less than five ions remain after pre-processing the data. (JAT) LutefiskXP version 1.0.6 - Internal release. Minor changes. LutefiskXP version 1.0.7 - Minor changes. Richard S. Johnson jsrichar@alum.mit.edu lutefisk-1.0.7+dfsg.orig/docs/0000755000175000017500000000000012153121713016075 5ustar rusconirusconilutefisk-1.0.7+dfsg.orig/docs/Lutefisk_edman.htm0000644000175000017500000000256607537756122021574 0ustar rusconirusconi Lutefisk.edman
SG		/ Cycle 1.	Enter single letter code w/o spaces. "X" means it could be anything.
Y		/ Cycle 2.	Enter single letter code w/o spaces. "X" means it could be anything.
Y		/ Cycle 3.	Enter single letter code w/o spaces. "X" means it could be anything.
DE		/ Cycle 4.	Enter single letter code w/o spaces. "X" means it could be anything.
V		/ Cycle 5.	Enter single letter code w/o spaces. "X" means it could be anything.
G		/ Cycle 6.	Enter single letter code w/o spaces. "X" means it could be anything.
M		/ Cycle 7.	Enter single letter code w/o spaces. "X" means it could be anything.
L		/ Cycle 8.	Enter single letter code w/o spaces. "X" means it could be anything.
T		/ Cycle 9.	Enter single letter code w/o spaces. "X" means it could be anything.
R		/ Cycle 10.	Enter single letter code w/o spaces. "X" means it could be anything.
		/ Cycle 11.	Enter single letter code w/o spaces. "X" means it could be anything.
		/ Cycle 12.	Enter single letter code w/o spaces. "X" means it could be anything.
lutefisk-1.0.7+dfsg.orig/docs/databaseSeq.html0000644000175000017500000000064207540444072021214 0ustar rusconirusconi databaseSeq

ELVISLIVESK
ELVISISDEADK
ELVISDECAYSK

lutefisk-1.0.7+dfsg.orig/docs/details.html0000644000175000017500000000206107540443756020431 0ustar rusconirusconi details
4	8	8		/ b ion values for nodes of type general, triple quad tryptic, and ion trap
2	0	0		/ a ion values 
0	0	0		/ c ion values                 Note: The sum of the ion values must be less
0	0	0		/ d ion values                       than 30 (memory issues).
2	2	2		/ b-17 or b-18 ion values
2	0	0		/ a-17 or a-18 ion values
4	8	8		/ y ion values
0	0	0		/ y-2 ion values
2	2	2		/ y-17 or y-18 ion values
0	0	0		/ x ion values
0	0	0		/ z+1 ion values
0	0	0		/ w ion values
0	0	0		/ v ion values
0	0	0		/ b-OH ion values
0	0	0		/ b-OH-17 ion values
lutefisk-1.0.7+dfsg.orig/docs/PSD_Lutefisk.params0000644000175000017500000001176507544134200021613 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: peptideMDH897.4.dta | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 1 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.75 | Peptide molecular weight tolerance. Fragment Error (u): 0.75 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 1 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.1 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 6 | Ions per input window (windows are 60 Da wide). Ions Per Residue: 2.7 | Number of ions per average residue. // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: T | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: N | Modified N-terminus? (N=none, A=acetylated, C=carbamylated, P=pyroglutamic acid) Modified C-terminus: N | Modified C-terminus? (N=none, A=amidated) Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: N | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion Edman Data File: | File with Edman data DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): | US shoe size. Default of 17. lutefisk-1.0.7+dfsg.orig/docs/Lutefisk_residues.htm0000644000175000017500000000232107537756132022321 0ustar rusconirusconi Lutefisk.residues
A	 71.0371	 	71.08		 71		/Ala
R	156.1011		156.19		156		/Arg
N	114.0429		114.10		114		/Asn
D	115.0270		115.09		115		/Asp
C	103.0092		103.14		103		/Cys
E	129.0426		129.12		129		/Glu
Q	128.0586		128.13		128		/Gln
G	 57.0215	 	57.05		 57		/Gly
H	137.0589		137.14		137		/His
I	113.0841		113.16		113		/Ile
L	113.0841		113.16		113		/Leu
K	128.0950		128.17		128		/Lys
M	131.0405		131.20		131		/Met
F	147.0684		147.18		147		/Phe
P	 97.0528		 97.12		 97		/Pro
S	 87.0320		 87.08		 87		/Ser
T	101.0477		101.11		101		/Thr
W	186.0793		186.21		186		/Trp
Y	163.0633		163.17		163		/Tyr
V	 99.0684		 99.13		 99		/Val
J	  0.0			  0.0		  0		/Modified amino acid
O	  0.0			  0.0		  0		/Modified amino acid
U	  0.0			  0.0		  0		/Modified amino acid
X	  0.0			  0.0		  0		/Modified amino acid
Z	  0.0			  0.0		  0		/Modified amino acid
lutefisk-1.0.7+dfsg.orig/docs/paramsDescrip.html0000644000175000017500000005364007763161267021613 0ustar rusconirusconi paramsDescrip

Lutefisk.params file parameters

 

CID Filename: Name of the CID data file. A full or partial pathname can be specified.

CID Quality: If you would like the program to give you it's opinion on the quality of the CID data, type "Y" or "N".  I gave up on this and no longer use it, so the default is “N”.

Peptide MW: Give the peptide molecular weight (NOT MH+!!) including any number of decimal places, depending on the mass accuracy of the instrument. For Sequest ".dta" files, a zero can be entered here, in which case the peptide molecular weight is obtained from the file header.

Charge-state: This is the charge state of the precursor ion. Any integer number can be used, although the program works best on CID spectra obtained from singly or doubly charged ions. Triply-charged ion precursors in a triple quad do not often yield complete sets of fragmentation ions sufficient to delineate a full-length sequence. For Sequest ".dta" files, a zero can be entered here, in which case the precursor charge is obtained from the file header.

MaxEnt3: Were the data subjected to a Max Ent 3 type of processing; ie, were the multiply charged fragment ions converted to their singly-charged counterparts and were the C13 isotope peaks removed? Answer "Y" or "N".

 

Mass Tolerances:

Peptide Error (u): This is the error in the peptide mass measurement in Daltons or fractions of a Dalton. This tolerance can be set as tight as you think your data warrants - 1 or 2 Daltons for low mass accuracy is suitable, or you can use a few hundredths of a Dalton for very accurate mass measurements. It is up to you.  For LCQ data, the software will try to “re-adjust” the peptide MW based on y/b ion pairs, so I generally choose 0.65 u as the peptide MW error for ion traps.  I use 0.45 for Qtof’s.

Fragment Error (u): This is the error in measurement of the m/z values of the fragment ions. For high quality triple quad data with unit resolution in Q3, I use a value of 0.5; for low resolution triple quad data I go with 0.75 or 1.0. For ion trap data, I typically use a value of 0.65. For poorly calibrated Qtof data, I use a value of 0.15 to 0.25, but for very well-calibrated data this tolerance can be reduced to 0.02 to 0.05 u.

Final Fragment Err (u): This value only applies to Qtof data. The idea is that temperature dependent expansion and contraction of the flight tube will change the calibration; however, the errors that result are linear. Lutefisk operates by finding a list of candidate sequences, and then it scores these candidates based on how well the predicted fragments match up with the observed fragments. In the final evaluation of sequence candidates derived from Qtof data, the calculated b- and y-type ions of each sequence are used to adjust the calibration of the data. Once the data has been recalibrated, then this Final Fragment Err is applied. Typically, I use a value of 0.02. If a value of zero is entered, then this recalibration feature is disabled and not applied.

Lately, for Qtof data, I use a Peptide Error of 0.45, a Fragment Error of 0.25, and a Final Fragment Err of 0.02. For nanospray ion trap data (collected in profile mode so that monoisotopic peaks can be identified), I use a Peptide Error of 0.45, a Fragment Error of 0.45, and a Final Fragment Err of zero (no effect). Larger peptide and fragment tolerances of 0.65 u are used for LC/MS/MS data from ion traps -- the centroided ions are not monoisotopic, hence the greater error.

Memory and Speed:

Max. Final Sequences: This is the maximum number of completed sequences (sequences that equal the specified peptide mass plus/minus the peptide mass tolerance) that can be stored before discarding low scoring sequences. This value is dependent on the RAM available to the program (see below); I generally use a value of 20000.

Max Subsequences: This is the maximum number of subsequences (partial sequences that get extended amino acid by amino acid) that can be stored before discarding low scoring subsequences. I usually allow 5000 subsequences to be processed, but this is also dependent on the amount of RAM that is available for Lutefisk. In one test case (Mac G3), I found that 12288 K was sufficient to allow for 20000 final sequences (above) and 5000 subsequences; 4096 was sufficient to allow for 10000 final sequences (above) and 2500 subsequences. I would recommend giving Lutefisk a bit more than the bare minimum, since I won't guarantee that in all cases your computer won't crash when short on RAM. In addition, the number of subsequences allowed is also dependent on the processor speed; I find that 5000 subsequences can take my G3 a few seconds to a minute to process data from a 1500 u peptide.

Mass Scrambles for Statistics: To help determine if the output is correct or nearly correct, Lutefisk compares the output to other sequences that are close matches, but known to be wrong. Typically, a value of six is used for this parameter, in which case, it derives the six best candidate sequences assuming six different incorrect peptide molecular weights. The incorrect molecular weights are 14 u, 28 u, and 42 u less than and greater than the correct peptide mass. The results are known to be wrong, and the scores for these wrong sequences are compared to the results derived by using the correct peptide mass. If you don't want to make a comparison to wrong sequences, then enter a zero for this parameter.  Lately, I have decided that this feature is not all that useful, so I use a value of zero. 

Spectral Processing:

CID File Type: Enter "F" if the CID data file is derived from the Finnigan "List" program, "T" if it is a tab-delineated ASCII file, "L" if it is a text file from the LCQ file converter program, or "D" if it is a ".dta" file.

Profile/Centroid: Profile data is subjected to a 5-point digital smooth; this is the only difference in processing. By entering a 'D' here, the program automatically differentiates between profile and centroid data. When using this default feature, I found that for some Sequest ".dta" files the program would mistakenly decide that it was profile data, so data files ending w/ ".dta" are automatically assumed to be centroided.

Peak Width (u): This value is used in the peak detection part of the program, and is dependent on the resolution of the mass analyzer. For unit resolved peaks, the program tries to identify and discard adjacent C13 peaks. For unit resolved spectra I usually use a value of 1.5; for lower resolution MS/MS data on a triple quad I use a value of 3. The auto-peakwidth seems to work quite well for triple quad data. Put a zero here ("0") to use auto-peakwidth when using profile data obtained from triple quads. For ion trap data, use 1 and for Qtof data use a value of 0.75.

Ion Threshold: Data with an intensity greater than the average intensity times this threshold is used for identifying peaks. I use a fairly low value of 0.1.

Mass Offset (u): For data where the CID fragment ion m/z values are consistently off by a known value, this value can be entered here. For example, if the data is always low by 0.2 Da then 0.2 can be entered here. If it is always high by 0.2, then the value of -0.2 is entered. This situation arises if you acquire data at a different resolution setting than what the third quadrupole was calibrated for.

Ions Per Window: The program steps from ion to ion and counts the number of ions between it and a mass 120 Da higher (120 Da is close to the weight averaged amino acid residue mass). If there are too many ions within this moving window, then only those with the greatest intensity are retained. For regions of a CID spectrum that could contain multiply charged fragment ions, this window is narrowed accordingly (e.g., 60 Daltons for regions that could possibly contain doubly-charged fragment ions). I usually use a value of 6 ions per window for unprocessed profile data. If your CID data contains centroided or peak top data that you have already processed by hand, i.e., you've eliminated superfluous ions and you wish to use all of the ions in the interpretation, try using a larger number here (like 20).

Ions Per Residue: This sets an overall limit to the number of ions to be considered. Since an average residue is of mass 120 Da, then a peptide of mass 1218 would be expected to have around 10 residues. I usually use a value of 2.7 here, so in this example, the number of ions used for sequencing would be limited to 27.

Subsequencing:

Transition Mass (u): This is the mass where the fragment mass values are in transition from monoisotopic to average mass values. Below this cutoff, peptide molecular weights and fragment ion m/z values are assumed to be monoisotopic masses. Above this cutoff average masses are assumed. For triple quadrupole data, I usually use a value of 1800. The cutoff is not abrupt; rather the switch occurs linear over a 400 Da range below the cutoff mass (ie, if 1800 is selected, the masses below 1400 are assumed to be monoisotopic, and those between 1400 and 1800 are in between). Since LCQ and Qtof data routinely give at least unit resolved MSMS data, I tend to set the cutoff very high (5000) for the trap data. This ensures that the program never has to deal with average mass calculations.

Fragmentation Pattern: The idea here was to allow for different types of fragmentation patterns to be recognized by the algorithm, thereby increasing the probability that the correct sequence will be amongst the candidate sequence list. Currently, there are only three types available - low energy CID of tryptic peptides on a triple quad, low energy CID of tryptic peptides on a Qtof, and low energy CID of tryptic peptides on an ion trap. So this means that for now you must enter 'T' (for triple quad), 'Q' (for Qtof), or 'L' (for ion trap).

Max. Gaps: A gap is a dipeptide of unknown sequence, but of known mass. Usually I allow the presence of only one gap per sequence. However, since the two N-terminal amino acids are so frequently unsequenceable, this "gap" is not counted in this limit. A value of "-1" is typically used, which is a signal to use a default number of gaps per sequence that depends on the peptide mass -- larger peptides are allowed more dipeptide gaps than smaller ones.

Extension Threshold: For a given subsequence there may be several possible amino acid extensions. The extension with the best score determines a threshold that the other extensions must exceed - highest score times this threshold equals the limit. I've been using a value of 0.15.

Max. Exentensions: In addition to the threshold described above, it is possible to set a limit on the number of extensions allowed for each subsequence. Only those extensions with the highest score are used and the low scoring extensions are ignored. I use a value of 6 here.

Extras:

Cysteine Mass: This variable is necessary to account for the various ways of alkylating cysteine residues. The easiest way to deal with the many possibilities is to have the user enter the residue mass of cysteine - 160.03 for carbamidomethylated cysteine, 161.01 for carboxymethylated cysteine, 208.07 for pyridylethylated cysteine, or any other value you want.

Proteolysis: This is different from the "fragmentation pattern" described above. If tryptic proteolysis ("T") is selected then both Arg and Lys are forced into the C-terminal position regardless of whether there is any fragmentation data to support their presence. This does not eliminate other possible C-terminal amino acids; it only insures that Lys and Arg are included as possibilities. Likewise, selecting Lys-C ("K") insures that Lys is at the C-terminus, and selecting Glu-C or V8 ("E") insures that Asp and Glu are considered as C-terminal amino acids. By selecting Asp-N ("D") the program makes sure that D is considered for the N-terminal amino acid even if there is no data supporting it is presence.

Modified N-terminus: You must specify the mass of the N-terminus.  For example, use 1.0078 for an unmodified peptide, 43.0184 if the peptide has been acetylated, or 44.0136 for N-carbamylated peptides.

Modified C-terminus: You must specify the mass of the C-terminus.  This is typically 17.0027 for unmodified peptides (-OH).

Present Amino Acids: If a complete sequence lacks one of these amino acids then it is discarded. Use single letter code without spaces. Use '*' to denote none.

Absent Amino Acids: These amino acids are not even considered when generating sequences. Use single letter code without spaces. Use '*' to denote none.

Auto Tag: Auto-tag looks at the most intense ions at m/z values greater than the precursor. It then tries to find short stretches of sequences called "sequence-tags", which are used to limit the number of sequences that are generated. I recommend using it for triple quad and Qtof data obtained for tryptic peptides with doubly-charged precursor ions. Specific sequence tags can still be entered as described below. Since ion trap data can have both b and y ions in the m/z region greater than the precursor ion, I find that it is best to not use the Auto-tag when sequencing with trap data.

Tag Low Mass y Ion: A sequence tag is a short stretch of sequence, usually interpreted by hand, that is surrounded by regions of unknown sequence but of known mass. Typically, these sequence tags are determined from y-type ions at m/z values greater than the precursor ion. If you have a sequence tag, then for this parameter, enter the m/z value of the lowest mass y ion in the series of y ions that delineates the sequence tag. If you do not wish to enter a sequence tag, then this value should be zero.

Sequence Tag: If you have a sequence tag, use the single letter code without spaces ordered from the low mass y ion to the high mass y ion. If you do not have a sequence tag, enter an asterisk ("*").

Tag High Mass y Ion: If you have a sequence tag, then the m/z value of the highest mass y ion in the y ion series that delineates the sequence tag is entered here. If you do not wish to enter a sequence tag, then this value should be zero.

Edman Data File: The program used to use Edman sequencing data, but this is no longer supported.

DB Sequence File: If you have any sequences or a sequence that you might think is correct (derived from, say, a database search), this information is put into this file. Give the path and filename, and if this is left blank, then no database-derived sequences are checked.

Shoe size (US): Enter your shoe size here. If no entry, then a default value of 15 will be assumed.

Output:

Number of sequences: Number of sequences to list in the .lut output file.  This number is the upper limit, and in many cases there will be less.

Score threshold:  The lower Pr score (probability of having half of the sequence correct) limit.  Typically, a lower threshold score of 0.2 is fine.  To maximize the number of sequences in the output, make this value 0.01 and give a high number to “Number of sequences” (e.g., 50).

 

lutefisk-1.0.7+dfsg.orig/docs/ELVISLIVESK.htm0000644000175000017500000000066007537756146020441 0ustar rusconirusconi ELVISLIVESK ELVISLIVESK
ELVISISDEADK
ELVISDECAYSK
lutefisk-1.0.7+dfsg.orig/docs/LCMSTrap_Lutefisk.params0000644000175000017500000001240407763162002022545 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: data\databaseTestLCQ\SHCIAEVEK_1_lcq.dta | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 2 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.65 | Peptide molecular weight tolerance. Fragment Error (u): 0.65 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 1 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.01 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 6 | Ions per input window (windows are 60 Da wide). Ions Per Residue: 4 | Number of ions per average residue. // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: L | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: 1.0078 | N-terminal mass [1.0078(unmod), 43.0184(acetyl), 44.0136(carbamyl)] Modified C-terminus: 17.0027 | C-terminal mass [17.0027(unmod), 16.0187(amide), 31.0184(methyl)] Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: N | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): 15 | US shoe size. Default of 17. // Output -------------------------------------------------------------------------------- Number of sequences: 5 | Number of output sequences listed. A good bet is 5 Score threshold: 0.2 | Pr(c) is approximate probability that at least half of the sequence is correct. A good bet is 0.20. lutefisk-1.0.7+dfsg.orig/docs/index.html0000644000175000017500000022562110307650335020110 0ustar rusconirusconi Lutefisk1900 Operators Manual

LutefiskXP v1.0.5 Operators Manual




Documentation updated Aug. 26, 2005

Copyright © 2005 Rich Johnson (jsrichar@alum.mit.edu)
All rights reserved worldwide




Lutefisk Homepage:  lutefiskxp.sourceforge.net

Table of Contents

  1. Overview
  2. Lutefisk files
  3. Compilation notes
  4. Contact information
  5. Version history

Overview

Lutefisk is a program for the de novo interpretation of peptide CID spectra. While it has a rudimentary interface, it can be compiled for virtually any operating system with a C compiler. [Source code and instructions are provided for MacOS, Win32, OSF, Solaris, Irix, and Linux.]. 

Lutefisk can be used in conjunction with homology-based database search programs (e.g., OpenSea, MS-BLAST, or CIDentify) as a supplement to standard MS/MS database search programs.  We use it to find modified peptides or sequence variants, as well as to help validate the results obtained from database search programs (e.g, Mascot, Sequest, etc).  We also find it useful for flagging high quality data that the search programs (Mascot, etc) fail to identify, which may be due to a variety of reasons (searching a human protein sequence database for an MSMS spectrum of a mycobacterial derived peptide, for example).


Lutefisk Files

To run Lutefisk, you need to have four files within the same directory or folder:

  1. CID data file (data files can be specified with a full or partial pathname)
  2. Lutefisk.details
  3. Lutefisk.params
  4. Lutefisk.residues

One additional file is optional:

  1. Database.sequence

Once these files (containing the appropriate information as described below) and the Lutefisk application are gathered together in one folder, you start the application. Once execution begins, Lutefisk proceeds with minimal user intervention.

On a Mac, a single dialog box appears where you can specify a variety of command line arguments (if interested type -h to see help); in most cases, you will click on the "Ok" button and use all of the default values. On Windows, there is no such dialog box; however, command-line arguments can be implemented by starting Lutefisk from the Command Prompt program supplied with the Windows operating system.

A simulated teletype interface appears and indicates the various stages of processing that have been achieved. When it is finished, the teletype interface provides an initial list of sequences ranked in order of the "intensity score", followed by a more refined and shorter list of sequences. This short list of sequences is also placed in a file with the default name identical to the CID data file name with ".lut" appended. The header to this file contains the information found in the parameter file "Lutefisk.params".

Note regarding use of Lutefisk with CIDentify: The output file can be read directly by the modified FASTA program called CIDentify. If you don't like Lutefisk, you can still use CIDentify without using Lutefisk. This can easily be done by editing a Lutefisk output file so that it contains your own sequences (determined by hand or another sequencing program). Alternatively, you can obtain the CIDentify source code and modify the data input format to suit yourself. It is also worth pointing out that the Lutefisk output file can be edited in order to eliminate any sequences that you somehow know are incorrect.

CID data files

Lutefisk can read CID data files in four different formats:

  1. ASCII files created by the Finnigan TSQ program called "List". These are created by starting the "List" program within ICIS Executive" and opening the data file of interest within "List". Under the "File" menu, go to "Print...". A dialog box appears wherein you select "ASCII" as the saved formats, and under "Text Displays" select "Multiple Pages". Provide a file name and select the "Save to File" button (don't select "Print" or else you will have reams of scratch paper).
  2. ASCII files created by the Finnigan LCQ File Converter program. From the destination box select "text" as the format, select the LCQ .raw files to convert, click on the arrow button, and then click on the convert button.
  3. Tab-delineated ASCII text files. The first column contains m/z values followed by a tab and the second column is unitless relative intensity (an example is shown on the web site).

For any of these first three formats, the data can contain profile data with multiple data points per mass unit, or it can be centroided (or peak top) m/z values.

  1. In addition, Lutefisk can read the Sequest ".dta" files; for ".dta" files "C" should be selected for the parameter "Profile or Centroid" found in the Lutefisk.params file (see below). This is our favorite data file format, since our Micromass Qtof and Thermofinnigan ion traps all have software that converts raw LC/MS/MS data to dta files.

Lutefisk.details

The Lutefisk.details file contains the so-called "ion probabilities" for each type of ion. Here is an example. Each column in the file contains the "ion probabilities" for different fragmentation patterns (see the description of "fragmentation patterns" below). Currently there are only two types of fragmentation pattern that have been coded, which is for low energy CID of tryptic peptides on triple quadrupole (or Qtof) instruments or ion traps, and these ion probabilities are listed in the second and third columns. The first column is not used (oddly enough). 

Lutefisk.residues

The Lutefisk.residues file contains the single letter code, monoisotopic masses, average masses, and nominal masses for each amino acid. The default Lutefisk.residue file is shown here. To add an additional residue to the list, replace the 0's in one of the rows w/ the corresponding monoisotopic, average, and nominal masses. Up to five additional non-traditional residues can be entered here, and will be given the single letter code of J, O, U, X, or Z. Also, if you don't like my masses for the usual amino acids, you should feel free to change them here.

Database.sequence

The Database.sequence file is a text file containing a sequence or a list of sequences that might have been derived from a sequence database search. An example of such a file is shown here. Although this seems like one is giving Lutefisk the answers up front, in fact, Lutefisk will do its usual de novo sequencing regardless of the Database.sequence list. In the final steps, where it determines scores for the candidate sequences, Lutefisk tosses in these database-derived sequences along with the de novo sequence candidates to determine if the database sequences are as good as or better than the de novo sequences. If so, then this constitutes evidence that the database derived sequences might actually be correct.

Lutefisk.params

The Lutefisk.params file is where most of the user-selected variables are altered. Once appropriate parameters have been chosen for a given set of data, one usually needs to change three parameters -- "CID Filename", "Peptide MW", and "Charge-state" (of the precursor ion). If the CID file is in the "dta" format, the latter two parameters can be automatically read from the file header and invoking the program like 'lutefisk <dta_file>' will override all three parameters with those in the specifed data file(s).

Certain parameters need to be changed to accommodate data obtained from different instruments. The mass tolerances should match the anticipated errors for each type of instrument. The tolerance parameter "Final Fragment Err" should be set to zero unless the data was obtained from a Qtof, in which case, it should be a value of 0.02 - 0.05 ( I currently use 0.04). The "Peak Width" parameter should be set to 1 for unit resolved data (ion trap), and 0.75 for higher resolution data obtained from a Qtof. Triple quads are often run in a low resolution mode to enhance sensitivity, so the peak widths might be 2-3 u. For less than unit resolved spectra (triple quad, say) set the "Transition Mass" to 1800. This is the mass above which average (rather than monoisotopic) masses are used in the calculations; for unit resolved or better data, set this high (5000) so that average masses are never used. Since fragmentation patterns are slightly different for triple quads, ion traps, and Qtof's, set the parameter "Fragmentation Pattern" accordingly. Finally, the parameter "Auto Tag" should be set to N for ion trap data, and Y for Qtof or triple quad data. Here are the parameter files I use for data obtained by LC/MS/MS using an ion trap, and LC/MS/MS using a Qtof.

Here is a more complete description of the Lutefisk.params file parameters.

Output files (.lut)

The header repeats the information contained in the params file, and also lists several scores that need some explanation. The candidate sequences are ranked according to Pr(C) which is the estimated probability of being correct.  I find that values over 0.5 are worth submitting to a homology-based sequence database search program, and anything over 0.8 is particularly worthy of serious consideration.  Pr(C) is calculated from an empirically-derived 2-order polynomial fit to a weighted average of the four remaining scores (Pevzscr, Quality, Intscr, and X-corr).  Pevzscr is an adaptation of the ideas presented by Dancik et al (J. of Comput. Biol (1999) Vol 6, 327), which is a score that penalizes for the absence of expected ions and accounts for the possibility of random matches.  Quality is the percentage of the peptide mass that can be accounted for by a contiguous ion series.  Intscr is the percentage of the fragment ion intensity that can be accounted for as b, y, internal fragment, etc, ions.  X-corr is the cross-correlation score that has been normalized by its auto-correlation score.

If "Mass Scrambles for Statistics" in the params file was used, then the bottom of the output file contains a summary of the statistical analysis. The first column "1st ranked" lists the un-normalized scores for the top ranked sequence. The column "St Deviations" shows how many standard deviations the top ranked sequence scores were compared to the average wrong scores. The column "Average Wrong" lists these wrong score averages, and the column "Correct/Wrong" shows the ratio of the top correct score versus the wrong score. Currently, I don’t recommend using this, so just give a zero value for the parameter “Mass Scrambles for Statistics” (lutefisk.params file).


Compilation Notes

See the 'README' file for compilation information.

Unix/Linux:

After untarring the archive, copy the makefile for your system to "Makefile". Use the "make lutefisk" command.

Win32:

Current Metrowerks Projects are included in the "Win32" folder. If you have an older compiler you will need to create a new "C Console App" project and add the source files as specified in the '0_ReadMe' file.

Macintosh:

Current Metrowerks Projects are included in the "Macintosh" folder (provided as a self- extracting archive to maintain fidelity). If you have an older compiler you will need to create a new "Std C Console PPC" project and add the source files as specified in the '0_ReadMe' file. Set the preferred heap size to 16M, the minimum heap size to 8M, and the stack size to 512k.


Contact Information

Lutefisk is software for de novo sequencing of peptides from tandem mass spectra.

Copyright (C) 1995  Richard S. Johnson

 

This program is free software; you can redistribute it and/or

modify it under the terms of the GNU General Public License

as published by the Free Software Foundation; either version 2

of the License, or (at your option) any later version.

 

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

GNU General Public License for more details.

 

You should have received a copy of the GNU General Public License

along with this program; if not, write to the Free Software

Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

 

Contact:

 

Richard S Johnson

4650 Forest Ave SE

Mercer Island, WA 98040

 

jsrichar@alum.mit.edu


Version History

LutefiskXP version 1.0.4 – External release

·        When Q is in the second position, then a c1 ion is present.  It is now scored, and used to distinguish the two amino acids if they are not already.

·        Limits placed on the number of subsequences (drops by half) if the program has been processing for over 30 seconds. It drops by another half if the processing is over a minute. This will speed long ones up.

·        Limits placed on the deconvolution of the C-terminal unsequenced chunk of mass (in Haggis).  If the overall processing has gone on for over 45 sec then it drops the mass to 600.

·        For LCQ data, b/y pairs are found and labeled as "golden boys" that cannot be gotten rid of as easily.  These could also be considered "favored sons", although we can all hope that the favored son "W" is eliminated in November (post-election news: sadly the favored son is still in DC).

·        Fixed the ranking so that the db derived sequences don't fowl up the rank numbering.

·        Fixed a bug that, in certain cases, produced negative values for summed residue masses displayed within brackets

·        Changed the LCQ b/y pair procedures (both in GetCID and main) such that pairs were either both singly charged, or one was singly-charged and the other doubly-charged (but only if the precursor is > 2 charge state).

·        Changed Haggis a bit so that the minimum number of edges required to be considered a sequence varied with the number of Lutefisk sequences already obtained as well as peptide molecular weight.  Minimum stayed at 4, but could be higher.

·        Changed Haggis so that if the array capacity was exceeded, it dumped the results obtained with the Lutefisk sequences and then continued to run (rather than exit(1), which is a waste).

·        Changed the way the ion intensity is altered in the final scoring.  The high intensity ions are reduced, but the low intensity ones are not increased

LutefiskXP version 1.0 – External release

·        The Haggis sequencing was modified so that two ion series could be combined.

·        The Haggis sequencing was modified so that the two unsequenced masses at either end could be matched to randomly

·        derived sequences.  The random sequences that fit with the most b and y ions is saved and replaces the chunk of mass.

·        Added two output variables to the params file.  Now can specify the number of sequences and their Pr(c) limit.

·        Modified final scoring slightly, such that "quality" has less of an influence.

Lutefisk1900 version 1.3.8 - Internal release only

Lutefisk1900 version 1.3.7 - Internal release only

Lutefisk1900 version 1.3.5 - Internal release only

Lutefisk1900 version 1.3.4 - Internal release only

Version 1900 1.3.2 – Released 1/28/02

Version 1900 1.2.9 - Internal release only

Version 1900 1.2.8 - Internal release only

Version 1900 1.2.7 - Internal release only

Version 1900 1.2.6 - Released 10/15/00

Version 1900 1.2.5 - Internal release only

Version 1900 1.2.4 - Released 4/6/00

 

Version 2.0.5 - Released 10/16/98

Version 2.0.4

Version 2.0.3

Version 2.0.2

Verson 2.0.1

Version 1.4.8

Version 1.4.7

Version 1.4.6

Version 1.4.5

Version 1.3 - Released 11/17/97

Version 1.1 - Released 6/22/97

Version 1.0 - Released 5/30/97

lutefisk-1.0.7+dfsg.orig/docs/edman.html0000644000175000017500000000265107540444000020054 0ustar rusconirusconi edman
SG		/ Cycle 1.	Enter single letter code w/o spaces. "X" means it could be anything.
Y		/ Cycle 2.	Enter single letter code w/o spaces. "X" means it could be anything.
Y		/ Cycle 3.	Enter single letter code w/o spaces. "X" means it could be anything.
DE		/ Cycle 4.	Enter single letter code w/o spaces. "X" means it could be anything.
V		/ Cycle 5.	Enter single letter code w/o spaces. "X" means it could be anything.
G		/ Cycle 6.	Enter single letter code w/o spaces. "X" means it could be anything.
M		/ Cycle 7.	Enter single letter code w/o spaces. "X" means it could be anything.
L		/ Cycle 8.	Enter single letter code w/o spaces. "X" means it could be anything.
T		/ Cycle 9.	Enter single letter code w/o spaces. "X" means it could be anything.
R		/ Cycle 10.	Enter single letter code w/o spaces. "X" means it could be anything.
		/ Cycle 11.	Enter single letter code w/o spaces. "X" means it could be anything.
		/ Cycle 12.	Enter single letter code w/o spaces. "X" means it could be anything.
lutefisk-1.0.7+dfsg.orig/docs/Qtof_Lutefisk.params0000644000175000017500000001244107763162071022100 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: data\databaseTestQtof\DAIPENLPPLTADFAEDKDVCK_3_qtof.dta | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 2 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.45 | Peptide molecular weight tolerance. Fragment Error (u): 0.25 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0.04 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 0.75 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.01 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 8 | Ions per input window (windows are 60 Da wide). Ions Per Residue: 6 | Number of ions per average residue. // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: Q | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: 1.0078 | N-terminal mass [1.0078(unmod), 43.0184(acetyl), 44.0136(carbamyl)] Modified C-terminus: 17.0027 | C-terminal mass [17.0027(unmod), 16.0187(amide), 31.0184(methyl)] Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: Y | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): 15 | US shoe size. Default of 17. // Output -------------------------------------------------------------------------------- Number of sequences: 5 | Number of output sequences listed. A good bet is 5 Score threshold: 0.2 | Pr(c) is approximate probability that at least half of the sequence is correct. A good bet is 0.20. lutefisk-1.0.7+dfsg.orig/docs/residues.html0000644000175000017500000000326107540444034020620 0ustar rusconirusconi residues
A	 71.0371	 	71.08		 71		/Ala
R	156.1011		156.19		156		/Arg
N	114.0429		114.10		114		/Asn
D	115.0270		115.09		115		/Asp
C	103.0092		103.14		103		/Cys
E	129.0426		129.12		129		/Glu
Q	128.0586		128.13		128		/Gln
G	 57.0215	 	57.05		 57		/Gly
H	137.0589		137.14		137		/His
I	113.0841		113.16		113		/Ile
L	113.0841		113.16		113		/Leu
K	128.0950		128.17		128		/Lys
M	131.0405		131.20		131		/Met
F	147.0684		147.18		147		/Phe
P	 97.0528		 97.12		 97		/Pro
S	 87.0320		 87.08		 87		/Ser
T	101.0477		101.11		101		/Thr
W	186.0793		186.21		186		/Trp
Y	163.0633		163.17		163		/Tyr
V	 99.0684		 99.13		 99		/Val
J	  0.0			  0.0		  0		/Modified amino acid
O	  0.0			  0.0		  0		/Modified amino acid
U	  0.0			  0.0		  0		/Modified amino acid
X	  0.0			  0.0		  0		/Modified amino acid
Z	  0.0			  0.0		  0		/Modified amino acid
lutefisk-1.0.7+dfsg.orig/README.txt0000644000175000017500000001031710303671100016640 0ustar rusconirusconi To get a brief summary of the important command-line options, invoke lutefisk with the '-h' option. (On the Macintosh, command-line options are entered on the 'Argurments:' line of the initial dialog box. To use command-line options under Win32 the executable must be invoked from a DOS prompt such as by using the Command Prompt program.) USAGE: lutefisk [options] [CID file pathname] -o = output file pathname -q = quiet mode ON (default OFF) -m = precursor ion mass -d = details file pathname -p = params file pathname -r = residues file pathname -s = pathnane of file with database sequences to score -v = verbose mode ON (default OFF) -h = print this help text ________________________________________________________________________ LUTEFISK SOURCE CODE ARCHIVE CONTENTS: (The Lutefisk_src.tar.gz archive is a tar archive that has been gzipped.) * Example Lutefisk input file, "QTof_ETYGDMADCCEK.dta" * Example Lutefisk output file, "QTof_ETYGDMADCCEK.lut" * This README file, "README" * The copright file, "COPYRIGHT" * The version history, "HISTORY" * The lutefisk accessory files: Lutefisk.details Lutefisk.params Lutefisk.residues database.sequence * C Source code files for Lutefisk: Makefile.XXX - Makefiles for compiling Lutefisk for AIX, IRIX, LINUX, OSF, OSX, or SUN LutefiskGlobalDeclarations.c Declaration of global variables. LutefiskMain.c Reads parameter, edman, and detail files. Also contains the function main. LutefiskGetAutoTag.c Finds peptide sequence tags automatically. LutefiskGetCID.c Loads data file and performs peak detection. LutefiskMakeGraph.c Produces N and C terminal sequence graphs. LutefiskSummedNode.c Combines N and C terminal sequence graphs into a single graph. LutefiskHaggis.c Finds candidate sequences by searching for contiguous series of ions that do not necessarily connect with either N- or C-termini. LutefiskSubseqMaker.c Uses the sequence graph to determine sequence candidates. LutefiskScore.c Assigns score and rank to sequence candidates. LutefiskXCorr.c Performs cross-correlation scoring. LutefiskFourier.c Used for cross-correlation scoring. ListRoutines.c Handles lists. getopt.c Interprets line commands when starting the program (Needed by the Mac and Win32 versions). Lutefisk.rsrc Resources needed by the Mac version. Header files: LutefiskDefinitions.h Contains #defines and struct definitions. LutefiskPrototypes.h Contains function prototypes. ListRoutines.h getopt.h Needed by the Mac and Win32 versions. ________________________________________________________________________ * Compiling on the Macintosh Current Metrowerks Projects are included in the "Macintosh" folder (provided as a self- extracting archive to maintain fidelity). If you have an older compiler you will need to create a new "Std C Console PPC" project and add the source files as specified above. Set the preferred heap size to 16M, the minimum heap size to 8M, and the stack size to 512k. You may have to change the files to creator 'CWIE' and type 'TEXT' to get Codewarrior to like them. Note that in MacOS X the apps can be compiled using the UNIX directions if desired. * Compiling under Win32 Current Metrowerks Projects are included in the "Win32" folder. If you have an older compiler you will need to create a new "C Console App" project and add the source files as specified above. * Compiling on UNIX After untarring the archive, copy the makefile for your system to "Makefile". Use the "make lutefisk" command. ________________________________________________________________________ Questions? Problems? Richard S. Johnson jsrichar@alum.mit.edu lutefisk-1.0.7+dfsg.orig/Qtof_ELVISLIVESK.lut0000644000175000017500000000367510307633116020503 0ustar rusconirusconiLutefiskXP v1.0.4 Copyright 1996-1904 Richard S. Johnson Run Date: Wed Sep 7 11:49:50 2005 Filename: Qtof_ELVISLIVESK.dta Molecular Weight: 1228.73 Molecular Weight Tolerance: 0.45 Fragment Ion Tolerance: 0.25 Ion Offset: 0.00 Charge State: 2 Centroided or Pre-processed Data Tryptic Digest Tryptic QTOF Fragmentation Pattern Cysteine residue mass: 160.03 Switch from monoisotopic to average mass at 5000 Ions per window: 8.0 Extension Threshold: 0.15 Extension Number: 6 Gaps: 1 Peak Width: 0.8 Data Threshold: 0.01 (1) Ions per residue: 6.0 Amino acids known to be present: * Amino acids known to be absent: * C-terminal mass: 17.0027 N-terminal mass: 1.0078 N-terminal Tag Mass: 0.00 C-terminal Tag Mass: 0.00 Sequence Tag: *0 Edman data is not available. AutoTag ON Sequence Rank Pr(c) PevzScr Quality IntScr X-corr NQVLSLLVESK 1 0.990 0.713 0.800 0.951 0.831 NQVLSLLVESK 2 0.990 0.691 0.800 0.952 0.831 AGNVLSLLVESQ 3 0.990 0.668 0.822 0.951 0.775 AGNVLSLLVESK 4 0.990 0.668 0.822 0.951 0.775 AGNVLSLVLE[215.09] 5 0.957 0.517 0.822 0.869 0.660 AGNVLSLVLE[215.13] 6 0.957 0.517 0.822 0.869 0.660 AGNVLSLVLE[215.16] 7 0.940 0.497 0.822 0.830 0.660 [212.15]ELSLLVESK 8 0.934 0.590 0.718 0.898 0.585 [212.08]ELSLLVESK 9 0.933 0.447 0.822 0.872 0.585 NQVLSLLV[188.06]R 10 0.914 0.550 0.622 0.899 0.694 Search time: 0:00:02 lutefisk-1.0.7+dfsg.orig/License.txt0000644000175000017500000003563710303671001017301 0ustar rusconirusconiGNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. lutefisk-1.0.7+dfsg.orig/Lutefisk.residues0000644000175000017500000000324010102223526020474 0ustar rusconirusconiA 71.0371 71.08 71 /Ala - Single letter code, monoisotopic, average, nominal masses R 156.1011 156.19 156 /Arg N 114.0429 114.10 114 /Asn D 115.0269 115.09 115 /Asp C 103.0092 103.14 103 /Cys E 129.0426 129.12 129 /Glu Q 128.0586 128.13 128 /Gln G 57.0215 57.05 57 /Gly H 137.0589 137.14 137 /His I 113.0841 113.16 113 /Ile and later used for oxidized Met, although B is its single letter code in the output L 113.0841 113.16 113 /Leu is used to represent Leu and Ile K 128.0950 128.17 128 /Lys M 131.0405 131.20 131 /Met F 147.0684 147.18 147 /Phe P 97.0528 97.12 97 /Pro S 87.0320 87.08 87 /Ser T 101.0477 101.11 101 /Thr W 186.0793 186.21 186 /Trp Y 163.0633 163.17 163 /Tyr V 99.0684 99.13 99 /Val J 0.0 0.0 0 /Modified amino acid: ICRAP-D0 (188.062, 188.251) O 0.0 0.0 0 /Modified amino acid: ICRAP-D5 (193.093, 193.281) U 0.0 0.0 0 /Modified amino acid: phospho-Ser (166.9983) X 0.0 0.0 0 /Modified amino acid: phospho-Thr (181.0140) Z 0.0 0.0 0 /Modified amino acid: phospho-Tyr (243.0296) lutefisk-1.0.7+dfsg.orig/src/0000755000175000017500000000000011666247617015760 5ustar rusconirusconilutefisk-1.0.7+dfsg.orig/src/LutefiskGetAutoTag.c0000644000175000017500000015426510303626440021632 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ #include #include #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" #include "ListRoutines.h" /*Globals for this file only.*/ tsequenceList *TagSeqList; tsequenceList *TagSubseqList; tMSDataList *TagMassList; /************************************* SortExtension ************************************** * * * */ struct extension *SortExtension(struct extension *inExtensionList) { INT_4 i; INT_4 extensionCount, outIndex; INT_4 highestExtensionScore, threshold; REAL_4 extensionThreshold = 0.05; /*extensions must have score greater than this*/ struct extension *outExtensionList; outExtensionList = (extension *) calloc(MAX_GAPLIST, sizeof(struct extension)); if(outExtensionList == NULL) { printf("SortExtension: Out of memory."); exit(1); } /*Find the highest extension score.*/ highestExtensionScore = inExtensionList[0].score; extensionCount = 0; i = 0; while(inExtensionList[i].mass > 0 && i < MAX_GAPLIST) { if(inExtensionList[i].score > highestExtensionScore) { highestExtensionScore = inExtensionList[i].score; } i++; extensionCount++; } /*Set the extension score threshold.*/ threshold = highestExtensionScore * extensionThreshold; /*If the number of extensions found are less than the specified maximum number of extensions per subsequence, then do this.*/ if(extensionCount <= gParam.maxExtNum) { outIndex = 0; for(i = 0; i < extensionCount; i++) { if(inExtensionList[i].score >= threshold && i < gParam.topSeqNum) { outExtensionList[outIndex] = inExtensionList[i]; outIndex++; if(outIndex >= MAX_GAPLIST) { printf("SortExtension: outIndex >= MAX_GAPLIST\n"); exit(1); } } } } if(extensionCount > gParam.maxExtNum) { /*There are too many extensions; throw out the worst ones.*/ /*First sort the extensions by whether they are singleAA or not and then by score.*/ qsort(inExtensionList, extensionCount, sizeof(struct extension), ExtensionsSortDescend); outIndex = 0; i = 0; while(i < extensionCount && outIndex < gParam.maxExtNum && outIndex < gParam.topSeqNum) { if(inExtensionList[i].score >= threshold) { outExtensionList[outIndex] = inExtensionList[i]; outIndex++; if(outIndex >= MAX_GAPLIST) { printf("SortExtensions: outIndex >= MAX_GAPLIST\n"); exit(1); } } i++; } } free(inExtensionList); return(outExtensionList); } /***************************RemoveNeutralLosses**************************************** * * Anything that is 17 or 18 Da less than another ion, where the neutral loss ion is * less than half the intensity of the higher mass ion is discarded. */ void RemoveNeutralLosses(INT_4 charge) { REAL_4 massDiff; char test; tMSData *tagMassPtr, *nextMassPtr; if(TagMassList->numObjects < 2) { return; } tagMassPtr = &TagMassList->mass[0]; while(tagMassPtr < TagMassList->mass + (TagMassList->numObjects) - 1) { nextMassPtr = tagMassPtr + 1; while(nextMassPtr < TagMassList->mass + TagMassList->numObjects) { test = FALSE; massDiff = nextMassPtr->mOverZ - tagMassPtr->mOverZ; massDiff = massDiff * charge; /*deal w/ charge > 1*/ if(massDiff <= gWater + gParam.fragmentErr && massDiff >= gAmmonia - gParam.fragmentErr) { if((2 * tagMassPtr->intensity) < nextMassPtr->intensity) { RemoveFromList(tagMassPtr - TagMassList->mass, TagMassList); break; } } nextMassPtr++; } tagMassPtr++; } return; } /***************************FilterTagMasses******************************************** * * Get rid of ions that are closer than Gly (remove the lower intensity of the pair). * */ void FilterTagMasses(INT_4 charge) { REAL_4 massDiff, minDiff; tMSData *tagMassPtr; minDiff = (gGapList[G] / charge) - 2 * gParam.fragmentErr; /*2x err for a little slop*/ if(TagMassList->numObjects < 2) { return; } /*If things are too close, then zero out the intensity of the lowest ones*/ tagMassPtr = &TagMassList->mass[1]; /*start w/ the second tagMass*/ while(tagMassPtr < TagMassList->mass + TagMassList->numObjects) { massDiff = tagMassPtr->mOverZ - (tagMassPtr - 1)->mOverZ; if(massDiff < minDiff) { if((2 * (tagMassPtr - 1)->intensity) < tagMassPtr->intensity) { /*just remove things that are more than 2x diff in intensity*/ RemoveFromList(tagMassPtr - 1 - TagMassList->mass, TagMassList); tagMassPtr--; } else if((2 * tagMassPtr->intensity) < (tagMassPtr - 1)->intensity) { RemoveFromList(tagMassPtr - TagMassList->mass, TagMassList); tagMassPtr--; } } tagMassPtr++; } } /**********************************NterminalTags**************************************** * * This function sets up the first batch of subsequences. It starts with a single subsequence * containing the N-terminal group (usually hydrogen of mass 1) and then tries to connect with * nodes that are either one or two amino acids higher in mass. It uses the array gapList * to determine what is an acceptable mass difference. A linked list of structs of type * Sequence is generated where the first struct in the list has the highest score, and the last * struct in the list has the lowest score. This function returns a pointer to * a struct of type Sequence, which is the first element in a linked list of subsequences. */ INT_4 NterminalTags(char *tagNode, INT_4 *tagNodeIntensity) { struct Sequence *subsequencePtr; tsequence subsequenceToAdd; INT_4 i, j, k, testValue, nTerminus; INT_4 *extensions, *extScore, extNum; INT_4 *bestExtensions, *bestExtScore, bestExtNum; INT_4 highestExtensionScore; INT_4 *singleAAExtension, singleAAExtNum, threshold; INT_4 *peptide; extensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(extensions == NULL) { printf("NterminalTags: Out of memory"); exit(1); } extScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(extScore == NULL) { printf("NterminalTags: Out of memory"); exit(1); } bestExtensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(bestExtensions == NULL) { printf("NterminalTags: Out of memory"); exit(1); } bestExtScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(bestExtScore == NULL) { printf("NterminalTags: Out of memory"); exit(1); } singleAAExtension = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(singleAAExtension == NULL) { printf("NterminalTags: Out of memory"); exit(1); } peptide = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(peptide == NULL) { printf("NterminalTags: Out of memory"); exit(1); } extNum = 0; singleAAExtNum = 0; subsequencePtr = NULL; /*Figure out what the N-terminal node is.*/ nTerminus = gParam.modifiedNTerm + 0.5; for(i = 0; i < gAminoAcidNumber; i++) /*Find the one amino acid extensions.*/ { if(gGapList[i] != 0) { testValue = nTerminus + gGapList[i]; if(tagNode[testValue] != 0) { extensions[extNum] = gGapList[i]; singleAAExtension[extNum] = gGapList[i]; extScore[extNum] = tagNodeIntensity[testValue]; extNum++; singleAAExtNum++; } } } /* Now find the two amino acid extensions. */ for(i = gAminoAcidNumber; i <= gGapListIndex; i++)/*Start at the end of the one aa extensions and move up from there.*/ { testValue = nTerminus + gGapList[i];/*This is the test mass (nominal).*/ if(tagNode[testValue] != 0) /*If there is any evidence at that mass.*/ { extensions[extNum] = gGapList[i]; extScore[extNum] = tagNodeIntensity[testValue]; extNum++; } } /* If two extensions differ by the mass of an amino acid, then the higher mass one's intensity is assigned a zero so that it is removed in the section below.*/ if(extNum > 0) { for(i = 0; i < extNum; i++) { for(j = i + 1; j < extNum; j++) { testValue = extensions[j] - extensions[i]; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(testValue == gGapList[k]) { extScore[j] = 0; } } } } } } /* * Now I need to find the best extensions, ie, the top maxExtNum of them and only if these * extensions are greater than the product of the highest score and extThresh. * bestExtensions[MAX_GAPLIST], bestExtScore[MAX_GAPLIST], bestExtNum; */ highestExtensionScore = extScore[0]; /*Find the highest extension score.*/ for(i = 0; i < extNum; i++) { if(extScore[i] > highestExtensionScore) { highestExtensionScore = extScore[i]; } } threshold = highestExtensionScore * 0.01; /*Set the extension score threshold.*/ bestExtNum = 0; for(i = 0; i < extNum; i++) { if(extScore[i] > threshold) { bestExtensions[bestExtNum] = extensions[i]; bestExtScore[bestExtNum] = extScore[i]; bestExtNum++; } } /* Store this information in the linked list of Sequence structs.*/ for(i = 0; i < bestExtNum; i++) { subsequenceToAdd.score = bestExtScore[i]; subsequenceToAdd.peptide[0] = bestExtensions[i]; subsequenceToAdd.peptideLength = 1; subsequenceToAdd.gapNum = 0; subsequenceToAdd.nodeValue = nTerminus + bestExtensions[i]; if(!AddToList(&subsequenceToAdd, TagSubseqList)) { free(extensions); free(extScore); free(bestExtensions); free(bestExtScore); free(singleAAExtension); free(peptide); return(extNum); } } free(extensions); free(extScore); free(bestExtensions); free(bestExtScore); free(singleAAExtension); free(peptide); return(extNum); } /**********************************AlternateNterminalTags**************************************** * * This function sets up the first batch of subsequences. It starts with a single subsequence * containing the N-terminal group (usually hydrogen of mass 1) and then tries to connect with * nodes that are either one or two amino acids higher in mass. It uses the array gapList * to determine what is an acceptable mass difference. A linked list of structs of type * Sequence is generated where the first struct in the list has the highest score, and the last * struct in the list has the lowest score. This function returns the number of extensions. */ INT_4 AlternateNterminalTags(char *tagNode, INT_4 *tagNodeIntensity) { tsequence subsequenceToAdd; INT_4 i, j, k, m, testValue, nTerminus; INT_4 *extensions, *extScore, extNum; INT_4 *bestExtensions, *bestExtScore, bestExtNum; INT_4 highestExtensionScore; INT_4 threshold; /*INT_4 *peptide;*/ INT_4 *threeAA, threeAANum, sum; INT_4 sameNum, averageExtension, *sameExtension; char nTerminusPossible, duplicateFlag; char sameTest; sameExtension = (int *) malloc(gParam.fragmentErr * 10 * sizeof(INT_4)); if(sameExtension == NULL) { printf("AlternateNterminalTags: Out of memory."); exit(1); } threeAA = (int *) malloc(gAminoAcidNumber*gAminoAcidNumber*gAminoAcidNumber*sizeof(INT_4)); if(threeAA == NULL) { printf("AlternateNterminalTags: Out of memory."); exit(1); } extensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(extensions == NULL) { printf("AlternateNterminalTags: Out of memory."); exit(1); } extScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(extScore == NULL) { printf("AlternateNterminalTags: Out of memory."); exit(1); } bestExtensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(bestExtensions == NULL) { printf("AlternateNterminalTags: Out of memory."); exit(1); } bestExtScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4)); if(bestExtScore == NULL) { printf("AlternateNterminalTags: Out of memory."); exit(1); } /*peptide = malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4 )); if(peptide == NULL) { printf("Out of memory"); exit(1); }*/ /* Fill in the masses for three amino acids.*/ threeAANum = 0; for(i = 0; i < gAminoAcidNumber; i++) /*Fill in the masses of the 3 AA extensions.*/ { for(j = 0; j < gAminoAcidNumber; j++) { for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[i] != 0 && gGapList[j] != 0 && gGapList[k] != 0) { sum = gGapList[i] + gGapList[j] + gGapList[k]; duplicateFlag = FALSE; for(m = 0; m < threeAANum; m++) { if(threeAA[m] == sum) { /* We already have this mass in threeAA so don't add it to the list. */ duplicateFlag = TRUE; break; } } if(duplicateFlag == FALSE) { for(m = 0; m <= gGapListIndex; m++) { if(gGapList[m] == sum) { /* We already have this mass so don't add it to the list. */ duplicateFlag = true; break; } } } if(!duplicateFlag) { threeAA[threeAANum] = sum; threeAANum++; if(threeAANum >= gAminoAcidNumber*gAminoAcidNumber*gAminoAcidNumber) { printf("AlternateNTerminalTags: threeAANum >= gAminoAcidNumber*gAminoAcidNumber*gAminoAcidNumber\n"); exit(1); } } } } } } /* Find the n-terminus.*/ extNum = 0; /*Figure out what the N-terminal node is.*/ nTerminus = gParam.modifiedNTerm + 0.5; /* Find the one, two, and three amino acid jumps from gGapList and threeAA.*/ for(i = nTerminus + gMonoMass_x100[G]; i < gMonoMass_x100[W] * 3; i++) /*step thru each node*/ { if(tagNode[i] != 0 && i < gGraphLength) /*ignore the nodes w/ zero evidence*/ { nTerminusPossible = FALSE; /*start assuming that this is not an extension*/ /* Check for one and two amino acid extensions using the gGapList array.*/ for(j = 0; j <= gGapListIndex; j++) { if(gGapList[j] != 0) { if(i - nTerminus == gGapList[j]) { nTerminusPossible = TRUE; /*its a possible extension*/ break; } } } /* Now check for three amino acid extensions if it wasn't a one or two aa extension.*/ if(nTerminusPossible == FALSE) { for(j = 0; j < threeAANum; j++) { if(threeAA[j] != 0) { if(i - nTerminus == threeAA[j]) { nTerminusPossible = TRUE; break; } } } } if(nTerminusPossible) /*save as an extension?*/ { extensions[extNum] = i - nTerminus; extScore[extNum] = tagNodeIntensity[i]; extNum++; if(extNum >= MAX_GAPLIST) { printf("AlternateNTerminalTags: extNum >= MAX_GAPLIST\n"); exit(1); } } } } /* If there are no extensions that match combinations of three amino acids, then look for anything*/ for(i = nTerminus + gMonoMass_x100[G]; i < gMonoMass_x100[W] * 3; i++) /*step thru each node*/ { if(tagNode[i] != 0 && i < gGraphLength) /*ignore the nodes w/ zero evidence*/ { extensions[extNum] = i - nTerminus; extScore[extNum] = tagNodeIntensity[i]; extNum++; if(extNum == MAX_GAPLIST - 1) break; } } /* Find extensions that are 1 node unit apart, and consolidate them.*/ for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { sameNum = 0; averageExtension = extensions[i]; for(j = 0; j < extNum; j++) { if(j != i && extScore[j] != 0) { sameTest = FALSE; for(k = 0; k < sameNum; k++) { if(extensions[sameExtension[k]] - extensions[j] == 1 || extensions[j] - extensions[sameExtension[k]] == 1) { sameTest = TRUE; } } if(extensions[i] - extensions[j] == 1 || extensions[j] - extensions[i] == 1 || sameTest) { sameExtension[sameNum] = j; averageExtension += extensions[j]; sameNum++; if(sameNum >= gParam.fragmentErr * 10) { printf("AlternateNTerminalTags: sameNum >= gParam.fragmentErr * 10\n"); exit(1); } } } } if(sameNum != 0) { averageExtension = ((float)averageExtension / (sameNum + 1)) + 0.5; /*count the i extension and round the value*/ extensions[i] = averageExtension; for(j = 0; j < sameNum; j++) { extScore[sameExtension[j]] = 0; } } } } /* Get rid of the extensions that are within gParam.fragmentErr of each other.*/ for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { for(j = 0; j < extNum; j++) { if(extScore[j] != 0 && i != j) { if(extensions[i] <= extensions[j] + gParam.fragmentErr && extensions[i] >= extensions[j] - gParam.fragmentErr) { if(extScore[i] >= extScore[j]) { extScore[j] = 0; } else { extScore[i] = 0; } } } } } } /* If two extensions differ by the mass of an amino acid, then the higher mass one's intensity is assigned a zero so that it is removed in the section below.*/ if(extNum > 0) { for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { for(j = i + 1; j < extNum; j++) { if(extScore[j] != 0) { testValue = extensions[j] - extensions[i]; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(testValue <= gGapList[k] + gParam.fragmentErr && testValue >= gGapList[k] - gParam.fragmentErr) { extScore[j] = -1; } } } } } } } } /* * Now I need to find the best extensions, ie, the top maxExtNum of them and only if these * extensions are greater than the product of the highest score and extThresh. * bestExtensions[MAX_GAPLIST], bestExtScore[MAX_GAPLIST], bestExtNum; */ if(extNum >= MAX_GAPLIST) { printf("AlternateNTerminalTags: extNum >= MAX_GAPLIST\n"); exit(1); } highestExtensionScore = extScore[0]; /*Find the highest extension score.*/ for(i = 0; i < extNum; i++) { if(extScore[i] > highestExtensionScore) { highestExtensionScore = extScore[i]; } } threshold = highestExtensionScore * 0.01; /*Set the extension score threshold.*/ bestExtNum = 0; for(i = 0; i < extNum; i++) { if(extScore[i] > threshold) { bestExtensions[bestExtNum] = extensions[i]; bestExtScore[bestExtNum] = extScore[i]; bestExtNum++; if(bestExtNum >= MAX_GAPLIST) { printf("AlternateNTerminalTags: bestExtNum >= MAX_GAPLIST\n"); exit(1); } } } /* Store this information in the linked list of Sequence structs.*/ for(i = 0; i < bestExtNum; i++) { subsequenceToAdd.score = bestExtScore[i]; subsequenceToAdd.peptide[0] = bestExtensions[i]; subsequenceToAdd.peptideLength = 1; subsequenceToAdd.gapNum = 0; subsequenceToAdd.nodeValue = nTerminus + bestExtensions[i]; if(!AddToList(&subsequenceToAdd, TagSubseqList)) { free(extensions); free(extScore); free(bestExtensions); free(bestExtScore); free(sameExtension); free(threeAA); /*free(peptide);*/ return(extNum); } } /* Free the arrays*/ free(extensions); free(extScore); free(bestExtensions); free(bestExtScore); /*free(peptide);*/ free(sameExtension); free(threeAA); return(extNum); } /**********************************TagMaker******************************************** * * This function uses a subsequencing approach to derive a list of incomplete peptide sequences. * The array tagNode contains the information used to build the subsequences. The indexing * of this array corresponds to nominal masses of b ions, and the information contained * relates to the probability that a cleavage was actually present at that nominal mass. * The array tagNodeIntensity contains the ion intensity of the original ions. * */ void TagMaker(char *tagNode, INT_4 *tagNodeIntensity, INT_4 totalIonIntensity) { INT_4 topSeqNum; INT_4 extensions; topSeqNum = gParam.topSeqNum; /* * The function NterminalSubsequences starts at the N-terminal node (usually a value of one), * and jumps by one amino acid, or by two amino acids to generate the first set of subsequences. * This linked list of subsequences is passed as a pointer to the first element in the array. */ extensions = NterminalTags(tagNode, tagNodeIntensity); /* * The function TagExtensions adds one amino acid at a time to the existing list * of subsequences. If a subsequence cannot be extended, then a score is assigned * based on intensity. If the intensity score exceeds a threshold, then the subsequence * is stored as a completed tag. * Once there are no more nodes remaining, then the function returns a NULL value. */ while(extensions) { extensions = TagExtensions(tagNode, topSeqNum, tagNodeIntensity, totalIonIntensity); } } /**********************************AlternateTagMaker******************************************** * * This function uses a subsequencing approach to derive a list of incomplete peptide sequences. * The array tagNode contains the information used to build the subsequences. The indexing * of this array corresponds to nominal masses of b ions, and the information contained * relates to the probability that a cleavage was actually present at that nominal mass. * The array tagNodeIntensity contains the ion intensity of the original ions. This differs * from TagMaker in that the N-terminal amino acids are not singles or REAL_8s; this allows * for a larger jump at the N-terminus of the peptide. * */ void AlternateTagMaker(char *tagNode, INT_4 *tagNodeIntensity, INT_4 totalIonIntensity) { INT_4 topSeqNum; INT_4 extensions; topSeqNum = gParam.topSeqNum; /* * The function NterminalSubsequences starts at the N-terminal node (usually a value of one), * and jumps to any value between two and three tryptophan masses. * This linked list of subsequences is passed as a pointer to the first element in the array. */ extensions = AlternateNterminalTags(tagNode, tagNodeIntensity); /* * The function TagExtensions adds one amino acid at a time to the existing list * of subsequences. If a subsequence cannot be extended, then a score is assigned * based on intensity. If the intensity score exceeds a threshold, then the subsequence * is stored as a completed tag. * Once there are no more nodes remaining, then the function returns a NULL value. */ while(extensions) { extensions = TagExtensions(tagNode, topSeqNum, tagNodeIntensity, totalIonIntensity); } } /***************************FindTagB2Ions*********************************************** * * * */ void FindTagB2Ions(struct MSData *firstMassPtr, INT_4 *totalIntensity, char *tagNode, INT_4 *tagNodeIntensity) { INT_4 ionNum; struct MSData *currPtr; REAL_4 ionMass[258], massDiff, error, bMass; INT_4 ionInt[258], i, j, bMassMax, bMassMin, k; /* Since I'm comparing the mass difference of two adjacent ion m/z's to the mass of CO, I am guessing that the error will always be fairly tight; at least better than the standard m/z fragment error of 0.5 Da. */ if(gParam.fragmentErr > 0.25) { error = 0.25; } else { error = gParam.fragmentErr; } /* Putting the ions into an array makes it easier to look for ions differring by 28 Da.*/ ionNum = 0; currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ > (115.0 - gParam.fragmentErr) && currPtr->mOverZ < (372.2 + gParam.fragmentErr)) { ionMass[ionNum] = currPtr->mOverZ; ionInt[ionNum] = currPtr->intensity; ionNum++; if(ionNum >= 258) { printf("The ionNum is greater than 258."); exit(1); } } currPtr = currPtr->next; } for(i = 1; i < ionNum; i++) { j = i-1; while(j > 0) { massDiff = ionMass[i] - ionMass[j]; if(massDiff >= 28 - error && massDiff <= 28 + error) { if(ionInt[j] < ionInt[i] * 0.1 || ionInt[j] > ionInt[i] * 3) { break; /*if the a/b pair intensity doesn't look right, then toss it out.*/ } totalIntensity += ionInt[i]; /*Boost the total ion intensity.*/ bMass = ionMass[i]; /*Find highest mass node.*/ bMass = bMass + gParam.fragmentErr; bMassMax = bMass; /*Truncate w/o rounding up.*/ tagNode[bMassMax] = 1; tagNodeIntensity[bMassMax] += ionInt[i]; /*Next I'll find the lowest mass node.*/ bMass = bMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; bMassMin = bMass; /*Truncate after rounding up.*/ if(bMassMin != bMassMax) { tagNode[bMassMin] = 1; tagNodeIntensity[bMassMin] += ionInt[i]; } /*Fill in the space.*/ if((bMassMax - bMassMin) > 1) { for(k = (bMassMin + 1); k < bMassMax; k++) { tagNode[k] = 1; tagNodeIntensity[k] += ionInt[i]; } } } j--; } } return; } /**********************FreeTagStructs**************************************** * * Used for freeing memory in a linked list. Bob DuBose tells me its best to free * space in the reverse order * that the space was malloc'ed. This routine does that very thing. * */ void FreeTagStructs(struct Sequence *currPtr) { struct Sequence *freeMePtr; while(currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free(freeMePtr); } return; } /*****************************MaskSequenceNodeWithTags*************************************** * * MaskSequenceNodeWithTags ....blah blah blah * */ void MaskSequenceNodeWithTags(SCHAR *sequenceNode, char *tagNode) { INT_4 i, maxTagRegion; INT_4 firstNode, currentNode; INT_4 charge = gParam.chargeState - 1; INT_4 stop, firstZeroNode; REAL_4 precursor; tsequence *currentTag; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; maxTagRegion = gParam.peptideMW - (precursor * charge) - (charge - 1) * gElementMass_x100[HYDROGEN] /*if precursor was a y ion*/ + 2 * gElementMass_x100[HYDROGEN]; /*This is the equation for converting y ion to b ion.*/ if(gParam.maxent3) { charge = 1; /*all maxent3 processed fragment ions are +1*/ } /* Initialize tagNode to zero's.*/ for(i = 0; i < gGraphLength; i++) { tagNode[i] = 0; } /* Find the N-terminal node. This will equal the nominal mass of the N-terminal group R-NH-.*/ firstNode = gParam.modifiedNTerm; tagNode[firstNode] = 1; /*Give the N-terminal node a value of one.*/ /* For each tag, assign the nodes a value of one.*/ currentTag = TagSeqList->seq; while(currentTag < TagSeqList->seq + TagSeqList->numObjects) { currentNode = firstNode; for(i = 0; i < currentTag->peptideLength; i++) { currentNode = currentNode + currentTag->peptide[i]; if(currentNode >= gGraphLength) { printf("MaskSequenceNodeWithTags: currentNode >= gGraphLength\n"); exit(1); } tagNode[currentNode] = 1; } currentTag++; } /* Find the highest index value in tagNode that is non-zero, add 57 (for Gly) and from that node on up reassign each node to a value of one. */ i = gGraphLength - 1; while(tagNode[i] == 0) { i--; } i = i + gMonoMass_x100[G]; /* If the highest index value plus 57 is less than the maxTagRegion (based on assuming only y ions and that no y ions are used that are less than the precursor), then maxTagRegion is dropped to the highest index plus 57. */ if(maxTagRegion > i) { maxTagRegion = i; } for(i = maxTagRegion; i < gGraphLength; i++) { tagNode[i] = 1; } /* If the N-terminal position contains more than two aa's, then give a value of one to these N-terminal positions in the array.*/ currentTag = TagSeqList->seq; currentNode = firstNode; /*firstNode is the N-terminal group*/ while(currentTag < TagSeqList->seq + TagSeqList->numObjects) { if(currentTag->peptide[0] > currentNode) { currentNode = currentTag->peptide[0] + firstNode; } currentTag++; } if(currentNode >= gGraphLength) { printf("MaskSequenceNodeWithTags: currentNode >= gGraphLength\n"); exit(1); } for(i = 0; i <= currentNode; i++) { tagNode[i] = 1; } firstZeroNode = currentNode + 1; /* For tags that contain W, N, Q, R, or W, the program inserts a value of one at those positions corresponding to two amino acid combinations. This allows data in regions outside of the tag region to contribute to the sequencing if there is a gap in the tag region. */ currentTag = TagSeqList->seq; while(currentTag < TagSeqList->seq + TagSeqList->numObjects) { currentNode = firstNode; for(i = 0; i < currentTag->peptideLength; i++) { currentNode += currentTag->peptide[i]; if(currentNode >= gGraphLength) { printf("MaskSequenceNodeWithTags: currentNode >= gGraphLength\n"); exit(1); } if(currentTag->peptide[i] == gGapList[N]) /*Asn*/ { if(gGapList[G] != 0) { tagNode[currentNode - gGapList[G]] = 1; } } if(currentTag->peptide[i] == gGapList[K] || currentTag->peptide[i] == gGapList[Q]) /*Lys or Gln*/ { if(gGapList[A] != 0) { tagNode[currentNode - gGapList[A]] = 1; } if(gGapList[G] != 0) { tagNode[currentNode - gGapList[G]] = 1; } } if(currentTag->peptide[i] == gGapList[R]) /*Arg*/ { if(gGapList[G] != 0) { tagNode[currentNode - gGapList[G]] = 1; } if(gGapList[V] != 0) { tagNode[currentNode - gGapList[V]] = 1; } } if(currentTag->peptide[i] == gNomMass[W]) /*Trp*/ { if(gGapList[A] != 0) { tagNode[currentNode - gGapList[A]] = 1; } if(gGapList[D] != 0) { tagNode[currentNode - gGapList[D]] = 1; } if(gGapList[E] != 0) { tagNode[currentNode - gGapList[E]] = 1; } if(gGapList[G] != 0) { tagNode[currentNode - gGapList[G]] = 1; } if(gGapList[S] != 0) { tagNode[currentNode - gGapList[S]] = 1; } if(gGapList[V] != 0) { tagNode[currentNode - gGapList[V]] = 1; } } } currentTag++; } /* Assign value of one to gParam.fragmentErr region around each node.*/ i = firstZeroNode; while(i < maxTagRegion && i < gGraphLength) { if(i != firstNode && tagNode[i] != 0) { i = i - gParam.fragmentErr; stop = i + gParam.fragmentErr * 2; while(i <= stop && i < gGraphLength) { tagNode[i] = 1; i++; } } i++; } for(i = firstZeroNode; i <= firstZeroNode + gParam.fragmentErr; i++) { tagNode[i] = 1; } /* * The composite tagNode is multiplied against the array sequenceNode, so that * only sequences identified as possible tags are allowed. */ for(i = 0; i < gGraphLength; i++) { sequenceNode[i] = tagNode[i] * sequenceNode[i]; } return; } /*************************************TagExtensions******************************************** * * This function adds single amino acid extensions onto the subsequences found in the linked * list starting w/ subsequencePtr. As a result, a new linked list starting with * newSubsequencePtr is generated. When all of the extensions have been made and the list * starting with newSubsequencePtr is complete, then the list starting with subsequencePtr * is free'ed, and newSubsequencePtr is returned. * The scores for the subsequences are derived from the array * tagNodeIntensity, and are a summation of ion intensities. The * array gGapList and gGapListIndex describe the possible extensions that are allowed. * The * finalSeqNum is the upper limit on the number of completed sequences that will be stored * in the list of Sequence structs in TagSeqList. The topSeqNum * is the maximum number of subsequences that are in the linked list of sequences (both * the list starting with subsequencePtr and newSubsequencePtr). */ INT_4 TagExtensions(char *tagNode, INT_4 topSeqNum, INT_4 *tagNodeIntensity, INT_4 totalIonIntensity) { tsequence *currPtr; tsequence subsequenceToAdd; char test; INT_4 i, j, testValue; INT_4 extNum; INT_4 score; INT_4 plusPro[AMINO_ACID_NUMBER]; INT_4 flagToStop = 0; /*if any subsequence can be extended, then this becomes 1*/ INT_4 *peptide, peptideLength, nodeValue, gapNum; REAL_4 finalScore; REAL_4 scoreAdjuster; tsequenceList *NewTagSubseqList; struct extension clearExtension; struct extension *extensionList; clearExtension.gapSize = 0; clearExtension.mass = 0; clearExtension.singleAAFLAG = 0; clearExtension.score = 0; extensionList = (extension *) calloc(MAX_GAPLIST, sizeof(struct extension)); if(extensionList == NULL) { printf("TagExtensions: Out of memory."); exit(1); } NewTagSubseqList = (tsequenceList *) CreateNewList( sizeof(tsequence), 50, 10 ); if (!NewTagSubseqList) { free(extensionList); return(flagToStop); } peptide = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4 )); if(peptide == NULL) { printf("TagExtensions: Out of memory."); exit(1); } /* Calculate values for two aa extensions that contain proline.*/ for(i = 0; i < gAminoAcidNumber; i++) { if(gGapList[i] == 0) { plusPro[i] = 0; } else { plusPro[i] = gGapList[P] + gGapList[i]; } } currPtr = TagSubseqList->seq; while(currPtr < TagSubseqList->seq + TagSubseqList->numObjects) { /*Clear the extension list*/ for(i = 0; i < MAX_GAPLIST; i++) { extensionList[i] = clearExtension; } extNum = 0; for(i = 0; i < gAminoAcidNumber; i++) /*Find the one amino acid extensions.*/ { if(gGapList[i] != 0) { testValue = currPtr->nodeValue + gGapList[i]; if(tagNode[testValue] != 0) { extensionList[extNum].mass = gGapList[i]; extensionList[extNum].gapSize = 0; extensionList[extNum].score = tagNodeIntensity[testValue]; extensionList[extNum].singleAAFLAG = 1; extNum++; if(extNum >= MAX_GAPLIST) { printf("TagExtensions: extNum >= MAX_GAPLIST\n"); exit(1); } } } } /* If no extensions were found, then look for two aa extensions that contain proline.*/ if(extNum == 0) { for(i = 0; i < gAminoAcidNumber; i++) /*Find the one amino acid extensions.*/ { if(plusPro[i] != 0) { testValue = currPtr->nodeValue + plusPro[i]; if(tagNode[testValue] != 0) { extensionList[extNum].mass = plusPro[i]; extensionList[extNum].gapSize = 0; extensionList[extNum].score = tagNodeIntensity[testValue]; extensionList[extNum].singleAAFLAG = 1; extNum++; if(extNum >= MAX_GAPLIST) { printf("TagExtensions: extNum >= MAX_GAPLIST\n"); exit(1); } } } } } /* * Now I need to find the best extensions, ie, the top maxExtNum of them and only if these * extensions are greater than the product of the highest score and extThresh. * bestExtensions[MAX_GAPLIST], bestExtScore[MAX_GAPLIST], bestExtNum; */ if(extNum > 0) { extensionList = SortExtension(extensionList); flagToStop = 1; /*keep extending the subsequences*/ /* Store this information in the linked list of Sequence structs. The values placed in the * arrays extensionList are used to determine * values for the variables peptide[], score, peptideLength, gapNum, and nodeValue. These * variables are passed to a few functions that are used to set up the linked list of structs * of type Sequence, which contain the next set of subsequences (newSubsequencePtr*/ for(j = 0; j < currPtr->peptideLength; j++) /*Set up the peptide field so that it includes the previous values contained in the peptide field of currPtr (the subsequence currently under investigation.*/ { if(j >= MAX_PEPTIDE_LENGTH) { printf("TagExtensions: j >= MAX_PEPTIDE_LENGTH\n"); exit(1); } peptide[j] = currPtr->peptide[j]; } i = 0; while(extensionList[i].mass > 0 && i < MAX_GAPLIST) { test = TRUE; /*Becomes FALSE if the data is stored as a final sequence.*/ score = currPtr->score + extensionList[i].score; peptideLength = currPtr->peptideLength + 1; peptide[peptideLength - 1] = extensionList[i].mass; gapNum = currPtr->gapNum + extensionList[i].gapSize; nodeValue = currPtr->nodeValue + extensionList[i].mass; if(peptideLength >= MAX_PEPTIDE_LENGTH) { printf("TagExtensions: peptideLength >= MAX_PEPTIDE_LENGTH\n"); exit(1); } /*If there are not too many subsequences stored, then do this.*/ if(NewTagSubseqList->numObjects < topSeqNum) { subsequenceToAdd.score = score; for(j = 0; j < peptideLength; j++) { subsequenceToAdd.peptide[j] = peptide[j]; } subsequenceToAdd.peptideLength = peptideLength; subsequenceToAdd.gapNum = gapNum; subsequenceToAdd.nodeValue = nodeValue; if(!AddToList(&subsequenceToAdd, NewTagSubseqList)) { DisposeList(NewTagSubseqList); free(extensionList); free(peptide); return(flagToStop); } } else /*If I have the max allowed number of subsequence, then do this.*/ { qsort(NewTagSubseqList->seq, (size_t)NewTagSubseqList->numObjects, (size_t)sizeof(tsequence),SequenceScoreDescendSortFunc); if(score > NewTagSubseqList->seq[NewTagSubseqList->numObjects - 1].score) { /* Replace the worst scorer with this sequence. */ for(j = 0; j < peptideLength; j++) { NewTagSubseqList->seq[NewTagSubseqList->numObjects -1].peptide[j] = peptide[j]; } NewTagSubseqList->seq[NewTagSubseqList->numObjects -1].peptideLength = peptideLength; NewTagSubseqList->seq[NewTagSubseqList->numObjects -1].score = score; NewTagSubseqList->seq[NewTagSubseqList->numObjects -1].nodeValue = nodeValue; NewTagSubseqList->seq[NewTagSubseqList->numObjects -1].gapNum = gapNum; } } i++; } } if(extNum == 0 && currPtr->peptideLength > 1) { if(totalIonIntensity == 0) { printf("TagExtensions: totalIonIntensity == 0\n"); exit(1); } finalScore = (REAL_4)currPtr->score / (REAL_4)totalIonIntensity; finalScore = finalScore * 100; if((finalScore >= TAG_CUTOFF && (gParam.fragmentPattern == 'Q' || gParam.fragmentPattern == 'T')) || (finalScore >= (TAG_CUTOFF * 0.5) && gParam.fragmentPattern == 'L')) { for(j = 0; j < currPtr->peptideLength; j++) { peptide[j] = currPtr->peptide[j]; } score = finalScore; peptideLength = currPtr->peptideLength; gapNum = currPtr->gapNum; nodeValue = currPtr->nodeValue; /*Here's the expected average peptide length.*/ scoreAdjuster = nodeValue; scoreAdjuster = scoreAdjuster / gAvResidueMass; /*Here's the ratio between the average expected and actual length.*/ if(peptideLength == 0) { printf("TagExtensions: peptideLength == 0\n"); exit(1); } scoreAdjuster = scoreAdjuster / peptideLength; if(TagSeqList->numObjects < gParam.finalSeqNum) { tsequence TagToAdd; for(j = 0; j < peptideLength; j++) { TagToAdd.peptide[j] = peptide[j]; } TagToAdd.peptideLength = peptideLength; TagToAdd.score = score * scoreAdjuster; TagToAdd.nodeValue = nodeValue; TagToAdd.gapNum = gapNum; if(!AddToList(&TagToAdd, TagSeqList)) { DisposeList(NewTagSubseqList); free(extensionList); free(peptide); return(flagToStop); } } else { qsort(TagSeqList->seq, (size_t)TagSeqList->numObjects, (size_t)sizeof(tsequence),SequenceScoreDescendSortFunc); if(score > TagSeqList->seq[TagSeqList->numObjects - 1].score) { /* Replace the worst scorer with this sequence. */ for(j = 0; j < peptideLength; j++) { TagSeqList->seq[TagSeqList->numObjects -1].peptide[j] = peptide[j]; } TagSeqList->seq[TagSeqList->numObjects -1].peptideLength = peptideLength; TagSeqList->seq[TagSeqList->numObjects -1].score = score; TagSeqList->seq[TagSeqList->numObjects -1].nodeValue = nodeValue; TagSeqList->seq[TagSeqList->numObjects -1].gapNum = gapNum; } } } } currPtr++; /*Go to the subsequence.*/ } DisposeList(TagSubseqList); /* Throw away the old list of subsequences */ TagSubseqList = NewTagSubseqList; /* Make the new subseqs the list of subsequences */ free(extensionList); free(peptide); return(flagToStop); } /********************************* FindTagYIons*************************************************** * * This function assumes that the CID ions are all of type y. The nominal mass values are * determined and the corresponding positions in the array sequenceNodeC are assigned the * additional value of gWeightedIonValues.y. */ void FindTagYIons(char *tagNode, INT_4 charge, INT_4 *tagNodeIntensity) { tMSData *tagMassPtr; INT_4 yMassMin, yMassMax, j; REAL_4 yMass, peptideMass; REAL_8 aToMFactor; tagMassPtr = TagMassList->mass; while(tagMassPtr < TagMassList->mass + TagMassList->numObjects) { yMass = tagMassPtr->mOverZ; yMass = (yMass * charge) - ((charge - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ if(yMass >= gParam.monoToAv) /*Fully apply the average to mono mass factor*/ { aToMFactor = 0; } else { if(yMass > (gParam.monoToAv - gAvMonoTransition)) /*In the transition region*/ { aToMFactor = (gParam.monoToAv - yMass) / gAvMonoTransition; } else /*Don't apply this factor*/ { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(yMass > (gParam.monoToAv - gAvMonoTransition)) /*Convert the y ion to monoisotopic.*/ { yMass = yMass * aToMFactor; } peptideMass = gParam.peptideMW; if(peptideMass >= gParam.monoToAv) /*Fully apply the average to mono mass factor*/ { aToMFactor = 0; } else { if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) /*In the transition region*/ { aToMFactor = (gParam.monoToAv - peptideMass) / gAvMonoTransition; } else /*Don't apply this factor*/ { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) /*Convert the y ion to monoisotopic.*/ { peptideMass = peptideMass * aToMFactor; } /*Peptide mass and ymass are converted to monoisotopic before converting to the b ion value*/ yMass = peptideMass - yMass + (2 * gElementMass_x100[HYDROGEN]); /*Convert to b ion.*/ /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node, and convert to the b ion mass.*/ yMass = yMass + gParam.fragmentErr; yMassMax = yMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ yMass = yMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; yMassMin = yMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(yMassMax >= gGraphLength) { printf("FindTagYIons: yMassMax >= gGraphLength\n"); exit(1); } for(j = yMassMin; j <= yMassMax; j++) { tagNode[j] = 1; tagNodeIntensity[j] = tagMassPtr->intensity; } tagMassPtr++; } return; } /*******************************TagNodeInit************************************************ * * This function initializes the array tagNode by first assigning the value of zero * to each element in the array (from 0 -> 3999). Next an N-terminal value is assigned to * position number 1, and then the possible C-terminal nodes are found and given an N-terminal * value. There may be more than one C-terminal node, depending on the mass and the error. */ void TagNodeInit(char *tagNode, INT_4 *tagNodeIntensity) { INT_4 i, firstNode; REAL_4 lastNode; REAL_8 aToMFactor; INT_4 lastNodeHigh, lastNodeLow; for(i = 0; i < gGraphLength; i++) /*Initialize tagNode to zero's.*/ { tagNode[i] = 0; tagNodeIntensity[i] = 0; } /* Find the N-terminal node. This will equal the nominal mass of the N-terminal group R-NH-.*/ firstNode = gParam.modifiedNTerm + 0.5; tagNode[firstNode] = 1; /*Give the N-terminal node a value of one.*/ /*Figure out what the C-terminal node(s) are.*/ lastNode = gParam.peptideMW - gParam.modifiedCTerm; /*Alter the values so that they are closer to the expected nominal masses.*/ if(lastNode > gParam.monoToAv) { aToMFactor = 0; } else { if(lastNode > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - lastNode) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(lastNode > gParam.monoToAv - gAvMonoTransition) { lastNode = lastNode * aToMFactor; /*Convert to monoisotopic*/ } /*The two extremes for the possible C-terminal nodes are identified first.*/ /*First I'll find the highest mass node.*/ lastNode = lastNode + gParam.peptideErr; lastNodeHigh = lastNode; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ lastNode = lastNode - gParam.peptideErr - gParam.peptideErr + 0.5; lastNodeLow = lastNode; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(lastNodeHigh >= gGraphLength) { printf("TagNodeInit: lastNodeHigh >= gGraphLength\n"); exit(1); } for(i = lastNodeLow; i <= lastNodeHigh; i++) { tagNode[i] = 1; } return; } /*********************************GetAutoTag****************************************** * * * */ void GetAutoTag(struct MSData *firstMassPtr, SCHAR *sequenceNode) { struct MSData *currPtr; struct Sequence *firstTagPtr; INT_4 threshold, charge, *tagNodeIntensity, totalIntensity; INT_4 *peptide, count; char *tagNode; REAL_4 precursor, maxMOverZ; TagMassList = (tMSDataList *) CreateNewList( sizeof(tMSData), 10, 10 ); if (!TagMassList) return; TagSubseqList = (tsequenceList *) CreateNewList( sizeof(tsequence), 50, 10 ); if (!TagSubseqList) { DisposeList(TagMassList); return; } TagSeqList = (tsequenceList *) CreateNewList( sizeof(tsequence), 10, 10 ); if (!TagSeqList) { DisposeList(TagMassList); DisposeList(TagSubseqList); return; } peptide = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); if(peptide == NULL) { printf("GetAutoTag: Out of memory."); exit(1); } tagNode = (char *) malloc(gGraphLength * sizeof(char)); /*set aside some space for the tagNode*/ if(tagNode == NULL) { printf("GetAutoTag: Out of memory."); exit(1); } tagNodeIntensity = (int *) malloc(gGraphLength * sizeof(INT_4)); if(tagNodeIntensity == NULL) { printf("GetAutoTag: Out of memory."); exit(1); } gTagLength = 0; /*initialize; used for spectrum quality assessment*/ charge = gParam.chargeState - 1; /*the only charge considered for the y fragment ions*/ if(gParam.maxent3) { charge = 1; /*all maxent3 processed data is +1*/ } precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; maxMOverZ = (((precursor * gParam.chargeState) - gElementMass_x100[HYDROGEN]) / charge) - gGapList[G] + gParam.fragmentErr; firstTagPtr = NULL; totalIntensity = 0; if(charge == 0) { free(tagNode); free(tagNodeIntensity); free(peptide); DisposeList(TagMassList); DisposeList(TagSeqList); return; /* Don't try to find tags for singly charged precursors */ } /* Find ions that are greater than the precursor but less than the maximum m/z value for the given value of charge. The number 'charge' is the only charge state for the fragment ions that is considered here; for +2 precursors I look only for singly charged y ions and for +3 precursors I look for +2 y fragment ions. Ions also must be of intensity greater than 10% of the most abundant ion in this region of the spectrum. */ currPtr = firstMassPtr; /*find an intensity threshold*/ count = 0; threshold = 0; while(currPtr != NULL) { if(currPtr->mOverZ > (precursor + (gParam.fragmentErr * 4)) && currPtr->mOverZ < maxMOverZ) { threshold = threshold + (0.20 * currPtr->intensity); count++; } currPtr = currPtr->next; } if(count == 0) { free(tagNode); free(tagNodeIntensity); free(peptide); DisposeList(TagMassList); DisposeList(TagSeqList); return; } threshold = threshold / count; /*threshold is 20% of the average*/ /*currPtr = firstMassPtr;*/ /*find 0.1 x most intense ion*/ /*threshold = 0; while(currPtr != NULL) { if(currPtr->intensity > threshold && currPtr->mOverZ > (precursor + (gParam.peakWidth * 4)) && currPtr->mOverZ < maxMOverZ) { threshold = currPtr->intensity; } currPtr = currPtr->next; } threshold = threshold * 0.1;*/ /*Set intensity threshold.*/ /* Find the potential tag ions */ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ > (precursor + (4 * gParam.fragmentErr)) && currPtr->mOverZ < maxMOverZ && currPtr != NULL) { if(currPtr->intensity > threshold) { tMSData massToAdd; massToAdd.mOverZ = currPtr->mOverZ; massToAdd.intensity = currPtr->intensity; massToAdd.normIntensity = currPtr->intensity; if(!AddToList(&massToAdd, TagMassList)) { free(tagNode); free(tagNodeIntensity); free(peptide); DisposeList(TagMassList); DisposeList(TagSeqList); return; } totalIntensity = totalIntensity + currPtr->intensity; /*set to zero at the top*/ } } currPtr = currPtr->next; } /* * Filter through the tag masses, so that no ions are closer together than 57/charge. * This should not be used for LCQ data, since high mass b ions are mingled with the * high mass y ions. For LCQ data, I remove neutral loss ions. */ if(gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q') { FilterTagMasses(charge); } if(gParam.fragmentPattern == 'L') { RemoveNeutralLosses(charge); } /* * Initialize the array tagNode so that all values * are zero, except for the N and C terminal nodes, which are assigned a value of one. */ TagNodeInit(tagNode, tagNodeIntensity); /* * Assume all ions in TagMassList are y ions of charge 'charge', and convert these to positions * in the tagNode array, where the indexing corresponds to nominal b ion masses. */ FindTagYIons(tagNode, charge, tagNodeIntensity); /* * To help reduce the number of sequence tags produced in LCQ data, the b ion nodes are * crossed with the y ion nodes. That is, only ions that have a corresponding b and y ions * are utilized. */ /*if(gParam.fragmentPattern == 'L') { FindTagBIons(tagNode, charge, tagNodeIntensity); }*/ /* * Now that the node scores have been finalized (in the array tagNode), * it is time to start building up subsequences from the * N-terminus. I connect the nodes that are spaced one or two amino acid residues apart. * The function * TagMaker returns a pointer to a struct of type Sequence (firstTagPtr, * which is the first element in * a linked list of completed sequences plus the associated score. If there were * no completed sequences, then the function returns a NULL value. */ /*firstTagPtr = TagMaker(tagNode, tagNodeIntensity, totalIntensity);*/ /* If no tags are found, try sticking in the b2 ion and checking again. */ /*if(firstTagPtr == NULL) { FindTagB2Ions(firstMassPtr, &totalIntensity, tagNode, tagNodeIntensity); firstTagPtr = TagMaker(tagNode, tagNodeIntensity, tagMassPtr); }*/ /* If still no tags are found, then try searching tagNode from the top down.*/ if(TagSeqList->numObjects == 0) { AlternateTagMaker(tagNode, tagNodeIntensity, totalIntensity); } /* * If there are any tags found, then these are used to make a composite tagNode, where * tagNode array is zeroed out, and reassigned values of one for positions corresponding * to the sequences in the tags. Regions outside of the tag region are assigned values of * one. The composite tagNode is multiplied against the array sequenceNode, so that * only sequences identified as possible tags are allowed when generating subsequences. */ if(TagSeqList->numObjects) { MaskSequenceNodeWithTags(sequenceNode, tagNode); } /* Announce to the world that I'm done w/ the auto-tag.*/ if(gParam.fMonitor && gCorrectMass) { if(TagSeqList->numObjects > 1) { printf("Auto-tag found %ld tags:\n", TagSeqList->numObjects); } else { if(TagSeqList->numObjects == 1) { printf("Auto-tag found one tag:\n"); } else { printf("Auto-tag found no tags:\n"); } } if(TagSeqList->numObjects > 0) { /* Print out the tags */ tsequence *currentTag; currentTag = TagSeqList->seq; while(currentTag < TagSeqList->seq + TagSeqList->numObjects) { char *peptideString; if(currentTag->peptideLength > gTagLength) { gTagLength = currentTag->peptideLength; } peptideString = ComposePeptideString(currentTag->peptide, currentTag->peptideLength); if(peptideString) { printf("%s\n", peptideString); free(peptideString); } currentTag++; } } } /* Free memory allocations specific to this function.*/ free(tagNode); free(tagNodeIntensity); free(peptide); DisposeList(TagMassList); DisposeList(TagSeqList); return; } /* ------------------------------------------------------------------------------------- // SEQUENCE SCORE DESCEND SORT FUNC -- Sort the sequences by their score (High to Low). */ int SequenceScoreDescendSortFunc(const void *n1, const void *n2) { tsequence *n3, *n4; n3 = (tsequence *)n1; n4 = (tsequence *)n2; return (int)(n3->score < n4->score)? 1:-1; } /* ------------------------------------------------------------------------------------- * COMPOSE PEPTIDE STRING -- Convert the peptide (stored as masses) into a string, * replacing single amino acids with their one letter equivalent * and masses of ambiguous multiple amino acids with the mass * in brackets. * ex: peptide = 201, 147, 97 => string = "[201]FP" */ char *ComposePeptideString(INT_4 *peptide, INT_4 peptideLength) { INT_4 i, j; REAL_4 error = 0.4 * gMultiplier; char test; char *string; char *p; string = (char *) malloc(50); if(!string) { return NULL; } p = string; for(i = 0; i < peptideLength; i++) { test = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(peptide[i] <= gGapList[j] + error && peptide[i] >= gGapList[j] - error) { p+= sprintf(p, "%c", gSingAA[j]); test = FALSE; break; } } if(test) /* More than a single AA */ { if(peptide[i] < 1000) { p+= sprintf(p, "[%3d]", peptide[i]); } else { p+= sprintf(p, "[%3d]", peptide[i] / gMultiplier); } } } p+= sprintf(p, "\0"); /* NULL terminate the string */ return string; } lutefisk-1.0.7+dfsg.orig/src/LutefiskProbScorer.c0000644000175000017500000006432207711761366021720 0ustar rusconirusconi/********************************************************************************************* * Copyright © 1995-1999 * Richard S. Johnson * Immunex Corp. * Seattle, WA * * First version: 10/95 *********************************************************************************************/ /* ANSI Headers */ #include #include #include #include /* Lutefisk Headers */ #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" /* Globals for this file*/ REAL_4 bIonProb = 0.7; /*b ion probability .6 */ REAL_4 bMinWaterProb = 0.3; /*b-18 probability .3 */ REAL_4 bMinAmmoniaProb = 0.15; /*b-17 probability .15 */ REAL_4 bDoublyProbMultiplier = 0.5; /*Multiply this for more than one charge .5 */ REAL_4 aIonProb = 0.1; /*a ion probability .1 */ REAL_4 yIonProb = 0.8; /*y ion probability .8 */ REAL_4 yMinWaterProb = 0.1; /*y-18 probability .1 */ REAL_4 yMinAmmoniaProb = 0.1; /*y-17 probability .1 */ REAL_4 yDoublyProbMultiplier = 0.5; /*Multiply this for more than one charge .5 */ REAL_4 immoniumProb = 0.2; /*Immonium ion probability .2 */ REAL_4 internalProb = 0.1; /*Internal ion probability .1 */ REAL_4 internalProProb = 0.2; /*Internal ions with N-terminal proline probability .2 */ /********************************SequenceScorer***************************************************** * * Assign probability scores to sequences. * */ REAL_4 LutefiskProbScorer(INT_4 *sequence, INT_4 seqLength, REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ) { INT_4 i, j; REAL_4 *randomProb; REAL_4 probScore = 0; /* Make some space*/ randomProb = malloc(MAX_ION_NUM * sizeof(REAL_4)); if(randomProb == NULL) { printf("SequenceScorer: Out of memory"); exit(1); } /*Initialize*/ for(i = 0; i < MAX_ION_NUM; i++) { randomProb[i] = 0; } /* Calculate random probability for each ion*/ CalcRandomProb(randomProb, fragMOverZ, fragNum); /* Score the sequences*/ /*initialize ionFound and sequenceProb to all zero, and fill in the precursor related ions*/ // InitIonFound(ionFound, fragMOverZ, fragNum, randomProb); /*Get initial probability based on terminal group (Lys and Arg are good; others are not)*/ // probScore = InitProbScore(sequence, seqLength); // probScore = FindBIons(ionFound, fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength); /*Find the y ions*/ // probScore = FindYIons(ionFound, fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength); /*Find the internal fragment ions*/ // probScore = FindInternalIons(ionFound, fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength); /*Find the immonium ions*/ /* probScore = FindImmoniumIons(ionFound, fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength);*/ /*Change probability score to log base 10 scale*/ if(probScore > 1) { probScore = log10(probScore); } else /*keep things positive by only logging things over a value of 1*/ { probScore = 0; } /* Free array*/ free(randomProb); return(probScore); } /***********************************FindImmoniumIons****************************************** * * Finds and scores the amino acid immonium ions. */ REAL_4 FindImmoniumIons(REAL_4 *ionFound, INT_4 *mass, INT_4 ionCount, REAL_4 score, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength) { REAL_4 lowMassIons[AMINO_ACID_NUMBER][3] = { /* A */ 44.0500, 0, 0, /* R */ 70.0657, 87.0922, 112.0875, /* N */ 87.0558, 0, 0, /* D */ 88.0399, 0, 0, /* C */ 0, 0, 0, /* E */ 102.0555, 0, 0, /* Q */ 84.0450, 101.0715, 129.0664, /* G */ 0, 0, 0, /* H */ 110.0718, 0, 0, /* I */ 86.0970, 120.0483, 0, /*I position also represent oxidized Met for qtof*/ /* L */ 86.0970, 0, 0, /* K */ 84.0814, 101.1079, 129.1028, /* M */ 104.0534, 0, 0, /* F */ 120.0813, 0, 0, /* P */ 70.0657, 0, 0, /* S */ 60.0449, 0, 0, /* T */ 74.0606, 0, 0, /* W */ 159.0922, 0, 0, /* Y */ 136.0762, 0, 0, /* V */ 72.0813, 0, 0, 0,0,0, 0,0,0, 0,0,0, 0,0,0, 0,0,0 }; INT_4 i, j, k, amAcidIndex, immoniumIndex; REAL_4 individualProb; for(i = 0; i < seqLength; i++) { amAcidIndex = -1; for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] <= gMonoMass[j] + gParam.fragmentErr && sequence[i] >= gMonoMass[j] - gParam.fragmentErr) { amAcidIndex = j; break; } } if(amAcidIndex > -1) { for(j = 0; j < 4; j++) { if(lowMassIons[amAcidIndex][j] > 0 && lowMassIons[amAcidIndex][j] > mass[0]) { immoniumIndex = 0; for(k = 0; k < ionCount; k++) { if(mass[k] > 160) { break; } if(mass[k] < lowMassIons[amAcidIndex][j] + gParam.fragmentErr && mass[k] > lowMassIons[amAcidIndex][j] - gParam.fragmentErr) { ionFound[k] = 1; immoniumIndex = k; } } if(immoniumIndex > 0) /*something was found*/ { individualProb = immoniumProb / randomProb[immoniumIndex]; score *= individualProb; } else { individualProb = (1 - immoniumProb) / (1 - randomProb[immoniumIndex]); score *= individualProb; } } } } } return(score); } /***********************************FindInternalIons****************************************** * * Finds and scores the internal fragment ions. */ REAL_4 FindInternalIons(REAL_4 *ionFound, INT_4 *mass, INT_4 ionCount, REAL_4 score, REAL_4 *randomProb, INT_4 *ionRank, INT_4 *sequence, INT_4 seqLength) { INT_4 i, j, k, residueCount; REAL_4 testMass, testFound, individualProb; REAL_4 precursor = (gParam.peptideMW + gParam.chargeState * gElementMass[HYDROGEN]) / gParam.chargeState; BOOLEAN nTermPro, intFragTest; if(seqLength < 4) { return(score); /*need at least four residues for an internal fragment*/ } for(i = 1; i < seqLength - 2; i++) { testMass = sequence[i] + gElementMass[HYDROGEN]; residueCount = 1; if(sequence[i] > gMonoMass[P] - gParam.fragmentErr && sequence[i] < gMonoMass[P] + gParam.fragmentErr) { nTermPro = TRUE; /*The N-terminus of this fragment is proline*/ } else { nTermPro = FALSE; } for(j = i + 1; j < seqLength - 1; j++) { testMass += sequence[j]; residueCount++; if(testMass < precursor - gParam.fragmentErr && residueCount < 6 && testMass > mass[0]) /*dont bother w/ high mass internal frags*/ { intFragTest = FALSE; for(k = 0; k < ionCount; k++) { if(mass[k] > testMass + gParam.fragmentErr) { break; /*I need to save the k value at the point where this occurs*/ } if(testMass < mass[k] + gParam.fragmentErr && testMass > mass[k] - gParam.fragmentErr) { if(!nTermPro) { if((REAL_4)ionRank[k] / (REAL_4)ionCount > 0.5) /*bottom half of the ions, intensity-wise*/ { intFragTest = TRUE; testFound = (REAL_4)ionRank[k] / (REAL_4)ionCount - 0.5; /*ranges from 0 to 0.5*/ if(testFound > ionFound[k]) { ionFound[k] = testFound; } } } else { intFragTest = TRUE; ionFound[k] = 1; /*internal frags w/ N-terminal Pro are common*/ } break; } } /*need to make sure k index is in range*/ if(k >= ionCount) { k = ionCount; } if(k < 0) { k = 1; } /*score the probability*/ if(intFragTest) { if(nTermPro) { individualProb = internalProProb / randomProb[k]; score *= individualProb; } else { individualProb = internalProb / randomProb[k]; score *= individualProb; } } else /*didn't find any*/ { if(nTermPro) { individualProb = (1 - internalProProb) / (1 - randomProb[k]); score *= individualProb; } else { individualProb = (1 - internalProb) / (1 - randomProb[k]); score *= individualProb; } } } } } return(score); } /***********************************InitProbScore********************************************** * * If the C-terminus is Arg or Lys, then give higher probability. * */ REAL_4 InitProbScore(INT_4 *sequence, INT_4 seqLength) { REAL_4 score = 0.05; REAL_4 residueMass, testMass; INT_4 i; BOOLEAN test = FALSE; BOOLEAN weirdMass; residueMass = sequence[seqLength - 1]; if(residueMass < gMonoMass[R] + gParam.fragmentErr && residueMass > gMonoMass[R] - gParam.fragmentErr) { score = 0.95; } else if(residueMass < gMonoMass[K] + gParam.fragmentErr && residueMass > gMonoMass[K] - gParam.fragmentErr) { score = 0.95; } else { for(i = 0; i < gAminoAcidNumber; i++) { testMass = residueMass - gMonoMass[i]; if(testMass < gMonoMass[R] + gParam.fragmentErr && testMass > gMonoMass[R] - gParam.fragmentErr) { score = 0.95; break; } else if(testMass < gMonoMass[K] + gParam.fragmentErr && testMass > gMonoMass[K] - gParam.fragmentErr) { score = 0.95; break; } } } return(score); } /************************************CalcRandomProb********************************************* * * For each ion, a 200 u window is identified (usually +/- 100 u surrounding it) and the number * of ions is counted within the window. That counted number is divided by the number of possible * ions that could fit in that 200 u window, which depends on the instrument resolution. */ void CalcRandomProb(REAL_4 *randomProb, INT_4 *mass, INT_4 ionCount) { INT_4 i, j, windowCount; REAL_4 lowMass, highMass; /* Initialize*/ lowMass = mass[0]; highMass = mass[ionCount-1]; for(i = 0; i < MAX_ION_NUM; i++) { randomProb[i] = 0; } for(i = 0; i < ionCount; i++) { windowCount = 0; if(mass[i] < lowMass + 100) /*bottom 200 u window before it moves*/ { for(j = 0; j < ionCount; j++) { if(mass[j] < lowMass + 200) { windowCount++; } } } else if(mass[i] > highMass - 100) /*top 200 u window that stops moving*/ { for(j = 0; j < ionCount; j++) { if(mass[j] > highMass - 200) { windowCount++; } } } else /*this is the moving window*/ { for(j = 0; j < ionCount; j++) { if(mass[j] > mass[i] - 100 && mass[j] < mass[i] + 100) { windowCount++; } } } /*calculate the randomness of this ion*/ randomProb[i] = (REAL_4) windowCount / 200; /*assuming low resolution of one peak per amu*/ } /*Verify that randomProb nevers equals zero or one (avoid divide by zero later on)*/ for(i = 0; i < ionCount; i++) { if(randomProb[i] < 0.005) { randomProb[i] = 0.005; /*this is 1 out of 200*/ } if(randomProb[i] > 0.995) { randomProb[i] = 0.995; /*this is 199 out of 200*/ } } return; } /*************************************InitIonFound*********************************************** * * Initialize ionFound to zero for each sequence, and set precursor to "found". */ void InitIonFound(REAL_4 *ionFound, INT_4 *mass, INT_4 ionCount, REAL_4 *randomProb) { INT_4 i; REAL_4 testMass, water, ammonia; for(i = 0; i < MAX_ION_NUM; i++) { ionFound[i] = 0; } /* Find precursor ion*/ testMass = (gParam.peptideMW + gElementMass[HYDROGEN] * gParam.chargeState) / gParam.chargeState; water = gElementMass[HYDROGEN] * 2 + gElementMass[OXYGEN]; ammonia = gElementMass[NITROGEN] + gElementMass[HYDROGEN] * 3; for(i = 0; i < ionCount; i++) { if(mass[i] > testMass + gParam.fragmentErr) { break; } if(mass[i] > testMass - water - gParam.fragmentErr) { if(mass[i] < testMass - water + gParam.fragmentErr) { ionFound[i] = 1; randomProb[i] = 0.005; } if(mass[i] > testMass - ammonia - gParam.fragmentErr && mass[i] < testMass - ammonia + gParam.fragmentErr) { ionFound[i] = 1; randomProb[i] = 0.005; } if(mass[i] > testMass - gParam.fragmentErr && mass[i] < testMass + gParam.fragmentErr) { ionFound[i] = 1; randomProb[i] = 0.005; } } } return; } /**************************************FindBIons********************************************* * * Find the b ions for the sequence and change the ionFound to 1. Return a value that corresponds * to the number of consecutive b ions. */ REAL_4 FindBIons(REAL_4 *ionFound, INT_4 *mass, INT_4 ionCount, INT_4 sequenceNum, REAL_4 probScore, REAL_4 *randomProb, REAL_4 *intensity, INT_4 *sequence, INT_4 seqLength) { INT_4 ionsInARow = 0; INT_4 i, j, k, bIonIndex; REAL_4 water, ammonia, bIonTemplate, bIonMin17Template, bIonMin18Template, testFound; REAL_4 bIon, bIonMin17, bIonMin18, individualProb, aIon, aIonTemplate, carbonMonoxide; BOOLEAN bIonTest, bMin18Test, bMin17Test, aIonTest, isItAGap; /* Initialize*/ water = gElementMass[OXYGEN] + gElementMass[HYDROGEN] * 2; ammonia = gElementMass[NITROGEN] + gElementMass[HYDROGEN] * 3; carbonMonoxide = gElementMass[CARBON] + gElementMass[OXYGEN]; bIonTemplate = gParam.modifiedNTerm; /* Start the calculations and searches*/ for(i = 0; i < seqLength; i++) { isItAGap = TRUE; if(i == 0 || i == seqLength - 1) { isItAGap = FALSE; /*won't call the two ends "gaps"*/ } if(isItAGap) { for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] < gMonoMass[j] + gParam.fragmentErr && sequence[i] > gMonoMass[j] - gParam.fragmentErr) { isItAGap = FALSE; break; } } } bIonTemplate += sequence[i]; bIonMin17Template = bIonTemplate - ammonia; bIonMin18Template = bIonTemplate - water; aIonTemplate = bIonTemplate - carbonMonoxide; for(j = 1; j <= gParam.chargeState; j++) /*check different charge states*/ { bIon = (bIonTemplate + (j-1)*gElementMass[HYDROGEN]) / j; bIonMin17 = (bIonMin17Template + (j-1)*gElementMass[HYDROGEN]) / j; bIonMin18 = (bIonMin18Template + (j-1)*gElementMass[HYDROGEN]) / j; aIon = (aIonTemplate + (j-1)*gElementMass[HYDROGEN]) / j; bIonTest = FALSE; aIonTest = FALSE; bMin18Test = FALSE; bMin17Test = FALSE; /*apply constraints to charge and mass*/ if((bIon * j) > ((j-1) * 350) && bIon > mass[0]) { for(k = 0; k < ionCount; k++) { if(mass[k] > bIon + gParam.fragmentErr) { break; /*don't waste any more time looking*/ } if(mass[k] > bIon - gParam.fragmentErr) { ionFound[k] = 1; bIonTest = TRUE; bIonIndex = k; } } if(bIonTest) /*there's a b ion, so look for the b-17 and b-18*/ { k = bIonIndex; k--; while(mass[k] > aIon - gParam.fragmentErr && k >= 0) { if(mass[k] > bIonMin17 - gParam.fragmentErr && mass[k] < bIonMin17 + gParam.fragmentErr) { testFound = intensity[bIonIndex] / (intensity[bIonIndex] + intensity[k]); bMin17Test = TRUE; if(testFound > ionFound[k]) { ionFound[k] = testFound; } } if(mass[k] > bIonMin18 - gParam.fragmentErr && mass[k] < bIonMin18 + gParam.fragmentErr) { testFound = intensity[bIonIndex] / (intensity[bIonIndex] + intensity[k]); bMin18Test = TRUE; if(testFound > ionFound[k]) { ionFound[k] = testFound; } } if(mass[k] > aIon - gParam.fragmentErr && mass[k] < aIon + gParam.fragmentErr) { testFound = intensity[bIonIndex] / (intensity[bIonIndex] + intensity[k]); aIonTest = TRUE; if(testFound > ionFound[k]) { ionFound[k] = testFound; } } k--; } } /*Calculate the probability scores*/ if(bIonTest) /*if the calculated b ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = bIonProb / randomProb[bIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = bIonProb * bDoublyProbMultiplier / randomProb[bIonIndex]; probScore *= individualProb; } } else /*if the calculated b ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1-bIonProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - bIonProb * bDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(bMin18Test) /*if the calculated b-18 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = bMinWaterProb / randomProb[bIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = bMinWaterProb * bDoublyProbMultiplier / randomProb[bIonIndex]; probScore *= individualProb; } } else /*if the calculated ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1 - bMinWaterProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - bMinWaterProb * bDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(bMin17Test) /*if the calculated b-17 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = bMinAmmoniaProb / randomProb[bIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = bMinAmmoniaProb * bDoublyProbMultiplier / randomProb[bIonIndex]; probScore *= individualProb; } } else /*if the calculated b ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1 - bMinAmmoniaProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - bMinAmmoniaProb * bDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(aIonTest) /*if the calculated a ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = aIonProb / randomProb[bIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = aIonProb * bDoublyProbMultiplier / randomProb[bIonIndex]; probScore *= individualProb; } } else /*if the calculated b ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1 - aIonProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - aIonProb * bDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(isItAGap && j == 1) /*a gap arises from lack of a y ion, so penalize, but only do it once for j=1*/ { individualProb = (1-bIonProb * 0.5) / (1 - randomProb[k - 1]); /*0.5 reflects fact that many gaps (except for the ends) are due to proline*/ probScore *= individualProb; } } } } return(probScore); /*ionsInARow is not used for now*/ } /**************************************FindYIons********************************************* * * Find the y ions for the sequence and change the ionFound to 1. Return a value that corresponds * to the number of consecutive y ions. */ REAL_4 FindYIons(REAL_4 *ionFound, INT_4 *mass, INT_4 ionCount, INT_4 sequenceNum, REAL_4 probScore, REAL_4 *randomProb, REAL_4 *intensity, INT_4 *sequence, INT_4 seqLength) { INT_4 ionsInARow = 0; INT_4 i, j, k, yIonIndex; REAL_4 water, ammonia, yIonTemplate, yIonMin17Template, yIonMin18Template; REAL_4 yIon, yIonMin17, yIonMin18, individualProb, testFound; BOOLEAN yIonTest, yMin17Test, yMin18Test, isItAGap; /* Initialize*/ water = gElementMass[OXYGEN] + gElementMass[HYDROGEN] * 2; ammonia = gElementMass[NITROGEN] + gElementMass[HYDROGEN] * 3; yIonTemplate = gParam.modifiedCTerm + 2 * gElementMass[HYDROGEN]; for(i = seqLength - 1; i > -1; i--) { isItAGap = TRUE; if(i == 0 || i == seqLength - 1) { isItAGap = FALSE; /*won't call the two ends "gaps"*/ } if(isItAGap) { for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] < gMonoMass[j] + gParam.fragmentErr && sequence[i] > gMonoMass[j] - gParam.fragmentErr) { isItAGap = FALSE; break; } } } yIonTemplate += sequence[i]; yIonMin17Template = yIonTemplate - ammonia; yIonMin18Template = yIonTemplate - water; for(j = 1; j <= gParam.chargeState; j++) /*check different charge states*/ { yIon = (yIonTemplate + (j-1)*gElementMass[HYDROGEN]) / j; yIonMin17 = (yIonMin17Template + (j-1)*gElementMass[HYDROGEN]) / j; yIonMin18 = (yIonMin18Template + (j-1)*gElementMass[HYDROGEN]) / j; yIonTest = FALSE; yMin17Test = FALSE; yMin18Test = FALSE; /*apply constraints to charge and mass*/ if((yIon * j) > ((j-1) * 350) && yIon > mass[0]) { for(k = 0; k < ionCount; k++) { if(mass[k] > yIon + gParam.fragmentErr) { break; /*don't waste any more time looking*/ } if(mass[k] > yIon - gParam.fragmentErr) { ionFound[k] = 1; yIonTest = TRUE; yIonIndex = k; } } if(yIonTest) /*there's a y ion, so look for the y-17 and y-18*/ { k = yIonIndex; k--; while(mass[k] > yIonMin18 - gParam.fragmentErr && k >= 0) { if(mass[k] > yIonMin17 - gParam.fragmentErr && mass[k] < yIonMin17 + gParam.fragmentErr) { /*y-17 intensity should be less than the y ion*/ testFound = intensity[yIonIndex] / (intensity[yIonIndex] + intensity[k]); yMin17Test = TRUE; if(testFound > ionFound[k]) { ionFound[k] = testFound; } } if(mass[k] > yIonMin18 - gParam.fragmentErr && mass[k] < yIonMin18 + gParam.fragmentErr) { testFound = intensity[yIonIndex] / (intensity[yIonIndex] + intensity[k]); yMin18Test = TRUE; if(testFound > ionFound[k]) { ionFound[k] = testFound; } } k--; } } /*Calculate the probability scores*/ if(yIonTest) /*if the calculated y ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = yIonProb / randomProb[yIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = yIonProb * yDoublyProbMultiplier / randomProb[yIonIndex]; probScore *= individualProb; } } else /*if the calculated y ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1-yIonProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - yIonProb * yDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(yMin18Test) /*if the calculated y-18 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = yMinWaterProb / randomProb[yIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = yMinWaterProb * yDoublyProbMultiplier / randomProb[yIonIndex]; probScore *= individualProb; } } else /*if the calculated y ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1 - yMinWaterProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - yMinWaterProb * yDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(yMin17Test) /*if the calculated y-18 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = yMinAmmoniaProb / randomProb[yIonIndex]; probScore *= individualProb; } else /*for multiply charged fragments*/ { individualProb = yMinAmmoniaProb * yDoublyProbMultiplier / randomProb[yIonIndex]; probScore *= individualProb; } } else /*if the calculated y ion is not present*/ { if(k >= ionCount) /*need to make sure k index is in range*/ { k = ionCount; } if(k < 0) { k = 1; } if(j == 1) { individualProb = (1 - yMinAmmoniaProb) / (1 - randomProb[k - 1]); probScore *= individualProb; } else { individualProb = (1 - yMinAmmoniaProb * yDoublyProbMultiplier) / (1 - randomProb[k - 1]); probScore *= individualProb; } } if(isItAGap && j == 1) /*a gap arises from lack of a y ion, so penalize, but only do it once for j=1*/ { individualProb = (1-yIonProb * 0.5) / (1 - randomProb[k - 1]); /*0.5 reflects fact that many gaps (except for the ends) are due to proline*/ probScore *= individualProb; } } } } return(probScore); } lutefisk-1.0.7+dfsg.orig/src/Makefile.osf0000644000175000017500000000352010124104201020147 0ustar rusconirusconi# #sun (bsd) # for mips, also use: -mips2 -O2 # CC= cc -O4 -std CFLAGS= -D__ALPHA LFLAGS= -lm -o BIN = /seqprg/slib/bin #NRAND= nrand #IBM RS/6000 NRAND= nrand48 RANFLG= -DRAND32 #HZ=60 for sun, mips, 100 for rs/6000, SGI, LINUX HZ=60 PROGS= lutefisk SPROGS= lutefisk .c.o: $(CC) $(CFLAGS) -c $< all : $(PROGS) sall : $(SPROGS) install : cp $(PROGS) $(BIN) clean-up : rm *.o $(PROGS) lutefisk : LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(CC) LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(LFLAGS) lutefisk LutefiskGlobalDeclarations.o : LutefiskGlobalDeclarations.c $(CC) $(CFLAGS) -c LutefiskGlobalDeclarations.c LutefiskMain.o : LutefiskMain.c $(CC) $(CFLAGS) -c LutefiskMain.c LutefiskGetCID.o : LutefiskGetCID.c $(CC) $(CFLAGS) -c LutefiskGetCID.c LutefiskHaggis.o : LutefiskHaggis.c $(CC) $(CFLAGS) -c LutefiskHaggis.c LutefiskMakeGraph.o : LutefiskMakeGraph.c $(CC) $(CFLAGS) -c LutefiskMakeGraph.c LutefiskSummedNode.o : LutefiskSummedNode.c $(CC) $(CFLAGS) -c LutefiskSummedNode.c LutefiskSubseqMaker.o : LutefiskSubseqMaker.c $(CC) $(CFLAGS) -c LutefiskSubseqMaker.c LutefiskScore.o : LutefiskScore.c $(CC) $(CFLAGS) -c LutefiskScore.c LutefiskXCorr.o : LutefiskXCorr.c $(CC) $(CFLAGS) -c LutefiskXCorr.c LutefiskFourier.o : LutefiskFourier.c $(CC) $(CFLAGS) -c LutefiskFourier.c LutefiskGetAutoTag.o : LutefiskGetAutoTag.c $(CC) $(CFLAGS) -c LutefiskGetAutoTag.c ListRoutines.o : ListRoutines.c $(CC) $(CFLAGS) -c ListRoutines.c lutefisk-1.0.7+dfsg.orig/src/Makefile.AIX0000644000175000017500000000330410124104201020001 0ustar rusconirusconi#CC= gcc -O #CC= cc -O3 -qstrict CC= xlc -O3 -qstrict CFLAGS= -qcpluscmt -D__AIX LFLAGS= -lm -o PROGS= lutefisk SPROGS= lutefisk .c.o: $(CC) $(CFLAGS) -c $< all : $(PROGS) sall : $(SPROGS) install : cp $(PROGS) $(BIN) clean-up : rm *.o $(PROGS) lutefisk : LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(CC) LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(LFLAGS) lutefisk LutefiskGlobalDeclarations.o : LutefiskGlobalDeclarations.c $(CC) $(CFLAGS) -c LutefiskGlobalDeclarations.c LutefiskMain.o : LutefiskMain.c $(CC) $(CFLAGS) -c LutefiskMain.c LutefiskGetCID.o : LutefiskGetCID.c $(CC) $(CFLAGS) -c LutefiskGetCID.c LutefiskHaggis.o : LutefiskHaggis.c $(CC) $(CFLAGS) -c LutefiskHaggis.c LutefiskMakeGraph.o : LutefiskMakeGraph.c $(CC) $(CFLAGS) -c LutefiskMakeGraph.c LutefiskSummedNode.o : LutefiskSummedNode.c $(CC) $(CFLAGS) -c LutefiskSummedNode.c LutefiskSubseqMaker.o : LutefiskSubseqMaker.c $(CC) $(CFLAGS) -c LutefiskSubseqMaker.c LutefiskScore.o : LutefiskScore.c $(CC) $(CFLAGS) -c LutefiskScore.c LutefiskXCorr.o : LutefiskXCorr.c $(CC) $(CFLAGS) -c LutefiskXCorr.c LutefiskFourier.o : LutefiskFourier.c $(CC) $(CFLAGS) -c LutefiskFourier.c LutefiskGetAutoTag.o : LutefiskGetAutoTag.c $(CC) $(CFLAGS) -c LutefiskGetAutoTag.c ListRoutines.o : ListRoutines.c $(CC) $(CFLAGS) -c ListRoutines.c lutefisk-1.0.7+dfsg.orig/src/Makefile.irix0000644000175000017500000000351110124104201020333 0ustar rusconirusconi# #sun (bsd) # for mips, also use: -mips2 -O2 # CC= gcc -O CFLAGS= -D__IRIX LFLAGS= -lm -o BIN = /seqprg/slib/bin #NRAND= nrand #IBM RS/6000 NRAND= nrand48 RANFLG= -DRAND32 #HZ=60 for sun, mips, 100 for rs/6000, SGI, LINUX HZ=60 PROGS= lutefisk SPROGS= lutefisk .c.o: $(CC) $(CFLAGS) -c $< all : $(PROGS) sall : $(SPROGS) install : cp $(PROGS) $(BIN) clean-up : rm *.o $(PROGS) lutefisk : LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(CC) LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(LFLAGS) lutefisk LutefiskGlobalDeclarations.o : LutefiskGlobalDeclarations.c $(CC) $(CFLAGS) -c LutefiskGlobalDeclarations.c LutefiskMain.o : LutefiskMain.c $(CC) $(CFLAGS) -c LutefiskMain.c LutefiskGetCID.o : LutefiskGetCID.c $(CC) $(CFLAGS) -c LutefiskGetCID.c LutefiskHaggis.o : LutefiskHaggis.c $(CC) $(CFLAGS) -c LutefiskHaggis.c LutefiskMakeGraph.o : LutefiskMakeGraph.c $(CC) $(CFLAGS) -c LutefiskMakeGraph.c LutefiskSummedNode.o : LutefiskSummedNode.c $(CC) $(CFLAGS) -c LutefiskSummedNode.c LutefiskSubseqMaker.o : LutefiskSubseqMaker.c $(CC) $(CFLAGS) -c LutefiskSubseqMaker.c LutefiskScore.o : LutefiskScore.c $(CC) $(CFLAGS) -c LutefiskScore.c LutefiskXCorr.o : LutefiskXCorr.c $(CC) $(CFLAGS) -c LutefiskXCorr.c LutefiskFourier.o : LutefiskFourier.c $(CC) $(CFLAGS) -c LutefiskFourier.c LutefiskGetAutoTag.o : LutefiskGetAutoTag.c $(CC) $(CFLAGS) -c LutefiskGetAutoTag.c ListRoutines.o : ListRoutines.c $(CC) $(CFLAGS) -c ListRoutines.c lutefisk-1.0.7+dfsg.orig/src/LutefiskScore.c0000644000175000017500000122672610303626754020714 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* Richard S. Johnson Made 12/95 This is for assigning scores to sequences derived from de novo interepretation of CID data. Input parameters are pointers to the first Sequence struct (containing the completed sequences to be scored), and the first MSData struct (containing the cid data - already offsetted), "peptideMW" (the peptide molecular weight), "fragmentErr" (the fragment ion m/z tolerance), and "chargeState" (the charge state of the precursor ion). This function and its related functions all use amino acid residue, element masses, etc that are 100 x the actual value. This allows for all values to be INT_4s, and this presumably speeds up the math and the comparisons (although I don't know this for sure). This bit of code was originally used by MADMAE, and has been extensively modified for Lutefisk. All of the functions within this file are either ScoreSequences or are called by ScoreSequences. */ #include #include #include #include #include #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" /* * Here's some globals that are specific to this file. They are two amino acid monoisotopic masses * times 10000 for Cys, Arg, His, and Lys. These get modified at the start of the ScoreSequences * function in order to accomodate different alkyl groups on cysteine. */ INT_4 gCysPlus[AMINO_ACID_NUMBER] = { 1740463, 2591103, 2170521, 2180361, 2060184, 2320518, 2310678, 1600307, 2400681, 2160933, 2160933, 2311042, 2340497, 2500776, 2000620, 1900412, 2040569, 2890885, 2660725, 2020776, 0, 0, 0, 0, 0 }; INT_4 gArgPlus[AMINO_ACID_NUMBER] = { 2271382, 3122022, 2701441, 2711281, 2591103, 2851437, 2841597, 2131226, 2931600, 2691852, 2691852, 2841961, 2871416, 3031695, 2531539, 2431332, 2571488, 3421804, 3191645, 2551695, 0, 0, 0, 0, 0 }; INT_4 gHisPlus[AMINO_ACID_NUMBER] = { 2080960, 2931600, 2511018, 2520859, 2400681, 2661015, 2651175, 1940804, 2741178, 2501430, 2501430, 2651539, 2680994, 2841273, 2341117, 2240909, 2381066, 3231382, 3001222, 2361273, 0, 0, 0, 0, 0 }; INT_4 gLysPlus[AMINO_ACID_NUMBER] = { 1991321, 2841961, 2421379, 2431219, 2311042, 2571376, 2561536, 1851164, 2651539, 2411790, 2411790, 2561899, 2591355, 2751634, 2251477, 2151270, 2291427, 3141743, 2911583, 2271634, 0, 0, 0, 0, 0 }; INT_4 gProPlus[AMINO_ACID_NUMBER] = { 1680899, 2531539, 2110957, 2120797, 2000620, 2260954, 2251114, 1540742, 2341117, 2101368, 2101368, 2251477, 2280933, 2441212, 1941055, 1840848, 1981005, 2831321, 2601161, 1961212, 0, 0, 0, 0, 0 }; INT_4 gGlnPlus[AMINO_ACID_NUMBER] = { 1990957, 2841597, 2421015, 2430855, 2310678, 2571012, 2561172, 1850801, 2651175, 2411427, 2411427, 2561536, 2590991, 2751270, 2251114, 2150906, 2291063, 3141379, 2911219, 2271270, 0, 0, 0, 0, 0 }; INT_4 gGluPlus[AMINO_ACID_NUMBER] = { 2000797, 2851437, 2430855, 2440696, 2320518, 2580852, 2571012, 1860641, 2661015, 2421267, 2421267, 2571376, 2600831, 2761110, 2260954, 2160746, 2300903, 3151219, 2921059, 2281110, 0, 0, 0, 0, 0 }; REAL_4 gToleranceNarrow, gToleranceWide; /*This is used in the "fuzzy logic" of matching calculated and observed ion m/z values.*/ char gTrypticCterm = FALSE; /*TRUE if tryptic or lys-c proteolysis with an ion at 147 or 175.*/ INT_4 gCleavageSiteStringent; /*used for determining quality of spectrum*/ char gAmIHere = FALSE; /*TRUE if tryptic or lys-c proteolysis with an ion at 147 or 175.*/ INT_4 gRightSequence[50] = { /*Used for checking if the correct sequence is remaining.*/ 7104, 11503, 13706, 9705, 14707, 11308, 14707, 16003, 11308, 12809, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; char gDatabaseSeq[MAX_DATABASE_SEQ_NUM][MAX_PEPTIDE_LENGTH]; INT_4 gPeptideLength[MAX_DATABASE_SEQ_NUM]; INT_4 gSeqNum = 0; REAL_8 gProbScoreMax; INT_4 gGapListDipeptideIndex; /*INT_4 gRightSequence[50] = { 1131, 971, 1471, 971, 870, 991, 570, 570, 1010, 570, 570, 1962, 1150, 2122, 1561, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };*/ /* Globals for haggis in this file*/ REAL_4 bIonProb = 0.7; /*b ion probability .6 */ REAL_4 bMinWaterProb = 0.2; /*b-18 probability .3 */ REAL_4 bMinAmmoniaProb = 0.15; /*b-17 probability .15 */ REAL_4 bMin64IonProb = 0.15; /*b-46 probability for b fragments containing oxMet*/ REAL_4 bDoublyProbMultiplier = 0.5; /*Multiply this for more than one charge .5 */ REAL_4 aIonProb = 0.1; /*a ion probability .1 */ REAL_4 yIonProb = 0.8; /*y ion probability .8 */ REAL_4 yMinWaterProb = 0.1; /*y-18 probability .1 */ REAL_4 yMinAmmoniaProb = 0.1; /*y-17 probability .1 */ REAL_4 yMin64IonProb = 0.2; /*y-46 probability for y fragments containing oxMet*/ REAL_4 yDoublyProbMultiplier = 0.5; /*Multiply this for more than one charge .5 */ REAL_4 immoniumProb = 0.2; /*Immonium ion probability .2 */ REAL_4 internalProb = 0.1; /*Internal ion probability .1 */ REAL_4 internalProProb = 0.2; /*Internal ions with N-terminal proline probability .2 */ /*****************************GetDatabaseSeq********************************************** * * * */ char *GetDatabaseSeq(INT_4 *peptide, INT_4 peptideLength) { char *string, *p; BOOLEAN correctSeqTest; INT_4 i, j, correctSeq; INT_4 metPheDiff = (gMonoMass_x100[F] - gMonoMass_x100[I]) * 1.5; INT_4 lysGlnDiff = (gMonoMass_x100[K] - gMonoMass_x100[Q]) * 1.5; string = (char *) malloc(50); if(!string) { return(NULL); } p = string; correctSeq = -1; /*this is an index value, so -1 flags as unassigned*/ if(gParam.qtofErr * gMultiplier <= metPheDiff && gParam.qtofErr != 0) { metPheDiff = TRUE; } else { metPheDiff = FALSE; } if(gParam.qtofErr * gMultiplier <= lysGlnDiff && gParam.qtofErr != 0) { lysGlnDiff = TRUE; } else { lysGlnDiff = FALSE; } /*Convert the peptide to the actual sequence (rather than index values of gGap*/ for(i = 0; i < peptideLength; i++) { if(peptide[i] < gAminoAcidNumber) /*if not a dipeptide*/ { peptide[i] = gSingAA[peptide[i]]; } else { free(string); return(NULL); /*something wrong with this sequence, so skip over it*/ } } peptide[peptideLength] = 0; /*NULL terminate the string*/ /*Now compare peptide w/ the database sequences*/ for(i = 0; i < gSeqNum; i++) { correctSeqTest = TRUE; if(gPeptideLength[i] == peptideLength) { for(j = 0; j < gPeptideLength[i]; j++) { if(peptide[j] != gDatabaseSeq[i][j]) { correctSeqTest = FALSE; /*not the same amino acid*/ if(peptide[j] == 'L' && gDatabaseSeq[i][j] == 'I') { correctSeqTest = TRUE; /*thats because I/L mixed up, so thats ok*/ } if(metPheDiff) { if(peptide[j] == 'B' && gDatabaseSeq[i][j] == 'm') { correctSeqTest = TRUE; /*B means m here (both represent ox Met*/ } } else { if(peptide[j] == 'F' && gDatabaseSeq[i][j] == 'm') { correctSeqTest = TRUE; } if(peptide[j] == 'F' && gDatabaseSeq[i][j] == 'B') { correctSeqTest = TRUE; } } if(!lysGlnDiff) { if((peptide[j] == 'Q' || peptide[j] == 'K') && (gDatabaseSeq[i][j] == 'Q' || gDatabaseSeq[i][j] == 'K')) { correctSeqTest = TRUE; } } } if(correctSeqTest == FALSE) { break; } } } else { correctSeqTest = FALSE; } if(correctSeqTest) { correctSeq = i; /*found the correct sequence, now break out*/ break; } } if(correctSeq >= 0) { for(i = 0; i < gPeptideLength[correctSeq]; i++) { p+= sprintf(p, "%c", gDatabaseSeq[correctSeq][i]); } } else { string = NULL; } p+= sprintf(p, "\0"); /* NULL terminate the string */ return(string); } char *PeptideString(INT_4 *peptide, INT_4 peptideLength) { INT_4 i, j, k; char test; char *string; char *p; char *dipeptide; INT_4 dipeptideCounter; INT_4 dipeptideMass, sum; string = (char *) malloc(200); if(!string) { return NULL; } dipeptide = (char *) malloc(3); if(dipeptide == NULL) return NULL; p = string; for(i = 0; i < peptideLength; i++) { test = TRUE; if(peptide[i] < gAminoAcidNumber) /*if single amino acid, just print the character*/ { p+= sprintf(p, "%c", gSingAA[peptide[i]]); } else /*if dipeptide....*/ { if(gGapList[peptide[i]] /gMultiplier < 1000) /*if the dipeptide is less than 1000 daltons*/ { dipeptideCounter = 0; /*count the number of possibilities for dipeptides of this mass*/ for(j = 0; j < gAminoAcidNumber; j++) { for(k = j; k < gAminoAcidNumber; k++) { sum = gMonoMass_x100[j] + gMonoMass_x100[k]; if(sum <= gGapList[peptide[i]] + gToleranceWide && sum >= gGapList[peptide[i]] - gToleranceWide) { dipeptideCounter++; } } } if(dipeptideCounter == 0) { dipeptideCounter = 1; } if(dipeptideCounter > 1 || gParam.qtofErr == 0) /*if there is more than one choice, or if the qtofErr was not used, then print the mass in brackets*/ { dipeptideMass = gGapList[peptide[i]]; /*if(gNodeCorrection[peptide[i]] > 5) { dipeptideMass = dipeptideMass + 1; } else if(gNodeCorrection[peptide[i]] <= -5) { gNodeCorrection[i] = gNodeCorrection[i] + 10; dipeptideMass = dipeptideMass - 1; }*/ if(gMultiplier == 1) { p+= sprintf(p, "[%3d]", (dipeptideMass)/gMultiplier); } else if(gMultiplier == 10) { p+= sprintf(p, "[%.1f]", ((REAL_4)dipeptideMass)/gMultiplier); } else if(gMultiplier == 100) { p+= sprintf(p, "[%.2f]", ((REAL_4)dipeptideMass)/gMultiplier); } else { p+= sprintf(p, "[%.3f]", ((REAL_4)dipeptideMass)/gMultiplier); } } else /*if there is only one choice, then print the two amino acids in brackets*/ { test = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { for(k = j; k < gAminoAcidNumber; k++) { if(gGapList[peptide[i]] == gMonoMass_x100[j] + gMonoMass_x100[k]) { dipeptide[0] = gSingAA[j]; dipeptide[1] = gSingAA[k]; dipeptide[2] = 0; test = FALSE; } } } if(test) /*non-standard dipeptide*/ { if(gMultiplier == 1) { p+= sprintf(p, "[%3d]", (gGapList[peptide[i]])/gMultiplier); } else if(gMultiplier == 10) { p+= sprintf(p, "[%.1f]", ((REAL_4)gGapList[peptide[i]])/gMultiplier); } else if(gMultiplier == 100) { p+= sprintf(p, "[%.2f]", ((REAL_4)gGapList[peptide[i]])/gMultiplier); } else { p+= sprintf(p, "[%.3f]", ((REAL_4)gGapList[peptide[i]])/gMultiplier); } } else /*standard dipeptide*/ { p += sprintf(p, "[%s]", dipeptide); } } } else /*if its some sort of weird dipeptide of mass greater than 1000 Da, then dump it an integer*/ { p+= sprintf(p, "[%3d]", gGapList[peptide[i]] / gMultiplier); } } } p+= sprintf(p, "\0"); /* NULL terminate the string */ return string; } /****************************AddPyroGlu********************************************** * * * */ void AddPyroGlu() { gSingAA[gAminoAcidNumber] = 'q'; gMonoMass[gAminoAcidNumber] = 111.032028; gAvMass[gAminoAcidNumber] = 111.05; gNomMass[gAminoAcidNumber] = 111; gMonoMass_x100[gAminoAcidNumber] = gMonoMass[gAminoAcidNumber] * gMultiplier + 0.5; gAminoAcidNumber++; return; } /***************************AddDatabaseSequences************************************* * * This bit of code is for adding sequence(s) derived from database matches using * sequest, bequest, mascot, etc. The idea is to add database-derived sequences * to the de novo - derived sequences to have them battle it out in the final * scoring and ranking. Presumably, if the database sequences are correct, they will * best account for the data, compared to the de novo sequences; if they are wrong, then * the de novo sequences will win out. * */ void AddDatabaseSequences(struct Sequence *firstSequencePtr) { struct Sequence *currPtr, *newPtr; char *stringBuffer, test; char databaseSequences[MAX_DATABASE_SEQ_NUM][MAX_PEPTIDE_LENGTH]; REAL_4 nNodeValue, peptideStartMass, peptideMass; INT_4 peptideLength[MAX_DATABASE_SEQ_NUM]; INT_4 i, j, k, seqNum, intNodeValue; INT_4 peptide[MAX_PEPTIDE_LENGTH]; REAL_4 peptideNodeMass; FILE *fp; INT_4 lysGlnDiff = (gMonoMass_x100[K] - gMonoMass_x100[Q]) * 1.5; if(gParam.qtofErr <= lysGlnDiff && gParam.qtofErr != 0) { lysGlnDiff = TRUE; } else { lysGlnDiff = FALSE; } peptideStartMass = gParam.modifiedCTerm + gParam.modifiedNTerm; /*Determine the initial nodeValue, depending on N-terminal modification*/ nNodeValue = gParam.modifiedNTerm / gMultiplier; stringBuffer = (char *)malloc(258); if (stringBuffer == NULL) { printf("LutefiskScore: stringBuffer == NULL\n"); exit(1); } fp = fopen(gParam.databaseSequences,"r"); if (fp == NULL) { printf("Cannot open the database sequence file '%s'\n", gParam.databaseSequences); exit(1); } if(gParam.fMonitor) { printf("Processing database sequence file '%s'\n", gParam.databaseSequences); } /*Read the information into a character array of single letter amino acid codes.*/ test = TRUE; seqNum = 0; gSeqNum = 0; while(!feof(fp) && test) { for(i = 0; i < 258; i++) { stringBuffer[i] = 0; } test = FALSE; if(my_fgets(stringBuffer, 256, fp) == NULL) { continue; } j = 0; /*the gDatabaseSeq and related globals keep track of the actual characters from the database sequences w/o conversion of I to L or B to F. This info is retained in order to have the output have the same sequence as in the database. N-terminal 'q' is considered as pyroglutamic acid, but 'q' anywhere else is converted to 'Q' and considered to be Gln*/ while(stringBuffer[j] >= 65 && stringBuffer[j] <= 121) { gDatabaseSeq[gSeqNum][j] = stringBuffer[j]; /*Except for m, convert lower case to upper case for global database seqs*/ if(gDatabaseSeq[gSeqNum][j] >= 97) { if(gDatabaseSeq[gSeqNum][j] != 109) { if(gDatabaseSeq[gSeqNum][j] != 113 || j != 0) /*except for N-terminal q (pyroGlu)*/ { gDatabaseSeq[gSeqNum][j] = gDatabaseSeq[gSeqNum][j] - 32; } } } test = TRUE; /*Now that the original stringBuffer was saved in the global array, I muck w/ it by converting m to F and capitalizing everything else except for q*/ if(stringBuffer[j] >= 97) { if(stringBuffer[j] == 109) { stringBuffer[j] = 70; /*m (109) usually means oxidized Met, which is almost the same mass as F (70)*/ } else if(stringBuffer[j] == 113 && j == 0) /*q (113) means N-terminal pyroglu*/ { AddPyroGlu(); /*adds pyroGlu to list*/ } else { stringBuffer[j] = stringBuffer[j] - 32; } } /*If tolerance can't differentiate lys from gln, then set Q's to K*/ if(stringBuffer[j] == 81 && !lysGlnDiff) { stringBuffer[j] = 75; } if(stringBuffer[j] == 66) /*Change B to F*/ { stringBuffer[j] = 70; } if(stringBuffer[j] == 73) { stringBuffer[j] = 76; } databaseSequences[seqNum][j] = stringBuffer[j]; j+=1; } if(j > 0) /*a legitimate sequence found*/ { for(i = j; i < 258; i++) { databaseSequences[seqNum][i] = 0; /*fill in the rest w/ zero's*/ gDatabaseSeq[gSeqNum][i] = 0; } peptideLength[seqNum] = j; gPeptideLength[gSeqNum] = j; gSeqNum += 1; seqNum += 1; } } fclose(fp); /* Convert the characters in databaseSequences to monoisotopic masses*/ for(i = 0; i < seqNum; i++) { for(j = 0; j < MAX_PEPTIDE_LENGTH; j++) { peptide[j] = 0; } peptideMass = peptideStartMass; peptideNodeMass = nNodeValue; for(j = 0; j < peptideLength[i]; j++) { for(k = 0; k < gAminoAcidNumber; k++) { if(gSingAA[k] == databaseSequences[i][j]) { peptideNodeMass += gMonoMass[k]; peptide[j] = gMonoMass_x100[k]; peptideMass += gMonoMass_x100[k]; break; } } } peptideNodeMass = peptideNodeMass * gMultiplier + 0.5; intNodeValue = peptideNodeMass; if(peptideMass <= gParam.peptideMW + gParam.peptideErr && peptideMass >= gParam.peptideMW - gParam.peptideErr) { /*first find the end of the list of the de novo sequences*/ currPtr = firstSequencePtr; while(currPtr->next != NULL) { currPtr = currPtr->next; } newPtr = (struct Sequence *) malloc(sizeof(struct Sequence)); if(newPtr == NULL) { printf("newPtr == NULL"); exit(1); } currPtr->next = newPtr; newPtr->next = NULL; for(j = 0; j < peptideLength[i]; j++) { newPtr->peptide[j] = peptide[j]; } newPtr->peptideLength = peptideLength[i]; newPtr->score = 500; newPtr->nodeValue = intNodeValue; newPtr->nodeCorrection = 0; newPtr->gapNum = -100; /*FLAG FOR BEING A DATABASE SEQUENCE*/ } } printf("%ld database sequences added.\n", seqNum); free(stringBuffer); return; } /***************************ScoreAttenuationFromCalfactor**************************** * * Sequences that need calFactors that deviate from 1.0000000000 are attenuated * according to the degree to which they differ. */ REAL_4 ScoreAttenuationFromCalfactor(REAL_4 calFactor, REAL_4 intScore) { REAL_4 newIntScore = 1; REAL_4 fudgeFactor; if(calFactor <= 0) /*calFactor = 0 was a flag for the output that no recalibration was done to the data due to a lack of sufficient calibrant ions.*/ return(intScore); if(calFactor > 1) { fudgeFactor = 1 / calFactor; } else { fudgeFactor = calFactor; } fudgeFactor = (fudgeFactor - 0.99) * 100; newIntScore = intScore * fudgeFactor; return(newIntScore); } /******************************ProlineInternalFrag************************************************ * * ProlineInternalFrag identifies internal fragment ions that contain proline at the * N-terminus. */ void ProlineInternalFrag(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 fragNum) { INT_4 i, j, k, intFrag, intFragMinErr, intFragPlusErr; INT_4 massDiff, precursor; REAL_4 currentIonFound; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; if(seqLength >= 4) /*Sequences less than 4 amino acids cannot have internal fragment ions.*/ { for(i = 1; i < (seqLength - 3); i++) /*This is the N-terminus of the fragment.*/ { if(sequence[i] == gGapList[P]) { for(j = (i + 1); j < (seqLength - 2); j++) /*C-terminus of the fragment.*/ { if(j < i + 3) /*just look at int frags less than 4 aa*/ { intFrag = gElementMass_x100[HYDROGEN]; /*Calc the mass of the int frag.*/ for(k = i; k <= j; k++) { intFrag += sequence[k]; } intFragMinErr = intFrag - gToleranceWide; intFragPlusErr = intFrag + gToleranceWide; if(intFragMinErr < precursor) /*Only count those matches where the internal frag mass is less than the precursor m/z value.*/ { k = 0; /*Look for this ion.*/ while(fragMOverZ[k] <= intFragPlusErr && k < fragNum) { if(fragMOverZ[k] >= intFragMinErr) { currentIonFound = ionFound[k]; massDiff = abs(intFrag - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); /*Attenuate the internal ion intensity.*/ ionFound[k] = ionFound[k] * INTERNAL_FRAG_MULTIPLIER ; if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } } k++; } } } /*if(j < i + 3)*/ } /*for(j = (i + 1); j < (seqLength - 2); j++)*/ } /*if(sequence[i] == gGapList[P])*/ } /*for(i = 1; i < (seqLength - 3); i++)*/ } /*if(seqLength >= 4)*/ return; } /**************************Recalibrate************************************************ * * If qtof data is used and if the qtofErr feature employed, then the mass data * is recalibrated according to the errors found between the calculated y and b * ions and the original data. The original mass values are restored at the end * of the scoring loop. * */ REAL_4 Recalibrate(INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 *fragIntensity) { INT_4 i, j, k, bCal, yCal, errNum; INT_4 bCalStart, yCalStart; INT_4 bIonMass, bIonMassMinErr, bIonMassPlusErr; INT_4 yIonMass, yIonMassMinErr, yIonMassPlusErr; INT_4 *ionFound; INT_4 yCalCorrection = 0; INT_4 bCalCorrection = 0; INT_4 chargeLimit = 1; REAL_8 *byError, stDev, avCorrectionFactor; REAL_4 precursor, avIntensity; REAL_4 totalIntensity = 0; REAL_4 totalErrorIntensity = 0; REAL_4 ratio, offSet; char test; /* Initialize.*/ precursor = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; /*For +3 precursors, only look at +1 and +2 fragment ions.*/ if(gParam.chargeState > 1) { chargeLimit = gParam.chargeState - 1; } stDev = 0; ionFound = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(ionFound == NULL) { printf("Recalibrate: out of memory.\n"); exit(1); } for(i = 0; i < MAX_ION_NUM; i++) { ionFound[i] = 0; } byError = (double *) malloc(MAX_ION_NUM * sizeof(REAL_8)); if(byError == NULL) { printf("Recalibrate: out of memory.\n"); exit(1); } for(i = 0; i < MAX_ION_NUM; i++) { byError[i] = 0; } avIntensity = 0; for(i = 0; i < fragNum; i++) { avIntensity += fragIntensity[i]; totalIntensity += fragIntensity[i]; } if(fragNum == 0) { printf("Recalibrate: fragNum = 0\n"); exit(1); } avIntensity = avIntensity / fragNum; /* Initialize the starting b ion mass (acetylated, etc). */ bCalStart = gParam.modifiedNTerm + 0.5; /*Determine the correction factor for high accuracy*/ i = gParam.modifiedNTerm * 10 + 0.5; bCalCorrection = i - bCalStart * 10; /* Initialize the starting mass for y ions. */ yCalStart = (2 * gElementMass_x100[HYDROGEN]) + gParam.modifiedCTerm + 0.5; i = (gParam.modifiedCTerm + (2 * gElementMass[HYDROGEN] * gMultiplier)) * 10 + 0.5; yCalCorrection = i - yCalStart * 10; for(i = (seqLength - 1); i > 0; i--) /*Don't do this loop for i = 0 (doesnt make sense).*/ { /* Calculate the singly charged y ion mass. */ yCal = YCalculator(i, sequence, seqLength, yCalStart, yCalCorrection); /* Calculate the singly charged b ion mass. */ bCal = BCalculator(i, sequence, bCalStart, bCalCorrection); for(j = 1; j <= chargeLimit; j++) { /*bIonMass = bCal;*/ bIonMass = (bCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bIonMassMinErr = bIonMass - gParam.fragmentErr; bIonMassPlusErr = bIonMass + gParam.fragmentErr; /*yIonMass = yCal;*/ yIonMass = (yCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yIonMassMinErr = yIonMass - gParam.fragmentErr; yIonMassPlusErr = yIonMass + gParam.fragmentErr; /* Search for b ions.*/ if(j == 1) /*Only look for +1 b ions*/ { k = fragNum - 1; while(fragMOverZ[k] >= bIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= bIonMassPlusErr) { if(ionFound[k] == 0) { ionFound[k] = bIonMass; } else { ionFound[k] = 0; /*if both b and y ions match to same ion*/ } } k--; } } /* Search for the y ion values.*/ test = TRUE; /*Only look for multiply charged y ions greater than precursor*/ if(j > 1) { if(yIonMass <= precursor + gParam.fragmentErr) { test = FALSE; } } if(test) { k = fragNum - 1; while(fragMOverZ[k] >= yIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= yIonMassPlusErr) { if(ionFound[k] == 0) { ionFound[k] = yIonMass ; } else { ionFound[k] = 0; /*if both b and y ions match to same ion*/ } } k--; } } /*if(test)*/ } /*for j*/ } /*for i*/ /* Determine the calibration slope change*/ /* First find ions greater than the precursor mass or if below the precursor has unusually high intensity. Don't use ions of unusually low abundance. */ avCorrectionFactor = 0; errNum = 0; for(i = 0; i < fragNum; i++) { if(ionFound[i] != 0 && fragMOverZ[i] != 0 && (fragMOverZ[i] > precursor || fragIntensity[i] > 2 * avIntensity) && fragIntensity[i] > avIntensity / 4) { ratio = (REAL_4)ionFound[i] / (REAL_4)fragMOverZ[i]; if(ratio > 0.9995 && ratio < 1.0005) { byError[i] = ratio; errNum++; avCorrectionFactor += byError[i]; } else { byError[i] = 100; } } else { byError[i] = 100; } } if(errNum < 3) /*if less than three points for determining correction, then dont recalibrate*/ { free(byError); free(ionFound); return(0); /*0 is returned, but is not applied to data*/ } /* Eliminate any outliers.*/ stDev = StandardDeviationOfTheBYErrors(byError, fragNum); /*stDev = stDev * 2;*/ avCorrectionFactor = avCorrectionFactor / errNum; if(stDev != 0) { for(i = 0; i < fragNum; i++) { if(byError[i] != 100) { if(byError[i] < avCorrectionFactor - stDev || byError[i] > avCorrectionFactor + stDev) { byError[i] = 100; } } } } /* Determine the correction factor.*/ avCorrectionFactor = 0; errNum = 0; for(i = 0; i < fragNum; i++) { if(byError[i] < 100) { avCorrectionFactor += (byError[i] * fragIntensity[i]); totalErrorIntensity += fragIntensity[i]; errNum++; } } if(errNum < 2) /*if less than two points for determining correction, then dont recalibrate*/ { free(byError); free(ionFound); return(0); /*0 is returned, but is not applied to data*/ } if(totalErrorIntensity == 0) { printf("Recalibrate: totalErrorIntensity = 0\n"); exit(1); } avCorrectionFactor = avCorrectionFactor / totalErrorIntensity; /* Don't use correction factors greater than 500 ppm.*/ if(avCorrectionFactor > 1.0005 || avCorrectionFactor < 0.9995) { avCorrectionFactor = 1; } /* Apply the correction factor to the masses.*/ for(i = 0; i < fragNum; i++) { fragMOverZ[i] = fragMOverZ[i] * avCorrectionFactor; if(ionFound[i] != 0) { byError[i] = ionFound[i] - fragMOverZ[i]; } else { byError[i] = 100; } } /* Determine the slope offset and apply*/ offSet = 0; errNum = 0; for(i = 0; i < fragNum; i++) { if(byError[i] != 100) { offSet += byError[i]; errNum++; } } if(offSet > 0) { offSet = offSet / errNum + 0.5; } else { offSet = offSet / errNum - 0.5; } /*this offset is only for very minor adjustments and limited to +/- 1 unit of mass*/ if(offSet >= 1) { offSet = 1; } else if(offSet <= -1) { offSet = -1; } else { offSet = 0; } for(i = 0; i < fragNum; i++) { fragMOverZ[i] = fragMOverZ[i] + offSet; } free(ionFound); free(byError); return(avCorrectionFactor); } /**********************************IsThisADuplicate************************************ * * Find out if this sequence is similar to existing sequences that have been scored * and stored (ie, same intOnlyScore and where elements of the sequence are identical * or within tolerance). */ char IsThisADuplicate(struct SequenceScore *firstScorePtr, INT_4 *sequence, REAL_4 intOnlyScore, REAL_4 intScore, INT_4 seqLength) { struct SequenceScore *currPtr; INT_4 i, j, k, m, n; char test = TRUE; currPtr = firstScorePtr; while(currPtr != NULL) { test = TRUE; j = intOnlyScore * 1000 + 0.5; k = (currPtr->intensityOnlyScore) * 1000 + 0.5; m = intScore * 1000 + 0.5; n = (currPtr->intensityScore) * 1000 + 0.5; if(j == k && m == n) { i = 0; while(currPtr->peptide[i] != 0) { i++; } if(i == seqLength) { for(i = 0; i < seqLength; i++) { if(sequence[i] < currPtr->peptide[i] - gToleranceNarrow || sequence[i] > currPtr->peptide[i] + gToleranceNarrow) { test = FALSE; } } if(test) { test = FALSE; return(test); } } } currPtr = currPtr->next; } test = TRUE; /*if it made it this far, then its not a duplicate*/ return(test); } /***********************************GoodSequence*************************************** * * This function determines if the "new" sequence is just a repeat of an old sequence, * and it also makes sure that the mass of the sequence equals the peptide mass. */ char GoodSequence(struct Sequence *firstSequencePtr, INT_4 *peptide, INT_4 *aaCorrection, INT_4 peptideLength) { INT_4 peptideMass, peptideMassCorrection, i, j; char test = TRUE; /*test is assumed to be true, unless proven otherwise, and this is the returned value*/ struct Sequence *checkPtr; checkPtr = firstSequencePtr; while(checkPtr != NULL && test) { test = FALSE; for(i = 0; i < checkPtr->peptideLength; i++) { if(peptide[i] != checkPtr->peptide[i]) { test = TRUE; } } checkPtr = checkPtr->next; } if(test) /*verify that the sequences matches the peptide molecular weight.*/ { peptideMass = gParam.modifiedNTerm + 0.5; /*Determine the correction factor for high accuracy*/ i = gParam.modifiedNTerm * 10 + 0.5; peptideMassCorrection = i - peptideMass * 10; peptideMass += (gParam.modifiedCTerm + 0.5); i = gParam.modifiedCTerm * 10 + 0.5; j = gParam.modifiedCTerm + 0.5; j *= 10; peptideMassCorrection += (i - j); for(i = 0; i < peptideLength; i++) { peptideMass += peptide[i]; peptideMassCorrection += aaCorrection[i]; if(peptideMassCorrection >= 10) { peptideMass += 1; peptideMassCorrection -= 10; } if(peptideMassCorrection <= -10) { peptideMass -= 1; peptideMassCorrection += 10; } } if(peptideMass < gParam.peptideMW - gParam.peptideErr || peptideMass > gParam.peptideMW + gParam.peptideErr) { test = FALSE; } } return(test); } /*****************************ExpandSequences******************************************* * * This function takes the final list of completed sequences and (for qtof only) expands * the number of sequences depending on the other members of the updated gGapList that are * within the fragment ion tolerance that is set wide during the subsequence buildup. Later * these sequences (the original ones plus the new ones based upon the old sequences) are scored * using tighter qtof error tolerances and the data is recalibrated for each sequence. */ void ExpandSequences(struct Sequence *firstSequencePtr) { struct Sequence *currPtr = NULL; struct Sequence *stopPtr = NULL; struct Sequence *newPtr = NULL; struct Sequence *lastPtr = NULL; struct Sequence *checkPtr = NULL; struct Sequence *testPtr = NULL; /*debug*/ INT_4 i, j, peptideNum; INT_4 n = 1, p = 0, q = 0; /*debug*/ INT_4 sequenceMatrix[MAX_PEPTIDE_LENGTH][30], aaNum[MAX_PEPTIDE_LENGTH]; INT_4 sequence[MAX_PEPTIDE_LENGTH]; char stop = TRUE; char allNegative; char test; INT_4 cycle, countTheSeqs; INT_4 peptide[MAX_PEPTIDE_LENGTH], aaCorrection[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength, score, nodeValue, gapNum; INT_2 nodeCorrection; currPtr = firstSequencePtr; if(currPtr == NULL) { printf("There are no sequences in ExpandSequences.\n"); exit(1); } while(currPtr->next != NULL) { currPtr = currPtr->next; } stopPtr = currPtr; lastPtr = currPtr; currPtr = firstSequencePtr; while(stop) { if(currPtr == stopPtr) /*Signal that the last of the original sequences is being examined.*/ stop = FALSE; /*initialize variables.*/ for(i = 0; i < MAX_PEPTIDE_LENGTH; i++) { aaNum[i] = 0; sequence[i] = 0; } for(i = 0; i < MAX_PEPTIDE_LENGTH; i++) { for(j = 0; j < 30; j++) { sequenceMatrix[i][j] = 0; } } /* Determine the different residue possibilities for each position in the sequence.*/ for(i = 0; i < currPtr->peptideLength; i++) { for(j = 0; j <= gGapListIndex; j++) { if(currPtr->peptide[i] <= gGapList[j] + gParam.fragmentErr && currPtr->peptide[i] >= gGapList[j] - gParam.fragmentErr) { test = FALSE; sequenceMatrix[i][aaNum[i]] = j; aaNum[i] = aaNum[i] + 1; if(aaNum[i] >= 30) { printf("LutefiskScore: Number of residues in sequenceMatrix exceeds 30.\n"); exit(1); } } } if(aaNum[i] == 0) { gGapListIndex++; gGapList[gGapListIndex] = currPtr->peptide[i]; gNodeCorrection[gGapListIndex] = 0; sequenceMatrix[i][aaNum[i]] = gGapListIndex; aaNum[i] += 1; } n *= aaNum[i]; } if(gParam.proteolysis == 'T') /*If the C-terminal amino acid is close to R or K, then force it for tryptics*/ { test = FALSE; for(i = 0; i < aaNum[(currPtr->peptideLength) - 1]; i++) { if(gGapList[sequenceMatrix[(currPtr->peptideLength)-1][i]] <= gGapList[K] + gParam.fragmentErr && gGapList[sequenceMatrix[(currPtr->peptideLength)-1][i]] >= gGapList[K] - gParam.fragmentErr) { test = TRUE; } } if(test) { aaNum[(currPtr->peptideLength)-1] = 1; sequenceMatrix[(currPtr->peptideLength)-1][0] = K; } test = FALSE; for(i = 0; i < aaNum[(currPtr->peptideLength)-1]; i++) { if(gGapList[sequenceMatrix[(currPtr->peptideLength)-1][i]] <= gGapList[R] + gParam.fragmentErr && gGapList[sequenceMatrix[(currPtr->peptideLength)-1][i]] >= gGapList[R] - gParam.fragmentErr) { test = TRUE; } } if(test) { aaNum[(currPtr->peptideLength)-1] = 1; sequenceMatrix[(currPtr->peptideLength)-1][0] = R; } } /* Test that there won't be too many sequences made. If too many, then eliminate all multiple choices for each position, except for Q/K and m/F.*/ peptideNum = 1; for(i = 0; i < currPtr->peptideLength; i++) { peptideNum *= aaNum[i]; } if((peptideNum > 250 && gCorrectMass) || (peptideNum > 50 && !gCorrectMass)) /*100 is arbitrary gAminoAcidNumber*/ { /*anything thats not a single aa is made negative*/ for(i = 0; i < currPtr->peptideLength; i++) { if(aaNum[i] > 1) { for(j = 0; j < aaNum[i]; j++) { if(sequenceMatrix[i][j] >= gAminoAcidNumber) { sequenceMatrix[i][j] = -1 * sequenceMatrix[i][j]; } } } } /*if all are negative then the first is made positive*/ for(i = 0; i < currPtr->peptideLength; i++) { if(aaNum[i] > 1) { allNegative = TRUE; for(j = 0; j < aaNum[i]; j++) { if(sequenceMatrix[i][j] > 0) { allNegative = FALSE; } } if(allNegative) { sequenceMatrix[i][0] = -1 * sequenceMatrix[i][0]; } } } /*eliminate negatives*/ for(i = 0; i < currPtr->peptideLength; i++) { for(j = 0; j < aaNum[i]; j++) { if(sequenceMatrix[i][j] < 0) break; } aaNum[i] = j; } } /* Create and store the new sequences.*/ for(i = 0; i < MAX_PEPTIDE_LENGTH; i++) { sequence[i] = 0; /*initialize*/ } sequence[0] = -1; cycle = 0; /*initialize*/ while(Ratchet(aaNum, cycle, sequence, currPtr->peptideLength)) { for(i = 0; i < currPtr->peptideLength; i++) { peptide[i] = gGapList[sequenceMatrix[i][sequence[i]]]; aaCorrection[i] = gNodeCorrection[sequenceMatrix[i][sequence[i]]]; } peptideLength = currPtr->peptideLength; score = currPtr->score; nodeValue = currPtr->nodeValue; nodeCorrection = currPtr->nodeCorrection; gapNum = currPtr->gapNum; test = GoodSequence(firstSequencePtr, peptide, aaCorrection, peptideLength); if(test) /*if different from the original, then store it as a new addition*/ { /*debug*/ /* p = 0; testPtr = firstSequencePtr; while(testPtr != NULL) { p++; testPtr = testPtr->next; }*/ newPtr = LoadSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection); /*debug*/ /* p = 0; testPtr = firstSequencePtr; while(testPtr != NULL) { p++; testPtr = testPtr->next; }*/ lastPtr->next = newPtr; lastPtr = newPtr; } } currPtr = currPtr->next; } /* Count the sequences in the list again. */ countTheSeqs = 0; currPtr = firstSequencePtr; while(currPtr != NULL) { countTheSeqs += 1; currPtr = currPtr->next; } if(gParam.fMonitor && gCorrectMass) { printf("Sequences expanded to %5ld for qtof score.\n", countTheSeqs); } return; } /*****************************Ratchet************************************************ * * This function is similar to RatchetIt that was used for incorporating Edman data. * Its used to alter the values in the array sequence. * */ char Ratchet(INT_4 *aaNum, INT_4 cycle, INT_4 *sequence, INT_4 seqLength) { char test = TRUE; sequence[cycle]++; if(sequence[cycle] >= aaNum[cycle] && cycle <= seqLength) { sequence[cycle] = 0; cycle++; if(cycle > seqLength) { test = FALSE; return(test); } test = Ratchet(aaNum, cycle, sequence, seqLength); } return(test); } /*************************AddToGapList**************************************** * * If a "dipeptide" from a sequence is actually due to three amino acids or more, then * it will not be found in the gGapList. To make this work, I need to add any * of these strange "dipeptides" to gGapList. */ void AddToGapList(struct Sequence *firstSequencePtr) { struct Sequence *currPtr; INT_4 i, j; char test; /*see if any "amino acids" in the sequence do not match any of the gGapList entries*/ currPtr = firstSequencePtr; while(currPtr != NULL) { for(i = 0; i < currPtr->peptideLength; i++) { test = TRUE; for(j = 0; j <= gGapListIndex; j++) { /*if(currPtr->peptide[i] <= gGapList[j] + gParam.fragmentErr && currPtr->peptide[i] >= gGapList[j] - gParam.fragmentErr)*/ if(currPtr->peptide[i] == gGapList[j]) { test = FALSE; break; } } if(test) /*if not found, then add this to the list*/ { if(gGapListIndex < MAX_GAPLIST - 1) { gGapListIndex++; gGapList[gGapListIndex] = currPtr->peptide[i]; gNodeCorrection[gGapListIndex] = 0; } } } currPtr = currPtr->next; } return; } /*****************************MakeNewgGapList**************************************** * * This function makes a new gGapList where Q and K are differentiated, and the position for * 'I' becomes 'm' and represents oxidized Met. Dipeptides are all listed unless they are exactly * the same mass (isomeric). A corresponding gNodeCorrection is used for all gGapList members * which represents the value that is added to gGapList to obtain the value of the next decimal * place. */ void MakeNewgGapList() { INT_4 i, j, k; INT_4 lysGlnDiff = (gMonoMass_x100[K] - gMonoMass_x100[Q]) * 1.5; INT_4 metPheDiff; INT_4 sum, massToAdd, absentFlag; REAL_4 correction; char delAmAcid, duplicateFlag; gMonoMass[I] = 147.0354; /*Reset the 'I' position to represent oxidized Met*/ gAvMass[I] = 147.193; gNomMass[I] = 147; gSingAA[I] = 'm'; gMonoMass_x100[I] = (gMonoMass[I] * gMultiplier) + 0.5; metPheDiff = (gMonoMass_x100[F] - gMonoMass_x100[I]) * 1.5; for(i = 0; i < MAX_GAPLIST; i++) { gGapList[i] = 0; } gGapListIndex = -1; for(i = 0; i < gAminoAcidNumber; i++) /*Copy the single aa extension masses.*/ { absentFlag = FALSE; if(gParam.aaAbsent[0] != '*') /* Check to see if the AA is on the absent list. */ { /* (We won't add it to the gap list if it is.) */ delAmAcid = gParam.aaAbsent[0]; j = 0; while(delAmAcid != 0 && (delAmAcid >= 'A' && delAmAcid <= 'Y')) { if(gSingAA[i] == delAmAcid) { absentFlag = TRUE; break; } j++; delAmAcid = gParam.aaAbsent[j]; } } if(absentFlag || (i == I && gParam.qtofErr >= metPheDiff) || (i == Q && gParam.qtofErr >= lysGlnDiff)) /* Ile and Gln, which are represented by Leu and Lys.*/ { massToAdd = 0; correction = 0; } else if(i == C && gParam.cysMW != 0) { massToAdd = (gParam.cysMW) + 0.5; /*Change the mass for cysteine (in case its alkylated).*/ correction = (gMonoMass[C] * gMultiplier * 10) - (gMonoMass_x100[C] * 10); } else { massToAdd = gMonoMass_x100[i]; correction = (gMonoMass[i] * gMultiplier * 10) - (gMonoMass_x100[i] * 10); } gGapList[i] = massToAdd; if(correction >= 0) { gNodeCorrection[i] = correction + 0.5; } else { gNodeCorrection[i] = correction - 0.5; } } gGapListIndex = gAminoAcidNumber - 1; for(i = 0; i < gAminoAcidNumber; i++) /*Fill in the masses of the 2 AA extensions.*/ { for(j = i; j < gAminoAcidNumber; j++) { if(gGapList[i] == 0 || gGapList[j] == 0) continue; sum = gGapList[i] + gGapList[j]; /*sum = ((gMonoMass[i] + gMonoMass[j]) * gMultiplier) + 0.5;*/ correction = ((gMonoMass[i] + gMonoMass[j]) * gMultiplier * 10) - ((gMonoMass_x100[i] + gMonoMass_x100[j]) * 10); /*correction = (((gMonoMass[i] + gMonoMass[j]) * gMultiplier) * 10) - (sum * 10);*/ duplicateFlag = FALSE; for(k = 0; k <= gGapListIndex; k++) { if(gGapList[k] == sum && (gNodeCorrection[k] <= correction + 1 && gNodeCorrection[k] >= correction - 1)) { /* We already have this mass so don't add it to the list. */ duplicateFlag = TRUE; break; } } if(!duplicateFlag) { gGapListIndex = gGapListIndex + 1; gGapList[gGapListIndex] = sum; if(correction >= 0) { gNodeCorrection[gGapListIndex] = correction + 0.5; } else { gNodeCorrection[gGapListIndex] = correction - 0.5; } } } } for(i = 0; i <= gGapListIndex; i++) { if(gNodeCorrection[i] >= 10) { gNodeCorrection[i] = gNodeCorrection[i] - 10; gGapList[i] = gGapList[i] + 1; } else if(gNodeCorrection[i] <= -10) { gNodeCorrection[i] = gNodeCorrection[i] + 10; gGapList[i] = gGapList[i] - 1; } } return; } /*****************************SequenceLengthCalc************************************* * * Calculate the actual sequence length by assuming that any gaps are due to two * amino acids rather than counting them as one. */ INT_4 SequenceLengthCalc(INT_4 *sequence, INT_4 seqLength) { INT_4 realSeqLength, i, j; char test; realSeqLength = 0; for(i = 0; i < seqLength; i++) { test = FALSE; /*test = FALSE when its not a single amino acid, or a two aa extension w/ Pro*/ for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] == gGapList[j]) { test = TRUE; break; } if(sequence[i] <= gProPlus[j] + gToleranceNarrow && sequence[i] >= gProPlus[j] - gToleranceNarrow) { test = TRUE; break; } } if(i == 0) /*don't penalize for a gap at the n-terminal position*/ { test = TRUE; } if(test) { realSeqLength++; } else { realSeqLength += 2; } } return(realSeqLength); } /*****************************SequenceLengthCalc************************************* * * Calculate the actual sequence length by assuming that any gaps are due to two * amino acids rather than counting them as one. */ INT_4 SequenceLengthCalcNoFudge(INT_4 *sequence, INT_4 seqLength) { INT_4 realSeqLength, residueNumGuess, i, j; REAL_4 averageResidueMass = AV_RESIDUE_MASS * gMultiplier; char test; realSeqLength = 0; for(i = 0; i < seqLength; i++) { test = FALSE; /*test = FALSE when its not a single amino acid, or a two aa extension w/ Pro*/ for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] == gGapList[j]) { test = TRUE; break; } } if(test) { realSeqLength++; } else { for(j = gAminoAcidNumber; j < gGapListIndex; j++) { if(sequence[i] == gGapList[j]) { test = TRUE; break; } } if(test) { realSeqLength += 2; } else { residueNumGuess = ((REAL_4)sequence[i] / averageResidueMass) + 0.5; realSeqLength = realSeqLength + residueNumGuess; } } } return(realSeqLength); } /***************************** BoostTheCTerminals ************************************ * * For tryptic peptides, the +1 y ions for C-terminal Lys or Arg are boosted for QTof * and quadrupole data for +1 or +2 precursors. For LCQ data, the high mass b ions for * cleavage of Lys or Arg are also boosted. The same boosting is done for C-terminal * Lys for Lys-C peptides, Asp or Glu for Glu-C peptides, and Asp for N-terminal Asp-N * peptides. */ void BoostTheCTerminals(struct MSData *firstMassPtr) { struct MSData *currPtr; INT_4 yArg, yLys, bArg, bLys, yGlu, yAsp, bGlu, bAsp; INT_4 bCalStart, yCalStart; INT_4 ionNum = 0; REAL_4 cTerminalBoost = 2.5; /*Boost ion intensity by this amount.*/ REAL_4 averageIntensity = 0; REAL_4 lcqBIonIntensity = 0; REAL_4 mToAFactor, argInt, lysInt; if(gParam.chargeState > 2) return; /*don't do anything for precursor charges greater than 2*/ /* Find the average intensity of ions in the list, and calculate the lcqBIonIntensity value, which will be twice the average. This will be the new intensity for high mass b ions for loss of K or R in lcq data.*/ currPtr = firstMassPtr; while(currPtr != NULL) { averageIntensity += currPtr->intensity; ionNum++; currPtr = currPtr->next; } if(ionNum == 0) { printf("BoostTheCTerminals: ionNum = 0"); exit(1); } averageIntensity = averageIntensity / ionNum; lcqBIonIntensity = averageIntensity * 2; /* Initialize the starting b ion mass. */ bCalStart = gParam.peptideMW - (gParam.modifiedCTerm + 0.5); /*Correct for average masses.*/ if(bCalStart > gParam.monoToAv) { mToAFactor = 0; } else { if(bCalStart >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - bCalStart) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(bCalStart >= (gParam.monoToAv - gAvMonoTransition)) { bCalStart = bCalStart * mToAFactor; } /* Initialize the starting mass for y ions. */ yCalStart = gParam.modifiedCTerm + (2 * gElementMass_x100[HYDROGEN]) + 0.5; /*Correct for average masses.*/ if(yCalStart > gParam.monoToAv) { mToAFactor = 0; } else { if(yCalStart >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - yCalStart) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(yCalStart >= (gParam.monoToAv - gAvMonoTransition)) { yCalStart = yCalStart * mToAFactor; } /* Boost ions for tryptic peptides.*/ if(gParam.proteolysis == 'T') { yArg = yCalStart + gMonoMass_x100[R]; yLys = yCalStart + gMonoMass_x100[K]; bArg = bCalStart - gMonoMass_x100[R]; bLys = bCalStart - gMonoMass_x100[K]; /*Look to see if both y1 ions are present and pick the most abundant one to boost*/ argInt = 0; lysInt = 0; currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yArg + gParam.fragmentErr && currPtr->mOverZ >= yArg - gParam.fragmentErr) { argInt = currPtr->intensity; } if(currPtr->mOverZ <= yLys + gParam.fragmentErr && currPtr->mOverZ >= yLys - gParam.fragmentErr) { lysInt = currPtr->intensity; } currPtr = currPtr->next; } if(lysInt != 0 && argInt != 0) { if(lysInt > 2 * argInt) { argInt = 0; } if(argInt > 2 * lysInt) { lysInt = 0; } } /*Arg y ion*/ if(argInt != 0) { currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yArg + gParam.fragmentErr && currPtr->mOverZ >= yArg - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } /*Lys y ion*/ if(lysInt != 0) { currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yLys + gParam.fragmentErr && currPtr->mOverZ >= yLys - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } if(gParam.fragmentPattern == 'L') { /*Look to see if both b ions are present and pick the most abundant one to boost*/ argInt = 0; lysInt = 0; currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bArg + gParam.fragmentErr && currPtr->mOverZ >= bArg - gParam.fragmentErr) { argInt = currPtr->intensity; } if(currPtr->mOverZ <= bLys + gParam.fragmentErr && currPtr->mOverZ >= bLys - gParam.fragmentErr) { lysInt = currPtr->intensity; } currPtr = currPtr->next; } if(lysInt != 0 && argInt != 0) { if(lysInt > 2 * argInt) { argInt = 0; } if(argInt > 2 * lysInt) { lysInt = 0; } } /*Arg b ion*/ if(argInt != 0) { currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bArg + gParam.fragmentErr && currPtr->mOverZ >= bArg - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } /*Lys b ion*/ if(lysInt != 0) { currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bLys + gParam.fragmentErr && currPtr->mOverZ >= bLys - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } } } /* Boost ions for Lys-C peptides.*/ if(gParam.proteolysis == 'K') { yLys = yCalStart + gMonoMass_x100[K]; bLys = bCalStart - gMonoMass_x100[K]; /*Lys y ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yLys + gParam.fragmentErr && currPtr->mOverZ >= yLys - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } if(gParam.fragmentPattern == 'L') { /*Lys b ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bLys + gParam.fragmentErr && currPtr->mOverZ >= bLys - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } } /* Boost ions for Glu-C peptides.*/ if(gParam.proteolysis == 'E') { yGlu = yCalStart + gMonoMass_x100[E]; yAsp = yCalStart + gMonoMass_x100[D]; bGlu = bCalStart - gMonoMass_x100[E]; bAsp = bCalStart - gMonoMass_x100[D]; /*Glu y ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yGlu + gParam.fragmentErr && currPtr->mOverZ >= yGlu - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } /*Asp y ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yAsp + gParam.fragmentErr && currPtr->mOverZ >= yAsp - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } if(gParam.fragmentPattern == 'L') { /*Glu b ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bGlu + gParam.fragmentErr && currPtr->mOverZ >= bGlu - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } /*Asp b ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bAsp + gParam.fragmentErr && currPtr->mOverZ >= bAsp - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } } /* Boost ions for Asp-N peptides.*/ if(gParam.proteolysis == 'D') { /*Initialize the starting b ion mass (acetylated, etc).*/ bCalStart = gParam.modifiedNTerm + 0.5; /*Correct for average masses.*/ if(bCalStart > gParam.monoToAv) { mToAFactor = 0; } else { if(bCalStart >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - bCalStart) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(bCalStart >= (gParam.monoToAv - gAvMonoTransition)) { bCalStart = bCalStart * mToAFactor; } /*Initialize the starting y ion mass.*/ yCalStart = gParam.peptideMW + (2 * gElementMass_x100[HYDROGEN]) - (gParam.modifiedNTerm + 0.5); /*Correct for average masses.*/ if(yCalStart > gParam.monoToAv) { mToAFactor = 0; } else { if(yCalStart >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - yCalStart) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(yCalStart >= (gParam.monoToAv - gAvMonoTransition)) { yCalStart = yCalStart * mToAFactor; } yAsp = yCalStart - gMonoMass_x100[D]; bAsp = bCalStart + gMonoMass_x100[D]; /*Asp y ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= yAsp + gParam.fragmentErr && currPtr->mOverZ >= yAsp - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } if(gParam.fragmentPattern == 'L') { /*Asp b ion*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ <= bAsp + gParam.fragmentErr && currPtr->mOverZ >= bAsp - gParam.fragmentErr) { currPtr->intensity = (currPtr->intensity) * cTerminalBoost; if(currPtr->intensity < averageIntensity) { currPtr->intensity = averageIntensity; } } currPtr = currPtr->next; } } } return; } /*********************** StandardDeviationOfTheBYErrors ****************************** * * For each peptide, the average error is determined and the standard deviation * around this average is determined. Smaller standard deviations of the error * indicate tighter agreement with the proposed sequence. */ REAL_4 StandardDeviationOfTheBYErrors(REAL_8 *byError, INT_4 fragNum) { REAL_8 averageError, diffSquared, sumOfDiffSquared, stDevErr; REAL_4 stDev; INT_4 errorNum, i; averageError = 0; errorNum = 0; for(i = 0; i < fragNum; i++) { if(byError[i] < 100) { averageError += byError[i]; errorNum++; } } if(errorNum < 2) { return(0); } averageError = averageError / errorNum; sumOfDiffSquared = 0; for(i = 0; i < fragNum; i++) { if(byError[i] < 100) { diffSquared = byError[i] - averageError; diffSquared = diffSquared * diffSquared; sumOfDiffSquared += diffSquared; } } if(errorNum <= 1) return(0); stDevErr = sumOfDiffSquared / (errorNum - 1); stDevErr = sqrt(stDevErr); stDev = stDevErr; /*convert to a REAL_4 to return*/ return(stDev); } /*************************** AssignError ********************************************* * * Assigns plus/negative error to ions found as b or y ions. If an ion has been found * as both b and y, then the lower error is saved. */ REAL_4 AssignError(REAL_4 currentError, INT_4 calculatedMass, INT_4 observedMass) { REAL_4 newError; INT_4 oldMassDiff, newMassDiff; newMassDiff = abs(calculatedMass - observedMass); oldMassDiff = currentError; oldMassDiff = abs(oldMassDiff); newError = observedMass - calculatedMass; if(currentError != 100) { if(oldMassDiff < newMassDiff) { newError = currentError; } } return(newError); } /******************************* SingleAA ********************************************* * * Returns a FALSE if the amino acid is not single amino acid extension. */ char SingleAA(INT_4 aminoAcidMass) { char test = FALSE; INT_4 i; INT_4 error = 0.4 * gMultiplier; for(i = 0; i < gAminoAcidNumber; i++) { if(aminoAcidMass <= gMonoMass_x100[i] + error && aminoAcidMass >= gMonoMass_x100[i] - error) { test = TRUE; break; } } return(test); } /************************** RemoveRedundantSequences *********************************** * * */ struct Sequence *RemoveRedundantSequences(struct Sequence *firstSequencePtr) { struct Sequence *currPtr, *checkThisPtr, *freeMePtr, *previousPtr; INT_4 i, j, testMass, countTheSeqs; INT_4 error = 0.4 * gMultiplier; /*if error is big, then many unrelated seqs are removed*/ char test; currPtr = firstSequencePtr; while(currPtr != NULL) { if(currPtr->score != 0) { checkThisPtr = firstSequencePtr->next; while(checkThisPtr != NULL) { if(checkThisPtr->peptideLength <= currPtr->peptideLength && checkThisPtr->score != 0 && checkThisPtr != currPtr) { test = TRUE; /*is TRUE if the two sequences are similar*/ j = 0; for(i = 0; i < checkThisPtr->peptideLength; i++) { if(j < currPtr->peptideLength) { if(currPtr->peptide[j] <= checkThisPtr->peptide[i] + error && currPtr->peptide[j] >= checkThisPtr->peptide[i] - error) { j++; } else { if(checkThisPtr->peptide[i] < currPtr->peptide[j]) { test = FALSE; break; } testMass = currPtr->peptide[j]; test = SingleAA(currPtr->peptide[j]); while(j < currPtr->peptideLength && testMass < checkThisPtr->peptide[i] - error) { j++; testMass += currPtr->peptide[j]; test = SingleAA(currPtr->peptide[j]); if(testMass <= checkThisPtr->peptide[i] + error && testMass >= checkThisPtr->peptide[i] - error) { /*make sure this is not a K = AG situation*/ if((testMass <= gMonoMass_x100[K] + error && testMass >= gMonoMass_x100[K] - error) || (testMass <= gMonoMass_x100[Q] + error && testMass >= gMonoMass_x100[Q] - error) || (testMass <= gMonoMass_x100[R] + error && testMass >= gMonoMass_x100[R] - error) || (testMass <= gMonoMass_x100[W] + error && testMass >= gMonoMass_x100[W] - error) || (testMass <= gMonoMass_x100[N] + error && testMass >= gMonoMass_x100[N] - error)) { test = FALSE; } j++; break; } if(testMass > checkThisPtr->peptide[i] + error) { test = FALSE; break; } } if(test == FALSE) { break; } } } } if(test) { if(checkThisPtr->gapNum == -100) /*this is signal that sequence was from database*/ { currPtr->score = 0; } else { checkThisPtr->score = 0; } } } checkThisPtr = checkThisPtr->next; } } currPtr = currPtr->next; } /*Free the pointers with scores of zero*/ while(firstSequencePtr != NULL) { if(firstSequencePtr->score == 0) { freeMePtr = firstSequencePtr; firstSequencePtr = firstSequencePtr->next; free(freeMePtr); } else { break; } } if(firstSequencePtr != NULL) { previousPtr = firstSequencePtr; currPtr = firstSequencePtr->next; while(currPtr != NULL) { if(currPtr->score == 0) { freeMePtr = currPtr; previousPtr->next = currPtr->next; currPtr = currPtr->next; free(freeMePtr); } else { currPtr = currPtr->next; previousPtr = previousPtr->next; } } } countTheSeqs = 0; currPtr = firstSequencePtr; while(currPtr != NULL) { countTheSeqs++; currPtr = currPtr->next; } if(gParam.fMonitor && gCorrectMass) { printf("Scoring %4ld remaining after removing redundant sequences.\n", countTheSeqs); } if(countTheSeqs == 0) { printf("RemoveRedundantSequences: countTheSeqs = 0\n"); exit(1); } return(firstSequencePtr); } /***************************RevertBackToReals***************************************** * * Divide all of the mass-related gParams values by gMultiplier. Revert the peak masses * back to floats. * */ void RevertBackToReals(struct MSData *firstMassPtr, struct SequenceScore *firstScorePtr) { struct MSData *currPtr; struct SequenceScore *currSeqPtr; INT_4 i, j; /* Assign character sequence to peptideSequence field (so as to not lose the K/Q differentiation.*/ currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { i = 0; while(currSeqPtr->peptide[i] != 0) { for(j = 0; j <= /*gAminoAcidNumber*/ gGapListIndex; j++) { if(currSeqPtr->peptide[i] == gGapList[j]) { currSeqPtr->peptideSequence[i] = j; } } i++; } currSeqPtr = currSeqPtr->next; } /* Convert the standard deviations of the b and y errors back.*/ currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { currSeqPtr->stDevErr = (currSeqPtr->stDevErr) / gMultiplier; currSeqPtr = currSeqPtr->next; } /* Convert the gParam values back.*/ gParam.peptideMW = gParam.peptideMW / gMultiplier; gParam.monoToAv = gParam.monoToAv / gMultiplier; gParam.peptideErr = gParam.peptideErr / gMultiplier; gParam.fragmentErr = gParam.fragmentErr / gMultiplier; gParam.ionOffset = gParam.ionOffset / gMultiplier; gParam.cysMW = gParam.cysMW / gMultiplier; gParam.tagNMass = gParam.tagNMass / gMultiplier; gParam.tagCMass = gParam.tagCMass / gMultiplier; gParam.peakWidth = gParam.peakWidth / gMultiplier; gParam.qtofErr = gParam.qtofErr / gMultiplier; gParam.modifiedNTerm = gParam.modifiedNTerm / gMultiplier; gParam.modifiedCTerm = gParam.modifiedCTerm / gMultiplier; /* gToleranceWide = (REAL_4)gToleranceWide / gMultiplier; gToleranceNarrow = (REAL_4)gToleranceNarrow / gMultiplier;*/ /* Assign 0 values for gNomMass in places where gGapList is zero (to eliminate using amino acids that are absent, or redundant. I need to do this in order to calculate b and y ions in xcorr properly*/ for(i = 0; i < gAminoAcidNumber; i++) { gNomMass[i] = (REAL_4)gGapList[i] / gMultiplier + 0.5; } /* Convert gGapList to nominal values*/ /*for(i = 0; i < gGapListIndex; i++) { gGapList[i] = gGapList[i] / gMultiplier; } for(i = 0; i < gGapListIndex; i++) { for(j = 0; j < gGapListIndex; j++) { if(i != j) { if(gGapList[i] != 0 && gGapList[j] != 0) { if(gGapList[i] == gGapList[j]) { gGapList[i] = 0; } } } } }*/ /* Convert the peak masses back.*/ currPtr = firstMassPtr; while(currPtr != NULL) { currPtr->mOverZ = (currPtr->mOverZ) / gMultiplier; currPtr = currPtr->next; } /* Convert the sequence list.*/ currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { i = 0; while(currSeqPtr->peptide[i] != 0) { currSeqPtr->peptide[i] = (currSeqPtr->peptide[i]) / gMultiplier; i++; } currSeqPtr = currSeqPtr->next; } if(gAmIHere) { i = 0; while ( gRightSequence[i] != 0) { gRightSequence[i] = gRightSequence[i] / gMultiplier; i++; } } return; } /***************************RevertTheRevertBackToReals***************************************** * * Change everything back the way it was (I know this is stupid...). * */ void RevertTheRevertBackToReals(struct MSData *firstMassPtr) { struct MSData *currPtr; INT_4 i; /* Convert the gParam values back.*/ gParam.peptideMW = gParam.peptideMW * gMultiplier; gParam.monoToAv = gParam.monoToAv * gMultiplier; gParam.peptideErr = gParam.peptideErr * gMultiplier; gParam.fragmentErr = gParam.fragmentErr * gMultiplier; gParam.ionOffset = gParam.ionOffset * gMultiplier; gParam.cysMW = gParam.cysMW * gMultiplier; gParam.tagNMass = gParam.tagNMass * gMultiplier; gParam.tagCMass = gParam.tagCMass * gMultiplier; gParam.peakWidth = gParam.peakWidth * gMultiplier; gParam.qtofErr = gParam.qtofErr * gMultiplier; gParam.modifiedNTerm = gParam.modifiedNTerm * gMultiplier; gParam.modifiedCTerm = gParam.modifiedCTerm * gMultiplier; /* Assign 0 values for gNomMass in places where gGapList is zero (to eliminate using amino acids that are absent, or redundant. I need to do this in order to calculate b and y ions in xcorr properly*/ for(i = 0; i < gAminoAcidNumber; i++) { if(gGapList[i] == 0) { gNomMass[i] = 0; } /*gNomMass[i] = (REAL_4)gGapList[i] * gMultiplier; */ } gMonoMass[9] = 113.08407; gMonoMass[6] = 128.05858; gSingAA[9] = 'I'; /* Convert the peak masses back.*/ currPtr = firstMassPtr; while(currPtr != NULL) { currPtr->mOverZ = (currPtr->mOverZ) * gMultiplier; currPtr = currPtr->next; } return; } /******************************* AdjustIonIntensity ****************************************** * * */ void AdjustIonIntensity(INT_4 fragNum, INT_4 *fragIntensity) { REAL_8 averageIntensity = 0; REAL_8 intensityDiff = 0; REAL_8 summedIntensityDiff = 0; REAL_8 standardDev = 0; INT_4 confidenceLimitHigh, confidenceLimitLow; INT_4 i; for(i = 0; i < fragNum; i++) { averageIntensity += fragIntensity[i]; } if(fragNum <= 1) { printf("AdjustIonIntensity: fragNum = 0\n"); exit(1); } averageIntensity = (REAL_4)averageIntensity / fragNum + 0.5; /*Calculate the standard deviation*/ for(i = 0; i < fragNum; i++) { intensityDiff = averageIntensity - fragIntensity[i]; intensityDiff = intensityDiff * intensityDiff; summedIntensityDiff += intensityDiff; } summedIntensityDiff = summedIntensityDiff / (fragNum - 1); standardDev = sqrt(summedIntensityDiff); confidenceLimitHigh = averageIntensity + (standardDev * 1.64); /*1.64 defines 90% conf lim*/ confidenceLimitLow = averageIntensity - (standardDev * 1.64); /*1.64 defines 90% conf lim*/ if(confidenceLimitLow < 0) { confidenceLimitLow = averageIntensity / 5; } /*Adjust the outliers*/ for(i = 0; i < fragNum; i++) { if(fragIntensity[i] > confidenceLimitHigh) { fragIntensity[i] = confidenceLimitHigh; } /* if(fragIntensity[i] < confidenceLimitLow) { fragIntensity[i] = confidenceLimitLow; }*/ } return; } /**********************ScoreBYIsotopes******************************************************* * * If the peak width is less than 1.4 Da, then its possible that there are some isotope * peaks that were not identified, but are in fact c13 peaks of b or y ions. Whenever * there is a value of one in ionFound and a value of zero in the next spot, if this mass * difference is one then the ionFound value for the isotop is given a value of * IONFOUND_ISOTOPE. * */ void ScoreBYIsotopes(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 fragNum, INT_4 *ionType) { INT_4 i; REAL_4 massDiff, c13MinusC12, oldIonFoundValue, newIonFoundValue; c13MinusC12 = 1.003354 * gMultiplier; for(i = 0; i < fragNum - 1; i++) { if(ionFound[i] != 0) { massDiff = fragMOverZ[i + 1] - fragMOverZ[i]; if(massDiff <= c13MinusC12 + gToleranceWide && massDiff >= c13MinusC12 - gToleranceWide) { oldIonFoundValue = ionFound[i + 1]; newIonFoundValue = ionFound[i] * IONFOUND_ISOTOPE; if(newIonFoundValue > oldIonFoundValue) { ionFound[i + 1] = newIonFoundValue; ionType[i+1] = 15; /*isotope ions*/ } } } } return; } /**********************FreeAllSequenceScore******************************* * * Free linked list of SequenceScore structs. * */ void FreeAllSequenceScore(struct SequenceScore *currPtr) { struct SequenceScore *freeMePtr; while(currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free(freeMePtr); } return; } /********************************AlterIonFound************************************************ * * This function multiplies the arrays yFound and bFound, and then checks to see if the * corresponding values of fragMOverZ have differences corresponding to the sequence. In * cases where there is a series of three or more ions, those values of ionFound are attenuated * by a factor of OVER_USED_IONS. The cleavageSites is reduced by one for every two such ions. * */ INT_4 AlterIonFound(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, REAL_4 *yFound, REAL_4 *bFound, INT_4 newCleavageSites) { INT_4 i, j; INT_4 yCal, ionsInARow, *theIndexValues, yCalStart; REAL_8 mToAFactor; char test; /* Make room for an array.*/ theIndexValues = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); if(theIndexValues == NULL) { printf("AlterIonFound: Out of memory."); exit(1); } /* Initialize some variables.*/ ionsInARow = 0; /* Initialize the starting mass for y ions. */ yCalStart = gParam.modifiedCTerm + (2 * gElementMass_x100[HYDROGEN]) + 0.5; /* Multiply the b and y Found arrays; both can only contain 0 or 1, so multiplying the two gives 1 if both are one and zero for other cases. */ for(i = 0; i < fragNum; i++) { yFound[i] = yFound[i] * bFound[i]; } /* Calculate the y ions and see if they match with the ions from yFound. */ for(i = (seqLength - 1); i >= 0; i--) { yCal = yCalStart; for(j = (seqLength - 1); j >= i; j--) { yCal += sequence[j]; /*This is the monoisotopic mass.*/ } /* Adjust to average mass if necessary.*/ if(yCal > gParam.monoToAv) { mToAFactor = 0; } else { if(yCal >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - yCal) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(yCal >= (gParam.monoToAv - gAvMonoTransition)) { yCal = yCal * mToAFactor; } /* Set test to TRUE, which becomes FALSE if a fragment ion matching this yCal is found.*/ test = TRUE; for(j = 0; j < fragNum; j++) { if(yFound[j] == 1) { if(fragMOverZ[j] <= yCal + gToleranceWide && fragMOverZ[j] >= yCal - gToleranceWide) { theIndexValues[ionsInARow] = j; ionsInARow++; test = FALSE; break; } } } /* test is TRUE if the most recently calculated yCal does not match an observed fragment ion. If TRUE, then it first checks to see if there has been a run of more than two ions in a row. If so, it attenuates the ionFound values and resets the cleavageSites value. */ if(test) { if(ionsInARow > 2) { for(j = 0; j < ionsInARow; j++) { ionFound[theIndexValues[j]] = ionFound[theIndexValues[j]] * OVER_USED_IONS; } ionsInARow = ionsInARow / 2; newCleavageSites = newCleavageSites - ionsInARow; } ionsInARow = 0; } } free(theIndexValues); return(newCleavageSites); } /*****************************CheckItOut****************************************************** * * This function is used to determine if the correct sequence is present. By setting * gAmIHere to be FALSE, this stuff is always skipped. If * gAmIHere is TRUE, then this function is activated. * */ BOOLEAN CheckItOutSequenceScore(struct SequenceScore *firstSequencePtr) { struct SequenceScore *currPtr, *correctPtr; INT_4 totalSubsequences, rank, correctPeptideLength; INT_4 i, j = 0; INT_4 lowestScore; char test; /* Find the length of the correct sequence. */ correctPeptideLength = 0; while ( gRightSequence[correctPeptideLength] != 0) { correctPeptideLength++; } /* Find the lowest score here. */ currPtr = firstSequencePtr; while(currPtr->next != NULL) { currPtr = currPtr->next; } lowestScore = currPtr->intensityScore; /* Find the correct sequence in the linked list. */ currPtr = firstSequencePtr; while(currPtr != NULL) { test = TRUE; for(i = 0; i < correctPeptideLength; i++) { /*if(currPtr->peptide[i] < gRightSequence[i] - gParam.fragmentErr || currPtr->peptide[i] > gRightSequence[i] + gParam.fragmentErr)*/ if(currPtr->peptide[i] != gRightSequence[i]) { test = FALSE; break; } } if(test) /*If this is the correct subsequence, then..*/ { gAmIHere = TRUE; totalSubsequences = 0; rank = 1; correctPtr = currPtr; currPtr = firstSequencePtr; while(currPtr != NULL) /*Count the subsequences and determine the rank.*/ { totalSubsequences++; if(currPtr->intensityScore > correctPtr->intensityScore) { rank++; } currPtr = currPtr->next; } j++; /*Stop here in the debugger.*/ return(TRUE); break; } currPtr = currPtr->next; } if(test != TRUE) { j++; /*Stop here in the debugger for when it doesn't match anymore.*/ return(FALSE); } } /*****************************CheckItOut****************************************************** * * This function is used to determine if the correct sequence is present. By setting * gAmIHere to be FALSE, this stuff is always skipped. If * gAmIHere is TRUE, then this function is activated. * */ BOOLEAN CheckItOut(struct Sequence *firstSequencePtr) { struct Sequence *currPtr, *correctPtr; INT_4 totalSubsequences, rank, correctPeptideLength; INT_4 i, j = 0; INT_4 lowestScore; INT_4 error = 0.4 * gMultiplier; char test; if(firstSequencePtr == NULL) { printf("There were no final sequences to check out."); exit(1); } /* Find the length of the correct sequence. */ correctPeptideLength = 0; while ( gRightSequence[correctPeptideLength] != 0) { correctPeptideLength++; } /* Find the lowest score here. */ currPtr = firstSequencePtr; while(currPtr->next != NULL) { currPtr = currPtr->next; } lowestScore = currPtr->score; /* Find the correct sequence in the linked list. */ currPtr = firstSequencePtr; while(currPtr != NULL) { test = TRUE; for(i = 0; i < correctPeptideLength; i++) { if(currPtr->peptide[i] < gRightSequence[i] - error || currPtr->peptide[i] > gRightSequence[i] + error) { test = FALSE; break; } } if(test) /*If this is the correct subsequence, then..*/ { gAmIHere = TRUE; totalSubsequences = 0; rank = 1; correctPtr = currPtr; currPtr = firstSequencePtr; while(currPtr != NULL) /*Count the subsequences and determine the rank.*/ { totalSubsequences++; if(currPtr->score > correctPtr->score) { rank++; } currPtr = currPtr->next; } j++; /*Stop here in the debugger.*/ return(TRUE); break; } currPtr = currPtr->next; } if(test != TRUE) { j++; /*Stop here in the debugger for when it doesn't match anymore.*/ return(FALSE); } } /******************************BCalculator******************************************** * * This function calculates singly charged b ion masses. It applies the appropriate * average mass correction factor, and returns a INT_4. * */ INT_4 BCalculator(INT_4 i, INT_4 *sequence, INT_4 bCalStart, INT_4 bCalCorrection) { INT_4 bCal = bCalStart; INT_4 j, corrections[10], residueCount, k, maxCorrection, minCorrection, correctionSpread, avCorrection; INT_4 nodeCorrection = bCalCorrection; REAL_8 mToAFactor; char test; for(j = 0; j < 10; j++) { corrections[j] = 0; } for(j = 0 ; j < i; j++) { bCal += sequence[j]; } for(j = 0; j < i; j++) /*determine the correction*/ { test = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(sequence[j] == gGapList[k]) { nodeCorrection += gNodeCorrection[k]; test = FALSE; } } if(test) /*test is true if this is a dipeptide*/ { residueCount = 0; for(k = gAminoAcidNumber; k <= gGapListIndex; k++) { if(sequence[j] == gGapList[k]) { corrections[residueCount] = gNodeCorrection[k]; residueCount++; if(residueCount >= 10) { printf("LutefiskScore: residueCount exceeds 10.\n"); exit(1); } } } if(residueCount <= 1) /*if not in gGapList, then no correction is added, cuz corrections array initialized to zero.*/ { nodeCorrection += corrections[0]; } else { maxCorrection = corrections[0]; minCorrection = corrections[0]; for(k = 1; k < residueCount; k++) { if(corrections[k] > maxCorrection) { maxCorrection = corrections[k]; } if(corrections[k] < minCorrection) { minCorrection = corrections[k]; } } correctionSpread = maxCorrection - minCorrection; if(correctionSpread <= 5) { avCorrection = 0; for(k = 0; k < residueCount; k++) { avCorrection += corrections[k]; } if(avCorrection > 0) { avCorrection = ((REAL_4)avCorrection / residueCount) + 0.5; } else { avCorrection = ((REAL_4)avCorrection / residueCount) - 0.5; } nodeCorrection += avCorrection; } } } } if(nodeCorrection >= 5) { nodeCorrection = ((REAL_4)nodeCorrection / 10) + 0.5; bCal += nodeCorrection; } else if(nodeCorrection <= -5) { nodeCorrection = ((REAL_4)nodeCorrection / 10) - 0.5; bCal += nodeCorrection; } if(bCal > gParam.monoToAv) { mToAFactor = 0; } else { if(bCal >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - bCal) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(bCal >= (gParam.monoToAv - gAvMonoTransition)) { bCal = bCal * mToAFactor; } return(bCal); } /******************************YCalculator******************************************** * * This function calculates singly charged y ion masses. It applies the appropriate * average mass correction factor, and returns a INT_4. * */ INT_4 YCalculator(INT_4 i, INT_4 *sequence, INT_4 seqLength, INT_4 yCalStart, INT_4 yCalCorrection) { INT_4 yCal = yCalStart; INT_4 j, k, nodeCorrection = yCalCorrection; INT_4 residueCount, correctionSpread, corrections[10], maxCorrection, minCorrection; INT_4 avCorrection; REAL_8 mToAFactor; char test; for(j = 0; j < 10; j++) { corrections[j] = 0; } for(j = i; j < seqLength; j++) { yCal += sequence[j]; } for(j = i; j < seqLength; j++) { test = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(sequence[j] == gGapList[k]) { nodeCorrection += gNodeCorrection[k]; test = FALSE; } } if(test) /*test is true if this is a dipeptide*/ { residueCount = 0; for(k = gAminoAcidNumber; k <= gGapListIndex; k++) { if(sequence[j] == gGapList[k]) { corrections[residueCount] = gNodeCorrection[k]; residueCount++; if(residueCount >= 10) { printf("LutefiskScore: residueCount exceeds 10.\n"); exit(1); } } } if(residueCount <= 1) /*if not in gGapList, then nothing is added to nodecorrection, since corrections array initialized to zero*/ { nodeCorrection += corrections[0]; } else { maxCorrection = corrections[0]; minCorrection = corrections[0]; for(k = 1; k < residueCount; k++) { if(corrections[k] > maxCorrection) { maxCorrection = corrections[k]; } if(corrections[k] < minCorrection) { minCorrection = corrections[k]; } } correctionSpread = maxCorrection - minCorrection; if(correctionSpread <= 5) { avCorrection = 0; for(k = 0; k < residueCount; k++) { avCorrection += corrections[k]; } if(avCorrection > 0) { avCorrection = ((REAL_4)avCorrection / residueCount) + 0.5; } else { avCorrection = ((REAL_4)avCorrection / residueCount) - 0.5; } nodeCorrection += avCorrection; } } } } if(nodeCorrection >= 5) { nodeCorrection = ((REAL_4)nodeCorrection / 10) + 0.5; yCal += nodeCorrection; } else if(nodeCorrection <= -5) { nodeCorrection = ((REAL_4)nodeCorrection / 10) - 0.5; yCal += nodeCorrection; } if(yCal > gParam.monoToAv) { mToAFactor = 0; } else { if(yCal >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - yCal) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(yCal >= (gParam.monoToAv - gAvMonoTransition)) { yCal = yCal * mToAFactor; } return(yCal); } /***************************TwoAAExtFinder********************************* * * Returns a char value of TRUE or FALSE depending on if the extension contains two * amino acids (TRUE) or not (FALSE). * */ char TwoAAExtFinder(INT_4 *sequence, INT_4 i) { INT_4 j; char twoAAExtension; twoAAExtension = TRUE; j = 0; while(twoAAExtension && j < gAminoAcidNumber) { if((sequence[i] <= (gMonoMass_x100[j] + gToleranceNarrow)) && (sequence[i] >= (gMonoMass_x100[j] - gToleranceNarrow))) { twoAAExtension = FALSE; } j++; } return(twoAAExtension); } /******************************FindNOxMet**************************************** * * This function counts the number of oxidized Mets in a sequence, which is used * to keep track of the number of ox met in each b ion. * */ INT_4 FindNOxMet(INT_4 *sequence, INT_4 seqLength) { INT_4 i, oxMetCount; if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0) /*mass accuracy sufficient to determine oxMet*/ { oxMetCount = 0; for(i = 0; i < seqLength; i++) { if(sequence[i] >= gMonoMass_x100[9] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[9] + gToleranceNarrow) { oxMetCount++; } } } else /*mass accuracy not sufficient to differentiate oxMet from Phe*/ { oxMetCount = 0; for(i = 0; i < seqLength; i++) { if(sequence[i] >= gMonoMass_x100[F] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[F] + gToleranceNarrow) { oxMetCount++; } } } return(oxMetCount); } /******************************FindNCharge*************************************** * * This function counts the number of basic residues in a sequence, and returns * a INT_4 that contains this number. * */ INT_4 FindNCharge(INT_4 *sequence, INT_4 seqLength) { INT_4 i, j, nChargeCount; nChargeCount = 1; for(i = 0; i < seqLength; i++) { if(((sequence[i] <= (gMonoMass_x100[R] + gToleranceWide)) && (sequence[i] >= (gMonoMass_x100[R] - gToleranceWide))) || (sequence[i] <= (gMonoMass_x100[H] + gToleranceWide) && (sequence[i] >= (gMonoMass_x100[H] - gToleranceWide))) || (sequence[i] <= (gMonoMass_x100[K] + gToleranceWide) && (sequence[i] >= (gMonoMass_x100[K] - gToleranceWide)))) { nChargeCount += 1; } } for(i = 0; i < seqLength; i++) /*Here I look for two amino acid extensions containing Arg, His, or Lys.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if(((sequence[i] <= (gArgPlus[j] + gToleranceWide)) && (sequence[i] >= (gArgPlus[j] - gToleranceWide))) || ((sequence[i] <= (gHisPlus[j] + gToleranceWide)) && (sequence[i] >= (gHisPlus[j] - gToleranceWide))) || ((sequence[i] <= (gLysPlus[j] + gToleranceWide)) && (sequence[i] >= (gLysPlus[j] - gToleranceWide)))) { nChargeCount += 1; } } } return(nChargeCount); } /******************************InitLutefiskScore*************************************** * * This function assigns space to some arrays, counts the sequences in the final list of * completed sequences, puts the sequence tag back into the list of sequences, determines * if there is a C-terminal Lys or Arg for tryptic peptides, and multiplies several mass * variables by 100 so that integers can be used rather than REAL_4s. * */ struct Sequence *InitLutefiskScore(INT_4 *sequence, INT_4 *fragMOverZ, INT_4 *fragIntensity, REAL_4 *ionFound, REAL_4 *ionFoundTemplate, INT_4 *countTheSeqs, struct Sequence *firstSequencePtr, REAL_4 *yFound, REAL_4 *bFound, struct MSData *firstMassPtr, INT_4 *charSequence, REAL_8 *byError, INT_4 *ionType) { struct Sequence *currSeqPtr; INT_4 i; REAL_4 lowMassIonConversion; /* Check that arrays are ok. */ if(yFound == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(bFound == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(sequence == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(charSequence == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(fragMOverZ == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(fragIntensity == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(ionFound == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(ionType == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(byError == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } if(ionFoundTemplate == NULL) { printf("InitLutefiskScore: Out of memory"); exit(1); } /* Count the sequences in the list, print the number to the console, and exit if no sequences. */ *countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { *countTheSeqs += 1; currSeqPtr = currSeqPtr->next; } if(gParam.fMonitor && gCorrectMass) { printf("Scoring %4ld completed sequences.\n", *countTheSeqs); } if(*countTheSeqs == 0) { printf("InitLutefiskScore: countTheSeqs = 0\n"); exit(1); } /* Insert the sequence tag back into the sequence before scoring it. */ if(gParam.tagSequence[0] != '*') { firstSequencePtr = AddTagBack(firstSequencePtr); } /* Check to see if there is an ion at 147 or 175, indicating a tryptic C-terminus. If gTrypticCterm is TRUE then something is done in "MassageScores". */ if(gParam.proteolysis == 'T') { if(gParam.fragmentPattern == 'Q' || gParam.fragmentPattern == 'T') { CTerminalLysOrArg(firstMassPtr); } else { gTrypticCterm = TRUE; /*Its difficult to tell if Lys or Arg is present in LCQ data.*/ } } /* Setup the global arrays gCysPlus, gArgPlus, gHisPlus, and gLysPlus so that they contain the appropriate values for the type of cysteine alkyl group. But don't do this again, cuz the numbers get screwy. */ if(gFirstTimeThru) { lowMassIonConversion = (REAL_4)gMultiplier / 10000; for(i = 0; i < gAminoAcidNumber; i++) { if(gGapList[i] == 0) { gArgPlus[i] = 0; gHisPlus[i] = 0; gLysPlus[i] = 0; gCysPlus[i] = 0; gGlnPlus[i] = 0; gGluPlus[i] = 0; gProPlus[i] = 0; } else { /*gArgPlus[i] = (REAL_4)gArgPlus[i] * lowMassIonConversion + 0.5; gHisPlus[i] = (REAL_4)gHisPlus[i] * lowMassIonConversion + 0.5; gLysPlus[i] = (REAL_4)gLysPlus[i] * lowMassIonConversion + 0.5; gCysPlus[i] = (REAL_4)gCysPlus[i] * lowMassIonConversion + 0.5; gGlnPlus[i] = (REAL_4)gGlnPlus[i] * lowMassIonConversion + 0.5; gGluPlus[i] = (REAL_4)gGluPlus[i] * lowMassIonConversion + 0.5; gProPlus[i] = (REAL_4)gProPlus[i] * lowMassIonConversion + 0.5;*/ gArgPlus[i] = gMonoMass_x100[R] + gMonoMass_x100[i]; gHisPlus[i] = gMonoMass_x100[H] + gMonoMass_x100[i]; gLysPlus[i] = gMonoMass_x100[K] + gMonoMass_x100[i]; gCysPlus[i] = gMonoMass_x100[C] + gMonoMass_x100[i]; gGlnPlus[i] = gMonoMass_x100[Q] + gMonoMass_x100[i]; gGluPlus[i] = gMonoMass_x100[E] + gMonoMass_x100[i]; gProPlus[i] = gMonoMass_x100[P] + gMonoMass_x100[i]; } } gArgPlus[C] = gParam.cysMW + gMonoMass_x100[R]; gHisPlus[C] = gParam.cysMW + gMonoMass_x100[H]; gLysPlus[C] = gParam.cysMW + gMonoMass_x100[K]; gGlnPlus[C] = gParam.cysMW + gMonoMass_x100[Q]; gGluPlus[C] = gParam.cysMW + gMonoMass_x100[E]; gCysPlus[C] = gParam.cysMW + gParam.cysMW; gProPlus[C] = gParam.cysMW + gMonoMass_x100[P]; for(i = 0; i < gAminoAcidNumber; i++) { if(gGapList[i] == 0) { gCysPlus[i] = gParam.cysMW + gMonoMass_x100[i]; } } } return(firstSequencePtr); } /********************************* HighMOverZFilter *********************************** * * * */ void HighMOverZFilter(struct Sequence *firstSequencePtr, INT_4 *fragMOverZ, INT_4 *fragIntensity, INT_4 *countTheSeqs, INT_4 *sequence, INT_4 fragNum) { INT_4 precursor, i, j, seqLength, seqLimit; INT_4 *highMZFrags, *highMZInts, highMZNum; INT_4 *tempMZFrags, *tempMZInts, tempMZNum, intScoreCutoff; INT_4 maxNumOfIons, greatestInt, greatestIntIndex; REAL_4 totalIntensity, score, averageIntensity, scoreCutoff; REAL_4 obsdIsotopeRatio, calcIsotopeRatio; REAL_4 C13minusC12 = 1.00335 * gMultiplier; char test; struct Sequence *currSeqPtr, *previousSeqPtr; /*Assign some space for these arrays.*/ highMZFrags = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(highMZFrags == NULL) { printf("HighMOverZFilter: Out of memory."); exit(1); } highMZInts = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(highMZInts == NULL) { printf("HighMOverZFilter: Out of memory."); exit(1); } tempMZFrags = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(highMZFrags == NULL) { printf("HighMOverZFilter: Out of memory."); exit(1); } tempMZInts = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(highMZInts == NULL) { printf("HighMOverZFilter: Out of memory."); exit(1); } /*Initialize variables.*/ precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; totalIntensity = 0; /*seqLimit is the num of seqs required before the weeding is stopped*/ if(gParam.chargeState <= 2) { seqLimit = 50; } else { seqLimit = 75; } /*set a limit on the num of ions*/ if(gParam.fragmentPattern == 'L') /*LCQ data has high mass b and y*/ { maxNumOfIons = ((REAL_4)gParam.peptideMW / (2 * gAvResidueMass)) * 1.0 + 0.5; } else /*TSQ and QTOF data generally has only y ions (fewer ions to look at)*/ { maxNumOfIons = ((REAL_4)gParam.peptideMW / (2 * gAvResidueMass)) + 0.5; } /* Generate the list of ions greater than the precursor m/z.*/ averageIntensity = 0; highMZNum = 0; for(i = 0; i < fragNum; i++) { if(fragMOverZ[i] > precursor + gToleranceWide * 2) { averageIntensity += fragIntensity[i]; highMZNum++; } } if(highMZNum == 0) { printf("No high mass ions found for the high m/z filter.\n"); free(highMZFrags); free(highMZInts); free(tempMZFrags); free(tempMZInts); return; } averageIntensity = averageIntensity / highMZNum; averageIntensity = averageIntensity / 2; /*2 is arbitrary; averageIntensity is used below in order to select only the more abundant ions; if this threshold is high then you might consider raising the percentage up from 90% (below)*/ tempMZNum = 0; for(i = 0; i < fragNum; i++) { if(fragMOverZ[i] > precursor + 5 * gMultiplier && fragIntensity[i] > averageIntensity) { test = TRUE; /*make sure I don't use an isotope peak*/ if(tempMZNum != 0) { for(j = 0; j < tempMZNum; j++) { if(tempMZFrags[j] + C13minusC12 >= fragMOverZ[i] - gToleranceWide * 2 && tempMZFrags[j] + C13minusC12 <= fragMOverZ[i] + gToleranceWide * 2) { if(fragIntensity[i] == 0 || tempMZFrags[j] == 0) { printf("HighMOverZFilter: divide by zero"); exit(1); } obsdIsotopeRatio = (REAL_4)tempMZInts[j] / (REAL_4)fragIntensity[i]; calcIsotopeRatio = (REAL_4)(1800 * gMultiplier) / (REAL_4)tempMZFrags[j]; if(obsdIsotopeRatio > calcIsotopeRatio - 0.3 || (obsdIsotopeRatio <= 1.2 && obsdIsotopeRatio >= 0.8)) { test = FALSE; } /*My LCQ gives funny isotope ratios, for some reason*/ if(gParam.fragmentPattern == 'L') { if(obsdIsotopeRatio * 2 > calcIsotopeRatio - 0.3) { test = FALSE; } } } } } if(test) { tempMZFrags[tempMZNum] = fragMOverZ[i]; tempMZInts[tempMZNum] = fragIntensity[i]; tempMZNum++; } } } if(tempMZNum <= maxNumOfIons) { for(i = 0; i < tempMZNum; i++) { highMZFrags[i] = tempMZFrags[i]; highMZInts[i] = tempMZInts[i]; } highMZNum = tempMZNum; } else { highMZNum = 0; for(i = 0; i < maxNumOfIons; i++) { greatestInt = tempMZInts[0]; greatestIntIndex = 0; for(j = 0; j < tempMZNum; j++) { if(tempMZInts[j] > greatestInt) { greatestInt = tempMZInts[j]; greatestIntIndex = j; } } tempMZInts[greatestIntIndex] = -1 * tempMZInts[greatestIntIndex]; } highMZNum = 0; for(i = 0; i < tempMZNum; i++) { if(tempMZInts[i] < 0) { highMZFrags[highMZNum] = tempMZFrags[i]; highMZInts[highMZNum] = -1 * tempMZInts[i]; highMZNum++; } } } /* Calculate the total ion intensity for this high m/z ion list.*/ for(i = 0; i < highMZNum; i++) { totalIntensity += highMZInts[i]; } if(totalIntensity == 0) { printf("HighMOverZFilter: totalIntensity = 0\n"); free(highMZFrags); free(highMZInts); free(tempMZFrags); free(tempMZInts); return; } /* Calculate the initial scoreCutoff value by finding the lowest intensity ion, and assume that the best sequences might not account for that one.*/ scoreCutoff = totalIntensity; for(i = 0; i < highMZNum; i++) { if(highMZInts[i] < scoreCutoff) { scoreCutoff = highMZInts[i]; } } scoreCutoff = (totalIntensity - scoreCutoff) / totalIntensity; scoreCutoff = scoreCutoff * 100; /* Start looking at each sequence.*/ while(scoreCutoff > 0) { currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { /*if(currSeqPtr->score == 629 && currSeqPtr->nodeValue == 14603) { i++; i++; } for debugging*/ /*Load the sequence array.*/ LoadSequence(sequence, &seqLength, currSeqPtr); /* Return a REAL_4 value that corresponds to the intensity only score using b, y and -17/18*/ score = AssignHighMZScore(highMZNum, highMZFrags, highMZInts, totalIntensity, sequence, seqLength); /*Skip sequences that have Pro in third position, for LCQ data since these tend to give lower abudance high mass ions.*/ /*score = ProInThirdPosition(score, sequence, seqLength);*/ /*currSeqPtr->score is a INT_4 and intScore is a REAL_4 that is less than 1, so I multiply it by 100 to get an int (%).*/ if(score >= 0) { currSeqPtr->score = score * 100 + 0.5; } else { currSeqPtr->score = -1; /*has a proline in the third position*/ } /*Give sequences with scores less than 90% a zero as a signal to trash it later.*/ intScoreCutoff = scoreCutoff + 0.5; if(currSeqPtr->score < intScoreCutoff && currSeqPtr->score > 0) { currSeqPtr->score = 0; } currSeqPtr = currSeqPtr->next; } *countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { if(currSeqPtr->score > 0) { *countTheSeqs += 1; } currSeqPtr = currSeqPtr->next; } if(*countTheSeqs < seqLimit) { scoreCutoff = scoreCutoff - 5; /*reduce the cutoff by 5%*/ currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { currSeqPtr->score = 1; currSeqPtr = currSeqPtr->next; } } else { if(gCorrectMass && gParam.fMonitor) { printf("The cutoff for the high m/z ions is %4.1f percent.\n", scoreCutoff); } scoreCutoff = -1; } } /* Get rid of sequences that have a score of zero, but don't free the firstSequencePtr.*/ currSeqPtr = firstSequencePtr->next; previousSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { if(currSeqPtr->score == 0) { previousSeqPtr->next = currSeqPtr->next; free(currSeqPtr); currSeqPtr = previousSeqPtr->next; } else { previousSeqPtr = currSeqPtr; currSeqPtr = currSeqPtr->next; } } /* Count the sequences in the list again. */ *countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { *countTheSeqs += 1; currSeqPtr = currSeqPtr->next; } if(gParam.fMonitor && gCorrectMass) { printf("Scoring %5ld sequences after the high m/z filter.\n", *countTheSeqs); } /*Free the arrays.*/ free(highMZFrags); free(highMZInts); free(tempMZFrags); free(tempMZInts); return; } /************************************TossTheLosers************************************* * * The best tryptic peptide sequences have contiguous series of y and, to a lesser extent, * b ions. Sequences that seem to be derived from wild mixtures of odd ions are weeded out * at this point. This function does not operate on non-tryptic peptides. * */ void TossTheLosers(struct Sequence *firstSequencePtr, REAL_4 *ionFoundTemplate, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity, INT_4 intensityTotal, REAL_4 *ionFound, INT_4 *countTheSeqs, INT_4 *sequence) { struct Sequence *currSeqPtr, *previousSeqPtr; INT_4 seqLength, i, cleavageSites; INT_4 highestBYScore = 0, averageBYScore = 0; REAL_4 intScore; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) /*Assign a BY intensity score to each sequence.*/ { if(currSeqPtr->score == 354 && currSeqPtr->nodeValue == 134773 &&currSeqPtr->nodeCorrection == -5) { i++; i++; } /*for debugging*/ LoadSequence(sequence, &seqLength, currSeqPtr); /*Also used below.*/ for(i = 0; i < fragNum; i++) /*Initialize for each sequence.*/ { ionFound[i] = ionFoundTemplate[i]; } cleavageSites = FindBYIons(ionFound, fragNum, fragMOverZ, sequence, seqLength); /*Note: This is not FindABYIons!*/ ProlineInternalFrag(ionFound, fragMOverZ, sequence, seqLength, fragNum); intScore = BYIntensityScorer(fragIntensity, ionFound, cleavageSites, fragNum, seqLength, intensityTotal); /*save the subsequence score to nodeValue*/ /*currSeqPtr->nodeValue = currSeqPtr->score;*/ currSeqPtr->score = (intScore * 1000); /*currSeqPtr->score is a INT_4 and intScore is a REAL_4 that is less than 1, so I multiply it by 1000 to get an int.*/ if(currSeqPtr->score > highestBYScore) { highestBYScore = currSeqPtr->score; } averageBYScore += (intScore * 1000); currSeqPtr = currSeqPtr->next; } if(*countTheSeqs == 0) { printf("TossTheLosers: countTheSeqs = 0\n"); return; } averageBYScore = averageBYScore / *countTheSeqs; /*countTheSeqs is the number of sequences.*/ /* Make the average BYScore be an average of the highest and the average score, ie, I only keep the highest quartile. */ averageBYScore = (averageBYScore + highestBYScore) / 2.05; /* For sequences with below average score, I reassign the score field to zero. */ currSeqPtr = firstSequencePtr->next; while(currSeqPtr != NULL) { if(currSeqPtr->score < averageBYScore) { currSeqPtr->score = 0; } currSeqPtr = currSeqPtr->next; } /* Now I get rid of the zero score sequences. */ currSeqPtr = firstSequencePtr->next; previousSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { if(currSeqPtr->score == 0) { previousSeqPtr->next = currSeqPtr->next; free(currSeqPtr); currSeqPtr = previousSeqPtr->next; } else { previousSeqPtr = currSeqPtr; currSeqPtr = currSeqPtr->next; } } /* Count the sequences in the list again. */ *countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { *countTheSeqs += 1; currSeqPtr = currSeqPtr->next; } if(gParam.fMonitor && gCorrectMass) { printf("Scoring %4ld sequences following the b and y filter.\n", *countTheSeqs); } return; } /*******************************CTerminalLysOrArg*************************************** * * This function will change the value of gTrypticCTerm (a char) to TRUE if it finds * an ion at m/z 147 or 175, and if its been specified that the peptide is derived from * tryptic digestion. * */ void CTerminalLysOrArg(struct MSData *firstMassPtr) { REAL_4 lysYIon, argYIon; struct MSData *currPtr; lysYIon = gMonoMass_x100[K] + 3 * gElementMass_x100[HYDROGEN] + gElementMass_x100[OXYGEN]; argYIon = gMonoMass_x100[R] + 3 * gElementMass_x100[HYDROGEN] + gElementMass_x100[OXYGEN]; currPtr = firstMassPtr; gTrypticCterm = FALSE; while(currPtr != NULL) { if(currPtr->mOverZ > argYIon + gToleranceWide) { break; } if(currPtr->mOverZ <= (lysYIon + gToleranceWide) && currPtr->mOverZ >= (lysYIon - gToleranceWide)) { gTrypticCterm = TRUE; break; } if(currPtr->mOverZ <= (argYIon + gToleranceWide) && currPtr->mOverZ >= (argYIon - gToleranceWide)) { gTrypticCterm = TRUE; break; } currPtr = currPtr->next; } return; } /******************************BYIntensityScorer********************************* * * IntensityScorer inputs fragIntensity (the ion intensities), ionFound (the array that * is indexed the same as fragIntensity and contains 1 for ions that have been identified * and 0 for those that were not), cleavageSites (the number of times either a b or y ion * of any charge was found to delineate the sequence), and fragNum (the number of fragment * ions in the CID data). This function returns a REAL_4 value (ranging from zero to one) * corresponding to the fraction of the ion current that can be identified times a multiplier * that reflects the idea that correct sequences are usually delineated by series of either * b or y ions. */ REAL_4 BYIntensityScorer(INT_4 *fragIntensity, REAL_4 *ionFound, INT_4 cleavageSites, INT_4 fragNum, INT_4 seqLength, INT_4 intensityTotal) { REAL_4 intScore = 0; REAL_4 numScore = 0; REAL_4 attenuation; INT_4 i; INT_4 intNum = 0; /* * Initialize attenuation, which is a fractional multiplier that reflects the number of * times either b or y ions delineate * the proposed sequence. */ attenuation = cleavageSites; if(seqLength - 1 == 0) { printf("BYIntensityScorer: seqLength - 1 = 0\n"); return(intScore); } attenuation = attenuation / (seqLength - 1); /* Add up the intensity that has been identified, and count the ions. */ for(i = 0; i < fragNum; i++) { if(ionFound[i] != 0) { intScore += fragIntensity[i] * ionFound[i]; intNum++; } } /* Divide by the total ion intensity (which does not include the precursor ion region). */ if(intensityTotal == 0) { printf("BYIntensityScorer: intensityTotal = 0\n"); exit(1); } intScore = intScore / intensityTotal; /* Attenuate the score so that sequences with lots of b and y ions are favored. */ intScore = ((INTENSITY_WEIGHT * intScore) + (ATTENUATION_WEIGHT * attenuation)) / INT_ATT_WEIGHT; return(intScore); } /******************************FindBYIons*********************************** * * FindBYIons identifies b and y ions. * The input is as described in the documentation for the function PEFragments, and it * returns a INT_4 containing the * value "cleavageSites", which is the number of cleavage sites (amide bonds) that are * defined by a b or y ion. * The array ionFound is also modified. */ INT_4 FindBYIons(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength) { INT_4 i, j, nChargeCount, cChargeCount, bCal, yCal, cleavageSites; INT_4 bOrYIon, k, bCalStart, yCalStart; INT_4 bIonMass, bIonMassMinErr, bIonMassPlusErr; INT_4 yIonMass, yIonMassMinErr, yIonMassPlusErr; INT_4 bCount, yCount, bSeries, ySeries, bIon, yIon, massDiff; INT_4 precursor, skipOneY, skipOneB; INT_4 yCalCorrection = 0; INT_4 bCalCorrection = 0; char test, twoAAExtension, maxCharge; BOOLEAN monoToAvYSwitch = TRUE; /*Used to recalculate for average masses.*/ BOOLEAN avToMonoBSwitch = FALSE; /*Used to recalculate for average masses.*/ BOOLEAN twoAANTerm; REAL_4 currentIonFound; /* Initialize a few variables.*/ bCount = 0; /*Current number of b ions in a row.*/ yCount = 0; /*Current number of y ions in a row.*/ bSeries = 0; /*The greatest number of b ions in a row for a given sequence.*/ ySeries = 0; /*The greatest number of y ions in a row for a given sequence.*/ skipOneY = 1; /*Allows for one missed y ion in a series w/o resetting to zero.*/ skipOneB = 1; /*Allows for one missed b ion in a series w/o resetting to zero.*/ cleavageSites = 0; /*Used to count the number of times a y or b ion of any charge state delineates a sequence. */ precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } /* Initialize the b ion starting mass (acetylated, or whatever). */ bCalStart = gParam.modifiedNTerm + 0.5; /* Initialize the y ion starting mass (amidated or unmodified). */ yCalStart = gParam.modifiedCTerm + (2 * gElementMass_x100[HYDROGEN]) + 0.5; /* Count the number of charged residues. nChargeCount is one more that the number of charged residues found in an N-terminal fragment. cChargeCount is one more than the number of charged residues in a C-terminal fragment. */ nChargeCount = FindNCharge(sequence, seqLength); cChargeCount = 1; /* Figure out if the N-terminus is a two amino acid extension. */ twoAANTerm = TwoAAExtFinder(sequence, 0); /* Here's the big loop, where I step through each position in the sequence. */ for(i = (seqLength - 1); i > 0; i--) /*Don't do this loop for i = 0 (doesnt make sense).*/ { /* Initialize some variables for this 'for' loop. */ bOrYIon = 0; /*If any number of b or y ions of any charge are found, then this equals one. Otherwise, it stays at zero.*/ bIon = 0; /*If a b ion is found this equals one.*/ yIon = 0; /*If a y ion is found this equals one.*/ /* Figure out if this is a two amino acid extension. */ twoAAExtension = TwoAAExtFinder(sequence, i); /* Calculate the singly charged y ion mass. */ yCal = YCalculator(i, sequence, seqLength, yCalStart, yCalCorrection); /* Calculate the singly charged b ion mass. */ bCal = BCalculator(i, sequence, bCalStart, bCalCorrection); /* Readjust the number of charges in the C- and N-terminii. */ if((sequence[i] >= gMonoMass_x100[R] - gToleranceWide && sequence[i] <= gMonoMass_x100[R] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[H] - gToleranceWide && sequence[i] <= gMonoMass_x100[H] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[K] - gToleranceWide && sequence[i] <= gMonoMass_x100[K] + gToleranceWide)) { cChargeCount += 1; nChargeCount -= 1; } else /*Check to see if its a two amino acid combo that could contain Arg, His, or Lys.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] >= gArgPlus[j] - gToleranceWide && sequence[i] <= gArgPlus[j] + gToleranceWide) || (sequence[i] >= gHisPlus[j] - gToleranceWide && sequence[i] <= gHisPlus[j] + gToleranceWide) || (sequence[i] >= gLysPlus[j] - gToleranceWide && sequence[i] <= gLysPlus[j] + gToleranceWide)) { cChargeCount += 1; nChargeCount -= 1; break; } } } for(j = 1; j <= maxCharge; j++) /*Check each charge state up to the parent ion charge.*/ { /* Initialize variables within this j loop.*/ test = FALSE; /*Used to test if b, a, or y ions are found before looking for the corresponding losses of ammonia or water.*/ bIonMass = (bCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bIonMassMinErr = bIonMass - gToleranceWide; bIonMassPlusErr = bIonMass + gToleranceWide; yIonMass = (yCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yIonMassMinErr = yIonMass - gToleranceWide; yIonMassPlusErr = yIonMass + gToleranceWide; /* Search for b ions.*/ if((bIonMass * j) > ((j-1) * 400 * gMultiplier)) /*Make sure there is enough mass to hold the charge.*/ { k = fragNum - 1; while(fragMOverZ[k] >= bIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= bIonMassPlusErr) { if(nChargeCount >= j) /*Make sure enough charges can be attached.*/ { bOrYIon = 1; /*A b or y ion has been identified.*/ test = TRUE; /*A b ion of charge j has been identified.*/ bIon = 1; /*A b ion is present.*/ massDiff = abs(bIonMass - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); /* If the charge under consideration "j" is equal to the precursor charge, and if that precursor charge is greater than one, then the bion intensity is attenuated. Alternatively, if the b ion mass is greater than the precusor, while at the same time the number of basic residues in the b ion is less than the number of charges on the precursor, then that is also grounds for reducing the influence of the ion. */ if((j == gParam.chargeState && gParam.chargeState > 1) || (bIonMass > precursor && nChargeCount < gParam.chargeState)) { if(gParam.fragmentPattern != 'L') { ionFound[k] = ionFound[k] * HIGH_MASS_B_ION_MULTIPLIER; } } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } } } k--; } } /* Search for the y ion values.*/ if((yIonMass * j) > ((j-1) * 400 * gMultiplier)) /*Make sure there is enough mass to hold the charge.*/ { test = FALSE; k = fragNum - 1; while(fragMOverZ[k] >= yIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= yIonMassPlusErr) { if(cChargeCount >= j) /*Make sure enough charges can be attached.*/ { bOrYIon = 1; /*A b or y ion has been identified.*/ test = TRUE; /*A y ion of charge j has been identified.*/ yIon = 1; /*A y ion is present.*/ massDiff = abs(yIonMass - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(j == gParam.chargeState && gParam.chargeState > 1 && ((i > 2 && !twoAANTerm) || (i > 1 && twoAANTerm))) { ionFound[k] = ionFound[k] * HIGH_CHARGE_Y_ION_MULTIPLIER; } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } } } k--; } } /*if((yIonMass * j) > ((j-1) * 50000))*/ } /*for j*/ if(bIon) { bCount++; /*If there was a b ion, then increment by one.*/ skipOneB = 1; } else { if(skipOneB == 0) { bCount = 0; /*Otherwise, reset the counting of b ions to zero.*/ } skipOneB = 0; /*if the next time there is no b ion, then bCount is reset to zero*/ } if(yIon) { yCount++; skipOneY = 1; } else { if(skipOneY == 0) { yCount = 0; } skipOneY = 0; /*if the next time there is no y ion, then bCount is reset to zero*/ } if(bCount > bSeries) { bSeries = bCount; /*Don't forget what the longest continuous b series was.*/ } if(yCount > ySeries) { ySeries = yCount; /*Don't forget what the longest continuous y series was.*/ } if(gParam.fragmentPattern == 'L' && (i < 3 || i > (seqLength - 3))) { cleavageSites++; /*For LCQ data, the ends are not well established and should not be counted against sequences*/ } else { cleavageSites += bOrYIon; /*Count the number of times a b or y ion define a cleavage site.*/ } } /*for i*/ if(gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q') { if(ySeries > bSeries) { cleavageSites = ySeries; } else { cleavageSites = bSeries; } } return(cleavageSites); } /****************************PrintToConsoleAndFile*************************************************** * This function prints header information plus a list of sequences, scores, ranked according * to the massaged score, which combines the intensity score and x-corr in a single value. The * file that is created is used as the input for the CIDentify database searching program * (a modification of FASTA) written by Alex Taylor. */ void PrintToConsoleAndFile(struct SequenceScore *firstScorePtr, REAL_4 quality, INT_4 length, REAL_4 perfectProbScore) { INT_4 i, j, seqNum, skippedOver = 0, wrongIndex; REAL_4 averageWrongXCorrScore, averageWrongIntScore, averageWrongQualityScore, averageWrongProbScore; REAL_4 averageWrongComboScore; REAL_4 sumOfDiffSquared, diffSquared, stDevXCorrScore, stDevProbScore, stDevIntScore; REAL_4 stDevQualityScore, stDevComboScore; REAL_4 stDevAboveAvXCorr, stDevAboveAvProb, stDevAboveAvInt; REAL_4 stDevAboveAvQuality, stDevAboveAvCombo; REAL_4 xCorrRatio, probRatio, intRatio, qualityRatio, comboRatio; REAL_4 wrongXCorrScore[WRONG_SEQ_NUM + 1], wrongIntScore[WRONG_SEQ_NUM + 1]; REAL_4 wrongProbScore[WRONG_SEQ_NUM + 1], wrongQualityScore[WRONG_SEQ_NUM + 1]; REAL_4 wrongComboScore[WRONG_SEQ_NUM + 1]; char *peptideString = NULL; INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength = 0; struct SequenceScore *maxPtr; FILE *fp; const time_t theTime = (const time_t)time(NULL); PrintHeaderToFile(); /* Open the output file for appending.*/ fp = fopen(gParam.outputFile, "a"); if(fp == NULL) /*fopen returns NULL if there's a problem.*/ { printf("Cannot open %s to write the output.\n", gParam.outputFile); exit(1); } /* fprintf(fp, "Spectral Quality = %f\n", quality); // fprintf(fp, "Contiguous series of sequence ions defines a sequence of length %2ld\n ", length); // fprintf(fp, "High Probscr %f\n ", perfectProbScore);*/ /*fprintf(fp, "Contiguous series of sequence ions defining a sequence of single amino acids %2ld\n ", gSingleAACleavageSites);*/ fprintf(fp, "\n "); fprintf(fp, "\n "); fprintf(fp, "\n "); /* if (gParam.fMonitor) // { // printf("\nMaximum Spectral Quality = %f\n", quality); // printf("\nLongest contiguous series of sequence ions defining a sequence of single amino acids %2ld\n ", length); // printf("\nHigh Probscr %f\n ", perfectProbScore);*/ /*printf("Contiguous series of sequence ions defining a sequence of single amino acids %2ld\n ", // gSingleAACleavageSites);*/ /* }*/ /* Count the sequences and determine the max seq length.*/ seqNum = 0; maxPtr = firstScorePtr; while(maxPtr != NULL) { seqNum++; maxPtr = maxPtr->next; } /* Set up to print some of the output.*/ if(gParam.fMonitor) { printf("\n Sequence Rank Pr(c) PevzScr Quality IntScr X-corr\n"); } fprintf(fp, "\n Sequence Rank Pr(c) PevzScr Quality IntScr X-corr\n"); if(firstScorePtr == NULL) { if(gParam.fMonitor) { printf("\nNo sequences were found exceeding the specified Pr(c) of %f.\n\n", gParam.outputThreshold); } fprintf(fp, "\nNo sequences were found exceeding the specified Pr(c) of %f.\n\n.", gParam.outputThreshold); fclose(fp); return; } /* Determine if any database sequences looked good, and report their sequences*/ if (strlen(gParam.databaseSequences) > 0 && gCorrectMass) /*was a database sequence file opened?*/ { if(gDatabaseSeqCorrect) /*a database derived sequence looked good*/ { maxPtr = firstScorePtr; while(maxPtr != NULL) { if(maxPtr->databaseSeq == 2) /*this specific sequence looked good*/ { /*Change peptide[j] to single letter code.*/ peptideString = NULL; peptideLength = 0; j = 0; while(maxPtr->peptide[j] != 0) { peptide[j] = maxPtr->peptideSequence[j]; peptideLength++; j++; } peptideString = GetDatabaseSeq(peptide, peptideLength); if(peptideString) { strcat(peptideString, " DB"); if(gParam.fMonitor) { printf("%-55.55s %2ld %5.3f %5.3f %5.3f %5.3f %5.3f\n", peptideString, maxPtr->rank - skippedOver, maxPtr->comboScore, maxPtr->probScore, maxPtr->quality, maxPtr->intensityScore, maxPtr->crossDressingScore); } fprintf(fp, "%-55.55s %2ld %5.3f %5.3f %5.3f %5.3f %5.3f\n", peptideString, maxPtr->rank - skippedOver, maxPtr->comboScore, maxPtr->probScore, maxPtr->quality, maxPtr->intensityScore, maxPtr->crossDressingScore); free(peptideString); } } maxPtr = maxPtr->next; } } } for(i = 1; i <= 50; i ++) /*List the top 50 sequences.*/ { if(i > seqNum) /*Break out if there are less than 50 sequences in the entire list.*/ { break; } maxPtr = firstScorePtr; while(maxPtr != NULL) { if(maxPtr->rank == i && maxPtr->databaseSeq != 2) { /*Change peptide[j] to single letter code.*/ peptideString = NULL; peptideLength = 0; j = 0; while(maxPtr->peptide[j] != 0) { peptide[j] = maxPtr->peptideSequence[j]; peptideLength++; j++; } if(maxPtr->databaseSeq) { peptideString = GetDatabaseSeq(peptide, peptideLength); } else { peptideString = PeptideString(peptide, peptideLength); } if(peptideString) { if(maxPtr->databaseSeq) { strcat(peptideString, " DB"); } if(gParam.fMonitor) { printf("%-55.55s %2ld %5.3f %5.3f %5.3f %5.3f %5.3f\n", peptideString, i - skippedOver, maxPtr->comboScore, maxPtr->probScore, maxPtr->quality, maxPtr->intensityScore, maxPtr->crossDressingScore); } fprintf(fp, "%-55.55s %2ld %5.3f %5.3f %5.3f %5.3f %5.3f\n", peptideString, i - skippedOver, maxPtr->comboScore, maxPtr->probScore, maxPtr->quality, maxPtr->intensityScore, maxPtr->crossDressingScore); free(peptideString); } else { skippedOver++; } break; } maxPtr = maxPtr->next; } } /* Print out the sequences w/ x-corr greater than 0.9 that were not already listed above.*/ /* maxPtr = firstScorePtr; while(maxPtr != NULL) { if(maxPtr->rank <= 50) //Don't list anything already listed above. { maxPtr = maxPtr->next; continue; } if(maxPtr->crossDressingScore >= 0.9) { //Change peptide[j] to single letter code. char *peptideString; INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength = 0; j = 0; while(maxPtr->peptide[j] != 0) { peptide[j] = maxPtr->peptideSequence[j]; peptideLength++; j++; } peptideString = PeptideString(peptide, peptideLength); if(maxPtr->databaseSeq) { strcat(peptideString, " DB"); } if(peptideString) { if(gParam.fMonitor) { printf("%-55.55s %2ld %5.3f %5.3f %5.3f %5.3f\n", peptideString, i - skippedOver, maxPtr->intensityScore, maxPtr->crossDressingScore, maxPtr->intensityOnlyScore, maxPtr->quality); } fprintf(fp, "%-55.55s %2ld %5.3f %5.3f %5.3f %5.3f\n", peptideString, i - skippedOver, maxPtr->intensityScore, maxPtr->crossDressingScore, maxPtr->intensityOnlyScore, maxPtr->quality); free(peptideString); } } maxPtr = maxPtr->next; } if(gParam.fragmentPattern == 'Q') { fprintf(fp, "\nThe residue 'm' signifies oxidized Met.\n"); if(gParam.fMonitor) { printf("\nThe residue 'm' signifies oxidized Met.\n"); } }*/ /* Print the elapsed search time */ { div_t theHours; div_t theMin; theHours = div(gParam.searchTime, 3600); theMin = div(theHours.rem, 60); fprintf(fp, "\nSearch time: %2d:", theHours.quot); if(gParam.fMonitor) { printf("\nSearch time: %2d:", theHours.quot); } if (theMin.quot < 10) { fprintf(fp, "0"); if(gParam.fMonitor) { printf("0"); } } fprintf(fp, "%d:", theMin.quot); if(gParam.fMonitor) { printf("%d:", theMin.quot); } if (theMin.rem < 10) { fprintf(fp, "0"); if(gParam.fMonitor) { printf("0"); } } fprintf(fp, "%d\n", theMin.rem); if(gParam.fMonitor) { printf("%d\n", theMin.rem); } } /* Here's where I figure out the statistics for the scores for the wrong answers and compare them to the correct mass sequences. */ /*Adjust the number of wrong sequences to avoid counting ones that just had zero scores*/ wrongIndex = 0; for(i = 0; i < gWrongIndex; i++) { if( gWrongXCorrScore[i] != 0 || gWrongIntScore[i] != 0 || gWrongProbScore[i] != 0 || gWrongQualityScore[i] != 0 || gWrongComboScore[i] != 0) { wrongXCorrScore[wrongIndex] = gWrongXCorrScore[i]; wrongIntScore[wrongIndex] = gWrongIntScore[i]; wrongProbScore[wrongIndex] = gWrongProbScore[i]; wrongQualityScore[wrongIndex] = gWrongQualityScore[i]; wrongComboScore[wrongIndex] = gWrongComboScore[i]; wrongIndex++; } } /*initialize*/ for(i = 0; i < WRONG_SEQ_NUM + 1; i++) { gWrongXCorrScore[i] = 0; gWrongIntScore[i] = 0; gWrongProbScore[i] = 0; gWrongQualityScore[i] = 0; gWrongComboScore[i] = 0; } for(i = 0; i < wrongIndex; i++) { gWrongXCorrScore[i] = wrongXCorrScore[i]; gWrongIntScore[i] = wrongIntScore[i]; gWrongProbScore[i] = wrongProbScore[i]; gWrongQualityScore[i] = wrongQualityScore[i]; gWrongComboScore[i] = wrongComboScore[i]; } gWrongIndex = wrongIndex; /*If there are enough wrong sequences, then figger out the standard deviations, etc*/ if(gWrongIndex < 3) { if(gParam.wrongSeqNum > 0) { printf("\nNot enough wrong sequences to statistically evaluate\n"); } } else { /*initialize*/ averageWrongXCorrScore = 0; averageWrongIntScore = 0; averageWrongProbScore = 0; averageWrongQualityScore = 0; averageWrongComboScore = 0; /*gWrongIndex-1 (the last in the list) contains the correct sequence score*/ for(i = 0; i < gWrongIndex - 1; i++) { averageWrongXCorrScore += gWrongXCorrScore[i]; averageWrongIntScore += gWrongIntScore[i]; averageWrongProbScore += gWrongProbScore[i]; averageWrongQualityScore += gWrongQualityScore[i]; averageWrongComboScore += gWrongComboScore[i]; } /*divide by the number of wrong answers to calc the average wrong scores*/ averageWrongXCorrScore = averageWrongXCorrScore / (gWrongIndex-1); averageWrongIntScore = averageWrongIntScore / (gWrongIndex-1); averageWrongProbScore = averageWrongProbScore / (gWrongIndex-1); averageWrongQualityScore = averageWrongQualityScore / (gWrongIndex-1); averageWrongComboScore = averageWrongComboScore / (gWrongIndex-1); /*figure out standard deviations of the wrong cross-correlation scores*/ sumOfDiffSquared = 0; for(i = 0; i < gWrongIndex - 1; i++) { diffSquared = gWrongXCorrScore[i] - averageWrongXCorrScore; diffSquared = diffSquared * diffSquared; sumOfDiffSquared += diffSquared; } stDevXCorrScore = sumOfDiffSquared / (gWrongIndex - 2); /*divide by N-1*/ stDevXCorrScore = sqrt(stDevXCorrScore); /*Figure out how many standard deviations the correct mass sequence cross-correlation score is above the above the average wrong cross-correlation score*/ if(stDevXCorrScore != 0) { stDevAboveAvXCorr = (gWrongXCorrScore[gWrongIndex-1] - averageWrongXCorrScore) / stDevXCorrScore; } else { stDevAboveAvXCorr = 0; /*avoid a divide by zero*/ } /*figure out standard deviations of the wrong "probability" scores*/ sumOfDiffSquared = 0; for(i = 0; i < gWrongIndex - 1; i++) { diffSquared = gWrongProbScore[i] - averageWrongProbScore; diffSquared = diffSquared * diffSquared; sumOfDiffSquared += diffSquared; } stDevProbScore = sumOfDiffSquared / (gWrongIndex - 2); /*divide by N-1*/ stDevProbScore = sqrt(stDevProbScore); /*Figure out how many standard deviations the correct mass sequence intensity only score is above the above the average wrong intensity only score*/ if(stDevProbScore != 0) { stDevAboveAvProb = (gWrongProbScore[gWrongIndex-1] - averageWrongProbScore) / stDevProbScore; } else { stDevAboveAvProb = 0; /*avoid a divide by zero*/ } /*figure out standard deviations of the wrong intensity scores*/ sumOfDiffSquared = 0; for(i = 0; i < gWrongIndex - 1; i++) { diffSquared = gWrongIntScore[i] - averageWrongIntScore; diffSquared = diffSquared * diffSquared; sumOfDiffSquared += diffSquared; } stDevIntScore = sumOfDiffSquared / (gWrongIndex - 2); stDevIntScore = sqrt(stDevIntScore); /*Figure out how many standard deviations the correct mass sequence biased intensity score is above the above the average wrong biased intensity score*/ if(stDevIntScore != 0) { stDevAboveAvInt = (gWrongIntScore[gWrongIndex-1] - averageWrongIntScore) / stDevIntScore; } else { stDevAboveAvInt = 0; /*avoid a divide by zero*/ } /*figure out standard deviations of the wrong quality scores*/ sumOfDiffSquared = 0; for(i = 0; i < gWrongIndex - 1; i++) { diffSquared = gWrongQualityScore[i] - averageWrongQualityScore; diffSquared = diffSquared * diffSquared; sumOfDiffSquared += diffSquared; } stDevQualityScore = sumOfDiffSquared / (gWrongIndex - 2); stDevQualityScore = sqrt(stDevQualityScore); /*Figure out how many standard deviations the correct mass sequence biased intensity score is above the above the average wrong biased intensity score*/ if(stDevQualityScore != 0) { stDevAboveAvQuality = (gWrongQualityScore[gWrongIndex-1] - averageWrongQualityScore) / stDevQualityScore; } else { stDevAboveAvQuality = 0; /*avoid a divide by zero*/ } /*figure out standard deviations of the wrong combined scores*/ sumOfDiffSquared = 0; for(i = 0; i < gWrongIndex - 1; i++) { diffSquared = gWrongComboScore[i] - averageWrongComboScore; diffSquared = diffSquared * diffSquared; sumOfDiffSquared += diffSquared; } stDevComboScore = sumOfDiffSquared / (gWrongIndex - 2); stDevComboScore = sqrt(stDevComboScore); /*Figure out how many standard deviations the correct mass sequence biased intensity score is above the above the average wrong biased intensity score*/ if(stDevComboScore != 0) { stDevAboveAvCombo = (gWrongComboScore[gWrongIndex-1] - averageWrongComboScore) / stDevComboScore; } else { stDevAboveAvCombo = 0; /*avoid a divide by zero*/ } /*Calculate right/wrong ratios*/ if(averageWrongXCorrScore != 0) { xCorrRatio = gWrongXCorrScore[gWrongIndex-1] / averageWrongXCorrScore; } else { xCorrRatio = 0; /*avoid divide by zero*/ } if(averageWrongProbScore != 0) { probRatio = gWrongProbScore[gWrongIndex-1] / averageWrongProbScore; } else { probRatio = 0; /*avoid divide by zero*/ } if(averageWrongIntScore != 0) { intRatio = gWrongIntScore[gWrongIndex-1] / averageWrongIntScore; } else { intRatio = 0; /*avoid divide by zero*/ } if(averageWrongQualityScore != 0) { qualityRatio = gWrongQualityScore[gWrongIndex-1] / averageWrongQualityScore; } else { qualityRatio = 0; /*avoid divide by zero*/ } if(averageWrongComboScore != 0) { comboRatio = gWrongComboScore[gWrongIndex-1] / averageWrongComboScore; } else { comboRatio = 0; /*avoid divide by zero*/ } if(gParam.fMonitor) { printf("\nStatistics based on %2ld wrong sequences:\n", gWrongIndex-1); printf("\n 1st ranked St Deviations Average Wrong Correct/Wrong\n"); printf("xcorr score %5.3f %5.3f %5.3f %5.2f\n", gWrongXCorrScore[gWrongIndex-1], stDevAboveAvXCorr, averageWrongXCorrScore, xCorrRatio); printf("Intensity score %5.3f %5.3f %5.3f %5.2f\n", gWrongIntScore[gWrongIndex-1], stDevAboveAvInt, averageWrongIntScore, intRatio); printf("Quality score %5.3f %5.3f %5.3f %5.2f\n", gWrongQualityScore[gWrongIndex-1], stDevAboveAvQuality, averageWrongQualityScore, qualityRatio); printf("Prob score %5.3f %5.3f %5.3f %5.2f\n", gWrongProbScore[gWrongIndex-1], stDevAboveAvProb, averageWrongProbScore, probRatio); printf("Combined score %5.3f %5.3f %5.3f %5.2f\n", gWrongComboScore[gWrongIndex-1], stDevAboveAvCombo, averageWrongComboScore, comboRatio); } fprintf(fp, "\nBased on %2ld wrong sequences\n", gWrongIndex); fprintf(fp, "\n 1st ranked St Deviations Average Wrong Correct/Wrong\n"); fprintf(fp, "xcorr score %5.3f %5.3f %5.3f %5.2f\n", gWrongXCorrScore[gWrongIndex-1], stDevAboveAvXCorr, averageWrongXCorrScore, xCorrRatio); fprintf(fp, "Intensity score %5.3f %5.3f %5.3f %5.2f\n", gWrongIntScore[gWrongIndex-1], stDevAboveAvInt, averageWrongIntScore, intRatio); fprintf(fp, "Quality score %5.3f %5.3f %5.3f %5.2f\n", gWrongQualityScore[gWrongIndex-1], stDevAboveAvQuality, averageWrongQualityScore, qualityRatio); fprintf(fp, "Prob score %5.3f %5.3f %5.3f %5.2f\n", gWrongProbScore[gWrongIndex-1], stDevAboveAvProb, averageWrongProbScore, probRatio); fprintf(fp, "Combined score %5.3f %5.3f %5.3f %5.2f\n", gWrongComboScore[gWrongIndex-1], stDevAboveAvCombo, averageWrongComboScore, comboRatio); } fclose(fp); return; } /*****************************KeepSequence************************************************************** * * If a sequence has an unusually high value for one of the scores, then it is kept regardless of how * bad the other scores might be. */ BOOLEAN KeepSequence(struct SequenceScore *currPtr, REAL_4 intscrKeep, REAL_4 xcorrKeep, REAL_4 qualityKeep, REAL_4 probscrKeep) { BOOLEAN keep = FALSE; if(currPtr->intensityScore > intscrKeep) { keep = TRUE; } if(currPtr->crossDressingScore > xcorrKeep) { keep = TRUE; } if(currPtr->quality > qualityKeep) { keep = TRUE; } if(currPtr->probScore > probscrKeep) { keep = TRUE; } return(keep); } /*****************************ComboScore**************************************************************** * * ComboScore is derived from an emperical determination of the probability of being wrong given * different input scores. The different probabilities are averaged and given as an output. * */ REAL_4 ComboScore(struct SequenceScore *currPtr) { REAL_4 averageScore, upperLimit, lowerLimit; REAL_4 aConstant, bConstant, cConstant; REAL_4 probWrong, probRight; REAL_4 pevScrWt, qualScrWt, intScrWt, xcorrScrWt, summedWt; /*Initialize*/ if(gParam.fragmentPattern == 'Q') { pevScrWt = 1; qualScrWt = 1.8; intScrWt = 2; xcorrScrWt = 1; } else if(gParam.fragmentPattern == 'L') { pevScrWt = 1.25; qualScrWt = 0.75; intScrWt = 2; xcorrScrWt = 1.5; } else { pevScrWt = 1; qualScrWt = 1; intScrWt = 1; xcorrScrWt = 1; } summedWt = pevScrWt + qualScrWt + intScrWt + xcorrScrWt; if(summedWt == 0) { printf("Divide by zero in ComboScore"); exit(1); } /*Calculate the weighted average score*/ averageScore = currPtr->probScore * pevScrWt + currPtr->intensityScore * intScrWt + currPtr->crossDressingScore * xcorrScrWt + currPtr->quality * qualScrWt; averageScore = averageScore / summedWt; if(averageScore == 0) { return(0); /*return a comboscr of zero*/ } upperLimit = 0.99; lowerLimit = 0.01; /*Pr(wrong) = Ax2 + Bx + C*/ if(gParam.fragmentPattern == 'Q') { aConstant = 2.667; /*Qtof model derived from data set*/ bConstant = -4.9901; cConstant = 2.293; } else if(gParam.fragmentPattern == 'L') { aConstant = -1.106; /*LCQ model derived from data set*/ bConstant = -0.607; cConstant = 1.467; } else { aConstant = 0; /*For other data use an average of the two constants, since no modeling has been done*/ bConstant = -2.8; /*Use a linear model, too*/ cConstant = 1.9; } /*Calculate probability of being wrong*/ probWrong = aConstant * averageScore * averageScore + bConstant * averageScore + cConstant; /*Set extreme ends of probability*/ if(probWrong > upperLimit) { probWrong = upperLimit; } else if(probWrong < lowerLimit) { probWrong = lowerLimit; } probRight = 1 - probWrong; /*make it probability of being correct*/ return(probRight); } /*****************************DetermineBestCandidates*************************************************** * * Sequences derived from databases are deemed to be correct if either the xcorr, intscr, or probscr * values are within 0.95 times the maximum. If its determined to be correct, then no further scoring * occurs. Otherwise, the scoring proceeds as follows: * Sequences with x-corr values less than 0.75 times the max are discarded. * Sequences with intensity scores less than 0.75 times the max are discarded. * Sequences with quality scores less than max - 0.2 are discarded. * Sequences with probscr scores less than max - 0.2 are discarded. * A final score is calculated by multiplying quality and probscr, and sequences less than 0.9 of the max * are discarded. * If more than five sequences are remaining, then they are ranked by qual x probscr and the top five are * kept as the final list of candidates. * */ struct SequenceScore *DetermineBestCandidates(struct SequenceScore *firstScorePtr) { REAL_4 intscrLimit, intscrBottom, highestIntScore; REAL_4 xcorrLimit, xcorrBottom, highestXcorr; REAL_4 qualityLimit, qualityBottom, highestQuality; REAL_4 probscrLimit, probscrBottom, highestProbscr; REAL_4 comboLimit, comboBottom, highestCombo; REAL_4 intScore, intOnlyScore, crossDressingScore, stDevErr, calFactor, quality; REAL_4 probScore, comboScore; REAL_4 intscrKeep, xcorrKeep, qualityKeep, probscrKeep; INT_4 maxFinalSequences, countTheSeqs, i, cleavageSites; INT_4 sequence[MAX_PEPTIDE_LENGTH], charSequence[MAX_PEPTIDE_LENGTH]; INT_4 seqLength; char databaseSeq; BOOLEAN keep; struct SequenceScore *currPtr, *massagedSeqListPtr, *previousPtr; /* Initialize variables. */ if(gParam.fragmentPattern == 'Q') { intscrLimit = 0.75; /*0.75 fraction of highest intscr that is used as threshold*/ intscrBottom = 0.4; /*0.4 this is the bottom intscr value*/ intscrKeep = 0.85; /*keep candidates with intscr greater than this*/ highestIntScore = 0; /*this will hold the max intscr value*/ xcorrLimit = 0.75; /*0.75 fraction of highest xcorr that is used as threshold*/ xcorrBottom = 0.3; /*0.3 this is the bottom xcorr value*/ xcorrKeep = 0.75; /*keep candidates with xcorr greater than this*/ highestXcorr = 0; /*this will hold the max xcorr value*/ qualityLimit = 0.35; /*0.3 this is subtracted from max quality to determine the limit*/ qualityBottom = 0.2; /*0.2 this is the bottom quality value*/ qualityKeep = 0.85; /*keep candidates with quality greater than this*/ highestQuality = 0; /*this will hold the max quality value*/ probscrLimit = 0.2; /*0.2 this is subtracted from max probscr to determine the limit*/ probscrBottom = 0.1; /*0.1 this is the bottom probscr value*/ probscrKeep = 0.75; /*keep candidates with probscr greater than this*/ highestProbscr = 0; /*this will hold the max probscr value*/ comboLimit = 0.8; /*0.8 fraction of highest probability estimate that is used as threshold*/ comboBottom = gParam.outputThreshold; /*0.2 this is the bottom combo score value*/ highestCombo = 0; /*this will hold the max combo score value*/ } else if(gParam.fragmentPattern == 'L') { intscrLimit = 0.75; /*0.75 fraction of highest intscr that is used as threshold*/ intscrBottom = 0.45; /*0.45 this is the bottom intscr value*/ intscrKeep = 0.85; /*keep candidates with intscr greater than this*/ highestIntScore = 0; /*this will hold the max intscr value*/ xcorrLimit = 0.75; /*0.75 fraction of highest xcorr that is used as threshold*/ xcorrBottom = 0.35; /*0.35 this is the bottom xcorr value*/ xcorrKeep = 0.85; /*keep candidates with intscr greater than this*/ highestXcorr = 0; /*this will hold the max xcorr value*/ qualityLimit = 0.4; /*0.4 this is subtracted from max quality to determine the limit*/ qualityBottom = 0.25; /*0.25 this is the bottom quality value*/ qualityKeep = 0.85; /*keep candidates with intscr greater than this*/ highestQuality = 0; /*this will hold the max quality value*/ probscrLimit = 0.2; /*0.2 this is subtracted from max probscr to determine th limit*/ probscrBottom = 0.25; /*0.25 this is the bottom probscr value*/ probscrKeep = 0.9; /*keep candidates with intscr greater than this*/ highestProbscr = 0; /*this will hold the max probscr value*/ comboLimit = 0.8; /*0.8 fraction of highest probability estimate that is used as threshold*/ comboBottom = gParam.outputThreshold; /*0.2 this is the bottom combo score value*/ highestCombo = 0; /*this will hold the max combo score value*/ } else { intscrLimit = 0.75; /*0.75 fraction of highest intscr that is used as threshold*/ intscrBottom = 0.0; /*0.0 this is the bottom intscr value*/ intscrKeep = 1.0; /*keep candidates with intscr greater than this*/ highestIntScore = 0; /*this will hold the max intscr value*/ xcorrLimit = 0.75; /*0.75 fraction of highest xcorr that is used as threshold*/ xcorrBottom = 0.0; /*0.0 this is the bottom xcorr value*/ xcorrKeep = 1.0; /*keep candidates with intscr greater than this*/ highestXcorr = 0; /*this will hold the max xcorr value*/ qualityLimit = 0.2; /*0.2 this is subtracted from max quality to determine the limit*/ qualityBottom = 0.0; /*0.0 this is the bottom quality value*/ qualityKeep = 1.0; /*keep candidates with intscr greater than this*/ highestQuality = 0; /*this will hold the max quality value*/ probscrLimit = 0.2; /*0.2 this is subtracted from max probscr to determine th limit*/ probscrBottom = 0.0; /*0.0 this is the bottom probscr value*/ probscrKeep = 1.0; /*keep candidates with intscr greater than this*/ highestProbscr = 0; /*this will hold the max probscr value*/ comboLimit = 0.8; /*0.8 fraction of highest probability estimate that is used as threshold*/ comboBottom = gParam.outputThreshold; /*0.2 this is the bottom combo score value*/ highestCombo = 0; /*this will hold the max combo score value*/ } maxFinalSequences = gParam.outputSeqNum; /*maximum number of sequences in the final list*/ massagedSeqListPtr = NULL; /*this is the final list of candidate sequences*/ /*Reset bottom limits if a very low output threshold has been selected*/ if(intscrBottom > gParam.outputThreshold) { intscrBottom = gParam.outputThreshold; } if(xcorrBottom > gParam.outputThreshold) { xcorrBottom = gParam.outputThreshold; } if(qualityBottom > gParam.outputThreshold) { qualityBottom = gParam.outputThreshold; } if(probscrBottom > gParam.outputThreshold) { probscrBottom = gParam.outputThreshold; } /* make sure xcorr, intscr, and quality are less than than one*/ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->crossDressingScore > 1) { currPtr->crossDressingScore = 1; } if(currPtr->intensityScore > 1) { currPtr->intensityScore = 1; } if(currPtr->quality > 1) { currPtr->quality = 1; } currPtr->comboScore = 0; /*should not have any values in it at this point*/ currPtr = currPtr->next; } /* Count the sequences and find the max scores.*/ countTheSeqs = 0; currPtr = firstScorePtr; while(currPtr != NULL) { countTheSeqs++; if(currPtr->intensityScore > highestIntScore) { highestIntScore = currPtr->intensityScore; } if(currPtr->crossDressingScore > highestXcorr) { highestXcorr = currPtr->crossDressingScore; } if(currPtr->quality > highestQuality) { highestQuality = currPtr->quality; } if(currPtr->probScore > highestProbscr) { highestProbscr = currPtr->probScore; } currPtr = currPtr->next; } /* * Determine if any database derived sequences are correct */ gDatabaseSeqCorrect = FALSE; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->databaseSeq) { if(currPtr->intensityScore >= highestIntScore * 0.95 && currPtr->intensityScore >= intscrBottom) { gDatabaseSeqCorrect = TRUE; /*global boolean to report back that there was a database sequence present that compared well against the de novo sequences*/ /*currPtr->comboScore = ComboScore(currPtr);*/ currPtr->databaseSeq = 2; } else if(currPtr->crossDressingScore >= highestXcorr * 0.95 && currPtr->crossDressingScore >= xcorrBottom) { gDatabaseSeqCorrect = TRUE; /*currPtr->comboScore = ComboScore(currPtr);*/ currPtr->databaseSeq = 2; } else if(currPtr->probScore >= highestProbscr * 0.95 && currPtr->probScore >= probscrBottom) { gDatabaseSeqCorrect = TRUE; /*currPtr->comboScore = ComboScore(currPtr);*/ currPtr->databaseSeq = 2; } } currPtr = currPtr->next; } /* * If the database derived sequences are not present, or did not match well, then find the good sequences. */ /*if(!databaseSeqCorrect) {*/ /*Find sequences below xcorr limits and assign zero score values to them.*/ currPtr = firstScorePtr; while(currPtr != NULL) { /*Does the sequence have unusually high xcorr, intscr, quality or probscr?*/ keep = KeepSequence(currPtr, intscrKeep, xcorrKeep, qualityKeep, probscrKeep); if(!keep) { if(currPtr->crossDressingScore < highestXcorr * xcorrLimit || currPtr->crossDressingScore < xcorrBottom) { if((!gDatabaseSeqCorrect) && currPtr->databaseSeq) /*don't whack the database sequence*/ { currPtr->intensityScore = 0; currPtr->crossDressingScore = 0; currPtr->probScore = 0; currPtr->quality = 0; currPtr->comboScore = 0; currPtr->intensityOnlyScore = 0; } } } currPtr = currPtr->next; } /*Find sequences below intscr limits and assign zero score values to them.*/ /*But first find the newest high intscr (since it might have been weeded out based on xcorr*/ highestIntScore = 0; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->intensityScore > highestIntScore) { highestIntScore = currPtr->intensityScore; } currPtr = currPtr->next; } currPtr = firstScorePtr; while(currPtr != NULL) { /*Does the sequence have unusually high xcorr, intscr, quality or probscr?*/ keep = KeepSequence(currPtr, intscrKeep, xcorrKeep, qualityKeep, probscrKeep); if(!keep) { if(currPtr->intensityScore < highestIntScore * intscrLimit || currPtr->intensityScore < intscrBottom) { if((!gDatabaseSeqCorrect) && currPtr->databaseSeq) /*don't whack the database sequence*/ { currPtr->intensityScore = 0; currPtr->crossDressingScore = 0; currPtr->probScore = 0; currPtr->quality = 0; currPtr->comboScore = 0; currPtr->intensityOnlyScore = 0; } } } currPtr = currPtr->next; } /*Find sequences below quality limits and assign zero score values to them.*/ /*But first find the newest high quality score*/ highestQuality = 0; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->quality > highestQuality) { highestQuality = currPtr->quality; } currPtr = currPtr->next; } currPtr = firstScorePtr; while(currPtr != NULL) { /*Does the sequence have unusually high xcorr, intscr, quality or probscr?*/ keep = KeepSequence(currPtr, intscrKeep, xcorrKeep, qualityKeep, probscrKeep); if(!keep) { if(currPtr->quality < highestQuality - qualityLimit || currPtr->quality < qualityBottom) { if((!gDatabaseSeqCorrect) && currPtr->databaseSeq) /*don't whack the database sequence*/ { currPtr->intensityScore = 0; currPtr->crossDressingScore = 0; currPtr->probScore = 0; currPtr->quality = 0; currPtr->comboScore = 0; currPtr->intensityOnlyScore = 0; } } } currPtr = currPtr->next; } /*Find sequences below probscr limits and assign zero score values to them.*/ /*But first find the newest high probscr*/ highestProbscr = 0; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->probScore > highestProbscr) { highestProbscr = currPtr->probScore; } currPtr = currPtr->next; } currPtr = firstScorePtr; while(currPtr != NULL) { /*Does the sequence have unusually high xcorr, intscr, quality or probscr?*/ keep = KeepSequence(currPtr, intscrKeep, xcorrKeep, qualityKeep, probscrKeep); if(!keep) { if(currPtr->probScore < highestProbscr - probscrLimit || currPtr->probScore < probscrBottom) { if((!gDatabaseSeqCorrect) && currPtr->databaseSeq) /*don't whack the database sequence*/ { currPtr->intensityScore = 0; currPtr->crossDressingScore = 0; currPtr->probScore = 0; currPtr->quality = 0; currPtr->comboScore = 0; currPtr->intensityOnlyScore = 0; } } } currPtr = currPtr->next; } /*Calculate the comboScore*/ currPtr = firstScorePtr; while(currPtr != NULL) { currPtr->comboScore = ComboScore(currPtr); /*returns score of zero if all other scores are zero*/ currPtr = currPtr->next; } /*Find highest comboscr value*/ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->comboScore > highestCombo) { highestCombo = currPtr->comboScore; } currPtr = currPtr->next; } /*Find sequences below comboscr limits and assign zero score values to them*/ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->comboScore < highestCombo * comboLimit || currPtr->comboScore < comboBottom) { if((!gDatabaseSeqCorrect) && currPtr->databaseSeq) /*don't whack the database sequence*/ { currPtr->intensityScore = 0; currPtr->crossDressingScore = 0; currPtr->probScore = 0; currPtr->quality = 0; currPtr->comboScore = 0; currPtr->intensityOnlyScore = 0; } } currPtr = currPtr->next; } /*}*/ /* * Store the remaining sequences. */ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->comboScore != 0) /*if there is a comboscore, then save it to the massaged list of seq's; if a database-derived sequence was found to be good, then all non- database sequences have comboscores of zero and are not saved*/ { intScore = currPtr->intensityScore; intOnlyScore = currPtr->intensityOnlyScore; crossDressingScore = currPtr->crossDressingScore; stDevErr = currPtr->stDevErr; calFactor = currPtr->calFactor; quality = currPtr->quality; probScore = currPtr->probScore; comboScore = currPtr->comboScore; cleavageSites = currPtr->cleavageSites; databaseSeq = currPtr->databaseSeq; /*Load sequence and charSequence arrays.*/ i = 0; while(currPtr->peptide[i] != 0) { sequence[i] = currPtr->peptide[i]; charSequence[i] = currPtr->peptideSequence[i]; i++; } sequence[i] = 0; charSequence[i] = 0; seqLength = i; /*Put into the massaged sequence list.*/ massagedSeqListPtr = AddToSeqScoreList(massagedSeqListPtr, LoadSeqScoreStruct(intScore, intOnlyScore, sequence, charSequence, seqLength, stDevErr, cleavageSites, calFactor, databaseSeq, crossDressingScore, quality, seqLength, probScore, comboScore)); } currPtr = currPtr->next; } /* Rank the sequences.*/ if(massagedSeqListPtr != NULL) { SeqComboScoreRanker(massagedSeqListPtr); } /* Eliminate sequences ranked more than maxFinalSequences*/ if(massagedSeqListPtr != NULL) { currPtr = massagedSeqListPtr; while(currPtr != NULL && currPtr->rank > maxFinalSequences) { previousPtr = currPtr; currPtr = currPtr->next; free(previousPtr); } massagedSeqListPtr = currPtr; /*found the first sequence ranked less than 6*/ /*now weed out the rest*/ if(massagedSeqListPtr != NULL && massagedSeqListPtr->next != NULL) { currPtr = massagedSeqListPtr->next; previousPtr = massagedSeqListPtr; while(currPtr != NULL) { if(currPtr->rank > maxFinalSequences) { previousPtr->next = currPtr->next; free(currPtr); currPtr = previousPtr->next; } else { previousPtr = previousPtr->next; currPtr = currPtr->next; } } } } /* Fill in the scores for the top ranked incorrect sequences*/ if(massagedSeqListPtr == NULL) { gWrongIndex++; return(massagedSeqListPtr); } currPtr = massagedSeqListPtr; while(currPtr != NULL) { if(currPtr->rank == 1) { gWrongXCorrScore[gWrongIndex] = currPtr->crossDressingScore/* * topXCorrScore*/; gWrongProbScore[gWrongIndex] = currPtr->probScore; gWrongIntScore[gWrongIndex] = currPtr->intensityScore; gWrongQualityScore[gWrongIndex] = currPtr->quality; gWrongComboScore[gWrongIndex] = currPtr->comboScore; gWrongIndex++; break; } currPtr = currPtr->next; } return(massagedSeqListPtr); } /****************************CalcIonFound***************************************************** * * This function uses the global value of gTolerance and an input value that corresponds to the * absolute value of the mass difference between a calculated ion m/z and an observed m/z. * gTolerance corresponds to a fraction of gParams.fragmentErr (the input fragment ion * tolerance). The function returns a value between zero and one. If the input mass difference * is less than gTolerance, then a one is immediately returned. If the mass difference is * greater than gTolerance, then a value between zero and one is returned, depending on how far * off the mass difference is. * * * New version of CalcIonFound is to use an exponential decay w/ certain boundaries. If the mass * difference is less than gToleranceNarrow, then it is assumed to be completely correct. If it * is between gToleranceNarrow and gToleranceWide it is an exponential decay. Ions in error * greater than gToleranceWide don't even get to this point. */ REAL_4 CalcIonFound(REAL_4 currentIonFound, INT_4 massDiff) { REAL_4 ionFound; /* REAL_4 range;*/ /*range = gToleranceWide - gToleranceNarrow; if(range == 0) { printf("CalcIonFound: range = 0\n"); exit(1); }*/ if(massDiff <= gToleranceNarrow) { ionFound = 1; } else { /*ionFound = gToleranceWide - massDiff; ionFound = ionFound / range;*/ ionFound = exp(-1 * (massDiff*massDiff) / (2 * gToleranceNarrow * gToleranceNarrow)); } if(ionFound < currentIonFound) { ionFound = currentIonFound; /*don't ever replace with a worse value*/ } return(ionFound); } /****************************PrintToConsole*************************************************** * This function prints a list of sequences, scores, ranked according to the intensity score. */ void PrintToConsole(struct SequenceScore *firstScorePtr) { INT_4 i, j, seqNum; REAL_4 xcorrNormalizer; struct SequenceScore *maxPtr, *currPtr; /* Find the xcorr normalizer.*/ xcorrNormalizer = 0; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->crossDressingScore > xcorrNormalizer) { xcorrNormalizer = currPtr->crossDressingScore; } currPtr = currPtr->next; } xcorrNormalizer = 1; /* Set up the screen to print some of the output.*/ /* printf("\n Rank X-corr IntScr IntOnlyScr Quality PevScr StDevErr CS CalFact Sequence\n");*/ printf("\n Rank X-corr IntScr Quality PevScr StDevErr CS CalFact Sequence\n"); /* Count the sequences.*/ seqNum = 0; maxPtr = firstScorePtr; while(maxPtr != NULL) { seqNum++; maxPtr = maxPtr->next; } for(i = 1; i <= 50 && i <= seqNum; i++) /*List the top 50 sequences.*/ { maxPtr = firstScorePtr; while(maxPtr != NULL) { if(maxPtr->rank == i) { /*Change peptide[j] to single letter code.*/ char *peptideString; INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength = 0; j = 0; while(maxPtr->peptide[j] != 0) { peptide[j] = maxPtr->peptideSequence[j]; peptideLength++; j++; } peptideString = PeptideString(peptide, peptideLength); if(maxPtr->databaseSeq) { strcat(peptideString, " "); /*used to denote this was database sequence*/ } if(peptideString) { /*printf(" %3ld %5.3f %5.3f %5.3f %5.3f %5.3f %6.4f %2ld %8.6f %s\n", i, maxPtr->crossDressingScore / xcorrNormalizer, maxPtr->intensityScore, maxPtr->intensityOnlyScore, maxPtr->quality, maxPtr->probScore, maxPtr->stDevErr, maxPtr->cleavageSites, maxPtr->calFactor, peptideString);*/ printf(" %3ld %5.3f %5.3f %5.3f %5.3f %6.4f %2ld %8.6f %s\n", i, maxPtr->crossDressingScore / xcorrNormalizer, maxPtr->intensityScore, maxPtr->quality, maxPtr->probScore, maxPtr->stDevErr, maxPtr->cleavageSites, maxPtr->calFactor, peptideString); free(peptideString); } break; } maxPtr = maxPtr->next; } } /* Print out the sequences w/ x-corr greater than 0.9 that were not already listed above.*/ maxPtr = firstScorePtr; while(maxPtr != NULL) { if(maxPtr->rank <= 50) /*Don't list anything already listed above.*/ { maxPtr = maxPtr->next; continue; } if(maxPtr->crossDressingScore /xcorrNormalizer >= 0.9) { /*Change peptide[j] to single letter code.*/ char *peptideString; INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength = 0; j = 0; while(maxPtr->peptide[j] != 0) { peptide[j] = maxPtr->peptideSequence[j]; peptideLength++; j++; } peptideString = PeptideString(peptide, peptideLength); if(maxPtr->databaseSeq) { strcat(peptideString, " "); /*used to denote this was database sequence*/ } if(peptideString) { printf(" %3ld %5.3f %5.3f %5.3f %5.3f %5.3f %6.4f %2ld %8.6f %s\n", i, maxPtr->crossDressingScore / xcorrNormalizer, maxPtr->intensityScore, maxPtr->intensityOnlyScore, maxPtr->quality, maxPtr->probScore, maxPtr->stDevErr, maxPtr->cleavageSites, maxPtr->calFactor, peptideString); free(peptideString); } } maxPtr = maxPtr->next; } return; } /***********************************AddTagBack************************************************** * * This function is called if a specific sequence tag was used. It inserts the sequence tag back * into the sequence found into the peptide field of firstSequencePtr (linked list of structs of * type Sequence). The value of the field peptideLength is correspondingly adjusted. * */ struct Sequence *AddTagBack(struct Sequence *firstSequencePtr) { INT_4 tagLength, i, j; INT_4 tagSequenceMass[MAX_PEPTIDE_LENGTH], peptide[MAX_PEPTIDE_LENGTH]; struct Sequence *currSeqPtr, *previousPtr, *discardPtr; REAL_4 nMass, nMassGroup; char test; REAL_4 thePeptideMW = gParam.peptideMW; /* First I'll find the length of the sequence tag. */ tagLength = 0; while(gParam.tagSequence[tagLength] != 0) { tagLength++; } /* Next I'll find the N-terminal group mass. */ nMassGroup = gParam.modifiedNTerm; /* I also need to convert the sequence tag from a char array to a list of nominal masses. */ i = 0; while(gParam.tagSequence[i] != 0) { for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { if(gSingAA[j] == gParam.tagSequence[i]) { tagSequenceMass[i] = gGapList[j]; } } } i++; } /* Don't forget to readjust the peptideMW. */ for(i = 0; i < tagLength; i++) { thePeptideMW = thePeptideMW + tagSequenceMass[i]; } /* * Now I'll start looking at each struct of type Sequence, and modifying the peptide and * peptideLength fields. */ currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { for(i = 0; i < currSeqPtr->peptideLength; i++) /*Save the old sequence.*/ { peptide[i] = currSeqPtr->peptide[i]; } nMass = nMassGroup; /*Initialize nMass, which serves to keep track of the addition of N-terminal amino acids.*/ /* In case the tag sequence is N-terminal, do this first:*/ test = TRUE; /*test becomes FALSE if the sequence tag is N-terminal.*/ if(nMass >= gParam.tagNMass - gToleranceWide) { if(nMass <= gParam.tagNMass + gToleranceWide) { test = FALSE; /*The N-terminal mass is the same as the N-terminal group. That is, this sequence tag is an N-terminal sequence.*/ for(j = 0; j < tagLength; j++) { currSeqPtr->peptide[j] = tagSequenceMass[j]; } currSeqPtr->peptideLength = (currSeqPtr->peptideLength) + tagLength; for(j = tagLength; j < currSeqPtr->peptideLength; j++) { currSeqPtr->peptide[j] = peptide[j - tagLength]; } } else { currSeqPtr->peptideLength = 0; /*Set the length to zero as a flag that something is wrong w/ the sequence.*/ } } /* If the tag sequence is not N-terminal, then the following will occur:*/ if(test) { i = 0; /*Used to follow the peptide length.*/ while(i < currSeqPtr->peptideLength) { nMass = nMass + (currSeqPtr->peptide[i]); if(nMass >= gParam.tagNMass - gToleranceWide) { if(nMass <= gParam.tagNMass + gToleranceWide) { for(j = 0; j < tagLength; j++) { currSeqPtr->peptide[i + 1 + j] = tagSequenceMass[j]; } for(j = (i + 1); j < currSeqPtr->peptideLength; j++) { currSeqPtr->peptide[j + tagLength] = peptide[j]; } currSeqPtr->peptideLength = (currSeqPtr->peptideLength) + tagLength; } else { currSeqPtr->peptideLength = 0; /*Set the length to zero as a flag that something is wrong w/ the sequence.*/ } break; } i++; } } currSeqPtr = currSeqPtr->next; } gParam.peptideMW = thePeptideMW; /* Get rid of any sequences that have a peptideLength of zero. */ while(firstSequencePtr != NULL) /*Find first sequence that does not have peptidelength of zero*/ { if(firstSequencePtr->peptideLength != 0) { break; } discardPtr = firstSequencePtr; firstSequencePtr = firstSequencePtr->next; free(discardPtr); } if(firstSequencePtr != NULL) { previousPtr = firstSequencePtr; currSeqPtr = firstSequencePtr->next; while(currSeqPtr != NULL) { if(currSeqPtr->peptideLength == 0) { previousPtr->next = currSeqPtr->next; discardPtr = currSeqPtr; currSeqPtr = currSeqPtr->next; free(discardPtr); } else { currSeqPtr = currSeqPtr->next; previousPtr = previousPtr->next; } } } return(firstSequencePtr); } /*****************************SeqIntensityRanker**************************************************** * * SeqRanker inputs one pointer to a struct of type SequenceScore - firstScorePtr (points * to the first of the scored sequences). This function returns nothing and modifies * the "rank" field of this linked list of structs. The highest ranked sequence score * is ranked "1", etc.. */ void SeqIntensityRanker(struct SequenceScore *firstScorePtr) { struct SequenceScore *currPtr, *maxPtr; INT_4 i, j, k, seqNum; REAL_4 maxScore; BOOLEAN test; /* Count the sequences.*/ currPtr = firstScorePtr; seqNum = 0; while(currPtr != NULL) { seqNum += 1; currPtr = currPtr->next; } /* Eliminate any poser database sequences that crept in due to K/Q and the like.*/ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->databaseSeq) { for(i = 0; i < gSeqNum; i++) { test = TRUE; j = 0; while(gDatabaseSeq[i][j] != 0) { k = currPtr->peptideSequence[j]; if(k != I && k != L) { if(gSingAA[k] != gDatabaseSeq[i][j]) { test = FALSE; /*a mismatch!*/ break; } } j++; } if(test) /*this sequence matches*/ { break; } } if(!test) /*none of the database sequences matched*/ { currPtr->intensityScore = 0; /*strip away designation as a database sequence and give zero score*/ currPtr->databaseSeq = 0; } } currPtr = currPtr->next; } /* Add 0.05 to the intensityScore for database-derived sequences. This 0.05 will be taken away down below, so that the score remains unaltered. However, the ranking will favor the database sequences.*/ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->databaseSeq) { currPtr->intensityScore += 0.05; } currPtr = currPtr->next; } /* Rank the MAX_X_CORR_NUM or seqNum highest scoring sequences (which ever is smaller).*/ i = 1; while(i <= MAX_X_CORR_NUM && i <= seqNum) { maxPtr = firstScorePtr; maxScore = -1000000; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->intensityScore > maxScore && currPtr->rank == 0) { maxPtr = currPtr; maxScore = currPtr->intensityScore; } currPtr = currPtr->next; } maxPtr->rank = i; i++; } /* Now that they've been ranked, restore the original intensityScore value.*/ currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->databaseSeq) { currPtr->intensityScore -= 0.05; } currPtr = currPtr->next; } return; } /*****************************SeqComboScoreRanker**************************************************** * * SeqRanker inputs one pointer to a struct of type SequenceScore - firstScorePtr (points * to the first of the scored sequences). This function returns nothing and modifies * the "rank" field of this linked list of structs. The highest ranked sequence score * is ranked "1", etc.. Ranked according to comboScore field value. */ void SeqComboScoreRanker(struct SequenceScore *firstScorePtr) { struct SequenceScore *currPtr, *maxPtr; INT_4 i, seqNum; REAL_4 maxScore, maxIntScore, maxProbScore; /* Count the sequences.*/ currPtr = firstScorePtr; seqNum = 0; while(currPtr != NULL) { seqNum += 1; currPtr = currPtr->next; } /* Rank the MAX_X_CORR_NUM or seqNum highest scoring sequences (which ever is smaller).*/ i = 1; while(i <= MAX_X_CORR_NUM && i <= seqNum) { maxPtr = firstScorePtr; maxScore = -1000000; maxIntScore = -1000000; maxProbScore = -1000000; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->comboScore > maxScore && currPtr->rank == 0) /*rank by comboscore*/ { maxPtr = currPtr; maxScore = currPtr->comboScore; maxIntScore = currPtr->intensityScore; maxProbScore = currPtr->probScore; } else if(currPtr->comboScore == maxScore && currPtr->rank == 0) { if(currPtr->probScore > maxProbScore && currPtr->rank == 0) /*2nd rank by probScore*/ { maxPtr = currPtr; maxScore = currPtr->comboScore; maxIntScore = currPtr->intensityScore; maxProbScore = currPtr->probScore; } else if(currPtr->probScore == maxProbScore && currPtr->rank == 0) { if(currPtr->intensityScore > maxIntScore && currPtr->rank == 0) /*3rd by intScore*/ { maxPtr = currPtr; maxScore = currPtr->comboScore; maxIntScore = currPtr->intensityScore; maxProbScore = currPtr->probScore; } } } currPtr = currPtr->next; } maxPtr->rank = i; i++; } return; } /**********************FreeAllSequence************************************ * * Used for freeing memory in a linked list. */ void FreeAllSequence(struct Sequence *currPtr) { struct Sequence *freeMePtr; while(currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free(freeMePtr); } return; } /****************************** FindLowestScore********************************************** * * FindLowestScore is a function that inputs a pointer to a struct of type SequenceScore * that is the first in a linked list of such structs. It searches through the * intensityScore fields of this list of structs and finds the one with the lowest intensity * score. It returns a pointer to this struct. */ struct SequenceScore *FindLowestScore(struct SequenceScore *currPtr) { struct SequenceScore *lowestPtr; lowestPtr = currPtr; while(currPtr != NULL) { if(currPtr->intensityScore < lowestPtr->intensityScore) { lowestPtr = currPtr; } currPtr = currPtr->next; } return(lowestPtr); } /******************************AddToSeqScoreList********************************* * AddToSeqScoreList makes a linked list of structs of type SequenceScore. Input variables * are two pointers * to structs of type SequenceScore. The first pointer is firstScorePtr, which is initially * set to NULL, but * is given the address of the first struct in the linked list and remains unchanged * thereafter. The second * pointer points to the new struct to be added. This second pointer already has the * 'next' field set to NULL. */ struct SequenceScore *AddToSeqScoreList(struct SequenceScore *firstPtr, struct SequenceScore *currPtr) { static struct SequenceScore *lastPtr; if(firstPtr == NULL) { firstPtr = currPtr; } else { lastPtr->next = currPtr; } lastPtr = currPtr; return(firstPtr); } /******************************LoadSeqScoreStruct******************************** * * LoadSeqScoreStruct inputs "intScore" (the intensity-based score for the sequence), * "sequence" (the nominal mass of the extensions), and "seqLength" (the sequence length). * It returns a pointer to a struct of type SequenceScore, which has had its peptide * and intensityScore fields loaded using the input values above. * The remaining fields (crossDressingScore, rank, and next) are NULLed. */ struct SequenceScore *LoadSeqScoreStruct(REAL_4 intScore, REAL_4 intOnlyScore, INT_4 *sequence, INT_4 *charSequence, INT_4 seqLength, REAL_4 stDevErr, INT_4 cleavageSites, REAL_4 calFactor, char databaseSeq, REAL_4 normalXCorScore, REAL_4 quality, REAL_4 length, REAL_4 probScore, REAL_4 comboScore) { struct SequenceScore *currPtr; INT_4 i; currPtr = (struct SequenceScore *) malloc(sizeof(struct SequenceScore)); if(currPtr == NULL) { printf("SequenceScore: Out of memory"); exit(1); } for(i = 0; i < seqLength; i++) { currPtr->peptide[i] = sequence[i]; currPtr->peptideSequence[i] = charSequence[i]; } currPtr->peptide[seqLength] = 0; currPtr->peptideSequence[seqLength] = 0; currPtr->intensityScore = intScore; currPtr->calFactor = calFactor; currPtr->cleavageSites = cleavageSites; currPtr->stDevErr = stDevErr; currPtr->databaseSeq = databaseSeq; currPtr->probScore = probScore; currPtr->intensityOnlyScore = intOnlyScore; currPtr->crossDressingScore = normalXCorScore; currPtr->comboScore = comboScore; currPtr->rank = 0; currPtr->length = length; currPtr->quality = quality; currPtr->next = NULL; return(currPtr); } /******************************IntensityOnlyScorer********************************* * * IntensityScorer inputs fragIntensity (the ion intensities), ionFound (the array that * is indexed the same as fragIntensity and contains 1 for ions that have been identified * and 0 for those that were not), and fragNum (the number of fragment * ions in the CID data). This function returns a REAL_4 value (ranging from zero to one) * corresponding to the fraction of the ion current that can be identified times as one * of the known ion types. */ REAL_4 IntensityOnlyScorer(INT_4 *fragIntensity, REAL_4 *ionFound, INT_4 fragNum, INT_4 intensityTotal) { REAL_4 intOnlyScore = 0; INT_4 i; /* Add up the intensity that has been identified, and count the ions.*/ for(i = 0; i < fragNum; i++) { if(ionFound[i] != 0) { intOnlyScore += (REAL_4)fragIntensity[i]; } } if(intensityTotal == 0) { printf("IntensityOnlyScorer: intensityTotal = 0\n"); exit(1); } intOnlyScore = intOnlyScore / intensityTotal; return(intOnlyScore); } /******************************IntensityScorer********************************* * * IntensityScorer inputs fragIntensity (the ion intensities), ionFound (the array that * is indexed the same as fragIntensity and contains 1 for ions that have been identified * and 0 for those that were not), cleavageSites (the number of times either a b or y ion * of any charge was found to delineate the sequence), and fragNum (the number of fragment * ions in the CID data). This function returns a REAL_4 value (ranging from zero to one) * corresponding to the fraction of the ion current that can be identified times a multiplier * that reflects the idea that correct sequences are usually delineated by series of either * b or y ions. */ REAL_4 IntensityScorer(INT_4 *fragIntensity, REAL_4 *ionFound, INT_4 cleavageSites, INT_4 fragNum, INT_4 seqLength, INT_4 intensityTotal) { REAL_4 intScore = 0; REAL_4 numScore = 0; REAL_4 attenuation, avPeaksPerResidue, avResidueNum, peaksPerResidue, realPerAvRatio; INT_4 i; INT_4 intNum = 0; /* * Initialize attenuation, which is a fractional multiplier that reflects the number of * times either b or y ions delineate * the proposed sequence. */ if(seqLength - 1 == 0) { printf("IntensityScorer: seqLength - 1 = 0\n"); exit(1); } attenuation = (REAL_4)cleavageSites / ((REAL_4)(seqLength - 1)); if(attenuation > 1) { attenuation = 1; /*make sure this never exceeds one*/ } if(attenuation < 0) { attenuation = 0; } /*attenuation = attenuation / (seqLength - 1);*/ /* Add up the intensity that has been identified, and count the ions.*/ for(i = 0; i < fragNum; i++) { if(ionFound[i] != 0) { intScore += (REAL_4)fragIntensity[i] * ionFound[i]; numScore += ionFound[i]; /*numScore++;*/ intNum++; } } /* Figure out the number of peaks per average number of residues.*/ avResidueNum = gParam.peptideMW / (REAL_4)gAvResidueMass; if(avResidueNum == 0) { printf("IntensityScorer: avResidueNum = 0\n"); exit(1); } avPeaksPerResidue = (REAL_4)intNum / (REAL_4)avResidueNum; /* Figure out the number of peaks per actual residue.*/ if(seqLength == 0) { printf("IntensityScorer: seqLength = 0\n"); exit(1); } peaksPerResidue = (REAL_4)intNum / (REAL_4)seqLength; /* Now determine the ratio of peaksPerResidue to avPeaksPerResidue.*/ if(avPeaksPerResidue == 0) { return(0); /*avPeaksPerResidue is 0 if no ions match, so return score of 0*/ } realPerAvRatio = (REAL_4)peaksPerResidue / (REAL_4)avPeaksPerResidue; if(realPerAvRatio > 1) { realPerAvRatio = 1; } if(realPerAvRatio < 0) { realPerAvRatio = 0; } /* Percent of ion current accounted for.*/ if(intensityTotal == 0) /*Avoid divide by zero.*/ { printf("IntensityScorer: intensityTotal = 0\n"); exit(1); } intScore = (REAL_4)intScore / (REAL_4)intensityTotal; if(intScore > 1) { intScore = 1; } if(intScore < 0) { intScore = 0; } /* Percent of the number of ions accounted for regardless of their intensity.*/ if(fragNum == 0) /*Avoid divide by zero.*/ { printf("IntensityScorer: fragNum = 0\n"); exit(1); } numScore = (REAL_4)numScore / (REAL_4)fragNum; if(numScore > 1) { numScore = 1; } if(numScore < 0) { numScore = 0; } /* Put the numScore, intScore, attenuation, and realPerAvRatio all together.*/ intScore = ((INTENSITY_WEIGHT * intScore) + (PEAKS_WEIGHT * realPerAvRatio) + (ATTENUATION_WEIGHT * attenuation) + (NUMBER_WEIGHT * numScore)) / ATT_INT_PEAKS_NUM; /*intScore = intScore * intScore * attenuation * numScore;*/ return(intScore); } /************************* ScoreLowMassIons *********************************************** * * ScoreLowMassIons inputs the same information as the function PEFragments (plus the array * "lowMassIons"), and returns nothing. Only the array "ionFound" is modified. * This function determines which amino acids are present in the sequence, and then uses * the array lowMassIons for identification of the low mass amino acid - specific ions. * Only singly charged ions are considered. */ void ScoreLowMassIons(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 lowMassIons[][3], INT_4 *ionType) { INT_4 i, j, k, m, n, p; INT_4 lowMassPlusErr, lowMassMinErr, massDiff; REAL_4 currentIonFound; char test; /* Using the array "sequence", identify the low mass ions.*/ for(i = 0; i < seqLength; i++) { for(j = 0; j < 3; j++) { for(m = 0; m < gAminoAcidNumber; m++) { if((sequence[i] <= gMonoMass_x100[m] + gToleranceWide) && (sequence[i] >= gMonoMass_x100[m] - gToleranceWide)) { if(lowMassIons[m][j] != 0) { k = 0; lowMassPlusErr = lowMassIons[m][j] + gToleranceWide; lowMassMinErr = lowMassIons[m][j] - gToleranceWide; while(fragMOverZ[k] <= lowMassPlusErr) { if(fragMOverZ[k] >= lowMassMinErr) { currentIonFound = ionFound[k]; massDiff = abs(lowMassIons[m][j] - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 14; /*low mass ions*/ } } k++; } } } } } } /* Check for two aa extensions */ for(i = 0; i < seqLength; i++) { test = TRUE; /*yes it is a two aa extension*/ for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] <= gMonoMass_x100[j] + gToleranceWide) && (sequence[i] >= gMonoMass_x100[j] - gToleranceWide)) { test = FALSE; /*no, its not a two aa extension*/ } } if(test) { for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { for(m = 0; m < gAminoAcidNumber; m++) { if(gGapList[m] != 0) { if((sequence[i] <= gGapList[j] + gGapList[m] + gToleranceWide) && (sequence[i] >= gGapList[j] + gGapList[m] - gToleranceWide)) { for(n = 0; n < 3; n++) { if(lowMassIons[j][n] != 0) { k = 0; lowMassPlusErr = lowMassIons[j][n] + gToleranceWide; lowMassMinErr = lowMassIons[j][n] - gToleranceWide; while(fragMOverZ[k] <= lowMassPlusErr) { if(fragMOverZ[k] >= lowMassMinErr) { currentIonFound = ionFound[k]; massDiff = abs(lowMassIons[j][n] - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 14; /*low mass ions*/ } } k++; } } } for(n = 0; n < 3; n++) { if(lowMassIons[m][n] != 0) { k = 0; lowMassPlusErr = lowMassIons[m][n] + gToleranceWide; lowMassMinErr = lowMassIons[m][n] - gToleranceWide; while(fragMOverZ[k] <= lowMassPlusErr) { if(fragMOverZ[k] >= lowMassMinErr) { currentIonFound = ionFound[k]; massDiff = abs(lowMassIons[m][n] - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 14; /*low mass ions*/ } } k++; } } } } } } } } } } /* Check for three aa extensions */ for(i = 0; i < seqLength; i++) { test = TRUE; /*yes it is a two or three aa extension*/ for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] <= gMonoMass_x100[j] + gToleranceWide) && (sequence[i] >= gMonoMass_x100[j] - gToleranceWide)) { test = FALSE; /*no, its not a two aa extension*/ } } if(test) { for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { for(m = 0; m < gAminoAcidNumber; m++) { if(gGapList[m] != 0) { for(p = 0; p < gAminoAcidNumber; p++) { if(gGapList[p] != 0) { if((sequence[i] <= gGapList[j] + gGapList[m] + gGapList[p] + gToleranceWide) && (sequence[i] >= gGapList[j] + gGapList[m] + gGapList[p] - gToleranceWide)) { for(n = 0; n < 3; n++) { if(lowMassIons[j][n] != 0) { k = 0; lowMassPlusErr = lowMassIons[j][n] + gToleranceWide; lowMassMinErr = lowMassIons[j][n] - gToleranceWide; while(fragMOverZ[k] <= lowMassPlusErr) { if(fragMOverZ[k] >= lowMassMinErr) { currentIonFound = ionFound[k]; massDiff = abs(lowMassIons[j][n] - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 14; /*low mass ions*/ } } k++; } } } for(n = 0; n < 3; n++) { if(lowMassIons[m][n] != 0) { k = 0; lowMassPlusErr = lowMassIons[m][n] + gToleranceWide; lowMassMinErr = lowMassIons[m][n] - gToleranceWide; while(fragMOverZ[k] <= lowMassPlusErr) { if(fragMOverZ[k] >= lowMassMinErr) { currentIonFound = ionFound[k]; massDiff = abs(lowMassIons[m][n] - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 14; /*low mass ions*/ } } k++; } } } for(n = 0; n < 3; n++) { if(lowMassIons[p][n] != 0) { k = 0; lowMassPlusErr = lowMassIons[p][n] + gToleranceWide; lowMassMinErr = lowMassIons[p][n] - gToleranceWide; while(fragMOverZ[k] <= lowMassPlusErr) { if(fragMOverZ[k] >= lowMassMinErr) { currentIonFound = ionFound[k]; massDiff = abs(lowMassIons[p][n] - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 14; /*low mass ions*/ } } k++; } } } } } } } } } } } } return; } /****************************** ProInThirdPosition ******************************* * * If there is a proline in the third position for LCQ data, then a value of -1 * is assigned as the score, which is used later to prevent tossing this sequence * out. */ REAL_4 ProInThirdPosition(REAL_4 oldScore, INT_4 *sequence, INT_4 seqLength) { INT_4 i, k, aaCount; REAL_4 score; char test; aaCount = 0; for(k = 0; k < seqLength; k++) { if(k < 4) { test = TRUE; for(i = 0; i < gAminoAcidNumber; i++) { if(sequence[k] <= gGapList[i] + gToleranceWide && sequence[k] >= gGapList[i] - gToleranceWide) { test = FALSE; } } if(test) { aaCount++; aaCount++; } else { aaCount++; } if(aaCount >= 3) break; } } if(aaCount == 3) { if(sequence[k] <= gGapList[P] + gToleranceWide && sequence[k] >= gGapList[P] - gToleranceWide) { score = -1; } else { score = oldScore; } } else { score = oldScore; } return(score); } /****************************** AssignHighMZScore ********************************** * * * */ REAL_4 AssignHighMZScore(INT_4 highMZNum, INT_4 *highMZFrags, INT_4 *highMZInts, REAL_4 totalIntensity, INT_4 *sequence, INT_4 seqLength) { INT_4 bCalStart, yCalStart, nChargeCount, cChargeCount; INT_4 i, j, k, m, bIonMass, bIonMassMinErr, bIonMassPlusErr; INT_4 bMinW, bMinWMinErr, bMinA, bMinAPlusErr; INT_4 yMinW, yMinWMinErr, yMinA, yMinAPlusErr; INT_4 yIonMass, yIonMassMinErr, yIonMassPlusErr; INT_4 yCal, bCal; INT_4 yCalCorrection = 0; INT_4 bCalCorrection = 0; REAL_4 score = 0; char test, test1, maxCharge; REAL_4 *ionFound; ionFound = (float *) malloc(MAX_ION_NUM * sizeof(REAL_4)); if(ionFound == NULL) { printf("AssignHighMZScore: Out of memory."); exit(1); } /*Initialize ionFound*/ for(j = 0; j < MAX_ION_NUM; j++) { ionFound[j] = 0; } if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } /* Initialize the starting b ion mass (acetylated, etc). */ bCalStart = gParam.modifiedNTerm + 0.5; /* Initialize the starting mass for y ions. */ yCalStart = gParam.modifiedCTerm + (2 * gElementMass_x100[HYDROGEN]) + 0.5; /* Count the number of charged residues. nChargeCount is one more than the number of charged residues found in an N-terminal fragment. cChargeCount is one more than the number of charged residues found in a C-terminal fragment. */ nChargeCount = FindNCharge(sequence, seqLength); cChargeCount = 1; /*Step through the sequence.*/ for(i = (seqLength - 1); i > 0; i--) /*Don't do this loop for i = 0 (doesnt make sense).*/ { /* Calculate the singly charged y ion mass. */ yCal = YCalculator(i, sequence, seqLength, yCalStart, yCalCorrection); /* Calculate the singly charged b ion mass. */ bCal = BCalculator(i, sequence, bCalStart, bCalCorrection); /* Readjust the number of charges in the C- and N-terminii. */ if((sequence[i] >= gMonoMass_x100[R] - gToleranceWide && sequence[i] <= gMonoMass_x100[R] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[H] - gToleranceWide && sequence[i] <= gMonoMass_x100[H] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[K] - gToleranceWide && sequence[i] <= gMonoMass_x100[K] + gToleranceWide)) { cChargeCount += 1; nChargeCount -= 1; } else /*Check to see if its a two amino acid combo that could contain Arg, His, or Lys.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] >= gArgPlus[j] - gToleranceWide && sequence[i] <= gArgPlus[j] + gToleranceWide) || (sequence[i] >= gHisPlus[j] - gToleranceWide && sequence[i] <= gHisPlus[j] + gToleranceWide) || (sequence[i] >= gLysPlus[j] - gToleranceWide && sequence[i] <= gLysPlus[j] + gToleranceWide)) { cChargeCount += 1; nChargeCount -= 1; break; } } } /* Check each charge state up to the parent ion charge. */ for(j = 1; j <= maxCharge; j++) { /* Initialize variables within this j loop. */ test = FALSE; /*Used to test if b, a, or y ions are found before looking for the corresponding losses of ammonia or water.*/ bIonMass = (bCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bIonMassMinErr = bIonMass - gToleranceWide; bIonMassPlusErr = bIonMass + gToleranceWide; yIonMass = (yCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yIonMassMinErr = yIonMass - gToleranceWide; yIonMassPlusErr = yIonMass + gToleranceWide; /* Search for b ions.*/ if((bIonMass * j) > ((j-1) * 400 * gMultiplier)) /*Make sure there is enough mass to hold the charge.*/ { k = highMZNum - 1; while(highMZFrags[k] >= bIonMassMinErr && k >= 0) { if(highMZFrags[k] <= bIonMassPlusErr) { if(nChargeCount >= j) /*Make sure enough charges can be attached.*/ { test = TRUE; /*A b ion of charge j has been identified.*/ ionFound[k] = 1; } } k--; } /* * Search for b minus ammonia or water. The index value of k is carried over from the * b ion search, since * the while loop breaks out once the mass value is less than the calculated b value * (ie, k is close to * b minus ammonia or water. */ test1 = FALSE; if(!test) { if(sequence[0] <= gMonoMass_x100[K] + gParam.fragmentErr && sequence[0] >= gMonoMass_x100[K] - gParam.fragmentErr && gParam.fragmentErr > 0.04 * gMultiplier) { test1 = TRUE; } if(sequence[0] <= gMonoMass_x100[Q] + gParam.fragmentErr && sequence[0] >= gMonoMass_x100[Q] - gParam.fragmentErr) { test1 = TRUE; } if(!test1) { for(m = 0; m < gAminoAcidNumber; m++) { if(sequence[0] <= gGlnPlus[m] + gParam.fragmentErr && sequence[0] >= gGlnPlus[m] - gParam.fragmentErr) { test1 = TRUE; break; } if(sequence[0] + sequence[1] <= gGlnPlus[m] + gParam.fragmentErr && sequence[0] + sequence[1] >= gGlnPlus[m] - gParam.fragmentErr) { test1 = TRUE; break; } } } } if(!test1) { if(sequence[0] <= gMonoMass_x100[E] + gParam.fragmentErr && sequence[0] >= gMonoMass_x100[E] - gParam.fragmentErr) { test1 = TRUE; } if(!test1) { for(m = 0; m < gAminoAcidNumber; m++) { if(sequence[0] <= gGluPlus[m] + gParam.fragmentErr && sequence[0] >= gGluPlus[m] - gParam.fragmentErr) { test1 = TRUE; break; } if(sequence[0] + sequence[1] <= gGluPlus[m] + gParam.fragmentErr && sequence[0] + sequence[1] >= gGluPlus[m] - gParam.fragmentErr) { test1 = TRUE; break; } } } } if(test || test1) { /* Calculate the b minus water and b minus ammonia values.*/ bMinW = (bCal - gWater + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bMinWMinErr = bMinW - gToleranceWide; bMinA = (bCal - gAmmonia + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bMinAPlusErr = bMinA + gToleranceWide; if(nChargeCount >= j) { while(highMZFrags[k] >= bMinWMinErr && k >= 0) { if(highMZFrags[k] <= bMinAPlusErr) { if(test) { ionFound[k] = 1; /*full count if b ion is present*/ } else { ionFound[k] = 0.75; /*partial count if Nterm QorE*/ } } k--; } } } } /*if((bIonMass * j) > ((j-1) * 50000))*/ /* Search for the y ion values.*/ if((yIonMass * j) > ((j-1) * 400 * gMultiplier)) /*Make sure there is enough mass to hold the charge.*/ { test = FALSE; k = highMZNum - 1; while(highMZFrags[k] >= yIonMassMinErr && k >= 0) { if(highMZFrags[k] <= yIonMassPlusErr) { if(cChargeCount >= j) /*Make sure enough charges can be attached.*/ { test = TRUE; /*A y ion of charge j has been identified.*/ ionFound[k] = 1; } } k--; } /* * Search for y minus ammonia or water. The index value of k is carried over from the * y ion search, since * the while loop breaks out once the mass value is less than the calculated y value * (ie, k is close to * y minus ammonia or water. */ if(test) { /* Calculate the y minus water and y minus ammonia values.*/ yMinW = (yCal - gWater + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yMinWMinErr = yMinW - gToleranceWide; yMinA = (yCal - gAmmonia + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yMinAPlusErr = yMinA + gToleranceWide; if(cChargeCount >= j) { while(highMZFrags[k] >= yMinWMinErr && k >= 0) { if(highMZFrags[k] <= yMinAPlusErr) { ionFound[k] = 1; } k--; } } } } /*if((yIonMass * j) > ((j-1) * 50000))*/ } /*for j*/ } /*for i*/ score = 0; for(i = 0; i < highMZNum; i++) { if(ionFound[i] != 0) { score += highMZInts[i] * ionFound[i]; } } if(totalIntensity == 0) { printf("AssignHighMZScore: totalIntensity = 0\n"); exit(1); } score = score / totalIntensity; free(ionFound); return(score); } /******************************FindABYIons*********************************** * * FindABYIons identifies a, b, and y ions, plus losses of ammonia and water from these * three ion types. * The input is as described in the documentation for the function PEFragments, and it * returns a INT_4 containing the * value "cleavageSites", which is the number of cleavage sites (amide bonds) that are * defined by a b or y ion. * The array ionFound is also modified. */ INT_4 FindABYIons(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, char argPresent, REAL_4 *yFound, REAL_4 *bFound, REAL_8 *byError, INT_4 *ionType) { INT_4 i, j, nChargeCount, cChargeCount, bCal, yCal, cleavageSites; INT_4 bOrYIon, k, aIonMass, aIonMassMinErr, aIonMassPlusErr; INT_4 aMinW, aMinWMinErr, aMinA, aMinAPlusErr, bCalStart, yCalStart; INT_4 bIonMass, bIonMassMinErr, bIonMassPlusErr; INT_4 bMinW, bMinA, bMinWMinErr, bMinAPlusErr; INT_4 yIonMass, yIonMassMinErr, yIonMassPlusErr; INT_4 yMinW, yMinWMinErr, yMinA, yMinAPlusErr; INT_4 bMin64, lossOf64, bMin64MinErr, bMin64PlusErr; INT_4 yMin64, yMin64MinErr, yMin64PlusErr; INT_4 bCount, yCount, bSeries, ySeries, bIon, yIon, massDiff, massDiffW, massDiffA; INT_4 bCountStringent, yCountStringent, bSeriesStringent, ySeriesStringent; INT_4 precursor, m, skipOneY, skipOneB; INT_4 yCalCorrection = 0; INT_4 bCalCorrection = 0; INT_4 nOxMetCount = 0; INT_4 cOxMetCount = 0; INT_4 bSingleAACount = 0, ySingleAACount = 0, bSingleAASeries = 0, ySingleAASeries = 0; char test, twoAAExtension, maxCharge; BOOLEAN monoToAvYSwitch = TRUE; /*Used to recalculate for average masses.*/ BOOLEAN avToMonoBSwitch = FALSE; /*Used to recalculate for average masses.*/ BOOLEAN twoAANTerm; char testForPro; REAL_4 currentIonFound; /* Initialize a few variables.*/ bCount = 0; /*Current number of b ions in a row.*/ bCountStringent = 0; /*A more stringent count used for quality assessment of the spectrum*/ yCount = 0; /*Ditto.*/ yCountStringent = 0; bSeries = 0; /*The greatest number of b ions in a row for a given sequence.*/ bSeriesStringent = 0; ySeries = 0; /*The greatest number of y ions in a row for a given sequence.*/ ySeriesStringent = 0; skipOneY = 1; /*Allows for one missed y ion in a series w/o resetting to zero.*/ skipOneB = 1; /*Allows for one missed b ion in a series w/o resetting to zero.*/ cleavageSites = 0; /*Used to count the number of times a y or b ion of any charge state delineates a sequence. */ gCleavageSiteStringent = 0; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; lossOf64 = gElementMass_x100[CARBON] + gElementMass_x100[SULFUR] + gElementMass_x100[OXYGEN] + 4 * gElementMass_x100[HYDROGEN]; if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } /* Initialize the starting b ion mass (acetylated, etc). */ bCalStart = gParam.modifiedNTerm + 0.5; /*Determine the correction factor for high accuracy*/ i = gParam.modifiedNTerm * 10 + 0.5; bCalCorrection = i - bCalStart * 10; yCalStart = gParam.modifiedCTerm + + (2 * gElementMass_x100[HYDROGEN]) + 0.5; i = (gParam.modifiedCTerm + (gMultiplier * gElementMass[HYDROGEN] * 2)) * 10 + 0.5; yCalCorrection = i - yCalStart * 10; /* Count the number of charged residues. nChargeCount is one more than the number of charged residues found in an N-terminal fragment. cChargeCount is one more than the number of charged residues found in a C-terminal fragment. */ nChargeCount = FindNCharge(sequence, seqLength); cChargeCount = 1; /* Determine if oxidized methionine is present. nOxMetCount and cOxMetCount are integer values, depending on if the b and y ions contain an oxidized methionine. */ nOxMetCount = FindNOxMet(sequence, seqLength); cOxMetCount = 0; /* Figure out if the N-terminus is a two amino acid extension (TRUE or FALSE). */ twoAANTerm = TwoAAExtFinder(sequence, 0); for(i = (seqLength - 1); i > 0; i--) /*Don't do this loop for i = 0 (doesnt make sense).*/ { /* Initialize some variables for this 'for' loop. */ bOrYIon = 0; /*If any number of b or y ions of any charge are found, then this equals one. Otherwise, it stays at zero.*/ bIon = 0; /*If a b ion is found this equals one.*/ yIon = 0; /*If a y ion is found this equals one.*/ /* Figure out if this is a two amino acid extension (TRUE or FALSE). */ twoAAExtension = TwoAAExtFinder(sequence, i); /* Calculate the singly charged y ion mass. */ yCal = YCalculator(i, sequence, seqLength, yCalStart, yCalCorrection); /* Calculate the singly charged b ion mass. */ bCal = BCalculator(i, sequence, bCalStart, bCalCorrection); /* If the N-terminus is not a two amino acid extension, its unlikely that one can find a b1 ion, so give bCal an irrelevant value, so that an accidental match to b1 is not made. */ if(i == 1) { /*find out if first amino acid is a twoAA extension*/ twoAAExtension = TwoAAExtFinder(sequence, i - 1); if(!twoAAExtension) { bCal = 10; } /*get the twoAAextension for the second position again*/ twoAAExtension = TwoAAExtFinder(sequence, i); } /* Readjust the number of oxidized methionines. */ if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0) /*mass accuracy sufficient to determine oxMet*/ { if(sequence[i] >= gMonoMass_x100[9] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[9] + gToleranceNarrow) { nOxMetCount--; /*b ions lost a oxMet*/ cOxMetCount++; /*y ions gained a oxMet*/ } if(nOxMetCount < 0 || cOxMetCount < 0) { printf("LutefiskScore:FindABYIons The number of oxidized Mets went negative."); exit(1); } } else /*mass accuracy not sufficient to differentiate oxMet from Phe*/ { if(sequence[i] >= gMonoMass_x100[F] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[F] + gToleranceNarrow) { nOxMetCount--; /*b ions lost a oxMet*/ cOxMetCount++; /*y ions gained a oxMet*/ } if(nOxMetCount < 0 || cOxMetCount < 0) { printf("LutefiskScore:FindABYIons The number of oxidized Mets went negative."); exit(1); } } /* Readjust the number of charges in the C- and N-terminii. */ if((sequence[i] >= gMonoMass_x100[R] - gToleranceWide && sequence[i] <= gMonoMass_x100[R] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[H] - gToleranceWide && sequence[i] <= gMonoMass_x100[H] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[K] - gToleranceWide && sequence[i] <= gMonoMass_x100[K] + gToleranceWide)) { cChargeCount += 1; nChargeCount -= 1; } else /*Check to see if its a two amino acid combo that could contain Arg, His, or Lys.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] >= gArgPlus[j] - gToleranceWide && sequence[i] <= gArgPlus[j] + gToleranceWide) || (sequence[i] >= gHisPlus[j] - gToleranceWide && sequence[i] <= gHisPlus[j] + gToleranceWide) || (sequence[i] >= gLysPlus[j] - gToleranceWide && sequence[i] <= gLysPlus[j] + gToleranceWide)) { cChargeCount += 1; nChargeCount -= 1; break; } } } /* Check each charge state up to the parent ion charge. */ for(j = 1; j <= maxCharge; j++) { /* Initialize variables within this j loop. */ test = FALSE; /*Used to test if b, a, or y ions are found before looking for the corresponding losses of ammonia or water.*/ aIonMass = (bCal - gCO + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; aIonMassMinErr = aIonMass - gToleranceWide; aIonMassPlusErr = aIonMass + gToleranceWide; bIonMass = (bCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bIonMassMinErr = bIonMass - gToleranceWide; bIonMassPlusErr = bIonMass + gToleranceWide; yIonMass = (yCal + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yIonMassMinErr = yIonMass - gToleranceWide; yIonMassPlusErr = yIonMass + gToleranceWide; /* Search for b ions.*/ if((bIonMass * j) > ((j-1) * 400 * gMultiplier)) /*Make sure there is enough mass to hold the charge.*/ { k = fragNum - 1; while(fragMOverZ[k] >= bIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= bIonMassPlusErr) { if(nChargeCount >= j) /*Make sure enough charges can be attached.*/ { bOrYIon = 1; /*A b or y ion has been identified.*/ test = TRUE; /*A b ion of charge j has been identified.*/ bIon = 1; /*A b ion is present.*/ massDiff = abs(bIonMass - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); /* If the charge under consideration "j" is equal to the precursor charge, and if that precursor charge is greater than one, then the bion intensity is attenuated. Alternatively, if the b ion mass is greater than the precusor, while at the same time the number of basic residues in the b ion is less than the number of charges on the precursor, then that is also grounds for reducing the influence of the ion. */ if((j == gParam.chargeState && gParam.chargeState > 1) || (bIonMass > precursor && nChargeCount < gParam.chargeState)) { if(gParam.fragmentPattern != 'L') { ionFound[k] = ionFound[k] * HIGH_MASS_B_ION_MULTIPLIER; } } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) /*if old number is bigger than new*/ { ionFound[k] = currentIonFound; } else { ionType[k] = 2; /*b ions are type 2*/ } if(j < gParam.chargeState && (bIonMass < precursor || gParam.fragmentPattern == 'L')) { bFound[k] = 1; byError[k] = AssignError(byError[k], bIonMass, fragMOverZ[k]); } } } k--; } /* * Search for b minus ammonia or water. The index value of k is carried over from the * b ion search, since * the while loop breaks out once the mass value is less than the calculated b value * (ie, k is close to * b minus ammonia or water. */ if(!test) { if(sequence[0] <= gMonoMass_x100[K] + gParam.fragmentErr && sequence[0] >= gMonoMass_x100[K] - gParam.fragmentErr && gParam.fragmentErr > 0.04 * gMultiplier) { test = TRUE; } if(sequence[0] <= gMonoMass_x100[Q] + gParam.fragmentErr && sequence[0] >= gMonoMass_x100[Q] - gParam.fragmentErr) { test = TRUE; } if(!test) { for(m = 0; m < gAminoAcidNumber; m++) { if(sequence[0] <= gGlnPlus[m] + gParam.fragmentErr && sequence[0] >= gGlnPlus[m] - gParam.fragmentErr) { test = TRUE; break; } if(sequence[0] + sequence[1] <= gGlnPlus[m] + gParam.fragmentErr && sequence[0] + sequence[1] >= gGlnPlus[m] - gParam.fragmentErr) { test = TRUE; break; } } } } if(!test) { if(sequence[0] <= gMonoMass_x100[E] + gParam.fragmentErr && sequence[0] >= gMonoMass_x100[E] - gParam.fragmentErr) { test = TRUE; } if(!test) { for(m = 0; m < gAminoAcidNumber; m++) { if(sequence[0] <= gGluPlus[m] + gParam.fragmentErr && sequence[0] >= gGluPlus[m] - gParam.fragmentErr) { test = TRUE; break; } if(sequence[0] + sequence[1] <= gGluPlus[m] + gParam.fragmentErr && sequence[0] + sequence[1] >= gGluPlus[m] - gParam.fragmentErr) { test = TRUE; break; } } } } if(test || argPresent) { /* Calculate the b minus water and b minus ammonia values.*/ bMinW = (bCal - gWater + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bMinWMinErr = bMinW - gToleranceWide; bMinA = (bCal - gAmmonia + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bMinAPlusErr = bMinA + gToleranceWide; if(nChargeCount >= j) { while(fragMOverZ[k] >= bMinWMinErr && k >= 0) { if(fragMOverZ[k] <= bMinAPlusErr) { massDiffW = abs(bMinW - fragMOverZ[k]); massDiffA = abs(bMinA - fragMOverZ[k]); if(massDiffW < massDiffA) { massDiff = massDiffW; } else { massDiff = massDiffA; } currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(gParam.chargeState > 1) { ionFound[k] = ionFound[k] * NEUTRAL_LOSS_MULTIPLIER; if((j == gParam.chargeState && gParam.chargeState > 1) || (bMinWMinErr > precursor && nChargeCount < gParam.chargeState)) { if(gParam.fragmentPattern != 'L') { ionFound[k] = ionFound[k] * HIGH_MASS_B_ION_MULTIPLIER; } } } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 3; /*b-17/18 ions are type 2*/ } } k--; } } } } /*if((bIonMass * j) > ((j-1) * 50000))*/ /* * Search for the a ions. * The k index value is retained from the b ion search, and the search for a continues * downward. */ /*Make sure there is enough mass to hold the charge, and that there is a b ion.*/ if((aIonMass * j) > ((j-1) * 400 * gMultiplier) && test) { test = FALSE; while(fragMOverZ[k] >= aIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= aIonMassPlusErr) { if(nChargeCount >= j) /*Make sure enough charges can be attached.*/ { test = TRUE; /*An a ion of charge j has been identified.*/ massDiff = abs(aIonMass - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if((i > 2 && !twoAANTerm) || (i > 1 && twoAANTerm)) { if((j == gParam.chargeState && gParam.chargeState > 1) || (aIonMass > precursor && nChargeCount < gParam.chargeState)) { ionFound[k] = ionFound[k] * HIGH_MASS_A_ION_MULTIPLIER * HIGH_MASS_A_ION_MULTIPLIER; } else { ionFound[k] = ionFound[k] * HIGH_MASS_A_ION_MULTIPLIER; } } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 4; /*a ions are type 2*/ } } } k--; } /* * Search for a minus ammonia or water. The index value of k is carried over from the * a ion search, since * the while loop breaks out once the mass value is less than the calculated a value * (ie, k is now close to * a minus ammonia or water. */ if(test || argPresent) { /* Calculate the a minus water and a minus ammonia values.*/ aMinW = (bCal - gCO - gWater + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; aMinWMinErr = aMinW - gToleranceWide; aMinA = (bCal - gCO - gAmmonia + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; aMinAPlusErr = aMinA + gToleranceWide; if(nChargeCount >= j) { while(fragMOverZ[k] >= aMinWMinErr && k >= 0) { if(fragMOverZ[k] <= aMinAPlusErr) { massDiffW = abs(aMinW - fragMOverZ[k]); massDiffA = abs(aMinA - fragMOverZ[k]); if(massDiffW < massDiffA) { massDiff = massDiffW; } else { massDiff = massDiffA; } currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if((i > 2 && !twoAANTerm) || (i > 1 && twoAANTerm) || j != 1) { ionFound[k] = ionFound[k] * HIGH_MASS_A_ION_MULTIPLIER; } if(gParam.chargeState > 1) { ionFound[k] = ionFound[k] * NEUTRAL_LOSS_MULTIPLIER; } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 5; /*a-17/18 ions are type 2*/ } } k--; } } } } /*if((aIonMass * j) > ((j-1) * 50000))*/ /* * Search for the b minus 64 ions if oxMet is present. * The k index value is retained from the b and a ion search, and continues * downward. */ if(test && nOxMetCount) /*there's a b ion and oxMet is in the b ion*/ { bMin64 = (bCal - lossOf64 + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; bMin64MinErr = bMin64 - gToleranceWide; bMin64PlusErr = bMin64 + gToleranceWide; while(fragMOverZ[k] >= bMin64MinErr && k >= 0) { if(fragMOverZ[k] <= bMin64PlusErr) { massDiff = abs(bMin64 - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0) { ionFound[k] = ionFound[k] * OXMET_MULTIPLIER; } else { ionFound[k] = ionFound[k] * PHE_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 6; /*losses from oxidized Met ions are type 2*/ } } k--; } } /* Search for the y ion values.*/ if((yIonMass * j) > ((j-1) * 400 * gMultiplier)) /*Make sure there is enough mass to hold the charge.*/ { test = FALSE; k = fragNum - 1; while(fragMOverZ[k] >= yIonMassMinErr && k >= 0) { if(fragMOverZ[k] <= yIonMassPlusErr) { if(cChargeCount >= j) /*Make sure enough charges can be attached.*/ { bOrYIon = 1; /*A b or y ion has been identified.*/ test = TRUE; /*A y ion of charge j has been identified.*/ yIon = 1; /*A y ion is present.*/ massDiff = abs(yIonMass - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(j == gParam.chargeState && gParam.chargeState > 1) { if((((i > 2 && !twoAANTerm) || (i > 1 && twoAANTerm)) && gParam.fragmentPattern != 'L') || (i > (INT_4)seqLength / 3 && gParam.fragmentPattern == 'L')) { ionFound[k] = ionFound[k] * HIGH_CHARGE_Y_ION_MULTIPLIER; } } if(twoAAExtension) { /*see if the twoaa ext could contain pro*/ testForPro = FALSE; for(m = 0; m < gAminoAcidNumber; m++) { if(sequence[i] <= gProPlus[m] + gToleranceWide && sequence[i] >= gProPlus[m] - gToleranceWide) { testForPro = TRUE; } } if(!testForPro) /*if could contain pro then don't attenuate*/ { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 7; /*y ions are type 7*/ } if(j < gParam.chargeState) { yFound[k] = 1; byError[k] = AssignError(byError[k], yIonMass, fragMOverZ[k]); } } } k--; } /* * Search for y minus ammonia or water. The index value of k is carried over from the * y ion search, since * the while loop breaks out once the mass value is less than the calculated y value * (ie, k is close to * y minus ammonia or water. */ if(test || argPresent) { /* Calculate the y minus water and y minus ammonia values.*/ yMinW = (yCal - gWater + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yMinWMinErr = yMinW - gToleranceWide; yMinA = (yCal - gAmmonia + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yMinAPlusErr = yMinA + gToleranceWide; if(cChargeCount >= j) { while(fragMOverZ[k] >= yMinWMinErr && k >= 0) { if(fragMOverZ[k] <= yMinAPlusErr) { massDiffW = abs(yMinW - fragMOverZ[k]); massDiffA = abs(yMinA - fragMOverZ[k]); if(massDiffW < massDiffA) { massDiff = massDiffW; } else { massDiff = massDiffA; } currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(gParam.chargeState > 1) { ionFound[k] = ionFound[k] * NEUTRAL_LOSS_MULTIPLIER; if(j == gParam.chargeState) { ionFound[k] = ionFound[k] * HIGH_CHARGE_Y_ION_MULTIPLIER; } } if(twoAAExtension) { ionFound[k] = ionFound[k] * TWO_AA_EXTENSION_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 8; /*y-17/18 ions are type 8*/ } } k--; } } /* * Search for the b minus 64 ions if oxMet is present. * The k index value is retained from the b and a ion search, and continues * downward. */ if(test && cOxMetCount) /*there's a b ion and oxMet is in the b ion*/ { yMin64 = (yCal - lossOf64 + (j * gElementMass_x100[HYDROGEN]) - gElementMass_x100[HYDROGEN]) / j; yMin64MinErr = yMin64 - gToleranceWide; yMin64PlusErr = yMin64 + gToleranceWide; while(fragMOverZ[k] >= yMin64MinErr && k >= 0) { if(fragMOverZ[k] <= yMin64PlusErr) { massDiff = abs(yMin64 - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0) { ionFound[k] = ionFound[k] * OXMET_MULTIPLIER; } else { ionFound[k] = ionFound[k] * PHE_MULTIPLIER; } if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 9; /*losses from oxMet ions are type 2*/ } } k--; } } } } /*if((yIonMass * j) > ((j-1) * 50000))*/ } /*for j*/ if(bIon) { bCount++; /*If there was a b ion, then increment by one.*/ skipOneB = 1; bCountStringent++; /*this count doesn't allow for gaps in the series*/ } else { if(skipOneB == 0) { bCount = 0; /*Otherwise, reset the counting of b ions to zero.*/ } skipOneB = 0; /*Allow one missing b ion before resetting bCount*/ bCountStringent = 0; } /*Sometimes a K or R is forced into the C-terminus even if no fragments are found. If the C-terminus is not a gap, then increment bCount, but don't reset the skipOneB, since if the penultimate amino acid lacks cleavage then it should be trashed.*/ if(twoAAExtension == FALSE && i == seqLength - 1 && !bIon) { bCount++; } /*bSingleAACount counts the contiguous b ions differing by single amino acids.*/ if(bIon && !twoAAExtension) { bSingleAACount++; } else if(bIon && twoAAExtension) { bSingleAACount = 1; /*reset to one so that C-terminal cleavage next to dipeptide is counted*/ } else { bSingleAACount = 0; } if(yIon) { yCount++; skipOneY = 1; yCountStringent++; /*this count doesn't allow for gaps in the series*/ } else { if(skipOneY == 0) { yCount = 0; } skipOneY = 0; yCountStringent = 0; } if(twoAAExtension == FALSE && i == seqLength - 1 && !yIon) { yCount++; } /*ySingleAACount counts the contiguous y ions differing by single amino acids.*/ if(yIon && !twoAAExtension) /*if there's a y ion and its a single amino acid*/ { ySingleAACount++; } else if(yIon && twoAAExtension) /*if there's a y ion and its a two aa extension*/ { ySingleAACount = 1; /*reset to one so that C-terminal cleavage next to dipeptide is counted*/ } else { ySingleAACount = 0; } if(bCount > bSeries) { bSeries = bCount; /*Don't forget what the longest continuous b series was.*/ } if(yCount > ySeries) { ySeries = yCount; /*Don't forget what the longest continuous y series was.*/ } if(bCountStringent > bSeriesStringent) { bSeriesStringent = bCountStringent; } if(yCountStringent > ySeriesStringent) { ySeriesStringent = yCountStringent; } /*bSingleAASeries and ySingleAASeries keep track of the longest stretch of single aa seq*/ if(bSingleAACount > bSingleAASeries) { bSingleAASeries = bSingleAACount; } if(ySingleAACount > ySingleAASeries) { ySingleAASeries = ySingleAACount; } /*if(gParam.fragmentPattern == 'L' && (i < 3 || i > (seqLength - 3))) { cleavageSites++;*/ /*For LCQ data, the ends are not well established and should not be counted against sequences*/ /*} else { cleavageSites += bOrYIon;*/ /*Count the number of times a b or y ion define a cleavage site.*/ /*}*/ } /*for i*/ /* * For TSQ data I found that I get better scoring if I look for contiguous series of either * b or y ions. For LCQ data, because of the missing low mass end, I cannot expect a * contiguous series, especially since the high mass b ions will compensate for the loss of * low mass y ions. */ if(gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q' || gParam.fragmentPattern == 'L') { if(ySeries > bSeries) { cleavageSites = ySeries; gCleavageSiteStringent = ySeriesStringent; } else { cleavageSites = bSeries; gCleavageSiteStringent = bSeriesStringent; } /*This is the number of cleavage sites defining a contiguous series of single aa's*/ if(ySingleAASeries > bSingleAASeries) { gSingleAACleavageSites = ySingleAASeries; } else { gSingleAACleavageSites = bSingleAASeries; } } return(cleavageSites); } /******************************InternalFrag************************************************ * * InternalFrag identifies internal fragment ions (where neither the C- nor N-terminii * are present). Included are the losses of CO, water, and ammonia from the usual b-type * internal fragment. The input is as described in the documentation for the function * PEFragments, and it returns nothing. Only the array ionFound is modified. Only * singly-charged ions are considered here. */ void InternalFrag(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 fragNum, INT_4 *ionType) { INT_4 i, j, k, intFrag, intFragMinErr, intFragPlusErr, saveIndex; INT_4 intFragMinW, intFragMinWMinErr; INT_4 intFragMinA, intFragMinAPlusErr; INT_4 intFragMinCO, intFragMinCOMinErr, intFragMinCOPlusErr; INT_4 massDiff, massDiffW, massDiffA, precursor; char intFragFound; REAL_4 currentIonFound; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; if(seqLength >= 4) /*Sequences less than 4 amino acids cannot have internal fragment ions.*/ { for(i = 1; i < (seqLength - 3); i++) /*This is the N-terminus of the fragment.*/ { for(j = (i + 1); j < (seqLength - 2); j++) /*C-terminus of the fragment.*/ { intFragFound = 0; /*This tests to see if the standard int frag is present; if it is present, then the losses of CO, water, and ammonia are also searched for.*/ intFrag = gElementMass_x100[HYDROGEN]; /*Calc the mass of the int frag.*/ for(k = i; k <= j; k++) { intFrag += sequence[k]; } intFragMinErr = intFrag - gToleranceWide; intFragPlusErr = intFrag + gToleranceWide; if(intFragMinErr < precursor) /*Only count those matches where the internal frag mass is less than the precursor m/z value.*/ { k = 0; while(fragMOverZ[k] <= intFragPlusErr && k < fragNum) /*Look for this ion.*/ { if(fragMOverZ[k] >= intFragMinErr) { currentIonFound = ionFound[k]; massDiff = abs(intFrag - fragMOverZ[k]); ionFound[k] = CalcIonFound(ionFound[k], massDiff); /*Attenuate the internal ion intensity, unless its a short one w/ P at N-term*/ if(sequence[i] != gGapList[P] || j > i + 2) { ionFound[k] = ionFound[k] * INTERNAL_FRAG_MULTIPLIER ; } intFragFound = 1; /*Its been found.*/ if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { if(sequence[i] != gGapList[P] || j > i + 2) { ionType[k] = 10; /*internal frag w/o Pro*/ } else { ionType[k] = 11; /*internal frag w/ Pro*/ } } } k++; } saveIndex = k; /*Start looking for the next fragment types using this index.*/ } if(intFragFound) { /* Calculate the various losses and plus/minus tolerances. */ intFragMinW = intFrag - gWater; intFragMinWMinErr = intFragMinW - gToleranceWide; intFragMinA = intFrag - gAmmonia; intFragMinAPlusErr = intFragMinA + gToleranceWide; intFragMinCO = intFrag - gCO; intFragMinCOMinErr = intFragMinCO - gToleranceWide; intFragMinCOPlusErr = intFragMinCO + gToleranceWide; /* Calculate internal fragments minus water or ammonia. */ k = saveIndex; while(fragMOverZ[k] >= intFragMinWMinErr && k >= 0) { if(fragMOverZ[k] <= intFragMinAPlusErr) { massDiffW = abs(intFragMinW - fragMOverZ[k]); massDiffA = abs(intFragMinA - fragMOverZ[k]); if(massDiffW < massDiffA) { massDiff = massDiffW; } else { massDiff = massDiffA; } currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); ionFound[k] = ionFound[k] * INTERNAL_FRAG_MULTIPLIER * NEUTRAL_LOSS_MULTIPLIER; if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 12; /*internal frag minus 17 or 18*/ } } k--; } /* Calculate internal fragments minus carbon monoxide.*/ k = saveIndex; while(fragMOverZ[k] >= intFragMinCOMinErr && k >= 0) { if(fragMOverZ[k] <= intFragMinCOPlusErr) { massDiff = abs(intFragMinCO - fragMOverZ[k]); currentIonFound = ionFound[k]; ionFound[k] = CalcIonFound(ionFound[k], massDiff); ionFound[k] = ionFound[k] * INTERNAL_FRAG_MULTIPLIER * NEUTRAL_LOSS_MULTIPLIER; if(currentIonFound > ionFound[k]) { ionFound[k] = currentIonFound; } else { ionType[k] = 13; /*internal frag minus CO*/ } } k--; } } /*if intFragFound*/ } /*for j*/ } /*for i*/ } /*if sequence is INT_4 enough*/ return; } /*******************************ArgIons************************************ * * ArgIons inputs the usual stuff for identifying ions (see PEFragments documentation). * It returns a char value of 0 or 1, depending whether Arg is present in the current * sequence. This function looks for an ion I call b - OH, which is unique to singly-charged * precursor ions containing arginine. The ion b-OH-NH3 is also identified. Only * singly-charged fragments are considered here. */ char ArgIons(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength) { INT_4 i, j, bOH, bOHMinErr, bOHPlusErr, bOHMinAmm, bOHMinAmmMinErr, bOHMinAmmPlusErr, massDiff, argCount; char argPresent = FALSE; char bOHPresent = 0; argCount = 0; /*counts the number of arginines in the sequence*/ /* Determine if Arg is in the sequence as a single amino acid. */ for(i = 0; i < seqLength; i++) { if((sequence[i] <= gMonoMass_x100[R] + gToleranceWide) && (sequence[i] >= gMonoMass_x100[R] -gToleranceWide)) { /* argPresent = TRUE;*/ argCount++; } } if(argPresent == FALSE) /*Check to see if Arg is present as a two amino acid gap.*/ { for(i = 0; i < seqLength; i++) /*For each "amino acid" in the array sequence.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] <= gArgPlus[j] + gToleranceWide) && (sequence[i] >= gArgPlus[j] - gToleranceWide)) /*Is the amino acid one of the two amino acid combinations that contain arginine?*/ { /* argPresent = TRUE;*/ argCount++; /* break;*/ } } /* if(argPresent) // { // break;*/ /*If I found arginine, why look for more?*/ /* }*/ } } /*If the number of arginines exceeds the number of protons, then its a non-mobile proton situation, which is designated by argPresent becoming TRUE. Also, the bOH ions are counted here*/ if(argCount >= gParam.chargeState) { argPresent = TRUE; bOH = gParam.peptideMW + gElementMass_x100[HYDROGEN] - sequence[seqLength - 1]; bOHMinErr = bOH - gToleranceWide; bOHPlusErr = bOH + gToleranceWide; i = fragNum - 1; while(fragMOverZ[i] >= bOHMinErr && i >= 0) { if(fragMOverZ[i] <= bOHPlusErr) { massDiff = abs(bOH - fragMOverZ[i]); ionFound[i] = CalcIonFound(ionFound[i], massDiff); bOHPresent = 1; } i--; } if(bOHPresent) { bOHMinAmm = bOH - gAmmonia; bOHMinAmmMinErr = bOHMinAmm - gToleranceWide; bOHMinAmmPlusErr = bOHMinAmm + gToleranceWide; i = fragNum - 1; while(fragMOverZ[i] >= bOHMinAmmMinErr && i >= 0) { if(fragMOverZ[i] <= bOHMinAmmPlusErr) { massDiff = abs(bOHMinAmm - fragMOverZ[i]); ionFound[i] = CalcIonFound(ionFound[i], massDiff); } i--; } } } return(argPresent); } /****************************LoadSequence*********************************************** * * LoadSequence inputs a INT_4 array called "sequence", a pointer to a INT_4 * "seqLength", and a pointer to a struct of type SequenceData. It takes the * INT_4 sequence found in the "peptide" field of the struct, and puts a sequence * of integers in the "sequence" array. This sequence of integers corresponds to the monoisotopic * residue mass of single amino acids or pairs of amino acids. * number for the amino acids found in gSingAA. */ void LoadSequence(INT_4 *sequence, INT_4 *seqLength, struct Sequence *currSeqPtr) { INT_4 i, j, testMass; char test; *seqLength = currSeqPtr->peptideLength; for(i = 0; i < *seqLength; i++) { test = TRUE; /*make sure the correct mass is used*/ testMass = currSeqPtr->peptide[i]; for(j = 0; j < gAminoAcidNumber; j++) { /*if((testMass <= gGapList[j] + gToleranceNarrow) && (testMass >= gGapList[j] - gToleranceNarrow))*/ if(testMass == gGapList[j]) { sequence[i] = gMonoMass_x100[j]; test = FALSE; break; } } if(test) /*if two aa extension*/ { sequence[i] = testMass; } } return; } /********************************WaterLoss************************************************* * * WaterLoss identifies fragment ions that are due to losses of two waters, two ammonias * or one of each. The input is the ionFound array (0 = not identified, 1 = identified), * fragNum (fragment ion number), fragmentErr (fragment ion tolerance), fragMOverZ array * (of fragment ion masses), and chargeState (the precursor ion charge state). It * returns nothing, but it does modify the ionFound array if ions match w/ calculated values. */ void WaterLoss(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity) { INT_4 precurMin2W, precurMinWA, precurMin2A, precurMinW, precurMinA; INT_4 precurMin2WMinErr, precurMin2WPlusErr, precurMinWAMinErr, precurMinWAPlusErr, precurMin2AMinErr, precurMin2APlusErr, precurMinWPlusErr, precurMinWMinErr, precurMinAPlusErr, precurMinAMinErr; INT_4 i; precurMin2W = (gParam.peptideMW - gWater - gWater + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; precurMinWA = (gParam.peptideMW - gWater - gAmmonia + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; precurMin2A = (gParam.peptideMW - gAmmonia - gAmmonia + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; precurMinW = (gParam.peptideMW - gWater + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; precurMinA = (gParam.peptideMW - gAmmonia + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; precurMin2WMinErr = precurMin2W - gToleranceWide; precurMin2WPlusErr = precurMin2W + gToleranceWide; precurMinWAMinErr = precurMinWA - gToleranceWide; precurMinWAPlusErr = precurMinWA + gToleranceWide; precurMin2AMinErr = precurMin2A - gToleranceWide; precurMin2APlusErr = precurMin2A + gToleranceWide; precurMinWPlusErr = precurMinW + gToleranceWide; precurMinWMinErr = precurMinW - gToleranceWide; precurMinAPlusErr = precurMinA + gToleranceWide; precurMinAMinErr = precurMinA - gToleranceWide; for(i = 0; i < fragNum; i++) { if(fragMOverZ[i] <= precurMin2APlusErr) { if(fragMOverZ[i] >= precurMin2WMinErr) { if(((fragMOverZ[i] <= precurMin2WPlusErr) && (fragMOverZ[i] >= precurMin2WMinErr)) || ((fragMOverZ[i] <= precurMinWAPlusErr) && (fragMOverZ[i] >= precurMinWAMinErr)) || ((fragMOverZ[i] <= precurMin2APlusErr) && (fragMOverZ[i] >= precurMin2AMinErr)) || ((fragMOverZ[i] <= precurMinWPlusErr) && (fragMOverZ[i] >= precurMinWMinErr)) || ((fragMOverZ[i] <= precurMinAPlusErr) && (fragMOverZ[i] >= precurMinAMinErr))) { ionFound[i] = 1; } } } } /* If an ion was converted to intensity of zero in the function TotalIntensity, it is identified as found here.*/ for(i = 0; i < fragNum; i++) { if(fragIntensity[i] == 0) { ionFound[i] = 1; } } return; } /****************************ScoreC1************************************************* * * If the first position is a two amino acid residue, then check to see if it could * be Gln and if the remaining mass corresponds to an amino acid. If so, define the * residue as two separate ones X and Q. If the N-terminus is a single amino acid,then * check if the next one is Q. Score the c1 ion. */ INT_4 ScoreC1(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength) { INT_4 i, c1Ion, testMass, massDiff; REAL_4 currentIonFound; BOOLEAN oneAA; /* Initalize*/ oneAA = FALSE; c1Ion = gElementMass_x100[HYDROGEN] * 4 + gElementMass_x100[NITROGEN]; /* Test to see if the N-terminal residue contains one amino acid, or more than one.*/ for(i = 0; i < gAminoAcidNumber; i++) { if(sequence[0] == gGapList[i]) { oneAA = TRUE; break; } } /* Figure out which c1 ion to use*/ if(oneAA) { if(sequence[1] == gGapList[Q]) { c1Ion += sequence[0]; } else { c1Ion = 0; /*this is que that there was no Q*/ } } else { testMass = sequence[0] - gGapList[Q]; for(i = 0; i < gAminoAcidNumber; i++) { if(testMass <= gGapList[i] + gToleranceWide && testMass >= gGapList[i] - gToleranceWide) { c1Ion += gGapList[i]; break; } } if(c1Ion == gElementMass_x100[HYDROGEN] * 4 + gElementMass_x100[NITROGEN]) { c1Ion = 0; /*que that Q is not there*/ } } /* Find the c1 ion in the list of fragment masses*/ for(i = 0; i < fragNum; i++) { if(fragMOverZ[i] >= c1Ion - gToleranceWide && fragMOverZ[i] <= c1Ion + gToleranceWide) { massDiff = abs(c1Ion - fragMOverZ[i]); currentIonFound = ionFound[i]; ionFound[i] = CalcIonFound(ionFound[i], massDiff); if(currentIonFound > ionFound[i]) { ionFound[i] = currentIonFound; } } } /* Redefine the N-terminal 2 aa residue if it contains Q and c1*/ if(c1Ion && !oneAA) { for(i = seqLength - 1; i > 0; i--) { sequence[i + 1] = sequence[i]; } sequence[1] = gGapList[Q]; sequence[0] = c1Ion - gElementMass_x100[HYDROGEN] * 4 - gElementMass_x100[NITROGEN]; seqLength++; } return(seqLength); } /****************************PEFragments********************************************* * * PEFragments identifies fragment ions that are due to pyridylethylated cysteines. * The input is the ionFound array (0 = not identified, 1 = identified), fragNum * (fragment ion number), fragmentErr (fragment ion tolerance), fragMOverZ array (of * fragment ion masses), and chargeState (the precursor ion charge state). It returns * nothing, but it does modify the ionFound array if ions match w/ calculated values. */ void PEFragments(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength) { INT_4 i, j, massDiff; INT_4 peMinusErr, pePlusErr, peFragment, peFragMinusErr, peFragPlusErr; REAL_4 monoCysPe; char test = FALSE; monoCysPe = 208.07 * gMultiplier; /*The nominal residue mass of PE-Cys is 208.*/ /* Determine if cys is present in the sequence. */ for(i = 0; i < seqLength; i++) { if(sequence[i] >= (monoCysPe - gToleranceWide) && sequence[i] <= (monoCysPe + gToleranceWide)) /*Check if cys is present as a single amino acid.*/ { test = TRUE; break; } } if(test == FALSE) /*Check to see if cys is present as a two amino acid gap.*/ { for(i = 0; i < seqLength; i++) /*For each "amino acid" in the array sequence.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] >= (gCysPlus[j] - gToleranceWide) && sequence[i] <= (gCysPlus[j] + gToleranceWide)) /*Compare each amino acid to the list of possible two amino acid combinations containing cysteine.*/ { test = TRUE; break; } } if(test) { break; /*Why keep checking for cysteines if its been found already?*/ } } } if(test) /*If there is a PECys in this sequence.*/ { /* Initialize variables. */ peMinusErr = gParam.cysMW - gToleranceWide; pePlusErr = gParam.cysMW + gToleranceWide; /* Search for fragment ions. */ for(i = 0; i < fragNum; i++) { if(fragMOverZ[i] <= pePlusErr) { if(fragMOverZ[i] >= peMinusErr) { massDiff = abs((long)(106.07 * gMultiplier - fragMOverZ[i])); ionFound[i] = CalcIonFound(ionFound[i], massDiff); } } } for(j = 1; j <= gParam.chargeState; j++) /*Calculate the loss of PE for each charge state.*/ { /* Initialize some variables. */ peFragment = (gParam.peptideMW + (j * gElementMass_x100[HYDROGEN]) - 105.07 * gMultiplier) / j; peFragMinusErr = peFragment - gToleranceWide; peFragPlusErr = peFragment + gToleranceWide; for(i = 0; i < fragNum; i++) /*Find losses of PE from precursor ion.*/ { if(fragMOverZ[i] >= peFragMinusErr) { if(fragMOverZ[i] <= peFragPlusErr) { massDiff = abs(peFragment - fragMOverZ[i]); ionFound[i] = CalcIonFound(ionFound[i], massDiff); } } } } } return; } /*****************************TotalIntensity****************************************** * * TotalIntensity calculates the total ion intensity found in the input array called * "fragIntensity". The region around the precursor ion is not counted in this value. * It returns a INT_4 corresponding to the sum of all appropriate intensity. Also * ion intensity around the precursor ion is reassigned a value of zero. */ INT_4 TotalIntensity(INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity) { INT_4 i; INT_4 totalIntensity = 0; INT_4 precursorMinWater, precursor, precursorMinAmmonia, highLimit; char charge; if(gParam.maxent3) { charge = 1; } else { charge = gParam.chargeState; } precursor = (gParam.peptideMW + (charge * gElementMass_x100[HYDROGEN])) / charge; precursorMinWater = precursor - (gWater / charge); precursorMinAmmonia = precursor - (gAmmonia / charge); highLimit = gParam.peptideMW + gElementMass_x100[HYDROGEN] - gMonoMass_x100[G] + gToleranceWide; for(i = 0; i < fragNum; i++) { if(((fragMOverZ[i] > (precursorMinWater - gToleranceWide)) && (fragMOverZ[i] < (precursorMinWater + gToleranceWide))) || ((fragMOverZ[i] < (precursorMinAmmonia + gToleranceWide)) && (fragMOverZ[i] > (precursorMinAmmonia - gToleranceWide)))) { fragIntensity[i] = 0; } else if(fragMOverZ[i] <= precursor + (gToleranceWide * 2) && fragMOverZ[i] >= precursor - (gToleranceWide * 3)) { fragIntensity[i] = 0; } else if(fragMOverZ[i] > highLimit) { fragIntensity[i] = 0; } else { totalIntensity += fragIntensity[i]; } } return(totalIntensity); } /******************************LoadTheIonArrays******************************************** * * LoadTheIonArrays inputs firstMassPtr, which points to the first element in a linked * list of MSData structs containing the cid data, plus two arrays defined in * ScoreSequences - fragMOverZ, and fragIntensity - each of which contains MAX_ION_NUM * elements. This function initializes these two arrays. If there are less than * MAX_ION_NUM elements in the linked list of MSData structs, then these are copied * directly to the two arrays. "fragNum" is the fragment number. If fragNum exceeds * MAX_ION_NUM, then the most intense ions are loaded into the arrays. */ void LoadTheIonArrays(struct MSData *firstMassPtr, INT_4 *fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity) { struct MSData *currMSPtr, *destroyPtr; INT_4 i, j, destroyIndex, *mostIntMass, *mostIntInt; mostIntMass = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(mostIntMass == NULL) { printf("LoadTheIonArrays: Out of memory."); exit(1); } mostIntInt = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); if(mostIntInt == NULL) { printf("LoadTheIonArrays: Out of memory."); exit(1); } /* Initialize variables. */ currMSPtr = firstMassPtr; *fragNum = 0; /* Count the number of ions in the linked list, and multiply the mass values by 100. */ while(currMSPtr != NULL) { *fragNum += 1; currMSPtr = currMSPtr->next; } /* If the arrays are large enough, just load the linked list w/o any modifications. */ if(*fragNum < MAX_ION_NUM) { currMSPtr = firstMassPtr; i = 0; while(currMSPtr != NULL) { fragMOverZ[i] = (REAL_4)currMSPtr->mOverZ + 0.5; /*had strange effects on Mac where, for example, 0.5 becomes 0.49999999 which then gets rounded to zero. Only noticed when it was looping thru here many times*/ fragIntensity[i] = currMSPtr->intensity; i++; currMSPtr = currMSPtr->next; } } /* If there is not enough room in the arrays, then load only the most intense ions and load the full array. */ else { *fragNum = MAX_ION_NUM; for(i = 0; i < MAX_ION_NUM; i++) /*Find the most intense ions.*/ { mostIntMass[i] = (REAL_4)firstMassPtr->mOverZ + 0.5; mostIntInt[i] = firstMassPtr->intensity; currMSPtr = firstMassPtr->next; while(currMSPtr != NULL) { if(currMSPtr->intensity > mostIntInt[i]) { mostIntInt[i] = currMSPtr->intensity; mostIntMass[i] = (REAL_4)currMSPtr->mOverZ + 0.5; destroyPtr = currMSPtr; } currMSPtr = currMSPtr->next; } destroyPtr->intensity = 0; } for(i = 0; i < MAX_ION_NUM; i++) /*Then sort by m/z.*/ { destroyIndex = 0; fragMOverZ[i] = mostIntMass[0]; fragIntensity[i] = mostIntInt[0]; for(j = 0; j < MAX_ION_NUM; j++) { if(mostIntMass[j] < fragMOverZ[i]) { fragMOverZ[i] = mostIntMass[j]; fragIntensity[i] = mostIntInt[j]; destroyIndex = j; } } mostIntMass[destroyIndex] = 10000; } } free(mostIntMass); free(mostIntInt); return; } /***************************ScoreSequences************************************ * * ScoreSequences is called by main in the file LutefiskMain.c and returns a linked list * of structs containing information on the sequence, its rank, intensity score, and * cross-correlation score. * * Input parameters are pointers to the first Sequence struct (containing the sequences * to be scored), and the first MSData struct (containing the cid data), and various * fields within the global struct gParam - "peptideMW" (the peptide molecular weight), * "fragmentErr" (the fragment ion m/z tolerance), "chargeState" (the charge state of the * precursor ion), and "cysMW" (the mass of cysteine - used to figure out the type of * cysteine fragmentation). The INT_4 "rankedSeqNum" is the number of sequences that * will be in the returned linked list of ranked and scored sequences (firstScorePtr), * BUT IT'S STILL NOT USED. */ struct Sequence *ScoreSequences(struct Sequence *firstSequencePtr, struct MSData *firstMassPtr) { char cysPE, addSequence, databaseSeq; char argPresent = 0; INT_4 *sequence, *charSequence, *ionType; INT_4 precursor, averageBYScore = 0, highestBYScore = 0; INT_4 *fragMOverZ, *fragIntensity, *saveFragMOverZ; INT_4 i, j, fragNum, intensityTotal, seqLength = 0, storedSeqNum, countTheSeqs; INT_4 cleavageSites = 0, length = 0; REAL_4 realSeqLengthNoFudgingAtAll = 0; REAL_4 intScore, intOnlyScore, *ionFound, *ionFoundTemplate, *yFound, *bFound; REAL_4 lowMassIonConversion, lowMassCys, residueNumGuess = 0, minQuality = 0; REAL_8 *byError, perfectProbScore = 0; REAL_4 stDevErr, realSeqLength, calFactor = 1, quality = 0; REAL_4 probScore = 0; /*INT_4 m;*/ /*debug*/ BOOLEAN aSequenceFound = FALSE; /*debugging*/ BOOLEAN test; /*debugging*/ INT_4 z = 0; /*debugging*/ struct SequenceScore *firstScorePtr, *lowScorePtr; struct SequenceScore *massagedSeqListPtr = NULL, *currMassagePtr = NULL; struct Sequence *currSeqPtr; /* * lowMassIons contains values corresponding to amino acid immonium ions or other pieces * of amino acids. * There are three ions per amino acid, most amino acids have a single immonium ion m/z. */ INT_4 lowMassIons[AMINO_ACID_NUMBER][3] = { /* A */ 440500, 0, 0, /* R */ 700657, 870922, 1120875, /* N */ 870558, 0, 0, /* D */ 880399, 0, 0, /* C */ 0, 0, 0, /* E */ 1020555, 0, 0, /* Q */ 840450, 1010715, 1290664, /* G */ 0, 0, 0, /* H */ 1100718, 0, 0, /* I */ 860970, 1200483, 0, /*I position also represent oxidized Met for qtof*/ /* L */ 860970, 0, 0, /* K */ 840814, 1011079, 1291028, /* M */ 1040534, 0, 0, /* F */ 1200813, 0, 0, /* P */ 700657, 0, 0, /* S */ 600449, 0, 0, /* T */ 740606, 0, 0, /* W */ 1590922, 0, 0, /* Y */ 1360762, 0, 0, /* V */ 720813, 0, 0, 0,0,0, 0,0,0, 0,0,0, 0,0,0, 0,0,0 }; /*check that the sequences do not exceed the array*/ currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { if(currSeqPtr->peptideLength > MAX_PEPTIDE_LENGTH) { printf("LutefiskScore: peptideLength too long"); exit(1); } currSeqPtr = currSeqPtr->next; } /*Convert the lowMassIons to the appropriate value given the gMultiplier value.*/ lowMassIonConversion = (REAL_4)gMultiplier / 10000; for(i = 0; i < gAminoAcidNumber; i++) { for(j = 0; j < 3; j++) { lowMassIons[i][j] = (REAL_4)lowMassIons[i][j] * lowMassIonConversion + 0.5; } } /*Assign a value to Cys, using gParam.cysMW to account for alkylations*/ lowMassCys = ((REAL_4)gParam.cysMW / gMultiplier); /*convert to real mass value temporarily*/ lowMassCys = lowMassCys - 26.9871; /*this is loss of CO plus H*/ lowMassCys = lowMassCys * gMultiplier; /*convert back*/ lowMassIons[C][0] = lowMassCys + 0.5; /*round it off*/ /* Assign some space for the arrays. */ sequence = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); charSequence = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); fragMOverZ = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); saveFragMOverZ = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); fragIntensity = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); ionFound = (float *) malloc(MAX_ION_NUM * sizeof(REAL_4)); ionType = (int *) malloc(MAX_ION_NUM * sizeof(INT_4)); byError = (double *) malloc(MAX_ION_NUM * sizeof(REAL_8)); ionFoundTemplate = (float *) malloc(MAX_ION_NUM * sizeof(REAL_4)); bFound = (float *) malloc(MAX_ION_NUM * sizeof(REAL_4)); yFound = (float *) malloc(MAX_ION_NUM * sizeof(REAL_4)); if(gAmIHere) /*debugging*/ { aSequenceFound = CheckItOut(firstSequencePtr); } /* Initialize some more variables. */ /*For high charge states, I won't recalibrate, so use wide tolerance.*/ if(gParam.qtofErr != 0) { if(gParam.chargeState > 3) { gParam.qtofErr = gParam.fragmentErr; } } gToleranceNarrow = gParam.fragmentErr * 0.95; /*0.8This is done for the fuzzy logic.*/ gToleranceWide = gParam.fragmentErr * 1.5; /*1.2This is done for the fuzzy logic.*/ storedSeqNum = 0; firstScorePtr = NULL; /*This is the pointer that is returned by LutefiskScore*/ lowScorePtr = NULL; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; fragNum = 0; gGapListDipeptideIndex = gGapListIndex; /* Check the arrays, count the sequences in the final list of completed sequences, put the sequence tag back into the list of sequences, determine if there is a C-terminal Lys or Arg for tryptic peptides, and multiply several mass variables by 100 so that integers can be used rather than REAL_4s. */ firstSequencePtr = InitLutefiskScore(sequence, fragMOverZ, fragIntensity, ionFound, ionFoundTemplate, &countTheSeqs, firstSequencePtr, yFound, bFound, firstMassPtr, charSequence, byError, ionType); if(gAmIHere) { aSequenceFound = CheckItOut(firstSequencePtr); } /* * Figure out minimum spectrum quality based on the max sequence tag length. */ residueNumGuess = gParam.peptideMW / (AV_RESIDUE_MASS * gMultiplier); if(residueNumGuess != 0) { minQuality = gTagLength / residueNumGuess; } /* * Since the order (forward or backward) orientation of the peptide sequence is usually based * on the presence of characteristic ions (eg, y1=147,175 for tryptic peptides), and since * these ions are not necessarily very intense, and since the scores are based on accounting * for the greatest percentage of ion intensity, this function boosts the ion intensities * for these characteristic ions, if they are present. */ if(gFirstTimeThru) /*do this once; don't keep doing it for each peptide MW*/ { BoostTheCTerminals(firstMassPtr); } /* Load the m/z and intensity arrays containing the cid data; count the ions, too. */ if(firstMassPtr == NULL) { printf("Most distressing! There seems to be no CID data."); exit(1); } LoadTheIonArrays(firstMassPtr, &fragNum, fragMOverZ, fragIntensity); /* * Find the average ion intensity and the standard deviation of the ion intensity, * and if any ions have exceed a 1.64 x stDev these intensities are adjusted * up or down. */ AdjustIonIntensity(fragNum, fragIntensity); /*since this is not affecting firstMassPtr, this can be done repeatedly w/o worrying about getting weird effects*/ /* Initialize ionFoundTemplate elements to zero. */ for(i = 0; i < fragNum; i++) { ionFoundTemplate[i] = 0; } /* Calculate the total ion intensity. */ intensityTotal = TotalIntensity(fragNum, fragMOverZ, fragIntensity); /* Figure out if the peptide is pyridylethylated. */ if((gParam.cysMW >= ((208.07 * gMultiplier) - gToleranceWide)) && (gParam.cysMW <= ((208.07 * gMultiplier) + gToleranceWide))) { cysPE = 1; } else { cysPE = 0; } /* Identify losses of water or ammonia, 2 waters, 2 ammonias, or one of each from the precursor ion. */ WaterLoss(ionFoundTemplate, fragNum, fragMOverZ, fragIntensity); /* I'll first just score for b and y ions. Those sequences that seem to have alot of b and y ions delineating their sequences will be kept, whereas those with few b and y (or alternating b and y) will be discarded. The first sequence in the linked list will always be kept in order to keep things simple. */ if((gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q' || gParam.fragmentPattern == 'L') && gParam.chargeState > 1) /*Only do this for tryptic peptides that have precursor charges greater than one.*/ { if(countTheSeqs > 100) /*don't bother weeding out ridiculous sequences if there's only a few*/ { TossTheLosers(firstSequencePtr, ionFoundTemplate, fragNum, fragMOverZ, fragIntensity, intensityTotal, ionFound, &countTheSeqs, sequence); } if(gAmIHere) { aSequenceFound = CheckItOut(firstSequencePtr); } /*Get rid of sequences that cannot account for most of the higher m/z fragments*/ if(countTheSeqs > 100) /*don't bother weeding out ridiculous sequences if there's only a few*/ { HighMOverZFilter(firstSequencePtr, fragMOverZ, fragIntensity, &countTheSeqs, sequence, fragNum); } if(gAmIHere) { aSequenceFound = CheckItOut(firstSequencePtr); } } /* * Add the database sequence(s) to the list in firstSequencePtr. */ if (strlen(gParam.databaseSequences) > 0 && gCorrectMass) { AddDatabaseSequences(firstSequencePtr); } if(gAmIHere) { aSequenceFound = CheckItOut(firstSequencePtr); } /* * Yank out sequences where a two aa extension matches single aa extensions in another seq. * These are usually cases where a two aa extension happens to match with three amino acids. */ firstSequencePtr = RemoveRedundantSequences(firstSequencePtr); if(gAmIHere) { aSequenceFound = CheckItOut(firstSequencePtr); } /* For Qtof data, if the qtof error value is sufficient to distinguish between Q/K, F/M-O, and isobaric dipeptides, then the number of sequences are expanded to account for all of the possibilities. Also, gGapList is redone to reflect the tighter qtof final score tolerance. Since I/L are isomeric, and since I use L for either I/L, the 'I' position in gGapList is reserved for oxidized Met and the single letter code for this position is changed to 'm'. */ if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0) { MakeNewgGapList(); /*Creates single aa and dipeptides and eliminates only those with exactly the same mass. If any amino acids in the sequence lists are not from single or dipeptides (ie, three amino acids, say) then these masses are added to gGapList.*/ /*Count the sequences, and if too many do an extensive scoring using the wide tolerances of gParam.fragmentErr and remove sequences on that basis*/ countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != 0) { countTheSeqs++; currSeqPtr = currSeqPtr->next; } if(countTheSeqs > MAX_QTOF_SEQUENCES) { RescoreAndPrune(firstSequencePtr, ionFound, fragNum, fragMOverZ, sequence, seqLength, argPresent, yFound, bFound, byError, cleavageSites, lowMassIons, ionFoundTemplate, fragIntensity, intensityTotal, ionType); } if(countTheSeqs > 50 && !gCorrectMass) /*start w/ fewer sequences if mass is wrong*/ { RescoreAndPrune(firstSequencePtr, ionFound, fragNum, fragMOverZ, sequence, seqLength, argPresent, yFound, bFound, byError, cleavageSites, lowMassIons, ionFoundTemplate, fragIntensity, intensityTotal, ionType); } ExpandSequences(firstSequencePtr); /*Expands the number of sequences if the qtofErr can differentiate between aa's and dipeptides of the same nominal mass.*/ gToleranceNarrow = gParam.qtofErr * 0.95; /*0.9This is done for the fuzzy logic.*/ gToleranceWide = gParam.qtofErr * 3; /*2This is done for the fuzzy logic.*/ } /*printf("Prior to AddToGapList\n");*/ AddToGapList(firstSequencePtr); /*add unusual dipeptide masses to gGapList*/ /*printf("After AddToGapList\n");*/ /*===================================================================================== * Here's the giant while loop, that scores each sequence in the linked list of * SequenceData structs. */ /*m = 0;*/ /*debug*/ currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { /*m++;*/ /*debug*/ /*if(z == 347) /*debugging*/ /*{ z++; }*/ /* * Write the sequence to the INT_4 array "sequence", and count the number of amino acids * ("seqLength"). Actually, its not the sequence that is in "sequence", rather its the * nominal mass of a single amino acid or a pair of amino acids times 100 (all mass values * are 100 x actual value in this file). */ LoadSequence(sequence, &seqLength, currSeqPtr); /*debugging*/ /*test = TRUE; if(seqLength == 12) { for(i = 0; i < seqLength; i++) { if(gRightSequence[i] <= sequence[i] - gToleranceWide || gRightSequence[i] >= sequence[i] + gToleranceWide) { test = FALSE; } } if(test) { i++; } }*/ /* Note if this is a database sequence (used to flag such sequences in the output)*/ if(currSeqPtr->gapNum == -100) { databaseSeq = TRUE; } else { databaseSeq = FALSE; } /* Qtof data is recalibrated for each sequence using y ions greater than m/z 500. The original fragMOverZ values are saved in saveFragMOverZ, and restored after the sequence has been scored. Once the data has been recalibrated the tolerances are narrowed to the scorerror values from the .param file. */ if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0 && gParam.chargeState <= 3) { for(i = 0; i < fragNum; i++) /*save the original data*/ { saveFragMOverZ[i] = fragMOverZ[i]; } /*printf("%d Before Recalibrate\n", m);*/ calFactor = Recalibrate(fragNum, fragMOverZ, sequence, seqLength, fragIntensity); /*printf("%d After Recalibrate\n", m);*/ } /* Initialize this variable each time around. */ for(i = 0; i < fragNum; i++) { ionFound[i] = ionFoundTemplate[i]; /*ionFoundTemplate contains identifications of ions that do not change from sequence to sequence. Therefore, these identifications were taken out of the giant while loop, and instead of initializing "ionFound" to zero, its initialized to ionFoundTemplate.*/ yFound[i] = 0; /*yFound and bFound are used to cut the scores for sequences that utilze the same ions for y and b's.*/ bFound[i] = 0; byError[i] = 100; /*Errors in b and y ions are placed in this array, where index matches the fragNum indexing*/ if(ionFoundTemplate[i] != 0) { ionType[i] = 1; /*Precursor ions are of type 1*/ } else { ionType[i] = 0; /*Initialize all other positions as 0 (random) ion types*/ } } /* Identify Arg-related ions, if the precursor is singly-charged and arginine is present in the sequence. Also checks if arg is present (TRUE or FALSE). */ argPresent = ArgIons(ionFound, fragNum, fragMOverZ, sequence, seqLength); /* Identify a, b, and y ions. "cleavageSites" is used in assigning the intensity based score later on. It is the number of times a peptide bond was cleaved or delineated by a b or y type ion. */ cleavageSites = FindABYIons(ionFound, fragNum, fragMOverZ, sequence, seqLength, argPresent, yFound, bFound, byError, ionType); /* Fool with the cleavageSites value and the ionFound values for sequences that used the same series of ions for both y and b. */ cleavageSites = AlterIonFound(ionFound, fragNum, fragMOverZ, sequence, seqLength, yFound, bFound, cleavageSites); /*if gapNum = -100, this signals that the sequence is from the database*/ if(currSeqPtr->gapNum == -100) { cleavageSites = (currSeqPtr->peptideLength) - 1; } /* Find any c1 ions when Q is at the second position from the N-terminus. */ seqLength = ScoreC1(ionFound, fragNum, fragMOverZ, sequence, seqLength); /* Identify pyridylethylated cysteine fragments, but only if PE'ed cysteine is in sequence. */ if(cysPE) { PEFragments(ionFound, fragNum, fragMOverZ, sequence, seqLength); } /* Identify internal fragment ions. */ InternalFrag(ionFound, fragMOverZ, sequence, seqLength, fragNum, ionType); /* Identify low mass amino acid - specific ions. */ ScoreLowMassIons(ionFound, fragMOverZ, sequence, seqLength, lowMassIons, ionType); /* For ions that are one dalton higher than another that has been identified as b or y, assign this as being partially found, too.*/ if(gParam.peakWidth < 0.7 * gMultiplier && !gParam.maxent3) { ScoreBYIsotopes(ionFound, fragMOverZ, fragNum, ionType); } /* Calculate the actual number of amino acids in the sequence where gaps are counted as two amino acids. */ realSeqLength = SequenceLengthCalc(sequence, seqLength); /* * Calculate the number of amino acids, not accounting for Pro mis-cleavages and N-terminal dipeptides. */ realSeqLengthNoFudgingAtAll = SequenceLengthCalcNoFudge(sequence, seqLength); /* Assign the intensity-based score for the sequence.*/ if(databaseSeq) { cleavageSites = realSeqLength - 1; /*don't penalize database-derived sequences in the score*/ } intScore = IntensityScorer(fragIntensity, ionFound, cleavageSites, fragNum, realSeqLength, intensityTotal); /* For qtof data, the intScore is modified so that larger correction factors to the calibration will attenuate the score.*/ /* if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0 && gParam.chargeState <= 2) { intScore = ScoreAttenuationFromCalfactor(calFactor, intScore); } */ /* Determine the standard deviation of the average error.*/ stDevErr = StandardDeviationOfTheBYErrors(byError, fragNum); /* Recalculate the intensity-based score w/o attenuation and other tricks.*/ intOnlyScore = IntensityOnlyScorer(fragIntensity, ionFound, fragNum, intensityTotal); /* * Calculate quality as mass of amino acids defined by contiguous series divided by total residue mass */ quality = MassBasedQuality(sequence, seqLength, fragNum, fragMOverZ, argPresent); /* * Determine Pavel probability score. */ if(sequence[0] == 2029) { i++; } probScore = LutefiskProbScorer(sequence, seqLength, fragNum, fragMOverZ, argPresent); /*normalize the score by using the sequence length*/ probScore = probScore / (2 * realSeqLengthNoFudgingAtAll); gProbScoreMax = gProbScoreMax / (2 * realSeqLengthNoFudgingAtAll); if(gProbScoreMax > perfectProbScore) { perfectProbScore = gProbScoreMax; } /* Qtof data is recalibrated for each sequence using y ions greater than m/z 500. The original fragMOverZ values are saved in saveFragMOverZ, and restored after the sequence has been scored. Once the data has been recalibrated the tolerances are narrowed to the scorerror values from the .param file. */ if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0 && gParam.chargeState <= 3) { for(i = 0; i < fragNum; i++) /*save the original data*/ { fragMOverZ[i] = saveFragMOverZ[i]; } } /* Check for quality of spectrum. gCleavageSiteStringent is the longest contiguous series of b or y ions but it counts two aa residues the same as one aa residues. realSeqLength is the actual lenght of the sequence where two aa residues are counted as two amino acids rather than a single residue. gSingleAACleavageSites is the number of contiguous b or y ions that define a sequence made up entirely of single amino acid residues -- no gaps are included. The realSeqLengthNoFudgingAtAll value represents the best guess at the number of amino acids in the sequence; in contrast, realSeqLength makes allowances for gaps that might include a proline, as well as an N-terminal gap -- those are counted as single residues. */ /*avoid divide by zero*/ /*if(realSeqLengthNoFudgingAtAll > 1) { quality = gSingleAACleavageSites / (realSeqLengthNoFudgingAtAll - 1); length = gSingleAACleavageSites; } else { quality = 0; length = 0; } This is the length based quality, which is not valid unless unsequenced regions are dipeptides*/ /* * Adjust quality value higher if its zero and a sequence tag was found */ if(quality == 0) { if(minQuality > 0) { quality = minQuality; /*minQuality is derived from the fraction of sequence covered by a sequence tag*/ } } /* Store the sequence, intensity score, actual peptide length, and quality.*/ addSequence = IsThisADuplicate(firstScorePtr, sequence, intOnlyScore, intScore, seqLength); if(addSequence || currSeqPtr->gapNum == -100) /*-100 flag for database seq*/ { if(storedSeqNum <= MAX_X_CORR_NUM) { firstScorePtr = AddToSeqScoreList(firstScorePtr, LoadSeqScoreStruct(intScore, intOnlyScore, sequence, charSequence, seqLength, stDevErr, cleavageSites, calFactor, databaseSeq, intOnlyScore, quality, length, probScore, 0.0)); storedSeqNum++; } else { if(lowScorePtr == NULL) /*Find the lowest intensity-based score out of all stored sequences.*/ { lowScorePtr = FindLowestScore(firstScorePtr); } if(intScore > lowScorePtr->intensityScore) { lowScorePtr->intensityScore = intScore; for(i = 0; i < seqLength; i++) { lowScorePtr->peptide[i] = sequence[i]; lowScorePtr->peptideSequence[i] = charSequence[i]; } lowScorePtr->peptide[seqLength] = 0; lowScorePtr->intensityOnlyScore = intOnlyScore; lowScorePtr->stDevErr = stDevErr; lowScorePtr->crossDressingScore = intScore; lowScorePtr->calFactor = calFactor; lowScorePtr->quality = quality; lowScorePtr->length = length; lowScorePtr->probScore = probScore; lowScorePtr->cleavageSites = cleavageSites; lowScorePtr->databaseSeq = databaseSeq; lowScorePtr->rank = 0; lowScorePtr = NULL; /*If this is not NULL, then that means it found the lowScorePtr earlier, but the sequence that was previously under consideration had a lower score. This means keeps the program from searching for the same lowScorePtr, unless its been NULLed.*/ } } } /*debugging*/ if(gAmIHere) { aSequenceFound = CheckItOutSequenceScore(firstScorePtr); if(!aSequenceFound) { z++; /*stop in debugger*/ } else { z++; } }/*end debugging*/ currSeqPtr = currSeqPtr->next; /*Point to the next struct in the linked list to continue the giant while loop.*/ } /*End of the giant while loop.*/ if(gAmIHere) { aSequenceFound = CheckItOutSequenceScore(firstScorePtr); } /* * Convert relevant mass values back to the real masses (not the * gMultiplier values). Convert * the mass-based peptide sequence to a character-based sequence and place in the peptideSequence * field of firstScorePtr. */ RevertBackToReals(firstMassPtr, firstScorePtr); if(gAmIHere) { aSequenceFound = CheckItOutSequenceScore(firstScorePtr); } /* * Here's where a rank is assigned based on the intensity-based score. */ SeqIntensityRanker(firstScorePtr); if(gAmIHere) { aSequenceFound = CheckItOutSequenceScore(firstScorePtr); } /* * Here's where the cross-correlation scoring would be done - after the giant while loop. * This is where the top rankedSeqNum number of scores are found, and the rest are discarded. * Cross-correlation scores are normalized to the auto-correlation of the actual spectrum. The * background associated with tau = 0 is determined by adding the symetrical differences around tau = 0. * This takes advantage of the fact that a real match is going to be symetrical, and bogus matches are not. */ if (gParam.fMonitor && gCorrectMass) { printf("Cross-dressing.\n"); fflush(stdout); } DoCrossCorrelationScoring(firstScorePtr, firstMassPtr); /* * Figure out a theoretically perfect probScore for comparison with actual probScores. */ /*perfectProbScore = CalcPerfectProbScore(fragNum, fragMOverZ);*/ if(gAmIHere) { aSequenceFound = CheckItOutSequenceScore(firstScorePtr); } if(gParam.fMonitor && gCorrectMass) { PrintToConsole(firstScorePtr); /* PrintScoreDetailsToXLFile(firstScorePtr, perfectProbScore);*/ /*debugging output*/ } /* Massage the scores to come up with a short list of most likely sequences. */ /* massagedSeqListPtr = MassageScores(firstScorePtr);*/ massagedSeqListPtr = DetermineBestCandidates(firstScorePtr); /* Find max quality and length values from the final list in the output. */ currMassagePtr = massagedSeqListPtr; quality = 0; length = 0; while(currMassagePtr != NULL) { if(currMassagePtr->quality > quality) { quality = currMassagePtr->quality; length = currMassagePtr->length; } currMassagePtr = currMassagePtr->next; } /* Stop the clock */ if(gCorrectMass) { gParam.searchTime = (clock() - gParam.startTicks)/ CLOCKS_PER_SEC; } /* Output is printed to the console and to a file.*/ if(gCorrectMass) { PrintToConsoleAndFile(massagedSeqListPtr, quality, length, perfectProbScore); } /* Free some linked lists*/ FreeAllSequenceScore(firstScorePtr); FreeAllSequenceScore(massagedSeqListPtr); RevertTheRevertBackToReals(firstMassPtr); /* Free the arrays, before I forget.*/ free(sequence); free(fragMOverZ); free(fragIntensity); free(ionFound); free(ionFoundTemplate); free(yFound); free(bFound); free(byError); free(charSequence); free(saveFragMOverZ); return(firstSequencePtr); /*Return a pointer to the massaged list of sequences and scores.*/ } /***************************MassBasedQuality********************************** * * Calculates quality as the mass of amino acids defined by a contiguous ion * series, divided by total mass of all residues. */ REAL_4 MassBasedQuality(INT_4 *sequence, INT_4 seqLength, INT_4 fragNum, INT_4 *fragMOverZ, char argPresent) { INT_4 i, j, k, maxCharge; REAL_4 bQualMass, yQualMass, quality, bSinglyCharged, ySinglyCharged, bIon, yIon, totalResidueMass; REAL_4 maxBQualMass, maxYQualMass, lowMassLimit; REAL_4 bMinWater, bMinAmmonia, yMinWater, yMinAmmonia; BOOLEAN singleAA, bIonFound, yIonFound, validTerminus; totalResidueMass = 0; maxBQualMass = 0; maxYQualMass = 0; lowMassLimit = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; lowMassLimit = lowMassLimit * 0.33333; if(gParam.chargeState > 1) { maxCharge = gParam.chargeState - 1; } else { maxCharge = 1; } /*Calculate quality based on b ions*/ bQualMass = 0; bSinglyCharged = gParam.modifiedNTerm; for(i = 0; i < seqLength; i++) { singleAA = FALSE; validTerminus = FALSE; bSinglyCharged = bSinglyCharged + sequence[i]; totalResidueMass += sequence[i]; for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] <= gMonoMass_x100[j] + gToleranceWide && sequence[i] >= gMonoMass_x100[j] - gToleranceWide) { singleAA = TRUE; validTerminus = TRUE; break; } } bIonFound = FALSE; /*set it to FALSE for each "residue"*/ if(i == 0 && !singleAA) /*Count the two aa at the N-terminus*/ { for(j = 0; j < gGapListDipeptideIndex; j++) { if(sequence[0] == gGapList[j]) /*make sure its really two aa, and not something more*/ { bIonFound = TRUE; validTerminus = TRUE; break; } } } if(singleAA) { for(j = 1; j <= maxCharge; j++) { bIon = (bSinglyCharged + (j-1) * gElementMass_x100[HYDROGEN]) / j; bMinWater = bSinglyCharged - (gElementMass_x100[HYDROGEN] * 2 + gElementMass_x100[OXYGEN]); bMinAmmonia = bSinglyCharged - (gElementMass_x100[HYDROGEN] * 3 + gElementMass_x100[NITROGEN]); bMinWater = (bMinWater + (j-1) * gElementMass_x100[HYDROGEN]) / j; bMinAmmonia = (bMinAmmonia + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(k = 0; k < fragNum; k++) { if(bIon <= fragMOverZ[k] + gToleranceWide && bIon >= fragMOverZ[k] - gToleranceWide) { bIonFound = TRUE; break; } if(argPresent) { if(bMinWater <= fragMOverZ[k] + gToleranceWide && bMinWater >= fragMOverZ[k] - gToleranceWide) { bIonFound = TRUE; break; } if(bMinAmmonia <= fragMOverZ[k] + gToleranceWide && bMinAmmonia >= fragMOverZ[k] - gToleranceWide) { bIonFound = TRUE; break; } } } if(bIonFound) { break; } } /*what if its ion trap data and the 1/3 rule predicts that the ions are too low of mass to be seen?*/ if(!bIonFound && gParam.fragmentPattern == 'L' && bSinglyCharged < lowMassLimit) { for(j = 1; j <= maxCharge; j++) { yIon = gParam.peptideMW - bSinglyCharged + 2 * gElementMass_x100[HYDROGEN]; yMinWater = yIon - (gElementMass_x100[OXYGEN] + 2 * gElementMass_x100[HYDROGEN]); yMinAmmonia = yIon - (gElementMass_x100[NITROGEN] + 3 * gElementMass_x100[HYDROGEN]); yIon = (yIon + (j-1) * gElementMass_x100[HYDROGEN]) / j; yMinWater = (yMinWater + (j-1) * gElementMass_x100[HYDROGEN]) / j; yMinAmmonia = (yMinAmmonia + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(k = 0; k < fragNum; k++) { if(yIon <= fragMOverZ[k] + gToleranceWide && yIon >= fragMOverZ[k] - gToleranceWide) { bIonFound = TRUE; break; } if(argPresent) { if(yMinWater <= fragMOverZ[k] + gToleranceWide && yMinWater >= fragMOverZ[k] - gToleranceWide) { bIonFound = TRUE; break; } if(yMinAmmonia <= fragMOverZ[k] + gToleranceWide && yMinAmmonia >= fragMOverZ[k] - gToleranceWide) { bIonFound = TRUE; break; } } } } } } if(bIonFound || (validTerminus && i == seqLength-1)) /*i==seqLenght-1 is the b ion of MH+-18*/ { bQualMass = bQualMass + sequence[i]; } else { if(bQualMass > maxBQualMass) { maxBQualMass = bQualMass; } bQualMass = 0; } } /*Calculate quality based on y ions*/ yQualMass = 0; ySinglyCharged = gParam.modifiedCTerm + 2 * gElementMass_x100[HYDROGEN]; for(i = seqLength - 1; i >= 0; i--) { singleAA = FALSE; validTerminus = FALSE; ySinglyCharged = ySinglyCharged + sequence[i]; for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] <= gMonoMass_x100[j] + gToleranceWide && sequence[i] >= gMonoMass_x100[j] - gToleranceWide) { singleAA = TRUE; validTerminus = TRUE; break; } } yIonFound = FALSE; /*set it to FALSE for each "residue"*/ if(i == 0 && !singleAA) /*Count the two aa at the N-terminus*/ { for(j = 0; j < gGapListDipeptideIndex; j++) { if(sequence[0] == gGapList[j]) /*make sure its really two aa, and not something more*/ { yIonFound = TRUE; validTerminus = TRUE; break; } } } if(singleAA) { for(j = 1; j <= maxCharge; j++) { yIon = (ySinglyCharged + (j-1) * gElementMass_x100[HYDROGEN]) / j; yMinWater = ySinglyCharged - (gElementMass_x100[HYDROGEN] * 2 + gElementMass_x100[OXYGEN]); yMinAmmonia = ySinglyCharged - (gElementMass_x100[HYDROGEN] * 3 + gElementMass_x100[NITROGEN]); yMinWater = (yMinWater + (j-1) * gElementMass_x100[HYDROGEN]) / j; yMinAmmonia = (yMinAmmonia + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(k = 0; k < fragNum; k++) { if(yIon <= fragMOverZ[k] + gToleranceWide && yIon >= fragMOverZ[k] - gToleranceWide) { yIonFound = TRUE; break; } if(argPresent) { if(yMinWater <= fragMOverZ[k] + gToleranceWide && yMinWater >= fragMOverZ[k] - gToleranceWide) { yIonFound = TRUE; break; } if(yMinAmmonia <= fragMOverZ[k] + gToleranceWide && yMinAmmonia >= fragMOverZ[k] - gToleranceWide) { yIonFound = TRUE; break; } } } if(yIonFound) { break; } } /*what if its ion trap data and the 1/3 rule predicts that the ions are too low of mass to be seen?*/ if(!yIonFound && gParam.fragmentPattern == 'L' && ySinglyCharged < lowMassLimit) { for(j = 1; j <= maxCharge; j++) { bIon = gParam.peptideMW - ySinglyCharged + 2 * gElementMass_x100[HYDROGEN]; bMinWater = bIon - (gElementMass_x100[OXYGEN] + 2 * gElementMass_x100[HYDROGEN]); bMinAmmonia = bIon - (gElementMass_x100[NITROGEN] + 3 * gElementMass_x100[HYDROGEN]); bIon = (bIon + (j-1) * gElementMass_x100[HYDROGEN]) / j; bMinWater = (bMinWater + (j-1) * gElementMass_x100[HYDROGEN]) / j; bMinAmmonia = (bMinAmmonia + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(k = 0; k < fragNum; k++) { if(bIon <= fragMOverZ[k] + gToleranceWide && bIon >= fragMOverZ[k] - gToleranceWide) { yIonFound = TRUE; break; } if(argPresent) { if(bMinWater <= fragMOverZ[k] + gToleranceWide && bMinWater >= fragMOverZ[k] - gToleranceWide) { yIonFound = TRUE; break; } if(bMinAmmonia <= fragMOverZ[k] + gToleranceWide && bMinAmmonia >= fragMOverZ[k] - gToleranceWide) { yIonFound = TRUE; break; } } } } } } if(yIonFound || (validTerminus && i==0)) /*i==0 means that I don't have to look for MH+*/ { yQualMass = yQualMass + sequence[i]; } else { if(yQualMass > maxYQualMass) { maxYQualMass = yQualMass; } yQualMass = 0; } } if(yQualMass > maxYQualMass) { maxYQualMass = yQualMass; } if(bQualMass > maxBQualMass) { maxBQualMass = bQualMass; } if(maxYQualMass > maxBQualMass) { quality = maxYQualMass / totalResidueMass; } else { quality = maxBQualMass / totalResidueMass; } return(quality); } /***************************RescoreAndPrune*********************************** * * */ void RescoreAndPrune(struct Sequence *firstSequencePtr, REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, char argPresent, REAL_4 *yFound, REAL_4 *bFound, REAL_8 *byError, INT_4 cleavageSites, INT_4 lowMassIons[][3], REAL_4 *ionFoundTemplate, INT_4 *fragIntensity, INT_4 intensityTotal, INT_4 *ionType) { struct Sequence *currSeqPtr = NULL; struct Sequence *previousPtr = NULL; INT_4 countTheSeqs = 0; INT_4 i, realSeqLength; REAL_4 maxScoreFraction = 0.45; REAL_4 intScore, maxScore; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { LoadSequence(sequence, &seqLength, currSeqPtr); for(i = 0; i < fragNum; i++) { ionFound[i] = ionFoundTemplate[i]; yFound[i] = 0; bFound[i] = 0; byError[i] = 100; } cleavageSites = FindABYIons(ionFound, fragNum, fragMOverZ, sequence, seqLength, argPresent, yFound, bFound, byError, ionType); cleavageSites = AlterIonFound(ionFound, fragNum, fragMOverZ, sequence, seqLength, yFound, bFound, cleavageSites); /*if gapNum = -100, this signals that the sequence is from the database*/ if(currSeqPtr->gapNum == -100) { cleavageSites = (currSeqPtr->peptideLength) - 1; } InternalFrag(ionFound, fragMOverZ, sequence, seqLength, fragNum, ionType); ScoreLowMassIons(ionFound, fragMOverZ, sequence, seqLength, lowMassIons, ionType); if(gParam.peakWidth < 0.7 * gMultiplier && !gParam.maxent3) { ScoreBYIsotopes(ionFound, fragMOverZ, fragNum, ionType); } realSeqLength = SequenceLengthCalc(sequence, seqLength); intScore = IntensityScorer(fragIntensity, ionFound, cleavageSites, fragNum, realSeqLength, intensityTotal); currSeqPtr->score = intScore * 1000 + 0.5; currSeqPtr = currSeqPtr->next; } /* Find max score*/ currSeqPtr = firstSequencePtr; maxScore = currSeqPtr->score; while(currSeqPtr != NULL) { if(currSeqPtr->score > maxScore) { maxScore = currSeqPtr->score; } currSeqPtr = currSeqPtr->next; } /* Count the sequences*/ countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { countTheSeqs++; currSeqPtr = currSeqPtr->next; } /* Remove sequences with less than 0.7 x maxScore*/ while(countTheSeqs > MAX_QTOF_SEQUENCES) { maxScoreFraction = maxScoreFraction + 0.02; currSeqPtr = firstSequencePtr->next; previousPtr = firstSequencePtr; while(currSeqPtr != NULL) { if(currSeqPtr->score < maxScore * maxScoreFraction) { previousPtr->next = currSeqPtr->next; free(currSeqPtr); currSeqPtr = previousPtr->next; } else { currSeqPtr = currSeqPtr->next; previousPtr = previousPtr->next; } } /* Count the sequences*/ countTheSeqs = 0; currSeqPtr = firstSequencePtr; while(currSeqPtr != NULL) { countTheSeqs++; currSeqPtr = currSeqPtr->next; } } if(gParam.fMonitor && gCorrectMass) { printf("Scoring %4ld sequences found for qtof scoring.\n", countTheSeqs); printf("These had intensity scores in excess of %.3f.\n", maxScoreFraction); } return; } /********************************LutefiskProbScorer***************************************************** * * Assign probability scores to sequences. * */ REAL_4 LutefiskProbScorer(INT_4 *sequence, INT_4 seqLength, INT_4 fragNum, INT_4 *fragMOverZ, char argPresent) { INT_4 i; REAL_4 *randomProb; REAL_8 probScore = 0; /* Make some space*/ randomProb = malloc(MAX_ION_NUM * sizeof(REAL_4)); if(randomProb == NULL) { printf("SequenceScorer: Out of memory"); exit(1); } /*Initialize*/ for(i = 0; i < MAX_ION_NUM; i++) { randomProb[i] = 0; } /*debug*/ if(sequence[0] == 7104) { i++; } /* Calculate random probability for each ion*/ CalcRandomProb(randomProb, fragMOverZ, fragNum); /* Score the sequences*/ /*Get initial probability based on terminal group (Lys and Arg are good; others are not)*/ probScore = InitProbScore(sequence, seqLength); /*Initialize the maximum probability score possible for this sequence (used to normalize later)*/ gProbScoreMax = probScore; /*Find the b ions*/ probScore = FindBIons(fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength, argPresent); /*Find the y ions*/ probScore = FindYIons(fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength, argPresent); /*Find the internal fragment ions*/ probScore = FindInternalIons(fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength); /*Find the immonium ions*/ probScore = FindImmoniumIons(fragMOverZ, fragNum, probScore, randomProb, sequence, seqLength); /*Change probability score to log base 10 scale*/ if(probScore > 1) { probScore = log10(probScore); } else /*keep things positive by only logging things over a value of 1*/ { probScore = 0.0001; } /*gProbScoreMax is based on y and b ion scoring, and assumes all reasonable values were found*/ if(gProbScoreMax > 1) { gProbScoreMax = log10(gProbScoreMax); } else /*keep things positive by only logging things over a value of 1*/ { gProbScoreMax = .0001; } /* probScore = probScore / gProbScoreMax;*/ /* Free array*/ free(randomProb); return(probScore); } /***********************************FindImmoniumIons****************************************** * * Finds and scores the amino acid immonium ions. */ REAL_8 FindImmoniumIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength) { REAL_4 lowMassIons[AMINO_ACID_NUMBER][3] = { /* A */ 0, 0, 0, /* R */ 0, 0, 0, /* N */ 0, 0, 0, /* D */ 0, 0, 0, /* C */ 0, 0, 0, /* E */ 0, 0, 0, /* Q */ 84.0450, 101.0715, 129.0664, /* G */ 0, 0, 0, /* H */ 110.0718, 0, 0, /* I */ 86.0970, 0, 0, /* L */ 86.0970, 0, 0, /* K */ 84.0814, 101.1079, 129.1028, /* M */ 104.0534, 0, 0, /* F */ 120.0813, 0, 0, /* P */ 70.0657, 0, 0, /* S */ 0, 0, 0, /* T */ 0, 0, 0, /* W */ 159.0922, 0, 0, /* Y */ 136.0762, 0, 0, /* V */ 72.0813, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; INT_4 i, j, k, immoniumIndex, aaPresent[AMINO_ACID_NUMBER]; REAL_4 individualProb; REAL_4 massDiff, errProb, testErrProb; BOOLEAN areThereAnyLowMassIons = FALSE; BOOLEAN immoniumFound = FALSE; /*multiply low mass values by gMultiplier*/ for(i = 0; i < gAminoAcidNumber; i++) { for(j = 0; j < 3; j++) { if(lowMassIons[i][j] != 0) { lowMassIons[i][j] *= gMultiplier; } } } /*Check to see if there are any immonium ions at all*/ for(i = 0; i < gAminoAcidNumber; i++) { for(j = 0; j < 3; j++) { if(lowMassIons[i][j] != 0) { for(k = 0; k < ionCount; k++) { if(mass[k] > lowMassIons[W][0] + gToleranceWide) { break; } if(mass[k] >= lowMassIons[i][j] - gToleranceWide && mass[k] <= lowMassIons[i][j] + gToleranceWide) { areThereAnyLowMassIons = TRUE; break; } } } } } /*Figure out which amino acids are present*/ for(i = 0; i < AMINO_ACID_NUMBER; i++) { aaPresent[i] = 0; } for(i = 0; i < seqLength; i++) { for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] <= gMonoMass_x100[j] + gToleranceWide && sequence[i] >= gMonoMass_x100[j] - gToleranceWide) { aaPresent[j] = 1; break; } } } /*Proceed if there are any immonium ions at all*/ if(areThereAnyLowMassIons) { for(i = 0; i < gAminoAcidNumber; i++) { if(aaPresent[i] && lowMassIons[i][0] != 0) { errProb = 0; /*use best match when several ions exist for a given amino acid*/ for(j = 0; j < 3; j++) { if(lowMassIons[i][j] > 0) { immoniumIndex = 0; immoniumFound = FALSE; for(k = 0; k < ionCount; k++) { if(mass[k] > lowMassIons[W][0] + gToleranceWide) { break; } if(mass[k] < lowMassIons[i][j] + gToleranceWide && mass[k] > lowMassIons[i][j] - gToleranceWide) { immoniumIndex = k; immoniumFound = TRUE; massDiff = fabs(lowMassIons[i][j] - mass[k]); testErrProb = CalcIonFound(0, massDiff); if(testErrProb > errProb) { errProb = testErrProb; } } } } } if(immoniumFound) /*something was found*/ { individualProb = immoniumProb / randomProb[immoniumIndex]; individualProb *= errProb; if(individualProb > 1) { probScore *= individualProb; } } else /*immonium ions not found, so penalize*/ { individualProb = (1 - immoniumProb) / (1 - randomProb[immoniumIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } } } return(probScore); } /***********************************FindInternalIons****************************************** * * Finds and scores the internal fragment ions. */ REAL_8 FindInternalIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength) { INT_4 i, j, k, residueCount; REAL_4 testMass, individualProb; REAL_4 massDiff, errProb; REAL_4 precursor = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; BOOLEAN nTermPro, intFragTest; if(seqLength < 4) { return(probScore); /*need at least four residues for an internal fragment*/ } for(i = 1; i < seqLength - 2; i++) { testMass = sequence[i] + gElementMass_x100[HYDROGEN]; residueCount = 1; if(sequence[i] > gMonoMass_x100[P] - gToleranceWide && sequence[i] < gMonoMass_x100[P] + gToleranceWide) { nTermPro = TRUE; /*The N-terminus of this fragment is proline*/ } else { nTermPro = FALSE; } for(j = i + 1; j < seqLength - 1; j++) { testMass += sequence[j]; residueCount++; if(testMass < precursor - gToleranceWide && residueCount < 5 && testMass > mass[0]) /*dont bother w/ high mass internal frags*/ { intFragTest = FALSE; for(k = 0; k < ionCount; k++) { if(mass[k] > testMass + gToleranceWide) { break; /*I need to save the k value at the point where this occurs*/ } if(testMass < mass[k] + gToleranceWide && testMass > mass[k] - gToleranceWide) { intFragTest = TRUE; massDiff = fabs(testMass - mass[k]); errProb = CalcIonFound(0, massDiff); break; } } /*need to make sure k index is in range*/ if(k >= ionCount) { k = ionCount; } if(k < 0) { k = 1; } /*score the probability*/ if(intFragTest) { if(nTermPro) { individualProb = internalProProb / randomProb[k]; individualProb *= errProb; if(individualProb > 1) { probScore *= individualProb; } } else { individualProb = internalProb / randomProb[k]; individualProb *= errProb; if(individualProb > 1) { probScore *= individualProb; } } } else /*didn't find any, so penalize*/ { if(nTermPro) { individualProb = (1 - internalProProb) / (1 - randomProb[k]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } else { individualProb = (1 - internalProb) / (1 - randomProb[k]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } } } } return(probScore); } /***********************************InitProbScore********************************************** * * If the C-terminus is Arg or Lys, then give higher probability. * */ REAL_4 InitProbScore(INT_4 *sequence, INT_4 seqLength) { REAL_4 score = 0.05; REAL_4 residueMass, testMass; INT_4 i; BOOLEAN test = FALSE; residueMass = sequence[seqLength - 1]; /*Initialize for tryptic peptides*/ if(gParam.proteolysis == 'T') { if(residueMass < gMonoMass_x100[R] + gToleranceWide && residueMass > gMonoMass_x100[R] - gToleranceWide) { score = 0.95; } else if(residueMass < gMonoMass_x100[K] + gToleranceWide && residueMass > gMonoMass_x100[K] - gToleranceWide) { score = 0.95; } else { for(i = 0; i < gAminoAcidNumber; i++) { testMass = residueMass - gMonoMass_x100[i]; if(testMass < gMonoMass_x100[R] + gToleranceWide && testMass > gMonoMass_x100[R] - gToleranceWide) { score = 0.95; break; } else if(testMass < gMonoMass_x100[K] + gToleranceWide && testMass > gMonoMass_x100[K] - gToleranceWide) { score = 0.95; break; } } } } else if(gParam.proteolysis == 'K') { if(residueMass < gMonoMass_x100[K] + gToleranceWide && residueMass > gMonoMass_x100[K] - gToleranceWide) { score = 0.95; } else { for(i = 0; i < gAminoAcidNumber; i++) { testMass = residueMass - gMonoMass_x100[i]; if(testMass < gMonoMass_x100[K] + gToleranceWide && testMass > gMonoMass_x100[K] - gToleranceWide) { score = 0.95; break; } } } } else if(gParam.proteolysis == 'E') { if(residueMass < gMonoMass_x100[E] + gToleranceWide && residueMass > gMonoMass_x100[E] - gToleranceWide) { score = 0.95; } else if(residueMass < gMonoMass_x100[D] + gToleranceWide && residueMass > gMonoMass_x100[D] - gToleranceWide) { score = 0.95; } else { for(i = 0; i < gAminoAcidNumber; i++) { testMass = residueMass - gMonoMass_x100[i]; if(testMass < gMonoMass_x100[E] + gToleranceWide && testMass > gMonoMass_x100[E] - gToleranceWide) { score = 0.95; break; } else if(testMass < gMonoMass_x100[D] + gToleranceWide && testMass > gMonoMass_x100[D] - gToleranceWide) { score = 0.95; break; } } } } return(score); } /************************************CalcRandomProb********************************************* * * For each ion, a 400 u window is identified (usually +/- 200 u surrounding it) and the number * of ions is counted within the window. That counted number is divided by the number of possible * ions that could fit in that 400 u window, which depends on the instrument resolution. */ void CalcRandomProb(REAL_4 *randomProb, INT_4 *mass, INT_4 ionCount) { INT_4 i, j, windowCount; REAL_4 lowMass, highMass; /* Initialize*/ lowMass = mass[0]; highMass = mass[ionCount-1]; for(i = 0; i < MAX_ION_NUM; i++) { randomProb[i] = 0; } for(i = 0; i < ionCount; i++) { windowCount = 0; if(mass[i] < lowMass + 200 * gMultiplier) /*bottom 400 u window before it moves*/ { for(j = 0; j < ionCount; j++) { if(mass[j] < lowMass + 400 * gMultiplier) { windowCount++; } } } else if(mass[i] > highMass - 200 * gMultiplier) /*top 400 u window that stops moving*/ { for(j = 0; j < ionCount; j++) { if(mass[j] > highMass - 400 * gMultiplier) { windowCount++; } } } else /*this is the moving window*/ { for(j = 0; j < ionCount; j++) { if(mass[j] > mass[i] - 200 * gMultiplier && mass[j] < mass[i] + 200 * gMultiplier) { windowCount++; } } } /*calculate the randomness of this ion*/ randomProb[i] = (REAL_4) windowCount / 400; /*assuming low resolution of one peak per amu is possible*/ } /*Verify that randomProb nevers equals zero or one (avoid divide by zero later on)*/ for(i = 0; i < ionCount; i++) { if(randomProb[i] < 0.005) { randomProb[i] = 0.005; /*this is 1 out of 200*/ } if(randomProb[i] > 0.995) { randomProb[i] = 0.995; /*this is 199 out of 200*/ } } return; } /**************************************FindBIons********************************************* * * Find the b ions for the sequence and change the ionFound to 1. Return a value that corresponds * to the number of consecutive b ions. */ REAL_8 FindBIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength, char argPresent) { INT_4 i, j, k, bIonIndex, posResidues, proGapLimit, proGapCount, oxMetCount, maxCharge; INT_4 addBIons; REAL_4 water, ammonia, bIonTemplate, bIonMin17Template, bIonMin18Template, precursor; REAL_4 bIon, bIonMin17, bIonMin18, individualProb, aIon, aIonTemplate, carbonMonoxide; REAL_4 neutralLossProb, highMassLimit, lowMassLimit, bMin64; REAL_4 bErrProb, b17ErrProb, b18ErrProb, aErrProb, massDiff, b64ErrProb; REAL_4 lossOf64, bIonMin64Template; BOOLEAN bIonTest, bMin18Test, bMin17Test, aIonTest, isItAGap, bMin64Test; BOOLEAN nTerminalQ, nTerminalE, TwoAAGap; /* Initialize*/ water = gElementMass_x100[OXYGEN] + gElementMass_x100[HYDROGEN] * 2; ammonia = gElementMass_x100[NITROGEN] + gElementMass_x100[HYDROGEN] * 3; carbonMonoxide = gElementMass_x100[CARBON] + gElementMass_x100[OXYGEN]; lossOf64 = gElementMass_x100[CARBON] + gElementMass_x100[SULFUR] + gElementMass_x100[OXYGEN] + 4 * gElementMass_x100[HYDROGEN]; bIonTemplate = gParam.modifiedNTerm; precursor = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; posResidues = 1; oxMetCount = 0; if(gParam.peptideMW < 1000 * gMultiplier) { proGapLimit = 1; /*gaps w/ Pro are not counted as gaps, unless there are more than 1 such Pro gap*/ } else { proGapLimit = 2; /*the limit is higher for higher molecular weight peptides*/ } proGapCount = 0; if(gParam.chargeState == 1) { maxCharge = 1; } else { maxCharge = 2; } if(gParam.fragmentPattern == 'L') /*ion masses outside of this range are not penalized if not found*/ { lowMassLimit = precursor * 0.333; /*so-called 1/3 rule*/ highMassLimit = 2000 * gMultiplier; /*mass limit for Deca*/ } else { lowMassLimit = 146 * gMultiplier; /*y1 for Lys*/ highMassLimit = 2 * precursor; /*often the very high mass ions are missing*/ } /*Determine if the N-terminal residue is Q or E. If so, then b-17 and b-18 are counted even if no b*/ if(sequence[0] >= gMonoMass_x100[Q] - gToleranceWide && sequence[0] <= gMonoMass_x100[Q] + gToleranceWide) { nTerminalQ = TRUE; } else { nTerminalQ = FALSE; } if(sequence[0] >= gMonoMass_x100[E] - gToleranceWide && sequence[0] <= gMonoMass_x100[E] + gToleranceWide) { nTerminalE = TRUE; } else { nTerminalE = FALSE; } /* Start the calculations and searches*/ for(i = 0; i < seqLength - 1; i++) { /*Count the number of positively charged amino acids in the sequence*/ if((sequence[i] >= gMonoMass_x100[R] - gToleranceWide && sequence[i] <= gMonoMass_x100[R] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[H] - gToleranceWide && sequence[i] <= gMonoMass_x100[H] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[K] - gToleranceWide && sequence[i] <= gMonoMass_x100[K] + gToleranceWide)) { posResidues++; } else /*check to see if its a two aa gap that might have a positive charge*/ { for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] >= gArgPlus[j] - gToleranceWide && sequence[i] <= gArgPlus[j] + gToleranceWide) || (sequence[i] >= gHisPlus[j] - gToleranceWide && sequence[i] <= gHisPlus[j] + gToleranceWide) || (sequence[i] >= gLysPlus[j] - gToleranceWide && sequence[i] <= gLysPlus[j] + gToleranceWide)) { posResidues++; break; } } } /*count the number of oxidized Met's, or Phe's (which could be oxidized Met)*/ if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0 && gToleranceNarrow < 5) /*mass accuracy sufficient to determine oxMet*/ { if(sequence[i] >= gMonoMass_x100[9] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[9] + gToleranceNarrow) { oxMetCount++; /*y ions gained a oxMet*/ } if(oxMetCount < 0) { printf("LutefiskScore:FindABYIons The number of oxidized Mets went negative."); exit(1); } } else /*mass accuracy not sufficient to differentiate oxMet from Phe*/ { if(sequence[i] >= gMonoMass_x100[F] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[F] + gToleranceNarrow) { oxMetCount++; /*y ions gained a oxMet*/ } if(oxMetCount < 0) { printf("LutefiskScore:FindABYIons The number of oxidized Mets went negative."); exit(1); } } /*Decide if this is a gap*/ isItAGap = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] <= gMonoMass_x100[j] + gToleranceWide && sequence[i] >= gMonoMass_x100[j] - gToleranceWide) { isItAGap = FALSE; break; } if(sequence[i] <= gMonoMass_x100[j] + gMonoMass_x100[P] + gToleranceWide && sequence[i] >= gMonoMass_x100[j] + gMonoMass_x100[P] - gToleranceWide) { if(proGapCount < proGapLimit) { isItAGap = FALSE; /*if the gap could contain Pro, then don't call it a gap*/ proGapCount++; break; } } } /* Don't count b1 ions */ if(!isItAGap && i == 0) { bIonTemplate += sequence[i]; /*need to add the mass to have correct b ion series later*/ continue; } /*Calc b related ions assuming a single charge*/ bIonTemplate += sequence[i]; bIonMin17Template = bIonTemplate - ammonia; bIonMin18Template = bIonTemplate - water; bIonMin64Template = bIonTemplate - lossOf64; aIonTemplate = bIonTemplate - carbonMonoxide; for(j = 1; j <= maxCharge; j++) /*check different charge states*/ { bIon = (bIonTemplate + (j-1)*gElementMass_x100[HYDROGEN]) / j; bIonMin17 = (bIonMin17Template + (j-1)*gElementMass_x100[HYDROGEN]) / j; bIonMin18 = (bIonMin18Template + (j-1)*gElementMass_x100[HYDROGEN]) / j; bMin64 = (bIonMin64Template + (j-1)*gElementMass_x100[HYDROGEN]) / j; aIon = (aIonTemplate + (j-1)*gElementMass_x100[HYDROGEN]) / j; bIonTest = FALSE; aIonTest = FALSE; bMin18Test = FALSE; bMin17Test = FALSE; bMin64Test = FALSE; /*apply constraints to charge and mass*/ if(bIon * j > (j-1) * 400 * gMultiplier && posResidues >= j) { /*Don't mess with the score unless b ion is less than precursor, or its LCQ data*/ if((bIonTemplate < precursor && gParam.fragmentPattern) || gParam.fragmentPattern == 'L') { for(k = 0; k < ionCount; k++) { if(mass[k] > bIon + gToleranceWide) { break; /*don't waste any more time looking*/ } if(mass[k] > bIon - gToleranceWide) { bIonTest = TRUE; massDiff = fabs(mass[k] - bIon); bErrProb = CalcIonFound(0, massDiff); } } if(bIonTest || nTerminalQ || nTerminalE || argPresent) /*there's a b ion, so look for the b-17 and b-18*/ { for(k = 0; k < ionCount; k++) { if(bIonTest || nTerminalQ || argPresent) { if(mass[k] > bIonMin17 - gToleranceWide && mass[k] < bIonMin17 + gToleranceWide) { bMin17Test = TRUE; massDiff = fabs(mass[k] - bIonMin17); b17ErrProb = CalcIonFound(0, massDiff); } } if(bIonTest || nTerminalE || argPresent) { if(mass[k] > bIonMin18 - gToleranceWide && mass[k] < bIonMin18 + gToleranceWide) { bMin18Test = TRUE; massDiff = fabs(mass[k] - bIonMin18); b18ErrProb = CalcIonFound(0, massDiff); } } if(bIonTest) { if(mass[k] > aIon - gToleranceWide && mass[k] < aIon + gToleranceWide) { aIonTest = TRUE; massDiff = fabs(mass[k] - aIon); aErrProb = CalcIonFound(0, massDiff); } } if(bIonTest && oxMetCount > 0) { if(mass[k] > bMin64 - gToleranceWide && mass[k] < bMin64 + gToleranceWide) { bMin64Test = TRUE; massDiff = fabs(mass[k] - bMin64); b64ErrProb = CalcIonFound(0, massDiff); } } } } /*Calculate the probability scores*/ /*but first find the approximate index value for the b ion (to get correct randomProb)*/ for(k = 0; k < ionCount; k++) { if(mass[k] > bIon + gToleranceWide) { bIonIndex = k - 1; break; } } if(bIonIndex >= ionCount) /*check the ends of the array*/ { bIonIndex = ionCount - 1; } if(bIonIndex < 0) { bIonIndex = 0; } if(bIonTest) /*if the calculated b ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = bIonProb / randomProb[bIonIndex]; individualProb *= bErrProb; if(individualProb > 1) /*ion found means normalized prob over 1 so as not to penalize*/ { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = bIonProb * bDoublyProbMultiplier / randomProb[bIonIndex]; individualProb *= bErrProb; if(individualProb > 1) { probScore *= individualProb; } } } else /*if the calculated b ion is not present*/ { if(bIon > lowMassLimit && bIon < highMassLimit) /*dont penalize if outside these limits*/ { if(!(nTerminalQ && bMin17Test)) /*if b-17 is present and N-term Q, then don't penalize*/ { if(!(nTerminalE && bMin18Test)) /*if b-18 is present and N-term E, then don't penalize*/ { if(!(argPresent && (bMin17Test || bMin18Test))) /*if more Arg's than charges, don't penalize*/ { if(j == 1) { individualProb = (1-bIonProb) / (1 - randomProb[bIonIndex]); if(individualProb < 1 && individualProb > 0) /*penalize by being between 0 and 1*/ { probScore *= individualProb; } } else { individualProb = (1 - bIonProb * bDoublyProbMultiplier) / (1 - randomProb[bIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } } } } } /*if any of the neutral losses from b ions are present*/ if(bMin18Test || bMin17Test || aIonTest) { if(bMin18Test) /*if the calculated b-18 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = bMinWaterProb / randomProb[bIonIndex]; individualProb *= b18ErrProb; if(individualProb > 1) { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = bMinWaterProb * bDoublyProbMultiplier / randomProb[bIonIndex]; individualProb *= b18ErrProb; if(individualProb > 1) { probScore *= individualProb; } } } if(bMin17Test) /*if the calculated b-17 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = bMinAmmoniaProb / randomProb[bIonIndex]; individualProb *= b17ErrProb; if(individualProb > 1) { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = bMinAmmoniaProb * bDoublyProbMultiplier / randomProb[bIonIndex]; individualProb *= b17ErrProb; if(individualProb > 1) { probScore *= individualProb; } } } if(aIonTest) /*if the calculated a ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = aIonProb / randomProb[bIonIndex]; individualProb *= aErrProb; if(individualProb > 1) { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = aIonProb * bDoublyProbMultiplier / randomProb[bIonIndex]; individualProb *= aErrProb; if(individualProb > 1) { probScore *= individualProb; } } } } else /*missing ion needs to be penalized*/ { neutralLossProb = bMinWaterProb; /*Figure out which neutral loss is least likely*/ if(bMinAmmoniaProb < neutralLossProb) { neutralLossProb = bMinAmmoniaProb; } if(aIonProb < neutralLossProb) { neutralLossProb = aIonProb; } if(aIon > lowMassLimit && bIonMin17 < highMassLimit) { if(j == 1) { individualProb = (1 - neutralLossProb) / (1 - randomProb[bIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } else { individualProb = (1 - neutralLossProb * bDoublyProbMultiplier) / (1 - randomProb[bIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } } if(bMin64Test) /*don't penalize if oxMet neutral loss is absent*/ { individualProb = bMin64IonProb / randomProb[bIonIndex]; individualProb *= b64ErrProb; if(individualProb > 1) { probScore *= individualProb; } } if(isItAGap && j == 1 && i > 0) /*a gap arises from lack of a b/y ion, so penalize, but only do it once for j=1; also don't penalize a gap at the N-terminus*/ { if(bIon > lowMassLimit && bIon < highMassLimit) { individualProb = (1-bIonProb) / (1 - randomProb[bIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } /*Now calculate the max prob score*/ if(bIon > lowMassLimit && bIon < highMassLimit) /*dont add to score if outside these limits*/ { /*add b ion contribution*/ if(j == 1) { individualProb = bIonProb / randomProb[bIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } else { individualProb = bIonProb * bDoublyProbMultiplier / randomProb[bIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } /*add neutral loss contribution (only add one neutral loss per residue*/ neutralLossProb = bMinWaterProb; /*Figure out which neutral loss is least likely*/ if(bMinAmmoniaProb < neutralLossProb) { neutralLossProb = bMinAmmoniaProb; } if(aIonProb < neutralLossProb) { neutralLossProb = aIonProb; } if(j == 1) /*for singly charged fragments*/ { individualProb = neutralLossProb / randomProb[bIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = neutralLossProb * bDoublyProbMultiplier / randomProb[bIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } /*now add extra prob score for ions that should be present within a gap*/ if(isItAGap && j == 1) /*a gap arises from lack of a b/y ion, so penalize, but only do it once for j=1*/ { /*decide if its a legitimate 2 aa gap or something bigger*/ TwoAAGap = FALSE; for(k = 0; k < gGapListIndex; k++) { if(sequence[i] <= gGapList[k] + gToleranceWide && sequence[i] >= gGapList[k] - gToleranceWide) { TwoAAGap = TRUE; /*found it as a 2aa gap*/ break; } } if(TwoAAGap) { addBIons = 1; } else { addBIons = ((REAL_4)sequence[i] / (gMultiplier * AV_RESIDUE_MASS)) + 0.5; addBIons = addBIons - 1; /*Ex: if three residues, then add two y ions*/ if(addBIons < 1) { addBIons = 1; } } if(i == 0) { addBIons = addBIons - 1; /*accounts for complete absence of b1 ions*/ } if(bIon > lowMassLimit && bIon < highMassLimit) { individualProb = bIonProb / randomProb[bIonIndex]; if(individualProb > 0) { for(k = 0; k < addBIons; k++) { gProbScoreMax *= individualProb; } } } } } } } } } return(probScore); } /**************************************FindYIons********************************************* * * Find the y ions for the sequence and change the ionFound to 1. Return a value that corresponds * to the number of consecutive y ions. */ REAL_8 FindYIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength, char argPresent) { INT_4 i, j, k, yIonIndex, posResidues, proGapLimit, proGapCount, oxMetCount, maxCharge; INT_4 addYIons; REAL_4 water, ammonia, yIonTemplate, yIonMin17Template, yIonMin18Template; REAL_4 yIon, yIonMin17, yIonMin18, individualProb, neutralLossProb; REAL_4 massDiff, yErrProb, y17ErrProb, y18ErrProb, precursor, lowMassLimit, highMassLimit; REAL_4 yMin64, lossOf64, yIonMin64Template, y64ErrProb; BOOLEAN yIonTest, yMin17Test, yMin18Test, isItAGap, yMin64Test, TwoAAGap; /* Initialize*/ water = gElementMass_x100[OXYGEN] + gElementMass_x100[HYDROGEN] * 2; ammonia = gElementMass_x100[NITROGEN] + gElementMass_x100[HYDROGEN] * 3; lossOf64 = gElementMass_x100[CARBON] + gElementMass_x100[SULFUR] + gElementMass_x100[OXYGEN] + 4 * gElementMass_x100[HYDROGEN]; yIonTemplate = gParam.modifiedCTerm + 2 * gElementMass_x100[HYDROGEN]; posResidues = 1; oxMetCount = 0; if(gParam.peptideMW < 1000 * gMultiplier) { proGapLimit = 1; } else { proGapLimit = 2; } proGapCount = 0; if(gParam.chargeState == 1) { maxCharge = 1; } else { maxCharge = 2; } precursor = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; if(gParam.fragmentPattern == 'L') /*ion masses outside of this range are not penalized if not found*/ { lowMassLimit = precursor * 0.333; /*so-called 1/3 rule*/ highMassLimit = 2000 * gMultiplier; /*mass limit for Deca*/ } else { lowMassLimit = 146 * gMultiplier; /*y1 for Lys*/ highMassLimit = 2 * precursor; /*often the very high mass ions are missing*/ } for(i = seqLength - 1; i > 0; i--) { /*Determine if its a positive residue*/ if((sequence[i] >= gMonoMass_x100[R] - gToleranceWide && sequence[i] <= gMonoMass_x100[R] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[H] - gToleranceWide && sequence[i] <= gMonoMass_x100[H] + gToleranceWide) || (sequence[i] >= gMonoMass_x100[K] - gToleranceWide && sequence[i] <= gMonoMass_x100[K] + gToleranceWide)) { posResidues++; } else /*check to see if its a two aa gap that might have a positive charge*/ { for(j = 0; j < gAminoAcidNumber; j++) { if((sequence[i] >= gArgPlus[j] - gToleranceWide && sequence[i] <= gArgPlus[j] + gToleranceWide) || (sequence[i] >= gHisPlus[j] - gToleranceWide && sequence[i] <= gHisPlus[j] + gToleranceWide) || (sequence[i] >= gLysPlus[j] - gToleranceWide && sequence[i] <= gLysPlus[j] + gToleranceWide)) { posResidues++; break; } } } /*Determine if this is a gap*/ isItAGap = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(sequence[i] < gMonoMass_x100[j] + gToleranceWide && sequence[i] > gMonoMass_x100[j] - gToleranceWide) { isItAGap = FALSE; break; } if(sequence[i] <= gMonoMass_x100[j] + gMonoMass_x100[P] + gToleranceWide && sequence[i] >= gMonoMass_x100[j] + gMonoMass_x100[P] - gToleranceWide) { if(proGapCount < proGapLimit) { isItAGap = FALSE; /*if the gap could contain Pro, then don't call it a gap*/ proGapCount++; break; } } } /*count the number of oxidized Met's, or Phe's (which could be oxidized Met)*/ if(gParam.fragmentPattern == 'Q' && gParam.qtofErr != 0 && gToleranceNarrow < 5) /*mass accuracy sufficient to determine oxMet*/ { if(sequence[i] >= gMonoMass_x100[9] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[9] + gToleranceNarrow) { oxMetCount++; /*y ions gained a oxMet*/ } if(oxMetCount < 0) { printf("LutefiskScore:FindABYIons The number of oxidized Mets went negative."); exit(1); } } else /*mass accuracy not sufficient to differentiate oxMet from Phe*/ { if(sequence[i] >= gMonoMass_x100[F] - gToleranceNarrow && sequence[i] <= gMonoMass_x100[F] + gToleranceNarrow) { oxMetCount++; /*y ions gained a oxMet*/ } if(oxMetCount < 0) { printf("LutefiskScore:FindABYIons The number of oxidized Mets went negative."); exit(1); } } yIonTemplate += sequence[i]; yIonMin17Template = yIonTemplate - ammonia; yIonMin18Template = yIonTemplate - water; yIonMin64Template = yIonTemplate - lossOf64; for(j = 1; j <= maxCharge; j++) /*check different charge states*/ { yIon = (yIonTemplate + (j-1)*gElementMass_x100[HYDROGEN]) / j; yIonMin17 = (yIonMin17Template + (j-1)*gElementMass_x100[HYDROGEN]) / j; yIonMin18 = (yIonMin18Template + (j-1)*gElementMass_x100[HYDROGEN]) / j; yMin64 = (yIonMin64Template + (j-1)*gElementMass_x100[HYDROGEN]) / j; yIonTest = FALSE; yMin17Test = FALSE; yMin18Test = FALSE; yMin64Test = FALSE; /*apply constraints to charge and mass*/ if(yIon * j > (j-1) * 400 * gMultiplier && posResidues >= j) { for(k = 0; k < ionCount; k++) { if(mass[k] > yIon + gToleranceWide) { break; /*don't waste any more time looking*/ } if(mass[k] > yIon - gToleranceWide) { yIonTest = TRUE; massDiff = fabs(mass[k] - yIon); yErrProb = CalcIonFound(0, massDiff); } } if(yIonTest || argPresent) /*there's a y ion, so look for the y-17 and y-18, or for non-mobile proton fragmentation*/ { for(k = 0; k < ionCount; k++) { if(mass[k] > yIonMin17 - gToleranceWide && mass[k] < yIonMin17 + gToleranceWide) { /*y-17 intensity should be less than the y ion*/ yMin17Test = TRUE; massDiff = fabs(mass[k] - yIonMin17); y17ErrProb = CalcIonFound(0, massDiff); } if(mass[k] > yIonMin18 - gToleranceWide && mass[k] < yIonMin18 + gToleranceWide) { yMin18Test = TRUE; massDiff = fabs(mass[k] - yIonMin18); y18ErrProb = CalcIonFound(0, massDiff); } if(oxMetCount > 0) { if(mass[k] > yMin64 - gToleranceWide && mass[k] < yMin64 + gToleranceWide) { yMin64Test = TRUE; massDiff = fabs(mass[k] - yMin64); y64ErrProb = CalcIonFound(0, massDiff); } } } } /*Calculate the probability scores*/ /*but first find the approximate index value for the y ion (to get the correct randomProb)*/ for(k = 0; k < ionCount;k++) { if(mass[k] < yIon + gToleranceWide) { yIonIndex = k - 1; break; } } if(yIonIndex >= ionCount) { yIonIndex = ionCount - 1; } if(yIonIndex < 0) { yIonIndex = 0; } if(yIonTest) /*if the calculated y ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = yIonProb / randomProb[yIonIndex]; individualProb *= yErrProb; if(individualProb > 1) { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = yIonProb * yDoublyProbMultiplier / randomProb[yIonIndex]; individualProb *= yErrProb; if(individualProb > 1) { probScore *= individualProb; } } } else /*if the calculated y ion is not present*/ { if(yIon > lowMassLimit && yIon < highMassLimit && i > 1) { if(!(argPresent && (yMin17Test || yMin18Test))) { if(j == 1) { individualProb = (1-yIonProb) / (1 - randomProb[yIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } else { individualProb = (1 - yIonProb * yDoublyProbMultiplier) / (1 - randomProb[yIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } } } if(yMin18Test || yMin17Test) /*if the calculated y-18 ion is present*/ { if(yMin18Test) { if(j == 1) /*for singly charged fragments*/ { individualProb = yMinWaterProb / randomProb[yIonIndex]; individualProb *= y18ErrProb; if(individualProb > 1) { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = yMinWaterProb * yDoublyProbMultiplier / randomProb[yIonIndex]; individualProb *= y18ErrProb; if(individualProb > 1) { probScore *= individualProb; } } } if(yMin17Test) /*if the calculated y-18 ion is present*/ { if(j == 1) /*for singly charged fragments*/ { individualProb = yMinAmmoniaProb / randomProb[yIonIndex]; individualProb *= y17ErrProb; if(individualProb > 1) { probScore *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = yMinAmmoniaProb * yDoublyProbMultiplier / randomProb[yIonIndex]; individualProb *= y17ErrProb; if(individualProb > 1) { probScore *= individualProb; } } } } else /*missing ions need to be penalized*/ { if(yMinWaterProb < yMinAmmoniaProb) { neutralLossProb = yMinWaterProb; } else { neutralLossProb = yMinAmmoniaProb; } if(yIonMin18 > lowMassLimit && yIonMin17 < highMassLimit && i > 1) { if(j == 1) { individualProb = (1 - neutralLossProb) / (1 - randomProb[yIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } else { individualProb = (1 - neutralLossProb * yDoublyProbMultiplier) / (1 - randomProb[yIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } } if(yMin64Test) /*don't penalize if oxMet neutral loss is absent*/ { individualProb = yMin64IonProb / randomProb[yIonIndex]; individualProb *= y64ErrProb; if(individualProb > 1) { probScore *= individualProb; } } if(isItAGap && j == 1 && i > 0) /*a gap arises from lack of a y ion, so penalize, but only do it once for j=1; also, don't penalize for a gap at the N-terminus*/ { if(yIon > lowMassLimit && yIon < highMassLimit) { individualProb = (1-yIonProb) / (1 - randomProb[yIonIndex]); if(individualProb < 1 && individualProb > 0) { probScore *= individualProb; } } } /*calculate y ion contribution to max possible prob score for this sequence*/ if(yIon > lowMassLimit && yIon < highMassLimit) { /*Add y ion contribution*/ if(j == 1) /*for singly charged fragments*/ { individualProb = yIonProb / randomProb[yIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = yIonProb * yDoublyProbMultiplier / randomProb[yIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } /*add one neutral loss contribution*/ if(yMinWaterProb < yMinAmmoniaProb) { neutralLossProb = yMinWaterProb; } else { neutralLossProb = yMinAmmoniaProb; } if(j == 1) /*for singly charged fragments*/ { individualProb = neutralLossProb / randomProb[yIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } else /*for multiply charged fragments*/ { individualProb = neutralLossProb * yDoublyProbMultiplier / randomProb[yIonIndex]; if(individualProb > 1) { gProbScoreMax *= individualProb; } } if(isItAGap && j == 1) /*a gap arises from lack of a y ion, so penalize, but only do it once for j=1*/ { /*decide if its a legitimate 2 aa gap or something bigger*/ TwoAAGap = FALSE; for(k = 0; k < gGapListIndex; k++) { if(sequence[i] <= gGapList[k] + gToleranceWide && sequence[i] >= gGapList[k] - gToleranceWide) { TwoAAGap = TRUE; /*found it as a 2aa gap*/ break; } } if(TwoAAGap) { addYIons = 1; } else { addYIons = ((REAL_4)sequence[i] / (gMultiplier * AV_RESIDUE_MASS)) + 0.5; addYIons = addYIons - 1; /*Ex: if three residues, then add two y ions*/ if(addYIons < 1) { addYIons = 1; } } if(yIon > lowMassLimit && yIon < highMassLimit) { individualProb = yIonProb / randomProb[yIonIndex]; if(individualProb > 0) { for(k = 0; k < addYIons; k++) { gProbScoreMax *= individualProb; } } } } } } } } /*Check to see if the N-terminal residue is a gap greater than 2aa's*/ TwoAAGap = FALSE; for(k = 0; k < gGapListIndex; k++) { if(sequence[0] <= gGapList[k] + gToleranceWide && sequence[0] >= gGapList[k] - gToleranceWide) { TwoAAGap = TRUE; /*found it as a 2aa gap or 1aa residue*/ break; } } if(!TwoAAGap) { addYIons = ((REAL_4)sequence[0] / (gMultiplier * AV_RESIDUE_MASS)) + 0.5; addYIons = addYIons - 1; /*Ex: if three residues, then add two y ions*/ if(addYIons < 1) { addYIons = 1; } if(addYIons > 0) { individualProb = yIonProb / randomProb[yIonIndex]; if(individualProb > 0) { for(k = 0; k < addYIons; k++) { gProbScoreMax *= individualProb; } } } } return(probScore); } /****************************PrintScoreDetailsToXLFile*************************************************** * This function prints header information to the output file. */ void PrintScoreDetailsToXLFile(struct SequenceScore *firstScorePtr, REAL_4 perfectProbScore) { FILE *fp; INT_4 i, j, seqNum; REAL_4 xcorrNormalizer; struct SequenceScore *maxPtr, *currPtr; const time_t theTime = (const time_t)time(NULL); char outputFile[256], fileName[256]; INT_4 length; INT_4 fileCount; /*Make up a name for the file*/ if (strlen(gParam.cidFilename) != 0) { /* Start from the CID filename */ strcpy (outputFile, gParam.cidFilename); length = strlen(outputFile); strcat(outputFile, ".xl"); } else { printf("not printing details"); return; } /* Make sure that the file doesn't already exist. If it does, append a number. */ strcpy(fileName, outputFile); fileCount = 1; while (1) { FILE *fp = fopen(fileName, "r"); if (NULL == fp) break; fclose(fp); strcpy(fileName, outputFile); sprintf(fileName + strlen(fileName), "%d\0", fileCount++); if (fileCount > 20) { printf("Too many old output files! Please clean up a bit first! Quitting."); exit(1); } } /* Open a new file.*/ fp = fopen(fileName, "w"); if(fp == NULL) /*fopen returns NULL if there's a problem.*/ { printf("Cannot open %s to write the output.\n", gParam.outputFile); exit(1); } fprintf(fp, "Run Date: %20s", ctime(&theTime)); /* Print header information from gParam to the console and the file.*/ fprintf(fp, " Filename: "); /*Print the CID data file name.*/ i = 0; while(gParam.cidFilename[i] != 0) { fputc(gParam.cidFilename[i], fp); i++; } fprintf(fp, "\n Molecular Weight: %7.2f", gParam.peptideMW); fprintf(fp, " Molecular Weight Tolerance: %5.2f", gParam.peptideErr); fprintf(fp, " Fragment Ion Tolerance: %5.2f", gParam.fragmentErr); fprintf(fp, "\n Ion Offset: %5.2f", gParam.ionOffset); fprintf(fp, " Charge State: %2ld", gParam.chargeState); if(gParam.centroidOrProfile == 'P') { fprintf(fp, " Profile Data \n"); } else { fprintf(fp, " Centroided or Pre-processed Data \n"); } if(gParam.proteolysis == 'T') { fprintf(fp, " Tryptic Digest"); } if(gParam.proteolysis == 'K') { fprintf(fp, " Lys-C Digest"); } if(gParam.proteolysis == 'E') { fprintf(fp, " Glu-C Digest"); } if(gParam.proteolysis == 'N') { fprintf(fp, " ??? Digest"); } if(gParam.fragmentPattern == 'G') { fprintf(fp, " Unknown Fragmentation Pattern \n"); } if(gParam.fragmentPattern == 'T') { fprintf(fp, " Tryptic Triple Quadrupole Fragmentation Pattern \n"); } if(gParam.fragmentPattern == 'L') { fprintf(fp, " Tryptic Ion Trap Fragmentation Pattern \n"); } if(gParam.fragmentPattern == 'Q') { fprintf(fp, " Tryptic QTOF Fragmentation Pattern \n"); } fprintf(fp, " Cysteine residue mass: %7.2f", gParam.cysMW); fprintf(fp, " Switch from monoisotopic to average mass at %d \n", gParam.monoToAv); fprintf(fp, " Ions per window: %.1f", gParam.ionsPerWindow); fprintf(fp, " Extension Threshold: %4.2f", gParam.extThresh); fprintf(fp, " Extension Number: %2ld", gParam.maxExtNum); fprintf(fp, "\n Gaps: %2ld", gParam.maxGapNum); fprintf(fp, " Peak Width: %4.1f", ((gParam.peakWidth) * 2)); fprintf(fp, " Data Threshold: %5.2f (%ld)", gParam.ionThreshold, gParam.intThreshold); fprintf(fp, " Ions per residue: %.1f", gParam.ionsPerResidue); fprintf(fp, "\n Amino acids known to be present: "); i = 0; while(gParam.aaPresent[i] != 0) { fputc(gParam.aaPresent[i], fp); i++; } fprintf(fp, "\n Amino acids known to be absent: "); i = 0; while(gParam.aaAbsent[i] != 0) { fputc(gParam.aaAbsent[i], fp); i++; } fprintf(fp, "\n"); fprintf(fp, "\n C-terminal mass: %7.4f", gParam.modifiedCTerm); fprintf(fp, "\n N-terminal mass: %7.4f", gParam.modifiedNTerm); fprintf(fp, "\n N-terminal Tag Mass: %7.2f", gParam.tagNMass); fprintf(fp, " C-terminal Tag Mass: %7.2f", gParam.tagCMass); fprintf(fp, " Sequence Tag: "); i = 0; while(gParam.tagSequence[i] != 0) { fputc(gParam.tagSequence[i], fp); i++; } if(gParam.edmanPresent) { fprintf(fp, "\n Edman data is available. "); } else { fprintf(fp, "\n Edman data is not available. "); } if(gParam.autoTag == TRUE) { fprintf(fp, "AutoTag ON"); } else { fprintf(fp, "AutoTag OFF"); } if(gParam.CIDfileType == 'T') { fprintf(fp, " CID data file is tab-delineated"); } if(gParam.CIDfileType == 'F') { fprintf(fp, " CID data file is Finnigan ASCII file"); } /*Print the details to the xl file*/ fprintf(fp, "\n Perfect probability score: %6.2f", perfectProbScore); /* Find the xcorr normalizer.*/ xcorrNormalizer = 0; currPtr = firstScorePtr; while(currPtr != NULL) { if(currPtr->crossDressingScore > xcorrNormalizer) { xcorrNormalizer = currPtr->crossDressingScore; } currPtr = currPtr->next; } xcorrNormalizer = 1; /* Set up the screen to print some of the output.*/ fprintf(fp, "\n Rank X-corr IntScr IntOnlyScr Quality ProbScr StDevErr CS CalFact Sequence\n"); /* Count the sequences.*/ seqNum = 0; maxPtr = firstScorePtr; while(maxPtr != NULL) { seqNum++; maxPtr = maxPtr->next; } for(i = 1; i <= 500 && i <= seqNum; i++) /*List the top 500 sequences.*/ { maxPtr = firstScorePtr; while(maxPtr != NULL) { if(maxPtr->rank == i) { /*Change peptide[j] to single letter code.*/ char *peptideString; INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength = 0; j = 0; while(maxPtr->peptide[j] != 0) { peptide[j] = maxPtr->peptideSequence[j]; peptideLength++; j++; } peptideString = PeptideString(peptide, peptideLength); if(maxPtr->databaseSeq) { strcat(peptideString, " "); /*used to denote this was database sequence*/ } if(peptideString) { fprintf(fp, " %3ld %5.3f %5.3f %5.3f %5.3f %5.3f %6.4f %2ld %8.6f %s\n", i, maxPtr->crossDressingScore / xcorrNormalizer, maxPtr->intensityScore, maxPtr->intensityOnlyScore, maxPtr->quality, maxPtr->probScore, maxPtr->stDevErr, maxPtr->cleavageSites, maxPtr->calFactor, peptideString); free(peptideString); } break; } maxPtr = maxPtr->next; } } fclose(fp); return; } /***********************************CalcPerfectProbScore****************************************** * * Determine a theoretical perfect probScore, given the list of ions present. */ REAL_4 CalcPerfectProbScore(INT_4 fragNum, INT_4 *fragMOverZ) { REAL_8 score = 0.95; REAL_4 individualProb, probScore; REAL_4 massRange = fragMOverZ[fragNum-1] - fragMOverZ[0]; REAL_4 randomMatchProb = (REAL_4)fragNum / (REAL_4)massRange * gMultiplier; REAL_4 residueMass = AV_RESIDUE_MASS * gMultiplier; INT_4 seqLength = (gParam.peptideMW * gMultiplier / residueMass) + 0.5; INT_4 halfSeqLength = seqLength / 2; INT_4 i; for(i = seqLength - 1; i > 1; i--) { individualProb = yIonProb / randomMatchProb; score *= individualProb; } for(i = 2; i < seqLength - 1; i++) { if(i < halfSeqLength || gParam.fragmentPattern =='L') { individualProb = bIonProb / randomMatchProb; score *= individualProb; } } if(score > 1) { probScore = log10(score); } else /*keep things positive by only logging things over a value of 1*/ { probScore = 0; } /*normalize by length*/ probScore = probScore / (2 * seqLength); return(probScore); } lutefisk-1.0.7+dfsg.orig/src/LutefiskHaggis.c0000644000175000017500000017223310303627012021017 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* ANSI headers */ #include #include #include #include #include #include /* Haggis headers */ #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" #include "ListRoutines.h" #if(defined(__MWERKS__) && __dest_os == __mac_os) #include "getopt.h" #include #include #include #include StandardFileReply freply; Point wpos; INT_4 tval; char prompt[256]; #endif #if(defined(__MWERKS__) && __dest_os == __win32_os) #include "getopt.h" #endif /*Definitions for this file*/ //#define MIN_NUM_IONS 5 /*Minimum number of ions after processing in GetCID*/ //#define MAX_ION_MASS 3000 /*Ions greater than this are deemed too high to not be a mistake*/ //#define MIN_HIGHMASS_INT_RATIO 0.1 /*Ratio of high mass intensity over total intensity*/ //#define HIGH_MASS_RATIO 0.9 /*Ions are counted until this % of high mass ion intensity is reached*/ //#define LCQ_INT_SUM_CUTOFF 500 /*Cutoff for good intensity total for LCQ data*/ //#define QTOF_INT_SUM_CUTOFF 140 /*Cutoff for good intensity total for Qtof data*/ //#define MAX_HIGH_MASS 100 /*Max number of ions greater than precursor*/ //#define MAX_ION_NUM 200 /*Max number of ions*/ #define MAX_SEQUENCES 10000 /*Max number of sequences to store*/ //#define MAX_MASS 2500 /*Peptides above this mass are tossed out.*/ //#define MIN_MASS 800 /*Peptides below this mass are tossed out.*/ //#define LOW_MASS_ION_NUM 19 /*Number of peptide-related low mass ions*/ /*Global variables for this file*/ INT_4 gForwardNodeConnect[MAX_ION_NUM][AMINO_ACID_NUMBER]; INT_4 gBackwardNodeConnect[MAX_ION_NUM][AMINO_ACID_NUMBER]; INT_4 gForwardNum[MAX_ION_NUM], gBackwardNum[MAX_ION_NUM]; INT_4 gIonCount; INT_4 gEdgeNum; INT_4 gSequenceNodes[MAX_SEQUENCES][MAX_ION_NUM]; INT_4 gSeqCount; INT_4 gSequenceNum; INT_4 gPepLength[MAX_SEQUENCES*2]; INT_4 gPepMassSeq[MAX_SEQUENCES*2][MAX_PEPTIDE_LENGTH]; INT_4 gMatchSeries[MAX_SEQUENCES*2]; INT_4 gAAArray[AMINO_ACID_NUMBER]; INT_4 gAAMonoArray[AMINO_ACID_NUMBER]; INT_4 gAANum, gMassRange, gCTermKIndex, gCTermRIndex, gLutefiskSequenceCount; BOOLEAN gNotTooManySequences = TRUE; /*****************************Haggis************************************************* * * First, Haggis decides how many fragment ion charge states to consider (+1 or +2, * rejecting precursors over +3). * Second, For each charge state Haggis converts the linked list firstMassPtr to a ion mass array * of singly-charged fragments. * Third, for each charge state it finds all singly-charged ions that can be connected to each other * via single amino acid residue mass jumps. * Fourth, it converts the sequences of nodes to sequences of residue masses, assuming that each * sequence of nodes could be either b or y ions. * Fifth, it adds the sequences to the linked list of sequences already produced by subsequencing. * */ struct Sequence *Haggis(struct Sequence *firstSequencePtr , struct MSData *firstMassPtr) { INT_4 *mass; INT_4 j, i, maxCharge; INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength; INT_4 score; INT_4 nodeValue; INT_2 nodeCorrection; INT_4 gapNum, lutefiskSequenceCount; struct Sequence *currPtr; /*Count and report the number of Lutefisk-derived sequences*/ lutefiskSequenceCount = 0; currPtr = firstSequencePtr; while(currPtr != NULL) { lutefiskSequenceCount++; currPtr = currPtr->next; } printf("Lutefisk sequences: %ld \n", lutefiskSequenceCount); gLutefiskSequenceCount = lutefiskSequenceCount; /*need to be global for StoreSeq*/ /*Don't bother working on precursor charge states more than 3*/ if(gParam.chargeState > 3) { return(firstSequencePtr); } /*Determine maximum charge state of fragment ions. Precursors of +3 have max charge of 2, +1 and +2 have a max charge of only 1*/ if(gParam.chargeState == 3) { maxCharge = 2; } else { maxCharge = 1; } /* Make some space*/ mass = malloc(MAX_ION_NUM * sizeof(REAL_4)); if(mass == NULL) { printf("Haggis: Out of memory"); exit(1); } /*Initialize variables*/ gSequenceNum = 0; for(i = 0; i < MAX_SEQUENCES * 2; i++) { gPepLength[i] = 0; for(j = 0; j < MAX_PEPTIDE_LENGTH; j++) { gPepMassSeq[i][j] = 0; } } /*Consider different charge states for fragment ions*/ for(j = 1; j <= maxCharge; j++) { /*Load mass arrays*/ mass = LoadMassArrays(mass, firstMassPtr, j); /*Set up the backward and forward node connections*/ SetupBackwardAndForwardNodes(mass); /*Find sequences of nodes*/ FindNodeSequences(mass); /*Convert sequences of nodes to sequences of residue masses assuming they are both b and y ions*/ GetSequenceOfResidues(mass); } /* Try to connect sequences.*/ AppendSequences(); /* Try to fill in the unsequenced ends with reasonable sequences. */ FleshOutSequenceEnds(firstMassPtr); /* To be consistent with the Lutefisk sequences, replace sequence regions that are unsupported by y/b ions w/ bracketed masses.*/ ModifyHaggisSequences(firstMassPtr); /*Find the highest score in the linked list*/ score = 0; currPtr = firstSequencePtr; while(currPtr != NULL) { if(currPtr->score > score) { score = currPtr->score; } currPtr = currPtr->next; } if(score == 0) { score = 1; /*if all the Lutefisk sequences have score of zero, then give Haggis sequences a non-zero score*/ } /*Assign some values for the linked list*/ nodeValue = gParam.peptideMW - gParam.modifiedCTerm + 0.5; nodeCorrection = 0; gapNum = 0; /*Add sequences to linked list*/ for(i = 0; i < gSequenceNum; i++) { peptideLength = gPepLength[i]; if(peptideLength < MAX_PEPTIDE_LENGTH && peptideLength > 3) /*toss out anything too small or big*/ { for(j= 0; j < peptideLength; j++) { peptide[j] = gPepMassSeq[i][j]; } firstSequencePtr = LinkHaggisSubsequenceList(firstSequencePtr, LoadHaggisSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); } } printf("Haggis sequences: %ld \n", gSequenceNum); free(mass); return(firstSequencePtr); } /********************************ModifyHaggisSequences********************************************** * * Search each sequence to see if there is a b or y ion between each amino acid. If not, combine * the amino acids for which no evidence is available. */ void ModifyHaggisSequences(struct MSData *firstMassPtr) { INT_4 i, j, k; BOOLEAN bIonTest, yIonTest; for(i = 0; i < gSequenceNum; i++) { for(j = 0; j < gPepLength[i] - 1; j++) { bIonTest = FindBIon(i,j, firstMassPtr); yIonTest = FindYIon(i,j, firstMassPtr); if(!bIonTest && !yIonTest) { gPepMassSeq[i][j] += gPepMassSeq[i][j + 1]; for(k = j + 1; k < gPepLength[i]; k++) { gPepMassSeq[i][k] = gPepMassSeq[i][k + 1]; } gPepLength[i] -= 1; j--; } } } return; } /*********************************FindBIon*********************************************************** * * Look for a b ion. If found, return a TRUE value. */ BOOLEAN FindBIon(INT_4 sequenceIndex,INT_4 residueIndex, struct MSData *firstMassPtr) { INT_4 i; INT_4 bIon, maxCharge, bIon1Charge; BOOLEAN bIonPresent; struct MSData *currPtr; /*Initialize*/ bIonPresent = FALSE; bIon1Charge = gParam.modifiedNTerm; if(gParam.chargeState > 2) { maxCharge = 2; } else { maxCharge = 1; } /*Calculate singly-charged b ion mass*/ for(i = 0; i <= residueIndex; i++) { bIon1Charge += gPepMassSeq[sequenceIndex][i]; } /*For each charge, look for the b ion*/ for(i = 1; i <= maxCharge; i++) { bIon = (bIon1Charge + (i - 1) * gElementMass_x100[HYDROGEN]) / i; currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= bIon - gParam.fragmentErr * 1.5) /*tolerance is very wide here*/ { if(currPtr->mOverZ > bIon + gParam.fragmentErr * 1.5) { break; } bIonPresent = TRUE; } currPtr = currPtr->next; } } return(bIonPresent); } /*********************************FindYIon*********************************************************** * * Look for a y ion. If found, return a TRUE value. */ BOOLEAN FindYIon(INT_4 sequenceIndex,INT_4 residueIndex, struct MSData *firstMassPtr) { INT_4 i; INT_4 yIon, maxCharge, yIon1Charge; BOOLEAN yIonPresent; struct MSData *currPtr; /*Initialize*/ yIonPresent = FALSE; yIon1Charge = gParam.modifiedCTerm + 2 * gElementMass_x100[HYDROGEN]; if(gParam.chargeState > 2) { maxCharge = 2; } else { maxCharge = 1; } /*Calculate singly-charged y ion mass*/ for(i = gPepLength[sequenceIndex] - 1; i > residueIndex; i--) { yIon1Charge += gPepMassSeq[sequenceIndex][i]; } /*For each charge, look for the y ion*/ for(i = 1; i <= maxCharge; i++) { yIon = (yIon1Charge + (i - 1) * gElementMass_x100[HYDROGEN]) / i; currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= yIon - gParam.fragmentErr * 1.5) /*tolerance is very wide here*/ { if(currPtr->mOverZ > yIon + gParam.fragmentErr * 1.5) { break; } yIonPresent = TRUE; } currPtr = currPtr->next; } } return(yIonPresent); } /*********************************AppendSequences**************************************************** * * This function finds two sequences that could be combined into a new one. */ void AppendSequences() { INT_4 i, j, k, testMass, massDiff, newSeqNum, massLimit; INT_4 newSeqIndex[MAX_SEQUENCES][2]; REAL_8 maxSequences = MAX_SEQUENCES; BOOLEAN storeIt, saveAll; massLimit = gMonoMass_x100[A] * 2 - gParam.fragmentErr; /*unsequenced mass separating the N- and C-terminal sequences has to be more than this value*/ /*Decide if all appended sequences can be saved*/ maxSequences = sqrt(maxSequences); if(maxSequences > gSequenceNum) { saveAll = TRUE; /*there are too few sequences to have to worry about generating too many new ones so go ahead and keep all of them*/ } else { saveAll = FALSE; /*need to save only the ones that correspond to certain masse*/ } /*Start the search for new sequences derived by sticking two old ones together*/ newSeqNum = 0; for(i = 0; i < gSequenceNum; i++) { testMass = 0; for(j = 0; j < gPepLength[i] - 1; j++) { testMass += gPepMassSeq[i][j]; /*Find N-terminal mass plus sequence of first one*/ } for(j = i; j < gSequenceNum; j++) { massDiff = gPepMassSeq[j][0] - testMass; /*Subtract C-terminal unsequenced mass from N-terminal mass of first one */ if(massDiff > massLimit) /*At moment only requirement is that mass diff be more than 2xAla*/ { if(gMatchSeries[j] != i && newSeqNum < MAX_SEQUENCES) /*don't append a sequence that is the reverse of itself*/ { storeIt = FALSE; /*assume the worst*/ if(saveAll) { storeIt = TRUE; /*not enough sequences to worry about overflow*/ } else { // if(massDiff > gMonoMass_x100[W] * 2 && massDiff < 400 * gMultiplier) // { // storeIt = FALSE; /*if the mass diff is between 372 and 500, save it*/ // } // else // { for(k = 0; k < gGapListIndex; k++) { if(massDiff <= gGapList[k] + gParam.fragmentErr && massDiff >= gGapList[k] - gParam.fragmentErr) { storeIt = TRUE; /*if its a one or two amino acid mass, save it*/ break; } } // } } if(storeIt) /*Store the index values of the two old sequences*/ { newSeqIndex[newSeqNum][0] = i; /*N-terminal bit*/ newSeqIndex[newSeqNum][1] = j; /*C-terminal bit*/ newSeqNum++; } } } } } /*Make the new sequences and add them to the global list*/ for(i = 0; i < newSeqNum; i++) { if(gSequenceNum < MAX_SEQUENCES * 2) { testMass = 0; for(j = 0; j < gPepLength[newSeqIndex[i][0]] - 1; j++) { gPepMassSeq[gSequenceNum][j] = gPepMassSeq[newSeqIndex[i][0]][j]; testMass += gPepMassSeq[gSequenceNum][j]; } massDiff = gPepMassSeq[newSeqIndex[i][1]][0] - testMass; gPepMassSeq[gSequenceNum][j] = massDiff; j++; for(k = 1; k < gPepLength[newSeqIndex[i][1]]; k++) { gPepMassSeq[gSequenceNum][j] = gPepMassSeq[newSeqIndex[i][1]][k]; j++; } gPepLength[gSequenceNum] = j; gSequenceNum++; } } return; } /***************************FleshOutSequences******************************************************************** * * First make list of unsequenced masses (no repeats). Then for each unsequenced mass start to find amino * acid combinations that match, but force a K or R if its a C-terminal mass. For each combination, * make all possible sequences (leaving K or R at the C-term), and score them using a simple y/b score. * The best score wins and replaces the unsequenced mass. */ void FleshOutSequenceEnds(struct MSData *firstMassPtr) { INT_4 i, j, k, l, mass; INT_4 cTermMassNum, nTermMassNum, *cTermMasses, *nTermMasses; INT_4 *sequenceToAdd, *sequenceToAppend; char cTerm; /*Assign space to arrays*/ cTermMasses = (int *) malloc(MAX_SEQUENCES * 2 * sizeof(INT_4)); if(cTermMasses == NULL) { printf("Haggis:FleshOutSequences memory error"); exit(1); } nTermMasses = (int *) malloc(MAX_SEQUENCES * 2 * sizeof(INT_4)); if(nTermMasses == NULL) { printf("Haggis:FleshOutSequences memory error"); exit(1); } sequenceToAdd = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); if(sequenceToAdd == NULL) { printf("Haggis:FleshOutSequences memory error"); exit(1); } sequenceToAppend = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); if(sequenceToAppend == NULL) { printf("Haggis:FleshOutSequences memory error"); exit(1); } /*Initialize*/ cTermMassNum = 0; nTermMassNum = 0; for(i = 0; i < MAX_SEQUENCES*2; i++) { cTermMasses[i] = 0; nTermMasses[i] = 0; } /*Make a list of C-terminal unsequenced masses*/ GetCTermMasses(&cTermMassNum, cTermMasses); /*Make a list of N-terminal unsequenced masses*/ GetNTermMasses(&nTermMassNum, nTermMasses); /*Make a mass ordered list of monoisotopic amino acids */ MakeAAArray(); /*Find best sequence for each n-terminal mass*/ for(i = 0; i < nTermMassNum; i++) { mass = nTermMasses[i]; sequenceToAdd[0] = 0; /*initialize*/ GetBestNtermSeq(sequenceToAdd, mass, firstMassPtr); /*Now stick this bit of sequence onto the appropriate full sequences (replacement)*/ if(sequenceToAdd[0] != 0) { for(j = 0; j < gSequenceNum; j++) { if(mass >= gPepMassSeq[j][0] - gParam.fragmentErr && mass <= gPepMassSeq[j][0] + gParam.fragmentErr) { l = 0; while(sequenceToAdd[l] != 0 && l < MAX_PEPTIDE_LENGTH) { sequenceToAppend[l] = sequenceToAdd[l]; l++; } for(k = 1; k < gPepLength[j]; k++) /*don't add k=0, since thats the unsequenced mass*/ { if(l < MAX_PEPTIDE_LENGTH) { sequenceToAppend[l] = gPepMassSeq[j][k]; l++; } } if(l < MAX_PEPTIDE_LENGTH) { gPepLength[j] = l; for(k = 0; k < gPepLength[j]; k++) { gPepMassSeq[j][k] = sequenceToAppend[k]; } } } } } } /*Find best sequence for each c-terminal mass, assuming K or R at C-terminus*/ /*Figure out if there is a C-terminal Lys, Arg, or Both*/ cTerm = CheckCterm(firstMassPtr); /*Now proceed*/ for(i = 0; i < cTermMassNum; i++) { mass = cTermMasses[i]; sequenceToAdd[0] = 0; /*initialize*/ GetBestCtermSeq(sequenceToAdd, mass, cTerm, firstMassPtr); /*Now stick this bit of sequence onto the appropriate full sequences (replacement)*/ if(sequenceToAdd[0] != 0) { for(j = 0; j < gSequenceNum; j++) { if(mass >= gPepMassSeq[j][gPepLength[j] - 1] - gParam.fragmentErr && mass <= gPepMassSeq[j][gPepLength[j] - 1] + gParam.fragmentErr) { l = 0; while(sequenceToAdd[l] != 0) { l++; } if(l < MAX_PEPTIDE_LENGTH) { l = 0; k = gPepLength[j] - 1; while(sequenceToAdd[l] != 0) { gPepMassSeq[j][k] = sequenceToAdd[l]; k++; l++; } gPepLength[j] = k; } } } } } /*free the arrays*/ free(cTermMasses); free(nTermMasses); free(sequenceToAdd); free(sequenceToAppend); return; } /**********************************GetBestCtermSeq****************************************** * * For each input mass of unsequenced C-terminus, find all random sequences that fit the mass. * To limit the computation time, only masses less than an upper limit are examined. Sequences * that fit the mass are scored according to how many y and b ions are matched. A C-terminal * Arg and/or Lys is assumed for tryptic peptides. */ void GetBestCtermSeq(INT_4 *sequenceToAdd, INT_4 mass, char cTerm, struct MSData *firstMassPtr) { INT_4 maxResidues, minResidues, residueNum, *sequence, i, j, k; INT_4 ratchetMass, nominalMass, loopNumber, newMass; REAL_4 testNum, position, score, bestScore; REAL_4 massLimit = 747; /*largest bit of unsequenced mass to be examined*/ char cTermAA; if((clock() - gParam.startTicks)/ CLOCKS_PER_SEC > 45) { massLimit = 600; } if(mass > massLimit * gMultiplier) { sequenceToAdd[0] = 0; /*signal that nothing was found*/ return; /*only try for short bits of mass*/ } /*Assign space to arrays*/ sequence = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); if(sequence == NULL) { printf("Haggis:FleshOutSequences memory error"); exit(1); } /*Initialize*/ for(i = 0; i < MAX_PEPTIDE_LENGTH; i++) { sequence[i] = 0; /*this is the array used to find random sequences*/ sequenceToAdd[i] = 0; /*this is the final best sequence to add for this particular mass */ } bestScore = 0; /*Create a loop that considers Arg, Lys, or both at C-terminus*/ if(cTerm == 'B') { loopNumber = 2; /*both y1 for Arg and Lys were found*/ } else { loopNumber = 1; /*only y1 for either Arg or Lys were found, *or* its not a tryptic peptide*/ } for(k = 0; k < loopNumber; k++) { if(loopNumber == 2) { if(k == 0) { newMass = mass - gMonoMass_x100[K]; cTermAA = 'K'; /*this denotes the particular C-term amino acid this time through the k loop*/ } else { newMass = mass - gMonoMass_x100[R]; cTermAA = 'R'; } } else { if(cTerm == 'K') { newMass = mass - gMonoMass_x100[K]; cTermAA = 'K'; } else if(cTerm == 'R') { newMass = mass - gMonoMass_x100[R]; cTermAA = 'R'; } else { newMass = mass; cTermAA = 'N'; } } /*Find max and min number of residues*/ testNum = (REAL_4)newMass / gMonoMass_x100[G]; maxResidues = testNum; if(maxResidues < 2) { return; /*presumably a single amino acid would have been found already*/ } if(maxResidues > MAX_PEPTIDE_LENGTH) { maxResidues = MAX_PEPTIDE_LENGTH; } testNum = (REAL_4)newMass / gMonoMass_x100[W]; testNum = testNum + 1; minResidues = testNum; if(minResidues < 2) { minResidues = 2; /*presumably a single amino acid would have been found already*/ } nominalMass = (REAL_4)newMass / gMultiplier + 0.5; /*use nominal masses now*/ /*start searching for sequences*/ for(residueNum = minResidues; residueNum <= maxResidues; residueNum++) { /*Initialize each time the sequence length changes*/ ratchetMass = 0; for(i = 0; i < residueNum; i++) { sequence[i] = 0; ratchetMass += gAAArray[0]; } sequence[0] = -1; /*The first time through Ratchet moves this to zero*/ position = 0; /*Ratchet produces a new sequence until all have been done, at which point it returns a NULL*/ while(ratchetMass != 0) { /*Ratchets through all possible sequences, and returns mass of sequence*/ ratchetMass = RatchetHaggis(sequence, residueNum, position, ratchetMass, nominalMass); if(ratchetMass == nominalMass) { /*Score the sequence*/ if(cTermAA == 'K') { sequence[residueNum] = gCTermKIndex; score = CSequenceScore(mass, sequence, residueNum + 1, firstMassPtr); } else if(cTermAA == 'R') { sequence[residueNum] = gCTermRIndex; score = CSequenceScore(mass, sequence, residueNum + 1, firstMassPtr); } else { score = CSequenceScore(mass, sequence, residueNum, firstMassPtr); } /*Check if this is the highest scoring sequence so far (save it)*/ if(score > bestScore) { bestScore = score; if(cTermAA == 'K' || cTermAA == 'R') { for(i = 0; i < residueNum + 1; i++) { sequenceToAdd[i] = gAAArray[sequence[i]]; /*puts nominal masses into array*/ } for(i = residueNum + 1; i < MAX_PEPTIDE_LENGTH; i++) { sequenceToAdd[i] = 0; /*backfill*/ } } else { for(i = 0; i < residueNum; i++) { sequenceToAdd[i] = gAAArray[sequence[i]]; /*puts nominal masses into array*/ } for(i = residueNum; i < MAX_PEPTIDE_LENGTH; i++) { sequenceToAdd[i] = 0; /*backfill*/ } } } } } } } /*Replace nominal masses in sequenceToAdd with monoisotopic masses*/ i = 0; while(sequenceToAdd[i] != 0) { for(j = 0; j < gAminoAcidNumber; j++) { if(j != K && j != I) { if(sequenceToAdd[i] == gNomMass[j]) { sequenceToAdd[i] = gMonoMass_x100[j]; break; } } } i++; } /*free the arrays*/ free(sequence); return; } /*********************************CSequenceScore********************************************* * * Calculate a score for each candidate c-terminal sequence. */ REAL_4 CSequenceScore(INT_4 mass, INT_4 *sequence, INT_4 residueNum, struct MSData *firstMassPtr) { REAL_4 score, precursor; INT_4 bIons[MAX_PEPTIDE_LENGTH], yIons[MAX_PEPTIDE_LENGTH]; INT_4 i, j, maxCharge; struct MSData *currPtr; /*Initialize*/ score = 0; if(gParam.chargeState > 2) { maxCharge = 2; } else { maxCharge = 1; } precursor = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; /*Calculate b ions at different charge states and search for them*/ for(j = 1; j <= maxCharge; j++) { /*Calculate b ions*/ bIons[0] = (gParam.peptideMW - gParam.modifiedCTerm - mass + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(i = 1; i < residueNum; i++) { bIons[i] = bIons[i-1] + gAAMonoArray[sequence[i-1]] / j; } /*Look for the b ions*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= bIons[1] - gParam.fragmentErr) /*skip over the b1 ions*/ { if(currPtr->mOverZ > bIons[residueNum - 1] + gParam.fragmentErr) { break; /*stop looking above the highest mass b ion*/ } for(i = 1; i < residueNum; i++) /*the highest b ion should not count*/ { if(currPtr->mOverZ >= bIons[i] - gParam.fragmentErr && currPtr->mOverZ <= bIons[i] + gParam.fragmentErr) { if(bIons[i] > 350 * gMultiplier *(j - 1)) { if(bIons[i] < precursor || gParam.fragmentPattern == 'L') { score += currPtr->intensity; } } } } } currPtr = currPtr->next; } } /*Calculate y ions at different charge states and search for them*/ for(j = 1; j <= maxCharge; j++) { /*Calculate y ions*/ yIons[0] = (gParam.modifiedCTerm + 2 * gElementMass_x100[HYDROGEN] + mass + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(i = 1; i < residueNum; i++) { yIons[i] = yIons[i-1] - gAAMonoArray[sequence[i - 1]] / j; } /*Look for the y ions*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= yIons[residueNum - 1] - gParam.fragmentErr) /*skip over lower masses*/ { if(currPtr->mOverZ > yIons[1] + gParam.fragmentErr) { break; /*stop looking above the highest mass y ion*/ } for(i = 1; i <= residueNum; i++) /*the highest mass y ion should not count*/ { if(currPtr->mOverZ >= yIons[i] - gParam.fragmentErr && currPtr->mOverZ <= yIons[i] + gParam.fragmentErr) { if(yIons[i] > 350 * gMultiplier *(j - 1)) { score += currPtr->intensity; } } } } currPtr = currPtr->next; } } return(score); } /**********************************CheckCterm********************************************** * * */ char CheckCterm(struct MSData *firstMassPtr) { INT_4 yLys, yArg; char cTerm; BOOLEAN yLysFound, yArgFound; struct MSData *currPtr; /*Intialize*/ yLys = gMonoMass_x100[K] + 2 * gElementMass_x100[HYDROGEN] + gParam.modifiedCTerm; yArg = gMonoMass_x100[R] + 2 * gElementMass_x100[HYDROGEN] + gParam.modifiedCTerm; yLysFound = FALSE; yArgFound = FALSE; /*Look for the y1 ions*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= yLys - gParam.fragmentErr) { if(currPtr->mOverZ > yArg + gParam.fragmentErr) { break; } if(currPtr->mOverZ >= yLys - gParam.fragmentErr && currPtr->mOverZ <= yLys + gParam.fragmentErr) { yLysFound = TRUE; } if(currPtr->mOverZ >= yArg - gParam.fragmentErr && currPtr->mOverZ <= yArg + gParam.fragmentErr) { yArgFound = TRUE; } } currPtr = currPtr->next; } /*Now decide what to report back*/ if(yArgFound && !yLysFound) { cTerm = 'R'; } else if(yLysFound && !yArgFound) { cTerm = 'K'; } else { cTerm = 'B'; /*neither or both were found, so its ambiguous*/ } if(gParam.proteolysis != 'T') { cTerm = 'N'; /*its not a tryptic cleavage*/ } return(cTerm); } /***********************************MakeAAArray******************************************** * * */ void MakeAAArray(void) { INT_4 i, j, smallestNumber, smallestNumberIndex; BOOLEAN keep; /*Intialize*/ for(i = 0; i < AMINO_ACID_NUMBER; i++) { gAAArray[i] = 0; } gAANum = 0; for(i = 0; i < gAminoAcidNumber; i++) { if(i != Q) { smallestNumber = 100000000; for(j = 0; j < gAminoAcidNumber; j++) { if(j != Q) { if(gMonoMass[j] < smallestNumber && gMonoMass[j] > 0) { smallestNumberIndex = j; smallestNumber = gMonoMass[j]; } } } gMonoMass[smallestNumberIndex] *= -1; keep = TRUE; for(j = 0; j < gAANum; j++) { if(smallestNumber == gAAArray[j]) { keep = FALSE; break; } } if(keep) { gAAArray[gAANum] = smallestNumber; gAAMonoArray[gAANum] = gMonoMass_x100[smallestNumberIndex]; if(smallestNumberIndex == K) { gCTermKIndex = gAANum; } if(smallestNumberIndex == R) { gCTermRIndex = gAANum; } gAANum++; } } } /*Set gMonoMass back to positive numbers*/ for(i = 0; i < gAminoAcidNumber; i++) { if(gMonoMass[i] < 0) { gMonoMass[i] *= -1; } } gMassRange = gAAArray[gAANum - 1] - gAAArray[0]; return; } /**********************************GetBestNtermSeq****************************************** * * For each input mass of unsequenced N-terminus, find all random sequences that fit the mass. * To limit the computation time, only masses less than an upper limit are examined. Sequences * that fit the mass are scored according to how many y and b ions are matched. */ void GetBestNtermSeq(INT_4 *sequenceToAdd, INT_4 mass, struct MSData *firstMassPtr) { INT_4 maxResidues, minResidues, residueNum, *sequence, i, j; INT_4 ratchetMass, nominalMass; REAL_4 testNum, position, score, bestScore; REAL_4 massLimit = 600; /*largest bit of unsequenced mass to be examined*/ if(mass > massLimit * gMultiplier) { sequenceToAdd[0] = 0; /*signal that nothing was found*/ return; /*only try for short bits of mass*/ } /*Assign space to arrays*/ sequence = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4)); if(sequence == NULL) { printf("Haggis:FleshOutSequences memory error"); exit(1); } /*Initialize*/ for(i = 0; i < MAX_PEPTIDE_LENGTH; i++) { sequence[i] = 0; /*this is the array used to find random sequences*/ sequenceToAdd[i] = 0; /*this is the final best sequence to add for this particular mass */ } bestScore = 0; /*Find max and min number of residues*/ testNum = (REAL_4)mass / gMonoMass_x100[G]; maxResidues = testNum; if(maxResidues < 2) { return; /*presumably a single amino acid would have been found already*/ } if(maxResidues > MAX_PEPTIDE_LENGTH) { maxResidues = MAX_PEPTIDE_LENGTH; /*this should never happen...*/ } testNum = (REAL_4)mass / gMonoMass_x100[W]; testNum = testNum + 1; minResidues = testNum; if(minResidues < 2) { minResidues = 2; /*presumably a single amino acid would have been found already*/ } nominalMass = mass / gMultiplier; /*use nominal masses now*/ for(residueNum = minResidues; residueNum <= maxResidues; residueNum++) { /*Initialize each time the sequence length changes*/ ratchetMass = 0; for(i = 0; i < residueNum; i++) { sequence[i] = 0; ratchetMass += gAAArray[0]; } sequence[0] = -1; /*The first time through Ratchet moves this to zero*/ position = 0; /*Ratchet produces a new sequence until all have been done, at which point it returns a NULL*/ while(ratchetMass != 0) { /*Ratchets through all possible sequences, and returns mass of sequence*/ ratchetMass = RatchetHaggis(sequence, residueNum, position, ratchetMass, nominalMass); if(ratchetMass == nominalMass) { /*Score the sequence*/ score = NSequenceScore(mass, sequence, residueNum, firstMassPtr); /*Check if this is the highest scoring sequence so far (save it)*/ if(score > bestScore) { bestScore = score; for(i = 0; i < residueNum; i++) { sequenceToAdd[i] = gAAArray[sequence[i]]; /*puts nominal masses into array*/ } for(i = residueNum; i < MAX_PEPTIDE_LENGTH; i++) { sequenceToAdd[i] = 0; /*backfill*/ } } } } } /*Replace nominal masses in sequenceToAdd with monoisotopic masses*/ i = 0; while(sequenceToAdd[i] != 0) { for(j = 0; j < gAminoAcidNumber; j++) { if(j != K && j != I) { if(sequenceToAdd[i] == gNomMass[j] && i < MAX_PEPTIDE_LENGTH) { sequenceToAdd[i] = gMonoMass_x100[j]; break; } } } i++; } /*free the arrays*/ free(sequence); return; } /*********************************NSequenceScore********************************************* * * */ REAL_4 NSequenceScore(INT_4 mass, INT_4 *sequence, INT_4 residueNum, struct MSData *firstMassPtr) { REAL_4 score, precursor; INT_4 bIons[MAX_PEPTIDE_LENGTH], yIons[MAX_PEPTIDE_LENGTH]; INT_4 i, j, maxCharge; struct MSData *currPtr; /*Initialize*/ score = 0; if(gParam.chargeState > 2) { maxCharge = 2; } else { maxCharge = 1; } precursor = (gParam.peptideMW + gParam.chargeState * gElementMass_x100[HYDROGEN]) / gParam.chargeState; /*Calculate b ions at different charge states and search for them*/ for(j = 1; j <= maxCharge; j++) { /*Calculate b ions*/ bIons[0] = (gParam.modifiedNTerm + gAAMonoArray[sequence[0]] + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(i = 1; i < residueNum; i++) { bIons[i] = bIons[i-1] + gAAMonoArray[sequence[i]] / j; } /*Look for the b ions*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= bIons[1] - gParam.fragmentErr) /*skip over the b1 ions*/ { if(currPtr->mOverZ > bIons[residueNum - 1] + gParam.fragmentErr) { break; /*stop looking above the highest mass b ion*/ } for(i = 1; i < residueNum - 1; i++) /*the highest b ion should not count*/ { if(currPtr->mOverZ >= bIons[i] - gParam.fragmentErr && currPtr->mOverZ <= bIons[i] + gParam.fragmentErr) { if(bIons[i] > 350 * gMultiplier *(j - 1)) { if(bIons[i] < precursor || gParam.fragmentPattern == 'L') { score += currPtr->intensity; } } } } } currPtr = currPtr->next; } } /*Calculate y ions at different charge states and search for them*/ for(j = 1; j <= maxCharge; j++) { /*Calculate y ions*/ yIons[0] = (gParam.peptideMW - gParam.modifiedNTerm + 2 * gElementMass_x100[HYDROGEN] + (j-1) * gElementMass_x100[HYDROGEN]) / j; for(i = 1; i <= residueNum; i++) { yIons[i] = yIons[i-1] - gAAMonoArray[sequence[i - 1]] / j; } /*Look for the y ions*/ currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ >= yIons[residueNum] - gParam.fragmentErr) /*skip over lower masses*/ { if(currPtr->mOverZ > yIons[1] + gParam.fragmentErr) { break; /*stop looking above the highest mass b ion*/ } for(i = 1; i < residueNum; i++) /*the lowest mass y ion should not count*/ { if(currPtr->mOverZ >= yIons[i] - gParam.fragmentErr && currPtr->mOverZ <= yIons[i] + gParam.fragmentErr) { if(yIons[i] > 350 * gMultiplier *(j - 1)) { score += currPtr->intensity; } } } } currPtr = currPtr->next; } } return(score); } /*********************************RatchetHaggis********************************************** * * */ INT_4 RatchetHaggis(INT_4 *sequence, INT_4 residueNum, INT_4 position, INT_4 ratchetMass, INT_4 correctMass) { INT_4 i; if(ratchetMass == 0) { return(0); /*make sure that this recursive call unwinds itself*/ } if(sequence[0] == -1) /*just starting out*/ { sequence[0] += 1; } else if(sequence[position] < gAANum - 1) /*changing the right most amino acid*/ { ratchetMass = ratchetMass - gAAArray[sequence[position]]; sequence[position] += 1; ratchetMass += gAAArray[sequence[position]]; } else /*need to ratchet*/ { if(sequence[position] + 1 >= gAANum && position < residueNum) { ratchetMass = ratchetMass - gAAArray[gAANum - 1]; ratchetMass += gAAArray[0]; sequence[position] = 0; position += 1; if(position >= residueNum) { return(0); } ratchetMass = RatchetHaggis(sequence, residueNum, position, ratchetMass, correctMass); } else { return(0); /*you've reached the end of the road*/ } } /*Try to skip if nowhere near the right mass*/ if(/*sequence[0] == 0 &&*/ ratchetMass != 0 && position < residueNum) { if(ratchetMass + gMassRange < correctMass) /*if the mass is nowhere near close enough*/ { ratchetMass = ratchetMass - gAAArray[sequence[0]]; ratchetMass += gAAArray[gAANum - 1]; sequence[0] = gAANum - 1; position = 0; ratchetMass = RatchetHaggis(sequence, residueNum, position, ratchetMass, correctMass); } if(ratchetMass > correctMass) /*if the mass is already to much*/ { for(i = 0; i <= position; i++) { ratchetMass = ratchetMass - gAAArray[sequence[i]]; ratchetMass += gAAArray[gAANum - 1]; sequence[i] = gAANum - 1; } position = 0; ratchetMass = RatchetHaggis(sequence, residueNum, position, ratchetMass, correctMass); } } return(ratchetMass); } /**********************GetCTermMasses**************************************************** * * Make a list of c-terminal unsequenced masses. */ void GetCTermMasses(INT_4 *cTermMassNum, INT_4 *cTermMasses) { INT_4 i, j; BOOLEAN newMassTest; for(i = 0; i < gSequenceNum; i++) { newMassTest = TRUE; for(j = 0; j < *cTermMassNum; j++) { if(gPepMassSeq[i][gPepLength[i] - 1] <= cTermMasses[j] + gParam.fragmentErr && gPepMassSeq[i][gPepLength[i] - 1] >= cTermMasses[j] - gParam.fragmentErr) { newMassTest = FALSE; break; } } if(newMassTest && *cTermMassNum < MAX_SEQUENCES * 2) { cTermMasses[*cTermMassNum] = gPepMassSeq[i][gPepLength[i] - 1]; *cTermMassNum += 1; } } return; } /**********************GetNTermMasses**************************************************** * * Make a list of n-terminal unsequenced masses. */ void GetNTermMasses(INT_4 *nTermMassNum, INT_4 *nTermMasses) { INT_4 i, j; BOOLEAN newMassTest; for(i = 0; i < gSequenceNum; i++) { newMassTest = TRUE; for(j = 0; j < *nTermMassNum; j++) { if(gPepMassSeq[i][0] <= nTermMasses[j] + gParam.fragmentErr && gPepMassSeq[i][0] >= nTermMasses[j] - gParam.fragmentErr) { newMassTest = FALSE; break; } } if(newMassTest && *nTermMassNum < MAX_SEQUENCES * 2) { nTermMasses[*nTermMassNum] = gPepMassSeq[i][0]; *nTermMassNum += 1; } } return; } /******************LoadHaggisSequenceStruct******************************************** * * LoadSequenceStruct puts the residue masses in the peptide[] field, peptide length, * score, nodeValue, gapNum, and nodeCorrection, in their fields. Of * course, to do this the function finds some memory, and this value is returned as a pointer * to a struct of type Sequence (which now contains all of this data). */ struct Sequence *LoadHaggisSequenceStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum, INT_2 nodeCorrection) { struct Sequence *currPtr; INT_4 i; currPtr = (struct Sequence *) malloc(sizeof(struct Sequence)); if(currPtr == NULL) { printf("LoadSequenceStruct in Haggis: Out of mammories"); exit(1); } for(i = 0; i < peptideLength; i++) { currPtr->peptide[i] = peptide[i]; } currPtr->peptideLength = peptideLength; currPtr->score = score; currPtr->gapNum = gapNum; currPtr->nodeValue = nodeValue; currPtr->nodeCorrection = nodeCorrection; currPtr->next = NULL; return(currPtr); } /****************LinkHaggisSubsequenceList********************************************************** * * This function adds a subsequence onto the existing linked list of structs of type Sequence. * It adds structs in order of their score fields, so that the first in the list has the * highest score and the last in the list has the lowest score. * */ struct Sequence *LinkHaggisSubsequenceList(struct Sequence *firstPtr, struct Sequence *newPtr) { struct Sequence *lastPtr; char test = TRUE; if(firstPtr == NULL) /*If this is the first struct of the list then do this.*/ firstPtr = newPtr; else { /*Find the last sequence in the list*/ lastPtr = firstPtr; while(lastPtr->next != NULL) { lastPtr = lastPtr->next; } lastPtr->next = newPtr; } return(firstPtr); } /***************************************GetSequenceOfResidues****************************** * * Convert sequence of nodes to sequence of amino acid residue masses. If a mass does * not equal a known residue mass then save the mass as is. */ void GetSequenceOfResidues(INT_4 *mass) { INT_4 i, j, testMass, k, y1R, y1K, residueMass; BOOLEAN y1Found, weirdMass; /*Initialize*/ for(i = 0; i < MAX_SEQUENCES*2; i++) { gMatchSeries[i] = -1; } //Create sequences for(i = 0; i < gSeqCount; i++) { //Assume y ions j = 0; while(gSequenceNodes[i][j] != 0) { j++; //find the end of the sequence } gPepLength[gSequenceNum] = j + 1; testMass = mass[gSequenceNodes[i][0]] - gParam.modifiedCTerm - gElementMass_x100[HYDROGEN]*2; gPepMassSeq[gSequenceNum][j] = ResidueMass(testMass); /*provide mass from gMonoMass_x100 if possible*/ j--; k = 1; while(j > 0) { testMass = mass[gSequenceNodes[i][k]] - mass[gSequenceNodes[i][k-1]]; gPepMassSeq[gSequenceNum][j] = ResidueMass(testMass); /*provide mass from gMonoMass_x100 if possible*/ j--; k++; } testMass = gParam.peptideMW + gElementMass_x100[HYDROGEN] - mass[gSequenceNodes[i][k-1]]; gPepMassSeq[gSequenceNum][0] = ResidueMass(testMass); /*provide mass from gMonoMass_x100 if possible*/ gSequenceNum++; //Assume b ions y1R = gMonoMass_x100[R] + gElementMass_x100[HYDROGEN]*2 + gParam.modifiedCTerm; y1K = gMonoMass_x100[K] + gElementMass_x100[HYDROGEN]*2 + gParam.modifiedCTerm; testMass = mass[gSequenceNodes[i][0]]; if((testMass > y1K - gParam.fragmentErr && testMass < y1K + gParam.fragmentErr) || (testMass > y1R - gParam.fragmentErr && testMass < y1R + gParam.fragmentErr)) { y1Found = TRUE; } else { y1Found = FALSE; } if(!y1Found) //Don't even store if a y1 ion for Arg or Lys found { testMass = mass[gSequenceNodes[i][0]] - gParam.modifiedNTerm; gPepMassSeq[gSequenceNum][0] = ResidueMass(testMass); gMatchSeries[gSequenceNum] = gSequenceNum - 1; /*denotes that this is a b ion series repeat*/ j = 1; while(gSequenceNodes[i][j] != 0) { testMass = mass[gSequenceNodes[i][j]] - mass[gSequenceNodes[i][j-1]]; gPepMassSeq[gSequenceNum][j] = ResidueMass(testMass); j++; } testMass = gParam.peptideMW - gParam.modifiedCTerm - mass[gSequenceNodes[i][j-1]]; gPepMassSeq[gSequenceNum][j] = ResidueMass(testMass); gPepLength[gSequenceNum] = j + 1; gSequenceNum++; } } /* * Clean up the N- and C-terminal ends to make sure they do not contain odd masses, like "53", or something */ for(i = 0; i < gSequenceNum; i++) { residueMass = gPepMassSeq[i][gPepLength[i] - 1]; /*Check to see if the C-terminal mass makes any sense (think about adding this later)*/ if(residueMass < 184.121 * gMultiplier - gParam.fragmentErr) { weirdMass = TRUE; if((residueMass < 174.064 * gMultiplier + gParam.fragmentErr && residueMass > 174.064 * gMultiplier - gParam.fragmentErr) || (residueMass < 172.048 * gMultiplier + gParam.fragmentErr && residueMass > 172.048 * gMultiplier - gParam.fragmentErr) || (residueMass < 172.085 * gMultiplier + gParam.fragmentErr && residueMass > 172.085 * gMultiplier - gParam.fragmentErr) || (residueMass < 171.064 * gMultiplier + gParam.fragmentErr && residueMass > 171.064 * gMultiplier - gParam.fragmentErr) || (residueMass < 170.106 * gMultiplier + gParam.fragmentErr && residueMass > 170.106 * gMultiplier - gParam.fragmentErr) || (residueMass < 168.090 * gMultiplier + gParam.fragmentErr && residueMass > 168.090 * gMultiplier - gParam.fragmentErr) || (residueMass < 158.069 * gMultiplier + gParam.fragmentErr && residueMass > 158.069 * gMultiplier - gParam.fragmentErr) || (residueMass < 156.09 * gMultiplier + gParam.fragmentErr && residueMass > 156.09 * gMultiplier - gParam.fragmentErr) || (residueMass < 154.074 * gMultiplier + gParam.fragmentErr && residueMass > 154.074 * gMultiplier - gParam.fragmentErr) || (residueMass < 144.055 * gMultiplier + gParam.fragmentErr && residueMass > 144.055 * gMultiplier - gParam.fragmentErr) || (residueMass < 142.074 * gMultiplier + gParam.fragmentErr && residueMass > 142.074 * gMultiplier - gParam.fragmentErr)) { weirdMass = FALSE; /*these are low mass two aa residue masses*/ } for(j = 0; j < gAminoAcidNumber; j++) { if(residueMass < gMonoMass_x100[j] + gParam.fragmentErr && residueMass > gMonoMass_x100[j] - gParam.fragmentErr) { weirdMass = FALSE; break; } } } else { weirdMass = FALSE; } if(weirdMass) /*add the weird c-term mass to the penultimate c-term mass*/ { gPepMassSeq[i][gPepLength[i] - 2] += gPepMassSeq[i][gPepLength[i] - 1]; gPepLength[i] -= 1; } /*Check to see if the N-terminal mass makes any sense*/ residueMass = gPepMassSeq[i][0]; if(residueMass < 184.121 * gMultiplier - gParam.fragmentErr) { weirdMass = TRUE; if((residueMass < 174.064 * gMultiplier + gParam.fragmentErr && residueMass > 174.064 * gMultiplier - gParam.fragmentErr) || (residueMass < 172.048 * gMultiplier + gParam.fragmentErr && residueMass > 172.048 * gMultiplier - gParam.fragmentErr) || (residueMass < 172.085 * gMultiplier + gParam.fragmentErr && residueMass > 172.085 * gMultiplier - gParam.fragmentErr) || (residueMass < 171.064 * gMultiplier + gParam.fragmentErr && residueMass > 171.064 * gMultiplier - gParam.fragmentErr) || (residueMass < 170.106 * gMultiplier + gParam.fragmentErr && residueMass > 170.106 * gMultiplier - gParam.fragmentErr) || (residueMass < 168.090 * gMultiplier + gParam.fragmentErr && residueMass > 168.090 * gMultiplier - gParam.fragmentErr) || (residueMass < 158.069 * gMultiplier + gParam.fragmentErr && residueMass > 158.069 * gMultiplier - gParam.fragmentErr) || (residueMass < 156.09 * gMultiplier + gParam.fragmentErr && residueMass > 156.09 * gMultiplier - gParam.fragmentErr) || (residueMass < 154.074 * gMultiplier + gParam.fragmentErr && residueMass > 154.074 * gMultiplier - gParam.fragmentErr) || (residueMass < 144.055 * gMultiplier + gParam.fragmentErr && residueMass > 144.055 * gMultiplier - gParam.fragmentErr) || (residueMass < 142.074 * gMultiplier + gParam.fragmentErr && residueMass > 142.074 * gMultiplier - gParam.fragmentErr)) { weirdMass = FALSE; /*these are low mass two aa residue masses*/ } for(j = 0; j < gAminoAcidNumber; j++) { if(residueMass < gMonoMass_x100[j] + gParam.fragmentErr && residueMass > gMonoMass_x100[j] - gParam.fragmentErr) { weirdMass = FALSE; break; } } } else { weirdMass = FALSE; } if(weirdMass) //add the weird c-term mass to the penultimate c-term mass { gPepMassSeq[i][0] += gPepMassSeq[i][1]; for(j = 2; j < gPepLength[i]; j++) { gPepMassSeq[i][j - 1] = gPepMassSeq[i][j]; } gPepLength[i] -= 1; } } return; } /******************************FindNodeSequences********************************* * * Given the ways that the ions can be connected (gForwardNodeConnect and gBackwardNodeConnect), * find all possible pathways through the nodes. */ void FindNodeSequences(INT_4 *mass) { INT_4 i, j, k, inputIon; BOOLEAN test; /*Initialize*/ gSeqCount = 0; for(i = 0; i < MAX_SEQUENCES; i++) { for(j = 0; j < MAX_ION_NUM; j++) { gSequenceNodes[i][j] = 0; } } /* Step through the nodes from low mass to high mass*/ for(i = 1; i < gIonCount; i++) /*This is the first node, which is incremented up to the top*/ { if(gForwardNum[i] != 0 && gNotTooManySequences) /*gNotTooManySequences signals that array is filled*/ { gEdgeNum = 0; /*Counts the edges as the tree is searched*/ test = TRUE; /*Becomes FALSE when all paths are searched from node i*/ inputIon = i; /*Need to get forward/backward arrays with correct positive/negative values for starting over*/ for(j = 0; j < gIonCount; j++) { for(k = 0; k < gBackwardNum[j]; k++) { if(gBackwardNodeConnect[j][k] > 0) { gBackwardNodeConnect[j][k] = -1 * gBackwardNodeConnect[j][k]; } } for(k = 0; k < gForwardNum[j]; k++) { if(gForwardNodeConnect[j][k] < 0) { gForwardNodeConnect[j][k] = -1 * gForwardNodeConnect[j][k]; } } } /*test is positive until the low mass terminal node has no remaining pathways*/ while(test) { test = NodeStep(&inputIon, mass); } } } return; } /******************************SetupBackwardAndForwardNodes*************************** * */ void SetupBackwardAndForwardNodes(INT_4 *mass) { INT_4 i, j, k, l, massDiff; /*Initialize variables*/ for(i = 0; i < MAX_ION_NUM; i++) { gForwardNum[i] = 0; gBackwardNum[i] = 0; for(j = 0; j < AMINO_ACID_NUMBER; j++) { gForwardNodeConnect[i][j] = 0; gBackwardNodeConnect[i][j] = 0; } } /* First assume fragment ions are all singly charged.*/ for(i = 0; i < gIonCount; i++) { for(j = i; j < gIonCount; j++) { massDiff = mass[j] - mass[i]; if(massDiff >= gMonoMass_x100[G] - gParam.fragmentErr && massDiff <= gMonoMass_x100[W] + gParam.fragmentErr) { for(k = 0; k < gAminoAcidNumber; k++) { if(massDiff <= gMonoMass_x100[k] + gParam.fragmentErr && massDiff>= gMonoMass_x100[k] - gParam.fragmentErr) { gBackwardNodeConnect[j][gBackwardNum[j]] = -i; gBackwardNum[j]++; gForwardNodeConnect[i][gForwardNum[i]] = j; gForwardNum[i]++; break; } } } } } /* Clean up the node connections. If a connection is made that is comprised of a larger jump that could also be two smaller connections (ie, two Gly to Gly jumps versus a single Asn jump), then the larger connection is eliminated.*/ for(i = 0; i < gIonCount; i++) { if(gForwardNum[i] > 1) { for(j = 0; j < gForwardNum[i] - 1; j++) { for(l = j + 1; l < gForwardNum[i]; l++) { if(gForwardNodeConnect[i][l] > 0 && gForwardNodeConnect[i][j] > 0) { massDiff = mass[gForwardNodeConnect[i][l]] - mass[gForwardNodeConnect[i][j]]; if(massDiff < gParam.fragmentErr) { gForwardNodeConnect[i][l] = 0; /*G+V=R, for example*/ } else { for(k = 0; k < gAminoAcidNumber; k++) { if(massDiff <= gMonoMass_x100[k] + gParam.fragmentErr && massDiff>= gMonoMass_x100[k] - gParam.fragmentErr) { gForwardNodeConnect[i][l] = 0; } } } } } } } } /*Get rid of the zero value node forward connections.*/ for(i = 0; i < gIonCount; i++) { for(j = 0; j < gForwardNum[i]; j++) { if(gForwardNodeConnect[i][j] == 0) { for(l = j; l < gForwardNum[i]; l++) { gForwardNodeConnect[i][l] = gForwardNodeConnect[i][l+1]; } gForwardNum[i] -= 1; } } } return; } /******************************LoadMassArrays*************************************** * * Fills mass array with ion masses. */ INT_4 *LoadMassArrays(INT_4 *mass, struct MSData *firstMassPtr, INT_4 charge) { INT_4 i; struct MSData *currPtr; /*Initialize variables*/ gIonCount = 1; for(i = 0; i < MAX_ION_NUM; i++) { mass[i] = 0; } /*Load mass array*/ currPtr = firstMassPtr; if(currPtr == NULL) { printf("Problem in LoadMassArrays: Haggis"); exit(1); /*No ions!!*/ } while(currPtr != NULL) { if(currPtr->mOverZ > 400 * gMultiplier * (charge - 1)) /*make sure big enough to hold the charge*/ { if(currPtr->mOverZ * charge < gParam.peptideMW - gMonoMass_x100[G]) { if(currPtr->mOverZ >= gMonoMass_x100[K] + gParam.modifiedCTerm + 2*gElementMass_x100[HYDROGEN] - gParam.fragmentErr) { mass[gIonCount] = currPtr->mOverZ * charge - ((charge-1)*gElementMass_x100[HYDROGEN]); gIonCount++; } if(gIonCount > MAX_ION_NUM) { printf("Problem in LoadMassArrays: Haggis"); exit(1); /*Too many ions; I'll exceed the array sizes.*/ } } } currPtr = currPtr->next; } return(mass); } /*****************************ResidueMass*********************************************** * * Given an input INT_4 testMass, determine if it matches to a residue mass in the * array gMonoMass_x100, then the gMonoMass_x100 derived number is returned. If its * not found, then the original is returned. * */ INT_4 ResidueMass(INT_4 inputMass) { INT_4 i, outputMass; BOOLEAN massFound = FALSE; for(i = 0; i < gAminoAcidNumber; i++) { if(inputMass <= gMonoMass_x100[i] + gParam.fragmentErr && inputMass >= gMonoMass_x100[i] - gParam.fragmentErr) { massFound = TRUE; outputMass = gMonoMass_x100[i]; break; } } if(!massFound) { outputMass = inputMass; } return(outputMass); } /*****************************NodeStep************************************************* * * NodeStep is a recursive function that takes in a node position, and then moves forward * or backward in the graph, all the while counting the number of edges in the path and * keeping track of the longest path. It returns either a TRUE or FALSE value. The default * return is a TRUE value, until all paths have been followed from a given starting node. If * all paths have been followed, then FALSE is returned to signal the end of the path-finding * from a particular starting node. Given a node position, the function steps forward one edge if * the edge has not previously been followed, the gEdgeNum is incremented up one and compared to gMaxEdgeNum. * The edge forward is used is made impassable (given a negative value), but the backwards edge is made positive and * passable. If a node position has no forward edges, then it follows a passable backward edge and the * that edge used to travel backwards is made impassable again. The edge numbering is decremented and the * function calls itself again. Eventually, the program gets back to the starting node, and no edges are passable * from that node, and that is when the function returns a FALSE value to signal that its time to move on to * another starting node. */ BOOLEAN NodeStep(INT_4 *nodeNum, INT_4 *nodeMass) { BOOLEAN wayForward = FALSE; /*assume that there are no edges towards high mass*/ BOOLEAN wayBackward = FALSE; /*assume that there is no edges leading backwards*/ BOOLEAN keepGoing = TRUE; /*becomes FALSE when no more paths to follow*/ BOOLEAN failureTest; INT_4 i, j, newNode, oldNode; if(gIonCount > MAX_ION_NUM) /*array boundary*/ { printf("gIonCount > MAX_ION_NUM"); exit(1); } if(gAminoAcidNumber > AMINO_ACID_NUMBER) { printf("gAminoAcidNumber > AMINO_ACID_NUMBER"); exit(1); } /*8/20/03 to allow for using parts of the tree that were used before, but connected via a different node at the bottom; might need to get rid of this if it causes problems later*/ for(i = *nodeNum + 1; i < gIonCount/*MAX_ION_NUM*/; i++) { for(j = 0; j < gForwardNum[*nodeNum]; j++) { if(gForwardNodeConnect[i][j] < 0) { gForwardNodeConnect[i][j] *= -1; } } } /* Figure out if I can go up in mass or not*/ if(gForwardNum[*nodeNum] != 0 && *nodeNum < gIonCount) { for(i = 0; i < gForwardNum[*nodeNum]; i++) { if(gForwardNodeConnect[*nodeNum][i] > 0) { wayForward = TRUE; /*There is an edge available that leads to higher mass nodes*/ break; } } } /* Now that its known if there is an edge to higher mass, or not, we can proceed.*/ if(wayForward) { for(i = 0; i < gForwardNum[*nodeNum]; i++) { if(gForwardNodeConnect[*nodeNum][i] > 0 && i < gAminoAcidNumber && *nodeNum < gIonCount) { gEdgeNum++; newNode = gForwardNodeConnect[*nodeNum][i]; /*if(gEdgeNum > gMaxEdgeNum) { gMaxEdgeNum = gEdgeNum; SaveLongestSequence(*nodeNum, newNode); }*/ gForwardNodeConnect[*nodeNum][i] = -1 * gForwardNodeConnect[*nodeNum][i]; failureTest = TRUE; for(j = 0; j < gBackwardNum[newNode]; j++) { if(*nodeNum == -1 * gBackwardNodeConnect[newNode][j]) { gBackwardNodeConnect[newNode][j] = -1 * gBackwardNodeConnect[newNode][j]; failureTest = FALSE; break; } } if(failureTest) { printf("Problem in function NodeStep"); exit(1); } break; } } oldNode = *nodeNum; /*debug*/ *nodeNum = newNode; keepGoing = NodeStep(nodeNum, nodeMass); if(!keepGoing) { return(FALSE); } } else /*need to back-track, if possible*/ { /* Store the sequences that cannot be extended (in a global array)*/ if(gSeqCount >= MAX_SEQUENCES) { gNotTooManySequences = FALSE; printf("Haggis had to quit early, because there were too many sequences found.\n"); return(FALSE); /*too many sequences, so stop now*/ } else { StoreSeq(*nodeNum, nodeMass); /*there is room for more sequences*/ } /* Figure out if I can go down in mass or not*/ if(gBackwardNum[*nodeNum] != 0) { for(i = 0; i 0) { wayBackward = TRUE; /*There is an edge available that leads to lower mass nodes*/ break; } } } keepGoing = FALSE; /*will keep going if there is an edge that leads backwards*/ if(wayBackward) { for(i = 0; gBackwardNum[*nodeNum]; i++) { if(gBackwardNodeConnect[*nodeNum][i] > 0 && *nodeNum < gIonCount/*MAX_ION_NUM*/ && i < gAminoAcidNumber) { keepGoing = TRUE; gEdgeNum--; newNode = gBackwardNodeConnect[*nodeNum][i]; gBackwardNodeConnect[*nodeNum][i] = -1 * gBackwardNodeConnect[*nodeNum][i]; *nodeNum = newNode; break; } } keepGoing = NodeStep(nodeNum, nodeMass); } if(!keepGoing) { return(FALSE); } } return(keepGoing); } /**********************************StoreSeq************************************************************************ * * */ void StoreSeq(INT_4 nodeNum, INT_4 *nodeMass) { INT_4 i, j, k, index, gapNum, minEdgeNum; REAL_4 highMass, lowMass, testMass; BOOLEAN foundBottomNode = FALSE; BOOLEAN keepTheSeq = TRUE; BOOLEAN testForGap; /*Quit before running out of space*/ if(gSeqCount >= MAX_SEQUENCES) { printf("Way too many sequences."); exit(1); } if(gEdgeNum >= MAX_ION_NUM) /*check array boundaries*/ { printf("Problem in StoreSeq"); exit(1); } testMass = gParam.peptideMW / gMultiplier; /*peptide mass*/ testMass = testMass / AV_RESIDUE_MASS; /*guess at the number of residues*/ if(gLutefiskSequenceCount > 10000) { minEdgeNum = testMass / 2 + 0.5; } else if(gLutefiskSequenceCount > 1000) { minEdgeNum = testMass / 3 + 0.5; /*need a series that covers a third of the sequence*/ } else { minEdgeNum = testMass / 4 + 0.5; } if(minEdgeNum < 4) minEdgeNum = 4; /*bottom limit*/ if(gEdgeNum < minEdgeNum) /*Anything with fewer than 4 edges is a useless sequence*/ { return; } /* Initialize the gSequenceNodes*/ for(i = 0; i < MAX_ION_NUM; i++) { gSequenceNodes[gSeqCount][i] = 0; } /* Fill in the sequence*/ gSequenceNodes[gSeqCount][gEdgeNum] = nodeNum; i = gEdgeNum; gapNum = 0; while(!foundBottomNode && i > 0) { foundBottomNode = TRUE; index = gSequenceNodes[gSeqCount][i]; for(j = 0; j < gBackwardNum[index]; j++) { if(gBackwardNodeConnect[index][j] > 0) { foundBottomNode = FALSE; highMass = nodeMass[gSequenceNodes[gSeqCount][i]]; i--; gSequenceNodes[gSeqCount][i] = gBackwardNodeConnect[index][j]; lowMass = nodeMass[gSequenceNodes[gSeqCount][i]]; testMass = highMass - lowMass; testForGap = TRUE; /*start by assuming its a gap*/ for(k = 0; k < gAminoAcidNumber; k++) { if(testMass < gMonoMass_x100[k] + gParam.fragmentErr && testMass > gMonoMass_x100[k] - gParam.fragmentErr) { testForGap = FALSE; /*its not a gap*/ break; } } if(testForGap) { gapNum++; } break; } } } /* Test for too many gaps*/ if(gapNum > gParam.maxGapNum) { keepTheSeq = FALSE; } else { keepTheSeq = TRUE; } /* Test to see if its a subset of a previous sequence*/ if(keepTheSeq) { for(i = 0; i < gSeqCount; i++) { j = 0; while(gSequenceNodes[i][j] != 0 || j == 0) { if(gSequenceNodes[gSeqCount][0] == gSequenceNodes[i][j]) { k = 0; while(gSequenceNodes[gSeqCount][k] != 0) { if(gSequenceNodes[gSeqCount][k] != gSequenceNodes[i][j+k]) { break; /*break out if they are not the same, then check below to see if it reached the end*/ } k++; } if(gSequenceNodes[gSeqCount][k] == 0) /*if the end was reached, then its a subset*/ { keepTheSeq = FALSE; break; } } j++; } if(!keepTheSeq) { break; } } } if(keepTheSeq) /*if the sequence is kept, then the sequence counter is incremented up one*/ { gSeqCount++; } else { i = 0; while(gSequenceNodes[gSeqCount][i] != 0) { gSequenceNodes[gSeqCount][i] = 0; /*reinitialize to zero*/ i++; } } return; }lutefisk-1.0.7+dfsg.orig/src/LutefiskMain.c0000644000175000017500000032016110533410050020471 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu ********************************************************************************************* Lutefisk is a program designed to aid in the interpretation of CID data of peptides. The main assumptions are that the data is of reasonable quality, the N- and C-terminal modifications (if any) are known, and the precursor ion charge (and therefore the peptide molecular weight) are known. The ultimate goal here is to develop code that can utilize msms data in conjunction with ambiguous and incomplete Edman sequencing data, sequence tags, peptide derivatization, and protein or est database searches. An older version of Lutefisk has been written in FORTRAN and runs on 68K Macs that have an fpu (1991, 39th ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN, pp 1233- 1234). This is a different and improved algorithm partly inspired by Fernandez-de-Cossjo, et al. (1995) CABIOS Vol. 11 No. 4 pp 427-434. Combining this msms interpretation algorithm with Edman sequencing, database searches, and derivatization is entirely of my own design; J. Alex Taylor implemented the changes in the FASTA code (Bill Pearson, U. of VA) so that the Lutefisk output can be read directly by the modified FASTA program. In addition, there were a number of additional critical changes made to FASTA to make it more compatible with msms sequencing data. The trademark Lutefisk was chosen at random, and is not meant to imply any similarity between this computer program and the partially base-hydrolzyed cod fish of the same name. ********************************************************************************************/ /* ANSI headers */ #include #include #include #include #include #include #if(defined(__MWERKS__) && __dest_os == __mac_os) /* Some Macintosh specific things */ #include "getopt.h" #include #include #include StandardFileReply freply; Point wpos; INT_4 tval; char prompt[256]; #endif #if(defined(__MWERKS__) && __dest_os == __win32_os) /* Some Windoze specific things */ #include "getopt.h" #endif /* Lutefisk headers */ #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" char versionString[256] = "LutefiskXP v1.0.7\nCopyright 1996-1906 Richard S. Johnson\n\n"; /* //-------------------------------------------------------------------------------- // main() //-------------------------------------------------------------------------------- There are six main steps to be performed by Lutefisk: 1. Import the information from the editable text file Lutefisk.params and Lutefisk.details. Lutefisk.params contains information like the CID data file name, the peptide molecular weight, and so on. Lutefisk.details contains more esoteric information that will not typically be modified, but that would be nice to be able to alter. For example, the .details file contains info relating to the types of ions to be considered, and the scoring values for those ions. In addition, there is the file Lutefisk.edmans, which contains the Edman data, if its available. 2. Import the CID data. This is currently done by making an ASCI file of m/z versus intensity using Finnigan's "List" program. Then this ASCI file is imported by Lutefisk. Its also possible to import a tab-delimited ASCI file not produced by Finnigan. 3. Find the nodes on the sequence graph. The CID data is converted to a sequence graph, which is where each ion is assumed to be one of the ion types under consideration (b, or y, or whatever) and then mathematically converted to the corresponding b ion value. For example, an ion at m/z 800 for a peptide of molecular weight 999 when assumed to be a y ion would mathematically be equivalent to a b ion of m/z 200. The nodes on this sequence graph are integer values, so that differences between nodes are directly comparable to nominal mass values of amino acid residues. Each node will have two scores - Nterm and Cterm - derived from the number of ions that provide evidence for each node (C-terminal ions and N-terminal ions). If a sequence tag is available, then it can be used to reduce the size of the sequence graph in that those nodes corresponding to the tag are removed. There are three ways to make the sequence graph: a general peptide graph, a tryptic triple quad graph, and a tryptic ion trap graph. 4. Calculate the Summed Node Scores starting from the C-terminus, and identify all of the one-edged nodes. Beginning with the C-terminus, start connecting the nodes, and assign to each connected node an additional bonus score. One edge nodes are those that are connected to the C-terminus but cannot be extended toward the N-terminus. These one-edge nodes are used in step #5 for jumping two amino acid gaps in the fragmentation pattern. 5. Build the subsequences beginning at the N-terminus, and keep those with the highest subsequence scores that are derived from the node values established in step #4. If a subsequence cannot be extended, then the program looks for any two amino acid jumps to one of the one-edge nodes from step #4. Unmodified peptides begin at mass 1 (hydrogen), otherwise they start at 43 (acetylation), 44 (carbamylation), or 112 (pyroglutamic acid). In any case, the first jump from the N-terminus can involve either one or two amino acids. Thereafter, only one amino acid is added at a time. For ion trap data, the initial jump can be more than two amino acids. 6. Score the completed sequences obtained from step #5. The score will be the usual percentage of ion current accounted for, plus it might be good to add the option of including the cross-correlation score. *************************************************************************************************/ int main(int argc, char **argv) { INT_4 i; const time_t theTime = (const time_t)time(NULL); extern INT_4 optind; #if(defined(__MWERKS__) && __dest_os == __mac_os) argc = ccommand(&argv); #endif /* gParam.startTicks = clock();*/ if (!SystemCheck()) exit(1); BuildPgmState(argc, argv); /*Read in any line commands*/ if (gParam.fMonitor) { printf(versionString); printf("Run Date: %20s", ctime(&theTime)); } ReadParamsFile(); ReadResidueFile(); /* * Import the information from Lutefisk.details. * * The various ion values are read from Lutefisk.details and are used to assign values to the * nodes in the sequence graph. The value "fragmentPattern" is needed in order to figger out * which of the three columns should be read in as the set of ion values. The function returns * a value that corresponds to the sum of all of the ion values for b, y, etc ions. */ gIonTypeWeightingTotal = ReadDetailsFile(); /* Reassign values for Cys to account for changes in mass due to alkylation.*/ gMonoMass[C] = gParam.cysMW; gAvMass[C] = gParam.cysMW; gNomMass[C] = gParam.cysMW; /* Assign values to the globals H2O and NH3.*/ H2O = 2 * gElementMass[HYDROGEN] + gElementMass[OXYGEN]; NH3 = gElementMass[NITROGEN] + 3 * gElementMass[HYDROGEN]; /* * The Edman data is read from the file Lutefisk.edman if the parameter "edmanPresent" equals * 'Y'. The data is read into an array gEdmanData[cycle number][amino acids in the cycle], and * it contains the nominal mass values of the amino acids found in the file Lutefisk.edman * (rather than a character listing of the amino acid single letter code). */ if (gParam.edmanPresent) { ReadEdmanFile(); } /* Total hack because we mess with these values later on. */ gParam.peptideMW_orig = gParam.peptideMW; gParam.peptideErr_orig = gParam.peptideErr; gParam.fragmentErr_orig = gParam.fragmentErr; gParam.peakWidth_orig = gParam.peakWidth; gParam.monoToAv_orig = gParam.monoToAv; gParam.qtofErr_orig = gParam.qtofErr; gParam.ionOffset_orig = gParam.ionOffset; gParam.cysMW_orig = gParam.cysMW; gParam.tagNMass_orig = gParam.tagNMass; gParam.tagCMass_orig = gParam.tagCMass; gParam.maxGapNum_orig = gParam.maxGapNum; gParam.modifiedNTerm_orig = gParam.modifiedNTerm; gParam.modifiedCTerm_orig = gParam.modifiedCTerm; gParam.topSeqNum_orig = gParam.topSeqNum; strcpy(gParam.outputFile_orig, gParam.outputFile); // optind = 0; /*debug*/ // argc = 3; /*debug*/ if (optind < argc) { for (i = optind; i < argc; i++) { strcpy(gParam.cidFilename, argv[i]); Run(); } } else if (strlen(gParam.cidFilename) > 0) { /* A filename was specified in the Lutefisk.params file */ Run(); } sleep(1); /* Why sleep before quitting? Well, on our 500 MHz alpha some results were being lost when calling the program from a child process via a pipe; seemingly because the pipe was being terminated before all the data had gotten through. */ return(0); /* All done */ } /*************************************************************************************************/ void Run() { REAL_4 actualPeptideMW, actualTopSeqNum, actualFinalSeqNum; INT_4 i, massChange, posNeg; SCHAR *sequenceNode = NULL; SCHAR *sequenceNodeC = NULL; SCHAR *sequenceNodeN = NULL; INT_4 oneEdgeNodesIndex, *oneEdgeNodes = NULL; struct MSData *firstMassPtr = NULL, *firstRawDataPtr = NULL; struct Sequence *firstSequencePtr = NULL; const time_t theTime = (const time_t)time(NULL); gParam.startTicks = clock(); /* Total hack because we mess with these values later on. */ gParam.peptideMW = gParam.peptideMW_orig; gParam.peptideErr = gParam.peptideErr_orig; gParam.fragmentErr = gParam.fragmentErr_orig; gParam.peakWidth = gParam.peakWidth_orig; gParam.monoToAv = gParam.monoToAv_orig; gParam.qtofErr = gParam.qtofErr_orig; gParam.ionOffset = gParam.ionOffset_orig; gParam.cysMW = gParam.cysMW_orig; gParam.tagNMass = gParam.tagNMass_orig; gParam.tagCMass = gParam.tagCMass_orig; gParam.maxGapNum = gParam.maxGapNum_orig; gParam.modifiedNTerm = gParam.modifiedNTerm_orig; gParam.modifiedCTerm = gParam.modifiedCTerm_orig; gParam.topSeqNum = gParam.topSeqNum_orig; strcpy(gParam.outputFile, gParam.outputFile_orig); gFirstTimeThru = TRUE; /* * GetCidData opens an ASCII file containing lists of m/z values and intensities for the * CID data. This file is produced using the Finnigan program called "LIST", by using the * "Print..." command found in the "File" menu. A dialog box appears where you tell it to * the saved format is "ASCII" for both the text and graph displays, and then you provide * a file name and click on the "save to file" button. * The pointer firstMassPtr points to the first * element in the linked list of ion values (m/z and intensity). * * GetCidData uses gElementMass and gMonoMass instead of gElementMass_x100 and gMonoMass_x100, * which haven't even been assigned values yet. */ firstMassPtr = GetCidData(); if (NULL == firstMassPtr) { PrintPartingGiftToFile(); exit(0); } /* * Adjust peptideMW for LCQ data using fragment ion pairs. */ if(gParam.fragmentPattern == 'L') { AdjustPeptideMW(firstMassPtr); } /* * Change output filename if no output filename specified - start name plus ".lut". * martin 98/8/27 */ ChangeOutputName(); /* * If the peptideMW is obtained from the data file header, then the sequence tag cannot be * set up properly until peptideMW is known. Now that the data file has been read, I can * set this up correctly. */ SetupSequenceTag(); /* Assign space to the various arrays.*/ sequenceNode = (SCHAR *) malloc(gGraphLength * sizeof(SCHAR )); /*Will contain summary of evidence.*/ if (sequenceNode == NULL) { printf("main: Out of memory"); exit(1); } sequenceNodeC = (SCHAR *) malloc(gGraphLength * sizeof(char )); /*Will contain C-terminal evidence.*/ if (sequenceNodeC == NULL) { printf("main: Out of memory"); exit(1); } sequenceNodeN = (SCHAR *) malloc(gGraphLength * sizeof(char )); /*Will contain N-terminal evidence.*/ if (sequenceNodeN == NULL) { printf("main: Out of memory"); exit(1); } oneEdgeNodes = (int *) malloc(gGraphLength * sizeof(INT_4 )); /*Will contain evidence that only connects w/ the C-terminus.*/ if (oneEdgeNodes == NULL) { printf("main: Out of memory"); exit(1); } /* If the gParam.maxGapNum is equal to -1, then assign a value based on gParam.peptideMW.*/ if (gParam.maxGapNum == -1) { if (gParam.peptideMW < 1400) { gParam.maxGapNum = 1; } else if (gParam.peptideMW >= 1400 && gParam.peptideMW < 2000) { gParam.maxGapNum = 2; } else if (gParam.peptideMW >= 2000) { gParam.maxGapNum = 3; } } /* * Multiply the gElementMass and gMonoMass values to give integer numbers for the corresponding * arrays of gElementMass_x100 and gMonoMass_x100. These latter arrays are used to represent the * fractional mass values of the elemental and amino acid masses. The defined value of GRAPH_LENGTH * is divided by 10, 100, 1000, 10,000 and 100,000 until a value less than 10,000 is obtained. Also, * this is where gParam fields that are mass-related are multiplied by the gMultiplier value. */ CreateGlobalIntegerMassArrays(firstMassPtr); /* * Make and modify the array gGapList so that it incorporates information about amino acids * that are missing, plus it alters the mass of cysteine based on the value of cysMW. The first * gAminoAcidNumber positions in gGapList contain the nominal residue mass values for the amino * acids, except that the masses of Gln and Ile are assigned zero. Hence, I need to be careful * about this throughout the program, and I often have an if statement that prevents using * gGapList values of zero. Positions after gAminoAcidNumber contain residue masses for * two amino acids. * * The other global arrays (gNomMass, gSingAA, etc.) remain intact. Later, in * ScoreSequences I modify some globals that are global only to the functions within that file. * * SetupGapList uses gElementMass_x100 and gMonoMass_x100. */ SetupGapList(); /* * MakeSequenceGraph assigns values to the array of INT_4 's sequenceNodeC[GRAPH_LENGTH] * and sequenceNodeN[GRAPH_LENGTH]. * The indexing of these arrays corresponds to nominal mass values of hypothetical b-type * ions, and the values assigned to each node is an estimation of the likelihood that * there is a real cleavage at that mass. * * MakeSequenceGraph uses gElementMass_x100 and gMonoMass_x100. * * Here's where I would start looping at different pParam.peptideMW values in order to * determine average scores for incorrect sequences. */ for (i = 0; i <= gParam.wrongSeqNum; i++) /*initialize*/ { gWrongXCorrScore[i] = 0; /*holds the best wrong cross-correlation scores*/ gWrongIntScore[i] = 0; /*holds the best wrong intensity scores*/ gWrongProbScore[i] = 0; /*holds the best wrong Pevzner score*/ gWrongQualityScore[i] = 0; /*holds the best wrong quality*/ gWrongComboScore[i] = 0; /*holds the best combined score*/ } actualPeptideMW = gParam.peptideMW; /*save the real peptide mass*/ actualTopSeqNum = gParam.topSeqNum; /*save the real max subsequence number*/ actualFinalSeqNum = gParam.finalSeqNum; /*save the real max final candidate sequence number*/ posNeg = 1; /*goes back and forth between +1 and -1, see below*/ /*Here's the loop where i is negative and works towards zero, which represents the correct mass. If i is -10 then -9, massChange becomes -5 and -5. However, each time thru the loop posNeg goes back and forth from +1 to -1, so in the end massChange is -5 then +5. These are the numbers that get multiplied by a methylene mass and then added to the correct peptide mass.*/ for (i = -1 * gParam.wrongSeqNum; i <= 0; i++) { if (i != 0) { massChange = (REAL_4)i / 2 - 0.5; /*since i is neg I need to subtract 0.5 to round down*/ massChange = massChange * posNeg; /*go pos and neg*/ posNeg = posNeg * -1; /*go back and forth -1 +1 -1 +1 on and on*/ gCorrectMass = FALSE; /*for wrong masses use a smaller seqNum to speed the processing*/ gParam.topSeqNum = 1000; gParam.finalSeqNum = 5000; } else { massChange = 0; /*no mass change when i = 0, cuz thats the loop for the correct MW*/ gCorrectMass = TRUE; /*loop is for correct mass*/ gParam.topSeqNum = actualTopSeqNum; /*use correct subseq num for correct mass*/ gParam.finalSeqNum = actualFinalSeqNum;/*use correct final sequence num */ } /*gParam.peptideMW gets changed for the remainder of the loop*/ gParam.peptideMW = actualPeptideMW + massChange * (/*2 * gElementMass_x100[HYDROGEN]*/ + gElementMass_x100[CARBON]); /*differences of a methylene is debatable; I think it might be bad idea now*/ MakeSequenceGraph(firstMassPtr, sequenceNode, sequenceNodeC, sequenceNodeN, gIonTypeWeightingTotal); /* * SummedNodeScore connects the nodes starting from the C-terminal node(s) that differ by the * nominal mass of an amino acid residue. There may be several C-terminal nodes if the peptide * mass error is sufficiently large, all of which are used independently of each other. Those * nodes that can be connected to the C-terminus are given a bonus score to differentiate them * from those nodes that do not. This is also * where the C-terminal one-edge nodes are found and stored in the array oneEdgeNodes, and * has a maximum of oneEdgeNodesIndex (ie, if oneEdgeNodesIndex is 26, then oneEdgeNodes has values * for [0 to 25]). The altered node values are held in the array sequenceNode. The arrays * sequenceNodeN and sequenceNodeC were obtained from the function MakeSequenceGraph and are used * as the input information for SummedNodeScore. At one time, this function summed the node * scores that lead up to it, but I found that this tended to overly dominate the subsequencing * scores in a bad way. What worked best is to add a bonus score to any node that connects to the * C-terminus. However, the name of the function remains - SummedNodeScore - even though it * doesn't sum the node scores. Those bits of evidence in either SequenceNodeC or SequenceNodeN * that cannot connect to the C-terminus are added together to give a relatively low node * value. Also, this is only done for stretches of consecutive nodes where none of them * are able to connect to the C-terminus (ie, nodes adjacent to one that does connect are not * included in the final array sequenceNode). * * SummedNodeScore uses gElementMass_x100 and gMonoMass_x100. */ SummedNodeScore(sequenceNode, sequenceNodeC, sequenceNodeN, oneEdgeNodes, &oneEdgeNodesIndex, gIonTypeWeightingTotal); /* * GetAutoTag finds bits of sequences using only the m/z region between the precursor ion and * above. Ions of type y are only considered unless there are pairs of ions that differ by 28 * where the higher m/z pair is of greater intensity. The only charge state considered is * one less than the precursor charge. For LCQ data, where both b and y ions are usually seen * above the precursor, this procedure can be a bit dangerous. I've found situations where * the correct sequence is eliminated because the correct sequence tag is not found. Due * to the ambiguity of LCQ data w/ respect to it being a b or a y, I don't use the auto-tag * feature for trap data. It works great for TSQ data, though. * * GetAutoTag uses gElementMass_x100 and gMonoMass_x100. */ if ((gParam.fragmentPattern == 'L' || gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q') && gParam.chargeState > 1 && gParam.autoTag) { GetAutoTag(firstMassPtr, sequenceNode); } /* * Now that the node scores have been finalized (in the array sequenceNode) and the C-terminal * one-edged nodes have been identified, it is time to start building up subsequences from the * N-terminus. Again, I connect the nodes that are spaced one or two amino acid residues apart, * but now I need to remember how the nodes were connected. In SummedNodeScore, all that was * important was knowing that there was some way to connect the nodes from the C-terminus, whereas * here I need to keep track of the pathway from the N-terminus. The output is the final list of * completed sequences, which is contained in a struct of type Sequence. The function * SubsequenceMaker returns a pointer to a struct of type Sequence, which is the first element in * a linked list of completed sequences plus the associated subsequence score. If there were * no completed sequences, then the function returns a NULL value. * * SubsequenceMaker uses gElementMass_x100 and gMonoMass_x100. */ firstSequencePtr = SubsequenceMaker(oneEdgeNodes, oneEdgeNodesIndex, sequenceNode); /* * Next add subsequences that do not necessarily connect to either termini (process called Haggis). * Mass scrambles for statistics has to be turned off, since changes in peptide MW will not alter the * results. */ if(gParam.wrongSeqNum == 0) { firstSequencePtr = Haggis(firstSequencePtr, firstMassPtr); } /* * Next, the list of sequences in the linked list starting w/ firstSequencePtr are scored. * To do this, I need the CID data (firstMassPtr), the peptide molecular weight, the fragment * ion error, the charge state of the precursor ion, and the mass of cysteine. The latter is * used to determine if certain alkylating groups are present that give rise to specific types * of fragment ions. The return is a pointer to the ranked and scored list of sequences. * The linked list starting with firstMassPtr is intact, but all lists of sequences, except * for the returned linked list, are free'ed. In addition, I need to know the monoToAv mass * switch, the sequence tag information (to add the sequence tag back in), and the N-terminal * modification. Intensity-based scores are determined for each sequence, and the the top * MAX_X_CORR_NUM sequences are assigned cross-correlation scores. In the end, some combination * of these two scores will provide a INT_4 list of sequences to be submitted for FASTA database * analysis. * * ScoreSequences uses gElementMass and gMonoMass instead of the _x100 arrays. */ if (firstSequencePtr != NULL) { firstSequencePtr = ScoreSequences(firstSequencePtr, firstMassPtr); gFirstTimeThru = FALSE; /*forever false after first time thru loop*/ SetupGapList(); /*The gGapList can get changed in the scoring, so its returned to the original values*/ FreeSequence(firstSequencePtr); /*Get rid of the sequences, cuz you'll be getting a new set soon*/ } else if (gCorrectMass) { PrintPartingGiftToFile(); } else { gWrongIndex++; /*No sequences to score, so the best score is zero, which is what the gScore arrays were normalized to*/ } } /*end of gParam.peptideMW looping*/ /*trash these things*/ free(sequenceNodeC); free(sequenceNodeN); free(oneEdgeNodes); free(sequenceNode); /* Free up the linked lists.*/ /* JAT - Why not free if win32? */ #if (__dest_os != __win32_os) /*List of completed sequences from subsequencing routine*/ FreeMassList(firstMassPtr); /*List of ions and intensities*/ #endif fflush(stdout); } /* //-------------------------------------------------------------------------------- // BuildPgmState() //-------------------------------------------------------------------------------- */ static void BuildPgmState(INT_4 argc, CHAR **argv) { INT_4 c; extern CHAR *optarg; extern INT_4 optind; /* initialize parameters */ gParam.fMonitor = TRUE; gParam.fVerbose = TRUE; strcpy(gParam.paramFile,"Lutefisk.params"); strcpy(gParam.outputFile,""); strcpy(gParam.detailsFilename,"Lutefisk.details"); strcpy(gParam.residuesFilename,"Lutefisk.residues"); /* get command-line parameters */ while ((c = getopt(argc, argv, "?hqvd:o:m:p:r:s:")) != -1) { switch (c) { case 'o': /* output file name */ strncpy(gParam.outputFile, optarg, sizeof(gParam.outputFile)); break; case 'd': /* details file name */ strncpy(gParam.detailsFilename, optarg, sizeof(gParam.detailsFilename)); break; case 'm': /* peptide MW */ gParam.peptideMW = atof(optarg); break; case 'p': /* param file name */ strncpy(gParam.paramFile, optarg, sizeof(gParam.paramFile)); break; case 'q': /* QUIET! */ gParam.fMonitor = FALSE; gParam.fVerbose = FALSE; break; case 'r': /* residues file name */ strncpy(gParam.residuesFilename, optarg, sizeof(gParam.residuesFilename)); break; case 's': /* database sequences file */ strncpy(gParam.databaseSequences, optarg, sizeof(gParam.databaseSequences)); break; case 'v': /* verbose */ gParam.fVerbose = TRUE; break; case '?': case 'h': /* print usage */ puts("\nUSAGE: lutefisk [options] [CID file pathname]\n"); puts( " -o = output file pathname"); puts( " -q = quiet mode ON (default OFF)"); puts( " -m = precursor ion mass"); puts( " -d = details file pathname"); puts( " -p = params file pathname"); puts( " -r = residues file pathname"); puts( " -s = pathnane of file with database sequences to score"); puts( " -v = verbose mode ON (default OFF)"); puts( " -h = print this help text"); puts( "" ); puts("\n"); exit(1); break; } } /* report flag state */ if (gParam.fMonitor) { printf("Verbose mode %s\n", gParam.fVerbose ? "ON" : "OFF"); } /* get the parameter file name if one is specified */ /*XXXX if (optind < argc) { strcpy(gParam.paramFile, argv[argc-1]); } */ return; } /* //-------------------------------------------------------------------------------- // FindTheMultiplier() //------------------------------------------------------------------------------- FindTheMultiplier uses the value of gParam.fragmentErr to determine the value of gMultiplier. This is also where a value is determined for GRAPH_LENGTH, which will replace the #defined value of GRAPH_LENGTH. */ void FindTheMultiplier(void) { INT_4 multiplier; REAL_4 testMass; multiplier = 1; testMass = multiplier * gParam.fragmentErr; if (testMass > MULTIPLIER_SWITCH) /* A value of two implies that there are a total of 5 nodes (2 on each side)*/ { gMultiplier = 1; } else { multiplier = 10; testMass = multiplier * gParam.fragmentErr; if (testMass > MULTIPLIER_SWITCH) { gMultiplier = 10; } else { multiplier = 100; testMass = multiplier * gParam.fragmentErr; if (testMass > MULTIPLIER_SWITCH) { gMultiplier = 100; } else { multiplier = 100; testMass = multiplier * gParam.fragmentErr; if (testMass > MULTIPLIER_SWITCH) { gMultiplier = 100; } else { multiplier = 1000; testMass = multiplier * gParam.fragmentErr; if (testMass > MULTIPLIER_SWITCH) { gMultiplier = 1000; } else { gMultiplier = 1000; } } } } } /* Now calculate GRAPH_LENGTH, which will a bit larger than required for the peptide mass.*/ gGraphLength = gMultiplier * (gParam.peptideMW + gParam.wrongSeqNum * (2 * gElementMass[HYDROGEN] + gElementMass[CARBON])) * 1.1; return; } /* //-------------------------------------------------------------------------------- // ChangeOutputName() //-------------------------------------------------------------------------------- If no output filename specified, set the output filename to the CID filename + ".lut". martin 98/8/27 modified 000310 JAT */ void ChangeOutputName(void) { if (strlen(gParam.outputFile) == 0) { /* There wasn't an output filename specified. */ char outputFile[256]; INT_4 length; INT_4 fileCount; /* Start from the CID filename */ strcpy (outputFile, gParam.cidFilename); length = strlen(outputFile); /* Add ".lut" to the end of the name (replacing .dta, etc.) */ if ((length > 4) && (!strncmp(outputFile + length - 4, ".dta", 4) || !strncmp(outputFile + length - 4, ".DTA", 4) || !strncmp(outputFile + length - 4, ".dat", 4) || !strncmp(outputFile + length - 4, ".DAT", 4) || !strncmp(outputFile + length - 4, ".txt", 4) || !strncmp(outputFile + length - 4, ".TXT", 4)) ) { strcpy(outputFile + length - 4, ".lut"); } else { strcat(outputFile, ".lut"); } /* Make sure that the file doesn't already exist. If it does, append a number. */ strcpy(gParam.outputFile, outputFile); fileCount = 1; while (1) { FILE *fp = fopen(gParam.outputFile, "r"); if (NULL == fp) break; fclose(fp); strcpy(gParam.outputFile, outputFile); sprintf(gParam.outputFile + strlen(gParam.outputFile), "%d\0", fileCount++); if (fileCount > 20) { printf("Too many old output files! Please clean up a bit first! Quitting."); exit(1); } } } return; } /* //-------------------------------------------------------------------------------- // CreateGlobalIntegerMassArrays() //-------------------------------------------------------------------------------- The value of GRAPH_LENGTH is divided by 10, 100, 1000, 10000, and 100000 until a value less than 10,000 is obtained. The divisor obtained is then used to multiply the float values in gElementMass and gMonoMass (correctly rounded) to be placed in the arrays gElementMass_x100 and gMonoMass_x100 (so-named because initially I will use monoisotopic masses down to 2 decimal points). */ void CreateGlobalIntegerMassArrays(struct MSData *firstMassPtr) { INT_4 i; REAL_4 correction; struct MSData *currPtr; /* Create integer values for the monoisotopic masses of amino acids.*/ for (i = 0; i < gAminoAcidNumber; i++) { gMonoMass_x100[i] = (gMonoMass[i] * gMultiplier) + 0.5; } /* Create integer values for gNodeCorrection.*/ for (i = 0; i < gAminoAcidNumber; i++) { correction = (gMonoMass[i] * gMultiplier * 10) - (gMonoMass_x100[i] * 10); if (correction >= 0) { gNodeCorrection[i] = correction + 0.5; } else { gNodeCorrection[i] = correction - 0.5; } } for (i = gAminoAcidNumber; i < MAX_GAPLIST; i++) { gNodeCorrection[i] = 0; } /* Create integer values for the monoisotopic masses of the six elements used by the program.*/ for (i = 0; i < ELEMENT_NUMBER; i++) { gElementMass_x100[i] = (gElementMass[i] * gMultiplier) + 0.5; } /* Create integer values for gElementCorrection.*/ for (i = 0; i < ELEMENT_NUMBER; i++) { correction = (gElementMass[i] * gMultiplier * 10) - (gElementMass_x100[i] * 10); if (correction >= 0) { gElementCorrection[i] = correction + 0.5; } else { gElementCorrection[i] = correction - 0.5; } } /* Create the appropriate integer values for the mass variables found in the .params file.*/ gParam.peptideMW = gParam.peptideMW * gMultiplier; gParam.monoToAv = gParam.monoToAv * gMultiplier; gParam.peptideErr = gParam.peptideErr * gMultiplier; gParam.fragmentErr = gParam.fragmentErr * gMultiplier; gParam.qtofErr = gParam.qtofErr * gMultiplier; gParam.ionOffset = gParam.ionOffset * gMultiplier; gParam.cysMW = gParam.cysMW * gMultiplier; gParam.tagNMass = gParam.tagNMass * gMultiplier; gParam.tagCMass = gParam.tagCMass * gMultiplier; gParam.peakWidth = gParam.peakWidth * gMultiplier; gParam.modifiedNTerm = gParam.modifiedNTerm * gMultiplier; gParam.modifiedCTerm = gParam.modifiedCTerm * gMultiplier; gAvMonoTransition = AV_MONO_TRANSITION * gMultiplier; gWater = WATER * gMultiplier; gAmmonia = AMMONIA * gMultiplier; gCO = CO * gMultiplier; gAvResidueMass = AV_RESIDUE_MASS * gMultiplier; /* Convert the list of real data peaks to integer data peaks.*/ currPtr = firstMassPtr; while (currPtr != NULL) { currPtr->mOverZ = (INT_4)(currPtr->mOverZ * gMultiplier + 0.5); currPtr = currPtr->next; } return; } /* //-------------------------------------------------------------------------------- // FreeAllSequenceScore() //-------------------------------------------------------------------------------- Free linked list of SequenceScore structs. */ void FreeSequenceScore(struct SequenceScore *currPtr) { struct SequenceScore *freeMePtr; while (currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free(freeMePtr); } return; } /* //-------------------------------------------------------------------------------- // FreeMassList() //-------------------------------------------------------------------------------- Free linked list of MSData */ void FreeMassList(struct MSData *currPtr) { struct MSData *freeMePtr; while (currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free((INT_4*)freeMePtr); /****(INT_4*)****/ } return; } /* //-------------------------------------------------------------------------------- // FreeSequence() //-------------------------------------------------------------------------------- Used for freeing memory in a linked list. */ void FreeSequence(struct Sequence *currPtr) { struct Sequence *freeMePtr; while (currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free(freeMePtr); } return; } /* //-------------------------------------------------------------------------------- // ReadDetailsFile() //-------------------------------------------------------------------------------- This function reads in the ion values from the Lutefisk.details editable ascii file. The ion values are used to determine the values to assign to the nodes in the sequence graph that is to be developed later in the program. In addition to assigning values to each of the ion types to be used, there are three possible ways of setting up the sequence graph. The first is called "General", and is an all-purpose fragmentation pattern where very little is assumed about the peptide. The second is called "Tryptic", and it assumes that the CID data is for a tryptic multiply charged precursor and that the CID was performed in a quadrupole instrument under low energy CID conditions (this might also work for ion trap data - I don't know, I really just don't know). The third type of fragmentation pattern is called "Arg+1", and it is for singly charged precursor ions that contain arginine. The type of fragmentation pattern is passed using the variable "fragmentPattern" and is either 'G', 'T', or 'R', and depending on the value of "fragmentPattern" either the first, second, or third column of Lutefisk.details is read to input the various ion values. */ INT_4 ReadDetailsFile(void) { FILE *fp; char stringBuffer[256]; INT_4 totalIonVal = 0; INT_4 i = 0; INT_4 value; REAL_4 valueMultiplier; fp = fopen(gParam.detailsFilename, "r"); if (fp == NULL) { printf("Cannot open Lutefisk details file."); exit(1); } while (!feof(fp)) { if (my_fgets(stringBuffer, 256, fp) == NULL) { continue; } i += 1; if (gParam.fragmentPattern == 'G') { /* Read in ion values for the general fragmentation pattern.*/ sscanf(stringBuffer, "%d %*d %*d", &value); } else if (gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q') { /* Read in ion values for the triple quad tryptic fragmentation pattern.*/ sscanf(stringBuffer, "%*d %d %*d", &value); } else if (gParam.fragmentPattern == 'L') { /* Read in ion values for the ion trap tryptic fragmenation pattern.*/ sscanf(stringBuffer, "%*d %*d %d", &value); } if (i == 1) { gWeightedIonValues.b = (INT_4)value; } else if (i == 2) { gWeightedIonValues.a = (INT_4)value; } else if (i == 3) { gWeightedIonValues.c = (INT_4)value; } else if (i == 4) { gWeightedIonValues.d = (INT_4)value; } else if (i == 5) { gWeightedIonValues.b_minus17or18 = (INT_4)value; } else if (i == 6) { gWeightedIonValues.a_minus17or18 = (INT_4)value; } else if (i == 7) { gWeightedIonValues.y = (INT_4)value; } else if (i == 8) { gWeightedIonValues.y_minus2 = (INT_4)value; } else if (i == 9) { gWeightedIonValues.y_minus17or18 = (INT_4)value; } else if (i == 10) { gWeightedIonValues.x = (INT_4)value; } else if (i == 11) { gWeightedIonValues.z_plus1 = (INT_4)value; } else if (i == 12) { gWeightedIonValues.w = (INT_4)value; } else if (i == 13) { gWeightedIonValues.v = (INT_4)value; } else if (i == 14) { gWeightedIonValues.b_minusOH = (INT_4)value; } else if (i == 15) { gWeightedIonValues.b_minusOH_minus17 = (INT_4)value; } } fclose(fp); totalIonVal = gWeightedIonValues.b + gWeightedIonValues.a + gWeightedIonValues.c + gWeightedIonValues.d + gWeightedIonValues.b_minus17or18 + gWeightedIonValues.a_minus17or18 + gWeightedIonValues.y + gWeightedIonValues.y_minus2 + gWeightedIonValues.y_minus17or18 + gWeightedIonValues.x + gWeightedIonValues.z_plus1 + gWeightedIonValues.w + gWeightedIonValues.v + gWeightedIonValues.b_minusOH + gWeightedIonValues.b_minusOH_minus17; if (totalIonVal > 30) /*can't exceed value of a char, and 30 is well below 127*/ { valueMultiplier = 30 / (REAL_4)totalIonVal; gWeightedIonValues.b = (REAL_4)gWeightedIonValues.b * valueMultiplier + 0.5; gWeightedIonValues.a = (REAL_4)gWeightedIonValues.a * valueMultiplier + 0.5; gWeightedIonValues.c = (REAL_4)gWeightedIonValues.c * valueMultiplier + 0.5; gWeightedIonValues.d = (REAL_4)gWeightedIonValues.d * valueMultiplier + 0.5; gWeightedIonValues.b_minus17or18 = (REAL_4)gWeightedIonValues.b_minus17or18 * valueMultiplier + 0.5; gWeightedIonValues.a_minus17or18 = (REAL_4)gWeightedIonValues.a_minus17or18 * valueMultiplier + 0.5; gWeightedIonValues.y = (REAL_4)gWeightedIonValues.y * valueMultiplier + 0.5; gWeightedIonValues.y_minus2 = (REAL_4)gWeightedIonValues.y_minus2 * valueMultiplier + 0.5; gWeightedIonValues.y_minus17or18 = (REAL_4)gWeightedIonValues.y_minus17or18 * valueMultiplier + 0.5; gWeightedIonValues.x = (REAL_4)gWeightedIonValues.x * valueMultiplier + 0.5; gWeightedIonValues.z_plus1 = (REAL_4)gWeightedIonValues.z_plus1 * valueMultiplier + 0.5; gWeightedIonValues.w = (REAL_4)gWeightedIonValues.w * valueMultiplier + 0.5; gWeightedIonValues.v = (REAL_4)gWeightedIonValues.v * valueMultiplier + 0.5; gWeightedIonValues.b_minusOH = (REAL_4)gWeightedIonValues.b_minusOH * valueMultiplier + 0.5; gWeightedIonValues.b_minusOH_minus17 = (REAL_4)gWeightedIonValues.b_minusOH_minus17 * valueMultiplier + 0.5; totalIonVal = 30; } return totalIonVal; } /* //-------------------------------------------------------------------------------- // ReadEdmanFile() //-------------------------------------------------------------------------------- This function reads the data from the file Lutefisk.edman into the INT_4 array called gEdmanData[MAX_PEPTIDE_LENGTH][gAminoAcidNumber]. The array gEdmanData contains the nominal masses of the amino acids listed in each cycle, and gMaxCycleNum contains the number of cycles listed in lutefisk.edman. */ void ReadEdmanFile() { FILE *fp; char stringBuffer[256], test; char edmanChar[MAX_PEPTIDE_LENGTH][AMINO_ACID_NUMBER]; INT_4 i, j, k; /* Initialize some variables.*/ test = TRUE; gMaxCycleNum = 0; for (i = 0; i < MAX_PEPTIDE_LENGTH; i++) { for (j = 0; j < gAminoAcidNumber; j++) { gEdmanData[i][j] = 0; edmanChar[i][j] = 0; } } /*Open the file.*/ fp = fopen(gParam.edmanFilename, "r"); if (fp == NULL) { printf("Could not open the Edman file '%s'. Quitting.", gParam.edmanFilename); exit(1); } /*Read the information into a character array of single letter amino acid codes.*/ while (!feof(fp) && test) { test = FALSE; if (my_fgets(stringBuffer, 256, fp) == NULL) { continue; } j = 0; while (stringBuffer[j] >= 65 && stringBuffer[j] <= 121 && j <= 19) { test = TRUE; if (stringBuffer[j] >= 97) { stringBuffer[j] = stringBuffer[j] - 32; } edmanChar[gMaxCycleNum][j] = stringBuffer[j]; j+=1; } edmanChar[gMaxCycleNum][j] = 0; gMaxCycleNum += 1; } if (test == FALSE) /*If it got out of the while loop by having test = FALSE, then the value of gMaxCycleNum will be one too many.*/ { gMaxCycleNum = gMaxCycleNum - 1; } fclose(fp); /* Convert single letter code characters to nominal mass values.*/ for (i = 0; i < gMaxCycleNum; i++) { j = 0; if (edmanChar[i][j] == 'X') { k = 0; for (j = 0; j < gAminoAcidNumber; j++) { if (gGapList[j] != 0) { gEdmanData[i][k] = gGapList[j]; k++; } } } else { while (edmanChar[i][j] != 0) { for (k = 0; k < gAminoAcidNumber; k++) { if (gSingAA[k] == edmanChar[i][j]) { if (gGapList[k] == 0) { if (gSingAA[k] == 'Q' || gSingAA[k] == 'I') /*If gGapList is 0, because its Ile or Gln, then I need to make special provision for these two amino acids. In the end, these will come out looking like they are Lys and Leu, so maybe I'll need to fix that minor problem later on.*/ { if (gSingAA[k] == 'Q') { gEdmanData[i][j] = gGapList[K]; } if (gSingAA[k] == 'I') { gEdmanData[i][j] = gGapList[L]; } } else { printf("An absent amino acid was found in the Edman data file."); exit(1); } } else { gEdmanData[i][j] = gGapList[k]; } } } j++; } } } return; } /* //-------------------------------------------------------------------------------- // ReadParamsFile() //-------------------------------------------------------------------------------- This function reads parameters from the Lutefisk.params editable text file. Fields set with command-line options override the values found in the Lutefisk.params file. The values are stored in various fields of a global struct called gParam. The meanings of the fields are as follows: fMonitor = TRUE or FALSE and turns simulated teletype interface on/off (default = on) fVerbose = TRUE or FALSE and spits out extra infor if on (default = off) paramFile = string (default = lutefisk.params) outputFile = string (default = lutefisk.out) detailsFilename = string (default = lutefisk.details) cidFilename = name of the asci file containing the CID data. peptideMW = molecular weight of the peptide (Da). chargeState = charge on the precursor ion. fragmentErr = fragment ion mass tolerance (Da). ionOffset = any error in mass measurement that is consistent throughout the spectrum. This value is added to the observed m/z values in cidFilename. fragmentPattern = G, T, L, or D for general, tryptic triple quad, tryptic LCQ fragmentations or default, where the program determines if its T or L automatically. cysMW = molecular weight of cysteine, including any possible modification. proteolysis = T, K, E or D for tryptic, Lys-C, V8 digestion, or Asp-N. This is used in setting up the graph. centroidOrProfile = C, P, or D depending on if the CID data is centroid or profile data. D is the default value where the program determines this automatically. monoToAv = the mass (Da) above which the observed masses are assumed to be average mass and below which the observed masses are assumed to be monoisotopic. The cutoff is gradual, and occurs over a range extending 400 Da below monoToAv. ionsPerWindow = the number of ions in a SPECTRAL_WINDOW_WIDTH Da wide window, for multiply charged ions the window size becomes correspondingly narrower. aaPresent = a character string of amino acid single letter codes w/o spaces that represent the amino acids known to be present in the peptide. aaAbsent = a character string of amino acid single letter codes w/o spaces that represent the amino acids known to be absent in the peptide. modifiedNTerm = N, A, C, or P for none, acetylated, carbamylated, or pyroglutamylated N-terminus. modifiedCterm = N or A for none or amidated C-terminal modification. tagNMass = mass (Da) of the amino acid residues N-terminal to the sequence tag. tagSequence = a character string of amino acid single letter codes w/o spaces that represent the amino acid sequence of the sequence tag. tagCMass = mass (Da) of the amino acid residues C-terminal to the sequence tag. finalSeqNum = the number of completed sequences that will be saved for final scoring. topSeqNum = the number of subsequences allowed.. extThresh = a fraction applied to the top scoring sequence extension. All other extensions must have scores exceeding this value. maxExtNum = the maximum number of extensions allowed per subsequence. maxGapNum = the maximum number of two amino acid gaps in the subsequence. peakWidth = the width of the ions at 10% of the height. A default of zero will cause the program to automatically determine the peak width if the data is in profile mode. If the data is centroided, then it defaults to a value of 2. ionThreshold = the ion threshold times the average intensity in the spectrum is the theshold below which signals are discarded. The m/z's above the precursor use a threshold that is one-half of ionThreshold. autoTag = Y or N. Yes will initiate the automatic sequence tag finder. peptideErr = the peptide molecular weight tolerance (in Da). edmanDataFile = The filename where Edman data is located. ionsPerResidue = once the CID data is input, there is an upper limit on the total number of ions that will be used in the analysis. This upper limit is based on the ionsPerResidue x the number of "average residues" in the peptide. This "average residue" number is determined from the AV_RESIDUE_MASS value found in Lutefisk.h and peptideMW. CIDfileType = F or T, depending on if the file is ASCII generated by the Finnigan TSQ List program, or a simple tab-delineated list. */ void ReadParamsFile(void) { FILE * fp; char stringBuffer[256], Leu = FALSE, Ile = FALSE, Gln = FALSE, Lys = FALSE; INT_4 i, j, length; char * setting; char * value; gParam.edmanPresent = FALSE; /*set to true if an edman file is present*/ fp = fopen(gParam.paramFile, "r"); if (fp == NULL) { /* fopen() returns NULL if it can't find the file. */ printf("Cannot open the parameter file '%s'\n", gParam.paramFile); goto problem; } while (!feof(fp)) /*While not at the end of file.*/ { if (my_fgets(stringBuffer, 256, fp) == NULL) { /* fgets returns NULL at the end of a file. Drop out of the current iteration, so that the while condition will terminate the loop.*/ continue; } setting = strtok(stringBuffer, ":"); /* Skip comment lines */ if (setting[0] == '/') continue; value = strtok(NULL, " \t"); if (NULL == value) continue; /* Skip settings w/o values. */ if (value[0] == '|') continue; /* printf("Token: '%s' Value: '%s'\n", setting, value); */ if (!strcmp(setting, "CID Filename")) /*----------------------------------*/ { if (!strlen(gParam.cidFilename)) { strcpy(gParam.cidFilename, value); } if (gParam.fVerbose) printf("CID file name = %s\n", gParam.cidFilename); } else if (!strcmp(setting, "CID Quality")) { gParam.quality = toupper(value[0]); if (gParam.quality == 'Y') { gParam.quality = TRUE; } else { gParam.quality = FALSE; } if (gParam.fVerbose) printf("Quality = %d\n", gParam.quality); } else if (!strcmp(setting, "Peptide MW")) /*------------------------------*/ { gParam.peptideMW = atof(value); if (gParam.fVerbose) printf("peptide MW = %f\n", gParam.peptideMW); } else if (!strcmp(setting, "Charge-state")) /*----------------------------*/ { gParam.chargeState = atoi(value); if (gParam.fVerbose) printf("charge state = %d\n", gParam.chargeState); } else if (!strcmp(setting, "MaxEnt3")) /*---------------------------------*/ { gParam.maxent3 = toupper(value[0]); if (gParam.maxent3 == 'Y') { gParam.maxent3 = TRUE; } else { gParam.maxent3 = FALSE; } if (gParam.fVerbose) printf("MaxEnt3 = %d\n", gParam.maxent3); } else if (!strcmp(setting, "Peptide Error (u)")) /*-----------------------*/ { gParam.peptideErr = atof(value); if (gParam.peptideErr < 0 || gParam.peptideErr > 5) { printf("The peptide error should be positive and less than or equal to 5.\n"); goto problem; } if (gParam.fVerbose) printf("peptide error = %f\n", gParam.peptideErr); } else if (!strcmp(setting, "Fragment Error (u)")) /*-----------------------*/ { gParam.fragmentErr = atof(value); if (gParam.fragmentErr < 0 || gParam.fragmentErr > 5) { printf("The fragment ion error should be positive and less than or equal to 5.\n"); goto problem; } if (gParam.fVerbose) printf("Fragment ion error = %f\n", gParam.fragmentErr); } else if (!strcmp(setting, "Final Fragment Err (u)")) /*------------------*/ { gParam.qtofErr = atof(value); if (gParam.qtofErr < 0 || gParam.qtofErr > 0.4) { printf("The qtof final fragment error should be less than 0.4.\n"); goto problem; } if (gParam.fVerbose) printf("Qtof fragment error = %f\n", gParam.qtofErr); } else if(!strcmp(setting, "Score threshold")) { gParam.outputThreshold = atof(value); if(gParam.outputThreshold <= 0 || gParam.outputThreshold >= 1) { printf("The output score threshold should be between 0 and 1\n"); goto problem; } if(gParam.fVerbose) printf("Final output score threshold = %f\n", gParam.outputThreshold); } else if (!strcmp(setting, "Max. Final Sequences")) /*------------------*/ { gParam.finalSeqNum = atoi(value); if (gParam.finalSeqNum < 0) { printf("The number of completed sequences must be positive.\n"); goto problem; } if (gParam.fVerbose) printf("Number of seqs to store = %d\n", gParam.finalSeqNum); } else if (!strcmp(setting, "Mass Scrambles for Statistics")) { gParam.wrongSeqNum = atoi(value); if (gParam.wrongSeqNum < 0) { printf("The number of mass scrambles is zero or higher.\n"); goto problem; } } else if (!strcmp(setting, "Number of sequences")) { gParam.outputSeqNum = atoi(value); if(gParam.outputSeqNum <= 0 || gParam.outputSeqNum > 50) { printf("The number of output sequences must be between 0 and 50.\n"); goto problem; } if(gParam.fVerbose) printf("Number of output sequences to display in final report = %d\n", gParam.outputSeqNum); } else if(!strcmp(setting, "Shoe Size (US)")) { gParam.shoeSize = atoi(value); if(gParam.shoeSize > 15) printf("Your feet are enormous.\n"); if(gParam.shoeSize < 5) printf("Your shoes are probably too tight.\n"); } else if (!strcmp(setting, "Max. Subsequences")) /*----------------------*/ { gParam.topSeqNum = atoi(value); if (gParam.topSeqNum < 0) { printf("The number of subsequences must be positive.\n"); goto problem; } if (gParam.fVerbose) printf("Number of seqs to display = %d\n", gParam.topSeqNum); } else if (!strcmp(setting, "CID File Type")) /*------------------------*/ { gParam.CIDfileType = toupper(value[0]); /* If the input file ends w/ .dat then it's Native and if .dta it's a dta file (unless it's been set as (Q) a Micromass .dta file which is a screwy variant). */ length = strlen(gParam.cidFilename); if ((length > 4) && (!strncmp(gParam.cidFilename + length - 4, ".dta", 4)) && gParam.CIDfileType != 'Q') { gParam.CIDfileType = 'D'; } else if ((length > 4) && (!strncmp(gParam.cidFilename + length - 4, ".dat", 4))) { gParam.CIDfileType = 'N'; } if (gParam.CIDfileType != 'F' /* ICIS text file */ && gParam.CIDfileType != 'T' /* tab text file */ && gParam.CIDfileType != 'L' /* LCQ text file */ && gParam.CIDfileType != 'N' /* Finnigan '.dat' file */ && gParam.CIDfileType != 'D' /* Finnigan '.dta' file */ && gParam.CIDfileType != 'Q') /* Micromass pkl pseudo '.dta' format */ { printf("Unregnized CID file type '%c'\n", gParam.CIDfileType); goto problem; } if (gParam.fVerbose) printf("CID file type = %c\n", gParam.CIDfileType); } else if (!strcmp(setting, "Profile/Centroid")) /*--------------------*/ { gParam.centroidOrProfile = toupper(value[0]); if (gParam.centroidOrProfile != 'C' && gParam.centroidOrProfile != 'P' && gParam.centroidOrProfile != 'A') { printf("Illegal value '%c' for the data centroid or profile type.\n", gParam.centroidOrProfile); goto problem; } if (gParam.fVerbose) printf("Centroid/Profile = %c\n", gParam.centroidOrProfile); } else if (!strcmp(setting, "Peak Width (u)")) /*----------------------*/ { gParam.peakWidth = atof(value); gParam.peakWidth = gParam.peakWidth / 2; /* Previously this was hard-coded as HALF_WINDOW, but it seemed like a good idea to make this a parameter in Lutefisk.gParam. However, the concept of "peakwidth" was more intuitive than "half-window", hence the change. To compensate, I divide the peak- width by two to get the half-window. Thus, I can simply replace HALF_WINDOW in the old code with peakWidth in the newer code without needing to worry about the differences.*/ if (gParam.peakWidth < 0 || gParam.peakWidth > 5) { printf("Peakwidth of zero invokes the autopeak find.\n"); printf("Otherwise choose a postive value less than or equal to 5\n"); goto problem; } if (gParam.fVerbose) printf("Peak width = %f\n", gParam.peakWidth); } else if (!strcmp(setting, "Ion Threshold")) /*-----------------------*/ { gParam.ionThreshold = atof(value); if (gParam.ionThreshold < 0 || gParam.ionThreshold >= 10) { printf("The ion threshold should be positive and less than or equal to 10.\n"); goto problem; } if (gParam.fVerbose) printf("Ion threshold = %f\n", gParam.ionThreshold); } else if (!strcmp(setting, "Mass Offset (u)")) /*---------------------*/ { gParam.ionOffset = atof(value); if (gParam.ionOffset < -2 || gParam.ionOffset > 2) { printf("I think you're ion offset is kind of weird.\n"); goto problem; } if (gParam.fVerbose) printf("Ion offset = %f\n", gParam.ionOffset); } else if (!strcmp(setting, "Ions Per Window")) /*---------------------*/ { gParam.ionsPerWindow = atof(value); if (gParam.ionsPerWindow < 0) { printf("The ions per window should be positive.\n"); goto problem; } if (gParam.fVerbose) printf("Ions per window = %.1f\n", gParam.ionsPerWindow); } else if (!strcmp(setting, "Ions Per Residue")) /*--------------------*/ { gParam.ionsPerResidue = atof(value); if (gParam.ionsPerResidue < 0 || gParam.ionsPerResidue > 20) { printf("The ions per residue should be positive.\n"); goto problem; } if (gParam.fVerbose) printf("Ions per residue = %.1f\n", gParam.ionsPerResidue); } else if (!strcmp(setting, "Transition Mass (u)")) /*-----------------*/ { gParam.monoToAv = atoi(value); if (gParam.monoToAv < 0) { printf("The switch mass should be positive.\n"); goto problem; } if (gParam.fVerbose) printf("Switch mass = %d\n", gParam.monoToAv); } else if (!strcmp(setting, "Fragmentation Pattern")) /*---------------*/ { gParam.fragmentPattern = toupper(value[0]); if (gParam.fragmentPattern != 'L' /* Tryptic ion trap data */ && gParam.fragmentPattern != 'G' /* Currently G = general pattern and is not really supported or recommended*/ && gParam.fragmentPattern != 'T' /* Tryptic triple quad data */ && gParam.fragmentPattern != 'D' /* Default that signals the program to decide on its own whether the data is from a triple quad or an ion trap*/ && gParam.fragmentPattern != 'Q') /* QTOF data */ { printf("Lutefisk.gParam: triple quad, qtof or ion trap tryptic fragmentation pattern?\n"); goto problem; } if (gParam.fVerbose) printf("Fragmentation pattern = %c\n", gParam.fragmentPattern); } else if (!strcmp(setting, "Max. Gaps")) /*---------------------------*/ { gParam.maxGapNum = atoi(value); if (gParam.maxGapNum < -1 || gParam.maxGapNum > 5) { printf("The number of gaps per sequence should be less than 5.\n"); printf("A value of -1 signals an automatic gap determination based on peptide mass.\n"); goto problem; } if (gParam.fVerbose) printf("Max gaps = %d\n", gParam.maxGapNum); } else if (!strcmp(setting, "Extension Threshold")) /*----------------*/ { gParam.extThresh = atof(value); if (gParam.extThresh < 0 || gParam.extThresh >= 1) { printf("The extension threshold is a value from 0 to 1.\n"); goto problem; } if (gParam.fVerbose) printf("Extension threshold = %f\n", gParam.extThresh); } else if (!strcmp(setting, "Max. Extensions")) /*--------------------*/ { gParam.maxExtNum = atoi(value); if (gParam.maxExtNum < 0 || gParam.maxExtNum > 10) { printf("The number of extensions is a positive number less than 10.\n"); goto problem; } if (gParam.fVerbose) printf("Max extensions = %d\n", gParam.maxExtNum); } else if (!strcmp(setting, "Cysteine Mass")) /*----------------------*/ { gParam.cysMW = atof(value); if (gParam.cysMW < 0) { printf("Lutefisk.gParam: The cysteine molecular weight must be positive.\n"); goto problem; } if (gParam.fVerbose) printf("Cysteine MW = %f\n", gParam.cysMW); } else if (!strcmp(setting, "Proteolysis")) /*------------------------*/ { gParam.proteolysis = toupper(value[0]); if (gParam.proteolysis != 'T' && gParam.proteolysis != 'K' && gParam.proteolysis != 'E' && gParam.proteolysis != 'D' && gParam.proteolysis != 'N') { printf("Lutefisk.gParam: Type of proteolysis must be specified as:\n" "tryptic (T), Lys-c (K), V8 (E), Asp-N (D), or none (N).\n"); goto problem; } if (gParam.fVerbose) printf("Proteolysis = %c\n", gParam.proteolysis); } else if (!strcmp(setting, "Modified N-terminus")) /*---------------*/ { gParam.modifiedNTerm = atof(value); if (gParam.modifiedNTerm < 0 || (gParam.modifiedNTerm > gParam.peptideMW && gParam.peptideMW > 0)) { printf("The N-terminal mass is unreasonable. It is the R in R-NH-\n"); goto problem; } if (gParam.fVerbose) printf("Modified N-terminus = %f\n", gParam.modifiedNTerm); } else if (!strcmp(setting, "Modified C-terminus")) /*---------------*/ { gParam.modifiedCTerm = atof(value); if (gParam.modifiedCTerm < 0 || (gParam.modifiedCTerm > gParam.peptideMW && gParam.peptideMW > 0)) { printf("The C-terminal mass is unreasonable. It is the R in -CO-R\n"); goto problem; } if (gParam.fVerbose) printf("Modified C-terminus = %f\n", gParam.modifiedCTerm); } else if (!strcmp(setting, "Auto Tag")) /*--------------------------*/ { gParam.autoTag = toupper(value[0]); if (gParam.autoTag == 'Y') { gParam.autoTag = TRUE; } else { gParam.autoTag = FALSE; } if (gParam.fVerbose) printf("Auto tag? = %d\n", gParam.autoTag); } else if (!strcmp(setting, "Tag Low Mass y Ion")) /*----------------*/ { gParam.tagCMass = atof(value); if (gParam.tagCMass < 0 || (gParam.tagCMass > gParam.peptideMW && gParam.peptideMW > 0)) { printf("The sequence tag N-terminal mass is unreasonable.\n"); goto problem; } if (gParam.fVerbose) printf("Low mass y ion = %f\n", gParam.tagCMass); } else if (!strcmp(setting, "Sequence Tag")) /*-----------------------*/ { strcpy(gParam.tagSequence, value); for (i = 0; i < strlen(gParam.tagSequence); i++) { gParam.tagSequence[i] = toupper(gParam.tagSequence[i]); /* If Q or I are entered as part of the tag, then change them to the isobaric K and L.*/ if (gParam.tagSequence[i] == 'Q' && gParam.fragmentErr > gMonoMass[K] - gMonoMass[Q]) { gParam.tagSequence[i] = 'K'; } if (gParam.tagSequence[i] == 'I') { gParam.tagSequence[i] = 'L'; } } if (gParam.fVerbose) printf("Sequence tag = %s\n", gParam.tagSequence); } else if (!strcmp(setting, "Tag High Mass y Ion")) /*----------------*/ { gParam.tagNMass = atof(value); if (gParam.tagNMass < gParam.tagCMass || (gParam.peptideMW > 0 && gParam.tagNMass > gParam.peptideMW + gElementMass[HYDROGEN] + gParam.fragmentErr)) { if (gParam.tagSequence[0] != '*') { printf("The sequence tag C-terminal mass is unreasonable.\n"); goto problem; } } if (gParam.fVerbose) printf("Low high y ion = %f\n", gParam.tagNMass); } else if (!strcmp(setting, "Present Amino Acids")) /*----------------*/ { strcpy(gParam.aaPresent, value); /* Force to uppercase */ for (i = 0; i < strlen(gParam.aaPresent); i++) { gParam.aaPresent[i] = toupper(gParam.aaPresent[i]); } if (strchr(gParam.aaPresent, 'I') && !strchr(gParam.aaPresent, 'L')) { strcat(gParam.aaPresent, "L"); } else if (strchr(gParam.aaPresent, 'L') && !strchr(gParam.aaPresent, 'I')) { /* strcat(gParam.aaPresent, "I");*/ } if (gParam.fragmentErr > 0.4) { if (strchr(gParam.aaPresent, 'K') && !strchr(gParam.aaPresent, 'Q')) { strcat(gParam.aaPresent, "Q"); } else if (strchr(gParam.aaPresent, 'Q') && !strchr(gParam.aaPresent, 'K')) { strcat(gParam.aaPresent, "K"); } } if (gParam.fVerbose) printf("Amino acids present = %s\n", gParam.aaPresent); } else if (!strcmp(setting, "Absent Amino Acids")) /*----------------*/ { strcpy(gParam.aaAbsent, value); /* Force to uppercase */ for (i = 0; i < strlen(gParam.aaAbsent); i++) { gParam.aaAbsent[i] = toupper(gParam.aaAbsent[i]); } if (strchr(gParam.aaAbsent, 'I') && !strchr(gParam.aaAbsent, 'L')) { strcat(gParam.aaAbsent, "L"); } else if (strchr(gParam.aaAbsent, 'L') && !strchr(gParam.aaAbsent, 'I')) { strcat(gParam.aaAbsent, "I"); } if (gParam.fragmentErr > 0.4) { if (strchr(gParam.aaAbsent, 'K') && !strchr(gParam.aaAbsent, 'Q')) { strcat(gParam.aaAbsent, "Q"); } else if (strchr(gParam.aaAbsent, 'Q') && !strchr(gParam.aaAbsent, 'K')) { strcat(gParam.aaAbsent, "K"); } } if (gParam.fVerbose) printf("Amino acids absent = %s\n", gParam.aaAbsent); } else if (!strcmp(setting, "Edman Data File")) { /* * The data is read into an array gEdmanData[cycle number][amino acids in the cycle], and * it contains the nominal mass values of the amino acids found in the specified file * (rather than a character listing of the amino acid single letter code). */ gParam.edmanPresent = TRUE; strcpy(gParam.edmanFilename, value); if (gParam.fVerbose) printf("Edman present = %d\n", gParam.edmanPresent); if (gParam.fVerbose) printf("Edman file name = %s\n", gParam.edmanFilename); } else if (!strcmp(setting, "DB Sequence File")) { strcpy(gParam.databaseSequences, value); if (gParam.fVerbose) printf("Database sequence file name = %s\n", gParam.databaseSequences); } else { printf("Unrecognized token '%s' in %s.\n", setting, gParam.paramFile); goto problem; } } fclose(fp); /* Do multi-field interaction checks. */ /* * Make sure that the amino acids that are absent (aaAbsent) do not match with the ones that * are present (aaPresent) or in the sequence tag (tagSequence). */ if (gParam.aaAbsent[0] != '*') { if (gParam.aaPresent[0] != '*') { i = 0; while (gParam.aaAbsent[i] != 0) { j = 0; while (gParam.aaPresent[j] != 0) { if (gParam.aaPresent[j] == gParam.aaAbsent[i]) { printf("Amino acid '%c' has been listed as both present and absent.", gParam.aaPresent[j]); goto problem; } j++; } i++; } } if (gParam.tagSequence[0] != '*') { i = 0; while (gParam.aaAbsent[i] != 0) { j = 0; while (gParam.tagSequence[j] != 0) { if (gParam.tagSequence[j] == gParam.aaAbsent[i]) { printf("An amino acid has been listed as a sequence tag and absent."); goto problem; } j++; } i++; } } } /* Turn autotag off if a single tag is entered */ if ((gParam.tagSequence[0] != '*' && gParam.tagCMass != 0 && gParam.tagNMass != 0) && gParam.autoTag == TRUE) { gParam.autoTag = FALSE; } if ((gParam.CIDfileType == 'D' || gParam.CIDfileType == 'X') && gParam.centroidOrProfile != 'C') { printf("Forcing .dta file to be read as centroid data.\n"); gParam.centroidOrProfile = 'C'; /*force it to read centroid for dta files*/ } /* To keep the qtof final scoring in sinc with the fragment error*/ if (gParam.fragmentErr > MULTIPLIER_SWITCH / 10) { gParam.qtofErr = 0; } if (gParam.qtofErr >= gParam.fragmentErr) { gParam.qtofErr = 0; } if (gParam.fragmentPattern != 'Q') { gParam.qtofErr = 0; } /* Make sure maxent3 processing only allowed for qtof data*/ if (gParam.maxent3) { if (gParam.fragmentPattern != 'Q') { gParam.maxent3 = FALSE; } } /* Make sure that if the peak width is zero, that no auto peakfinding is done for centroided data, and instead give reasonable values for different instruments*/ if (gParam.centroidOrProfile == 'C' && gParam.peakWidth == 0) { if (gParam.fragmentPattern == 'T') { gParam.peakWidth = 1.5; } else if (gParam.fragmentPattern == 'Q') { gParam.peakWidth = 0.375; } else { gParam.peakWidth = 0.5; } } /* Check that the number of wrong sequences is reasonable, and then guess as to what was meant.*/ if (gParam.wrongSeqNum < 3) { gParam.wrongSeqNum = 0; /*neg nums no good, and 1 or 2 is not statistically significant*/ } /*For odd numbers, round up*/ gParam.wrongSeqNum = (REAL_4)gParam.wrongSeqNum / 2 + 0.5; gParam.wrongSeqNum = gParam.wrongSeqNum * 2; return; problem: printf("\nQuitting.\n"); exit(1); } /* //-------------------------------------------------------------------------------- // ReadResidueFile() //-------------------------------------------------------------------------------- This file reads the amino acid residue masses. Up to 25 residues are possible, including the twenty common ones. Although Q/K have the same nominal masses, they can be differentiated if the error tolerances are less than about 0.04 u. Although I and L are isomeric, I use the space for L to hold Leu or Ile and use the I position later in the program to store the values for oxidized Met, which can be differentiated from Phe if the error is less than 0.03 or so. Modifications to Cys can be made via an entry in Lutefisk.params. This leaves 5 positions -- J, O, U, X, and Z for additional modified amino acids. */ void ReadResidueFile(void) { FILE *fp; char stringBuffer[256]; char singleAA; REAL_4 monoisotopic, average; INT_4 nominal; INT_4 i = 0; INT_4 monoToNom; fp = fopen(gParam.residuesFilename, "r"); if (fp == NULL) { printf("Cannot open Lutefisk residues file."); exit(1); } gAminoAcidNumber = 0; while (!feof(fp)) { if (my_fgets(stringBuffer, 256, fp) == NULL) { continue; } sscanf(stringBuffer, "%c %f %f %d", &singleAA, &monoisotopic, &average, &nominal); if (monoisotopic != 0 && average != 0 && nominal != 0) { gSingAA[gAminoAcidNumber] = singleAA; gMonoMass[gAminoAcidNumber] = monoisotopic; gAvMass[gAminoAcidNumber] = average; gNomMass[gAminoAcidNumber] = nominal; gAminoAcidNumber++; } } /*Check for mistakes*/ for (i = 0; i < gAminoAcidNumber; i++) { if (gMonoMass[i] < gAvMass[i] - 1 || gMonoMass[i] > gAvMass[i] + 1) { printf("*************************************************\n"); printf("You may have a typo in Lutefisk.residues for %c.\n", gSingAA[i]); printf("*************************************************\n"); } monoToNom = gMonoMass[i]; if (monoToNom != gNomMass[i]) { printf("*************************************************\n"); printf("You may have a typo in Lutefisk.residues for %c.\n", gSingAA[i]); printf("*************************************************\n"); } } return; } /* //-------------------------------------------------------------------------------- // SetupGapList() //-------------------------------------------------------------------------------- This function uses the values of cysMW and gNomMass to obtain a list of one and two amino acid extensions. Positions 0-19 contain the standard single amino acid extensions, where the values correspond to the nominal residue mass of the amino acid. Subsequent index values contain the two amino acid extensions. If any amino acids are listed in aaAbsent, then these are not used to generate either the two or one amino acid extensions. The number of values in the array gGapList is variable and depends on the number of amino acids that are absent - amino acids that are absent are not used to make two amino acid extensions. The number of extensions in the array gGapList is gGapListIndex. */ void SetupGapList() { INT_4 i, j, k, sum; char delAmAcid; INT_4 absentFlag; INT_4 duplicateFlag; INT_4 massToAdd; INT_4 lysGlnDiff = (gMonoMass_x100[K] - gMonoMass_x100[Q]) * 1.5; /*mass diff between Q and K plus 50%*/ struct MSData *currPtr = NULL; struct MSData *nextPtr = NULL; /* char singleAAPresent[AMINO_ACID_NUMBER]; REAL_4 massDiff; */ gGapListIndex = -1; for (i = 0; i < gAminoAcidNumber; i++) /*Copy the single aa extension masses.*/ { absentFlag = FALSE; if (gParam.aaAbsent[0] != '*') /* Check to see if the AA is on the absent list. */ { /* (We won't add it to the gap list if it is.) */ delAmAcid = gParam.aaAbsent[0]; j = 0; while (delAmAcid != 0 && (delAmAcid >= 'A' && delAmAcid <= 'Y')) { if (gSingAA[i] == delAmAcid) { absentFlag = TRUE; break; } j++; delAmAcid = gParam.aaAbsent[j]; } } if (absentFlag || i == I || (i == Q && gParam.fragmentErr >= lysGlnDiff)) /* Ile and Gln, which are represented by Leu and Lys.*/ { massToAdd = 0; } else if (i == C && gParam.cysMW != 0) { massToAdd = (gParam.cysMW) + 0.5; /*Change the mass for cysteine (in case its alkylated).*/ } else { massToAdd = gMonoMass_x100[i]; } gGapList[i] = massToAdd; } gGapListIndex = gAminoAcidNumber - 1; for (i = 0; i < gAminoAcidNumber; i++) /*Fill in the masses of the 2 AA extensions.*/ { for (j = i; j < gAminoAcidNumber; j++) { if (gGapList[i] == 0 || gGapList[j] == 0) continue; sum = gGapList[i] + gGapList[j]; /*sum = ((gMonoMass[i] + gMonoMass[j]) * gMultiplier) + 0.5;*/ duplicateFlag = FALSE; for (k = 0; k <= gGapListIndex; k++) { if (gGapList[k] > sum - gParam.fragmentErr && gGapList[k] < sum + gParam.fragmentErr) { /* We already have this mass so don't add it to the list. */ duplicateFlag = TRUE; break; } } /*for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { for(m = gAminoAcidNumber; m <= gGapListIndex; m++) { if(sum - gGapList[k] <= gGapList[m] + gParam.fragmentErr && sum - gGapList[k] >= gGapList[m] - gParam.fragmentErr) { duplicateFlag = true; break; } } } }*/ if (!duplicateFlag) { gGapListIndex = gGapListIndex + 1; gGapList[gGapListIndex] = sum; } } } /*get rid of single amino acids that aren't there (save time later)*/ /* if(firstMassPtr == NULL || firstMassPtr->next == NULL) { printf("LutefiskMain: firstMassPtr = NULL"); exit(1); } for(i = 0; i < gAminoAcidNumber; i++) { singleAAPresent[i] = 0; } currPtr = firstMassPtr; while(currPtr->next != NULL) { nextPtr = currPtr->next; while(nextPtr != NULL) { massDiff = nextPtr->mOverZ - currPtr->mOverZ; if(massDiff <= gMonoMass_x100[W] + gParam.fragmentErr && massDiff >= gMonoMass_x100[G] - gParam.fragmentErr ) { for(i = 0; i < gAminoAcidNumber; i++) { if(gMonoMass_x100[i] != 0) { if(massDiff <= gMonoMass_x100[i] + gParam.fragmentErr && massDiff >= gMonoMass_x100[i] - gParam.fragmentErr) { singleAAPresent[i] = 1; break; } } } } nextPtr = nextPtr->next; } currPtr = currPtr->next; } for(i = 0; i < gAminoAcidNumber; i++) { gGapList[i] = gGapList[i] * singleAAPresent[i]; }*/ return; } /* //-------------------------------------------------------------------------------- // SetupSequenceTag() //-------------------------------------------------------------------------------- Convert from highY, lowY, tag to gParam.tagNMass, gParam.tagCMass, and gParam.tagSequence. Initially I had the user enter the unsequenced N-terminal mass, unsequenced C-terminal mass, and the sequence tag; however, in order to simplify the use of tags, I now mimic the PeptideSearch protocol where the low mass y ion is entered as a float, the high mass y ion is entered as a float, and the sequence tag is entered from low mass to high mass, which is C-terminal to N-terminal direction. For practical reasons (I've never used a b ion sequence tag), I only consider y ion tags. */ void SetupSequenceTag() { char tag[256]; REAL_4 highY, lowY, tagMass, aToMFactor; INT_4 i, k, j; /*Save old values for recalculations*/ highY = gParam.tagNMass; lowY = gParam.tagCMass; i = 0; while (gParam.tagSequence[i] != 0) { tag[i] = gParam.tagSequence[i]; i++; } tag[i] = 0; if (tag[0] == '*' || highY == 0 || lowY == 0) { gParam.tagSequence[0] = '*'; /*Make sure all of the tag paramters are set properly if there is no tag.*/ gParam.tagSequence[1] = '0'; gParam.tagCMass = 0; gParam.tagNMass = 0; } else { /*Make some quick checks to see if I should shit-can the process*/ if (highY > gParam.peptideMW) { printf("Your high mass y ion of your sequence tag is greater than the peptide mass."); exit(1); } if (highY < 0) { printf("Curiously, you have chosen a negative mass value for the high mass y ion."); exit(1); } if (lowY > gParam.peptideMW) { printf("Your low mass y ion of your sequence tag is greater than the peptide mass."); exit(1); } if (lowY < 0) { printf("Curiously, you have chosen a negative mass value for the low mass y ion."); exit(1); } printf("Specified sequence tag: [%8.3f] %s [%8.3f]\n", lowY, tag, highY); gParam.tagCMass = lowY - (2 * gElementMass[HYDROGEN]); gParam.tagNMass = gParam.peptideMW - (highY - (2 * gElementMass[HYDROGEN])); j = 0; k = 0; while (tag[k] != 0) { k++; } k--; while (tag[j] != 0) { gParam.tagSequence[j] = tag[k]; k--; j++; } } if (gParam.tagSequence[0] != '*') /*Make sure tag matches peptide MW.*/ { tagMass = gParam.tagCMass + gParam.tagNMass; j = 0; while (gParam.tagSequence[j] != 0) { for (k = 0; k < gAminoAcidNumber; k++) { if (gSingAA[k] == gParam.tagSequence[j]) { if (k == C) { tagMass = tagMass + gParam.cysMW; } else { tagMass = tagMass + gMonoMass[k]; } break; } } j++; } if (tagMass > gParam.monoToAv) { tagMass = tagMass * MONO_TO_AV; } else if (tagMass > gParam.monoToAv - AV_MONO_TRANSITION) { aToMFactor = (tagMass - (gParam.monoToAv - AV_MONO_TRANSITION)) / AV_MONO_TRANSITION; aToMFactor = (MONO_TO_AV - 1) * aToMFactor; aToMFactor = 1 + aToMFactor; tagMass = tagMass * aToMFactor; } if ((tagMass > gParam.peptideMW + gParam.peptideErr) || (tagMass < gParam.peptideMW - gParam.peptideErr)) { printf("Your peptide tag does not match the peptide molecular weight.\n"); exit(1); } } return; } /* //-------------------------------------------------------------------------------- // SystemCheck() //-------------------------------------------------------------------------------- */ BOOLEAN SystemCheck(void) { BOOLEAN testValue = TRUE; BOOLEAN big_endian; #if defined __BIG_ENDIAN big_endian = TRUE; #else big_endian = FALSE; #endif { UINT_4 test[2] = {0x41424344, 0x0}; /* ASCII "ABCD" in big endian */ /* printf ("%s\n", (char *)&test); */ if (big_endian) { if (strcmp((char *)&test, "ABCD")) { printf("Program should be set to __LITTLE_ENDIAN in LutefiskDefinitions.h\n"); testValue = FALSE; } } else if (!strcmp((char *)&test, "ABCD")) { printf("Program should be set to __BIG_ENDIAN in LutefiskDefinitions.h\n"); testValue = FALSE; } } if (sizeof(REAL_4) != 4) { printf("REAL_4 is %d bytes instead of 4.\n", sizeof(REAL_4)); testValue = FALSE; } if (sizeof(REAL_8) != 8) { printf("REAL_8 is %d bytes instead of 8.\n", sizeof(REAL_8)); testValue = FALSE; } if (sizeof(INT_2) != 2) { printf("INT_2 is %d bytes instead of 2.\n", sizeof(INT_2)); testValue = FALSE; } if (sizeof(UINT_2) != 2) { printf("UINT_2 is %d bytes instead of 2.\n", sizeof(UINT_2)); testValue = FALSE; } if (sizeof(INT_4) != 4) { printf("INT_4 is %d bytes instead of 4.\n", sizeof(INT_4)); testValue = FALSE; } if (sizeof(UINT_4) != 4) { printf("UINT_4 is %d bytes instead of 4.\n", sizeof(UINT_4)); testValue = FALSE; } if (sizeof(CHAR) != 1) { printf("CHAR is %d bytes instead of 1.\n", sizeof(CHAR)); testValue = FALSE; } if (sizeof(BOOLEAN) != 1) { printf("BOOLEAN is %d bytes instead of 1.\n", sizeof(BOOLEAN)); testValue = FALSE; } return(testValue); } /****************************PrintHeaderToFile*************************************************** * This function prints header information to the output file. */ void PrintPartingGiftToFile() { FILE *fp; PrintHeaderToFile(); /* Open the file for appending.*/ fp = fopen(gParam.outputFile, "a"); if(fp == NULL) /*fopen returns NULL if there's a problem.*/ { printf("Cannot open %s for appending.\n", gParam.outputFile); exit(1); } printf("\n\nNo potential candidate sequences could be found.\n\n"); fprintf(fp, "No potential candidate sequences could be found.\n\n"); fclose(fp); } /****************************PrintHeaderToFile*************************************************** * This function prints header information to the output file. */ void PrintHeaderToFile() { INT_4 i; FILE *fp; const time_t theTime = (const time_t)time(NULL); /* Open a new file.*/ fp = fopen(gParam.outputFile, "w"); if(fp == NULL) /*fopen returns NULL if there's a problem.*/ { printf("Cannot open %s to write the output.\n", gParam.outputFile); exit(1); } fprintf(fp, versionString); fprintf(fp, "Run Date: %20s", ctime(&theTime)); /* Print header information from gParam to the console and the file.*/ fprintf(fp, " Filename: "); /*Print the CID data file name.*/ i = 0; while(gParam.cidFilename[i] != 0) { fputc(gParam.cidFilename[i], fp); i++; } fprintf(fp, "\n Molecular Weight: %7.2f", gParam.peptideMW); fprintf(fp, " Molecular Weight Tolerance: %5.2f", gParam.peptideErr); fprintf(fp, " Fragment Ion Tolerance: %5.2f", gParam.fragmentErr); fprintf(fp, "\n Ion Offset: %5.2f", gParam.ionOffset); fprintf(fp, " Charge State: %2ld", gParam.chargeState); if(gParam.centroidOrProfile == 'P') { fprintf(fp, " Profile Data \n"); } else { fprintf(fp, " Centroided or Pre-processed Data \n"); } if(gParam.proteolysis == 'T') { fprintf(fp, " Tryptic Digest"); } if(gParam.proteolysis == 'K') { fprintf(fp, " Lys-C Digest"); } if(gParam.proteolysis == 'E') { fprintf(fp, " Glu-C Digest"); } if(gParam.proteolysis == 'N') { fprintf(fp, " ??? Digest"); } if(gParam.fragmentPattern == 'G') { fprintf(fp, " Unknown Fragmentation Pattern \n"); } if(gParam.fragmentPattern == 'T') { fprintf(fp, " Tryptic Triple Quadrupole Fragmentation Pattern \n"); } if(gParam.fragmentPattern == 'L') { fprintf(fp, " Tryptic Ion Trap Fragmentation Pattern \n"); } if(gParam.fragmentPattern == 'Q') { fprintf(fp, " Tryptic QTOF Fragmentation Pattern \n"); } fprintf(fp, " Cysteine residue mass: %7.2f", gParam.cysMW); fprintf(fp, " Switch from monoisotopic to average mass at %d \n", gParam.monoToAv); fprintf(fp, " Ions per window: %.1f", gParam.ionsPerWindow); fprintf(fp, " Extension Threshold: %4.2f", gParam.extThresh); fprintf(fp, " Extension Number: %2ld", gParam.maxExtNum); fprintf(fp, "\n Gaps: %2ld", gParam.maxGapNum); fprintf(fp, " Peak Width: %4.1f", ((gParam.peakWidth) * 2)); fprintf(fp, " Data Threshold: %5.2f (%ld)", gParam.ionThreshold, gParam.intThreshold); fprintf(fp, " Ions per residue: %.1f", gParam.ionsPerResidue); fprintf(fp, "\n Amino acids known to be present: "); i = 0; while(gParam.aaPresent[i] != 0) { fputc(gParam.aaPresent[i], fp); i++; } fprintf(fp, "\n Amino acids known to be absent: "); i = 0; while(gParam.aaAbsent[i] != 0) { fputc(gParam.aaAbsent[i], fp); i++; } fprintf(fp, "\n"); fprintf(fp, "\n C-terminal mass: %7.4f", gParam.modifiedCTerm); fprintf(fp, "\n N-terminal mass: %7.4f", gParam.modifiedNTerm); fprintf(fp, "\n N-terminal Tag Mass: %7.2f", gParam.tagNMass); fprintf(fp, " C-terminal Tag Mass: %7.2f", gParam.tagCMass); fprintf(fp, " Sequence Tag: "); i = 0; while(gParam.tagSequence[i] != 0) { fputc(gParam.tagSequence[i], fp); i++; } if(gParam.edmanPresent) { fprintf(fp, "\n Edman data is available. "); } else { fprintf(fp, "\n Edman data is not available. "); } if(gParam.autoTag == TRUE) { fprintf(fp, "AutoTag ON"); } else { fprintf(fp, "AutoTag OFF"); } if(gParam.CIDfileType == 'T') { fprintf(fp, " CID data file is tab-delineated"); } if(gParam.CIDfileType == 'F') { fprintf(fp, " CID data file is Finnigan ASCII file"); } /* Pad the output with a few blank lines for later use.*/ fprintf(fp, "\n\n\n"); fclose(fp); return; } /******************************AdjustPeptideMW**************************************** * * Ion trap data usually has complementary pairs of ions (b and y ions for the same * cleavage. These pairs can be used to adjust the peptide MW, which is often not * very accurately determined in the MS scan. */ void AdjustPeptideMW(struct MSData *firstMassPtr) { REAL_4 mass[MAX_ION_NUM], peptide[MAX_ION_NUM], testMass, maxPeptideMass; REAL_4 pairMass[MAX_ION_NUM], mass2[MAX_ION_NUM]; REAL_4 lowMassIon[MAX_ION_NUM], water, ammonia, avePairMass; REAL_8 stDev; INT_4 ionNum, i, j, peptideNum, nominalPeptide[MAX_ION_NUM], maxPeptideNum; INT_4 nominalCount, requiredPairs, pairNum; BOOLEAN test; struct MSData *currPtr; /*initialize*/ water = gElementMass[OXYGEN] + 2 * gElementMass[HYDROGEN]; ammonia = gElementMass[NITROGEN] + 3 * gElementMass[HYDROGEN]; peptideNum = 0; ionNum = 0; maxPeptideNum = 0; maxPeptideMass = 0; pairNum = 0; stDev = 0; avePairMass = 0; for(i = 0; i < MAX_ION_NUM; i++) { peptide[i] = 0; nominalPeptide[i] = 0; pairMass[i] = 0; mass[i] = 0; mass2[i] = 0; } if(gParam.peptideMW < 750) { requiredPairs = 1; /*need more than this number of pairs*/ } else if(gParam.peptideMW < 1500) { requiredPairs = 2; } else if(gParam.peptideMW < 2250) { requiredPairs = 3; } else { requiredPairs = 4; } /* Fill in the mass array assuming singly-charged ions.*/ currPtr = firstMassPtr; while(currPtr != NULL) { mass[ionNum] = currPtr->mOverZ - gElementMass[HYDROGEN]; ionNum++; currPtr = currPtr->next; } /* Fill in the mass array assuming doubly-charged ions.*/ if(gParam.chargeState > 2) { for(i = 0; i < ionNum; i++) { testMass = (mass[i] + gElementMass[HYDROGEN]) * 2 - 2 * gElementMass[HYDROGEN]; if(testMass < gParam.peptideMW - gMonoMass[G] + gParam.fragmentErr && testMass > 700) /*doubly charged ions have to be in the right mass range*/ { mass2[i] = testMass; } else { mass2[i] = 0; /*zero is a flag that it could not be a doubly-charged ion*/ } } } /* Find a suitable error*/ /*assume all ions are singly-charged*/ for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { testMass = mass[i] + mass[j]; if(testMass <= gParam.peptideMW + gParam.peptideErr * 2 && testMass >= gParam.peptideMW - gParam.peptideErr * 2) { pairMass[pairNum] = testMass; /*collect the data*/ pairNum++; } } } /*now assume that one of the pair is doubly-charged*/ if(gParam.chargeState > 2) { for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { if(mass2[j] > mass[i]) { testMass = mass[i] + mass2[j]; if(testMass <= gParam.peptideMW + gParam.peptideErr * 2 && testMass >= gParam.peptideMW - gParam.peptideErr * 2) { pairMass[pairNum] = testMass; /*collect the data*/ pairNum++; } } } } } if(pairNum < 3) { stDev = gParam.peptideErr; /*not enough pairs of ions to determine standard deviation so it gets defined as the peptide error from the params file*/ } else /*enough data to take a stab at finding standard deviation*/ { for(i = 0; i < pairNum; i++) { avePairMass += pairMass[i]; } avePairMass = avePairMass / pairNum; for(i = 0; i < pairNum; i++) { stDev += ((pairMass[i] - avePairMass) * (pairMass[i] - avePairMass)); } stDev = stDev / (pairNum - 1); stDev = sqrt(stDev); } /*reality checks*/ if(stDev > 2 * gParam.peptideErr) { stDev = 2 * gParam.peptideErr; /*don't let the error be too big*/ } else if(stDev < 0.5 * gParam.peptideErr) { stDev = 0.5 * gParam.peptideErr; /*or too small*/ } /*find pairs of masses that are close to the peptide molecular weight*/ /*first assume the ions are all singly-charged*/ for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { testMass = mass[i] + mass[j]; if(testMass <= gParam.peptideMW + stDev && testMass >= gParam.peptideMW - stDev) { peptide[peptideNum] = testMass; lowMassIon[peptideNum] = mass[i]; peptideNum++; } } } /*now assume that one of them is doubly-charged*/ if(gParam.chargeState > 2) { for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { if(mass2[j] > mass[i]) { testMass = mass[i] + mass2[j]; if(testMass <= gParam.peptideMW + stDev && testMass >= gParam.peptideMW - stDev) { peptide[peptideNum] = testMass; lowMassIon[peptideNum] = mass[i]; peptideNum++; } } } } } /*Find the correct nominal masses of the peptides that were identified*/ for(i = 0; i < peptideNum; i++) { nominalPeptide[i] = peptide[i] - (peptide[i] * 0.00050275) + 0.5; } /*Wipe out nominalPeptides that are derived from ions that are too close in mass */ for(i = 0; i < peptideNum; i++) { testMass = nominalPeptide[i]; if(testMass != 0) { for(j = 0; j < peptideNum; j++) { if(nominalPeptide[j] == testMass && i != j) { if(fabs(lowMassIon[i] - lowMassIon[j]) <= gElementMass[HYDROGEN] + gParam.fragmentErr) { test = TRUE; } else { test = FALSE; } /* test = TRUE; // if(fabs(lowMassIon[i] - lowMassIon[j]) >= ammonia - gParam.fragmentErr && // fabs(lowMassIon[i] - lowMassIon[j]) <= ammonia + gParam.fragmentErr) // { // test = FALSE; // } // if(fabs(lowMassIon[i] - lowMassIon[j]) >= water - gParam.fragmentErr && // fabs(lowMassIon[i] - lowMassIon[j]) <= water + gParam.fragmentErr) // { // test = FALSE; // } // if(fabs(lowMassIon[i] - lowMassIon[j]) >= gMonoMass[G] - gParam.fragmentErr) // { // test = FALSE; // }*/ if(test) /*if the two ions differ by a small mass thats not water or ammonia*/ { nominalPeptide[j] = 0; } } } } } /*Count the numbers of each nominal peptide mass*/ for(i = 0; i < peptideNum; i++) { testMass = nominalPeptide[i]; if(testMass != 0) { nominalCount = 0; for(j = 0; j < peptideNum; j++) { if(nominalPeptide[j] == testMass) { nominalCount++; } } if(nominalCount > maxPeptideNum) { maxPeptideNum = nominalCount; maxPeptideMass = testMass; } } } /*decide what to do with this information*/ if(maxPeptideNum > requiredPairs) /*need at least 3 pairs of ions to change the peptide MW*/ { maxPeptideMass = maxPeptideMass + (maxPeptideMass * 0.00050275); /*add estimated mass defect*/ testMass = fabs(maxPeptideMass - gParam.peptideMW); if(testMass <= gParam.peptideErr * 1.5) { maxPeptideNum = maxPeptideNum - requiredPairs; /*maxPeptideNum is now the number in excess of requiredPairs*/ testMass = maxPeptideMass * maxPeptideNum + gParam.peptideMW; testMass = testMass / (maxPeptideNum + 1); /*calculate the average of the theoretical and obsv'd MW*/ printf("Peptide MW was adjusted from %f to ", gParam.peptideMW); gParam.peptideMW = testMass; printf("%f using %ld ion pairs\n", gParam.peptideMW, maxPeptideNum + requiredPairs); } } return; } lutefisk-1.0.7+dfsg.orig/src/Makefile.sun0000644000175000017500000000351410124104201020170 0ustar rusconirusconi# #sun (bsd) # for mips, also use: -mips2 -O2 # CC= gcc -O CFLAGS= -D__SOLARIS LFLAGS= -lm -o BIN = /seqprg/slib/bin #NRAND= nrand #IBM RS/6000 NRAND= nrand48 RANFLG= -DRAND32 #HZ=60 for sun, mips, 100 for rs/6000, SGI, LINUX HZ=60 PROGS= lutefisk SPROGS= lutefisk .c.o: $(CC) $(CFLAGS) -c $< all : $(PROGS) sall : $(SPROGS) install : cp $(PROGS) $(BIN) clean-up : rm *.o $(PROGS) lutefisk : LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(CC) LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(LFLAGS) lutefisk LutefiskGlobalDeclarations.o : LutefiskGlobalDeclarations.c $(CC) $(CFLAGS) -c LutefiskGlobalDeclarations.c LutefiskMain.o : LutefiskMain.c $(CC) $(CFLAGS) -c LutefiskMain.c LutefiskGetCID.o : LutefiskGetCID.c $(CC) $(CFLAGS) -c LutefiskGetCID.c LutefiskHaggis.o : LutefiskHaggis.c $(CC) $(CFLAGS) -c LutefiskHaggis.c LutefiskMakeGraph.o : LutefiskMakeGraph.c $(CC) $(CFLAGS) -c LutefiskMakeGraph.c LutefiskSummedNode.o : LutefiskSummedNode.c $(CC) $(CFLAGS) -c LutefiskSummedNode.c LutefiskSubseqMaker.o : LutefiskSubseqMaker.c $(CC) $(CFLAGS) -c LutefiskSubseqMaker.c LutefiskScore.o : LutefiskScore.c $(CC) $(CFLAGS) -c LutefiskScore.c LutefiskXCorr.o : LutefiskXCorr.c $(CC) $(CFLAGS) -c LutefiskXCorr.c LutefiskFourier.o : LutefiskFourier.c $(CC) $(CFLAGS) -c LutefiskFourier.c LutefiskGetAutoTag.o : LutefiskGetAutoTag.c $(CC) $(CFLAGS) -c LutefiskGetAutoTag.c ListRoutines.o : ListRoutines.c $(CC) $(CFLAGS) -c ListRoutines.c lutefisk-1.0.7+dfsg.orig/src/LutefiskXCorr.c0000644000175000017500000007626310474603032020665 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* Richard S. Johnson 6/96 - ? LutefiskXP is a program designed to aid in the interpretation of CID data of peptides. The main assumptions are that the data is of reasonable quality, the N- and C-terminal modifications (if any) are known, and the precursor ion charge (and therefore the peptide molecular weight) are known. The ultimate goal here is to develop code that can utilize msms data in conjunction with ambiguous and incomplete Edman sequencing data, sequence tags, peptide derivatization, and protein or est database searches. An older version of LutefiskXP has been written in FORTRAN and runs on 68K Macs that have an fpu (1991, 39th ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN, pp 1233- 1234). This is a different and improved algorithm partly inspired by Fernandez-de-Cossjo, et al. (1995) CABIOS Vol. 11 No. 4 pp 427-434. Combining this msms interpretation algorithm with Edman sequencing, database searches, and derivatization is entirely of my own design; J. Alex Taylor implemented the changes in the FASTA code (Bill Pearson, U. of VA) so that the LutefiskXP output can be read directly by the modified FASTA program. In addition, there were a number of additional critical changes made to FASTA to make it more compatible with msms sequencing data. The trademark LutefiskXP was chosen at random, and is not meant to imply any similarity between this computer program and the partially base-hydrolzyed cod fish of the same name (minus XP). */ #include #include #include #include /* Lutefisk headers*/ #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" #define MAXPARENTCHARGE 9 #define SIDE_PEAK_ATT 0.75 /*Peak heights on sides of main peak in mock spectrum.*/ #define PLUS1_NEUT_LOSS_ATT 0.5 /*Peak height for neutral losses of singly charged precursors.*/ #define NEUT_LOSS_ATT 0.1 /*Peak height for neutral losses of multiply charged precursors.*/ #define BAD_B_ATT 0.05 /*Peak heights for b ions that are not very likely.*/ #define A_ATT 0.5 /*Peak heights for a ions.*/ #define BAD_A_ATT 0.05 /*Peak heights for a ions that are not very likely.*/ #define INT_FRAG_ATT 0.1 /*peak heights for internal fragment ions.*/ #define BAD_Y_ATT 0.05 /*Peak heights for y ions that are not very likely.*/ extern REAL_4 *spectrum1; extern REAL_4 *spectrum2; extern REAL_4 *tau; UINT_4 SIZEOF_SPECTRA; /* Smallest power of 2 (for cross-correlation)*/ REAL_4 gSidePeakAtt = SIDE_PEAK_ATT; /*Here's some globals that are specific to this file. They are two amino acid nominal masses *times 100 for Cys, Arg, His, and Lys. These get modified at the start of the ScoreSequences *function in order to accomodate different alkyl groups on cysteine. */ INT_4 gCysPlusXCorr[AMINO_ACID_NUMBER] = { 174, 259, 217, 218, 206, 232, 231, 160, 240, 216, 216, 231, 234, 250, 200, 190, 204, 289, 266, 202, 0,0,0,0,0 }; INT_4 gArgPlusXCorr[AMINO_ACID_NUMBER] = { 227, 312, 270, 271, 259, 285, 284, 213, 293, 269, 269, 284, 287, 303, 253, 243, 257, 342, 319, 255, 0,0,0,0,0 }; INT_4 gHisPlusXCorr[AMINO_ACID_NUMBER] = { 208, 293, 251, 252, 240, 266, 265, 194, 274, 250, 250, 265, 268, 284, 234, 224, 238, 323, 300, 236, 0,0,0,0,0 }; INT_4 gLysPlusXCorr[AMINO_ACID_NUMBER] = { 199, 284, 242, 243, 231, 257, 256, 185, 265, 241, 241, 256, 259, 275, 225, 215, 229, 314, 291, 227, 0,0,0,0,0 }; INT_4 lowMassIonMass[AMINO_ACID_NUMBER] = { 44, 112, 87, 88, 76, 102, 102, 30, 110, 86, 86, 129, 104, 120, 70, 60, 74, 159, 136, 72, 0,0,0,0,0 }; REAL_4 lowMassIonIntFactor[AMINO_ACID_NUMBER] = { 0.1, 0.1, 0.1, 0.1, 0.0, 0.1, 0.1, 0.0, 0.5, 0.5, 0.5, 0.25, 0.1, 0.5, 0.3, 0.1, 0.1, 0.2, 0.3, 0.2, 0,0,0,0,0 }; extern void CrossCorrScoreTheSeq(struct SequenceScore *currScorePtr); /***************************YXCorrCalc***************************************************** * * This function calculates the mass of singly charged y ions for a given cleavage, * and a given sequence. It returns a REAL_4. * */ REAL_4 YXCorrCalc(INT_4 i, struct SequenceScore *currScorePtr, REAL_4 YionStart, INT_4 seqLength) { REAL_4 Yion; INT_4 j, k; char test; REAL_8 mToAFactor; Yion = YionStart; for(j = i; j < seqLength; j++) { test = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(currScorePtr->peptide[j] == gNomMass[k]) { test = FALSE; Yion += gMonoMass[k]; break; } } if(test) /*its a 2 aa extension*/ { Yion = Yion + ((currScorePtr->peptide[j])) + 0.07; /*0.07 is a guess at the mass defect*/ } } if(Yion > gParam.monoToAv) { mToAFactor = 0; } else { if(Yion >= (gParam.monoToAv - AV_MONO_TRANSITION)) { mToAFactor = (gParam.monoToAv - Yion) / AV_MONO_TRANSITION; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(Yion >= (gParam.monoToAv - AV_MONO_TRANSITION)) { Yion = Yion * mToAFactor; } return(Yion); } /******************************BXCorrCalc*************************************************** * * This function calculates the mass of singly charged b ions for a given cleavage, * and a given sequence. It returns a REAL_4. * */ REAL_4 BXCorrCalc(INT_4 i, struct SequenceScore *currScorePtr, REAL_4 BionStart) { REAL_4 Bion; INT_4 j, k; char test; REAL_8 mToAFactor; Bion = BionStart; for(j = 0; j < i; j++) { test = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(currScorePtr->peptide[j] == gNomMass[k]) { test = FALSE; Bion += gMonoMass[k]; break; } } if(test) /*its a 2 aa extension*/ { Bion = Bion + ((currScorePtr->peptide[j])) + 0.07; /*0.07 is a guess at the mass defect*/ } } if(Bion > gParam.monoToAv) { mToAFactor = 0; } else { if(Bion >= (gParam.monoToAv - AV_MONO_TRANSITION)) { mToAFactor = (gParam.monoToAv - Bion) / AV_MONO_TRANSITION; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(Bion >= (gParam.monoToAv - AV_MONO_TRANSITION)) { Bion = Bion * mToAFactor; } return(Bion); } /*********************************FindNChargeXCorr******************************************** * * Counts the number of charged residues in the sequence. * */ INT_4 FindNChargeXCorr(struct SequenceScore *currScorePtr) { INT_4 j, k, seqLength, nChargeCount, *pSeq; /* Initialize.*/ nChargeCount = 1; seqLength = 0; /* Determine the sequence length. */ pSeq = &currScorePtr->peptide[0]; while(*pSeq != NULL) { seqLength++; pSeq++; } /* Find one aa extensions that are either R, H, or K. */ for(j = 0; j < seqLength; j++) { if((currScorePtr->peptide[j] == gNomMass[R]) || (currScorePtr->peptide[j] == gNomMass[H]) || (currScorePtr->peptide[j] == gNomMass[K])) { nChargeCount += 1; } } /* Here I look for two amino acid extensions containing Arg, His, or Lys. */ for(j = 0; j < seqLength; j++) { for(k = 0; k < gAminoAcidNumber; k++) { if(currScorePtr->peptide[j] == gArgPlusXCorr[k] || currScorePtr->peptide[j] == gHisPlusXCorr[k] || currScorePtr->peptide[j] == gLysPlusXCorr[k]) { nChargeCount += 1; } } } return(nChargeCount); } /************************** DoCrossCorrelationScoring******************************************* * * */ void DoCrossCorrelationScoring(struct SequenceScore *firstScorePtr, struct MSData *firstMassPtr) { INT_4 i, seqNum; REAL_4 normalizedScore; /* The max absolute cross-correlation value * (Later used as the normalizing factor)*/ REAL_4 tauDiff, intensityAccountedFor, autocorrelation; struct SequenceScore *currSeqPtr; if (!firstScorePtr) { return; } /*For high accuracy data, make peaks w/ no extra width*/ if(gParam.qtofErr != 0) { if(gParam.qtofErr < 0.25) { gSidePeakAtt = 0; } else { gSidePeakAtt = SIDE_PEAK_ATT; } } else { gSidePeakAtt = SIDE_PEAK_ATT; } /*Do the funny normalization of intensities that Eng does in Sequest.*/ CalcNormalizedExptPeaks(firstMassPtr); /*Set aside memory for the spectra that are cross-correlated.*/ SetupCrossCorrelation(); if (!spectrum1 || !spectrum2 || !tau) { /* Do not proceed with cross-correlation because we couldn't get the memory*/ return; } else { FillInSpectrum1(firstMassPtr); /*Using the list of ions from firstMassPtr, generate a "real" spectrum for cross-correlating. These mass values will be 2 times the actual number in order to add to the specificity of the cross-correlation.*/ } /*Do an autocorrelation of the spectrum. This is the normalizing factor used later*/ CrossCorrelate(spectrum1-1, spectrum1-1, (UINT_4) SIZEOF_SPECTRA, tau-1); for(i = 0; i < SIZEOF_SPECTRA; i++) { if(tau[i] < 1) { tau[i] = 0; } } intensityAccountedFor = 0.0; for(i = 1; i < 250; i++) { tauDiff = tau[i] - tau[SIZEOF_SPECTRA - i]; if(tauDiff < 0) { tauDiff = tauDiff * -1; } intensityAccountedFor += tauDiff; } autocorrelation = tau[0] - intensityAccountedFor/250; /*should be equal to tau(0)*/ /*debug spectrum1*/ /*printf("data spectrum \n"); for(i = 0; i < SIZEOF_SPECTRA; i++) { if(spectrum1[i] != 0) { printf("%f %f \n",(REAL_4)i/2,spectrum1[i]); } }*/ /*Count the number of sequences.*/ seqNum = 0; currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { seqNum++; currSeqPtr = currSeqPtr->next; } /*If there are too many, then cut the number of sequences to be cross-correlated.*/ if(seqNum > MAX_X_CORR_NUM) { seqNum = MAX_X_CORR_NUM; } /*Cross-correlate the sequences. Only do the top intensity-scorers.*/ for(i = 1; i <= seqNum; i++) { currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { if(i == currSeqPtr->rank) { CrossCorrScoreTheSeq(currSeqPtr); } currSeqPtr = currSeqPtr->next; } } /* Normalize the cross-correlation results to 1.0*/ normalizedScore = 0.0 ; /* First find the highest score*/ currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { if (currSeqPtr->crossDressingScore > normalizedScore) { normalizedScore = currSeqPtr->crossDressingScore; } currSeqPtr = currSeqPtr->next; /* Point to the next struct in the linked list to * continue the while loop. */ } /* Use the highest score to normalize them all*/ if (normalizedScore > 0.0) { normalizedScore = 1/normalizedScore; } /*normalizedScore = 1;*/ /*to keep from normalizing*/ /*normalizedScore = 1 / gParam.peptideMW;*/ if(autocorrelation == 0) { printf("Avoiding divide by zero."); autocorrelation += 0.0001; // JAT 2006.08.28 // exit(1); } normalizedScore = 1 / autocorrelation; /*another way to normalize*/ currSeqPtr = firstScorePtr; while(currSeqPtr != NULL) { currSeqPtr->crossDressingScore = normalizedScore * currSeqPtr->crossDressingScore; currSeqPtr = currSeqPtr->next; /* Point to the next struct in the linked list to * continue the while loop. */ } if (spectrum1) free(spectrum1); spectrum1 = NULL; if (tau) free(tau); tau = NULL; return; } /************************** CrossCorrScoreTheSeq******************************************* * * This function calculates the cross-correlation score for ea. peptide passed to it. */ void CrossCorrScoreTheSeq(struct SequenceScore *currScorePtr) { INT_4 i, j, k; /* Loop indicies. */ INT_4 chargeLimit; /* Max # of daughter charges to consider */ REAL_4 intensityAccountedFor = 0.0;/* Sum of the intensities of the ions matched with fragment ions. */ REAL_4 Bion; /* Mass of the singly charged B ion. */ REAL_4 Yion; /* Mass of the singly charged Y ion. */ REAL_4 BionStart; /* Mass of N-terminal group (H, Ac, etc)*/ REAL_4 YionStart; /* Mass of Y-terminal group (free or amidated)*/ REAL_4 BionMass; /* Mass of the B ion in a particular charge state. */ REAL_4 AionMass; /* Mass of the A ion (B ion - CO) in a particular charge state. */ REAL_4 YionMass; /* Mass of the Y ion in a particular charge state. */ REAL_4 proAttenuation; /* Attenuations intensity for y ions next to Pro*/ REAL_4 glyAttenuation; /* Attenuates intensity for y ions next to Gly*/ REAL_4 fragmentMass; /* Mass of the internal fragment ion. */ REAL_4 parent; /* The m/z of the parent ion. */ REAL_4 tolerance; /* The daughter ion error tolerance. */ REAL_4 offset; /* The mass offset for daughter ions. */ REAL_4 parentMinH2O, parentMinNH3, parentMin2H2O, parentMin2NH3; REAL_4 tauDiff; REAL_4 highMassRange, lowMassRange; REAL_4 fullIntensity; /* Intensity used by cross-correlation when creating dummy spectra. */ INT_4 seqLength, nChargeCount, cChargeCount; INT_4 *pSeq; char NTermTest, widePeak; parent = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; tolerance = gParam.fragmentErr; if(gParam.fragmentPattern == 'L') /*ion masses outside of this range are not penalized if not found*/ { lowMassRange = parent * 0.333; /*so-called 1/3 rule*/ highMassRange = 2000; /*mass limit for Deca*/ } else { lowMassRange = 146; /*y1 for Lys*/ highMassRange = 2 * parent; /*often the very high mass ions are missing*/ } offset = 0; /*the offset has already been applied*/ chargeLimit = gParam.chargeState; if(gParam.fragmentErr <= 0.75) { widePeak = FALSE; /*mock peak widths are 1.5 dalton*/ } else { widePeak = TRUE; /*mock peak widths are 2.5 daltons*/ } if (spectrum2) { free(spectrum2); /* Throw away the old data (if any exists) */ spectrum2 = NULL; } spectrum2 = (REAL_4 *) malloc(SIZEOF_SPECTRA*sizeof(REAL_4)); if(spectrum2 == NULL) { printf("Out of memory"); exit(1); } for(i = 0; i < SIZEOF_SPECTRA; i++) { spectrum2[i] = 0; } if (!spectrum2) return; seqLength = 0; /* Determine the mass of the N and C-termini. */ BionStart = gParam.modifiedNTerm; YionStart = gParam.modifiedCTerm + (2 * gElementMass[HYDROGEN]); /* Setup the values for nChargeCount and cChargeCount; FindNChargeXCorr also removes the factor of 100 from the sequence masses. */ cChargeCount = 1; nChargeCount = FindNChargeXCorr(currScorePtr); /* Test if N-terminal amino acid is one or two residues. */ NTermTest = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(currScorePtr->peptide[0] == gNomMass[k]) { NTermTest = FALSE; /*TRUE if a two amino acid step.*/ } } /* Find the sequence length, so I can loop through it. */ pSeq = &currScorePtr->peptide[0]; seqLength = 0; while(*pSeq != NULL) { seqLength++; pSeq++; } /* Here's where I loop through the sequence. ================================================================= */ for (i = seqLength - 1; i > 0; i--) { fullIntensity = 50; /*Arbitrary, but it should match the spectrum1 max value.*/ /* Figure out what Bion and Yion should be.*/ Bion = BXCorrCalc(i, currScorePtr, BionStart); Yion = YXCorrCalc(i, currScorePtr, YionStart, seqLength); /* Figure out what cChargeCount and nChargeCount should be.*/ if((currScorePtr->peptide[i] == gNomMass[R]) || (currScorePtr->peptide[i] == gNomMass[H]) || (currScorePtr->peptide[i] == gNomMass[K])) { cChargeCount += 1; nChargeCount -= 1; } else /*Check to see if its a two amino acid combo that could contain Arg, His, or Lys.*/ { for(j = 0; j < gAminoAcidNumber; j++) { if(currScorePtr->peptide[i] == gArgPlusXCorr[j] || currScorePtr->peptide[i] == gHisPlusXCorr[j] || currScorePtr->peptide[i] == gLysPlusXCorr[j]) { cChargeCount += 1; nChargeCount -= 1; break; } } } /* Look for ea. possible charge state.>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> */ for (j = 1; j <= chargeLimit; j++) { BionMass = (Bion + (j - 1))/j; AionMass = (Bion - CO + (j - 1))/j; YionMass = (Yion + (j - 1))/j; /* Once we are looking for fragment ions of charge 3 or above these should be considered * less likely and so, if cross-correlation is on, these peaks will be given an intensity * quarter of what it would be for a charge 1 or 2 fragment ion. */ if (j == 3) fullIntensity = fullIntensity / 4; /* Dummy up a theoretical spectrum for the peptide. * Maker sure there are chargeable amino acids for the given charge - j - and don't * bother with b and a ions that have just a proton or have just a proton and one aa.*/ if(nChargeCount >= j && i != seqLength && (i != 1 || NTermTest)) { if((BionMass * j) > (j-1) * 500) /*Make sure there is enough mass for the charge.*/ { if (BionMass < msms.scanMassHigh && BionMass < highMassRange && BionMass > lowMassRange) { if(gParam.chargeState == 1) /*If the precursor is singly charged.*/ { BionMass = BionMass * 2; NH3 = NH3 * 2; H2O = H2O * 2; AddPeakToSpectrum(spectrum2, BionMass, fullIntensity); AddPeakToSpectrum(spectrum2, BionMass - NH3/j, fullIntensity * PLUS1_NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, BionMass - H2O/j, fullIntensity * PLUS1_NEUT_LOSS_ATT); BionMass = BionMass * 0.5; NH3 = NH3 * 0.5; H2O = H2O * 0.5; } else /*For multiply charged precursors.*/ { /*If the +1 b ion has an m/z less than the precursor ion or if the number of chargeable amino acids in the +1 b ion equals the charge state of the precursor, then give these guys full intensity.*/ if((j == 1 || j <= (gParam.chargeState - 1)) && (BionMass < parent || nChargeCount == gParam.chargeState || gParam.fragmentPattern == 'L')) { BionMass = BionMass * 2; NH3 = NH3 * 2; H2O = H2O * 2; AddPeakToSpectrum(spectrum2, BionMass, fullIntensity); AddPeakToSpectrum(spectrum2, BionMass - NH3/j, fullIntensity * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, BionMass - H2O/j, fullIntensity * NEUT_LOSS_ATT); BionMass = BionMass * 0.5; NH3 = NH3 * 0.5; H2O = H2O * 0.5; } else /*Otherwise... (ie, >+1 b ions or +1 bions > precursor)*/ { BionMass = BionMass * 2; NH3 = NH3 * 2; H2O = H2O * 2; AddPeakToSpectrum(spectrum2, BionMass, fullIntensity * BAD_B_ATT); AddPeakToSpectrum(spectrum2, BionMass - NH3/j, fullIntensity * BAD_B_ATT * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, BionMass - H2O/j, fullIntensity * BAD_B_ATT * NEUT_LOSS_ATT); BionMass = BionMass * 0.5; NH3 = NH3 * 0.5; H2O = H2O * 0.5; } } } if (AionMass < msms.scanMassHigh && AionMass < highMassRange && AionMass > lowMassRange) { if(gParam.chargeState == 1) { AionMass = AionMass * 2; AddPeakToSpectrum(spectrum2, AionMass, fullIntensity * A_ATT); AionMass = AionMass * 0.5; } else { /* Give higher intensity for singly charged a2 ions. Skip if LCQ data (?).*/ if(j == 1 && ((NTermTest == TRUE && i == 1) || (NTermTest == FALSE && i == 2))) { AionMass = AionMass * 2; AddPeakToSpectrum(spectrum2, AionMass, fullIntensity * A_ATT); AionMass = AionMass * 0.5; } else { if((j == 1 || j <= (gParam.chargeState - 1)) && (BionMass < parent || nChargeCount == gParam.chargeState)) { AionMass = AionMass * 2; AddPeakToSpectrum(spectrum2, AionMass, fullIntensity * A_ATT * BAD_A_ATT); AionMass = AionMass * 0.5; } else { AionMass = AionMass * 2; AddPeakToSpectrum(spectrum2, AionMass, fullIntensity * A_ATT * BAD_A_ATT * BAD_A_ATT); AionMass = AionMass * 0.5; } } } } } } if(cChargeCount >= j && i != 0) { if((YionMass * j) > (j-1) * 500) { if (YionMass < msms.scanMassHigh && YionMass < highMassRange && YionMass > lowMassRange) { proAttenuation = 1.0; glyAttenuation = 1.0; if(i > 2 && i < seqLength - 2) /*don't attenuate at the ends*/ { if(currScorePtr->peptide[i-1] == gNomMass[P]) { proAttenuation = 0.2; /*Attenuates y ions followin P*/ } if(currScorePtr->peptide[i-1] == gNomMass[G]) { /*glyAttenuation = 0.2;*/ /*Attenuates y ions followin G*/ } } fullIntensity = fullIntensity * proAttenuation * glyAttenuation; if(gParam.chargeState == 1) { YionMass = YionMass * 2; NH3 = NH3 * 2; H2O = H2O * 2; AddPeakToSpectrum(spectrum2, YionMass, fullIntensity); AddPeakToSpectrum(spectrum2, YionMass - NH3/j, fullIntensity * PLUS1_NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, YionMass - H2O/j, fullIntensity * PLUS1_NEUT_LOSS_ATT); YionMass = YionMass * 0.5; NH3 = NH3 * 0.5; H2O = H2O * 0.5; } else { /*LCQ data has a preponderance of multiply charged y ions*/ if(j == 1 || j <= (gParam.chargeState - 1) || (i <= 2 && !NTermTest) || (i <= 1 && NTermTest) || (i <= (INT_4)seqLength / 4 && gParam.fragmentPattern == 'L')) { YionMass = YionMass * 2; NH3 = NH3 * 2; H2O = H2O * 2; if(j == 1 || j <= (gParam.chargeState - 1) || (i <= 2 && !NTermTest) || (i <= 1 && NTermTest)) { AddPeakToSpectrum(spectrum2, YionMass, fullIntensity); /*add some width to multiply charged*/ if(j > 1) { AddPeakToSpectrum(spectrum2, YionMass - 1, fullIntensity); AddPeakToSpectrum(spectrum2, YionMass + 1, fullIntensity); } AddPeakToSpectrum(spectrum2, YionMass - NH3/j, fullIntensity * NEUT_LOSS_ATT); if(j > 1) { AddPeakToSpectrum(spectrum2, YionMass - NH3/j - 1, fullIntensity * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, YionMass - NH3/j + 1, fullIntensity * NEUT_LOSS_ATT); } AddPeakToSpectrum(spectrum2, YionMass - H2O/j, fullIntensity * NEUT_LOSS_ATT); if(j > 1) { AddPeakToSpectrum(spectrum2, YionMass - H2O/j - 1, fullIntensity * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, YionMass - H2O/j + 1, fullIntensity * NEUT_LOSS_ATT); } } else { AddPeakToSpectrum(spectrum2, YionMass, fullIntensity * 1/i); /*add some width to multiply charged*/ if(j > 1) { AddPeakToSpectrum(spectrum2, YionMass - 1, fullIntensity * 1/(i+ 1)); AddPeakToSpectrum(spectrum2, YionMass + 1, fullIntensity * 1/(i+ 1)); } AddPeakToSpectrum(spectrum2, YionMass - NH3/j, fullIntensity * NEUT_LOSS_ATT); if(j > 1) { AddPeakToSpectrum(spectrum2, YionMass - NH3/j - 1, fullIntensity * 1/(i+ 1) * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, YionMass - NH3/j + 1, fullIntensity * 1/(i+ 1) * NEUT_LOSS_ATT); } AddPeakToSpectrum(spectrum2, YionMass - H2O/j, fullIntensity * NEUT_LOSS_ATT); if(j > 1) { AddPeakToSpectrum(spectrum2, YionMass - H2O/j - 1, fullIntensity * 1/(i+ 1) * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, YionMass - H2O/j + 1, fullIntensity * 1/(i+ 1) * NEUT_LOSS_ATT); } } YionMass = YionMass * 0.5; NH3 = NH3 * 0.5; H2O = H2O * 0.5; } else { YionMass = YionMass * 2; NH3 = NH3 * 2; H2O = H2O * 2; AddPeakToSpectrum(spectrum2, YionMass, fullIntensity * BAD_Y_ATT); AddPeakToSpectrum(spectrum2, YionMass - NH3/j, fullIntensity * BAD_Y_ATT * NEUT_LOSS_ATT); AddPeakToSpectrum(spectrum2, YionMass - H2O/j, fullIntensity * BAD_Y_ATT * NEUT_LOSS_ATT); YionMass = YionMass * 0.5; NH3 = NH3 * 0.5; H2O = H2O * 0.5; } } if(proAttenuation == 0) exit(1); fullIntensity = fullIntensity / proAttenuation; } } } } } /* If the peptide is 4 residues or INT_4er, look for internal fragment ions. * (Only +1 right now). Skip if LCQ data.*/ fullIntensity = 50; if (seqLength > 4 && gParam.fragmentPattern != 'L') { for (i = 1; i < seqLength - 2; i++) { for (j = i+1; j < seqLength - 1; j++) { if(j <= i + 3) { fragmentMass = gElementMass[HYDROGEN]; for (k = i; k <= j; k++) fragmentMass += currScorePtr->peptide[k]; if (fragmentMass < msms.scanMassHigh && fragmentMass < parent) { if(fragmentMass > lowMassRange && fragmentMass < highMassRange) { fragmentMass = fragmentMass * 2; AddPeakToSpectrum(spectrum2, fragmentMass, fullIntensity * INT_FRAG_ATT); fragmentMass = fragmentMass * 0.5; } } } } } } /* Add the low mass immonium ions.*/ for (i = 0; i < seqLength; i++) { for(j = 0; j < gAminoAcidNumber; j++) { if(gNomMass[j] == currScorePtr->peptide[i]) { if(lowMassIonMass[j] > lowMassRange) { lowMassIonMass[j] = lowMassIonMass[j] * 2; AddPeakToSpectrum(spectrum2, lowMassIonMass[j], fullIntensity * lowMassIonIntFactor[j]); lowMassIonMass[j] = lowMassIonMass[j] * 0.5; } break; } } } /* Wipe out region where the precursor and derivatives are located; these don't count.*/ parent = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; parentMinH2O = (gParam.peptideMW - H2O + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; parentMinNH3 = (gParam.peptideMW - NH3 + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; parentMin2H2O = (gParam.peptideMW - H2O - H2O + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; parentMin2NH3 = (gParam.peptideMW - NH3 - NH3 + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; parent = parent * 2; parentMinH2O = parentMinH2O * 2; parentMinNH3 = parentMinNH3 * 2; parentMin2H2O = parentMin2H2O * 2; parentMin2NH3 = parentMin2NH3 * 2; spectrum2[((INT_4)(parent + 0.5))] = 0; spectrum2[((INT_4)(parent + 0.5)) - 1] = 0; spectrum2[((INT_4)(parent + 0.5)) + 1] = 0; spectrum2[((INT_4)(parentMinH2O + 0.5))] = 0; spectrum2[((INT_4)(parentMinH2O + 0.5)) - 1] = 0; spectrum2[((INT_4)(parentMinH2O + 0.5)) + 1] = 0; spectrum2[((INT_4)(parentMinNH3 + 0.5))] = 0; spectrum2[((INT_4)(parentMinNH3 + 0.5)) - 1] = 0; spectrum2[((INT_4)(parentMinNH3 + 0.5)) + 1] = 0; spectrum2[((INT_4)(parentMin2H2O + 0.5))] = 0; spectrum2[((INT_4)(parentMin2H2O + 0.5)) - 1] = 0; spectrum2[((INT_4)(parentMin2H2O + 0.5)) + 1] = 0; spectrum2[((INT_4)(parentMin2NH3 + 0.5))] = 0; spectrum2[((INT_4)(parentMin2NH3 + 0.5)) - 1] = 0; spectrum2[((INT_4)(parentMin2NH3 + 0.5)) + 1] = 0; if(widePeak) { spectrum2[((INT_4)(parent + 0.5)) - 2] = 0; spectrum2[((INT_4)(parent + 0.5)) + 2] = 0; spectrum2[((INT_4)(parentMinH2O + 0.5)) - 2] = 0; spectrum2[((INT_4)(parentMinH2O + 0.5)) + 2] = 0; spectrum2[((INT_4)(parentMinNH3 + 0.5)) - 2] = 0; spectrum2[((INT_4)(parentMinNH3 + 0.5)) + 2] = 0; spectrum2[((INT_4)(parentMin2H2O + 0.5)) - 2] = 0; spectrum2[((INT_4)(parentMin2H2O + 0.5)) + 2] = 0; spectrum2[((INT_4)(parentMin2NH3 + 0.5)) - 2] = 0; spectrum2[((INT_4)(parentMin2NH3 + 0.5)) + 2] = 0; } /* Wipe out regions outside of scan range.*/ for(i = 0; i < SIZEOF_SPECTRA; i++) { if(i < ((INT_4)msms.scanMassLow)*2 - 1) { spectrum2[i] = 0; } if(i > ((INT_4)msms.scanMassHigh)*2 + 1) { spectrum2[i] = 0; } } /*debug spectrum2*/ /*if(currScorePtr->rank == 2) { printf("sequence spectrum \n"); for(i = 0; i < SIZEOF_SPECTRA; i++) { if(spectrum2[i] != 0) { printf("%f %f \n",(REAL_4)i/2,spectrum2[i]); } } } */ /* Cross-correlation analysis */ CrossCorrelate(spectrum2-1, spectrum1-1,(UINT_4) SIZEOF_SPECTRA, tau-1); /* The cross-correlation score is tau[0] minus the mean of -75 < tau < 75. tau[-1 to -75] are stored in wrapped around order at the end of tau. */ /* memcpy(&tau[76], &tau[SIZEOF_SPECTRA - 75], 75 * sizeof(REAL_4)); intensityAccountedFor = 0.0; for (i = 0; i < 150; i++) { intensityAccountedFor += tau[i]; } currScorePtr->crossDressingScore = tau[0] - intensityAccountedFor/150;*/ for(i = 0; i < SIZEOF_SPECTRA; i++) { if(tau[i] < 1) { tau[i] = 0; } } /*Since exact matches are exactly symmetrical, I no longer subtract out the average tau from -75 to 75, but instead subtract the sum of the differences between points that differ by a factor of -1.*/ intensityAccountedFor = 0; for(i = 1; i < 250; i++) /*compare +1/-1 up to +250/-250*/ { tauDiff = tau[i] - tau[SIZEOF_SPECTRA - i]; /*tau(-250) minus tau(250), etc*/ if(tauDiff < 0) { tauDiff = tauDiff * -1; /*absolute value*/ } intensityAccountedFor += tauDiff; /*add it all up*/ } currScorePtr->crossDressingScore = tau[0] - intensityAccountedFor/250;/*divide by 250*/ if (spectrum2) free(spectrum2); spectrum2 = NULL; } /************************** AddPeakToSpectrum ******************************************* * * */ void AddPeakToSpectrum( REAL_4 *spectrum, REAL_4 mass, REAL_4 intensity) { char widePeak; if(intensity < 2) return; /*don't sweat the small stuff*/ if(gParam.fragmentErr <= 0.75) { widePeak = FALSE; /*mock peak widths are 1.5 dalton*/ } else { widePeak = TRUE; /*mock peak widths are 2.5 daltons*/ } /* Make sure that the mass is within the spectrum's range */ if ( (INT_4)(mass + 0.5) > 2 && (INT_4)(mass + 0.5) < SIZEOF_SPECTRA - 2 ) { /* Set the intensity of the peak center */ if (spectrum[(INT_4) (mass + 0.5)] < intensity) spectrum[(INT_4) (mass + 0.5)] = intensity; /* Set the intensity of the peak sides */ if (spectrum[((INT_4) (mass + 0.5)) - 1] < intensity * gSidePeakAtt) spectrum[((INT_4) (mass + 0.5)) - 1] = intensity * gSidePeakAtt; if (spectrum[((INT_4) (mass + 0.5)) + 1] < intensity * gSidePeakAtt) spectrum[((INT_4) (mass + 0.5)) + 1] = intensity * gSidePeakAtt; if(widePeak) { /* Set the intensity of the peak side's side*/ if (spectrum[((INT_4) (mass + 0.5)) - 2] < intensity * gSidePeakAtt * gSidePeakAtt) spectrum[((INT_4) (mass + 0.5)) - 2] = intensity * gSidePeakAtt * gSidePeakAtt; if (spectrum[((INT_4) (mass + 0.5)) + 2] < intensity * gSidePeakAtt * gSidePeakAtt) spectrum[((INT_4) (mass + 0.5)) + 2] = intensity * gSidePeakAtt * gSidePeakAtt; } } } lutefisk-1.0.7+dfsg.orig/src/getopt.c0000644000175000017500000000314710102256715017412 0ustar rusconirusconi#include #include "getopt.h" /*LINTLIBRARY*/ #ifndef NULL #define NULL 0 #endif extern short strcmp(); extern char *strchr(); INT_4 opterr = 1; INT_4 optind = 1; INT_4 optopt; char *optarg; INT_4 getopt(INT_4 argc, CHAR **argv, CHAR *opts) { static INT_4 sp = 1; register INT_4 c; register CHAR *cp; if(sp == 1) if(optind >= argc || argv[optind][0] != '-' || argv[optind][1] == '\0') return(EOF); else if(strcmp(argv[optind], "--") == NULL) { optind++; return(EOF); } optopt = c = argv[optind][sp]; if(c == ':' || (cp=strchr(opts, c)) == NULL) { fprintf(stderr,"illegal command-line option: %c\n",c); if(argv[optind][++sp] == '\0') { optind++; sp = 1; } return('?'); } if(*++cp == ':') { if(argv[optind][sp+1] != '\0') optarg = &argv[optind++][sp+1]; else if(++optind >= argc) { fprintf(stderr,"command-line option %c requires an argument\n",c); sp = 1; return('?'); } else optarg = argv[optind++]; sp = 1; } else { if(argv[optind][++sp] == '\0') { sp = 1; optind++; } optarg = NULL; } return(c); } lutefisk-1.0.7+dfsg.orig/src/Makefile.osx0000644000175000017500000000343210124104201020173 0ustar rusconirusconi CC= cc -O4 CFLAGS= -D__OS_X LFLAGS= -lm -o BIN = /seqprg/slib/bin #NRAND= nrand #IBM RS/6000 NRAND= nrand48 RANFLG= -DRAND32 #HZ=60 for sun, mips, 100 for rs/6000, SGI, LINUX HZ=60 PROGS= lutefisk SPROGS= lutefisk .c.o: $(CC) $(CFLAGS) -c $< all : $(PROGS) sall : $(SPROGS) install : cp $(PROGS) $(BIN) clean-up : rm *.o $(PROGS) lutefisk : LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(CC) LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(LFLAGS) lutefisk LutefiskGlobalDeclarations.o : LutefiskGlobalDeclarations.c $(CC) $(CFLAGS) -c LutefiskGlobalDeclarations.c LutefiskMain.o : LutefiskMain.c $(CC) $(CFLAGS) -c LutefiskMain.c LutefiskGetCID.o : LutefiskGetCID.c $(CC) $(CFLAGS) -c LutefiskGetCID.c LutefiskHaggis.o : LutefiskHaggis.c $(CC) $(CFLAGS) -c LutefiskHaggis.c LutefiskMakeGraph.o : LutefiskMakeGraph.c $(CC) $(CFLAGS) -c LutefiskMakeGraph.c LutefiskSummedNode.o : LutefiskSummedNode.c $(CC) $(CFLAGS) -c LutefiskSummedNode.c LutefiskSubseqMaker.o : LutefiskSubseqMaker.c $(CC) $(CFLAGS) -c LutefiskSubseqMaker.c LutefiskScore.o : LutefiskScore.c $(CC) $(CFLAGS) -c LutefiskScore.c LutefiskXCorr.o : LutefiskXCorr.c $(CC) $(CFLAGS) -c LutefiskXCorr.c LutefiskFourier.o : LutefiskFourier.c $(CC) $(CFLAGS) -c LutefiskFourier.c LutefiskGetAutoTag.o : LutefiskGetAutoTag.c $(CC) $(CFLAGS) -c LutefiskGetAutoTag.c ListRoutines.o : ListRoutines.c $(CC) $(CFLAGS) -c ListRoutines.c lutefisk-1.0.7+dfsg.orig/src/Makefile.linux0000644000175000017500000000335110124104201020521 0ustar rusconirusconiCC= gcc -O CFLAGS= -D__LINUX LFLAGS= -lm -o NRAND= nrand48 RANFLG= -DRAND32 #HZ=60 for sun, mips, 100 for rs/6000, SGI, LINUX HZ=60 PROGS= lutefisk SPROGS= lutefisk .c.o: $(CC) $(CFLAGS) -c $< all : $(PROGS) sall : $(SPROGS) install : cp $(PROGS) $(BIN) clean-up : rm *.o $(PROGS) lutefisk : LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(CC) LutefiskGlobalDeclarations.o LutefiskMain.o LutefiskGetCID.o LutefiskHaggis.o LutefiskMakeGraph.o LutefiskSummedNode.o LutefiskSubseqMaker.o LutefiskScore.o LutefiskXCorr.o LutefiskFourier.o LutefiskGetAutoTag.o ListRoutines.o $(LFLAGS) lutefisk LutefiskGlobalDeclarations.o : LutefiskGlobalDeclarations.c $(CC) $(CFLAGS) -c LutefiskGlobalDeclarations.c LutefiskMain.o : LutefiskMain.c $(CC) $(CFLAGS) -c LutefiskMain.c LutefiskGetCID.o : LutefiskGetCID.c $(CC) $(CFLAGS) -c LutefiskGetCID.c LutefiskHaggis.o : LutefiskHaggis.c $(CC) $(CFLAGS) -c LutefiskHaggis.c LutefiskMakeGraph.o : LutefiskMakeGraph.c $(CC) $(CFLAGS) -c LutefiskMakeGraph.c LutefiskSummedNode.o : LutefiskSummedNode.c $(CC) $(CFLAGS) -c LutefiskSummedNode.c LutefiskSubseqMaker.o : LutefiskSubseqMaker.c $(CC) $(CFLAGS) -c LutefiskSubseqMaker.c LutefiskScore.o : LutefiskScore.c $(CC) $(CFLAGS) -c LutefiskScore.c LutefiskXCorr.o : LutefiskXCorr.c $(CC) $(CFLAGS) -c LutefiskXCorr.c LutefiskFourier.o : LutefiskFourier.c $(CC) $(CFLAGS) -c LutefiskFourier.c LutefiskGetAutoTag.o : LutefiskGetAutoTag.c $(CC) $(CFLAGS) -c LutefiskGetAutoTag.c ListRoutines.o : ListRoutines.c $(CC) $(CFLAGS) -c ListRoutines.c lutefisk-1.0.7+dfsg.orig/src/LutefiskFourier.c0000644000175000017500000003517410303626356021244 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* LutefiskXP is a program designed to aid in the interpretation of CID data of peptides. The main assumptions are that the data is of reasonable quality, the N- and C-terminal modifications (if any) are known, and the precursor ion charge (and therefore the peptide molecular weight) are known. The ultimate goal here is to develop code that can utilize msms data in conjunction with ambiguous and incomplete Edman sequencing data, sequence tags, peptide derivatization, and protein or est database searches. An older version of LutefiskXP has been written in FORTRAN and runs on 68K Macs that have an fpu (1991, 39th ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN, pp 1233- 1234). This is a different and improved algorithm partly inspired by Fernandez-de-Cossjo, et al. (1995) CABIOS Vol. 11 No. 4 pp 427-434. Combining this msms interpretation algorithm with Edman sequencing, database searches, and derivatization is entirely of my own design; J. Alex Taylor implemented the changes in the FASTA code (Bill Pearson, U. of VA) so that the LutefiskXP output can be read directly by the modified FASTA program. In addition, there were a number of additional critical changes made to FASTA to make it more compatible with msms sequencing data. The trademark LutefiskXP was chosen at random, and is not meant to imply any similarity between this computer program and the partially base-hydrolzyed cod fish of the same name. */ #include #include #include #include #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" REAL_4 * spectrum1 = NULL; REAL_4 * spectrum2 = NULL; REAL_4 * tau = NULL; extern UINT_4 SIZEOF_SPECTRA; static void FastFourier(REAL_4 *data, UINT_4 nn, INT_4 isign); static void twofft(REAL_4 data1[], REAL_4 data2[], REAL_4 fft1[], REAL_4 fft2[], UINT_4 n); static void realft(REAL_4 data[], UINT_4 n, int isign); /*************************************CalcNormalizedExptPeaks********************************** * * It takes the peaks from the current MS/MS spectrum and for each charge state (1,2,3,4,5+) * it divides the spectrum into ten equal sections and normalizes the peaks in each section * to an intensity of 50. The parent (and potential parent derivatives in that charge state) * are excluded from this normalization. */ void CalcNormalizedExptPeaks(struct MSData *firstMassPtr) { INT_4 segment; /* The current segment (1/10th of the spectrum). */ REAL_4 tolerance; /* The daughter ion error tolerance. */ REAL_4 offset; /* The mass offset for daughter ions. */ REAL_4 segmentSize; /* The size (in m/z) of a segment */ REAL_4 maxIntensity; /* The max peak intensity in a segment */ REAL_4 normalizingFactor; /* The normalizing factor for a segment */ struct MSData *pPeak; /* Pointer to the current MS/MS peak. */ struct MSData *pSegment; /* Pointer to an MS/MS peak in the segment. */ REAL_4 precursor; /* The m/z of the parent ion. */ INT_4 charge, segmentNum; charge = gParam.chargeState; precursor = (gParam.peptideMW + (charge * gElementMass[HYDROGEN])) / charge; tolerance = gParam.fragmentErr; offset = 0; /*this has already been incorporated into the list of ions*/ segmentNum = ((msms.scanMassHigh - msms.scanMassLow) / AV_RESIDUE_MASS) + 1; if(segmentNum == 0) exit(1); segmentSize = (msms.scanMassHigh - msms.scanMassLow) / segmentNum; segment = 0; /* Index to the segment */ maxIntensity = 0; pPeak = firstMassPtr; while(pPeak != NULL && segment < segmentNum) { if (pPeak->mOverZ > msms.scanMassLow - gParam.fragmentErr + (segment * segmentSize)) { pSegment = pPeak; /* Find the peak with the max intensity in this segment */ while (pPeak != NULL && pPeak->mOverZ < msms.scanMassLow + ((segment+1) * segmentSize)) { if (pPeak->mOverZ > (precursor - 2*H2O - 2*tolerance) && pPeak->mOverZ < (precursor + (tolerance * 2))) { /* Count as max only if it is not the parent or a parent derivative. */ if (!closeEnough((pPeak->mOverZ - (precursor + offset - H2O/charge)), tolerance * 2) && !closeEnough((pPeak->mOverZ - (precursor + offset - NH3/charge)), tolerance * 2) && !closeEnough((pPeak->mOverZ - (precursor + offset - 2*H2O/charge)), tolerance * 2) && !closeEnough((pPeak->mOverZ - (precursor + offset - 2*NH3/charge)), tolerance * 2) && !closeEnough((pPeak->mOverZ - precursor + offset), tolerance * 2)) { if(pPeak->intensity > maxIntensity) { maxIntensity = pPeak->intensity; } } else { pPeak->normIntensity = -1; /* Flag it as a parent or derivative */ } } else { if (pPeak->intensity > maxIntensity) { maxIntensity = pPeak->intensity; } } pPeak = pPeak->next; } /* Normalize the peaks in this segment */ if ((pPeak == NULL) || (pPeak->mOverZ > pSegment->mOverZ)) { if (maxIntensity > 0) { normalizingFactor = 50.0/maxIntensity; } else { normalizingFactor = 1; } while (pSegment != NULL) { if (pSegment->normIntensity == -1) { /* The peak should have a normalized intensity of 0 if it is flagged * as a parent or parent derivative */ pSegment->normIntensity = 0; } else { pSegment->normIntensity = normalizingFactor * pSegment->intensity; } pSegment = pSegment->next; if(pPeak == NULL) { if(pSegment == NULL) { break; } } else if(pSegment == NULL || pSegment->mOverZ >= pPeak->mOverZ) { break; } } } } segment++; maxIntensity = 0; } } /*************************************FillInSpectrum1********************************** * * This function takes the normalized peak intensities from the current MS/MS spectrum * and creates a dummied up spectrum. */ void FillInSpectrum1(struct MSData *firstMassPtr) { struct MSData *pPeak; /* Pointer to the current MS/MS peak. */ INT_4 massInt, lowEnd, highEnd, i; REAL_4 precursor; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; lowEnd = (precursor - 35) * 2; /*From lowEnd to highEnd, I attenuate the spectrum1 values*/ highEnd = (precursor + 2) * 2; if (!spectrum1) { return; } /* For data with higher mass accuracy, the spectrum1 is made narrower. Since I bin every 0.5 Da, three bins would give a peak width of about 1.5 Da. */ if(gParam.fragmentErr <= 0.75) { pPeak = firstMassPtr; while (pPeak != NULL) { massInt = (((pPeak->mOverZ) * 2) + 0.5); if (massInt > 2 && massInt < (SIZEOF_SPECTRA - 2) ) { spectrum1[massInt] = pPeak->normIntensity; if(gParam.qtofErr == 0 || gParam.qtofErr >= 0.25) { if (0.5 * pPeak->normIntensity > spectrum1[massInt - 1]) { spectrum1[massInt - 1] = 0.75 * pPeak->normIntensity; } if(0.5 * pPeak->normIntensity > spectrum1[massInt + 1]) { spectrum1[massInt + 1] = 0.75 * pPeak->normIntensity; } } } pPeak = pPeak->next; } } /* For data with worse errors, I go for a peak that is 5 bins wide. */ else { pPeak = firstMassPtr; while (pPeak != NULL) { massInt = (((pPeak->mOverZ) * 2) + 0.5); if (massInt > 2 && massInt < (SIZEOF_SPECTRA - 2) ) { spectrum1[massInt] = pPeak->normIntensity; if (0.5 * pPeak->normIntensity > spectrum1[massInt - 1]) { spectrum1[massInt - 1] = 0.5 * pPeak->normIntensity; } if(0.25 * pPeak->normIntensity > spectrum1[massInt - 2]) { spectrum1[massInt - 2] = 0.25 * pPeak->normIntensity; } if(0.5 * pPeak->normIntensity > spectrum1[massInt + 1]) { spectrum1[massInt + 1] = 0.5 * pPeak->normIntensity; } if(0.25 * pPeak->normIntensity > 0.25 * spectrum1[massInt + 2]) { spectrum1[massInt + 2] = 0.25 * pPeak->normIntensity; } } pPeak = pPeak->next; } } for(i = lowEnd; i < highEnd; i++) { spectrum1[i] = spectrum1[i] * 0.5; } } /*************************************SetupCrossCorrelation********************************** * * Sets aside memory blocks for the two spectra that will be dummied up and padded, and for * the array to contain the cross-correlation results, tau. */ void SetupCrossCorrelation(void) { INT_4 i; /* The spectral arrays must be of the same power of 2. So, if the spectra was not * acquired above m/z 2048 we will uses that as the array size because it will run * much faster than if we use 4096. (The cross-correlation is done at unit resolution.) */ SIZEOF_SPECTRA = SIZEOF_SPECTRA_BIG; if (spectrum1) { free(spectrum1); /* Throw away the old data (if any exists) */ } spectrum1 = (REAL_4 *) malloc(SIZEOF_SPECTRA*sizeof(REAL_4)); if (NULL == spectrum1) { printf("Not enough memory to allocate spectrum1"); exit(1); } for(i = 0; i < SIZEOF_SPECTRA; i++) { spectrum1[i] = 0; } if (spectrum2) { free(spectrum2); /* Throw away the old data (if any exists) */ } spectrum2 = (REAL_4 *) malloc(SIZEOF_SPECTRA*sizeof(REAL_4)); if (NULL == spectrum2) { printf("Not enough memory to allocate spectrum2"); exit(1); } for(i = 0; i < SIZEOF_SPECTRA; i++) { spectrum2[i] = 0; } if (tau) { /* tau will be the array for the results of the cross-correlation * and it needs to be twice as large as the spectra. */ free(tau); /* Throw away the old data (if any exists) */ } tau = (REAL_4 *) malloc(SIZEOF_SPECTRA * 2 * sizeof(REAL_4)); if (NULL == tau) { printf("Not enough memory to allocate tau"); exit(1); } for(i = 0; i < (SIZEOF_SPECTRA) * 2; i++) { tau[i] = 0; } } /*************************************CrossCorrelate********************************** * */ void CrossCorrelate(REAL_4 *array1, REAL_4 *array2, UINT_4 n, REAL_4 *result) { UINT_4 i; /* Loop index. */ REAL_4 temp; REAL_4 *workSpace; workSpace = (REAL_4 *) malloc((n * 2) * sizeof(REAL_4)); for(i = 0; i < (n * 2); i++) { workSpace[i] = 0; } if (workSpace) { workSpace--; /* This is done so the array is not treated as 0 based. */ twofft(array1, array2, workSpace, result, n); for (i = 2; i <= n+2; i += 2) { result[i-1] = (workSpace[i-1] * (temp = result[i-1]) + workSpace[i] * result[i])/(n * 2); result[i] = (workSpace[i] * temp - workSpace[i-1] * result[i])/(n * 2); } result[2] = result[n+1]; realft(result, n, -1); workSpace++; free(workSpace); } } /*************************************FastFourier********************************** * */ static void FastFourier(REAL_4 *array, UINT_4 nn, INT_4 isign) { UINT_4 n; UINT_4 mmax; UINT_4 m; UINT_4 j; UINT_4 istep; UINT_4 i; REAL_4 tempr; REAL_4 tempi; REAL_8 wtemp; REAL_8 wr; REAL_8 wpr; REAL_8 wpi; REAL_8 wi; REAL_8 theta; n = nn << 1; j = 1; for (i = 1; i < n; i += 2) { if (j > i) { SWAP(array[j],array[i]); SWAP(array[j+1],array[i+1]); } m = n >> 1; while (m >= 2 && j > m) { j -= m; m >>= 1; } j += m; } mmax = 2; while (n> mmax) { istep = mmax << 1; theta = isign * (6.28318530717959/mmax); wpr = -2.0 * pow(sin(0.5 * theta),2); wpi = sin(theta); wr = 1.0; wi = 0.0; for (m=1; m>1); if (isign == 1) { c2 = -0.5; FastFourier(array, n>>1, 1); } else { c2 = 0.5; theta = -theta; } wtemp = sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = sin(theta); wr = 1.0 + wpr; wi = wpi; np3 = n + 3; for (i=2; i<=(n>>2); i++) { i4 = 1 + (i3 = np3 - (i2 = 1 + (i1 = i + i - 1))); h1r = c1 * (array[i1] + array[i3]); h1i = c1 * (array[i2] - array[i4]); h2r = -c2 * (array[i2] + array[i4]); h2i = c2 * (array[i1] - array[i3]); array[i1] = h1r + wr * h2r - wi * h2i; array[i2] = h1i + wr * h2i + wi * h2r; array[i3] = h1r - wr * h2r + wi * h2i; array[i4] = -h1i + wr * h2i + wi * h2r; wr = (wtemp = wr) * wpr - wi * wpi + wr; wi = wi * wpr + wtemp * wpi + wi; } if (isign == 1) { array[1] = (h1r = array[1]) + array[2]; array[2] = h1r - array[2]; } else { array[1] = c1 * ((h1r = array[1]) + array[2]); array[2] = c1 * (h1r - array[2]); FastFourier(array, n>>1, -1); } } lutefisk-1.0.7+dfsg.orig/src/LutefiskSubseqMaker.c0000644000175000017500000020240710303626624022044 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* Richard S. Johnson 6/96 LutefiskSubseqMaker is a file containing the function SubsequenceMaker plus its associated functions. It was written to be used as part of a program called "LutefiskXP", which is used to aid in the interpretation of CID data of peptides. The general aim of this file (and the function SubsequenceMaker) is to use the graph sequenceNode to derive a list of completed sequences that account for some of the CID data. It uses a subsequencing approach. Rather than searching each limb and twig of the tree, I ignore those branches that do not appear to lead to anything interesting. */ #include #include #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" /*Globals for this file only.*/ struct Sequence *gFinalSequencePtr; INT_4 gSubseqNum = 0; INT_4 gAA1Max, gAA1Min, gAA2Max, gAA2Min, gAA1, gAA2; char gCheckItOut = FALSE; /*Equals TRUE if I want to follow the subsequence buildup, or FALSE if I want it to run in the normal mode.*/ INT_4 gCorrectSequence[50] = { /*Used for checking if the correct sequence is remaining.*/ 20011, 8703, 11308, 9907, 5702, 25009, 9907, 17106, 15610, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; /********************************** ClearLowestScore ******************************* * * This function finds the lowest scoring subsequence (the last one in the list), * and removes all subsequences of that score. */ void ClearLowestScore(struct Sequence *newSubsequencePtr, INT_4 *subseqNum, INT_4 maxLastNode, INT_4 minLastNode) { struct Sequence *currPtr, *previousPtr, *trashPtr, *lastPtr; INT_4 lowScore; /*Return if only zero or one new subsequence.*/ if(newSubsequencePtr->next == NULL || newSubsequencePtr == NULL) return; /*Find the end of the list.*/ currPtr = newSubsequencePtr->next; previousPtr = newSubsequencePtr; while(currPtr != NULL) { previousPtr = previousPtr->next; currPtr = currPtr->next; } /*Find the lowest score, which is from the subsequence at the end of the list.*/ lowScore = previousPtr->score; /*If the first new subsequence also has the low score, then get rid of one subsequence.*/ if(newSubsequencePtr->score == lowScore) { currPtr = newSubsequencePtr->next; previousPtr = newSubsequencePtr; if(currPtr->next == NULL) /*Return if only three subsequences.*/ return; while(currPtr->next != NULL) { previousPtr = previousPtr->next; currPtr = currPtr->next; } previousPtr->next = NULL; free(currPtr); /*Free one subsequence so that the new one can be added on.*/ *subseqNum = *subseqNum - 1; } /*If the first subsequence has a higher score than the low score, get rid of all subsequences with scores equal to the low score.*/ else { currPtr = newSubsequencePtr->next; previousPtr = newSubsequencePtr; while(currPtr != NULL && currPtr->score != lowScore) { previousPtr = previousPtr->next; currPtr = currPtr->next; } previousPtr->next = NULL; /*terminate the list just prior to the low score subseqs*/ lastPtr = previousPtr; /*remember the last good subsequence pointer*/ trashPtr = currPtr; /* if(gParam.proteolysis == 'T') /*keep subsequences that are about to finish*/ /* { while(currPtr != NULL) { if(currPtr->nodeValue + gMonoMass_x100[R] >= minLastNode) { if(currPtr->nodeValue + gMonoMass_x100[R] <= maxLastNode) { currPtr->nodeValue = (currPtr->nodeValue) * -1; } if(currPtr->nodeValue + gMonoMass_x100[K] >= minLastNode) { if(currPtr->nodeValue + gMonoMass_x100[K] <= maxLastNode) { currPtr->nodeValue = (currPtr->nodeValue) * -1; } } } currPtr = currPtr->next; } currPtr = trashPtr; while(currPtr != NULL) { if(currPtr->nodeValue < 0) { currPtr->nodeValue = (currPtr->nodeValue) * -1; lastPtr->next = currPtr; lastPtr = currPtr; currPtr = currPtr->next; } else { trashPtr = currPtr; currPtr = currPtr->next; free(trashPtr); } } } else {*/ FreeSequenceStructs(trashPtr); /*trash the remaining ones.*/ /* } */ /*Count the subsequences*/ *subseqNum = 0; currPtr = newSubsequencePtr; while(currPtr != NULL) { *subseqNum = *subseqNum + 1; currPtr = currPtr->next; } } return; } /**********************************LCQNterminalSubsequences**************************************** * * This function sets up the first batch of subsequences. It starts with a single subsequence * containing the N-terminal group (usually hydrogen of mass 1) and then tries to connect with * nodes that are up to 559 units higher. If any higher mass nodes can be connected to lower * mass nodes, then the higher mass nodes are eliminated because these will be incorporated * into the sequence as the subsequencing progresses. A linked list of structs of type * Sequence is generated where the first struct in the list has the highest score, and the last * struct in the list has the lowest score. */ struct Sequence *LCQNterminalSubsequences(SCHAR *sequenceNode, INT_4 maxLastNode, INT_4 lowSuperNode, INT_4 highSuperNode) { struct Sequence *subsequencePtr; INT_4 i, j, k, m, testValue, nTerminus; INT_4 *extensions, *extScore, extNum; INT_4 *bestExtensions, *bestExtScore, bestExtNum; INT_4 highestExtensionScore; INT_4 threshold; INT_4 threeAALimit = gAminoAcidNumber*gAminoAcidNumber*gAminoAcidNumber; INT_4 score, *peptide, peptideLength, nodeValue, gapNum; INT_4 *threeAA, threeAANum, sum; INT_4 sameNum, averageExtension, *sameExtension; INT_2 nodeCorrection; char nTerminusPossible, duplicateFlag; char sameTest, doIt; sameExtension = (int *) malloc(gParam.fragmentErr * 20 * sizeof(INT_4)); if(sameExtension == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } threeAA = (int *) malloc(threeAALimit * sizeof(INT_4)); if(threeAA == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } extensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(extensions == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } extScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(extScore == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } bestExtensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(bestExtensions == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } bestExtScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(bestExtScore == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } peptide = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4 )); if(peptide == NULL) { printf("LCQNterminalSubsequences: Out of memory."); exit(1); } /* Fill in the masses for three amino acids.*/ threeAANum = 0; for(i = 0; i < gAminoAcidNumber; i++) /*Fill in the masses of the 3 AA extensions.*/ { for(j = 0; j < gAminoAcidNumber; j++) { for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[i] != 0 && gGapList[j] != 0 && gGapList[k] != 0) { sum = gGapList[i] + gGapList[j] + gGapList[k]; duplicateFlag = FALSE; for(m = 0; m < threeAANum; m++) { if(threeAA[m] == sum) { /* We already have this mass in threeAA so don't add it to the list. */ duplicateFlag = TRUE; break; } } if(duplicateFlag == FALSE) { for(m = 0; m <= gGapListIndex; m++) { if(gGapList[m] == sum) { /* We already have this mass so don't add it to the list. */ duplicateFlag = TRUE; break; } } } if(!duplicateFlag && threeAANum < threeAALimit - 1) { threeAA[threeAANum] = sum; threeAANum++; } } } } } /* Now start finding the N-terminal pieces.*/ extNum = 0; subsequencePtr = NULL; nTerminus = gParam.modifiedNTerm; /* Find the one, two, and three amino acid jumps from gGapList and threeAA.*/ for(i = nTerminus + gMonoMass_x100[G]; i < gMonoMass_x100[W] * 3; i++) /*step thru each node*/ { if(sequenceNode[i] != 0 && i < gGraphLength) /*ignore the nodes w/ zero evidence*/ { nTerminusPossible = FALSE; /*start assuming that this is not an extension*/ /* Check for one and two amino acid extensions using the gGapList array.*/ for(j = 0; j <= gGapListIndex; j++) { if(gGapList[j] != 0) { if(i - nTerminus == gGapList[j]) { nTerminusPossible = TRUE; /*its a possible extension*/ break; } } } /* Now check for three amino acid extensions if it wasn't a one or two aa extension.*/ if(nTerminusPossible == FALSE) { for(j = 0; j < threeAANum; j++) { if(threeAA[j] != 0) { if(i - nTerminus == threeAA[j]) { nTerminusPossible = TRUE; break; } } } } doIt = TRUE; /*check for superNodes when sequencetag specified*/ if(nTerminus < lowSuperNode && i > highSuperNode) { doIt = FALSE; } if(nTerminusPossible && i <= maxLastNode && doIt) /*save as an extension?*/ { extensions[extNum] = i - nTerminus; extScore[extNum] = sequenceNode[i]; extNum++; if(extNum >= MAX_GAPLIST) { printf("LCQNTerminalSubsequences: extNum >= MAX_GAPLIST\n"); exit(1); } } } } /* Find extensions that are 1 node unit apart, and consolidate them.*/ for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { sameNum = 0; averageExtension = extensions[i]; for(j = 0; j < extNum; j++) { if(j != i && extScore[j] != 0) { sameTest = FALSE; for(k = 0; k < sameNum; k++) { if(extensions[sameExtension[k]] - extensions[j] == 1 || extensions[j] - extensions[sameExtension[k]] == 1) { sameTest = TRUE; } } if(extensions[i] - extensions[j] == 1 || extensions[j] - extensions[i] == 1 || sameTest) { sameExtension[sameNum] = j; averageExtension += extensions[j]; sameNum++; } } } if(sameNum != 0) { averageExtension = ((float)averageExtension / (sameNum + 1)) + 0.5; /*count the i extension and round the value*/ extensions[i] = averageExtension; for(j = 0; j < sameNum; j++) { extScore[sameExtension[j]] = 0; } } } } /* Get rid of the extensions that are within gParam.fragmentErr of each other.*/ for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { for(j = 0; j < extNum; j++) { if(extScore[j] != 0 && i != j) { if(extensions[i] <= extensions[j] + gParam.fragmentErr && extensions[i] >= extensions[j] - gParam.fragmentErr) { if(extScore[i] >= extScore[j]) { extScore[j] = 0; } else { extScore[i] = 0; } } } } } } /* If two extensions differ by the mass of an amino acid, then the higher mass one's intensity is assigned a zero so that it is removed in the section below.*/ if(extNum > 0) { for(i = 0; i < extNum; i++) { for(j = i + 1; j < extNum; j++) { testValue = extensions[j] - extensions[i]; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(testValue <= gGapList[k] + gParam.fragmentErr && testValue >= gGapList[k] - gParam.fragmentErr) { extScore[j] = 0; } } } } } } /* * Now I need to find the best extensions, ie, the top maxExtNum of them and only if these * extensions are greater than the product of the highest score and extThresh. * bestExtensions[MAX_GAPLIST], bestExtScore[MAX_GAPLIST], bestExtNum; */ highestExtensionScore = extScore[0]; /*Find the highest extension score.*/ for(i = 0; i < extNum; i++) { if(extScore[i] > highestExtensionScore) { highestExtensionScore = extScore[i]; } } threshold = highestExtensionScore * gParam.extThresh; /*Set the extension score threshold.*/ /* * If a peptide tag has been entered and if the highestExtensionScore is over 100, then * that means that the highest scoring extension is a superNode. In this case, don't * use an extension score threshold. */ if(gParam.tagNMass != 0 && gParam.tagCMass != 0) { if(highestExtensionScore > 100) { threshold = 0; } } bestExtNum = 0; for(i = 0; i < extNum; i++) { if(extScore[i] >= threshold) { bestExtensions[bestExtNum] = extensions[i]; bestExtScore[bestExtNum] = extScore[i]; bestExtNum++; if(bestExtNum >= MAX_GAPLIST) { printf("LCQNTerminalSubsequences: bestExtNum >= MAX_GAPLIST\n"); exit(1); } } } /* Make sure there are enough subsequences allowed.*/ if(bestExtNum > gParam.topSeqNum) { bestExtNum = gParam.topSeqNum; } /* Store this information in the linked list of Sequence structs.*/ for(i = 0; i < bestExtNum; i++) { score = bestExtScore[i]; peptide[0] = bestExtensions[i]; peptideLength = 1; gapNum = 0; nodeValue = nTerminus + bestExtensions[i]; j = gParam.modifiedNTerm * 10 + 0.5; k = gParam.modifiedNTerm + 0.5; k = k * 10; nodeCorrection = j - k; subsequencePtr = LinkSubsequenceList(subsequencePtr, LoadSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); } /* Free the arrays*/ free(extensions); free(extScore); free(bestExtensions); free(bestExtScore); free(peptide); free(sameExtension); free(threeAA); return(subsequencePtr); } /*****************************amIHere****************************************************** * * This function is used to determine at each subsequence extension if the correct sequence * is present. By setting gCheckItOut to be FALSE, this stuff is always skipped. If * gCheckItOut is TRUE, then this function is activated. * */ void amIHere(INT_4 correctPeptideLength, struct Sequence *subsequencePtr) { struct Sequence *currPtr, *correctPtr; INT_4 i, totalSubsequences, rank; INT_4 j = 0; INT_4 lowestScore; char test; if(subsequencePtr == NULL) { j++; /*set debugger here*/ j++; return; } /*gCheckItOut = FALSE;*/ /*If sequence is found then its made TRUE later. This keeps it from running this function when the sequence drops out.*/ currPtr = subsequencePtr; /*Find the lowest score here.*/ while(currPtr->next != NULL) { currPtr = currPtr->next; } lowestScore = currPtr->score; currPtr = subsequencePtr; while(currPtr != NULL) { if(currPtr->nodeValue == 13322) { j++; j++; } test = TRUE; for(i = 0; i < correctPeptideLength; i++) { if(currPtr->peptide[i] <= gCorrectSequence[i] - gParam.fragmentErr || currPtr->peptide[i] >= gCorrectSequence[i] + gParam.fragmentErr) { test = FALSE; break; } } if(test) /*If this is the correct subsequence, then..*/ { gCheckItOut = TRUE; totalSubsequences = 0; rank = 1; correctPtr = currPtr; currPtr = subsequencePtr; while(currPtr != NULL) /*Count the subsequences and determine the rank.*/ { totalSubsequences++; if(currPtr->score > correctPtr->score) { rank++; } currPtr = currPtr->next; } j++; /*Stop here in the debugger.*/ j++; break; } currPtr = currPtr->next; } if(test != TRUE) { j++; /*Stop here in the debugger for when it doesn't match anymore.*/ j++; } } /*****************************CorrectMass************************************************** * * This function figures out if the final sequence that is about to be stored, is in fact * of the correct mass. It returns a FALSE if its not correct, and a TRUE if it is correct. */ char CorrectMass(INT_4 *peptide, INT_4 peptideLength, INT_4 *aaPresentMass) { char correct = TRUE; char test, peptideChar[MAX_PEPTIDE_LENGTH]; REAL_4 calcMass = 0; REAL_4 nMass; REAL_8 mToAFactor; INT_4 i, j, k, diff, tagSequenceMass; calcMass += gParam.modifiedNTerm; nMass = gParam.modifiedNTerm; calcMass += gParam.modifiedCTerm; for(i = 0; i < peptideLength; i++) /*Figure out the nominal mass of the peptide.*/ { calcMass += peptide[i]; } if(calcMass >= gParam.monoToAv) /*Convert from mono to average mass*/ { mToAFactor = 0; } else { if(calcMass >= (gParam.monoToAv - gAvMonoTransition)) { mToAFactor = (gParam.monoToAv - calcMass) / gAvMonoTransition; } else { mToAFactor = 1; } } mToAFactor = MONO_TO_AV - ((MONO_TO_AV - 1) * mToAFactor); if(calcMass > (gParam.monoToAv - gAvMonoTransition)) /*Convert from average to monoisotopic mass if necessary.*/ { calcMass = calcMass * mToAFactor; } /* Now decide if the peptide mass is correct.*/ if(calcMass <= (gParam.peptideMW + gParam.peptideErr) && calcMass >= (gParam.peptideMW - gParam.peptideErr)) { correct = TRUE; } else { correct = FALSE; } /* Now decide if the sequence tag is present before storing it as a complete sequence.*/ if(correct) { if(gParam.tagSequence[0] != '*') { test = TRUE; if(nMass >= (gParam.tagNMass - gParam.fragmentErr)) /*This is if the sequence tag starts at the N-terminus.*/ { if(nMass <= (gParam.tagNMass + gParam.fragmentErr)) { correct = TRUE; test = FALSE; } } if(test) { for(i = 0; i < peptideLength; i++) { nMass = nMass + (peptide[i]); if(nMass >= (gParam.tagNMass - gParam.fragmentErr)) { if(nMass <= (gParam.tagNMass + gParam.fragmentErr)) { correct = TRUE; } else { correct = FALSE; } break; } } } } } /* Now check to see if the amino acids that are supposed to be in the peptide are part of the sequence about to be strored away.*/ if(correct) /*If its not a correct sequence, then don't bother proving its even more incorrect.*/ { if(aaPresentMass[0] != -1) /*Do this if there were any amino acids listed as being present in the sequence.*/ { /* Identify 2 amino acid gaps in the sequence.*/ for(i = 0; i < peptideLength; i++) { test = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { if(gGapList[j] == peptide[i]) { test = FALSE; break; } } } if(test) { peptideChar[i] = TRUE; /*It is a 2 amino acid gap.*/ } else { peptideChar[i] = FALSE; } } /* Now start comparing aaPresent and the sequence in 'peptide'.*/ i = 0; while(aaPresentMass[i] != -1) { if(aaPresentMass[i] != 0) { test = TRUE; for(j = 0; j < peptideLength; j++) { if(peptide[j] == aaPresentMass[i]) { test = FALSE; break; } } if(test) /*I didn't find it as a 1 amino acid, but maybe its part of 2 aa.*/ { for(j = 0; j < peptideLength; j++) { if(peptideChar[j]) { diff = peptide[j] - aaPresentMass[i]; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(diff >= gGapList[k] - gParam.fragmentErr && diff <= gGapList[k] + gParam.fragmentErr) { test = FALSE; break; } } } } } } if(test) /*I didn't find it in the sequence. Maybe its in the sequence tag.*/ { j = 0; while(gParam.tagSequence[j] != 0) { tagSequenceMass = 0; for(k = 0; k < gAminoAcidNumber; k++) { if(gSingAA[k] == gParam.tagSequence[j]) { if(gGapList[k] != 0) { tagSequenceMass = gGapList[k]; } break; } } if(tagSequenceMass == aaPresentMass[i]) { test = FALSE; break; } j++; } } if(test) { correct = FALSE; break; } } i++; } } } return(correct); } /****************************AlterSubsequenceList********************************************** * * This function adds a subsequence onto the existing linked list of structs of type Sequence. * It adds structs in order of their score fields, so that the first in the list has the * highest score and the last in the list has the lowest score. * */ struct Sequence *AlterSubsequenceList(struct Sequence *firstPtr, struct Sequence *newPtr) { struct Sequence *currPtr, *previousPtr; char test = TRUE; INT_4 i = 0; previousPtr = firstPtr; /*This is to make sure currPtr and previousPtr are initialized, so that in the end I can find the last element in the list to free.*/ if(firstPtr->next != NULL) /*Take care of situations where there is only one subsequence.*/ { currPtr = firstPtr->next; } else { if(newPtr->score > firstPtr->score) { free(firstPtr); firstPtr = newPtr; return(firstPtr); } else { free(newPtr); return(firstPtr); } } /*Much of this is the same as from LinkSubsequenceList.*/ if(firstPtr == NULL) /*If this is the first struct of the list then do this.*/ { firstPtr = newPtr; return(firstPtr); } else { if(newPtr->score > firstPtr->score) /*If the struct to be added has the best score then*/ { newPtr->next = firstPtr; firstPtr = newPtr; } else /*Otherwise, go find the position for the new struct (based on its score field.*/ { while(currPtr->next != NULL) { if(newPtr->score > currPtr->score) /*I found the place.*/ { previousPtr->next = newPtr; newPtr->next = currPtr; test = FALSE; break; } previousPtr = currPtr; currPtr = currPtr->next; } if(test) /*Ok, I didn't find the place, and I'm at the end of the linked list, so this must have the lowest score of all. I'll leave the list alone and return. This is different from LinkSubsequenceList, which added the new struct on to the end of the list.*/ { free(newPtr); return(firstPtr); } } } /* Here's what's different from LinkSubsequenceList - I find the lowest score (or last in the * list) and I eliminate it. I set the previousPtr's next field to NULL to make it the new * last element in the list, and I free the currPtr. */ while(currPtr->next != NULL) { previousPtr = currPtr; currPtr = currPtr->next; } previousPtr->next = NULL; /*if(currPtr->nodeValue == 13322 && currPtr->score == 236) { i++; i++; } for debugging*/ free(currPtr); return(firstPtr); } /**********************FreeSequenceStructs**************************************** * * Used for freeing memory in a linked list. Bob DuBose tells me its best to free * space in the reverse order * that the space was malloc'ed. This routine does that very thing. * */ void FreeSequenceStructs(struct Sequence *s) { struct Sequence *currPtr, *nextPtr; currPtr = s; while(currPtr != NULL) { nextPtr = currPtr->next; free(currPtr); currPtr = nextPtr; } return; /* now unwind the recursion*/ } /************************************* StoreSubsequences ******************************************** * * Store this information in the linked list of Sequence structs. The values placed in the * extensionList are used to determine * values for the variables peptide[], score, peptideLength, gapNum, and nodeValue. These * variables are passed to a few functions that are used to set up the linked list of structs * of type Sequence, which contain the next set of subsequences (newSubsequencePtr */ struct Sequence *StoreSubsequences( struct Sequence *newSubsequencePtr, struct extension *extensionList, struct Sequence *currentSubsequence, INT_4 *lastNode, INT_4 lastNodeNum, INT_4 maxLastNode, INT_4 minLastNode, INT_4 *aaPresentMass, INT_4 *seqNum, INT_4 *subseqNum, SCHAR *sequenceNode) { INT_4 i, j; /* Loop index */ BOOLEAN test; INT_4 *peptide, score, peptideLength, nodeValue, gapNum; INT_2 nodeCorrection; peptide = (int *) calloc(MAX_PEPTIDE_LENGTH, sizeof(INT_4 )); if(peptide == NULL) { printf("StoreSubsequences: Out of memory"); exit(1); } /*Set up the peptide field so that it includes the previous values contained in the peptide field of currPtr (the subsequence currently under investigation.*/ for(i = 0; i < currentSubsequence->peptideLength; i++) { peptide[i] = currentSubsequence->peptide[i]; } i = 0; while(extensionList[i].mass > 0 && i < MAX_GAPLIST) { test = TRUE; /*Becomes FALSE if the data is stored as a final sequence.*/ score = currentSubsequence->score + extensionList[i].score; peptideLength = currentSubsequence->peptideLength + 1; if(peptideLength > MAX_PEPTIDE_LENGTH) { printf("StoreSubsequences: peptideLength > MAX_PEPTIDE_LENGTH\n"); exit(1); } peptide[peptideLength - 1] = extensionList[i].mass; gapNum = currentSubsequence->gapNum + extensionList[i].gapSize; nodeValue = currentSubsequence->nodeValue + extensionList[i].mass; nodeCorrection = currentSubsequence->nodeCorrection + extensionList[i].nodeCorrection; if(nodeCorrection >= 10) { nodeCorrection = nodeCorrection - 10; nodeValue = nodeValue + 1; } else if(nodeCorrection <= -10) { nodeCorrection = nodeCorrection + 10; nodeValue = nodeValue - 1; } if(nodeValue >= gAA1Min) /*can be terminated by known C-terminal aa*/ { if(nodeValue <= gAA1Max) { score = score + sequenceNode[maxLastNode]; peptideLength++; if(peptideLength > MAX_PEPTIDE_LENGTH) { printf("StoreSubsequences: peptideLength > MAX_PEPTIDE_LENGTH\n"); exit(1); } peptide[peptideLength - 1] = gAA1; nodeValue = nodeValue + gAA1; } } if(nodeValue >= gAA2Min) /*can be terminated by known C-terminal aa*/ { if(nodeValue <= gAA2Max) { score = score + sequenceNode[maxLastNode]; peptideLength++; if(peptideLength > MAX_PEPTIDE_LENGTH) { printf("StoreSubsequences: peptideLength > MAX_PEPTIDE_LENGTH\n"); exit(1); } peptide[peptideLength - 1] = gAA2; nodeValue = nodeValue + gAA2; } } /*Here's where the completed sequences are stored.*/ if(nodeValue >= minLastNode && nodeValue <= maxLastNode) { for(j = 0; j < lastNodeNum; j++) { if(nodeValue == lastNode[j]) { test = FALSE; /*Even if its not stored, I won't continue seq'ing this.*/ if((gapNum <= gParam.maxGapNum) && CorrectMass(peptide, peptideLength, aaPresentMass)) { if(*seqNum < gParam.finalSeqNum) { gFinalSequencePtr = LinkSubsequenceList(gFinalSequencePtr, LoadFinalSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); *seqNum = *seqNum + 1; } else { gFinalSequencePtr = AlterSubsequenceList(gFinalSequencePtr, LoadFinalSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); } } } } } if(test && nodeValue < minLastNode) { if(gapNum <= gParam.maxGapNum) /*Don't store if there are too many gaps.*/ { if(*subseqNum < gParam.topSeqNum) /*If there are not too many subsequences stored, then do this.*/ { newSubsequencePtr = LinkSubsequenceList(newSubsequencePtr, LoadSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); *subseqNum = *subseqNum + 1; } else /*If I have the max allowed number of subsequence, then do this.*/ { /*This is the new way, which removes all subsequences with the same low score*/ /*ClearLowestScore(newSubsequencePtr, subseqNum, maxLastNode, minLastNode); newSubsequencePtr = LinkSubsequenceList(newSubsequencePtr, LoadSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); *subseqNum = *subseqNum + 1;*/ /*This is the old way which was to replace subsequences one at a time*/ newSubsequencePtr = AlterSubsequenceList(newSubsequencePtr, LoadSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); } } } i++; } free(peptide); return newSubsequencePtr; } /************************************* ExtensionsSortDescend ******************************************** * * qsort sorting subroutine used in SortExtensions(). */ int ExtensionsSortDescend(const void *n1, const void *n2) { struct extension *n3 = (struct extension *)n1; struct extension *n4 = (struct extension *)n2; if (n3->singleAAFLAG > n4->singleAAFLAG) { return -1; } else if (n3->singleAAFLAG < n4->singleAAFLAG) { return 1; } else { if (n3->score > n4->score) { return -1; } else if (n3->score < n4->score) { return 1; } else return 0; } } /************************************* SortExtensions ******************************************** * */ struct extension *SortExtensions(struct extension *inExtensionList) { INT_4 i; INT_4 extensionCount; INT_4 highestExtensionScore, threshold; INT_4 outIndex, lastOneInScore; struct extension *outExtensionList; outExtensionList = (extension *) calloc(MAX_GAPLIST, sizeof(struct extension)); /* Note: memory is zeroed */ if(outExtensionList == NULL) { printf("SortExtensions: Out of memory"); exit(1); } /*Find the highest extension score.*/ highestExtensionScore = inExtensionList[0].score; extensionCount = 0; i = 0; while ( inExtensionList[i].mass > 0 && i < MAX_GAPLIST ) { if(inExtensionList[i].score > highestExtensionScore) { highestExtensionScore = inExtensionList[i].score; } i++; extensionCount++; } /*Set the extension score threshold.*/ threshold = highestExtensionScore * gParam.extThresh; /* * If a peptide tag has been entered and if the highestExtensionScore is over 100, then * that means that the highest scoring extension is to a superNode. In this case, don't * use an extension score threshold. */ if(gParam.tagNMass != 0 && gParam.tagCMass != 0) { if(highestExtensionScore > 100) { threshold = 0; } } /*If the number of extensions found are less than the specified maximum number of extensions per subsequence, then just put them in the outgoing list.*/ if(extensionCount <= gParam.maxExtNum) { outIndex = 0; for(i = 0; i < extensionCount; i++) { if(inExtensionList[i].score >= threshold && i < gParam.topSeqNum) { outExtensionList[outIndex] = inExtensionList[i]; outIndex++; if(outIndex >= MAX_GAPLIST) { printf("FATAL ERROR in SortExtensions(): Overflowed the outExtensionList ?????"); exit(1); } } } } if(extensionCount > gParam.maxExtNum) { /* There are too many extensions. Throw out the worst. */ /* First, sort the extensions by whether they are singleAA or not and then by score. */ qsort(inExtensionList, extensionCount, sizeof(struct extension), ExtensionsSortDescend); outIndex = 0; i = 0; while (i < extensionCount && outIndex < gParam.maxExtNum && outIndex < gParam.topSeqNum) { if (inExtensionList[i].score >= threshold) { outExtensionList[outIndex] = inExtensionList[i]; outIndex++; if(outIndex >= MAX_GAPLIST) { printf("FATAL ERROR in SortExtensions(): Overflowed the outExtensionList ?????"); exit(1); } } i++; } lastOneInScore = outExtensionList[outIndex - 1].score; /*now stick in the extensions with the same score as the lowest scoring extension*/ while(i < extensionCount && outIndex < gParam.topSeqNum) { if(inExtensionList[i].score == lastOneInScore) { outExtensionList[outIndex] = inExtensionList[i]; outIndex++; if(outIndex >= MAX_GAPLIST) { printf("FATAL ERROR in SortExtensions(): Overflowed the outExtensionList ?????"); exit(1); } } i++; } } free(inExtensionList); return outExtensionList; } /************************************* Score2aaExtension ******************************************** * * Examine some special cases and then calculate the score for the 2 AA extension */ struct extension Score2aaExtension( struct extension io2aaExtension, INT_4 startingNodeMass, INT_4 *oneEdgeNodes, INT_4 oneEdgeNodesIndex, BOOLEAN oneEdgeNNode, char sequenceNodeValue) { INT_4 i, j; /* Loop index */ INT_4 massDiff; INT_4 endingNodeMass = startingNodeMass + io2aaExtension.mass; BOOLEAN prolinePossible = FALSE; /* BOOLEAN glycinePossible = FALSE;*/ BOOLEAN oneEdgeCNode = FALSE; BOOLEAN precursorRegion = FALSE; REAL_4 precursor; INT_4 precursorMinErr, precursorPlusErr; precursor = (gParam.peptideMW + gParam.chargeState * gElementMass[HYDROGEN]) / gParam.chargeState; precursorPlusErr = precursor + 2 * gParam.fragmentErr + 0.5; precursorMinErr = precursor - 2 * gParam.fragmentErr; massDiff = io2aaExtension.mass - gGapList[P]; /*Subtract the mass of proline.*/ for(i = 0; i < gAminoAcidNumber; i++) { if(gGapList[i] != 0) { if(massDiff <= gGapList[i] + gParam.fragmentErr && massDiff >= gGapList[i] - gParam.fragmentErr) { prolinePossible = TRUE; break; } } } /* massDiff = io2aaExtension.mass - gGapList[G]; for(i = 0; i < gAminoAcidNumber; i++) { if(gGapList[i] != 0) { if(massDiff == gGapList[i]) { prolinePossible = true; break; } } } */ /*Find out if this node is a C-terminal one-edger.*/ for(i = 0; i < oneEdgeNodesIndex; i++) { if(endingNodeMass == oneEdgeNodes[i]) { oneEdgeCNode = TRUE; break; } } if(gParam.chargeState == 2) /*For +2 precursors, if the 2 aa extension extends over the precursor ion, then don't attenuate the score so much.*/ { if(startingNodeMass + gMonoMass_x100[G] <= precursor && endingNodeMass - gMonoMass_x100[G] >= precursor) { for(i = 0; i < gAminoAcidNumber; i++) { if((startingNodeMass + gGapList[i] >= precursorMinErr) && (startingNodeMass + gGapList[i] <= precursorPlusErr)) { if(endingNodeMass - precursorMinErr >= gMonoMass_x100[G] && endingNodeMass - precursorPlusErr <= gMonoMass_x100[W]) { for(j = 0; j < gAminoAcidNumber; j++) { if(endingNodeMass - gGapList[j] >= precursorMinErr && endingNodeMass - gGapList[j] <= precursorPlusErr) { precursorRegion = TRUE; break; } } } break; } } } } /* Now score the two amino acid extension depending on if its connecting two one-edge * nodes, one regular node and a one-edge node, or two regular nodes. */ if(oneEdgeCNode && oneEdgeNNode) { /* Don't penalize as heavily for making a 2 aa jump that ends at the C-terminus. */ io2aaExtension.score = sequenceNodeValue * EDGE_EDGE_PENALTY; io2aaExtension.gapSize = 0; } else { if(prolinePossible) { io2aaExtension.score = sequenceNodeValue * PROLINE_PENALTY; io2aaExtension.gapSize = 0; } else { if(precursorRegion) { io2aaExtension.score = sequenceNodeValue * PRECURSOR_PENALTY; } else { /*if(glycinePossible) { io2aaExtension.score = sequenceNodeValue * GLYCINE_PENALTY; } else {*/ if(oneEdgeCNode || oneEdgeNNode) { io2aaExtension.score = sequenceNodeValue * NODE_EDGE_PENALTY; } else { if(oneEdgeCNode == FALSE && oneEdgeNNode == FALSE) { io2aaExtension.score = sequenceNodeValue * NODE_NODE_PENALTY; } } /*}*/ } } } return io2aaExtension; } /*************************************AddExtensions******************************************** * * This function adds single amino acid extensions onto the subsequences found in the linked * list starting w/ subsequencePtr. As a result, a new linked list starting with * newSubsequencePtr is generated. When all of the extensions have been made and the list * starting with newSubsequencePtr is complete, then the list starting with subsequencePtr * is free'ed, and newSubsequencePtr is returned. In addition to making one amino acid * jumps between nodes, this function performs two amino acid jumps for subsequences that * cannot be extended any further, which leads to no penalty in the extension score. If * a jump is made between a one-edge node and a regular node, then this extension score is * penalized by 0.75. If a jump is made between two nodes that can be extended by single * amino acid jumps, then that extension score is penalized by 0.5. * The scores for the subsequences are derived from the array * sequenceNode, and are a summation of the scores for the nodes within a subsequence. The * array gGapList and gGapListIndex describe the possible extensions that are allowed. * The array aaPresent is a character array listing in single letter code, the amino acids * known to be present. If a completed sequence lacks one or more of these amino acids * then it is not stored. extThresh is the extension threshold as described earlier, and * maxExtNum is the upper limit on the number of extensions allowed per subsequence. The * maxGapNum is the upper limit on the number of two amino acid extensions allowed per * sequence (excluding the N-termninal two amino acids and certain C-terminal cases). The * finalSeqNum is the upper limit on the number of completed sequences that will be stored * in the linked list of Sequence structs starting w/ gFinalSequencePtr. The topSeqNum * is the maximum number of subsequences that are in the linked list of sequences (both * the list starting with subsequencePtr and newSubsequencePtr). */ struct Sequence *AddExtensions(struct Sequence *subsequencePtr, SCHAR *sequenceNode, INT_4 *oneEdgeNodes, INT_4 oneEdgeNodesIndex, INT_4 *aaPresentMass, INT_4 topSeqNum, INT_4 *lastNode, INT_4 lastNodeNum, INT_4 *seqNum, INT_4 maxLastNode, INT_4 minLastNode, INT_4 lowSuperNode, INT_4 highSuperNode) { struct Sequence *newSubsequencePtr; struct Sequence *currPtr; char doIt; BOOLEAN oneEdgeNNode, test; INT_4 i, j, k, testValue, subseqNum, z=0, subseqCount; INT_4 extNum; INT_4 massDiff; struct extension clearExtension; struct extension *extensionList; clearExtension.gapSize = 0; clearExtension.mass = 0; clearExtension.singleAAFLAG = 0; clearExtension.score = 0; clearExtension.nodeCorrection = 0; extensionList = (extension *) calloc(MAX_GAPLIST, sizeof(struct extension)); /* Note: memory is zeroed */ if(extensionList == NULL) { printf("AddExtensions: Out of memory"); exit(1); } subseqCount = 0; /*This is used to count the number of subsequences in the old list*/ newSubsequencePtr = NULL; currPtr = subsequencePtr; subseqNum = 0; /*This is used to count the number of subsequences in the new list and is used in the StoreSubsequences function to ensure that only gParam.topSeqNum subsequences are stored.*/ while(currPtr != NULL) { /* Clear the extensionList */ for (i = 0; i < MAX_GAPLIST; i++) { extensionList[i] = clearExtension; } subseqCount++; extNum = 0; oneEdgeNNode = TRUE; /*If it remains TRUE, then no one amino acid extensions found.*/ for(i = 0; i < gAminoAcidNumber; i++) /*Find the one amino acid extensions.*/ { if(gGapList[i] != 0) { testValue = currPtr->nodeValue + gGapList[i]; doIt = TRUE; if(currPtr->nodeValue < lowSuperNode && testValue > highSuperNode) { doIt = FALSE; } if(testValue >= gGraphLength) { doIt = FALSE; /*you've gone past the graph length*/ } if(sequenceNode[testValue] != 0 && testValue <= maxLastNode && doIt) { /* Add the single AA extension to the extensionList */ extensionList[extNum].mass = gGapList[i]; extensionList[extNum].gapSize = 0; extensionList[extNum].singleAAFLAG = 1; extensionList[extNum].score = sequenceNode[testValue]; extensionList[extNum].nodeCorrection = gNodeCorrection[i]; extNum++; if(extNum >= MAX_GAPLIST) { printf("AddExtensions: extNum >= MAX_GAPLIST\n"); exit(1); } oneEdgeNNode = FALSE; } } } /* * Now find the two amino acid extensions. Make sure that the two amino acid extension does not * include one of the one amino acid extensions. IE, if I find an edge for Ala, then I cannot * use a two amino acid edge of 199, since 128 + 71 = 199. */ for(i = gAminoAcidNumber; i <= gGapListIndex; i++)/*Start at the end of the one aa extensions and move up from there.*/ { testValue = currPtr->nodeValue + gGapList[i];/*This is the test mass (nominal).*/ doIt = TRUE; /*check that no superNodes are involved*/ if(currPtr->nodeValue < lowSuperNode && testValue > highSuperNode) { doIt = FALSE; } if(testValue >= gGraphLength) { doIt = FALSE; /*you've gone past the graph length*/ } if(doIt && sequenceNode[testValue] != 0 && testValue <= maxLastNode) { /* If there is any evidence at that mass, and if the node is less than the peptide mass.*/ test = TRUE; /*test is set to true, and if it continues to be true (see below), then this extension is allowed.*/ j = 0; while (extensionList[j].singleAAFLAG == 1 && j < extNum) { /*Look at the one aa extensions.*/ massDiff = gGapList[i] - extensionList[j].mass; for(k = 0; k < gAminoAcidNumber; k++) /*k = 0; k < gGapListIndex; k++*/ { if(gGapList[k] != 0) { if(massDiff <= gGapList[k] + gParam.fragmentErr && massDiff >= gGapList[k] - gParam.fragmentErr) { test = FALSE; /*If the mass difference is equal to an amino acid mass then its not allowed and test = FALSE.*/ break; } } } j++; } if(test) /*If test is still true then go ahead and save these as extensions.*/ { /*Add the 2 AA extension.*/ extensionList[extNum].mass = gGapList[i]; extensionList[extNum].gapSize = 1; extensionList[extNum].singleAAFLAG = 0; extensionList[extNum].nodeCorrection = 0; /*Examine some special cases and then calculate the score for the 2 AA extension*/ extensionList[extNum] = Score2aaExtension(extensionList[extNum], currPtr->nodeValue, oneEdgeNodes, oneEdgeNodesIndex, oneEdgeNNode, sequenceNode[testValue]); extNum++; /*Increment the number of extensions.*/ if(extNum >= MAX_GAPLIST) { printf("AddExtensions: extNum >= MAX_GAPLIST\n"); exit(1); } } } } /* * Now I need to find the best extensions, ie, the top maxExtNum of them and only if these * extensions are greater than the product of the highest score and extThresh. * bestExtensions[MAX_GAPLIST], bestExtScore[MAX_GAPLIST], bestExtNum; */ if(extNum > 0) { if(currPtr->peptideLength >= MAX_PEPTIDE_LENGTH) { printf("LutefiskSubseqMaker:AddExtensions peptide length exceeds array length"); exit(1); } extensionList = SortExtensions( extensionList ); /* Store this information in the linked list of Sequence structs. The values placed in the * arrays bestExtensions[], bestExtScore[], and bestExtGapNum[] are used to determine * values for the variables peptide[], score, peptideLength, gapNum, and nodeValue. These * variables are passed to a few functions that are used to set up the linked list of structs * of type Sequence, which contain the next set of subsequences (newSubsequencePtr*/ newSubsequencePtr = StoreSubsequences(newSubsequencePtr, extensionList, currPtr, lastNode, lastNodeNum, maxLastNode, minLastNode, aaPresentMass, seqNum, &subseqNum, sequenceNode); } currPtr = currPtr->next; /*Go to the subsequence.*/ } /* * Hack back the number of subsequences to be less than gParam.topSeqNum. */ /*while(subseqNum > gParam.topSeqNum) { ClearLowestScore(newSubsequencePtr, &subseqNum, maxLastNode, minLastNode); }*/ if(subseqCount > gSubseqNum) { gSubseqNum = subseqCount; /*keep track of the most numbers of subsequences used*/ } free(extensionList); FreeSequenceStructs(subsequencePtr); return(newSubsequencePtr); } /****************LinkSubsequenceList********************************************************** * * This function adds a subsequence onto the existing linked list of structs of type Sequence. * It adds structs in order of their score fields, so that the first in the list has the * highest score and the last in the list has the lowest score. * */ struct Sequence *LinkSubsequenceList(struct Sequence *firstPtr, struct Sequence *newPtr) { struct Sequence *currPtr, *previousPtr; char test = TRUE; if(firstPtr == NULL) /*If this is the first struct of the list then do this.*/ firstPtr = newPtr; else { if(newPtr->score > firstPtr->score) /*If the struct to be added has the best score then*/ { newPtr->next = firstPtr; firstPtr = newPtr; } else /*Otherwise, go find the position for the new struct (based on its score field.*/ { previousPtr = firstPtr; currPtr = firstPtr->next; while(currPtr != NULL) { if(newPtr->score > currPtr->score) /*I found the place.*/ { previousPtr->next = newPtr; newPtr->next = currPtr; test = FALSE; break; } previousPtr = currPtr; currPtr = currPtr->next; } if(test) /*Ok, I didn't find the place, and I'm at the end of the linked list, so this must have the lowest score of all. I'll stick it on the end.*/ { previousPtr->next = newPtr; newPtr->next = NULL; } } } return(firstPtr); } /******************LoadFinalSequenceStruct******************************************** * * LoadStruct puts the nominal extension mass in the peptide[] field, and increments the * value of peptideLength by one. The fields score and nodeValue are also modified. Of * course, to do this the function finds some memory, and this value is returned as a pointer * to a struct of type Sequence (which now contains all of this data). * struct Sequence * { * INT_4 peptide[MAX_PEPTIDE_LENGTH]; * INT_4 peptideLength; * INT_4 score; * INT_4 nodeValue; * struct Sequence *next; * }; */ struct Sequence *LoadFinalSequenceStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum, INT_2 nodeCorrection) { struct Sequence *currPtr; INT_4 i; REAL_4 scoreAdjuster; currPtr = (struct Sequence *) malloc(sizeof(struct Sequence)); if(currPtr == NULL) { printf("LoadFinalSequenceStruct: Out of mammories"); exit(1); } scoreAdjuster = nodeValue; /*Here's the expected average peptide length.*/ scoreAdjuster = scoreAdjuster / gAvResidueMass; if(peptideLength == 0) { printf("LoadFinalSequenceStruct: peptideLength == 0\n"); exit(1); } scoreAdjuster = scoreAdjuster / peptideLength; /*Here's the ratio between the average expected and actual length.*/ for(i = 0; i < peptideLength; i++) { currPtr->peptide[i] = peptide[i]; } currPtr->peptideLength = peptideLength; currPtr->score = score * scoreAdjuster; currPtr->gapNum = gapNum; currPtr->nodeValue = nodeValue; currPtr->nodeCorrection = nodeCorrection; currPtr->next = NULL; return(currPtr); } /******************LoadSequenceStruct******************************************** * * LoadStruct puts the nominal extension mass in the peptide[] field, and increments the * value of peptideLength by one. The fields score and nodeValue are also modified. Of * course, to do this the function finds some memory, and this value is returned as a pointer * to a struct of type Sequence (which now contains all of this data). * struct Sequence * { * INT_4 peptide[MAX_PEPTIDE_LENGTH]; * INT_4 peptideLength; * INT_4 score; * INT_4 nodeValue; * struct Sequence *next; * }; */ struct Sequence *LoadSequenceStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum, INT_2 nodeCorrection) { struct Sequence *currPtr; INT_4 i; currPtr = (struct Sequence *)calloc(1, sizeof(struct Sequence)); if(currPtr == NULL) { printf("LoadSequenceStruct: Out of mammories"); exit(1); } for(i = 0; i < peptideLength; i++) { currPtr->peptide[i] = peptide[i]; } currPtr->peptideLength = peptideLength; currPtr->score = score; currPtr->gapNum = gapNum; currPtr->nodeValue = nodeValue; currPtr->nodeCorrection = nodeCorrection; currPtr->next = NULL; return(currPtr); } /**********************************NterminalSubsequences**************************************** * * This function sets up the first batch of subsequences. It starts with a single subsequence * containing the N-terminal group (usually hydrogen of mass 1) and then tries to connect with * nodes that are either one or two amino acids higher in mass. It uses the array gapList * to determine what is an acceptable mass difference. A linked list of structs of type * Sequence is generated where the first struct in the list has the highest score, and the last * struct in the list has the lowest score. There is an upper limit on the number of * N-terminal extensions - maxExtNum - and the scores for these must be greater than the * product of the highest scoring subsequence and the value of extThresh. If the peptide * mass is small enough, then the sequences are stored in a linked list of final sequences, * of which there will be an upper limit of finalSeqNum. This function returns a pointer to * a struct of type Sequence, which is the first element in a linked list of subsequences. */ struct Sequence *NterminalSubsequences(SCHAR *sequenceNode, INT_4 maxLastNode, INT_4 lowSuperNode, INT_4 highSuperNode) { struct Sequence *subsequencePtr; INT_4 i, j, k, m, testValue, nTerminus; INT_4 *extensions, *extScore, extNum; INT_4 *bestExtensions, *bestExtScore, bestExtNum; INT_4 highestExtensionScore; INT_4 threshold; INT_4 score, *peptide, peptideLength, nodeValue, gapNum; INT_4 *threeAA, threeAANum, sum; INT_4 sameNum, averageExtension, *sameExtension; INT_2 nodeCorrection; char nTerminusPossible, duplicateFlag; char doIt; char sameTest; sameExtension = (int *) malloc(gParam.fragmentErr * 20 * sizeof(INT_4)); if(sameExtension == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } threeAA = (int *) malloc(gAminoAcidNumber*gAminoAcidNumber*gAminoAcidNumber*sizeof(INT_4)); if(threeAA == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } extensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(extensions == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } extScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(extScore == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } bestExtensions = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(bestExtensions == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } bestExtScore = (int *) malloc(MAX_GAPLIST * sizeof(INT_4 )); if(bestExtScore == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } peptide = (int *) malloc(MAX_PEPTIDE_LENGTH * sizeof(INT_4 )); if(peptide == NULL) { printf("NterminalSubsequences: Out of memory."); exit(1); } /* Fill in the masses for three amino acids.*/ threeAANum = 0; for(i = 0; i < gAminoAcidNumber; i++) /*Fill in the masses of the 3 AA extensions.*/ { for(j = 0; j < gAminoAcidNumber; j++) { for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[i] != 0 && gGapList[j] != 0 && gGapList[k] != 0) { sum = gGapList[i] + gGapList[j] + gGapList[k]; duplicateFlag = FALSE; for(m = 0; m < threeAANum; m++) { if(threeAA[m] == sum) { /* We already have this mass in threeAA so don't add it to the list. */ duplicateFlag = TRUE; break; } } if(duplicateFlag == FALSE) { for(m = 0; m <= gGapListIndex; m++) { if(gGapList[m] == sum) { /* We already have this mass so don't add it to the list. */ duplicateFlag = TRUE; break; } } } if(!duplicateFlag) { threeAA[threeAANum] = sum; threeAANum++; } } } } } extNum = 0; subsequencePtr = NULL; nTerminus = gParam.modifiedNTerm + 0.5; /* Find the one, two, and three amino acid jumps from gGapList and threeAA.*/ for(i = nTerminus + gMonoMass_x100[G]; i < gMonoMass_x100[W] * 3; i++) /*step thru each node*/ { if(sequenceNode[i] != 0) /*ignore the nodes w/ zero evidence*/ { nTerminusPossible = FALSE; /*start assuming that this is not an extension*/ /* Check for one and two amino acid extensions using the gGapList array.*/ for(j = 0; j <= gGapListIndex; j++) { if(gGapList[j] != 0) { if(i - nTerminus == gGapList[j]) { nTerminusPossible = TRUE; /*its a possible extension*/ break; } } } /* Now check for three amino acid extensions if it wasn't a one or two aa extension.*/ if(nTerminusPossible == FALSE) { for(j = 0; j < threeAANum; j++) { if(threeAA[j] != 0) { if(i - nTerminus == threeAA[j]) { nTerminusPossible = TRUE; break; } } } } doIt = TRUE; /*check for superNodes when sequencetag specified*/ if(nTerminus < lowSuperNode && i > highSuperNode) { doIt = FALSE; } if(nTerminusPossible && i <= maxLastNode && doIt) /*save as an extension?*/ { extensions[extNum] = i - nTerminus; extScore[extNum] = sequenceNode[i]; extNum++; } } } /* Find extensions that are 1 node unit apart, and consolidate them.*/ for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { sameNum = 0; averageExtension = extensions[i]; for(j = 0; j < extNum; j++) { if(j != i && extScore[j] != 0) { sameTest = FALSE; for(k = 0; k < sameNum; k++) { if(extensions[sameExtension[k]] - extensions[j] == 1 || extensions[j] - extensions[sameExtension[k]] == 1) { sameTest = TRUE; } } if(extensions[i] - extensions[j] == 1 || extensions[j] - extensions[i] == 1 || sameTest) { sameExtension[sameNum] = j; averageExtension += extensions[j]; sameNum++; } } } if(sameNum != 0) { averageExtension = ((float)averageExtension / (sameNum + 1)) + 0.5; /*count the i extension and round the value*/ extensions[i] = averageExtension; for(j = 0; j < sameNum; j++) { extScore[sameExtension[j]] = 0; } } } } /* Get rid of the extensions that are within gParam.fragmentErr of each other.*/ for(i = 0; i < extNum; i++) { if(extScore[i] != 0) { for(j = 0; j < extNum; j++) { if(extScore[j] != 0 && i != j) { if(extensions[i] <= extensions[j] + gParam.fragmentErr && extensions[i] >= extensions[j] - gParam.fragmentErr) { if(extScore[i] >= extScore[j]) { extScore[j] = 0; } else { extScore[i] = 0; } } } } } } /* If two extensions differ by the mass of an amino acid, then the higher mass one's intensity is assigned a zero so that it is removed in the section below.*/ if(extNum > 0) { for(i = 0; i < extNum; i++) { for(j = i + 1; j < extNum; j++) { testValue = extensions[j] - extensions[i]; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(testValue <= gGapList[k] + gParam.fragmentErr && testValue >= gGapList[k] - gParam.fragmentErr) { extScore[j] = 0; } } } } } } /* * Now I need to find the best extensions, ie, the top maxExtNum of them and only if these * extensions are greater than the product of the highest score and extThresh. * bestExtensions[MAX_GAPLIST], bestExtScore[MAX_GAPLIST], bestExtNum; */ highestExtensionScore = extScore[0]; /*Find the highest extension score.*/ for(i = 0; i < extNum; i++) { if(extScore[i] > highestExtensionScore) { highestExtensionScore = extScore[i]; } } threshold = highestExtensionScore * gParam.extThresh; /*Set the extension score threshold.*/ /* * If a peptide tag has been entered and if the highestExtensionScore is over 100, then * that means that the highest scoring extension is to a superNode. In this case, don't * use an extension score threshold. */ if(gParam.tagNMass != 0 && gParam.tagCMass != 0) { if(highestExtensionScore > 100) { threshold = 0; } } bestExtNum = 0; for(i = 0; i < extNum; i++) { if(extScore[i] >= threshold) { bestExtensions[bestExtNum] = extensions[i]; bestExtScore[bestExtNum] = extScore[i]; bestExtNum++; } } /* Make sure there are enough subsequences allowed.*/ if(bestExtNum > gParam.topSeqNum) { bestExtNum = gParam.topSeqNum; } /* Store this information in the linked list of Sequence structs.*/ for(i = 0; i < bestExtNum; i++) { score = bestExtScore[i]; peptide[0] = bestExtensions[i]; peptideLength = 1; gapNum = 0; nodeValue = nTerminus + bestExtensions[i]; j = gParam.modifiedNTerm * 10 + 0.5; k = gParam.modifiedNTerm + 0.5; k = k * 10; nodeCorrection = j - k; subsequencePtr = LinkSubsequenceList(subsequencePtr, LoadSequenceStruct(peptide, peptideLength, score, nodeValue, gapNum, nodeCorrection)); } free(extensions); free(extScore); free(bestExtensions); free(bestExtScore); free(peptide); free(sameExtension); free(threeAA); return(subsequencePtr); } /**********************************SubsequenceMaker******************************************** * * This function uses a subsequencing approach to derive a list of completed peptide sequences. * The array sequenceNode contains the information used to build the subsequences. The indexing * of this array corresponds to nominal masses of b ions, and the information contained * relates to the probability that a cleavage was actually present at that nominal mass. The * array oneEdgeNodes lists the nodes that could not be connected to any other node N-terminal * to it, and oneEdgeNodesIndex is the number of such nodes listed in the array. The arrays * aaPresent and aaAbsent contain the amino acids (single letter code) that are either present * or absent in the peptide. The value of cysMW allows for alteration of the residue mass of * cysteine, which can be alkylated w/ various reagents. The extThresh is the fractional * value of the highest ranked extension to a particular subsequence that can be used in * the formation of the next list of subsequences. The value of maxExtNum is the maximum * number of extensions per subsequence that is allowed. The finalSeqNum is the upper limit * on the number of completed sequences that will be stored and scored later. The topSeqNum * is the upper limit on the number of subsequences that are to be allowed. If I run out * of space, I may make these last two parameters changeable by the program - ie, if there * is no space left then it will automatically purge 100 or so sequences and reset these * upper limits to 100 less than before. The maxGapNum is the maximum number of gaps that * are allowed - its either zero or one. * */ struct Sequence *SubsequenceMaker(INT_4 *oneEdgeNodes, INT_4 oneEdgeNodesIndex, SCHAR *sequenceNode) { struct Sequence *subsequencePtr; INT_4 *lastNode, lastNodeNum, aaPresentMass[AMINO_ACID_NUMBER]; INT_4 maxLastNodeNum; /*max num of lastnodes*/ INT_4 maxLastNode, minLastNode; /*the highest and lowest last node value*/ INT_4 i, j, seqNum, finalSeqNum, topSeqNum, correctPeptideLength; INT_4 highSuperNode, lowSuperNode; /*used when a specific sequence tag is to be used*/ INT_4 halfAsManySubsequences, quarterAsManySubsequences; char test; gFinalSequencePtr = NULL; subsequencePtr = NULL; seqNum = 0; gSubseqNum = 0; finalSeqNum = gParam.finalSeqNum; topSeqNum = gParam.topSeqNum; halfAsManySubsequences = gParam.topSeqNum / 2; quarterAsManySubsequences = gParam.topSeqNum / 4; /* Determine the values of highSuperNode and lowSuperNode*/ highSuperNode = gGraphLength -1; lowSuperNode = 0; if(gParam.tagSequence[0] != '*') { for(i = gGraphLength - 1; i >= 0; i--) { if(sequenceNode[i] == -1) { highSuperNode = i; break; } } for(i = 0; i < gGraphLength; i++) { if(sequenceNode[i] == -1) { lowSuperNode = i; break; } } for(i = 0; i < gGraphLength; i++) { if(sequenceNode[i] == -1) { sequenceNode[i] = 127; } } } /* Calculated the maximum number of last nodes (this was a source of a memory over-run.*/ if(gParam.fragmentErr > gParam.peptideErr) { maxLastNodeNum = 4 * gParam.fragmentErr; } else { maxLastNodeNum = 4 * gParam.peptideErr; } lastNode = (int *) malloc(maxLastNodeNum * sizeof(INT_4 )); /*Will contain C-terminal evidence.*/ if(lastNode == NULL) { printf("SubsequenceMaker: Out of memory"); exit(1); } /* Determine values for aaPresentMass (nominal masses for the single letter code found in aaPresent.*/ if(gParam.aaPresent[0] == '*') { aaPresentMass[0] = -1; } else { i = 0; while(gParam.aaPresent[i] != 0) { for(j = 0; j < gAminoAcidNumber; j++) { if(gParam.aaPresent[i] == gSingAA[j]) { if(gGapList[j] != 0) { aaPresentMass[i] = gGapList[j]; } break; } } i++; } aaPresentMass[i] = -1; } /* * Find the C-terminal nodes, so that I'll know when I'm finished. */ lastNodeNum = 0; i = gGraphLength - 1; test = TRUE; while(i > 0 && test) { if(sequenceNode[i] > 0) { test = FALSE; maxLastNode = i; while(sequenceNode[i] > 0) { lastNode[lastNodeNum] = i; minLastNode = i; lastNodeNum++; if(lastNodeNum > maxLastNodeNum) { printf("lastNodeNum exceeds maxLastNodeNum."); exit(1); } i--; } } i--; } if(gParam.proteolysis == 'T') { gAA1Max = maxLastNode - gMonoMass_x100[R]; gAA1Min = minLastNode - gMonoMass_x100[R]; gAA1 = gMonoMass_x100[R]; gAA2Max = maxLastNode - gMonoMass_x100[K]; gAA2Min = minLastNode - gMonoMass_x100[K]; gAA2 = gMonoMass_x100[K]; } else if(gParam.proteolysis == 'K') { gAA1Max = maxLastNode - gMonoMass_x100[K]; gAA1Min = minLastNode - gMonoMass_x100[K]; gAA1 = gMonoMass_x100[K]; gAA2Max = 0; gAA2Min = 0; gAA2 = 0; } else if(gParam.proteolysis == 'E') { gAA1Max = maxLastNode - gMonoMass_x100[E]; gAA1Min = minLastNode - gMonoMass_x100[E]; gAA1 = gMonoMass_x100[E]; gAA2Max = maxLastNode - gMonoMass_x100[D]; gAA2Min = minLastNode - gMonoMass_x100[D]; gAA2 = gMonoMass_x100[D]; } /* * The function NterminalSubsequences starts at the N-terminal node (usually a value of one), * and jumps by one amino acid, or by two amino acids to generate the first set of subsequences. * This linked list of subsequences is passed as a pointer to the first element in the array. */ if(gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q') { subsequencePtr = NterminalSubsequences(sequenceNode, maxLastNode, lowSuperNode, highSuperNode); } if(gParam.fragmentPattern == 'L') { subsequencePtr = LCQNterminalSubsequences(sequenceNode, maxLastNode, lowSuperNode, highSuperNode); } if(gCheckItOut) /*If I want to see how well my subsequencing is going gCheckItOut is TRUE.*/ { correctPeptideLength = 1; amIHere(correctPeptideLength, subsequencePtr); } /* * The function AddExtensions primarily adds one amino acid at a time to the existing list * of subsequences. If a subsequence cannot be extended, then a two amino acid gap is jumped * to one of the one-edged nodes from the C-terminus. No penalty is extracted as a consequence * of jumping, since its only allowed between two nodes that cannot be extended. Other * two amino acid jumps are penalized so that the extension score is reduced by factors * NODE_NODE_PENALTY, NODE_EDGE_PENALTY, and EDGE_EDGE_PENALTY as defined in lutefisk.h. * Once there are no more nodes remaining, then the function returns a NULL value. */ while(subsequencePtr != NULL) { if((clock() - gParam.startTicks)/ CLOCKS_PER_SEC > 30) { gParam.topSeqNum = halfAsManySubsequences; /*its taking too long, so speed it up*/ } if((clock() - gParam.startTicks)/ CLOCKS_PER_SEC > 60) { gParam.topSeqNum = quarterAsManySubsequences; /*this is really taking too long*/ } subsequencePtr = AddExtensions(subsequencePtr, sequenceNode, oneEdgeNodes, oneEdgeNodesIndex, aaPresentMass, topSeqNum, lastNode, lastNodeNum, &seqNum, maxLastNode, minLastNode, lowSuperNode, highSuperNode); if(gCheckItOut) { correctPeptideLength++; amIHere(correctPeptideLength, subsequencePtr); } } if(gCheckItOut) { correctPeptideLength = 14; amIHere(correctPeptideLength, gFinalSequencePtr); } if(gParam.fMonitor && gCorrectMass) { printf("Subsequencing is finished.\n"); printf("Max subsequences: %d \n", gSubseqNum); } free(lastNode); return(gFinalSequencePtr); } lutefisk-1.0.7+dfsg.orig/src/LutefiskGlobalDeclarations.c0000644000175000017500000000560210303626470023350 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ #include "LutefiskDefinitions.h" /* Global variables that have been declared in LutefiskGlobals.h */ tmsms msms; /* Structure to hold data about the CID file. */ tionWeights gWeightedIonValues; tParam gParam; /* Structure for the Lutefisk parameters loaded from Lutefisk.param */ char gSingAA[AMINO_ACID_NUMBER]; REAL_4 gMonoMass[AMINO_ACID_NUMBER]; REAL_4 gAvMass[AMINO_ACID_NUMBER]; INT_4 gNomMass[AMINO_ACID_NUMBER]; REAL_4 gWrongXCorrScore[WRONG_SEQ_NUM + 1]; REAL_4 gWrongIntScore[WRONG_SEQ_NUM + 1]; REAL_4 gWrongProbScore[WRONG_SEQ_NUM + 1]; REAL_4 gWrongQualityScore[WRONG_SEQ_NUM + 1]; REAL_4 gWrongComboScore[WRONG_SEQ_NUM + 1]; INT_4 gWrongIndex = 0; INT_4 gTagLength; INT_4 gSingleAACleavageSites = 0; BOOLEAN gCorrectMass; BOOLEAN gFirstTimeThru; REAL_4 gElementMass[ELEMENT_NUMBER] = { 1.007825035, /* H */ 12.00, /* C */ 14.003074002, /* N */ 15.99491463, /* O */ 30.973762, /* P */ 31.972070698 /* S */ }; INT_4 gAminoAcidNumber = AMINO_ACID_NUMBER; INT_4 gMonoMass_x100[AMINO_ACID_NUMBER]; /*Values assigned in CreateGlobalIntegerMassArrays*/ INT_4 gElementMass_x100[ELEMENT_NUMBER]; /*Values assigned in CreateGlobalIntegerMassArrays*/ INT_4 gMultiplier; /*Value assigned in CreateGlobalIntegerMassArrays*/ INT_4 gNodeCorrection[AMINO_ACID_NUMBER * AMINO_ACID_NUMBER]; /*Value assigned in CreateGlobalIntegerMassArrays*/ INT_4 gElementCorrection[ELEMENT_NUMBER]; /*Value assigned in CreateGlobalIntegerMassArrays*/ INT_4 gAvMonoTransition, gWater, gAmmonia, gCO, gAvResidueMass, gGraphLength; INT_4 gGapList[MAX_GAPLIST]; INT_4 gGapListIndex; INT_4 gEdmanData[MAX_PEPTIDE_LENGTH][AMINO_ACID_NUMBER]; INT_4 gMaxCycleNum; REAL_4 H2O; REAL_4 NH3; REAL_4 gIonTypeWeightingTotal; BOOLEAN gDatabaseSeqCorrect; lutefisk-1.0.7+dfsg.orig/src/LutefiskSummedNode.c0000644000175000017500000005522110303626634021663 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* Richard S. Johnson 6/96 LutefiskSummedNode is a file containing the function SummedNodeScore plus its associated functions. It was written to be used as part of a program called "LutefiskXP", which is used to aid in the interpretation of CID data of peptides. The general aim of this (and the function SummedNodeScore) is to connect the nodes starting from the C-terminal node(s), where the differences between the nodes corresponds to the nominal mass of an amino acid residue. If a node that is connected to the C-terminus is found that cannot be extended any further towards the N-terminus, then it is saved as a "one-edged node". These special nodes are used later for making two amino acid jumps (for those situations where there is no fragmentation at particular peptide bonds). */ #include #include #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" /*Define some globals that were defined in LutefiskMain.c.*/ extern INT_4 gNomMass[AMINO_ACID_NUMBER]; /********************************AddExtraNodes*********************************************** * * This function compares the arrays "sequenceNode" and "evidence" to see if there are * nodes that were not connected to the C-terminus. It ignores nodes that are adjacent * to nodes that had been connected. It adds the values in sequenceNodeN and sequenceNodeC * and puts them into the array sequenceNode. */ void AddExtraNodes(SCHAR *sequenceNode, SCHAR *sequenceNodeN, SCHAR *sequenceNodeC, char *evidence) { INT_4 i, j, k, newNodeValue; char test; i = gGraphLength - 1; /*Start out at the maximum node position.*/ while(i >= 0) /*Don't let the index value go negative (there are no negative b ions).*/ { if(evidence[i] != 0) /*If I hit a node that has any cleavage evidence, then do this.*/ { test = TRUE; /*Test is used to determine if consecutive evidence nodes have been represented already in the sequenceNode array. In other words, if there is a series of three consecutive nodes w/ non-zero evidence, it may be that the middle one actually can be connected to the C-terminus (as above), and therefore that node position or index value for the array sequenceNode is non-zero. In such situations, I don't bother adding the adjacent nodes to the array sequenceNode.*/ j = i; /*I need a new index to step through the consecutive non-zero nodes.*/ while(evidence[j] != 0) { if(sequenceNode[j] != 0) { test = FALSE; /*I've found a node that has already been included in sequenceNode.*/ } j--; /*Keep stepping down so that I can find the end of this series of nodes.*/ } if(test || gParam.fragmentErr <= 0.5 * gMultiplier) /*Alter the relevant values of sequenceNode. If the fragment tolerance is less than 0.5, then adjacent nodes are not due to slop in mass assignments, and should therefore be taken seriously.*/ { for(k = j + 1; k <= i; k++) /*evidence[j] is zero, so just go up to it.*/ { newNodeValue = sequenceNodeC[k] + sequenceNodeN[k]; if(newNodeValue > 127 || newNodeValue < -127) { newNodeValue = 127; } if(newNodeValue > sequenceNode[k]) { sequenceNode[k] = newNodeValue; } } } i = j + 1; /*Must be j+1, cuz its incremented down before starting the while loop.*/ } i--; } return; } /********************************SortOneEdgeNodes******************************************* * * This function sorts the list of one edge nodes and removes the redundancies and sorts by * increasing mass. */ void SortOneEdgeNodes(INT_4 *oneEdgeNodes, INT_4 *oneEdgeNodesIndex) { INT_4 i; char test = TRUE; INT_4 *tempNode, tempIndex, smallestTemp, smallestTempValue; tempNode = (int *) malloc(gGraphLength * sizeof(INT_4)); /*Will contain C-terminal evidence.*/ if(tempNode == NULL) { printf("SortOneEdgeNodes: Out of memory"); exit(1); } if(*oneEdgeNodesIndex >= gGraphLength) { printf("SortOneEdgeNodes: *oneEdgeNodesIndex >= gGraphLength\n"); exit(1); } tempIndex = 0; while(test) { test = FALSE; i = 0; while(i < *oneEdgeNodesIndex) /*Find a positive number.*/ { if(oneEdgeNodes[i] > 0) { test = TRUE; smallestTemp = i; smallestTempValue = oneEdgeNodes[i]; break; } i++; } if(test) { for(i = 0; i < *oneEdgeNodesIndex; i++) /*Find the smallest value that is not zero.*/ { if(oneEdgeNodes[i] > 0) { if(oneEdgeNodes[i] < smallestTempValue) { smallestTemp = i; smallestTempValue = oneEdgeNodes[i]; } } } for(i = 0; i < *oneEdgeNodesIndex; i++) /*Find the ones that are identical.*/ { if(i != smallestTemp) { if(smallestTempValue == oneEdgeNodes[i]) { oneEdgeNodes[i] = 0; } } } tempNode[tempIndex] = smallestTempValue; /*Set the temporary value.*/ tempIndex++; /*Increment the number of values.*/ if(tempIndex >= gGraphLength) { printf("gGraphLength is too small."); exit(1); } oneEdgeNodes[smallestTemp] = 0; /*Set this to zero so that its not used again.*/ } } *oneEdgeNodesIndex = tempIndex; /*Transfer to the oneEdgeNodes array.*/ if(*oneEdgeNodesIndex >= gGraphLength) { printf("SortOneEdgeNodes: *oneEdgeNodesIndex >= gGraphLength\n"); exit(1); } for(i = 0; i < tempIndex; i++) { oneEdgeNodes[i] = tempNode[i]; } free(tempNode); return; } /********************************FindCurrentNode********************************************* * * This function takes the currentNode from which amino acid residue masses have already been * subtracted, and finds the next node lower in mass that can be connected to the C-terminus. * That new node serves as the next currentNode. */ INT_4 FindCurrentNode(SCHAR *sequenceNode, INT_4 currentNode) { INT_4 i; i = currentNode; while(i > 0) { i--; if(i < currentNode - 190 * gMultiplier) /* If i goes below the mass of currentNode - Trp, then terminate the search. I used 190 as the mass of Trp plus an inordinately large error.*/ { currentNode = 0; return(currentNode); } if(sequenceNode[i] > 0) { currentNode = i; return(currentNode); } } currentNode = i; return(currentNode); } /********************************AssignProNodeValue********************************************* * * This function figures out how to score the node. If this function has been called, * then its already known that this is one of the connectable nodes, but the score to * be placed in the array sequenceNode now needs to be determined. This is based on the * type of evidence for the nextNode and the currentNode. If the nextNode evidence is * that both N- and C-terminal ions were present at that point, then the values from * sequenceNodeC,sequenceNodeN, and sequenceNode[currentNOde] are summed. If the * evidence for the nextNode does not match w/ the currentNode, then the values for * sequenceNodeC and sequenceNodeN only are summed. */ void AssignProNodeValue(INT_4 nextNode, INT_4 currentNode, char *evidence, SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal) { INT_4 nodeScore = 0; /* The 2 aa extension w/ proline is only used if its a tryptic peptide and therefore only y ions would be expected if there are no ions delineating pro.*/ if((evidence[nextNode] == 'B' || evidence[nextNode] == 'C') && (evidence[currentNode] == 'B' || evidence[currentNode] == 'C')) { nodeScore = sequenceNodeC[nextNode] + sequenceNodeN[nextNode] + (totalIonVal * TOTALIONVAL_MULTIPLIER * 0.5); } if(sequenceNode[nextNode] < 0) /*If the nextNode value is negative, make it positive.*/ { sequenceNode[nextNode] = -1 * sequenceNode[nextNode]; } if(nodeScore > 127 || nodeScore < -127) { nodeScore = 127; } if(nodeScore > sequenceNode[nextNode]) /*If the new nodeScore is greater, then assign its value to sequencNode[nextNode], otherwise leave the original value as a positive number.*/ { sequenceNode[nextNode] = nodeScore; } return; } /********************************AssignNodeValue********************************************* * * This function figures out how to score the node. If this function has been called, * then its already known that this is one of the connectable nodes, but the score to * be placed in the array sequenceNode now needs to be determined. This is based on the * type of evidence for the nextNode and the currentNode. If the nextNode evidence is * that both N- and C-terminal ions were present at that point, then the values from * sequenceNodeC,sequenceNodeN, and sequenceNode[currentNOde] are summed. If the * evidence for the nextNode does not match w/ the currentNode, then the values for * sequenceNodeC and sequenceNodeN only are summed. */ void AssignNodeValue(INT_4 nextNode, INT_4 currentNode, char *evidence, SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal) { INT_4 nodeScore = 0; INT_4 i = 0; REAL_4 scoreAdjuster; /*if(nextNode >= 10428 && nextNode <= 10434) { i++; i++; } for debugging*/ scoreAdjuster = currentNode - nextNode; /*Nominal amino acid mass of this connection.*/ scoreAdjuster = scoreAdjuster / gAvResidueMass; /*Ratio between this connection and the average connection.*/ scoreAdjuster = (scoreAdjuster + 99) / 100; /*Reduce scoreAdjuster effect.*/ if(evidence[nextNode] == 'B' || evidence[currentNode] == 'B') { nodeScore = sequenceNodeC[nextNode] + sequenceNodeN[nextNode] + (totalIonVal * TOTALIONVAL_MULTIPLIER); /*((nodeScore + totalIonVal) + (nodeScore * totalIonVal)) / 2;*/ /*(totalIonVal * TOTALIONVAL_MULTIPLIER);*/ /*+ sequenceNode[currentNode] * TOTALIONVAL_MULTIPLIER;*/ } else { if(evidence[nextNode] != evidence[currentNode]) { nodeScore = sequenceNodeC[nextNode] + sequenceNodeN[nextNode]; } else { nodeScore = sequenceNodeC[nextNode] + sequenceNodeN[nextNode] + (totalIonVal * TOTALIONVAL_MULTIPLIER); /*((nodeScore + totalIonVal) + (nodeScore * totalIonVal)) / 2;*/ /*(totalIonVal * TOTALIONVAL_MULTIPLIER);*/ /*+ sequenceNode[currentNode] * TOTALIONVAL_MULTIPLIER;*/ } } nodeScore = (nodeScore * scoreAdjuster) + 0.5; /*Adjust score for size of extension.*/ if(sequenceNode[nextNode] < 0) /*If the nextNode value is negative, make it positive.*/ { sequenceNode[nextNode] = -1 * sequenceNode[nextNode]; } if(nodeScore > 127 || nodeScore < -127) { nodeScore = 127; } if(nodeScore > sequenceNode[nextNode]) /*If the new nodeScore is greater, then assign its value to sequencNode[nextNode], otherwise leave the original value as a positive number.*/ { sequenceNode[nextNode] = nodeScore; } return; } /***************************************InitSummedNodeArrays******************************** * * This function initializes some of the arrrays used by the function SummedNodeScore. */ void InitSummedNodeArrays(SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 *oneEdgeNodes, char *evidence) { INT_4 i; for(i = 0; i < gGraphLength; i++) /*Initialize these arrays.*/ { sequenceNode[i] = 0; evidence[i] = 0; } for(i = 0; i < gGraphLength; i++) { oneEdgeNodes[i] = 0; } for(i = 0; i < gGraphLength; i++) /*Set up the evidence array.*/ { if(sequenceNodeC[i] != 0 && sequenceNodeN[i] != 0) { evidence[i] = 'B'; } if(sequenceNodeC[i] != 0 && sequenceNodeN[i] == 0) { evidence[i] = 'C'; } if(sequenceNodeC[i] == 0 && sequenceNodeN[i] != 0) { evidence[i] = 'N'; } } for(i = 0; i < gGraphLength; i++) /*-1 implies a sequencetag*/ { if(sequenceNodeC[i] == -1 && sequenceNodeN[i] == -1) { evidence[i] = 'B'; } } return; } /******************************SummedNodeScore********************************************** * * This function is used to connect the nodes that were identified in the function * MakeSequenceGraph. The input data is stored in the INT_4 arrays sequenceNodeC * and sequenceNodeN. The array sequenceNodeC contains all of the evidence for C-terminal * ions, and the array sequenceNodeN contains the evidence for N-terminal ions. Both * arrays were formed by assuming that each observed ion could be one of several possiblities - * b, a, y, etc. - and then mathematically converting to the corresponding b ion value. * Here the program starts to connect the nodes, beginning at the C-terminus. It does * not remember exactly how it connects the nodes, rather the point is to assign high score * values to the array sequenceNode for those nodes where its possible to connect with the * C-terminus. Later, the program will start making subsequences starting at the N-terminus, * and it will use the scores held in sequenceNode to guide the way towards the C-terminus. * In addition, this function keeps track of those nodes that can be connected to the * C-terminus, but do not connect to any other nodes N-terminal to it. These so-called one- * edged nodes will be used for making two amino acid jumps between nodes. */ void SummedNodeScore(SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 *oneEdgeNodes, INT_4 *oneEdgeNodesIndex, INT_4 totalIonVal) { INT_4 i, k; INT_4 j, currentNode, nextNode, lowSuperNode, highSuperNode; INT_4 highArgNode, highLysNode, lowArgNode, lowLysNode, lowMassCutoff, y2; char *evidence; char test = TRUE; char anotherTest, doIt; evidence = (char *) malloc(gGraphLength * sizeof(char )); /*Will contain C-terminal evidence.*/ if(evidence == NULL) { printf("SummedNodeScore: Out of memory"); exit(1); } *oneEdgeNodesIndex = 0; /* Find the low mass cutoff to be used for LCQ data.*/ lowMassCutoff = (gParam.peptideMW + (gParam.chargeState * gElementMass_x100[HYDROGEN])) / gParam.chargeState; lowMassCutoff = lowMassCutoff * 0.333; /*Use the rule of 1/3 for the low mass end cutoff.*/ /* Find the lowSuperNode and highSuperNode*/ highSuperNode = gGraphLength -1; /*default values when no specific sequencetag is used*/ lowSuperNode = 0; if(gParam.tagSequence[0] != '*') { for(i = gGraphLength - 1; i >= 0; i--) { if(sequenceNodeN[i] == -1 && sequenceNodeC[i] == -1) { highSuperNode = i; break; } } for( i = 0; i < gGraphLength; i++) { if(sequenceNodeN[i] == -1 && sequenceNodeC[i] == -1) { lowSuperNode = i; break; } } } /* * Find the nodes corresponding to the C-terminal node minus R or K. This is used for * a section below that compensates for the absence of y2 ions in higher mass peptides * obtained from ion traps w/ a low mass cutoff of 1/3 of the precursor ion. */ i = gGraphLength - 1; while(i > 0 && sequenceNodeC[i] == 0) { i--; } highArgNode = i - gGapList[R]; highLysNode = i - gGapList[K]; while(i > 0 && sequenceNodeC[i] != 0) { i--; } i = i + 1; /*Go back up to where evidence is not zero.*/ lowArgNode = i - gGapList[R]; lowLysNode = i - gGapList[K]; /* Initialize sequenceNode, oneEdgeNodes, and evidence to zero. Then assign values of * either B, N, or C to those nodes that have any evidence derived from both the N- and * C-terminii, the N-terminus only, or the C-terminus only. */ InitSummedNodeArrays(sequenceNode, sequenceNodeC, sequenceNodeN, oneEdgeNodes, evidence); /* Make sure that the index is greater than zero. The first time evidence is not zero, * the value of test becomes FALSE, which means that thereafter the while loop will continue * only as INT_4 as evidence does not equal zero, which is only possible for those nodes * that are due to multiple C-terminal nodes resulting from a large error in the peptide * mass measurement. */ i = gGraphLength - 1; while(i > 0 && (test || evidence[i] != 0)) { i--; if(evidence[i] != 0) { test = FALSE; currentNode = i; /* Assign the value to the C-terminal node of sequenceNode.*/ if(sequenceNodeC[currentNode] + sequenceNodeN[currentNode] < 127) { sequenceNode[currentNode] = sequenceNodeC[currentNode] + sequenceNodeN[currentNode]; } else { sequenceNode[currentNode] = 127; } /* Now that I've found a C-terminal Node, lets start connecting the dots.*/ /* * A connection is made * when the difference between the currentNode and a lower value node equals the nominal mass * of an amino acid residue. That lower value node must have evidence of being a real cleavage * in that the value of evidence[node] is not zero (ie, is B, C, or N). Once a connection * is made, then a value for that node is assigned to the array sequenceNode[node]. If a * previous connection has made an assignment to the array sequenceNode at the node under * consideration, then the node with the greatest absolute value is kept (and is made positive, * since it is now part of the current set of node connections}. Once all * of the connections have been made and sequenceNode assignments have been completed, then * all values are given a negative value in order to distinguish old node connections with * any new ones using differnt C-terminii. For a given currentNode, the program checks the * twenty possible nextNode values. After that it finds the next currentNode, which will be * the next node down that has a positive value for sequenceNode[node]. */ while(currentNode != 0) { anotherTest = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { nextNode = currentNode - gGapList[j]; doIt = TRUE; if(currentNode > highSuperNode && nextNode < lowSuperNode) { doIt = FALSE; /*you skipped a superNode*/ } if(nextNode >= 0 && evidence[nextNode] != 0 && doIt) { anotherTest = FALSE; AssignNodeValue(nextNode, currentNode, evidence, sequenceNode, sequenceNodeC, sequenceNodeN, totalIonVal); } } } if(gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q' || gParam.fragmentPattern == 'L') { for(j = 0; j < gAminoAcidNumber; j++) /*check for 2aa's w/ proline*/ { if(gGapList[j] != 0) { nextNode = currentNode - gGapList[j] - gGapList[P]; doIt = TRUE; if(currentNode > highSuperNode && nextNode < lowSuperNode) { doIt = FALSE; /*you skipped a superNode*/ } if(nextNode >= 0 && evidence[nextNode] != 0 && doIt) { anotherTest = FALSE; AssignProNodeValue(nextNode, currentNode, evidence, sequenceNode, sequenceNodeC, sequenceNodeN, totalIonVal); } } } } /*For LCQ data the y2 ions may be missing for ions greater than 1200 (1200 is chosen because GK y2 would be below the ion trap low mass cutoff), so 2 amino acid extensions are allowed for tryptic peptides below R or K.*/ if(gParam.fragmentPattern == 'L' && gParam.peptideMW > 1200 * gMultiplier && gParam.proteolysis == 'T' && gParam.chargeState <= 2) { if((currentNode <= highArgNode && currentNode >= lowArgNode) || (currentNode <= highLysNode && currentNode >= lowLysNode)) { for(j = 0; j < gAminoAcidNumber; j++) /*check for 2aa's*/ { if(gGapList[j] != 0) { for(k = j; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { nextNode = currentNode - gGapList[j] - gGapList[k]; doIt = TRUE; if(currentNode > highSuperNode && nextNode < lowSuperNode) { doIt = FALSE; /*you skipped a superNode*/ } y2 = 147 * gMultiplier; /*y1 ion for c-term Lys*/ if(gGapList[j] < gGapList[k]) { y2 += gGapList[j]; /*add the lowest mass aa*/ } else { y2 += gGapList[k]; } if(y2 > lowMassCutoff) { doIt = FALSE; /*this y2 should be within range*/ } if(nextNode >= 0 && evidence[nextNode] != 0 && doIt) { anotherTest = FALSE; AssignNodeValue(nextNode, currentNode, evidence, sequenceNode, sequenceNodeC, sequenceNodeN, totalIonVal); } } } } } } } if(anotherTest) { if(*oneEdgeNodesIndex < gGraphLength - 1) { oneEdgeNodes[*oneEdgeNodesIndex] = currentNode; *oneEdgeNodesIndex += 1; if(*oneEdgeNodesIndex >= gGraphLength) { printf("SummedNodeScore: *oneEdgeNodesIndex >= gGraphLength\n"); exit(1); } } else { printf("gGraphLength is too small."); exit(1); } } /* * If I reach the N-terminus, or no further connections can be made, then FindCurrentNode * returns a value of zero, which terminates the while loop. */ currentNode = FindCurrentNode(sequenceNode, currentNode); if(currentNode > gGraphLength || currentNode < 0) { printf("SummedNodeScore: currentNode > gGraphLength || currentNode < 0\n"); exit(1); } } for(j = 0; j < gGraphLength; j++) /*Make all of the positive values negative.*/ { if(sequenceNode[j] > 0) { sequenceNode[j] = -1 * (INT_2)sequenceNode[j]; } } } } for(i = 0; i < gGraphLength; i++) /*Make everything positive.*/ { if(sequenceNode[i] < 0) { sequenceNode[i] = -1 * sequenceNode[i]; } } SortOneEdgeNodes(oneEdgeNodes, oneEdgeNodesIndex); AddExtraNodes(sequenceNode, sequenceNodeN, sequenceNodeC, evidence); /* Add -1 to the superNode positions of sequenceNode*/ for(i = 0; i < gGraphLength; i++) { if(sequenceNodeN[i] == -1 && sequenceNodeC[i] == -1) { sequenceNode[i] = -1; } } if(gParam.fMonitor && gCorrectMass) { printf("Graph is finished. \n"); } free(evidence); return; } lutefisk-1.0.7+dfsg.orig/src/getopt.h0000644000175000017500000000033210102256715017410 0ustar rusconirusconi#ifndef __GETOPT__ #define __GETOPT__ #pragma once #include "LutefiskDefinitions.h" /* Public function prototypes */ INT_4 getopt(INT_4 argc, CHAR **argv, CHAR *opts); #endif /* __GETOPT__ */lutefisk-1.0.7+dfsg.orig/src/LutefiskDefinitions.h0000644000175000017500000004006110475410104022070 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ #ifndef _LUTEFISK_DEFS_ #define _LUTEFISK_DEFS_ #include #define DEBUG /* Uncomment this line for debugging */ #ifndef TRUE #define TRUE 1 #endif #ifndef FALSE #define FALSE 0 #endif #ifndef NULL #define NULL 0 #endif #ifdef __LITTLE_ENDIAN #undef __LITTLE_ENDIAN #endif #ifdef __BIG_ENDIAN #undef __BIG_ENDIAN #endif #if (defined __MWERKS__ && __dest_os == __mac_os) || defined __OS_X #define __BIG_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif #if defined __MWERKS__ && __dest_os == __win32_os #define __LITTLE_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif #if defined __SOLARIS #define __BIG_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif #if defined __ALPHA #define __LITTLE_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif #if defined __IRIX #define __BIG_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif #if defined __AIX #define __BIG_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif #if defined __LINUX #define __LITTLE_ENDIAN #define CHAR char #define SCHAR signed char #define INT_2 signed short int #define UINT_2 unsigned short int #define INT_4 signed int #define UINT_4 unsigned int #define REAL_4 float #define REAL_8 double #define BOOLEAN unsigned char #endif /* Define a few constants. */ #define AMINO_ACID_NUMBER 25 /*Number of amino acids.*/ /* AMINO ACIDS */ #define A 0 #define R 1 #define N 2 #define D 3 #define C 4 #define E 5 #define Q 6 #define G 7 #define H 8 #define I 9 #define L 10 #define K 11 #define M 12 #define F 13 #define P 14 #define S 15 #define T 16 #define W 17 #define Y 18 #define V 19 #define ELEMENT_NUMBER 6 /*Number of elements - H, C, N, O, P, and S.*/ #define HYDROGEN 0 #define CARBON 1 #define NITROGEN 2 #define OXYGEN 3 #define PHOSPHORUS 4 #define SULFUR 5 #define WATER ((gElementMass[HYDROGEN] * 2) + gElementMass[OXYGEN]) /*Mass of water.*/ #define AMMONIA (gElementMass[NITROGEN] + (gElementMass[HYDROGEN] * 3)) /*Mass of ammonia.*/ #define CO (gElementMass[CARBON] + gElementMass[OXYGEN]) /*Mass of carbon monoxide.*/ #define MAX_ION_NUM 500 /*500 Maximum number of fragment ions used in the final scoring.*/ #define MAX_DATA_POINTS_PER_GROUP 2500 /*The max number of data points per group of ions that exceed an ion threshold. Each group is passed on to a function that sorts the ions.*/ #define SPECTRAL_WINDOW_WIDTH 120 /*Width of spectrum that has a limit on the number of ions.*/ #define MULTIPLIER_SWITCH 2.5 /*Used in determining gMultiplier*/ #define GRAPH_LENGTH 400000 /*The maximum graph size (ie, peptides must be less than 4000 Da).*/ #define AV_TO_MONO 0.999371395 /*The weighted average ratio between average and monoisotopic amino acid masses.*/ #define MONO_TO_NOMINAL 0.99949725 /*The weighted average ratio between monoisotopic and nominal masses.*/ #define MONO_TO_AV 1.000629 /*To convert from monoisotopic to average mass.*/ #define NOMINAL_TO_MONO 1.000503003 /*To convert from nominal to monoisotopic mass.*/ #define NOMINAL_TO_AV 1.001132319 /*To convert from nominal to average mass.*/ #define C_NODE_VALUE 10 /*The value placed at the C-terminal nodes.*/ #define N_NODE_VALUE 10 /*The value placed at the N-terminal nodes.*/ #define ONE_EDGE_NODE_MAX 4000 /*Max number of one edged nodes.*/ #define MIN_MASS_PER_CHARGE 300 /*For charge states > 1 there is a minimum amount of mass required to hold that charge.*/ #define MAX_PEPTIDE_LENGTH 60 /*35 Maximum peptide length.*/ #define MAX_GAPLIST (AMINO_ACID_NUMBER * AMINO_ACID_NUMBER) #define AV_RESIDUE_MASS 119 /*This is the weighted average amino acid residue mass.*/ #define TAG_MULTIPLIER 10 #define AV_MONO_TRANSITION 400 /*The factor that converts average to mono is transitioned over this mass range below the gParam.monoToAv value.*/ #define OVER_USED_IONS 0.9 /*For ions that are used for both y and b ions, the ionFound value is attenuated by this much.*/ /*The following affects the node values and/or subsequencing.*/ #define EDGE_EDGE_PENALTY 0.9 /*From a N-terminal one-edge node to a C-terminal one-edge node.*/ #define PROLINE_PENALTY 0.75 /*For gaps that might contain proline.*/ #define PRECURSOR_PENALTY 0.65 /*For gaps that encompass the precursor ion.*/ #define GLYCINE_PENALTY 0.5 /*For gaps that might contain glycine.*/ #define NODE_EDGE_PENALTY 0.4 /*From node to a C-terminal one-edge node.*/ #define NODE_NODE_PENALTY 0.2 /*These are multiplied against an extension score for an extension that uses two amino acids.*/ #define TOTALIONVAL_MULTIPLIER 1 /*Effects the node values that connect to C-terminus.*/ #define HIGH_CHARGE_MULT 0.5 /*Effects the node value if fragment has high charge (0-1).*/ #define HIGH_MASS_B_MULT 0.5 /*Effects the node value if b ions are more than precursor m/z (0-1).*/ #define HIGH_MASS_A_MULT 0.1 /*Effects the node value if a ions are more than 350 Da (0-1).*/ /* These weighting values are used to weight the importance of various scoring parameters when calculating the final score. Used in LutefiskScore.*/ #define ATTENUATION_WEIGHT 0 /*1Presence of b and y ions.*/ #define INTENSITY_WEIGHT 1 /*2Ion current accounted for.*/ #define PEAKS_WEIGHT 0 /*0Peaks per residue divided by peaks per average residue.*/ #define NUMBER_WEIGHT 0 /*1Number of ions accounted for.*/ #define ATT_INT_PEAKS_NUM (ATTENUATION_WEIGHT + INTENSITY_WEIGHT + PEAKS_WEIGHT + NUMBER_WEIGHT) #define INT_NUM_WEIGHT (NUMBER_WEIGHT + INTENSITY_WEIGHT) #define INT_ATT_WEIGHT (INTENSITY_WEIGHT + ATTENUATION_WEIGHT) #define INT_PEAKS_WEIGHT (INTENSITY_WEIGHT + PEAKS_WEIGHT) #define MAX_X_CORR_NUM 5000 /*The maximum number of sequences to cross-corr score.*/ #define NEUTRAL_LOSS_MULTIPLIER 1 /*0.85 Fraction of ion signal for neutral loss ions (y-17, b-17, etc) that is counted in intensity score. Must be 0-1.*/ #define INTERNAL_FRAG_MULTIPLIER 1 /*0.5 Fraction of ion signal for internal fragment ions that is counted in intensity score. Must be 0-1.*/ #define HIGH_MASS_B_ION_MULTIPLIER 1 /*0.3 Fraction of ion signal for certain high mass b ions that is counted in intensity score. Must be 0-1.*/ #define HIGH_MASS_A_ION_MULTIPLIER 1 /*0.1Fraction of ion signal for certain high mass a ions that is counted in intensity score. Must be 0-1.*/ #define HIGH_CHARGE_Y_ION_MULTIPLIER 1 /*0.65 Fraction of ion signal for y ions that have the same charge as the precursor. Must be 0-1.*/ #define TWO_AA_EXTENSION_MULTIPLIER 1 /*1Fraction of ion signal for two amino acid extensions. Must be 0-1.*/ #define OXMET_MULTIPLIER 1 /*1Fraction of ion signal for losses of 46 u when mass accuracy is sufficient to differentiate oxMet.*/ #define PHE_MULTIPLIER 1 /*0.5Fraction of ion signal for losse of 46 u when mass accuracy is not sufficient to differentiate oxMet.*/ #define SIZEOF_SPECTRA_SMALL 2048 /* Arrays for FFT analysis must be a power of 2 in size */ #define SIZEOF_SPECTRA_BIG 4096 #define TAG_CUTOFF 50 /*Percentage of total ion current in tag region that must be accounted for before the sequence is considered a tag.*/ #define GOLDEN_BOY_CUTOFF 0.55 /*Fraction of average goldenBoy intensity as cutoff.*/ #define GOLDEN_BOY_MAX 600 /*Golden boys cannot be greater than this m/z*/ #define IONFOUND_ISOTOPE 0.5 /*Value in ionFound for positions that could be isotopes of possible b or y ions.*/ #define SIGNAL_NOISE 4 /*This is the signal to noise required for an ion to be used.*/ #define MAX_QTOF_SEQUENCES 150 /*Max number of sequences for scoring using qtofErr*/ #define MAX_DATABASE_SEQ_NUM 50 /*Max number of sequences derived from database match*/ #define WRONG_SEQ_NUM 100 /*Number of wrong masses (and sequences) for comparing to correct*/ /* MACROS */ #define closeEnough(x,tol) (((x)<0)? (-(x)<=tol):(x)<=tol) #define MolWeightOf_Pos(mz,z) ((mz - monoisotopicElementMass[HYDROGEN]) * (REAL_4)(z)) #define MolWeightOf_Neg(mz,z) ((mz + monoisotopicElementMass[HYDROGEN]) * (REAL_4)(z)) #define mzOf_Pos(MW,z) (((MW)/(REAL_4)(z)) + monoisotopicElementMass[HYDROGEN]) #define mzOf_Neg(MW,z) (((MW)/(REAL_4)(z)) - monoisotopicElementMass[HYDROGEN]) #define chargeOf_Pos(MW,mz) ((INT_4)(MW)/(mz - monoisotopicElementMass[HYDROGEN]) + 0.1) #define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr /* Define a few structs. */ extern struct MSData /*Used to hold ion information.*/ { REAL_4 mOverZ; INT_4 intensity; INT_4 normIntensity; struct MSData *next; } MSData; typedef struct { REAL_4 mOverZ; INT_4 intensity; }tRawMSData; typedef struct { INT_4 numObjects; INT_4 limit; INT_4 sizeofobject; INT_4 growNum; tRawMSData *mass; }tRawMSDataList; typedef struct { INT_4 index; REAL_4 mOverZ; INT_4 intensity; INT_4 normIntensity; }tMSData; typedef struct { INT_4 numObjects; INT_4 limit; INT_4 sizeofobject; INT_4 growNum; tMSData *mass; }tMSDataList; typedef struct /*Structure to hold data about the CID file.*/ { REAL_4 scanMassLow; REAL_4 scanMassHigh; char instrument[10]; char centroidOrProfile; }tmsms; extern tmsms msms; typedef struct { INT_4 a; INT_4 a_minus17or18; INT_4 b; INT_4 b_minus17or18; INT_4 b_minusOH; INT_4 b_minusOH_minus17; INT_4 c; INT_4 d; INT_4 v; INT_4 w; INT_4 x; INT_4 y; INT_4 y_minus2; INT_4 y_minus17or18; INT_4 z_plus1; }tionWeights; extern tionWeights gWeightedIonValues; typedef struct /*Used to hold Lutefisk's parameters.*/ { char fMonitor; char fVerbose; clock_t startTicks; clock_t searchTime; char paramFile[256]; char outputFile[256]; char cidFilename[256]; char detailsFilename[256]; char residuesFilename[256]; REAL_4 peptideMW; INT_4 chargeState; BOOLEAN maxent3; REAL_4 fragmentErr; REAL_4 qtofErr; REAL_4 ionOffset; char fragmentPattern; REAL_4 cysMW; char proteolysis; char centroidOrProfile; INT_4 monoToAv; REAL_4 ionsPerWindow; char aaPresent[AMINO_ACID_NUMBER]; char aaAbsent[AMINO_ACID_NUMBER]; REAL_4 modifiedNTerm; REAL_4 modifiedCTerm; REAL_4 tagNMass; INT_4 shoeSize; char tagSequence[MAX_PEPTIDE_LENGTH]; REAL_4 tagCMass; REAL_4 outputThreshold; INT_4 finalSeqNum; INT_4 topSeqNum; REAL_4 extThresh; INT_4 maxExtNum; INT_4 maxGapNum; INT_4 outputSeqNum; REAL_4 peakWidth; REAL_4 ionThreshold; /* Fraction */ INT_4 intThreshold; /* Actual intensity threshold */ REAL_4 peptideErr; BOOLEAN autoTag; BOOLEAN edmanPresent; char edmanFilename[256]; REAL_4 ionsPerResidue; char CIDfileType; char databaseSequences[256]; BOOLEAN quality; INT_4 wrongSeqNum; INT_4 topSeqNum_orig; REAL_4 peptideMW_orig; REAL_4 peptideErr_orig; REAL_4 fragmentErr_orig; REAL_4 peakWidth_orig; INT_4 monoToAv_orig; REAL_4 qtofErr_orig; REAL_4 ionOffset_orig; REAL_4 cysMW_orig; REAL_4 tagNMass_orig; REAL_4 tagCMass_orig; REAL_4 modifiedNTerm_orig; REAL_4 modifiedCTerm_orig; INT_4 maxGapNum_orig; char outputFile_orig[256]; }tParam; extern tParam gParam; extern struct Sequence /*Used to hold sequence info during subsequencing.*/ { INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength; INT_4 score; INT_4 nodeValue; INT_2 nodeCorrection; INT_4 gapNum; struct Sequence *next; } Sequence; typedef struct { INT_4 peptide[MAX_PEPTIDE_LENGTH]; INT_4 peptideLength; INT_4 score; INT_4 nodeValue; INT_4 gapNum; }tsequence; typedef struct { INT_4 numObjects; INT_4 limit; INT_4 sizeofobject; INT_4 growNum; tsequence *seq; }tsequenceList; extern struct SequenceScore /*Used to hold sequence info during final scoring procedure.*/ { INT_4 peptide[MAX_PEPTIDE_LENGTH]; REAL_4 intensityScore; REAL_4 intensityOnlyScore; REAL_4 crossDressingScore; REAL_4 stDevErr; REAL_4 calFactor; REAL_4 quality; /*new 6/20/01 this is contiguous single aa divided by actual peptide length*/ REAL_4 length; /*new 6/20/01 this is the unfudged unadulterated peptide length, counting dipeptide residues as two amino acids*/ REAL_8 probScore; /*7/30/03 added Pevzner probability score*/ REAL_4 comboScore; /*final combined score*/ INT_4 cleavageSites; INT_2 peptideSequence[MAX_PEPTIDE_LENGTH]; char databaseSeq; INT_4 rank; struct SequenceScore *next; } SequenceScore; typedef struct extension { char gapSize; BOOLEAN singleAAFLAG; INT_4 mass; INT_4 score; INT_2 nodeCorrection; } extension; extern char gSingAA[AMINO_ACID_NUMBER]; extern REAL_4 gMonoMass[AMINO_ACID_NUMBER]; extern REAL_4 gAvMass[AMINO_ACID_NUMBER]; extern INT_4 gNomMass[AMINO_ACID_NUMBER]; extern REAL_4 gElementMass[ELEMENT_NUMBER]; extern INT_4 gGapList[MAX_GAPLIST]; extern INT_4 gAminoAcidNumber; extern INT_4 gGapListIndex; extern INT_4 gEdmanData[MAX_PEPTIDE_LENGTH][AMINO_ACID_NUMBER], gMaxCycleNum; extern REAL_4 H2O, NH3; extern INT_4 gElementMass_x100[ELEMENT_NUMBER]; /*Values assigned in CreateGlobalIntegerMassArrays*/ extern INT_4 gMonoMass_x100[AMINO_ACID_NUMBER]; /*Values assigned in CreateGlobalIntegerMassArrays*/ extern INT_4 gMultiplier; /*Value determined in CreateGlobalIntegerMassArrays*/ extern INT_4 gNodeCorrection[MAX_GAPLIST]; /*Value determined in CreateGlobalIntegerMassArrays*/ extern INT_4 gElementCorrection[ELEMENT_NUMBER]; /*Value determined in CreateGlobalIntegerMassArrays*/ extern INT_4 gAvMonoTransition, gWater, gAmmonia, gCO, gAvResidueMass, gGraphLength; extern REAL_4 gWrongXCorrScore[WRONG_SEQ_NUM + 1]; extern REAL_4 gWrongIntScore[WRONG_SEQ_NUM + 1]; extern REAL_4 gWrongProbScore[WRONG_SEQ_NUM + 1]; extern REAL_4 gWrongQualityScore[WRONG_SEQ_NUM + 1]; extern REAL_4 gWrongComboScore[WRONG_SEQ_NUM + 1]; extern INT_4 gSingleAACleavageSites; extern INT_4 gWrongIndex; extern INT_4 gTagLength; extern BOOLEAN gCorrectMass; extern BOOLEAN gFirstTimeThru; extern REAL_4 gIonTypeWeightingTotal; extern BOOLEAN gDatabaseSeqCorrect; #endif /* _LUTEFISK_DEFS_ */ lutefisk-1.0.7+dfsg.orig/src/LutefiskMakeGraph.c0000644000175000017500000024144010303626576021467 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* Richard S. Johnson 6/96 LutefiskMakeGraph is a file containing the function MakeSequenceGraph plus its associated functions. It was written to be used as part of a program called "LutefiskXP", which is used to aid in the interpretation of CID data of peptides. The general aim of this file (and the function MakeSequenceGraph) is to convert the CID data into a graph of integer values corresponding to the nominal masses of singly charged b ions. There are three ways that are being developed for making this conversion (otherwise known as three templates), which depend on the type of data that is under investigation. Currently only one of these templates has been completed. Who can guess when the others might be done? */ #include #include #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" /********************************RemoveSillyNodes********************************************** * * RemoveSillyNodes removes all nodes below 239 that cannot be made of any combination of * amino acids. After writing this function I came to doubt that it would really do much * good, since the silly nodes are near the N-terminus and won't connect. Hence, it probably * doesn't matter if there are these silly nodes or not. */ void RemoveSillyNodes(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN) { INT_4 i, j, k, firstNode, testMass; char test; /* Find the firstNode, based on the N-terminal modification.*/ firstNode = gParam.modifiedNTerm + 0.5; /* * Zero out any non-zero nodes that are less than firstNode + Gly. */ for(i = 0; i < firstNode + gMonoMass_x100[G]; i++) { if(i != firstNode) { sequenceNodeC[i] = 0; sequenceNodeN[i] = 0; } } /* * Nodes between 57 and less than 142 (2xAla) (plus firstNode) can only be due to * single amino acids. 142 is the smallest number that can be made up from two amino * acids that cannot also be made up from one amino acid. These get zero'ed out. */ for(i = firstNode + gMonoMass_x100[G]; i < firstNode + gMonoMass_x100[A] * 2; i++) { if(sequenceNodeC[i] != 0 || sequenceNodeN[i] != 0) { test = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { testMass = firstNode + gGapList[j]; if(testMass == i) { test = FALSE; } } } if(test) { sequenceNodeC[i] = 0; sequenceNodeN[i] = 0; } } } /* * Step through each node from firstNode + 142 to firstNode + 239. 142 is the smallest number * that can be made from two amino acids (2xAla) and 239 (I think) is the smallest number that * cannot be made up from two amino acids, but can be made up from three amino acids. */ for(i = firstNode + 142 * gMultiplier; i < firstNode + 239 * gMultiplier; i++) { if(sequenceNodeC[i] != 0 || sequenceNodeN[i] != 0) { test = TRUE; for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { for(k = j; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { testMass = firstNode + gGapList[j] + gGapList[k]; if(testMass == i) { test = FALSE; } } } } } for(j = 0; j < gAminoAcidNumber; j++) { if(gGapList[j] != 0) { testMass = firstNode + gGapList[j]; if(testMass == i) { test = FALSE; } } } if(test) { sequenceNodeC[i] = 0; sequenceNodeN[i] = 0; } } } return; } /********************************FindTrypticLCQY17Ions******************************************** * * This function assumes that the CID ions are all of type y-17 or y-18. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeC are * assigned the additional value of gWeightedIonValues.y_minus17or18. Only y-17 or y-18 ions that have a * corresponding y ion are counted. */ void FindTrypticLCQY17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeC) { struct MSData *currPtr; INT_4 i, j, testForChar; INT_4 y17MassMin, y17MassMax; REAL_4 y17Mass, peptideMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; /*maxent3 data is converted to +1 fragments*/ } /*the max fragment charge state for maxent3 is one*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { y17Mass = currPtr->mOverZ; test = IsThisPossible(y17Mass, i); if(test) { y17Mass = (y17Mass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ if(y17Mass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(y17Mass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - y17Mass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(y17Mass > (gParam.monoToAv - gAvMonoTransition)) { y17Mass = y17Mass * aToMFactor; } peptideMass = gParam.peptideMW; if(peptideMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - peptideMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { peptideMass = peptideMass * aToMFactor; } y17Mass = peptideMass - y17Mass + (2 * gElementMass_x100[HYDROGEN]); /*Partially convert to b ion.*/ /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ y17Mass = y17Mass - gAmmonia + gParam.fragmentErr; y17MassMax = y17Mass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ y17Mass = y17Mass - gParam.fragmentErr - gParam.fragmentErr - gWater + gAmmonia + 0.5; y17MassMin = y17Mass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(y17MassMax >= gGraphLength) { printf("FindTrypticLCQY17Ions: y17MassMax >= gGraphLength|n"); exit(1); } for(j = y17MassMin; j <= y17MassMax; j++) { if(sequenceNodeC[j] != 0) { testForChar = sequenceNodeC[j]; if(i <= mostLikelyFragCharge) { testForChar += (INT_4)gWeightedIonValues.y_minus17or18; } else { testForChar += (INT_4)(gWeightedIonValues.y_minus17or18 * HIGH_CHARGE_MULT); } if(testForChar < 127 && testForChar > -127) { sequenceNodeC[j] = testForChar; } else { sequenceNodeC[j] = 63; } } } } } currPtr = currPtr->next; } return; } /********************************* FindTrypticLCQYIons*************************************************** * * This function assumes that the CID ions are all of type y. The nominal mass values are * determined and the corresponding positions in the array sequenceNodeC are assigned the * additional value of gWeightedIonValues.y. */ void FindTrypticLCQYIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeC) { struct MSData *currPtr; INT_4 yMassMin, yMassMax; INT_4 i, j, testForChar; REAL_4 yMass, peptideMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { yMass = currPtr->mOverZ; test = IsThisPossible(yMass, i); if(test) { yMass = (yMass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ if(yMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(yMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - yMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(yMass > (gParam.monoToAv - gAvMonoTransition)) { yMass = yMass * aToMFactor; } peptideMass = gParam.peptideMW; if(peptideMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - peptideMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { peptideMass = peptideMass * aToMFactor; } yMass = peptideMass - yMass + (2 * gElementMass_x100[HYDROGEN]); /*Convert to b ion.*/ /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node, and convert to the b ion mass.*/ if(yMass <= 372 * gMultiplier) /*Allow greater range for high mass y+2 ions*/ { yMass = yMass + gParam.fragmentErr * i; } else { yMass = yMass + gParam.fragmentErr; } yMassMax = yMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ if(yMass <= 372 * gMultiplier) /*Allow greater range for high mass y+2 ions*/ { yMass = yMass - (i * gParam.fragmentErr) - (i * gParam.fragmentErr) + 0.5; } else { yMass = yMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; } yMassMin = yMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(yMassMax >= gGraphLength) { printf("FindTrypticLCQYIons: yMassMax >= gGraphLength\n"); exit(1); } for(j = yMassMin; j <= yMassMax; j++) { testForChar = sequenceNodeC[j]; /*make sure value fits in a char*/ if(i <= mostLikelyFragCharge || yMass <= 373 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.y; } else { testForChar += (INT_4)(gWeightedIonValues.y * HIGH_CHARGE_MULT); } if(testForChar < 127 && testForChar > -127) { sequenceNodeC[j] = testForChar; } else { sequenceNodeC[j] = 63; } } } } currPtr = currPtr->next; } /* If a sequenceNodeC is assigned a non-zero value that is less than the full gWeightedIonValues.y, * then that means that it was because an ion was assumed to be of an unlikely charge state. * For example, if a doubly charged precursor had an ion that was assumed to be a doubly * charged fragment, but the corresponding singly charged ion was absent, then the value * for that node would be less than gWeightedIonValues.y. I remove these from the list here. * * For the LCQ, I've noticed that there are often doubly charged y ions from +2 precursors, * where the fragment ions are due to the loss of the N-terminal one, two, or three amino * acids. So don't zero out the first 373 positions (this would be two tryptophans). */ for(i = 373 * gMultiplier; i < gGraphLength; i++) { if(sequenceNodeC[i] < gWeightedIonValues.y) { sequenceNodeC[i] = 0; } } return; } /********************************FindTrypticLCQA17Ions******************************************** * * This function assumes that the CID ions are all of type a-17 or a-18. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeN are * assigned the additional value of gWeightedIonValues.a_minus17or18. Only a-17 or a-18 ions that have a * corresponding a ion are counted. */ void FindTrypticLCQA17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent) { struct MSData *currPtr; INT_4 a17MassMin, a17MassMax, i, j, testForChar; REAL_4 a17Mass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { a17Mass = currPtr->mOverZ; test = IsThisPossible(a17Mass, i); if(test) { a17Mass = (a17Mass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(a17Mass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(a17Mass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - a17Mass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(a17Mass > (gParam.monoToAv - gAvMonoTransition)) { a17Mass = a17Mass * aToMFactor; } /*The two extremes for the possible a ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ a17Mass = a17Mass + gWater + gCO + gParam.fragmentErr; a17MassMax = a17Mass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ a17Mass = a17Mass - gParam.fragmentErr - gParam.fragmentErr + 0.5; a17MassMin = a17Mass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(a17MassMax >= gGraphLength) { printf("FindTrypticLCQA17Ions: a17MassMax >= gGraphLength\n"); exit(1); } for(j = a17MassMin; j <= a17MassMax; j++) { if(ionPresent[j - gCO] != 0) { testForChar = sequenceNodeN[j]; if(i <= mostLikelyFragCharge) { if(a17Mass < 350 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.a_minus17or18; } else { testForChar += (INT_4)(gWeightedIonValues.a_minus17or18 * HIGH_MASS_A_MULT); } } else { if(a17Mass < 350 * gMultiplier) { testForChar += (INT_4)(gWeightedIonValues.a_minus17or18 * HIGH_CHARGE_MULT); } else { testForChar += (INT_4)(gWeightedIonValues.a_minus17or18 * HIGH_MASS_A_MULT * HIGH_CHARGE_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } } } } } currPtr = currPtr->next; } return; } /********************************FindTrypticLCQAIons******************************************** * * This function assumes that the CID ions are all of type a. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeN are * assigned the additional value of gWeightedIonValues.a. Only those a ions that have a * corresponding b ion are counted. Using nominal masses as the index, the number one is * placed in the array ionPresent at the nominal mass of the a ion. */ void FindTrypticLCQAIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent) { struct MSData *currPtr; INT_4 aMassMin, aMassMax, i, j, testForChar; REAL_4 aMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { aMass = currPtr->mOverZ; test = IsThisPossible(aMass, i); if(test) { aMass = (aMass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(aMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(aMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - aMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(aMass > (gParam.monoToAv - gAvMonoTransition)) { aMass = aMass * aToMFactor; } /*The two extremes for the possible a ion nodes are identified first. The integer mass value of the a ion is the index number of the array ionPresent, which is initially set to zero for all of its values. If a b ion is present, then the a ion is counted. The fact that the a ion has been counted is noted by changing ionPresent[whatever the a ion mass is] to 1. This is use when deciding if there are any a-17 ions to be included in the scores for the nodes. */ /*First I'll find the highest mass node.*/ aMass = aMass + gCO + gParam.fragmentErr; aMassMax = aMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ aMass = aMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; aMassMin = aMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(aMassMax >= gGraphLength) { printf("FindTrypticLCQAIons: aMassMax >= gGraphLength\n"); exit(1); } for(j = aMassMin; j <= aMassMax; j++) { if(sequenceNodeN[j] != 0) { testForChar = sequenceNodeN[j]; /*make sure value fits in a char*/ if(i <= mostLikelyFragCharge) { if(aMass < 350 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.a; } else { testForChar += (INT_4)(gWeightedIonValues.a * HIGH_MASS_A_MULT); } } else { if(aMass < 350 * gMultiplier) { testForChar += (INT_4)(gWeightedIonValues.a * HIGH_CHARGE_MULT); } else { testForChar += (INT_4)(gWeightedIonValues.a * HIGH_CHARGE_MULT * HIGH_MASS_A_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } ionPresent[j - 28 * gMultiplier] = 1; /*The a ion is found.*/ } } } } currPtr = currPtr->next; } return; } /********************************FindTrypticLCQB17Ions******************************************** * * This function assumes that the CID ions are all of type b-17 or b-18. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeN are * assigned the additional value of gWeightedIonValues.b_minus17or18. Only b-17 or b-18 ions that have a * corresponding b ion are counted. */ void FindTrypticLCQB17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN) { struct MSData *currPtr; INT_4 b17MassMin, b17MassMax, i, j, testForChar; REAL_4 b17Mass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { b17Mass = currPtr->mOverZ; test = IsThisPossible(b17Mass, i); if(test) { b17Mass = (b17Mass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(b17Mass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(b17Mass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - b17Mass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(b17Mass > (gParam.monoToAv - gAvMonoTransition)) { b17Mass = b17Mass * aToMFactor; } /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ b17Mass = b17Mass + gWater + gParam.fragmentErr; b17MassMax = b17Mass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ b17Mass = b17Mass - gParam.fragmentErr - gParam.fragmentErr - gWater + gAmmonia + 0.5; b17MassMin = b17Mass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(b17MassMax >= gGraphLength) { printf("FindTrypticLCQB17Ions: b17MassMax >= gGraphLength\n"); exit(1); } for(j = b17MassMin; j <= b17MassMax; j++) { testForChar = sequenceNodeN[j]; /*make sure value fits in a char*/ if(sequenceNodeN[j] != 0) { if(i <= mostLikelyFragCharge) /*Charge state of the fragment is ok.*/ { testForChar += (INT_4)gWeightedIonValues.b_minus17or18; } else /*Charge state of the fragment is probably too high.*/ { testForChar += (INT_4)(gWeightedIonValues.b_minus17or18 * HIGH_CHARGE_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } } } } currPtr = currPtr->next; } return; } /********************************* FindTrypticLCQBIons*************************************************** * * This function assumes that the CID ions are all of type b. The nominal mass values are * determined and the corresponding positions in the array sequenceNodeN are assigned the * additional value of gWeightedIonValues.b. Rules are for tryptic peptides fragmented in * an ion trap. */ void FindTrypticLCQBIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN) { struct MSData *currPtr; INT_4 bMassMin, bMassMax, i, j, testForChar; REAL_4 bMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { bMass = currPtr->mOverZ; test = IsThisPossible(bMass, i); if(test) { bMass = (bMass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(bMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(bMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - bMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(bMass > (gParam.monoToAv - gAvMonoTransition)) { bMass = bMass * aToMFactor; } /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ bMass = bMass + gParam.fragmentErr; bMassMax = bMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ bMass = bMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; bMassMin = bMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(bMassMax >= gGraphLength) { printf("FindTrypticLCQBIons: bMassMax >= gGraphLength\n"); exit(1); } for(j = bMassMin; j <= bMassMax; j++) { testForChar = sequenceNodeN[j]; /*test to make sure value fits a char*/ if(i <= mostLikelyFragCharge) { if(bMass > 147 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.b; } } else { testForChar += (INT_4)(gWeightedIonValues.b * HIGH_CHARGE_MULT); } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } } } } currPtr = currPtr->next; } /* If a sequenceNodeN is assigned a non-zero value that is less than the full gWeightedIonValues.b, * then that means that it was because an ion was assumed to be of an unlikely charge state. * For example, if a doubly charged precursor had an ion that was assumed to be a doubly * charged fragment, but the corresponding singly charged ion was absent, then the value * for that node would be less than gWeightedIonValues.y. I remove these from the list here. * * So far, I've not seen alot of multiply charged b ions in LCQ data of tryptic peptides, where * the lower charge states (primarily +1 fragments of +2 precursors) is totally absent. */ for(i = 0; i < gGraphLength; i++) { if(sequenceNodeN[i] < gWeightedIonValues.b) { sequenceNodeN[i] = 0; } } return; } /*********************************TrypticLCQTemplate*********************************************** * * This function takes the initialized sequenceNode and modifies it according to empirical * rules developed from days of experience interpretting CID spectra of multiply-charged * tryptic peptides using the LCQ. The rules are as follows: * Assume each ion is both a b and a y ion. Look for an a ion if a b ion is present. * Look for b-17 ion if a b ion is present. Look for an a-17 ion if an a ion is present * (and the corresponding b ion is also present). Look for a y-17 ion if a y ion is present. * Look for multiply-charged b and y ions if sufficient mass is available. * */ void TrypticLCQTemplate(struct MSData *firstMassPtr, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN) { char *ionPresent; INT_4 i; ionPresent = (char *) malloc(gGraphLength * sizeof(char )); /*Will contain C-terminal evidence.*/ if(ionPresent == NULL) { printf("TrypticLCQTemplate: Out of memory"); exit(1); } for(i = 0; i < gGraphLength; i++) /*Init this array to zero. These arrays are used to keep track of where the a ions are.*/ { ionPresent[i] = 0; } if(gWeightedIonValues.b != 0) { FindTrypticLCQBIons(firstMassPtr, sequenceNodeN); } if(gWeightedIonValues.b_minus17or18 != 0) { FindTrypticLCQB17Ions(firstMassPtr, sequenceNodeN); } if(gWeightedIonValues.a != 0) { FindTrypticLCQAIons(firstMassPtr, sequenceNodeN, ionPresent); } if(gWeightedIonValues.a_minus17or18 != 0) { FindTrypticLCQA17Ions(firstMassPtr, sequenceNodeN, ionPresent); } if(gWeightedIonValues.y != 0) { FindTrypticLCQYIons(firstMassPtr, sequenceNodeC); } if(gWeightedIonValues.y_minus17or18 != 0) { FindTrypticLCQY17Ions(firstMassPtr, sequenceNodeC); } free(ionPresent); return; } /****************************RatchetIt************************************ * * RatchetIt increases sequence[cycle] by one. If the value of sequence[cycle] exceeds * the aaNum for that cycle position, then sequence[cycle] is reset to zero and cycle is * increased by one and seqeunce[cycle] (old cycle value + 1) is increased by one. The * input is 'aaNum' (the number of amino acids in the current cycle), cycle, the array 'sequence' * and 'seqLength' (the maximum length of this sequence). It returns a value of TRUE until * 'cycle' exceeds the 'seqLength' (at which point it returns a FALSE char). By way * of example, the following shows how RatchetIt should handle a list containing three * amino acids per cycle and is two cycles INT_4: * 0 0 * 1 0 * 2 0 * 0 1 * 1 1 * 2 1 * 0 2 * 1 2 * 2 2 ---> test = 0 (exit the while loop) */ char RatchetIt (INT_4 *aaNum, char cycle, char *sequence, INT_4 seqLength) { char test = TRUE; /* * "cycle" is initialized to zero prior to calling RatchetIt for the first time, and is * increased only if the zero index of sequence exceeds the number of amino acids in the * zero-th Edman cycle (which is really the first cycle of Edman degradation. * "sequence[cycle]" is always increased by one, but if it exceeds the number of amino acids * in that "cycle" then it is reset to zero and cycle is increased by one. */ sequence[cycle] += 1; /*Sometimes, this is all that RatchetIt does.*/ if(sequence[cycle] >= aaNum[cycle] && cycle <= seqLength) /*Sometimes, it does more.*/ { sequence[cycle] = 0; /*Reset the array 'sequence' to zero at the current 'cycle'.*/ cycle += 1; /*Go to the next highest cycle.*/ if(cycle > seqLength) /*If I've gone too far w/ 'cycle', then shut the whole thing down by returning a FALSE char value.*/ { test = FALSE; return(test); } test = RatchetIt(aaNum, cycle, sequence, seqLength); /*Recursive call.*/ } return(test); } /*******************************AddEdmanData*************************************************** * * This function alters the arrays sequenceNodeC and sequenceNodeN so as to incorporate the * additional data provided from Edman sequencing. All permutations of the Edman data are * determined, and the corresponding nodes (assuming the N-terminus is unmodified) are altered. * If its not zero, then a value corresponding to one-half of totalIonVal is added to the * existing node value. */ void AddEdmanData(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal) { INT_4 i, j, aaNum[MAX_PEPTIDE_LENGTH], edmanNode, halfTotalIonVal; char sequence[MAX_PEPTIDE_LENGTH], cycle; halfTotalIonVal = totalIonVal * 0.5; /*This is added to the nodes.*/ /*Determine the number of amino acids in each cycle.*/ for(i = 0; i < gMaxCycleNum; i++) { aaNum[i] = 0; j = 0; while(gEdmanData[i][j] != 0) /*If there's a zero, then that signals the end of the list.*/ { aaNum[i]++; j++; } } /* * Here's the major loop in this function - it increments through each Edman cycle that was * entered in the file Lutefisk.edman. What I'm doing here is finding all combinations of * Edman-derived amino acids of varying lengths. So I start out w/ i = 0, which is all * combinations that are one amino acid INT_4 (ie, just the first cycle). Next I look for i = 1, * which is all combinations from the first and second cycle to give a two amino acid INT_4 * segment. This continues until I get to the maximum number of Edman cycles available. */ for(i = 0; i < gMaxCycleNum; i++) /*Nodes will contain from one to gMaxCycleNum-1 amino acids.*/ { /* Initialize some variables for each time through this loop.*/ for(j = 0; j < MAX_PEPTIDE_LENGTH; j++) { sequence[j] = 0; } sequence[0] = -1; cycle = 0; /* * RatchetIt will alter the values found in the array "sequence". This array corresponds to * the indexing of one of the dimensions in the array gEdmanData[i][sequence[i]]. If cycle * reaches a value of gMaxCycleNum, then it returns a char value of zero; otherwise its one. */ while(RatchetIt(aaNum, cycle, sequence, i)) { /* Calculate the node for this particular Edman-derived sequence.*/ edmanNode = gElementMass_x100[HYDROGEN]; for(j = 0; j <= i; j++) { edmanNode += gEdmanData[j][sequence[j]]; } /* If that node is non-zero for 'sequenceNodeC', then add the value 'halfTotalIonVal' to it.*/ if(sequenceNodeC[edmanNode] != 0) { sequenceNodeC[edmanNode] += halfTotalIonVal; } else { if(i == 0) { sequenceNodeC[edmanNode] = 1; /*If its one amino acid INT_4 and zero value, then do this.*/ } else { sequenceNodeC[edmanNode] = 0; /*If its over one amino acid INT_4 and zero value, then do this. Currently it does nothing.*/ } } if(sequenceNodeN[edmanNode] != 0) /*Same as above, but for sequenceNodeN.*/ { sequenceNodeN[edmanNode] += halfTotalIonVal; } else { if(i == 0) { sequenceNodeN[edmanNode] = 1; } else { sequenceNodeN[edmanNode] = 0; } } } } return; } /***************************************AddCTermResidue**************************************** * * This function locates all of the C-terminal Nodes and subtracts the nominal mass of the * most likely C-terminal amino acids for a given proteolysis. If that node has a value of zero, * then a value of one is placed there, so that it might be used for subsequence building. */ void AddCTermResidue(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN) { INT_4 i = gGraphLength - 1; char test = TRUE; /* * The INT_4 'i' is used to index the arrays 'sequenceNodeC' and 'sequenceNodeN'. Obviously, * it must not be a negative number. The char 'test' is initialized as TRUE, and becomes * FALSE once a non-zero value of sequenceNodeC[i] is encountered (as i is incremented down from * its maximum value - GRAPH_LENGTH. Once 'test' becomes FALSE, then this while loop can * continue only as INT_4 as sequenceNodeC[i] remains non-zero. Remember that due to mass * measurement errors, there can be several C-terminal nodes, but that they will always be * adjacent and separate from any other nodes by a string of zero-valued nodes. */ while(i > 0 && (test || sequenceNodeC[i] != 0)) { i--; if(sequenceNodeC[i] != 0) { test = FALSE; if(gParam.proteolysis == 'T') /*If its a tryptic cleavage.*/ { /*If LCQ data, give boost to b ions w/ loss of K or R*/ if(gParam.fragmentPattern == 'L') { sequenceNodeN[i - gMonoMass_x100[K]] = sequenceNodeN[i - gMonoMass_x100[K]] * 4; sequenceNodeN[i - gMonoMass_x100[R]] = sequenceNodeN[i - gMonoMass_x100[R]] * 4; } else /*if qtof or triple quad, boost the y2 ions*/ { sequenceNodeC[i - gMonoMass_x100[K]] = sequenceNodeC[i - gMonoMass_x100[K]] * 4; sequenceNodeC[i - gMonoMass_x100[R]] = sequenceNodeC[i - gMonoMass_x100[R]] * 4; } if(sequenceNodeC[i - gMonoMass_x100[K]] == 0 && sequenceNodeN[i - gMonoMass_x100[K]] == 0) /*Lys*/ { sequenceNodeC[i - gMonoMass_x100[K]] = 10; sequenceNodeN[i - gMonoMass_x100[K]] = 10; } if(sequenceNodeC[i - gMonoMass_x100[R]] == 0 && sequenceNodeN[i - gMonoMass_x100[R]] == 0) /*Arg*/ { sequenceNodeC[i - gMonoMass_x100[R]] = 10; sequenceNodeN[i - gMonoMass_x100[R]] = 10; } } if(gParam.proteolysis == 'K') /*If its a Lys-C cleavage.*/ { if(sequenceNodeC[i - gMonoMass_x100[K]] == 0 && sequenceNodeN[i - gMonoMass_x100[K]] == 0) /*Lys*/ { sequenceNodeC[i - gMonoMass_x100[K]] = 1; sequenceNodeN[i - gMonoMass_x100[K]] = 1; } } if(gParam.proteolysis == 'E') /*If its a Staph v8 cleavage.*/ { if(sequenceNodeC[i - gMonoMass_x100[E]] == 0 && sequenceNodeN[i - gMonoMass_x100[E]] == 0) /*Glu*/ { sequenceNodeC[i - gMonoMass_x100[E]] = 1; sequenceNodeN[i - gMonoMass_x100[E]] = 1; } if(sequenceNodeC[i - gMonoMass_x100[D]] == 0 && sequenceNodeN[i - gMonoMass_x100[D]] == 0) /*Asp*/ { sequenceNodeC[i - gMonoMass_x100[D]] = 1; sequenceNodeN[i - gMonoMass_x100[D]] = 1; } } } } if(gParam.proteolysis == 'D') /*If its an Asp-N cleavage.*/ { if(sequenceNodeC[gMonoMass_x100[D] + gElementMass_x100[HYDROGEN]] == 0 && sequenceNodeN[gMonoMass_x100[D] + gElementMass_x100[HYDROGEN]] == 0) { sequenceNodeC[gMonoMass_x100[D] + gElementMass_x100[HYDROGEN]] = 1; sequenceNodeN[gMonoMass_x100[D] + gElementMass_x100[HYDROGEN]] = 1; } } return; } /*****************************AddTag********************************************************** * * This function inputs the sequence tag (a character string of single letter code amino * acids), and the unsequenced mass at the N- and C-terminii (both REAL_4s). In addition, * the mono to average mass switch mass, the peptideMW, fragmentErr, peptideErr are input. * Also, the arrays sequenceNodeC and sequenceNodeN are used to figure out which nodes * contain any sequence evidence. There is no output, except that these two arrays * (sequenceNodeN and sequenceNodeC) are modified so that the interveneing sequence tag * region is cut out and replaced by "superNodes". The superNode values are the sum of the * intervening node values that correspond to the sequence tag. The value of peptideMW is altered * so that it is reduced by the mass of the sequence tag. The modified arrays continue on * to be processed in the usual manner, but when a completed sequence is reached, then the * tag is re-inserted in the appropriate location. */ void AddTag(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN) { INT_4 minSuperNode, maxSuperNode, i, j, k, minCTerm, maxCTerm; INT_4 nextNode, oldValue, newValue; REAL_4 superNodeMass, cTermMass; REAL_8 aToMFactor; char test; /*Convert the C-terminal tag mass to monoisotopic mass.*/ cTermMass = gParam.tagCMass; if(cTermMass >= gParam.monoToAv) { aToMFactor = 0; } else { if(cTermMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - cTermMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(cTermMass > (gParam.monoToAv - gAvMonoTransition)) { cTermMass = cTermMass * aToMFactor; } /* * The C-terminal mass 'tagCMass' includes the mass of the residues C-terminal to the * 'tagSequence' plus the C-terminal group (CO-R), which is either -OH or -NH2. So I * need to change the C-terminal mass to reflect the type of C-terminus. This will make * it so that the thus modified C-terminal mass added to the N-terminal mass plus the masses * of the sequence tag will equal the C-terminal node values (which are sort of like b ions). */ cTermMass = cTermMass - gParam.modifiedCTerm; /* Find the maximum c terminal mass.*/ cTermMass = cTermMass + gParam.fragmentErr; maxCTerm = cTermMass; /* Find the minimum c terminal mass.*/ cTermMass = cTermMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; minCTerm = cTermMass; superNodeMass = gParam.tagNMass; if(superNodeMass >= gParam.monoToAv) { aToMFactor = 0; } else { if(superNodeMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - superNodeMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(superNodeMass > (gParam.monoToAv - gAvMonoTransition)) { superNodeMass = superNodeMass * aToMFactor; } /* Find the maximum super node mass.*/ superNodeMass = superNodeMass + gParam.fragmentErr; maxSuperNode = superNodeMass; /* Find the minimum super node mass.*/ superNodeMass = superNodeMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; minSuperNode = superNodeMass; if(minSuperNode < gElementMass_x100[HYDROGEN]) /*If tag includes N-terminal amino acids, but mass is off a bit*/ { minSuperNode = gElementMass_x100[HYDROGEN]; } for(i = minSuperNode; i <= maxSuperNode; i++) { nextNode = i; /*Start with the node for the N-terminal mass.*/ j = 0; /* * Calculate the node (or mass) of the sequence tag plus the N-terminal mass. Set test to * FALSE, and add to the superNode positions of sequenceNodeN and sequenceNodeC the values * found in the nodes that make up the sequence tag. */ while(gParam.tagSequence[j] != 0) { test = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(gSingAA[k] == gParam.tagSequence[j]) { test = FALSE; nextNode = nextNode + gGapList[k]; } } } if(test) /*TRUE only if a sequence tag entry was not a valid amino acid.*/ { printf("There is something wrong with the sequence tag."); exit(1); } j++; } test = FALSE; for(j = nextNode + minCTerm; j <= nextNode + maxCTerm; j++) { if(sequenceNodeN[j] != 0) { test = TRUE; break; } } /* Change the supernode values to a negative one.*/ if(test) { if(i >= gGraphLength) { printf("AddTag: i >= gGraphLength\n"); exit(1); } sequenceNodeN[i] = -1; sequenceNodeC[i] = -1; } } /* Now I reassign the node values so that the sequence tag region is cut out.*/ for(i = gGraphLength - 1; i > 0; i--) /*Find the next node above the superNode series*/ { if(sequenceNodeN[i] == -1) { oldValue = i + 1; break; } } j = 0; /*Find the node that is the mass of the tagSequence higher than the oldValue*/ newValue = oldValue; while(gParam.tagSequence[j] != 0) { test = TRUE; for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(gSingAA[k] == gParam.tagSequence[j]) { test = FALSE; newValue = newValue + gGapList[k]; } } } if(test) /*TRUE only if a sequence tag entry was not a valid amino acid.*/ { printf("There is something wrong with the sequence tag."); exit(1); } j++; } while(newValue < gGraphLength) { sequenceNodeN[oldValue] = sequenceNodeN[newValue]; sequenceNodeC[oldValue] = sequenceNodeC[newValue]; newValue++; oldValue++; } while(oldValue < gGraphLength) { sequenceNodeN[oldValue] = 0; sequenceNodeC[oldValue] = 0; oldValue++; } /* Reset the peptideMW to reflect the removal of the sequence tag mass.*/ j = 0; while(gParam.tagSequence[j] != 0) { for(k = 0; k < gAminoAcidNumber; k++) { if(gGapList[k] != 0) { if(gSingAA[k] == gParam.tagSequence[j]) { gParam.peptideMW = gParam.peptideMW - gGapList[k]; } } } j++; } return; } /********************************FindTrypticY17Ions******************************************** * * This function assumes that the CID ions are all of type y-17 or y-18. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeC are * assigned the additional value of gWeightedIonValues.y_minus17or18. Only y-17 or y-18 ions that have a * corresponding y ion are counted. */ void FindTrypticY17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeC) { struct MSData *currPtr; INT_4 i, j, testForChar; INT_4 y17MassMin, y17MassMax; REAL_4 y17Mass, peptideMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { y17Mass = currPtr->mOverZ; test = IsThisPossible(y17Mass, i); if(test) { y17Mass = (y17Mass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ if(y17Mass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(y17Mass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - y17Mass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(y17Mass > (gParam.monoToAv - gAvMonoTransition)) { y17Mass = y17Mass * aToMFactor; } peptideMass = gParam.peptideMW; if(peptideMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - peptideMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { peptideMass = peptideMass * aToMFactor; } y17Mass = peptideMass - y17Mass + (2 * gElementMass_x100[HYDROGEN]); /*Partially convert to b ion.*/ /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ y17Mass = y17Mass - gAmmonia + gParam.fragmentErr; y17MassMax = y17Mass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ y17Mass = y17Mass - gParam.fragmentErr - gParam.fragmentErr - gWater + gAmmonia + 0.5; y17MassMin = y17Mass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(y17MassMax >= gGraphLength) { printf("FindTrypticY17Ions: y17MassMax >= gGraphLength\n"); exit(1); } for(j = y17MassMin; j <= y17MassMax; j++) { if(sequenceNodeC[j] != 0) { testForChar = sequenceNodeC[j]; if(i <= mostLikelyFragCharge) { testForChar += (INT_4)gWeightedIonValues.y_minus17or18; } else { testForChar += (INT_4)(gWeightedIonValues.y_minus17or18 * HIGH_CHARGE_MULT); } if(testForChar < 127 && testForChar > -127) { sequenceNodeC[j] = testForChar; } else { sequenceNodeC[j] = 63; } } } } } currPtr = currPtr->next; } return; } /********************************* FindTrypticYIons*************************************************** * * This function assumes that the CID ions are all of type y. The nominal mass values are * determined and the corresponding positions in the array sequenceNodeC are assigned the * additional value of gWeightedIonValues.y. */ void FindTrypticYIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeC) { struct MSData *currPtr; INT_4 yMassMin, yMassMax; INT_4 i, j, testForChar, firstNode, massDiff; REAL_4 yMass, peptideMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { yMass = currPtr->mOverZ; test = IsThisPossible(yMass, i); if(test) { yMass = (yMass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ if(yMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(yMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - yMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(yMass > (gParam.monoToAv - gAvMonoTransition)) { yMass = yMass * aToMFactor; } peptideMass = gParam.peptideMW; if(peptideMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - peptideMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(peptideMass > (gParam.monoToAv - gAvMonoTransition)) { peptideMass = peptideMass * aToMFactor; } yMass = peptideMass - yMass + (2 * gElementMass_x100[HYDROGEN]); /*Convert to b ion.*/ /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node, and convert to the b ion mass.*/ yMass = yMass + gParam.fragmentErr; yMassMax = yMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ yMass = yMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; yMassMin = yMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(yMassMax >= gGraphLength) { printf("FindTrypticYIons: yMassMax >= gGraphLength\n"); exit(1); } for(j = yMassMin; j <= yMassMax; j++) { testForChar = sequenceNodeC[j]; /*make sure value fits in a char*/ if(i <= mostLikelyFragCharge) { testForChar += (INT_4)gWeightedIonValues.y; } else { testForChar += (INT_4)(gWeightedIonValues.y * HIGH_CHARGE_MULT); } if(testForChar < 127 && testForChar > -127) { sequenceNodeC[j] = testForChar; } else { sequenceNodeC[j] = 63; } } } } currPtr = currPtr->next; } /* If a sequenceNodeC is assigned a non-zero value that is less than the full gWeightedIonValues.y, * then that means that it was because an ion was assumed to be of an unlikely charge state. * For example, if a doubly charged precursor had an ion that was assumed to be a doubly * charged fragment, but the corresponding singly charged ion was absent, then the value * for that node would be less than gWeightedIonValues.y. I remove these from the list here. */ firstNode = 0; for(i = 0; i < gGraphLength; i++) { if(firstNode == 0) { if(sequenceNodeC[i] != 0) { firstNode = i; /*find the N-terminus*/ } } /*if +2 y ion present indicating N-terminal amino acid, then it will have a node value less than the max*/ if(sequenceNodeC[i] > 0 && sequenceNodeC[i] < gWeightedIonValues.y && i != firstNode) { massDiff = i - firstNode; for(j = 0; j < gAminoAcidNumber; j++) { if(massDiff == gGapList[j]) { sequenceNodeC[i] = gWeightedIonValues.y; /*give it higher value so that it can be retained below*/ } } } if(sequenceNodeC[i] < gWeightedIonValues.y) { sequenceNodeC[i] = 0; } } return; } /********************************FindTrypticA17Ions******************************************** * * This function assumes that the CID ions are all of type a-17 or a-18. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeN are * assigned the additional value of gWeightedIonValues.a_minus17or18. Only a-17 or a-18 ions that have a * corresponding a ion are counted. */ void FindTrypticA17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent) { struct MSData *currPtr; INT_4 a17MassMin, a17MassMax, i, j, testForChar; REAL_4 a17Mass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { a17Mass = currPtr->mOverZ; test = IsThisPossible(a17Mass, i); if(test) { a17Mass = (a17Mass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(a17Mass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(a17Mass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - a17Mass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(a17Mass > (gParam.monoToAv - gAvMonoTransition)) { a17Mass = a17Mass * aToMFactor; } /*The two extremes for the possible a ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ a17Mass = a17Mass + gWater + gCO + gParam.fragmentErr; a17MassMax = a17Mass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ a17Mass = a17Mass - gParam.fragmentErr - gParam.fragmentErr + 0.5; a17MassMin = a17Mass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(a17MassMax >= gGraphLength) { printf("FindTrypticA17Ions: a17MassMax >= gGraphLength\n"); exit(1); } for(j = a17MassMin; j <= a17MassMax; j++) { if(ionPresent[j - 28 * gMultiplier] != 0) { testForChar = sequenceNodeN[j]; if(i <= mostLikelyFragCharge) { if(a17Mass < 350 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.a_minus17or18; } else { testForChar += (INT_4)(gWeightedIonValues.a_minus17or18 * HIGH_MASS_A_MULT); } } else { if(a17Mass < 350 * gMultiplier) { testForChar += (INT_4)(gWeightedIonValues.a_minus17or18 * HIGH_CHARGE_MULT); } else { testForChar += (INT_4)(gWeightedIonValues.a_minus17or18 * HIGH_MASS_A_MULT * HIGH_CHARGE_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } } } } } currPtr = currPtr->next; } return; } /********************************FindTrypticAIons******************************************** * * This function assumes that the CID ions are all of type a. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeN are * assigned the additional value of gWeightedIonValues.a. Only those a ions that have a * corresponding b ion are counted. Using nominal masses as the index, the number one is * placed in the array ionPresent at the nominal mass of the a ion. */ void FindTrypticAIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent) { struct MSData *currPtr; INT_4 aMassMin, aMassMax, i, j, testForChar; REAL_4 aMass; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { aMass = currPtr->mOverZ; test = IsThisPossible(aMass, i); if(test) { aMass = (aMass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(aMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(aMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - aMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(aMass > (gParam.monoToAv - gAvMonoTransition)) { aMass = aMass * aToMFactor; } /*The two extremes for the possible a ion nodes are identified first. The integer mass value of the a ion is the index number of the array ionPresent, which is initially set to zero for all of its values. If a b ion is present, then the a ion is counted. The fact that the a ion has been counted is noted by changing ionPresent[whatever the a ion mass is] to 1. This is use when deciding if there are any a-17 ions to be included in the scores for the nodes. */ /*First I'll find the highest mass node.*/ aMass = aMass + gCO + gParam.fragmentErr; aMassMax = aMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ aMass = aMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; aMassMin = aMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(aMassMax >= gGraphLength) { printf("FindTrypticAIons: aMassMax >= gGraphLength\n"); exit(1); } for(j = aMassMin; j <= aMassMax; j++) { if(sequenceNodeN[j] != 0) { testForChar = sequenceNodeN[j]; /*make sure value fits in a char*/ if(i <= mostLikelyFragCharge) { if(aMass < 350 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.a; } else { testForChar += (INT_4)(gWeightedIonValues.a * HIGH_MASS_A_MULT); } } else { if(aMass < 350 * gMultiplier) { testForChar += (INT_4)(gWeightedIonValues.a * HIGH_CHARGE_MULT); } else { testForChar += (INT_4)(gWeightedIonValues.a * HIGH_CHARGE_MULT * HIGH_MASS_A_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } ionPresent[j - 28 * gMultiplier] = 1; /*The a ion is found.*/ } } } } currPtr = currPtr->next; } return; } /********************************FindTrypticB17Ions******************************************** * * This function assumes that the CID ions are all of type b-17 or b-18. The nominal mass * values are determined and the corresponding positions in the array sequenceNodeN are * assigned the additional value of gWeightedIonValues.b_minus17or18. Only b-17 or b-18 ions that have a * corresponding b ion are counted. */ void FindTrypticB17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN) { struct MSData *currPtr; INT_4 b17MassMin, b17MassMax, i, j, testForChar; REAL_4 b17Mass, precursor; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; precursor = gParam.peptideMW / gParam.chargeState; /*Not exactly the precursor, but close enough.*/ if(gParam.chargeState == 1) { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { b17Mass = currPtr->mOverZ; test = IsThisPossible(b17Mass, i); if(test) { b17Mass = (b17Mass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(b17Mass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(b17Mass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - b17Mass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(b17Mass > (gParam.monoToAv - gAvMonoTransition)) { b17Mass = b17Mass * aToMFactor; } /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ b17Mass = b17Mass + gWater + gParam.fragmentErr; b17MassMax = b17Mass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ b17Mass = b17Mass - gParam.fragmentErr - gParam.fragmentErr - gWater + gAmmonia + 0.5; b17MassMin = b17Mass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(b17MassMax >= gGraphLength) { printf("FindTrypticB17Ions: b17MassMax >= gGraphLength\n"); exit(1); } for(j = b17MassMin; j <= b17MassMax; j++) { testForChar = sequenceNodeN[j]; /*make sure value fits in a char*/ if(sequenceNodeN[j] != 0) { if(i <= mostLikelyFragCharge) /*Charge state of the fragment is ok.*/ { if(b17Mass < precursor) /*If the m/z is less than the precursor, thats good.*/ { testForChar += (INT_4)gWeightedIonValues.b_minus17or18; } else /*Otherwise, there's a penalty.*/ { testForChar += (INT_4)(gWeightedIonValues.b_minus17or18 * HIGH_MASS_B_MULT); } } else /*Charge state of the fragment is probably too high.*/ { if(b17Mass < precursor) { testForChar += (INT_4)(gWeightedIonValues.b_minus17or18 * HIGH_CHARGE_MULT); } else { testForChar += (INT_4)(gWeightedIonValues.b_minus17or18 * HIGH_CHARGE_MULT * HIGH_MASS_B_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; } } } } } currPtr = currPtr->next; } return; } /********************************IsThisStillPossible******************************************* * * This function is called if an ion under consideration is of greater m/z than the precursor. * If there is a possible a ion present at 28/charge below the bMass, then a TRUE is returned; * otherwise a FALSE is returned. * */ char IsThisStillPossible(REAL_4 bMass, INT_4 currentCharge, struct MSData *firstMassPtr) { char test = FALSE; REAL_4 testMass; struct MSData *currPtr; testMass = (bMass - gCO) / (REAL_4)currentCharge; currPtr = firstMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ > bMass) { break; } if(currPtr->mOverZ >= testMass - gParam.fragmentErr && currPtr->mOverZ <= testMass + gParam.fragmentErr) { test = TRUE; } currPtr = currPtr->next; } return(test); } /********************************IsThisPossible************************************************ * * This function tests to see if the ion is the precursor, precursor - water, below m/z 115, * or equal to 120, 136, 147, 159, or 175. It also checks to make sure that there is sufficient * mass to hold multiple charges and is less than the molecular weight of the peptide. * currentCharge = current charge. * chargeState = charge on the precursor ion. */ char IsThisPossible(REAL_4 bMass, INT_4 currentCharge) { REAL_4 minMOverZ, precursor, minWater; INT_4 minMassPerCharge = MIN_MASS_PER_CHARGE * gMultiplier + 0.5; char maxCharge; char test = TRUE; if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } minMOverZ = (currentCharge - 1) * minMassPerCharge; if(bMass < minMOverZ) /*Check that the ion is more than MIN_MASS_PER_CHARGE.*/ { test = FALSE; } /* Check to see if the ion is the precursor or the precursor minus water ion.*/ precursor = (gParam.peptideMW + (maxCharge * gElementMass_x100[HYDROGEN])) / maxCharge; if(bMass <= (precursor + gParam.fragmentErr) && bMass >= (precursor - gParam.fragmentErr)) { test = FALSE; } minWater = (gParam.peptideMW - gWater + (maxCharge * gElementMass_x100[HYDROGEN])) / maxCharge; if(bMass <= (minWater + gParam.fragmentErr) && bMass >= (minWater - gParam.fragmentErr)) { test = FALSE; } /* Don't use any ions less than 115 Da.*/ if(bMass < 115 * gMultiplier) { test = FALSE; } /* Don't use specific ions that are usually immonium ions or tryptic y ions.*/ if(bMass <= ((gMonoMass_x100[F] - gCO + gElementMass_x100[HYDROGEN]) + gParam.fragmentErr) && bMass >= ((gMonoMass_x100[F] - gCO + gElementMass_x100[HYDROGEN]) - gParam.fragmentErr)) { test = FALSE; } if(bMass <= (129 * gMultiplier + gParam.fragmentErr) && bMass >= (129 * gMultiplier - gParam.fragmentErr)) { test = FALSE; } if(bMass <= ((gMonoMass_x100[Y] - gCO + gElementMass_x100[HYDROGEN]) + gParam.fragmentErr) && bMass >= ((gMonoMass_x100[Y] - gCO + gElementMass_x100[HYDROGEN]) - gParam.fragmentErr)) { test = FALSE; } if(bMass <= ((gMonoMass_x100[W] - gCO + gElementMass_x100[HYDROGEN]) + gParam.fragmentErr) && bMass >= ((gMonoMass_x100[W] - gCO + gElementMass_x100[HYDROGEN]) - gParam.fragmentErr)) { test = FALSE; } bMass = (bMass * currentCharge) - ((currentCharge - 1) * gElementMass_x100[HYDROGEN]); /*Convert to singly charged ion.*/ if(bMass > (gParam.peptideMW - gMonoMass_x100[G])) /*Check that the calculated singly charged b ion is less than the molecular weight of the peptide minus the mass of glycine.*/ { test = FALSE; } return(test); } /********************************* FindTrypticBIons*************************************************** * * This function assumes that the CID ions are all of type b. The nominal mass values are * determined and the corresponding positions in the array sequenceNodeN are assigned the * additional value of gWeightedIonValues.b. */ void FindTrypticBIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN) { struct MSData *currPtr; INT_4 bMassMin, bMassMax, i, j, testForChar; REAL_4 bMass, precursor; REAL_8 aToMFactor; char test, mostLikelyFragCharge, maxCharge; currPtr = firstMassPtr; precursor = gParam.peptideMW / gParam.chargeState; /*Not exactly the precursor, but close enough.*/ if(gParam.chargeState == 1) /*Figure out the most likely charge state of a fragment ion. This is used to determine the gWeightedIonValues.y.*/ { mostLikelyFragCharge = 1; } else { mostLikelyFragCharge = gParam.chargeState - 1; } if(gParam.maxent3) { mostLikelyFragCharge = 1; } /*max charge state for maxent3 is +1*/ if(gParam.maxent3) { maxCharge = 1; } else { maxCharge = gParam.chargeState; } while(currPtr != NULL) { for(i = 1; i <= maxCharge; i++) { bMass = currPtr->mOverZ; test = IsThisPossible(bMass, i); /*For ions > precursor, check if there is an a ion.*/ if(test && currPtr->mOverZ > precursor + gParam.fragmentErr * 2) { test = IsThisStillPossible(bMass, i, firstMassPtr); } if(test) { bMass = (bMass * i) - ((i - 1) * gElementMass_x100[HYDROGEN]); /*Convert to +1 ion.*/ /*Alter the values so that they are closer to the expected nominal masses.*/ if(bMass >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(bMass > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - bMass) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(bMass > (gParam.monoToAv - gAvMonoTransition)) { bMass = bMass * aToMFactor; } /*The two extremes for the possible b ion nodes are identified first.*/ /*First I'll find the highest mass node.*/ bMass = bMass + gParam.fragmentErr; bMassMax = bMass; /*Truncate w/o rounding up.*/ /*Next I'll find the lowest mass node.*/ bMass = bMass - gParam.fragmentErr - gParam.fragmentErr + 0.5; bMassMin = bMass; /*Truncate after rounding up.*/ /*Now fill in the middle (if there is anything in the middle.*/ if(bMassMax >= gGraphLength) { printf("FindTrypticBIons: bMassMax >= gGraphLength\n"); exit(1); } for(j = bMassMin; j <= bMassMax; j++) { testForChar = sequenceNodeN[j]; /*test to make sure the value fits in a char*/ if(i <= mostLikelyFragCharge) { if(bMass < precursor && bMass > 147 * gMultiplier) { testForChar += (INT_4)gWeightedIonValues.b; } else { testForChar += (INT_4)(gWeightedIonValues.b * HIGH_MASS_B_MULT); } } else { if(bMass < precursor && bMass > 147 * gMultiplier) { testForChar += (INT_4)(gWeightedIonValues.b * HIGH_CHARGE_MULT); } else { testForChar += (INT_4)(gWeightedIonValues.b * HIGH_CHARGE_MULT * HIGH_MASS_B_MULT); } } if(testForChar < 127 && testForChar > -127) { sequenceNodeN[j] = testForChar; } else { sequenceNodeN[j] = 63; /*if 63 is in seqNodeN and C then sum is still <127*/ } } } } currPtr = currPtr->next; } /* If a sequenceNodeN is assigned a non-zero value that is less than the full gWeightedIonValues.b, * then that means that it was because an ion was assumed to be of an unlikely charge state. * For example, if a doubly charged precursor had an ion that was assumed to be a doubly * charged fragment, but the corresponding singly charged ion was absent, then the value * for that node would be less than gWeightedIonValues.y. I remove these from the list here. */ for(i = 0; i < gGraphLength; i++) { if(sequenceNodeN[i] < gWeightedIonValues.b) { sequenceNodeN[i] = 0; } } return; } /*********************************TrypticTemplate*********************************************** * * This function takes the initialized sequenceNode and modifies it according to empirical * rules developed from years of experience interpretting CID spectra of multiply-charged * tryptic peptides. The rules are as follows: * Assume each ion is both a b and a y ion. Look for an a ion if a b ion is present. * Look for b-17 ion if a b ion is present. Look for an a-17 ion if an a ion is present * (and the corresponding b ion is also present). Look for a y-17 ion if a y ion is present. * Look for multiply-charged b and y ions if sufficient mass is available. * */ void TrypticTemplate(struct MSData *firstMassPtr, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN) { char *ionPresent; INT_4 i; ionPresent = (char *) malloc(gGraphLength * sizeof(char)); /*Will contain C-terminal evidence.*/ if(ionPresent == NULL) { printf("TrypticTemplate: Out of memory"); exit(1); } for(i = 0; i < gGraphLength; i++) /*Init this array to zero. These arrays are used to keep track of where the a ions are.*/ { ionPresent[i] = 0; } FindTrypticBIons(firstMassPtr, sequenceNodeN); FindTrypticB17Ions(firstMassPtr, sequenceNodeN); FindTrypticAIons(firstMassPtr, sequenceNodeN, ionPresent); FindTrypticA17Ions(firstMassPtr, sequenceNodeN, ionPresent); FindTrypticYIons(firstMassPtr, sequenceNodeC); FindTrypticY17Ions(firstMassPtr, sequenceNodeC); free(ionPresent); return; } /*******************************SequenceNodeInit************************************************ * * This function initializes the array sequenceNode by first assigning the value of zero * to each element in the array (from 0 -> 3999). Next an N-terminal value is assigned to * position number 1, and then the possible C-terminal nodes are found and given an N-terminal * value. There may be more than one C-terminal node, depending on the mass and the error. */ void SequenceNodeInit(SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN) { INT_4 i, firstNode; REAL_4 lastNode; REAL_8 aToMFactor; INT_4 lastNodeHigh, lastNodeLow; for(i = 0; i < gGraphLength; i++) /*Initialize sequenceNode to zero's.*/ { sequenceNode[i] = 0; sequenceNodeC[i] = 0; sequenceNodeN[i] = 0; } /* Find the N-terminal node. This will equal the nominal mass of the N-terminal group R-NH-.*/ firstNode = gParam.modifiedNTerm; sequenceNodeC[firstNode] = N_NODE_VALUE; /*Give the N-terminal node an arbitrary value.*/ sequenceNodeN[firstNode] = N_NODE_VALUE; sequenceNode[firstNode] = sequenceNodeC[firstNode] + sequenceNodeN[firstNode]; /*Figure out what the C-terminal node(s) are.*/ lastNode = gParam.peptideMW - gParam.modifiedCTerm; /*Alter the values so that they are closer to the expected nominal masses.*/ if(lastNode >= gParam.monoToAv) /*convert to monoisotopic*/ { aToMFactor = 0; } else { if(lastNode > (gParam.monoToAv - gAvMonoTransition)) { aToMFactor = (gParam.monoToAv - lastNode) / gAvMonoTransition; } else { aToMFactor = 1; } } aToMFactor = ((1 - AV_TO_MONO) * aToMFactor) + AV_TO_MONO; if(lastNode > (gParam.monoToAv - gAvMonoTransition)) { lastNode = lastNode * aToMFactor; } /*The two extremes for the possible C-terminal nodes are identified first.*/ /*First I'll find the highest mass node.*/ lastNode = lastNode + gParam.peptideErr; lastNodeHigh = lastNode; /*Truncate w/o rounding up.*/ if(lastNodeHigh >= gGraphLength) { printf("SequenceNodeInit: lastNodeHigh >= gGraphLength\n"); exit(1); } sequenceNodeN[lastNodeHigh] = C_NODE_VALUE; sequenceNodeC[lastNodeHigh] = C_NODE_VALUE; sequenceNode[lastNodeHigh] = sequenceNodeN[lastNodeHigh] + sequenceNodeC[lastNodeHigh]; /*Next I'll find the lowest mass node.*/ lastNode = lastNode - gParam.peptideErr - gParam.peptideErr; lastNodeLow = lastNode; /*Truncate after rounding up.*/ sequenceNodeN[lastNodeLow] = C_NODE_VALUE; sequenceNodeC[lastNodeLow] = C_NODE_VALUE; sequenceNode[lastNodeLow] = sequenceNodeN[lastNodeLow] + sequenceNodeC[lastNodeLow]; /*Now fill in the middle (if there is anything in the middle.*/ if((lastNodeHigh - lastNodeLow) > 1) { for(i = lastNodeLow; i <= lastNodeHigh; i++) { sequenceNodeN[i] = C_NODE_VALUE; sequenceNodeC[i] = C_NODE_VALUE; sequenceNode[i] = sequenceNodeC[i] + sequenceNodeN[i]; } } return; } /**********************************MakeSequenceGraph****************************************** * * The general idea here is to assume that each observed ion in the linked list "firstMassPtr" * is of one of several possibilities - a, b, c, x, y, z+1, y-2, d, v, w, etc.. Making * each of these assumptions for each ion, it is possible to mathematically convert the CID * data into a "sequence graph". Here, the sequence graph's index numbers correspond to the * nominal mass values of b-type ions. The value held at each node (or index number) of the * sequence graph is an estimation of the likelihood that a fragmentation is present at that * mass. For example, if there are two ions at 200 and 801, and the peptide molecular weight * is 999, then 200 could be a b ion, and 801 would be the corresponding y ion. The value * for the node 200 would reflect the fact that at least two ions suggest that a cleavage * occured at that mass. * * There are three ways to calculate the node values - a general peptide template, a tryptic * peptide template, and a template where there is an arginine present and the precursor * has a single charge. Each template has certain rules that are followed (and are hard-coded) * in order to derive the final sequence graph. * * If a sequence tag is known, then this information is used to derive the sequence graph. * * Here is a list of the input variables used by this function: * - firstMassPtr has the CID data. * - sequenceNode[GRAPH_LENGTH] is the array that eventually contains the final node values. * It is derived from sequenceNodeN and sequenceNodeC. * - sequenceNodeN[] is the array that contains the N-terminal fragment info. * - sequenceNodeC[] is the array that contains the C-terminal fragment info. * - proteolysis indicates if the peptide was cleaved by trypsin, lys-c, or v8. If so, * then the nodes for a C-terminal lysine, arginine, etc. are given higher values. * These values will be equal to gWeightedIonValues.y (see below). * - fragmentPattern tells whether to use a general, tryptic, or arg+1 template. * - modifiedCTerm is needed to convert a putative C-terminal ion into a b ion node. * - tagSequence contains a single letter code sequence for the sequence tag. * - tagCMass and tagNMass are the terminal unsequenced masses surrounding the sequence tag. * - peptideMWPtr is the pointer to the REAL_4 containing the molecular weight of the peptide. * - fragmentErr is the CID fragment ion tolerance. * - peptideErr is the tolerance for the peptide mass measurement (in Da). * - chargeState is the charge state of the precursor ion. * - monoToAv is the mass above which average masses are assumed. * - xxxIonVal are the scores used for each ion type. * - edmanPresent indicates (TRUE or FALSE) if there is any edman data available. * - totalIonVal contains the sum of all of the gWeightedIonValues. */ void MakeSequenceGraph(struct MSData *firstMassPtr, SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal) { /* * Initialize the arrays sequenceNode, sequenceNodeC, and sequenceNodeN so that all values * are zero, except for the N and C terminal nodes. This is the only thing that this entire * file/function does to the array 'sequenceNode'. Most of the action occurs to the other two * arrays - sequenceNodeN and sequenceNodeC. */ SequenceNodeInit(sequenceNode, sequenceNodeC, sequenceNodeN); /* * The TrypticTemplate contains the hard-coded rules for interpretting tryptic multiply charged * ions (from triple quads). It uses the various xxxIonVal's to assign values to positions in the arrays called * sequenceNodeC and sequenceNodeN, where the indexing of these arrays corresponds to the nominal * masses of real and hypothetical (ie, calculated) b type ions. The hypothetical b type ions * derived from those real ions that are assumed to be C-terminal are given node values that are * placed in sequenceNodeC. Likewise, sequenceNodeN contains values for ions that are assumed to * be N-terminal ions. */ if(gParam.fragmentPattern == 'T' || gParam.fragmentPattern == 'Q') { TrypticTemplate(firstMassPtr, sequenceNodeC, sequenceNodeN); } /* To be written at a later date.*/ /* if(fragmentPattern == 'G') { GeneralTemplate(firstMassPtr, sequenceNode, peptideMW, fragmentErr, chargeState, monoToAv, gWeightedIonValues.b, gWeightedIonValues.a, gWeightedIonValues.c, gWeightedIonValues.d, gWeightedIonValues.b_minus17or18, gWeightedIonValues.a_minus17or18, gWeightedIonValues.y, gWeightedIonValues.y_minus2, gWeightedIonValues.y_minus17or18, gWeightedIonValues.x, gWeightedIonValues.z_plus1, gWeightedIonValues.w, gWeightedIonValues.v, gWeightedIonValues.b_minusOH, gWeightedIonValues.b_minusOH_minus17); }*/ /* * The TrypticLCQTemplate contains the hard-coded rules for interpretting tryptic multiply charged * ions (from ion traps). It uses the various xxxIonVal's to assign values to positions in the arrays called * sequenceNodeC and sequenceNodeN, where the indexing of these arrays corresponds to the nominal * masses of real and hypothetical (ie, calculated) b type ions. The hypothetical b type ions * derived from those real ions that are assumed to be C-terminal are given node values that are * placed in sequenceNodeC. Likewise, sequenceNodeN contains values for ions that are assumed to * be N-terminal ions. */ if(gParam.fragmentPattern == 'L') { TrypticLCQTemplate(firstMassPtr, sequenceNodeC, sequenceNodeN); } /* * RemoveSillyNodes removes all nodes below 260 that cannot be made of any combination of * amino acids. */ RemoveSillyNodes(sequenceNodeC, sequenceNodeN); /* * If a type of proteolysis has been entered by the user (via the char 'proteolysis'), then * it is assumed that all possible cleavages characteristic of that proteolysis are present. * For example, if the peptide is derived from a tryptic digest, then the node positions * corresponding to the C-terminal node minus 128 (lysine) and 156 (arginine) are calculated. * if the values in sequenceNodeN[C-term minus residue] and sequenceNodeC[C-term minus residue] * are both zero (ie, no real ions have indicated a cleavage at that site), then these arrays * are assigned a value of one at that node value. If the data is of poor quality, this permits * the possibility of continuing a sequence to the C-terminus even if the cleavage info is * absent. By assigning a value of one to those nodes, I do not weight these sequences very * heavily. */ if(gParam.proteolysis != 'N') { AddCTermResidue(sequenceNodeC, sequenceNodeN); } /* * If 'edmanPresent' is TRUE and the N-terminus is not modified, then call the function * AddEdmanData. AddEdmanData uses the globals gMaxCycleNum and * gEdmanData[MAX_PEPTIDE_LENGTH][gAminoAcidNumber] to alter the arrays 'sequenceNodeC' * and 'sequenceNodeN'. */ if(gParam.edmanPresent) { AddEdmanData(sequenceNodeC, sequenceNodeN, totalIonVal); } /* * If a sequence tag has been provided, then the region containing the sequence is excised * and the graph closed over the wound. The peptideMW loses the mass of the sequence tag. */ if(gParam.tagSequence[0] != '*') { AddTag(sequenceNodeC, sequenceNodeN); } return; } lutefisk-1.0.7+dfsg.orig/src/LutefiskGetCID.c0000644000175000017500000051756210533410001020655 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ /* ANSI Headers */ #include #include #include #include /* Lutefisk Headers */ #include "LutefiskPrototypes.h" #include "LutefiskDefinitions.h" #include "ListRoutines.h" /* Function prototypes */ tMSDataList *ReadFinniganFile(char *filename); tMSDataList *ReadFinniganFile(char *filename) { printf("That would be nice, wouldn't it.\nNative file reading not implimented.\n"); return NULL; exit(1); } /* Declare pointers to a struct that is global only within LutefiskGetCID.c. */ static struct MSData *gGroupingPtr; static struct MSData *gLastDataPtr; /*Definitions for this file*/ #define MIN_NUM_IONS 5 /*Minimum number of ions after processing in GetCID*/ #define MAX_ION_MASS 3000 /*Ions greater than this are deemed too high to not be a mistake*/ #define MIN_HIGHMASS_INT_RATIO 0.05 /*Ratio of high mass intensity over total intensity*/ #define HIGH_MASS_RATIO 0.95 /*Ions are counted until this % of high mass ion intensity is reached*/ #define LCQ_INT_SUM_CUTOFF 500 /*Cutoff for good intensity total for LCQ data*/ #define QTOF_INT_SUM_CUTOFF 140 /*Cutoff for good intensity total for Qtof data*/ #define MAX_HIGH_MASS 100 /*Max number of ions greater than precursor*/ #define MAX_MASS 4000 /*Peptides above this mass are tossed out.*/ #define MIN_MASS 800 /*Peptides below this mass are tossed out.*/ #define LOW_MASS_ION_NUM 19 /*Number of peptide-related low mass ions*/ /* //-------------------------------------------------------------------------------- // GetCidData() //-------------------------------------------------------------------------------- GetCidData uses an ASCII input file named cidFilename, where cidFilename is produced using the 'Print...' command in the Finnigan program LIST. When cidFilename is placed in the same folder as Lutefisk, it finds weighted average masses within a resolution window defined by the variable 'peakWidth'. 'peakWidth' is half of the peak width at its base, and for high sensitivity (low resolution) applications peakWidth is typically 4-6 Da, but is best set to 3.0 for unit resolution (this helps get rid of the C13 peaks). GetCidData returns the value of the MSData struct called firstAvMassPtr, which is the final list of ion m/z and intensity values used for the rest of the program. Modified 03.13.00 JAT - Split out the file reading code into ReadCIDFile(). */ struct MSData *GetCidData(void) { tMSDataList *MSDataList = NULL; tMSDataList *peakList = NULL; struct MSData *firstAvMassPtr = NULL; INT_4 i, finalIonCount; REAL_4 excessIonRatio; REAL_4 generalQuality = 0; REAL_4 lowMassQuality = 0; REAL_4 highMassQuality = 0; REAL_4 quality = 0; if(gParam.fMonitor) { printf("Processing CID datafile '%s'\n", gParam.cidFilename); } MSDataList = ReadCIDFile(gParam.cidFilename); if (MSDataList->numObjects == 0) { printf("There doesn't seem to be any data in the firstDataPtr linked list.\n"); exit(1); } TrimList(MSDataList); /* * Find the global multiplier, which is the value that determines the number of decimal points * for which the data will be considered significant. For example, if two decimal points are * to be used, then the multiplier will be 100 (ie, 100 x the monoisotopic mass will be the * node values used in the graph. */ FindTheMultiplier(); /* * If the 'Autodetect' value for the centroidOrProfile variable in the .params file was, * selected, the program tries to figure out what type of file is being used here. */ if(gParam.centroidOrProfile == 'A') { CentroidOrProfile(MSDataList); } /* * If the default value for the fragmentation pattern was selected, then the program * decides if the data is TSQ or LCQ data. The differentiation is not fool-proof, but * should work most of the time. The idea is that LCQ msms data starts at a mass that is * 1/3 of the precursor ion m/z. */ if(gParam.fragmentPattern == 'D') { GuessAtTheFragmentPattern(); } /* * Here's where profile data is smoothed once with a 5 point smoothing routine * (using Finnigan's coefficients). */ if(gParam.centroidOrProfile == 'P') { SmoothCID(MSDataList); } /* Here is where the auto-peakWidth is determined. The lowest mass ions (less than 500, and no more than ten and most intense) are chosen from the raw data in firstDataPtr. The peak tops in firstDataPtr are found, and the peak width at 50% are calculated. The peakWidth is 2x this 50% peakWidth, since peakWidth assumes a 10% valley. Peaks used must be >10% of the most intense ion below 500. An average of these peakWidths is determined, and outliers are discarded and the average is recalculated. This recalculated values is the peakWidth in the automated mode.*/ if(gParam.peakWidth == 0) { if(gParam.centroidOrProfile == 'P') { gParam.peakWidth = GetPeakWidth(MSDataList); } else { if(gParam.fragmentPattern == 'L') { gParam.peakWidth = 0.5; /*for lcq, assume width of 1 da*/ } else { gParam.peakWidth = 1; /*otherwise, assume width of 2 da*/ } } printf("The peak width is %5.2f\n", gParam.peakWidth * 2); } /* * Find the signal threshold, where the noise is the average of all ions and the * signal/noise threshold is defined by ionThreshold. */ gParam.intThreshold = FindThreshold(MSDataList); /* Alternatively, you could use the median rather than an average.*/ /*threshold = FindMedian(firstDataPtr); TOO SLOW */ /* * Identify groups of ions, where a group of ions are above the "threshold" and * consecutive ions are less than a peakWidth apart in m/z value (this is of particular * concern when importing centroided data rather than profile data). Groups of ions * are placed in a separate linked list which is passed to the function 'IonSorter'. * The global struct pointer gGroupingPtr is used to remember the position in the linked * list of the complete CID data where IonGrouper is to start finding a new group. If * IonGrouper is no longer able to find ion groups (ie, end of list), it passes * a NULL value. This signals IonSorter to also pass a NULL value, which signals * the while loop to terminate ('test' becomes FALSE). Otherwise, IonSorter returns a pointer * to a linked list of MSData structs, which contains a short list of ions that have been * weight averaged. This list of ions is appended onto the linked list that is pointed to by * firstAvMassPtr. */ peakList = IonCondenser(MSDataList); if(peakList->numObjects < 5) { printf("Too few data points remaining after ion condensation!"); return NULL; } /* * After generating all of the mass averaged ion values, some ions may still be closer than * the value of 'peakWidth'. This happens because some ions are just outside of the peakWidth * tolerance, but after weighted averaging to determine their new masses, they get too close. * For pairs of ions that are too close together, those with the lowest intensity are removed. */ /* XXXXXXXXXXX JAT - Is this still necessary? */ /* firstAvMassPtr = ZeroTheIons(firstAvMassPtr); */ /* Eliminate ions below the threshold.*/ /* currPtr = firstAvMassPtr->next; previousPtr = firstAvMassPtr; while(currPtr != NULL) { if(currPtr->mOverZ > precursor + gParam.fragmentErr) break; if(currPtr->intensity == threshold) { if((currPtr->mOverZ < 147.5) || (currPtr->mOverZ < 175.5 && currPtr->mOverZ > 174.5) || (currPtr->mOverZ < 159.5 && currPtr->mOverZ > 158.5)) { currPtr = currPtr->next; previousPtr = previousPtr->next; } else { previousPtr->next = currPtr->next; free(currPtr); currPtr = previousPtr->next; } } else { currPtr = currPtr->next; previousPtr = previousPtr->next; } } */ /* * Next the summed intensity of the linked list of ions is checked to see if it is less * than 2 billion. If the sum exceeds this then each ion is attenuated ten fold. This * is to make sure that I don't try to use a number too big for a INT_4 to hold later * on. */ CheckTheIntensity(peakList); /* Next add the ion offset to the m/z values in the linked list of ions.*/ AddTheIonOffset(peakList); /* Remove low mass ions that are not due to amino acids.*/ LowMassIonRemoval(peakList); /* For Qtof data, convert any intense doubly charged ions to singly-charged ones.*/ /* * Next the program checks to see if ions that are 1 Da apart are due to isotopes. The * ions that seem to be due to isotopes are removed. This is not done for data processed * using maxent3, since the data would already have been de-isotoped. */ if(!gParam.maxent3) { RemoveIsotopes(peakList); } /* * Remove the precursor ions. */ RemovePrecursors(peakList); /* * Find the ions that might be y ions that can be connected via single amino acids to the * y1 ions of 147 and 175 (if they are present). A value of 1 is placed in normIntensity * if the ion is a golden boy (from CA). */ if(gParam.proteolysis == 'T' && (gParam.fragmentPattern == 'Q' || gParam.fragmentPattern == 'T')) { FindTheGoldenBoys(peakList); } if(gParam.fragmentPattern =='L' && peakList->numObjects < 100) { FindBYGoldenBoys(peakList); /*spare the b/y pairs if there are not many ions at this point*/ } /* * Next the program checks to see if there are too many ions clustered together. It does * this by counting the number of ions within windows of width SPECTRAL_WINDOW_WIDTH Da and * making sure that only a certain number of ions (ionsPerWindow) are present within any * given window. If there are too many ions, it throws out those with the lowest intensity. */ WindowFilter(peakList); /* * Find high mass ions that could not be either b or y ions. */ EliminateBadHighMassIons(peakList); /* * Verify that the selected ions have a signal to noise ratio greater than SIGNAL_NOISE that * is #defined in LutefiskDefinitions. The dta files from qtof data are also checked for s/n problems. */ if(gParam.centroidOrProfile == 'P' || gParam.CIDfileType == 'X' || gParam.CIDfileType == 'D') { CheckSignalToNoise(peakList, MSDataList); } /* * Check to see if ions can be connected to other ions with very high mass accuracy * (compared to the calculated amino acid masses). For LCQ data I've decided (from 50 * measurements) that the average error should be better than 0.15. For QTOF data, this * should be much higher. For triple quad data, I don't care to speculate. */ if(gParam.fragmentPattern == 'L' || gParam.fragmentPattern == 'Q') { CheckConnections(peakList); } /* * For LCQ data, find b/y pairs and flag them as being golden boys that are difficult to delete on * the basis of ion intensity. */ if(gParam.fragmentPattern =='L') { FindBYGoldenBoys(peakList); } /* * Next I count the ions, and if there are more ions per residue than stipulated by the value * of "ionsPerResiude" (from Lutefisk.params), then the lowest intensity ions are removed. * For example if the peptide MW is 1200 and there are supposed to be 5 ions per residue, * then only 50 of the most intense ions are kept. */ finalIonCount = (gParam.peptideMW / AV_RESIDUE_MASS) + 0.5; /*Num of average residues.*/ finalIonCount = finalIonCount * gParam.ionsPerResidue; /*Num of ions allowed.*/ /*lowMassIons = 0; currPtr = &peakList->mass[0]; ptrOfNoReturn = &peakList->mass[peakList->numObjects]; while (currPtr < ptrOfNoReturn) { if (currPtr->mOverZ < 146.5) { lowMassIons++; } currPtr++; } finalIonCount += lowMassIons;*/ if (peakList->numObjects > finalIonCount) { WeedTheIons(peakList, finalIonCount, TRUE); } /* The use of "golden boy" ions (ions that connect to 147 and 175 in tryptic peptides) can produce too many ions. The following procedure is done in order to reduce excessive numbers. The golden boys are lumped in with the regular ions here, ie, no special treatment. */ excessIonRatio = (2000 - gParam.peptideMW) / 1000; excessIonRatio = (excessIonRatio * 0.25) + 1; if(excessIonRatio < 1) excessIonRatio = 1; finalIonCount = finalIonCount * excessIonRatio; /*Allow 15% increase in number of ions.*/ if (peakList->numObjects > finalIonCount) { WeedTheIons(peakList, finalIonCount, FALSE); } /* Print out the number of ions in the final linked list of averaged ions.*/ if(gParam.fMonitor) { printf("Number of ions: %ld \n", peakList->numObjects); } if(peakList->numObjects < 5) { printf("Too few data points remaining after preprocessing!"); return NULL; } /* * For QTof data, use y1 ions for R and K and immonium ions to obtain a mass offset correction. */ /*if(gParam.fragmentPattern == 'Q') { CalibrationCorrection(peakList); }*/ /* * For precursor charges of two or less, most of the fragment ions will be singly-charged. * Thus, one can compare theoretical mass defects from the observed defects, and make * corrections. */ if(gParam.chargeState <= 2 && gParam.fragmentPattern != 'Q') { DefectCorrection(peakList); } /* * For LCQ data, normalize the intensity to the fourth most intense ions. It seems that * the ion trap will produce one or two whopper ions, so it seems reasonable to avoid * having a large percentage of ion current in one or two ions. */ if(gParam.fragmentPattern == 'L') { NormalizeIntensity(peakList); } /* Now get rid of the raw CID data.*/ DisposeList(MSDataList); /* Print the final list*/ if(gParam.fVerbose) { DumpMSData(peakList); } /* KLUDGE: Dump back into linked list - JAT */ /* for (i = 0; i < MSDataList->numObjects; i++) { firstDataPtr = AddToCIDList(firstDataPtr, LoadMSDataStruct(MSDataList->mass[i].mOverZ, MSDataList->mass[i].intensity)); } */ for (i = 0; i < peakList->numObjects; i++) { firstAvMassPtr = AddToCIDList(firstAvMassPtr, LoadMSDataStruct(peakList->mass[i].mOverZ, peakList->mass[i].intensity)); } /* Check the CID data quality here, but only if there is a monitor for output.*/ if(gParam.quality && gParam.fMonitor) { printf("\n"); printf("Quality assessment:"); /* Check the charge state, peptide mass, number of ions, total intensity of ions, and distribution of ions.*/ generalQuality = GeneralEval(peakList); /* Check the low mass ions for qtof data*/ if(gParam.fragmentPattern == 'Q') { lowMassQuality = LowMassIonCheck(peakList); } else { lowMassQuality = 1; /*LCQ data doesn't contain the low mass end*/ } /* Check for series of ions above the precursor that can be connected by amino acids*/ highMassQuality = HighMassIonCheck(peakList); /* Combine the scores via multiplication; all values are between 0 and 1, so the combined quality value is also between 0 and 1*/ quality = generalQuality * lowMassQuality * highMassQuality; /* Print some more info*/ if(quality) { printf("\nThis spectrum exceeds the minimal spectral quality parameters.\n"); } else { printf("\nThis spectrum stinks.\n"); } /*printf("General appearance (scale of 0 to 1): %f\n", generalQuality); if(gParam.fragmentPattern != 'L') { printf("Low mass ions (scale of 0 to 1): %f\n", lowMassQuality); } printf("High mass ions above the precursor (scale of 0 to 1): %f\n", highMassQuality); if(quality == 1) { printf("Overall data quality rating (A to F scale): A\n\n"); } else if (quality < 1 && quality > 0.8) { printf("Overall data quality rating (A to F scale): B\n\n"); } else if(quality <= 0.8 && quality >= 0.5) { printf("Overall data quality rating (A to F scale): C\n\n"); } else if(quality < 0.5 && quality > 0) { printf("Overall data quality rating (A to F scale): D\n\n"); } else { printf("Overall data quality rating (A to F scale): F\n"); printf("Don't waste your time on this one.\n\n"); exit(0); }*/ } DisposeList(peakList); return(firstAvMassPtr); } /*****************************GeneralEval****************************************************** * * Check that the charge state of the precursor is either +2 or +3, and that the mass is not * a weird value. Also checks that this is not a virtually empty file (ie, more than a handful * of ions. Checks that fragment ions are not weird values. Makes sure that a minimum percentage * of the total ion abundance is above the precursor, and that of those above the precursor the * abundance is not isolated to just a few ions. If all of these tests are passed, then a * quality assignment is made based on the total intensity. * */ REAL_4 GeneralEval(tMSDataList *peakList) { INT_4 i; INT_4 highMassIonCount = 0; REAL_4 quality = 0; REAL_4 precursor = (gParam.peptideMW + gParam.chargeState * gElementMass[HYDROGEN]) / gParam.chargeState; REAL_4 totIntensity = 0; REAL_4 highMassInt = 0; REAL_4 highMassRatio = 0; REAL_4 highMassInt2 = 0; REAL_4 minPeakNum = gParam.peptideMW / AV_RESIDUE_MASS; tMSDataList * intensityOrderedList = NULL; /* Do some basic checks on the charge state and peptide mass*/ if(gParam.peptideMW < MIN_MASS) { return(0); /*Too small*/ } if(gParam.peptideMW > MAX_MASS) { return(0); /*Too big*/ } if(gParam.chargeState < 2) { return(0); /*Not enough charge*/ } if(gParam.chargeState > 3) { return(0); /*Too much charge*/ } /* Check the number of ions*/ if(peakList->numObjects < minPeakNum) { return(0); /*Not enough ions*/ } if(peakList->numObjects < MIN_NUM_IONS) { return(0); } /* Check for weird ions (negative mass or very high)*/ for (i = 0; i < peakList->numObjects; i++) { if(peakList->mass[i].mOverZ < -1) { return(0); } if(peakList->mass[i].mOverZ > MAX_ION_MASS) { return(0); } } /* Check for ions above the precursor*/ for(i = 0; i < peakList->numObjects; i++) { totIntensity += peakList->mass[i].intensity; if(peakList->mass[i].mOverZ > precursor + gParam.fragmentErr * 3) { highMassInt += peakList->mass[i].intensity; } } if(totIntensity == 0) { return(0); /*No ion intensity, so quality of zero is returned*/ } /* Calculate the percentage of total ion current above the precursor*/ highMassRatio = highMassInt / totIntensity; if(highMassRatio < MIN_HIGHMASS_INT_RATIO) { return(0); /*there's less than 5% of total intensity above the precursor*/ } /* Now I count the number of ions that comprise 95% of the high mass intensity*/ /*The next few lines are directly from bits of jat's code*/ intensityOrderedList = (tMSDataList *) CopyList( peakList ); if (!intensityOrderedList) { printf("Ran out of memory in GeneralEval()!\n"); exit(1); } /* Sort the intensityOrderedList in order of decreasing intensity */ qsort(intensityOrderedList->mass,(size_t)intensityOrderedList->numObjects, (size_t)sizeof(tMSData),IntensityDescendSortFunc); /* Count the number of ions required to account for most of the high mass intensity.*/ for(i = 0; i < intensityOrderedList->numObjects; i++) { if(intensityOrderedList->mass[i].mOverZ > precursor + 3 * gParam.fragmentErr) { highMassIonCount++; highMassInt2 += intensityOrderedList->mass[i].intensity; } if((highMassInt2 / totIntensity) > HIGH_MASS_RATIO * highMassRatio) { break; /*For example if HIGH_MASS_RATIO = 0.9 and highMassRatio was 0.5 (ie, half of the ion current in a spectrum was above the precursor), then this bit of code counts the number of ions required to account for 0.45 of the total intensity (ie, 90% of the current above the precursor). The idea is if only one or two ions account for most of the ion current, then this is not as good as if the current were spread out among several ions. The latter situation would imply better sequence data, than just having a few massive ions.*/ } } /* Assign quality values based on the number of major high mass ions, and peptide mass.*/ if(highMassIonCount == 1 && gParam.peptideMW > 850) { DisposeList(intensityOrderedList); /*get rid of this ion list*/ return(0); /*Only one major ion above the precursor*/ } DisposeList(intensityOrderedList); /*get rid of this ion list*/ return(1); /*return a value of 1 if the spectrum passes this test*/ } /*****************************LowMassIonCheck************************************************** * * First, low mass ions are checked against an approved list of peptide-related ions. * If any are found, then the low mass quality is not zero, and might be as much as 1 * depending on the number of low mass ions found. If no peptide-related low mass ions * are found, then the number of weird ions are counted. If any weird non-peptide ions * are located then the quality is zero. * */ REAL_4 LowMassIonCheck(tMSDataList *peakList) { REAL_4 lowMassIons[LOW_MASS_ION_NUM] = { 60.0, 70.1, 72.1, 74.1, 84.0, 86.1, 87.1, 88.0, 101.1, 102.1, 104.1, 110.1, 112.1, 120.1, 129.1, 136.1, 159.1, 147.1, 175.1 }; REAL_4 quality = 1; /*default for no low mass ions, and no weird ions*/ INT_4 i, j; INT_4 lowMassIonCount = 0; INT_4 weirdIons = 0; char test; /* First look for immonium and other peptide-related low mass ions*/ for (i = 0; i < peakList->numObjects; i++) { if(peakList->mass[i].mOverZ > 175.1 + gParam.fragmentErr) { break; } for(j = 0; j < LOW_MASS_ION_NUM; j++) { if(peakList->mass[i].mOverZ <= lowMassIons[j] + gParam.fragmentErr && peakList->mass[i].mOverZ >= lowMassIons[j] - gParam.fragmentErr) { lowMassIonCount++; break; } } } /* If no peptide-related low mass ions found, then look for weird low mass ions.*/ if(lowMassIonCount == 0) { for (i = 0; i < peakList->numObjects; i++) { if(peakList->mass[i].mOverZ > 147.1 - gParam.fragmentErr) { break; } test = TRUE; /*test is TRUE if its a weird peak*/ for(j = 0; j < 17; j++) { if(peakList->mass[i].mOverZ <= lowMassIons[j] + gParam.fragmentErr && peakList->mass[i].mOverZ >= lowMassIons[j] - gParam.fragmentErr) { test = FALSE; break; } } if(test) { weirdIons++; } } } /* Now assign a low mass ion quality.*/ if(lowMassIonCount == 0) { if(weirdIons != 0) { quality = 0; /*Weird ions in the absence of immoniums is not good at all*/ } } return(quality); } /*****************************HighMassIonCheck************************************************* * * * */ REAL_4 HighMassIonCheck(tMSDataList *peakList) { INT_4 i, highCount, j, k, totalSinglyChargedConnectNum, totalDoublyChargedConnectNum; INT_4 nodeConnect[MAX_HIGH_MASS][AMINO_ACID_NUMBER]; INT_4 connectNum[MAX_HIGH_MASS]; INT_4 m, n, firstAA, secondAA, thirdAA, fourthAA; REAL_4 highMass[MAX_HIGH_MASS], highInt[MAX_HIGH_MASS]; REAL_4 precursor; REAL_4 totalHighInt; REAL_4 averageInt, massDiff; REAL_4 threshold, quality; char runOfTwo, runOfThree, runOfFour; /* initialize*/ averageInt = 0; totalHighInt = 0; quality = 0; precursor = (gParam.peptideMW + gParam.chargeState * gElementMass[HYDROGEN]) / gParam.chargeState; highCount = 0; totalSinglyChargedConnectNum = 0; totalDoublyChargedConnectNum = 0; for(i = 0; i < MAX_HIGH_MASS; i++) { connectNum[i] = 0; for(j = 0; j < AMINO_ACID_NUMBER; j++) { nodeConnect[i][j] = 0; } } for(i = 0; i < MAX_HIGH_MASS; i++) { highMass[i] = 0; highInt[i] = 0; } /*Identify high mass ions*/ for (i = 0; i < peakList->numObjects; i++) { if(peakList->mass[i].mOverZ > precursor + 4 * gParam.fragmentErr) { highMass[highCount] = peakList->mass[i].mOverZ; highInt[highCount] = peakList->mass[i].intensity; totalHighInt += peakList->mass[i].intensity; highCount++; } if(highCount > MAX_HIGH_MASS) { return(0); /*Too many high mass ions (>100); I'll exceed the array sizes. Also, its unlikely to have this many high mass ions unless the file is screwed up somehow.*/ } } if(highCount == 0) { return(0); /*No high mass ions, so return a zero quality value*/ } averageInt = totalHighInt / highCount; /*average intensity*/ threshold = averageInt / 5; /*threshold is 1/4 of the average; there is no good reason to use 1/4 rather than, say 1/5.*/ /* First assume fragment ions are all singly charged. This first bit of nested for loops determines which nodes connect to each other. The indexing for connectNum matches with highMass, and contains the number of connections that can be made to lower mass nodes. The array nodeConnect has two indexing, where one matches connectNum and highMass and the other index corresponds to the number of connections that can be made to lower mass nodes. The intensity of the higher mass node always has to be above the threshold, whereas the lower mass node can be below threshold, but only if the mass difference corresponds to Pro or Gly. The actual value stored by nodeConnect is the index value that it can connect with.*/ for(i = 0; i < highCount; i++) { for(j = i; j < highCount; j++) { if(highInt[j] > threshold) { massDiff = highMass[j] - highMass[i]; if(massDiff >= gMonoMass[G] - gParam.fragmentErr) { for(k = 0; k < AMINO_ACID_NUMBER; k++) { if(massDiff <= gMonoMass[k] + gParam.fragmentErr && massDiff>= gMonoMass[k] - gParam.fragmentErr) { if(highInt[i] > threshold || k == G || k == P) { nodeConnect[j][connectNum[j]] = i; connectNum[j]++; break; } } } } } } } /* Count the singly-charged connections; this number does not tell you the longest sequence obtainable. This is determined below....*/ totalSinglyChargedConnectNum = 0; for(i = 0; i < highCount; i++) { if(connectNum[i] != 0) { totalSinglyChargedConnectNum++; } } /* Can I make runs of two, three, or four amino acids (ie, can I link three, four, or five ions)?*/ runOfTwo = FALSE; /*two amino acids defined*/ runOfThree = FALSE; /*three amino acids defined*/ runOfFour = FALSE; /*four amino acids defined*/ if(gParam.peptideMW > 1000) /*smaller peptides might not give data with long series of nodes*/ { for(i = highCount - 1; i >= 0; i--) /*start at the high mass end and word down*/ { if(connectNum[i] != 0) /*keep looking at the next node down if the current one lacks any connections to lower nodes*/ { for(j = 0; j < connectNum[i]; j++) /*loop through the all of the possible connections*/ { firstAA = nodeConnect[i][j]; /*firstAA is the next node down*/ if(connectNum[firstAA] == 0) { continue; /*if the next node down does not connect to anything, then stop following this pathway*/ } for(k = 0; k < connectNum[firstAA]; k++) { secondAA = nodeConnect[firstAA][k]; /*secondAA is the second node down*/ runOfTwo = TRUE; /*you can connect at least three ions, and this will remain TRUE for the rest of this function*/ if(connectNum[secondAA] == 0) { continue; /*if the second node down does not connect to anything, then stop following this pathway*/ } for(m = 0; m < connectNum[secondAA]; m++) /*etc etc*/ { thirdAA = nodeConnect[secondAA][m]; runOfThree = TRUE; if(connectNum[thirdAA] == 0) { continue; } for(n = 0; n < connectNum[thirdAA]; n++) { fourthAA = nodeConnect[thirdAA][n]; runOfFour = TRUE; } } } } } } } /* Now assume fragment ions are doubly-charged, which is only possible if the precursor is triply-charged, and the data was not derived from any maxent3 treatment (conversion of all ions to singly charged, and de-isotoped).*/ if(gParam.chargeState == 3 && !gParam.maxent3) { /*initialize the node arrays to get rid of the singly charged info*/ for(i = 0; i < MAX_HIGH_MASS; i++) { connectNum[i] = 0; for(j = 0; j < AMINO_ACID_NUMBER; j++) { nodeConnect[i][j] = 0; } } for(i = 0; i < highCount; i++) { for(j = i; j < highCount; j++) { if(highInt[j] > threshold) { massDiff = (highMass[j] - highMass[i]) * 2; if(massDiff >= gMonoMass[G] - gParam.fragmentErr) { for(k = 0; k < AMINO_ACID_NUMBER; k++) { if(massDiff <= gMonoMass[k] + gParam.fragmentErr && massDiff>= gMonoMass[k] - gParam.fragmentErr) { if(highInt[i] > threshold || k == G || k == P) { nodeConnect[j][connectNum[j]] = i; connectNum[j]++; break; } } } } } } } /*Count the doubly-charged connections*/ totalDoublyChargedConnectNum = 0; for(i = 0; i < highCount; i++) { if(connectNum[i] != 0) { totalDoublyChargedConnectNum++; } } /*Can I make runs of two, three, or four amino acids?*/ if(gParam.peptideMW > 1000) { for(i = highCount - 1; i >= 0; i--) { if(connectNum[i] != 0) { for(j = 0; j < connectNum[i]; j++) { firstAA = nodeConnect[i][j]; if(connectNum[firstAA] == 0) { continue; } for(k = 0; k < connectNum[firstAA]; k++) { secondAA = nodeConnect[firstAA][k]; runOfTwo = TRUE; if(connectNum[secondAA] == 0) { continue; } for(m = 0; m < connectNum[secondAA]; m++) { thirdAA = nodeConnect[secondAA][m]; runOfThree = TRUE; if(connectNum[thirdAA] == 0) { continue; } for(n = 0; n < connectNum[thirdAA]; n++) { fourthAA = nodeConnect[thirdAA][n]; runOfFour = TRUE; } } } } } } } } /*Print relevant info*/ /*if(runOfFour) { printf("\n"); printf("At least five ions above the precursor can be connected\n"); } else if(runOfThree) { printf("\n"); printf("At least four ions above the precursor can be connected\n"); } else if(runOfTwo) { printf("\n"); printf("At least three ions above the precursor can be connected\n"); } else { if(totalSinglyChargedConnectNum == 0 && totalDoublyChargedConnectNum == 0) { printf("\n"); printf("No ions above the precursor can be connected\n"); } else { printf("\n"); printf("Only pairs of ions above the precursor can be connected\n"); } }*/ /* First see if any connections can be made*/ if(totalSinglyChargedConnectNum == 0 && totalDoublyChargedConnectNum == 0) { if(gParam.peptideMW < 950) { quality = 1; /*low mw peptides may not have many ions above the precursor*/ } else { quality = 0; /*bigger peptides are therefore crap*/ } } else /*If any connections can be made, then start out w/ a high quality*/ { quality = 1; } /* Quality is adjusted downward if insufficient sequence lengths obtained for certain mass ranges*/ /*if(gParam.peptideMW > 1150) /*Anything less than 1150 is not attenuated further*/ /*{ if(gParam.peptideMW < 1300) { if(!runOfTwo) { quality *= 0.5; } } else if(gParam.peptideMW < 1450) { if(!runOfThree) { quality *= 0.5; } } else { if(!runOfFour) { quality *= 0.75; } } }*/ return(quality); } /* //-------------------------------------------------------------------------------- // ReadCIDFile() //-------------------------------------------------------------------------------- Modified 03.13.00 JAT - Split this function out from GetCidData(). */ tMSDataList *ReadCIDFile(char *inFilename) { FILE * fp; INT_4 i; INT_4 j; tMSDataList * MSDataList = NULL; tMSData massToAdd; REAL_4 massValue = 0.0; REAL_4 oldMassValue = 0.0; INT_4 ionIntensity = 0; INT_4 oldIonIntensity = 0; REAL_4 intensityAsReal = 0.0; char * stringBuffer = NULL; char * stringBuffer2 = NULL; BOOLEAN firstNumberFlag = true; BOOLEAN headerFlag = true; msms.scanMassHigh = -1; stringBuffer = (char *)malloc(258); if (NULL == stringBuffer) { printf("Outa memory in GetCidData()!\n"); goto problem; } stringBuffer2 = (char *)malloc(258); if (NULL == stringBuffer2) { printf("Outa memory in GetCidData()!\n"); goto problem; } for (i = 0; i < 258; i++) { stringBuffer2[i] = 0; } if (gParam.CIDfileType == 'N') { /* Open the native data file and make a linked list of m/z and intensity values.*/ if (gParam.fVerbose) printf("Reading the native CID file '%s'\n", inFilename); MSDataList = ReadFinniganFile(inFilename); } else { /* Open the ASCII data file and make a linked list of m/z and intensity values.*/ MSDataList = (tMSDataList *) CreateNewList( sizeof(tMSData), 500, 500 ); if ( NULL == MSDataList ) { printf("Outa memory in GetCIDFile()!\n"); goto problem; } fp = fopen(inFilename,"r"); if (fp == NULL) { printf("Cannot open the CID file '%s'.\n", inFilename); goto problem; } i=0; oldMassValue = 0; oldIonIntensity = 0; while (my_fgets(stringBuffer, 256, fp) != NULL) { i+=1; /* Skip blank lines */ if (!strcmp(stringBuffer, "\r")) continue; /*PC files are screwy*/ if (!strcmp(stringBuffer, "\n")) continue; /* Deal with the headers first -------------------------------------- */ if (headerFlag) { if(gParam.CIDfileType == 'T') { headerFlag = FALSE; /*tab text has no header*/ } else if (gParam.CIDfileType == 'F') /* Finnigan's ICIS text format */ { /* headerFlag is true until the data is being read w/ in the Finnigan file.*/ sscanf(stringBuffer, "%f %d", &massValue, &ionIntensity); /* The number 1 should appear on the first line of data in a Finnigan ASCII file.*/ if(massValue == 1.0) { headerFlag = FALSE; } else if (massValue > 1.0 && massValue < 2000.0 && ionIntensity > 0) { /* Catch potential problem caused by forgetting to change from 'F' to 'T' in the .params file. */ gParam.CIDfileType = 'T'; } else continue; /* Still in the header, so go back to the start of the loop. */ } else if (gParam.CIDfileType == 'L') /* Finnigan's LCQ text format */ { sscanf(stringBuffer, "%[DataPeaks]", stringBuffer2); if(!strcmp(stringBuffer2, "DataPeaks")) { headerFlag = FALSE; continue; } else { continue; /*read another line*/ } } else if (gParam.CIDfileType == 'D') /* Finnigan's '.dta' text format */ { /*First line is MH+ and charge: read if these values are zero from the params file*/ if(gParam.peptideMW == 0 || gParam.chargeState == 0) { sscanf(stringBuffer, "%f %d", &gParam.peptideMW, &gParam.chargeState); /*adjust for the fact that the value in the dta file is MH+*/ gParam.peptideMW -= gElementMass[HYDROGEN]; if (gParam.fVerbose) { printf(" Precursor Mass: %.3f\n", gParam.peptideMW); printf("Precursor Charge: %d\n", gParam.chargeState); } } else { sscanf(stringBuffer, "%*f %*d"); } headerFlag = FALSE; continue; /*the next line is the start of the data*/ } else if (gParam.CIDfileType == 'Q') /* Micromass' pkl '.dta' text format */ { /*First line is the precursor m/z, followed by intensity (float) and charge: read if these values are zero from the params file*/ if(gParam.peptideMW == 0 || gParam.chargeState == 0) { sscanf(stringBuffer, "%f %*f %d", &gParam.peptideMW, &gParam.chargeState); /*adjust for the fact that the value in the pkl file is precurson mass*/ gParam.peptideMW = (gParam.peptideMW * gParam.chargeState) - (gParam.chargeState * gElementMass[HYDROGEN]); } else { sscanf(stringBuffer, "%*f %*f %*d"); } headerFlag = FALSE; continue; /*the next line is the start of the data*/ } else { printf("Whoa! Pleading ignorace of CID file type '%c'\n", gParam.CIDfileType); goto problem; } } /* Read the data */ massToAdd.mOverZ = -1; /*test to see if real data entered later*/ massToAdd.intensity = -1; if (gParam.CIDfileType == 'F') /* Finnigan's ICIS text format */ { sscanf(stringBuffer, "%*d %f %d", &massToAdd.mOverZ, &massToAdd.intensity); } else if (gParam.CIDfileType == 'T') /* Tab text format */ { sscanf(stringBuffer, "%f %d", &massToAdd.mOverZ, &massToAdd.intensity); if (i == 1 && (massToAdd.mOverZ <= 1 || massToAdd.mOverZ > 1000)) { printf("The datafile does not appear to be in tab-delimited (T) format\n" "Be sure the CID file type is set correctly in the Lutefisk.params file.\n"); goto problem; } } else if (gParam.CIDfileType == 'Q') /* Micromass' pkl '.dta' text format */ { sscanf(stringBuffer, "%f %f", &massToAdd.mOverZ, &intensityAsReal); massToAdd.intensity = (INT_4) intensityAsReal; if (i == 1 && (massToAdd.mOverZ <= 1 || massToAdd.mOverZ > 1000)) { printf("The datafile does not appear to be in Micromass' pkl '.dta' (Q) format\n" "Be sure the CID file type is set correctly in the Lutefisk.params file.\n"); goto problem; } } else if (gParam.CIDfileType == 'L') /* Finnigan's LCQ text format */ { if (stringBuffer[0] == '\n') { continue; /*skip any blank lines*/ } sscanf(stringBuffer, "%s", stringBuffer2); if (!strcmp(stringBuffer2, "saturated")) { continue; /*finnigan sticks this into every other line for some reason*/ } for (j = 0; j < 258; j++) { if(stringBuffer[j] == ',') /*replaces commas with spaces*/ { stringBuffer[j] = ' '; } } sscanf(stringBuffer, "%*[Packet] %*[#] %*d %*[intensity] %*[=] %f %*[mass/position] %*[=] %f", &intensityAsReal, &massToAdd.mOverZ); massToAdd.intensity = (INT_4) intensityAsReal; } else if (gParam.CIDfileType == 'D') /* Finnigan's '.dta' text format */ { /* In case the intensity is a real value, we will read it this way. */ sscanf(stringBuffer, "%f %f", &massToAdd.mOverZ, &intensityAsReal); massToAdd.intensity = (INT_4) intensityAsReal; } if (massToAdd.mOverZ < -1 || massToAdd.intensity < 0) { printf("There is something wrong with the data file.\n"); goto problem; } if (firstNumberFlag) { msms.scanMassLow = massToAdd.mOverZ; firstNumberFlag = false; } if(massToAdd.mOverZ > msms.scanMassHigh) msms.scanMassHigh = massToAdd.mOverZ; if (oldMassValue == massToAdd.mOverZ) { if (massToAdd.intensity >= oldIonIntensity) { MSDataList->mass[MSDataList->numObjects - 1] = massToAdd; oldMassValue = massToAdd.mOverZ; oldIonIntensity = massToAdd.intensity; } } else { if (!AddToList(&massToAdd, MSDataList)) { printf("Ran out of room for datapoints!\n"); goto problem; } oldMassValue = massToAdd.mOverZ; oldIonIntensity = massToAdd.intensity; } } fclose(fp); } free(stringBuffer); free(stringBuffer2); return MSDataList; problem: printf("Quitting."); exit(1); return(NULL); } /************************************ CalibrationCorrection ******************************** * * If sufficiently intense immonium ions and y1 ions (for tryptic peptides) are available, * determine a mass correction to be applied to the list of ions. * */ void CalibrationCorrection(tMSDataList *inPeakList) { REAL_4 immoniumIons[13], y1Arg, y1Lys, intensityCutoff, calibrationMass[15], calibrationIntensity[15]; REAL_4 offsetMass, totalOffsetIntensity, lysErr, argErr; INT_4 i, ionNum; tMSData *currPtr = NULL; tMSData *ptrOfNoReturn = NULL; /*Calculate the values for y1Arg, y1Lys, and immoniumIons.*/ y1Arg = gMonoMass[R] + 3 * gElementMass[HYDROGEN] + gElementMass[OXYGEN]; y1Lys = gMonoMass[K] + 3 * gElementMass[HYDROGEN] + gElementMass[OXYGEN]; immoniumIons[0] = gMonoMass[P] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[1] = gMonoMass[V] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[2] = gMonoMass[L] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[3] = gMonoMass[M] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[4] = gMonoMass[H] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[5] = gMonoMass[F] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[6] = gMonoMass[Y] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[7] = gMonoMass[W] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[8] = gMonoMass[T] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[9] = gMonoMass[S] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[10] = gMonoMass[N] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[11] = gMonoMass[D] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; immoniumIons[12] = gMonoMass[E] + gElementMass[HYDROGEN] - gElementMass[OXYGEN] - gElementMass[CARBON]; if (inPeakList->numObjects == 0) return; /*Calculate the intensity cutoff.*/ intensityCutoff = 0; currPtr = &inPeakList->mass[0]; ptrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPtr < ptrOfNoReturn) { intensityCutoff += currPtr->intensity; currPtr++; } intensityCutoff = intensityCutoff / inPeakList->numObjects; /*calc the average intensity*/ intensityCutoff = intensityCutoff / 4; /*Only use ions that are greater than 1/4 of the average intensity*/ /*Check if any of the immonium ions or tryptic y1 ions are present.*/ ionNum = 0; /*reset the ion counter*/ if(gParam.proteolysis == 'T') /*look for y1 ions for Arg and Lys if a tryptic peptide.*/ { currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { if(currPtr->mOverZ > y1Arg + gParam.fragmentErr) { break; } if(currPtr->mOverZ <= y1Lys + gParam.fragmentErr && currPtr->mOverZ >= y1Lys - gParam.fragmentErr && currPtr->intensity > intensityCutoff) { calibrationMass[ionNum] = y1Lys - currPtr->mOverZ; calibrationIntensity[ionNum] = currPtr->intensity; ionNum++; } if(currPtr->mOverZ <= y1Arg + gParam.fragmentErr && currPtr->mOverZ >= y1Arg - gParam.fragmentErr && currPtr->intensity > intensityCutoff) { calibrationMass[ionNum] = y1Arg - currPtr->mOverZ; calibrationIntensity[ionNum] = currPtr->intensity; ionNum++; } currPtr++; } if(ionNum == 2) /*if both K and R y1 ions found, then choose the one with the least error*/ { lysErr = calibrationMass[0] - y1Lys; argErr = calibrationMass[1] - y1Arg; if(lysErr < 0) { lysErr = lysErr * -1; } if(argErr < 0) { argErr = argErr * -1; } if(lysErr < argErr) { ionNum = 1; } else { calibrationMass[0] = calibrationMass[1]; calibrationIntensity[0] = calibrationIntensity[1]; ionNum = 1; } } } if(gParam.proteolysis == 'K') /*Look for y1 ion of Lys if a Lys-C peptide.*/ { currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { if(currPtr->mOverZ > y1Lys + gParam.fragmentErr) { break; } if(currPtr->mOverZ <= y1Lys + gParam.fragmentErr && currPtr->mOverZ >= y1Lys - gParam.fragmentErr && currPtr->intensity > intensityCutoff) { calibrationMass[ionNum] = y1Lys - currPtr->mOverZ; calibrationIntensity[ionNum] = currPtr->intensity; ionNum++; } currPtr++; } } /*Now look for the immonium ions.*/ for(i = 0; i < 13; i++) { currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { if(currPtr->mOverZ > immoniumIons[i] + gParam.fragmentErr) { break; } if(currPtr->mOverZ >= immoniumIons[i] - gParam.fragmentErr && currPtr->mOverZ <= immoniumIons[i] + gParam.fragmentErr && currPtr->intensity > intensityCutoff) { calibrationMass[ionNum] = immoniumIons[i] - currPtr->mOverZ; calibrationIntensity[ionNum] = currPtr->intensity; ionNum++; } currPtr++; } } /*Calculate the offset values*/ offsetMass = 0; totalOffsetIntensity = 0; for(i = 0; i < ionNum; i++) { offsetMass += calibrationMass[i] * calibrationIntensity[i]; totalOffsetIntensity += calibrationIntensity[i]; } if(totalOffsetIntensity == 0) return; /*nothing was found to adjust calibration, so return w/o modifying the masses.*/ offsetMass = offsetMass / totalOffsetIntensity; /*obtain the average offset weighted for intensity*/ offsetMass = offsetMass * 0.5; /*adjust by only half as much as calculated (dont be too radical)*/ /*Apply the calculated offset value to all of the ion masses.*/ currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { currPtr->mOverZ = currPtr->mOverZ + offsetMass; currPtr++; } msms.scanMassLow = msms.scanMassLow + offsetMass; msms.scanMassHigh = msms.scanMassHigh + offsetMass; printf("The QTof calibration offset is %f \n", offsetMass); return; } /***********************************CheckConnections**************************************** * * This is where I make sure that ions can be connected to other ions via single amino * acid jumps. The mass accuracy required is increased for this determination. */ void CheckConnections(tMSDataList *inPeakList) { tMSData *currPtr; tMSData *ptrOfNoReturn; tMSData *nextPtr; REAL_4 massDiff, error; REAL_4 highMassYIon, lowMassYIon, highMassBIon; REAL_4 precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; INT_4 j, i, maxCharge; char test; if(gParam.fragmentErr >= 0.75 || (gParam.chargeState > 2 && gParam.maxent3 == FALSE)) { return; /*Invoke this routine only by setting the fragment error to less than 0.75.*/ } error = gParam.fragmentErr * 0.5; /*9/23/03 changed rsj*/ /* make all indexes negative, later any ions that can connect are made positive, and in the end the ions that continue to have negative indexes are removed */ currPtr = &inPeakList->mass[0]; ptrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPtr < ptrOfNoReturn) { currPtr->index = -1; currPtr++; } /* For doubly charged precursors from ion trap data, don't eliminate fragment ions that could be doubly-charged.*/ if(gParam.fragmentPattern == 'L' && gParam.chargeState == 2) { currPtr = &inPeakList->mass[0]; ptrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPtr < ptrOfNoReturn) { if(currPtr->mOverZ >= precursor - gMonoMass[W] - gParam.fragmentErr && currPtr->mOverZ <= precursor - 2 * gParam.fragmentErr) { currPtr->index = 1; } currPtr++; } } /*now find connecting ions*/ currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn - 1) { nextPtr = currPtr + 1; while(nextPtr < ptrOfNoReturn) { massDiff = nextPtr->mOverZ - currPtr->mOverZ; if(massDiff > gMonoMass[W] + error) { break; /*stop looking if the difference is more than Trp*/ } test = FALSE; for(i = 0; i < gAminoAcidNumber; i++) { if(massDiff <= gMonoMass[i] + error && massDiff >= gMonoMass[i] - error) { test = TRUE; break; } } if(test) { if(nextPtr->index == -1) { nextPtr->index = 1; } if(currPtr->index == -1) { currPtr->index = 1; } } nextPtr++; } currPtr++; } /*Keep ions that could be terminal y ions or high mass b ions.*/ if(gParam.maxent3) { maxCharge = 1; /*maxCharge is one for maxent3 data, since all ions converted to +1*/ } else { maxCharge = gParam.chargeState; } currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { for(i = 1; i <= maxCharge; i++) { for(j = 0; j < gAminoAcidNumber; j++) { /*Calculate the high mass y ion.*/ highMassYIon = gParam.peptideMW - gMonoMass[j] + gParam.modifiedNTerm - gElementMass[HYDROGEN]; highMassYIon = (highMassYIon + (gElementMass[HYDROGEN] * i)) / i; /*Calculate the low mass y ion.*/ lowMassYIon = gMonoMass[j] + gElementMass[HYDROGEN] + gParam.modifiedCTerm; lowMassYIon = (lowMassYIon + (gElementMass[HYDROGEN] * i)) / i; /*Calculate the high mass b ion.*/ highMassBIon = gParam.peptideMW - gMonoMass[j] - gParam.modifiedCTerm; highMassBIon = (highMassBIon + (gElementMass[HYDROGEN] * (i - 1))) / i; if(currPtr->mOverZ >= highMassYIon - gParam.fragmentErr && currPtr->mOverZ <= highMassYIon + gParam.fragmentErr) { if(currPtr->index == -1) { currPtr->index = 1; /**Fixed by RSJ, JAT had currptr->intensity*/ } } if(currPtr->mOverZ >= lowMassYIon - gParam.fragmentErr && currPtr->mOverZ <= lowMassYIon + gParam.fragmentErr) { if(currPtr->index == -1) { currPtr->index = 1; /**Fixed by RSJ, JAT had currptr->intensity*/ } } if(currPtr->mOverZ >= highMassBIon - gParam.fragmentErr && currPtr->mOverZ <= highMassBIon + gParam.fragmentErr) { if(currPtr->index == -1) { currPtr->index = 1; /**Fixed by RSJ, JAT had currptr->intensity*/ } } } } currPtr++; } /*Keep the low mass ions.*/ currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { if(currPtr->mOverZ < 148 || (currPtr->mOverZ > 158.5 && currPtr->mOverZ < 159.5)) { if(currPtr->index == -1) { currPtr->index = 1; } } currPtr++; } /*get rid of the peaks with neg indexes*/ currPtr = &inPeakList->mass[0]; while(currPtr < ptrOfNoReturn) { if(currPtr->index == -1) { RemoveFromList(currPtr - &inPeakList->mass[0], inPeakList); ptrOfNoReturn--; currPtr--; } currPtr++; } /* Reset index values */ for (i = 0; i < inPeakList->numObjects; i++) { inPeakList->mass[i].index = i; } return; } /***********************************CheckSignalToNoise************************************** * * This function verifies that the remaining ions selected as "interesting" have an adequate * signal to noise ratio. This ratio is #defined in LutefiskDefinitions. The noise is * calculated as the average intensity 50 u above and 50 u below the ion. For simplicity, * the first element in the linked list is never deleted. */ void CheckSignalToNoise(tMSDataList *inPeakList, tMSDataList *inMSDataList) { tMSDataList *neighborhoodList = NULL; tMSData *currPtr = NULL; tMSData *previousPtr = NULL; tMSData *ptrOfNoReturn = NULL; tMSData *currPeakPtr = NULL; tMSData *peakPtrOfNoReturn = NULL; INT_4 dataNum; INT_4 range = 50; /*This is the range (+/- this num) over which the noise is determined*/ INT_4 deltaMassNum; INT_4 calcDataNum; REAL_4 signalToNoise, intensity, noise, deltaMass, deltaMassSum, firstDataPoint; /* I ran in to trouble by assuming that profile data would have a continuous non-zero intensity. However, it seems that Sciex data and probably others has long stretches of zeroed data. This had the effect of having a higher calculated noise value, which caused real ions to be excluded using the original s/n exclusion criteria established for continuous non-zero data. To overcome this, I need to figure out what the average data spacing is, and then use this to figure out the noise level. */ if (inMSDataList->numObjects < 2) return; if (inPeakList->numObjects < 2) return; previousPtr = &inMSDataList->mass[0]; currPtr = &inMSDataList->mass[1]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; deltaMass = 0; deltaMassSum = 0; deltaMassNum = 0; if(gParam.CIDfileType == 'X' || gParam.CIDfileType == 'D') /*QTof .dta data*/ { while(currPtr < ptrOfNoReturn) { if(currPtr->intensity != 0 && previousPtr->intensity != 0) /*only look at non-zero data points*/ { deltaMass = currPtr->mOverZ - previousPtr->mOverZ; if(deltaMass <= 2) /*dta files seem to have "ions" spaced every u or so*/ { deltaMassSum += deltaMass; deltaMassNum++; } } currPtr++; previousPtr++; } if(deltaMass <= 0 || deltaMassNum < 25) { return; /*if the dta data has been created so that there are few ions less than 2 da apart, then its difficult to assess the noise level*/ } deltaMass = deltaMassSum / (REAL_4)deltaMassNum; } else { while(currPtr < ptrOfNoReturn) { if(currPtr->intensity != 0 && previousPtr->intensity != 0) /*only look at non-zero data points*/ { deltaMass = currPtr->mOverZ - previousPtr->mOverZ; if(deltaMass <= (gParam.peakWidth * 0.5)) { deltaMassSum += deltaMass; deltaMassNum++; } } currPtr++; previousPtr++; } if(deltaMassNum <= 0) /*if deltaMass not found try setting the window a bit wider*/ { previousPtr = &inMSDataList->mass[0]; currPtr = &inMSDataList->mass[1]; deltaMass = 0; deltaMassSum = 0; deltaMassNum = 0; while(currPtr < ptrOfNoReturn) { if(currPtr->intensity != 0 && previousPtr->intensity != 0) /*only look at non-zero data points*/ { deltaMass = currPtr->mOverZ - previousPtr->mOverZ; if(deltaMass <= gParam.peakWidth) { deltaMassSum += deltaMass; deltaMassNum++; } } currPtr++; previousPtr++; } } if(deltaMassNum == 0) return; /*avoid divide by zero*/ deltaMass = deltaMassSum / (REAL_4)deltaMassNum; } if(deltaMass <= 0) return; /*to prevent divide by zero below*/ neighborhoodList = (tMSDataList *) CreateNewList( sizeof(tMSData), 1000, 1000 ); if (!neighborhoodList) { printf("Ran out of memory in CheckSignalToNoise()!\n"); exit(1); } /* Start looking at each peak*/ currPeakPtr = &inPeakList->mass[0]; peakPtrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPeakPtr < peakPtrOfNoReturn) { /* First find the ion's neighbors, excluding the ion itself*/ firstDataPoint = 0; dataNum = 0; calcDataNum = 0; intensity = 0; signalToNoise = 0; currPtr = &inMSDataList->mass[0]; while(currPtr < ptrOfNoReturn) { if(currPtr->intensity != 0) /*dont look at data points w/ zero intensity*/ { if(currPtr->mOverZ > currPeakPtr->mOverZ + range) { break; } else { if (currPtr->mOverZ > currPeakPtr->mOverZ - range && (currPtr->mOverZ < currPeakPtr->mOverZ - gParam.peakWidth || currPtr->mOverZ > currPeakPtr->mOverZ + gParam.peakWidth)) { if (!AddToList(currPtr, neighborhoodList)) { printf("Ran out of memory in CheckSignalToNoise()!\n"); exit(1); } } } } currPtr++; } if(neighborhoodList->numObjects > 0) { /* Sort the neighborhoodList in order of decreasing intensity */ qsort(neighborhoodList->mass,(size_t)neighborhoodList->numObjects, (size_t)sizeof(tMSData),IntensityDescendSortFunc); /* The noise is the median intensity in the list */ noise = neighborhoodList->mass[(INT_4)(neighborhoodList->numObjects/2)].intensity; signalToNoise = (currPeakPtr->intensity) / noise; } else { signalToNoise = 100; /*if there are no data points for measuring noise then give high s/n*/ } /* less than 50 data points used to determine the noise level is considered insufficient */ if ((signalToNoise < SIGNAL_NOISE && neighborhoodList->numObjects > 50) || signalToNoise == 0) { /*dont get rid of immonium ions*/ if(currPeakPtr->mOverZ > 148 && (currPeakPtr->mOverZ < 158.5 || currPeakPtr->mOverZ > 159.5)) { RemoveFromList(currPeakPtr - &inPeakList->mass[0], inPeakList); peakPtrOfNoReturn--; currPeakPtr--; } } neighborhoodList->numObjects = 0; currPeakPtr++; } if (neighborhoodList) DisposeList(neighborhoodList); return; } /***********************************GuessAtTheFragmentPattern******************************* * * If the default ("D") is used for the fragment pattern, then the program takes a stab at * figuring out if the data is from an LCQ or a TSQ. It does this by assuming that TSQ * data always starts at a mass lower than 0.2 x the precursor m/z. If changes occur to * the LCQ that permit lower start masses in MS/MS data, then this will need to be changed. * */ void GuessAtTheFragmentPattern() { REAL_4 precursor; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precursor = precursor * 0.2; if(precursor > msms.scanMassLow) { if(gParam.fragmentErr >= 0.15 * gMultiplier) { gParam.fragmentPattern = 'T'; /* TSQ (triple quad) */ } else { gParam.fragmentPattern = 'Q'; /* Q-TOF */ } } else { gParam.fragmentPattern = 'L'; /* LCQ (ion trap) */ } return; } /***********************************DefectCorrection**************************************** * * The observed mass defect is compared to the theoretical mass defect, and corrections are * made. I've found that my LCQ data can have slightly lower than expected masses at higher * m/z; for example, an ion at 1100.2 is really at 1100.6. Since the low m/z end is often * ok, I cannot use a mass offset across the entire m/z range. This more intelligent defect * correction should allow for the use of tighter error tolerances (+/- 0.5 Da). * */ void DefectCorrection(tMSDataList *inPeakList) { tMSData *currPtr = NULL; tMSData *previousPtr = NULL; tMSData *ptrOfNoReturn = NULL; INT_4 integerMass[200], ionNum, i; REAL_4 mass[200], defect[200], precursor; REAL_8 aObserved, bObserved, sumOfXSquared, sumOfY, sumOfX; REAL_8 sumOfXTimesY, aTheory = 0, bTheory = 0.00050275; REAL_8 numerator, denominator, observedDefect, theoryDefect, additionalDefect, testMass; char doublyCharged[200]; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; /* Create mass and integerMass arrays from linked list data*/ ionNum = 0; currPtr = &inPeakList->mass[0]; ptrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPtr < ptrOfNoReturn) { mass[ionNum] = currPtr->mOverZ; theoryDefect = bTheory * mass[ionNum]; /*added 8/5/03*/ integerMass[ionNum] = mass[ionNum] - theoryDefect + 0.5; /*added -theoryDefect + 0.5 8/5/03*/ defect[ionNum] = mass[ionNum] - integerMass[ionNum]; ionNum++; if(ionNum >= 200) return; /*if too many ions, then return w/o making corrections*/ currPtr++; } /* Find potential doubly-charged ions, and mark them so that they are not changed.*/ for(i = 0; i < ionNum; i++) { doublyCharged[i] = 0; /*initialize*/ } if(gParam.chargeState == 2) { for(i = 0; i < ionNum; i++) { if(mass[i] > precursor - (3 * gMonoMass[W] / 2) - gParam.fragmentErr && mass[i] < precursor + gParam.fragmentErr) { doublyCharged[i] = 1; } } /*for(j = 0; j < ionNum; j++) { if(((mass[i] * 2) - gElementMass[HYDROGEN] <= mass[j] + gParam.fragmentErr) && ((mass[i] * 2) - gElementMass[HYDROGEN] >= mass[j] - gParam.fragmentErr)) { doublyCharged[i] = 1; } }*/ } /* Calculated the mass defect for each data point*/ /* for(i = 0; i < ionNum; i++) { defect[i] = mass[i] - integerMass[i]; if(integerMass[i] < 700 && doublyCharged[i] == 0) { if(defect[i] > 0.7) //for low mass, if error is in other direction, the defect should be negative. { defect[i] = 0; integerMass[i] = mass[i] + 0.5; mass[i] = integerMass[i]; //bump the value up to at least the integer value } } }*/ /* Now for the least squares calculation of a straight line.*/ sumOfX = 0; /*calc sumOfX*/ for(i = 0; i < ionNum; i++) { sumOfX += mass[i]; } sumOfY = 0; /*calc sumOfY*/ for(i = 0; i < ionNum; i++) { sumOfY += defect[i]; } sumOfXSquared = 0; /*calc sumOfXSquared*/ for(i = 0; i < ionNum; i++) { sumOfXSquared = sumOfXSquared + (mass[i] * mass[i]); } sumOfXTimesY = 0; /*calc of sumOfXTimesY*/ for(i = 0; i < ionNum; i++) { sumOfXTimesY = sumOfXTimesY + (mass[i] * defect[i]); } /* From equation y = bx + a, calc a first*/ numerator = (sumOfXSquared * sumOfY) - (sumOfX * sumOfXTimesY); denominator = (ionNum * sumOfXSquared) - (sumOfX * sumOfX); if(denominator == 0) { printf("DefectCorrection: denominator = 0! Quitting.\n"); exit(1); } aObserved = numerator / denominator; /* Now calculate b*/ numerator = (ionNum * sumOfXTimesY) - (sumOfX * sumOfY); denominator = (ionNum * sumOfXSquared) - (sumOfX * sumOfX); if(denominator == 0) { printf("DefectCorrection: denominator = 0\n"); exit(1); } bObserved = numerator / denominator; /* Now make the mass corrections.*/ for(i = 0; i < ionNum; i++) { if(mass[i] > 0 && doublyCharged[i] == 0) { testMass = (mass[i] * bTheory) - defect[i]; if(testMass > 0.15) /*if already close, then don't bother*/ { observedDefect = (bObserved * mass[i]) + aObserved; theoryDefect = (bTheory * mass[i]) + aTheory; additionalDefect = theoryDefect - observedDefect; if(additionalDefect < 0) { additionalDefect = (theoryDefect - defect[i]) / 2; } if(additionalDefect > 0) { mass[i] = mass[i] + additionalDefect; } } testMass = defect[i] - (mass[i] * bTheory); if(testMass > 0.15) { observedDefect = (bObserved * mass[i]) + aObserved; theoryDefect = (bTheory * mass[i]) + aTheory; additionalDefect = theoryDefect - observedDefect; if(additionalDefect > 0) { additionalDefect = (theoryDefect - defect[i]) / 2; } if(additionalDefect < 0) { mass[i] = mass[i] + additionalDefect; } } } } currPtr = &inPeakList->mass[0]; i = 0; while(currPtr < ptrOfNoReturn) { currPtr->mOverZ = mass[i]; i++; currPtr++; } return; } /***********************************NormalizeIntensity************************************** * * This function normalizes the CID data intensity to the fourth most intense ion. It * seems to not be unusual for LCQ data to have a few favored fragmentation pathways that * result in a couple of intense ions. In order to increase the spread in the scoring of * candidate sequences, these intense ions are reduced to the intensity of the fourth most * abundant ion. * */ void NormalizeIntensity(tMSDataList *inMSDataList) { /* Sort the MSData in order of decreasing intensity */ qsort(inMSDataList->mass,(size_t)inMSDataList->numObjects, (size_t)sizeof(tMSData),IntensityDescendSortFunc); inMSDataList->mass[0].intensity = inMSDataList->mass[3].intensity; inMSDataList->mass[1].intensity = inMSDataList->mass[3].intensity; inMSDataList->mass[2].intensity = inMSDataList->mass[3].intensity; /* Resort the MSData in order of increasing mass */ qsort(inMSDataList->mass,(size_t)inMSDataList->numObjects, (size_t)sizeof(tMSData),MassAscendSortFunc); return; } /***********************************CentroidOrProfile*************************************** * * The mass differences between adjacent data points are calculated, and a standard deviation * for these differences is used to establish if the data is profile (very little deviation) * or centroided data (large deviation). * */ void CentroidOrProfile(tMSDataList *inMSDataList) { tMSData *currPtr; tMSData *prevPtr; tMSData *ptrOfNoReturn; INT_4 threshold = 0; INT_4 numberOfMassDiffs = 0; INT_4 i; REAL_8 massDiff[100]; REAL_8 massDiffAv = 0; REAL_8 standardDeviation = 0; if (inMSDataList->numObjects < 2) return; threshold = FindThreshold(inMSDataList); prevPtr = &inMSDataList->mass[0]; currPtr = &inMSDataList->mass[1]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn && numberOfMassDiffs < 100) { if(currPtr->intensity > threshold && prevPtr->intensity > threshold) { massDiff[numberOfMassDiffs] = (currPtr->mOverZ) - (prevPtr->mOverZ); massDiffAv += massDiff[numberOfMassDiffs]; numberOfMassDiffs++; } currPtr++; prevPtr++; } if(numberOfMassDiffs == 0) return; /* Avoid potential divide-by-zero */ massDiffAv = massDiffAv / numberOfMassDiffs; for(i = 0; i < numberOfMassDiffs; i++) { standardDeviation = standardDeviation + ((massDiff[i] - massDiffAv) * (massDiff[i] - massDiffAv)); } standardDeviation = standardDeviation / numberOfMassDiffs; standardDeviation = sqrt(standardDeviation); if(massDiffAv < 1) { if(standardDeviation < 0.5) { gParam.centroidOrProfile = 'P'; } else { gParam.centroidOrProfile = 'C'; } } else { gParam.centroidOrProfile = 'C'; } return; } /***********************************FindBYGoldenBoys**************************************** * * For LCQ data, find b/y ion pair complements that represent cleavage at the same amide * bond. Designate these as golden boy ions that are difficult to get rid of simply on the * basis of ion intensity. * */ void FindBYGoldenBoys(tMSDataList *inMSDataList) { REAL_4 *massList, *mass2List, testMass, *pairMass, avePairMass; REAL_8 stDev; INT_4 maxIonNum = gGraphLength / gMultiplier; INT_4 ionNum, i, j, pairNum; char *goodOrBad; tMSData *currPtr = NULL; tMSData *ptrOfNoReturn = NULL; /* Set aside some space for these arrays.*/ massList = (float *) malloc(maxIonNum * sizeof(REAL_4)); if(massList == NULL) { printf("FindBYGoldenBoys: Out of memory."); exit(1); } mass2List = (float *) malloc(maxIonNum * sizeof(REAL_4)); if(mass2List == NULL) { printf("FindBYGoldenBoys: Out of memory."); exit(1); } pairMass = (float *) malloc(maxIonNum * sizeof(REAL_4)); if(massList == NULL) { printf("FindBYGoldenBoys: Out of memory."); exit(1); } goodOrBad = (char *) malloc(maxIonNum * sizeof(char)); if(goodOrBad == NULL) { printf("FindBYGoldenBoys: Out of memory."); exit(1); } /* Initialize some variables.*/ currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while(currPtr < ptrOfNoReturn) { currPtr->normIntensity = 0; /*set normIntensity field to zero*/ currPtr++; } ionNum = 0; pairNum = 0; avePairMass = 0; stDev = 0; for(i = 0; i < maxIonNum; i++) { massList[i] = 0; mass2List[i] = 0; goodOrBad[i] = 0; } /* Fill in the mass array assuming singly-charged ions.*/ currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while(currPtr < ptrOfNoReturn) { massList[ionNum] = currPtr->mOverZ; ionNum++; currPtr++; } /* Fill in the mass array assuming doubly-charged ions.*/ if(gParam.chargeState > 2) { for(i = 0; i < ionNum; i++) { testMass = massList[i] * 2 - gElementMass[HYDROGEN]; if(testMass < gParam.peptideMW - gMonoMass[G] + gParam.fragmentErr && testMass > 700) /*doubly charged ions have to be in the right mass range*/ { mass2List[i] = testMass; } else { mass2List[i] = 0; /*zero is a flag that it could not be a doubly-charged ion*/ } } } /* Find a suitable error*/ /*assume all ions are singly-charged*/ for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { testMass = massList[i] + massList[j] - 2 * gElementMass[HYDROGEN]; if(testMass <= gParam.peptideMW + gParam.peptideErr * 2 && testMass >= gParam.peptideMW - gParam.peptideErr * 2) { pairMass[pairNum] = testMass; /*collect the data*/ pairNum++; } } } /*now assume that one of the pair is doubly-charged*/ if(gParam.chargeState > 2) { for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { if(mass2List[j] > massList[i]) { testMass = massList[i] + mass2List[j] - 2 * gElementMass[HYDROGEN]; if(testMass <= gParam.peptideMW + gParam.peptideErr * 2 && testMass >= gParam.peptideMW - gParam.peptideErr * 2) { pairMass[pairNum] = testMass; /*collect the data*/ pairNum++; } } } } } /*now calculate the stDev error*/ if(pairNum < 3) { stDev = gParam.peptideErr; /*not enough pairs of ions to determine standard deviation so it gets defined as the peptide error from the params file*/ } else /*enough data to take a stab at finding standard deviation*/ { for(i = 0; i < pairNum; i++) { avePairMass += pairMass[i]; } avePairMass = avePairMass / pairNum; for(i = 0; i < pairNum; i++) { stDev += ((pairMass[i] - avePairMass) * (pairMass[i] - avePairMass)); } stDev = stDev / (pairNum - 1); stDev = sqrt(stDev); } /*reality checks*/ if(stDev > 2 * gParam.peptideErr) { stDev = 2 * gParam.peptideErr; /*don't let the error be too big*/ } else if(stDev < 0.5 * gParam.peptideErr) { stDev = 0.5 * gParam.peptideErr; /*or too small*/ } /*find pairs of masses that are close to the peptide molecular weight*/ /*first assume the ions are all singly-charged*/ pairNum = 0; for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { testMass = massList[i] + massList[j] - 2 * gElementMass[HYDROGEN]; if(testMass <= gParam.peptideMW + stDev && testMass >= gParam.peptideMW - stDev) { goodOrBad[i] = 1; goodOrBad[j] = 1; pairNum++; } } } /*now assume that one of them is doubly-charged*/ if(gParam.chargeState > 2) { for(i = 0; i < ionNum - 1; i++) { for(j = i + 1; j < ionNum; j++) { if(mass2List[j] > massList[i]) { testMass = massList[i] + mass2List[j] - 2 * gElementMass[HYDROGEN]; if(testMass <= gParam.peptideMW + stDev && testMass >= gParam.peptideMW - stDev) { goodOrBad[i] = 1; goodOrBad[j] = 1; pairNum++; } } } } } /* The normIntensity field for the ms data pointers contain 0 if not a goldenBoy and a 1 if it is. */ for(i = 0; i < ionNum; i++) { if(goodOrBad[i] != 0) { currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while(currPtr < ptrOfNoReturn) { if(massList[i] == currPtr->mOverZ) { currPtr->normIntensity = 1; break; } currPtr++; } } } /* free the variables*/ free(massList); free(mass2List); free(pairMass); free(goodOrBad); return; } /***********************************FindTheGoldenBoys*************************************** * * Two arrays are set up, one that contains the ion m/z and another that is either 1 or 0, * depending on whether the ion is a golden boy or not (golden boys are ions that can be * connected by single amino acid jumps to either 147 or 175, and are m/z less than the * precursor ion). By searching through the ms data list, I start at 147 and make one aa jumps. * If an ion is present then the same index position for the second array is reset to 1. * */ void FindTheGoldenBoys(tMSDataList *inMSDataList) { REAL_4 *massList, precursor, lys, arg, plusArgLys[2 * AMINO_ACID_NUMBER]; REAL_4 testMass, err; char *goodOrBad, test; INT_4 i, j, k, ionNum, *intensityList, cutoff, goldenBoyNum; INT_4 maxIonNum = gGraphLength / gMultiplier; /*don't need GRAPH_LENGTH numbers of ions for the arrays of massList, goodOrBad, and intensityList*/ tMSData *currPtr = NULL; tMSData *ptrOfNoReturn = NULL; /* Set aside some space for these arrays.*/ massList = (float *) malloc(maxIonNum * sizeof(REAL_4)); if(massList == NULL) { printf("FindTheGoldenBoys: Out of memory."); exit(1); } goodOrBad = (char *) malloc(maxIonNum * sizeof(char)); if(goodOrBad == NULL) { printf("FindTheGoldenBoys: Out of memory."); exit(1); } intensityList = (int *) malloc(maxIonNum * sizeof(INT_4)); if(intensityList == NULL) { printf("FindTheGoldenBoys: Out of memory."); exit(1); } /* Initialize some variables.*/ lys = gMonoMass[K] + 3 * gElementMass[HYDROGEN] + gElementMass[OXYGEN]; arg = gMonoMass[R] + 3 * gElementMass[HYDROGEN] + gElementMass[OXYGEN]; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while(currPtr < ptrOfNoReturn) { currPtr->normIntensity = 0; /*set normIntensity field to zero*/ currPtr++; } for(i = 0; i < maxIonNum; i++) { massList[i] = 0; intensityList[i] = 0; goodOrBad[i] = 0; } for(i = 0; i < gAminoAcidNumber; i++) { plusArgLys[i] = lys + gMonoMass[i]; } j = 0; for(i = gAminoAcidNumber; i < 2 * gAminoAcidNumber; i++) { plusArgLys[i] = arg + gMonoMass[j]; j++; } precursor = (gParam.peptideMW + gParam.chargeState * gElementMass[HYDROGEN]) / gParam.chargeState; ionNum = 0; err = gParam.fragmentErr / 4; /*The error between ions is less than the error of the calc vs obsd masses.*/ /* Fill in the mass array.*/ currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while(currPtr < ptrOfNoReturn) { if(currPtr->mOverZ > lys - gParam.fragmentErr && currPtr->mOverZ < precursor - gParam.fragmentErr * 2 && currPtr->mOverZ < GOLDEN_BOY_MAX) { massList[ionNum] = currPtr->mOverZ; intensityList[ionNum] = currPtr->intensity; ionNum++; } currPtr++; } /* Seed the array goodOrBad by putting a 1 in the position that have 147 or 175. If both are absent, then I seed with the array plusArgLys. */ test = 1; for(i = 0; i < ionNum; i++) { if((massList[i] <= lys + gParam.fragmentErr) && (massList[i] >= lys - gParam.fragmentErr)) { goodOrBad[i] = 1; /*you found 147*/ test = 0; break; } } for(i = 0; i < ionNum; i++) { if((massList[i] <= arg + gParam.fragmentErr) && (massList[i] >= arg - gParam.fragmentErr)) { goodOrBad[i] = 1; /*you found 175*/ test = 0; break; } } if(test) { for(i = 0; i < ionNum; i++) { for(j = 0; j < 2 * gAminoAcidNumber; j++) { if((massList[i] <= plusArgLys[j] + gParam.fragmentErr) && (massList[i] >= plusArgLys[j] - gParam.fragmentErr)) { goodOrBad[i] = 1; } } } } /* Start at the low mass end and work up trying to connect y ions. Anything that can be connected to 147 or 175 is given a value of 1 in the goodOrBad array. */ for(i = 0; i < ionNum - 1; i++) { if(goodOrBad[i] != 0) { for(j = i + 1; j < ionNum; j++) { testMass = massList[j] - massList[i]; if((testMass <= gMonoMass[W] + err) && (testMass >= gMonoMass[G] - err)) { for(k = 0; k < gAminoAcidNumber; k++) { if((testMass <= gMonoMass[k] + err) && (testMass >= gMonoMass[k] - err)) { if(massList[i] <= lys + err && massList[i] >= lys - err) { goodOrBad[j] = -1; /*tag y2 ions so that they don't get tossed*/ } else if(massList[i] <= arg + err && massList[i] >= arg - err) { goodOrBad[j] = -1; /*tag y2 ions so that they don't get tossed*/ } else { goodOrBad[j] = 1; } } } } } } } /* Make sure we don't lose the y1 ions by marking them as -1*/ for(i = 0; i < ionNum; i++) { if(massList[i] <= lys + err && massList[i] >= lys - err) { goodOrBad[i] = -1; /*tag y2 ions so that they don't get tossed*/ } if(massList[i] <= arg + err && massList[i] >= arg - err) { goodOrBad[i] = -1; /*tag y2 ions so that they don't get tossed*/ } } /* Eliminate low intensity goldenBoys from the goldenBoy list. */ cutoff = 0; goldenBoyNum = 0; for(i = 0; i < ionNum; i++) { if(goodOrBad[i] == 1) { cutoff += intensityList[i]; goldenBoyNum++; } } if(goldenBoyNum == 0) return; cutoff = (cutoff / goldenBoyNum) * GOLDEN_BOY_CUTOFF; for(i = 0; i < ionNum; i++) { if(goodOrBad[i] == 1) { if(intensityList[i] < cutoff) { goodOrBad[i] = 0; } } } /*change the -1 values that mark y2 ions back to +1 values so that they get counted as goldenboys*/ for(i = 0; i < ionNum; i++) { if(goodOrBad[i] == -1) { goodOrBad[i] = 1; } } /* The normIntensity field for the ms data pointers contain 0 if not a goldenBoy and a 1 if it is. */ for(i = 0; i < ionNum; i++) { if(goodOrBad[i] != 0) { currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while(currPtr < ptrOfNoReturn) { if(massList[i] == currPtr->mOverZ) { currPtr->normIntensity = 1; break; } currPtr++; } } } /* Free the arrays.*/ free(massList); free(goodOrBad); free(intensityList); return; } /********************************GetPeakWidth*********************************************** * * GetPeakWidth finds the peak width when the auto-peakWidth option is chosen by setting * the peakWidth to zero in the .params file. * */ REAL_4 GetPeakWidth(tMSDataList *inMSDataList) { tMSDataList *bigTreeList = NULL; INT_4 i; tMSData *currPtr; tMSData *ptrOfNoReturn; REAL_4 precursor; INT_4 topIndex; INT_4 halfIntensity; REAL_4 slope; REAL_4 intercept; REAL_4 leadingMass; REAL_4 trailingMass; REAL_4 peakWidth; REAL_4 peakWidthSum = 0; REAL_4 peakWidthSquaredSum = 0; REAL_4 avgPeakWidth; REAL_4 stdDev; REAL_4 tolerance; if (inMSDataList->numObjects == 0) { printf("GetPeakWidth: no data in inMSDataList\n"); exit(1); } precursor = (gParam.peptideMW + gParam.chargeState) / gParam.chargeState; /* Sort the MSData in order of decreasing intensity */ qsort(inMSDataList->mass,(size_t)inMSDataList->numObjects, (size_t)sizeof(tMSData),IntensityDescendSortFunc); bigTreeList = (tMSDataList *) CreateNewList( sizeof(tMSData), 10, 1 ); if (!bigTreeList) { printf("Ran out of memory in GetPeakWidth()!\n"); exit(1); } /* Make bigTrees the top ten most intense peaks that are not the precursor. */ currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn && bigTreeList->numObjects < 10) { /* Only gather peaks below 600 and don't take the precursor */ if (currPtr->mOverZ < 600 && (currPtr->mOverZ < precursor - 2.0 || currPtr->mOverZ > precursor + 2.0)) /* XXXXXX WHICH SHOULD I USE????? JAT // && (currPtr->mOverZ < precursor - gParam.peptideErr // || currPtr->mOverZ > precursor + gParam.peptideErr)) */ { /* Don't take a peak that overlaps one already on the list */ for (i = 0; i < bigTreeList->numObjects; i++) { if (currPtr->mOverZ > bigTreeList->mass[i].mOverZ - 5.0 && currPtr->mOverZ < bigTreeList->mass[i].mOverZ + 5.0) { break; /* Too close to a tree we already have */ } } if (i == bigTreeList->numObjects) { if(!AddToList(currPtr, bigTreeList)) { printf("Ran out of memory in GetPeakWidth()!\n"); exit(1); } } } currPtr++; } /* Resort the MSData in order of increasing mass */ qsort(inMSDataList->mass,(size_t)inMSDataList->numObjects, (size_t)sizeof(tMSData),MassAscendSortFunc); /* Remove big tree peaks if they are < 10% of the highest peak. */ for (i = 1; i < bigTreeList->numObjects; i++) { if (bigTreeList->mass[i].intensity < 0.10 * bigTreeList->mass[0].intensity) { RemoveFromList(i, bigTreeList); i--; } } /* Now find the half-height peak width of each remaining bigTree. */ for (i = 0; i < bigTreeList->numObjects; i++) { currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn) { if (currPtr->mOverZ == bigTreeList->mass[i].mOverZ) break; currPtr++; } topIndex = currPtr - &inMSDataList->mass[0]; halfIntensity = (INT_4)(0.5 * bigTreeList->mass[i].intensity); /* // peak top // | // leading . trailing // edge . . edge // . . // ______________ . ___ . ______________ 50% peak height // . . // . . // . . // . . . . . |-----| . . . . . // peak width */ /* ----------- Find the leading edge 50% mass value ----------- */ while (currPtr >= &inMSDataList->mass[0]) { if (currPtr->intensity < halfIntensity) break; currPtr--; } /* Use the two points that flank the half-intensity value to calculate the mass at exactly half intensity. */ slope = (REAL_4)(((currPtr + 1)->intensity - currPtr->intensity)/ ((currPtr + 1)->mOverZ - currPtr->mOverZ)); intercept = (REAL_4)((currPtr->intensity) - (currPtr->mOverZ * slope)); leadingMass = (REAL_4)((halfIntensity - intercept)/slope); /* ----------- Find the trailing edge 50% mass value ----------- */ currPtr = &inMSDataList->mass[topIndex]; while (currPtr < ptrOfNoReturn) { if (currPtr->intensity < halfIntensity) break; currPtr++; } /* Use the two points that flank the half-intensity value to calculate the mass at exactly half intensity. */ slope = (REAL_4)((currPtr->intensity - (currPtr - 1)->intensity)/ (currPtr->mOverZ - (currPtr - 1)->mOverZ)); intercept = (REAL_4)(((currPtr - 1)->intensity) - ((currPtr - 1)->mOverZ * slope)); trailingMass = (REAL_4)((halfIntensity - intercept)/slope); /* XXXXXXXXX Why multiply by two? - JAT */ peakWidth = (REAL_4)(trailingMass - leadingMass) * 2; peakWidthSum += peakWidth; peakWidthSquaredSum += peakWidth * peakWidth; /* Replace the mass with the peak width (for throwing out outliers) */ bigTreeList->mass[i].mOverZ = (REAL_4)(trailingMass - leadingMass) * 2; } if (bigTreeList->numObjects > 0) { /* Calculate the average peak width for the big trees. */ avgPeakWidth = (REAL_4)(peakWidthSum/bigTreeList->numObjects); /* Calculate the standard deviation. */ stdDev = sqrt((peakWidthSquaredSum/bigTreeList->numObjects) - (avgPeakWidth * avgPeakWidth)); /* Throw away peaks too far away from the average. */ tolerance = 1.5 * stdDev; peakWidthSum = 0; for (i = 0; i < bigTreeList->numObjects; i++) { /* Remember that the big tree mass in now really the peak width */ if (bigTreeList->mass[i].mOverZ < (avgPeakWidth - tolerance) || bigTreeList->mass[i].mOverZ > (avgPeakWidth + tolerance)) { RemoveFromList(i, bigTreeList); i--; } else { peakWidthSum += bigTreeList->mass[i].mOverZ; } } avgPeakWidth = (REAL_4)(peakWidthSum/bigTreeList->numObjects); } else { avgPeakWidth = 3; } if(avgPeakWidth >= 2.5) { /*For the broad peaks, I force the fragment error to be at least 1 Da.*/ if(gParam.fragmentErr < 1.0) { gParam.fragmentErr = 1; } } else { if(avgPeakWidth >= 1.5) { if(gParam.fragmentErr < 0.75) { gParam.fragmentErr = 0.75; } } else { /* XXXXXXX Should this be done when data is high res? JAT */ if(gParam.fragmentErr < 0.5) { gParam.fragmentErr = 0.5; } } } avgPeakWidth = avgPeakWidth / 2; /* the program anticipates a number that is half of the actual peak width */ if (bigTreeList) DisposeList(bigTreeList); return(avgPeakWidth); } /* //-------------------------------------------------------------------------------- // IntensityDescendSortFunc() //-------------------------------------------------------------------------------- // Modified 03.13.00 JAT - Added the secondary mass key to eliminate problems with // platform specific differences. // */ INT_4 IntensityDescendSortFunc(const void *n1, const void *n2) { tMSData *n3, *n4; n3 = (tMSData *)n1; n4 = (tMSData *)n2; if (n3->intensity != n4->intensity) { return (INT_4)(n3->intensity < n4->intensity)? 1:-1; } else { if (n3->mOverZ != n4->mOverZ) { return (INT_4)(n3->mOverZ < n4->mOverZ)? 1:-1; } return 0; } } /* //-------------------------------------------------------------------------------- // MassAscendSortFunc() //-------------------------------------------------------------------------------- */ INT_4 MassAscendSortFunc(const void *n1, const void *n2) { tMSData *n3, *n4; n3 = (tMSData *)n1; n4 = (tMSData *)n2; if(n3->mOverZ != n4->mOverZ) { return (INT_4)(n3->mOverZ > n4->mOverZ)? 1:-1; } else { return 0; } } /***********************************RemoveIsotopes*************************************** * * This function removes peaks that differ by one dalton and appear to be due to * the presence of isotopes. The algorithm is very simple and basic and only * worries about whether the next ion up is an isotopic peak; it doesn't worry * about two daltons up cuz I'm assuming that those ions will weeded out based on * intensity. This function won't be suitable for use on high resolution data * obtained from, say, a Q-TOF. I'll burn that bridge when I get to it. */ void RemoveIsotopes(tMSDataList *inMSDataList) { tMSData *currPtr = NULL; tMSData *isotopePtr = NULL; tMSData *isotope2Ptr = NULL; tMSData *ptrOfNoReturn = NULL; REAL_4 upperLimit; REAL_4 lowerLimit; REAL_4 massDiff; REAL_4 obsdIntensityRatio; REAL_4 calcIntensityRatio; /* Use a tighter error since the difference is relative. */ upperLimit = 1 + (gParam.fragmentErr / 2); lowerLimit = 1 - (gParam.fragmentErr / 2); currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn - 1) { isotopePtr = currPtr + 1; while (isotopePtr < ptrOfNoReturn && isotopePtr->mOverZ <= currPtr->mOverZ + upperLimit) { massDiff = isotopePtr->mOverZ - currPtr->mOverZ; /* Are the peaks 1 Da apart? */ if (massDiff < upperLimit && massDiff > lowerLimit) { /* Calculate the theoretical isotope ratio. * (The 0.2 is a fudge factor.) */ calcIntensityRatio = ((currPtr->mOverZ) / 1800) + 0.2; obsdIntensityRatio = (REAL_4)(isotopePtr->intensity) / (REAL_4)(currPtr->intensity); /* Does a comparison of the intensities make it look like an isotope peak? * Give 25% leeway. */ if (obsdIntensityRatio <= calcIntensityRatio)/* + 0.25 && obsdIntensityRatio >= calcIntensityRatio - 0.25) Fixed by RSJ*/ { /* We found what looks a +1 isotope peak, is there a peak that * looks like a +2 isotope? */ isotope2Ptr = isotopePtr + 1; while (isotope2Ptr < ptrOfNoReturn && isotope2Ptr->mOverZ <= isotopePtr->mOverZ + upperLimit) { massDiff = isotope2Ptr->mOverZ - isotopePtr->mOverZ; /* Are the peaks 1 Da apart? */ if (massDiff < upperLimit && massDiff > lowerLimit) { /* XXXXXX What should the equation for this ratio be? - JAT */ /* Calculate the theoretical isotope ratio. * (The 0.2 is a fudge factor.) */ calcIntensityRatio = ((isotopePtr->mOverZ) / 1800) + 0.2; obsdIntensityRatio = (REAL_4)(isotope2Ptr->intensity) / (REAL_4)(isotopePtr->intensity); /* Does a comparison of the intensities make it look like an isotope peak? * Give 25% leeway. */ if (obsdIntensityRatio <= calcIntensityRatio)/* + 0.25 && obsdIntensityRatio >= calcIntensityRatio - 0.25) Fixed by RSJ*/ { /* We found what looks a +2 isotope peak, Whack it. */ RemoveFromList((isotope2Ptr - &inMSDataList->mass[0]), inMSDataList); ptrOfNoReturn--; isotope2Ptr--; } } isotope2Ptr++; } /* Whack the +1 isotope. */ RemoveFromList((isotopePtr - &inMSDataList->mass[0]), inMSDataList); ptrOfNoReturn--; isotopePtr--; } } isotopePtr++; } currPtr++; } return; } /***********************************FindMedian********************************************** * * FindMedian finds the median threshold value. */ INT_4 FindMedian(struct MSData *firstPtr) { struct MSData *currPtr, *biggestPtr; INT_4 biggestIntensity, lowIntensityValue, highIntensityValue; INT_4 numberOfIons, targetNumOfIons, median, signal; /* Count the number of datapoints, and divide in half.*/ targetNumOfIons = 0; currPtr = firstPtr; while(currPtr != NULL) { targetNumOfIons += 1; currPtr = currPtr->next; } targetNumOfIons = targetNumOfIons / 2; /* Make the highest intensity datapoints negative.*/ numberOfIons = 0; while(numberOfIons <= targetNumOfIons) { currPtr = firstPtr; biggestPtr = currPtr; biggestIntensity = currPtr->intensity; numberOfIons += 1; while(currPtr != NULL) { if(currPtr->intensity > biggestIntensity) { biggestIntensity = currPtr->intensity; biggestPtr = currPtr; } currPtr = currPtr->next; } biggestPtr->intensity = biggestPtr->intensity * -1; } /* Find the highest intensity datapoint from the low intensity half of the set.*/ currPtr = firstPtr; biggestIntensity = currPtr->intensity; while(currPtr != NULL) { if(currPtr->intensity > biggestIntensity) { biggestIntensity = currPtr->intensity; } currPtr = currPtr->next; } lowIntensityValue = biggestIntensity; /* Find the highest intensity datapoint from the high intensity half of the set. Recall that the high intensity half of the data set is negative, so I'll actually be finding the lowest intensity datapoint.*/ currPtr = firstPtr; while(currPtr != NULL) /*First find a negative intensity.*/ { if(currPtr->intensity < 0) { biggestIntensity = currPtr->intensity; break; } currPtr = currPtr->next; } currPtr = firstPtr; /*Now go look for the correct value.*/ while(currPtr != NULL) { if(currPtr->intensity > biggestIntensity && currPtr->intensity < 0) { biggestIntensity = currPtr->intensity; } currPtr = currPtr->next; } highIntensityValue = biggestIntensity * -1; /* Take an average of the two datapoints, which will be the median.*/ median = (lowIntensityValue + highIntensityValue) / 2; /* The signal threshold is determined from the median and the user input "ionThreshold".*/ signal = median * gParam.ionThreshold; return(signal); } /************************ZeroTheIons*************************************************** * * This function inputs the linked list of weight averaged ions (a struct of type MSData), * plus the REAL_4 'peakWidth', which is one-half of the width of a peak near its base. * It finds ions that are too close together (less than 'peakWidth') and zero's the intensity * field of the ion with the lowest intensity. Those ions w/ zero intensity are free'd and * the list is re-linked. It returns a pointer to this list of structs of type MSData. */ struct MSData *ZeroTheIons(struct MSData *firstAvMassPtr) { struct MSData *currPtr, *nextPtr, *previousPtr, *structToFreePtr; REAL_8 diff; currPtr = firstAvMassPtr; while(currPtr != NULL) { nextPtr = currPtr->next; while(nextPtr != NULL) { diff = fabs((nextPtr->mOverZ) - (currPtr->mOverZ)); if(diff <= gParam.peakWidth * 1.5) /* *1.75 was empirically derived*/ { if(currPtr->intensity < nextPtr->intensity) /*Which ion should be zeroed?*/ { currPtr->intensity = 0; } else { nextPtr->intensity = 0; } } nextPtr = nextPtr->next; } /* * currPtr is moved up to the next position, and if the intensity of that value * has not been zeroed, then it breaks out of the while loop and becomes the next * currPtr. If currPtr reaches the NULL value, the while loop terminates, and the * NULL currPtr also terminates the next loop up in the hierarchy. */ while(currPtr != NULL) { currPtr = currPtr->next; if(currPtr == NULL || currPtr->intensity != 0) { break; /*Break out - you've found an ion value that is not NULL and has positive intensity.*/ } } } /* * Next I weed out the zero intensity ions, and re-link the non-zero ions. * The original linked list of mass spectral data is free'ed. */ currPtr = firstAvMassPtr; while(currPtr->intensity == 0) /*Find the first ion that has a non-zero intensity.*/ { structToFreePtr = currPtr; currPtr = currPtr->next; free(structToFreePtr); } firstAvMassPtr = currPtr; /*Set the new firstAvMassPtr; this is the return value.*/ previousPtr = currPtr; currPtr = currPtr->next; while(currPtr != NULL && previousPtr != NULL) { if(currPtr->intensity == 0) { previousPtr->next = currPtr->next; free(currPtr); currPtr = previousPtr->next; } else { previousPtr = currPtr; currPtr = currPtr->next; } } return(firstAvMassPtr); } /*******************************WeedTheIons**************************************** * * This function is called when the actual number of ions in a linked list of structs * of type MSData exceeds the value "finalIonCount". The most intense ions are saved, * and the linked list is modified to remove the low intensity ions. The discarded * structs are free'ed. */ void WeedTheIons(tMSDataList *inMSDataList, INT_4 finalIonCount, BOOLEAN spareGoldenBoys) { tMSData *currPtr; tMSData *ptrOfNoReturn; BOOLEAN thumbsDown; REAL_4 immonium[15]; REAL_4 precursor; INT_4 i; /*initialize immonium ions*/ immonium[0] = gMonoMass[D] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[1] = gMonoMass[N] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[2] = gMonoMass[E] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[3] = gMonoMass[Q] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[4] = gMonoMass[H] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[5] = gMonoMass[L] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[6] = gMonoMass[M] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[7] = gMonoMass[F] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[8] = gMonoMass[P] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[9] = gMonoMass[S] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[10] = gMonoMass[T] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[11] = gMonoMass[W] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[12] = gMonoMass[Y] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[13] = gMonoMass[V] - gElementMass[1] - gElementMass[3] + gElementMass[0]; immonium[14] = gMonoMass[K] - gElementMass[1] - gElementMass[3] - 2 * gElementMass[0] - gElementMass[2]; precursor = (gParam.peptideMW + gParam.chargeState * gElementMass[HYDROGEN]) / gParam.chargeState; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; /* Increase the intensity of ions larger than the precursor as befitting their importance so they are less likely to be purged. */ while (currPtr < ptrOfNoReturn) { if (currPtr->mOverZ > precursor) { currPtr->intensity *= 2.5; } currPtr++; } /* Sort the MSData in order of decreasing intensity */ qsort(inMSDataList->mass,(size_t)inMSDataList->numObjects, (size_t)sizeof(tMSData),IntensityDescendSortFunc); /* Now that the ions are sorted, take the intensity increase given to the ions above the precursor back out. */ currPtr = &inMSDataList->mass[0]; while (currPtr < ptrOfNoReturn) { if (currPtr->mOverZ > precursor) { currPtr->intensity /= 2.5; } currPtr++; } /* Keep the flowers and remove the weeds. */ currPtr = &inMSDataList->mass[finalIonCount]; while (currPtr < ptrOfNoReturn) { thumbsDown = TRUE; if (TRUE == spareGoldenBoys && currPtr->normIntensity == 1 && currPtr->intensity > 0) { /* Spare the golden boys */ thumbsDown = FALSE; } else if (currPtr->mOverZ < 160) { /* Spare potential immonium ions */ for(i = 0; i < 15; i++) { if (currPtr->mOverZ <= immonium[i] + gParam.fragmentErr && currPtr->mOverZ >= immonium[i] - gParam.fragmentErr) { thumbsDown = FALSE; break; } } } if (TRUE == thumbsDown) { RemoveFromList(currPtr - &inMSDataList->mass[0], inMSDataList); ptrOfNoReturn--; currPtr--; } currPtr++; } /* Resort the MSData in order of increasing mass */ qsort(inMSDataList->mass,(size_t)inMSDataList->numObjects, (size_t)sizeof(tMSData),MassAscendSortFunc); /* Update the index values */ for (i = 0; i < inMSDataList->numObjects; i++) { inMSDataList->mass[i].index = i; } return; } /****************************countIons********************************************* * This function counts the number of ions in the linked list of structs of type * MSData. It returns a INT_4 corresponding to the number of ions counted. */ INT_4 countIons(struct MSData *firstAvMassPtr) { struct MSData *currPtr; INT_4 count = 0; currPtr = firstAvMassPtr; while(currPtr != NULL) { if(currPtr->normIntensity == 0/* && currPtr->mOverZ > 146.5*/) { count++; } currPtr = currPtr->next; } return(count); } /****************************countIonsAgain********************************************* * This function counts the number of ions in the linked list of structs of type * MSData. It returns a INT_4 corresponding to the number of ions counted. */ INT_4 countIonsAgain(struct MSData *firstAvMassPtr) { struct MSData *currPtr; INT_4 count = 0; currPtr = firstAvMassPtr; while(currPtr != NULL) { currPtr = currPtr->next; count++; } return(count); } /****************************RemovePrecursors************************************** * * I don't see why I keep these ions around. Lets get rid of them here. * */ void RemovePrecursors(tMSDataList *inMSDataList) { tMSData *currPtr; tMSData *ptrOfNoReturn; REAL_4 precursor, precurMin2W, precurMinWA, precurMin2A, precurMinW, precurMinA; REAL_4 tolerance; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; tolerance = gParam.fragmentErr; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precurMin2W = (gParam.peptideMW - WATER - WATER + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precurMinWA = (gParam.peptideMW - WATER - AMMONIA + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precurMin2A = (gParam.peptideMW - AMMONIA - AMMONIA + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precurMinW = (gParam.peptideMW - WATER + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precurMinA = (gParam.peptideMW - AMMONIA + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; while (currPtr < ptrOfNoReturn) { if((currPtr->mOverZ <= precursor + tolerance && currPtr->mOverZ >= precursor - tolerance) || (currPtr->mOverZ <= precurMin2W + tolerance && currPtr->mOverZ >= precurMin2W - tolerance) || (currPtr->mOverZ <= precurMinWA + tolerance && currPtr->mOverZ >= precurMinWA - tolerance) || (currPtr->mOverZ <= precurMin2A + tolerance && currPtr->mOverZ >= precurMin2A - tolerance) || (currPtr->mOverZ <= precurMinW + tolerance && currPtr->mOverZ >= precurMinW - tolerance) || (currPtr->mOverZ <= precurMinA + tolerance && currPtr->mOverZ >= precurMinA - tolerance)) { if(currPtr->intensity > 1) { currPtr->intensity = 1; /*if there are too many ions, then the low intensity will cause this to be removed*/ } else { currPtr->intensity = 0; /*usually intensity is a large integer, but in case it ever becomes a REAL_4 between 0 and 1, I'll stick this condition in here - it probably won't ever be used.*/ } } currPtr++; } return; } /****************************EliminateBadHighMassIons****************************** * * This function removes high mass ions that could not be either b or y ions, via the * loss of one or two amino acids. Since ions may be due to the loss of three amino acids * this function stops eliminating ions that could be due to the loss of one glycine * and two alanines. The combination of three glycines matches one Asn and one Gly. * */ void EliminateBadHighMassIons(tMSDataList *inPeakList) { tMSData *currPtr = NULL; tMSData *windowStartPtr = NULL; tMSData *ptrOfNoReturn = NULL; REAL_4 stopSearch, bIon, yIon; INT_4 i, j; char test; if(gParam.peptideMW > gParam.monoToAv) return; /*this function designed for monoisotopic masses only*/ stopSearch = gParam.peptideMW + gElementMass[HYDROGEN] - 2 * gMonoMass[A] - gMonoMass[G] + gParam.fragmentErr; /* make all indexes positive; bad ions are later given negative index values*/ currPtr = &inPeakList->mass[0]; ptrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPtr < ptrOfNoReturn) { currPtr->index = 1; currPtr++; } /*look at ions from high mass to low mass*/ currPtr = &inPeakList->mass[ (inPeakList->numObjects) - 1 ]; ptrOfNoReturn = &inPeakList->mass[0]; while(currPtr >= ptrOfNoReturn && currPtr->mOverZ > stopSearch) { test = TRUE; for(i = 0; i < gAminoAcidNumber; i++) { bIon = gParam.peptideMW - gParam.modifiedCTerm - gMonoMass[i]; if(currPtr->mOverZ <= bIon + gParam.fragmentErr && currPtr->mOverZ >= bIon - gParam.fragmentErr) { test = FALSE; } yIon = gParam.peptideMW + gElementMass[HYDROGEN] - gMonoMass[i]; if(currPtr->mOverZ <= yIon + gParam.fragmentErr && currPtr->mOverZ >= yIon - gParam.fragmentErr) { test = FALSE; } for(j = 0; j < gAminoAcidNumber; j++) { bIon = gParam.peptideMW - gParam.modifiedCTerm - gMonoMass[i] - gMonoMass[j]; if(currPtr->mOverZ <= bIon + gParam.fragmentErr && currPtr->mOverZ >= bIon - gParam.fragmentErr) { test = FALSE; } yIon = gParam.peptideMW + gElementMass[HYDROGEN] - gMonoMass[i] - gMonoMass[j]; if(currPtr->mOverZ <= yIon + gParam.fragmentErr && currPtr->mOverZ >= yIon - gParam.fragmentErr) { test = FALSE; } } } if(test) currPtr->index = -1; currPtr--; } /*get rid of the peaks with neg indexes*/ currPtr = &inPeakList->mass[0]; ptrOfNoReturn = &inPeakList->mass[inPeakList->numObjects]; while(currPtr < ptrOfNoReturn) { if(currPtr->index == -1) { RemoveFromList(currPtr - &inPeakList->mass[0], inPeakList); ptrOfNoReturn--; currPtr--; } currPtr++; } return; } /****************************WindowFilter****************************************** * * Next the program checks to see if there are too many ions clustered together. It does * this by counting the number of ions within windows of width 120 Da and making sure that * only a certain number of ions (ionsPerWindow) are present within any given window. If * there are too many ions, it throws out those with the lowest intensity. */ void WindowFilter(tMSDataList *inMSDataList) { tMSData *currPtr = NULL; tMSData *windowStartPtr = NULL; tMSData *ptrOfNoReturn = NULL; INT_4 ionsInWindow; INT_4 endingMass; INT_4 charge; INT_4 ionsRemoved; REAL_4 nextChargeWindowStart; if(gParam.maxent3) { charge = 1; } else { charge = gParam.chargeState; /*This is decremented to 1.*/ } nextChargeWindowStart = (gParam.peptideMW + charge) / charge; /* Find the start of the first window at a mass greater than 176 Da. * Below this mass there is no filtering of ions. y1 for arg is 175 */ currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[ inMSDataList->numObjects ]; while (currPtr < ptrOfNoReturn && currPtr->mOverZ < 176) currPtr++; if (currPtr == ptrOfNoReturn) return; windowStartPtr = currPtr; /* If the mass of the window exceeds the nextWindowStart, then the * value of charge is decremented and a nextWindowStart is calculated. * Charge can never be less than one. The variable 'nextWindowStart' * contains the m/z of the point where the window width should be changed. * For example, if the chargeState is 2 then the region below the precursor * ion has a width of SPECTRAL_WINDOW_WIDTH / 2, and the region above has * a window width of SPECTRAL_WINDOW_WIDTH / 1. */ while (charge > 0 && currPtr < ptrOfNoReturn && windowStartPtr < ptrOfNoReturn) { ionsInWindow = 0; if (currPtr->mOverZ > nextChargeWindowStart) { charge--; if (charge == 0) break; nextChargeWindowStart = (gParam.peptideMW + charge) / charge; } endingMass = currPtr->mOverZ + (REAL_4)(SPECTRAL_WINDOW_WIDTH / charge); /* Ions are counted up to the endingMass value. */ while (currPtr < ptrOfNoReturn && currPtr->mOverZ < endingMass) { /* Only count ions that are not goldenBoys */ if (currPtr->normIntensity == 0) ionsInWindow++; currPtr++; } if (ionsInWindow > gParam.ionsPerWindow) { /* If there are too many ions, then purge them. */ ionsRemoved = PurgeTheWindow(inMSDataList, windowStartPtr, ionsInWindow, endingMass); /*currPtr -= ionsRemoved; */ ptrOfNoReturn -= ionsRemoved; /*windowStartPtr += gParam.ionsPerWindow;*/ /*Fixed by RSJ*/ } /*Fixed by RSJ*/ /*else /*Fixed by RSJ*/ /*{ /*Fixed by RSJ*/ /*windowStartPtr = currPtr; /*Fixed by RSJ*/ /*}*/ windowStartPtr++; currPtr = windowStartPtr; } return; } /****************************PurgeTheWindow**************************************** * * This function finds the lowest intensity ions within a particular m/z window, * and purges from the linked list of mass spectral CID data those ions of lowest * intensity. It relinks the list and free's the space that is no longer used. */ INT_4 PurgeTheWindow(tMSDataList *inMSDataList, tMSData *windowStartPtr, INT_4 ionsInWindow, INT_4 endingMass) { tMSData *currPtr = NULL; tMSData *ptrOfNoReturn = NULL; INT_4 excessIonNum, smallestIntensity, indexToRemove; REAL_4 precursor, precurMinW, precurMinA, precurMin2W, precurMin2A, precurMinWA; REAL_4 tolerance; precursor = (gParam.peptideMW + (gParam.chargeState * gElementMass[HYDROGEN])) / gParam.chargeState; precurMinW = (precursor - WATER) / gParam.chargeState; precurMinA = (precursor - AMMONIA) / gParam.chargeState; precurMin2W = (precursor - (2 * WATER)) / gParam.chargeState; precurMin2A = (precursor - (2 * AMMONIA)) / gParam.chargeState; precurMinWA = (precursor - WATER - AMMONIA) / gParam.chargeState; excessIonNum = ionsInWindow - gParam.ionsPerWindow; tolerance = gParam.fragmentErr; currPtr = windowStartPtr; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; /* Whack any precursor ions first */ if ((currPtr->mOverZ < precurMin2W - tolerance && endingMass > precurMin2W - tolerance) || (currPtr->mOverZ < precursor + tolerance && endingMass > precursor + tolerance)) { while (currPtr < ptrOfNoReturn && (currPtr->mOverZ < endingMass)) { if (currPtr->mOverZ > precurMin2W - tolerance) { if ((currPtr->mOverZ >= precursor - tolerance && currPtr->mOverZ <= precursor + tolerance) || (currPtr->mOverZ >= precurMinW - tolerance && currPtr->mOverZ <= precurMinW + tolerance) || (currPtr->mOverZ >= precurMinA - tolerance && currPtr->mOverZ <= precurMinA + tolerance) || (currPtr->mOverZ >= precurMin2W - tolerance && currPtr->mOverZ <= precurMin2W + tolerance) || (currPtr->mOverZ >= precurMin2A - tolerance && currPtr->mOverZ <= precurMin2A + tolerance) || (currPtr->mOverZ >= precurMinWA - tolerance && currPtr->mOverZ <= precurMinWA + tolerance)) { RemoveFromList(currPtr - &inMSDataList->mass[0], inMSDataList); ptrOfNoReturn--; currPtr--; excessIonNum--; } } currPtr++; } } /* Now shed remaining excess ions. */ while (excessIonNum > 0) { currPtr = windowStartPtr; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; /* Don't wipe a golden boy */ while (currPtr->normIntensity == 1 && currPtr < ptrOfNoReturn) currPtr++; if (currPtr == ptrOfNoReturn || currPtr->mOverZ > endingMass) break; smallestIntensity = currPtr->intensity; indexToRemove = currPtr - &inMSDataList->mass[0]; while (currPtr < ptrOfNoReturn && (currPtr->mOverZ < endingMass)) { if (currPtr->normIntensity != 1 && currPtr->intensity < smallestIntensity) { smallestIntensity = currPtr->intensity; indexToRemove = currPtr - &inMSDataList->mass[0]; } currPtr++; } RemoveFromList(indexToRemove, inMSDataList); excessIonNum--; } /* Return the number of ions removed. */ return((ionsInWindow - gParam.ionsPerWindow) - excessIonNum); } /***********************AddTheIonOffset******************************************** * * This function adds ionOffset (a user controlled variable) to the m/z field of * a list of structs of type MSData. * */ void AddTheIonOffset(tMSDataList *inMSDataList) { tMSData *currPtr; tMSData *ptrOfNoReturn; if (gParam.ionOffset == 0) return; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn) { currPtr->mOverZ = currPtr->mOverZ + gParam.ionOffset; currPtr++; } return; } /***********************CheckTheIntensity****************************************** * * This function starts adding up the total intensity of the ions in the linked * list (which starts with the pointer to a struct of type MSData called firstAvMassPtr). * If the sum exceeds 2 billion, then all of the ions are attenuated ten fold. */ void CheckTheIntensity(tMSDataList *inMSDataList) { INT_4 intensitySum = 0; tMSData *currPtr; tMSData *ptrOfNoReturn; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn) { intensitySum += currPtr->intensity; if(intensitySum > 2000000000) { currPtr = &inMSDataList->mass[0]; while (currPtr < ptrOfNoReturn) { currPtr->intensity = currPtr->intensity / 10; currPtr++; } currPtr = &inMSDataList->mass[0]; /*Reset the currPtr to the beginning.*/ intensitySum = 0; /*Reinitialize the summed intensity to zero.*/ continue; /*Break out of the current loop, but continue using the while loop.*/ } currPtr++; /*If nothing happens, then move on to the next ion.*/ } return; } /***********************SortByMass************************************************* * * This function takes in a pointer to the first element in the linked list of * structures of type MSData that contains the smoothed and average mass values * plus intensities. It sorts the contents by mass - the lowest mass ions * are placed first in the list. */ void SortByMass(struct MSData *firstAvMassPtr) { struct MSData *currPtr, *testPtr; INT_4 lowMassIntensity, highMassIntensity; char test; REAL_4 lowMOverZ, highMOverZ; currPtr = firstAvMassPtr; /*Two pointers are compared, currPtr is supposed to contain the lowest mass ion.*/ while(currPtr != NULL) /*If currPtr is NULL then the end of the list has been reached.*/ { testPtr = currPtr->next; /*testPtr contains the high mass ion.*/ test = TRUE; while(testPtr != NULL) { if(testPtr->mOverZ < currPtr->mOverZ) /*Make the comparison.*/ { lowMassIntensity = testPtr->intensity; /*Store ion values in these INT_4 ints.*/ lowMOverZ = testPtr->mOverZ; highMassIntensity = currPtr->intensity; highMOverZ = currPtr->mOverZ; currPtr->intensity = lowMassIntensity; /*Swap the values back into the pointers.*/ currPtr->mOverZ = lowMOverZ; testPtr->intensity = highMassIntensity; testPtr->mOverZ = highMOverZ; test = FALSE; /*Do not allow the currPtr to be incremented.*/ break; /*Break to the outer loop.*/ } testPtr = testPtr->next; /*Get set to test the next ion.*/ } if(test) /*If no testPtr ion was found to be lower in mass, then test = FALSE. Proceed to the next ion up. If there was a rearrangement, then I need to retest the new value that was swapped.*/ { currPtr = currPtr->next; } } return; } /***********************SmoothCID************************************** * * SmoothCID is a five point digital filter that uses Finnigan's coefficients, * which are 13, 27, 37, 27, and 13. The first two and last two data points are * not smoothed. * */ void SmoothCID(tMSDataList *inMSDataList) { tMSData *currPtr; tMSData *ptrOfNoReturn; tMSData *ptrOne, *ptrTwo, *ptrThree, *ptrFour, *ptrFive; /*These are the five points.*/ INT_4 smoothedDataPoint; /*This INT_4 is used to calculated the smoothed data point.*/ if (inMSDataList->numObjects < 6) return; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; /* Initialize the five points. */ ptrOne = currPtr; ptrTwo = currPtr + 1; ptrThree = currPtr + 2; ptrFour = currPtr + 3; ptrFive = currPtr + 4; currPtr = currPtr + 5; /* Do the five point average. currPtr always points to the next data point * following the five * that are currently being smoothed. If currPtr is NULL then the smoothing is stopped. */ while(currPtr < ptrOfNoReturn) { if((ptrOne->intensity > 7000000) || (ptrTwo->intensity > 7000000) || (ptrThree->intensity > 7000000) || (ptrFour->intensity > 7000000) || (ptrFive->intensity > 7000000)) /*if intensity exceeds INT_4 int*/ { ptrOne->intensity = ptrOne->intensity * 0.001; ptrTwo->intensity = ptrTwo->intensity * 0.001; ptrThree->intensity = ptrThree->intensity * 0.001; ptrFour->intensity = ptrFour->intensity * 0.001; ptrFive->intensity = ptrFive->intensity * 0.001; smoothedDataPoint = (13 * ptrOne->intensity) + (27 * ptrTwo->intensity) + (37 * ptrThree->intensity) + (27 * ptrFour->intensity) + (13 * ptrFive->intensity); smoothedDataPoint = smoothedDataPoint / 117; smoothedDataPoint = smoothedDataPoint * 1000; ptrOne->intensity = ptrOne->intensity * 1000; ptrTwo->intensity = ptrTwo->intensity * 1000; ptrThree->intensity = ptrThree->intensity * 1000; ptrFour->intensity = ptrFour->intensity * 1000; ptrFive->intensity = ptrFive->intensity * 1000; } else { smoothedDataPoint = (13 * ptrOne->intensity) + (27 * ptrTwo->intensity) + (37 * ptrThree->intensity) + (27 * ptrFour->intensity) + (13 * ptrFive->intensity); smoothedDataPoint = smoothedDataPoint / 117; } ptrThree->intensity = smoothedDataPoint; ptrOne = ptrTwo; /* Shift the five points up by one position.*/ ptrTwo = ptrThree; ptrThree = ptrFour; ptrFour = ptrFive; ptrFive = currPtr; currPtr++; } return; } /**********************FreeAllMSData*************************************************** * * An alternate way to free the data in a linked list. Here the structs are free'ed * in non-reverse order. I'll see if this crashes. */ void FreeAllMSData(struct MSData *currPtr) { struct MSData *freeMePtr; while(currPtr != NULL) { freeMePtr = currPtr; currPtr = currPtr->next; free(freeMePtr); } return; } /******************LoadMSDataStruct******************************************** * * LoadStruct puts mass to charge and intensity values into the appropriate struct field * and returns the pointer to that struct. * */ struct MSData *LoadMSDataStruct(REAL_4 massValue, INT_4 ionIntensity) { struct MSData *currPtr = NULL; currPtr = (struct MSData *)malloc(sizeof(struct MSData)); if(currPtr == NULL) { printf("LoadMSDataStruct: Out of memory\n"); exit(1); } currPtr->mOverZ = massValue; currPtr->intensity = ionIntensity; currPtr->normIntensity = 0; currPtr->next = NULL; return(currPtr); } /****************AddToListNoNull********************************************************** * * AddToListNoNull adds m/z and intensity values in an MSData struct to a linked list. * The first parameter is a pointer to a struct * of type MSData containing the firt bit of data in the list. The second parameter * is a pointer to a struct of type MSData * containing a new piece of data to be added to the list. The end of this linked * list is not signaled by the presence of a zero in the next field. * */ struct MSData *AddToListNoNull(struct MSData *firstPtr, struct MSData *currPtr) { struct MSData *lastPtr; lastPtr = firstPtr; if(firstPtr == NULL) firstPtr = currPtr; else { while(lastPtr->next != NULL) lastPtr = lastPtr->next; lastPtr->next = currPtr; } return(firstPtr); } /****************AddToCIDList********************************************************** * * AddToList adds m/z and intensity values in and MSData struct to a linked list. * The first parameter is a pointer to a struct * of type MSData containing the firt bit of data in the list. The second parameter * is a pointer to a struct of type MSData * containing a new piece of data to be added to the list. The end of this linked * list is signaled by the presence of a zero in the * next field. * */ struct MSData *AddToCIDList(struct MSData *firstPtr, struct MSData *currPtr) { if(firstPtr == NULL) { firstPtr = currPtr; } else { gLastDataPtr->next = currPtr; } gLastDataPtr = currPtr; return(firstPtr); } /***************ModifyList********************************************************** * Replaces most recently added element in a linked list with a new element. The * first paramter is a pointer to a struct of * type MSData containing the first bit of data in the list. The second paramter * is a pointer to a struct of type MSData * containing a new piece of data to replace the last element in the list. NULL * is placed in the next field of this struct element. * */ void ModifyList(struct MSData *firstPtr, struct MSData *currPtr) { struct MSData *lastPtr, *nextToLastPtr; lastPtr = firstPtr; if(firstPtr == NULL) { firstPtr = currPtr; return; } if(firstPtr->next == NULL) { return; } else { while(lastPtr->next != NULL) { nextToLastPtr = lastPtr; lastPtr = lastPtr->next; } free(lastPtr); nextToLastPtr->next = currPtr; } currPtr->next = NULL; gLastDataPtr = currPtr; } /***************FindThreshold***************************************** * * FindThreshold sums the intensities of all of the ions in the list * of tMSData structs whose intensity is greater than zero. This sum * is then divided by the total number of non-zero ions present, and is * returned as a INT_4. * ******/ INT_4 FindThreshold(tMSDataList *inMSDataList) { tMSData *currPtr; tMSData *ptrOfNoReturn; INT_4 signal = 0; INT_4 numOfNonZeroPts = 0; REAL_4 intensitySum = 0; REAL_4 noise = 0; if (inMSDataList->numObjects != 0) { currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn) { if(currPtr->intensity != 0) /*dont count the zero intensity data points*/ { numOfNonZeroPts++; intensitySum += currPtr->intensity; /*Add the intensity.*/ } currPtr++; /*Increment to the next data point.*/ } noise = intensitySum / numOfNonZeroPts; printf("Average signal (noise) = %.2f\n",noise); /*debug*/ printf("Data points = %ld\n", numOfNonZeroPts); /*debug*/ signal = noise * gParam.ionThreshold; if(signal == 0) signal = 1; /*added 11/10/98 for qtof data*/ } return(signal); } /***********************************LowMassIonRemoval**************************************** * * Low mass ions that could not be due to amino acid immonium ions and other small fragments * derived from amino acids are removed. For tryptic peptides, the mass range below y1 for * Lys (m/z 147) are removed, and for other cleavages, ions below m/z 90 (y1 for Ala) are * eliminated. * */ void LowMassIonRemoval(tMSDataList *inMSDataList) { REAL_4 lowMassIons[23] = {44.0500, 70.0657, 87.0922, 112.0875, 87.0558, 88.0399, 102.0555, 84.0450, 101.0715, 129.0664, 110.0718, 86.0970, 86.0970, 84.0814, 101.1079, 129.1028, 104.0534, 120.0813, 70.0657, 60.0449, 74.0606, 136.0762, 72.0813 }; INT_4 i; tMSData *currPtr; tMSData *ptrOfNoReturn; BOOLEAN thumbsDown; currPtr = &inMSDataList->mass[0]; ptrOfNoReturn = &inMSDataList->mass[inMSDataList->numObjects]; while (currPtr < ptrOfNoReturn) { if(currPtr->mOverZ < 74.5) { thumbsDown = TRUE; /* Spare potential immonium ions */ for(i = 0; i < 23; i++) { if (currPtr->mOverZ <= lowMassIons[i] + gParam.fragmentErr && currPtr->mOverZ >= lowMassIons[i] - gParam.fragmentErr) { thumbsDown = FALSE; break; } } if (TRUE == thumbsDown) { RemoveFromList(currPtr - &inMSDataList->mass[0], inMSDataList); ptrOfNoReturn--; currPtr--; } } currPtr++; } return; } /***********************************IonCondenser********************************************** * * This function was originally designed with centroided data in mind, but it seemed to work * well for profile data, too. The idea is that groups of ions that are above the 'threshold' * of ion intensity, and are close in m/z (as defined by 'peakWidth'). The value of 'precursor' * is the m/z value of the precursor ion, and it is used to determine when to switch the * threshold value to one-half of what was read from the file Lutefisk.params. * */ tMSDataList *IonCondenser(tMSDataList *inMSDataList) { INT_4 i, j; REAL_4 halfWidth = gParam.peakWidth/2; tMSDataList * intensityOrderedList = NULL; tMSDataList * peakList = NULL; tMSData potentialPeak; tMSData peak; tMSData msData; REAL_8 intensitySum = 0.0; INT_4 prevIntensity = 0; REAL_4 avgIonMass = 0; /* Add index values */ for (i = 0; i < inMSDataList->numObjects; i++) { inMSDataList->mass[i].index = i; } intensityOrderedList = (tMSDataList *) CopyList( inMSDataList ); if (!intensityOrderedList) { printf("Ran out of memory in IonCondenser()!\n"); exit(1); } /* Sort the intensityOrderedList in order of decreasing intensity */ qsort(intensityOrderedList->mass,(size_t)intensityOrderedList->numObjects, (size_t)sizeof(tMSData),IntensityDescendSortFunc); peakList = (tMSDataList *) CreateNewList( sizeof(tMSData), 1000, 1000 ); if (!peakList) { printf("Ran out of memory in IonCondenser()!\n"); exit(1); } /* Find all the peak tops that are at least a peakWidth apart. */ for (i = 0; i < intensityOrderedList->numObjects; i++) { potentialPeak = intensityOrderedList->mass[i]; /* Quit when we hit the zero intensity points */ if (potentialPeak.intensity <= 0) break; /* Get rid of ions below mass 43 (44 is the immonium ion of alanine). */ if (potentialPeak.mOverZ < 43.0) continue; /* Don't try to make a peak out of an ion whose intensity is less than the threshold. */ if (potentialPeak.intensity < gParam.intThreshold) { if((potentialPeak.mOverZ > 43.0 && potentialPeak.mOverZ < 147.5) || (potentialPeak.mOverZ > 174.5 && potentialPeak.mOverZ < 175.5) || (potentialPeak.mOverZ > 158.5 && potentialPeak.mOverZ < 159.5)) { /* Let immoniums thru even if they are low intensity */ } else continue; } /* Would the potential peak overlap the domain of an existing peak? */ for (j = 0; j < peakList->numObjects; j++) { peak = peakList->mass[j]; if (potentialPeak.mOverZ >= peak.mOverZ - gParam.peakWidth && potentialPeak.mOverZ <= peak.mOverZ + gParam.peakWidth) break; /**Fixed by RSJ. Try to keep peaks closer to 1 Da apart.**/ if(gParam.peakWidth < 0.8) { if (potentialPeak.mOverZ >= peak.mOverZ - 0.8 && potentialPeak.mOverZ <= peak.mOverZ + 0.8) break; } } if (j == peakList->numObjects) { /* Add a new peak. */ intensitySum = 0; avgIonMass = 0; /* Calculate weighted average mass for the peak. */ /* Leading edge... */ prevIntensity = potentialPeak.intensity; j = potentialPeak.index; while (j >= 0) { msData = inMSDataList->mass[j]; if (msData.mOverZ < potentialPeak.mOverZ - halfWidth) break; if (msData.intensity >= gParam.intThreshold) { intensitySum += msData.intensity; avgIonMass += (msData.intensity * msData.mOverZ); } else if (prevIntensity >= msData.intensity && ((msData.mOverZ > 43.0 && msData.mOverZ < 147.5) || (msData.mOverZ > 174.5 && msData.mOverZ < 175.5) || (msData.mOverZ > 158.5 && msData.mOverZ < 159.5))) { /* else if (prevIntensity >= msData.intensity || (msData.mOverZ > 43.0 && msData.mOverZ < 147.5) || (msData.mOverZ > 174.5 && msData.mOverZ < 175.5) || (msData.mOverZ > 158.5 && msData.mOverZ < 159.5)) { */ intensitySum += msData.intensity; avgIonMass += (msData.intensity * msData.mOverZ); } prevIntensity = msData.intensity; j--; } /* Trailing edge... */ prevIntensity = potentialPeak.intensity; j = potentialPeak.index + 1; while (j < inMSDataList->numObjects) { msData = inMSDataList->mass[j]; if (msData.mOverZ > potentialPeak.mOverZ + halfWidth) break; if (msData.intensity >= gParam.intThreshold) { intensitySum += (INT_4)msData.intensity; avgIonMass += (REAL_4)(msData.intensity * msData.mOverZ); } else if (prevIntensity >= msData.intensity && ((msData.mOverZ > 43.0 && msData.mOverZ < 147.5) || (msData.mOverZ > 174.5 && msData.mOverZ < 175.5) || (msData.mOverZ > 158.5 && msData.mOverZ < 159.5))) { /* else if (prevIntensity >= msData.intensity || (msData.mOverZ > 43.0 && msData.mOverZ < 147.5) || (msData.mOverZ > 174.5 && msData.mOverZ < 175.5) || (msData.mOverZ > 158.5 && msData.mOverZ < 159.5)) { */ intensitySum += msData.intensity; avgIonMass += (msData.intensity * msData.mOverZ); } prevIntensity = msData.intensity; j++; } if (intensitySum == 0) { printf("ionCondenser(): intensitySum = 0.\n" "Threshold: %d, potential peak: %6.2f %d (index %d)\n", gParam.intThreshold, potentialPeak.mOverZ, potentialPeak.intensity, potentialPeak.index); /* Leading edge... */ prevIntensity = potentialPeak.intensity; j = potentialPeak.index; while (j >= 0) { msData = inMSDataList->mass[j]; printf("Leading: %6.2f %d (%d)\n", msData.mOverZ, msData.intensity, msData.index); if (msData.mOverZ < potentialPeak.mOverZ - halfWidth) break; j--; } /* Trailing edge... */ prevIntensity = potentialPeak.intensity; j = potentialPeak.index + 1; while (j < inMSDataList->numObjects) { msData = inMSDataList->mass[j]; printf("Trailing: %6.2f %d (%d)\n", msData.mOverZ, msData.intensity, msData.index); if (msData.mOverZ > potentialPeak.mOverZ + halfWidth) break; j++; } /**************/ for (j = 0; j < inMSDataList->numObjects; j++) { inMSDataList->mass[i].index = i; printf("AFTER: %6.2f %d\n", inMSDataList->mass[j].mOverZ, inMSDataList->mass[j].intensity); } printf("pointer: %d\n", inMSDataList); /***********/ exit(1); } potentialPeak.mOverZ = avgIonMass / intensitySum; /* Now double check to be sure that the mass still is not within a peakwidth of another peak. */ for (j = 0; j < peakList->numObjects; j++) { peak = peakList->mass[j]; if (potentialPeak.mOverZ >= peak.mOverZ - gParam.peakWidth && potentialPeak.mOverZ <= peak.mOverZ + gParam.peakWidth) break; } if (j == peakList->numObjects) { /* Add a new peak. */ if(!AddToList(&potentialPeak, peakList)) { printf("Ran out of memory in IonCondenser()!\n"); exit(1); } } } } if (intensityOrderedList) DisposeList(intensityOrderedList); /* Resort the MSData in order of increasing mass */ qsort(peakList->mass,(size_t)peakList->numObjects, (size_t)sizeof(tMSData),MassAscendSortFunc); /* Add index values */ for (i = 0; i < peakList->numObjects; i++) { peakList->mass[i].index = i; peakList->mass[i].normIntensity = 0; } return peakList; } /* ------------------------------------------------------------------------- // my_fgets */ char *my_fgets(char *s, INT_4 n, FILE *fp) { register char *t = s; register long c; if (n < 1) return(NULL); while (--n) { if ((c = getc(fp)) < 0) { if (feof(fp) && t != s) break; return(NULL); } *t++ = c; if (c == '\n' || c == '\32') break; /* This is to handle stupid windows files that end each line with \r\n */ else if (c == '\r') { if ((c = getc(fp)) < 0) { if (feof(fp) && t != s) break; return(NULL); } if (c != '\n') { /* Roll back the file pointer or we will miss characters. */ fseek(fp, -1, SEEK_CUR); } break; } } *t = '\0'; return(s); } #ifdef DEBUG /**************************** DumpMassList ************************************** * * For debugging purposes only. * */ void DumpMassList(struct MSData *firstPtr) { struct MSData *currPtr = NULL; INT_4 ionNum; INT_4 intensity[3000]; REAL_4 mass[3000]; /* Create mass and integerMass arrays from linked list data */ ionNum = 0; currPtr = firstPtr; while(currPtr != NULL) { mass[ionNum] = currPtr->mOverZ; intensity[ionNum] = currPtr->intensity; ionNum++; if(ionNum >= 3000) break; currPtr = currPtr->next; } } /**************************** DumpMSData ************************************** * * For debugging purposes only. * */ void DumpMSData(tMSDataList *inList) { INT_4 i; printf("\n\nMASS LIST:\n"); for (i = 0; i < inList->numObjects; i++) { if (i >= 3000) break; printf("%ld %7.3f %ld\n", i, inList->mass[i].mOverZ, inList->mass[i].intensity); } } #endif lutefisk-1.0.7+dfsg.orig/src/ListRoutines.h0000644000175000017500000000251410102256716020557 0ustar rusconirusconi/* ------------------------------------------------------------------------------------ // ListRoutines.h -- definitions and prototypes for ListRoutines.c // // ------------------------------------------------------------------------------------ */ #pragma once #ifndef __LISTROUTINES__ #define __LISTROUTINES__ #include "LutefiskDefinitions.h" #ifdef __cplusplus extern "C" { #endif #ifndef true #define true 1 #endif #ifndef false #define false 0 #endif #ifndef nil #define nil 0 #endif /* Generic list structues */ typedef char toObject; typedef struct { INT_4 numObjects; INT_4 limit; INT_4 sizeofobject; INT_4 growNum; toObject *object; } tlist; typedef struct { INT_4 numObjects; INT_4 limit; INT_4 sizeofobject; INT_4 growNum; INT_4 *entry; } tINT_2list; /* Public function prototypes ---~---~---~---~---~---~---~---~---~---~---~---~---~ */ extern void *CreateNewList(INT_4 sizeofobject, INT_4 initialNum, INT_4 growNum); extern void *CopyList(void *voidlist); extern char AddToList(void *voidobject, void *voidlist); extern void RemoveFromList(INT_4 index, void *voidlist); extern void TrimList(void *voidlist); extern void DisposeList(void *voiddeadlist); #ifdef __cplusplus } #endif #endif /* __LISTROUTINES__ */ lutefisk-1.0.7+dfsg.orig/src/LutefiskPrototypes.h0000644000175000017500000005352010303627360022015 0ustar rusconirusconi/********************************************************************************************* Lutefisk is software for de novo sequencing of peptides from tandem mass spectra. Copyright (C) 1995 Richard S. Johnson This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Contact: Richard S Johnson 4650 Forest Ave SE Mercer Island, WA 98040 jsrichar@alum.mit.edu *********************************************************************************************/ #ifndef __LUTEFISK__ #define __LUTEFISK__ #include "LutefiskDefinitions.h" /* Prototypes for LutefiskMain. */ void Run(void); void ReadParamsFile(void); INT_4 ReadDetailsFile(void); void SetupGapList(); void ReadEdmanFile(); void FreeSequenceScore(struct SequenceScore *currPtr); void FreeMassList(struct MSData *currPtr); void FreeSequence(struct Sequence *currPtr); void CreateGlobalIntegerMassArrays(struct MSData *firstMassPtr); void ChangeOutputName(void); void FindTheMultiplier(void); static void BuildPgmState(int argc, char **argv); BOOLEAN SystemCheck(void); void ReadResidueFile(void); void SetupSequenceTag(); void PrintPartingGiftToFile(void); void PrintHeaderToFile(void); void AdjustPeptideMW(struct MSData *firstMassPtr); /*Prototypes for Haggis*/ struct Sequence *Haggis(struct Sequence *firstSequencePtr , struct MSData *firstMassPtr); BOOLEAN NodeStep(INT_4 *nodeNum, INT_4 *nodeMass); void StoreSeq(INT_4 nodeNum, INT_4 *nodeMass); INT_4 ResidueMass(INT_4 inputMass); INT_4 *LoadMassArrays(INT_4 *mass, struct MSData *firstMassPtr, INT_4 charge); void SetupBackwardAndForwardNodes(INT_4 *mass); void FindNodeSequences(INT_4 *mass); void GetSequenceOfResidues(INT_4 *mass); struct Sequence *LinkHaggisSubsequenceList(struct Sequence *firstPtr, struct Sequence *newPtr); struct Sequence *LoadHaggisSequenceStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum, INT_2 nodeCorrection); void FleshOutSequenceEnds(struct MSData *firstMassPtr); void GetCTermMasses(INT_4 *cTermMassNum, INT_4 *cTermMasses); void GetNTermMasses(INT_4 *nTermMassNum, INT_4 *nTermMasses); void GetBestNtermSeq(INT_4 *sequenceToAdd, INT_4 mass, struct MSData *firstMassPtr); INT_4 RatchetHaggis(INT_4 *sequence, INT_4 residueNum, INT_4 position, INT_4 ratchetMass, INT_4 correctMass); void AppendSequences(void); void MakeAAArray(void); REAL_4 NSequenceScore(INT_4 mass, INT_4 *sequence, INT_4 residueNum, struct MSData *firstMassPtr); void GetBestCtermSeq(INT_4 *sequenceToAdd, INT_4 mass, char cTerm, struct MSData *firstMassPtr); char CheckCterm(struct MSData *firstMassPtr); REAL_4 CSequenceScore(INT_4 mass, INT_4 *sequence, INT_4 residueNum, struct MSData *firstMassPtr); void ModifyHaggisSequences(struct MSData *firstMassPtr); BOOLEAN FindBIon(INT_4 sequenceIndex,INT_4 residueIndex, struct MSData *firstMassPtr); BOOLEAN FindYIon(INT_4 sequenceIndex,INT_4 residueIndex, struct MSData *firstMassPtr); /* Prototypes for LutefiskGetCID.*/ tMSDataList *ReadCIDFile(char *inFilename); void AddTheIonOffset(tMSDataList *inMSDataList); void CalibrationCorrection(tMSDataList *inPeakList); void CentroidOrProfile(tMSDataList *inMSDataList); void CheckConnections(tMSDataList *inPeakList); void CheckSignalToNoise(tMSDataList *inPeakList, tMSDataList *inMSDataList); void CheckTheIntensity(tMSDataList *inMSDataList); void DefectCorrection(tMSDataList *inPeakList); void FindTheGoldenBoys(tMSDataList *inMSDataList); INT_4 FindThreshold(tMSDataList *inMSDataList); REAL_4 GetPeakWidth(tMSDataList *inMSDataList); void GuessAtTheFragmentPattern(); tMSDataList *IonCondenser(tMSDataList *inMSDataList); INT_4 IntensityDescendSortFunc(const void *n1, const void *n2); INT_4 MassAscendSortFunc(const void *n1, const void *n2); void NormalizeIntensity(tMSDataList *inMSDataList); INT_4 PurgeTheWindow(tMSDataList *inMSDataList, tMSData *windowStartPtr, INT_4 ionsInWindow, INT_4 endingMass/*, INT_4 maxIonsPerWindow*/); void RemoveIsotopes(tMSDataList *inMSDataList); void RemovePrecursors(tMSDataList *inMSDataList); void SmoothCID(tMSDataList *inMSDataList); void WeedTheIons(tMSDataList *inMSDataList, INT_4 finalIonCount, BOOLEAN spareGoldenBoys); void WindowFilter(tMSDataList *inMSDataList); void FreeAllMSData(struct MSData *currPtr); void free_all(struct MSData *s); struct MSData *LoadMSDataStruct(REAL_4 massValue, INT_4 ionIntensity); struct MSData *AddToCIDList(struct MSData *firstPtr, struct MSData *currPtr); void ModifyList(struct MSData *firstPtr, struct MSData *currPtr); INT_4 FindMedian(struct MSData *firstPtr); char *my_fgets(char *s, INT_4 n, FILE *fp); struct MSData *GetCidData(void); struct MSData *AddToListNoNull(struct MSData *firstPtr, struct MSData *currPtr); void SortByMass(struct MSData *firstAvMassPtr); INT_4 countIons(struct MSData *firstAvMassPtr); struct MSData *ZeroTheIons(struct MSData *firstAvMassPtr); INT_4 countIonsAgain(struct MSData *firstAvMassPtr); void EliminateBadHighMassIons(tMSDataList *inPeakList); void LowMassIonRemoval(tMSDataList *peakList); REAL_4 GeneralEval(tMSDataList *peakList); REAL_4 LowMassIonCheck(tMSDataList *peakList); REAL_4 HighMassIonCheck(tMSDataList *peakList); void FindBYGoldenBoys(tMSDataList *inMSDataList); #ifdef DEBUG void DumpMassList(struct MSData *firstPtr); void DumpMSData(tMSDataList *inList); #endif /*Prototypes for LutefiskMakeGraph.*/ void MakeSequenceGraph(struct MSData *firstMassPtr, SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal); void SequenceNodeInit(SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN); void TrypticTemplate(struct MSData *firstMassPtr, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN); void FindTrypticBIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN); char IsThisPossible(REAL_4 bMass, INT_4 currentCharge); void FindTrypticB17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN); void FindTrypticAIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent); void FindTrypticA17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent); void FindTrypticYIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeC); void FindTrypticY17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeC); void AddTag(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN); void AddCTermResidue(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN); char RatchetIt(INT_4 *aaNum, char cycle, char *sequence, INT_4 seqLength); void AddEdmanData(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal); char IsThisStillPossible(REAL_4 bMass, INT_4 currentCharge, struct MSData *firstMassPtr); void TrypticLCQTemplate(struct MSData *firstMassPtr, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN); void FindTrypticLCQBIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN); void FindTrypticLCQB17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN); void FindTrypticLCQAIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent); void FindTrypticLCQA17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeN, char *ionPresent); void FindTrypticLCQYIons(struct MSData *firstMassPtr, SCHAR *sequenceNodeC); void FindTrypticLCQY17Ions(struct MSData *firstMassPtr, SCHAR *sequenceNodeC); void RemoveSillyNodes(SCHAR *sequenceNodeC, SCHAR *sequenceNodeN); /*Prototypes for LutefiskSummedNode.*/ void SummedNodeScore(SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 *oneEdgeNodes, INT_4 *oneEdgeNodesIndex, INT_4 totalIonVal); void InitSummedNodeArrays(SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 *oneEdgeNodes, char *evidence); void AssignNodeValue(INT_4 nextNode, INT_4 currentNode, char *evidence, SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal); INT_4 FindCurrentNode(SCHAR *sequenceNode, INT_4 currentNode); void SortOneEdgeNodes(INT_4 *oneEdgeNodes, INT_4 *oneEdgeNodesIndex); void AddExtraNodes(SCHAR *sequenceNode, SCHAR *sequenceNodeN, SCHAR *sequenceNodeC, char *evidence); void AssignProNodeValue(INT_4 nextNode, INT_4 currentNode, char *evidence, SCHAR *sequenceNode, SCHAR *sequenceNodeC, SCHAR *sequenceNodeN, INT_4 totalIonVal); /*Prototypes for SubsequenceMaker.*/ struct Sequence *SubsequenceMaker(INT_4 *oneEdgeNodes, INT_4 oneEdgeNodesIndex, SCHAR *sequenceNode); struct Sequence *NterminalSubsequences(SCHAR *sequenceNode, INT_4 maxLastNode, INT_4 lowSuperNode, INT_4 highSuperNode); struct Sequence *LoadSequenceStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum, INT_2 nodeCorrection); struct Sequence *LoadFinalSequenceStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum, INT_2 nodeCorrection); struct Sequence *LinkSubsequenceList(struct Sequence *firstPtr, struct Sequence *newPtr); struct Sequence *StoreSubsequences(struct Sequence *newSubsequencePtr, struct extension *extensionList, struct Sequence *currentSubsequence,INT_4 *lastNode, INT_4 lastNodeNum, INT_4 maxLastNode, INT_4 minLastNode, INT_4 *aaPresentMass, INT_4 *seqNum, INT_4 *subseqNum, SCHAR *sequenceNode); int ExtensionsSortDescend(const void *n1, const void *n2); struct extension *SortExtensions(struct extension *inExtensionList); struct extension Score2aaExtension(struct extension io2aaExtension, INT_4 startingNodeMass, INT_4 *oneEdgeNodes, INT_4 oneEdgeNodesIndex, BOOLEAN oneEdgeNNode, char sequenceNodeValue); struct Sequence *AddExtensions(struct Sequence *subsequencePtr, SCHAR *sequenceNode, INT_4 *oneEdgeNodes, INT_4 oneEdgeNodesIndex, INT_4 *aaPresentMass, INT_4 topSeqNum, INT_4 *lastNode, INT_4 lastNodeNum, INT_4 *seqNum, INT_4 maxLastNode, INT_4 minLastNode, INT_4 lowSuperNode, INT_4 highSuperNode); void FreeSequenceStructs(struct Sequence *s); struct Sequence *AlterSubsequenceList(struct Sequence *firstPtr, struct Sequence *newPtr); char CorrectMass(INT_4 *peptide, INT_4 peptideLength, INT_4 *aaPresentMass); void amIHere(INT_4 correctPeptideLength, struct Sequence *subsequencePtr); struct Sequence *LCQNterminalSubsequences(SCHAR *sequenceNode, INT_4 maxLastNode, INT_4 lowSuperNode, INT_4 highSuperNode); void ClearLowestScore(struct Sequence *newSubsequencePtr, INT_4 *subseqNum, INT_4 maxLastNode, INT_4 minLastNode); /*Prototypes for LutefiskScore.*/ struct Sequence *ScoreSequences(struct Sequence *finalSequencePtr, struct MSData *firstMassPtr); void LoadTheIonArrays(struct MSData *firstMassPtr, INT_4 *fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity); INT_4 TotalIntensity(INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity); void WaterLoss(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity); void LoadSequence(INT_4 *sequence, INT_4 *seqLength, struct Sequence *currSeqPtr); void PEFragments(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength); char ArgIons(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength); void InternalFrag(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 fragNum, INT_4 *ionType); INT_4 FindABYIons(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, char argPresent, REAL_4 *yFound, REAL_4 *bFound, REAL_8 *byError, INT_4 *ionType); void ScoreLowMassIons(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 lowMassIons[][3], INT_4 *ionType); REAL_4 IntensityScorer(INT_4 *fragIntensity, REAL_4 *ionFound, INT_4 cleavageSites, INT_4 fragNum, INT_4 seqLength, INT_4 intensityTotal); void FreeAllSequence(struct Sequence *currPtr); void SeqIntensityRanker(struct SequenceScore *firstScorePtr); void SeqComboScoreRanker(struct SequenceScore *firstScorePtr); struct Sequence *AddTagBack(struct Sequence *firstSequencePtr); void PrintToConsole(struct SequenceScore *firstScorePtr); REAL_4 CalcIonFound(REAL_4 currentIonFound, INT_4 massDiff); void PrintToConsoleAndFile(struct SequenceScore *firstScorePtr, REAL_4 quality, INT_4 length, REAL_4 perfectProbScore); INT_4 FindBYIons(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength); REAL_4 BYIntensityScorer(INT_4 *fragIntensity, REAL_4 *ionFound, INT_4 cleavageSites, INT_4 fragNum, INT_4 seqLength, INT_4 intensityTotal); REAL_4 IntensityOnlyScorer(INT_4 *fragIntensity, REAL_4 *ionFound, INT_4 fragNum, INT_4 intensityTotal); void CTerminalLysOrArg(struct MSData *firstMassPtr); struct Sequence *InitLutefiskScore(INT_4 *sequence, INT_4 *fragMOverZ, INT_4 *fragIntensity, REAL_4 *ionFound, REAL_4 *ionFoundTemplate, INT_4 *countTheSeqs, struct Sequence *firstSequencePtr, REAL_4 *yFound, REAL_4 *bFound, struct MSData *firstMassPtr, INT_4 *charSequence, REAL_8 *byError, INT_4 *ionType); void TossTheLosers(struct Sequence *firstSequencePtr, REAL_4 *ionFoundTemplate, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *fragIntensity, INT_4 intensityTotal, REAL_4 *ionFound, INT_4 *countTheSeqs, INT_4 *sequence); INT_4 FindNCharge(INT_4 *sequence, INT_4 seqLength); char TwoAAExtFinder(INT_4 *sequence, INT_4 i); INT_4 BCalculator(INT_4 i, INT_4 *sequence, INT_4 bCalStart, INT_4 bCalCorrection); INT_4 YCalculator(INT_4 i, INT_4 *sequence, INT_4 seqLength, INT_4 yCalStart, INT_4 yCalCorrection); BOOLEAN CheckItOut(struct Sequence *firstSequencePtr); INT_4 AlterIonFound(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, REAL_4 *yFound, REAL_4 *bFound, INT_4 newCleavageSites); void FreeAllSequenceScore(struct SequenceScore *currPtr); void ScoreBYIsotopes(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 fragNum, INT_4 *ionType); void AdjustIonIntensity(INT_4 fragNum, INT_4 *fragIntensity); void RevertBackToReals(struct MSData *firstMassPtr, struct SequenceScore *firstScorePtr); BOOLEAN CheckItOutSequenceScore(struct SequenceScore *firstSequencePtr); void HighMOverZFilter(struct Sequence *firstSequencePtr, INT_4 *fragMOverZ, INT_4 *fragIntensity, INT_4 *countTheSeqs, INT_4 *sequence, INT_4 fragNum); REAL_4 AssignHighMZScore(INT_4 highMZNum, INT_4 *highMZFrags, INT_4 *highMZInts, REAL_4 totalIntensity, INT_4 *sequence, INT_4 seqLength); struct Sequence *RemoveRedundantSequences(struct Sequence *firstSequencePtr); char SingleAA(INT_4 aminoAcidMass); char *PeptideString(INT_4 *peptide, INT_4 peptideLength); REAL_4 AssignError(REAL_4 currentError, INT_4 calculatedMass, INT_4 observedMass); REAL_4 StandardDeviationOfTheBYErrors(REAL_8 *byError, INT_4 fragNum); void BoostTheCTerminals(struct MSData *firstMassPtr); REAL_4 ProInThirdPosition(REAL_4 oldScore, INT_4 *sequence, INT_4 seqLength); INT_4 SequenceLengthCalc(INT_4 *sequence, INT_4 seqLength); void MakeNewgGapList(void); void ExpandSequences(struct Sequence *firstSequencePtr); char Ratchet(INT_4 *aaNum, INT_4 cycle, INT_4 *sequence, INT_4 seqLength); char GoodSequence(struct Sequence *firstSequencePtr, INT_4 *peptide, INT_4 *peptideCorrection, INT_4 peptideLength); void AddToGapList(struct Sequence *firstSequencePtr); char IsThisADuplicate(struct SequenceScore *firstScorePtr, INT_4 *sequence, REAL_4 intOnlyScore, REAL_4 intScore, INT_4 seqLength); REAL_4 Recalibrate(INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 *fragIntensity); void ProlineInternalFrag(REAL_4 *ionFound, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, INT_4 fragNum); void RescoreAndPrune(struct Sequence *firstSequencePtr, REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength, char argPresent, REAL_4 *yFound, REAL_4 *bFound, REAL_8 *byError, INT_4 cleavageSites, INT_4 lowMassIons[][3], REAL_4 *ionFoundTemplate, INT_4 *fragIntensity, INT_4 intensityTotal, INT_4 *ionType); REAL_4 ScoreAttenuationFromCalfactor(REAL_4 calFactor, REAL_4 intScore); void AddDatabaseSequences(struct Sequence *firstSequencePtr); char *GetDatabaseSeq(INT_4 *peptide, INT_4 peptideLength); void AddPyroGlu(); INT_4 FindNOxMet(INT_4 *sequence, INT_4 seqLength); void RevertTheRevertBackToReals(struct MSData *firstMassPtr); INT_4 SequenceLengthCalcNoFudge(INT_4 *sequence, INT_4 seqLength); REAL_4 MassBasedQuality(INT_4 *sequence, INT_4 seqLength, INT_4 fragNum, INT_4 *fragMOverZ, char argPresent); BOOLEAN KeepSequence(struct SequenceScore *currPtr, REAL_4 intscrKeep, REAL_4 xcorrKeep, REAL_4 qualityKeep, REAL_4 probscrKeep); REAL_4 ComboScore(struct SequenceScore *currPtr); struct SequenceScore *LoadSeqScoreStruct(REAL_4 intScore, REAL_4 intOnlyScore, INT_4 *sequence, INT_4 *charSequence, INT_4 seqLength, REAL_4 stDevErr, INT_4 cleavageSites, REAL_4 calFactor, char databaseSeq, REAL_4 normalXCorScore, REAL_4 quality, REAL_4 length, REAL_4 probScore, REAL_4 comboScore); struct SequenceScore *AddToSeqScoreList(struct SequenceScore *firstPtr, struct SequenceScore *currPtr); struct SequenceScore *FindLowestScore(struct SequenceScore *currPtr); struct SequenceScore *MassageScores(struct SequenceScore *firstScorePtr); INT_4 ScoreC1(REAL_4 *ionFound, INT_4 fragNum, INT_4 *fragMOverZ, INT_4 *sequence, INT_4 seqLength); /*Prototypes for LutefiskXCorr.*/ extern void DoCrossCorrelationScoring(struct SequenceScore *firstScorePtr, struct MSData *firstMassPtr) ; void CrossCorrelate(REAL_4 *array1, REAL_4 *array2, UINT_4 n, REAL_4 *result); REAL_4 YXCorrCalc(INT_4 i, struct SequenceScore *currScorePtr, REAL_4 YionStart, INT_4 seqLength); REAL_4 BXCorrCalc(INT_4 i, struct SequenceScore *currScorePtr, REAL_4 BionStart); INT_4 FindNChargeXCorr(struct SequenceScore *currScorePtr); void AddPeakToSpectrum( REAL_4 *spectrum, REAL_4 mass, REAL_4 intensity); /*Prototypes for LutefiskFourier.*/ void CalcNormalizedExptPeaks(struct MSData *firstMassPtr); void FillInSpectrum1(struct MSData *firstMassPtr); void SetupCrossCorrelation(void); /*Prototypes for LutefiskGetAutoTag.*/ void MaskSequenceNodeWithTags(SCHAR *sequenceNode, char *tagNode); INT_4 TagExtensions(char *tagNode, INT_4 topSeqNum, INT_4 *tagNodeIntensity, INT_4 totalIonIntensity); INT_4 NterminalTags(char *tagNode, INT_4 *tagNodeIntensity); void TagMaker(char *tagNode, INT_4 *tagNodeIntensity, INT_4 totalIonIntensity); void FindTagYIons(char *tagNode, INT_4 charge, INT_4 *tagNodeIntensity); void TagNodeInit(char *tagNode, INT_4 *tagNodeIntensity); void GetAutoTag(struct MSData *firstMassPtr, SCHAR *sequenceNode); struct Sequence *AlterTagList(struct Sequence *firstPtr, struct Sequence *newPtr); void FreeTagStructs(struct Sequence *currPtr); struct Sequence *LoadFinalTagStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum); struct Sequence *LinkTagList(struct Sequence *firstPtr, struct Sequence *newPtr); struct Sequence *LoadTagStruct(INT_4 *peptide, INT_4 peptideLength, INT_4 score, INT_4 nodeValue, INT_4 gapNum); void FreeMSData(struct MSData *currPtr); struct MSData *LoadMSData(REAL_4 massValue, INT_4 ionIntensity); struct MSData *AddCIDList(struct MSData *firstPtr, struct MSData *currPtr); void FindTagB2Ions(struct MSData *firstMassPtr, INT_4 *totalIntensity, char *tagNode, INT_4 *tagNodeIntensity); void FindMoreB2Ions(struct MSData *firstMassPtr, INT_4 *totalIntensity, char *tagNode, INT_4 *tagNodeIntensity); INT_4 AlternateNterminalTags(char *tagNode, INT_4 *tagNodeIntensity); void AlternateTagMaker(char *tagNode, INT_4 *tagNodeIntensity, INT_4 totalIonIntensity); void FilterTagMasses(INT_4 charge); int SequenceScoreDescendSortFunc(const void *n1, const void *n2); char *ComposePeptideString(INT_4 *peptide, INT_4 peptideLength); void RemoveNeutralLosses(INT_4 charge); void PrintScoreDetailsToXLFile(struct SequenceScore *firstScorePtr, REAL_4 perfectProbScore); REAL_4 CalcPerfectProbScore(INT_4 fragNum, INT_4 *fragMOverZ); struct SequenceScore *DetermineBestCandidates(struct SequenceScore *firstScorePtr); struct extension *SortExtension(struct extension *inExtensionList); /*Prototypes for LutefiskProbScorer*/ REAL_4 LutefiskProbScorer(INT_4 *sequence, INT_4 seqLength, INT_4 fragNum, INT_4 *fragMOverZ, char argPresent); REAL_8 FindImmoniumIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength); REAL_8 FindInternalIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength); REAL_4 InitProbScore(INT_4 *sequence, INT_4 seqLength); REAL_8 FindBIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength, char argPresent); REAL_8 FindYIons(INT_4 *mass, INT_4 ionCount, REAL_8 probScore, REAL_4 *randomProb, INT_4 *sequence, INT_4 seqLength, char argPresent); void CalcRandomProb(REAL_4 *randomProb, INT_4 *mass, INT_4 ionCount); #endif /* __LUTEFISK__*/ lutefisk-1.0.7+dfsg.orig/src/ListRoutines.c0000644000175000017500000001203110102256716020545 0ustar rusconirusconi/* ------------------------------------------------------------------------------------ // ListRoutines.c -- General list routines. Adapted from an idea from Ben Halpern. // Written by JAT 1995; // ------------------------------------------------------------------------------------ */ #include #include #include "ListRoutines.h" /*------------------------------------------------------------------------------------ // CREATE NEW LIST -- Allocate and initialize a list objects of size sizeofobject. // For tfoolist *foolist, use: foolist = CreateNewList(sizeof(tfoo), 10 , 10); // A pointer to the list is returned. If the list could not be created then NULL is returned. */ void *CreateNewList(INT_4 sizeofobject, INT_4 initialNum, INT_4 growNum) { tlist *list = nil; list = (tlist *) malloc(sizeof(tlist)); if ( nil == list ) return nil; list->numObjects = 0; list->limit = initialNum; list->sizeofobject = sizeofobject; list->growNum = growNum; list->object = (char*) malloc( (size_t)(initialNum * sizeofobject) ); if ( nil == list->object ) { free( list ); return nil; } return list; } /*------------------------------------------------------------------------------------ // COPY LIST -- Make a copy of the list. */ void *CopyList(void *voidlist) { tlist *inList = nil; tlist *outList = nil; inList = (tlist *) voidlist; outList = (tlist *) CreateNewList(inList->sizeofobject, inList->numObjects, inList->growNum); if ( nil == outList ) return nil; outList->numObjects = inList->numObjects; memcpy(&(outList->object[0]), &(inList->object[0]), (size_t)(outList->sizeofobject * outList->numObjects)); return outList; } /*------------------------------------------------------------------------------------ // ADD TO LIST -- Add an object of the appropriate size to the list. For tfoolist // *foolist which was obtained with CreateNewList, and tfoo *foo, // use: if ( !AddToList(foo, foolist) ) goto finishUp; The function returns true if // the object was successfully added to the list and false if it was not. */ char AddToList(void *voidobject, void *voidlist) { char *newobjects; toObject *object; tlist *list; INT_4 sizeoflist; object = (toObject *) voidobject; list = (tlist *) voidlist; sizeoflist = list->numObjects * list->sizeofobject; if (list->numObjects >= list->limit) { /* Boost the size of the list if we are going to overflow. */ newobjects = (char*) malloc((list->limit + list->growNum) * list->sizeofobject); if ( newobjects ) { memcpy(newobjects, list->object, (size_t)sizeoflist); list->limit += list->growNum; free( list->object ); list->object = newobjects; } else return false; } memcpy(&(list->object[0]) + (size_t)sizeoflist, object, (size_t)(list->sizeofobject)); list->numObjects++; return true; } /*------------------------------------------------------------------------------------ // REMOVE FROM LIST -- Remove the object of the given index from the list. */ void RemoveFromList(INT_4 index, void *voidlist) { tlist *list; INT_4 sizeofobjectstomove; INT_4 sourceoffset, targetoffset; list = (tlist *) voidlist; targetoffset = index * list->sizeofobject; sourceoffset = targetoffset + list->sizeofobject; sizeofobjectstomove = list->numObjects * list->sizeofobject - sourceoffset; memmove(list->object + targetoffset, list->object + sourceoffset, (size_t)sizeofobjectstomove); list->numObjects--; } /*------------------------------------------------------------------------------------ // TRIM LIST -- Free up unused memory and reset limit accordingly. */ void TrimList(void *voidlist) { tlist *list; INT_4 sizeoflist; list = (tlist *) voidlist; sizeoflist = list->numObjects * list->sizeofobject; list->object = (char *) realloc(list->object, (size_t)sizeoflist ); list->limit = list->numObjects; } /*------------------------------------------------------------------------------------ // DISPOSE LIST -- Dispose of the list. */ void DisposeList(void *voiddeadlist) { tlist *deadlist; if ( nil == voiddeadlist ) return; /* Don't try to dispose of it if it doesn't exist. Nasty things could happen. */ deadlist = (tlist *) voiddeadlist; free( deadlist->object ); free( deadlist ); } lutefisk-1.0.7+dfsg.orig/Lutefisk.details0000644000175000017500000000111010307630031020270 0ustar rusconirusconi4 9 9 / b ion values for nodes of type general, triple quad tryptic, and ion trap tryptic 2 0 0 / a ion values 0 0 0 / c ion values Note: The sum of the ion values must be less 0 0 0 / d ion values than 30 (memory issues). 2 1 1 / b-17 or b-18 ion values 2 0 0 / a-17 or a-18 ion values 4 9 9 / y ion values 0 0 0 / y-2 ion values 2 1 1 / y-17 or y-18 ion values 0 0 0 / x ion values 0 0 0 / z+1 ion values 0 0 0 / w ion values 0 0 0 / v ion values 0 0 0 / b-OH ion values 0 0 0 / b-OH-17 ion values lutefisk-1.0.7+dfsg.orig/Lutefisk.lcq_params0000644000175000017500000001244110303453352021003 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 0 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.75 | Peptide molecular weight tolerance. Fragment Error (u): 0.5 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 1 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.01 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 6 | Ions per input window (windows are 60 Da wide). 8 for Qtof, 6 for LCQ Ions Per Residue: 4 | Number of ions per average residue. 6 for Qtof, 4 for LCQ // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: L | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: 1.0078 | N-terminal mass [1.0078(unmod), 43.0184(acetyl), 44.0136(carbamyl)] Modified C-terminus: 17.0027 | C-terminal mass [17.0027(unmod), 16.0187(amide), 31.0184(methyl)] Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: N | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): 9.5 | US shoe size. Default of 17. // Output -------------------------------------------------------------------------------- Number of sequences: 10 | Number of output sequences listed. A good bet is 10 Score threshold: 0.2 | Pr(c) is approximate probability that at least half of the sequence is correct. A good bet is 0.20. lutefisk-1.0.7+dfsg.orig/Lutefisk.qtof_params0000644000175000017500000001247310307633112021200 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: Qtof_ELVISLIVESK.dta | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 0 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.45 | Peptide molecular weight tolerance. Fragment Error (u): 0.25 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0.04 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 0.75 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.01 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 8 | Ions per input window (windows are 60 Da wide). 8 for Qtof, 6 for LCQ Ions Per Residue: 6 | Number of ions per average residue. 6 for Qtof, 4 for LCQ // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: Q | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: 1.0078 | N-terminal mass [1.0078(unmod), 43.0184(acetyl), 44.0136(carbamyl)] Modified C-terminus: 17.0027 | C-terminal mass [17.0027(unmod), 16.0187(amide), 31.0184(methyl)] Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: Y | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): 9.5 | US shoe size. Default of 17. // Output -------------------------------------------------------------------------------- Number of sequences: 10 | Number of output sequences listed. A good bet is 10 Score threshold: 0.2 | Pr(c) is approximate probability that at least half of the sequence is correct. A good bet is 0.20. lutefisk-1.0.7+dfsg.orig/Lutefisk.paramsLCQ0000644000175000017500000001260610102222720020475 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: data\databaseTestQtof\LKPDPNTLCDEFKADEK_3_qtof.dta | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 2 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.65 | Peptide molecular weight tolerance. Fragment Error (u): 0.65 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 1 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.01 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 6 | Ions per input window (windows are 60 Da wide). 8 for Qtof, 6 for LCQ Ions Per Residue: 4 | Number of ions per average residue. 6 for Qtof, 4 for LCQ // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: L | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: 1.0078 | N-terminal mass [1.0078(unmod), 43.0184(acetyl), 44.0136(carbamyl)] Modified C-terminus: 17.0027 | C-terminal mass [17.0027(unmod), 16.0187(amide), 31.0184(methyl)] Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: N | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): 30 | US shoe size. Default of 17. // Output -------------------------------------------------------------------------------- Number of sequences: 5 | Number of output sequences listed. A good bet is 5 Score threshold: 0.2 | Pr(c) is approximate probability that at least half of the sequence is correct. A good bet is 0.20. lutefisk-1.0.7+dfsg.orig/Qtof_ELVISLIVESK.dta0000644000175000017500000006402710023114374020440 0ustar rusconirusconi1229.74 2 68.6871 4.0 68.7309 3.0 69.0631 5.0 71.0826 3.0 71.7839 2.0 72.0778 1924.0 73.0807 81.0 73.5243 1.0 74.1006 5.0 82.3521 3.0 82.4109 4.0 84.0695 48.0 85.0655 3.0 86.0933 4805.0 86.3840 1.0 86.4551 2.0 86.6412 4.0 86.9153 2.0 86.9976 2.0 87.0948 310.0 87.1513 4.0 87.2612 5.0 87.3162 3.0 87.4427 4.0 88.1012 12.0 89.0547 14.0 100.0770 7.0 101.1110 8.0 102.0514 225.0 102.1206 6.0 103.0504 6.0 112.0871 5.0 113.0656 15.0 113.1237 1.0 114.0770 7.0 117.1011 3.0 118.0847 43.0 119.0770 2.0 124.0794 5.0 129.0891 247.0 129.1732 4.0 129.2247 2.0 130.0772 125.0 130.1697 8.0 131.0941 9.0 132.0935 25.0 133.0906 11.0 134.9733 2.0 139.0370 2.0 141.0549 56.0 141.1194 5.0 142.0465 7.0 142.2331 2.0 144.0424 3.0 147.1050 798.0 147.1959 19.0 148.0246 2.0 148.1123 68.0 148.2251 1.0 149.1148 1.0 152.0973 4.0 152.1917 3.0 153.0714 4.0 154.1070 2.0 155.0947 13.0 156.1003 4.0 157.1314 6.0 158.0843 1.0 159.0848 2.0 166.1077 2.0 168.0931 3.0 169.1093 2.0 170.1124 26.0 171.1132 12.0 173.1198 135.0 174.1189 11.0 174.2210 1.0 175.1231 3.0 177.0910 7.0 177.2633 1.0 179.0786 40.0 179.7162 5.0 180.0793 3.0 181.0756 3.0 181.7969 2.0 181.9001 1.0 183.1062 81.0 183.1960 2.0 184.1003 7.0 185.1545 1261.0 185.2709 41.0 185.7456 1.0 186.1590 116.0 186.2673 4.0 187.1920 5.0 189.0888 2.0 189.7610 3.0 190.1503 1.0 191.0846 4.0 192.0783 2.0 195.1324 5.0 197.1172 869.0 197.2310 36.0 198.1206 124.0 199.0316 1.0 199.1089 42.0 199.1644 21.0 200.1156 5.0 201.1137 610.0 201.2126 19.0 202.0147 2.0 202.1193 67.0 203.1083 9.0 203.2293 4.0 204.1364 2.0 208.1495 1.0 209.1015 1.0 209.1866 2.0 209.2972 10.0 210.3627 3.0 213.0512 2.0 213.1505 1289.0 213.2652 56.0 213.8249 1.0 214.1497 159.0 214.6001 3.0 215.1276 5957.0 215.2298 237.0 215.6876 1.0 215.9037 4.0 216.1288 862.0 216.2652 36.0 216.4748 2.0 216.6066 9.0 216.7433 3.0 216.9599 2.0 217.1257 127.0 217.4282 2.0 217.5063 3.0 217.7928 2.0 217.8970 4.0 218.1056 16.0 218.2707 1.0 218.9581 4.0 219.1166 20.0 219.2717 2.0 220.2923 1.0 220.4671 3.0 221.1316 1.0 222.1654 1.0 223.1313 5.0 224.0375 2.0 225.1113 2245.0 225.2141 89.0 225.5816 5.0 226.1146 281.0 226.2653 15.0 226.7320 3.0 227.1419 64.0 227.2640 1.0 227.6722 1.0 227.9920 5.0 228.1608 12.0 228.9793 2.0 229.1082 58.0 229.6387 4.0 230.0848 4.0 230.2187 6.0 231.0230 1.0 231.1615 90.0 231.2646 15.0 232.0617 2.0 232.1513 14.0 233.0668 1.0 233.1566 3.0 234.1315 3378.0 234.2549 163.0 234.7678 4.0 235.1341 437.0 235.5799 4.0 235.7155 1.0 235.9956 4.0 236.1330 45.0 236.2578 3.0 236.6922 1.0 236.8190 2.0 236.8914 16.0 237.1269 3.0 238.1792 3.0 238.6154 3.0 238.7426 1.0 239.1611 1.0 239.4160 3.0 240.2159 15.0 240.3091 1.0 241.1606 27.0 242.1553 8.0 242.2560 2.0 242.5034 3.0 242.7325 2.0 242.8609 2.0 243.1209 12474.0 243.2404 569.0 243.5492 2.0 243.7329 8.0 243.9122 13.0 244.1237 1710.0 244.3212 41.0 244.5604 5.0 244.6524 3.0 244.7482 7.0 244.8734 6.0 245.0023 7.0 245.1268 227.0 245.2331 16.0 245.2972 27.0 245.9058 2.0 246.1130 23.0 246.3767 4.0 246.5060 4.0 246.6816 1.0 246.9682 6.0 247.1394 11.0 247.7687 4.0 247.9772 3.0 248.0513 1.0 248.1625 3.0 248.5984 2.0 249.1090 1.0 249.5922 3.0 251.1700 9.0 252.1553 4.0 253.2498 3.0 254.1777 4.0 255.1674 14.0 256.1046 2.0 256.2509 52.0 257.0284 2.0 257.2454 10.0 258.1333 4.0 258.2373 8.0 258.8807 1.0 259.0701 2.0 259.1649 8.0 260.1605 4.0 261.0345 2.0 265.3111 2.0 266.1649 1.0 266.8085 2.0 268.2032 14.0 269.1743 23.0 270.2026 2.0 270.9285 3.0 272.2097 25.0 272.3057 1.0 273.1609 1.0 274.2025 2.0 279.1454 5.0 279.2241 1.0 280.1885 3.0 280.9968 1.0 281.2084 24.0 281.3718 1.0 282.1633 174.0 282.2609 11.0 283.1681 29.0 284.0134 2.0 284.1622 6.0 284.2714 2.0 285.1076 14.0 285.1948 1.0 285.3041 4.0 285.3737 1.0 286.1986 22.0 287.2057 5.0 288.1539 2.0 289.3338 2.0 290.3055 3.0 291.6305 1.0 292.1734 1.0 294.1885 11.0 295.1482 2.0 296.0486 1.0 296.1813 2596.0 296.3065 110.0 296.9605 9.0 297.1776 681.0 297.3109 28.0 297.5185 2.0 297.9551 1.0 298.1789 104.0 298.3453 7.0 298.5547 2.0 298.7480 5.0 299.1829 11.0 299.3687 2.0 299.5113 4.0 300.1795 225.0 300.3775 11.0 300.8264 3.0 301.1707 45.0 301.7661 3.0 301.9399 2.0 302.1795 15.0 302.9019 2.0 303.1477 8.0 303.3936 4.0 304.0192 3.0 304.1731 4.0 304.2860 3.0 305.4573 1.0 306.1777 1.0 308.1475 2.0 309.1399 4.0 310.1858 3.0 311.0154 2.0 311.1861 13.0 312.0333 2.0 312.1407 52.0 312.3348 2.0 313.1764 9.0 314.0531 1.0 314.1883 503.0 314.3144 17.0 314.4286 8.0 315.1902 80.0 315.3684 1.0 316.1434 52.0 316.4248 4.0 316.8122 2.0 317.0007 8.0 317.1322 9.0 318.1752 3.0 318.3012 3.0 320.1727 3.0 321.0578 3.0 321.2581 4.0 321.7119 1.0 323.1065 3.0 323.9321 1.0 324.0381 5.0 324.1737 6364.0 324.3201 263.0 324.6209 2.0 325.0459 8.0 325.1765 1217.0 325.3322 56.0 325.5439 6.0 325.7138 3.0 325.8307 4.0 326.0113 2.0 326.1914 260.0 326.4151 9.0 326.7661 4.0 326.8405 1.0 326.9469 3.0 327.0320 2.0 327.1875 86.0 327.4153 7.0 327.8733 2.0 328.1599 24.0 328.5662 1.0 329.0037 2.0 329.2065 9.0 330.1433 65.0 330.2858 4.0 331.1526 8.0 331.8812 1.0 332.1856 7.0 332.3315 3.0 334.1496 11.0 334.7653 17.0 335.7192 2.0 336.1614 3.0 338.2036 8.0 338.6583 2.0 341.5884 2.0 342.1829 7867.0 342.3175 359.0 342.6878 3.0 342.8730 4.0 343.1876 1464.0 343.3199 85.0 343.6252 5.0 343.7125 1.0 344.0399 6.0 344.1903 234.0 344.5421 2.0 344.7002 33.0 344.9245 4.0 345.1587 807.0 345.3618 30.0 345.6353 9.0 345.8322 3.0 346.0232 3.0 346.1635 139.0 346.5357 8.0 346.7740 1.0 346.9931 2.0 347.1466 28.0 347.2782 1.0 347.4866 4.0 348.0352 4.0 348.1956 7.0 348.4855 1.0 348.9580 2.0 349.4088 3.0 349.5739 4.0 350.0251 1.0 350.2123 4.0 351.2151 1.0 351.9433 3.0 352.9486 4.0 354.0882 2.0 354.2165 52.0 355.0854 2.0 355.2185 17.0 355.5458 8.0 356.2043 92.0 357.1854 19.0 357.4401 3.0 358.2082 4.0 358.9770 3.0 359.0885 1.0 359.2000 1.0 359.9811 1.0 360.1885 14.0 360.8190 2.0 361.1768 2.0 362.2064 4.0 362.3968 1.0 362.7441 2.0 363.0243 16.0 363.1680 3470.0 363.3022 200.0 363.6300 1.0 363.7422 3.0 363.8882 2.0 364.0004 2.0 364.1691 646.0 364.5170 6.0 364.9441 6.0 365.1670 112.0 365.3601 5.0 365.8214 2.0 366.0803 4.0 366.1879 19.0 366.3619 2.0 366.5421 2.0 366.6660 3.0 366.8238 3.0 367.2442 15.0 368.2249 14.0 368.4372 2.0 369.2622 5.0 370.2024 15.0 371.0963 2.0 371.2437 7.0 371.6633 1.0 372.1172 1.0 373.1692 25.0 373.8446 3.0 374.1177 5.0 374.3226 3.0 375.1884 8.0 375.6216 1.0 378.2264 6.0 379.1195 5.0 379.2456 2.0 379.6697 1.0 380.1744 5.0 380.3121 1.0 380.7253 1.0 381.2422 2.0 382.3114 3.0 383.2061 21.0 384.1919 47.0 384.4545 3.0 385.0778 2.0 385.2538 49.0 385.4012 1.0 386.0715 3.0 386.2103 10.0 387.2518 3.0 388.1091 2.0 388.2096 29.0 388.3642 1.0 388.5497 2.0 388.7238 19.0 389.1648 6.0 389.3157 5.0 390.1870 2.0 391.2000 12.0 392.2150 11.0 393.2408 136.0 394.1841 14.0 394.2975 20.0 394.4178 1.0 395.2513 174.0 396.0790 2.0 396.2620 33.0 396.8523 1.0 397.0281 4.0 397.2378 14.0 397.7437 1.0 398.2294 10.0 399.1767 5.0 399.2817 56.0 399.4707 4.0 399.8353 1.0 400.1884 2.0 400.2826 16.0 401.0600 1.0 401.2270 32.0 402.2157 5.0 402.3219 2.0 403.0068 4.0 403.1604 5.0 404.2067 4.0 405.2662 3.0 407.1620 11.0 407.7696 2.0 408.1618 5.0 409.2606 3813.0 409.4034 194.0 410.2592 1021.0 410.3999 42.0 410.5310 35.0 410.8172 4.0 410.9246 2.0 411.2347 297.0 411.4257 8.0 411.5809 9.0 411.7958 2.0 412.0466 2.0 412.2193 56.0 412.5006 1.0 412.8831 2.0 413.2549 302.0 413.5704 21.0 414.0797 1.0 414.2572 73.0 414.4630 3.0 414.5828 5.0 414.8704 3.0 415.2300 20.0 415.9018 3.0 416.0698 2.0 416.2138 9.0 417.2229 5.0 419.2446 3.0 420.0161 3.0 422.2380 3.0 422.6613 2.0 423.2180 6.0 423.3995 1.0 424.2112 4.0 425.0722 5.0 425.2163 120.0 425.3997 2.0 425.5727 8.0 425.7275 1.0 426.2039 48.0 426.4077 5.0 426.7480 3.0 427.0884 4.0 427.2662 751.0 427.5872 13.0 427.7941 1.0 428.2647 209.0 428.4743 13.0 428.7195 2.0 429.0364 4.0 429.2179 286.0 429.7924 13.0 430.0364 1.0 430.2096 64.0 430.3977 7.0 430.8055 4.0 431.2415 13.0 431.4042 3.0 431.9789 4.0 432.1134 5.0 432.4926 2.0 432.7375 6.0 433.0802 2.0 433.2272 4.0 433.7295 3.0 434.1463 3.0 434.4160 3.0 434.9558 1.0 435.2115 9.0 435.5942 3.0 435.7536 113.0 436.2402 74.0 436.7527 22.0 436.9708 1.0 437.2526 5065.0 437.9431 4.0 438.0540 4.0 438.2549 1276.0 438.5468 27.0 438.7932 3.0 438.9165 5.0 439.0645 2.0 439.2549 188.0 439.4098 16.0 439.9774 5.0 440.1255 2.0 440.2600 39.0 440.4465 3.0 440.7430 5.0 440.9654 1.0 441.2428 28.0 441.4351 1.0 441.6947 8.0 442.0287 5.0 442.1895 6.0 442.3504 5.0 442.5113 2.0 442.7713 3.0 443.2360 117.0 443.7872 1.0 444.0104 2.0 444.2169 141.0 444.4196 6.0 444.7583 1949.0 445.2556 963.0 445.4126 48.0 445.5492 20.0 445.7637 270.0 445.8971 16.0 446.0338 6.0 446.2520 66.0 446.4937 7.0 446.7176 13.0 447.0658 5.0 447.2401 7.0 447.4765 4.0 447.6508 2.0 448.0418 5.0 448.3358 1.0 448.9091 1.0 449.2084 5.0 449.7074 1.0 450.3815 1.0 452.2567 3.0 453.2458 8.0 453.6468 8.0 454.2862 11.0 455.2606 1670.0 455.4156 48.0 455.5915 26.0 455.6920 12.0 455.8930 1.0 456.2678 456.0 456.4962 12.0 456.6057 18.0 457.0370 6.0 457.2674 75.0 457.4825 5.0 458.2667 13.0 458.6991 4.0 458.9638 3.0 459.9855 2.0 460.6041 1.0 461.2502 20.0 461.5644 3.0 461.6782 4.0 461.8678 1.0 462.2319 5284.0 462.3991 134.0 462.5257 130.0 462.8041 1.0 463.2357 1238.0 463.4626 27.0 463.6273 16.0 463.8174 5.0 463.9441 2.0 464.0709 7.0 464.2437 218.0 464.4640 16.0 464.6415 12.0 464.9460 4.0 465.1109 6.0 465.2534 36.0 465.4917 1.0 465.9362 4.0 466.2411 8.0 466.4825 1.0 466.6605 1.0 467.2581 4.0 467.5507 3.0 467.9453 2.0 468.3017 2.0 468.4290 2.0 468.6074 4.0 468.9259 1.0 469.2318 5.0 469.3592 3.0 469.9458 1.0 470.2827 7.0 470.7371 3.0 471.0435 1.0 471.2351 10.0 471.7718 2.0 472.1999 11.0 472.3215 1.0 473.1403 2.0 473.2683 8.0 474.2033 2.0 475.0879 1.0 475.3829 2.0 475.6395 4.0 476.2814 4.0 476.7439 4.0 477.1679 1.0 477.3479 3.0 478.0939 1.0 478.2684 37.0 478.9949 1.0 479.2654 38.0 480.2878 9.0 480.4384 1.0 481.2665 17.0 481.5741 1.0 481.7291 1.0 482.1166 1.0 482.2588 4.0 483.2413 8.0 483.7201 3.0 483.8236 1.0 484.1602 3.0 484.3415 5.0 484.9503 2.0 485.0670 2.0 485.2862 191.0 485.4558 16.0 485.5855 5.0 485.7852 113.0 486.0784 2.0 486.2795 40.0 486.7850 7.0 487.2726 3.0 487.3506 1.0 487.7404 3.0 488.0263 3.0 488.2603 3.0 488.3774 4.0 488.5464 3.0 488.7285 2.0 489.0538 2.0 489.2230 7.0 489.6786 2.0 490.0824 3.0 490.6818 1.0 491.2946 4.0 492.0644 5.0 492.1689 1.0 493.0256 5.0 493.2507 9.0 493.8151 3.0 494.2882 1394.0 494.4430 81.0 494.7902 848.0 494.9403 30.0 495.1460 33.0 495.2910 267.0 495.4903 6.0 495.7939 64.0 495.9620 2.0 496.0930 4.0 496.2853 155.0 496.6698 1.0 496.8272 3.0 496.9453 4.0 497.1158 2.0 497.2791 68.0 497.5095 7.0 497.9427 4.0 498.1134 2.0 498.3140 49.0 498.9806 1.0 499.0857 2.0 499.3224 19.0 499.7564 4.0 500.0853 3.0 500.3354 6.0 500.4539 1.0 501.1781 4.0 501.3099 4.0 501.5602 1.0 502.1930 5.0 502.4832 1.0 502.6678 3.0 503.0109 3.0 503.2694 10.0 503.7766 1.0 504.2654 4.0 504.3844 1.0 504.7677 4.0 505.1115 2.0 505.2582 12.0 505.4421 3.0 505.7332 3.0 506.2710 1537.0 506.5540 36.0 506.9249 2.0 507.2689 476.0 507.9323 3.0 508.0384 5.0 508.3015 134.0 508.5558 6.0 508.7947 7.0 509.1000 2.0 509.2798 26.0 509.5383 2.0 509.7774 2.0 509.9501 2.0 510.3571 15.0 510.5616 1.0 511.2665 12.0 511.6258 1.0 511.7855 3.0 512.0251 1.0 512.1849 2.0 512.3065 30.0 512.5047 1.0 513.3442 4.0 514.0778 1.0 514.2989 97.0 514.5315 1.0 514.8252 5.0 515.0388 2.0 515.2926 36.0 515.9473 3.0 516.2014 7.0 516.3178 14.0 516.5624 3.0 516.8032 1.0 517.2139 5.0 519.1066 5.0 519.2540 2.0 519.8173 4.0 520.3138 3.0 521.0792 4.0 522.0737 4.0 522.3292 8.0 523.0018 1.0 523.3248 15.0 523.9980 1.0 524.2794 2168.0 524.5983 48.0 524.9549 4.0 525.1033 12.0 525.2803 644.0 525.4539 38.0 525.9395 2.0 526.1015 5.0 526.3170 298.0 526.5470 9.0 526.6956 1.0 526.9117 1.0 527.0468 3.0 527.3140 67.0 527.5331 9.0 527.9387 3.0 528.1010 2.0 528.2903 14.0 528.4797 1.0 528.6962 5.0 528.9126 7.0 529.2916 7.0 529.6166 2.0 530.2126 5.0 530.4565 1.0 531.2158 5.0 531.3243 1.0 532.4099 1.0 532.9395 4.0 533.1569 6.0 534.0813 2.0 534.3397 3.0 534.5301 3.0 536.2593 4.0 538.2097 10.0 538.3188 34.0 539.0564 3.0 539.1383 2.0 539.3077 15.0 539.7534 2.0 540.3336 10.0 541.2908 11.0 542.2905 791.0 542.7515 15.0 542.8611 4.0 543.0806 2.0 543.2934 236.0 543.5743 2.0 543.7567 11.0 544.0546 2.0 544.2955 54.0 544.6999 3.0 545.2882 10.0 545.4830 4.0 546.1566 5.0 546.3216 3.0 546.4592 1.0 546.8444 1.0 547.3398 3.0 548.0698 4.0 549.2965 3.0 549.7517 4.0 550.2347 3.0 550.8230 100.0 551.1461 2.0 551.3242 76.0 551.5468 2.0 551.6711 3.0 551.8333 23.0 552.3184 11.0 553.3586 9.0 553.5247 1.0 554.1617 2.0 554.3557 8.0 554.5081 4.0 555.1595 3.0 555.3259 5.0 555.8253 5.0 556.3050 46.0 556.5607 2.0 556.8523 1.0 557.1023 2.0 557.3049 104.0 557.4774 2.0 557.6441 2.0 558.2923 30.0 558.6172 4.0 558.8120 2.0 559.2910 9.0 559.4659 3.0 560.2817 32.0 560.5241 1.0 560.7053 2.0 561.2568 12.0 562.3367 1.0 563.2581 3.0 564.2360 1.0 564.8232 5.0 565.2149 1.0 565.3828 1.0 565.7131 7.0 566.7828 1.0 567.3994 3.0 567.8901 2.0 568.0724 1.0 568.2407 2.0 568.3529 6.0 569.2932 6.0 570.2062 11.0 570.7122 3.0 571.1902 1.0 571.3871 5.0 571.9780 1.0 572.2314 3.0 572.3441 1.0 572.5129 1.0 572.6960 4.0 572.9073 3.0 573.4426 3.0 573.7949 4.0 574.0627 1.0 574.2037 3.0 574.3306 4.0 575.3078 5496.0 576.0870 12.0 576.3138 1618.0 576.4904 36.0 576.6238 52.0 577.0838 5.0 577.3154 316.0 577.5646 8.0 577.9464 7.0 578.1870 15.0 578.3170 64.0 578.4841 8.0 578.7955 2.0 578.8947 1.0 579.1496 2.0 579.3535 14.0 579.4611 6.0 579.7870 4.0 580.3113 8.0 580.4106 6.0 580.8785 1.0 581.1622 3.0 581.3891 5.0 582.4396 1.0 582.5815 3.0 583.3488 7.0 583.8036 1.0 584.3647 16.0 585.3401 5.0 585.7674 3.0 587.0926 2.0 587.3350 5.0 587.4633 2.0 588.0054 2.0 588.1053 2.0 588.3194 23.0 588.7762 14.0 589.0476 1.0 589.2046 4.0 589.3760 4.0 589.6332 4.0 590.4908 4.0 591.2203 6.0 591.3659 49.0 591.5781 2.0 592.0648 2.0 592.2223 2.0 592.3566 31.0 592.7236 2.0 592.9385 3.0 593.0532 2.0 593.2974 29.0 593.6695 4.0 593.9563 2.0 594.3671 11.0 594.9462 1.0 595.3289 17.0 595.7359 2.0 596.1525 2.0 596.3250 9.0 596.4973 3.0 596.7704 4.0 596.9286 2.0 597.3356 481.0 597.4894 48.0 597.8336 343.0 598.0230 12.0 598.3397 199.0 598.6694 6.0 598.8415 36.0 599.0294 3.0 599.3049 31.0 599.8362 10.0 600.0092 1.0 600.2487 16.0 600.7589 3.0 600.9897 3.0 601.3118 9.0 601.6823 5.0 602.0288 5.0 602.3320 4.0 603.4156 1.0 603.8927 2.0 604.5436 1.0 604.7607 4.0 605.3419 25.0 605.5424 6.0 605.8809 35.0 606.3403 838.0 606.5129 30.0 606.8423 541.0 607.1507 16.0 607.3460 262.0 607.6729 19.0 607.8469 64.0 608.1661 9.0 608.3257 29.0 608.5289 7.0 608.7467 2.0 608.9064 17.0 609.3729 71.0 609.9668 5.0 610.1994 7.0 610.3593 46.0 610.8682 5.0 611.0718 2.0 611.1883 5.0 611.3201 41.0 611.5811 4.0 612.3310 17.0 612.6439 1.0 613.1974 5.0 613.3723 8.0 613.7044 7.0 614.0370 7.0 614.1595 2.0 614.3958 96.0 614.8844 89.0 615.3416 14334.0 615.8432 9612.0 616.0859 183.0 616.3431 4397.0 616.8410 1393.0 617.0357 60.0 617.3344 835.0 617.8293 176.0 618.0154 24.0 618.3298 210.0 618.5129 16.0 618.6299 22.0 618.8323 28.0 618.9813 22.0 619.3500 1962.0 619.5085 103.0 619.6841 45.0 619.8879 37.0 620.3509 725.0 620.7100 11.0 620.8419 27.0 621.1205 11.0 621.3539 179.0 621.7220 13.0 621.8687 12.0 622.0594 2.0 622.3377 47.0 622.7788 2.0 622.8815 2.0 623.3344 30.0 623.7190 4.0 623.9101 5.0 624.1600 4.0 624.2628 2.0 624.3909 11.0 624.6893 6.0 625.0423 6.0 625.2631 8.0 625.4020 57.0 625.6457 1.0 626.2051 2.0 626.3588 19.0 626.6321 4.0 627.0447 4.0 627.2657 12.0 627.3901 48.0 627.5311 4.0 627.7964 7.0 627.9438 1.0 628.3802 18.0 628.6519 5.0 628.9174 3.0 629.3134 46.0 629.6259 1.0 629.7884 2.0 629.9213 1.0 630.3053 9.0 630.7929 4.0 631.2068 4.0 631.4286 3.0 633.0562 3.0 633.8856 3.0 635.2789 3.0 636.2878 3.0 637.3605 1217.0 637.8176 18.0 638.0109 2.0 638.3672 435.0 638.5609 11.0 638.6946 11.0 639.1409 8.0 639.3761 109.0 639.6320 6.0 639.8701 2.0 640.0562 4.0 640.1976 2.0 640.3971 35.0 640.9572 3.0 641.3259 24.0 642.2092 2.0 642.3732 10.0 642.5223 2.0 643.2833 5.0 644.3583 1.0 647.3342 4.0 647.7535 2.0 648.3077 3.0 649.2969 3.0 651.3738 7.0 652.3654 5.0 652.5247 2.0 653.3065 6.0 653.4420 1.0 654.9319 2.0 655.3743 487.0 655.5495 18.0 655.7454 15.0 656.1522 2.0 656.3839 210.0 656.8609 2.0 657.3739 41.0 658.2341 2.0 658.3548 7.0 658.9589 3.0 659.6086 4.0 667.4298 6.0 668.4031 2.0 669.3858 20.0 670.2452 7.0 670.3772 36.0 670.5499 1.0 671.0376 3.0 671.1901 2.0 671.3882 20.0 672.4102 5.0 673.3565 3.0 676.6435 4.0 676.9037 3.0 677.7614 3.0 678.9874 3.0 681.3815 5.0 683.2875 1.0 683.5029 3.0 683.8567 2.0 684.3339 3.0 684.9805 1.0 685.5813 2.0 688.3877 3107.0 688.7603 95.0 689.3925 1169.0 689.6549 43.0 689.8977 36.0 690.1650 9.0 690.3967 247.0 690.8298 10.0 691.2166 5.0 691.3983 51.0 691.6343 4.0 691.8200 2.0 691.9681 9.0 692.1451 1.0 692.3818 15.0 693.1517 2.0 693.4771 5.0 693.9575 4.0 694.3762 5.0 694.8724 3.0 696.3624 5.0 697.3254 10.0 697.4809 1.0 700.5146 2.0 701.3714 3.0 703.5704 2.0 704.4135 4.0 705.2415 2.0 705.4067 16.0 706.4453 8.0 707.4467 4.0 708.4487 4.0 709.4514 4.0 710.4016 7.0 711.2001 4.0 711.3648 2.0 713.4852 3.0 714.3657 3.0 715.3883 2.0 717.2310 3.0 719.4073 1.0 721.4606 3.0 722.3145 5.0 722.4599 37.0 722.7256 1.0 723.4379 21.0 723.8967 1.0 724.2134 3.0 724.4230 24.0 724.9420 1.0 725.1639 2.0 725.3381 12.0 726.3370 7.0 727.3680 2.0 727.4951 2.0 729.3533 3.0 729.5281 4.0 730.2593 2.0 732.3121 2.0 732.4396 21.0 732.5988 1.0 732.9174 1.0 733.4272 5.0 733.5706 2.0 734.4156 7.0 735.2929 2.0 736.2027 2.0 736.4156 49.0 737.2568 2.0 737.4260 32.0 738.2159 5.0 738.4331 14.0 739.4482 18.0 740.3597 6.0 740.4971 25.0 740.7281 4.0 741.2406 4.0 741.4697 15.0 741.7214 4.0 742.2503 2.0 742.4553 15.0 742.7634 3.0 743.4050 10.0 744.8979 2.0 745.0746 3.0 745.2673 3.0 747.9363 3.0 748.3709 4.0 750.0621 2.0 750.4379 1254.0 750.7714 27.0 751.0295 5.0 751.1584 19.0 751.4369 534.0 751.6423 27.0 752.0296 4.0 752.4346 174.0 752.9335 2.0 753.2565 1.0 753.4308 26.0 754.1611 2.0 754.4359 150.0 755.1311 2.0 755.4182 51.0 755.6486 1.0 756.0692 3.0 756.4659 15.0 757.1375 1.0 757.4020 281.0 758.0769 2.0 758.4164 114.0 758.5954 1.0 758.9196 9.0 759.2438 5.0 759.3968 29.0 759.6006 3.0 760.4280 10.0 761.0934 3.0 761.3207 5.0 761.5156 5.0 761.7916 4.0 762.1165 4.0 762.8152 1.0 764.1484 8.0 764.3274 12.0 765.0922 1.0 765.3038 6.0 766.2319 2.0 766.3785 7.0 767.3074 1.0 768.4410 354.0 768.7098 8.0 768.9709 6.0 769.1014 1.0 769.4554 137.0 769.6237 11.0 769.7870 3.0 770.3095 5.0 770.4728 44.0 770.7179 2.0 771.1427 2.0 771.4368 4.0 772.2053 3.0 772.3687 6.0 772.5323 3.0 773.0557 5.0 773.8740 5.0 774.3488 4.0 774.7746 6.0 775.4133 11799.0 775.8724 145.0 776.0691 137.0 776.4183 4924.0 777.4233 1148.0 777.7585 40.0 778.0211 24.0 778.4270 261.0 778.7329 31.0 778.9897 19.0 779.1869 8.0 779.4277 56.0 779.6140 8.0 779.7947 10.0 780.0412 4.0 780.4850 12.0 780.8303 9.0 781.0770 6.0 781.3895 9.0 781.7350 10.0 782.1134 5.0 782.4097 3.0 782.7719 3.0 782.9035 3.0 783.3646 1.0 783.9510 5.0 784.3531 6.0 784.8311 4.0 785.0289 4.0 785.3623 7.0 786.4474 4.0 787.7187 1.0 788.1316 2.0 788.7925 2.0 791.1246 3.0 791.2900 3.0 791.4887 3.0 792.0020 2.0 792.8799 1.0 793.9738 1.0 794.5377 3.0 795.5167 2.0 796.4374 11.0 797.2935 3.0 797.3932 1.0 798.4404 4.0 799.9543 1.0 801.4861 3.0 802.3860 1.0 810.4404 3.0 813.0726 2.0 821.3826 4.0 822.4961 8.0 823.0868 1.0 823.4752 10.0 824.3872 4.0 826.3991 3.0 831.4817 7.0 837.4311 1.0 839.3224 4.0 839.5270 16.0 840.4898 9.0 841.2158 6.0 841.4889 3.0 842.3597 4.0 842.5475 3.0 844.8038 1.0 845.6591 3.0 848.2965 1.0 849.1193 2.0 849.4958 432.0 849.8911 14.0 850.3201 27.0 850.5065 196.0 851.0409 5.0 851.3500 5.0 851.5154 69.0 852.1743 3.0 852.2774 2.0 852.4911 13.0 853.1881 2.0 853.4734 7.0 853.6522 3.0 853.7553 1.0 854.3057 3.0 854.9938 3.0 855.2690 3.0 858.5247 2.0 865.4456 26.0 865.9722 1.0 866.5263 8.0 867.1326 4.0 867.2887 5.0 867.5120 179.0 868.0166 1.0 868.2940 10.0 868.5108 87.0 868.8837 3.0 869.0919 1.0 869.5082 24.0 869.9074 2.0 870.1330 2.0 870.4593 105.0 871.0185 2.0 871.3138 2.0 871.4685 49.0 872.1825 1.0 872.4606 8.0 873.5734 5.0 874.3389 3.0 874.5128 3.0 875.2786 3.0 876.5674 7.0 877.1948 1.0 877.5433 5.0 878.0838 2.0 878.6418 2.0 879.9156 3.0 880.5792 1.0 883.1653 1.0 883.4706 32.0 884.1100 5.0 884.4949 17.0 885.1951 1.0 885.6241 4.0 885.8605 1.0 886.4911 3.0 887.1570 5.0 887.5076 3.0 887.7706 4.0 887.9459 4.0 888.4924 6352.0 889.0160 51.0 889.4954 3108.0 890.4965 945.0 891.0992 29.0 891.4913 211.0 892.0187 13.0 892.1945 10.0 892.5194 31.0 892.9328 10.0 893.2318 5.0 893.4500 7.0 893.6540 8.0 893.8826 6.0 894.1465 6.0 894.3401 4.0 894.7045 6.0 894.8858 3.0 895.4668 4.0 895.7662 1.0 896.1888 1.0 896.9818 6.0 897.5282 9.0 898.3041 3.0 898.6568 3.0 899.2920 3.0 899.5253 7.0 900.3159 1.0 904.4345 2.0 904.5760 2.0 905.3727 1.0 906.3645 3.0 910.6924 1.0 911.0031 4.0 912.5046 1.0 919.1648 1.0 934.4639 3.0 935.6155 1.0 950.5785 3.0 951.5676 4.0 952.3574 3.0 952.5571 2.0 960.5651 6.0 961.4772 2.0 968.5517 1.0 969.5580 26.0 969.8707 1.0 970.2373 2.0 970.4940 18.0 970.9706 1.0 971.5208 7.0 974.9721 3.0 975.1926 1.0 975.7807 1.0 978.2823 4.0 978.5509 90.0 978.7609 4.0 979.0739 4.0 979.2764 2.0 979.5319 54.0 980.5366 22.0 980.7502 1.0 986.3421 4.0 986.7857 4.0 986.9890 2.0 987.5515 2875.0 988.5592 1628.0 988.8755 44.0 989.1107 32.0 989.5505 524.0 989.9492 33.0 990.2640 10.0 990.5545 103.0 991.0789 7.0 991.2642 6.0 991.5701 20.0 991.9786 13.0 992.2650 2.0 992.4874 2.0 992.6913 3.0 993.0066 2.0 993.2476 3.0 993.6371 4.0 993.9896 5.0 994.2122 3.0 994.4720 5.0 995.1587 4.0 996.2543 10.0 996.5561 70.0 997.2761 11.0 997.5792 52.0 998.2056 2.0 998.5079 19.0 998.8005 2.0 999.5446 4.0 1000.6799 3.0 1009.5228 5.0 1009.9716 1.0 1010.5327 7.0 1018.5174 3.0 1029.4393 1.0 1037.5740 3.0 1065.3948 2.0 1065.6023 28.0 1066.3556 2.0 1066.5669 11.0 1067.4703 1.0 1068.1433 2.0 1083.3719 2.0 1083.5669 25.0 1084.2241 1.0 1084.5343 9.0 1084.8636 2.0 1085.5421 3.0 1086.2015 1.0 1100.4056 2.0 1100.6428 27.0 1101.6748 16.0 1102.4172 4.0 1102.6908 2.0 1110.5831 7.0 1111.6589 2.0 1210.3501 3.0 lutefisk-1.0.7+dfsg.orig/database.sequence0000644000175000017500000000001410102256715020441 0ustar rusconirusconiSLHTLFGDELCKlutefisk-1.0.7+dfsg.orig/Lutefisk.params0000644000175000017500000001244510106463371020153 0ustar rusconirusconi// Lutefisk parameters file // // If this file is present in the directory from which Lutefisk is invoked, // then the value of the parameters listed in the 'VALUE' column below // will override the program defaults. // // TITLE VALUE DEFAULT CID Filename: | CID Filename. CID Quality: N | Check for CID data quality. (Y/N) Peptide MW: 0 | Peptide molecular weight. Zero will calc. from input file. Charge-state: 0 | Number of charges on the precursor ion. Zero will calc. from input file. MaxEnt3: N | Data file processed using MaxEnt 3 (Qtof only) (Y/N) // Mass Tolerances ---------------------------------------------------------------------- Peptide Error (u): 0.45 | Peptide molecular weight tolerance. Fragment Error (u): 0.25 | Fragment ion tolerance. Must be 0.25 or less for qtof scoring to take effect. Final Fragment Err (u): 0.04 | Fragment ion tolerance for final scoring of Qtof data. Zero will skip qtof scoring. // Memory and Speed --------------------------------------------------------------------- Max. Final Sequences: 20000 | Number of final sequences stored. Max. Subsequences: 5000 | Number of subsequence allowed. Mass Scrambles for Statistics: 0 | Number of times to use a wrong precursor mass (for calculating score significance). // Spectral Processing ------------------------------------------------------------------ CID File Type: D | CID file type: D='.dta', F=ICIS text file, L=LCQ "text", T=tab text, N='.dat' Profile/Centroid: C | Is this CID data in profile or centroid form? P=Profile, C=Centroid, A=Autodetect. Peak Width (u): 0.75 | Peak width at about 10%. A value of 0 (zero) activates the auto-peak width mode. Ion Threshold: 0.01 | Ion threshold. (Ions > average intensity x Ion threshold are utilized.) Mass Offset (u): 0.0 | Mass offset. Ions Per Window: 8 | Ions per input window (windows are 60 Da wide). 8 for Qtof, 6 for LCQ Ions Per Residue: 6 | Number of ions per average residue. 6 for Qtof, 4 for LCQ // Subsequencing ------------------------------------------------------------------------ Transition Mass (u): 5000 | Cutoff for monoisotopic to average mass calculations. Fragmentation Pattern: Q | Fragmentation pattern (T=triple quad tryptic,L=ion trap tryptic, Q=Qtof tryptic) Max. Gaps: -1 | Maximum number of gaps per subsequence. -1 implies a default value. Extension Threshold: 0.15 | Extension threshold. Max. Extensions: 6 | Maximum number of extensions per subsequence. // Extras ------------------------------------------------------------------------------- Cysteine Mass: 160.03065 | Residue mass of cysteine. (160.03065, 161.01466, 208.06703 = carbamidomethyl, carboxymethyl and pyridylethyl) Proteolysis: T | Type of proteolysis? T=tryptic, K=Lys-C, E=V8, D=AspN, and N=none of the above Modified N-terminus: 1.0078 | N-terminal mass [1.0078(unmod), 43.0184(acetyl), 44.0136(carbamyl)] Modified C-terminus: 17.0027 | C-terminal mass [17.0027(unmod), 16.0187(amide), 31.0184(methyl)] Present Amino Acids: * | Amino acids known to be present in the peptide. * means none. Absent Amino Acids: * | Amino acids known to be absent from the peptide. * means none. Auto Tag: Y | Auto-tag (Y/N). Tag Low Mass y Ion: 0 | Sequence tag - low mass y ion Sequence Tag: * | Sequence tag - single letter code, no spaces, from low mass to high mass y ion Tag High Mass y Ion: 0 | Sequence tag - high mass y ion DB Sequence File: | File with sequences to score with the final results. Shoe Size (US): 9.5 | US shoe size. Default of 17. // Output -------------------------------------------------------------------------------- Number of sequences: 10 | Number of output sequences listed. A good bet is 5 Score threshold: 0.02 | Pr(c) is approximate probability that at least half of the sequence is correct. A good bet is 0.20.