./PaxHeaders.5815/SFST0000644000000000000000000000013212463674134011406 xustar0030 mtime=1422882908.740362703 30 atime=1448530778.645029745 30 ctime=1448530762.276792346 SFST/0000750007467500724610000000000012463674134013220 5ustar00schmidcisintern00000000000000SFST/PaxHeaders.5815/data0000644000000000000000000000007410256273442012162 xustar0030 atime=1448530749.008599996 30 ctime=1448530762.092789678 SFST/data/0000750007467500724610000000000010256273442014124 5ustar00schmidcisintern00000000000000SFST/data/PaxHeaders.5815/XMOR0000644000000000000000000000007311707521002012734 xustar0030 atime=1448530749.008599996 29 ctime=1448530762.04878904 SFST/data/XMOR/0000750007467500724610000000000011707521002014677 5ustar00schmidcisintern00000000000000SFST/data/XMOR/PaxHeaders.5815/prefixfilter.fst0000644000000000000000000000007410256277455016255 xustar0030 atime=1448530762.012788517 30 ctime=1448530762.012788517 SFST/data/XMOR/prefixfilter.fst0000640007467500724610000000120510256277455020141 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: prefixfilter.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Content: enforcement of derivational constraints % Modified: Wed Jun 22 17:04:56 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #include "symbols.fst" % Check agreement of the word class and origin features % Delete the agreement features of the prefix ALPHABET = [#Letter#] #=wc# = #WordClass# #=orig# = #Origin# .* \ [#=wc#]:<> [#=orig#]:<> [#BDKStem#] .* [#=wc#] [#StemType#] [#=orig#] \ [#InflClass#]? SFST/data/XMOR/PaxHeaders.5815/symbols.fst0000644000000000000000000000007410410733557015232 xustar0030 atime=1448530762.016788577 30 ctime=1448530762.016788577 SFST/data/XMOR/symbols.fst0000640007467500724610000000567310410733557017133 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: symbols.fst % Author: Helmut Schmid; IMS, Universitaet Stuttgart % Content: definition of symbol classes % Modified: Fri Mar 24 10:10:07 2006 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % All symbols used by the morphology should be defined here %%% Single Character Symbols %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % lower case consonants #cons# = bcdfghjklmnpqrstvwxyz % upper case consonants #CONS# = BCDFGHJKLMNPQRSTVWXYZ % all consonants #Cons# = #cons# #CONS# % lower case vowels #vowel# = aeiou % upper case vowels #VOWEL# = AEIOU % all vowels #Vowel# = #vowel# #VOWEL# % lower case letters #letter# = #cons# #vowel# % upper case letters #LETTER# = #CONS# #VOWEL# % all letters #Letter# = #Cons# #Vowel# %%% Lexicon Entry Markers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % affix type features #Affix# = % stem type features (internally used) #BDKStem# = % all stem types including the general stem feature % used in the lexicon #EntryType# = #BDKStem# #Affix# %%% Agreement Features %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % word class features #WordClass# = % stem type feature #StemType# = % classic origin features #classic# = % all origin features #Origin# = #classic# % origin features including the internally used feature % which represents the disjunction stored in #classic# #Origin-cl# = #Origin# % complexity features #Complex# = % inflection class features #InflClass# = % all agreement features #AgrFeat# = #WordClass# #StemType# #Origin-cl# % all agreement features + inflection class features #AgrFeatInfl# = #AgrFeat# #InflClass# %%% Analysis Features %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % number feature #Number# = % gender feature #Gender# = % case feature #Case# = % Person Feature #Person# = <1><2><3> % degree feature #Degree# = % verbal features #VerbFeat# = % affix markers #AFF# = % Morphosyntactic Features #MorphSyn# = #Number# #Gender# #Case# #Person#\ #Degree# #VerbFeat# #AFF# %%% Trigger Features %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % capitalisation feature: lower case, capitalized or fixed #Cap# = % Features used to mark the boundaries of morphemes and inflection #Boundary# = % all triggers % marks lexicon entries without default stems #Trigger# = #Cap# #Boundary# %%% General Symbol Classes %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #Tag# = #EntryType# #AgrFeatInfl# #MorphSyn# #Trigger# #AllSym# = #Letter# #Tag# SFST/data/XMOR/PaxHeaders.5815/HOWTO0000644000000000000000000000007410521347522013643 xustar0030 atime=1448530762.016788577 30 ctime=1448530762.016788577 SFST/data/XMOR/HOWTO0000640007467500724610000000441110521347522015531 0ustar00schmidcisintern00000000000000 How to develop a new morphology =============================== 1. Get acquainted with XMOR - Read the SFST manual - Read the article on the German SMOR morphology - Look at the XMOR files and try to understand how XMOR works 2. Do the linguistic work, i.e. a) Define the set of word classes b) Define the inflectional classes c) What are the derivational prefixes and their features? d) What are the derivational suffixes and their features? e) Which features will be used in the morphological analyses? 3. Modify symbols.fst to include your word classes, inflectional classes and morphosyntactic features 4. Write lexicon entries and store them in the file "lexicon". Add entries for derivational prefixes and suffixes. The Perl script morph-match.perl can be used to do the necessary alignment of lemma and surface form automatically. Try for instance: echo "mice\tmouse\tN base native NounPl" | ./morph-match.perl 5. Define the different inflectional paradigms in "inflection.fst". 6. Write the morpho-phonological rules which transform the morpheme sequences to the correct surface strings and store them in "phon.fst". 6. Compile and debug the morphology. Debugging ========= If the result is not what you have expected, you can use the following tricks: Check whether the compiler produces warnings of the form "assignment of empty transducer to:". If so, check whether it is okay that the transducer generated at this point is empty. You will find the respective source code line at the beginning of the line with the compiler warning. In order to check an intermediate result of the compilation, which is stored in $X$, you can proceed as follows: 1. Insert the command $X$ >> "ValueofX.a" in the source code after the commands which compute $X$ 2. Compile 3. Examine the file "ValueofX.a": You can print the transducer with the command fst-print ValueofX.a You can generate with the transducer using the command fst-generate ValueofX.a You can use the command fst-mor ValueofX.a to check interactively how the transducer maps the input strings. 4. Often it is also useful to reduce the lexicon to the smallest number of entries needed to test something. (But don't forget to make a backup of the full lexicon first!) SFST/data/XMOR/PaxHeaders.5815/Makefile0000644000000000000000000000007410430654707014465 xustar0030 atime=1448530762.020788635 30 ctime=1448530762.020788635 SFST/data/XMOR/Makefile0000640007467500724610000000150710430654707016356 0ustar00schmidcisintern00000000000000 SOURCES = morph.fst symbols.fst map1.fst map2.fst defaultstems.fst \ suffixfilter.fst prefixfilter.fst compoundfilter.fst inflection.fst \ inflectionfilter.fst phon.fst .PHONY: targets targets: morph.a morph.a: phon.a map1.a map2.a prefixfilter.a suffixfilter.a compoundfilter.a \ inflection.a inflectionfilter.a lexicon %.a: %.fst fst-compiler $< $@ %.ca: %.a fst-compact $< $@ archive: gtar -zcf VERSION-`date '+%y%m%d'`.tar.gz Makefile lexicon *.fst clean: -rm *.a *~ Makefile.bak 2>&- > /dev/null Makefile: *.fst -makedepend -Y -o.a $(SOURCES) 2>/dev/null # DO NOT DELETE morph.a: symbols.fst defaultstems.fst map1.a: symbols.fst map2.a: symbols.fst suffixfilter.a: symbols.fst prefixfilter.a: symbols.fst compoundfilter.a: symbols.fst inflection.a: symbols.fst inflectionfilter.a: symbols.fst phon.a: symbols.fst SFST/data/XMOR/PaxHeaders.5815/compoundfilter.fst0000644000000000000000000000007410256277453016602 xustar0030 atime=1448530762.020788635 30 ctime=1448530762.020788635 SFST/data/XMOR/compoundfilter.fst0000640007467500724610000000136010256277453020470 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: compoundfilter.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Content: enforcement of compounding constraints % Modified: Fri Jun 17 14:14:08 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #include "symbols.fst" % Compounding filter % Compounds are restricted to nouns and adjectives $org$ = [#Origin#]:<> % symbols occurring in non-compounds $T$ = [#Letter# #EntryType#] | [#WordClass#]:<> | $org$ % expression matching non-compounds $TS$ = $T$* % expression matching compounds $TC$ = ($T$ | :<>)* ($TS$ [] |\ $TC$ []) \ :<> $org$ [#InflClass#] SFST/data/XMOR/PaxHeaders.5815/lexicon0000644000000000000000000000007410601001053014365 xustar0030 atime=1448530762.024788693 30 ctime=1448530762.024788693 SFST/data/XMOR/lexicon0000640007467500724610000000074110601001053016255 0ustar00schmidcisintern00000000000000dark red easy house story wish mouse mo:iu:<>s:ce hike strike un able SFST/data/XMOR/PaxHeaders.5815/FILES0000644000000000000000000000007310256277453013616 xustar0030 atime=1448530762.024788693 29 ctime=1448530762.02878875 SFST/data/XMOR/FILES0000640007467500724610000000137710256277453015515 0ustar00schmidcisintern00000000000000Source code files of the XMOR morphology morph.fst main file symbols.fst definition of the single- and multi-character symbols (tags) map1.fst mapping of the analysis symbols of the lexicon entries map2.fst mapping of the surface symbols of the lexicon entries defaultstems.fst generation of default compounding and derivational stems from base stems suffixfilter.fst filter checking agreement of the suffix features prefixfilter.fst filter checking agreement of the prefix features compoundfilter.fst filter checking the structure of compounds inflection.fst definition of the inflectional endings inflectionfilter.fst filter checking agreement of the inflection feature phon.fst morpho-phonological rules lexicon contains the list of lexicon entries SFST/data/XMOR/PaxHeaders.5815/morph-match.perl0000644000000000000000000000007211707521002016113 xustar0029 atime=1448530762.02878875 29 ctime=1448530762.02878875 SFST/data/XMOR/morph-match.perl0000750007467500724610000000553411707521002020014 0ustar00schmidcisintern00000000000000#!/usr/bin/perl # Input format: # enwraps enwrap V 3sg PRES use Getopt::Std; getopts('uh'); use Encode; if (defined $opt_h) { print " Usage: morph-match.perl [file] OPTIONS: -u use UTF8 character encoding -h print usage information The input file contains lines with three columns such as write wrote V past house houses N pl The output is wro:ite houses:<> "; exit(1); } my $N; while (<>) { $_ = decode("utf-8",$_) if defined $opt_u; print STDERR "\r",$N if (++$N % 10 == 0); chomp; my($w,$l,@f) = split; print ""; output(match($l,$w)); foreach $l (@f) { output("<$l>"); } print "\n"; } sub output { my $_ = shift; $_ = encode("utf-8",$_) if defined $opt_u; print; } ###################################################################### # alignment functions sub match { my $lemma = shift; my $word = shift; my @w = mysplit($word); my @l = mysplit($lemma); unshift(@w, undef); unshift(@l, undef); my($i,$k,$s,@score,@action,$result); $score[0][0] = 0.0; for( $i=0; $i<=$#w; $i++ ) { for( $k=0; $k<=$#l; $k++ ) { next if $i==0 && $k == 0; $score[$i][$k] = 10000000000000; # matching $s = $score[$i-1][$k-1] + cost($w[$i], $l[$k]); if ($score[$i][$k] >= $s) { $score[$i][$k] = $s; $action[$i][$k] = "m"; } # delete character in word $s = $score[$i-1][$k] + cost($w[$i], ''); if ($score[$i][$k] >= $s) { $score[$i][$k] = $s; $action[$i][$k] = "w"; } # delete character in lemma $s = $score[$i][$k-1] + cost('', $l[$k]); if ($score[$i][$k] >= $s) { $score[$i][$k] = $s; $action[$i][$k] = "l"; } } } $i = $#w; $k = $#l; while (defined $action[$i][$k]) { if ($action[$i][$k] eq "m") { if ($w[$i] eq $l[$k]) { $result = quote($w[$i]).$result; } else { $result = quote($l[$k]).":".quote($w[$i]).$result; } $i--; $k--; } elsif ($action[$i][$k] eq "w") { $result = "<>:".quote($w[$i]).$result; $i--; } else { $result = quote($l[$k]).":<>".$result; $k--; } } return $result; } sub mysplit { my $s = shift; $s =~ s/(<[A-za-z0-9-]*>|.)/$1 /g; $s =~ s/ $//; return split(/ /,$s); } sub quote { my $s = shift; $s =~ s/^([\!\"\'\(\)\,\-\.\:\?\~\<])$/\\$1/; return $s; } sub cost { my($w,$l) = @_; return 1e30 unless (defined $w && defined $l); return 0 if ($w eq $l); return 0.2 if (lc($w) eq lc($l)); return 0.5 if ($w eq '' && $l eq 'a'); return 0.5 if ($w eq '' && $l eq 'o'); return 0.5 if ($w eq '' && $l eq 'u'); return 0.5 if ($w eq 's' && $l eq ''); return 0.7 if ($w eq 'c' && $l eq 's'); return 0.7 if ($w eq 's' && $l eq 'c'); return 0.7 if ($w eq ''); return 0.8 if ($l eq ''); return 0.95 if ($w =~ /[aeiou]/ && $l =~ /[aeiou]/); return 1; } SFST/data/XMOR/PaxHeaders.5815/map2.fst0000644000000000000000000000007310256277454014407 xustar0029 atime=1448530762.02878875 30 ctime=1448530762.032788808 SFST/data/XMOR/map2.fst0000640007467500724610000000140110256277454016272 0ustar00schmidcisintern00000000000000%************************************************************************** % File: map2.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Content: deletes tags on the surface string % Modified: Fri Jun 17 14:34:43 2005 (schmid) %************************************************************************** % definition of the symbol classes #include "symbols.fst" % delete unwanted symbols on the "surface" % and map the feature to the more specific features % and ALPHABET = [#Letter# #WordClass# #StemType# #Origin# #Complex# #InflClass#] \ :[#classic#] [#Affix#] .* |\ ? (: .* |\ : .* |\ : .* ) .* SFST/data/XMOR/PaxHeaders.5815/morph.fst0000644000000000000000000000007410256471103014661 xustar0030 atime=1448530762.032788808 30 ctime=1448530762.032788808 SFST/data/XMOR/morph.fst0000640007467500724610000000433710256471103016556 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: morph.fst % Author: Helmut Schmid; IMS, Universitaet Stuttgart % Content: main file of the morphology % Modified: Thu Jun 23 10:23:15 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % definition of the symbol classes #include "symbols.fst" % read the morphemes $LEX$ = "lexicon" % apply the transducers which deletes unwanted symbols in the analysis % string (map1) and the surface string (map2) of the lexicon entries $LEX$ = "" || $LEX$ || "" % creation of sublexica for the different types of entries % stems $BDKStem$ = $LEX$ || :<>? [#BDKStem#] [#AllSym#]* % prefixes $Prefix$ = $LEX$ || [#AllSym#]* % suffixes combining with simplex stems $SimplexSuffix$ = $LEX$ || :<> [#AllSym#]* % suffixes combining with suffix derivation stems $SuffDerivSuffix$ = $LEX$ || :<> [#AllSym#]* % suffixes combining with prefix derivation stems $PrefDerivSuffix$ = $LEX$ || :<> [#AllSym#]* % generation of default derivational and compounding stems #include "defaultstems.fst" %%% Derivation and Composition %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % sequence of derivational suffixes to be added to simplex stems $SimplexSuffixes$ = ($SimplexSuffix$ $SuffDerivSuffix$*)? % sequence of derivational suffixes to be added to prefixed stems $PrefDerivSuffixes$ = ($PrefDerivSuffix$ $SuffDerivSuffix$*)? % suffix derivation with a simplex base $SuffixFilter$ = "" $S0$ = $BDKStem$ $SimplexSuffixes$ || $SuffixFilter$ % prefix derivation $P1$ = $Prefix$ $S0$ || "" % suffix derivation with a "prefderiv" base $S1$ = $P1$ $PrefDerivSuffixes$ || $SuffixFilter$ % combination of the different derivations $Morph$ = $S0$ | $S1$ %%% Compounding %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % $Morph$ = $Morph$* $Morph$ || "" $Morph$ = $Morph$ || "" %%% Inflection %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% $Morph$ = $Morph$ "" || "" %%% Morpho-Phonological Rules %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% $Morph$ || "" SFST/data/XMOR/PaxHeaders.5815/inflection.fst0000644000000000000000000000007410256277454015703 xustar0030 atime=1448530762.032788808 30 ctime=1448530762.036788866 SFST/data/XMOR/inflection.fst0000640007467500724610000000267210256277454017600 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: inflection.fst % Author: Helmut Schmid; IMS, Universitaet Stuttgart % Content: definition of inflectional classes % Modified: Fri Jun 17 16:04:22 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % definition of the symbol classes #include "symbols.fst" %%% definition of the inflectional classes %%%%%%%%%%%%%%%%%%%%%%%%%%%% $AdjReg$ = {}:{} |\ {}:{er} |\ {}:{est} $AdvReg$ = <> $NounReg$ = {}:{} |\ {}:{s} $NounSg$ = :<> $NounPl$ = :<> $VerbReg$ = {<3>}:{s} |\ {[<1><2>]}:{} |\ {[<1><2><3>]}:{} |\ {[][<1><2><3>]}:{ed} |\ {}:{ed} |\ {}:{ing} % adding a tag for the inflectional class $LCInfl$ = <>: $AdjReg$ |\ <>: $AdvReg$ |\ <>: $NounReg$ |\ <>: $NounSg$ |\ <>: $NounPl$ |\ <>: $VerbReg$ % no capitalized or fixed word forms yet % $UCInfl$ = ... % $FixInfl$ = ... % The capitalization of the resulting word form is indicated by % the three feature tags (lower case the first character), % (capitalize the first character) and (do nothing) $LCInfl$ = $LCInfl$ <>: % $UCInfl$ = $UCInfl$ <>: % $FixInfl$ = $FixInfl$ <>: $LCInfl$ % | $UCInfl$ | $FixInfl$ SFST/data/XMOR/PaxHeaders.5815/map1.fst0000644000000000000000000000007410256471022014372 xustar0030 atime=1448530762.036788866 30 ctime=1448530762.036788866 SFST/data/XMOR/map1.fst0000640007467500724610000000120710256471022016260 0ustar00schmidcisintern00000000000000%************************************************************************** % File: map1.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Content: deletes tags in the analysis string % Modified: Thu Jun 23 10:22:28 2005 (schmid) %************************************************************************** % definition of the symbol classes #include "symbols.fst" % delete unwanted symbols in the analysis ALPHABET = [#Letter# #WordClass#] \ <>:[#StemType# #Origin-cl# #InflClass#] <>:? <>: .* |\ <>: <>:[#Complex#] <>:[#WordClass#] .* :<> |\ <>: .* <>:[#WordClass#] .* :<> SFST/data/XMOR/PaxHeaders.5815/defaultstems.fst0000644000000000000000000000007410256277453016250 xustar0030 atime=1448530762.040788924 30 ctime=1448530762.040788924 SFST/data/XMOR/defaultstems.fst0000640007467500724610000000171010256277453020135 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: defaultstems.fst % Author: Helmut Schmid; IMS, Universitaet Stuttgart % Content: generation of derivational and compounding stems % from base stems by means of default rules % Modified: Fri Jun 17 16:03:22 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Default derivation and compounding stems are not generated for % lexicon entries which start with ALPHABET = [#Letter#] % Rule which turns a adjectival base stem into a compounding stems % The inflection feature is deleted; the morpheme itself is unchanged. $DefCompAdj$ = $LEX$ ||\ : .* : [#Origin#] [#InflClass#]:<> $DefDerivVerb$ = $LEX$ ||\ : .* : [#Origin#] [#InflClass#]:<> % Add the new stems to the set of stems $BDKStem$ = $BDKStem$ | $DefCompAdj$ | $DefDerivVerb$ SFST/data/XMOR/PaxHeaders.5815/phon.fst0000644000000000000000000000007410523570403014500 xustar0030 atime=1448530762.040788924 30 ctime=1448530762.040788924 SFST/data/XMOR/phon.fst0000640007467500724610000000334310523570403016371 0ustar00schmidcisintern00000000000000%************************************************************************** % File: phon.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Content: morphophonological rules % Modified: Mon Nov 6 09:23:31 2006 (schmid) %************************************************************************** #include "symbols.fst" %%% adjective rules %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % easy+er -> easier % easy+est -> easiest % late+er -> later % red -> redder ALPHABET = [#Letter# #EntryType# #WordClass# #Cap#] \ y:i e:<> : $C$ = [] $T$ = ([hwxaioue] $C$) e <=> <> (r | st) &\ [#Cons#] y <=> i ($C$ e(r|st)) &\ ([#Cons#][#vowel#][#cons#]) <=> (e(r|st)) ALPHABET = [#Letter# #EntryType# #WordClass# #Cap#] #=1# = #cons# $X$ = [#=1#] <>: :[#=1#] $Rule1$ = $T$ || (.* $X$)* .* %%% noun rules %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % story -> stories % wish -> wishes (also verbs) ALPHABET = [#Letter# #EntryType# #WordClass# #Cap#] $T$ = {y}:{ie} ^-> ([#cons#] __ [] s) $Rule2$ = $T$ || <>:e ^-> (([szx]|[cs]h) __ [] s) %%% verb rules %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % delete -> deleting ALPHABET = [#Letter# #EntryType# #WordClass# #Cap#] e:<> y:i $Rule3$ = e <=> <> ( (ing|ed)) &\ [#cons#] y <=> i ( ed) %%% capitalisation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ALPHABET = [#Letter#] [#WordClass#]:<> $T$ = .* \ ([#EntryType#]:<> ([#LETTER#]:[#letter#] | [#letter#]) .*)* \ [#Cap#] $Capitalisation$ = $T$ || (\ ([#LETTER#]:[#letter#] | [#letter#]) .* :<> |\ ([#letter#]:[#LETTER#] | [#LETTER#]) .* :<> |\ .* :<>) $Rule1$ || $Rule2$ || $Rule3$ || $Capitalisation$ SFST/data/XMOR/PaxHeaders.5815/inflectionfilter.fst0000644000000000000000000000007410256277454017111 xustar0030 atime=1448530762.044788982 30 ctime=1448530762.044788982 SFST/data/XMOR/inflectionfilter.fst0000640007467500724610000000114010256277454020773 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: inflectionfilter.fst % Author: Helmut Schmid; IMS, Universitaet Stuttgart % Content: definition of the inflectional filter % Modified: Fri Jun 17 14:29:02 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % definition of the symbol classes #include "symbols.fst" % The following inflection filter ensures that the base stems % are combined with the correct inflectional endings ALPHABET = [#Letter# #EntryType# #WordClass#] $=1$ = [#InflClass#]:<> .* $=1$ $=1$ .* [#Cap#] SFST/data/XMOR/PaxHeaders.5815/README0000644000000000000000000000007410521346250013675 xustar0030 atime=1448530762.044788982 30 ctime=1448530762.044788982 SFST/data/XMOR/README0000640007467500724610000000067610521346250015574 0ustar00schmidcisintern00000000000000 The morphology is compiled by running make (GNU version) After compiling, you are ready to analyze words: > fst-mor morph.a reading automaton... finished. analyse> mouse mouse analyse> mice mouse analyse> houses house analyse> mouses no result for mouses analyse> unstrikable unstrikeable analyse> redder red analyse> generate> red reddest generate> q SFST/data/XMOR/PaxHeaders.5815/suffixfilter.fst0000644000000000000000000000007210256277456016263 xustar0029 atime=1448530762.04878904 29 ctime=1448530762.04878904 SFST/data/XMOR/suffixfilter.fst0000640007467500724610000000251610256277456020157 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: suffixfilter.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Content: enforcement of derivational constraints % Modified: Fri Jun 17 16:31:36 2005 (schmid) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #include "symbols.fst" % Definition of an expression which matches either a simplex word form % or the features of the last morpheme of a complex word form % expression matching prefixes $C1$ = [#Letter# #BDKStem# ] % expression matching the last morpheme and its agreement features $C2$ = [#Letter# #AgrFeat#] $Tail$ = $C1$* $C2$* [#InflClass#]? %%% Feature Checking %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The agreement features are deleted $=1$ = [#WordClass#]:<> $=2$ = [#StemType#]:<> $=3$ = [#Origin#]:<> $T$ = $=1$ $=2$ $=3$ $=1$ $=2$ $=3$ ALPHABET = [#Letter# #BDKStem#] $Filter$ = .* (.* $T$)* $Tail$ %%% Phonological Rules %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % morpho-phonological rules accompanying derivational processes could % be added here ALPHABET = [#AllSym#] e:<> % write+able => writable $PhonRules$ = e <=> <> ( a) %%% Resulting Transducer %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% $Filter$ || $PhonRules$ SFST/data/PaxHeaders.5815/easy0000644000000000000000000000013212546721220013112 xustar0030 mtime=1436263056.534053589 30 atime=1448530749.036600375 30 ctime=1448530762.056789156 SFST/data/easy/0000750007467500724610000000000012546721220015061 5ustar00schmidcisintern00000000000000SFST/data/easy/PaxHeaders.5815/adj0000644000000000000000000000007210533571007013653 xustar0029 atime=1448530762.04878904 29 ctime=1448530762.04878904 SFST/data/easy/adj0000640007467500724610000000004710533571007015544 0ustar00schmidcisintern00000000000000easy late early happy white black dark SFST/data/easy/PaxHeaders.5815/easy.fst0000644000000000000000000000013212546722222014651 xustar0030 mtime=1436263570.745030937 30 atime=1448530762.052789098 30 ctime=1448530762.052789098 SFST/data/easy/easy.fst0000640007467500724610000000172712546722222016553 0ustar00schmidcisintern00000000000000% Define the set of valid symbol pairs for the two-level rules. % The symbol # is used to mark the boundary between the stem and % the inflectional suffix. It is deleted here. ALPHABET = [A-Za-z] y:i [e\#]:<> % Read the lexical items from a separate file % each line of which contains a form like "dark" $WORDS$ = "adj" % Define a rule which replaces y with i % if a morpheme boundary and an e follows % easy#er -> easier $R1$ = y<=>i (\#:<> e) % Define a rule which eliminates e before "#e" % late#er -> later $R2$ = e<=><> (\#:<> e) % Compute the intersection of the two rule transducers $R$ = $R1$ & $R2$ % Define a transducer for the inflectional endings $INFL$ = :<> (:<> | :{er} | :{est}) % Concatenate the lexical forms and the inflectional endings and % put a morpheme boundary in between which is not printed in the analysis $S$ = $WORDS$ <>:\# $INFL$ % Apply the two level rules % The result transducer is stored in the output file $S$ || $R$ SFST/data/easy/PaxHeaders.5815/README0000644000000000000000000000007410533575247014066 xustar0030 atime=1448530762.052789098 30 ctime=1448530762.052789098 SFST/data/easy/README0000640007467500724610000000123710533575247015757 0ustar00schmidcisintern00000000000000Simple transducer example Compilation >>> fst-compiler easy.fst easy.a Test of the interactive analysis tool >>> fst-mor easy.a reading automaton... finished. analyse> easiest easy analyse> latest late analyse> late late // entering an empty line switches to generation mode analyse> generate> late late generate> q Test of the batch-mode analysis tool echo easier| fst-infl easy.a reading automaton... finished. > easier easy Test of the second batch mode analysis tool (usually faster) >>> fst-compact easy.a easy.ca >>> echo later| fst-infl2 easy.ca reading automaton... finished. > later late SFST/data/PaxHeaders.5815/SMOR0000644000000000000000000000007410601001320012714 xustar0030 atime=1448530749.040600434 30 ctime=1448530762.092789678 SFST/data/SMOR/0000750007467500724610000000000010601001320014656 5ustar00schmidcisintern00000000000000SFST/data/SMOR/PaxHeaders.5815/Makefile0000644000000000000000000000007410430654722014455 xustar0030 atime=1448530762.056789156 30 ctime=1448530762.056789156 SFST/data/SMOR/Makefile0000640007467500724610000000042610430654722016345 0ustar00schmidcisintern00000000000000 SOURCES = smor.fst OTHER = lexicon smor.a: phon.a $(OTHER) %.a: %.fst fst-compiler $< $@ %.ca: %.a fst-compact $< $@ Makefile: *.fst -makedepend -Y -o.a $(SOURCES) 2>/dev/null # DO NOT DELETE smor.a: map.fst NUM.fst deko.fst flexion.fst defaults.fst FIX.fst PRO.fst SFST/data/SMOR/PaxHeaders.5815/defaults.fst0000644000000000000000000000007410075520064015336 xustar0030 atime=1448530762.056789156 30 ctime=1448530762.060789214 SFST/data/SMOR/defaults.fst0000640007467500724610000001004010075520064017217 0ustar00schmidcisintern00000000000000%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % File: defaults.fst % Author: Helmut Schmid; IMS, University of Stuttgart % Date: July 2003 % Content: generation of default base, derivation and composition stems %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% $TMP$ = $VPrefStems$ $BaseStems$ || $NoDef2NULL$ || $PREFFILTER$ $TMP$ = ($BaseStems$ | $TMP$) || $KOMPOSFILTER$ $ANY$ = [\!-\~- <~n>