debian/0000755000000000000000000000000012215424167007172 5ustar debian/kytea.install0000644000000000000000000000004212215423526011671 0ustar usr/bin/kytea usr/bin/train-kytea debian/kytea.manpages0000644000000000000000000000004412215423526012020 0ustar debian/kytea.1 debian/train-kytea.1 debian/compat0000644000000000000000000000000212215423526010366 0ustar 9 debian/changelog0000644000000000000000000000065512215423526011050 0ustar kytea (0.4.6+dfsg-2) unstable; urgency=low * debian/libkytea0.shlibs - control version using shlibs instead of symbols file (Closes: #722287) * debian/libkytea0.symbols - remove symbols -- Koichi Akabe Wed, 11 Sep 2013 20:57:36 +0900 kytea (0.4.6+dfsg-1) unstable; urgency=low * Initial release. (Closes: #721985) -- Koichi Akabe Sat, 07 Sep 2013 10:10:49 +0900 debian/watch0000644000000000000000000000015012215423526010215 0ustar version=3 opts=dversionmangle=s/\+dfsg// \ http://www.phontron.com/kytea/download/kytea-(\d.*)\.tar\.gz debian/kytea.10000644000000000000000000000560512215423526010375 0ustar .TH "KYTEA" "1" .SH "NAME" kytea \(em a word segmentation/pronunciation estimation tool .SH "SYNOPSIS" .PP \fBkytea\fR [\fBoptions\fP] .SH "DESCRIPTION" .PP This manual page documents briefly the \fBkytea\fR command. .PP This manual page was written for the \fBDebian\fP distribution because the original program does not have a manual page. Instead, it has documentation in the GNU \fBInfo\fP format; see below. .PP \fBkytea\fR is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie. .PP .SH "OPTIONS" .PP A summary of options is included below. .SS Analysis Options: .IP "\fB-model\fP" 11 The model file to use when analyzing text .IP "\fB-nows\fP" 11 Don't do word segmentation (raw input cannot be accepted) .IP "\fB-notags\fP" 11 Do only word segmentation, no tagging .IP "\fB-notag\fP" 11 Skip the tag of the nth tag (n starts at 1) .IP "\fB-nounk\fP" 11 Don't estimate the pronunciation of unknown words .IP "\fB-wsconst\fP" 11 Specifies character types to not be segmented (e.g. D for digits) .IP "\fB-unkbeam\fP" 11 The width of the beam to use in beam search for unknown words (default 50, 0 for full search) .IP "\fB-debug\fP" 11 The debugging level (0=silent, 1=simple, 2=detailed) .SS Format Options: .IP "\fB-in\fP" 11 The formatting of the input (raw/tok/full/part/conf, default raw) .IP "\fB-out\fP" 11 The formatting of the output (full/part/conf/eda/tags, default full) .IP "\fB-tagmax\fP" 11 The maximum number of tags to print for one word (default 3, 0 implies no limit) .IP "\fB-deftag\fP" 11 A tag for words that cannot be given any tag (for example, unknown words that contain a character not in the subword dictionary) .IP "\fB-unktag\fP" 11 A tag to append to indicate words not in the dictionary .SS Format Options (for advanced users): .IP "\fB-wordbound\fP" 11 The separator for words in full annotation (" ") .IP "\fB-tagbound\fP" 11 The separator for tags in full/partial annotation ("/") .IP "\fB-elembound\fP" 11 The separator for candidates in full/partial annotation ("&") .IP "\fB-unkbound\fP" 11 Indicates unannotated boundaries in partial annotation (" ") .IP "\fB-skipbound\fP" 11 Indicates skipped boundaries in partial annotation ("?") .IP "\fB-nobound\fP" 11 Indicates non-existence of boundaries in partial annotation ("\-") .IP "\fB-hasbound\fP" 11 Indicates existence of boundaries in partial annotation ("|") .PP .RE .SH "AUTHOR" .PP This manual page was written by Koichi Akabe vbkaisetsu@gmail.com for the \fBDebian\fP system (and may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. .PP On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. debian/train-kytea.10000644000000000000000000000744212215423526011511 0ustar .TH "TRAIN-KYTEA" "1" .SH "NAME" kytea \(em a word segmentation/pronunciation estimation tool .SH "SYNOPSIS" .PP \fBtrain\-kytea\fR [\fBoptions\fP] .SH "DESCRIPTION" .PP This manual page documents briefly the \fBtrain\-kytea\fR command. .PP This manual page was written for the \fBDebian\fP distribution because the original program does not have a manual page. Instead, it has documentation in the GNU \fBInfo\fP format; see below. .PP \fBkytea\fR is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie. .PP .SH "OPTIONS" .PP A summary of options is included below. .SS Input/Output Options: .IP "\fB-encode\fP" 11 The text encoding to be used (utf8/euc/sjis; default: utf8) .IP "\fB-full\fP" 11 A fully annotated training corpus (multiple possible) .IP "\fB-tok\fP" 11 A training corpus that is tokenized with no tags (multiple possible) .IP "\fB-part\fP" 11 A partially annotated training corpus (multiple possible) .IP "\fB-conf\fP" 11 A confidence annotated training corpus (multiple possible) .IP "\fB-feat\fP" 11 A file containing features generated by \-featout .IP "\fB-dict\fP" 11 A dictionary file (one 'word/pron' entry per line, multiple possible) .IP "\fB-subword\fP" 11 A file of subword units. This will enable unknown word PE. .IP "\fB-model\fP" 11 The file to write the trained model to .IP "\fB-modtext\fP" 11 Print a text model (instead of the default binary) .IP "\fB-featout\fP" 11 Write the features used in training the model to this file .SS Model Training Options (basic) .IP "\fB-nows\fP" 11 Don't train a word segmentation model .IP "\fB-notags\fP" 11 Skip the training of tagging, do only word segmentation .IP "\fB-global\fP" 11 Train the nth tag with a global model (good for POS, bad for PE) .IP "\fB-debug\fP" 11 The debugging level during training (0=silent, 1=normal, 2=detailed) .SS Model Training Options (for advanced users): .IP "\fB-charw\fP" 11 The character window to use for WS (3) .IP "\fB-charn\fP" 11 The character n\-gram length to use for WS for WS (3) .IP "\fB-typew\fP" 11 The character type window to use for WS (3) .IP "\fB-typen\fP" 11 The character type n\-gram length to use for WS for WS (3) .IP "\fB-dictn\fP" 11 Dictionary words greater than \-dictn will be grouped together (4) .IP "\fB-unkn\fP" 11 Language model n\-gram order for unknown words (3) .IP "\fB-eps\fP" 11 The epsilon stopping criterion for classifier training .IP "\fB-cost\fP" 11 The cost hyperparameter for classifier training .IP "\fB-nobias\fP" 11 Don't use a bias value in classifier training .IP "\fB-solver\fP" 11 The solver (1=SVM, 7=logistic regression, etc.; default 1, see LIBLINEAR documentation for more details) .SS Format Options (for advanced users): .IP "\fB-wordbound\fP" 11 The separator for words in full annotation (" ") .IP "\fB-tagbound\fP" 11 The separator for tags in full/partial annotation ("/") .IP "\fB-elembound\fP" 11 The separator for candidates in full/partial annotation ("&") .IP "\fB-unkbound\fP" 11 Indicates unannotated boundaries in partial annotation (" ") .IP "\fB-skipbound\fP" 11 Indicates skipped boundaries in partial annotation ("?") .IP "\fB-nobound\fP" 11 Indicates non-existence of boundaries in partial annotation ("-") .IP "\fB-hasbound\fP" 11 Indicates existence of boundaries in partial annotation ("|") .PP .RE .SH "AUTHOR" .PP This manual page was written by Koichi Akabe vbkaisetsu@gmail.com for the \fBDebian\fP system (and may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. .PP On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. debian/libkytea-dev.install0000644000000000000000000000007012215423526013135 0ustar usr/include/ usr/lib/*/pkgconfig/ usr/lib/*/libkytea.so debian/control0000644000000000000000000000260212215423526010573 0ustar Source: kytea Section: misc Priority: optional Maintainer: Koichi Akabe Build-Depends: debhelper (>= 9.0.0), dh-autoreconf, liblinear-dev, libboost-dev Standards-Version: 3.9.4 Homepage: http://www.phontron.com/kytea/ Package: libkytea0 Section: libs Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: library of KyTea KyTea is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie. . This package contains shared libraries of KyTea. Package: kytea Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: morphological analysis system with pointwise predictors KyTea is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie. . This package contains predictor and training tool. Package: libkytea-dev Section: libdevel Architecture: any Depends: libkytea0 (= ${binary:Version}), ${misc:Depends} Description: library of KyTea : development files KyTea is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie. . This package contains development files of KyTea. debian/README.source0000644000000000000000000000034712215423526011353 0ustar This is repacked source of the upstream package. - The upstream package contains a non-free model file. Is was striped on this package. - The upstream package contains liblinear, but it's duplicated with debian's one.debian/copyright0000644000000000000000000000173112215423526011125 0ustar Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Upstream-Name: kytea Source: http://www.phontron.com/kytea/ Files: * Copyright: 2009-2013, KyTea Development Team License: Apache-2.0 Files: debian/* Copyright: 2013 Koichi Akabe License: Apache-2.0 License: Apache-2.0 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at . http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. . On Debian systems, the complete text of the Apache version 2.0 license can be found in "/usr/share/common-licenses/Apache-2.0". debian/libkytea0.install0000644000000000000000000000003012215423526012435 0ustar usr/lib/*/libkytea.so.* debian/rules0000755000000000000000000000241312215423526010250 0ustar #!/usr/bin/make -f %: dh "$@" --with autoreconf upstream_version ?= $(shell dpkg-parsechangelog | sed -rne 's/^Version: ([0-9.]+)(\+dfsg)?.*$$/\1/p') dfsg_version = $(upstream_version)+dfsg upstream_pkg = "kytea" pkg = $(shell dpkg-parsechangelog | sed -ne 's/^Source: //p') get-orig-source: uscan --rename --download-current-version --destdir=. tar -xzf $(pkg)_$(upstream_version).orig.tar.gz rm -f $(pkg)_$(upstream_version).orig.tar.gz mv $(upstream_pkg)-$(upstream_version) $(pkg)_$(dfsg_version).orig rm -r $(pkg)_$(dfsg_version).orig/data rm -r $(pkg)_$(dfsg_version).orig/src/lib/liblinear rm -r $(pkg)_$(dfsg_version).orig/missing rm -r $(pkg)_$(dfsg_version).orig/ltmain.sh rm -r $(pkg)_$(dfsg_version).orig/install-sh rm -r $(pkg)_$(dfsg_version).orig/depcomp rm -r $(pkg)_$(dfsg_version).orig/configure rm -r $(pkg)_$(dfsg_version).orig/config.sub rm -r $(pkg)_$(dfsg_version).orig/config.guess rm -r $(pkg)_$(dfsg_version).orig/aclocal.m4 rm -r $(pkg)_$(dfsg_version).orig/Makefile.in rm -r $(pkg)_$(dfsg_version).orig/INSTALL tar -czf $(CURDIR)/../$(pkg)_$(dfsg_version).orig.tar.gz $(pkg)_$(dfsg_version).orig rm -r $(pkg)_$(dfsg_version).orig ifeq (,$(findstring nocheck,$(DEB_BUILD_OPTIONS))) override_dh_auto_test: $(CURDIR)/src/test/test-kytea endifdebian/source/0000755000000000000000000000000012215423526010470 5ustar debian/source/format0000644000000000000000000000001412215423526011676 0ustar 3.0 (quilt) debian/patches/0000755000000000000000000000000012215423526010617 5ustar debian/patches/10_remove-data-dir.patch0000644000000000000000000000114712215423526015123 0ustar Description: remove data directory The upstream data directory contains non-free model file. This patch remove it from am file. Author: Koichi Akabe Last-Update: 2013-08-23 --- kytea-0.4.6+dfsg.orig/Makefile.am +++ kytea-0.4.6+dfsg/Makefile.am @@ -1,4 +1,4 @@ -SUBDIRS = src data +SUBDIRS = src pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = kytea.pc --- kytea-0.4.6+dfsg.orig/configure.ac +++ kytea-0.4.6+dfsg/configure.ac @@ -16,7 +16,6 @@ AC_CONFIG_FILES([ src/bin/Makefile src/test/Makefile src/api/Makefile - data/Makefile ]) # disable shared libraries debian/patches/20_use-shared-liblinear.patch0000644000000000000000000000333312215423526016142 0ustar Description: use shared liblinear This patch uses shared liblinear instead of included one Author: Koichi Akabe Last-Update: 2013-08-23 --- kytea-0.4.6+dfsg.orig/configure.ac +++ kytea-0.4.6+dfsg/configure.ac @@ -11,8 +11,6 @@ AC_CONFIG_FILES([ src/Makefile src/include/Makefile src/lib/Makefile - src/lib/liblinear/Makefile - src/lib/liblinear/blas/Makefile src/bin/Makefile src/test/Makefile src/api/Makefile --- kytea-0.4.6+dfsg.orig/src/lib/Makefile.am +++ kytea-0.4.6+dfsg/src/lib/Makefile.am @@ -1,4 +1,3 @@ -LLLIBS = liblinear/liblinear.la KYTCPP = kytea.cpp general-io.cpp corpus-io-prob.cpp corpus-io-eda.cpp corpus-io-full.cpp corpus-io-part.cpp corpus-io-tokenized.cpp corpus-io-raw.cpp corpus-io.cpp model-io.cpp string-util.cpp kytea-model.cpp kytea-config.cpp kytea-lm.cpp feature-io.cpp dictionary.cpp feature-lookup.cpp kytea-util.cpp kytea-string.cpp kytea-struct.cpp # KYTH = kytea.h corpus-io.h model-io.h string-util.h \ # kytea-model.h kytea-string.h kytea-struct.h dictionary.h general-io.h \ @@ -6,10 +5,7 @@ KYTCPP = kytea.cpp general-io.cpp corpu AM_CPPFLAGS = -I$(srcdir)/../include -DPKGDATADIR='"$(pkgdatadir)"' -SUBDIRS = liblinear - lib_LTLIBRARIES = libkytea.la libkytea_la_SOURCES = ${KYTCPP} -libkytea_la_LIBADD = ${LLLIBS} -libkytea_la_LDFLAGS = -version-info 0:0:0 +libkytea_la_LDFLAGS = -llinear -version-info 0:0:0 --- kytea-0.4.6+dfsg.orig/src/lib/kytea-model.cpp +++ kytea-0.4.6+dfsg/src/lib/kytea-model.cpp @@ -3,7 +3,7 @@ #include #include #include -#include "liblinear/linear.h" +#include #include #include #include debian/patches/series0000644000000000000000000000006712215423526012037 0ustar 10_remove-data-dir.patch 20_use-shared-liblinear.patch debian/libkytea0.shlibs0000644000000000000000000000004112215423526012255 0ustar libkytea 0 libkytea0 (>= 0.4.6)