pax_global_header00006660000000000000000000000064135203027670014517gustar00rootroot0000000000000052 comment=b6356c877f4af28d90bda2373b3b71bd371a3273 morfessor-2.0.6/000077500000000000000000000000001352030276700135435ustar00rootroot00000000000000morfessor-2.0.6/.gitignore000066400000000000000000000005161352030276700155350ustar00rootroot00000000000000*.py[cod] # C extensions *.so # Packages *.egg *.egg-info dist build eggs parts bin var sdist develop-eggs .installed.cfg lib lib64 MANIFEST env* # Installer logs pip-log.txt # Unit test / coverage reports .coverage .tox nosetests.xml # Translations *.mo #Idea IDE .idea # Mr Developer .mr.developer.cfg .project .pydevproject morfessor-2.0.6/LICENSE000066400000000000000000000024761352030276700145610ustar00rootroot00000000000000Copyright (c) 2012-2019, Sami Virpioja, Peter Smit, and Stig-Arne Grönroos. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.morfessor-2.0.6/MANIFEST.in000066400000000000000000000001071352030276700152770ustar00rootroot00000000000000include LICENSE include ez_setup.py include docs/build_requirements.txtmorfessor-2.0.6/README000066400000000000000000000021621352030276700144240ustar00rootroot00000000000000Morfessor 2.0 - Quick start =========================== Installation ------------ Morfessor 2.0 is installed using setuptools library for Python. To build and install the module and scripts to default paths, type python setup.py install For details, see http://docs.python.org/install/ Documentation ------------- User instructions for Morfessor 2.0 are available in the docs directory as Sphinx source files (see http://sphinx-doc.org/). Instructions how to build the documentation can be found in docs/README. The documentation is also available on-line at http://morfessor.readthedocs.org/ Details of the implemented algorithms and methods and a set of experiments are described in the following technical report: Sami Virpioja, Peter Smit, Stig-Arne Grönroos, and Mikko Kurimo. Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline. Aalto University publication series SCIENCE + TECHNOLOGY, 25/2013. Aalto University, Helsinki, 2013. ISBN 978-952-60-5501-5. The report is available online at http://urn.fi/URN:ISBN:978-952-60-5501-5 Contact ------- Questions or feedback? Email: morpho@aalto.fi morfessor-2.0.6/docs/000077500000000000000000000000001352030276700144735ustar00rootroot00000000000000morfessor-2.0.6/docs/Makefile000066400000000000000000000151771352030276700161460ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = build # User-friendly check for sphinx-build ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) endif # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " xml to make Docutils-native XML files" @echo " pseudoxml to make pseudoxml-XML files for display purposes" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Morfessor.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Morfessor.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/Morfessor" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Morfessor" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." latexpdfja: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through platex and dvipdfmx..." $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." xml: $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml @echo @echo "Build finished. The XML files are in $(BUILDDIR)/xml." pseudoxml: $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml @echo @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." morfessor-2.0.6/docs/README000066400000000000000000000014161352030276700153550ustar00rootroot00000000000000Generating Documentation ------------------------ The user instructions for Morfessor 2.0 are available as Sphinx source files (see http://sphinx-doc.org/). To build the documentation you need both the 'sphinx' and the 'sphinxcontrib-napoleon' package. With a recent version of pip you could do:: pip install -e .[docs] to automatically install the required dependencies for making the docs. After installing Sphinx, you can generate the documentation in different formats using the Makefile or make.bat in the directory "docs". For example, to generate a PDF file, type "make latexpdf", and to generate a single HTML file, type "make singlehtml". Type "make help" to see all available formats. The documentation can also be read online on http://morfessor.readthedocs.org/morfessor-2.0.6/docs/build_requirements.txt000066400000000000000000000000361352030276700211350ustar00rootroot00000000000000sphinx sphinxcontrib-napoleon morfessor-2.0.6/docs/make.bat000066400000000000000000000150741352030276700161070ustar00rootroot00000000000000@ECHO OFF REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set BUILDDIR=build set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source set I18NSPHINXOPTS=%SPHINXOPTS% source if NOT "%PAPER%" == "" ( set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% ) if "%1" == "" goto help if "%1" == "help" ( :help echo.Please use `make ^` where ^ is one of echo. html to make standalone HTML files echo. dirhtml to make HTML files named index.html in directories echo. singlehtml to make a single large HTML file echo. pickle to make pickle files echo. json to make JSON files echo. htmlhelp to make HTML files and a HTML help project echo. qthelp to make HTML files and a qthelp project echo. devhelp to make HTML files and a Devhelp project echo. epub to make an epub echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter echo. text to make text files echo. man to make manual pages echo. texinfo to make Texinfo files echo. gettext to make PO message catalogs echo. changes to make an overview over all changed/added/deprecated items echo. xml to make Docutils-native XML files echo. pseudoxml to make pseudoxml-XML files for display purposes echo. linkcheck to check all external links for integrity echo. doctest to run all doctests embedded in the documentation if enabled goto end ) if "%1" == "clean" ( for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i del /q /s %BUILDDIR%\* goto end ) %SPHINXBUILD% 2> nul if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx echo.installed, then set the SPHINXBUILD environment variable to point echo.to the full path of the 'sphinx-build' executable. Alternatively you echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from echo.http://sphinx-doc.org/ exit /b 1 ) if "%1" == "html" ( %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/html. goto end ) if "%1" == "dirhtml" ( %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. goto end ) if "%1" == "singlehtml" ( %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. goto end ) if "%1" == "pickle" ( %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the pickle files. goto end ) if "%1" == "json" ( %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the JSON files. goto end ) if "%1" == "htmlhelp" ( %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run HTML Help Workshop with the ^ .hhp project file in %BUILDDIR%/htmlhelp. goto end ) if "%1" == "qthelp" ( %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run "qcollectiongenerator" with the ^ .qhcp project file in %BUILDDIR%/qthelp, like this: echo.^> qcollectiongenerator %BUILDDIR%\qthelp\Morfessor.qhcp echo.To view the help file: echo.^> assistant -collectionFile %BUILDDIR%\qthelp\Morfessor.ghc goto end ) if "%1" == "devhelp" ( %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp if errorlevel 1 exit /b 1 echo. echo.Build finished. goto end ) if "%1" == "epub" ( %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub if errorlevel 1 exit /b 1 echo. echo.Build finished. The epub file is in %BUILDDIR%/epub. goto end ) if "%1" == "latex" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex if errorlevel 1 exit /b 1 echo. echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. goto end ) if "%1" == "latexpdf" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex cd %BUILDDIR%/latex make all-pdf cd %BUILDDIR%/.. echo. echo.Build finished; the PDF files are in %BUILDDIR%/latex. goto end ) if "%1" == "latexpdfja" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex cd %BUILDDIR%/latex make all-pdf-ja cd %BUILDDIR%/.. echo. echo.Build finished; the PDF files are in %BUILDDIR%/latex. goto end ) if "%1" == "text" ( %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text if errorlevel 1 exit /b 1 echo. echo.Build finished. The text files are in %BUILDDIR%/text. goto end ) if "%1" == "man" ( %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man if errorlevel 1 exit /b 1 echo. echo.Build finished. The manual pages are in %BUILDDIR%/man. goto end ) if "%1" == "texinfo" ( %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo if errorlevel 1 exit /b 1 echo. echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. goto end ) if "%1" == "gettext" ( %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale if errorlevel 1 exit /b 1 echo. echo.Build finished. The message catalogs are in %BUILDDIR%/locale. goto end ) if "%1" == "changes" ( %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes if errorlevel 1 exit /b 1 echo. echo.The overview file is in %BUILDDIR%/changes. goto end ) if "%1" == "linkcheck" ( %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck if errorlevel 1 exit /b 1 echo. echo.Link check complete; look for any errors in the above output ^ or in %BUILDDIR%/linkcheck/output.txt. goto end ) if "%1" == "doctest" ( %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest if errorlevel 1 exit /b 1 echo. echo.Testing of doctests in the sources finished, look at the ^ results in %BUILDDIR%/doctest/output.txt. goto end ) if "%1" == "xml" ( %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml if errorlevel 1 exit /b 1 echo. echo.Build finished. The XML files are in %BUILDDIR%/xml. goto end ) if "%1" == "pseudoxml" ( %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml if errorlevel 1 exit /b 1 echo. echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. goto end ) :end morfessor-2.0.6/docs/source/000077500000000000000000000000001352030276700157735ustar00rootroot00000000000000morfessor-2.0.6/docs/source/cmdtools.rst000066400000000000000000000317271352030276700203630ustar00rootroot00000000000000Command line tools ================== The installation process installs 4 scripts in the appropriate PATH. morfessor --------- The morfessor command is a full-featured script for training, updating models and segmenting test data. Loading existing model ~~~~~~~~~~~~~~~~~~~~~~ ``-l `` load :ref:`binary-model-def` ``-L `` load :ref:`morfessor1-model-def` Loading data ~~~~~~~~~~~~ ``-t , --traindata `` Input corpus file(s) for training (text or bz2/gzipped text; use '-' for standard input; add several times in order to append multiple files). Standard, all sentences are split on whitespace and the tokens are used as compounds. The ``--traindata-list`` option can be used to read all input files as a list of compounds, one compound per line optionally prefixed by a count. See :ref:`data-format-options` for changing the delimiters used for separating compounds and atoms. ``--traindata-list`` Interpret all training files as list files instead of corpus files. A list file contains one compound per line with optionally a count as prefix. ``-T , --testdata `` Input corpus file(s) to analyze (text or bz2/gzipped text; use '-' for standard input; add several times in order to append multiple files). The file is read in the same manner as an input corpus file. See :ref:`data-format-options` for changing the delimiters used for separating compounds and atoms. Training model options ~~~~~~~~~~~~~~~~~~~~~~ ``-m , --mode `` Morfessor can run in different modes, each doing different actions on the model. The modes are: none Do initialize or train a model. Can be used when just loading a model for segmenting new data init Create new model and load input data. Does not train the model batch Loads an existing model (which is already initialized with training data) and run :ref:`batch-training` init+batch Create a new model, load input data and run :ref:`batch-training`. **Default** online Create a new model, read and train the model concurrently as described in :ref:`online-training` online+batch First read and train the model concurrently as described in :ref:`online-training` and after that retrain the model using :ref:`batch-training` ``-a , --algorithm `` Algorithm to use for training: recursive Recursive as descirbed in :ref:`recursive-training` **Default** viterbi Viterbi as described in :ref:`viterbi-training` ``-d , --dampening `` Method for changing the compound counts in the input data. Options: none Do not alter the counts of compounds (token based training) log Change the count :math:`x` of a compound to :math:`\log(x)` (log-token based training) ones Treat all compounds as if they only occured once (type based training) ``-f , --forcesplit `` A list of atoms that would always cause the compound to be split. By default only hyphens (``-``) would force a split. Note the notation of the argument list. To have no force split characters, use as an empty string as argument (``-f ""``). To split, for example, both hyphen (``-``) and apostrophe (``'``) use ``-f "-'"`` ``-F , --finish-threshold `` Stopping threshold. Training stops when the decrease in model cost of the last iteration is smaller then finish_threshold * #boundaries; (default '0.005') ``-r , --randseed `` Seed for random number generator ``-R , --randsplit `` Initialize new words by random splitting using the given split probability (default no splitting). See :ref:`rand-init` ``--skips`` Use random skips for frequently seen compounds to speed up training. See :ref:`rand-init` ``--batch-minfreq `` Compound frequency threshold for batch training (default 1) ``--max-epochs `` Hard maximum of epochs in training ``--nosplit-re `` If the expression matches the two surrounding characters, do not allow splitting (default None) ``--online-epochint `` Epoch interval for online training (default 10000) ``--viterbi-smoothing `` Additive smoothing parameter for Viterbi training and segmentation (default 0). ``--viterbi-maxlen `` Maximum construction length in Viterbi training and segmentation (default 30) Saving model ~~~~~~~~~~~~ ``-s `` save :ref:`binary-model-def` ``-S `` save :ref:`morfessor1-model-def` ``--save-reduced`` save :ref:`binary-reduced-model-def` Examples ~~~~~~~~ Training a model from inputdata.txt, saving a :ref:`morfessor1-model-def` and segmenting the test.txt set: :: morfessor -t inputdata.txt -S model.segm -T test.txt morfessor-train --------------- The morfessor-train command is a convenience command that enables easier training for morfessor models. The basic command structure is: :: morfessor-train [arguments] traindata-file [traindata-file ...] The arguments are identical to the ones for the `morfessor`_ command. The most relevant are: ``-s `` save binary model ``-S `` save Morfessor 1.0 style model ``--save-reduced`` save reduced binary model Examples ~~~~~~~~ Train a morfessor model from a wordcount list in ISO_8859-15, doing type based training, writing the log to logfile and saving them model as model.bin: :: morfessor-train --encoding=ISO_8859-15 --traindata-list --logfile=log.log -s model.bin -d ones traindata.txt morfessor-segment ----------------- The morfessor-segment command is a convenience command that enables easier segmentation of test data with a morfessor model. The basic command structure is: :: morfessor-segment [arguments] testcorpus-file [testcorpus-file ...] The arguments are identical to the ones for the `morfessor`_ command. The most relevant are: ``-l `` load binary model (normal or reduced) ``-L `` load Morfessor 1.0 style model Examples ~~~~~~~~ Loading a binary model and segmenting the words in testdata.txt: :: morfessor-segment -l model.bin testdata.txt morfessor-evaluate ------------------ The morfessor-evaluate command is used for evaluating a morfessor model against a gold-standard. If multiple models are evaluated, it reports statistical significant differences between them. The basic command structure is: :: morfessor-evaluate [arguments] [ ...] Positional arguments ~~~~~~~~~~~~~~~~~~~~ ```` gold standard file in standard annotation format ```` model files to segment (either binary or Morfessor 1.0 style segmentation models). Optional arguments ~~~~~~~~~~~~~~~~~~ ``-t TEST_SEGMENTATIONS, --testsegmentation TEST_SEGMENTATIONS`` Segmentation of the test set. Note that all words in the gold-standard must be segmented ``--num-samples `` number of samples to take for testing ``--sample-size `` size of each testing samples ``--format-string `` Python new style format string used to report evaluation results. The following variables are a value and and action separated with and underscore. E.g. fscore_avg for the average f-score. The available values are "precision", "recall", "fscore", "samplesize" and the available actions: "avg", "max", "min", "values", "count". A last meta-data variable (without action) is "name", the filename of the model. See also the format-template option for predefined strings. ``--format-template