debian/0000775000000000000000000000000011770066203007172 5ustar debian/compat0000664000000000000000000000000211316732633010373 0ustar 7 debian/ocr4gamera.rst0000664000000000000000000000273511670011125011747 0ustar ========== ocr4gamera ========== ------------------------------------- OCR system using the Gamera framework ------------------------------------- :Manual section: 1 Usage ----- **ocr4gamera** -x `` [`options`] `` Options ------- -v , --verbosity= Set verbosity level to ``. Possible values are 0 (default): silent operation; 1: information on progress; >2: segmentation info is written to PNG files with prefix ``debug_``. -h, --help Display help and exit. -d, --deskew Do a skew correction (recommended). -f, --filter Filter out very large (images) and very small components (noise). -a, --automatic-group Autogroup glyphs with classifier. -x , --xmlfile= Read training data from ``. -o , --output= Write recognized text to file `` (otherwise it is written to stdout). -c , --extra_chars_csvfile= Read additional class name conversions from file ``. `` must contain one conversion per line. -R , --heuristic_rules= Apply heuristic rules `` for disambiguation of some chars. `` can be ``roman`` (default) or ``none`` (for no rules). -D, --dictionary-correction Correct words using a dictionary (requires aspell or ispell). -L , --dictionary-language= Use `` as language for aspell (when option ``-D`` is set). -e , --edit-distance= Correct words only when edit distance not more than ``. debian/control0000664000000000000000000000275311721422202010573 0ustar Source: ocr4gamera Section: python Priority: optional Maintainer: Jakub Wilk Uploaders: Debian Python Modules Team Vcs-Svn: svn://svn.debian.org/python-modules/packages/ocr4gamera/trunk/ Vcs-Browser: http://svn.debian.org/viewsvn/python-modules/packages/ocr4gamera/trunk/ Build-Depends: debhelper (>= 7), python-support (>= 0.90), python-all-dev, python-gamera (>= 3.2.6), python-gamera-dev, python-docutils (>= 0.6), python-pygments (>= 0.6) Standards-Version: 3.9.3 Homepage: http://gamera.informatik.hsnr.de/addons/ocr4gamera/ XS-Python-Version: >= 2.4 Package: python-gamera.toolkits.ocr Architecture: all Depends: ${misc:Depends}, ${python:Depends}, python-gamera (>= 3.2.6) Suggests: aspell | ispell Enhances: python-gamera Description: toolkit for building OCR systems The Gamera OCR Toolkit is meant to help building optical character recognition (OCR) systems for standard text documents. Even though it can be used as is, it is specifically designed to make individual steps of the recognition system customizable and replaceable. It provides: * a flexible mechanism for plugging in custom page segmentation algorithms * heuristic rules for dealing with diacritics, and for disambiguation of common confused roman characters (like comma and apostrophe, or lower and upper case ‘W’) * a ready-to-run script ocr4gamera which acts as a basic OCR-system. . Note that the toolkit does not include any training data. debian/patches/0000775000000000000000000000000011770066203010621 5ustar debian/patches/no-wx-import.diff0000664000000000000000000000066211716302454014040 0ustar Description: Don't import the wx module, it's not unused by anything. Author: Jakub Wilk Bug: http://tech.groups.yahoo.com/group/gamera-devel/message/2209 Forwarded: yes Last-Update: 2012-02-13 --- a/gamera/toolkits/ocr/__init__.py +++ b/gamera/toolkits/ocr/__init__.py @@ -9,7 +9,6 @@ from gamera import toolkit import plugins -import wx # You can inherit from toolkit.CustomMenu to create a menu debian/patches/series0000664000000000000000000000013411716300561012032 0ustar no-wx-import.diff doc-plugin-import.diff doc-fix-hyperlink-target.diff doc-build-local.diff debian/patches/doc-build-local.diff0000664000000000000000000000212311716302454014404 0ustar Description: Allow to build documentation from local source. Normally it's only possible to build documentation is the toolkit is installed system-wide. This patch allows the documentation to be built from local source. . Also, abort if importing the toolkit didn't succeed. Author: Jakub Wilk Forwarded: not-needed Last-Update: 2012-02-13 --- a/doc/gendoc.py +++ b/doc/gendoc.py @@ -1,8 +1,20 @@ #!/usr/bin/env python +import os +import sys + from gamera import gendoc if __name__ == '__main__': + + import gamera.toolkits + gamera.toolkits.__path__[:0] = [os.path.join( + sys.path[0], + os.pardir, + 'gamera', + 'toolkits' + )] + # Step 1: # Import all of the plugins to document. # Be careful not to load the core plugins, or they @@ -12,6 +24,7 @@ try: from gamera.toolkits.ocr.plugins import bbox_merging_mcmillan except ImportError: + raise print "WARNING:" print "This `ocr` toolkit must be installed before generating" print "the documentation. For now, the system will skip generating" debian/patches/doc-plugin-import.diff0000664000000000000000000000117211716302454015026 0ustar Description: Import correct plugin when building documentation. Author: Jakub Wilk Bug: http://tech.groups.yahoo.com/group/gamera-devel/message/2211 Forwarded: yes Last-Update: 2012-02-13 --- a/doc/gendoc.py +++ b/doc/gendoc.py @@ -10,7 +10,7 @@ # If the plugins are not already installed, we'll just ignore # them and generate the narrative documentation. try: - from gamera.toolkits.ocr.plugins import clear + from gamera.toolkits.ocr.plugins import bbox_merging_mcmillan except ImportError: print "WARNING:" print "This `ocr` toolkit must be installed before generating" debian/patches/doc-fix-hyperlink-target.diff0000664000000000000000000000114011716302454016270 0ustar Description: Fix hyperlink target in the documentation. Author: Jakub Wilk Bug: http://tech.groups.yahoo.com/group/gamera-devel/message/2210 Forwarded: yes Last-Update: 2012-02-13 --- a/gamera/toolkits/ocr/ocr_toolkit.py +++ b/gamera/toolkits/ocr/ocr_toolkit.py @@ -55,7 +55,7 @@ to a `standard unicode character name`_, as in the examples of the following table: -.. _`standard unicode character names`: http://www.unicode.org/charts/ +.. _`standard unicode character name`: http://www.unicode.org/charts/ +-----------+----------------------------+----------------------------+ debian/copyright0000664000000000000000000000300211721422202011107 0ustar Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Upstream-Name: ocr4gamera Upstream-Contact: Rene Baston, Christoph Dalitz Source: http://gamera.informatik.hsnr.de/addons/ocr4gamera/ Files: * Copyright: 2009-2010, Rene Baston 2009-2012, Christoph Dalitz 2010, Robert Butz License: GPL-2+ This toolkit is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, either version 2 of the license, or (at your option) any later version. . This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the file LICENSE for more details. . On Debian systems, the complete text of the GNU General Public License version 2 can be found in the /usr/share/common-licenses/GPL-2 file. Files: debian/* Copyright: 2009-2012, Jakub Wilk License: GPL-2 This package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 dated June, 1991. . This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. . On Debian systems, the complete text of the GNU General Public License version 2 can be found in the /usr/share/common-licenses/GPL-2 file. debian/watch0000664000000000000000000000012611316732633010225 0ustar version=3 http://gamera.informatik.hsnr.de/addons/ocr4gamera/ocr-([0-9.]+)[.]tar[.]gz debian/links0000664000000000000000000000013211316732633010234 0ustar usr/share/doc/python-gamera.toolkits.ocr/src usr/share/doc/python-gamera.toolkits.ocr/rst debian/docs0000664000000000000000000000003011316732633010041 0ustar README doc/html doc/src debian/changelog0000664000000000000000000000422211770066175011054 0ustar ocr4gamera (1.0.6-3) unstable; urgency=low * Don't include *.egg-info in the binary package, as distribution name is too generic. * Do not pass explicit debian/control path to pyversions. -- Jakub Wilk Tue, 19 Jun 2012 14:11:39 +0200 ocr4gamera (1.0.6-2) unstable; urgency=low * Upload to unstable. * Add lintian override for build-depends-on-python-dev-with-no-arch-any. * Use build-stamp in build* targets. * Reduce build-dependency on debhelper to >= 7. * Run dh_testdir in the clean target. * Bump standards version to 3.9.3. + Update debian/copyright URI. -- Jakub Wilk Sun, 26 Feb 2012 17:09:24 +0100 ocr4gamera (1.0.6-1) experimental; urgency=low * New upstream release. * Update copyright file. * Build-depend on python-all-dev to test-build for all supported Python versions. * Rewrite debian/rules from scratch, without using dh. * Fix hyperlink target in the documentation. (doc-fix-hyperlink-target.diff) * Build documentation from source. + Update debian/rules. + Add doc/html/ directory to dpkg-source's extend-diff-ignore. + Patch gendoc script to import correct plugin. (doc-plugin-import.diff) + Patch gendoc script to use local copy of sources. (doc-build-local.diff) + Add python-pygments to Build-Depends. * Update patch headers. -- Jakub Wilk Mon, 13 Feb 2012 22:48:50 +0100 ocr4gamera (1.0.5-1) experimental; urgency=low * New upstream release. * Update debian/copyright to the latest DEP-5 version. * Bump standards version to 3.9.2 (no changes needed). * Don't import wx, it's not used by anything. (no-wx-import.diff) * Use versioned format URI in debian/copyright. * Provide a simple manual page. + Build-depend on python-docutils, to build it from reStructuredText format. * Eliminate files with duplicate contents from the documentation using a dedicated script. -- Jakub Wilk Sat, 10 Dec 2011 19:45:06 +0100 ocr4gamera (1.0.4-1) experimental; urgency=low * Initial release (closes: #553044). -- Jakub Wilk Thu, 26 Aug 2010 19:33:48 +0200 debian/clean0000664000000000000000000000004211316732633010176 0ustar gamera/toolkits/ocr/plugins/*.cpp debian/source/0000775000000000000000000000000011770066203010472 5ustar debian/source/format0000664000000000000000000000001411353227514011701 0ustar 3.0 (quilt) debian/source/lintian-overrides0000664000000000000000000000025011716315202014044 0ustar # We really need python(-all)-dev for compilation, even though binary packages # are architecture-independent. ocr4gamera: build-depends-on-python-dev-with-no-arch-any debian/source/options0000664000000000000000000000004011716267322012107 0ustar extend-diff-ignore = ^doc/html/ debian/symlink-helper0000775000000000000000000000310011670011570012051 0ustar #!/usr/bin/python import hashlib import os import sys def file_hash(path): hashsum = hashlib.sha256() file = open(path, 'rb') try: hashsum.update(file.read()) finally: file.close() return hashsum.digest() def main(): data = {} if len(sys.argv) != 3: print >>sys.stderr, 'Usage: %s ' sys.exit(1) _, src_dir, dst_dir = sys.argv for root, dirs, files in os.walk(src_dir): for filename in files: path = os.path.join(root, filename) path = os.path.normpath(path) data[file_hash(path)] = path for root, dirs, files in os.walk(dst_dir): for filename in files: if not '_generic' in filename: continue path = os.path.join(root, filename) path = os.path.normpath(path) data[file_hash(path)] = path for root, dirs, files in os.walk(dst_dir): for filename in files: if '_generic' in filename: continue path = os.path.join(root, filename) path = os.path.normpath(path) try: sympath = data[file_hash(path)] except LookupError: pass else: sympath = os.path.join('../' * root.count('/'), sympath) sympath = os.path.normpath(sympath) print >>sys.stderr, 'symlinking %s -> %s' % (path, sympath) os.unlink(path) os.symlink(sympath, path) if __name__ == '__main__': main() # vim:ts=4 sw=4 et debian/rules0000775000000000000000000000212111770057334010253 0ustar #!/usr/bin/make -f python_all = pyversions -r | tr ' ' '\n' | xargs -t -I {} env {} .PHONY: clean clean: dh_testdir dh_clean rm -rf build debian/*.[1-9] find -name '*.py[co]' -delete .PHONY: build build-arch build-indep build build-indep: build-stamp build-stamp: dh_testdir $(python_all) setup.py build rst2man debian/ocr4gamera.rst > debian/ocr4gamera.1 rm -rf doc/html/ cd doc && python gendoc.py touch $(@) .PHONY: binary binary-arch binary-indep binary binary-indep: build-stamp dh_testdir dh_testroot dh_prep $(python_all) setup.py install --prefix=/usr --root=debian/python-gamera.toolkits.ocr/ find debian/*/ -name '_bbox_*.so' -delete find debian/*/ -name '*.egg-info' -delete cd debian/*/usr/bin/ && \ rename.ul '.py' '' ocr4gamera.py && \ sed -i -e '1s,^#!.*,#!/usr/bin/python,' ocr4gamera dh_installdocs cd debian/*/usr/share/doc/*/html/ && \ $(CURDIR)/debian/symlink-helper ../src/ . dh_installchangelogs dh_installman debian/*.[1-9] dh_pysupport dh_link dh_compress dh_fixperms dh_installdeb dh_gencontrol dh_md5sums dh_builddeb # vim:ts=4 sw=4 noet debian/doc-base0000664000000000000000000000071311316732633010576 0ustar Document: ocr4gamera-documentation Title: OCR toolkit for Gamera Author: Rene Baston, Christoph Dalitz Section: Programming Format: HTML Index: /usr/share/doc/python-gamera.toolkits.ocr/html/index.html Files: /usr/share/doc/python-gamera.toolkits.ocr/html/*.html Format: Text Index: /usr/share/doc/python-gamera.toolkits.ocr/rst/index.txt.gz Files: /usr/share/doc/python-gamera.toolkits.ocr/rst/*.txt /usr/share/doc/python-gamera.toolkits.ocr/rst/*.txt.gz