greekocr-1.0.1/0000755000175000017500000000000011635564234012663 5ustar dalitzdalitzgreekocr-1.0.1/TODO0000644000175000017500000000111611536114132013336 0ustar dalitzdalitzTasks for future versions of the GreekOCR toolkit ------------------------------------------------- - optionally enable grouping for the separatistic approach - add new wholistic recognition which attaches accents to characters with Gamera's grouping algorithm - do thorough tests for the different approaches and measure the performances on various documents - Why does the recognition result contain spurious additional lines? - integrate the improved attachment of diacritical signs to characters into the basic OCR toolkit - disambiguate between quotes and accents greekocr-1.0.1/MANIFEST.in0000644000175000017500000000071711530742470014420 0ustar dalitzdalitzrecursive-include src *.cpp *.c *.h makefile.* *.hpp *.hxx *.cxx *.txt ANNOUNCE CHANGES INSTALL KNOWNBUG LICENSE README TODO recursive-include include *.cpp *.c *.h makefile.* *.hpp *.hxx *.cxx *.txt ANNOUNCE CHANGES INSTALL KNOWNBUG LICENSE README TODO recursive-include scripts greekocr include ACKNOWLEDGEMENTS CHANGES TODO INSTALL LICENSE README KNOWN_BUGS MANIFEST.in version recursive-include doc *.txt *.html *.css *.py *.jpg *.jpeg *.png *.gif *.fig greekocr-1.0.1/LICENSE0000644000175000017500000003542311635556523013701 0ustar dalitzdalitz GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS greekocr-1.0.1/setup.py0000755000175000017500000000304111530744025014366 0ustar dalitzdalitz#!/usr/bin/env python from distutils.core import setup, Extension from gamera import gamera_setup # This constant should be the name of the toolkit TOOLKIT_NAME = "greekocr" VERSION = open("version", 'r').readlines()[0].strip() AUTHOR = "Christian Brandt and Christoph Dalitz" HOMEPAGE = "http://gamera.sourceforge.net/" DESCRIPTION = "An addon Greek OCR toolkit for the Gamera framework for document analysis and recognition." LICENSE = "GNU GPL version 2" # ---------------------------------------------------------------------------- # You should not usually have to edit anything below, but it is # implemented here and not in the Gamera core so that you can edit it # if you need to do something more complicated (for example, building # and linking to a third- party library). # ---------------------------------------------------------------------------- PLUGIN_PATH = 'gamera/toolkits/%s/plugins/' % TOOLKIT_NAME PACKAGE = 'gamera.toolkits.%s' % TOOLKIT_NAME PLUGIN_PACKAGE = PACKAGE + ".plugins" plugins = gamera_setup.get_plugin_filenames(PLUGIN_PATH) plugin_extensions = gamera_setup.generate_plugins(plugins, PLUGIN_PACKAGE) # This is a standard distutils setup initializer. If you need to do # anything more complex here, refer to the Python distutils documentation. setup(name=TOOLKIT_NAME, version=VERSION, license=LICENSE, url=HOMEPAGE, author=AUTHOR, description=DESCRIPTION, ext_modules = plugin_extensions, packages = [PACKAGE, PLUGIN_PACKAGE], scripts = ['scripts/greekocr4gamera.py']) greekocr-1.0.1/gamera/0000755000175000017500000000000011635564234014117 5ustar dalitzdalitzgreekocr-1.0.1/gamera/toolkits/0000755000175000017500000000000011635564234015767 5ustar dalitzdalitzgreekocr-1.0.1/gamera/toolkits/greekocr/0000755000175000017500000000000011635564234017570 5ustar dalitzdalitzgreekocr-1.0.1/gamera/toolkits/greekocr/singlediacritics.py0000644000175000017500000001374411635564123023470 0ustar dalitzdalitz# -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. from gamera.core import * from gamera.plugins.pagesegmentation import textline_reading_order from gamera.toolkits.ocr.ocr_toolkit import * from gamera.toolkits.ocr.classes import Textline,Page,ClassifyCCs import gamera.kdtree as kdtree import unicodedata import sys class SinglePage(Page): def lines_to_chars(self): subccs = self.img.sub_cc_analysis(self.ccs_lines) for i,segment in enumerate(self.ccs_lines): self.textlines.append(SingleTextline(segment, subccs[1][i])) class Character(object): def __init__(self, glyph): self.maincharacter = glyph self.unicodename = glyph.get_main_id() self.unicodename = self.unicodename.replace(".", " ").upper() #print self.unicodename #self.unicodename = self.combinedwith = [] #print self.maincharacter def addCombiningDiacritics(self, diacrit): self.combinedwith.append(diacrit) pass def toUnicodeString(self): try: str = u"" mainids = self.maincharacter.get_main_id().split(".and.") for char in mainids: if char == "skip" or char == "unclassified": continue str = str + u"%c" % return_char(char) #str = u"" + return_char(self.unicodename) for char in self.combinedwith: #char = char.get_main_id().replace(".", " ").upper() mainids = char.get_main_id().split(".and.") #print mainids for char in mainids: if char == "skip": continue #print "added %s to output" % char str = str + u"%c" % return_char(char) return unicodedata.normalize('NFD', str) except: #print self.unicodename return u"E" class SingleTextline(Textline): def sort_glyphs(self): self.glyphs.sort(lambda x,y: cmp(x.ul_x, y.ul_x)) #begin calculating threshold for word-spacing glyphs = [] for g in self.glyphs: if self.is_combining_glyph(g): continue glyphs.append(g) spacelist = [] total_space = 0 for i in range(len(glyphs) - 1): spacelist.append(glyphs[i + 1].ul_x - glyphs[i].lr_x) if(len(spacelist) > 0): threshold = median(spacelist) threshold = threshold * 2.0 else: threshold = 0 #end calculatin threshold for word-spacing self.words = chars_make_words(self.glyphs, threshold) def is_combining_glyph(self, glyph): ret = glyph.get_main_id().find("combining") != -1 return ret def to_string(self): k = 3 max_k = 10 output = "" for word in self.words: med_center = median([g.center.y for g in word]) glyphs_combining = [] characters = [] nodes_normal = [] skipids = ["manual.xi.upper", "manual.xi.lower", "manual.theta.outer", "_split.splitx", "skip"] for glyph in word: mainid = glyph.get_main_id() if skipids.count(mainid) > 0: continue elif mainid == "manual.xi.middle": glyph.classify_automatic("greek.capital.letter.xi") elif mainid == "manual.theta.inner": glyph.classify_automatic("greek.capital.letter.theta") elif mainid == "comma" or mainid == "combining.comma.above": #print "%s - center_y: %d - med_center: %d" % (mainid, glyph.center.y, med_center) if glyph.center.y > self.bbox.center.y: glyph.classify_automatic("comma") else: glyph.classify_automatic("combining.comma.above") elif mainid.find("manual") != -1 or mainid.find("split") != -1: continue if self.is_combining_glyph(glyph): glyphs_combining.append(glyph) else: c = Character(glyph) characters.append(c) #print c nodes_normal.append(kdtree.KdNode((glyph.center.x, glyph.center.y), c)) if (nodes_normal == None or len(nodes_normal) == 0): continue tree = kdtree.KdTree(nodes_normal) for g in glyphs_combining: fast = True if fast: knn = tree.k_nearest_neighbors((g.center.x, g.center.y), k) knn[0].data.addCombiningDiacritics(g) else: found = False while (not found) and k < max_k: knn = tree.k_nearest_neighbors((g.center.x, g.center.y), k) for nn in knn: if (nn.data.maincharacter.get_main_id().split(".").count("greek") > 0) and not found: nn.data.addCombiningDiacritics(g) found = True break k = k + 2 for c in characters: output = output + c.toUnicodeString() output = output + " " return output greekocr-1.0.1/gamera/toolkits/greekocr/compare.py0000644000175000017500000000601711635564123021571 0ustar dalitzdalitz# -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. import unicodedata import codecs def levenshtein(s1, s2): """Computes the Levenshtein distance (aka edit distance) between the two Unicode strings *s1* and *s2*. Signature: ``levenshtein(s1, s2)`` This implementation differs from the plugin *edit_distance* in the Gamera core in two points: - the Gamera core function is implemented in C++ and currently only works with ASCII strings - this implementation is written in pure Python and therefore somewhat slower, but it works with Unicode strings. For details about the algorithm see http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance """ if len(s1) < len(s2): return levenshtein(s2, s1) if not s1: return len(s2) previous_row = xrange(len(s2) + 1) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 # j+1 instead of j since previous_row and current_row are one character longer deletions = current_row[j] + 1 # than s2 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] def levenshtein_multi_unicode(str1, str2): #remove linebreaks str1 = str1.replace("\n", "") str2 = str2.replace("\n", "") #remove spaces str1 = str1.replace(" ", "") str2 = str2.replace(" ", "") str1_n = unicodedata.normalize("NFD", str1) str2_n = unicodedata.normalize("NFD", str2) return levenshtein(str1_n, str2_n), len(str1_n), len(str2_n) def errorrate(groundtruth, ocr): """For the two given Unicode strings, the edit distance divided by the length of the first string is returned. Signature: ``errorrate(groundtruth, ocr)`` """ errorcount, gtlength, ocrlength = levenshtein_multi_unicode(groundtruth, ocr) rate = float(errorcount) / gtlength print "Errorcount: %d" % errorcount print "Characters in GT: %d" % gtlength print "Characters in OCR: %d" % ocrlength print "Error Rate: %.2f %%" % (rate * 100) #print "=%5d %5d %5d %3.2f" % (errorcount, gtlength, ocrlength, rate*100) return rate greekocr-1.0.1/gamera/toolkits/greekocr/unicode_teubner.py0000644000175000017500000001742011635564123023315 0ustar dalitzdalitz#encoding: utf-8 # -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. import sys charactermap = { "GREEK CAPITAL LETTER ALPHA": "A","GREEK CAPITAL LETTER BETA": "B", "GREEK CAPITAL LETTER GAMMA": "G","GREEK CAPITAL LETTER DELTA": "D", "GREEK CAPITAL LETTER EPSILON": "E","GREEK CAPITAL LETTER ZETA": "Z", "GREEK CAPITAL LETTER ETA": "H","GREEK CAPITAL LETTER THETA": "J", "GREEK CAPITAL LETTER IOTA": "I","GREEK CAPITAL LETTER KAPPA": "K", "GREEK CAPITAL LETTER LAMDA": "L","GREEK CAPITAL LETTER MU": "M", "GREEK CAPITAL LETTER NU": "N","GREEK CAPITAL LETTER XI": "X", "GREEK CAPITAL LETTER OMICRON": "O","GREEK CAPITAL LETTER PI": "P", "GREEK CAPITAL LETTER RHO": "R","GREEK CAPITAL LETTER SIGMA": "C", "GREEK CAPITAL LETTER TAU": "T","GREEK CAPITAL LETTER UPSILON": "U", "GREEK CAPITAL LETTER PHI": "F","GREEK CAPITAL LETTER CHI": "Q", "GREEK CAPITAL LETTER PSI": "Y","GREEK CAPITAL LETTER OMEGA": "W", "GREEK SMALL LETTER ALPHA": "a","GREEK SMALL LETTER BETA": "b", "GREEK SMALL LETTER GAMMA": "g","GREEK SMALL LETTER DELTA": "d", "GREEK SMALL LETTER EPSILON": "e","GREEK SMALL LETTER ZETA": "z", "GREEK SMALL LETTER ETA": "h","GREEK SMALL LETTER THETA": "j","GREEK THETA SYMBOL": "j", "GREEK SMALL LETTER IOTA": "i","GREEK SMALL LETTER KAPPA": "k", "GREEK SMALL LETTER LAMDA": "l","GREEK SMALL LETTER MU": "m", "GREEK SMALL LETTER NU": "n","GREEK SMALL LETTER XI": "x", "GREEK SMALL LETTER OMICRON": "o","GREEK SMALL LETTER PI": "p", "GREEK SMALL LETTER RHO": "r","GREEK SMALL LETTER FINAL SIGMA": "c", "GREEK SMALL LETTER SIGMA": "s","GREEK SMALL LETTER TAU": "t", "GREEK SMALL LETTER UPSILON": "u","GREEK SMALL LETTER PHI": "f", "GREEK SMALL LETTER CHI": "q","GREEK SMALL LETTER PSI": "y", "GREEK SMALL LETTER OMEGA": "w", "SPACE": " ", "FULL STOP": ".", "COMMA": ",", "HYPHEN-MINUS": "-" } accentmap = [ ['\\`%c', ['combining.grave.accent']], ['\\\'%c', ['combining.acute.accent']], ['\\~%c', ['combining.greek.perispomeni']], ['\\"%c', ['combining.diaresis']], ['\\u{%c}', ['combining.breve']], ['\\U{%c%c}', ['combining.double.breve']], ['\\=%c', ['combining.overline']], ['\\r{%c}', ['combining.comma.above']], ['\\s{%c}', ['combining.reversed.comma.above']], ['\\Ad{%c}', ['combining.acute.accent', 'combining.diaresis']], ['\\Gd{%c}', ['combining.diaresis', 'combining.grave.accent']], ['\\Cd{%c}', ['combining.diaresis', 'combining.greek.perispomeni']], ['\\Ar{%c}', ['combining.acute.accent', 'combining.reversed.comma.above']], ['\\Gr{%c}', ['combining.grave.accent', 'combining.reversed.comma.above']], ['\\Cr{%c}', ['combining.greek.perispomeni', 'combining.reversed.comma.above']], ['\\As{%c}', ['combining.acute.accent', 'combining.comma.above']], ['\\Gs{%c}', ['combining.comma.above', 'combining.grave.accent']], ['\\Cs{%c}', ['combining.comma.above', 'combining.greek.perispomeni']], ['\\c{%c}', ['combining.inverted.breve.below']], ['\\ut{%cw}', ['combining.double.breve.below']], ['\\Ab{%c}', ['combining.acute.accent', 'combining.breve']], ['\\Gb{%c}', ['combining.breve', 'combining.grave.accent']], ['\\Arb{%c}', ['combining.acute.accent', 'combining.breve', 'combining.reversed.comma.above']], ['\\Grb{%c}', ['combining.breve', 'combining.grave.accent', 'combining.reversed.comma.above']], ['\\Asb{%c}', ['combining.acute.accent', 'combining.breve', 'combining.comma.above']], ['\\Gsb{%c}', ['combining.breve', 'combining.comma.above', 'combining.grave.accent']], ['\\Am{%c}', ['combining.acute.accent', 'combining.overline']], ['\\Gm{%c}', ['combining.grave.accent', 'combining.overline']], ['\\Cm{%c}', ['combining.greek.perispomeni', 'combining.overline']], ['\\Arm{%c}', ['combining.acute.accent', 'combining.overline', 'combining.reversed.comma.above']], ['\\Grm{%c}', ['combining.grave.accent', 'combining.overline', 'combining.reversed.comma.above']], ['\\Crm{%c}', ['combining.greek.perispomeni', 'combining.overline', 'combining.reversed.comma.above']], ['\\Asm{%c}', ['combining.acute.accent', 'combining.comma.above', 'combining.overline']], ['\\Gsm{%c}', ['combining.comma.above', 'combining.grave.accent', 'combining.overline']], ['\\Csm{%c}', ['combining.comma.above', 'combining.greek.perispomeni', 'combining.overline']], ['\\Sm{%c}', ['combining.comma.above', 'combining.overline']], ['\\Rm{%c}', ['combining.overline', 'combining.reversed.comma.above']], ['\\iS{%c}', ['combining.greek.ypogegrammeni']], ['\\d{%c}', ['combining.dot.below']], ['\\bd{%c}', ['combining.breve', 'combining.diaresis']], ['\\ring{%c}', ['combining.ring.below']] ] accentmap.sort(key=lambda s: s[1]) def unicode_to_teubner(unicode_text): """Returns the given unicode string to a LaTeX document body using the Teubner style for representing Greek characters and accents. Signature: ``unicode_to_teubner (unicode_text)`` The returned LaTeX code does not contain the LaTeX header. To create a complete LaTeX document, you can use the following code: .. code:: Python # LaTeX header print "\documentclass[10pt]{article}" print "\usepackage[polutonikogreek]{babel}" print "\usepackage[or]{teubner}" print "\\\\begin{document}" print "\selectlanguage{greek}" # document body print unicode_to_teubner(unicode_string) # LaTex footer print "\end{document}" """ import unicodedata output = u"" combinewith = [] maincharacter = None i = 0 while i < len(unicode_text): character = unicode_text[i] try: name_unicode = unicodedata.name(character) except: if character == "\n": output += " " name = name_unicode.lower() name = name.replace(" ", ".") if name.find("combining") == -1: #non-combining character if maincharacter != None and len(combinewith) > 0: #do lookup combinewith.sort() for format,combination in accentmap: if combination == combinewith: try: output += format % charactermap[maincharacter] except KeyError: sys.stderr.write("Teubner: Unknown character '%s'\n" % maincharacter) break maincharacter = None combinewith = [] elif maincharacter != None: try: output += charactermap[maincharacter] maincharacter = None except KeyError: sys.stderr.write("Teubner: Unknown character '%s'\n" % maincharacter) maincharacter = name_unicode else: #combining character if maincharacter != None: combinewith.append(name) else: output += "e" i += 1 return output if __name__ == "__main__": import unicodedata teststr = u"ἔθαψε, ὡς οἰκὸς ἦν" print unicode_to_teubner(unicodedata.normalize("NFD", teststr)) for a in accentmap: sort = sorted(a[1]) #unicodedata.normalize(u"aäb", "NFD")) greekocr-1.0.1/gamera/toolkits/greekocr/plugins/0000755000175000017500000000000011635564234021251 5ustar dalitzdalitzgreekocr-1.0.1/gamera/toolkits/greekocr/plugins/__init__.py0000644000175000017500000000020711511157302023344 0ustar dalitzdalitz# You need to have some sort of __init__.py file here # in order to import modules in this directory. # It is deliberately empty. greekocr-1.0.1/gamera/toolkits/greekocr/greekocr.py0000644000175000017500000001767611635564123021761 0ustar dalitzdalitz# -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. from gamera.core import * init_gamera() from gamera import knn from gamera.plugins import pagesegmentation from gamera.plugins.pagesegmentation import textline_reading_order from gamera.classify import ShapedGroupingFunction from gamera.plugins.image_utilities import union_images from gamera.toolkits.ocr.ocr_toolkit import * from gamera.toolkits.ocr.classes import Textline,Page,ClassifyCCs from gamera.gamera_xml import glyphs_to_xml from gamera.knn_editing import edit_cnn from gamera.toolkits.greekocr.singlediacritics import * from gamera.toolkits.greekocr.wholisticdiacritics import * import unicodedata import codecs def clean_classifier(cknn): glyphs = cknn.get_glyphs() print "old %d" % len(glyphs) sorted_glyphs = sorted(glyphs, key=lambda g: g.to_rle()) new_glyphs = [] last_rle = sorted_glyphs[0].to_rle() new_glyphs.append(sorted_glyphs[0]) for i in range(1, len(sorted_glyphs) -1): if last_rle != sorted_glyphs[i].to_rle(): new_glyphs.append(sorted_glyphs[i]) else: print sorted_glyphs[i].get_main_id() print "new after removing duplicates: %d" % len(new_glyphs) cknn.set_glyphs(new_glyphs) cknn = edit_cnn(cknn) print "new after cnn: %d" % len(cknn.get_glyphs()) return cknn class GreekOCR(object): """Provides the functionality for GreekOCR. The following parameters control the recognition process: **cknn** The kNNInteractive classifier. **mode** The mode for dealing with accents. Can be ``wholistic`` or ``separatistic``. """ def __init__(self, mode="wholistic"): """Signature: ``init (mode="wholistic")`` where *mode* can be "wholistic" or "separatistic". """ self.optimizeknn = False self.debug = False self.cknn = knn.kNNInteractive([], ["aspect_ratio", "volume64regions", "moments", "nholes_extended"], 0) self.autogroup = None self.output = "" self.mode = mode def load_trainingdata(self, trainfile): """Loads the training data. Signature: ``load_trainingdata (trainfile)`` where *trainfile* is an Gamera XML file containing training data. Make sure that the training file matches the *mode* (wholistic or separatistic). """ self.cknn.from_xml_filename(trainfile) if self.optimizeknn: self.cknn = clean_classifier(self.cknn) def segment_page(self): if(self.mode == "separatistic"): self.page = SinglePage(self.img) else: self.page = WholisticPage(self.img) if self.debug: print "start page segmentation..." t = time.time() self.page.segment() if self.debug: t = time.time() - t print "\t segmentation done [",t,"sec]" def get_page_glyphs(self, image): """Returns a list of segmented CCs using the selected segmentation approach on the given image. This list can be used for creating training data. Signature: ``get_page_glyphs (image)`` where *image* is a Gamera image. """ if image.data.pixel_type != ONEBIT: image = image.to_onebit() self.img = image self.segment_page() glyphs = [] for line in self.page.textlines: for g in line.glyphs: glyphs.append(g) return glyphs def save_debug_images(self): """Saves the following images to the current working directory: **debug_lines.png** Has a frame drawn around each detected line. **debug_chars.png** Has a frame drawn around each detected character. **debug_words.png** Has a frame drawn around each detected word. """ rgbfilename = "debug_lines.png" rgb = self.page.show_lines() rgb.save_PNG(rgbfilename) print "file '%s' written" % rgbfilename rgbfilename = "debug_chars.png" rgb = self.page.show_glyphs() rgb.save_PNG(rgbfilename) print "file '%s' written" % rgbfilename rgbfilename = "debug_words.png" rgb = self.page.show_words() rgb.save_PNG(rgbfilename) print "file '%s' written" % rgbfilename def classify_text(self): self.output = "" for line in self.page.textlines: line.glyphs = \ self.cknn.classify_and_update_list_automatic(line.glyphs) line.sort_glyphs() self.output = self.output + line.to_string() + "\n" self.output = self._normalize(self.output) def get_text(self): return self.output def process_image(self, image): """Recognizes the given image and returns the recognized text as Unicode string. Signature: ``process_image (image)`` where *image* is a Gamera image. The recognized text is additionally stored in the ``GreekOCR`` property *output*, which can subsequently be written to a file with save_text_unicode_ or save_text_teubner_. Make sure that you have called load_trainingdata_ before! """ if image.data.pixel_type != ONEBIT: image = image.to_onebit() self.img = image if self.debug: print "Doing page Segmentation" self.segment_page() if self.debug: print "Classifying Text" self.classify_text() if self.debug: print "Returning Output" return self.get_text() def save_text_xetex(self, filename): data = \ '''\documentclass[11pt]{article} \usepackage{xltxtra} \setmainfont[Mapping=tex-text]{GFS Porson} \\begin{document} %s \end{document}''' % self.output.replace("\n", "\n\n") f = codecs.open(filename, "w", encoding='utf-8') f.write(data) f.close() def save_text_unicode(self, filename): """Stores the recognized text to the given *filename* as Unicode string. Signature ``save_text_unicode(filename)`` Make sure that you have called process_image_ before! """ f = codecs.open(filename, "w", encoding='utf-8') f.write(self.output) f.close() def save_text_teubner(self, filename): """Stores the recognized text to the given *filename* as a LaTeX document utilizing the Teubner style for representing Greek characters and accents. Signature ``save_text_teubner(filename)`` Make sure that you have called process_image_ before! """ from unicode_teubner import unicode_to_teubner data = ''' \documentclass[10pt]{article} \usepackage[polutonikogreek]{babel} \usepackage[or]{teubner} \\begin{document} \selectlanguage{greek} %s \end{document} ''' % unicode_to_teubner(self.output).replace("\n", "\n\n") f = codecs.open(filename, "w", encoding='utf-8') f.write(data) f.close() def _normalize(self,str): str = unicodedata.normalize("NFD", str) output = u"" combined = [] for i in str: is_combining = True try: is_combining = unicodedata.combining(i) > 0 or unicodedata.name(i).find("ACCENT") >= 0 except: is_combining = False if not is_combining: for j in sorted(combined): output += j combined = [] output += i else: combined.append(i) if len(combined) > 0: for j in sorted(combined): output += j return unicodedata.normalize("NFD", output) greekocr-1.0.1/gamera/toolkits/greekocr/wholisticdiacritics.py0000644000175000017500000001464311635564123024213 0ustar dalitzdalitz# -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. from gamera.core import * from gamera.plugins.pagesegmentation import textline_reading_order from gamera.toolkits.ocr.ocr_toolkit import * from gamera.toolkits.ocr.classes import Textline,Page,ClassifyCCs import gamera.kdtree as kdtree import unicodedata import sys class WholisticPage(Page): def __init__(self, img): self.img = img #cknn = knn.kNNInteractive([], ["aspect_ratio", "volume64regions", "moments", "nholes_extended"], 0) #cknn.from_xml_filename("x01/classifier-all-2/classifier_glyphs.xml") #if(opt.ccsfilter): # the_ccs = ccs #else: the_ccs = img.cc_analysis() self.median_cc = int(median([cc.nrows for cc in the_ccs])) #autogroup = ClassifyCCs(cknn) #autogroup.parts_to_group = 3 #autogroup.grouping_distance = max([2,median_cc / 8]) Page.__init__(self, img)#, classify_ccs=autogroup) #print "autogrouping glyphs activated." #print "maximal autogroup distance:", autogroup.grouping_distance def lines_to_chars(self): self.textlines = self.get_line_glyphs(self.img, self.ccs_lines) def check_glyph_greek_accent(self, item,glyph): remove = [] add = [] result = [] if((glyph.ul_x == item.ul_x and glyph.ul_y == item.ul_y and glyph.lr_x == item.lr_x and glyph.lr_y == item.lr_y) or \ glyph.intersects_x(item) or \ (glyph.distance_bb(item) < 3 and \ (glyph.distance_cx(item) < (self.median_cc / 2) or 2*glyph.nrows < item.nrows or 2*item.nrows < glyph.nrows ) \ )\ ): ##nebeinander? # print "y" remove.append(glyph) remove.append(item) new = union_images([item,glyph]) add.append(new) result.append(add) #result[0] == ADD result.append(remove) #result[1] == REMOVE return result def get_line_glyphs(self,image,textlines): i=0 show = [] lines = [] ret,sub_ccs = image.sub_cc_analysis(textlines) #print "doc has %d lines" % len(sub_ccs) linenumber = 0 for ccs in sub_ccs: linenumber = linenumber + 1 #print "line %d" % linenumber line_bbox = Rect(textlines[i]) i = i + 1 glyphs = ccs[:] newlist = [] remove = [] add = [] result = [] glyphs.sort(lambda x,y: cmp(x.ul_x, y.ul_x)) #print "first run" for position, item in enumerate(glyphs): olditem = item left = max(0,position - 5) right = min(position + 5, len(glyphs)) checklist = glyphs[left:right] for glyph in checklist: if(item == glyph): continue result = self.check_glyph_greek_accent(item,glyph) if(len(result[0]) > 0): #something has been joind... item = result[0][0] #add.append(result[0][0]) #joind glyph remove.append(result[1][0]) #first part of joind one remove.append(result[1][1]) #second part of joind one if olditem != item: add.append(item) for elem in remove: if(elem in glyphs): glyphs.remove(elem) for elem in add: glyphs.append(elem) remove = [] add = [] glyphs = textline_reading_order(glyphs) glyphs = list(set(glyphs)) #print len(glyphs) new_line = WholisticTextline(line_bbox) final = [] if(len(glyphs) > 0): for glyph in glyphs: final.append(glyph) new_line.add_glyphs(final,False) #new_line.sort_glyphs() #reading order -- from left to right lines.append(new_line) for glyph in glyphs: show.append(glyph) return lines class WholisticTextline(Textline): #called after classification def sort_glyphs(self): self.glyphs = textline_reading_order(self.glyphs) #begin calculating threshold for word-spacing spacelist = [] for i in range(len(self.glyphs) - 1): spacelist.append(self.glyphs[i + 1].ul_x - self.glyphs[i].lr_x) if(len(spacelist) > 0): threshold = median(spacelist) threshold = threshold * 2.0 else: threshold = 0 #end calculatin threshold for word-spacing self.words = chars_make_words(self.glyphs, threshold) def to_string(self): k = 3 max_k = 10 output = u"" for word in self.words: characters = [] skipids = ["manual.xi.upper", "manual.xi.lower", "manual.theta.outer"] for glyph in word: mainid = glyph.get_main_id() if mainid == "comma" or mainid == "combining.comma.above": #print "%s - center_y: %d - med_center: %d" % (mainid, glyph.center.y, med_center) if glyph.center.y > self.bbox.center.y: glyph.classify_automatic("comma") else: glyph.classify_automatic("combining.comma.above") mainid = glyph.get_main_id() mainid = mainid.split(".and.") for a in mainid: char = return_char(a) #print "added %s to output" % char output = output + char#unicodedata.normalize('NFD', char) output = output + " " return output greekocr-1.0.1/gamera/toolkits/greekocr/__init__.py0000644000175000017500000000250511635564123021700 0ustar dalitzdalitz# -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. """ Toolkit setup This file is run on importing anything within this directory. Its purpose is only to help with the Gamera GUI shell, and may be omitted if you are not concerned with that. """ from gamera import toolkit from gamera.toolkits.greekocr import compare from gamera.toolkits.greekocr.greekocr import * # Let's import all our plugins here so that when this toolkit # is imported using the "Toolkit" menu in the Gamera GUI # everything works. greekocr-1.0.1/PKG-INFO0000644000175000017500000000051011635564234013754 0ustar dalitzdalitzMetadata-Version: 1.0 Name: greekocr Version: 1.0.1 Summary: An addon Greek OCR toolkit for the Gamera framework for document analysis and recognition. Home-page: http://gamera.sourceforge.net/ Author: Christian Brandt and Christoph Dalitz Author-email: UNKNOWN License: GNU GPL version 2 Description: UNKNOWN Platform: UNKNOWN greekocr-1.0.1/doc/0000755000175000017500000000000011635564234013430 5ustar dalitzdalitzgreekocr-1.0.1/doc/html/0000755000175000017500000000000011635564234014374 5ustar dalitzdalitzgreekocr-1.0.1/doc/html/functions.html0000644000175000017500000001256211535422120017262 0ustar dalitzdalitz GreekOCR Toolkit: Global Functions

GreekOCR Toolkit: Global Functions

Last modified: March 08, 2011

Contents

The toolkit defines a number of free function which are not image methods. These are defined in different modules and can be imported in a python script with

from gamera.toolkits.greekocr.compare import levenshtein, errorrate
from gamera.toolkits.greekocr.unicode_teubner import unicode_to_teubner

Conversion of Greek Unicode to LaTeX

unicode_to_teubner

Returns the given unicode string to a LaTeX document body using the Teubner style for representing Greek characters and accents. Signature:

unicode_to_teubner (unicode_text)

The returned LaTeX code does not contain the LaTeX header. To create a complete LaTeX document, you can use the following code:

# LaTeX header
print "\documentclass[10pt]{article}"
print "\usepackage[polutonikogreek]{babel}"
print "\usepackage[or]{teubner}"
print "\\begin{document}"
print "\selectlanguage{greek}"

# document body
print unicode_to_teubner(unicode_string)

# LaTex footer
print "\end{document}"

Measuring recognition rates with ground truth data

levenshtein

Computes the Levenshtein distance (aka edit distance) between the two Unicode strings s1 and s2. Signature:

levenshtein(s1, s2)

This implementation differs from the plugin edit_distance in the Gamera core in two points:

  • the Gamera core function is implemented in C++ and currently only works with ASCII strings
  • this implementation is written in pure Python and therefore somewhat slower, but it works with Unicode strings.

For details about the algorithm see http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance

errorrate

For the two given Unicode strings, the edit distance divided by the length of the first string is returned. Signature:

errorrate(groundtruth, ocr)
greekocr-1.0.1/doc/html/html4css1.css0000644000175000017500000001301511564155126016725 0ustar dalitzdalitz/* :Author: David Goodger :Contact: goodger@users.sourceforge.net :Date: $Date: 2007/08/10 14:44:25 $ :Revision: $Revision: 1.1 $ :Copyright: This stylesheet has been placed in the public domain. Default cascading style sheet for the HTML output of Docutils. See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to customize this style sheet. */ /* used to remove borders from tables and images */ .borderless, table.borderless td, table.borderless th { border: 0 } table.borderless td, table.borderless th { /* Override padding for "table.docutils td" with "! important". The right padding separates the table cells. */ padding: 0 0.5em 0 0 ! important } .first { /* Override more specific margin styles with "! important". */ margin-top: 0 ! important } .last, .with-subtitle { margin-bottom: 0 ! important } .hidden { display: none } a.toc-backref { text-decoration: none ; color: black } blockquote.epigraph { margin: 2em 5em ; } dl.docutils dd { margin-bottom: 0.5em } /* Uncomment (and remove this text!) to get bold-faced definition list terms dl.docutils dt { font-weight: bold } */ div.abstract { margin: 2em 5em } div.abstract p.topic-title { font-weight: bold ; text-align: center } div.admonition, div.attention, div.caution, div.danger, div.error, div.hint, div.important, div.note, div.tip, div.warning { margin: 2em ; border: medium outset ; padding: 1em } div.admonition p.admonition-title, div.hint p.admonition-title, div.important p.admonition-title, div.note p.admonition-title, div.tip p.admonition-title { font-weight: bold ; font-family: sans-serif } div.attention p.admonition-title, div.caution p.admonition-title, div.danger p.admonition-title, div.error p.admonition-title, div.warning p.admonition-title { color: red ; font-weight: bold ; font-family: sans-serif } /* Uncomment (and remove this text!) to get reduced vertical space in compound paragraphs. div.compound .compound-first, div.compound .compound-middle { margin-bottom: 0.5em } div.compound .compound-last, div.compound .compound-middle { margin-top: 0.5em } */ div.dedication { margin: 2em 5em ; text-align: center ; font-style: italic } div.dedication p.topic-title { font-weight: bold ; font-style: normal } div.figure { margin-left: 2em ; margin-right: 2em } div.footer, div.header { clear: both; font-size: smaller } div.line-block { display: block ; margin-top: 1em ; margin-bottom: 1em } div.line-block div.line-block { margin-top: 0 ; margin-bottom: 0 ; margin-left: 1.5em } div.sidebar { margin-left: 1em ; border: medium outset ; padding: 1em ; background-color: #ffffee ; width: 40% ; float: right ; clear: right } div.sidebar p.rubric { font-family: sans-serif ; font-size: medium } div.system-messages { margin: 5em } div.system-messages h1 { color: red } div.system-message { border: medium outset ; padding: 1em } div.system-message p.system-message-title { color: red ; font-weight: bold } div.topic { margin: 2em } h1.section-subtitle, h2.section-subtitle, h3.section-subtitle, h4.section-subtitle, h5.section-subtitle, h6.section-subtitle { margin-top: 0.4em } h1.title { text-align: center } h2.subtitle { text-align: center } hr.docutils { width: 75% } img.align-left { clear: left } img.align-right { clear: right } ol.simple, ul.simple { margin-bottom: 1em } ol.arabic { list-style: decimal } ol.loweralpha { list-style: lower-alpha } ol.upperalpha { list-style: upper-alpha } ol.lowerroman { list-style: lower-roman } ol.upperroman { list-style: upper-roman } p.attribution { text-align: right ; margin-left: 50% } p.caption { font-style: italic } p.credits { font-style: italic ; font-size: smaller } p.label { white-space: nowrap } p.rubric { font-weight: bold ; font-size: larger ; color: maroon ; text-align: center } p.sidebar-title { font-family: sans-serif ; font-weight: bold ; font-size: larger } p.sidebar-subtitle { font-family: sans-serif ; font-weight: bold } p.topic-title { font-weight: bold } pre.address { margin-bottom: 0 ; margin-top: 0 ; font-family: serif ; font-size: 100% } pre.literal-block, pre.doctest-block { margin-left: 2em ; margin-right: 2em ; background-color: #eeeeee } span.classifier { font-family: sans-serif ; font-style: oblique } span.classifier-delimiter { font-family: sans-serif ; font-weight: bold } span.interpreted { font-family: sans-serif } span.option { white-space: nowrap } span.pre { white-space: pre } span.problematic { color: red } span.section-subtitle { /* font-size relative to parent (h1..h6 element) */ font-size: 80% } table.citation { border-left: solid 1px gray; margin-left: 1px } table.docinfo { margin: 2em 4em } table.docutils { margin-top: 0.5em ; margin-bottom: 0.5em; background-color: #f7fffd; border-color: #72ada8; border: solid thin #aaaaaa; } table.footnote { border-left: solid 1px black; margin-left: 1px } table.docutils td, table.docutils th, table.docinfo td, table.docinfo th { padding-left: 0.5em ; padding-right: 0.5em ; vertical-align: top } table.docutils th.field-name, table.docinfo th.docinfo-name { font-weight: bold ; text-align: left ; white-space: nowrap ; } td.field-body, th.field-name { padding: 0.5em; border: solid thin #aaaaaa; } h1 tt.docutils, h2 tt.docutils, h3 tt.docutils, h4 tt.docutils, h5 tt.docutils, h6 tt.docutils { font-size: 100% } tt.docutils { } ul.auto-toc { list-style-type: none } greekocr-1.0.1/doc/html/pygments.css0000644000175000017500000000603011564155126016750 0ustar dalitzdalitz.c { color: #408080; font-style: italic } /* Comment */ .err { border: 1px solid #FF0000 } /* Error */ .k { color: #008000; font-weight: bold } /* Keyword */ .o { color: #666666 } /* Operator */ .cm { color: #408080; font-style: italic } /* Comment.Multiline */ .cp { color: #BC7A00 } /* Comment.Preproc */ .c1 { color: #408080; font-style: italic } /* Comment.Single */ .cs { color: #408080; font-style: italic } /* Comment.Special */ .gd { color: #A00000 } /* Generic.Deleted */ .ge { font-style: italic } /* Generic.Emph */ .gr { color: #FF0000 } /* Generic.Error */ .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .gi { color: #00A000 } /* Generic.Inserted */ .go { color: #808080 } /* Generic.Output */ .gp { color: #000080; font-weight: bold } /* Generic.Prompt */ .gs { font-weight: bold } /* Generic.Strong */ .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ .gt { color: #0040D0 } /* Generic.Traceback */ .kc { color: #008000; font-weight: bold } /* Keyword.Constant */ .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */ .kp { color: #008000 } /* Keyword.Pseudo */ .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */ .kt { color: #B00040 } /* Keyword.Type */ .m { color: #666666 } /* Literal.Number */ .s { color: #BA2121 } /* Literal.String */ .na { color: #7D9029 } /* Name.Attribute */ .nb { color: #008000 } /* Name.Builtin */ .nc { color: #0000FF; font-weight: bold } /* Name.Class */ .no { color: #880000 } /* Name.Constant */ .nd { color: #AA22FF } /* Name.Decorator */ .ni { color: #999999; font-weight: bold } /* Name.Entity */ .ne { color: #D2413A; font-weight: bold } /* Name.Exception */ .nf { color: #0000FF } /* Name.Function */ .nl { color: #A0A000 } /* Name.Label */ .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */ .nt { color: #008000; font-weight: bold } /* Name.Tag */ .nv { color: #19177C } /* Name.Variable */ .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */ .w { color: #bbbbbb } /* Text.Whitespace */ .mf { color: #666666 } /* Literal.Number.Float */ .mh { color: #666666 } /* Literal.Number.Hex */ .mi { color: #666666 } /* Literal.Number.Integer */ .mo { color: #666666 } /* Literal.Number.Oct */ .sb { color: #BA2121 } /* Literal.String.Backtick */ .sc { color: #BA2121 } /* Literal.String.Char */ .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */ .s2 { color: #BA2121 } /* Literal.String.Double */ .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */ .sh { color: #BA2121 } /* Literal.String.Heredoc */ .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */ .sx { color: #008000 } /* Literal.String.Other */ .sr { color: #BB6688 } /* Literal.String.Regex */ .s1 { color: #BA2121 } /* Literal.String.Single */ .ss { color: #19177C } /* Literal.String.Symbol */ .bp { color: #008000 } /* Name.Builtin.Pseudo */ .vc { color: #19177C } /* Name.Variable.Class */ .vg { color: #19177C } /* Name.Variable.Global */ .vi { color: #19177C } /* Name.Variable.Instance */ .il { color: #666666 } /* Literal.Number.Integer.Long */greekocr-1.0.1/doc/html/images/0000755000175000017500000000000011635564234015641 5ustar dalitzdalitzgreekocr-1.0.1/doc/html/images/who5.png0000644000175000017500000000151311564155126017226 0ustar dalitzdalitzPNG  IHDR'][rsRGBbKGD̿ pHYs  tIME QY IDATHǽKHTawF2)ŰXD-B=H7E"B-fO"0T= *ZFh=0#s㴘3{GWaf.|BHp64bx1(C<<av# 0 #{ to`!<XdġWŷyŏ:v*٩J>_1>GRQ5,$`HH&*1^ēU/02:i%Y䩍۩obA,+QpBsvgw {idZ, -!9F0.=ׯ# w xjٷJ 5p9G+帥Ex]#.>mjWjY^1asJej`BEw1 eښ6up40CAEUXU)rOϺu|D+g͉ύgU:l>i:I\}cIYi[*X[! <*^kG8Ra\< 68LE7,p!0 Jddo,4B Oc +wXxq#i8t*3Myzߝ>Ba kL(.{&GlzXF G<~7Κ 6agkEEz.CL;獽k6ucWYOqO+}WV PPPhm Kl3Zle{j04k^DM 8~zUbôIENDB`greekocr-1.0.1/doc/html/images/sep1.png0000644000175000017500000000101211564155126017206 0ustar dalitzdalitzPNG  IHDR#)rZsRGBbKGD̿ pHYs  tIME - [WIDATHǥ?kQ$BFZ芅`J,Lgagc`!6~mP!v*  ̵۰f7;1ww2[#:u pSwuAa ^Qg262 Y?Q2 `V{&Pj%]gx.%{i7WLIq"lN8sgRN,k-==!|K_>=%u^+@klB.~Z_)c p*v:sG^?G Yz/crQڇ Bx$V& !Oj.$ ݆+W?]}F&$}RKnk{hk %zIENDB`greekocr-1.0.1/doc/html/images/sep4.png0000644000175000017500000000044711564155126017224 0ustar dalitzdalitzPNG  IHDRWa[sRGBbKGD̿ pHYs  tIME 7,NpsIDATU1NBЧpc! $z+oc&\EBaOl( _+6ʭX8#C?_0ًNDx^7EOs?8NB S(CӤI0b8 ,Ajk\ ,rerTcRRXv]aݔ`d8 j/sC<4`p2̰Q8v Xe' }G*(1?e!ׂ/H (u $>.98ƷBȳjmRAȱ="]As=cB̹B $;f5釯Cz٬1.nч?t)e_ZBH6'JͭF'zzRT < 1`m fL64j/t?%]MFV"# Wm+g5-%);٫uaE U7XtwVG+hP \`yE0n-\ Zdг Ljh%_M{Y5C Ncu0LJߚ e~.3b&MIENDB`greekocr-1.0.1/doc/html/images/sep2.png0000644000175000017500000000121511564155126017214 0ustar dalitzdalitzPNG  IHDR#: 2sRGBbKGD̿ pHYs  tIME /"+o-IDAT8˅ϋQϽ3)AIIύXnS# eaP`BYIȯH;wx{ޙ{y9>>ʖ R;B("? 2hЊP*ؔ +9k{W+B+'fRP6 %Đ_5PP"A&$|&~֯i>bx{4@e{sw#@-$uv#\:Fܤf?i]̓ÇCY9%E~tYc6FluTAhgU[v :="Pw_}nܞW3w@USӶ.K3r+0w00rF@ZgɈMYTȭ@j|Fr4Ejt&͎Dquoh2r+t* 0S [[#"@T~'ΟOvhV?7; d?=ӯ:]iOݧ$#|{`}<;Ve9U,֢>;hyzV&d^OZA =2]wm-y(*>F}ٜ+ZHzbB_&NW|g[Ƿ}Oꇒ'`՜(,g|mb?`㵾Fŵ~Rej'WVRs%QX h/8RX<[-;Ga H[ev#^ex]1ԪѸ)$HD+ YG rArOMOI* ,a (3k{ig#}PDi3VdhڡlKga ("M\%dw8sR@@HXޕVk5Cԑy0r'~D^QC0Љy-TnLp1_~6%7NR8Jkl͛zF|;vj_EPAj1i0ޭ}9u}G[=(0ZP'%@V( @YRЁ߆@  *(:@" ?}P 0!@L`$r'( @( @ 0:  @C @Rq\EhN * `HMx` @P=kőf:P;E;: ᕅO7TL4P[|!<?: !/( Q~PH?F?qz @( @@( @@(R0 @(( @( qG(( @(( @( "_80y U^P@7a`3r?*@MW} xdpzb(Y2P RN@%H 8 R>zz/| { P$D @NPc^o.nĩɟqi 4:K},O@Mj^tt w|֚ë}a'==NG+eZ&5Oifk3m( RO[EPc-e'|tٵ; P$ ehɭgk vT<"[ @4E \5%ӑei;P3;o;*ZBaeŀڣ=Hp{([W_f2*cDi֫@XihÎ(Ynna} wGIO"H{p@VHP14$qfs3:{ʩ4em0y1,-grOg :9Dk苗ԠeY4 j,o{qq"{!JUi uwhV5fVyV(TT4/nuZK!Ug*ErjPѱ ϜSyYhJ_BTV5+ Gv, C\(I5Aϙ=wYY 4*THOI,`+-l,IŒ )D ڏ7Ш8OG)jVETx  8S(,9:rD*mcRExUI0`H9钑4 *bJ4$3afާXj#=ll &r+%HШ2$Iw)SX&{[Cʌ(z;Txa h"` `(Al,E0Ŵ7ɣEZa>@Ö,tћC*V"Y@Ic|OzAḁpY KYTL fV\{Uߟr|ūC|2רa8ut:hU6ƐM̃JDh>*T?5r٪²!X0wJ^Yi^nm d*߿+I$Ot1a@:X5= E)`e][oּVe&uЪ<%`d\sXیHS2l/h5tp/^>|nӖkb/c)1aUa" U4 jгlʯyzus+09h$F`;,Qk+UK5T6~W x?Y77ݟͪ%cf@R6EĒg FTZ 6a޳ʘ2'=]9) 1hXY B\k&p/kr}rᨡ.t `T9M}aF F3)x>\߯߿֏OT^`Zۻ}:O}7#֝ʐYWCߜ\N0f< j4wٍ=޼~n* &!n"QDx?<НϷ[Em^=GcfCVկJ7L Ch$roA&oͰS H+ahBm.n$IDY_ZWw-AKv`y1p?m%~TDgCrnӛ/Ocpw{POlnPLcBiLv=ЎMFE}OxwK5]1(6uN.F3rJ0ɖJT݋?>UynW,js7wӠ>P42ea/Fzno;~ty66?6S-f^yh;DinW7s"M :ԖmOI1m*5?Ʊiܦ[Ө`v>9h.x.uJi%` i/l P[ cշ`w0tM5ѦrO3IJxχS&X~8EJm:m[{<1mm=np&rD˱kՉ>E)fB,Z r;vAez7|dO#-n11A'$XDsi[;ǩӂg7irut M~bO#Ĭ1en0*pשh`sK>|˗Ghڇ_=dmм&acy02,jm>J>~-v>]\Yu2soc&Encg4#@E9Ǝg n{wmxރ^n¤1EhshfcSṫ>|z8GzoϾYea>8k]C7OFH4Gla7_}a֭/>[=Q}:W>@cU| sz+޿z6N7ZNq )m˿_>~3]?g?n{c%8 hܳ~Z6zU ~ T)qu{̛7pbb9uwyuډ0#?[JVD(н'mO~}}?t{ ft1? G}~Pct|K@ty/a8}FãOGNjmʑ^} c Wߛ.O돖zQG4?}?`_}_]^ޝFY~t>Oq_]cCT>/_,_?=̢m+K/<'aޘUK ->xsus/\Nb6á食vu՗6D C{Y`y룏>nm;H6* S0_N[3}M}niFڥ1%Tr/4aL-{ȏ/IwE1Gw_>w}'hhMppfBqm{ri-tjc%#~ fISb>{6z%,Űe% [l4O{/ 3p>Dhijl{ZU:aMkH`lFyIs'Ufk}m.FP`(WR¼T*y0hػ9M岜4Nr6TUd" @aT0CrcLdD5 /JZ j4ЪdVbξCV!'Tف0MF$@K^[ f!"J ܊wZ xHL08 'd̰ۺ4N"'RRY ې&SqOWE5/0y1z}l-EPTfϒH#fŘY]]!`^^-DW}B64p@#U0PN&&J*LQńiz iVT+IV]cC '/I(5kZ R iU֓bU̐#G +W+(id{+\ϻ?@fࠀATI2-3ʬ]wdŃIENDB`greekocr-1.0.1/doc/html/images/RGB_generic.png0000644000175000017500000014417411564155126020465 0ustar dalitzdalitzPNG  IHDR٘ pHYse IDATxSRɕ(?]s-j]/?ϏCu2" $;t 7~8މ_O_^? 5/ɴg\zrί_}jO>rΉR¡0D "E8DD8@o$SǧbY=[߽ߎ%D>yu("\>sRoLnƊw1iÏϧ)~ /:e,o>۽2A`?!q @@"d ˿Gx$I߾{o~;ܘ{=?)yçkćq__>Tk;w߿~aftaiקuIyozzoi* ]T q̂Z{T)Qvwû< ,LD]{bĭ#23u]kkϟ?nipH3 wea8L`yA)C<+Ct„ }\sinWP2a&DN)'IAnAD株kU{|kps3`Y2 B` B"^  < =+!F@ ޼Շ7wâ!ooہܝ{zAn$ދ2tL@Lĥ x׷w_ ATjjFeGR-Y 1 p(ܵwNHu]#a˵zjha^O\Lh]{#"Bs'"n$!pL#]{Eˆ$DAP<-`/?^QHQk?d}usǑe9=E L6Kt Lі\)Dn80Z+!.kKzx)<TL Vk.s]2F 0[y<g spBZ֖$$&x};w~4F03mr۱{||.4G&NB]9GK\hHI)m…իDnaI8uakr)7o8Q*JwֆaԶjoL$sT#"Bvq7|8D[2KJw& ۡjݻ2yڛ:J┐ZLN9QrA"3ED&B bsse]/_O_j5л=>eWg xh, L޽{{gLTקǗZGb f9쓪j75׫>y Hf f@KmkmnAasFC1Ӯ|$p3 gDBq[)~ SDC(ATpvHDj`A$mǗk'c‰E/V H82 @waBD9,,rvݑ҇oS8770 R)X繭{[u$@j΄t5Ոn咈,U3wT( vՀ raf@"bF6kP~y?=>Hc[{ͯ `P46eFL sv|>x}|?!s2S2]{/ϧfMy^s}Xfn#vϧ D:0CYpϏ!4v@R˴+YCDfڽjT;.sJ"l)( 5-'!&0!X=,Kq#q9g!4Em8?2HHRqO|ConqTٶ"y8e5'wr=\2@DDu]FʘoS4@ց #0Efݢ_V$ $$YژҔ@p3֛"n7Bp|j,B"".#.k)UV3[綞qܽ=8ahnhxh"D"D$"Fij2õjg||cYX$" ^=~xs[O~>Y<_tٍyK6vu3U/Z/FD@"!c!D-- i,FĈ,09qt8$> XR4zZo 8B! ۼDD4 IHJ~gۍ4 $f 7-8,۽6m]BՂ 0U-fI //L}xعYg,^nn ̘6mm@zIXRTn]3C$/ϟ^XxR1vr"@ ܽ]IQo f[cݞY.󂈦% ӈLDLD})if$³`ewDwon~7MȘ$Jv>s$i uVjm]Ja@"081:а?L7WRr@÷WPݬ.?%% ٗ,r lڏ!_T.(V[,eǜƜ4H4@r :O0wqp7"]wx[xp*DԻi[k3S)Mÿ~eLwԖ,Tc5kTY [u$sɉ?uuia>_*8GjfI,#`7q7rɁ,ea7m:_΢jaj:RNR28 G4Uc;,N).W?_U?ګCyn?ue?ت&%qyڏwR233K"0˼\!!"0Zw&Ad p&@433cR\uUGnd~./.fu}>1` `Ąj #LJ>@u}i 4Ӝc\_ BJիD]/ %rj^P7kMR-N5`4 EJN%^ӵ2xFxD1GU5].הpp߸"e4ͮ\;e @H)EKR8D\5?_.iȟ(1`:F$$dBL2oLp7s&b3ծz'i7n7AI)%5ZOJ9|OOz^m/Ops3 Ja7V@f%c~,D"shkz\ºy SGqSJ%19‰y:dSy= )m|ǔY5=$HD$XN;! }U/ާ}nߎu]j\RDz,z7bݯ/5<$Xv"2qx?QP;8"%n:u1_WwB@LZvF̧ FݘE" w5sS#"&W=,oyy>ymiN SҁڡaH {00qT2Ie(ĩ- Y$ !d]0uu~yz*9!"D/Uk؂݃dy3Ð!%q CYzˏObm~(B2M11ĵre_aέ̽1;abO6]k'ts @"b YP XPU/_>Sj˚i(4 yW$%"޴^ Sx)y<`yڱ0,DTʴ}mov}2H Dt֕KRw5c$ 4wu^p f9a77.y܍̛ _,z,D9ۍHlZ_v7|s=7Bݝ 67~o?~r]˥BI%R ,@dTLsAܼrBFt0sZ4{]g>dffb&XV{13B ́Y8i}:-/672 |b&a_X,6[)d!=]C*yujZ_e^R0x;?#[,nnMGFDش5Oϟ)0um)2ׅs0dK\UU1 " D (jnjtyZ"$nHoݽI(ۀ$+ z8dbʹ5b,9OӈDFA% `Q%z#smA"@L XDX$D)^}xrn0!PyE!1"avx}He Dj#H$ݺuuqR /a|S#2$!aPcBb"I_>]k뽃v1D$2G"FWfRv‘9"#0)R+B&u]{ג57jښ&f~Ńɻaw{jX7@4L"%#9s9sR$ C3GZ{뺙AUY#"l5YYKoq;~ͷw|8㓒q3xZkW=>%wRJ)'ߦ"Gl*F0Ѹ^OhW·@# `n{`(RL;׮9G_\ώ.9Ieo&AD6G'b㇠jږF) loqNRƔSS38n5iLԶ <7;ĨfDsr"F(1,#1w$DW63 dIn = 17Ṛ0&WEFxMXq:l$ۻqD# 0i2a$hkM1@ <6V]/,bhda13BpD@tw,^AK/(8L==>璉 Hm`"ܶcɱD"?l9wGGf!{D1=4ѦHftgAΩ@cSTm]땅sr3@t&2Y͢&$7CeB $JN߼*OxҮƌw{ LIs:fi圙X *3[Wjԗ릿K"̶ZՒYD)Ƿoޕi@BD!@PfH)bi6bfyi'0:@( ; 13Rbil`um6?yJɺ{s( xͯÖ"I)G)95yJr/s) D Dhrya^+nn8QqM {[ؼiDz@dwWӜS&J2?9DDfA4u9[2UHX fRrJD@nÐnnݘ"`m|n ( )v8&d&iJӁTM `:~9#oUe* laPU"{,L*yWod8¿;E`ۋ y0I"s.f #qp@uۨ|_/0 b#ll,,qXdvUG8`U X6̺jun |vǶ^>>K>}yso=KBw{|ui v0A"Ǽ$% ˽QRb4'D)EzaBL"X"T{o]{O9CfXo߼IL0lzF@"<2NC$ nR# #vLss"y4VsCvD IDAT`F" 훻U!}ڼpJ;f2?e>߄zpxF$Ǘڴw=BrZoR2˖})]9߿>{}eWvs1kiZ#Hrɻ19iyqtj]#T>}ݫ0~a$$HhL۝c n҄{lI-@ 21Yk6]ۺ Buë^Dy+D s4fwt-KRDnHk"]kEňNrFr(]KL;$2Le^1 + oM]SNKMcNHa_]1"CRY^өn|zY`xuD`󬪞sJB~yR3ܚC8}s~ĸ@7G"2303p30pb"@H)AI""7d$qwB$٘( jKgVU{P夨O>{|%[$ jfHHrB?Ǐ/ZON0|6ƻi(IuEpiu:zY9&o߽?%li p ty> z:ܽo隆"rN)sop ]kC#JJa]fji7Wǜϳ Ĺ"A)BuP 2tӗrnP=e?fDtZ lJzhj2kop'f"vMMTjm^qou~yY˟\onׯn{0RNL4hnp.Oϧu*L%L7iK~x!6 //SH~Ȃ εLE0^(܃8Ǘ?=05kvqI QX@&NHehuϏO֘p^)a7%'،WtR$XJ9pBÿbPM(|s znq3U !Ηpm3ο@ po>,tO׎n,Bu`S>={S@DI4fz0 ⼬9 @kֺ֗jkPJftDgҶmx8fn6IG);nGD[unW{Z뢚vq2J8!%IDMk46\_.߼ys{<]s?7Ǔk}YIƜ}ywܦ~}-4w͵-M޼\"11"[ HήƔ hkDJf5F mlUEoy?x~7wc"$L)oXV n֚~ ")"0!`Cy†n78!'撄e۞xܸ+9~ջ庸WOk?OH8)cU,}'Li&M;ŐXZUUU:\D0(D<݇72Ȍԥ 'bfBb5Wi@N׿-uYwa|qպ+5b4s0'5[b \KQ!^ aUպ*j@sQRa"4P = S&pF["m<L%ޭ멅c >OŠU ) 9HH bfv=R(КCni#)`Uo9U-`/P&N ȁZk\Χe:rj_f-y L?ˌ/HZo!23o>n_Uש6Қ[s Դb 2r7Mln! !"orzlM (*H@-ZC鞯7PH9! !Ũj`O1 PPeY2)88cd`S_؛NbbudSα>ͼ.D!9;U5{-у@]zޞT-#Ojw3nV!Iuu8j&bs83C"祕fbD>OH*/v1:GKPZ,$xnKN̻NKm/0Of Q!SrQ{:__tV0y u7>ǀHTK! "F0 m^tQ2rr6@3vwU3D$ \w4/糔mC^]\ԸՒR"3k9! !ԥ x.bVCd"p53PF]TS3{}yn!E0!Pȯf==GcfDZҕwy=n!;!⛈#C(Չ10ٯ_өfV4n= 9bŠcY\zhjnKiiRe4(zjQ;G(2|>q Wܼ51SW#!2nfm?ϧ:^PhIw姇1.+R4xPNv]_ȑT<*D[Y9-Q`,EH$ DBYz3uRLͶKǨ:pfX HE ?զǪ #yBZ<=KijݼMTf}6!Y{^R@dR8nob2Q?3s !HDHND1f^75O <r50]zT(֚괈(TmNMr0ee1G-r/kPHx#2~2Œ#3jHZ,&2c)LL5y!zvZCxYY88L)9țuʿǫjYBUEBFkbwujfʢse^^y5VUs՜p`*{5C 1{@եj a ̌!|pwcKmLjQzn;Zn D@ 1@E{f~9_{ ?}zy~ ҙ*)2S^q^!g8f$HH=b""y7UA:?H"PڟfOŚkDDߒ8@E4iR \vU B r:$uܬR<봴yn&6cY>pP+a?5=Z"LC1Խ.TYlwu7.{8^tRsd]O|:8~}[o60q@VLr:W<=XKQwej)n."<^OӴ֛w?|{. !037QwO9.s&UG3ApSi"h&B!<};k/Ո΁m`کnhBy˯/?O/?pnQ)^ N\2+1!0gSez(EVn)% kiehf&n^ lWvjS|~:N'{9kmnR _b1ev#냭tîoڊ4q&s)RH9v1 :{ySaCk/s}!&S&H4أELNPݦ~sحWc!BKIVR[[}#d6 IDATZN ܥx.]LnN9+0"y" Rd/ ,^^ƠuFye7קv=FQin73""d 7髅9hUZE2O5!cNI3)0 ߧ+exZjf̔!pb&iC2C/YR77HJCH)ӰݬRZŴGC=okY湜aW+, 1Q7IZZL/e{TM,4*X>ĔͧBbȎ ±ʟC`s8NU:C!Bf3.͑dabi|:!ԙPNMo_v f~w~8`JYNǐ̠֘bD^Td'|ie; t= )zv3w{RU`Ъ;VEP9pjӹR 9'1.c)W.W¹OKПC_5e^NOjt渻b ^}}zyy~Nc~n=<-:? fp3J6Os!!E "SnnfY<4G!JtL|e_z;ϭA.)P2MNSOÖjk\ (ĘB-֫HJ@`Cb-Rݼ(QLf~HU>-?|=o9U"3"z$]̋eفBHÊCtDSZPtyoro6W۫nϧr<@\Ԭ3"$BWYkr!#T4'& CK3p)w Dr: ?ͿæwXZKr:rz &'lRgZ˲OקZ[c9Kkp>_<޽s7. &2.HtZC,!KLnnǫu^-B J)*Ld͚ F$Khд'>0MMMe hm%oW" ;l`.yq9ܿ^Vk]VXYE 6 0v㘆̄)å^riaM/E?i˲L2i*yYze!2yV qA9id jtϵP_?S*zm.bLTӐjCf`<-4|>/sw#RRQicϳ#z R&_z wo>zLԁy*2y*gi9].7w?=<ܬ/_@Lp5Pf?+DTUi" bs˗ < clwwϻ&||e>_ iA,3;4(dP+ hUMD`~n~Z׏~ӐRu~= q{}!?Rk#f$NifD4o77cc)%f ڿ4 C^/$H\~}]`}_FVfښↇy²!bE50$D޿%~۫~a=@$4$-v7.Futpskڤ.x^M~Z h vãߝ c&bJ _>_DT͖R1wurԙ:!1u bfeF dH$׻Zz:u$r VsQ@DW#ެq{7j*R|!ej/O|<-Di 1Af,eZW^9_3p|s;ADH8{Rܚ)!8"?O\np$ð7[6Zgl+8RӉL9o)!LgS1)2_nm<?|<OØ#!;5 bҶH+ nbxHpo޽:<ĐhY@x9Itg8^=nPQVREDv|9Vi߭wa5=ۮj(!'s2{ 0?fN̩(S[ّEV*"CfpQ;sfGW5$7p_8p'B[}Ș# M Uc<-ײcyyQD3u: PQ3-U{ 䭒&?oי;bvY8 +[k"&wS\ ??s;ofTT @d"uQP fHΈ7֚gcYLVZzf# ð,U]jh@f?~s{};c_ROH"@i2)gZkjeO|:@׻]W֙qLMZ-m9=?ao_47ZеCqq*X'iHUpȜD@D2oFo9lKJU-9!AMOi|Ux>2HnL9EY[y+1RfJ)0nd. sM`E9 ER)t̉s $=00&UW~yHYFb`fzÇfbJ)a7?W/4%L:PRjmy^RRR8zܬCL!Meϧ|>Rrfꥶ/jy*yruwK1W4-RPU#r?*`oO;!q_sOoxZfB)bΉC n@ss13kMǮJ[S Km,0x$&pgsT9tt@{w{#Wsk‘!FJצkYtn90! C^!bӺjX<0sӯ6qwsg}| -<˼LӲ̥Dlv}~߮ۘsgȫe:en:@NbamK*b.?χۇ6)137"jYs К.sLØIk1NH sct[H:6i>a&Թ, BJ5QUkUs &$skE@n!RkbbMޖv:MbVzpt3D؍)G9WCkԿːCLcFp֬agY*Ÿ[\[t|nD~=_[Yi2kG+F6~%qzCyQ)4ǯ|߯6C_LvX~ݾD&`FfZD/M3C&3x$Hz9Dm1GS7k˼dfڠ:0TУvJ&@CvwtC|nT)<1 96r2ղԈn:_z+9hi8 &աo1EAqv/Tպ,h/K]̀$%Vhy7W}Q^զ1 1rdƀV,0ExfZmCNոz#:>7B>JH&Gf/Abtw$Ǘ~miX:C$@e|_?i]Z+/m7*Bd㫈:wn;6V[`?_jB3K)99 [9rzvjÐ"5"0TV*v9Rn٬j8DvHD&`!8[~VDqscjJH+D0Dc\".!4k3npj1 Wqi*bs7LRxs&ƞa{n{w_U@LXRekE,i(p3{pÌnfI思s0\JdV 2+_=Zڽ\; >x3#vuRulثVH1).V3yI<\D1B4sw5L]o!)8cه4!1eV *N.D/L9-Bjx.R&č?x_E*R10FjbZԴŘJlYBl6xAV4TόhZElݽ+fBt+E] _nϺRQSRdB\ߗ8VAE"##c6"49BzcwH 9K<]N?7r<_˱|<ʲ}vX WUr^by5<BԔf{sc{I0[)yL]C[oD`;)sҒI!r7]Be6Ty]&h9e4> ai9@}WhnV۷o7^uͪ v}+tV),KZ0A$!锊N1G sRr,YMvu]?rEKym2GS[J%>S1C$=W} if|9iǡ[Cb*0CH"Z\5EϚ# IDAT 9o \A9.R)`HT釟]|V̌כ{WJk^ T (E2W!siQnur8̗Ӳ˶ +(+5k7ӔӾh9!#NZNRi&]lYD mz]߷7]Pe쟎~))6!4mCe wMM"buKɒSGɱl%g) py.1m'OܽC\;" ;;_9 @N%)Ej3"3!IՁtTJ.|։:@0C3DFDHNLLr/i19-H!?V!Y+9̬mڒGfMB?|4&now;0R*JXWBEY_8Taغ !8Fۦ"ns#?51U MՋԨ% FTQVHlywwA)R`;} X96* LILsc>8GyB;aw{X;Nr9"cL8jmwawCK1RRIIR/IrBK)ȅ禷fq8} H!einO^v1L)AyG )$9{)9ٱi֥i4OR.tsU`fhz_F5ѵ)K]#T5/kUpL9K͖QVic0WzmNYey,ڶi["zQGZHDU#ŘLĀd{<"چ` WDu*1׼Lm1%tU"5 5E$2A,"`MUKM RDDr!'}ӭ_߷v7;bF*'x>}.gD@wk︮N1s.bDPD zwQϿD[;bϢjEC#6vhsqM5NPZ6LS\aM7W4@̌s+~ݮ+ MD2@UQE9IY%dќ l .c@TUywΩJ]Z˪?Sj\JU!1+.kUdVqyԷ CwuG/si+:EoWneH.R@$!ݮi%弘$BebL5 ek^˔KL m6i*ϧuιRr^xN#@zXy{* }7 =kScrJ~7.t}mn4/2<rV:VeV~\X١IeZ scAFDbv`VJ]'q/ݖ0 !,Ta,ܧ}P;4N?|4͛ŔiU y.˒4buf;[Jee)ah)6k7J.)A%,G[mRc9Be217C߬z_W@&b+$"n}^ކxJԂC1v.x %RbebT"04n̎{D\$b1"XÇ)Bچy&]]VjL#*޼ڽoV.tW%)#s5ZYqχqFt$Pۦnwmk8H̝sc҆5O+E~S.9e^#p 8"v"49rc7P.0Nxs,sV8zD)9咳d,,1h@UO^Q*2Vo̶ہn,r%ca(9}B㼯ݪW&U"";ݶ%OqH bƵmzݬn73-%}?8~~<^.Q }pumC@y!&H1eSEǪb<ydooooqQ_Au9$%_%Eb4ݦ]3撃mǘUBTc$"о7_OACԢ|>cF׷ͺ<;YtPpm&%C@Eͷm j7ˋ?<S|6nn ܖb?=zsNCdhӜrQ5lW^}*˂)f)fU`GKɂ!;L}6.RG9|s C-kaGjvA"BhIƇZȗ")SF&׺kAD%_TsIJ2IVBtlMvw۷oovog7_fx0|>/rVa٬j$"9eiVM[ۻW>CFd*NjO?1erk@|$?A17umZT₢KRc˘&9ݶ-m.&>y״t }:ﺡOU${uS8liLVoyY%kvR PNI .SE. }=8, ݶ|8N9A) PV*AkԶ1%jU-u*"<"JoyY2O9ϧa[4W25B d- 5%%AMV&YuaF`BGTr*) ɋt6]p}/?qxh_Oow;&b d L﻾ZU8פn:<~Ǚq1ۻy]Ӵ~=Pp:jn#&b]JwЙ#Db fտ}jC<N4w{k"1KWYow^AFc%=}zJ˲[+G܅R z1%_.nכ;,iAE#*\$e10~,]r(aws=$q8>14"|U5ˮ&$ vRTg-9W1 11bh()%unެw_mo^w]f 9Tj${bӴ<ܬv __9)BvA-E5bمJ/#AJq9.30&-c.~w[sf"RۛcT5'H 97+ײ6xo;x5 ԋk00vwthIQ'GWecZd9o~E@uVB/@*$˧j /Vpׯ-c)0}j6w!8dιsK^r6TLIr913HsT'+˒bsMA9>hw *[A`6SUq("ZK2O\%9㧹'*8Fo~߮ז \^Ej"=ܬ~C.ݗw "orJ>.E|͏imsʏir ۻts]/aowU~UfoWc\b5qFF:vLUlH̞_ F7cKܲL_zfYRr]kݠRbO*;a"CD&yHlfj u[Y9:G7_l|nspx?}j]ET{r_b]8rc.vӷ7SIUy?Z70N\e"Y LEĠ&A)ZwUZyw7+x:E_=jERgDv} L%Wh/K`hE;Zak%p#&Yq[j߼~m;f @ jfR^Ĭ5"?-;ngU~?7]鯏S8Tdj]omfX RW5h-/|9]NOq^c?|W7mrtٹZJYۻor\J+j%4.!kۮ8 @ V yoAuJMG˥pA[d}ŐuLMoݫ7Rt\y;8z[Y*D60WS j5`^nwg(R=5*lﮎ[4d皶y2Cf\6uD]^o7]"2jP34]UM7m 46Mh1-O>"qm5\qm/t헯6ۛ,ԔG~fεAD㇛ot滯z 7JzJ4OQXiop; slj:d .0.9G4/) CW_]}p*CBRӾ풖x|bb%i֘LA.SZ1frN)5MU%(RVskmc/z w>4URrIU R$DbC"\U_@-5͊wa߷)s;=TD+IJm>?EmjR_ilw&AmФ#;,xw}_mJu^F7@&шT, h;}߿o.?}:owhf9 |x:W1Ʋ}Wʱ-SDصJ/D%? z{arV| >/ ͆1.|:)&}>(eNSKf%v=׿IA# (Jlj vWۻWw"9y\>}6F"yBfps.`@**bDܵ  aE{'jK.7U #&v`8Eq֯_m#n*U(&b9\|y|zinj+@Υ~>L}^dJ>\ p;ii~v_/"V՗ye9/^slR=؋Ю$:įݶ*b$pIYӯt1+s\bx-m@Dծ$/Obj׽[жYc}Б--)jɖr&) c|;$k!MfRAȰHXՁ}5ƃmW劐M ̈]] KL^B"d+ż70k??_η/):ϯ֎YRf X1y?}$"OOGV}]?8*ѠfN2罹AE!a%X.EꉫaHqV*R$g8cВ%"ȲĪ]2MSIe\ac$+_mVmrtul@/zR*tN4~>>O9~ӯs7kx״|~~԰(ln1'$FB{bfc['Y\(}@G MT.2ڶCPU9AUj]`*AVC%M,1/)%6$v)iT 40:߶,7?q:؇_L IbyjoC@RH9>}X`ďeuP<榠#Np!\b?dޅmnmY㪫aNWFUqLC ,UDX0 RL8\p!m<Jf|*Ч䜋D/4OeypEmCfm/R3.x)?~>6M٭ޫcPMTIDXU `0tSF2Y7âx?nXi:/2\ #%rudyQr4縌H1 ʎBrޑQN9.s5Ƴs"3L b,&z&(]qXr9jZODDaš|~>]1;4.4:;YTSRcȍ`(-ҊUmNGI "0U)K1ojoݚ|f:% IJ"SqH۠zﺾ!G5w>\zQEi؇# c.IUX)DԹFԖq1-1?F&Xv߮iiHࡹxY4ۮ c23B GĜD 98IպݬW 4)rě5%ط7^ kz4O)-So)%iB8$%$XUsTD15$L(dޡAHG T'z R$IM 0 LLH@$%F9&Z0 $9>t<.(q-;|?חwxVJr8/qеl技V3+*iV fb jETt0i?4g/+a@UE1R[sUMSb*_4ۭtHJmBk(1IŘ9HEb"C &4m3/T ;F L,9e.cQ@DvDSg@928Ӹ]f7lnni=dxn{ͰEDt.hZҒcGR,a{c.GMsƜbX¤"I$I"\ԐKU]e9 x˒ųyNdRJV9(Z !(LYZrH. `CiQe%4iC>lu2.\ʮ hnBZ4ҋLyu@c4/xDz[QN>}bfjUτa^ g2FcqDm*дڮ2qͥSqr,LxnDbj`DPunR CT,1 VnhA$mBwx<>_www7[dw?=|h34?|yVCk:UIKhjL!C׶^,&RK*ɑFiIY ><,irsӭ0`MpsLWat:L8$r1Ike,t)FDD gPE҅r,8 4XT:ݥU$]A#cQPS0Z _ 00:F $10l9Pƃdi Q@4UQy9X])Bznn^3ZWc*ʂƮ02MQƝ?\YZc@g: IDATXJT(Re>=Ē!8vs96xfNkrKWW۸ĜS.j!4 SCS9`vu{׭VYӿ4 yv~w_6Cߴo׮LcY.UHJΉ.s6KY|`(Yq) "N9\ Qɪjه777 AWD"fK̹Ih$TRܰ  !-4$x8&S Wڦq}S fH8ry.1NrL?pT>rpp8G31h8"J)yYC&bhilfHJI U(8tfLZT(MCڢR- .`K5G$1$kLγw;6Z[rhzS*%xʹ<=);3;eIBDC#QknGPEƵ1j{+4dUچTt9"(3 Ym.QÇSZmzjw;.Ʋ|X.g0YJGZ,k=/e05Gmmַ<x9_?>OO4ܶK.GD%8M񰌗y<*mO䏧g83ճ HU*WCWDns>=~z 1R@ToC6U˪Z'Ս3!Z :$ ʆ!H"Tgen*|HU/7m{缚]cjCbf# &EwjPQ2Hx˜SS/C 5sLC^~<~jr*L`&(b4 Vj(X`r*uefyvYK]U_͉Z|??)FBٶSCpaJFeE9fL̊aAs"T@PБCa8$22GSeMBAQsyïM0 5my&d`&3NIQTMJ1n7Sg`^Mr%WVGH PUdͱCf&Y,"9 hthI ^6~fov(qm|x<5sR-yK \+s~sp{usƔRꙌe=hkѦ9R+K}:$݌ ω~ r^^?xlJcmN<%t9/#Gb̻! ]ݏGD8|{WdsnokUA @0Qy> ~:z:?r~~**P̑!LEa̔ vAW-/|xzk nb@IWWW^o|<'7ùp0i j|_>nۄ{W@SJ(BksH$H9S*!x#:*IG`pFGo=f06efzsz]_> .͐Z@H@IK uS& yhiws{uv&Ue&a(s J,ǦJ y;cFXrk1H fY #2 &5Fb hc@G'atMP" FǠKڍЫVBDq]MM×S׵5-_·,kӓjy??l7H<,43"ty3lԕno??cBx.r3K~nߗmNqXrD1i\}/ጜGa6GMtL25#At@"pIbāo=۫ۛOa Ø;.%PDD<6+gYe$LIA{9%150.@`$͈&YOmtt]nQX=-z0$U @ƻz;ΣhY 1<凷ww7F$ff@ǧ_qr9 {yrPeUQ,,$]npwtz ExTHfjaA\UkmͦQqn_jU[Y×z^( zÿo޼Dq0ũ w7DeBAr~zZBNӘ << 5F@#BCactƗ!>y3@z`nVݞ}3FmFH>sU^ɂBˡ RD̴wWwWVl-H!9ԣo{2'%`(ս1SBj_;_Hf^Nc&"!3'JYĚvS3%1Z)-g @D6㩖j)BiŘ{ &I|~3r U- (yufus! _X,LRw<&4R4]J53j b Cu5v3 S)7ԗR5"h|>ǵL??od!kM-qa]I(n@}N=׿~tWĒ8o$gI1_lnff'aBmA7LD '澸֢KQ@z)6i:,O>?Y=RZ٧Rj|4H(e9%yNˉ\ϕD`H~{{?w[mC< Lh""c9z]kUBC".#!7CBn{wDhe <zBnNoy.=!@HL2Zl%CbHIz鲙$Їc?tƗ Tm9D,U[=Q7Qv^j@ԦkQX@bʄ`=pp3īR}~\t-w#-x8mDW۫;&DSDy7cٝHc&~J̗šx\΋YV_K|nE[y?S}aQK{9FIh h;O7w߼oLza_  9?yQf3×ZpBJ zr|:>~;IRNDx  Hc^M` jӔEE9c0$@=~kaЉe>{X54ȼ@qnjR8` @[):MǤ֪9Bx#7sUD7 ,=;6%ܣZRZ4 p.KA7E m<`eQ wiHL4Ep"̃fzxRe޼8@I׻+Psf!ٻ#C![] C831B@OrO X8 /ހp7Es;Ú7f3dnfa-5 .AبUU71N=BM{6kQblͦ!10]Z/Ys$Ukffjͽ`jH㔾fӇ,.)6@MRH8vf"d$4WYc11qO(""a7ƴi۳7ۿyȟݯ #0zP8bCmOk]~y)}5me`dDaapEqWL   J(.%ί_}jhع,Oǧχ/_Ɵjq 3oM+gY;Cx"\K[J=Zs rf֍t#b91&p;W[^zjOCPj=Ava{3Z_ek]n}9zO›bXGk>9I9"fRͣ5;/WPTՙvb,OZKmMu2˜~;` iaĪƙ% 젪8?urtĞdK¤r7/$ IDAT.< l`fHDQUiTiV9!fOVs 3vyFG4Fy74!a|5Uӗe]ӊA! aٲީX4 1E8Rstw'cinwPfC0@R7pP%ST@(%,t#:y iN Yk\`{n}|~и# zU!y8n!;$6el7rY$ϟʹ.UNk0v!eq@B:;؅uXtE Vs 9Kb$ ],y< Wm vC㞈0 fto޹H0P@Y ln̄j֊\*m4䔧am7w']Kܣ8SpfDGh`kQ5!w0꾬6ANep ؃y8UOOg p}soÛ돟>?O'+b뢏?j3΃nm4#" wӅgt9?}||n9-Uۧ,^o< Lb˞/=絝`4L i[ɑ!!w2><j5Њ y4 |-"|9aP @ =5 Jd&HUNY$BbfJI _ioJ{&@σtXzBa[*LC朄h?O4n{%^;axf~ǧ'73C-f>g.%"ZFYum3eԅ%"B# k1 vקv0l&9_~&_#|-fA,Krje@A|w5Mx3m- 1RY×w,iu=<~~_6@y8 Ml) SS|(U ϟ'I#"R=njn50Vͪ/yϟ?Yi^m 4 O"B_KS$!vj6i@>tjJbd<2f`*T`f"$ ^4&b,|!  ܝ%Y $Q>=q3 ~iaj Rfoh7>ӏ???>!#yNݎ%KSKhΜ"bSO4d^+ͼ~s}~湇՟n]Lj6NB"_ (<-˧i3pb$ݘVwsߒ[Y|)Yh B'.EXyV/.)oK)aJmE:xԻkg/3vfD:_.و5QOd~鸬ꨑGsLD/n-D !Du@Hĉ+Y@iU$$DE(Db~}#ꖆ2_Lb ~gbÌDnڕZ͚&f3>}/>}t:Z7);jp,Bj$"'kUuG1 րüzW77{&/h`DI „o^>~+ƫZF$K8jzr<.4py{QYu]O_?}xyz,M%1ː(,Vk=8o%tK ڪcX93'f 'ĂGf&{8x ?qޥRX㫛*JOn!104!ڔ '@rB4D``laa*/k"铓fLBD qқ|w-֣@ν'wK9ptLI ic8[Z+u]LԔb}};]ϯ|X֦m;̉;7sZ'em!HL) f]nvfla"" "):C7nv߿>|?~:ЎC.`xj@@s @50PFAV{6xXJ*< ḌY?lv7w- __~p>[Xʐ-d ZI$ ,yJn}A@77u]Սx(DC&o -8 .4EG"PXf Ej}h RUܻu:j<8N09'cL)1 ]iČ|#"u6[,%a~/OO>5kifIr:>m]e]G*g~h@ϵUO~>Tonvn3%I",1nƜv{y>B5 P(kUmL G$`K/D@T0̄ (woOvz<5yp¾="Fl.ajml|أ"Lѥ53>`05moަ1" &ۛ{sey8^.!F ߐutV jA<;B4~B{HN@iI6ow{ba&tȅ!H4yP jk]|n||yi.k!.[К=G0C hx@HNEO65X9B(mf<c"f>Ga}nٍ-RNiUsI BNfuϬ77h<z)Խ*LBw~s ޿- 5=XBAW`aMM- 9]z3Ƌ]#[mK7)nۛۿ|M9E"+#F @؃=_ o53 "\6$F F@UEwkqNr(RjuGLĴ1{o7)EP^(:ԽS8x$ p2ᰁJq%Ln= M8t6άLrmWOϚYتsƈf!<|Y%3};O3 H-"]#znv\SJ$oJ___~>X]&+,W' 47/VŔz^>|~_v4tqW7|0(ge 4b~%>c6\)H2Yi9wҊyLv?mhe1ܗ Xy/_^ ujGZƳfҠѹ@}#pۆk֯x?Y{$2<^rCl79d"31|hYYj?oc6Š|;ʪfάcNq}|1ǜuao¦Vi!1mcnϟvl&%leu]WUKyOK﷫R-s*`{xG1݃գ#6r^?}|*U55eش4JLq\ESMu~_%cYFz6S-#@nU~ >x0=^7@DcΓ &kT˖* ٴ}0_3vv,Z3(Qro$%e*@F Sf~3p9|zδ"jQ6k#392YGG엍4aim\o{xn}|p\m tG`r YOľo~ٷJr"$'u=!z;vj#z#\̠( skO_>?~//u6R L dV0.u۞OឨdP`14Ѐ B93I.ׇcly,[5v/__~}z8ȗ|y;PX[|4݄re{fg8Kz`@(̳ƷC4b3blofYg.|O"iЦ357 /6v_I\n28ܶqlml#b Aw @W ̊)3MyNwwJv."b,(5clpn EyEVR5laKҚ{m9U1BUE]w.pO^^q͈z9f4`p>ٟzjWzX^,̷_wOׇٺۿ$064˗_~Ԯ|{{{=W<&Ϥ;b'46< n`BM@R b.Y6Ik@Έi8g;݃f#< m1>>}]/16.Ս1̪#<0DM MӕX xHZOeG 5Z4R^f %bfckAP%NȲ.qIݻT8n8VTնY6֐hEVmX9í<|y29+[6/qxzܮط5fd ;_NCe~y~|}y.*fi0{՝r*ct%p1x~y~~yضșyfeV9|z3 Ɛ6B4zf0sn˶_mxq0|_/]G#rH6SqU%b̙ v׊RI$[E R\% 2лBnÅ&qxiUv.#~AMЯr1`Ա4xxz:Tb@糵B͞Z"=Bqxr|~<5iUN>omRPїt+x=`'L3{cĦm 6;ASy1"~Ì%x<JQN5~'IDAT> ѧ2M]&_ Z. %F۶0{RS%YˁGO-b,-9LcD)u4 wVUK{xfV4:fp1B0_gm sI&*wjތ~*+Fk̬toݱMmۦ7,i%XS@UE?*tJ.B@v<ҜYhѕJS=/J>zaP*)risݞ>b`x77649 @D!IQ=Rf&%c]C蓭>GkUwo}]JVК)wEj؀1+#".;^ŃS o }kL]#@VvuTg˸*w1.إH7TgZk>PɱYFC6LR["`m5WRکVSsnzK]&BY @&%`nQ #ohEeeMZxZG03 +b9;n;jd-W=HHthkTHFu1 dp)Rj_b{eL ze(˴3L# 6b+WO?о_}7hU)oU)C_hz4;qD ͣeS.@9jFW?Ǖbᭌ,gKihf&`bE~GUfbstLf6DF5ԚZ#\#0smLu ɰZx_ *ZU͓m]Jݿoy܌Q!w,lZ?CVsͬkUK[E6;ϙdʬl.h"3ܴT9ȈMDb92&9`%bU5^@""Sxxujŀf#bӡ.M?hsn$4:c#]*a(~OѴ=ݒyy7̪ى{,"X1T&S~^*UU[Yd3 {]ލ.JԓuLp t]V]=xJm,!U!U*݄Ď* scH: LyUG|H7S@9yuH+"FV2Nx o+4BaVWr!$1܌Bhjt{TOغ&҅1E U=>'A16LVB.KEgHTX9k)⇄JQtwAU+ŠVdAk֊F2^2SP06ĕ)ҝ"U5ޥ+`p[߷:'6 VFrxZPHo UEx $ڂ wlKf(VlWnsYev_k܌lk=Wn7+737!NJpHӷ&^N~vӴ2 d4 4(Ak& PG#0# eV+v\ zk߾%6`P#쌼_?kSany," i;& _lܵ40 : jm@,C@h`te0Z"bQK, #p!;,Km7\X՘Q&InO64/^P:jhc ld`hwDi閫88)#T;'O5 2$W 6Z}N"/9F=&bfA rDUӗ&6 %}sg:[somJv^Xhܤ&B+1BYV82;pzɫoE0liOgfUEpMu0e'q;aZ 1CV>d}3OrnճõxQW3=2uL{A0V6Qנf|t=]]]ԍY5I.nmE Їe*a>nxQ6\}N(,%nU=SgQPSoSu3;C4h*Bkؖ'[񧌽8<)[wVPɨ߲+nnR'٭e ]M@O#y{&d!)zۅp: MV%Hus—$Ag&EI˾`lV/m\GFf@+)ᬢOJ[zڃ]=:.3o:p3g}8>j L粞Q:Fc+3VWS]IRfV& kͣݹRiIґ _NytS.IENDB`greekocr-1.0.1/doc/html/images/sep7.png0000644000175000017500000000030511564155126017220 0ustar dalitzdalitzPNG  IHDR`ulsRGBbKGD̿ pHYs  tIME 5'ZIIDAT}10їd b P7TT go+fG5m7oE%kUWd4GUErNBO \-ξPIENDB`greekocr-1.0.1/doc/html/images/OneBit_generic.png0000644000175000017500000000065211564155126021223 0ustar dalitzdalitzPNG  IHDRFd1{ pHYse\IDAT(J1 YB}>0Aha,{1xZKژ?+86io0v*\tSjB׵7ۦpJ'mr%q"t]+kkڜ-Cn9ϩIGyYQ1<+ Zlޖ9~Ta*s뙕-PsU0FNe{Eveٝ/'~Ժʉ׊8F}bW,lU%l2ES$H]t͏pʿBJl)IENDB`greekocr-1.0.1/doc/html/images/Grey16_generic.png0000644000175000017500000000031711564155126021116 0ustar dalitzdalitzPNG  IHDRSHXJ pHYseIDATh1 1 AN0* BϚ+,,,,,,,,,,,,,,,,,,,>lm;zIfi})kO̒̒̒̒̒̒̒̒̒%oZIENDB`greekocr-1.0.1/doc/html/images/overview.fig0000644000175000017500000000360611564155126020200 0ustar dalitzdalitz#FIG 3.2 Landscape Center Inches Letter 100.00 Single -2 1200 2 6 4650 3300 5850 4500 5 1 0 2 20 7 50 -1 -1 0.000 0 1 0 0 5250.000 3364.286 4800 4275 5235 4380 5700 4275 1 2 0 2 20 7 50 -1 -1 0.000 1 0.0000 5250 3525 450 150 4800 3375 5700 3675 2 1 0 2 20 7 50 -1 -1 0.000 0 0 -1 0 0 2 4800 3525 4800 4275 2 1 0 2 20 7 50 -1 -1 0.000 0 0 -1 0 0 2 5700 3525 5700 4275 -6 1 4 0 0 0 7 50 -1 -1 0.000 1 0.0000 9675 1912 155 155 9525 1875 9825 1950 2 2 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 975 1500 1725 1500 1725 2475 975 2475 975 1500 2 2 0 2 0 7 40 -1 20 0.000 0 0 -1 0 0 5 1125 1575 1875 1575 1875 2550 1125 2550 1125 1575 2 2 0 2 0 7 30 -1 20 0.000 0 0 -1 0 0 5 1275 1650 2025 1650 2025 2625 1275 2625 1275 1650 2 2 0 2 20 7 10 -1 20 0.000 0 0 -1 0 0 5 1575 1800 2325 1800 2325 2775 1575 2775 1575 1800 2 2 0 2 20 7 20 -1 20 0.000 0 0 -1 0 0 5 1425 1725 2175 1725 2175 2700 1425 2700 1425 1725 2 1 0 2 20 7 50 -1 -1 0.000 0 0 -1 1 0 3 1 1 2.00 120.00 240.00 1800 3000 1800 3900 4500 3900 2 2 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 7875 1650 8625 1650 8625 2625 7875 2625 7875 1650 2 1 0 2 -1 7 50 -1 -1 0.000 0 0 -1 1 0 2 1 1 2.00 120.00 240.00 2850 2175 7650 2175 2 1 0 2 -1 7 50 -1 -1 0.000 0 0 -1 1 0 2 1 1 2.00 120.00 240.00 5175 3150 5175 2400 2 2 0 2 0 7 40 -1 20 0.000 0 0 -1 0 0 5 8025 1725 8775 1725 8775 2700 8025 2700 8025 1725 2 2 0 2 0 7 30 -1 20 0.000 0 0 -1 0 0 5 8175 1800 8925 1800 8925 2775 8175 2775 8175 1800 4 0 0 50 -1 0 20 0.0000 4 255 885 1275 1200 Images\001 4 0 20 50 -1 0 20 0.0000 4 195 585 6000 4200 Data\001 4 0 20 50 -1 0 20 0.0000 4 255 1050 5850 3900 Training\001 4 0 -1 50 -1 0 20 0.0000 4 195 885 7800 1050 Text as\001 4 0 -1 50 -1 0 20 0.0000 4 210 1980 7800 1380 Unicode/LaTeX\001 4 0 20 50 -1 1 20 0.0000 4 255 1065 2325 3750 Training\001 4 0 0 50 -1 0 20 0.0000 4 195 1275 1125 900 Document\001 4 0 -1 50 -1 1 20 0.0000 4 270 1710 4575 1950 Classification\001 greekocr-1.0.1/doc/html/images/sep3.png0000644000175000017500000000150111564155126017213 0ustar dalitzdalitzPNG  IHDR6жsRGBbKGD̿ pHYs  tIME 1@IDATHǕ=\Uߛ]Eh&Xp-d-B  EmDFAPchm-AMc%"A$_{X| =s{&S'[3;+?YLBK[˳gNkN( qW2i?tɔm Ι꘵)pݴt![O 3ZVLKDeU}K B!JyZREZrP T}^e][ jÆ2i/Vi[ }-Zt3 a{U-8Z‘jo;y!|RC\XD`)d=,suSrB=Ua[X9Z.u=g#;v!~[S=hs\x`NY!5ʰ.VcC k@d8`Fϵv7$I-u·c|vHtF +,.[-wlm_8n#, ZnB!LիMA^M숚M&;5S, Classes greekocr-1.0.1/doc/html/default.css0000644000175000017500000001104411564155126016527 0ustar dalitzdalitz@import url(html4css1.css); @import url(pygments.css); body { margin: 2em 2em 2em 2em; background-color: #effffd; } a.toc-backref { text-decoration: none ; color: black } h1 { background-color: #e1f0ee; color: #29493c; border-top-color: #72ada8; border-top-style: solid; border-top-width: 4px } h2 { background-color: #e1f0ee; color: #29493c; border-top-color: #72ada8; border-top-style: solid; border-top-width: 2px } h3 { background-color: #e1f0ee; color: #29493c; border-top-color: #72ada8; border-top-style: solid; border-top-width: 1px } h4 { background-color: #e1f0ee; color: #29493c; border-top-color: #72ada8; border-top-style: solid; border-top-width: 1px } h5 { background-color: #e1f0ee; color: #29493c; border-top-color: #72ada8; border-top-style: solid; border-top-width: 0.5px } div.code-block, div.highlight { margin-left: 2em; margin-right: 2em; background-color: #f0f0e0; font-family: "Andale Mono", "Bitstream Vera Sans Mono", monospace; border-color: #e0e0d0; border-style: solid; border-width: 1px; font-size: 10pt; padding: 1em; } /* The following is for SilverCity syntax highlighting */ .code_default { FONT-FAMILY: "Andale Mono", "Bitstream Vera Sans Mono", monospace; FONT-SIZE: 10pt; } .c_character { color: olive; } .c_comment { color: green; font-style: italic; } .c_commentdoc { color: green; font-style: italic; } .c_commentdockeyword { color: navy; font-weight: bold; } .c_commentdockeyworderror { color: red; font-weight: bold; } .c_commentline { color: green; font-style: italic; } .c_commentlinedoc { color: green; font-style: italic; } .c_default { } .c_identifier { color: black; } .c_number { color: #009999; } .c_operator { color: black; } .c_preprocessor { color: navy; font-weight: bold; } .c_regex { color: olive; } .c_string { color: olive; } .c_stringeol { color: olive; } .c_uuid { color: olive; } .c_verbatim { color: olive; } .c_word { color: navy; font-weight: bold; } .c_word2 { color: navy; font-weight: bold; } .h_asp { color: #ffff00; } .h_aspat { color: #ffdf00; } .h_attribute { color: #008080; } .h_attributeunknown { color: #ff0000; } .h_cdata { color: #ffdf00; } .h_comment { color: #808000; } .h_default { } .h_doublestring { color: olive; } .h_entity { color: #800080; } .h_number { color: #009999; } .h_other { color: #800080; } .h_script { color: #000080; } .h_singlestring { color: olive; } .h_tag { color: #000080; } .h_tagend { color: #000080; } .h_tagunknown { color: #ff0000; } .h_xmlend { color: #0000ff; } .h_xmlstart { color: #0000ff; } .pl_array { color: black; } .pl_backticks { color: olive; } .pl_character { color: olive; } .pl_commentline { color: green; font-style: italic; } .pl_datasection { color: olive; } .pl_default { } .pl_error { color: red; font-weight: bold; } .pl_hash { color: black; } .pl_here_delim { color: olive; } .pl_here_q { color: olive; } .pl_here_qq { color: olive; } .pl_here_qx { color: olive; } .pl_identifier { color: black; } .pl_longquote { color: olive; } .pl_number { color: #009999; } .pl_operator { color: black; } .pl_pod { color: black; font-style: italic; } .pl_preprocessor { color: navy; font-weight: bold; } .pl_punctuation { color: black; } .pl_regex { color: olive; } .pl_regsubst { color: olive; } .pl_scalar { color: black; } .pl_string { color: olive; } .pl_string_q { color: olive; } .pl_string_qq { color: olive; } .pl_string_qr { color: olive; } .pl_string_qw { color: olive; } .pl_string_qx { color: olive; } .pl_symboltable { color: black; } .pl_word { color: navy; font-weight: bold; } .p_character { color: olive; } .p_classname { color: blue; font-weight: bold; } .p_commentblock { color: gray; font-style: italic; } .p_commentline { color: green; font-style: italic; } .p_default { } .p_defname { color: #009999; font-weight: bold; } .p_identifier { color: black; } .p_number { color: #009999; } .p_operator { color: black; } .p_string { color: olive; } .p_stringeol { color: olive; } .p_triple { color: olive; } .p_tripledouble { color: olive; } .p_word { color: navy; font-weight: bold; } .yaml_comment { color: #008800; font-style: italic; } .yaml_default { } .yaml_document { color: #808080; font-style: italic; } .yaml_identifier { color: navy; font-weight: bold; } .yaml_keyword { color: #880088; } .yaml_number { color: #880000; } .yaml_reference { color: #008888; } greekocr-1.0.1/doc/html/index.html0000644000175000017500000004053511564155127016376 0ustar dalitzdalitz GreekOCR toolkit for Gamera

GreekOCR toolkit for Gamera

Last modified: May 16, 2011

Editor:Christian Brandt, Christoph Dalitz
Version:1.0.0

Use the 'Addons' section on the Gamera home page for access to file releases of this toolkit.

Overview

The purpose of the GreekOCR Toolkit is to help building optical character recognition (OCR) systems for text documents with polytonal Greek text, i.e. classical Greek with a wide variety of accents. It can be used as is, but can also be used as a building block for implementing a custom OCR system for polytonal Greek.

The toolkit is based on and requires the Gamera framework for document analysis and recognition. Moreover it requires the OCR toolkit for Gamera. As an addon package for Gamera, it provides

  • a ready-to-run python script greekocr4gamera.py which acts as a basic GreekOCR-system
  • python library functions for building a custom GreekOCR system

Please note that the toolkit currently does not include any training data. This means that you must create a training data base of Greek characters before you can use the script greekocr4gamera.py.

Approaches for recognizing accents

Compared to texts with Roman letters or modern (or monotonal) Greek, classical (or polytonal) Greek uses a large number of accents that can be used in a wide range of combinations. Compared to the ordinary OCR process, this requires a special treatment both for attaching accents to characters and for recognizing the resulting combinations. From a general point of view, two different approaches are possible:

Wholistic approach:
Identify each character as a whole, including its accents. This approach requires that all possible character/accents combinations have been predefined and are present as samples in the training data.
Separatistic approach:
Identify core characters and accents separately and combine them subsequently. In this case, the training data contains the core characters and the individual accents.

This toolkit offers both possibilities. You must therefore make sure that your training data matches the chosen recognition approach.

Output code

The toolkit can generate the OCR result in two different codes:

  • Unicode as specified in the Unicode standards Greek (Unicode range 0370-03FF) and Combining Diacritical Marks (Unicode range 0300-036F).
  • LaTeX code with the Teubner style for representing polytonal Greek accents in combination with the Babel style option polutonikogreek.

The latter option is provided for generation of a human readable graphical representation as a Postscript or PDF file via LaTeX.

Limitations

As the segmentation of the individual characters is based on a connected component analysis, the toolkit cannot deal with touching characters, unless they have been trained as combinations. It is therefore in general only applicable to printed documents, rather than handwritten documents.

From a user's perspective, there are some points to beware in this toolkit:

  • It does not include methods for preprocessing like skew correction or noise removal. For this purpose, the standard routines shipped with Gamera must be used beforehand, e.g. rotation_angle_projections for skew correction, or despeckle for noise removal.
  • It does not provide prototypes of the Greek characters and accents. This means that characters must be trained on sample pages before using the toolkit.
  • The standard page segmentation algorithm for textline separation is currently very basic.

User's Manual

This documentation is written for those who want to use the toolkit for OCR, but are not interested in extending the toolkit itself.

Developer's Documentation

This documentation is for those who want to extend the functionality of the GreekOCR toolkit, or who want to write their own recognition script.

  • Classes: reference for the classes involved in the segmentation process. These are:
  • Functions: the global functions defined by the toolkit

Installation

We have only tested the toolkit on Linux and MacOS X, but as the toolkit is written entirely in Python, the following instructions should work for any operating system.

Prerequisites

First you will need a working installation of the following software:

  • Gamera 3.x, as available from the Gamera website. It is strongly recommended that you use a recent version, preferably from SVN.
  • The OCR toolkit for Gamera, as available from the "Addons" section of the Gamera website.

If you want to generate the documentation, you will need two additional third-party Python libraries:

  • docutils for handling reStructuredText documents.
  • pygments for colorizing source code.

Note

It is generally not necessary to generate the documentation because it is included in file releases of the toolkit.

Building and Installing

To build and install this toolkit, go to the base directory of the toolkit distribution and run the setup.py script as follows:

# 1) compile
python setup.py build

# 2) install
sudo python setup.py install

Command 1) compiles the toolkit from the sources and command 2) installs it. As the latter requires root privilegue, you need to use sudo on Linux and MacOS X. On Windows, sudo is not necessary.

Note that the script greekocr4gamera.py is installed into /usr/bin or /usr/local/bin on Linux and newer versions of MacOS X, but into /System/Library/Frameworks/Python.framework/Versions/2.x/bin on older MacOS X versions. As the latter directory is not in the standard search path, you could either add it to your search path, or install the scripts additionally into /usr/bin on MacOS X with:

# install scripts into standard path (older MacOS X, optional)
sudo python setup.py install_scripts -d /usr/bin

If you want to regenerate the documentation, go to the doc directory and run the gendoc.py script. The output will be placed in the doc/html/ directory. The contents of this directory can be placed on a webserver for convenient viewing.

Note

Before building the documentation you must install the toolkit. Otherwise gendoc.py will not find the plugin documentation.

Installing without root privileges

The above installation with python setup.py install will install the toolkit system wide and thus requires root privileges. If you do not have root access (Linux) or are no sudoer (MacOS X), you can install the GreekOcr toolkit into your home directory. Note however that this also requires that Gamera is installed into your home directory. It is currently not possible to install Gamera globally and only toolkits locally.

Here are the steps to install both Gamera and the OCR toolkit into ~/python:

# install Gamera locally
mkdir ~/python
python setup.py install --prefix=~/python

# build and install the OCR toolkit locally
export CFLAGS=-I~/python/include/python2.3/gamera
python setup.py build
python setup.py install --prefix=~/python

Moreover you should set the following environment variables in your ~/.profile:

# search path for python modules
export PYTHONPATH=~/python/lib/python

# search path for executables (eg. greekocr4gamera.py)
export PATH=~/python/bin:$PATH

Uninstallation

The installation uses the Python distutils, which do not support uninstallation. Thus you need to remove the installed files manually:

  • the installed Python library files of the toolkit
  • the installed standalone scripts

Python Library Files

All python library files of this toolkit are installed into the gamera/toolkits/greekocr subdirectory of the Python library folder. Thus it is sufficient to remove this directory for an uninstallation.

Where the python library folder is depends on your system and python version. Here are the folders that you need to remove on MacOS X and Debian Linux ("2.x" stands for the python version; replace it with your actual version):

  • MacOS X: /Library/Python/2.x/gamera/toolkits/greekocr
  • Debian Linux: /usr/lib/python2.x/site-packages/gamera/toolkits/greekocr

Standalone Scripts

The standalone scripts are installed into /usr/bin or /usr/local/bin (Linux) or /System/Library/Frameworks/Python.framework/Versions/2.x/bin (older MacOS X), unless you have explicitly chosen a different location with the options --prefix or --home during installation.

For an uninstall, remove the following script:

  • greekocr4gamera.py

About this documentation

The documentation was written by Christoph Dalitz. Permission is granted to copy, distribute and/or modify this documentation under the terms of the Creative Commons Attribution Share-Alike License (CC-BY-SA) v3.0. In addition, permission is granted to use and/or modify the code snippets from the documentation without restrictions.

greekocr-1.0.1/doc/html/usermanual.html0000644000175000017500000004340311535675613017445 0ustar dalitzdalitz GreekOCR Toolkit User's Manual

GreekOCR Toolkit User's Manual

Last modified: March 09, 2011

This documentation is for those who want to use the toolkit for polytonal Greek OCR, but are not interested in extending the toolkit itself.

Overview

The toolkit provides the functionality to segment an image page into text lines, words and characters, to sort them in reading-order, and to generate an output string.

Before you can use the OCR toolkit, you must first train characters from sample pages, which will then be used by the toolkit for classifying characters:

images/overview.png

Hence the proper use of this toolkit requires the following two steps:

  • training of sample characters on representative document images. This step is interactive and is done with the Gamera GUI, as described in the Gamera training tutorial
  • recognition of documents with the aid of this training data. This step usually runs automatically without user interaction. For this purpose, the tools from the present toolkit can be used.

There are two options to use this toolkit: you can either use the script greekocr4gamera.py as provided by the toolkit, or you can build your own recognition scripts with the aid of the python library functions provided by the toolkit. Both alternatives are described below.

Training

As explained in the GreekOCR toolkit overview, you must create different training data, depending on the approach for dealing with accents:

  • for the wholistic approach, you must train all possible (or frequent) combinations characters and accents
  • for the separatistic approach, you must train characters and accents separately

The wholistic approach has the disadvantage that the training data will generally be incomplete because rare combinations are unlikely to appear in the samples used for training. Moreover, it requires much more training effort. Depending on the documents under consideration, it might however be that the one or the other approach yields better results; testing both approaches might therefore pay off.

A list of CCs for training using the wholistic or separatistic algorithms on image can be created with:

from gamera.toolkits.greekocr import GreekOCR
from gamera import knn
classifier = knn.kNNInteractive()
g = GreekOCR("wholistic") #or separatistic
ccs = g.get_page_glyphs(image)
classifier.display(ccs, image)

Note

When accents frequently touch the characters, you should train these combinations even for the separatistic approach, because the glyph segmentation is based on a connected component analysis, which cannot split touching symbols.

Symbol names for "separatistic" recognition

For "separatistic" recognition, the characters and accents must be trained separately. The class names for the characters must correspond to the names in the Unicode table Greek, and the names for the accents must correspond to the Unicode table Combining Diacritical Marks. The latter typically start with the word COMBINING. For punctuation marks like "full stop", the names from the Unicode table Basic Latin can be used.

The following table lists some examples. For touching characters or accents, you can combine their Unicode names with AND, as in the following table demonstrated for the touching sigma and tau and the touching comma and acute:

Character Unicode Name(s) Class Name
images/sep1.png GREEK CAPITAL LETTER TAU greek.capital.letter.tau
images/sep2.png GREEK SMALL LETTER DELTA greek.small.letter.delta
images/sep4.png COMBINING GREEK PERISPOMENI combining.greek.perispomeni
images/sep5.png COMBINING COMMA ABOVE combining.comma.above
images/sep7.png HYPHEN-MINUS hyphen-minus
images/sep3.png
GREEK SMALL LETTER SIGMA,
GREEK SMALL LETTER TAU
greek.small.letter.sigma.and.greek.small.letter.tau
images/sep6.png
COMBINING COMMA ABOVE,
COMBINING ACUTE ACCENT
combining.comma.above.and.combining.acute.accent

Symbol names for "wholistic" recognition

For "wholistic" recognition, no isolated accents are trianed. In contrast, each character is trained in all occuring combinations with accents. The Unicode names of the character and the accents are concatenated with the word and, as shown in the following examples:

Character Class Name
images/who1.png greek.small.letter.alpha
images/who2.png greek.small.letter.alpha.and.combining.acute.accent
images/who3.png greek.small.letter.alpha.and.combining.comma.above
images/who4.png greek.small.letter.alpha.and.combining.comma.above.and.combining.acute.accent
images/who5.png greek.small.letter.alpha.and.combining.greek.perispomeni

The order of the accents in the class names is not important, because the accent order will be normalized automatically during the recognition process.

Using the script greekocr4gamera.py

The greekocr4gamera.py script takes an image and already trained data and segments the picture into single glyphs. The training-data is used to classify those glyphs and converts them into an output code. The output code can be a Unicode string or a LaTeX document utilizing the Teubner style. The output is written to standard-out or can optionally be stored in a file.

The end user application greekocr4gamera.py will be installed to /usr/bin or /usr/local/bin unless you habe explicitly chosen a different location. Its synopsis is:

greekocr4gamera.py -x <trainingdata> [options] <imagefile>

Options can be in short (one dash, one character) or long form (two dashes, string). When called with -h, -? or any other invalid option, a usage message will be printed. The valid options are:

-x trainingdata, --xml-file=trainingdata
This option is required. trainingdata must be an xml file created with Gamera's training dialog.
-u outfile, --unicode=outfile
Writes the Unicode output to outfile. When neither -u nor -t are specified, the unicode output is written to stdout.
-t outfile, --teubner=outfile
Writes the LaTeX output to outfile.
-s, --separatistic
Use the separatistic approach for recognition.
-w, --wholistic
Use the wholistic approach for recognition (default).
--deskew
Do a skew correction.
--filter
Filter out very large (images) and very small (noise) components.
--debug
Write images debug_lines.png, debug_words.png and debug_chars.png to working directory for debugging purposes.

Writing custom scripts

If you want to write your own scripts for recognition, you can use greekocr4gamera.py as a good starting point.

In Greek OCR functionality is implemented in the class GreekOCR, which must import at the beginning of your script:

from gamera.toolkits.greekocr import GreekOCR

After that you can instantiate a GreekOCR object and can recognize an image with the following methods:

g = GreekOCR()
g.mode = "wholistic"  # or "separatistic"
g.load_trainingdata("wholistic.xml")
image = load_image("imagefile.png")
output = g.process_image(image)
print output

This will print the Unicode result to stdout. To save it to a file either in Unicode or LaTeX with the Teubner style, use the following methods:

g.save_text_unicode("unicode-output.txt")
g.save_text_teubner("teubner-output.tex")

For more information on how to fine control the recognition process, see the developer's documentation.

greekocr-1.0.1/doc/html/gamera.toolkits.greekocr.greekocr.GreekOCR.html0000644000175000017500000001543411564155126025371 0ustar dalitzdalitz class GreekOCR

class GreekOCR

Last modified: May 16, 2011

GreekOCR

In module gamera.toolkits.greekocr.greekocr

Provides the functionality for GreekOCR. The following parameters control the recognition process:

cknn
The kNNInteractive classifier.
mode
The mode for dealing with accents. Can be wholistic or separatistic.

__init__

Signature:

init (mode="wholistic")

where mode can be "wholistic" or "separatistic".

load_trainingdata

Loads the training data. Signature:

load_trainingdata (trainfile)

where trainfile is an Gamera XML file containing training data. Make sure that the training file matches the mode (wholistic or separatistic).

get_page_glyphs

Returns a list of segmented CCs using the selected segmentation approach on the given image. This list can be used for creating training data. Signature:

get_page_glyphs (image)

where image is a Gamera image.

process_image

Recognizes the given image and returns the recognized text as Unicode string. Signature:

process_image (image)

where image is a Gamera image. The recognized text is additionally stored in the GreekOCR property output, which can subsequently be written to a file with save_text_unicode or save_text_teubner.

Make sure that you have called load_trainingdata before!

save_debug_images

Saves the following images to the current working directory:

debug_lines.png
Has a frame drawn around each detected line.
debug_chars.png
Has a frame drawn around each detected character.
debug_words.png
Has a frame drawn around each detected word.

save_text_unicode

Stores the recognized text to the given filename as Unicode string. Signature

save_text_unicode(filename)

Make sure that you have called process_image before!

save_text_teubner

Stores the recognized text to the given filename as a LaTeX document utilizing the Teubner style for representing Greek characters and accents. Signature

save_text_teubner(filename)

Make sure that you have called process_image before!

greekocr-1.0.1/doc/src/0000755000175000017500000000000011635564234014217 5ustar dalitzdalitzgreekocr-1.0.1/doc/src/gamera.toolkits.greekocr.greekocr.GreekOCR.txt0000644000175000017500000000054611564155126025065 0ustar dalitzdalitzclass ``GreekOCR`` ================== ``GreekOCR`` ------------ In module ``gamera.toolkits.greekocr.greekocr`` .. docstring:: gamera.toolkits.greekocr.greekocr GreekOCR :no_title: .. docstring:: gamera.toolkits.greekocr.greekocr GreekOCR __init__ load_trainingdata get_page_glyphs process_image save_debug_images save_text_unicode save_text_teubner greekocr-1.0.1/doc/src/usermanual.txt0000700000175000017500000002562011535675604017135 0ustar dalitzdalitz============================== GreekOCR Toolkit User's Manual ============================== This documentation is for those who want to use the toolkit for polytonal Greek OCR, but are not interested in extending the toolkit itself. Overview '''''''' The toolkit provides the functionality to segment an image page into text lines, words and characters, to sort them in reading-order, and to generate an output string. Before you can use the OCR toolkit, you must first train characters from sample pages, which will then be used by the toolkit for classifying characters: .. image:: images/overview.png Hence the proper use of this toolkit requires the following two steps: - training of sample characters on representative document images. This step is interactive and is done with the Gamera GUI, as described in the `Gamera training tutorial`__ - recognition of documents with the aid of this training data. This step usually runs automatically without user interaction. For this purpose, the tools from the present toolkit can be used. .. __: http://gamera.sourceforge.net/doc/html/training_tutorial.html There are two options to use this toolkit: you can either use the script ``greekocr4gamera.py`` as provided by the toolkit, or you can build your own recognition scripts with the aid of the python library functions provided by the toolkit. Both alternatives are described below. Training '''''''' As explained in the `GreekOCR toolkit overview`__, you must create different training data, depending on the approach for dealing with accents: - for the *wholistic* approach, you must train all possible (or frequent) combinations characters and accents - for the *separatistic* approach, you must train characters and accents separately .. __: index.html#approaches-for-recognizing-accents The wholistic approach has the disadvantage that the training data will generally be incomplete because rare combinations are unlikely to appear in the samples used for training. Moreover, it requires much more training effort. Depending on the documents under consideration, it might however be that the one or the other approach yields better results; testing both approaches might therefore pay off. A list of CCs for training using the *wholistic* or *separatistic* algorithms on *image* can be created with: .. code:: Python from gamera.toolkits.greekocr import GreekOCR from gamera import knn classifier = knn.kNNInteractive() g = GreekOCR("wholistic") #or separatistic ccs = g.get_page_glyphs(image) classifier.display(ccs, image) .. note:: When accents frequently touch the characters, you should train these combinations even for the *separatistic* approach, because the glyph segmentation is based on a connected component analysis, which cannot split touching symbols. Symbol names for "separatistic" recognition ------------------------------------------- For "separatistic" recognition, the characters and accents must be trained separately. The class names for the characters must correspond to the names in the Unicode table `Greek`_, and the names for the accents must correspond to the Unicode table `Combining Diacritical Marks`_. The latter typically start with the word ``COMBINING``. For punctuation marks like "full stop", the names from the Unicode table `Basic Latin`_ can be used. .. _Greek: http://unicode.org/charts/PDF/U0370.pdf .. _`Combining Diacritical Marks`: http://unicode.org/charts/PDF/U0300.pdf .. _`Basic Latin`: http://unicode.org/charts/PDF/U0000.pdf The following table lists some examples. For touching characters or accents, you can combine their Unicode names with ``AND``, as in the following table demonstrated for the touching *sigma* and *tau* and the touching *comma* and *acute*: +----------------------------+-------------------------------+-------------------------------------------------------+ | Character | Unicode Name(s) | Class Name | +============================+===============================+=======================================================+ | .. image:: images/sep1.png | ``GREEK CAPITAL LETTER TAU`` | ``greek.capital.letter.tau`` | +----------------------------+-------------------------------+-------------------------------------------------------+ | .. image:: images/sep2.png | ``GREEK SMALL LETTER DELTA`` | ``greek.small.letter.delta`` | +----------------------------+-------------------------------+-------------------------------------------------------+ | .. image:: images/sep4.png |``COMBINING GREEK PERISPOMENI``| ``combining.greek.perispomeni`` | +----------------------------+-------------------------------+-------------------------------------------------------+ | .. image:: images/sep5.png | ``COMBINING COMMA ABOVE`` | ``combining.comma.above`` | +----------------------------+-------------------------------+-------------------------------------------------------+ | .. image:: images/sep7.png | ``HYPHEN-MINUS`` | ``hyphen-minus`` | +----------------------------+-------------------------------+-------------------------------------------------------+ | .. image:: images/sep3.png | ``GREEK SMALL LETTER SIGMA``, |``greek.small.letter.sigma.and.greek.small.letter.tau``| | | ``GREEK SMALL LETTER TAU`` | | +----------------------------+-------------------------------+-------------------------------------------------------+ | .. image:: images/sep6.png | ``COMBINING COMMA ABOVE``, | ``combining.comma.above.and.combining.acute.accent`` | | | ``COMBINING ACUTE ACCENT`` | | +----------------------------+-------------------------------+-------------------------------------------------------+ Symbol names for "wholistic" recognition ---------------------------------------- For "wholistic" recognition, no isolated accents are trianed. In contrast, each character is trained in all occuring combinations with accents. The Unicode names of the character and the accents are concatenated with the word ``and``, as shown in the following examples: +----------------------------+---------------------------------------------------------------------------------------+ | Character | Class Name | +============================+=======================================================================================+ | .. image:: images/who1.png | ``greek.small.letter.alpha`` | +----------------------------+---------------------------------------------------------------------------------------+ | .. image:: images/who2.png | ``greek.small.letter.alpha.and.combining.acute.accent`` | +----------------------------+---------------------------------------------------------------------------------------+ | .. image:: images/who3.png | ``greek.small.letter.alpha.and.combining.comma.above`` | +----------------------------+---------------------------------------------------------------------------------------+ | .. image:: images/who4.png | ``greek.small.letter.alpha.and.combining.comma.above.and.combining.acute.accent`` | +----------------------------+---------------------------------------------------------------------------------------+ | .. image:: images/who5.png | ``greek.small.letter.alpha.and.combining.greek.perispomeni`` | +----------------------------+---------------------------------------------------------------------------------------+ The order of the accents in the class names is not important, because the accent order will be normalized automatically during the recognition process. Using the script ``greekocr4gamera.py`` ''''''''''''''''''''''''''''''''''''''' The *greekocr4gamera.py* script takes an image and already trained data and segments the picture into single glyphs. The training-data is used to classify those glyphs and converts them into an output code. The output code can be a Unicode string or a LaTeX document utilizing the `Teubner style`_. The output is written to standard-out or can optionally be stored in a file. .. _`Teubner style`: http://www.ctan.org/tex-archive/macros/latex/contrib/teubner/ The end user application *greekocr4gamera.py* will be installed to ``/usr/bin`` or ``/usr/local/bin`` unless you habe explicitly chosen a different location. Its synopsis is:: greekocr4gamera.py -x [options] Options can be in short (one dash, one character) or long form (two dashes, string). When called with ``-h``, ``-?`` or any other invalid option, a usage message will be printed. The valid options are: ``-x`` *trainingdata*, ``--xml-file``\ =\ *trainingdata* This option is required. *trainingdata* must be an xml file created with `Gamera's training dialog`__. .. __: http://gamera.sourceforge.net/doc/html/training_tutorial.html ``-u`` *outfile*, ``--unicode``\ =\ *outfile* Writes the Unicode output to *outfile*. When neither ``-u`` nor ``-t`` are specified, the unicode output is written to stdout. ``-t`` *outfile*, ``--teubner``\ =\ *outfile* Writes the LaTeX output to *outfile*. ``-s``, ``--separatistic`` Use the separatistic approach for recognition. ``-w``, ``--wholistic`` Use the wholistic approach for recognition (default). ``--deskew`` Do a skew correction. ``--filter`` Filter out very large (images) and very small (noise) components. ``--debug`` Write images *debug_lines.png*, *debug_words.png* and *debug_chars.png* to working directory for debugging purposes. Writing custom scripts '''''''''''''''''''''' If you want to write your own scripts for recognition, you can use ``greekocr4gamera.py`` as a good starting point. In Greek OCR functionality is implemented in the class GreekOCR__, which must import at the beginning of your script: .. __: gamera.toolkits.ocr.classes.GreekOCR.html .. code:: Python from gamera.toolkits.greekocr import GreekOCR After that you can instantiate a *GreekOCR* object and can recognize an image with the following methods: .. code:: Python g = GreekOCR() g.mode = "wholistic" # or "separatistic" g.load_trainingdata("wholistic.xml") image = load_image("imagefile.png") output = g.process_image(image) print output This will print the Unicode result to stdout. To save it to a file either in Unicode or LaTeX with the Teubner style, use the following methods: .. code:: Python g.save_text_unicode("unicode-output.txt") g.save_text_teubner("teubner-output.tex") For more information on how to fine control the recognition process, see the `developer's documentation`__. .. __: index.html#developer-s-documentation greekocr-1.0.1/doc/src/html4css1.css0000700000175000017500000001301511535422107016533 0ustar dalitzdalitz/* :Author: David Goodger :Contact: goodger@users.sourceforge.net :Date: $Date: 2007/08/10 14:44:25 $ :Revision: $Revision: 1.1 $ :Copyright: This stylesheet has been placed in the public domain. Default cascading style sheet for the HTML output of Docutils. See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to customize this style sheet. */ /* used to remove borders from tables and images */ .borderless, table.borderless td, table.borderless th { border: 0 } table.borderless td, table.borderless th { /* Override padding for "table.docutils td" with "! important". The right padding separates the table cells. */ padding: 0 0.5em 0 0 ! important } .first { /* Override more specific margin styles with "! important". */ margin-top: 0 ! important } .last, .with-subtitle { margin-bottom: 0 ! important } .hidden { display: none } a.toc-backref { text-decoration: none ; color: black } blockquote.epigraph { margin: 2em 5em ; } dl.docutils dd { margin-bottom: 0.5em } /* Uncomment (and remove this text!) to get bold-faced definition list terms dl.docutils dt { font-weight: bold } */ div.abstract { margin: 2em 5em } div.abstract p.topic-title { font-weight: bold ; text-align: center } div.admonition, div.attention, div.caution, div.danger, div.error, div.hint, div.important, div.note, div.tip, div.warning { margin: 2em ; border: medium outset ; padding: 1em } div.admonition p.admonition-title, div.hint p.admonition-title, div.important p.admonition-title, div.note p.admonition-title, div.tip p.admonition-title { font-weight: bold ; font-family: sans-serif } div.attention p.admonition-title, div.caution p.admonition-title, div.danger p.admonition-title, div.error p.admonition-title, div.warning p.admonition-title { color: red ; font-weight: bold ; font-family: sans-serif } /* Uncomment (and remove this text!) to get reduced vertical space in compound paragraphs. div.compound .compound-first, div.compound .compound-middle { margin-bottom: 0.5em } div.compound .compound-last, div.compound .compound-middle { margin-top: 0.5em } */ div.dedication { margin: 2em 5em ; text-align: center ; font-style: italic } div.dedication p.topic-title { font-weight: bold ; font-style: normal } div.figure { margin-left: 2em ; margin-right: 2em } div.footer, div.header { clear: both; font-size: smaller } div.line-block { display: block ; margin-top: 1em ; margin-bottom: 1em } div.line-block div.line-block { margin-top: 0 ; margin-bottom: 0 ; margin-left: 1.5em } div.sidebar { margin-left: 1em ; border: medium outset ; padding: 1em ; background-color: #ffffee ; width: 40% ; float: right ; clear: right } div.sidebar p.rubric { font-family: sans-serif ; font-size: medium } div.system-messages { margin: 5em } div.system-messages h1 { color: red } div.system-message { border: medium outset ; padding: 1em } div.system-message p.system-message-title { color: red ; font-weight: bold } div.topic { margin: 2em } h1.section-subtitle, h2.section-subtitle, h3.section-subtitle, h4.section-subtitle, h5.section-subtitle, h6.section-subtitle { margin-top: 0.4em } h1.title { text-align: center } h2.subtitle { text-align: center } hr.docutils { width: 75% } img.align-left { clear: left } img.align-right { clear: right } ol.simple, ul.simple { margin-bottom: 1em } ol.arabic { list-style: decimal } ol.loweralpha { list-style: lower-alpha } ol.upperalpha { list-style: upper-alpha } ol.lowerroman { list-style: lower-roman } ol.upperroman { list-style: upper-roman } p.attribution { text-align: right ; margin-left: 50% } p.caption { font-style: italic } p.credits { font-style: italic ; font-size: smaller } p.label { white-space: nowrap } p.rubric { font-weight: bold ; font-size: larger ; color: maroon ; text-align: center } p.sidebar-title { font-family: sans-serif ; font-weight: bold ; font-size: larger } p.sidebar-subtitle { font-family: sans-serif ; font-weight: bold } p.topic-title { font-weight: bold } pre.address { margin-bottom: 0 ; margin-top: 0 ; font-family: serif ; font-size: 100% } pre.literal-block, pre.doctest-block { margin-left: 2em ; margin-right: 2em ; background-color: #eeeeee } span.classifier { font-family: sans-serif ; font-style: oblique } span.classifier-delimiter { font-family: sans-serif ; font-weight: bold } span.interpreted { font-family: sans-serif } span.option { white-space: nowrap } span.pre { white-space: pre } span.problematic { color: red } span.section-subtitle { /* font-size relative to parent (h1..h6 element) */ font-size: 80% } table.citation { border-left: solid 1px gray; margin-left: 1px } table.docinfo { margin: 2em 4em } table.docutils { margin-top: 0.5em ; margin-bottom: 0.5em; background-color: #f7fffd; border-color: #72ada8; border: solid thin #aaaaaa; } table.footnote { border-left: solid 1px black; margin-left: 1px } table.docutils td, table.docutils th, table.docinfo td, table.docinfo th { padding-left: 0.5em ; padding-right: 0.5em ; vertical-align: top } table.docutils th.field-name, table.docinfo th.docinfo-name { font-weight: bold ; text-align: left ; white-space: nowrap ; } td.field-body, th.field-name { padding: 0.5em; border: solid thin #aaaaaa; } h1 tt.docutils, h2 tt.docutils, h3 tt.docutils, h4 tt.docutils, h5 tt.docutils, h6 tt.docutils { font-size: 100% } tt.docutils { } ul.auto-toc { list-style-type: none } greekocr-1.0.1/doc/src/classes.txt0000644000175000017500000000026211564155126016412 0ustar dalitzdalitz======= Classes ======= Alphabetical ------------- **G** GreekOCR_ (gamera.toolkits.greekocr.greekocr.GreekOCR) .. _GreekOCR: gamera.toolkits.greekocr.greekocr.GreekOCR.htmlgreekocr-1.0.1/doc/src/images/0000755000175000017500000000000011635564234015464 5ustar dalitzdalitzgreekocr-1.0.1/doc/src/images/who5.png0000644000175000017500000000151311533703222017041 0ustar dalitzdalitzPNG  IHDR'][rsRGBbKGD̿ pHYs  tIME QY IDATHǽKHTawF2)ŰXD-B=H7E"B-fO"0T= *ZFh=0#s㴘3{GWaf.|BHp64bx1(C<<av# 0 #{ to`!<XdġWŷyŏ:v*٩J>_1>GRQ5,$`HH&*1^ēU/02:i%Y䩍۩obA,+QpBsvgw {idZ, -!9F0.=ׯ# w xjٷJ 5p9G+帥Ex]#.>mjWjY^1asJej`BEw1 eښ6up40CAEUXU)rOϺu|D+g͉ύgU:l>i:I\}cIYi[*X[! <*^kG8Ra\< 68LE7,p!0 Jddo,4B Oc +wXxq#i8t*3Myzߝ>Ba kL(.{&GlzXF G<~7Κ 6agkEEz.CL;獽k6ucWYOqO+}WV PPPhm Kl3Zle{j04k^DM 8~zUbôIENDB`greekocr-1.0.1/doc/src/images/sep1.png0000644000175000017500000000101211533670467017037 0ustar dalitzdalitzPNG  IHDR#)rZsRGBbKGD̿ pHYs  tIME - [WIDATHǥ?kQ$BFZ芅`J,Lgagc`!6~mP!v*  ̵۰f7;1ww2[#:u pSwuAa ^Qg262 Y?Q2 `V{&Pj%]gx.%{i7WLIq"lN8sgRN,k-==!|K_>=%u^+@klB.~Z_)c p*v:sG^?G Yz/crQڇ Bx$V& !Oj.$ ݆+W?]}F&$}RKnk{hk %zIENDB`greekocr-1.0.1/doc/src/images/sep4.png0000644000175000017500000000044711533671660017051 0ustar dalitzdalitzPNG  IHDRWa[sRGBbKGD̿ pHYs  tIME 7,NpsIDATU1NBЧpc! $z+oc&\EBaOl( _+6ʭX8#C?_0ًNDx^7EOs?8NB S(CӤI0b8 ,Ajk\ ,rerTcRRXv]aݔ`d8 j/sC<4`p2̰Q8v Xe' }G*(1?e!ׂ/H (u $>.98ƷBȳjmRAȱ="]As=cB̹B $;f5釯Cz٬1.nч?t)e_ZBH6'JͭF'zzRT < 1`m fL64j/t?%]MFV"# Wm+g5-%);٫uaE U7XtwVG+hP \`yE0n-\ Zdг Ljh%_M{Y5C Ncu0LJߚ e~.3b&MIENDB`greekocr-1.0.1/doc/src/images/sep2.png0000644000175000017500000000121511533670706017041 0ustar dalitzdalitzPNG  IHDR#: 2sRGBbKGD̿ pHYs  tIME /"+o-IDAT8˅ϋQϽ3)AIIύXnS# eaP`BYIȯH;wx{ޙ{y9>>ʖ R;B("? 2hЊP*ؔ +9k{W+B+'fRP6 %Đ_5PP"A&$|&~֯i>bx{4@e{sw#@-$uv#\:Fܤf?i]̓ÇCY9%E~tYc6FluTAhgU[v :="Pw_}nܞW3w@USӶ.K3r+0w00rF@ZgɈMYTȭ@j|Fr4Ejt&͎Dquoh2r+t* 0S [[#"@T~'ΟOvhV?7; d?=ӯ:]iOݧ$#|{`}<;Ve9U,֢>;hyzV&d^OZA =2]wm-y(*>F}ٜ+ZHzbB_&NW|g[Ƿ}Oꇒ'`՜(,g|mb?`㵾Fŵ~Rej'WVRs%QX h/8RX<[-;Ga H[ev#^ex]1ԪѸ)$HD+ YG rArOMOI* ,a (3k{ig#}PDi3VdhڡlKga ("M\%dw8sR@@HXޕVk5Cԑy0r'~D^QC0Љy-TnLp1_~6%7NR8Jkl͛zF|;vj_EPAj1i0ޭ}9u}G[=(0ZP'%@V( @YRЁ߆@  *(:@" ?}P 0!@L`$r'( @( @ 0:  @C @Rq\EhN * `HMx` @P=kőf:P;E;: ᕅO7TL4P[|!<?: !/( Q~PH?F?qz @( @@( @@(R0 @(( @( qG(( @(( @( "_80y U^P@7a`3r?*@MW} xdpzb(Y2P RN@%H 8 R>zz/| { P$D @NPc^o.nĩɟqi 4:K},O@Mj^tt w|֚ë}a'==NG+eZ&5Oifk3m( RO[EPc-e'|tٵ; P$ ehɭgk vT<"[ @4E \5%ӑei;P3;o;*ZBaeŀڣ=Hp{(n7+737!NJpHӷ&^N~vӴ2 d4 4(Ak& PG#0# eV+v\ zk߾%6`P#쌼_?kSany," i;& _lܵ40 : jm@,C@h`te0Z"bQK, #p!;,Km7\X՘Q&InO64/^P:jhc ld`hwDi閫88)#T;'O5 2$W 6Z}N"/9F=&bfA rDUӗ&6 %}sg:[somJv^Xhܤ&B+1BYV82;pzɫoE0liOgfUEpMu0e'q;aZ 1CV>d}3OrnճõxQW3=2uL{A0V6Qנf|t=]]]ԍY5I.nmE Їe*a>nxQ6\}N(,%nU=SgQPSoSu3;C4h*Bkؖ'[񧌽8<)[wVPɨ߲+nnR'٭e ]M@O#y{&d!)zۅp: MV%Hus—$Ag&EI˾`lV/m\GFf@+)ᬢOJ[zڃ]=:.3o:p3g}8>j L粞Q:Fc+3VWS]IRfV& kͣݹRiIґ _NytS.IENDB`greekocr-1.0.1/doc/src/images/sep7.png0000644000175000017500000000030511533671436017046 0ustar dalitzdalitzPNG  IHDR`ulsRGBbKGD̿ pHYs  tIME 5'ZIIDAT}10їd b P7TT go+fG5m7oE%kUWd4GUErNBO \-ξPIENDB`greekocr-1.0.1/doc/src/images/overview.fig0000644000175000017500000000360611530754317020023 0ustar dalitzdalitz#FIG 3.2 Landscape Center Inches Letter 100.00 Single -2 1200 2 6 4650 3300 5850 4500 5 1 0 2 20 7 50 -1 -1 0.000 0 1 0 0 5250.000 3364.286 4800 4275 5235 4380 5700 4275 1 2 0 2 20 7 50 -1 -1 0.000 1 0.0000 5250 3525 450 150 4800 3375 5700 3675 2 1 0 2 20 7 50 -1 -1 0.000 0 0 -1 0 0 2 4800 3525 4800 4275 2 1 0 2 20 7 50 -1 -1 0.000 0 0 -1 0 0 2 5700 3525 5700 4275 -6 1 4 0 0 0 7 50 -1 -1 0.000 1 0.0000 9675 1912 155 155 9525 1875 9825 1950 2 2 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 975 1500 1725 1500 1725 2475 975 2475 975 1500 2 2 0 2 0 7 40 -1 20 0.000 0 0 -1 0 0 5 1125 1575 1875 1575 1875 2550 1125 2550 1125 1575 2 2 0 2 0 7 30 -1 20 0.000 0 0 -1 0 0 5 1275 1650 2025 1650 2025 2625 1275 2625 1275 1650 2 2 0 2 20 7 10 -1 20 0.000 0 0 -1 0 0 5 1575 1800 2325 1800 2325 2775 1575 2775 1575 1800 2 2 0 2 20 7 20 -1 20 0.000 0 0 -1 0 0 5 1425 1725 2175 1725 2175 2700 1425 2700 1425 1725 2 1 0 2 20 7 50 -1 -1 0.000 0 0 -1 1 0 3 1 1 2.00 120.00 240.00 1800 3000 1800 3900 4500 3900 2 2 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 7875 1650 8625 1650 8625 2625 7875 2625 7875 1650 2 1 0 2 -1 7 50 -1 -1 0.000 0 0 -1 1 0 2 1 1 2.00 120.00 240.00 2850 2175 7650 2175 2 1 0 2 -1 7 50 -1 -1 0.000 0 0 -1 1 0 2 1 1 2.00 120.00 240.00 5175 3150 5175 2400 2 2 0 2 0 7 40 -1 20 0.000 0 0 -1 0 0 5 8025 1725 8775 1725 8775 2700 8025 2700 8025 1725 2 2 0 2 0 7 30 -1 20 0.000 0 0 -1 0 0 5 8175 1800 8925 1800 8925 2775 8175 2775 8175 1800 4 0 0 50 -1 0 20 0.0000 4 255 885 1275 1200 Images\001 4 0 20 50 -1 0 20 0.0000 4 195 585 6000 4200 Data\001 4 0 20 50 -1 0 20 0.0000 4 255 1050 5850 3900 Training\001 4 0 -1 50 -1 0 20 0.0000 4 195 885 7800 1050 Text as\001 4 0 -1 50 -1 0 20 0.0000 4 210 1980 7800 1380 Unicode/LaTeX\001 4 0 20 50 -1 1 20 0.0000 4 255 1065 2325 3750 Training\001 4 0 0 50 -1 0 20 0.0000 4 195 1275 1125 900 Document\001 4 0 -1 50 -1 1 20 0.0000 4 270 1710 4575 1950 Classification\001 greekocr-1.0.1/doc/src/images/sep3.png0000644000175000017500000000150111533671037017036 0ustar dalitzdalitzPNG  IHDR6жsRGBbKGD̿ pHYs  tIME 1@IDATHǕ=\Uߛ]Eh&Xp-d-B  EmDFAPchm-AMc%"A$_{X| =s{&S'[3;+?YLBK[˳gNkN( qW2i?tɔm Ι꘵)pݴt![O 3ZVLKDeU}K B!JyZREZrP T}^e][ jÆ2i/Vi[ }-Zt3 a{U-8Z‘jo;y!|RC\XD`)d=,suSrB=Ua[X9Z.u=g#;v!~[S=hs\x`NY!5ʰ.VcC k@d8`Fϵv7$I-u·c|vHtF +,.[-wlm_8n#, ZnB!LիMA^M숚M&;5S,, 2011 Please contact Christoph Dalitz for questions about this toolkit. License ------- This toolkit is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, either version 2 of the license, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the file LICENSE for more details. greekocr-1.0.1/version0000644000175000017500000000000611635564152014266 0ustar dalitzdalitz1.0.1 greekocr-1.0.1/scripts/0000755000175000017500000000000011635564234014352 5ustar dalitzdalitzgreekocr-1.0.1/scripts/greekocr4gamera.py0000755000175000017500000001152611535675422017777 0ustar dalitzdalitz#!/usr/bin/env python # -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*- # vim: set tabstop=3 shiftwidth=3 expandtab: # Copyright (C) 2010-2011 Christian Brandt # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # This just simply runs the greekocr toolkits main function import sys def usage(): usage = "Usage:\n" usage += " greekocr4gamera.py -x [options] \n" usage += "\n" usage += " Options:\n" usage += " --wholistic wholistic segmentation mode (default)\n" usage += " -w short for --wholistic\n" usage += " --separatistic separatistic segmentation mode\n" usage += " -s short for --separatistic\n" usage += "\n" usage += " --unicode specify filename for unicode output\n" usage += " -u short for --unicode\n" usage += " --teubner specify filename for teubner TeX output\n" usage += " -t short for --teubner\n" usage += "\n" usage += " --deskew do a skew correction (recommended)\n" usage += " --filter filter out very large (images) and very\n" usage += " small components (noise)\n" usage += "\n" usage += " --debug save debug-images\n" usage += " debug_lines.png debug_words.png debug_chars.png\n" usage += " -d short for --debug\n" sys.stderr.write(usage) options = {} args = sys.argv[1:] i = 0 while i < len(args): if args[i] in ("-x", "--trainingdata"): i += 1 options["trainingdata"] = args[i] elif args[i] in ("--help", "-h"): usage() elif args[i] in ("--wholistic", "-w"): options["mode"] = "wholistic" elif args[i] in ("--separatistic", "-s"): options["mode"] = "separatistic" elif args[i] in ("-u","--unicode"): i += 1 options["unicodeoutfile"] = args[i] elif args[i] in ("-t", "--teubner"): i += 1 options["teubneroutfile"] = args[i] elif args[i] in ("-d", "--debug"): options["debug"] = True elif args[i] in ("--deskew"): options["deskew"] = True elif args[i] in ("--filter"): options["filter"] = True else: options["imagefile"] = args[i] i += 1 if not options.has_key("trainingdata"): print "No Trainingdata given" usage() exit(1) if not options.has_key("mode"): options["mode"] = "wholistic" if not options.has_key("imagefile"): print "No filename given" usage() exit(2) from gamera.core import * from gamera.plugins.listutilities import median from gamera.toolkits.greekocr import GreekOCR g = GreekOCR() g.mode = options["mode"] g.load_trainingdata(options["trainingdata"]) image = load_image(options["imagefile"]) if image.data.pixel_type != ONEBIT: image = image.to_onebit() if options.has_key("filter") and options["filter"] == True: count = 0 ccs = image.cc_analysis() if options.has_key("debug") and options["debug"] == True: print "filter started on",len(ccs) ,"elements..." median_black_area = median([cc.black_area()[0] for cc in ccs]) for cc in ccs: if(cc.black_area()[0] > (median_black_area * 10)): cc.fill_white() del cc count = count + 1 for cc in ccs: if(cc.black_area()[0] < (median_black_area / 10)): cc.fill_white() del cc count = count + 1 if options.has_key("debug") and options["debug"] == True: print "filter done.",len(ccs)-count,"elements left." if options.has_key("deskew") and options["deskew"] == True: #from gamera.toolkits.otr.otr_staff import * if options.has_key("debug") and options["debug"] == True: print "\ntry to skew correct..." rotation = image.rotation_angle_projections(-10,10)[0] img = image.rotate(rotation,0) if options.has_key("debug") and options["debug"] == True: print "rotated with",rotation,"angle" output = g.process_image(image) if options.has_key("debug") and options["debug"] == True: g.save_debug_images() if options.has_key("unicodeoutfile"): g.save_text_unicode(options["unicodeoutfile"]) elif options.has_key("teubneroutfile"): g.save_text_teubner(options["teubneroutfile"]) else: print output