unicode-0.9.7/0000775000000000000000000000000012054366765010063 5ustar unicode-0.9.7/README0000644000000000000000000000440412054171456010732 0ustar This file is in UTF-8 encoding. To use unicode utility, you need: - python >=2.2 (generators are needed), preferrably wide unicode build, - python optparse library (part of python2.3) - UnicodeData.txt file (http://www.unicode.org/Public) which you should put into /usr/share/unicode/, ~/.unicode/ or current working directory. - if you want to see UniHan properties, you need also Unihan.txt file which should be put into /usr/share/unicode/, ~./unicode/ or current working directory. Enter regular expression, hexadecimal number or some characters as an argument. unicode will try to guess what you want to look up, see the manpage if you want to force other behaviour (the manpage is also the best documentation). In particular, -r forces searching for regular expression in the names of character, -s forces unicode to display information about the characters given. Here are just some examples how to use this script: $ unicode.py euro U+20A0 EURO-CURRENCY SIGN UTF-8: e2 82 a0 UTF-16BE: 20a0 Decimal: ₠ ₠ Category: Sc (Symbol, Currency) Bidi: ET (European Number Terminator) U+20AC EURO SIGN UTF-8: e2 82 ac UTF-16BE: 20ac Decimal: € € Category: Sc (Symbol, Currency) Bidi: ET (European Number Terminator) $ unicode.py 00c0 U+00C0 LATIN CAPITAL LETTER A WITH GRAVE UTF-8: c3 80 UTF-16BE: 00c0 Decimal: À À (à) Lowercase: U+00E0 Category: Lu (Letter, Uppercase) Bidi: L (Left-to-Right) Decomposition: 0041 0300 You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g. unicode 0450..0520 will display the whole cyrillic, armenian and hebrew blocks (characters from U+0400 to U+05FF) unicode 0400.. will display just characters from U+0400 up to U+04FF Use --fromcp to query codepoints from other encodings: $ unicode --fromcp cp1250 -d 200 U+010C LATIN CAPITAL LETTER C WITH CARON UTF-8: c4 8c UTF-16BE: 010c Decimal: Č Č (Č) Uppercase: U+010C Category: Lu (Letter, Uppercase) Bidi: L (Left-to-Right) Decomposition: 0043 030C Multibyte encodings are supported: $ unicode --fromcp big5 -x aff3 and multi-char strings are supported, too: $ unicode --fromcp utf-8 -x c599c3adc5a5 unicode-0.9.7/unicode.10000644000000000000000000000727512054171456011573 0ustar .\" Hey, EMACS: -*- nroff -*- .TH UNICODE 1 "2003-01-31" .SH NAME unicode \- command line unicode database query tool .SH SYNOPSIS .B unicode .RI [ options ] string .SH DESCRIPTION This manual page documents the .B unicode command. .PP \fBunicode\fP is a command line unicode database query tool. .SH OPTIONS .TP .BI \-h .BI \-\-help Show help and exit. .TP .BI \-x .BI \-\-hexadecimal Assume .I string to be a hexadecimal number .TP .BI \-d .BI \-\-decimal Assume .I string to be a decimal number .TP .BI \-o .BI \-\-octal Assume .I string to be an octal number .TP .BI \-b .BI \-\-binary Assume .I string to be a binary number .TP .BI \-r .BI \-\-regexp Assume .I string to be a regular expression .TP .BI \-s .BI \-\-string Assume .I string to be a sequence of characters .TP .BI \-a .BI \-\-auto Try to guess type of .I string from one of the above (default) .TP .BI \-mMAXCOUNT .BI \-\-max=MAXCOUNT Maximal number of codepoints to display, default: 20; use 0 for unlimited .TP .BI \-iCHARSET .BI \-\-io=IOCHARSET I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP tries to guess this value from your locale, so with properly set up locale, you should not need to specify it. .TP .BI \-\-fcp=CHARSET .BI \-\-fromcp=CHARSET Convert numerical arguments from this encoding, default: no conversion. Multibyte encodings are supported. This is ignored for non-numerical arguments. .TP .BI \-cADDCHARSET .BI \-\-charset\-add=ADDCHARSET Show hexadecimal reprezentation of displayed characters in this additional charset. .TP .BI \-CUSE_COLOUR .BI \-\-colour=USE_COLOUR USE_COLOUR is one of .I on .I off .I auto .B \-\-colour=on will use ANSI colour codes to colourise the output .B \-\-colour=off won't use colours. .B \-\-colour=auto will test if standard output is a tty, and use colours only when it is. .BI \-\-color is a synonym of .BI \-\-colour .TP .BI \-v .BI \-\-verbose Be more verbose about displayed characters, e.g. display Unihan information, if available. .TP .BI \-w .BI \-\-wikipedia Spawn browser pointing to Wikipedia entry about the character. .TP .BI \-\-list List (approximately) all known encodings. .SH USAGE \fBunicode\fP tries to guess the type of an argument. In particular, if the arguments looks like a valid hexadecimal representation of a Unicode codepoint, it will be considered to be such. Using \fBunicode\fP face will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE, and it will not search for 'face' in character descriptions \- for the latter, use: \fBunicode\fP -r face For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a): \fBunicode\fP 00E1 \fBunicode\fP U+00E1 \fBunicode\fP \('a \fBunicode\fP 'latin small letter a with acute' You can specify a range of characters as argumets, \fBunicode\fP will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g. \fBunicode\fP 0450..0520 will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF) \fBunicode\fP 0400.. will display just characters from U+0400 up to U+04FF Use --fromcp to query codepoints from other encodings: \fBunicode\fP --fromcp cp1250 -d 200 Multibyte encodings are supported: \fBunicode\fP --fromcp big5 -x aff3 and multi-char strings are supported, too: \fBunicode\fP --fromcp utf-8 -x c599c3adc5a5 .SH BUGS Tabular format does not deal well with full-width, combining, control and RTL characters. .SH SEE ALSO ascii(1) .SH AUTHOR Radovan Garab\('ik unicode-0.9.7/changelog0000777000000000000000000000000012010413116014773 2debian/changelogustar unicode-0.9.7/README-paracode0000644000000000000000000000227112010413116012466 0ustar Written by Radovan Garabík . For new versions, look at http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ ------------------- paracode exploits the full power of the Unicode standard to convert the text into visually similar stream of glyphs, while using completely different codepoints. It is an excellent didactic tool demonstrating the principles and advanced use of the Unicode standard. paracode is a command line tool working as a filter, reading standard input in UTF-8 encoding and writing to standard output. Use optional -t switch to select what tables to use. Special name 'all' selects all the tables. Note that selecting 'other', 'cyrillic_plus' and 'cherokee' tables (and 'all') makes use of rather esoteric characters, and not all fonts contain them. Special table 'mirror' uses quite different character substitution, is not selected automatically with 'all' and does not work well with anything except plain ascii alphabetical characters. Example: paracode -t cyrillic+greek+cherokee paracode -t cherokee output paracode -r -t mirror output Possible tables are: cyrillic cyrillic_plus greek other cherokee all unicode-0.9.7/paracode0000755000000000000000000001473412010413116011545 0ustar #!/usr/bin/python import sys, unicodedata from optparse import OptionParser table_cyrillic = { 'A' : u'\N{CYRILLIC CAPITAL LETTER A}', 'B' : u'\N{CYRILLIC CAPITAL LETTER VE}', 'C' : u'\N{CYRILLIC CAPITAL LETTER ES}', 'E' : u'\N{CYRILLIC CAPITAL LETTER IE}', 'H' : u'\N{CYRILLIC CAPITAL LETTER EN}', 'I' : u'\N{CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I}', 'J' : u'\N{CYRILLIC CAPITAL LETTER JE}', 'K' : u'\N{CYRILLIC CAPITAL LETTER KA}', 'M' : u'\N{CYRILLIC CAPITAL LETTER EM}', 'O' : u'\N{CYRILLIC CAPITAL LETTER O}', 'P' : u'\N{CYRILLIC CAPITAL LETTER ER}', 'S' : u'\N{CYRILLIC CAPITAL LETTER DZE}', 'T' : u'\N{CYRILLIC CAPITAL LETTER TE}', 'X' : u'\N{CYRILLIC CAPITAL LETTER HA}', 'Y' : u'\N{CYRILLIC CAPITAL LETTER U}', 'a' : u'\N{CYRILLIC SMALL LETTER A}', 'c' : u'\N{CYRILLIC SMALL LETTER ES}', 'e' : u'\N{CYRILLIC SMALL LETTER IE}', 'i' : u'\N{CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I}', 'j' : u'\N{CYRILLIC SMALL LETTER JE}', 'o' : u'\N{CYRILLIC SMALL LETTER O}', 'p' : u'\N{CYRILLIC SMALL LETTER ER}', 's' : u'\N{CYRILLIC SMALL LETTER DZE}', 'x' : u'\N{CYRILLIC SMALL LETTER HA}', 'y' : u'\N{CYRILLIC SMALL LETTER U}', } table_cyrillic_plus = { 'Y' : u'\N{CYRILLIC CAPITAL LETTER STRAIGHT U}', 'h' : u'\N{CYRILLIC SMALL LETTER SHHA}', } table_greek = { 'A' : u'\N{GREEK CAPITAL LETTER ALPHA}', 'B' : u'\N{GREEK CAPITAL LETTER BETA}', 'E' : u'\N{GREEK CAPITAL LETTER EPSILON}', 'H' : u'\N{GREEK CAPITAL LETTER ETA}', 'I' : u'\N{GREEK CAPITAL LETTER IOTA}', 'K' : u'\N{GREEK CAPITAL LETTER KAPPA}', 'M' : u'\N{GREEK CAPITAL LETTER MU}', 'N' : u'\N{GREEK CAPITAL LETTER NU}', 'O' : u'\N{GREEK CAPITAL LETTER OMICRON}', 'P' : u'\N{GREEK CAPITAL LETTER RHO}', 'T' : u'\N{GREEK CAPITAL LETTER TAU}', 'X' : u'\N{GREEK CAPITAL LETTER CHI}', 'Y' : u'\N{GREEK CAPITAL LETTER UPSILON}', 'Z' : u'\N{GREEK CAPITAL LETTER ZETA}', 'o' : u'\N{GREEK SMALL LETTER OMICRON}', } table_other = { '!' : u'\N{LATIN LETTER RETROFLEX CLICK}', 'O' : u'\N{ARMENIAN CAPITAL LETTER OH}', 'S' : u'\N{ARMENIAN CAPITAL LETTER TIWN}', 'o' : u'\N{ARMENIAN SMALL LETTER OH}', 'n' : u'\N{ARMENIAN SMALL LETTER VO}', } table_cherokee = { 'A' : u'\N{CHEROKEE LETTER GO}', 'B' : u'\N{CHEROKEE LETTER YV}', 'C' : u'\N{CHEROKEE LETTER TLI}', 'D' : u'\N{CHEROKEE LETTER A}', 'E' : u'\N{CHEROKEE LETTER GV}', 'G' : u'\N{CHEROKEE LETTER NAH}', 'H' : u'\N{CHEROKEE LETTER MI}', 'J' : u'\N{CHEROKEE LETTER GU}', 'K' : u'\N{CHEROKEE LETTER TSO}', 'L' : u'\N{CHEROKEE LETTER TLE}', 'M' : u'\N{CHEROKEE LETTER LU}', 'P' : u'\N{CHEROKEE LETTER TLV}', 'R' : u'\N{CHEROKEE LETTER SV}', 'S' : u'\N{CHEROKEE LETTER DU}', 'T' : u'\N{CHEROKEE LETTER I}', 'V' : u'\N{CHEROKEE LETTER DO}', 'W' : u'\N{CHEROKEE LETTER LA}', 'Y' : u'\N{CHEROKEE LETTER GI}', 'Z' : u'\N{CHEROKEE LETTER NO}', } table_mirror = { 'A' : u'\N{FOR ALL}', 'B' : u'\N{CANADIAN SYLLABICS CARRIER KHA}', 'C' : u'\N{LATIN CAPITAL LETTER OPEN O}', 'D' : u'\N{CANADIAN SYLLABICS CARRIER PA}', 'E' : u'\N{LATIN CAPITAL LETTER REVERSED E}', 'F' : u'\N{TURNED CAPITAL F}', 'G' : u'\N{TURNED SANS-SERIF CAPITAL G}', 'H' : u'H', 'I' : u'I', 'J' : u'\N{LATIN SMALL LETTER LONG S}', 'K' : u'\N{LATIN SMALL LETTER TURNED K}', # fixme 'L' : u'\N{TURNED SANS-SERIF CAPITAL L}', 'M' : u'W', 'N' : u'N', 'O' : u'O', 'P' : u'\N{CYRILLIC CAPITAL LETTER KOMI DE}', 'R' : u'\N{CANADIAN SYLLABICS TLHO}', 'S' : u'S', 'T' : u'\N{UP TACK}', 'U' : u'\N{ARMENIAN CAPITAL LETTER VO}', 'V' : u'\N{N-ARY LOGICAL AND}', 'W' : u'M', 'X' : u'X', 'Y' : u'\N{TURNED SANS-SERIF CAPITAL Y}', 'Z' : u'Z', 'a' : u'\N{LATIN SMALL LETTER TURNED A}', 'b' : u'q', 'c' : u'\N{LATIN SMALL LETTER OPEN O}', 'd' : u'p', 'e' : u'\N{LATIN SMALL LETTER SCHWA}', 'f' : u'\N{LATIN SMALL LETTER DOTLESS J WITH STROKE}', 'g' : u'\N{LATIN SMALL LETTER B WITH HOOK}', 'h' : u'\N{LATIN SMALL LETTER TURNED H}', 'i' : u'\N{LATIN SMALL LETTER DOTLESS I}' + u'\N{COMBINING DOT BELOW}', 'j' : u'\N{LATIN SMALL LETTER LONG S}' + u'\N{COMBINING DOT BELOW}', 'k' : u'\N{LATIN SMALL LETTER TURNED K}', 'l' : u'l', 'm' : u'\N{LATIN SMALL LETTER TURNED M}', 'n' : u'u', 'o' : u'o', 'p' : u'd', 'q' : u'b', 'r' : u'\N{LATIN SMALL LETTER TURNED R}', 's' : u's', 't' : u'\N{LATIN SMALL LETTER TURNED T}', 'u' : u'n', 'v' : u'\N{LATIN SMALL LETTER TURNED V}', 'w' : u'\N{LATIN SMALL LETTER TURNED W}', 'x' : u'x', 'y' : u'\N{LATIN SMALL LETTER TURNED Y}', 'z' : u'z', '0' : '0', '1' : u'I', '2' : u'\N{INVERTED QUESTION MARK}\N{COMBINING MACRON}', '3' : u'\N{LATIN CAPITAL LETTER OPEN E}', '4' : u'\N{LATIN SMALL LETTER LZ DIGRAPH}', '6' : '9', '7' : u'\N{LATIN CAPITAL LETTER L WITH STROKE}', '8' : '8', '9' : '6', ',' : "'", "'" : ',', '.' : u'\N{DOT ABOVE}', '?' : u'\N{INVERTED QUESTION MARK}', '!' : u'\N{INVERTED EXCLAMATION MARK}', } tables_names = ['cyrillic', 'cyrillic_plus', 'greek', 'other', 'cherokee'] table_default = table_cyrillic table_default.update(table_greek) table_all={} for t in tables_names: table_all.update(globals()['table_'+t]) parser = OptionParser(usage="usage: %prog [options]") parser.add_option("-t", "--tables", action="store", default='default', dest="tables", type="string", help="""list of tables to use, separated by a plus sign. Possible tables are: """+'+'.join(tables_names)+""" and a special name 'all' to specify all these tables joined together. There is another table, 'mirror', that is not selected in 'all'.""") parser.add_option("-r", "--reverse", action="count", dest="reverse", default=0, help="Reverse the text after conversion. Best used with the 'mirror' table.") (options, args) = parser.parse_args() if args: to_convert = ' '.join(args).decode('utf-8') else: to_convert = None tables = options.tables.split('+') tables = ['table_'+x for x in tables] tables = [globals()[x] for x in tables] table = {} for t in tables: table.update(t) def reverse_string(s): l = list(s) l.reverse() r = ''.join(l) return r def do_convert(s, reverse=0): if reverse: s = reverse_string(s) l = unicodedata.normalize('NFKD', s) out = [] for c in l: out.append(table.get(c, c)) out = ''.join(out) out = unicodedata.normalize('NFKC', out) return out if not to_convert: if options.reverse: lines = sys.stdin.readlines() lines.reverse() else: lines = sys.stdin for line in lines: l = line.decode('utf-8') out = do_convert(l, options.reverse) sys.stdout.write(out.encode('utf-8')) else: out = do_convert(to_convert, options.reverse) sys.stdout.write(out.encode('utf-8')) sys.stdout.write('\n') unicode-0.9.7/debian/0000755000000000000000000000000012054366765011303 5ustar unicode-0.9.7/debian/dirs0000644000000000000000000000001012010413116012125 0ustar usr/bin unicode-0.9.7/debian/docs0000644000000000000000000000003012010413116012116 0ustar README README-paracode unicode-0.9.7/debian/control0000644000000000000000000000070612054152166012675 0ustar Source: unicode Section: utils Priority: optional Maintainer: Radovan Garabík Build-Depends: debhelper (>= 4) Standards-Version: 3.8.0 Package: unicode Architecture: all Depends: python (>= 2.3) Recommends: unicode-data Description: display unicode character properties unicode is a simple command line utility that displays properties for a given unicode character, or searches unicode database for a given name. unicode-0.9.7/debian/rules0000755000000000000000000000355612010413116012343 0ustar #!/usr/bin/make -f # Sample debian/rules that uses debhelper. # GNU copyright 1997 to 1999 by Joey Hess. # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 # This is the debhelper compatibility version to use. #export DH_COMPAT=4 CFLAGS = -Wall -g ifneq (,$(findstring noopt,$(DEB_BUILD_OPTIONS))) CFLAGS += -O0 else CFLAGS += -O2 endif ifeq (,$(findstring nostrip,$(DEB_BUILD_OPTIONS))) INSTALL_PROGRAM += -s endif configure: configure-stamp configure-stamp: dh_testdir # Add here commands to configure the package. touch configure-stamp build: build-stamp build-stamp: configure-stamp dh_testdir # Add here commands to compile the package. #$(MAKE) #/usr/bin/docbook-to-man debian/unicode.sgml > unicode.1 touch build-stamp clean: dh_testdir dh_testroot rm -f build-stamp configure-stamp # Add here commands to clean up after the build process. #-$(MAKE) clean dh_clean install: build dh_testdir dh_testroot dh_clean -k dh_installdirs # Add here commands to install the package into debian/unicode. #$(MAKE) install DESTDIR=$(CURDIR)/debian/unicode cp unicode paracode $(CURDIR)/debian/unicode/usr/bin # Build architecture-dependent files here. #binary-arch: build install # We have nothing to do by default. # Build architecture-independent files here. binary-indep: build install dh_testdir dh_testroot # dh_installdebconf dh_installdocs # dh_installexamples # dh_installmenu # dh_installlogrotate # dh_installemacsen # dh_installpam # dh_installmime # dh_installinit # dh_installcron dh_installman unicode.1 paracode.1 # dh_installinfo # dh_undocumented dh_installchangelogs # dh_link dh_strip dh_compress dh_fixperms # dh_makeshlibs dh_installdeb # dh_perl # dh_shlibdeps # dh_python dh_gencontrol dh_md5sums dh_builddeb binary: binary-indep binary-arch .PHONY: build clean binary-indep binary-arch binary install configure unicode-0.9.7/debian/changelog0000644000000000000000000001361212054171456013147 0ustar unicode (0.9.7) unstable; urgency=low * add option to recognise binary input numerical codes * do not suggest console-data * change Suggest to Recommend for unicode-data (closes: #683852), both this and above suggested by Tollef Fog Heen * do not throw an exception when run under an undefined locale * on error, exit with nonzero existatus * preliminary python3 support * mention -s and -r in the README (closes: #664277) * other minor tweaks and improvements -- Radovan Garabík Sat, 24 Nov 2012 11:18:06 +0200 unicode (0.9.6) unstable; urgency=low * add option to recognise octal input numerical codes * add option to convert input numerical codes from an arbitrary charset * don't suggest perl-modules anymore (closes: #651479), thanks to mike castleman * clarify searching for hexadecimal codepoints in the manpage (closes: #643284) * better error messages if the codepoint exceeds sys.maxunicode -- Radovan Garabík Sun, 29 Jul 2012 13:46:18 +0200 unicode (0.9.5) unstable; urgency=low * do not raise an exception on empty string argument (closes: #601503), thanks to Etienne Millon for reporting the bug -- Radovan Garabík Sun, 21 Nov 2010 14:50:29 +0100 unicode (0.9.4) unstable; urgency=low * recognise split unihan files (closes: #551789) -- Radovan Garabík Sun, 07 Feb 2010 18:36:29 +0100 unicode (0.9.3) unstable; urgency=low * run pylint & pychecker – fix some previously unnoticed bugs -- Radovan Garabík Mon, 04 May 2009 22:40:51 +0200 unicode (0.9.2) unstable; urgency=low * giving "latin alpha" as an argument will now search for all the character names containing the "latin.*alpha" regular expression, not _either_ "latin" or "alpha" strings (closes: #439146), idea from martin f. krafft. * added forgotten README-paracode to the docfiles -- Radovan Garabík Thu, 30 Oct 2008 18:58:48 +0100 unicode (0.9.1) unstable; urgency=low * add package URL to debian/copyright and debian/README.Debian (closes: #495555) -- Radovan Garabík Sat, 23 Aug 2008 10:28:02 +0200 unicode (0.9) unstable; urgency=low * include paracode utility * clarify GPL version (v3) -- Radovan Garabík Wed, 19 Sep 2007 19:01:55 +0100 unicode (0.8) unstable; urgency=low * fix traceback when letter has no uppercase or lowercase forms -- Radovan Garabík Sun, 1 Oct 2006 21:42:33 +0200 unicode (0.7) unstable; urgency=low * updated to use unicode-data (closes: #386853) * data files can be bzip2'ed now * use data from unicode data files, not from python unicodedata module (the latter tends to be obsolete) -- Radovan Garabík Sat, 16 Sep 2006 21:44:34 +0200 unicode (0.6) unstable; urgency=low * fix stupid undeclared options bug (thanks to Tim Hatch) * remove absolute path from z?grep, rely on OS's default PATH to execute the command(s) * add default path to UnicodeData.txt for MacOSX systems -- Radovan Garabík Wed, 4 Jan 2006 19:57:54 +0100 unicode (0.5) unstable; urgency=low * work around browser invocations that cannot handle UTF-8 in URL's -- Radovan Garabík Sun, 1 Jan 2006 00:59:60 +0100 unicode (0.4.9) unstable; urgency=low * better directional overriding for RTL characters * query wikipedia with -w switch * better heuristics guessing argument type -- Radovan Garabík Sun, 11 Sep 2005 18:30:59 +0200 unicode (0.4.8) unstable; urgency=low * catch an exception if locale.nl_langinfo is not present (thanks to Michael Weir) * default to no colour if the system in MS Windows * put back accidentally disabled left-to-right mark - as a result, tabular display of arabic, hebrew and other RTL scripts is much better (the bug manifested itself only on powerful i18n terminals, such as mlterm) -- Radovan Garabík Fri, 26 Aug 2005 14:25:58 +0200 unicode (0.4.7) unstable; urgency=low * some UniHan support (closes: #187214) * --color as a synonum for --colour added (closes: #273503) -- Radovan Garabík Thu, 4 Aug 2005 16:36:07 +0200 unicode (0.4.6) unstable; urgency=low * change charset guessing (closes: #241889), thanks to Євгeнiй Meщepяĸoв (Eugeniy Meshcheryakov) for the patch * closes: #229857 - it has been closed together with 215267 -- Radovan Garabík Tue, 20 Apr 2004 15:39:34 +0200 unicode (0.4.5) unstable; urgency=low * catch exception if input sequence is invalid in given encoding (closes: #188438) * automatically find and symlink UnicodeData.txt from perl, if installed (thanks to LarstiQ for the patch) (closes: #215267) * change architecture to 'all' (closes: #215264) -- Radovan Garabík Wed, 21 Jan 2004 10:30:38 +0100 unicode (0.4) unstable; urgency=low * added option to choose colour output (closes: #187215) -- Radovan Garabík Wed, 9 Apr 2003 16:37:39 +0200 unicode (0.3.1) unstable; urgency=low * added python to Build-depends (closes: #183662) * properly quote hyphens in manpage (closes: #186151) * do not use UTF-8 in manpage (closes: #186193) * added versioned dependency for python2.3 (closes: #186444) -- Radovan Garabík Mon, 24 Mar 2003 14:39:31 +0100 unicode (0.3) unstable; urgency=low * Initial Release. -- Radovan Garabík Fri, 7 Feb 2003 15:09:19 +0100 unicode-0.9.7/debian/compat0000644000000000000000000000000212010413116012450 0ustar 5 unicode-0.9.7/debian/README.Debian0000644000000000000000000000035612010413116013317 0ustar unicode for Debian ------------------ packaged as native package, the source resides at http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ -- Radovan Garabík , Fri, 7 Feb 2003 15:09:19 +0100 unicode-0.9.7/debian/copyright0000644000000000000000000000055512010413116013212 0ustar This program was written by Radovan Garabík on Fri, 7 Feb 2003 15:09:19 +0100, and packaged for Debian as a native package. The sources and package can be downloaded from: http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ Copyright: © Radovan Garabík, released under GPL v3, see /usr/share/common-licenses/GPL unicode-0.9.7/paracode.10000644000000000000000000000305212010413116011670 0ustar .\" Hey, EMACS: -*- nroff -*- .TH PARACODE 1 "2005-04-16" .SH NAME paracode \- command line Unicode conversion tool .SH SYNOPSIS .B paracode .RI [ -t tables ] string .SH DESCRIPTION This manual page documents the .B paracode command. .PP \fBparacode\fP exploits the full power of the Unicode standard to convert the text into visually similar stream of glyphs, while using completely different codepoints. It is an excellent didactic tool demonstrating the principles and advanced use of the Unicode standard. .PP \fBparacode\fP is a command line tool working as a filter, reading standard input in UTF-8 encoding and writing to standard output. .SH OPTIONS .TP .BI \-t tables .BI \-\-tables Use given list of conversion tables, separated by a plus sign. Special name 'all' selects all the tables. Note that selecting 'other', 'cyrillic_plus' and 'cherokee' tables (and 'all') makes use of rather esoteric characters, and not all fonts contain them. Special table 'mirror' uses quite different character substitution, is not selected automatically with 'all' and does not work well with anything except plain ascii alphabetical characters. Example: paracode -t cyrillic+greek+cherokee paracode -t cherokee output paracode -r -t mirror output Possible tables are: cyrillic cyrillic_plus greek other cherokee all .TP .BI \-r Display text in reverse order after conversion, best used together with -t mirror. .SH SEE ALSO iconv(1) .SH AUTHOR Radovan Garab\('ik unicode-0.9.7/unicode0000755000000000000000000006377712054365372011452 0ustar #!/usr/bin/python import os, glob, sys, unicodedata, locale, gzip, re, traceback, encodings import urllib, webbrowser, textwrap # bz2 was introduced in 2.3, we want this to work also with earlier versions try: import bz2 except ImportError: bz2 = None # for python3 try: unicode except NameError: unicode = str # 'any' and 'all' were introduced in python2.5 # dummy replacement for older versions try: all except NameError: all = lambda x: False PY3 = sys.version_info[0] >= 3 if PY3: import subprocess as cmd def is_ascii(s): "test is string s consists completely of ascii characters (python 3)" try: s.encode('ascii') except UnicodeEncodeError: return False return True def out(*args): "pring args, converting them to output charset" for i in args: sys.stdout.flush() sys.stdout.buffer.write(i.encode(options.iocharset, 'replace')) # ord23 is used to convert elements of byte array in python3, which are integers ord23 = lambda x: x # unichr is not in python3 unichr = chr else: # python2 # getoutput() and getstatusoutput() methods have # been moved from commands to the subprocess module # with Python >= 3.x import commands as cmd def is_ascii(s): "test is string s consists completely of ascii characters (python 2)" try: unicode(s, 'ascii') except UnicodeDecodeError: return False return True def out(*args): "pring args, converting them to output charset" for i in args: sys.stdout.write(i.encode(options.iocharset, 'replace')) ord23 = ord from optparse import OptionParser VERSION='0.9.7' # list of terminals that support bidi biditerms = ['mlterm'] try: locale.setlocale(locale.LC_ALL, '') except locale.Error: pass # guess terminal charset try: iocharsetguess = locale.nl_langinfo(locale.CODESET) or "ascii" except locale.Error: iocharsetguess = "ascii" if os.environ.get('TERM') in biditerms and iocharsetguess.lower().startswith('utf'): LTR = u'\u202d' # left to right override else: LTR = '' colours = { 'none' : "", 'default' : "\033[0m", 'bold' : "\033[1m", 'underline' : "\033[4m", 'blink' : "\033[5m", 'reverse' : "\033[7m", 'concealed' : "\033[8m", 'black' : "\033[30m", 'red' : "\033[31m", 'green' : "\033[32m", 'yellow' : "\033[33m", 'blue' : "\033[34m", 'magenta' : "\033[35m", 'cyan' : "\033[36m", 'white' : "\033[37m", 'on_black' : "\033[40m", 'on_red' : "\033[41m", 'on_green' : "\033[42m", 'on_yellow' : "\033[43m", 'on_blue' : "\033[44m", 'on_magenta' : "\033[45m", 'on_cyan' : "\033[46m", 'on_white' : "\033[47m", 'beep' : "\007", } general_category = { 'Lu': 'Letter, Uppercase', 'Ll': 'Letter, Lowercase', 'Lt': 'Letter, Titlecase', 'Lm': 'Letter, Modifier', 'Lo': 'Letter, Other', 'Mn': 'Mark, Non-Spacing', 'Mc': 'Mark, Spacing Combining', 'Me': 'Mark, Enclosing', 'Nd': 'Number, Decimal Digit', 'Nl': 'Number, Letter', 'No': 'Number, Other', 'Pc': 'Punctuation, Connector', 'Pd': 'Punctuation, Dash', 'Ps': 'Punctuation, Open', 'Pe': 'Punctuation, Close', 'Pi': 'Punctuation, Initial quote', 'Pf': 'Punctuation, Final quote', 'Po': 'Punctuation, Other', 'Sm': 'Symbol, Math', 'Sc': 'Symbol, Currency', 'Sk': 'Symbol, Modifier', 'So': 'Symbol, Other', 'Zs': 'Separator, Space', 'Zl': 'Separator, Line', 'Zp': 'Separator, Paragraph', 'Cc': 'Other, Control', 'Cf': 'Other, Format', 'Cs': 'Other, Surrogate', 'Co': 'Other, Private Use', 'Cn': 'Other, Not Assigned', } bidi_category = { 'L' : 'Left-to-Right', 'LRE' : 'Left-to-Right Embedding', 'LRO' : 'Left-to-Right Override', 'R' : 'Right-to-Left', 'AL' : 'Right-to-Left Arabic', 'RLE' : 'Right-to-Left Embedding', 'RLO' : 'Right-to-Left Override', 'PDF' : 'Pop Directional Format', 'EN' : 'European Number', 'ES' : 'European Number Separator', 'ET' : 'European Number Terminator', 'AN' : 'Arabic Number', 'CS' : 'Common Number Separator', 'NSM' : 'Non-Spacing Mark', 'BN' : 'Boundary Neutral', 'B' : 'Paragraph Separator', 'S' : 'Segment Separator', 'WS' : 'Whitespace', 'ON' : 'Other Neutrals', } comb_classes = { 0: 'Spacing, split, enclosing, reordrant, and Tibetan subjoined', 1: 'Overlays and interior', 7: 'Nuktas', 8: 'Hiragana/Katakana voicing marks', 9: 'Viramas', 10: 'Start of fixed position classes', 199: 'End of fixed position classes', 200: 'Below left attached', 202: 'Below attached', 204: 'Below right attached', 208: 'Left attached (reordrant around single base character)', 210: 'Right attached', 212: 'Above left attached', 214: 'Above attached', 216: 'Above right attached', 218: 'Below left', 220: 'Below', 222: 'Below right', 224: 'Left (reordrant around single base character)', 226: 'Right', 228: 'Above left', 230: 'Above', 232: 'Above right', 233: 'Double below', 234: 'Double above', 240: 'Below (iota subscript)', } def get_unicode_properties(ch): properties = {} if ch in linecache: fields = linecache[ch].strip().split(';') proplist = ['codepoint', 'name', 'category', 'combining', 'bidi', 'decomposition', 'dummy', 'digit_value', 'numeric_value', 'mirrored', 'unicode1name', 'iso_comment', 'uppercase', 'lowercase', 'titlecase'] for i, prop in enumerate(proplist): if prop!='dummy': properties[prop] = fields[i] if properties['lowercase']: properties['lowercase'] = unichr(int(properties['lowercase'], 16)) if properties['uppercase']: properties['uppercase'] = unichr(int(properties['uppercase'], 16)) if properties['titlecase']: properties['titlecase'] = unichr(int(properties['titlecase'], 16)) properties['combining'] = int(properties['combining']) properties['mirrored'] = properties['mirrored']=='Y' else: properties['codepoint'] = '%04X' % ord(ch) properties['name'] = unicodedata.name(ch, '') properties['category'] = unicodedata.category(ch) properties['combining'] = unicodedata.combining(ch) properties['bidi'] = unicodedata.bidirectional(ch) properties['decomposition'] = unicodedata.decomposition(ch) properties['digit_value'] = unicodedata.digit(ch, '') properties['numeric_value'] = unicodedata.numeric(ch, '') properties['mirrored'] = unicodedata.mirrored(ch) properties['unicode1name'] = '' properties['iso_comment'] = '' properties['uppercase'] = ch.upper() properties['lowercase'] = ch.lower() properties['titlecase'] = '' return properties def do_init(): HomeDir = os.path.expanduser('~/.unicode') HomeUnicodeData = os.path.join(HomeDir, "UnicodeData.txt") global UnicodeDataFileNames UnicodeDataFileNames = [HomeUnicodeData, '/usr/share/unicode/UnicodeData.txt', '/usr/share/unidata/UnicodeData.txt', './UnicodeData.txt'] + \ glob.glob('/usr/share/unidata/UnicodeData*.txt') + \ glob.glob('/usr/share/perl/*/unicore/UnicodeData.txt') + \ glob.glob('/System/Library/Perl/*/unicore/UnicodeData.txt') # for MacOSX HomeUnihanData = os.path.join(HomeDir, "Unihan*") global UnihanDataGlobs UnihanDataGlobs = [HomeUnihanData, '/usr/share/unidata/Unihan*', '/usr/share/unicode/Unihan*', './Unihan*'] def get_unihan_files(): fos = [] # list of file names for Unihan data file(s) for gl in UnihanDataGlobs: fnames = glob.glob(gl) fos += fnames return fos def get_unihan_properties_internal(ch): properties = {} ch = ord(ch) global unihan_fs for f in unihan_fs: fo = OpenGzip(f) for l in fo: if l.startswith('#'): continue line = l.strip() if not line: continue char, key, value = line.strip().split('\t') if int(char[2:], 16) == ch: properties[key] = unicode(value, 'utf-8') elif int(char[2:], 16)>ch: break return properties def get_unihan_properties_zgrep(ch): properties = {} global unihan_fs ch = ord(ch) chs = 'U+%X' % ch for f in unihan_fs: if f.endswith('.gz'): grepcmd = 'zgrep' elif f.endswith('.bz2'): grepcmd = 'bzgrep' else: grepcmd = 'grep' cmdline = grepcmd+' ^'+chs+r'\\b '+f status, output = cmd.getstatusoutput(cmdline) output = output.split('\n') for l in output: if not l: continue char, key, value = l.strip().split('\t') if int(char[2:], 16) == ch: if PY3: properties[key] = value else: properties[key] = unicode(value, 'utf-8') elif int(char[2:], 16)>ch: break return properties # basic sanity check, if e.g. you run this on MS Windows... if os.path.exists('/bin/grep'): get_unihan_properties = get_unihan_properties_zgrep else: get_unihan_properties = get_unihan_properties_internal def error(txt): out(txt) out('\n') sys.exit(1) def get_gzip_filename(fname): "return fname, if it does not exist, return fname+.gz, if neither that, fname+bz2, if neither that, return None" if os.path.exists(fname): return fname if os.path.exists(fname+'.gz'): return fname+'.gz' if os.path.exists(fname+'.bz2') and bz2 is not None: return fname+'.bz2' return None def OpenGzip(fname): "open fname, try fname.gz or fname.bz2 if fname does not exist, return file object or GzipFile or BZ2File object" if os.path.exists(fname) and not (fname.endswith('.gz') or fname.endswith('.bz2')): return open(fname) if os.path.exists(fname+'.gz'): fname = fname+'.gz' elif os.path.exists(fname+'.bz2') and bz2 is not None: fname = fname+'.bz2' if fname.endswith('.gz'): return gzip.GzipFile(fname) elif fname.endswith('.bz2'): return bz2.BZ2File(fname) return None def GrepInNames(pattern, fillcache=False): p = re.compile(pattern, re.I) f = None for name in UnicodeDataFileNames: f = OpenGzip(name) if f != None: break if not fillcache: if not f: out( """ Cannot find UnicodeData.txt, please place it into /usr/share/unidata/UnicodeData.txt, /usr/share/unicode/UnicodeData.txt, ~/.unicode/ or current working directory (optionally you can gzip it). Without the file, searching will be much slower. """ ) for i in xrange(sys.maxunicode): try: name = unicodedata.name(unichr(i)) if re.search(p, name): yield myunichr(i) except ValueError: pass else: for l in f: if re.search(p, l): r = myunichr(int(l.split(';')[0], 16)) linecache[r] = l yield r f.close() else: if f: for l in f: if re.search(p, l): r = myunichr(int(l.split(';')[0], 16)) linecache[r] = l f.close() def valfromcp(n, cp=None): "if fromcp is defined, then the 'n' is considered to be from that codepage and is converted accordingly" if cp: xh = '%x' %n if len(xh) % 2: # pad hexadecimal representation with a zero xh = '0'+xh cps = ( [xh[i:i+2] for i in range(0,len(xh),2)] ) cps = ( chr(int(i, 16)) for i in cps) cps = ''.join(cps) """ if 0 <= n <= 255: s = chr(n) elif 256 <= n <= 65535: s = struct.pack('>H', n) elif 65536 <= n <= sys.maxint: s = struct.pack('>H', n) else: # bad character code, either negative or too big raise ValueError("Bad character code %s" %n) print 'ee',`s` n = unicode(s, cp) """ s = unicode(cps, cp) ns = [ord(x) for x in s] return ns else: return [n] def myunichr(n): try: r = unichr(n) return r except OverflowError: traceback.print_exc() error("The codepoint is too big - it does not fit into an int.") except ValueError: traceback.print_exc() err = "The codepoint is too big." if sys.maxunicode <= 0xffff: err += "\nPerhaps your python interpreter is not compiled with wide unicode characters." error(err) def guesstype(arg): if not arg: # empty string return 'empty string', arg elif not is_ascii(arg): return 'string', arg elif arg[:2]=='U+' or arg[:2]=='u+': # it is hexadecimal number try: val = int(arg[2:], 16) if val>sys.maxunicode: return 'regexp', arg else: return 'hexadecimal', arg[2:] except ValueError: return 'regexp', arg elif arg[0] in "Uu" and len(arg)>4: try: val = int(arg[1:], 16) if val>sys.maxunicode: return 'regexp', arg else: return 'hexadecimal', arg except ValueError: return 'regexp', arg elif len(arg)>=4: if len(arg) in (8, 16, 24, 32): if all(x in '01' for x in arg): val = int(arg, 2) if val<=sys.maxunicode: return 'binary', arg try: val = int(arg, 16) if val>sys.maxunicode: return 'regexp', arg else: return 'hexadecimal', arg except ValueError: return 'regexp', arg else: return 'string', arg def process(arglist, t, fromcp=None): # build a list of values, so that we can combine queries like # LATIN ALPHA and search for LATIN.*ALPHA and not names that # contain either LATIN or ALPHA result = [] names_query = [] # reserved for queries in names - i.e. -r for arg_i in arglist: if t==None: tp, arg = guesstype(arg_i) if tp == 'regexp': # if the first argument is guessed to be a regexp, add # all the following arguments to the regular expression - # this is probably what you wanted, e.g. # 'unicode cyrillic be' will now search for the 'cyrillic.*be' regular expression t = 'regexp' else: tp, arg = t, arg_i if tp=='hexadecimal': val = int(arg, 16) vals = valfromcp(val, fromcp) for val in vals: r = myunichr(val) list(GrepInNames('%04X'%val, fillcache=True)) # fill the table with character properties result.append(r) elif tp=='decimal': val = int(arg, 10) vals = valfromcp(val, fromcp) for val in vals: r = myunichr(val) list(GrepInNames('%04X'%val, fillcache=True)) # fill the table with character properties result.append(r) elif tp=='octal': val = int(arg, 8) vals = valfromcp(val, fromcp) for val in vals: r = myunichr(val) list(GrepInNames('%04X'%val, fillcache=True)) # fill the table with character properties result.append(r) elif tp=='binary': val = int(arg, 2) vals = valfromcp(val, fromcp) for val in vals: r = myunichr(val) list(GrepInNames('%04X'%val, fillcache=True)) # fill the table with character properties result.append(r) elif tp=='regexp': names_query.append(arg) elif tp=='string': try: if PY3: # argv is automatically decoded into unicode, even padded with bogus character if it is not encodable unirepr = arg else: unirepr = unicode(arg, options.iocharset) except UnicodeDecodeError: error ("Sequence %s is not valid in charset '%s'." % (repr(arg), options.iocharset)) unilist = ['%04X'%ord(x) for x in unirepr] unireg = '|'.join(unilist) list(GrepInNames(unireg, fillcache=True)) for r in unirepr: result.append(r) elif tp=='empty string': pass # do not do anything for an empty string if names_query: query = '.*'.join(names_query) for r in GrepInNames(query): result.append(r) return result def maybe_colours(colour): if use_colour: return colours[colour] else: return "" # format key and value def printkv(*l): for i in range(0, len(l), 2): if i options.maxcount: out("\nToo many characters to display, more than %s, use --max option to change it\n" % options.maxcount) return properties = get_unicode_properties(c) out(maybe_colours('bold')) out('U+%04X '% ord(c)) if properties['name']: out(properties['name']) else: out(maybe_colours('default')) out(" - No such unicode character name in database") out(maybe_colours('default')) out('\n') ar = ["UTF-8", ' '.join([("%02x" % ord23(x)) for x in c.encode('utf-8')]) , "UTF-16BE", ''.join([("%02x" % ord23(x)) for x in c.encode('utf-16be')]), "Decimal", "&#%s;" % ord(c) ] if options.addcharset: try: rep = ' '.join([("%02x" % ord(x)) for x in c.encode(options.addcharset)] ) except UnicodeError: rep = "NONE" ar.extend( [options.addcharset, rep] ) printkv(*ar) if properties['combining']: pc = " "+c else: pc = c out(pc) uppercase = properties['uppercase'] lowercase = properties['lowercase'] if uppercase: out(" (%s)" % uppercase) out('\n') printkv( "Uppercase", 'U+%04X'% ord(properties['uppercase']) ) elif lowercase: out(" (%s)" % properties['lowercase']) out('\n') printkv( "Lowercase", 'U+%04X'% ord(properties['lowercase']) ) else: out('\n') printkv( 'Category', properties['category']+ " (%s)" % general_category[properties['category']] ) if properties['numeric_value']: printkv( 'Numeric value', properties['numeric_value']) if properties['digit_value']: printkv( 'Digit value', properties['digit_value']) bidi = properties['bidi'] if bidi: printkv( 'Bidi', bidi+ " (%s)" % bidi_category[bidi] ) mirrored = properties['mirrored'] if mirrored: out('Character is mirrored\n') comb = properties['combining'] if comb: printkv( 'Combining', str(comb)+ " (%s)" % (comb_classes.get(comb, '?')) ) decomp = properties['decomposition'] if decomp: printkv( 'Decomposition', decomp ) if options.verbosity>0: uhp = get_unihan_properties(c) for key in uhp: printkv(key, uhp[key]) out('\n') def print_block(block): #header out(" "*10) for i in range(16): out(".%X " % i) out('\n') #body for i in range(block*16, block*16+16): hexi = "%X" % i if len(hexi)>3: hexi = "%07X" % i hexi = hexi[:4]+" "+hexi[4:] else: hexi = " %03X" % i out(LTR+hexi+". ") for j in range(16): c = unichr(i*16+j) if unicodedata.combining(c): c = " "+c out(c) out(' ') out('\n') out('\n') def print_blocks(blocks): for block in blocks: print_block(block) def is_range(s, typ): sp = s.split('..') if len(sp)!=2: return False if not sp[1]: sp[1] = sp[0] elif not sp[0]: sp[0] = sp[1] if not sp[0]: return False low = list(process([sp[0]], typ)) # intentionally no fromcp here, ranges are only of unicode characters high = list(process([sp[1]], typ)) if len(low)!=1 or len(high)!=1: return False low = ord(low[0]) high = ord(high[0]) low = low // 256 high = high // 256 + 1 return range(low, high) parser = OptionParser(usage="usage: %prog [options] arg") parser.add_option("-x", "--hexadecimal", action="store_const", const='hexadecimal', dest="type", help="Assume arg to be hexadecimal number") parser.add_option("-o", "--octal", action="store_const", const='octal', dest="type", help="Assume arg to be octal number") parser.add_option("-b", "--binary", action="store_const", const='binary', dest="type", help="Assume arg to be binary number") parser.add_option("-d", "--decimal", action="store_const", const='decimal', dest="type", help="Assume arg to be decimal number") parser.add_option("-r", "--regexp", action="store_const", const='regexp', dest="type", help="Assume arg to be regular expression") parser.add_option("-s", "--string", action="store_const", const='string', dest="type", help="Assume arg to be a sequence of characters") parser.add_option("-a", "--auto", action="store_const", const=None, dest="type", help="Try to guess arg type (default)") parser.add_option("-m", "--max", action="store", default=10, dest="maxcount", type="int", help="Maximal number of codepoints to display, default: 10; 0=unlimited") parser.add_option("-i", "--io", action="store", default=iocharsetguess, dest="iocharset", type="string", help="I/O character set, I am guessing %s" % iocharsetguess) parser.add_option("--fcp", "--fromcp", action="store", default='', dest="fromcp", type="string", help="Convert numerical arguments from this encoding, default: no conversion") parser.add_option("-c", "--charset-add", action="store", dest="addcharset", type="string", help="Show hexadecimal reprezentation in this additional charset") parser.add_option("-C", "--colour", action="store", dest="use_colour", type="string", default="auto", help="Use colours, on, off or auto") parser.add_option('', "--color", action="store", dest="use_colour", type="string", default="auto", help="synonym for --colour") parser.add_option("-v", "--verbose", action="count", dest="verbosity", default=0, help="Increase verbosity (reads Unihan properties - slow!)") parser.add_option("-w", "--wikipedia", action="count", dest="query_wiki", default=0, help="Query wikipedia for the character") parser.add_option("--list", action="store_const", dest="list_all_encodings", const=True, help="List (approximately) all known encodings") (options, arguments) = parser.parse_args() linecache = {} do_init() if options.list_all_encodings: all_encodings = os.listdir(os.path.dirname(encodings.__file__)) all_encodings = set([os.path.splitext(x)[0] for x in all_encodings]) all_encodings = list(all_encodings) all_encodings.sort() print (textwrap.fill(' '.join(all_encodings))) sys.exit() if len(arguments)==0: parser.print_help() sys.exit() if options.use_colour.lower() in ("on", "1", "true", "yes"): use_colour = True elif options.use_colour.lower() in ("off", "0", "false", "no"): use_colour = False else: use_colour = sys.stdout.isatty() if sys.platform == 'win32': use_colour = False l_args = [] # list of non range arguments to process for argum in arguments: is_r = is_range(argum, options.type) if is_r: print_blocks(is_r) else: l_args.append(argum) if l_args: unihan_fs = [] if options.verbosity>0: unihan_fs = get_unihan_files() # list of file names for Unihan data file(s), empty if not available if not unihan_fs: out( """ Unihan_*.txt files not found. In order to view Unihan properties, please place the file into /usr/share/unidata/, /usr/share/unicode/, ~/.unicode/ or current working directory (optionally you can gzip or bzip2 them). You can get the files by unpacking ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip Warning, listing UniHan Properties is rather slow. """) options.verbosity = 0 try: print_characters(process(l_args, options.type, options.fromcp), options.maxcount, options.query_wiki) except IOError: # e.g. broken pipe pass unicode-0.9.7/COPYING0000644000000000000000000000001012010413116011052 0ustar GPL v3