icu4j-4.2/0000755000175000017500000000000011410460421012324 5ustar twernertwernericu4j-4.2/.classpath0000644000175000017500000000056711361046450014326 0ustar twernertwerner icu4j-4.2/readme.html0000644000175000017500000014162011361046450014462 0ustar twernertwerner ReadMe for ICU4J

International Components for Unicode for Java (ICU4J)

Read Me for ICU4J 4.2.1.1


Date
April 8, 2010

Note: This is an update release of ICU4J 4.2. This release contains several bug fixes and updated data, but does not introduce any new APIs or functionalities.

For the most recent release, see the ICU4J download site.

Contents

Introduction to ICU4J

The International Components for Unicode (ICU) library provides robust and full-featured Unicode services on a wide variety of platforms. ICU supports the most current version of the Unicode standard, including support for supplementary characters (needed for GB 18030 repertoire support).

Java provides a strong foundation for global programs, and IBM and the ICU team played a key role in providing globalization technology to Java. But because of its long release schedule, Java cannot always keep up with evolving standards. The ICU team continues to extend Java's Unicode and internationalization support, focusing on improving performance, keeping current with the Unicode standard, and providing richer APIs, while remaining as compatible as possible with the original Java text and internationalization API design.

ICU4J is an add-on to the regular JRE that provides:

Note: We continue to provide assistance to Sun, and in some cases, ICU4J support has been rolled into a later release of Java. For example, the Thai word-break is now in Java 1.4. However, the most current and complete version is always found in ICU4J.

What Is New In This Release?

See the ICU 4.2 download page about new features in this release. The list of API changes since the previous ICU4J release is available here.

Summary of changes in 4.2.1.1 (April 8, 2010)

Time zone data update

ICU4J 4.2.1.1 updates the time zone data to version 2010h from the Olson tz database.

Bug fixes

This release contains following bug fixes ported from ICU4J 4.4 development code stream

Summary of changes in 4.2.1 (July 1, 2009)

Locale data update

ICU4J 4.2.1 updates the CLDR version to 1.7.1.

Time zone data update

ICU4J 4.2.1 updates the time zone data to version 2009j from the Olson tz database.

Bug fixes

This release contains following bug fixes (not a complete list)

License Information

The ICU projects (ICU4C and ICU4J) use the X license. The X license is suitable for commercial use and is a recommended free software license that is compatible with the GNU GPL license. This became effective with release 1.8.1 of ICU4C and release 1.3.1 of ICU4J in mid-2001. All new ICU releases will adopt the X license; previous ICU releases continue to utilize the IPL (IBM Public License). Users of previous releases of ICU who want to adopt new ICU releases will need to accept the terms and conditions of the X license.

The main effect of the change is to provide GPL compatibility. The X license is listed as GPL compatible, see the GNU page at http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses. This means that GPL projects can now use ICU code, it does not mean that projects using ICU become subject to GPL.

The IBM version contains the essential text of the license, omitting the X-specific trademarks and copyright notices. The full copy of ICU's license is included in the download package.

Platform Dependencies

By default ICU4J depends on functionality available in J2SE 5 or later releases. The binary distribution of ICU4J jar file may have a problem with older JRE versions. We provide the ability to build a variant of ICU4J for JRE 1.3 or 1.4. If you want to use ICU4J running on JRE 1.4, you should build ICU4J libraries from the source package with JDK 1.4. With older JDK releases, some build targets may not work.

The table below shows operating systems and JRE/JDK versions currently used by the ICU development team.

Operating System Sun Java SE IBM Java SE
1.6.0 1.5.0 1.4.2 1.6.0 1.5.0 1.4.2
AIX 5.2 - - - Regularly tested Regularly tested Rarely tested
AIX 5.3 - - - Regularly tested Regularly tested Rarely tested
AIX 6.1 - - - Reference platform Regularly tested Rarely tested
HP-UX 11 (PA-RISC) Regularly tested Regularly tested Rarely tested - - -
HP-UX 11 (IA64) Regularly tested Regularly tested Rarely tested - - -
Redhat Enterprise Linux 4 (x86) Regularly tested Regularly tested Rarely tested Regularly tested Regularly tested Rarely tested
Redhat Enterprise Linux 5 (x86) Regularly tested Regularly tested Rarely tested Regularly tested Regularly tested Rarely tested
Solaris 9 (SPARC) Regularly tested Regularly tested Rarely tested - - -
Solaris 10 (SPARC) Reference platform Regularly tested Rarely tested - - -
Windows XP Regularly tested Regularly tested Rarely tested Regularly tested Regularly tested Rarely tested
Windows Vista Regularly tested Regularly tested Rarely tested Reference platform Regularly tested Rarely tested
Windows 2008 Server Regularly tested Regularly tested Rarely tested Regularly tested Regularly tested Rarely tested

How to Download ICU4J

There are two ways to download the ICU4J releases.

For more details on how to download ICU4J directly from the web site, please see the ICU downloads page at http://www.icu-project.org/download/

The Structure and Contents of ICU4J

Below, $icu4j_root is the placement of the icu directory in your file system, like "drive:\...\icu4j" in your environment. "drive:\..." stands for any drive and any directory on that drive that you chose to install icu4j into.

Information and build files:

readme.html
(this file)
A description of ICU4J (International Components for Unicode for Java)
license.html The X license, used by ICU4J
build.xml Ant build file. See How to Install and Build for more information

The source directories mirror the package structure of the code.
Core packages become part of the ICU4J jar file.
Charset packages become part of the ICU4J charset jar file.
LocaleSPI packages become part of the ICU4J Locale service provider jar file.
RichText classes are Core and API, but can be removed from icu4j.jar, and can be built into their own jar.
API packages contain classes with supported API.

$icu4j_root/src/com/ibm/icu/charset
Charset, API
Packages that provide Charset conversion
$icu4j_root/src/com/ibm/icu/dev
Non-Core, Non-API
Packages used for internal development:
  • Data: data used by tests and in building ICU
  • Demos: Calendar, Holiday, Break Iterator, Rule-based Number Format, Transformations
    (See below for more information about the demos.)
  • Tests: API and coverage tests of all functionality.
    For information about running the tests, see $icu4j_root/src/com/ibm/icu/dev/test/TestAll.java.
  • Tools: tools used to build data tables, etc.
$icu4j_root/src/com/ibm/icu/impl
Core, Non-API
These are utility classes used from different ICU4J core packages.
$icu4j_root/src/com/ibm/icu/lang
Core, API
Character properties package.
$icu4j_root/src/com/ibm/icu/math
Core, API
Additional math classes.
$icu4j_root/src/com/ibm/icu/text
Core, API
Additional text classes. These add to, and in some cases replace, related core Java classes:
  • Arabic shaping
  • Bidirectional text manipulation
  • Break iteration
  • Date formatting
  • Number formatting
  • Transliteration
  • Normalization
  • String manipulation
  • Collation
  • String search
  • Unicode compression
  • Unicode sets
$icu4j_root/src/com/ibm/icu/util
Core, API
Additional utility classes:
  • Calendars - Gregorian, Buddhist, Coptic, Ethiopic, Hebrew, Islamic, Japanese, Chinese and others
  • Holiday
  • TimeZone
  • VersionInfo
  • Iteration
  • Currency
$icu4j_root/src/com/ibm/richtext
RichText,Non-API
Styled text editing package. This includes demos, tests, and GUIs for editing and displaying styled text. The richtext package provides a scrollable display, typing, arrow-key support, tabs, alignment and justification, word- and sentence-selection (by double-clicking and triple-clicking, respectively), text styles, clipboard operations (cut, copy and paste) and a log of changes for undo-redo. Richtext uses Java's TextLayout and complex text support (provided to Sun by the ICU4J team).
$icu4j_root/localespi/src/com/ibm/icu/impl
LocaleSPI,Non-API
Packages for ICU4J Locale Service Provider runtime code implementing the Java SE 6 locale sensitive service provider interfaces.
$icu4j_root/localespi/src/com/ibm/icu/dev
LocaleSPI,Non-API
Packages used for internal development for ICU4J Locale Service Provider, including test cases.

Building ICU4J creates and populates the following directories:

$icu4j_root/classes contains all class files
$icu4j_root/doc contains JavaDoc for all packages

ICU4J data is stored in the following locations:

com.ibm.icu.impl.data Holds data used by the ICU4J core and charset packages (com.ibm.icu.lang, com.ibm.icu.text, com.ibm.icu.util, com.ibm.icu.math, com.ibm.icu.text and com.ibm.icu.charset). In particular, all resource information is stored here.
com.ibm.icu.dev.data Holds data that is not part of ICU4J core, but rather part of a test, sample, or demo.

Where to get Documentation

The ICU user's guide contains lots of general information about ICU, in its C, C++, and Java incarnations.

The complete API documentation for ICU4J (javadoc) is available on the ICU4J web site, and can be built from the sources:

How to Install and Build

To install ICU4J, simply place the prebuilt jar file icu4j.jar on your Java CLASSPATH. If you need Charset API support please place icu4j-charsets.jar on your class path. No other files are needed.

To build ICU4J, you will need a J2SE SDK and the Ant build system. We strongly recommend using the Ant build system to build ICU4J. It's recommended to install both the J2SE SDK and Ant somewhere outside the ICU4J directory. For example, on Linux you might install these in /usr/local.

Once the J2SE SDK and Ant are installed, building is just a matter of typing ant in the ICU4J root directory. This causes the Ant build system to perform a build as specified by the file build.xml, located in the ICU4J root directory. You can give Ant options like -verbose, and you can specify targets. Ant will only build what's been changed and will resolve dependencies properly. For example:

C:\icu4j>ant
Buildfile: build.xml

checkAntVersion:

warnAntVersion:

initBase:
    [mkdir] Created dir: C:\icu4j\classes
     [echo] java home: C:\jdk1.6.0
     [echo] java version: 1.6.0
     [echo] ant java version: 1.6
     [echo] Apache Ant version 1.7.0 compiled on December 13 2006
     [echo] ICU4JDEV with Windows XP 5.1 build 2600 Service Pack 3 on x86
     [echo] clover initstring = '${clover.initstring}'
     [echo] target runtime environment: JAVASE6
     [echo] Initialized at 2009-04-25 at 05:25:03 EDT

buildMangle:
    [javac] Compiling 1 source file to C:\icu4j\classes

initSrc:

displayBuildEnvWarning:

doMangle:
     [echo] Running source code preprocessor [java com.ibm.icu.dev.tool.docs.Cod
eMangler -dJAVASE6 -n @preprocessor.txt]

init:

coreData:
     [copy] Copying 1 file to C:\icu4j\classes\com\ibm\icu

icudata:
    [unjar] Expanding: C:\icu4j\src\com\ibm\icu\impl\data\icudata.jar into c:\ic
u4j\classes
     [copy] Copying 1 file to C:\icu4j\classes\META-INF

durationdata:
     [copy] Copying 16 files to C:\icu4j\classes\com\ibm\icu\impl\duration\i
mpl\data

core:
    [javac] Compiling 351 source files to C:\icu4j\classes
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.

BUILD SUCCESSFUL
Total time: 44 seconds
Note: The above output is an example. The numbers are likely to be different with the current version ICU4J.

The following are some targets that you can provide to ant. For more targets run ant -projecthelp or see the build.xml file.

all Build all targets.
core Build the main class files in the subdirectory classes. If no target is specified, core is assumed.
tests Build the test class files.
demos Build the demos.
tools Build the tools.
docs Run javadoc over the main class files, generating an HTML documentation tree in the subdirectory doc.
jar Create a jar archive icu4j.jar in the root ICU4J directory containing the main class files.
jarSrc Like the jar target, but containing only the source files.
jarDocs Like the jar target, but containing only the docs.
richedit Build the richedit core class files and tests.
richeditJar Create the richedit jar file (which contains only the richedit core class files). The file richedit.jar will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten.
richeditZip Create a zip archive of the richedit docs and jar file for distribution. The zip file richedit.zip will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten.
clean Remove all built targets, leaving the source.

For more information, read the Ant documentation and the build.xml file.

After doing a build it is a good idea to run all the icu4j tests by typing
"ant check" or "java -classpath classes com.ibm.icu.dev.test.TestAll -nothrow".

Eclipse users: See the ICU4J site for information on how to configure Eclipse to build and develop ICU4J on Eclipse IDE.

Note: To install/build ICU4J Locale Service Provider, please refer Read Me for ICU4J Locale Service Provider.

How to modularize ICU4J

Some clients may not wish to ship all of ICU4J with their application, since the application might only use a small part of ICU4J. ICU4J release 2.6 and later provide build options to build individual ICU4J 'modules' for a more compact distribution. For more details, please refer to the section Modularization of ICU4J in the ICU user's guide article Packaging ICU.

Trying Out ICU4J

Note: the demos provided with ICU4J are for the most part undocumented. This list can show you where to look, but you'll have to experiment a bit. The demos (with the exception of richedit) are unsupported and may change or disappear without notice.

The icu4j.jar file contains only the core ICU4J classes, not the demo classes, so unless you build ICU4J there is little to try out.

Charset

To try out the Charset package, build icu4j.jar and icu4j-charsets.jar using 'jar' target. You can use the charsets by placing these files on your classpath.
java -cp $icu4j_root/icu4j.jar:$icu4j_root/icu4j-charsets.jar <your program>

Rich Edit

To try out the richedit package, first build the richeditJar target. This is a 'runnable' jar file. To run the richedit demo, type:
java -jar $icu4j_root/richedit/richedit.jar
This will present an empty edit pane with an awt interface.

With a fuller command line you can try out other options, for example:

java -classpath $icu4j_root/richedit/richedit.jar com.ibm.richtext.demo.EditDemo [-swing][file]

This will use an awt GUI, or a swing GUI if -swing is passed on the command line. It will open a text file if one is provided, otherwise it will open a blank page. Click to type.

You can add tabs to the tab ruler by clicking in the ruler while holding down the control key. Clicking on an existing tab changes between left, right, center, and decimal tabs. Dragging a tab moves it, dragging it off the ruler removes it.

You can experiment with complex text by using the keymap functions. Please note that these are mainly for demo purposes, for real work with Arabic or Hebrew you will want to use an input method. You will need to use a font that supports Arabic or Hebrew, 'Lucida Sans' (provided with Java) supports these languages.

Other demos

The other demo programs are not supported and exist only to let you experiment with the ICU4J classes. First, build ICU4J using ant all. Then try one of the following:

ICU4J Resource Information

Starting with release 2.1, ICU4J includes its own resource information which is completely independent of the JRE resource information. (Note, ICU4J 2.8 to 3.4, time zone information depends on the underlying JRE). The ICU4J resource information is equivalent to the information in ICU4C and many resources are, in fact, the same binary files that ICU4C uses.

By default the ICU4J distribution includes all of the standard resource information. It is located under the directory com/ibm/icu/impl/data. Depending on the service, the data is in different locations and in different formats. Note: This will continue to change from release to release, so clients should not depend on the exact organization of the data in ICU4J.

Some of the data files alias or otherwise reference data from other data files. One reason for this is because some locale names have changed. For example, he_IL used to be iw_IL. In order to support both names but not duplicate the data, one of the resource files refers to the other file's data. In other cases, a file may alias a portion of another file's data in order to save space. Currently ICU4J provides no tool for revealing these dependencies.

Note: Java's Locale class silently converts the language code "he" to "iw" when you construct the Locale (for versions of Java through Java 5). Thus Java cannot be used to locate resources that use the "he" language code. ICU, on the other hand, does not perform this conversion in ULocale, and instead uses aliasing in the locale data to represent the same set of data under different locale ids.

Resource files that use locale ids form a hierarchy, with up to four levels: a root, language, region (country), and variant. Searches for locale data attempt to match as far down the hierarchy as possible, for example, "he_IL" will match he_IL, but "he_US" will match he (since there is no US variant for he, and "xx_YY will match root (the default fallback locale) since there is no xx language code in the locale hierarchy. Again, see java.util.ResourceBundle for more information.

Currently ICU4J provides no tool for revealing these dependencies between data files, so trimming the data directly in the ICU4J project is a hit-or-miss affair. The key point when you remove data is to make sure to remove all dependencies on that data as well. For example, if you remove he.res, you need to remove he_IL.res, since it is lower in the hierarchy, and you must remove iw.res, since it references he.res, and iw_IL.res, since it depends on it (and also references he_IL.res).

Unfortunately, the jar tool in the JDK provides no way to remove items from a jar file. Thus you have to extract the resources, remove the ones you don't want, and then create a new jar file with the remining resources. See the jar tool information for how to do this. Before 'rejaring' the files, be sure to thoroughly test your application with the remaining resources, making sure each required resource is present.

Using additional resource files with ICU4J

Warning: Resource file formats can change across releases of ICU4J!
The format of ICU4J resources is not part of the API. Clients who develop their own resources for use with ICU4J should be prepared to regenerate them when they move to new releases of ICU4J.

We are still developing ICU4J's resource mechanism. Currently it is not possible to mix icu's new binary .res resources with traditional java-style .class or .txt resources. We might allow for this in a future release, but since the resource data and format is not formally supported, you run the risk of incompatibilities with future releases of ICU4J.

Resource data in ICU4J is checked in to the repository as a jar file containing the resource binaries, icudata.jar. This means that inspecting the contents of these resources is difficult. They currently are compiled from ICU4C .txt file data. You can view the contents of the ICU4C text resource files to understand the contents of the ICU4J resources.

The files in icudata.jar get extracted to com/ibm/icu/impl/data in the build directory when the 'core' target is built. Building the 'resources' target will force the resources to once again be extracted. Extraction will overwrite any corresponding resource files already in that directory.

Building ICU4J Resources from ICU4C

ICU4J data is built by ICU4C tools. Please see "icu4j-readme.txt" in $icu4c_root/source/data for the procedures.
Generating Data from CLDR
Note: This procedure assumes that all 3 sources are in sibling directories
  1. Checkout CLDR. $cldr_root in the following steps is the root directory where the CLDR source files checked out.
  2. Update $cldr_root/common to 'release-1-7-0' tag
  3. Update $cldr_root/tools to 'release-1-7-0' tag
  4. Checkout ICU4C with tag 'release-4-2'
  5. Checkout ICU4J with tag 'release-4-2'
  6. Build ICU4J
  7. Build ICU4C
  8. Change to $cldr_root/tools/java directory
  9. Build CLDR using ant after pointing ICU4J_CLASSES env var to the newly build ICU4J
  10. cd to $icu4c_root/source/data directory
  11. Follow the instructions in the cldr-icu-readme.txt
  12. Build ICU4C data from CLDR
  13. Build ICU4J data from ICU4C data by following the procedures in $icu4c_root/source/data/icu4j-readme.txt
  14. cd to $icu4j_root dir
  15. Build and test icu4j

About ICU4J Time Zone

ICU4J 4.2 includes time zone data version 2009g, which is the latest one as of the release date. However, time zone data is frequently updated in response to changes made by local governments around the world. If you need to update the time zone data, please refer the ICU user guide topic Updating the Time Zone Data.

Starting with ICU4J 4.0, you can optionally configure ICU4J date and time service classes to use underlying JDK TimeZone implementation (see the ICU4J API reference TimeZone for the details). When this configuration is enabled, ICU's own time zone data won't be used and you have to get time zone data patches from the JRE vendor.

Where to Find More Information

http://www.icu-project.org/ is the home page of International Components for Unicode development project

http://www.ibm.com/software/globalization/icu/ is a pointer to general information about the International Components for Unicode hosted by IBM

http://www.ibm.com/software/globalization/ is a pointer to information on how to make applications global.

Submitting Comments, Requesting Features and Reporting Bugs

Your comments are important to making ICU4J successful. We are committed to investigate any bug reports or suggestions, and will use your feedback to help plan future releases.

To submit comments, request features and report bugs, please see ICU bug database information or contact us through the ICU Support mailing list. While we are not able to respond individually to each comment, we do review all comments.



Thank you for your interest in ICU4J!



Copyright © 2002-2010 International Business Machines Corporation and others. All Rights Reserved.
4400 North First Street, San José, CA 95193, USA

icu4j-4.2/localespi/0000755000175000017500000000000011361046446014313 5ustar twernertwernericu4j-4.2/localespi/.classpath0000644000175000017500000000061311361046446016276 0ustar twernertwerner icu4j-4.2/localespi/readme.html0000644000175000017500000003255611361046446016451 0ustar twernertwerner ReadMe for ICU4J Locale Service Provider

ICU4J Locale Service Provider

Read Me for ICU4J Locale Service Provider


Note: This is a technical preview for ICU4J Locale Service Provider. This component may be changed or removed in future release of ICU4J.

Contents

Introduction

Java SE 6 introduced a new feature which allows Java user code to extend locale support in Java runtime environment. JREs shipped by Sun or IBM come with decent locale coverage, but some users may want more locale support. Java SE 6 includes abstract classes extending java.util.spi.LocaleServiceProvider. Java SE 6 users can create a subclass of these abstract class to supply their own locale support for text break, collation, date/number formatting or providing translations for currency, locale and time zone names.

ICU4J has been providing more comprehensive locale coverage than standard JREs. However, Java programmers have to use ICU4J's own internationalization service APIs (com.ibm.icu.*) to utilize the rich locale support. Sometimes, the migration is not an option for various reasons. For example, your code may depend on existing Java libraries utilizing JDK internationalization service APIs, but you have no access to the source code. In this case, it is not possible to modify the libraries to use ICU4J APIs.

ICU4J Locale Service Provider is a component consists of classes implementing the Java SE 6 locale sensitive service provider interfaces. Available service providers are -

ICU4J Locale Service Provider is designed to work as installed extensions in a JRE. Once the component is configured properly, Java application running on the JRE automatically picks the ICU4J's internationalization service implementation when a requested locale is not available in the JRE.

Using ICU4J Locale Service Provider

Java SE 6 locale sensitive service providers are using the Java Extension Mechanism. An implementation of a locale sensitive service provider is installed as an optional package to extend the functionality of the Java core platform. To install an optional package, its JAR files must be placed in the Java extension directory. The standard location is <java-home>/lib/ext. You can alternatively use the system property java.ext.dirs to specify one or more locations where optional packages are installed. For example, if the JRE root directry is JAVA_HOME and you put ICU4J Locale Service Provider files in ICU_SPI_DIR, the ICU4J Locale Service Provider is enabled by the following command.

  java -Djava.ext.dirs=%JAVA_HOME%\lib\ext;%ICU_SPI_DIR% <your_java_app>  [Microsoft Windows]
  java -Djava.ext.dirs=$JAVA_HOME/lib/ext:$ICU_SPI_DIR <your_java_app>    [Linux, Solaris and other unix like platforms]

The ICU4J's implementations of Java SE 6 locale sensitive service provider interfaces and configuration files are packaged in a single JAR file (icu4j-localespi-<version>.jar). But the actual implementation of the service classes and data are in the ICU4J core JAR file (icu4j-<version>.jar). So you need to put the localespi JAR file along with the core JAR file in the Java extension directory.

Once the ICU4J Locale Service Provider is installed properly, factory methods in JDK internationalization classes look for the implementation provided by ICU4J when a requested locale is not supported by the JDK service class. For example, locale af_ZA (Afrikaans - South Africa) is not supported by JDK DateFormat in Sun Java SE 6. The following code snippet returns an instance of DateFormat from ICU4J Locale Service Provider and prints out the current date localized for af_ZA.

  DateFormat df = DateFormat.getDateInstance(DateFormat.LONG, new Locale("af", "ZA"));
  System.out.println(df.format(new Date()));
Sample output:
  2008 Junie 19     [With ICU4J Locale Service Provider enabled]
  June 19, 2008     [Without ICU4J Locale Service Provider]

Optional Configuration

Enabling or disabling individual service

By default, all Java 6 SE locale sensitive service providers are enabled in the ICU4J Locale Service Provider JAR file. If you want to disable specific providers supported by ICU4J, you can remove the corresponding provider configuration files from META-INF/services in the localespi JAR file. For example, if you do not want to use ICU's time zone name service at all, you can remove the file: META-INF/services/java.util.spi.TimeZoneNameProvider from the JAR file.

Note: Disabling DateFormatSymbolsProvider/DecimalFormatSymbolsProvider won't affect the localized symbols actually used by DateFormatProvider/NumberFormatProvider by the current implementation. These services are implemented independently.

Configuring the behavior of ICU4J Locale Service Provider

com/ibm/icu/impl/javaspi/ICULocaleServiceProviderConfig.properties in the localespi JAR file is used for configuring the behavior of the ICU4J Locale Service Provider implementation. There are some configuration properties available. See the table below for each configuration in detail.

Property Value Default Description
com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.enableIcuVariants "true" or "false" "true" Whether if Locales with ICU's variant suffix will be included in getAvailableLocales. The current Java SE 6 locale sensitive service does not allow user provided provider implementations to override locales supported by JRE itself. When this property is "true"(default), ICU4J Locale Service Provider includes Locales with the suffix(com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.icuVariantSuffix) in the variant field. For example, the ICU4J provider includes locales fr_FR and fr_FR_ICU in the available locale list. So JDK API user can still access the internationalization service object created by the ICU4J provider by the special locale fr_FR_ICU.
com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.icuVariantSuffix Any String "ICU" Suffix string used in Locale's variant field to specify the ICU implementation.
com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.enableIso3Languages "true" or "false" "true" Whether if 3-letter language Locales are included in getAvailabeLocales. Use of 3-letter language codes in java.util.Locale is not supported by the API reference document. However, the implementation does not check the length of language code, so there is no practical problem with it.
com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.useDecimalFormat "true" or "false" "false" Whether if java.text.DecimalFormat subclass is used for NumberFormat#getXXXInstance. DecimalFormat#format(Object,StringBuffer,FieldPosition) is declared as final, so ICU cannot override the implementation. As a result, some number types such as BigInteger/BigDecimal are not handled by the ICU implementation. If a client expects NumberFormat#getXXXInstance returns a DecimalFormat (for example, need to manipulate decimal format patterns), he/she can set true to this setting. However, in this case, BigInteger/BigDecimal support is not done by ICU's implementation.

Building and Testing ICU4J Locale Service Provider

ICU4J Locale Service Provider classes depend on the ICU4J core files. To build the component, you should build the ICU4J core JAR file first. Please refer ReadMe for ICU4J to set up the build environment. The actual steps building the ICU4J Locale Service Provider JAR file are described below. The examples used in these steps assume the ICU4J source package is extracted to C:\icu4j on Microsoft Windows.

1. Build the ICU4J core JAR file (icu4j.jar).

  C:\icu4j>ant jar

2. Set JAVA_HOME to JDK 6.0. The ICU4J Locale Service Provider requires JDK 6.0. If you used JDK 6.0 for building the ICU4J core JAR file, this step is not necessary.

  C:\icu4j>set JAVA_HOME=C:\jdk1.6.0

3. Change directory to a subdirectory named "localespi"

4. Run ant with the default target ("jar").

  C:\icu4j\localespi>ant
  Buildfile: build.xml

  check-env-java6:

  icu4j-jar:

  compile:
      [mkdir] Created dir: C:\icu4j\localespi\classes
      [javac] Compiling 22 source files to C:\icu4j\localespi\classes

  jar:
        [jar] Building jar: C:\icu4j\localespi\icu4j-localespi.jar

  build-jar:

  BUILD SUCCESSFUL
  Total time: 5 seconds

With above steps, icu4j-localespi.jar is created in localespi subdirectory. To verify the ICU4J Locale Service Provider component, you can run ant with the target "check". (Note: The target "check" actually creates icu4j-localespi.jar, so you can simply run this target to build and test the component.)

  C:\icu4j\localespi\ant check
After compiling classes used for testing, the test cases for ICU4J Locale Service Provider are executed. The test output looks like below.
     [java] TestAll {
     [java]   BreakIteratorTest {
     [java]     TestGetInstance Passed
     [java]     TestICUEquivalent Passed
     [java]   } Passed
     [java]   CollatorTest {
     [java]     TestGetInstance Passed
     [java]     TestICUEquivalent Passed
     [java]   } Passed
     [java]   CurrencyNameTest {
     [java]     TestCurrencySymbols Passe
     [java]   } Passed
     [java]   DateFormatSymbolsTest {
     [java]     TestGetInstance Passed
     [java]     TestICUEquivalent Passed
     [java]     TestNynorsk Passed
     [java]     TestSetSymbols Passed
     [java]   } Passed
     [java]   DateFormatTest {
     [java]     TestGetInstance Passed
     [java]     TestICUEquivalent Passed
     [java]     TestThaiDigit Passed
     [java]   } Passed
     [java]   DecimalFormatSymbolsTest {
     [java]     TestGetInstance Passed
     [java]     TestICUEquivalent Passed
     [java]     TestSetSymbols Passed
     [java]   } Passed
     [java]   LocaleNameTest {
     [java]     TestCountryNames Passed
     [java]     TestLanguageNames Passed
     [java]     TestVariantNames Passed
     [java]   } Passed
     [java]   NumberFormatTest {
     [java]     TestGetInstance Passed
     [java]     TestICUEquivalent Passed
     [java]   } Passed
     [java]   TimeZoneNameTest {
     [java]     TestTimeZoneNames Passed
     [java]   } Passed
     [java] } Passed

Copyright © 2008-2009 International Business Machines Corporation and others. All Rights Reserved.
4400 North First Street, San José, CA 95193, USA

icu4j-4.2/localespi/src/0000755000175000017500000000000011361046444015100 5ustar twernertwernericu4j-4.2/localespi/src/META-INF/0000755000175000017500000000000011361046444016240 5ustar twernertwernericu4j-4.2/localespi/src/META-INF/services/0000755000175000017500000000000011361046444020063 5ustar twernertwernericu4j-4.2/localespi/src/META-INF/services/java.util.spi.TimeZoneNameProvider0000644000175000017500000000026311361046444026543 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j time zone name com.ibm.icu.impl.javaspi.util.TimeZoneNameProviderICU icu4j-4.2/localespi/src/META-INF/services/java.text.spi.BreakIteratorProvider0000644000175000017500000000026411361046444026756 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j break iterator com.ibm.icu.impl.javaspi.text.BreakIteratorProviderICU icu4j-4.2/localespi/src/META-INF/services/java.util.spi.CurrencyNameProvider0000644000175000017500000000026211361046444026602 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j currency name com.ibm.icu.impl.javaspi.util.CurrencyNameProviderICU icu4j-4.2/localespi/src/META-INF/services/java.text.spi.CollatorProvider0000644000175000017500000000025111361046444025773 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j collator com.ibm.icu.impl.javaspi.text.CollatorProviderICU icu4j-4.2/localespi/src/META-INF/services/java.text.spi.DateFormatSymbolsProvider0000644000175000017500000000027611361046444027622 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j date format symbpols com.ibm.icu.impl.javaspi.text.DateFormatSymbolsProviderICU icu4j-4.2/localespi/src/META-INF/services/java.text.spi.NumberFormatProvider0000644000175000017500000000026511361046444026622 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j number formatter com.ibm.icu.impl.javaspi.text.NumberFormatProviderICU icu4j-4.2/localespi/src/META-INF/services/java.text.spi.DateFormatProvider0000644000175000017500000000026111361046444026243 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j date formatter com.ibm.icu.impl.javaspi.text.DateFormatProviderICU icu4j-4.2/localespi/src/META-INF/services/java.text.spi.DecimalFormatSymbolsProvider0000644000175000017500000000030311361046444030272 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j decimal format symbols com.ibm.icu.impl.javaspi.text.DecimalFormatSymbolsProviderICU icu4j-4.2/localespi/src/META-INF/services/java.util.spi.LocaleNameProvider0000644000175000017500000000025611361046444026212 0ustar twernertwerner# Copyright (C) 2008, International Business Machines Corporation and others. All Rights Reserved. # icu4j locale name com.ibm.icu.impl.javaspi.util.LocaleNameProviderICU icu4j-4.2/localespi/src/com/0000755000175000017500000000000011361046444015656 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/0000755000175000017500000000000011361046444016425 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/0000755000175000017500000000000011361046446017207 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/dev/0000755000175000017500000000000011361046446017765 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/dev/test/0000755000175000017500000000000011361046446020744 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/0000755000175000017500000000000011361046446022717 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/LocaleNameTest.java0000644000175000017500000002023611361046446026425 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.util.ULocale; public class LocaleNameTest extends TestFmwk { public static void main(String[] args) throws Exception { new LocaleNameTest().run(args); } public void TestLanguageNames() { Locale[] locales = Locale.getAvailableLocales(); StringBuffer icuid = new StringBuffer(); for (Locale inLocale : locales) { if (TestUtil.isProblematicIBMLocale(inLocale)) { logln("Skipped " + inLocale); continue; } ULocale inULocale = ULocale.forLocale(inLocale); Locale inLocaleICU = TestUtil.toICUExtendedLocale(inLocale); for (Locale forLocale : locales) { if (forLocale.getLanguage().length() == 0) { continue; } icuid.setLength(0); icuid.append(forLocale.getLanguage()); String country = forLocale.getCountry(); String variant = forLocale.getVariant(); if (country.length() != 0) { icuid.append("_"); icuid.append(country); } if (variant.length() != 0) { if (country.length() == 0) { icuid.append("_"); } icuid.append("_"); icuid.append(variant); } ULocale forULocale = new ULocale(icuid.toString()); String icuname = ULocale.getDisplayLanguage(forULocale.getLanguage(), inULocale); if (icuname.equals(forULocale.getLanguage()) || icuname.length() == 0) { continue; } String name = forLocale.getDisplayLanguage(inLocale); if (TestUtil.isICUExtendedLocale(inLocale)) { // The name should be taken from ICU if (!name.equals(icuname)) { errln("FAIL: Language name by ICU is " + icuname + ", but got " + name + " for locale " + forLocale + " in locale " + inLocale); } } else { if (!name.equals(icuname)) { logln("INFO: Language name by JDK is " + name + ", but " + icuname + " by ICU, for locale " + forLocale + " in locale " + inLocale); } // Try explicit ICU locale (xx_yy_ICU) name = forLocale.getDisplayLanguage(inLocaleICU); if (!name.equals(icuname)) { errln("FAIL: Language name by ICU is " + icuname + ", but got " + name + " for locale " + forLocale + " in locale " + inLocaleICU); } } } } } public void TestCountryNames() { Locale[] locales = Locale.getAvailableLocales(); for (Locale inLocale : locales) { if (TestUtil.isProblematicIBMLocale(inLocale)) { logln("Skipped " + inLocale); continue; } ULocale inULocale = ULocale.forLocale(inLocale); Locale inLocaleICU = TestUtil.toICUExtendedLocale(inLocale); for (Locale forLocale : locales) { if (forLocale.getCountry().length() == 0) { continue; } // ULocale#forLocale preserves country always ULocale forULocale = ULocale.forLocale(forLocale); String icuname = ULocale.getDisplayCountry(forULocale.getCountry(), inULocale); if (icuname.equals(forULocale.getCountry()) || icuname.length() == 0) { continue; } String name = forLocale.getDisplayCountry(inLocale); if (TestUtil.isICUExtendedLocale(inLocale)) { // The name should be taken from ICU if (!name.equals(icuname)) { errln("FAIL: Country name by ICU is " + icuname + ", but got " + name + " for locale " + forLocale + " in locale " + inLocale); } } else { // The name might be taken from JDK if (!name.equals(icuname)) { logln("INFO: Country name by JDK is " + name + ", but " + icuname + " in ICU, for locale " + forLocale + " in locale " + inLocale); } // Try explicit ICU locale (xx_yy_ICU) name = forLocale.getDisplayCountry(inLocaleICU); if (!name.equals(icuname)) { errln("FAIL: Country name by ICU is " + icuname + ", but got " + name + " for locale " + forLocale + " in locale " + inLocaleICU); } } } } } public void TestVariantNames() { Locale[] locales = Locale.getAvailableLocales(); StringBuffer icuid = new StringBuffer(); for (Locale inLocale : locales) { if (TestUtil.isProblematicIBMLocale(inLocale)) { logln("Skipped " + inLocale); continue; } ULocale inULocale = ULocale.forLocale(inLocale); Locale inLocaleICU = TestUtil.toICUExtendedLocale(inLocale); for (Locale forLocale : locales) { if (forLocale.getVariant().length() == 0) { continue; } icuid.setLength(0); icuid.append(forLocale.getLanguage()); String country = forLocale.getCountry(); String variant = forLocale.getVariant(); if (country.length() != 0) { icuid.append("_"); icuid.append(country); } if (variant.length() != 0) { if (country.length() == 0) { icuid.append("_"); } icuid.append("_"); icuid.append(variant); } ULocale forULocale = new ULocale(icuid.toString()); String icuname = ULocale.getDisplayVariant(forULocale.getVariant(), inULocale); if (icuname.equals(forULocale.getVariant()) || icuname.length() == 0) { continue; } String name = forLocale.getDisplayVariant(inLocale); if (TestUtil.isICUExtendedLocale(inLocale)) { // The name should be taken from ICU if (!name.equals(icuname)) { errln("FAIL: Variant name by ICU is " + icuname + ", but got " + name + " for locale " + forLocale + " in locale " + inLocale); } } else { if (!name.equals(icuname)) { logln("INFO: Variant name by JDK is " + name + ", but " + icuname + " in ICU, for locale " + forLocale + " in locale " + inLocale); } // Try explicit ICU locale (xx_yy_ICU) name = forLocale.getDisplayVariant(inLocaleICU); if (!name.equals(icuname)) { errln("FAIL: Variant name by ICU is " + icuname + ", but got " + name + " for locale " + forLocale + " in locale " + inLocaleICU); } } } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/BreakIteratorTest.java0000644000175000017500000002500111361046446027156 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.text.BreakIterator; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class BreakIteratorTest extends TestFmwk { public static void main(String[] args) throws Exception { new BreakIteratorTest().run(args); } private static final int CHARACTER_BRK = 0; private static final int WORD_BRK = 1; private static final int LINE_BRK = 2; private static final int SENTENCE_BRK = 3; /* * Check if getInstance returns the ICU implementation. */ public void TestGetInstance() { for (Locale loc : BreakIterator.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } checkGetInstance(CHARACTER_BRK, loc); checkGetInstance(WORD_BRK, loc); checkGetInstance(LINE_BRK, loc); checkGetInstance(SENTENCE_BRK, loc); } } private void checkGetInstance(int type, Locale loc) { BreakIterator brkitr = null; String method = null; switch (type) { case CHARACTER_BRK: brkitr = BreakIterator.getCharacterInstance(loc); method = "getCharacterInstance"; break; case WORD_BRK: brkitr = BreakIterator.getWordInstance(loc); method = "getWordInstance"; break; case LINE_BRK: brkitr = BreakIterator.getLineInstance(loc); method = "getLineInstance"; break; case SENTENCE_BRK: brkitr = BreakIterator.getSentenceInstance(loc); method = "getSentenceInstance"; break; default: errln("FAIL: Unknown break iterator type"); return; } boolean isIcuImpl = (brkitr instanceof com.ibm.icu.impl.jdkadapter.BreakIteratorICU); if (TestUtil.isICUExtendedLocale(loc)) { if (!isIcuImpl) { errln("FAIL: " + method + " returned JDK BreakIterator for locale " + loc); } } else { if (isIcuImpl) { logln("INFO: " + method + " returned ICU BreakIterator for locale " + loc); } BreakIterator brkitrIcu = null; Locale iculoc = TestUtil.toICUExtendedLocale(loc); switch (type) { case CHARACTER_BRK: brkitrIcu = BreakIterator.getCharacterInstance(iculoc); break; case WORD_BRK: brkitrIcu = BreakIterator.getWordInstance(iculoc); break; case LINE_BRK: brkitrIcu = BreakIterator.getLineInstance(iculoc); break; case SENTENCE_BRK: brkitrIcu = BreakIterator.getSentenceInstance(iculoc); break; } if (isIcuImpl) { if (!brkitr.equals(brkitrIcu)) { // BreakIterator.getXXXInstance returns a cached BreakIterator instance. // BreakIterator does not override Object#equals, so the result may not be // consistent. // logln("INFO: " + method + " returned ICU BreakIterator for locale " + loc // + ", but different from the one for locale " + iculoc); } } else { if (!(brkitrIcu instanceof com.ibm.icu.impl.jdkadapter.BreakIteratorICU)) { errln("FAIL: " + method + " returned JDK BreakIterator for locale " + iculoc); } } } } /* * Testing the behavior of text break between ICU instance and its * equivalent created via the Locale SPI framework. */ public void TestICUEquivalent() { Locale[] TEST_LOCALES = { new Locale("en", "US"), new Locale("fr", "FR"), new Locale("th", "TH"), new Locale("zh", "CN"), }; String[] TEST_DATA = { "International Components for Unicode (ICU) is an open source project of mature " + "C/C++ and Java libraries for Unicode support, software internationalization and " + "software globalization. ICU is widely portable to many operating systems and " + "environments. It gives applications the same results on all platforms and between " + "C/C++ and Java software. The ICU project is an open source development project " + "that is sponsored, supported and used by IBM and many other companies.", "L'International Components for Unicode (ICU) est un projet open source qui fourni " + "des biblioth\u00e8ques pour les langages informatique C/C++ et Java pour supporter " + "Unicode, l'internationalisation et la mondialisation des logiciels. ICU est largement " + "portable vers beaucoup de syst\u00e8mes d'exploitations et d'environnements. Il " + "donne aux applications les m\u00eames comportements et r\u00e9sultats sur toutes " + "les plateformes et entre les logiciels C/C++ et Java. Le projet ICU est un projet " + "dont les code sources sont disponibles qui est sponsoris\u00e9, support\u00e9 et " + "utilis\u00e9 par IBM et beaucoup d'autres entreprises.", "\u5728IBM\u7b49\u4f01\u696d\u4e2d\uff0c\u56fd\u9645\u5316\u7ecf\u5e38\u7b80\u5199" + "\u4e3aI18N (\u6216i18n\u6216I18n)\uff0c\u5176\u4e2d18\u4ee3\u8868\u4e86\u4e2d\u95f4" + "\u7701\u7565\u768418\u4e2a\u5b57\u6bcd\uff1b\u800c\u201c\u672c\u5730\u5316\u201d" + "\u540c\u53ef\u7b80\u5199\u4e3al10n\u3002\u9019\u4e24\u4e2a\u6982\u5ff5\u6709\u65f6" + "\u5408\u79f0\u5168\u7403\u5316\uff08g11n\uff09\uff0c\u4f46\u662f\u5168\u7403\u5316" + "\u7684\u6db5\u4e49\u66f4\u4e3a\u4e00\u822c\u5316\u3002\u53e6\u5916\u5076\u5c14\u4f1a" + "\u51fa\u73b0\u201cp13n\u201d\uff0c\u4ee3\u8868\u4e2a\u4eba\u5316\uff08personalization" + "\uff09\u3002", "\u0e01\u0e23\u0e38\u0e07\u0e40\u0e17\u0e1e\u0e21\u0e2b\u0e32\u0e19\u0e04\u0e23" + "\u0e43\u0e19\u0e1b\u0e31\u0e08\u0e08\u0e38\u0e1a\u0e31\u0e19\u0e40\u0e1b\u0e47" + "\u0e19\u0e28\u0e39\u0e19\u0e22\u0e4c\u0e01\u0e25\u0e32\u0e07\u0e01\u0e32\u0e23" + "\u0e1b\u0e01\u0e04\u0e23\u0e2d\u0e07 \u0e01\u0e32\u0e23\u0e28\u0e36\u0e01\u0e29" + "\u0e32 \u0e01\u0e32\u0e23\u0e04\u0e21\u0e19\u0e32\u0e04\u0e21\u0e02\u0e19\u0e2a" + "\u0e48\u0e07 \u0e01\u0e32\u0e23\u0e40\u0e07\u0e34\u0e19\u0e01\u0e32\u0e23\u0e18" + "\u0e19\u0e32\u0e04\u0e32\u0e23 \u0e01\u0e32\u0e23\u0e1e\u0e32\u0e13\u0e34\u0e0a" + "\u0e22\u0e4c \u0e01\u0e32\u0e23\u0e2a\u0e37\u0e48\u0e2d\u0e2a\u0e32\u0e23 \u0e2f" + "\u0e25\u0e2f \u0e42\u0e14\u0e22\u0e21\u0e35\u0e1e\u0e37\u0e49\u0e19\u0e17\u0e35" + "\u0e48\u0e17\u0e31\u0e49\u0e07\u0e2b\u0e21\u0e14 1,562.2 \u0e15\u0e32\u0e23\u0e32" + "\u0e07\u0e01\u0e34\u0e42\u0e25\u0e40\u0e21\u0e15\u0e23 \u0e1e\u0e34\u0e01\u0e31" + "\u0e14\u0e17\u0e32\u0e07\u0e20\u0e39\u0e21\u0e34\u0e28\u0e32\u0e2a\u0e15\u0e23" + "\u0e4c\u0e04\u0e37\u0e2d \u0e25\u0e30\u0e15\u0e34\u0e08\u0e39\u0e14 13\u00b0 45" + "\u2019 \u0e40\u0e2b\u0e19\u0e37\u0e2d \u0e25\u0e2d\u0e07\u0e08\u0e34\u0e08\u0e39" + "\u0e14 100\u00b0 31\u2019 \u0e15\u0e30\u0e27\u0e31\u0e19\u0e2d\u0e2d\u0e01" }; BreakIterator[] jdkBrkItrs = new BreakIterator[4]; com.ibm.icu.text.BreakIterator[] icuBrkItrs = new com.ibm.icu.text.BreakIterator[4]; for (Locale loc : TEST_LOCALES) { Locale iculoc = TestUtil.toICUExtendedLocale(loc); jdkBrkItrs[0] = BreakIterator.getCharacterInstance(iculoc); jdkBrkItrs[1] = BreakIterator.getWordInstance(iculoc); jdkBrkItrs[2] = BreakIterator.getLineInstance(iculoc); jdkBrkItrs[3] = BreakIterator.getSentenceInstance(iculoc); icuBrkItrs[0] = com.ibm.icu.text.BreakIterator.getCharacterInstance(iculoc); icuBrkItrs[1] = com.ibm.icu.text.BreakIterator.getWordInstance(iculoc); icuBrkItrs[2] = com.ibm.icu.text.BreakIterator.getLineInstance(iculoc); icuBrkItrs[3] = com.ibm.icu.text.BreakIterator.getSentenceInstance(iculoc); for (String text : TEST_DATA) { for (int i = 0; i < 4; i++) { compareBreaks(text, jdkBrkItrs[i], icuBrkItrs[i]); } } } } private void compareBreaks(String text, BreakIterator jdkBrk, com.ibm.icu.text.BreakIterator icuBrk) { jdkBrk.setText(text); icuBrk.setText(text); // Forward int jidx = jdkBrk.first(); int iidx = icuBrk.first(); if (jidx != iidx) { errln("FAIL: Different first boundaries (jdk=" + jidx + ",icu=" + iidx + ") for text:\n" + text); } while (true) { jidx = jdkBrk.next(); iidx = icuBrk.next(); if (jidx != iidx) { errln("FAIL: Different boundaries (jdk=" + jidx + ",icu=" + iidx + "direction=forward) for text:\n" + text); } if (jidx == BreakIterator.DONE) { break; } } // Backward jidx = jdkBrk.last(); iidx = jdkBrk.last(); if (jidx != iidx) { errln("FAIL: Different last boundaries (jdk=" + jidx + ",icu=" + iidx + ") for text:\n" + text); } while (true) { jidx = jdkBrk.previous(); iidx = icuBrk.previous(); if (jidx != iidx) { errln("FAIL: Different boundaries (jdk=" + jidx + ",icu=" + iidx + "direction=backward) for text:\n" + text); } if (jidx == BreakIterator.DONE) { break; } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/TestUtil.java0000644000175000017500000000677611361046446025357 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.util.Locale; public class TestUtil { private static final String ICU_VARIANT = "ICU"; private static final String ICU_VARIANT_SUFFIX = "_ICU"; public static Locale toICUExtendedLocale(Locale locale) { if (isICUExtendedLocale(locale)) { return locale; } String variant = locale.getVariant(); variant = variant.length() == 0 ? ICU_VARIANT : variant + ICU_VARIANT_SUFFIX; return new Locale(locale.getLanguage(), locale.getCountry(), variant); } public static boolean isICUExtendedLocale(Locale locale) { String variant = locale.getVariant(); if (variant.equals(ICU_VARIANT) || variant.endsWith(ICU_VARIANT_SUFFIX)) { return true; } return false; } public static boolean equals(Object o1, Object o2) { if (o1 == null && o2 == null) { return true; } if (o1 == null || o2 == null) { return false; } return o1.equals(o2); } private static final boolean SUNJRE; private static final boolean IBMJRE; static { String javaVendor = System.getProperty("java.vendor"); if (javaVendor != null) { if (javaVendor.indexOf("Sun") >= 0) { SUNJRE = true; IBMJRE = false; } else if (javaVendor.indexOf("IBM") >= 0) { SUNJRE = false; IBMJRE = true; } else { SUNJRE = false; IBMJRE = false; } } else { SUNJRE = false; IBMJRE = false; } } public static boolean isSUNJRE() { return SUNJRE; } public static boolean isIBMJRE() { return IBMJRE; } /* * Ticket#6368 * * The ICU4J locale spi test cases reports many errors on IBM Java 6. There are two kinds * of problems observed and both of them look like implementation problems in IBM Java 6. * * - When a locale has variant field (for example, sr_RS_Cyrl, de_DE_PREEURO), adding ICU * suffix in the variant field (for example, sr_RS_Cyrl_ICU, de_DE_PREEURO_ICU) has no effects. * For these locales, IBM JRE 6 ignores installed Locale providers. * * - For "sh" sublocales with "ICU" variant (for example, sh__ICU, sh_CS_ICU), IBM JRE 6 also * ignores installed ICU locale providers. Probably, "sh" is internally mapped to "sr_RS_Cyrl" * internally before locale look up. * * For now, we exclude these problematic locales from locale spi test cases on IBM Java 6. */ public static boolean isProblematicIBMLocale(Locale loc) { if (!isIBMJRE()) { return false; } if (loc.getLanguage().equals("sh")) { return true; } String variant = loc.getVariant(); if (variant.startsWith("EURO") || variant.startsWith("PREEURO") || variant.startsWith("Cyrl") || variant.startsWith("Latn")) { return true; } return false; } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/CurrencyNameTest.java0000644000175000017500000000673111361046446027024 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.util.Currency; import java.util.HashSet; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class CurrencyNameTest extends TestFmwk { public static void main(String[] args) throws Exception { new CurrencyNameTest().run(args); } public void TestCurrencySymbols() { // Make a set of unique currencies HashSet currencies = new HashSet(); for (Locale l : Locale.getAvailableLocales()) { if (l.getCountry().length() == 0) { continue; } Currency currency = Currency.getInstance(l); if (currency == null) { continue; } currencies.add(currency); } for (Currency currency : currencies) { String currencyCode = currency.getCurrencyCode(); com.ibm.icu.util.Currency currencyIcu = com.ibm.icu.util.Currency.getInstance(currencyCode); if (currencyIcu == null) { logln("INFO: Currency code " + currencyCode + " is not supported by ICU"); continue; } for (Locale loc : Locale.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } String curSymbol = currency.getSymbol(loc); String curSymbolIcu = currencyIcu.getSymbol(loc); if (curSymbolIcu.equals(currencyCode)) { // No data in ICU if (!curSymbol.equals(currencyCode)) { logln("INFO: JDK has currency symbol " + curSymbol + " for locale " + loc + ", but ICU does not"); } continue; } if (TestUtil.isICUExtendedLocale(loc)) { if (!curSymbol.equals(curSymbolIcu)) { if (!curSymbol.equals(curSymbolIcu)) { errln("FAIL: Currency symbol for " + currencyCode + " by ICU is " + curSymbolIcu + ", but got " + curSymbol + " in locale " + loc); } } } else { if (!curSymbol.equals(curSymbolIcu)) { logln("INFO: Currency symbol for " + currencyCode + " by ICU is " + curSymbolIcu + ", but " + curSymbol + " by JDK in locale " + loc); } // Try explicit ICU locale (xx_yy_ICU) Locale locIcu = TestUtil.toICUExtendedLocale(loc); curSymbol = currency.getSymbol(locIcu); if (!curSymbol.equals(curSymbolIcu)) { errln("FAIL: Currency symbol for " + currencyCode + " by ICU is " + curSymbolIcu + ", but got " + curSymbol + " in locale " + locIcu); } } } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/CollatorTest.java0000644000175000017500000001231311361046446026201 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.text.CollationKey; import java.text.Collator; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class CollatorTest extends TestFmwk { public static void main(String[] args) throws Exception { new CollatorTest().run(args); } /* * Check if getInstance returns the ICU implementation. */ public void TestGetInstance() { for (Locale loc : Collator.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } Collator coll = Collator.getInstance(loc); boolean isIcuImpl = (coll instanceof com.ibm.icu.impl.jdkadapter.CollatorICU); if (TestUtil.isICUExtendedLocale(loc)) { if (!isIcuImpl) { errln("FAIL: getInstance returned JDK Collator for locale " + loc); } } else { if (isIcuImpl) { logln("INFO: getInstance returned ICU Collator for locale " + loc); } Locale iculoc = TestUtil.toICUExtendedLocale(loc); Collator collIcu = Collator.getInstance(iculoc); if (isIcuImpl) { if (!coll.equals(collIcu)) { errln("FAIL: getInstance returned ICU Collator for locale " + loc + ", but different from the one for locale " + iculoc); } } else { if (!(collIcu instanceof com.ibm.icu.impl.jdkadapter.CollatorICU)) { errln("FAIL: getInstance returned JDK Collator for locale " + iculoc); } } } } } /* * Testing the behavior of text collation between ICU instance and its * equivalent created via the Locale SPI framework. */ public void TestICUEquivalent() { Locale[] TEST_LOCALES = { new Locale("en", "US"), new Locale("de", "DE"), new Locale("ja", "JP"), }; String[] TEST_DATA = { "Cafe", "cafe", "CAFE", "caf\u00e9", "cafe\u0301", "\u304b\u3075\u3047", "\u304c\u3075\u3047", "\u304b\u3075\u3048", "\u30ab\u30d5\u30a7", "\uff76\uff8c\uff6a", }; for (Locale loc : TEST_LOCALES) { Locale iculoc = TestUtil.toICUExtendedLocale(loc); Collator jdkColl = Collator.getInstance(iculoc); com.ibm.icu.text.Collator icuColl = com.ibm.icu.text.Collator.getInstance(loc); // Default strength = TERITIARY checkCollation(jdkColl, icuColl, TEST_DATA, "TERITIARY", loc); // PRIMARY jdkColl.setStrength(Collator.PRIMARY); icuColl.setStrength(com.ibm.icu.text.Collator.PRIMARY); checkCollation(jdkColl, icuColl, TEST_DATA, "PRIMARY", loc); // SECONDARY jdkColl.setStrength(Collator.SECONDARY); icuColl.setStrength(com.ibm.icu.text.Collator.SECONDARY); checkCollation(jdkColl, icuColl, TEST_DATA, "SECONDARY", loc); } } private void checkCollation(Collator jdkColl, com.ibm.icu.text.Collator icuColl, String[] data, String strength, Locale loc) { for (String text1 : data) { for (String text2 : data) { int jdkRes = jdkColl.compare(text1, text2); int icuRes = icuColl.compare(text1, text2); if (jdkRes != icuRes) { errln("FAIL: Different results for [text1=" + text1 + ",text2=" + text2 + ") for locale " + loc + " with strength " + strength + " - Result (jdk=" + jdkRes + ",icu=" + icuRes + ")"); } // Evaluate collationKey CollationKey jdkKey1 = jdkColl.getCollationKey(text1); CollationKey jdkKey2 = jdkColl.getCollationKey(text2); com.ibm.icu.text.CollationKey icuKey1 = icuColl.getCollationKey(text1); com.ibm.icu.text.CollationKey icuKey2 = icuColl.getCollationKey(text2); int jdkKeyRes = jdkKey1.compareTo(jdkKey2); int icuKeyRes = icuKey1.compareTo(icuKey2); if (jdkKeyRes != icuKeyRes) { errln("FAIL: Different collationKey comparison results for [text1=" + text1 + ",text2=" + text2 + ") for locale " + loc + " with strength " + strength + " - Result (jdk=" + jdkRes + ",icu=" + icuRes + ")"); } } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/TestAll.java0000644000175000017500000000166011361046446025135 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import com.ibm.icu.dev.test.TestFmwk.TestGroup; public class TestAll extends TestGroup { public static void main(String[] args) { new TestAll().run(args); } public TestAll() { super(new String[] { "BreakIteratorTest", "CollatorTest", "DateFormatSymbolsTest", "DateFormatTest", "DecimalFormatSymbolsTest", "NumberFormatTest", "CurrencyNameTest", "LocaleNameTest", "TimeZoneNameTest", }); } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/TimeZoneNameTest.java0000644000175000017500000001341211361046446026756 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.util.Locale; import java.util.TimeZone; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.util.ULocale; public class TimeZoneNameTest extends TestFmwk { public static void main(String[] args) throws Exception { new TimeZoneNameTest().run(args); } public void TestTimeZoneNames() { Locale[] locales = Locale.getAvailableLocales(); String[] tzids = TimeZone.getAvailableIDs(); for (Locale loc : locales) { boolean warningOnly = false; if (TestUtil.isProblematicIBMLocale(loc)) { warningOnly = true; } else if ((TestUtil.isSUNJRE() || TestUtil.isIBMJRE()) && loc.toString().startsWith("eu")) { warningOnly = true; } for (String tzid : tzids) { TimeZone tz = TimeZone.getTimeZone(tzid); com.ibm.icu.util.TimeZone tzIcu = com.ibm.icu.util.TimeZone.getTimeZone(tzid); checkDisplayNamePair(TimeZone.SHORT, tz, tzIcu, loc, warningOnly); checkDisplayNamePair(TimeZone.LONG, tz, tzIcu, loc, warningOnly); } } } private void checkDisplayNamePair(int style, TimeZone tz, com.ibm.icu.util.TimeZone icuTz, Locale loc, boolean warnOnly) { /* Note: There are two problems here. * * It looks Java 6 requires a TimeZoneNameProvider to return both standard name and daylight name * for a zone. If the provider implementation only returns either of them, Java 6 also ignore * the other. In ICU, there are zones which do not have daylight names, especially zones which * do not use daylight time. This test case does not check a standard name if its daylight name * is not available because of the Java 6 implementation problem. * * Another problem is that ICU always use a standard name for a zone which does not use daylight * saving time even daylight name is requested. */ String icuStdName = getIcuDisplayName(icuTz, false, style, loc); String icuDstName = getIcuDisplayName(icuTz, true, style, loc); if (icuStdName != null && icuDstName != null && !icuStdName.equals(icuDstName)) { checkDisplayName(false, style, tz, loc, icuStdName, warnOnly); checkDisplayName(true, style, tz, loc, icuDstName, warnOnly); } } private String getIcuDisplayName(com.ibm.icu.util.TimeZone icuTz, boolean daylight, int style, Locale loc) { ULocale uloc = ULocale.forLocale(loc); boolean shortStyle = (style == TimeZone.SHORT); String icuname = icuTz.getDisplayName(daylight, (shortStyle ? com.ibm.icu.util.TimeZone.SHORT : com.ibm.icu.util.TimeZone.LONG), uloc); int numDigits = 0; for (int i = 0; i < icuname.length(); i++) { if (UCharacter.isDigit(icuname.charAt(i))) { numDigits++; } } if (numDigits >= 3) { // ICU does not have the localized name return null; } return icuname; } private void checkDisplayName(boolean daylight, int style, TimeZone tz, Locale loc, String icuname, boolean warnOnly) { String styleStr = (style == TimeZone.SHORT) ? "SHORT" : "LONG"; String name = tz.getDisplayName(daylight, style, loc); if (TestUtil.isICUExtendedLocale(loc)) { // The name should be taken from ICU if (!name.equals(icuname)) { if (warnOnly) { logln("WARNING: TimeZone name by ICU is " + icuname + ", but got " + name + " for time zone " + tz.getID() + " in locale " + loc + " (daylight=" + daylight + ", style=" + styleStr + ")"); } else { errln("FAIL: TimeZone name by ICU is " + icuname + ", but got " + name + " for time zone " + tz.getID() + " in locale " + loc + " (daylight=" + daylight + ", style=" + styleStr + ")"); } } } else { if (!name.equals(icuname)) { logln("INFO: TimeZone name by ICU is " + icuname + ", but got " + name + " for time zone " + tz.getID() + " in locale " + loc + " (daylight=" + daylight + ", style=" + styleStr + ")"); } // Try explicit ICU locale (xx_yy_ICU) Locale icuLoc = TestUtil.toICUExtendedLocale(loc); name = tz.getDisplayName(daylight, style, icuLoc); if (!name.equals(icuname)) { if (warnOnly) { logln("WARNING: TimeZone name by ICU is " + icuname + ", but got " + name + " for time zone " + tz.getID() + " in locale " + icuLoc + " (daylight=" + daylight + ", style=" + styleStr + ")"); } else { errln("FAIL: TimeZone name by ICU is " + icuname + ", but got " + name + " for time zone " + tz.getID() + " in locale " + icuLoc + " (daylight=" + daylight + ", style=" + styleStr + ")"); } } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/DecimalFormatSymbolsTest.java0000644000175000017500000002075411361046446030512 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.text.DecimalFormatSymbols; import java.util.Currency; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class DecimalFormatSymbolsTest extends TestFmwk { public static void main(String[] args) throws Exception { new DecimalFormatSymbolsTest().run(args); } /* * Check if getInstance returns the ICU implementation. */ public void TestGetInstance() { for (Locale loc : DecimalFormatSymbols.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } DecimalFormatSymbols decfs = DecimalFormatSymbols.getInstance(loc); boolean isIcuImpl = (decfs instanceof com.ibm.icu.impl.jdkadapter.DecimalFormatSymbolsICU); if (TestUtil.isICUExtendedLocale(loc)) { if (!isIcuImpl) { errln("FAIL: getInstance returned JDK DecimalFormatSymbols for locale " + loc); } } else { if (isIcuImpl) { logln("INFO: getInstance returned ICU DecimalFormatSymbols for locale " + loc); } Locale iculoc = TestUtil.toICUExtendedLocale(loc); DecimalFormatSymbols decfsIcu = DecimalFormatSymbols.getInstance(iculoc); if (isIcuImpl) { if (!decfs.equals(decfsIcu)) { errln("FAIL: getInstance returned ICU DecimalFormatSymbols for locale " + loc + ", but different from the one for locale " + iculoc); } } else { if (!(decfsIcu instanceof com.ibm.icu.impl.jdkadapter.DecimalFormatSymbolsICU)) { errln("FAIL: getInstance returned JDK DecimalFormatSymbols for locale " + iculoc); } } } } } /* * Testing the contents of DecimalFormatSymbols between ICU instance and its * equivalent created via the Locale SPI framework. */ public void TestICUEquivalent() { Locale[] TEST_LOCALES = { new Locale("en", "US"), new Locale("pt", "BR"), new Locale("ko", "KR"), }; for (Locale loc : TEST_LOCALES) { Locale iculoc = TestUtil.toICUExtendedLocale(loc); DecimalFormatSymbols jdkDecfs = DecimalFormatSymbols.getInstance(iculoc); com.ibm.icu.text.DecimalFormatSymbols icuDecfs = com.ibm.icu.text.DecimalFormatSymbols.getInstance(loc); Currency jdkCur = jdkDecfs.getCurrency(); com.ibm.icu.util.Currency icuCur = icuDecfs.getCurrency(); if ((jdkCur != null && icuCur == null) || (jdkCur == null && icuCur != null) || !jdkCur.getCurrencyCode().equals(icuCur.getCurrencyCode())) { errln("FAIL: Different results returned by getCurrency for locale " + loc); } checkEquivalence(jdkDecfs.getCurrencySymbol(), icuDecfs.getCurrencySymbol(), loc, "getCurrencySymbol"); checkEquivalence(jdkDecfs.getDecimalSeparator(), icuDecfs.getDecimalSeparator(), loc, "getDecimalSeparator"); checkEquivalence(jdkDecfs.getDigit(), icuDecfs.getDigit(), loc, "getDigit"); checkEquivalence(jdkDecfs.getExponentSeparator(), icuDecfs.getExponentSeparator(), loc, "getExponentSeparator"); checkEquivalence(jdkDecfs.getGroupingSeparator(), icuDecfs.getGroupingSeparator(), loc, "getGroupingSeparator"); checkEquivalence(jdkDecfs.getInfinity(), icuDecfs.getInfinity(), loc, "getInfinity"); checkEquivalence(jdkDecfs.getInternationalCurrencySymbol(), icuDecfs.getInternationalCurrencySymbol(), loc, "getInternationalCurrencySymbol"); checkEquivalence(jdkDecfs.getMinusSign(), icuDecfs.getMinusSign(), loc, "getMinusSign"); checkEquivalence(jdkDecfs.getMonetaryDecimalSeparator(), icuDecfs.getMonetaryDecimalSeparator(), loc, "getMonetaryDecimalSeparator"); checkEquivalence(jdkDecfs.getNaN(), icuDecfs.getNaN(), loc, "getNaN"); checkEquivalence(jdkDecfs.getPatternSeparator(), icuDecfs.getPatternSeparator(), loc, "getPatternSeparator"); checkEquivalence(jdkDecfs.getPercent(), icuDecfs.getPercent(), loc, "getPercent"); checkEquivalence(jdkDecfs.getPerMill(), icuDecfs.getPerMill(), loc, "getPerMill"); checkEquivalence(jdkDecfs.getZeroDigit(), icuDecfs.getZeroDigit(), loc, "getZeroDigit"); } } private void checkEquivalence(Object jo, Object io, Locale loc, String method) { if (!jo.equals(io)) { errln("FAIL: Different results returned by " + method + " for locale " + loc + " (jdk=" + jo + ",icu=" + io + ")"); } } /* * Testing setters */ public void TestSetSymbols() { // ICU's JDK DecimalFormatSymbols implementation for de_DE locale DecimalFormatSymbols decfs = DecimalFormatSymbols.getInstance(new Locale("de", "DE", "ICU")); // en_US is supported by JDK, so this is the JDK's own DecimalFormatSymbols Locale loc = new Locale("en", "US"); DecimalFormatSymbols decfsEnUS = DecimalFormatSymbols.getInstance(loc); // Copying over all symbols decfs.setCurrency(decfsEnUS.getCurrency()); decfs.setCurrencySymbol(decfsEnUS.getCurrencySymbol()); decfs.setDecimalSeparator(decfsEnUS.getDecimalSeparator()); decfs.setDigit(decfsEnUS.getDigit()); decfs.setExponentSeparator(decfsEnUS.getExponentSeparator()); decfs.setGroupingSeparator(decfsEnUS.getGroupingSeparator()); decfs.setInfinity(decfsEnUS.getInfinity()); decfs.setInternationalCurrencySymbol(decfsEnUS.getInternationalCurrencySymbol()); decfs.setMinusSign(decfsEnUS.getMinusSign()); decfs.setMonetaryDecimalSeparator(decfsEnUS.getMonetaryDecimalSeparator()); decfs.setNaN(decfsEnUS.getNaN()); decfs.setPatternSeparator(decfsEnUS.getPatternSeparator()); decfs.setPercent(decfsEnUS.getPercent()); decfs.setPerMill(decfsEnUS.getPerMill()); decfs.setZeroDigit(decfsEnUS.getZeroDigit()); // Check Currency cur = decfs.getCurrency(); Currency curEnUS = decfsEnUS.getCurrency(); if ((cur != null && curEnUS == null) || (cur == null && curEnUS != null) || !cur.equals(curEnUS)) { errln("FAIL: Different results returned by getCurrency"); } checkEquivalence(decfs.getCurrencySymbol(), decfsEnUS.getCurrencySymbol(), loc, "getCurrencySymbol"); checkEquivalence(decfs.getDecimalSeparator(), decfsEnUS.getDecimalSeparator(), loc, "getDecimalSeparator"); checkEquivalence(decfs.getDigit(), decfsEnUS.getDigit(), loc, "getDigit"); checkEquivalence(decfs.getExponentSeparator(), decfsEnUS.getExponentSeparator(), loc, "getExponentSeparator"); checkEquivalence(decfs.getGroupingSeparator(), decfsEnUS.getGroupingSeparator(), loc, "getGroupingSeparator"); checkEquivalence(decfs.getInfinity(), decfsEnUS.getInfinity(), loc, "getInfinity"); checkEquivalence(decfs.getInternationalCurrencySymbol(), decfsEnUS.getInternationalCurrencySymbol(), loc, "getInternationalCurrencySymbol"); checkEquivalence(decfs.getMinusSign(), decfsEnUS.getMinusSign(), loc, "getMinusSign"); checkEquivalence(decfs.getMonetaryDecimalSeparator(), decfsEnUS.getMonetaryDecimalSeparator(), loc, "getMonetaryDecimalSeparator"); checkEquivalence(decfs.getNaN(), decfsEnUS.getNaN(), loc, "getNaN"); checkEquivalence(decfs.getPatternSeparator(), decfsEnUS.getPatternSeparator(), loc, "getPatternSeparator"); checkEquivalence(decfs.getPercent(), decfsEnUS.getPercent(), loc, "getPercent"); checkEquivalence(decfs.getPerMill(), decfsEnUS.getPerMill(), loc, "getPerMill"); checkEquivalence(decfs.getZeroDigit(), decfsEnUS.getZeroDigit(), loc, "getZeroDigit"); } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/NumberFormatTest.java0000644000175000017500000002745511361046446027040 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.math.BigDecimal; import java.math.BigInteger; import java.text.NumberFormat; import java.text.ParseException; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class NumberFormatTest extends TestFmwk { public static void main(String[] args) throws Exception { new NumberFormatTest().run(args); } private static final int DEFAULT_TYPE = 0; private static final int NUMBER_TYPE = 1; private static final int INTEGER_TYPE = 2; private static final int PERCENT_TYPE = 3; private static final int CURRENCY_TYPE = 4; /* * Check if getInstance returns the ICU implementation. */ public void TestGetInstance() { for (Locale loc : NumberFormat.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } checkGetInstance(DEFAULT_TYPE, loc); checkGetInstance(NUMBER_TYPE, loc); checkGetInstance(INTEGER_TYPE, loc); checkGetInstance(PERCENT_TYPE, loc); checkGetInstance(CURRENCY_TYPE, loc); } } private void checkGetInstance(int type, Locale loc) { NumberFormat nf; String[] method = new String[1]; nf = getJDKInstance(type, loc, method); boolean isIcuImpl = (nf instanceof com.ibm.icu.impl.jdkadapter.DecimalFormatICU) || (nf instanceof com.ibm.icu.impl.jdkadapter.NumberFormatICU); if (TestUtil.isICUExtendedLocale(loc)) { if (!isIcuImpl) { errln("FAIL: " + method[0] + " returned JDK NumberFormat for locale " + loc); } } else { if (isIcuImpl) { logln("INFO: " + method[0] + " returned ICU NumberFormat for locale " + loc); } Locale iculoc = TestUtil.toICUExtendedLocale(loc); NumberFormat nfIcu = null; nfIcu = getJDKInstance(type, iculoc, null); if (isIcuImpl) { if (!nf.equals(nfIcu)) { errln("FAIL: " + method[0] + " returned ICU NumberFormat for locale " + loc + ", but different from the one for locale " + iculoc); } } else { if (!(nfIcu instanceof com.ibm.icu.impl.jdkadapter.DecimalFormatICU) && !(nfIcu instanceof com.ibm.icu.impl.jdkadapter.NumberFormatICU)) { errln("FAIL: " + method[0] + " returned JDK NumberFormat for locale " + iculoc); } } } } private NumberFormat getJDKInstance(int type, Locale loc, String[] methodName) { NumberFormat nf = null; String method = null; switch (type) { case DEFAULT_TYPE: nf = NumberFormat.getInstance(loc); method = "getInstance"; break; case NUMBER_TYPE: nf = NumberFormat.getNumberInstance(loc); method = "getNumberInstance"; break; case INTEGER_TYPE: nf = NumberFormat.getIntegerInstance(loc); method = "getIntegerInstance"; break; case PERCENT_TYPE: nf = NumberFormat.getPercentInstance(loc); method = "getPercentInstance"; break; case CURRENCY_TYPE: nf = NumberFormat.getCurrencyInstance(loc); method = "getCurrencyInstance"; break; } if (methodName != null) { methodName[0] = method; } return nf; } private com.ibm.icu.text.NumberFormat getICUInstance(int type, Locale loc, String[] methodName) { com.ibm.icu.text.NumberFormat icunf = null; String method = null; switch (type) { case DEFAULT_TYPE: icunf = com.ibm.icu.text.NumberFormat.getInstance(loc); method = "getInstance"; break; case NUMBER_TYPE: icunf = com.ibm.icu.text.NumberFormat.getNumberInstance(loc); method = "getNumberInstance"; break; case INTEGER_TYPE: icunf = com.ibm.icu.text.NumberFormat.getIntegerInstance(loc); method = "getIntegerInstance"; break; case PERCENT_TYPE: icunf = com.ibm.icu.text.NumberFormat.getPercentInstance(loc); method = "getPercentInstance"; break; case CURRENCY_TYPE: icunf = com.ibm.icu.text.NumberFormat.getCurrencyInstance(loc); method = "getCurrencyInstance"; break; } if (methodName != null) { methodName[0] = method; } return icunf; } /* * Testing the behavior of number format between ICU instance and its * equivalent created via the Locale SPI framework. */ public void TestICUEquivalent() { Locale[] TEST_LOCALES = { new Locale("en", "US"), new Locale("de", "DE"), new Locale("zh"), }; long[] TEST_LONGS = { 40L, -1578L, 112233445566778899L, }; double[] TEST_DOUBLES = { 0.0451D, -1.679D, 124578.369D, }; Object[] TEST_NUMBERS = { Byte.valueOf((byte)13), Integer.valueOf(3961), Long.valueOf(-3451237890000L), Float.valueOf(1.754F), Double.valueOf(-129.942362353D), new BigInteger("-15253545556575859505"), new BigDecimal("3.14159265358979323846264338"), }; String[] methodName = new String[1]; for (Locale loc : TEST_LOCALES) { for (int type = 0; type <= 4; type++) { Locale iculoc = TestUtil.toICUExtendedLocale(loc); NumberFormat nf = getJDKInstance(type, iculoc, methodName); com.ibm.icu.text.NumberFormat icunf = getICUInstance(type, loc, null); String s1, s2; Number n1, n2; boolean pe1, pe2; for (long l : TEST_LONGS) { s1 = nf.format(l); s2 = icunf.format(l); if (!s1.equals(s2)) { errln("FAIL: Different results for formatting long " + l + " by NumberFormat(" + methodName[0] + ") in locale " + loc + " - JDK:" + s1 + " ICU:" + s2); } pe1 = false; n1 = n2 = null; try { n1 = nf.parse(s1); } catch (ParseException e) { pe1 = true; } pe2 = false; try { n2 = icunf.parse(s2); } catch (ParseException e) { pe2 = true; } if ((pe1 && !pe2) || (!pe1 && pe2)) { errln("FAIL: ParseException thrown by " + (pe1 ? "JDK" : "ICU") + " NumberFormat(" + methodName[0] + ") for parsing long" + l + " in locale " + loc); } else if (!pe1 && !pe2 && !n1.equals(n2)) { errln("FAIL: Different results for parsing long " + l + " by NumberFormat(" + methodName[0] + ") in locale " + loc + " - JDK:" + n1 + " ICU:" + n2); } else if (pe1 && pe2) { logln("INFO: ParseException thrown by both JDK and ICU NumberFormat(" + methodName[0] + ") for parsing long " + l + " in locale " + loc); } } for (double d : TEST_DOUBLES) { s1 = nf.format(d); s2 = icunf.format(d); if (!s1.equals(s2)) { errln("FAIL: Different results for formatting double " + d + " by NumberFormat(" + methodName[0] + ") in locale " + loc + " - JDK:" + s1 + " ICU:" + s2); } pe1 = false; n1 = n2 = null; try { n1 = nf.parse(s1); } catch (ParseException e) { pe1 = true; } pe2 = false; try { n2 = icunf.parse(s2); } catch (ParseException e) { pe2 = true; } if ((pe1 && !pe2) || (!pe1 && pe2)) { errln("FAIL: ParseException thrown by " + (pe1 ? "JDK" : "ICU") + " NumberFormat(" + methodName[0] + ") for parsing double" + d + " in locale " + loc); } else if (!pe1 && !pe2 && !n1.equals(n2)) { errln("FAIL: Different results for parsing double " + d + " by NumberFormat(" + methodName[0] + ") in locale " + loc + " - JDK:" + n1 + " ICU:" + n2); } else if (pe1 && pe2) { logln("INFO: ParseException thrown by both JDK and ICU NumberFormat(" + methodName[0] + ") for parsing double " + d + " in locale " + loc); } } for (Object o : TEST_NUMBERS) { s1 = nf.format(o); s2 = icunf.format(o); if (!s1.equals(s2)) { errln("FAIL: Different results for formatting " + o.getClass().getName() + " by NumberFormat(" + methodName[0] + ") in locale " + loc + " - JDK:" + s1 + " ICU:" + s2); } pe1 = false; n1 = n2 = null; try { n1 = nf.parse(s1); } catch (ParseException e) { pe1 = true; } pe2 = false; try { n2 = icunf.parse(s2); } catch (ParseException e) { pe2 = true; } if ((pe1 && !pe2) || (!pe1 && pe2)) { errln("FAIL: ParseException thrown by " + (pe1 ? "JDK" : "ICU") + " NumberFormat(" + methodName[0] + ") for parsing " + o.getClass().getName() + " in locale " + loc); } else if (!pe1 && !pe2 && !n1.equals(n2)) { errln("FAIL: Different results for parsing " + o.getClass().getName() + " by NumberFormat(" + methodName[0] + ") in locale " + loc + " - JDK:" + n1 + " ICU:" + n2); } else if (pe1 && pe2) { logln("INFO: ParseException thrown by both JDK and ICU NumberFormat(" + methodName[0] + ") for parsing " + o.getClass().getName() + " in locale " + loc); } } } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/DateFormatSymbolsTest.java0000644000175000017500000002033511361046446030024 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.text.DateFormatSymbols; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class DateFormatSymbolsTest extends TestFmwk { public static void main(String[] args) throws Exception { new DateFormatSymbolsTest().run(args); } /* * Check if getInstance returns the ICU implementation. */ public void TestGetInstance() { for (Locale loc : DateFormatSymbols.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } DateFormatSymbols dfs = DateFormatSymbols.getInstance(loc); boolean isIcuImpl = (dfs instanceof com.ibm.icu.impl.jdkadapter.DateFormatSymbolsICU); if (TestUtil.isICUExtendedLocale(loc)) { if (!isIcuImpl) { errln("FAIL: getInstance returned JDK DateFormatSymbols for locale " + loc); } } else { if (isIcuImpl) { logln("INFO: getInstance returned ICU DateFormatSymbols for locale " + loc); } Locale iculoc = TestUtil.toICUExtendedLocale(loc); DateFormatSymbols dfsIcu = DateFormatSymbols.getInstance(iculoc); if (isIcuImpl) { if (!dfs.equals(dfsIcu)) { errln("FAIL: getInstance returned ICU DateFormatSymbols for locale " + loc + ", but different from the one for locale " + iculoc); } } else { if (!(dfsIcu instanceof com.ibm.icu.impl.jdkadapter.DateFormatSymbolsICU)) { errln("FAIL: getInstance returned JDK DateFormatSymbols for locale " + iculoc); } } } } } /* * Testing the contents of DateFormatSymbols between ICU instance and its * equivalent created via the Locale SPI framework. */ public void TestICUEquivalent() { Locale[] TEST_LOCALES = { new Locale("en", "US"), new Locale("es", "ES"), new Locale("ja", "JP", "JP"), new Locale("th", "TH"), }; for (Locale loc : TEST_LOCALES) { Locale iculoc = TestUtil.toICUExtendedLocale(loc); DateFormatSymbols jdkDfs = DateFormatSymbols.getInstance(iculoc); com.ibm.icu.text.DateFormatSymbols icuDfs = com.ibm.icu.text.DateFormatSymbols.getInstance(loc); compareArrays(jdkDfs.getAmPmStrings(), icuDfs.getAmPmStrings(), loc, "getAmPmStrings"); compareArrays(jdkDfs.getEras(), icuDfs.getEras(), loc, "getEras"); compareArrays(jdkDfs.getMonths(), icuDfs.getMonths(), loc, "getMonths"); compareArrays(jdkDfs.getShortMonths(), icuDfs.getShortMonths(), loc, "getShortMonths"); compareArrays(jdkDfs.getShortWeekdays(), icuDfs.getShortWeekdays(), loc, "getShortWeekdays"); compareArrays(jdkDfs.getWeekdays(), icuDfs.getWeekdays(), loc, "getWeekdays"); compareArrays(jdkDfs.getZoneStrings(), icuDfs.getZoneStrings(), loc, "getZoneStrings"); } } /* * Testing setters */ public void TestSetSymbols() { // ICU's JDK DateFormatSymbols implementation for ja_JP locale DateFormatSymbols dfs = DateFormatSymbols.getInstance(new Locale("ja", "JP", "ICU")); // en_US is supported by JDK, so this is the JDK's own DateFormatSymbols Locale loc = new Locale("en", "US"); DateFormatSymbols dfsEnUS = DateFormatSymbols.getInstance(loc); // Copying over all symbols dfs.setAmPmStrings(dfsEnUS.getAmPmStrings()); dfs.setEras(dfsEnUS.getEras()); dfs.setMonths(dfsEnUS.getMonths()); dfs.setShortMonths(dfsEnUS.getShortMonths()); dfs.setShortWeekdays(dfsEnUS.getShortWeekdays()); dfs.setWeekdays(dfsEnUS.getWeekdays()); dfs.setZoneStrings(dfsEnUS.getZoneStrings()); compareArrays(dfs.getAmPmStrings(), dfsEnUS.getAmPmStrings(), loc, "getAmPmStrings"); compareArrays(dfs.getEras(), dfsEnUS.getEras(), loc, "getEras"); compareArrays(dfs.getMonths(), dfsEnUS.getMonths(), loc, "getMonths"); compareArrays(dfs.getShortMonths(), dfsEnUS.getShortMonths(), loc, "getShortMonths"); compareArrays(dfs.getShortWeekdays(), dfsEnUS.getShortWeekdays(), loc, "getShortWeekdays"); compareArrays(dfs.getWeekdays(), dfsEnUS.getWeekdays(), loc, "getWeekdays"); compareArrays(dfs.getZoneStrings(), dfsEnUS.getZoneStrings(), loc, "getZoneStrings"); } private void compareArrays(Object jarray, Object iarray, Locale loc, String method) { if (jarray instanceof String[][]) { String[][] jaa = (String[][])jarray; String[][] iaa = (String[][])iarray; if (jaa.length != iaa.length || jaa[0].length != iaa[0].length) { errln("FAIL: Different array size returned by " + method + "for locale " + loc + "(jdksize=" + jaa.length + "x" + jaa[0].length + ",icusize=" + iaa.length + "x" + iaa[0].length + ")"); } for (int i = 0; i < jaa.length; i++) { for (int j = 0; j < jaa[i].length; j++) { if (!TestUtil.equals(jaa[i][j], iaa[i][j])) { errln("FAIL: Different symbols returned by " + method + "for locale " + loc + " at index " + i + "," + j + " (jdk=" + jaa[i][j] + ",icu=" + iaa[i][j] + ")"); } } } } else { String[] ja = (String[])jarray; String[] ia = (String[])iarray; if (ja.length != ia.length) { errln("FAIL: Different array size returned by " + method + "for locale " + loc + "(jdksize=" + ja.length + ",icusize=" + ia.length + ")"); } else { for (int i = 0; i < ja.length; i++) { if (!TestUtil.equals(ja[i], ia[i])) { errln("FAIL: Different symbols returned by " + method + "for locale " + loc + " at index " + i + " (jdk=" + ja[i] + ",icu=" + ia[i] + ")"); } } } } } /* * Testing Nynorsk locales */ public void TestNynorsk() { Locale nnNO = new Locale("nn", "NO"); Locale noNONY = new Locale("no", "NO", "NY"); DateFormatSymbols dfs_nnNO = DateFormatSymbols.getInstance(nnNO); DateFormatSymbols dfs_nnNO_ICU = DateFormatSymbols.getInstance(TestUtil.toICUExtendedLocale(nnNO)); DateFormatSymbols dfs_noNONY_ICU = DateFormatSymbols.getInstance(TestUtil.toICUExtendedLocale(noNONY)); // Weekday names should be identical for these three. // If data is taken from no/nb, then this check will fail. String[] dow_nnNO = dfs_nnNO.getWeekdays(); String[] dow_nnNO_ICU = dfs_nnNO_ICU.getWeekdays(); String[] dow_noNONY_ICU = dfs_noNONY_ICU.getWeekdays(); for (int i = 1; i < dow_nnNO.length; i++) { if (!dow_nnNO[i].equals(dow_nnNO_ICU[i])) { errln("FAIL: Different weekday name - index=" + i + ", nn_NO:" + dow_nnNO[i] + ", nn_NO_ICU:" + dow_nnNO_ICU[i]); } } for (int i = 1; i < dow_nnNO.length; i++) { if (!dow_nnNO[i].equals(dow_noNONY_ICU[i])) { errln("FAIL: Different weekday name - index=" + i + ", nn_NO:" + dow_nnNO[i] + ", no_NO_NY_ICU:" + dow_nnNO_ICU[i]); } } } } icu4j-4.2/localespi/src/com/ibm/icu/dev/test/localespi/DateFormatTest.java0000644000175000017500000001751111361046446026455 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.localespi; import java.text.DateFormat; import java.text.ParseException; import java.util.Date; import java.util.Locale; import com.ibm.icu.dev.test.TestFmwk; public class DateFormatTest extends TestFmwk { public static void main(String[] args) throws Exception { new DateFormatTest().run(args); } /* * Check if getInstance returns the ICU implementation. */ public void TestGetInstance() { for (Locale loc : DateFormat.getAvailableLocales()) { if (TestUtil.isProblematicIBMLocale(loc)) { logln("Skipped " + loc); continue; } checkGetInstance(DateFormat.FULL, DateFormat.LONG, loc); checkGetInstance(DateFormat.MEDIUM, -1, loc); checkGetInstance(1, DateFormat.SHORT, loc); } } private void checkGetInstance(int dstyle, int tstyle, Locale loc) { String method[] = new String[1]; DateFormat df = getJDKInstance(dstyle, tstyle, loc, method); boolean isIcuImpl = (df instanceof com.ibm.icu.impl.jdkadapter.SimpleDateFormatICU); if (TestUtil.isICUExtendedLocale(loc)) { if (!isIcuImpl) { errln("FAIL: " + method[0] + " returned JDK DateFormat for locale " + loc); } } else { if (isIcuImpl) { logln("INFO: " + method[0] + " returned ICU DateFormat for locale " + loc); } Locale iculoc = TestUtil.toICUExtendedLocale(loc); DateFormat dfIcu = getJDKInstance(dstyle, tstyle, iculoc, null); if (isIcuImpl) { if (!df.equals(dfIcu)) { errln("FAIL: " + method[0] + " returned ICU DateFormat for locale " + loc + ", but different from the one for locale " + iculoc); } } else { if (!(dfIcu instanceof com.ibm.icu.impl.jdkadapter.SimpleDateFormatICU)) { errln("FAIL: " + method[0] + " returned JDK DateFormat for locale " + iculoc); } } } } private DateFormat getJDKInstance(int dstyle, int tstyle, Locale loc, String[] methodName) { DateFormat df; String method; if (dstyle < 0) { df = DateFormat.getTimeInstance(tstyle, loc); method = "getTimeInstance"; } else if (tstyle < 0) { df = DateFormat.getDateInstance(dstyle, loc); method = "getDateInstance"; } else { df = DateFormat.getDateTimeInstance(dstyle, tstyle, loc); method = "getDateTimeInstance"; } if (methodName != null) { methodName[0] = method; } return df; } private com.ibm.icu.text.DateFormat getICUInstance(int dstyle, int tstyle, Locale loc, String[] methodName) { com.ibm.icu.text.DateFormat icudf; String method; if (dstyle < 0) { icudf = com.ibm.icu.text.DateFormat.getTimeInstance(tstyle, loc); method = "getTimeInstance"; } else if (tstyle < 0) { icudf = com.ibm.icu.text.DateFormat.getDateInstance(dstyle, loc); method = "getDateInstance"; } else { icudf = com.ibm.icu.text.DateFormat.getDateTimeInstance(dstyle, tstyle, loc); method = "getDateTimeInstance"; } if (methodName != null) { methodName[0] = method; } return icudf; } /* * Testing the behavior of date format between ICU instance and its * equivalent created via the Locale SPI framework. */ public void TestICUEquivalent() { Locale[] TEST_LOCALES = { new Locale("en", "US"), new Locale("it", "IT"), new Locale("iw", "IL"), new Locale("ja", "JP", "JP"), new Locale("th", "TH"), new Locale("zh", "TW"), }; long[] TEST_DATES = { 1199499330543L, // 2008-01-05T02:15:30.543Z 1217001308085L, // 2008-07-25T15:55:08.085Z }; for (Locale loc : TEST_LOCALES) { for (int dstyle = -1; dstyle <= 3; dstyle++) { for (int tstyle = -1; tstyle <= 3; tstyle++) { if (tstyle == -1 && dstyle == -1) { continue; } Locale iculoc = TestUtil.toICUExtendedLocale(loc); DateFormat df = getJDKInstance(dstyle, tstyle, iculoc, null); com.ibm.icu.text.DateFormat icudf = getICUInstance(dstyle, tstyle, loc, null); for (long t : TEST_DATES) { // Format Date d = new Date(t); String dstr1 = df.format(d); String dstr2 = icudf.format(d); if (!dstr1.equals(dstr2)) { errln("FAIL: Different format results for locale " + loc + " (dstyle=" + dstyle + ",tstyle=" + tstyle + ") at time " + t + " - JDK:" + dstr1 + " ICU:" + dstr2); continue; } // Parse Date d1, d2; try { d1 = df.parse(dstr1); } catch (ParseException e) { errln("FAIL: ParseException thrown for JDK DateFormat for string " + dstr1 + "(locale=" + iculoc + ",dstyle=" + dstyle + ",tstyle=" + tstyle + ")"); continue; } try { d2 = icudf.parse(dstr1); } catch (ParseException e) { errln("FAIL: ParseException thrown for ICU DateFormat for string " + dstr1 + "(locale=" + loc + ",dstyle=" + dstyle + ",tstyle=" + tstyle + ")"); continue; } if (!d1.equals(d2)) { errln("FAIL: Different parse results for locale " + loc + " for date string " + dstr1 + " (dstyle=" + dstyle + ",tstyle=" + tstyle + ") at time " + t + " - JDK:" + dstr1 + " ICU:" + dstr2); } } } } } } /* * Check if ICU DateFormatProvider uses Thai native digit for Locale * th_TH_TH. */ public void TestThaiDigit() { Locale thTHTH = new Locale("th", "TH", "TH"); String pattern = "yyyy-MM-dd"; DateFormat dfmt = DateFormat.getDateInstance(DateFormat.FULL, thTHTH); DateFormat dfmtIcu = DateFormat.getDateInstance(DateFormat.FULL, TestUtil.toICUExtendedLocale(thTHTH)); ((java.text.SimpleDateFormat)dfmt).applyPattern(pattern); ((java.text.SimpleDateFormat)dfmtIcu).applyPattern(pattern); Date d = new Date(); String str1 = dfmt.format(d); String str2 = dfmtIcu.format(d); if (!str1.equals(str2)) { errln("FAIL: ICU DateFormat returned a result different from JDK for th_TH_TH"); } } } icu4j-4.2/localespi/src/com/ibm/icu/impl/0000755000175000017500000000000011361046444020146 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/0000755000175000017500000000000011361046446021605 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/util/0000755000175000017500000000000011361046446022562 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/util/LocaleNameProviderICU.java0000644000175000017500000000317211361046446027504 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.util; import java.util.Locale; import java.util.spi.LocaleNameProvider; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.util.ULocale; public class LocaleNameProviderICU extends LocaleNameProvider { @Override public String getDisplayCountry(String countryCode, Locale locale) { String id = "und_" + countryCode; String disp = ULocale.getDisplayCountry(id, ULocale.forLocale(ICULocaleServiceProvider.canonicalize(locale))); if (disp.length() == 0 || disp.equals(countryCode)) { return null; } return disp; } @Override public String getDisplayLanguage(String languageCode, Locale locale) { String disp = ULocale.getDisplayLanguage(languageCode, ULocale.forLocale(ICULocaleServiceProvider.canonicalize(locale))); if (disp.length() == 0 || disp.equals(languageCode)) { return null; } return disp; } @Override public String getDisplayVariant(String variant, Locale locale) { // ICU does not support JDK Locale variant names return null; } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/util/TimeZoneNameProviderICU.java0000644000175000017500000000400211361046446030030 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.util; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.util.TimeZone; public class TimeZoneNameProviderICU extends java.util.spi.TimeZoneNameProvider { @Override public String getDisplayName(String ID, boolean daylight, int style, Locale locale) { TimeZone tz = TimeZone.getTimeZone(ID); Locale actualLocale = ICULocaleServiceProvider.canonicalize(locale); String disp = tz.getDisplayName(daylight, style, actualLocale); if (disp.length() == 0) { return null; } // This is ugly hack, but no simple solution to check if // the localized name was picked up. int numDigits = 0; for (int i = 0; i < disp.length(); i++) { char c = disp.charAt(i); if (UCharacter.isDigit(c)) { numDigits++; } } // If there are more than 3 numbers, this code assume GMT format was used. if (numDigits >= 3) { return null; } if (daylight) { // ICU uses standard name for daylight name when the zone does not use // daylight saving time. // This is yet another ugly hack to support the JDK's behavior String stdDisp = tz.getDisplayName(false, style, actualLocale); if (disp.equals(stdDisp)) { return null; } } return disp; } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/util/CurrencyNameProviderICU.java0000644000175000017500000000212511361046446030074 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.util; import java.util.Locale; import java.util.spi.CurrencyNameProvider; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.util.Currency; public class CurrencyNameProviderICU extends CurrencyNameProvider { @Override public String getSymbol(String currencyCode, Locale locale) { Currency cur = Currency.getInstance(currencyCode); String sym = cur.getSymbol(ICULocaleServiceProvider.canonicalize(locale)); if (sym.length() == 0 || sym.equals(currencyCode)) { return null; } return sym; } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/0000755000175000017500000000000011361046446022571 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/BreakIteratorProviderICU.java0000644000175000017500000000374311361046446030255 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.text; import java.text.BreakIterator; import java.text.spi.BreakIteratorProvider; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.impl.jdkadapter.BreakIteratorICU; public class BreakIteratorProviderICU extends BreakIteratorProvider { @Override public BreakIterator getCharacterInstance(Locale locale) { com.ibm.icu.text.BreakIterator icuBrkItr = com.ibm.icu.text.BreakIterator.getCharacterInstance( ICULocaleServiceProvider.canonicalize(locale)); return BreakIteratorICU.wrap(icuBrkItr); } @Override public BreakIterator getLineInstance(Locale locale) { com.ibm.icu.text.BreakIterator icuBrkItr = com.ibm.icu.text.BreakIterator.getLineInstance( ICULocaleServiceProvider.canonicalize(locale)); return BreakIteratorICU.wrap(icuBrkItr); } @Override public BreakIterator getSentenceInstance(Locale locale) { com.ibm.icu.text.BreakIterator icuBrkItr = com.ibm.icu.text.BreakIterator.getSentenceInstance( ICULocaleServiceProvider.canonicalize(locale)); return BreakIteratorICU.wrap(icuBrkItr); } @Override public BreakIterator getWordInstance(Locale locale) { com.ibm.icu.text.BreakIterator icuBrkItr = com.ibm.icu.text.BreakIterator.getWordInstance( ICULocaleServiceProvider.canonicalize(locale)); return BreakIteratorICU.wrap(icuBrkItr); } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/DateFormatSymbolsProviderICU.java0000644000175000017500000000215111361046446031106 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.text; import java.text.DateFormatSymbols; import java.text.spi.DateFormatSymbolsProvider; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.impl.jdkadapter.DateFormatSymbolsICU; public class DateFormatSymbolsProviderICU extends DateFormatSymbolsProvider { @Override public DateFormatSymbols getInstance(Locale locale) { com.ibm.icu.text.DateFormatSymbols icuDfs = com.ibm.icu.text.DateFormatSymbols.getInstance( ICULocaleServiceProvider.canonicalize(locale)); return DateFormatSymbolsICU.wrap(icuDfs); } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/DecimalFormatSymbolsProviderICU.java0000644000175000017500000000222111361046446031565 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.text; import java.text.DecimalFormatSymbols; import java.text.spi.DecimalFormatSymbolsProvider; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.impl.jdkadapter.DecimalFormatSymbolsICU; public class DecimalFormatSymbolsProviderICU extends DecimalFormatSymbolsProvider { @Override public DecimalFormatSymbols getInstance(Locale locale) { com.ibm.icu.text.DecimalFormatSymbols icuDecfs = com.ibm.icu.text.DecimalFormatSymbols.getInstance( ICULocaleServiceProvider.canonicalize(locale)); return DecimalFormatSymbolsICU.wrap(icuDecfs); } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/NumberFormatProviderICU.java0000644000175000017500000000565211361046446030121 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.text; import java.text.NumberFormat; import java.text.spi.NumberFormatProvider; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.impl.jdkadapter.DecimalFormatICU; import com.ibm.icu.impl.jdkadapter.NumberFormatICU; public class NumberFormatProviderICU extends NumberFormatProvider { private final int NUMBER = 0; private final int INTEGER = 1; private final int CURRENCY = 2; private final int PERCENT = 3; @Override public NumberFormat getCurrencyInstance(Locale locale) { return getInstance(CURRENCY, locale); } @Override public NumberFormat getIntegerInstance(Locale locale) { return getInstance(INTEGER, locale); } @Override public NumberFormat getNumberInstance(Locale locale) { return getInstance(NUMBER, locale); } @Override public NumberFormat getPercentInstance(Locale locale) { return getInstance(PERCENT, locale); } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } private NumberFormat getInstance(int type, Locale locale) { com.ibm.icu.text.NumberFormat icuNfmt; Locale actual = ICULocaleServiceProvider.canonicalize(locale); switch (type) { case NUMBER: icuNfmt = com.ibm.icu.text.NumberFormat.getNumberInstance(actual); break; case INTEGER: icuNfmt = com.ibm.icu.text.NumberFormat.getIntegerInstance(actual); break; case CURRENCY: icuNfmt = com.ibm.icu.text.NumberFormat.getCurrencyInstance(actual); break; case PERCENT: icuNfmt = com.ibm.icu.text.NumberFormat.getPercentInstance(actual); break; default: return null; } if (!(icuNfmt instanceof com.ibm.icu.text.DecimalFormat)) { // icuNfmt must be always DecimalFormat return null; } NumberFormat nf = null; if (ICULocaleServiceProvider.useDecimalFormat()) { nf = DecimalFormatICU.wrap((com.ibm.icu.text.DecimalFormat)icuNfmt); } else { nf = NumberFormatICU.wrap(icuNfmt); } com.ibm.icu.text.DecimalFormatSymbols decfs = ICULocaleServiceProvider.getDecimalFormatSymbolsForLocale(actual); if (decfs != null) { ((com.ibm.icu.text.DecimalFormat)icuNfmt).setDecimalFormatSymbols(decfs); } return nf; } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/CollatorProviderICU.java0000644000175000017500000000204211361046446027265 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.text; import java.text.Collator; import java.text.spi.CollatorProvider; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.impl.jdkadapter.CollatorICU; public class CollatorProviderICU extends CollatorProvider { @Override public Collator getInstance(Locale locale) { com.ibm.icu.text.Collator icuCollator = com.ibm.icu.text.Collator.getInstance( ICULocaleServiceProvider.canonicalize(locale)); return CollatorICU.wrap(icuCollator); } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/text/DateFormatProviderICU.java0000644000175000017500000000530511361046446027541 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi.text; import java.text.DateFormat; import java.text.spi.DateFormatProvider; import java.util.Locale; import com.ibm.icu.impl.javaspi.ICULocaleServiceProvider; import com.ibm.icu.impl.jdkadapter.SimpleDateFormatICU; public class DateFormatProviderICU extends DateFormatProvider { private static final int NONE = -1; @Override public DateFormat getDateInstance(int style, Locale locale) { return getInstance(style, NONE, locale); } @Override public DateFormat getDateTimeInstance(int dateStyle, int timeStyle, Locale locale) { return getInstance(dateStyle, timeStyle, locale); } @Override public DateFormat getTimeInstance(int style, Locale locale) { return getInstance(NONE, style, locale); } @Override public Locale[] getAvailableLocales() { return ICULocaleServiceProvider.getAvailableLocales(); } private DateFormat getInstance(int dstyle, int tstyle, Locale locale) { com.ibm.icu.text.DateFormat icuDfmt; Locale actual = ICULocaleServiceProvider.canonicalize(locale); if (dstyle == NONE) { icuDfmt = com.ibm.icu.text.DateFormat.getTimeInstance(tstyle, actual); } else if (tstyle == NONE) { icuDfmt = com.ibm.icu.text.DateFormat.getDateInstance(dstyle, actual); } else { icuDfmt = com.ibm.icu.text.DateFormat.getDateTimeInstance(dstyle, tstyle, actual); } if (!(icuDfmt instanceof com.ibm.icu.text.SimpleDateFormat)) { // icuDfmt must be always SimpleDateFormat return null; } com.ibm.icu.text.DecimalFormatSymbols decfs = ICULocaleServiceProvider.getDecimalFormatSymbolsForLocale(actual); if (decfs != null) { com.ibm.icu.text.NumberFormat icuNfmt = icuDfmt.getNumberFormat(); if (icuNfmt instanceof com.ibm.icu.text.DecimalFormat) { ((com.ibm.icu.text.DecimalFormat)icuNfmt).setDecimalFormatSymbols(decfs); } else if (icuNfmt instanceof com.ibm.icu.impl.DateNumberFormat) { ((com.ibm.icu.impl.DateNumberFormat)icuNfmt).setZeroDigit(decfs.getDigit()); } icuDfmt.setNumberFormat(icuNfmt); } return SimpleDateFormatICU.wrap((com.ibm.icu.text.SimpleDateFormat)icuDfmt); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/ICULocaleServiceProvider.java0000644000175000017500000001536311361046446027254 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.javaspi; import java.io.IOException; import java.io.InputStream; import java.util.Arrays; import java.util.HashSet; import java.util.Locale; import java.util.Properties; import java.util.Set; import com.ibm.icu.impl.ICUResourceBundle; import com.ibm.icu.text.DecimalFormatSymbols; import com.ibm.icu.util.ULocale; public class ICULocaleServiceProvider { private static final String SPI_PROP_FILE = "com/ibm/icu/impl/javaspi/ICULocaleServiceProviderConfig.properties"; private static final String SUFFIX_KEY = "com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.icuVariantSuffix"; private static final String ENABLE_VARIANTS_KEY = "com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.enableIcuVariants"; private static final String ENABLE_ISO3_LANG_KEY = "com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.enableIso3Languages"; private static final String USE_DECIMALFORMAT_KEY = "com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.useDecimalFormat"; private static boolean configLoaded = false; private static String suffix = "ICU"; private static boolean enableVariants = true; private static boolean enableIso3Lang = true; private static boolean useDecimalFormat = false; private static final Locale[] SPECIAL_LOCALES = { new Locale("ja", "JP", "JP"), new Locale("no"), new Locale("no", "NO"), new Locale("no", "NO", "NY"), new Locale("sr", "CS"), new Locale("th", "TH", "TH"), }; private static Locale[] LOCALES = null; public static Locale[] getAvailableLocales() { Locale[] all = getLocales(); return Arrays.copyOf(all, all.length); } public static Locale canonicalize(Locale locale) { Locale result = locale; String variant = locale.getVariant(); String suffix = getIcuSuffix(); if (variant.equals(suffix)) { result = new Locale(locale.getLanguage(), locale.getCountry()); } else if (variant.endsWith(suffix) && variant.charAt(variant.length() - suffix.length() - 1) == '_') { variant = variant.substring(0, variant.length() - suffix.length() - 1); result = new Locale(locale.getLanguage(), locale.getCountry(), variant); } return result; } public static boolean useDecimalFormat() { loadConfiguration(); return useDecimalFormat; } private static final Locale THAI_NATIVE_DIGIT_LOCALE = new Locale("th", "TH", "TH"); private static final char THAI_NATIVE_ZERO = '\u0E50'; private static DecimalFormatSymbols THAI_NATIVE_DECIMAL_SYMBOLS = null; /* * Returns a DecimalFormatSymbols if the given locale requires * non-standard symbols, more specifically, native digits used * by JDK Locale th_TH_TH. If the locale does not requre a special * symbols, null is returned. */ public static synchronized DecimalFormatSymbols getDecimalFormatSymbolsForLocale(Locale loc) { if (loc.equals(THAI_NATIVE_DIGIT_LOCALE)) { if (THAI_NATIVE_DECIMAL_SYMBOLS == null) { THAI_NATIVE_DECIMAL_SYMBOLS = new DecimalFormatSymbols(new ULocale("th_TH")); THAI_NATIVE_DECIMAL_SYMBOLS.setDigit(THAI_NATIVE_ZERO); } return (DecimalFormatSymbols)THAI_NATIVE_DECIMAL_SYMBOLS.clone(); } return null; } private static synchronized Locale[] getLocales() { if (LOCALES != null) { return LOCALES; } Set localeSet = new HashSet(); ULocale[] icuLocales = ICUResourceBundle.getAvailableULocales(); for (ULocale uloc : icuLocales) { String language = uloc.getLanguage(); String country = uloc.getCountry(); String variant = uloc.getVariant(); if (language.length() >= 3 && !enableIso3Languages()) { continue; } addLocale(new Locale(language, country, variant), localeSet); } for (Locale l : SPECIAL_LOCALES) { addLocale(l, localeSet); } LOCALES = localeSet.toArray(new Locale[0]); return LOCALES; } private static void addLocale(Locale loc, Set locales) { locales.add(loc); if (enableIcuVariants()) { // Add ICU variant String language = loc.getLanguage(); String country = loc.getCountry(); String variant = loc.getVariant(); StringBuffer var = new StringBuffer(variant); if (var.length() != 0) { var.append("_"); } var.append(getIcuSuffix()); locales.add(new Locale(language, country, var.toString())); } } private static boolean enableIso3Languages() { return enableIso3Lang; } private static boolean enableIcuVariants() { loadConfiguration(); return enableVariants; } private static String getIcuSuffix() { loadConfiguration(); return suffix; } private static synchronized void loadConfiguration() { if (configLoaded) { return; } Properties spiConfigProps = new Properties(); try { InputStream is = ClassLoader.getSystemResourceAsStream(SPI_PROP_FILE); spiConfigProps.load(is); String val = (String)spiConfigProps.get(SUFFIX_KEY); if (val != null && val.length() > 0) { suffix = val; } enableVariants = parseBooleanString((String)spiConfigProps.get(ENABLE_VARIANTS_KEY), enableVariants); enableIso3Lang = parseBooleanString((String)spiConfigProps.get(ENABLE_ISO3_LANG_KEY), enableIso3Lang); useDecimalFormat = parseBooleanString((String)spiConfigProps.get(USE_DECIMALFORMAT_KEY), useDecimalFormat); } catch (IOException ioe) { // Any IO errors, ignore } configLoaded = true; } private static boolean parseBooleanString(String str, boolean defaultVal) { if (str == null) { return defaultVal; } if (str.equalsIgnoreCase("true")) { return true; } else if (str.equalsIgnoreCase("false")) { return false; } return defaultVal; } } icu4j-4.2/localespi/src/com/ibm/icu/impl/javaspi/ICULocaleServiceProviderConfig.properties0000644000175000017500000000317711361046446031655 0ustar twernertwerner#* #******************************************************************************* #* Copyright (C) 2008, International Business Machines Corporation and * #* others. All Rights Reserved. * #******************************************************************************* #* This is the properties is used for configuring ICU locale service provider #* implementation. #* # Whether if Locales with ICU's variant suffix will be included in getAvailableLocales. # [default: true] com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.enableIcuVariants = true # Suffix string used in Locale's variant field to specify the ICU implementation. # [default: ICU] com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.icuVariantSuffix = ICU # Whether if 3-letter language Locales are included in getAvailabeLocales. # [default: true] com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.enableIso3Languages = true # Whether if java.text.DecimalFormat subclass is used for NumberFormat#getXXXInstance. # DecimalFormat#format(Object,StringBuffer,FieldPosition) is declared as final, so # ICU cannot override the implementation. As a result, some number types such as # BigInteger/BigDecimal are not handled by the ICU implementation. If a client expects # NumberFormat#getXXXInstance returns a DecimalFormat (for example, need to manipulate # decimal format patterns), he/she can set true to this setting. However, in this case, # BigInteger/BigDecimal support is not done by ICU's implementation. # [default: false] com.ibm.icu.impl.javaspi.ICULocaleServiceProvider.useDecimalFormat = false icu4j-4.2/localespi/src/com/ibm/icu/impl/icuadapter/0000755000175000017500000000000011361046444022267 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/impl/icuadapter/NumberFormatJDK.java0000644000175000017500000002126611361046444026073 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.icuadapter; import java.math.RoundingMode; import java.text.FieldPosition; import java.text.NumberFormat; import java.text.ParseException; import java.text.ParsePosition; import com.ibm.icu.impl.jdkadapter.NumberFormatICU; import com.ibm.icu.math.BigDecimal; import com.ibm.icu.util.Currency; import com.ibm.icu.util.CurrencyAmount; /** * NumberFormatJDK is an adapter class which wraps java.text.NumberFormat and * implements ICU4J NumberFormat APIs. */ public class NumberFormatJDK extends com.ibm.icu.text.NumberFormat { private static final long serialVersionUID = -1739846528146803964L; private NumberFormat fJdkNfmt; private NumberFormatJDK(NumberFormat jdkNfmt) { fJdkNfmt = jdkNfmt; } public static com.ibm.icu.text.NumberFormat wrap(NumberFormat jdkNfmt) { if (jdkNfmt instanceof NumberFormatICU) { return ((NumberFormatICU)jdkNfmt).unwrap(); } return new NumberFormatJDK(jdkNfmt); } public NumberFormat unwrap() { return fJdkNfmt; } @Override public Object clone() { NumberFormatJDK other = (NumberFormatJDK)super.clone(); other.fJdkNfmt = (NumberFormat)fJdkNfmt.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof NumberFormatJDK) { return ((NumberFormatJDK)obj).fJdkNfmt.equals(fJdkNfmt); } return false; } //public String format(java.math.BigDecimal number) //public String format(BigDecimal number) @Override public StringBuffer format(java.math.BigDecimal number, StringBuffer toAppendTo, FieldPosition pos) { return fJdkNfmt.format(number, toAppendTo, pos); } @Override public StringBuffer format(BigDecimal number, StringBuffer toAppendTo, FieldPosition pos) { return fJdkNfmt.format(number.toBigDecimal(), toAppendTo, pos); } @Override public StringBuffer format(java.math.BigInteger number, StringBuffer toAppendTo, FieldPosition pos) { return fJdkNfmt.format(number, toAppendTo, pos); } //public String format(java.math.BigInteger number) //String format(CurrencyAmount currAmt) @Override public StringBuffer format(CurrencyAmount currAmt, StringBuffer toAppendTo, FieldPosition pos) { java.util.Currency save = fJdkNfmt.getCurrency(); String currCode = currAmt.getCurrency().getCurrencyCode(); boolean same = save.getCurrencyCode().equals(currCode); if (!same) { fJdkNfmt.setCurrency(java.util.Currency.getInstance(currCode)); } fJdkNfmt.format(currAmt.getNumber(), toAppendTo, pos); if (!same) { fJdkNfmt.setCurrency(save); } return toAppendTo; } //public String format(double number) @Override public StringBuffer format(double number, StringBuffer toAppendTo, FieldPosition pos) { return fJdkNfmt.format(number, toAppendTo, pos); } //public String format(long number) @Override public StringBuffer format(long number, StringBuffer toAppendTo, FieldPosition pos) { return fJdkNfmt.format(number, toAppendTo, pos); } @Override public StringBuffer format(Object number, StringBuffer toAppendTo, FieldPosition pos) { return fJdkNfmt.format(number, toAppendTo, pos); } @Override public Currency getCurrency() { java.util.Currency jdkCurrency = fJdkNfmt.getCurrency(); if (jdkCurrency == null) { return null; } return Currency.getInstance(jdkCurrency.getCurrencyCode()); } //protected Currency getEffectiveCurrency() @Override public int getMaximumFractionDigits() { return fJdkNfmt.getMaximumFractionDigits(); } @Override public int getMaximumIntegerDigits() { return fJdkNfmt.getMaximumIntegerDigits(); } @Override public int getMinimumFractionDigits() { return fJdkNfmt.getMinimumFractionDigits(); } public int getMinumumIntegerDigits() { return fJdkNfmt.getMinimumIntegerDigits(); } @Override public int getRoundingMode() { RoundingMode jdkMode = fJdkNfmt.getRoundingMode(); int icuMode = BigDecimal.ROUND_UP; if (jdkMode.equals(RoundingMode.CEILING)) { icuMode = BigDecimal.ROUND_CEILING; } else if (jdkMode.equals(RoundingMode.DOWN)) { icuMode = BigDecimal.ROUND_DOWN; } else if (jdkMode.equals(RoundingMode.FLOOR)) { icuMode = BigDecimal.ROUND_FLOOR; } else if (jdkMode.equals(RoundingMode.HALF_DOWN)) { icuMode = BigDecimal.ROUND_HALF_DOWN; } else if (jdkMode.equals(RoundingMode.HALF_EVEN)) { icuMode = BigDecimal.ROUND_HALF_EVEN; } else if (jdkMode.equals(RoundingMode.HALF_UP)) { icuMode = BigDecimal.ROUND_HALF_UP; } else if (jdkMode.equals(RoundingMode.UNNECESSARY)) { icuMode = BigDecimal.ROUND_UNNECESSARY; } else if (jdkMode.equals(RoundingMode.UP)) { icuMode = BigDecimal.ROUND_UP; } return icuMode; } @Override public int hashCode() { return fJdkNfmt.hashCode(); } @Override public boolean isGroupingUsed() { return fJdkNfmt.isGroupingUsed(); } @Override public boolean isParseIntegerOnly() { return fJdkNfmt.isParseIntegerOnly(); } @Override public boolean isParseStrict() { // JDK NumberFormat does not support strict parsing return false; } @Override public Number parse(String text) throws ParseException { return fJdkNfmt.parse(text); } @Override public Number parse(String text, ParsePosition parsePosition) { return fJdkNfmt.parse(text, parsePosition); } //public Object parseObject(String source, ParsePosition parsePosition) @Override public void setCurrency(Currency theCurrency) { if (theCurrency == null) { fJdkNfmt.setCurrency(null); return; } else { fJdkNfmt.setCurrency(java.util.Currency.getInstance(theCurrency.getCurrencyCode())); } } @Override public void setGroupingUsed(boolean newValue) { fJdkNfmt.setGroupingUsed(newValue); } @Override public void setMaximumFractionDigits(int newValue) { fJdkNfmt.setMaximumFractionDigits(newValue); } @Override public void setMaximumIntegerDigits(int newValue) { fJdkNfmt.setMaximumIntegerDigits(newValue); } @Override public void setMinimumFractionDigits(int newValue) { fJdkNfmt.setMinimumFractionDigits(newValue); } @Override public void setMinimumIntegerDigits(int newValue) { fJdkNfmt.setMinimumIntegerDigits(newValue); } @Override public void setParseIntegerOnly(boolean value) { fJdkNfmt.setParseIntegerOnly(value); } @Override public void setParseStrict(boolean value) { // JDK NumberFormat does not support strict parsing - ignore this operation } @Override public void setRoundingMode(int roundingMode) { RoundingMode mode = null; switch (roundingMode) { case BigDecimal.ROUND_CEILING: mode = RoundingMode.CEILING; break; case BigDecimal.ROUND_DOWN: mode = RoundingMode.DOWN; break; case BigDecimal.ROUND_FLOOR: mode = RoundingMode.FLOOR; break; case BigDecimal.ROUND_HALF_DOWN: mode = RoundingMode.HALF_DOWN; break; case BigDecimal.ROUND_HALF_EVEN: mode = RoundingMode.HALF_EVEN; break; case BigDecimal.ROUND_HALF_UP: mode = RoundingMode.HALF_UP; break; case BigDecimal.ROUND_UNNECESSARY: mode = RoundingMode.UNNECESSARY; break; case BigDecimal.ROUND_UP: mode = RoundingMode.UP; break; } if (mode == null) { throw new IllegalArgumentException("Invalid rounding mode: " + roundingMode); } fJdkNfmt.setRoundingMode(mode); } }icu4j-4.2/localespi/src/com/ibm/icu/impl/icuadapter/TimeZoneJDK.java0000644000175000017500000001321311361046444025215 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.icuadapter; import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar; import java.util.Locale; import java.util.TimeZone; import com.ibm.icu.impl.Grego; import com.ibm.icu.impl.jdkadapter.TimeZoneICU; import com.ibm.icu.util.ULocale; /** * TimeZoneJDK is an adapter class which wraps java.util.TimeZone and * implements ICU4J TimeZone APIs. */ public class TimeZoneJDK extends com.ibm.icu.util.TimeZone { private static final long serialVersionUID = -1137052823551791933L; private TimeZone fJdkTz; private transient Calendar fJdkCal; private TimeZoneJDK(TimeZone jdkTz) { fJdkTz = jdkTz; } public static com.ibm.icu.util.TimeZone wrap(TimeZone jdkTz) { if (jdkTz instanceof TimeZoneICU) { return ((TimeZoneICU)jdkTz).unwrap(); } return new TimeZoneJDK(jdkTz); } public TimeZone unwrap() { return fJdkTz; } @Override public Object clone() { TimeZoneJDK other = (TimeZoneJDK)super.clone(); other.fJdkTz = (TimeZone)fJdkTz.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof TimeZoneJDK) { return (((TimeZoneJDK)obj).fJdkTz).equals(fJdkTz); } return false; } //public String getDisplayName() //public String getDisplayName(boolean daylight, int style) //public String getDisplayName(Locale locale) //public String getDisplayName(ULocale locale) @Override public String getDisplayName(boolean daylight, int style, Locale locale) { return fJdkTz.getDisplayName(daylight, style, locale); } @Override public String getDisplayName(boolean daylight, int style, ULocale locale) { return fJdkTz.getDisplayName(daylight, style, locale.toLocale()); } @Override public int getDSTSavings() { return fJdkTz.getDSTSavings(); } @Override public String getID() { return fJdkTz.getID(); } @Override public int getOffset(int era, int year, int month, int day, int dayOfWeek, int milliseconds) { return fJdkTz.getOffset(era, year, month, day, dayOfWeek, milliseconds); } @Override public int getOffset(long date) { return fJdkTz.getOffset(date); } @Override public void getOffset(long date, boolean local, int[] offsets) { synchronized(this) { if (fJdkCal == null) { fJdkCal = new GregorianCalendar(fJdkTz); } if (local) { int fields[] = new int[6]; Grego.timeToFields(date, fields); int hour, min, sec, mil; int tmp = fields[5]; mil = tmp % 1000; tmp /= 1000; sec = tmp % 60; tmp /= 60; min = tmp % 60; hour = tmp / 60; fJdkCal.clear(); fJdkCal.set(fields[0], fields[1], fields[2], hour, min, sec); fJdkCal.set(java.util.Calendar.MILLISECOND, mil); int doy1, hour1, min1, sec1, mil1; doy1 = fJdkCal.get(java.util.Calendar.DAY_OF_YEAR); hour1 = fJdkCal.get(java.util.Calendar.HOUR_OF_DAY); min1 = fJdkCal.get(java.util.Calendar.MINUTE); sec1 = fJdkCal.get(java.util.Calendar.SECOND); mil1 = fJdkCal.get(java.util.Calendar.MILLISECOND); if (fields[4] != doy1 || hour != hour1 || min != min1 || sec != sec1 || mil != mil1) { // Calendar field(s) were changed due to the adjustment for non-existing time // Note: This code does not support non-existing local time at year boundary properly. // But, it should work fine for real timezones. int dayDelta = Math.abs(doy1 - fields[4]) > 1 ? 1 : doy1 - fields[4]; int delta = ((((dayDelta * 24) + hour1 - hour) * 60 + min1 - min) * 60 + sec1 - sec) * 1000 + mil1 - mil; // In this case, we use the offsets before the transition fJdkCal.setTimeInMillis(fJdkCal.getTimeInMillis() - delta - 1); } } else { fJdkCal.setTimeInMillis(date); } offsets[0] = fJdkCal.get(java.util.Calendar.ZONE_OFFSET); offsets[1] = fJdkCal.get(java.util.Calendar.DST_OFFSET); } } @Override public int getRawOffset() { return fJdkTz.getRawOffset(); } @Override public int hashCode() { return fJdkTz.hashCode(); } @Override public boolean hasSameRules(com.ibm.icu.util.TimeZone other) { return other.hasSameRules(TimeZoneJDK.wrap(fJdkTz)); } @Override public boolean inDaylightTime(Date date) { return fJdkTz.inDaylightTime(date); } @Override public void setID(String ID) { fJdkTz.setID(ID); } @Override public void setRawOffset(int offsetMillis) { fJdkTz.setRawOffset(offsetMillis); } @Override public boolean useDaylightTime() { return fJdkTz.useDaylightTime(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/0000755000175000017500000000000011361046444022257 5ustar twernertwernericu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/NumberFormatICU.java0000644000175000017500000001635011361046444026071 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.math.RoundingMode; import java.text.FieldPosition; import java.text.ParseException; import java.text.ParsePosition; import java.util.Currency; import com.ibm.icu.impl.icuadapter.NumberFormatJDK; import com.ibm.icu.text.NumberFormat; /** * NumberFormatICU is an adapter class which wraps ICU4J NumberFormat and * implements java.text.NumberFormat APIs. */ public class NumberFormatICU extends java.text.NumberFormat { private static final long serialVersionUID = 4892903815641574060L; private NumberFormat fIcuNfmt; private NumberFormatICU(NumberFormat icuNfmt) { fIcuNfmt = icuNfmt; } public static java.text.NumberFormat wrap(NumberFormat icuNfmt) { if (icuNfmt instanceof NumberFormatJDK) { return ((NumberFormatJDK)icuNfmt).unwrap(); } return new NumberFormatICU(icuNfmt); } public NumberFormat unwrap() { return fIcuNfmt; } @Override public Object clone() { NumberFormatICU other = (NumberFormatICU)super.clone(); other.fIcuNfmt = (NumberFormat)fIcuNfmt.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof NumberFormatICU) { return ((NumberFormatICU)obj).fIcuNfmt.equals(fIcuNfmt); } return false; } //public String format(double number) @Override public StringBuffer format(double number, StringBuffer toAppendTo, FieldPosition pos) { return fIcuNfmt.format(number, toAppendTo, pos); } //public String format(long number); @Override public StringBuffer format(long number, StringBuffer toAppendTo, FieldPosition pos) { return fIcuNfmt.format(number, toAppendTo, pos); } @Override public StringBuffer format(Object number, StringBuffer toAppendTo, FieldPosition pos) { return fIcuNfmt.format(number, toAppendTo, pos); } @Override public Currency getCurrency() { com.ibm.icu.util.Currency icuCurrency = fIcuNfmt.getCurrency(); if (icuCurrency == null) { return null; } return Currency.getInstance(icuCurrency.getCurrencyCode()); } @Override public int getMaximumFractionDigits() { return fIcuNfmt.getMaximumFractionDigits(); } @Override public int getMaximumIntegerDigits() { return fIcuNfmt.getMaximumIntegerDigits(); } @Override public int getMinimumFractionDigits() { return fIcuNfmt.getMinimumFractionDigits(); } @Override public int getMinimumIntegerDigits() { return fIcuNfmt.getMinimumIntegerDigits(); } @Override public RoundingMode getRoundingMode() { int icuMode = fIcuNfmt.getRoundingMode(); RoundingMode mode = RoundingMode.UP; switch (icuMode) { case com.ibm.icu.math.BigDecimal.ROUND_CEILING: mode = RoundingMode.CEILING; break; case com.ibm.icu.math.BigDecimal.ROUND_DOWN: mode = RoundingMode.DOWN; break; case com.ibm.icu.math.BigDecimal.ROUND_FLOOR: mode = RoundingMode.FLOOR; break; case com.ibm.icu.math.BigDecimal.ROUND_HALF_DOWN: mode = RoundingMode.HALF_DOWN; break; case com.ibm.icu.math.BigDecimal.ROUND_HALF_EVEN: mode = RoundingMode.HALF_EVEN; break; case com.ibm.icu.math.BigDecimal.ROUND_HALF_UP: mode = RoundingMode.HALF_UP; break; case com.ibm.icu.math.BigDecimal.ROUND_UNNECESSARY: mode = RoundingMode.UNNECESSARY; break; case com.ibm.icu.math.BigDecimal.ROUND_UP: mode = RoundingMode.UP; break; } return mode; } @Override public int hashCode() { return fIcuNfmt.hashCode(); } @Override public boolean isGroupingUsed() { return fIcuNfmt.isGroupingUsed(); } @Override public boolean isParseIntegerOnly() { return fIcuNfmt.isParseIntegerOnly(); } @Override public Number parse(String source) throws ParseException { return fIcuNfmt.parse(source); } @Override public Number parse(String source, ParsePosition parsePosition) { return fIcuNfmt.parse(source, parsePosition); } //public Object parseObject(String source, ParsePosition pos) @Override public void setCurrency(Currency currency) { if (currency == null) { fIcuNfmt.setCurrency(null); } else { fIcuNfmt.setCurrency(com.ibm.icu.util.Currency.getInstance(currency.getCurrencyCode())); } } @Override public void setGroupingUsed(boolean newValue) { fIcuNfmt.setGroupingUsed(newValue); } @Override public void setMaximumFractionDigits(int newValue) { fIcuNfmt.setMaximumFractionDigits(newValue); } @Override public void setMaximumIntegerDigits(int newValue) { fIcuNfmt.setMaximumIntegerDigits(newValue); } @Override public void setMinimumFractionDigits(int newValue) { fIcuNfmt.setMinimumFractionDigits(newValue); } @Override public void setMinimumIntegerDigits(int newValue) { fIcuNfmt.setMinimumIntegerDigits(newValue); } @Override public void setParseIntegerOnly(boolean value) { fIcuNfmt.setParseIntegerOnly(value); } @Override public void setRoundingMode(RoundingMode roundingMode) { if (roundingMode.equals(RoundingMode.CEILING)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_CEILING); } else if (roundingMode.equals(RoundingMode.DOWN)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_DOWN); } else if (roundingMode.equals(RoundingMode.FLOOR)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_FLOOR); } else if (roundingMode.equals(RoundingMode.HALF_DOWN)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_HALF_DOWN); } else if (roundingMode.equals(RoundingMode.HALF_EVEN)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_HALF_EVEN); } else if (roundingMode.equals(RoundingMode.HALF_UP)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_HALF_UP); } else if (roundingMode.equals(RoundingMode.UNNECESSARY)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_UNNECESSARY); } else if (roundingMode.equals(RoundingMode.UP)) { fIcuNfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_UP); } else { throw new IllegalArgumentException("Invalid rounding mode was specified."); } } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/BreakIteratorICU.java0000644000175000017500000000454011361046444026224 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.text.CharacterIterator; import com.ibm.icu.text.BreakIterator; /** * BreakIteratorICU is an adapter class which wraps ICU4J BreakIterator and * implements java.text.BreakIterator APIs. */ public class BreakIteratorICU extends java.text.BreakIterator { private BreakIterator fIcuBrkItr; private BreakIteratorICU(BreakIterator icuBrkItr) { fIcuBrkItr = icuBrkItr; } public static java.text.BreakIterator wrap(BreakIterator icuBrkItr) { return new BreakIteratorICU(icuBrkItr); } public BreakIterator unwrap() { return fIcuBrkItr; } @Override public Object clone() { BreakIteratorICU other = (BreakIteratorICU)super.clone(); other.fIcuBrkItr = (BreakIterator)fIcuBrkItr.clone(); return other; } @Override public int current() { return fIcuBrkItr.current(); } @Override public int first() { return fIcuBrkItr.first(); } @Override public int following(int offset) { return fIcuBrkItr.following(offset); } @Override public CharacterIterator getText() { return fIcuBrkItr.getText(); } @Override public boolean isBoundary(int offset) { return fIcuBrkItr.isBoundary(offset); } @Override public int last() { return fIcuBrkItr.last(); } @Override public int next() { return fIcuBrkItr.next(); } @Override public int next(int n) { return fIcuBrkItr.next(n); } @Override public int preceding(int offset) { return fIcuBrkItr.preceding(offset); } @Override public int previous() { return fIcuBrkItr.previous(); } @Override public void setText(CharacterIterator newText) { fIcuBrkItr.setText(newText); } @Override public void setText(String newText) { fIcuBrkItr.setText(newText); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/CalendarICU.java0000644000175000017500000002414411361046444025201 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.util.HashMap; import java.util.Locale; import java.util.Map; import java.util.TimeZone; import com.ibm.icu.impl.icuadapter.TimeZoneJDK; import com.ibm.icu.text.DateFormatSymbols; import com.ibm.icu.util.Calendar; /** * CalendarICU is an adapter class which wraps ICU4J Calendar and * implements java.util.Calendar APIs. */ public class CalendarICU extends java.util.Calendar { private static final long serialVersionUID = -8641226371713600671L; private Calendar fIcuCal; private CalendarICU(Calendar icuCal) { fIcuCal = icuCal; init(); } public static java.util.Calendar wrap(Calendar icuCal) { return new CalendarICU(icuCal); } public Calendar unwrap() { sync(); return fIcuCal; } @Override public void add(int field, int amount) { sync(); fIcuCal.add(field, amount); } // Note: We do not need to override followings. These methods // call int compareTo(Calendar anotherCalendar) and we // override the method. //public boolean after(Object when) //public boolean before(Object when) // Note: Jeez! These methods are final and we cannot override them. // We do not want to rewrite ICU Calendar implementation classes // as subclasses of java.util.Calendar. This adapter class // wraps an ICU Calendar instance and the calendar calculation // is actually done independently from java.util.Calendar // implementation. Thus, we need to monitor the status of // superclass fields in some methods and call ICU Calendar's // clear if superclass clear update the status of superclass's // calendar fields. See private void sync(). //public void clear() //public void clear(int field) @Override public Object clone() { sync(); CalendarICU other = (CalendarICU)super.clone(); other.fIcuCal = (Calendar)fIcuCal.clone(); return other; } public int compareTo(Calendar anotherCalendar) { sync(); long thisMillis = getTimeInMillis(); long otherMillis = anotherCalendar.getTimeInMillis(); return thisMillis > otherMillis ? 1 : (thisMillis == otherMillis ? 0 : -1); } // Note: These methods are supposed to be implemented by java.util.Calendar // subclasses. But we actually use a instance of ICU Calendar // for all calendar calculation, we do nothing here. @Override protected void complete() {} @Override protected void computeFields() {} @Override protected void computeTime() {} @Override public boolean equals(Object obj) { if (obj instanceof CalendarICU) { sync(); return ((CalendarICU)obj).fIcuCal.equals(fIcuCal); } return false; } @Override public int get(int field) { sync(); return fIcuCal.get(field); } @Override public int getActualMaximum(int field) { return fIcuCal.getActualMaximum(field); } @Override public int getActualMinimum(int field) { return fIcuCal.getActualMinimum(field); } @Override public String getDisplayName(int field, int style, Locale locale) { if (field < 0 || field >= FIELD_COUNT || (style != SHORT && style != LONG && style != ALL_STYLES)) { throw new IllegalArgumentException("Bad field or style."); } DateFormatSymbols dfs = DateFormatSymbols.getInstance(locale); String[] array = getFieldStrings(field, style, dfs); if (array != null) { int fieldVal = get(field); if (fieldVal < array.length) { return array[fieldVal]; } } return null; } @Override public Map getDisplayNames(int field, int style, Locale locale) { if (field < 0 || field >= FIELD_COUNT || (style != SHORT && style != LONG && style != ALL_STYLES)) { throw new IllegalArgumentException("Bad field or style."); } DateFormatSymbols dfs = DateFormatSymbols.getInstance(locale); if (style != ALL_STYLES) { return getFieldStringsMap(field, style, dfs); } Map result = getFieldStringsMap(field, SHORT, dfs); if (result == null) { return null; } if (field == MONTH || field == DAY_OF_WEEK) { Map longMap = getFieldStringsMap(field, LONG, dfs); if (longMap != null) { result.putAll(longMap); } } return result; } @Override public int getGreatestMinimum(int field) { return fIcuCal.getGreatestMinimum(field); } @Override public int getLeastMaximum(int field) { return fIcuCal.getLeastMaximum(field); } @Override public int getMaximum(int field) { return fIcuCal.getMaximum(field); } @Override public int getMinimalDaysInFirstWeek() { return fIcuCal.getMinimalDaysInFirstWeek(); } @Override public int getMinimum(int field) { return fIcuCal.getMinimum(field); } // Note: getTime() calls getTimeInMillis() //public Date getTime() @Override public long getTimeInMillis() { sync(); return fIcuCal.getTimeInMillis(); } @Override public TimeZone getTimeZone() { return TimeZoneICU.wrap(fIcuCal.getTimeZone()); } @Override public int hashCode() { sync(); return fIcuCal.hashCode(); } //protected int internalGet(int field) @Override public boolean isLenient() { return fIcuCal.isLenient(); } //public boolean isSet(int field) @Override public void roll(int field, boolean up) { sync(); fIcuCal.roll(field, up); } @Override public void roll(int field, int amount) { sync(); fIcuCal.roll(field, amount); } @Override public void set(int field, int value) { sync(); fIcuCal.set(field, value); } // Note: These set methods call set(int field, int value) for each field. // These are final, so we cannot override them, but we override // set(int field, int value), so the superclass implementations // still work as we want. //public void set(int year, int month, int date) //public void set(int year, int month, int date, int hourOfDay, int minute) //public void set(int year, int month, int date, int hourOfDay, int minute, int second) @Override public void setFirstDayOfWeek(int value) { fIcuCal.setFirstDayOfWeek(value); } @Override public void setLenient(boolean lenient) { fIcuCal.setLenient(lenient); } @Override public void setMinimalDaysInFirstWeek(int value) { fIcuCal.setMinimalDaysInFirstWeek(value); } // Note: This method calls setTimeInMillis(long millis). // This method is final, so we cannot override it, but we // override setTimeInMillis(long millis), so the superclass // implementation still works as we want. //public void setTime(Date date) @Override public void setTimeInMillis(long millis) { fIcuCal.setTimeInMillis(millis); } @Override public void setTimeZone(TimeZone value) { fIcuCal.setTimeZone(TimeZoneJDK.wrap(value)); } @Override public String toString() { sync(); return "CalendarICU: " + fIcuCal.toString(); } private void sync() { // Check if clear is called for each JDK Calendar field. // If it was, then call clear for the field in the wrapped // ICU Calendar. for (int i = 0; i < isSet.length; i++) { if (!isSet[i]) { isSet[i] = true; try { fIcuCal.clear(i); } catch (ArrayIndexOutOfBoundsException e) { // More fields in JDK calendar, which is unlikely } } } } private void init() { // Mark "set" for all fields, so we can detect the invocation of // clear() later. for (int i = 0; i < isSet.length; i++) { isSet[i] = true; } } private static String[] getFieldStrings(int field, int style, DateFormatSymbols dfs) { String[] result = null; switch (field) { case AM_PM: result = dfs.getAmPmStrings(); break; case DAY_OF_WEEK: result = (style == LONG) ? dfs.getWeekdays() : dfs.getShortWeekdays(); break; case ERA: //result = (style == LONG) ? dfs.getEraNames() : dfs.getEras(); result = dfs.getEras(); break; case MONTH: result = (style == LONG) ? dfs.getMonths() : dfs.getShortMonths(); break; } return result; } private static Map getFieldStringsMap(int field, int style, DateFormatSymbols dfs) { String[] strings = getFieldStrings(field, style, dfs); if (strings == null) { return null; } Map res = new HashMap(); for (int i = 0; i < strings.length; i++) { if (strings[i].length() != 0) { res.put(strings[i], Integer.valueOf(i)); } } return res; } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/TimeZoneICU.java0000644000175000017500000000541111361046444025216 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.util.Date; import java.util.Locale; import com.ibm.icu.impl.icuadapter.TimeZoneJDK; import com.ibm.icu.util.TimeZone; /** * TimeZoneICU is an adapter class which wraps ICU4J TimeZone and * implements java.util.TimeZone APIs. */ public class TimeZoneICU extends java.util.TimeZone { private static final long serialVersionUID = 6019030618408620277L; private TimeZone fIcuTz; private TimeZoneICU(TimeZone icuTz) { fIcuTz = icuTz; } public static java.util.TimeZone wrap(TimeZone icuTz) { if (icuTz instanceof TimeZoneJDK) { return ((TimeZoneJDK)icuTz).unwrap(); } return new TimeZoneICU(icuTz); } public TimeZone unwrap() { return fIcuTz; } @Override public Object clone() { TimeZoneICU other = (TimeZoneICU)super.clone(); other.fIcuTz = (TimeZone)fIcuTz.clone(); return other; } //public String getDisplayName() //public String getDisplayName(boolean daylight, int style) //public String getDisplayName(Locale locale) @Override public String getDisplayName(boolean daylight, int style, Locale locale) { return fIcuTz.getDisplayName(daylight, style, locale); } @Override public int getDSTSavings() { return fIcuTz.getDSTSavings(); } @Override public String getID() { return fIcuTz.getID(); } @Override public int getOffset(int era, int year, int month, int day, int dayOfWeek, int milliseconds) { return fIcuTz.getOffset(era, year, month, day, dayOfWeek, milliseconds); } @Override public int getOffset(long date) { return fIcuTz.getOffset(date); } @Override public int getRawOffset() { return fIcuTz.getRawOffset(); } @Override public boolean hasSameRules(java.util.TimeZone other) { return other.hasSameRules(TimeZoneICU.wrap(fIcuTz)); } @Override public boolean inDaylightTime(Date date) { return fIcuTz.inDaylightTime(date); } @Override public void setID(String ID) { fIcuTz.setID(ID); } @Override public void setRawOffset(int offsetMillis) { fIcuTz.setRawOffset(offsetMillis); } @Override public boolean useDaylightTime() { return fIcuTz.useDaylightTime(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/CollationKeyICU.java0000644000175000017500000000336211361046444026064 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import com.ibm.icu.text.CollationKey; /** * CollationKeyICU is an adapter class which wraps ICU4J CollationKey and * implements java.text.CollationKey APIs. */ public class CollationKeyICU extends java.text.CollationKey { private CollationKey fIcuCollKey; private CollationKeyICU(CollationKey icuCollKey) { super(icuCollKey.getSourceString()); fIcuCollKey = icuCollKey; } public static java.text.CollationKey wrap(CollationKey icuCollKey) { return new CollationKeyICU(icuCollKey); } public CollationKey unwrap() { return fIcuCollKey; } @Override public int compareTo(java.text.CollationKey target) { if (target instanceof CollationKeyICU) { return fIcuCollKey.compareTo(((CollationKeyICU)target).fIcuCollKey); } return 0; } @Override public String getSourceString() { return fIcuCollKey.getSourceString(); } @Override public byte[] toByteArray() { return fIcuCollKey.toByteArray(); } @Override public boolean equals(Object obj) { if (obj instanceof CollationKeyICU) { return ((CollationKeyICU)obj).fIcuCollKey.equals(fIcuCollKey); } return false; } @Override public int hashCode() { return fIcuCollKey.hashCode(); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/CollatorICU.java0000644000175000017500000001245411361046444025250 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.text.CollationKey; import com.ibm.icu.text.Collator; /** * CollatorICU is an adapter class which wraps ICU4J Collator and * implements java.text.Collator APIs. */ public class CollatorICU extends java.text.Collator { private Collator fIcuCollator; private CollatorICU(Collator icuCollator) { fIcuCollator = icuCollator; } public static java.text.Collator wrap(Collator icuCollator) { return new CollatorICU(icuCollator); } public Collator unwrap() { return fIcuCollator; } public Object clone() { CollatorICU other = (CollatorICU)super.clone(); try { other.fIcuCollator = (Collator)fIcuCollator.clone(); } catch (CloneNotSupportedException e) { // ICU Collator clone() may throw CloneNotSupportedException, // but JDK does not. We use UnsupportedOperationException instead // as workwround. throw new UnsupportedOperationException("clone() is not supported by this ICU Collator."); } return other; } public int compare(Object o1, Object o2) { return fIcuCollator.compare(o1, o2); } public int compare(String source, String target) { return fIcuCollator.compare(source, target); } public boolean equals(Object that) { if (that instanceof CollatorICU) { return ((CollatorICU)that).fIcuCollator.equals(fIcuCollator); } return false; } public boolean equals(String source, String target) { return fIcuCollator.equals(source, target); } public CollationKey getCollationKey(String source) { com.ibm.icu.text.CollationKey icuCollKey = fIcuCollator.getCollationKey(source); return CollationKeyICU.wrap(icuCollKey); } public int getDecomposition() { int mode = java.text.Collator.NO_DECOMPOSITION; if (fIcuCollator.getStrength() == Collator.IDENTICAL) { return java.text.Collator.FULL_DECOMPOSITION; } int icuMode = fIcuCollator.getDecomposition(); if (icuMode == Collator.CANONICAL_DECOMPOSITION) { mode = java.text.Collator.CANONICAL_DECOMPOSITION; } // else if (icuMode == Collator.NO_DECOMPOSITION) { // mode = java.text.Collator.NO_DECOMPOSITION; // } // else { // throw new IllegalStateException("Unknown decomposition mode is used by the ICU Collator."); // } return mode; } public int getStrength() { int strength; int icuStrength = fIcuCollator.getStrength(); switch (icuStrength) { case Collator.IDENTICAL: strength = java.text.Collator.IDENTICAL; break; case Collator.PRIMARY: strength = java.text.Collator.PRIMARY; break; case Collator.SECONDARY: strength = java.text.Collator.SECONDARY; break; case Collator.TERTIARY: strength = java.text.Collator.TERTIARY; break; case Collator.QUATERNARY: // Note: No quaternary support in Java.. // Return tertiary instead for now. strength = java.text.Collator.TERTIARY; break; default: throw new IllegalStateException("Unknown strength is used by the ICU Collator."); } return strength; } public int hashCode() { return fIcuCollator.hashCode(); } public void setDecomposition(int decompositionMode) { switch (decompositionMode) { case java.text.Collator.CANONICAL_DECOMPOSITION: fIcuCollator.setDecomposition(Collator.CANONICAL_DECOMPOSITION); break; case java.text.Collator.NO_DECOMPOSITION: fIcuCollator.setDecomposition(Collator.NO_DECOMPOSITION); break; case java.text.Collator.FULL_DECOMPOSITION: // Not supported by ICU. // This option is interpreted as IDENTICAL strength. fIcuCollator.setStrength(Collator.IDENTICAL); break; default: throw new IllegalArgumentException("Invalid decomposition mode."); } } public void setStrength(int newStrength) { switch (newStrength) { case java.text.Collator.IDENTICAL: fIcuCollator.setStrength(Collator.IDENTICAL); break; case java.text.Collator.PRIMARY: fIcuCollator.setStrength(Collator.PRIMARY); break; case java.text.Collator.SECONDARY: fIcuCollator.setStrength(Collator.SECONDARY); break; case java.text.Collator.TERTIARY: fIcuCollator.setStrength(Collator.TERTIARY); break; default: throw new IllegalArgumentException("Invalid strength."); } } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/DateFormatSymbolsICU.java0000644000175000017500000000656311361046444027074 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import com.ibm.icu.text.DateFormatSymbols; /** * DateFormatSymbolsICU is an adapter class which wraps ICU4J DateFormatSymbols and * implements java.text.DateFormatSymbols APIs. */ public class DateFormatSymbolsICU extends java.text.DateFormatSymbols { private static final long serialVersionUID = -7313618555550964943L; private DateFormatSymbols fIcuDfs; private DateFormatSymbolsICU(DateFormatSymbols icuDfs) { fIcuDfs = icuDfs; } public static java.text.DateFormatSymbols wrap(DateFormatSymbols icuDfs) { return new DateFormatSymbolsICU(icuDfs); } public DateFormatSymbols unwrap() { return fIcuDfs; } @Override public Object clone() { DateFormatSymbolsICU other = (DateFormatSymbolsICU)super.clone(); other.fIcuDfs = (DateFormatSymbols)this.fIcuDfs.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof DateFormatSymbolsICU) { return ((DateFormatSymbolsICU)obj).fIcuDfs.equals(this.fIcuDfs); } return false; } @Override public String[] getAmPmStrings() { return fIcuDfs.getAmPmStrings(); } @Override public String[] getEras() { return fIcuDfs.getEras(); } public String getLocalePatternChars() { return fIcuDfs.getLocalPatternChars(); } @Override public String[] getMonths() { return fIcuDfs.getMonths(); } @Override public String[] getShortMonths() { return fIcuDfs.getShortMonths(); } @Override public String[] getShortWeekdays() { return fIcuDfs.getShortWeekdays(); } @Override public String[] getWeekdays() { return fIcuDfs.getWeekdays(); } @Override public String[][] getZoneStrings() { return fIcuDfs.getZoneStrings(); } @Override public int hashCode() { return fIcuDfs.hashCode(); } @Override public void setAmPmStrings(String[] newAmpms) { fIcuDfs.setAmPmStrings(newAmpms); } @Override public void setEras(String[] newEras) { fIcuDfs.setEras(newEras); } @Override public void setLocalPatternChars(String newLocalPatternChars) { fIcuDfs.setLocalPatternChars(newLocalPatternChars); } @Override public void setMonths(String[] newMonths) { fIcuDfs.setMonths(newMonths); } @Override public void setShortMonths(String[] newShortMonths) { fIcuDfs.setShortMonths(newShortMonths); } @Override public void setShortWeekdays(String[] newShortWeekdays) { fIcuDfs.setShortWeekdays(newShortWeekdays); } @Override public void setWeekdays(String[] newWeekdays) { fIcuDfs.setWeekdays(newWeekdays); } @Override public void setZoneStrings(String[][] newZoneStrings) { fIcuDfs.setZoneStrings(newZoneStrings); } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/DecimalFormatICU.java0000644000175000017500000003651411361046444026203 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.math.RoundingMode; import java.text.AttributedCharacterIterator; import java.text.AttributedString; import java.text.CharacterIterator; import java.text.DecimalFormatSymbols; import java.text.FieldPosition; import java.text.ParsePosition; import java.util.Currency; import java.util.HashMap; import java.util.Map; import java.util.Set; import com.ibm.icu.text.DecimalFormat; import com.ibm.icu.text.NumberFormat; /** * DecimalFormatICU is an adapter class which wraps ICU4J DecimalFormat and * implements java.text.DecimalFormat APIs. */ public class DecimalFormatICU extends java.text.DecimalFormat { private static final long serialVersionUID = 6441573352964019403L; private DecimalFormat fIcuDecfmt; private DecimalFormatICU(DecimalFormat icuDecfmt) { fIcuDecfmt = icuDecfmt; } public static java.text.DecimalFormat wrap(DecimalFormat icuDecfmt) { return new DecimalFormatICU(icuDecfmt); } public DecimalFormat unwrap() { return fIcuDecfmt; } // Methods overriding java.text.DecimalFormat @Override public void applyLocalizedPattern(String pattern) { fIcuDecfmt.applyLocalizedPattern(pattern); } @Override public void applyPattern(String pattern) { fIcuDecfmt.applyPattern(pattern); } @Override public Object clone() { DecimalFormatICU other = (DecimalFormatICU)super.clone(); other.fIcuDecfmt = (DecimalFormat)fIcuDecfmt.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof DecimalFormatICU) { return ((DecimalFormatICU)obj).fIcuDecfmt.equals(fIcuDecfmt); } return false; } @Override public StringBuffer format(double number, StringBuffer result, FieldPosition fieldPosition) { return fIcuDecfmt.format(number, result, fieldPosition); } @Override public StringBuffer format(long number, StringBuffer result, FieldPosition fieldPosition) { return fIcuDecfmt.format(number, result, fieldPosition); } @Override public AttributedCharacterIterator formatToCharacterIterator(Object obj) { AttributedCharacterIterator aci = fIcuDecfmt.formatToCharacterIterator(obj); // Create a new AttributedString StringBuilder sb = new StringBuilder(aci.getEndIndex() - aci.getBeginIndex()); char c = aci.first(); while (true) { sb.append(c); c = aci.next(); if (c == CharacterIterator.DONE) { break; } } AttributedString resstr = new AttributedString(sb.toString()); // Mapping attributes Map attributes = null; int index = aci.getBeginIndex(); int residx = 0; while (true) { if (aci.setIndex(index) == CharacterIterator.DONE) { break; } attributes = aci.getAttributes(); if (attributes != null) { int end = aci.getRunLimit(); Map jdkAttributes = new HashMap(); Set keys = attributes.keySet(); for (AttributedCharacterIterator.Attribute key : keys) { AttributedCharacterIterator.Attribute jdkKey = mapAttribute(key); Object jdkVal = attributes.get(key); if (jdkVal instanceof AttributedCharacterIterator.Attribute) { jdkVal = mapAttribute((AttributedCharacterIterator.Attribute)jdkVal); } jdkAttributes.put(jdkKey, jdkVal); } int resend = residx + (end - index); resstr.addAttributes(jdkAttributes, residx, resend); index = end; residx = resend; } } return resstr.getIterator(); } @Override public Currency getCurrency() { com.ibm.icu.util.Currency icuCurrency = fIcuDecfmt.getCurrency(); if (icuCurrency == null) { return null; } return Currency.getInstance(icuCurrency.getCurrencyCode()); } @Override public DecimalFormatSymbols getDecimalFormatSymbols() { return DecimalFormatSymbolsICU.wrap(fIcuDecfmt.getDecimalFormatSymbols()); } @Override public int getGroupingSize() { return fIcuDecfmt.getGroupingSize(); } @Override public int getMaximumFractionDigits() { return fIcuDecfmt.getMaximumFractionDigits(); } @Override public int getMaximumIntegerDigits() { return fIcuDecfmt.getMaximumIntegerDigits(); } @Override public int getMinimumFractionDigits() { return fIcuDecfmt.getMinimumFractionDigits(); } @Override public int getMinimumIntegerDigits() { return fIcuDecfmt.getMinimumIntegerDigits(); } @Override public int getMultiplier() { return fIcuDecfmt.getMultiplier(); } @Override public String getNegativePrefix() { return fIcuDecfmt.getNegativePrefix(); } @Override public String getNegativeSuffix() { return fIcuDecfmt.getNegativeSuffix(); } @Override public String getPositivePrefix() { return fIcuDecfmt.getPositivePrefix(); } @Override public String getPositiveSuffix() { return fIcuDecfmt.getPositiveSuffix(); } @Override public RoundingMode getRoundingMode() { int icuMode = fIcuDecfmt.getRoundingMode(); RoundingMode mode = RoundingMode.UP; switch (icuMode) { case com.ibm.icu.math.BigDecimal.ROUND_CEILING: mode = RoundingMode.CEILING; break; case com.ibm.icu.math.BigDecimal.ROUND_DOWN: mode = RoundingMode.DOWN; break; case com.ibm.icu.math.BigDecimal.ROUND_FLOOR: mode = RoundingMode.FLOOR; break; case com.ibm.icu.math.BigDecimal.ROUND_HALF_DOWN: mode = RoundingMode.HALF_DOWN; break; case com.ibm.icu.math.BigDecimal.ROUND_HALF_EVEN: mode = RoundingMode.HALF_EVEN; break; case com.ibm.icu.math.BigDecimal.ROUND_HALF_UP: mode = RoundingMode.HALF_UP; break; case com.ibm.icu.math.BigDecimal.ROUND_UNNECESSARY: mode = RoundingMode.UNNECESSARY; break; case com.ibm.icu.math.BigDecimal.ROUND_UP: mode = RoundingMode.UP; break; } return mode; } @Override public int hashCode() { return fIcuDecfmt.hashCode(); } @Override public boolean isDecimalSeparatorAlwaysShown() { return fIcuDecfmt.isDecimalSeparatorAlwaysShown(); } @Override public boolean isParseBigDecimal() { return fIcuDecfmt.isParseBigDecimal(); } @Override public Number parse(String text, ParsePosition pos) { return fIcuDecfmt.parse(text, pos); } @Override public void setCurrency(Currency currency) { if (currency == null) { fIcuDecfmt.setCurrency(null); } else { fIcuDecfmt.setCurrency(com.ibm.icu.util.Currency.getInstance(currency.getCurrencyCode())); } } @Override public void setDecimalFormatSymbols(DecimalFormatSymbols newSymbols) { com.ibm.icu.text.DecimalFormatSymbols icuDecfs = null; if (newSymbols instanceof DecimalFormatSymbolsICU) { icuDecfs = ((DecimalFormatSymbolsICU)newSymbols).unwrap(); } else { icuDecfs = fIcuDecfmt.getDecimalFormatSymbols(); Currency currency = newSymbols.getCurrency(); if (currency == null) { icuDecfs.setCurrency(null); } else { icuDecfs.setCurrency(com.ibm.icu.util.Currency.getInstance(currency.getCurrencyCode())); } // Copy symbols icuDecfs.setCurrencySymbol(newSymbols.getCurrencySymbol()); icuDecfs.setDecimalSeparator(newSymbols.getDecimalSeparator()); icuDecfs.setDigit(newSymbols.getDigit()); icuDecfs.setExponentSeparator(newSymbols.getExponentSeparator()); icuDecfs.setGroupingSeparator(newSymbols.getGroupingSeparator()); icuDecfs.setInfinity(newSymbols.getInfinity()); icuDecfs.setInternationalCurrencySymbol(newSymbols.getInternationalCurrencySymbol()); icuDecfs.setMinusSign(newSymbols.getMinusSign()); icuDecfs.setMonetaryDecimalSeparator(newSymbols.getMonetaryDecimalSeparator()); icuDecfs.setNaN(newSymbols.getNaN()); icuDecfs.setPatternSeparator(newSymbols.getPatternSeparator()); icuDecfs.setPercent(newSymbols.getPercent()); icuDecfs.setPerMill(newSymbols.getPerMill()); icuDecfs.setZeroDigit(newSymbols.getZeroDigit()); } fIcuDecfmt.setDecimalFormatSymbols(icuDecfs); } @Override public void setDecimalSeparatorAlwaysShown(boolean newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setDecimalSeparatorAlwaysShown(newValue); } } @Override public void setGroupingSize(int newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setGroupingSize(newValue); } } @Override public void setMaximumFractionDigits(int newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setMaximumFractionDigits(newValue); } } @Override public void setMaximumIntegerDigits(int newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setMaximumIntegerDigits(newValue); } } @Override public void setMinimumFractionDigits(int newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setMinimumFractionDigits(newValue); } } @Override public void setMinimumIntegerDigits(int newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setMinimumIntegerDigits(newValue); } } @Override public void setMultiplier(int newValue) { fIcuDecfmt.setMultiplier(newValue); } @Override public void setNegativePrefix(String newValue) { fIcuDecfmt.setNegativePrefix(newValue); } @Override public void setNegativeSuffix(String newValue) { fIcuDecfmt.setNegativeSuffix(newValue); } @Override public void setParseBigDecimal(boolean newValue) { fIcuDecfmt.setParseBigDecimal(newValue); } @Override public void setPositivePrefix(String newValue) { fIcuDecfmt.setPositivePrefix(newValue); } @Override public void setPositiveSuffix(String newValue) { fIcuDecfmt.setPositiveSuffix(newValue); } @Override public void setRoundingMode(RoundingMode roundingMode) { if (roundingMode.equals(RoundingMode.CEILING)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_CEILING); } else if (roundingMode.equals(RoundingMode.DOWN)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_DOWN); } else if (roundingMode.equals(RoundingMode.FLOOR)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_FLOOR); } else if (roundingMode.equals(RoundingMode.HALF_DOWN)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_HALF_DOWN); } else if (roundingMode.equals(RoundingMode.HALF_EVEN)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_HALF_EVEN); } else if (roundingMode.equals(RoundingMode.HALF_UP)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_HALF_UP); } else if (roundingMode.equals(RoundingMode.UNNECESSARY)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_UNNECESSARY); } else if (roundingMode.equals(RoundingMode.UP)) { fIcuDecfmt.setRoundingMode(com.ibm.icu.math.BigDecimal.ROUND_UP); } else { throw new IllegalArgumentException("Invalid rounding mode was specified."); } } @Override public String toLocalizedPattern() { return fIcuDecfmt.toLocalizedPattern(); } @Override public String toPattern() { return fIcuDecfmt.toPattern(); } // Methods overriding java.text.NumberFormat @Override public boolean isGroupingUsed() { return fIcuDecfmt.isGroupingUsed(); } @Override public boolean isParseIntegerOnly() { return fIcuDecfmt.isParseIntegerOnly(); } @Override public void setGroupingUsed(boolean newValue) { if (fIcuDecfmt != null) { fIcuDecfmt.setGroupingUsed(newValue); } } @Override public void setParseIntegerOnly(boolean value) { fIcuDecfmt.setParseIntegerOnly(value); } private static AttributedCharacterIterator.Attribute mapAttribute(AttributedCharacterIterator.Attribute icuAttribute) { AttributedCharacterIterator.Attribute jdkAttribute = icuAttribute; if (icuAttribute == NumberFormat.Field.CURRENCY) { jdkAttribute = java.text.NumberFormat.Field.CURRENCY; } else if (icuAttribute == NumberFormat.Field.DECIMAL_SEPARATOR) { jdkAttribute = java.text.NumberFormat.Field.DECIMAL_SEPARATOR; } else if (icuAttribute == NumberFormat.Field.EXPONENT) { jdkAttribute = java.text.NumberFormat.Field.EXPONENT; } else if (icuAttribute == NumberFormat.Field.EXPONENT_SIGN) { jdkAttribute = java.text.NumberFormat.Field.EXPONENT_SIGN; } else if (icuAttribute == NumberFormat.Field.EXPONENT_SYMBOL) { jdkAttribute = java.text.NumberFormat.Field.EXPONENT_SYMBOL; } else if (icuAttribute == NumberFormat.Field.FRACTION) { jdkAttribute = java.text.NumberFormat.Field.FRACTION; } else if (icuAttribute == NumberFormat.Field.GROUPING_SEPARATOR) { jdkAttribute = java.text.NumberFormat.Field.GROUPING_SEPARATOR; } else if (icuAttribute == NumberFormat.Field.INTEGER) { jdkAttribute = java.text.NumberFormat.Field.INTEGER; } else if (icuAttribute == NumberFormat.Field.PERCENT) { jdkAttribute = java.text.NumberFormat.Field.PERCENT; } else if (icuAttribute == NumberFormat.Field.PERMILLE) { jdkAttribute = java.text.NumberFormat.Field.PERMILLE; } else if (icuAttribute == NumberFormat.Field.SIGN) { jdkAttribute = java.text.NumberFormat.Field.SIGN; } return jdkAttribute; } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/SimpleDateFormatICU.java0000644000175000017500000003503311361046444026667 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.text.AttributedCharacterIterator; import java.text.AttributedString; import java.text.CharacterIterator; import java.text.DateFormatSymbols; import java.text.FieldPosition; import java.text.NumberFormat; import java.text.ParsePosition; import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar; import java.util.HashMap; import java.util.Map; import java.util.Set; import java.util.TimeZone; import com.ibm.icu.impl.icuadapter.NumberFormatJDK; import com.ibm.icu.impl.icuadapter.TimeZoneJDK; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.SimpleDateFormat; /** * SimpleDateFormatICU is an adapter class which wraps ICU4J SimpleDateFormat and * implements java.text.SimpleDateFormat APIs. */ public class SimpleDateFormatICU extends java.text.SimpleDateFormat { private static final long serialVersionUID = -2060890659010258983L; private SimpleDateFormat fIcuSdf; private SimpleDateFormatICU(SimpleDateFormat icuSdf) { fIcuSdf = icuSdf; } public static java.text.SimpleDateFormat wrap(SimpleDateFormat icuSdf) { return new SimpleDateFormatICU(icuSdf); } // Methods overriding java.text.SimpleDateFormat @Override public void applyLocalizedPattern(String pattern) { fIcuSdf.applyLocalizedPattern(pattern); } @Override public void applyPattern(String pattern) { fIcuSdf.applyPattern(pattern); } @Override public Object clone() { SimpleDateFormatICU other = (SimpleDateFormatICU)super.clone(); other.fIcuSdf = (SimpleDateFormat)this.fIcuSdf.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof SimpleDateFormatICU) { return ((SimpleDateFormatICU)obj).fIcuSdf.equals(this.fIcuSdf); } return false; } @Override public StringBuffer format(Date date, StringBuffer toAppendTo, FieldPosition pos) { return fIcuSdf.format(date, toAppendTo, pos); } @Override public AttributedCharacterIterator formatToCharacterIterator(Object obj) { AttributedCharacterIterator aci = fIcuSdf.formatToCharacterIterator(obj); // Create a new AttributedString StringBuilder sb = new StringBuilder(aci.getEndIndex() - aci.getBeginIndex()); char c = aci.first(); while (true) { sb.append(c); c = aci.next(); if (c == CharacterIterator.DONE) { break; } } AttributedString resstr = new AttributedString(sb.toString()); // Mapping attributes Map attributes = null; int index = aci.getBeginIndex(); int residx = 0; while (true) { if (aci.setIndex(index) == CharacterIterator.DONE) { break; } attributes = aci.getAttributes(); if (attributes != null) { int end = aci.getRunLimit(); Map jdkAttributes = new HashMap(); Set keys = attributes.keySet(); for (AttributedCharacterIterator.Attribute key : keys) { AttributedCharacterIterator.Attribute jdkKey = mapAttribute(key); Object jdkVal = attributes.get(key); if (jdkVal instanceof AttributedCharacterIterator.Attribute) { jdkVal = mapAttribute((AttributedCharacterIterator.Attribute)jdkVal); } jdkAttributes.put(jdkKey, jdkVal); } int resend = residx + (end - index); resstr.addAttributes(jdkAttributes, residx, resend); index = end; residx = resend; } } return resstr.getIterator(); } @Override public Date get2DigitYearStart() { return fIcuSdf.get2DigitYearStart(); } @Override public DateFormatSymbols getDateFormatSymbols() { return DateFormatSymbolsICU.wrap(fIcuSdf.getDateFormatSymbols()); } @Override public int hashCode() { return fIcuSdf.hashCode(); } @Override public Date parse(String text, ParsePosition pos) { return fIcuSdf.parse(text, pos); } @Override public void set2DigitYearStart(Date startDate) { fIcuSdf.set2DigitYearStart(startDate); } @Override public void setDateFormatSymbols(DateFormatSymbols newFormatSymbols) { com.ibm.icu.text.DateFormatSymbols icuDfs = null; if (newFormatSymbols instanceof DateFormatSymbolsICU) { icuDfs = ((DateFormatSymbolsICU)newFormatSymbols).unwrap(); } else if (fIcuSdf.getCalendar() instanceof com.ibm.icu.util.GregorianCalendar) { // Java 6 uses DateFormatSymbols exclusively for Gregorian // calendar. String[] newJDK, curICU, newICU; icuDfs = fIcuSdf.getDateFormatSymbols(); // Eras newJDK = newFormatSymbols.getEras(); curICU = icuDfs.getEras(); newICU = copySymbols(newJDK, curICU, true); // Months newJDK = newFormatSymbols.getMonths(); curICU = icuDfs.getMonths(); newICU = copySymbols(newJDK, curICU, false); icuDfs.setMonths(newICU); // ShortMonths newJDK = newFormatSymbols.getShortMonths(); curICU = icuDfs.getShortMonths(); newICU = copySymbols(newJDK, curICU, false); icuDfs.setShortMonths(newICU); // Weekdays newJDK = newFormatSymbols.getWeekdays(); curICU = icuDfs.getWeekdays(); newICU = copySymbols(newJDK, curICU, false); icuDfs.setWeekdays(newICU); // ShortWeekdays newJDK = newFormatSymbols.getShortWeekdays(); curICU = icuDfs.getShortWeekdays(); newICU = copySymbols(newJDK, curICU, false); icuDfs.setShortWeekdays(newICU); // AmPm newJDK = newFormatSymbols.getAmPmStrings(); curICU = icuDfs.getAmPmStrings(); newICU = copySymbols(newJDK, curICU, false); icuDfs.setAmPmStrings(newICU); } else { // For other calendars, JDK's standard DateFormatSymbols // cannot be used. throw new UnsupportedOperationException("JDK DateFormatSymbols cannot be used for the calendar type."); } fIcuSdf.setDateFormatSymbols(icuDfs); } @Override public String toLocalizedPattern() { return fIcuSdf.toLocalizedPattern(); } @Override public String toPattern() { return fIcuSdf.toLocalizedPattern(); } // Methods overriding java.text.DateFormat @Override public Calendar getCalendar() { return CalendarICU.wrap(fIcuSdf.getCalendar()); } @Override public NumberFormat getNumberFormat() { com.ibm.icu.text.NumberFormat nfmt = fIcuSdf.getNumberFormat(); if (nfmt instanceof NumberFormatJDK) { return ((NumberFormatJDK)nfmt).unwrap(); } if (nfmt instanceof com.ibm.icu.text.DecimalFormat) { return DecimalFormatICU.wrap((com.ibm.icu.text.DecimalFormat)nfmt); } return NumberFormatICU.wrap(nfmt); } @Override public TimeZone getTimeZone() { return getCalendar().getTimeZone(); } @Override public boolean isLenient() { return fIcuSdf.isLenient(); } private static final long SAMPLE_TIME = 962409600000L; //2000-07-01T00:00:00Z private static final int JAPANESE_YEAR = 12; // Japanese calendar year @ SAMPLE_TIME private static final int THAI_YEAR = 2543; // Thai Buddhist calendar year @ SAMPLE_TIME @Override public void setCalendar(Calendar newCalendar) { com.ibm.icu.util.Calendar icuCal = null; if (newCalendar instanceof CalendarICU) { icuCal = ((CalendarICU)newCalendar).unwrap(); } else { // Note: There is no easy way to implement ICU Calendar with // JDK Calendar implementation. For now, this code assumes // the given calendar is either Gregorian, Buddhist or // JapaneseImperial. Once the type is detected, this code // creates an instance of ICU Calendar with the same type. com.ibm.icu.util.TimeZone icuTz = TimeZoneJDK.wrap(newCalendar.getTimeZone()); if (newCalendar instanceof GregorianCalendar) { icuCal = new com.ibm.icu.util.GregorianCalendar(icuTz); } else { newCalendar.setTimeInMillis(SAMPLE_TIME); int year = newCalendar.get(Calendar.YEAR); if (year == JAPANESE_YEAR) { icuCal = new com.ibm.icu.util.JapaneseCalendar(icuTz); } else if (year == THAI_YEAR) { icuCal = new com.ibm.icu.util.BuddhistCalendar(icuTz); } else { // We cannot support the case throw new UnsupportedOperationException("Unsupported calendar type by ICU Calendar adapter."); } } // Copy the original calendar settings icuCal.setFirstDayOfWeek(newCalendar.getFirstDayOfWeek()); icuCal.setLenient(newCalendar.isLenient()); icuCal.setMinimalDaysInFirstWeek(newCalendar.getMinimalDaysInFirstWeek()); } fIcuSdf.setCalendar(icuCal); } @Override public void setLenient(boolean lenient) { fIcuSdf.setLenient(lenient); } @Override public void setNumberFormat(NumberFormat newNumberFormat) { if (newNumberFormat instanceof DecimalFormatICU) { fIcuSdf.setNumberFormat(((DecimalFormatICU)newNumberFormat).unwrap()); } else if (newNumberFormat instanceof NumberFormatICU) { fIcuSdf.setNumberFormat(((NumberFormatICU)newNumberFormat).unwrap()); } else { fIcuSdf.setNumberFormat(NumberFormatJDK.wrap(newNumberFormat)); } } @Override public void setTimeZone(TimeZone zone) { fIcuSdf.setTimeZone(TimeZoneJDK.wrap(zone)); } private String[] copySymbols(String[] newData, String[] curData, boolean alignEnd) { if (newData.length >= curData.length) { return newData; } int startOffset = alignEnd ? curData.length - newData.length : 0; System.arraycopy(newData, 0, curData, startOffset, newData.length); return curData; } private static AttributedCharacterIterator.Attribute mapAttribute(AttributedCharacterIterator.Attribute icuAttribute) { AttributedCharacterIterator.Attribute jdkAttribute = icuAttribute; if (icuAttribute == DateFormat.Field.AM_PM) { jdkAttribute = java.text.DateFormat.Field.AM_PM; } else if (icuAttribute == DateFormat.Field.DAY_OF_MONTH) { jdkAttribute = java.text.DateFormat.Field.DAY_OF_MONTH; } else if (icuAttribute == DateFormat.Field.DAY_OF_WEEK) { jdkAttribute = java.text.DateFormat.Field.DAY_OF_WEEK; } else if (icuAttribute == DateFormat.Field.DAY_OF_WEEK_IN_MONTH) { jdkAttribute = java.text.DateFormat.Field.DAY_OF_WEEK_IN_MONTH; } else if (icuAttribute == DateFormat.Field.DAY_OF_YEAR) { jdkAttribute = java.text.DateFormat.Field.DAY_OF_YEAR; } else if (icuAttribute == DateFormat.Field.ERA) { jdkAttribute = java.text.DateFormat.Field.ERA; } else if (icuAttribute == DateFormat.Field.HOUR_OF_DAY0) { jdkAttribute = java.text.DateFormat.Field.HOUR_OF_DAY0; } else if (icuAttribute == DateFormat.Field.HOUR_OF_DAY1) { jdkAttribute = java.text.DateFormat.Field.HOUR_OF_DAY1; } else if (icuAttribute == DateFormat.Field.HOUR0) { jdkAttribute = java.text.DateFormat.Field.HOUR0; } else if (icuAttribute == DateFormat.Field.HOUR1) { jdkAttribute = java.text.DateFormat.Field.HOUR1; } else if (icuAttribute == DateFormat.Field.MILLISECOND) { jdkAttribute = java.text.DateFormat.Field.MILLISECOND; } else if (icuAttribute == DateFormat.Field.MINUTE) { jdkAttribute = java.text.DateFormat.Field.MINUTE; } else if (icuAttribute == DateFormat.Field.MONTH) { jdkAttribute = java.text.DateFormat.Field.MONTH; } else if (icuAttribute == DateFormat.Field.SECOND) { jdkAttribute = java.text.DateFormat.Field.SECOND; } else if (icuAttribute == DateFormat.Field.TIME_ZONE) { jdkAttribute = java.text.DateFormat.Field.TIME_ZONE; } else if (icuAttribute == DateFormat.Field.WEEK_OF_MONTH) { jdkAttribute = java.text.DateFormat.Field.WEEK_OF_MONTH; } else if (icuAttribute == DateFormat.Field.WEEK_OF_YEAR) { jdkAttribute = java.text.DateFormat.Field.WEEK_OF_YEAR; } else if (icuAttribute == DateFormat.Field.YEAR) { jdkAttribute = java.text.DateFormat.Field.YEAR; } // There are other DateFormat.Field constants defined in // ICU4J DateFormat below. // // DOW_LOCAL // EXTENDED_YEAR // JULIAN_DAY // MILLISECONDS_IN_DAY // QUARTER // YEAR_WOY // // However, the corresponding pattern characters are not used by // the default factory method - getXXXInstance. So these constants // are only used when user intentionally set a pattern including // these ICU4J specific pattern letters. Even it happens, // ICU4J's DateFormat.Field extends java.text.Format.Field, so // it does not break the contract of formatToCharacterIterator. return jdkAttribute; } } icu4j-4.2/localespi/src/com/ibm/icu/impl/jdkadapter/DecimalFormatSymbolsICU.java0000644000175000017500000001265011361046444027547 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.impl.jdkadapter; import java.util.Currency; import com.ibm.icu.text.DecimalFormatSymbols; /** * DecimalFormatSymbolsICU is an adapter class which wraps ICU4J DecimalFormatSymbols and * implements java.text.DecimalFormatSymbols APIs. */ public class DecimalFormatSymbolsICU extends java.text.DecimalFormatSymbols { private static final long serialVersionUID = -8226875908479009580L; private DecimalFormatSymbols fIcuDecfs; private DecimalFormatSymbolsICU(DecimalFormatSymbols icuDecfs) { fIcuDecfs = icuDecfs; } public static java.text.DecimalFormatSymbols wrap(DecimalFormatSymbols icuDecfs) { return new DecimalFormatSymbolsICU(icuDecfs); } public DecimalFormatSymbols unwrap() { return fIcuDecfs; } @Override public Object clone() { DecimalFormatSymbolsICU other = (DecimalFormatSymbolsICU)super.clone(); other.fIcuDecfs = (DecimalFormatSymbols)fIcuDecfs.clone(); return other; } @Override public boolean equals(Object obj) { if (obj instanceof DecimalFormatSymbolsICU) { return ((DecimalFormatSymbolsICU)obj).fIcuDecfs.equals(fIcuDecfs); } return false; } @Override public Currency getCurrency() { com.ibm.icu.util.Currency icuCurrency = fIcuDecfs.getCurrency(); if (icuCurrency == null) { return null; } return Currency.getInstance(icuCurrency.getCurrencyCode()); } @Override public String getCurrencySymbol() { return fIcuDecfs.getCurrencySymbol(); } @Override public char getDecimalSeparator() { return fIcuDecfs.getDecimalSeparator(); } @Override public char getDigit() { return fIcuDecfs.getDigit(); } @Override public String getExponentSeparator() { return fIcuDecfs.getExponentSeparator(); } @Override public char getGroupingSeparator() { return fIcuDecfs.getGroupingSeparator(); } @Override public String getInfinity() { return fIcuDecfs.getInfinity(); } @Override public String getInternationalCurrencySymbol() { return fIcuDecfs.getInternationalCurrencySymbol(); } @Override public char getMinusSign() { return fIcuDecfs.getMinusSign(); } @Override public char getMonetaryDecimalSeparator() { return fIcuDecfs.getMonetaryDecimalSeparator(); } @Override public String getNaN() { return fIcuDecfs.getNaN(); } @Override public char getPatternSeparator() { return fIcuDecfs.getPatternSeparator(); } @Override public char getPercent() { return fIcuDecfs.getPercent(); } @Override public char getPerMill() { return fIcuDecfs.getPerMill(); } @Override public char getZeroDigit() { return fIcuDecfs.getZeroDigit(); } @Override public void setCurrency(Currency currency) { com.ibm.icu.util.Currency icuCurrency = null; if (currency != null) { icuCurrency = com.ibm.icu.util.Currency.getInstance(currency.getCurrencyCode()); } fIcuDecfs.setCurrency(icuCurrency); } @Override public void setCurrencySymbol(String currency) { fIcuDecfs.setCurrencySymbol(currency); } @Override public void setDecimalSeparator(char decimalSeparator) { fIcuDecfs.setDecimalSeparator(decimalSeparator); } @Override public void setDigit(char digit) { fIcuDecfs.setDigit(digit); } @Override public void setExponentSeparator(String exp) { fIcuDecfs.setExponentSeparator(exp); } @Override public void setGroupingSeparator(char groupingSeparator) { fIcuDecfs.setGroupingSeparator(groupingSeparator); } @Override public void setInfinity(String infinity) { fIcuDecfs.setInfinity(infinity); } @Override public void setInternationalCurrencySymbol(String currencyCode) { fIcuDecfs.setInternationalCurrencySymbol(currencyCode); } @Override public void setMinusSign(char minusSign) { fIcuDecfs.setMinusSign(minusSign); } @Override public void setMonetaryDecimalSeparator(char sep) { fIcuDecfs.setMonetaryDecimalSeparator(sep); } @Override public void setNaN(String NaN) { fIcuDecfs.setNaN(NaN); } @Override public void setPatternSeparator(char patternSeparator) { fIcuDecfs.setPatternSeparator(patternSeparator); } @Override public void setPercent(char percent) { fIcuDecfs.setPercent(percent); } @Override public void setPerMill(char perMill) { fIcuDecfs.setPerMill(perMill); } @Override public void setZeroDigit(char zeroDigit) { fIcuDecfs.setZeroDigit(zeroDigit); } @Override public int hashCode() { return fIcuDecfs.hashCode(); } } icu4j-4.2/localespi/build.xml0000644000175000017500000001157411361046446016144 0ustar twernertwerner
icu4j-4.2/localespi/.settings/0000755000175000017500000000000011361046446016231 5ustar twernertwernericu4j-4.2/localespi/.settings/org.eclipse.jdt.ui.prefs0000644000175000017500000001315711361046446022707 0ustar twernertwerner#Tue Jun 03 11:18:15 EDT 2008 eclipse.preferences.version=1 formatter_profile=_ICU4J standard formatter_settings_version=11 org.eclipse.jdt.ui.exception.name=e org.eclipse.jdt.ui.gettersetter.use.is=true org.eclipse.jdt.ui.javadoc=false org.eclipse.jdt.ui.keywordthis=false org.eclipse.jdt.ui.overrideannotation=true org.eclipse.jdt.ui.text.custom_code_templates= icu4j-4.2/localespi/.settings/org.eclipse.core.resources.prefs0000644000175000017500000000013311361046446024441 0ustar twernertwerner#Tue Jun 03 11:37:05 EDT 2008 eclipse.preferences.version=1 encoding/=US-ASCII icu4j-4.2/localespi/.settings/org.eclipse.jdt.core.prefs0000644000175000017500000005450111361046446023220 0ustar twernertwerner#Tue May 20 15:13:23 EDT 2008 eclipse.preferences.version=1 org.eclipse.jdt.core.codeComplete.argumentPrefixes= org.eclipse.jdt.core.codeComplete.argumentSuffixes= org.eclipse.jdt.core.codeComplete.fieldPrefixes= org.eclipse.jdt.core.codeComplete.fieldSuffixes= org.eclipse.jdt.core.codeComplete.localPrefixes= org.eclipse.jdt.core.codeComplete.localSuffixes= org.eclipse.jdt.core.codeComplete.staticFieldPrefixes= org.eclipse.jdt.core.codeComplete.staticFieldSuffixes= org.eclipse.jdt.core.compiler.codegen.inlineJsrBytecode=enabled org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.6 org.eclipse.jdt.core.compiler.codegen.unusedLocal=preserve org.eclipse.jdt.core.compiler.compliance=1.6 org.eclipse.jdt.core.compiler.debug.lineNumber=generate org.eclipse.jdt.core.compiler.debug.localVariable=generate org.eclipse.jdt.core.compiler.debug.sourceFile=generate org.eclipse.jdt.core.compiler.problem.assertIdentifier=error org.eclipse.jdt.core.compiler.problem.enumIdentifier=error org.eclipse.jdt.core.compiler.source=1.6 org.eclipse.jdt.core.formatter.align_type_members_on_columns=false org.eclipse.jdt.core.formatter.alignment_for_arguments_in_allocation_expression=16 org.eclipse.jdt.core.formatter.alignment_for_arguments_in_enum_constant=16 org.eclipse.jdt.core.formatter.alignment_for_arguments_in_explicit_constructor_call=16 org.eclipse.jdt.core.formatter.alignment_for_arguments_in_method_invocation=16 org.eclipse.jdt.core.formatter.alignment_for_arguments_in_qualified_allocation_expression=16 org.eclipse.jdt.core.formatter.alignment_for_assignment=0 org.eclipse.jdt.core.formatter.alignment_for_binary_expression=16 org.eclipse.jdt.core.formatter.alignment_for_compact_if=16 org.eclipse.jdt.core.formatter.alignment_for_conditional_expression=80 org.eclipse.jdt.core.formatter.alignment_for_enum_constants=0 org.eclipse.jdt.core.formatter.alignment_for_expressions_in_array_initializer=16 org.eclipse.jdt.core.formatter.alignment_for_multiple_fields=16 org.eclipse.jdt.core.formatter.alignment_for_parameters_in_constructor_declaration=16 org.eclipse.jdt.core.formatter.alignment_for_parameters_in_method_declaration=16 org.eclipse.jdt.core.formatter.alignment_for_selector_in_method_invocation=16 org.eclipse.jdt.core.formatter.alignment_for_superclass_in_type_declaration=16 org.eclipse.jdt.core.formatter.alignment_for_superinterfaces_in_enum_declaration=16 org.eclipse.jdt.core.formatter.alignment_for_superinterfaces_in_type_declaration=16 org.eclipse.jdt.core.formatter.alignment_for_throws_clause_in_constructor_declaration=16 org.eclipse.jdt.core.formatter.alignment_for_throws_clause_in_method_declaration=16 org.eclipse.jdt.core.formatter.blank_lines_after_imports=1 org.eclipse.jdt.core.formatter.blank_lines_after_package=1 org.eclipse.jdt.core.formatter.blank_lines_before_field=0 org.eclipse.jdt.core.formatter.blank_lines_before_first_class_body_declaration=0 org.eclipse.jdt.core.formatter.blank_lines_before_imports=1 org.eclipse.jdt.core.formatter.blank_lines_before_member_type=1 org.eclipse.jdt.core.formatter.blank_lines_before_method=1 org.eclipse.jdt.core.formatter.blank_lines_before_new_chunk=1 org.eclipse.jdt.core.formatter.blank_lines_before_package=0 org.eclipse.jdt.core.formatter.blank_lines_between_import_groups=1 org.eclipse.jdt.core.formatter.blank_lines_between_type_declarations=1 org.eclipse.jdt.core.formatter.brace_position_for_annotation_type_declaration=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_anonymous_type_declaration=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_array_initializer=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_block=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_block_in_case=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_constructor_declaration=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_enum_constant=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_enum_declaration=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_method_declaration=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_switch=end_of_line org.eclipse.jdt.core.formatter.brace_position_for_type_declaration=end_of_line org.eclipse.jdt.core.formatter.comment.clear_blank_lines_in_block_comment=false org.eclipse.jdt.core.formatter.comment.clear_blank_lines_in_javadoc_comment=false org.eclipse.jdt.core.formatter.comment.format_block_comments=true org.eclipse.jdt.core.formatter.comment.format_header=false org.eclipse.jdt.core.formatter.comment.format_html=true org.eclipse.jdt.core.formatter.comment.format_javadoc_comments=true org.eclipse.jdt.core.formatter.comment.format_line_comments=true org.eclipse.jdt.core.formatter.comment.format_source_code=true org.eclipse.jdt.core.formatter.comment.indent_parameter_description=true org.eclipse.jdt.core.formatter.comment.indent_root_tags=true org.eclipse.jdt.core.formatter.comment.insert_new_line_before_root_tags=insert org.eclipse.jdt.core.formatter.comment.insert_new_line_for_parameter=insert org.eclipse.jdt.core.formatter.comment.line_length=80 org.eclipse.jdt.core.formatter.compact_else_if=true org.eclipse.jdt.core.formatter.continuation_indentation=2 org.eclipse.jdt.core.formatter.continuation_indentation_for_array_initializer=2 org.eclipse.jdt.core.formatter.format_guardian_clause_on_one_line=false org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_annotation_declaration_header=true org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_enum_constant_header=true org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_enum_declaration_header=true org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_type_header=true org.eclipse.jdt.core.formatter.indent_breaks_compare_to_cases=true org.eclipse.jdt.core.formatter.indent_empty_lines=false org.eclipse.jdt.core.formatter.indent_statements_compare_to_block=true org.eclipse.jdt.core.formatter.indent_statements_compare_to_body=true org.eclipse.jdt.core.formatter.indent_switchstatements_compare_to_cases=true org.eclipse.jdt.core.formatter.indent_switchstatements_compare_to_switch=false org.eclipse.jdt.core.formatter.indentation.size=4 org.eclipse.jdt.core.formatter.insert_new_line_after_annotation=insert org.eclipse.jdt.core.formatter.insert_new_line_after_opening_brace_in_array_initializer=do not insert org.eclipse.jdt.core.formatter.insert_new_line_at_end_of_file_if_missing=do not insert org.eclipse.jdt.core.formatter.insert_new_line_before_catch_in_try_statement=do not insert org.eclipse.jdt.core.formatter.insert_new_line_before_closing_brace_in_array_initializer=do not insert org.eclipse.jdt.core.formatter.insert_new_line_before_else_in_if_statement=do not insert org.eclipse.jdt.core.formatter.insert_new_line_before_finally_in_try_statement=do not insert org.eclipse.jdt.core.formatter.insert_new_line_before_while_in_do_statement=do not insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_annotation_declaration=insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_anonymous_type_declaration=insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_block=insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_enum_constant=insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_enum_declaration=insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_method_body=insert org.eclipse.jdt.core.formatter.insert_new_line_in_empty_type_declaration=insert org.eclipse.jdt.core.formatter.insert_space_after_and_in_type_parameter=insert org.eclipse.jdt.core.formatter.insert_space_after_assignment_operator=insert org.eclipse.jdt.core.formatter.insert_space_after_at_in_annotation=do not insert org.eclipse.jdt.core.formatter.insert_space_after_at_in_annotation_type_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_after_binary_operator=insert org.eclipse.jdt.core.formatter.insert_space_after_closing_angle_bracket_in_type_arguments=insert org.eclipse.jdt.core.formatter.insert_space_after_closing_angle_bracket_in_type_parameters=insert org.eclipse.jdt.core.formatter.insert_space_after_closing_brace_in_block=insert org.eclipse.jdt.core.formatter.insert_space_after_closing_paren_in_cast=insert org.eclipse.jdt.core.formatter.insert_space_after_colon_in_assert=insert org.eclipse.jdt.core.formatter.insert_space_after_colon_in_case=insert org.eclipse.jdt.core.formatter.insert_space_after_colon_in_conditional=insert org.eclipse.jdt.core.formatter.insert_space_after_colon_in_for=insert org.eclipse.jdt.core.formatter.insert_space_after_colon_in_labeled_statement=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_allocation_expression=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_annotation=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_array_initializer=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_constructor_declaration_parameters=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_constructor_declaration_throws=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_enum_constant_arguments=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_enum_declarations=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_explicitconstructorcall_arguments=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_for_increments=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_for_inits=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_method_declaration_parameters=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_method_declaration_throws=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_method_invocation_arguments=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_multiple_field_declarations=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_multiple_local_declarations=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_parameterized_type_reference=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_superinterfaces=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_type_arguments=insert org.eclipse.jdt.core.formatter.insert_space_after_comma_in_type_parameters=insert org.eclipse.jdt.core.formatter.insert_space_after_ellipsis=insert org.eclipse.jdt.core.formatter.insert_space_after_opening_angle_bracket_in_parameterized_type_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_angle_bracket_in_type_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_angle_bracket_in_type_parameters=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_brace_in_array_initializer=insert org.eclipse.jdt.core.formatter.insert_space_after_opening_bracket_in_array_allocation_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_bracket_in_array_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_annotation=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_cast=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_catch=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_constructor_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_enum_constant=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_for=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_if=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_method_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_method_invocation=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_parenthesized_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_switch=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_synchronized=do not insert org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_while=do not insert org.eclipse.jdt.core.formatter.insert_space_after_postfix_operator=do not insert org.eclipse.jdt.core.formatter.insert_space_after_prefix_operator=do not insert org.eclipse.jdt.core.formatter.insert_space_after_question_in_conditional=insert org.eclipse.jdt.core.formatter.insert_space_after_question_in_wildcard=do not insert org.eclipse.jdt.core.formatter.insert_space_after_semicolon_in_for=insert org.eclipse.jdt.core.formatter.insert_space_after_unary_operator=do not insert org.eclipse.jdt.core.formatter.insert_space_before_and_in_type_parameter=insert org.eclipse.jdt.core.formatter.insert_space_before_assignment_operator=insert org.eclipse.jdt.core.formatter.insert_space_before_at_in_annotation_type_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_binary_operator=insert org.eclipse.jdt.core.formatter.insert_space_before_closing_angle_bracket_in_parameterized_type_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_angle_bracket_in_type_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_angle_bracket_in_type_parameters=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_brace_in_array_initializer=insert org.eclipse.jdt.core.formatter.insert_space_before_closing_bracket_in_array_allocation_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_bracket_in_array_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_annotation=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_cast=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_catch=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_constructor_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_enum_constant=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_for=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_if=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_method_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_method_invocation=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_parenthesized_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_switch=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_synchronized=do not insert org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_while=do not insert org.eclipse.jdt.core.formatter.insert_space_before_colon_in_assert=insert org.eclipse.jdt.core.formatter.insert_space_before_colon_in_case=do not insert org.eclipse.jdt.core.formatter.insert_space_before_colon_in_conditional=insert org.eclipse.jdt.core.formatter.insert_space_before_colon_in_default=do not insert org.eclipse.jdt.core.formatter.insert_space_before_colon_in_for=insert org.eclipse.jdt.core.formatter.insert_space_before_colon_in_labeled_statement=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_allocation_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_annotation=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_array_initializer=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_constructor_declaration_parameters=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_constructor_declaration_throws=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_enum_constant_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_enum_declarations=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_explicitconstructorcall_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_for_increments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_for_inits=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_method_declaration_parameters=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_method_declaration_throws=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_method_invocation_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_multiple_field_declarations=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_multiple_local_declarations=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_parameterized_type_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_superinterfaces=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_type_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_comma_in_type_parameters=do not insert org.eclipse.jdt.core.formatter.insert_space_before_ellipsis=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_angle_bracket_in_parameterized_type_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_angle_bracket_in_type_arguments=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_angle_bracket_in_type_parameters=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_annotation_type_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_anonymous_type_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_array_initializer=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_block=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_constructor_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_enum_constant=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_enum_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_method_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_switch=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_type_declaration=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_bracket_in_array_allocation_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_bracket_in_array_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_bracket_in_array_type_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_annotation=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_annotation_type_member_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_catch=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_constructor_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_enum_constant=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_for=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_if=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_method_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_method_invocation=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_parenthesized_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_switch=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_synchronized=insert org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_while=insert org.eclipse.jdt.core.formatter.insert_space_before_parenthesized_expression_in_return=insert org.eclipse.jdt.core.formatter.insert_space_before_parenthesized_expression_in_throw=insert org.eclipse.jdt.core.formatter.insert_space_before_postfix_operator=do not insert org.eclipse.jdt.core.formatter.insert_space_before_prefix_operator=do not insert org.eclipse.jdt.core.formatter.insert_space_before_question_in_conditional=insert org.eclipse.jdt.core.formatter.insert_space_before_question_in_wildcard=do not insert org.eclipse.jdt.core.formatter.insert_space_before_semicolon=do not insert org.eclipse.jdt.core.formatter.insert_space_before_semicolon_in_for=do not insert org.eclipse.jdt.core.formatter.insert_space_before_unary_operator=do not insert org.eclipse.jdt.core.formatter.insert_space_between_brackets_in_array_type_reference=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_braces_in_array_initializer=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_brackets_in_array_allocation_expression=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_annotation_type_member_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_constructor_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_enum_constant=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_method_declaration=do not insert org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_method_invocation=do not insert org.eclipse.jdt.core.formatter.keep_else_statement_on_same_line=false org.eclipse.jdt.core.formatter.keep_empty_array_initializer_on_one_line=false org.eclipse.jdt.core.formatter.keep_imple_if_on_one_line=false org.eclipse.jdt.core.formatter.keep_then_statement_on_same_line=false org.eclipse.jdt.core.formatter.lineSplit=80 org.eclipse.jdt.core.formatter.never_indent_block_comments_on_first_column=false org.eclipse.jdt.core.formatter.never_indent_line_comments_on_first_column=false org.eclipse.jdt.core.formatter.number_of_blank_lines_at_beginning_of_method_body=0 org.eclipse.jdt.core.formatter.number_of_empty_lines_to_preserve=1 org.eclipse.jdt.core.formatter.put_empty_statement_on_new_line=true org.eclipse.jdt.core.formatter.tabulation.char=space org.eclipse.jdt.core.formatter.tabulation.size=4 org.eclipse.jdt.core.formatter.use_tabs_only_for_leading_indentations=false org.eclipse.jdt.core.formatter.wrap_before_binary_operator=true icu4j-4.2/localespi/.externalToolBuilders/0000755000175000017500000000000011361046446020543 5ustar twernertwernericu4j-4.2/localespi/.externalToolBuilders/localespi_jar.launch0000644000175000017500000000325211361046446024550 0ustar twernertwerner icu4j-4.2/localespi/.project0000644000175000017500000000133011361046446015757 0ustar twernertwerner icu4j-localespi org.eclipse.jdt.core.javabuilder org.eclipse.ui.externaltools.ExternalToolBuilder auto,full,incremental, LaunchConfigHandle <project>/.externalToolBuilders/localespi_jar.launch org.eclipse.jdt.core.javanature icu4j-4.2/eclipseProjectMisc/0000755000175000017500000000000011361045622016122 5ustar twernertwernericu4j-4.2/eclipseProjectMisc/normSrc.launch0000644000175000017500000000456511361045622020753 0ustar twernertwerner icu4j-4.2/eclipseProjectMisc/initSrc.launch0000644000175000017500000000456511361045622020743 0ustar twernertwerner icu4j-4.2/META-INF/0000755000175000017500000000000011361050734013473 5ustar twernertwernericu4j-4.2/META-INF/MANIFEST.MF0000644000175000017500000000101011361050732015113 0ustar twernertwernerManifest-Version: 1.0 Ant-Version: Apache Ant 1.7.1 Created-By: 2.4 (IBM Corporation) Built-By: IBM Corporation Name: common Specification-Title: ICU4J Sources Specification-Version: 4.2 Specification-Vendor: ICU Implementation-Title: ICU for Java source files Implementation-Version: 4.2.1.1 Implementation-Vendor: IBM Corporation Implementation-Vendor-Id: com.ibm Copyright-Info: Copyright (c) 2000-2010, International Business Machin es Corporation and others. All Rights Reserved. Sealed: false icu4j-4.2/src/0000755000175000017500000000000011361045622013122 5ustar twernertwernericu4j-4.2/src/META-INF/0000755000175000017500000000000011361045622014262 5ustar twernertwernericu4j-4.2/src/META-INF/services/0000755000175000017500000000000011361045622016105 5ustar twernertwernericu4j-4.2/src/META-INF/services/java.nio.charset.spi.CharsetProvider0000644000175000017500000000024311361045622025061 0ustar twernertwerner# Copyright (C) 2006, International Business Machines Corporation and others. All Rights Reserved. # icu4j converters com.ibm.icu.charset.CharsetProviderICU icu4j-4.2/src/com/0000755000175000017500000000000011361045622013700 5ustar twernertwernericu4j-4.2/src/com/ibm/0000755000175000017500000000000011361046432014447 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/0000755000175000017500000000000011361046432015227 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/lang/0000755000175000017500000000000011361050726016151 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/lang/UCharacterDirection.java0000644000175000017500000000557411361046134022707 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; import com.ibm.icu.lang.UCharacterEnums.ECharacterDirection; /** * Enumerated Unicode character linguistic direction constants. * Used as return results from UCharacter *

* This class is not subclassable *

* @author Syn Wee Quek * @stable ICU 2.1 */ public final class UCharacterDirection implements ECharacterDirection { // private constructor ========================================= ///CLOVER:OFF /** * Private constructor to prevent initialisation */ private UCharacterDirection() { } ///CLOVER:ON /** * Gets the name of the argument direction * @param dir direction type to retrieve name * @return directional name * @stable ICU 2.1 */ public static String toString(int dir) { switch(dir) { case LEFT_TO_RIGHT : return "Left-to-Right"; case RIGHT_TO_LEFT : return "Right-to-Left"; case EUROPEAN_NUMBER : return "European Number"; case EUROPEAN_NUMBER_SEPARATOR : return "European Number Separator"; case EUROPEAN_NUMBER_TERMINATOR : return "European Number Terminator"; case ARABIC_NUMBER : return "Arabic Number"; case COMMON_NUMBER_SEPARATOR : return "Common Number Separator"; case BLOCK_SEPARATOR : return "Paragraph Separator"; case SEGMENT_SEPARATOR : return "Segment Separator"; case WHITE_SPACE_NEUTRAL : return "Whitespace"; case OTHER_NEUTRAL : return "Other Neutrals"; case LEFT_TO_RIGHT_EMBEDDING : return "Left-to-Right Embedding"; case LEFT_TO_RIGHT_OVERRIDE : return "Left-to-Right Override"; case RIGHT_TO_LEFT_ARABIC : return "Right-to-Left Arabic"; case RIGHT_TO_LEFT_EMBEDDING : return "Right-to-Left Embedding"; case RIGHT_TO_LEFT_OVERRIDE : return "Right-to-Left Override"; case POP_DIRECTIONAL_FORMAT : return "Pop Directional Format"; case DIR_NON_SPACING_MARK : return "Non-Spacing Mark"; case BOUNDARY_NEUTRAL : return "Boundary Neutral"; } return "Unassigned"; } } icu4j-4.2/src/com/ibm/icu/lang/UProperty.java0000644000175000017500000006523511361046134020776 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; /** *

Selection constants for Unicode properties.

*

These constants are used in functions like * UCharacter.hasBinaryProperty(int) to select one of the Unicode properties. *

*

The properties APIs are intended to reflect Unicode properties as * defined in the Unicode Character Database (UCD) and Unicode Technical * Reports (UTR).

*

For details about the properties see * http://www.unicode.org.

*

For names of Unicode properties see the UCD file PropertyAliases.txt. *

*

Important: If ICU is built with UCD files from Unicode versions below * 3.2, then properties marked with "new" are not or not fully * available. Check UCharacter.getUnicodeVersion() to be sure.

* @author Syn Wee Quek * @stable ICU 2.6 * @see com.ibm.icu.lang.UCharacter */ public interface UProperty { // public data member -------------------------------------------------- /** *

Binary property Alphabetic.

*

Property for UCharacter.isUAlphabetic(), different from the property * in UCharacter.isalpha().

*

Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic.

* @stable ICU 2.6 */ public static final int ALPHABETIC = 0; /** * First constant for binary Unicode properties. * @stable ICU 2.6 */ public static final int BINARY_START = ALPHABETIC; /** * Binary property ASCII_Hex_Digit (0-9 A-F a-f). * @stable ICU 2.6 */ public static final int ASCII_HEX_DIGIT = 1; /** *

Binary property Bidi_Control.

*

Format controls which have specific functions in the Bidi Algorithm. *

* @stable ICU 2.6 */ public static final int BIDI_CONTROL = 2; /** *

Binary property Bidi_Mirrored.

*

Characters that may change display in RTL text.

*

Property for UCharacter.isMirrored().

*

See Bidi Algorithm; UTR 9.

* @stable ICU 2.6 */ public static final int BIDI_MIRRORED = 3; /** *

Binary property Dash.

*

Variations of dashes.

* @stable ICU 2.6 */ public static final int DASH = 4; /** *

Binary property Default_Ignorable_Code_Point (new). *

*

Property that indicates codepoint is ignorable in most processing. *

*

Codepoints (2060..206F, FFF0..FFFB, E0000..E0FFF) + * Other_Default_Ignorable_Code_Point + (Cf + Cc + Cs - White_Space)

* @stable ICU 2.6 */ public static final int DEFAULT_IGNORABLE_CODE_POINT = 5; /** *

Binary property Deprecated (new).

*

The usage of deprecated characters is strongly discouraged.

* @stable ICU 2.6 */ public static final int DEPRECATED = 6; /** *

Binary property Diacritic.

*

Characters that linguistically modify the meaning of another * character to which they apply.

* @stable ICU 2.6 */ public static final int DIACRITIC = 7; /** *

Binary property Extender.

*

Extend the value or shape of a preceding alphabetic character, e.g. * length and iteration marks.

* @stable ICU 2.6 */ public static final int EXTENDER = 8; /** *

Binary property Full_Composition_Exclusion.

*

CompositionExclusions.txt + Singleton Decompositions + * Non-Starter Decompositions.

* @stable ICU 2.6 */ public static final int FULL_COMPOSITION_EXCLUSION = 9; /** *

Binary property Grapheme_Base (new).

*

For programmatic determination of grapheme cluster boundaries. * [0..10FFFF]-Cc-Cf-Cs-Co-Cn-Zl-Zp-Grapheme_Link-Grapheme_Extend-CGJ

* @stable ICU 2.6 */ public static final int GRAPHEME_BASE = 10; /** *

Binary property Grapheme_Extend (new).

*

For programmatic determination of grapheme cluster boundaries.

*

Me+Mn+Mc+Other_Grapheme_Extend-Grapheme_Link-CGJ

* @stable ICU 2.6 */ public static final int GRAPHEME_EXTEND = 11; /** *

Binary property Grapheme_Link (new).

*

For programmatic determination of grapheme cluster boundaries.

* @stable ICU 2.6 */ public static final int GRAPHEME_LINK = 12; /** *

Binary property Hex_Digit.

*

Characters commonly used for hexadecimal numbers.

* @stable ICU 2.6 */ public static final int HEX_DIGIT = 13; /** *

Binary property Hyphen.

*

Dashes used to mark connections between pieces of words, plus the * Katakana middle dot.

* @stable ICU 2.6 */ public static final int HYPHEN = 14; /** *

Binary property ID_Continue.

*

Characters that can continue an identifier.

*

ID_Start+Mn+Mc+Nd+Pc

* @stable ICU 2.6 */ public static final int ID_CONTINUE = 15; /** *

Binary property ID_Start.

*

Characters that can start an identifier.

*

Lu+Ll+Lt+Lm+Lo+Nl

* @stable ICU 2.6 */ public static final int ID_START = 16; /** *

Binary property Ideographic.

*

CJKV ideographs.

* @stable ICU 2.6 */ public static final int IDEOGRAPHIC = 17; /** *

Binary property IDS_Binary_Operator (new).

*

For programmatic determination of Ideographic Description Sequences. *

* @stable ICU 2.6 */ public static final int IDS_BINARY_OPERATOR = 18; /** *

Binary property IDS_Trinary_Operator (new).

* * @stable ICU 2.6 */ public static final int IDS_TRINARY_OPERATOR = 19; /** *

Binary property Join_Control.

*

Format controls for cursive joining and ligation.

* @stable ICU 2.6 */ public static final int JOIN_CONTROL = 20; /** *

Binary property Logical_Order_Exception (new).

*

Characters that do not use logical order and require special * handling in most processing.

* @stable ICU 2.6 */ public static final int LOGICAL_ORDER_EXCEPTION = 21; /** *

Binary property Lowercase.

*

Same as UCharacter.isULowercase(), different from * UCharacter.islower().

*

Ll+Other_Lowercase

* @stable ICU 2.6 */ public static final int LOWERCASE = 22; /**

Binary property Math.

*

Sm+Other_Math

* @stable ICU 2.6 */ public static final int MATH = 23; /** *

Binary property Noncharacter_Code_Point.

*

Code points that are explicitly defined as illegal for the encoding * of characters.

* @stable ICU 2.6 */ public static final int NONCHARACTER_CODE_POINT = 24; /** *

Binary property Quotation_Mark.

* @stable ICU 2.6 */ public static final int QUOTATION_MARK = 25; /** *

Binary property Radical (new).

*

For programmatic determination of Ideographic Description * Sequences.

* @stable ICU 2.6 */ public static final int RADICAL = 26; /** *

Binary property Soft_Dotted (new).

*

Characters with a "soft dot", like i or j.

*

An accent placed on these characters causes the dot to disappear.

* @stable ICU 2.6 */ public static final int SOFT_DOTTED = 27; /** *

Binary property Terminal_Punctuation.

*

Punctuation characters that generally mark the end of textual * units.

* @stable ICU 2.6 */ public static final int TERMINAL_PUNCTUATION = 28; /** *

Binary property Unified_Ideograph (new).

*

For programmatic determination of Ideographic Description * Sequences.

* @stable ICU 2.6 */ public static final int UNIFIED_IDEOGRAPH = 29; /** *

Binary property Uppercase.

*

Same as UCharacter.isUUppercase(), different from * UCharacter.isUpperCase().

*

Lu+Other_Uppercase

* @stable ICU 2.6 */ public static final int UPPERCASE = 30; /** *

Binary property White_Space.

*

Same as UCharacter.isUWhiteSpace(), different from * UCharacter.isSpace() and UCharacter.isWhitespace().

* Space characters+TAB+CR+LF-ZWSP-ZWNBSP

* @stable ICU 2.6 */ public static final int WHITE_SPACE = 31; /** *

Binary property XID_Continue.

*

ID_Continue modified to allow closure under normalization forms * NFKC and NFKD.

* @stable ICU 2.6 */ public static final int XID_CONTINUE = 32; /** *

Binary property XID_Start.

*

ID_Start modified to allow closure under normalization forms NFKC * and NFKD.

* @stable ICU 2.6 */ public static final int XID_START = 33; /** *

Binary property Case_Sensitive.

*

Either the source of a case * mapping or _in_ the target of a case mapping. Not the same as * the general category Cased_Letter.

* @stable ICU 2.6 */ public static final int CASE_SENSITIVE = 34; /** * Binary property STerm (new in Unicode 4.0.1). * Sentence Terminal. Used in UAX #29: Text Boundaries * (http://www.unicode.org/reports/tr29/) * @stable ICU 3.0 */ public static final int S_TERM = 35; /** * Binary property Variation_Selector (new in Unicode 4.0.1). * Indicates all those characters that qualify as Variation Selectors. * For details on the behavior of these characters, * see StandardizedVariants.html and 15.6 Variation Selectors. * @stable ICU 3.0 */ public static final int VARIATION_SELECTOR = 36; /** * Binary property NFD_Inert. * ICU-specific property for characters that are inert under NFD, * i.e., they do not interact with adjacent characters. * Used for example in normalizing transforms in incremental mode * to find the boundary of safely normalizable text despite possible * text additions. * * There is one such property per normalization form. * These properties are computed as follows - an inert character is: * a) unassigned, or ALL of the following: * b) of combining class 0. * c) not decomposed by this normalization form. * AND if NFC or NFKC, * d) can never compose with a previous character. * e) can never compose with a following character. * f) can never change if another character is added. * Example: a-breve might satisfy all but f, but if you * add an ogonek it changes to a-ogonek + breve * * See also com.ibm.text.UCD.NFSkippable in the ICU4J repository, * and icu/source/common/unormimp.h . * @stable ICU 3.0 */ public static final int NFD_INERT = 37; /** * Binary property NFKD_Inert. * ICU-specific property for characters that are inert under NFKD, * i.e., they do not interact with adjacent characters. * Used for example in normalizing transforms in incremental mode * to find the boundary of safely normalizable text despite possible * text additions. * @see #NFD_INERT * @stable ICU 3.0 */ public static final int NFKD_INERT = 38; /** * Binary property NFC_Inert. * ICU-specific property for characters that are inert under NFC, * i.e., they do not interact with adjacent characters. * Used for example in normalizing transforms in incremental mode * to find the boundary of safely normalizable text despite possible * text additions. * @see #NFD_INERT * @stable ICU 3.0 */ public static final int NFC_INERT = 39; /** * Binary property NFKC_Inert. * ICU-specific property for characters that are inert under NFKC, * i.e., they do not interact with adjacent characters. * Used for example in normalizing transforms in incremental mode * to find the boundary of safely normalizable text despite possible * text additions. * @see #NFD_INERT * @stable ICU 3.0 */ public static final int NFKC_INERT = 40; /** * Binary Property Segment_Starter. * ICU-specific property for characters that are starters in terms of * Unicode normalization and combining character sequences. * They have ccc=0 and do not occur in non-initial position of the * canonical decomposition of any character * (like " in NFD(a-umlaut) and a Jamo T in an NFD(Hangul LVT)). * ICU uses this property for segmenting a string for generating a set of * canonically equivalent strings, e.g. for canonical closure while * processing collation tailoring rules. * @stable ICU 3.0 */ public static final int SEGMENT_STARTER = 41; /** * Binary property Pattern_Syntax (new in Unicode 4.1). * See UAX #31 Identifier and Pattern Syntax * (http://www.unicode.org/reports/tr31/) * @stable ICU 3.4 */ public static final int PATTERN_SYNTAX = 42; /** * Binary property Pattern_White_Space (new in Unicode 4.1). * See UAX #31 Identifier and Pattern Syntax * (http://www.unicode.org/reports/tr31/) * @stable ICU 3.4 */ public static final int PATTERN_WHITE_SPACE = 43; /** * Binary property alnum (a C/POSIX character class). * Implemented according to the UTS #18 Annex C Standard Recommendation. * See the UCharacter class documentation. * @stable ICU 3.4 */ public static final int POSIX_ALNUM = 44; /** * Binary property blank (a C/POSIX character class). * Implemented according to the UTS #18 Annex C Standard Recommendation. * See the UCharacter class documentation. * @stable ICU 3.4 */ public static final int POSIX_BLANK = 45; /** * Binary property graph (a C/POSIX character class). * Implemented according to the UTS #18 Annex C Standard Recommendation. * See the UCharacter class documentation. * @stable ICU 3.4 */ public static final int POSIX_GRAPH = 46; /** * Binary property print (a C/POSIX character class). * Implemented according to the UTS #18 Annex C Standard Recommendation. * See the UCharacter class documentation. * @stable ICU 3.4 */ public static final int POSIX_PRINT = 47; /** * Binary property xdigit (a C/POSIX character class). * Implemented according to the UTS #18 Annex C Standard Recommendation. * See the UCharacter class documentation. * @stable ICU 3.4 */ public static final int POSIX_XDIGIT = 48; /** *

One more than the last constant for binary Unicode properties.

* @stable ICU 2.6 */ public static final int BINARY_LIMIT = 49; /** * Enumerated property Bidi_Class. * Same as UCharacter.getDirection(int), returns UCharacterDirection values. * @stable ICU 2.4 */ public static final int BIDI_CLASS = 0x1000; /** * First constant for enumerated/integer Unicode properties. * @stable ICU 2.4 */ public static final int INT_START = BIDI_CLASS; /** * Enumerated property Block. * Same as UCharacter.UnicodeBlock.of(int), returns UCharacter.UnicodeBlock * values. * @stable ICU 2.4 */ public static final int BLOCK = 0x1001; /** * Enumerated property Canonical_Combining_Class. * Same as UCharacter.getCombiningClass(int), returns 8-bit numeric values. * @stable ICU 2.4 */ public static final int CANONICAL_COMBINING_CLASS = 0x1002; /** * Enumerated property Decomposition_Type. * Returns UCharacter.DecompositionType values. * @stable ICU 2.4 */ public static final int DECOMPOSITION_TYPE = 0x1003; /** * Enumerated property East_Asian_Width. * See http://www.unicode.org/reports/tr11/ * Returns UCharacter.EastAsianWidth values. * @stable ICU 2.4 */ public static final int EAST_ASIAN_WIDTH = 0x1004; /** * Enumerated property General_Category. * Same as UCharacter.getType(int), returns UCharacterCategory values. * @stable ICU 2.4 */ public static final int GENERAL_CATEGORY = 0x1005; /** * Enumerated property Joining_Group. * Returns UCharacter.JoiningGroup values. * @stable ICU 2.4 */ public static final int JOINING_GROUP = 0x1006; /** * Enumerated property Joining_Type. * Returns UCharacter.JoiningType values. * @stable ICU 2.4 */ public static final int JOINING_TYPE = 0x1007; /** * Enumerated property Line_Break. * Returns UCharacter.LineBreak values. * @stable ICU 2.4 */ public static final int LINE_BREAK = 0x1008; /** * Enumerated property Numeric_Type. * Returns UCharacter.NumericType values. * @stable ICU 2.4 */ public static final int NUMERIC_TYPE = 0x1009; /** * Enumerated property Script. * Same as UScript.getScript(int), returns UScript values. * @stable ICU 2.4 */ public static final int SCRIPT = 0x100A; /** * Enumerated property Hangul_Syllable_Type, new in Unicode 4. * Returns HangulSyllableType values. * @stable ICU 2.6 */ public static final int HANGUL_SYLLABLE_TYPE = 0x100B; /** * Enumerated property NFD_Quick_Check. * Returns numeric values compatible with Normalizer.QuickCheckResult. * @stable ICU 3.0 */ public static final int NFD_QUICK_CHECK = 0x100C; /** * Enumerated property NFKD_Quick_Check. * Returns numeric values compatible with Normalizer.QuickCheckResult. * @stable ICU 3.0 */ public static final int NFKD_QUICK_CHECK = 0x100D; /** * Enumerated property NFC_Quick_Check. * Returns numeric values compatible with Normalizer.QuickCheckResult. * @stable ICU 3.0 */ public static final int NFC_QUICK_CHECK = 0x100E; /** * Enumerated property NFKC_Quick_Check. * Returns numeric values compatible with Normalizer.QuickCheckResult. * @stable ICU 3.0 */ public static final int NFKC_QUICK_CHECK = 0x100F; /** * Enumerated property Lead_Canonical_Combining_Class. * ICU-specific property for the ccc of the first code point * of the decomposition, or lccc(c)=ccc(NFD(c)[0]). * Useful for checking for canonically ordered text; * see Normalizer.FCD and http://www.unicode.org/notes/tn5/#FCD . * Returns 8-bit numeric values like CANONICAL_COMBINING_CLASS. * @stable ICU 3.0 */ public static final int LEAD_CANONICAL_COMBINING_CLASS = 0x1010; /** * Enumerated property Trail_Canonical_Combining_Class. * ICU-specific property for the ccc of the last code point * of the decomposition, or lccc(c)=ccc(NFD(c)[last]). * Useful for checking for canonically ordered text; * see Normalizer.FCD and http://www.unicode.org/notes/tn5/#FCD . * Returns 8-bit numeric values like CANONICAL_COMBINING_CLASS. * @stable ICU 3.0 */ public static final int TRAIL_CANONICAL_COMBINING_CLASS = 0x1011; /** * Enumerated property Grapheme_Cluster_Break (new in Unicode 4.1). * Used in UAX #29: Text Boundaries * (http://www.unicode.org/reports/tr29/) * Returns UGraphemeClusterBreak values. * @stable ICU 3.4 */ public static final int GRAPHEME_CLUSTER_BREAK = 0x1012; /** * Enumerated property Sentence_Break (new in Unicode 4.1). * Used in UAX #29: Text Boundaries * (http://www.unicode.org/reports/tr29/) * Returns USentenceBreak values. * @stable ICU 3.4 */ public static final int SENTENCE_BREAK = 0x1013; /** * Enumerated property Word_Break (new in Unicode 4.1). * Used in UAX #29: Text Boundaries * (http://www.unicode.org/reports/tr29/) * Returns UWordBreakValues values. * @stable ICU 3.4 */ public static final int WORD_BREAK = 0x1014; /** * One more than the last constant for enumerated/integer Unicode * properties. * @stable ICU 2.4 */ public static final int INT_LIMIT = 0x1015; /** * Bitmask property General_Category_Mask. * This is the General_Category property returned as a bit mask. * When used in UCharacter.getIntPropertyValue(c), * returns bit masks for UCharacterCategory values where exactly one bit is set. * When used with UCharacter.getPropertyValueName() and UCharacter.getPropertyValueEnum(), * a multi-bit mask is used for sets of categories like "Letters". * @stable ICU 2.4 */ public static final int GENERAL_CATEGORY_MASK = 0x2000; /** * First constant for bit-mask Unicode properties. * @stable ICU 2.4 */ public static final int MASK_START = GENERAL_CATEGORY_MASK; /** * One more than the last constant for bit-mask Unicode properties. * @stable ICU 2.4 */ public static final int MASK_LIMIT = 0x2001; /** * Double property Numeric_Value. * Corresponds to UCharacter.getUnicodeNumericValue(int). * @stable ICU 2.4 */ public static final int NUMERIC_VALUE = 0x3000; /** * First constant for double Unicode properties. * @stable ICU 2.4 */ public static final int DOUBLE_START = NUMERIC_VALUE; /** * One more than the last constant for double Unicode properties. * @stable ICU 2.4 */ public static final int DOUBLE_LIMIT = 0x3001; /** * String property Age. * Corresponds to UCharacter.getAge(int). * @stable ICU 2.4 */ public static final int AGE = 0x4000; /** * First constant for string Unicode properties. * @stable ICU 2.4 */ public static final int STRING_START = AGE; /** * String property Bidi_Mirroring_Glyph. * Corresponds to UCharacter.getMirror(int). * @stable ICU 2.4 */ public static final int BIDI_MIRRORING_GLYPH = 0x4001; /** * String property Case_Folding. * Corresponds to UCharacter.foldCase(String, boolean). * @stable ICU 2.4 */ public static final int CASE_FOLDING = 0x4002; /** * String property ISO_Comment. * Corresponds to UCharacter.getISOComment(int). * @stable ICU 2.4 */ public static final int ISO_COMMENT = 0x4003; /** * String property Lowercase_Mapping. * Corresponds to UCharacter.toLowerCase(String). * @stable ICU 2.4 */ public static final int LOWERCASE_MAPPING = 0x4004; /** * String property Name. * Corresponds to UCharacter.getName(int). * @stable ICU 2.4 */ public static final int NAME = 0x4005; /** * String property Simple_Case_Folding. * Corresponds to UCharacter.foldCase(int, boolean). * @stable ICU 2.4 */ public static final int SIMPLE_CASE_FOLDING = 0x4006; /** * String property Simple_Lowercase_Mapping. * Corresponds to UCharacter.toLowerCase(int). * @stable ICU 2.4 */ public static final int SIMPLE_LOWERCASE_MAPPING = 0x4007; /** * String property Simple_Titlecase_Mapping. * Corresponds to UCharacter.toTitleCase(int). * @stable ICU 2.4 */ public static final int SIMPLE_TITLECASE_MAPPING = 0x4008; /** * String property Simple_Uppercase_Mapping. * Corresponds to UCharacter.toUpperCase(int). * @stable ICU 2.4 */ public static final int SIMPLE_UPPERCASE_MAPPING = 0x4009; /** * String property Titlecase_Mapping. * Corresponds to UCharacter.toTitleCase(String). * @stable ICU 2.4 */ public static final int TITLECASE_MAPPING = 0x400A; /** * String property Unicode_1_Name. * Corresponds to UCharacter.getName1_0(int). * @stable ICU 2.4 */ public static final int UNICODE_1_NAME = 0x400B; /** * String property Uppercase_Mapping. * Corresponds to UCharacter.toUpperCase(String). * @stable ICU 2.4 */ public static final int UPPERCASE_MAPPING = 0x400C; /** * One more than the last constant for string Unicode properties. * @stable ICU 2.4 */ public static final int STRING_LIMIT = 0x400D; /** * Selector constants for UCharacter.getPropertyName() and * UCharacter.getPropertyValueName(). These selectors are used to * choose which name is returned for a given property or value. * All properties and values have a long name. Most have a short * name, but some do not. Unicode allows for additional names, * beyond the long and short name, which would be indicated by * LONG + i, where i=1, 2,... * * @see UCharacter#getPropertyName * @see UCharacter#getPropertyValueName * @stable ICU 2.4 */ public interface NameChoice { /** * Selector for the abbreviated name of a property or value. * Most properties and values have a short name; those that do * not return null. * @stable ICU 2.4 */ static final int SHORT = 0; /** * Selector for the long name of a property or value. All * properties and values have a long name. * @stable ICU 2.4 */ static final int LONG = 1; /** * The number of predefined property name choices. Individual * properties or values may have more than COUNT aliases. * @stable ICU 2.4 */ static final int COUNT = 2; } } icu4j-4.2/src/com/ibm/icu/lang/UCharacterTypeIterator.java0000644000175000017500000000424011361046134023407 0ustar twernertwerner/* ****************************************************************************** * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ****************************************************************************** */ package com.ibm.icu.lang; import com.ibm.icu.impl.TrieIterator; import com.ibm.icu.impl.UCharacterProperty; /** * Class enabling iteration of the codepoints according to their types. * Result of each iteration contains the interval of codepoints that have * the same type. * Example of use:
*
 * RangeValueIterator iterator = UCharacter.getTypeIterator();
 * RangeValueIterator.Element element = new RangeValueIterator.Element();
 * while (iterator.next(element)) {
 *     System.out.println("Codepoint \\u" + 
 *                        Integer.toHexString(element.start) + 
 *                        " to codepoint \\u" +
 *                        Integer.toHexString(element.limit - 1) + 
 *                        " has the character type " + 
 *                        element.value);
 * }
 * 
* @author synwee * @see com.ibm.icu.util.TrieIterator * @since release 2.1, Jan 24 2002 */ class UCharacterTypeIterator extends TrieIterator { // protected constructor --------------------------------------------- /** * TrieEnumeration constructor * @param property the unicode character properties to be used */ protected UCharacterTypeIterator(UCharacterProperty property) { super(property.m_trie_); } // protected methods ---------------------------------------------- /** * Called by nextElement() to extracts a 32 bit value from a trie value * used for comparison. * This method is to be overwritten if special manipulation is to be done * to retrieve a relevant comparison. * The default function is to return the value as it is. * @param value a value from the trie * @return extracted value */ protected int extract(int value) { return value & UCharacterProperty.TYPE_MASK; } }icu4j-4.2/src/com/ibm/icu/lang/UCharacter.java0000644000175000017500000073055111361050726021050 0ustar twernertwerner//##header /** ******************************************************************************* * Copyright (C) 1996-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; import java.io.IOException; import java.lang.ref.SoftReference; import java.util.HashMap; import java.util.Locale; import java.util.Map; import java.util.MissingResourceException; import com.ibm.icu.impl.UBiDiProps; import com.ibm.icu.impl.UCaseProps; import com.ibm.icu.impl.NormalizerImpl; import com.ibm.icu.impl.UCharacterUtility; import com.ibm.icu.impl.UCharacterName; import com.ibm.icu.impl.UCharacterNameChoice; import com.ibm.icu.impl.UPropertyAliases; import com.ibm.icu.lang.UCharacterEnums.*; import com.ibm.icu.text.BreakIterator; import com.ibm.icu.text.UTF16; import com.ibm.icu.impl.UCharacterProperty; import com.ibm.icu.util.RangeValueIterator; import com.ibm.icu.util.ULocale; import com.ibm.icu.util.ValueIterator; import com.ibm.icu.util.VersionInfo; /** *

* The UCharacter class provides extensions to the * * java.lang.Character class. These extensions provide support for * more Unicode properties and together with the UTF16 * class, provide support for supplementary characters (those with code * points above U+FFFF). * Each ICU release supports the latest version of Unicode available at that time. *

*

* Code points are represented in these API using ints. While it would be * more convenient in Java to have a separate primitive datatype for them, * ints suffice in the meantime. *

*

* To use this class please add the jar file name icu4j.jar to the * class path, since it contains data files which supply the information used * by this file.
* E.g. In Windows
* set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
* Otherwise, another method would be to copy the files uprops.dat and * unames.icu from the icu4j source subdirectory * $ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory * $ICU4J_CLASS/com.ibm.icu.impl.data. *

*

* Aside from the additions for UTF-16 support, and the updated Unicode * properties, the main differences between UCharacter and Character are: *

    *
  • UCharacter is not designed to be a char wrapper and does not have * APIs to which involves management of that single char.
    * These include: *
      *
    • char charValue(), *
    • int compareTo(java.lang.Character, java.lang.Character), etc. *
    *
  • UCharacter does not include Character APIs that are deprecated, nor * does it include the Java-specific character information, such as * boolean isJavaIdentifierPart(char ch). *
  • Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric * values '10' - '35'. UCharacter also does this in digit and * getNumericValue, to adhere to the java semantics of these * methods. New methods unicodeDigit, and * getUnicodeNumericValue do not treat the above code points * as having numeric values. This is a semantic change from ICU4J 1.3.1. *
*

* Further detail differences can be determined from the program * * com.ibm.icu.dev.test.lang.UCharacterCompare *

*

* In addition to Java compatibility functions, which calculate derived properties, * this API provides low-level access to the Unicode Character Database. *

*

* Unicode assigns each code point (not just assigned character) values for * many properties. * Most of them are simple boolean flags, or constants from a small enumerated list. * For some properties, values are strings or other relatively more complex types. *

*

* For more information see * "About the Unicode Character Database" (http://www.unicode.org/ucd/) * and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html). *

*

* There are also functions that provide easy migration from C/POSIX functions * like isblank(). Their use is generally discouraged because the C/POSIX * standards do not define their semantics beyond the ASCII range, which means * that different implementations exhibit very different behavior. * Instead, Unicode properties should be used directly. *

*

* There are also only a few, broad C/POSIX character classes, and they tend * to be used for conflicting purposes. For example, the "isalpha()" class * is sometimes used to determine word boundaries, while a more sophisticated * approach would at least distinguish initial letters from continuation * characters (the latter including combining marks). * (In ICU, BreakIterator is the most sophisticated API for word boundaries.) * Another example: There is no "istitle()" class for titlecase characters. *

*

* ICU 3.4 and later provides API access for all twelve C/POSIX character classes. * ICU implements them according to the Standard Recommendations in * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions * (http://www.unicode.org/reports/tr18/#Compatibility_Properties). *

*

* API access for C/POSIX character classes is as follows: * - alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC) * - lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE) * - upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE) * - punct: ((1< *

* The C/POSIX character classes are also available in UnicodeSet patterns, * using patterns like [:graph:] or \p{graph}. *

*

* Note: There are several ICU (and Java) whitespace functions. * Comparison: * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; * most of general categories "Z" (separators) + most whitespace ISO controls * (including no-break spaces, but excluding IS1..IS4 and ZWSP) * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces * - isSpaceChar: just Z (including no-break spaces) *

*

* This class is not subclassable *

* @author Syn Wee Quek * @stable ICU 2.1 * @see com.ibm.icu.lang.UCharacterEnums */ public final class UCharacter implements ECharacterCategory, ECharacterDirection { // public inner classes ---------------------------------------------- /** * A family of character subsets representing the character blocks in the * Unicode specification, generated from Unicode Data file Blocks.txt. * Character blocks generally define characters used for a specific script * or purpose. A character is contained by at most one Unicode block. * @stable ICU 2.4 */ public static final class UnicodeBlock extends Character.Subset { // block id corresponding to icu4c ----------------------------------- /** * @stable ICU 2.4 */ public static final int INVALID_CODE_ID = -1; /** * @stable ICU 2.4 */ public static final int BASIC_LATIN_ID = 1; /** * @stable ICU 2.4 */ public static final int LATIN_1_SUPPLEMENT_ID = 2; /** * @stable ICU 2.4 */ public static final int LATIN_EXTENDED_A_ID = 3; /** * @stable ICU 2.4 */ public static final int LATIN_EXTENDED_B_ID = 4; /** * @stable ICU 2.4 */ public static final int IPA_EXTENSIONS_ID = 5; /** * @stable ICU 2.4 */ public static final int SPACING_MODIFIER_LETTERS_ID = 6; /** * @stable ICU 2.4 */ public static final int COMBINING_DIACRITICAL_MARKS_ID = 7; /** * Unicode 3.2 renames this block to "Greek and Coptic". * @stable ICU 2.4 */ public static final int GREEK_ID = 8; /** * @stable ICU 2.4 */ public static final int CYRILLIC_ID = 9; /** * @stable ICU 2.4 */ public static final int ARMENIAN_ID = 10; /** * @stable ICU 2.4 */ public static final int HEBREW_ID = 11; /** * @stable ICU 2.4 */ public static final int ARABIC_ID = 12; /** * @stable ICU 2.4 */ public static final int SYRIAC_ID = 13; /** * @stable ICU 2.4 */ public static final int THAANA_ID = 14; /** * @stable ICU 2.4 */ public static final int DEVANAGARI_ID = 15; /** * @stable ICU 2.4 */ public static final int BENGALI_ID = 16; /** * @stable ICU 2.4 */ public static final int GURMUKHI_ID = 17; /** * @stable ICU 2.4 */ public static final int GUJARATI_ID = 18; /** * @stable ICU 2.4 */ public static final int ORIYA_ID = 19; /** * @stable ICU 2.4 */ public static final int TAMIL_ID = 20; /** * @stable ICU 2.4 */ public static final int TELUGU_ID = 21; /** * @stable ICU 2.4 */ public static final int KANNADA_ID = 22; /** * @stable ICU 2.4 */ public static final int MALAYALAM_ID = 23; /** * @stable ICU 2.4 */ public static final int SINHALA_ID = 24; /** * @stable ICU 2.4 */ public static final int THAI_ID = 25; /** * @stable ICU 2.4 */ public static final int LAO_ID = 26; /** * @stable ICU 2.4 */ public static final int TIBETAN_ID = 27; /** * @stable ICU 2.4 */ public static final int MYANMAR_ID = 28; /** * @stable ICU 2.4 */ public static final int GEORGIAN_ID = 29; /** * @stable ICU 2.4 */ public static final int HANGUL_JAMO_ID = 30; /** * @stable ICU 2.4 */ public static final int ETHIOPIC_ID = 31; /** * @stable ICU 2.4 */ public static final int CHEROKEE_ID = 32; /** * @stable ICU 2.4 */ public static final int UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_ID = 33; /** * @stable ICU 2.4 */ public static final int OGHAM_ID = 34; /** * @stable ICU 2.4 */ public static final int RUNIC_ID = 35; /** * @stable ICU 2.4 */ public static final int KHMER_ID = 36; /** * @stable ICU 2.4 */ public static final int MONGOLIAN_ID = 37; /** * @stable ICU 2.4 */ public static final int LATIN_EXTENDED_ADDITIONAL_ID = 38; /** * @stable ICU 2.4 */ public static final int GREEK_EXTENDED_ID = 39; /** * @stable ICU 2.4 */ public static final int GENERAL_PUNCTUATION_ID = 40; /** * @stable ICU 2.4 */ public static final int SUPERSCRIPTS_AND_SUBSCRIPTS_ID = 41; /** * @stable ICU 2.4 */ public static final int CURRENCY_SYMBOLS_ID = 42; /** * Unicode 3.2 renames this block to "Combining Diacritical Marks for * Symbols". * @stable ICU 2.4 */ public static final int COMBINING_MARKS_FOR_SYMBOLS_ID = 43; /** * @stable ICU 2.4 */ public static final int LETTERLIKE_SYMBOLS_ID = 44; /** * @stable ICU 2.4 */ public static final int NUMBER_FORMS_ID = 45; /** * @stable ICU 2.4 */ public static final int ARROWS_ID = 46; /** * @stable ICU 2.4 */ public static final int MATHEMATICAL_OPERATORS_ID = 47; /** * @stable ICU 2.4 */ public static final int MISCELLANEOUS_TECHNICAL_ID = 48; /** * @stable ICU 2.4 */ public static final int CONTROL_PICTURES_ID = 49; /** * @stable ICU 2.4 */ public static final int OPTICAL_CHARACTER_RECOGNITION_ID = 50; /** * @stable ICU 2.4 */ public static final int ENCLOSED_ALPHANUMERICS_ID = 51; /** * @stable ICU 2.4 */ public static final int BOX_DRAWING_ID = 52; /** * @stable ICU 2.4 */ public static final int BLOCK_ELEMENTS_ID = 53; /** * @stable ICU 2.4 */ public static final int GEOMETRIC_SHAPES_ID = 54; /** * @stable ICU 2.4 */ public static final int MISCELLANEOUS_SYMBOLS_ID = 55; /** * @stable ICU 2.4 */ public static final int DINGBATS_ID = 56; /** * @stable ICU 2.4 */ public static final int BRAILLE_PATTERNS_ID = 57; /** * @stable ICU 2.4 */ public static final int CJK_RADICALS_SUPPLEMENT_ID = 58; /** * @stable ICU 2.4 */ public static final int KANGXI_RADICALS_ID = 59; /** * @stable ICU 2.4 */ public static final int IDEOGRAPHIC_DESCRIPTION_CHARACTERS_ID = 60; /** * @stable ICU 2.4 */ public static final int CJK_SYMBOLS_AND_PUNCTUATION_ID = 61; /** * @stable ICU 2.4 */ public static final int HIRAGANA_ID = 62; /** * @stable ICU 2.4 */ public static final int KATAKANA_ID = 63; /** * @stable ICU 2.4 */ public static final int BOPOMOFO_ID = 64; /** * @stable ICU 2.4 */ public static final int HANGUL_COMPATIBILITY_JAMO_ID = 65; /** * @stable ICU 2.4 */ public static final int KANBUN_ID = 66; /** * @stable ICU 2.4 */ public static final int BOPOMOFO_EXTENDED_ID = 67; /** * @stable ICU 2.4 */ public static final int ENCLOSED_CJK_LETTERS_AND_MONTHS_ID = 68; /** * @stable ICU 2.4 */ public static final int CJK_COMPATIBILITY_ID = 69; /** * @stable ICU 2.4 */ public static final int CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A_ID = 70; /** * @stable ICU 2.4 */ public static final int CJK_UNIFIED_IDEOGRAPHS_ID = 71; /** * @stable ICU 2.4 */ public static final int YI_SYLLABLES_ID = 72; /** * @stable ICU 2.4 */ public static final int YI_RADICALS_ID = 73; /** * @stable ICU 2.4 */ public static final int HANGUL_SYLLABLES_ID = 74; /** * @stable ICU 2.4 */ public static final int HIGH_SURROGATES_ID = 75; /** * @stable ICU 2.4 */ public static final int HIGH_PRIVATE_USE_SURROGATES_ID = 76; /** * @stable ICU 2.4 */ public static final int LOW_SURROGATES_ID = 77; /** * Same as public static final int PRIVATE_USE. * Until Unicode 3.1.1; the corresponding block name was "Private Use"; * and multiple code point ranges had this block. * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" * and adds separate blocks for the supplementary PUAs. * @stable ICU 2.4 */ public static final int PRIVATE_USE_AREA_ID = 78; /** * Same as public static final int PRIVATE_USE_AREA. * Until Unicode 3.1.1; the corresponding block name was "Private Use"; * and multiple code point ranges had this block. * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" * and adds separate blocks for the supplementary PUAs. * @stable ICU 2.4 */ public static final int PRIVATE_USE_ID = PRIVATE_USE_AREA_ID; /** * @stable ICU 2.4 */ public static final int CJK_COMPATIBILITY_IDEOGRAPHS_ID = 79; /** * @stable ICU 2.4 */ public static final int ALPHABETIC_PRESENTATION_FORMS_ID = 80; /** * @stable ICU 2.4 */ public static final int ARABIC_PRESENTATION_FORMS_A_ID = 81; /** * @stable ICU 2.4 */ public static final int COMBINING_HALF_MARKS_ID = 82; /** * @stable ICU 2.4 */ public static final int CJK_COMPATIBILITY_FORMS_ID = 83; /** * @stable ICU 2.4 */ public static final int SMALL_FORM_VARIANTS_ID = 84; /** * @stable ICU 2.4 */ public static final int ARABIC_PRESENTATION_FORMS_B_ID = 85; /** * @stable ICU 2.4 */ public static final int SPECIALS_ID = 86; /** * @stable ICU 2.4 */ public static final int HALFWIDTH_AND_FULLWIDTH_FORMS_ID = 87; /** * @stable ICU 2.4 */ public static final int OLD_ITALIC_ID = 88; /** * @stable ICU 2.4 */ public static final int GOTHIC_ID = 89; /** * @stable ICU 2.4 */ public static final int DESERET_ID = 90; /** * @stable ICU 2.4 */ public static final int BYZANTINE_MUSICAL_SYMBOLS_ID = 91; /** * @stable ICU 2.4 */ public static final int MUSICAL_SYMBOLS_ID = 92; /** * @stable ICU 2.4 */ public static final int MATHEMATICAL_ALPHANUMERIC_SYMBOLS_ID = 93; /** * @stable ICU 2.4 */ public static final int CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B_ID = 94; /** * @stable ICU 2.4 */ public static final int CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT_ID = 95; /** * @stable ICU 2.4 */ public static final int TAGS_ID = 96; // New blocks in Unicode 3.2 /** * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement". * @stable ICU 2.4 */ public static final int CYRILLIC_SUPPLEMENTARY_ID = 97; /** * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement". * @stable ICU 3.0 */ public static final int CYRILLIC_SUPPLEMENT_ID = 97; /** * @stable ICU 2.4 */ public static final int TAGALOG_ID = 98; /** * @stable ICU 2.4 */ public static final int HANUNOO_ID = 99; /** * @stable ICU 2.4 */ public static final int BUHID_ID = 100; /** * @stable ICU 2.4 */ public static final int TAGBANWA_ID = 101; /** * @stable ICU 2.4 */ public static final int MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A_ID = 102; /** * @stable ICU 2.4 */ public static final int SUPPLEMENTAL_ARROWS_A_ID = 103; /** * @stable ICU 2.4 */ public static final int SUPPLEMENTAL_ARROWS_B_ID = 104; /** * @stable ICU 2.4 */ public static final int MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B_ID = 105; /** * @stable ICU 2.4 */ public static final int SUPPLEMENTAL_MATHEMATICAL_OPERATORS_ID = 106; /** * @stable ICU 2.4 */ public static final int KATAKANA_PHONETIC_EXTENSIONS_ID = 107; /** * @stable ICU 2.4 */ public static final int VARIATION_SELECTORS_ID = 108; /** * @stable ICU 2.4 */ public static final int SUPPLEMENTARY_PRIVATE_USE_AREA_A_ID = 109; /** * @stable ICU 2.4 */ public static final int SUPPLEMENTARY_PRIVATE_USE_AREA_B_ID = 110; /** * @stable ICU 2.6 */ public static final int LIMBU_ID = 111; /*[1900]*/ /** * @stable ICU 2.6 */ public static final int TAI_LE_ID = 112; /*[1950]*/ /** * @stable ICU 2.6 */ public static final int KHMER_SYMBOLS_ID = 113; /*[19E0]*/ /** * @stable ICU 2.6 */ public static final int PHONETIC_EXTENSIONS_ID = 114; /*[1D00]*/ /** * @stable ICU 2.6 */ public static final int MISCELLANEOUS_SYMBOLS_AND_ARROWS_ID = 115; /*[2B00]*/ /** * @stable ICU 2.6 */ public static final int YIJING_HEXAGRAM_SYMBOLS_ID = 116; /*[4DC0]*/ /** * @stable ICU 2.6 */ public static final int LINEAR_B_SYLLABARY_ID = 117; /*[10000]*/ /** * @stable ICU 2.6 */ public static final int LINEAR_B_IDEOGRAMS_ID = 118; /*[10080]*/ /** * @stable ICU 2.6 */ public static final int AEGEAN_NUMBERS_ID = 119; /*[10100]*/ /** * @stable ICU 2.6 */ public static final int UGARITIC_ID = 120; /*[10380]*/ /** * @stable ICU 2.6 */ public static final int SHAVIAN_ID = 121; /*[10450]*/ /** * @stable ICU 2.6 */ public static final int OSMANYA_ID = 122; /*[10480]*/ /** * @stable ICU 2.6 */ public static final int CYPRIOT_SYLLABARY_ID = 123; /*[10800]*/ /** * @stable ICU 2.6 */ public static final int TAI_XUAN_JING_SYMBOLS_ID = 124; /*[1D300]*/ /** * @stable ICU 2.6 */ public static final int VARIATION_SELECTORS_SUPPLEMENT_ID = 125; /*[E0100]*/ /* New blocks in Unicode 4.1 */ /** * @stable ICU 3.4 */ public static final int ANCIENT_GREEK_MUSICAL_NOTATION_ID = 126; /*[1D200]*/ /** * @stable ICU 3.4 */ public static final int ANCIENT_GREEK_NUMBERS_ID = 127; /*[10140]*/ /** * @stable ICU 3.4 */ public static final int ARABIC_SUPPLEMENT_ID = 128; /*[0750]*/ /** * @stable ICU 3.4 */ public static final int BUGINESE_ID = 129; /*[1A00]*/ /** * @stable ICU 3.4 */ public static final int CJK_STROKES_ID = 130; /*[31C0]*/ /** * @stable ICU 3.4 */ public static final int COMBINING_DIACRITICAL_MARKS_SUPPLEMENT_ID = 131; /*[1DC0]*/ /** * @stable ICU 3.4 */ public static final int COPTIC_ID = 132; /*[2C80]*/ /** * @stable ICU 3.4 */ public static final int ETHIOPIC_EXTENDED_ID = 133; /*[2D80]*/ /** * @stable ICU 3.4 */ public static final int ETHIOPIC_SUPPLEMENT_ID = 134; /*[1380]*/ /** * @stable ICU 3.4 */ public static final int GEORGIAN_SUPPLEMENT_ID = 135; /*[2D00]*/ /** * @stable ICU 3.4 */ public static final int GLAGOLITIC_ID = 136; /*[2C00]*/ /** * @stable ICU 3.4 */ public static final int KHAROSHTHI_ID = 137; /*[10A00]*/ /** * @stable ICU 3.4 */ public static final int MODIFIER_TONE_LETTERS_ID = 138; /*[A700]*/ /** * @stable ICU 3.4 */ public static final int NEW_TAI_LUE_ID = 139; /*[1980]*/ /** * @stable ICU 3.4 */ public static final int OLD_PERSIAN_ID = 140; /*[103A0]*/ /** * @stable ICU 3.4 */ public static final int PHONETIC_EXTENSIONS_SUPPLEMENT_ID = 141; /*[1D80]*/ /** * @stable ICU 3.4 */ public static final int SUPPLEMENTAL_PUNCTUATION_ID = 142; /*[2E00]*/ /** * @stable ICU 3.4 */ public static final int SYLOTI_NAGRI_ID = 143; /*[A800]*/ /** * @stable ICU 3.4 */ public static final int TIFINAGH_ID = 144; /*[2D30]*/ /** * @stable ICU 3.4 */ public static final int VERTICAL_FORMS_ID = 145; /*[FE10]*/ /* New blocks in Unicode 5.0 */ /** * @stable ICU 3.6 */ public static final int NKO_ID = 146; /*[07C0]*/ /** * @stable ICU 3.6 */ public static final int BALINESE_ID = 147; /*[1B00]*/ /** * @stable ICU 3.6 */ public static final int LATIN_EXTENDED_C_ID = 148; /*[2C60]*/ /** * @stable ICU 3.6 */ public static final int LATIN_EXTENDED_D_ID = 149; /*[A720]*/ /** * @stable ICU 3.6 */ public static final int PHAGS_PA_ID = 150; /*[A840]*/ /** * @stable ICU 3.6 */ public static final int PHOENICIAN_ID = 151; /*[10900]*/ /** * @stable ICU 3.6 */ public static final int CUNEIFORM_ID = 152; /*[12000]*/ /** * @stable ICU 3.6 */ public static final int CUNEIFORM_NUMBERS_AND_PUNCTUATION_ID = 153; /*[12400]*/ /** * @stable ICU 3.6 */ public static final int COUNTING_ROD_NUMERALS_ID = 154; /*[1D360]*/ /** * @stable ICU 4.0 */ public static final int SUNDANESE_ID = 155; /* [1B80] */ /** * @stable ICU 4.0 */ public static final int LEPCHA_ID = 156; /* [1C00] */ /** * @stable ICU 4.0 */ public static final int OL_CHIKI_ID = 157; /* [1C50] */ /** * @stable ICU 4.0 */ public static final int CYRILLIC_EXTENDED_A_ID = 158; /* [2DE0] */ /** * @stable ICU 4.0 */ public static final int VAI_ID = 159; /* [A500] */ /** * @stable ICU 4.0 */ public static final int CYRILLIC_EXTENDED_B_ID = 160; /* [A640] */ /** * @stable ICU 4.0 */ public static final int SAURASHTRA_ID = 161; /* [A880] */ /** * @stable ICU 4.0 */ public static final int KAYAH_LI_ID = 162; /* [A900] */ /** * @stable ICU 4.0 */ public static final int REJANG_ID = 163; /* [A930] */ /** * @stable ICU 4.0 */ public static final int CHAM_ID = 164; /* [AA00] */ /** * @stable ICU 4.0 */ public static final int ANCIENT_SYMBOLS_ID = 165; /* [10190] */ /** * @stable ICU 4.0 */ public static final int PHAISTOS_DISC_ID = 166; /* [101D0] */ /** * @stable ICU 4.0 */ public static final int LYCIAN_ID = 167; /* [10280] */ /** * @stable ICU 4.0 */ public static final int CARIAN_ID = 168; /* [102A0] */ /** * @stable ICU 4.0 */ public static final int LYDIAN_ID = 169; /* [10920] */ /** * @stable ICU 4.0 */ public static final int MAHJONG_TILES_ID = 170; /* [1F000] */ /** * @stable ICU 4.0 */ public static final int DOMINO_TILES_ID = 171; /* [1F030] */ /** * @stable ICU 2.4 */ public static final int COUNT = 172; // blocks objects --------------------------------------------------- /** * @stable ICU 2.6 */ public static final UnicodeBlock NO_BLOCK = new UnicodeBlock("NO_BLOCK", 0); /** * @stable ICU 2.4 */ public static final UnicodeBlock BASIC_LATIN = new UnicodeBlock("BASIC_LATIN", BASIC_LATIN_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LATIN_1_SUPPLEMENT = new UnicodeBlock("LATIN_1_SUPPLEMENT", LATIN_1_SUPPLEMENT_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LATIN_EXTENDED_A = new UnicodeBlock("LATIN_EXTENDED_A", LATIN_EXTENDED_A_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LATIN_EXTENDED_B = new UnicodeBlock("LATIN_EXTENDED_B", LATIN_EXTENDED_B_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock IPA_EXTENSIONS = new UnicodeBlock("IPA_EXTENSIONS", IPA_EXTENSIONS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SPACING_MODIFIER_LETTERS = new UnicodeBlock("SPACING_MODIFIER_LETTERS", SPACING_MODIFIER_LETTERS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS = new UnicodeBlock("COMBINING_DIACRITICAL_MARKS", COMBINING_DIACRITICAL_MARKS_ID); /** * Unicode 3.2 renames this block to "Greek and Coptic". * @stable ICU 2.4 */ public static final UnicodeBlock GREEK = new UnicodeBlock("GREEK", GREEK_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CYRILLIC = new UnicodeBlock("CYRILLIC", CYRILLIC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ARMENIAN = new UnicodeBlock("ARMENIAN", ARMENIAN_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HEBREW = new UnicodeBlock("HEBREW", HEBREW_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ARABIC = new UnicodeBlock("ARABIC", ARABIC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SYRIAC = new UnicodeBlock("SYRIAC", SYRIAC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock THAANA = new UnicodeBlock("THAANA", THAANA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock DEVANAGARI = new UnicodeBlock("DEVANAGARI", DEVANAGARI_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BENGALI = new UnicodeBlock("BENGALI", BENGALI_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GURMUKHI = new UnicodeBlock("GURMUKHI", GURMUKHI_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GUJARATI = new UnicodeBlock("GUJARATI", GUJARATI_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ORIYA = new UnicodeBlock("ORIYA", ORIYA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock TAMIL = new UnicodeBlock("TAMIL", TAMIL_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock TELUGU = new UnicodeBlock("TELUGU", TELUGU_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock KANNADA = new UnicodeBlock("KANNADA", KANNADA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MALAYALAM = new UnicodeBlock("MALAYALAM", MALAYALAM_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SINHALA = new UnicodeBlock("SINHALA", SINHALA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock THAI = new UnicodeBlock("THAI", THAI_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LAO = new UnicodeBlock("LAO", LAO_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock TIBETAN = new UnicodeBlock("TIBETAN", TIBETAN_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MYANMAR = new UnicodeBlock("MYANMAR", MYANMAR_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GEORGIAN = new UnicodeBlock("GEORGIAN", GEORGIAN_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HANGUL_JAMO = new UnicodeBlock("HANGUL_JAMO", HANGUL_JAMO_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ETHIOPIC = new UnicodeBlock("ETHIOPIC", ETHIOPIC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CHEROKEE = new UnicodeBlock("CHEROKEE", CHEROKEE_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS = new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS", UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock OGHAM = new UnicodeBlock("OGHAM", OGHAM_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock RUNIC = new UnicodeBlock("RUNIC", RUNIC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock KHMER = new UnicodeBlock("KHMER", KHMER_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MONGOLIAN = new UnicodeBlock("MONGOLIAN", MONGOLIAN_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LATIN_EXTENDED_ADDITIONAL = new UnicodeBlock("LATIN_EXTENDED_ADDITIONAL", LATIN_EXTENDED_ADDITIONAL_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GREEK_EXTENDED = new UnicodeBlock("GREEK_EXTENDED", GREEK_EXTENDED_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GENERAL_PUNCTUATION = new UnicodeBlock("GENERAL_PUNCTUATION", GENERAL_PUNCTUATION_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SUPERSCRIPTS_AND_SUBSCRIPTS = new UnicodeBlock("SUPERSCRIPTS_AND_SUBSCRIPTS", SUPERSCRIPTS_AND_SUBSCRIPTS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CURRENCY_SYMBOLS = new UnicodeBlock("CURRENCY_SYMBOLS", CURRENCY_SYMBOLS_ID); /** * Unicode 3.2 renames this block to "Combining Diacritical Marks for * Symbols". * @stable ICU 2.4 */ public static final UnicodeBlock COMBINING_MARKS_FOR_SYMBOLS = new UnicodeBlock("COMBINING_MARKS_FOR_SYMBOLS", COMBINING_MARKS_FOR_SYMBOLS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LETTERLIKE_SYMBOLS = new UnicodeBlock("LETTERLIKE_SYMBOLS", LETTERLIKE_SYMBOLS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock NUMBER_FORMS = new UnicodeBlock("NUMBER_FORMS", NUMBER_FORMS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ARROWS = new UnicodeBlock("ARROWS", ARROWS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MATHEMATICAL_OPERATORS = new UnicodeBlock("MATHEMATICAL_OPERATORS", MATHEMATICAL_OPERATORS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MISCELLANEOUS_TECHNICAL = new UnicodeBlock("MISCELLANEOUS_TECHNICAL", MISCELLANEOUS_TECHNICAL_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CONTROL_PICTURES = new UnicodeBlock("CONTROL_PICTURES", CONTROL_PICTURES_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock OPTICAL_CHARACTER_RECOGNITION = new UnicodeBlock("OPTICAL_CHARACTER_RECOGNITION", OPTICAL_CHARACTER_RECOGNITION_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ENCLOSED_ALPHANUMERICS = new UnicodeBlock("ENCLOSED_ALPHANUMERICS", ENCLOSED_ALPHANUMERICS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BOX_DRAWING = new UnicodeBlock("BOX_DRAWING", BOX_DRAWING_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BLOCK_ELEMENTS = new UnicodeBlock("BLOCK_ELEMENTS", BLOCK_ELEMENTS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GEOMETRIC_SHAPES = new UnicodeBlock("GEOMETRIC_SHAPES", GEOMETRIC_SHAPES_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MISCELLANEOUS_SYMBOLS = new UnicodeBlock("MISCELLANEOUS_SYMBOLS", MISCELLANEOUS_SYMBOLS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock DINGBATS = new UnicodeBlock("DINGBATS", DINGBATS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BRAILLE_PATTERNS = new UnicodeBlock("BRAILLE_PATTERNS", BRAILLE_PATTERNS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_RADICALS_SUPPLEMENT = new UnicodeBlock("CJK_RADICALS_SUPPLEMENT", CJK_RADICALS_SUPPLEMENT_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock KANGXI_RADICALS = new UnicodeBlock("KANGXI_RADICALS", KANGXI_RADICALS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock IDEOGRAPHIC_DESCRIPTION_CHARACTERS = new UnicodeBlock("IDEOGRAPHIC_DESCRIPTION_CHARACTERS", IDEOGRAPHIC_DESCRIPTION_CHARACTERS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_SYMBOLS_AND_PUNCTUATION = new UnicodeBlock("CJK_SYMBOLS_AND_PUNCTUATION", CJK_SYMBOLS_AND_PUNCTUATION_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HIRAGANA = new UnicodeBlock("HIRAGANA", HIRAGANA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock KATAKANA = new UnicodeBlock("KATAKANA", KATAKANA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BOPOMOFO = new UnicodeBlock("BOPOMOFO", BOPOMOFO_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HANGUL_COMPATIBILITY_JAMO = new UnicodeBlock("HANGUL_COMPATIBILITY_JAMO", HANGUL_COMPATIBILITY_JAMO_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock KANBUN = new UnicodeBlock("KANBUN", KANBUN_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BOPOMOFO_EXTENDED = new UnicodeBlock("BOPOMOFO_EXTENDED", BOPOMOFO_EXTENDED_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ENCLOSED_CJK_LETTERS_AND_MONTHS = new UnicodeBlock("ENCLOSED_CJK_LETTERS_AND_MONTHS", ENCLOSED_CJK_LETTERS_AND_MONTHS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_COMPATIBILITY = new UnicodeBlock("CJK_COMPATIBILITY", CJK_COMPATIBILITY_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A = new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A", CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS = new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS", CJK_UNIFIED_IDEOGRAPHS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock YI_SYLLABLES = new UnicodeBlock("YI_SYLLABLES", YI_SYLLABLES_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock YI_RADICALS = new UnicodeBlock("YI_RADICALS", YI_RADICALS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HANGUL_SYLLABLES = new UnicodeBlock("HANGUL_SYLLABLES", HANGUL_SYLLABLES_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HIGH_SURROGATES = new UnicodeBlock("HIGH_SURROGATES", HIGH_SURROGATES_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HIGH_PRIVATE_USE_SURROGATES = new UnicodeBlock("HIGH_PRIVATE_USE_SURROGATES", HIGH_PRIVATE_USE_SURROGATES_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock LOW_SURROGATES = new UnicodeBlock("LOW_SURROGATES", LOW_SURROGATES_ID); /** * Same as public static final int PRIVATE_USE. * Until Unicode 3.1.1; the corresponding block name was "Private Use"; * and multiple code point ranges had this block. * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" * and adds separate blocks for the supplementary PUAs. * @stable ICU 2.4 */ public static final UnicodeBlock PRIVATE_USE_AREA = new UnicodeBlock("PRIVATE_USE_AREA", 78); /** * Same as public static final int PRIVATE_USE_AREA. * Until Unicode 3.1.1; the corresponding block name was "Private Use"; * and multiple code point ranges had this block. * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" * and adds separate blocks for the supplementary PUAs. * @stable ICU 2.4 */ public static final UnicodeBlock PRIVATE_USE = PRIVATE_USE_AREA; /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS = new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS", CJK_COMPATIBILITY_IDEOGRAPHS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ALPHABETIC_PRESENTATION_FORMS = new UnicodeBlock("ALPHABETIC_PRESENTATION_FORMS", ALPHABETIC_PRESENTATION_FORMS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_A = new UnicodeBlock("ARABIC_PRESENTATION_FORMS_A", ARABIC_PRESENTATION_FORMS_A_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock COMBINING_HALF_MARKS = new UnicodeBlock("COMBINING_HALF_MARKS", COMBINING_HALF_MARKS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_COMPATIBILITY_FORMS = new UnicodeBlock("CJK_COMPATIBILITY_FORMS", CJK_COMPATIBILITY_FORMS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SMALL_FORM_VARIANTS = new UnicodeBlock("SMALL_FORM_VARIANTS", SMALL_FORM_VARIANTS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_B = new UnicodeBlock("ARABIC_PRESENTATION_FORMS_B", ARABIC_PRESENTATION_FORMS_B_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SPECIALS = new UnicodeBlock("SPECIALS", SPECIALS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HALFWIDTH_AND_FULLWIDTH_FORMS = new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS", HALFWIDTH_AND_FULLWIDTH_FORMS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock OLD_ITALIC = new UnicodeBlock("OLD_ITALIC", OLD_ITALIC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock GOTHIC = new UnicodeBlock("GOTHIC", GOTHIC_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock DESERET = new UnicodeBlock("DESERET", DESERET_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BYZANTINE_MUSICAL_SYMBOLS = new UnicodeBlock("BYZANTINE_MUSICAL_SYMBOLS", BYZANTINE_MUSICAL_SYMBOLS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MUSICAL_SYMBOLS = new UnicodeBlock("MUSICAL_SYMBOLS", MUSICAL_SYMBOLS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MATHEMATICAL_ALPHANUMERIC_SYMBOLS = new UnicodeBlock("MATHEMATICAL_ALPHANUMERIC_SYMBOLS", MATHEMATICAL_ALPHANUMERIC_SYMBOLS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B = new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B", CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT = new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT", CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock TAGS = new UnicodeBlock("TAGS", TAGS_ID); // New blocks in Unicode 3.2 /** * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement". * @stable ICU 2.4 */ public static final UnicodeBlock CYRILLIC_SUPPLEMENTARY = new UnicodeBlock("CYRILLIC_SUPPLEMENTARY", CYRILLIC_SUPPLEMENTARY_ID); /** * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement". * @stable ICU 3.0 */ public static final UnicodeBlock CYRILLIC_SUPPLEMENT = new UnicodeBlock("CYRILLIC_SUPPLEMENT", CYRILLIC_SUPPLEMENT_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock TAGALOG = new UnicodeBlock("TAGALOG", TAGALOG_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock HANUNOO = new UnicodeBlock("HANUNOO", HANUNOO_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock BUHID = new UnicodeBlock("BUHID", BUHID_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock TAGBANWA = new UnicodeBlock("TAGBANWA", TAGBANWA_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A = new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A", MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SUPPLEMENTAL_ARROWS_A = new UnicodeBlock("SUPPLEMENTAL_ARROWS_A", SUPPLEMENTAL_ARROWS_A_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SUPPLEMENTAL_ARROWS_B = new UnicodeBlock("SUPPLEMENTAL_ARROWS_B", SUPPLEMENTAL_ARROWS_B_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B = new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B", MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SUPPLEMENTAL_MATHEMATICAL_OPERATORS = new UnicodeBlock("SUPPLEMENTAL_MATHEMATICAL_OPERATORS", SUPPLEMENTAL_MATHEMATICAL_OPERATORS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock KATAKANA_PHONETIC_EXTENSIONS = new UnicodeBlock("KATAKANA_PHONETIC_EXTENSIONS", KATAKANA_PHONETIC_EXTENSIONS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock VARIATION_SELECTORS = new UnicodeBlock("VARIATION_SELECTORS", VARIATION_SELECTORS_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A = new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A", SUPPLEMENTARY_PRIVATE_USE_AREA_A_ID); /** * @stable ICU 2.4 */ public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_B = new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_B", SUPPLEMENTARY_PRIVATE_USE_AREA_B_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock LIMBU = new UnicodeBlock("LIMBU", LIMBU_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock TAI_LE = new UnicodeBlock("TAI_LE", TAI_LE_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock KHMER_SYMBOLS = new UnicodeBlock("KHMER_SYMBOLS", KHMER_SYMBOLS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock PHONETIC_EXTENSIONS = new UnicodeBlock("PHONETIC_EXTENSIONS", PHONETIC_EXTENSIONS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_ARROWS = new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_ARROWS", MISCELLANEOUS_SYMBOLS_AND_ARROWS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock YIJING_HEXAGRAM_SYMBOLS = new UnicodeBlock("YIJING_HEXAGRAM_SYMBOLS", YIJING_HEXAGRAM_SYMBOLS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock LINEAR_B_SYLLABARY = new UnicodeBlock("LINEAR_B_SYLLABARY", LINEAR_B_SYLLABARY_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock LINEAR_B_IDEOGRAMS = new UnicodeBlock("LINEAR_B_IDEOGRAMS", LINEAR_B_IDEOGRAMS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock AEGEAN_NUMBERS = new UnicodeBlock("AEGEAN_NUMBERS", AEGEAN_NUMBERS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock UGARITIC = new UnicodeBlock("UGARITIC", UGARITIC_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock SHAVIAN = new UnicodeBlock("SHAVIAN", SHAVIAN_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock OSMANYA = new UnicodeBlock("OSMANYA", OSMANYA_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock CYPRIOT_SYLLABARY = new UnicodeBlock("CYPRIOT_SYLLABARY", CYPRIOT_SYLLABARY_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock TAI_XUAN_JING_SYMBOLS = new UnicodeBlock("TAI_XUAN_JING_SYMBOLS", TAI_XUAN_JING_SYMBOLS_ID); /** * @stable ICU 2.6 */ public static final UnicodeBlock VARIATION_SELECTORS_SUPPLEMENT = new UnicodeBlock("VARIATION_SELECTORS_SUPPLEMENT", VARIATION_SELECTORS_SUPPLEMENT_ID); /* New blocks in Unicode 4.1 */ /** * @stable ICU 3.4 */ public static final UnicodeBlock ANCIENT_GREEK_MUSICAL_NOTATION = new UnicodeBlock("ANCIENT_GREEK_MUSICAL_NOTATION", ANCIENT_GREEK_MUSICAL_NOTATION_ID); /*[1D200]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock ANCIENT_GREEK_NUMBERS = new UnicodeBlock("ANCIENT_GREEK_NUMBERS", ANCIENT_GREEK_NUMBERS_ID); /*[10140]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock ARABIC_SUPPLEMENT = new UnicodeBlock("ARABIC_SUPPLEMENT", ARABIC_SUPPLEMENT_ID); /*[0750]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock BUGINESE = new UnicodeBlock("BUGINESE", BUGINESE_ID); /*[1A00]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock CJK_STROKES = new UnicodeBlock("CJK_STROKES", CJK_STROKES_ID); /*[31C0]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS_SUPPLEMENT = new UnicodeBlock("COMBINING_DIACRITICAL_MARKS_SUPPLEMENT", COMBINING_DIACRITICAL_MARKS_SUPPLEMENT_ID); /*[1DC0]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock COPTIC = new UnicodeBlock("COPTIC", COPTIC_ID); /*[2C80]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock ETHIOPIC_EXTENDED = new UnicodeBlock("ETHIOPIC_EXTENDED", ETHIOPIC_EXTENDED_ID); /*[2D80]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock ETHIOPIC_SUPPLEMENT = new UnicodeBlock("ETHIOPIC_SUPPLEMENT", ETHIOPIC_SUPPLEMENT_ID); /*[1380]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock GEORGIAN_SUPPLEMENT = new UnicodeBlock("GEORGIAN_SUPPLEMENT", GEORGIAN_SUPPLEMENT_ID); /*[2D00]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock GLAGOLITIC = new UnicodeBlock("GLAGOLITIC", GLAGOLITIC_ID); /*[2C00]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock KHAROSHTHI = new UnicodeBlock("KHAROSHTHI", KHAROSHTHI_ID); /*[10A00]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock MODIFIER_TONE_LETTERS = new UnicodeBlock("MODIFIER_TONE_LETTERS", MODIFIER_TONE_LETTERS_ID); /*[A700]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock NEW_TAI_LUE = new UnicodeBlock("NEW_TAI_LUE", NEW_TAI_LUE_ID); /*[1980]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock OLD_PERSIAN = new UnicodeBlock("OLD_PERSIAN", OLD_PERSIAN_ID); /*[103A0]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock PHONETIC_EXTENSIONS_SUPPLEMENT = new UnicodeBlock("PHONETIC_EXTENSIONS_SUPPLEMENT", PHONETIC_EXTENSIONS_SUPPLEMENT_ID); /*[1D80]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock SUPPLEMENTAL_PUNCTUATION = new UnicodeBlock("SUPPLEMENTAL_PUNCTUATION", SUPPLEMENTAL_PUNCTUATION_ID); /*[2E00]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock SYLOTI_NAGRI = new UnicodeBlock("SYLOTI_NAGRI", SYLOTI_NAGRI_ID); /*[A800]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock TIFINAGH = new UnicodeBlock("TIFINAGH", TIFINAGH_ID); /*[2D30]*/ /** * @stable ICU 3.4 */ public static final UnicodeBlock VERTICAL_FORMS = new UnicodeBlock("VERTICAL_FORMS", VERTICAL_FORMS_ID); /*[FE10]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock NKO = new UnicodeBlock("NKO", NKO_ID); /*[07C0]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock BALINESE = new UnicodeBlock("BALINESE", BALINESE_ID); /*[1B00]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock LATIN_EXTENDED_C = new UnicodeBlock("LATIN_EXTENDED_C", LATIN_EXTENDED_C_ID); /*[2C60]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock LATIN_EXTENDED_D = new UnicodeBlock("LATIN_EXTENDED_D", LATIN_EXTENDED_D_ID); /*[A720]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock PHAGS_PA = new UnicodeBlock("PHAGS_PA", PHAGS_PA_ID); /*[A840]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock PHOENICIAN = new UnicodeBlock("PHOENICIAN", PHOENICIAN_ID); /*[10900]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock CUNEIFORM = new UnicodeBlock("CUNEIFORM", CUNEIFORM_ID); /*[12000]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock CUNEIFORM_NUMBERS_AND_PUNCTUATION = new UnicodeBlock("CUNEIFORM_NUMBERS_AND_PUNCTUATION", CUNEIFORM_NUMBERS_AND_PUNCTUATION_ID); /*[12400]*/ /** * @stable ICU 3.6 */ public static final UnicodeBlock COUNTING_ROD_NUMERALS = new UnicodeBlock("COUNTING_ROD_NUMERALS", COUNTING_ROD_NUMERALS_ID); /*[1D360]*/ /** * @stable ICU 4.0 */ public static final UnicodeBlock SUNDANESE = new UnicodeBlock("SUNDANESE", SUNDANESE_ID); /* [1B80] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock LEPCHA = new UnicodeBlock("LEPCHA", LEPCHA_ID); /* [1C00] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock OL_CHIKI = new UnicodeBlock("OL_CHIKI", OL_CHIKI_ID); /* [1C50] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock CYRILLIC_EXTENDED_A = new UnicodeBlock("CYRILLIC_EXTENDED_A", CYRILLIC_EXTENDED_A_ID); /* [2DE0] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock VAI = new UnicodeBlock("VAI", VAI_ID); /* [A500] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock CYRILLIC_EXTENDED_B = new UnicodeBlock("CYRILLIC_EXTENDED_B", CYRILLIC_EXTENDED_B_ID); /* [A640] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock SAURASHTRA = new UnicodeBlock("SAURASHTRA", SAURASHTRA_ID); /* [A880] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock KAYAH_LI = new UnicodeBlock("KAYAH_LI", KAYAH_LI_ID); /* [A900] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock REJANG = new UnicodeBlock("REJANG", REJANG_ID); /* [A930] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock CHAM = new UnicodeBlock("CHAM", CHAM_ID); /* [AA00] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock ANCIENT_SYMBOLS = new UnicodeBlock("ANCIENT_SYMBOLS", ANCIENT_SYMBOLS_ID); /* [10190] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock PHAISTOS_DISC = new UnicodeBlock("PHAISTOS_DISC", PHAISTOS_DISC_ID); /* [101D0] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock LYCIAN = new UnicodeBlock("LYCIAN", LYCIAN_ID); /* [10280] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock CARIAN = new UnicodeBlock("CARIAN", CARIAN_ID); /* [102A0] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock LYDIAN = new UnicodeBlock("LYDIAN", LYDIAN_ID); /* [10920] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock MAHJONG_TILES = new UnicodeBlock("MAHJONG_TILES", MAHJONG_TILES_ID); /* [1F000] */ /** * @stable ICU 4.0 */ public static final UnicodeBlock DOMINO_TILES = new UnicodeBlock("DOMINO_TILES", DOMINO_TILES_ID); /* [1F030] */ /** * @stable ICU 2.4 */ public static final UnicodeBlock INVALID_CODE = new UnicodeBlock("INVALID_CODE", INVALID_CODE_ID); // public methods -------------------------------------------------- /** * Gets the only instance of the UnicodeBlock with the argument ID. * If no such ID exists, a INVALID_CODE UnicodeBlock will be returned. * @param id UnicodeBlock ID * @return the only instance of the UnicodeBlock with the argument ID * if it exists, otherwise a INVALID_CODE UnicodeBlock will be * returned. * @stable ICU 2.4 */ public static UnicodeBlock getInstance(int id) { if (id >= 0 && id < BLOCKS_.length) { return BLOCKS_[id]; } return INVALID_CODE; } /** * Returns the Unicode allocation block that contains the code point, * or null if the code point is not a member of a defined block. * @param ch code point to be tested * @return the Unicode allocation block that contains the code point * @stable ICU 2.4 */ public static UnicodeBlock of(int ch) { if (ch > MAX_VALUE) { return INVALID_CODE; } return UnicodeBlock.getInstance((PROPERTY_.getAdditional(ch, 0) & BLOCK_MASK_) >> BLOCK_SHIFT_); } /** * Internal function returning of(ch).getID(). * * @param ch * @return numeric block value * @internal */ static int idOf(int ch) { if (ch < 0 || ch > MAX_VALUE) { return -1; } return (PROPERTY_.getAdditional(ch, 0) & BLOCK_MASK_) >> BLOCK_SHIFT_; } /** * Cover the JDK 1.5 API. Return the Unicode block with the * given name.
Note: Unlike JDK 1.5, this only matches * against the official UCD name and the Java block name * (ignoring case). * @param blockName the name of the block to match * @return the UnicodeBlock with that name * @throws IllegalArgumentException if the blockName could not be matched * @stable ICU 3.0 */ public static final UnicodeBlock forName(String blockName) { Map m = null; if (mref != null) { m = (Map)mref.get(); } if (m == null) { m = new HashMap(BLOCKS_.length); for (int i = 0; i < BLOCKS_.length; ++i) { UnicodeBlock b = BLOCKS_[i]; String name = trimBlockName(getPropertyValueName(UProperty.BLOCK, b.getID(), UProperty.NameChoice.LONG)); m.put(name, b); } mref = new SoftReference(m); } UnicodeBlock b = (UnicodeBlock)m.get(trimBlockName(blockName)); if (b == null) { throw new IllegalArgumentException(); } return b; } private static SoftReference mref; private static String trimBlockName(String name) { String upper = name.toUpperCase(); StringBuffer result = new StringBuffer(upper.length()); for (int i = 0; i < upper.length(); i++) { char c = upper.charAt(i); if (c != ' ' && c != '_' && c != '-') { result.append(c); } } return result.toString(); } /** * Returns the type ID of this Unicode block * @return integer type ID of this Unicode block * @stable ICU 2.4 */ public int getID() { return m_id_; } // private data members --------------------------------------------- /** * Array of UnicodeBlocks, for easy access in getInstance(int) */ private final static UnicodeBlock BLOCKS_[] = { NO_BLOCK, BASIC_LATIN, LATIN_1_SUPPLEMENT, LATIN_EXTENDED_A, LATIN_EXTENDED_B, IPA_EXTENSIONS, SPACING_MODIFIER_LETTERS, COMBINING_DIACRITICAL_MARKS, GREEK, CYRILLIC, ARMENIAN, HEBREW, ARABIC, SYRIAC, THAANA, DEVANAGARI, BENGALI, GURMUKHI, GUJARATI, ORIYA, TAMIL, TELUGU, KANNADA, MALAYALAM, SINHALA, THAI, LAO, TIBETAN, MYANMAR, GEORGIAN, HANGUL_JAMO, ETHIOPIC, CHEROKEE, UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS, OGHAM, RUNIC, KHMER, MONGOLIAN, LATIN_EXTENDED_ADDITIONAL, GREEK_EXTENDED, GENERAL_PUNCTUATION, SUPERSCRIPTS_AND_SUBSCRIPTS, CURRENCY_SYMBOLS, COMBINING_MARKS_FOR_SYMBOLS, LETTERLIKE_SYMBOLS, NUMBER_FORMS, ARROWS, MATHEMATICAL_OPERATORS, MISCELLANEOUS_TECHNICAL, CONTROL_PICTURES, OPTICAL_CHARACTER_RECOGNITION, ENCLOSED_ALPHANUMERICS, BOX_DRAWING, BLOCK_ELEMENTS, GEOMETRIC_SHAPES, MISCELLANEOUS_SYMBOLS, DINGBATS, BRAILLE_PATTERNS, CJK_RADICALS_SUPPLEMENT, KANGXI_RADICALS, IDEOGRAPHIC_DESCRIPTION_CHARACTERS, CJK_SYMBOLS_AND_PUNCTUATION, HIRAGANA, KATAKANA, BOPOMOFO, HANGUL_COMPATIBILITY_JAMO, KANBUN, BOPOMOFO_EXTENDED, ENCLOSED_CJK_LETTERS_AND_MONTHS, CJK_COMPATIBILITY, CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A, CJK_UNIFIED_IDEOGRAPHS, YI_SYLLABLES, YI_RADICALS, HANGUL_SYLLABLES, HIGH_SURROGATES, HIGH_PRIVATE_USE_SURROGATES, LOW_SURROGATES, PRIVATE_USE_AREA, CJK_COMPATIBILITY_IDEOGRAPHS, ALPHABETIC_PRESENTATION_FORMS, ARABIC_PRESENTATION_FORMS_A, COMBINING_HALF_MARKS, CJK_COMPATIBILITY_FORMS, SMALL_FORM_VARIANTS, ARABIC_PRESENTATION_FORMS_B, SPECIALS, HALFWIDTH_AND_FULLWIDTH_FORMS, OLD_ITALIC, GOTHIC, DESERET, BYZANTINE_MUSICAL_SYMBOLS, MUSICAL_SYMBOLS, MATHEMATICAL_ALPHANUMERIC_SYMBOLS, CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B, CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT, TAGS, CYRILLIC_SUPPLEMENT, TAGALOG, HANUNOO, BUHID, TAGBANWA, MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A, SUPPLEMENTAL_ARROWS_A, SUPPLEMENTAL_ARROWS_B, MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B, SUPPLEMENTAL_MATHEMATICAL_OPERATORS, KATAKANA_PHONETIC_EXTENSIONS, VARIATION_SELECTORS, SUPPLEMENTARY_PRIVATE_USE_AREA_A, SUPPLEMENTARY_PRIVATE_USE_AREA_B, LIMBU, TAI_LE, KHMER_SYMBOLS, PHONETIC_EXTENSIONS, MISCELLANEOUS_SYMBOLS_AND_ARROWS, YIJING_HEXAGRAM_SYMBOLS, LINEAR_B_SYLLABARY, LINEAR_B_IDEOGRAMS, AEGEAN_NUMBERS, UGARITIC, SHAVIAN, OSMANYA, CYPRIOT_SYLLABARY, TAI_XUAN_JING_SYMBOLS, VARIATION_SELECTORS_SUPPLEMENT, /* New blocks in Unicode 4.1 */ ANCIENT_GREEK_MUSICAL_NOTATION, ANCIENT_GREEK_NUMBERS, ARABIC_SUPPLEMENT, BUGINESE, CJK_STROKES, COMBINING_DIACRITICAL_MARKS_SUPPLEMENT, COPTIC, ETHIOPIC_EXTENDED, ETHIOPIC_SUPPLEMENT, GEORGIAN_SUPPLEMENT, GLAGOLITIC, KHAROSHTHI, MODIFIER_TONE_LETTERS, NEW_TAI_LUE, OLD_PERSIAN, PHONETIC_EXTENSIONS_SUPPLEMENT, SUPPLEMENTAL_PUNCTUATION, SYLOTI_NAGRI, TIFINAGH, VERTICAL_FORMS, NKO, BALINESE, LATIN_EXTENDED_C, LATIN_EXTENDED_D, PHAGS_PA, PHOENICIAN, CUNEIFORM, CUNEIFORM_NUMBERS_AND_PUNCTUATION, COUNTING_ROD_NUMERALS, /* New blocks in Unicode 5.8 */ SUNDANESE, LEPCHA, OL_CHIKI, CYRILLIC_EXTENDED_A, VAI, CYRILLIC_EXTENDED_B, SAURASHTRA, KAYAH_LI, REJANG, CHAM, ANCIENT_SYMBOLS, PHAISTOS_DISC, LYCIAN, CARIAN, LYDIAN, MAHJONG_TILES, DOMINO_TILES, }; static { if (COUNT!=BLOCKS_.length) { throw new java.lang.IllegalStateException("UnicodeBlock fields are inconsistent!"); } } /** * Identification code for this UnicodeBlock */ private int m_id_; // private constructor ---------------------------------------------- /** * UnicodeBlock constructor * @param name name of this UnicodeBlock * @param id unique id of this UnicodeBlock * @exception NullPointerException if name is null */ private UnicodeBlock(String name, int id) { super(name); m_id_ = id; } } /** * East Asian Width constants. * @see UProperty#EAST_ASIAN_WIDTH * @see UCharacter#getIntPropertyValue * @stable ICU 2.4 */ public static interface EastAsianWidth { /** * @stable ICU 2.4 */ public static final int NEUTRAL = 0; /** * @stable ICU 2.4 */ public static final int AMBIGUOUS = 1; /** * @stable ICU 2.4 */ public static final int HALFWIDTH = 2; /** * @stable ICU 2.4 */ public static final int FULLWIDTH = 3; /** * @stable ICU 2.4 */ public static final int NARROW = 4; /** * @stable ICU 2.4 */ public static final int WIDE = 5; /** * @stable ICU 2.4 */ public static final int COUNT = 6; } /** * Decomposition Type constants. * @see UProperty#DECOMPOSITION_TYPE * @stable ICU 2.4 */ public static interface DecompositionType { /** * @stable ICU 2.4 */ public static final int NONE = 0; /** * @stable ICU 2.4 */ public static final int CANONICAL = 1; /** * @stable ICU 2.4 */ public static final int COMPAT = 2; /** * @stable ICU 2.4 */ public static final int CIRCLE = 3; /** * @stable ICU 2.4 */ public static final int FINAL = 4; /** * @stable ICU 2.4 */ public static final int FONT = 5; /** * @stable ICU 2.4 */ public static final int FRACTION = 6; /** * @stable ICU 2.4 */ public static final int INITIAL = 7; /** * @stable ICU 2.4 */ public static final int ISOLATED = 8; /** * @stable ICU 2.4 */ public static final int MEDIAL = 9; /** * @stable ICU 2.4 */ public static final int NARROW = 10; /** * @stable ICU 2.4 */ public static final int NOBREAK = 11; /** * @stable ICU 2.4 */ public static final int SMALL = 12; /** * @stable ICU 2.4 */ public static final int SQUARE = 13; /** * @stable ICU 2.4 */ public static final int SUB = 14; /** * @stable ICU 2.4 */ public static final int SUPER = 15; /** * @stable ICU 2.4 */ public static final int VERTICAL = 16; /** * @stable ICU 2.4 */ public static final int WIDE = 17; /** * @stable ICU 2.4 */ public static final int COUNT = 18; } /** * Joining Type constants. * @see UProperty#JOINING_TYPE * @stable ICU 2.4 */ public static interface JoiningType { /** * @stable ICU 2.4 */ public static final int NON_JOINING = 0; /** * @stable ICU 2.4 */ public static final int JOIN_CAUSING = 1; /** * @stable ICU 2.4 */ public static final int DUAL_JOINING = 2; /** * @stable ICU 2.4 */ public static final int LEFT_JOINING = 3; /** * @stable ICU 2.4 */ public static final int RIGHT_JOINING = 4; /** * @stable ICU 2.4 */ public static final int TRANSPARENT = 5; /** * @stable ICU 2.4 */ public static final int COUNT = 6; } /** * Joining Group constants. * @see UProperty#JOINING_GROUP * @stable ICU 2.4 */ public static interface JoiningGroup { /** * @stable ICU 2.4 */ public static final int NO_JOINING_GROUP = 0; /** * @stable ICU 2.4 */ public static final int AIN = 1; /** * @stable ICU 2.4 */ public static final int ALAPH = 2; /** * @stable ICU 2.4 */ public static final int ALEF = 3; /** * @stable ICU 2.4 */ public static final int BEH = 4; /** * @stable ICU 2.4 */ public static final int BETH = 5; /** * @stable ICU 2.4 */ public static final int DAL = 6; /** * @stable ICU 2.4 */ public static final int DALATH_RISH = 7; /** * @stable ICU 2.4 */ public static final int E = 8; /** * @stable ICU 2.4 */ public static final int FEH = 9; /** * @stable ICU 2.4 */ public static final int FINAL_SEMKATH = 10; /** * @stable ICU 2.4 */ public static final int GAF = 11; /** * @stable ICU 2.4 */ public static final int GAMAL = 12; /** * @stable ICU 2.4 */ public static final int HAH = 13; /** * @stable ICU 2.4 */ public static final int HAMZA_ON_HEH_GOAL = 14; /** * @stable ICU 2.4 */ public static final int HE = 15; /** * @stable ICU 2.4 */ public static final int HEH = 16; /** * @stable ICU 2.4 */ public static final int HEH_GOAL = 17; /** * @stable ICU 2.4 */ public static final int HETH = 18; /** * @stable ICU 2.4 */ public static final int KAF = 19; /** * @stable ICU 2.4 */ public static final int KAPH = 20; /** * @stable ICU 2.4 */ public static final int KNOTTED_HEH = 21; /** * @stable ICU 2.4 */ public static final int LAM = 22; /** * @stable ICU 2.4 */ public static final int LAMADH = 23; /** * @stable ICU 2.4 */ public static final int MEEM = 24; /** * @stable ICU 2.4 */ public static final int MIM = 25; /** * @stable ICU 2.4 */ public static final int NOON = 26; /** * @stable ICU 2.4 */ public static final int NUN = 27; /** * @stable ICU 2.4 */ public static final int PE = 28; /** * @stable ICU 2.4 */ public static final int QAF = 29; /** * @stable ICU 2.4 */ public static final int QAPH = 30; /** * @stable ICU 2.4 */ public static final int REH = 31; /** * @stable ICU 2.4 */ public static final int REVERSED_PE = 32; /** * @stable ICU 2.4 */ public static final int SAD = 33; /** * @stable ICU 2.4 */ public static final int SADHE = 34; /** * @stable ICU 2.4 */ public static final int SEEN = 35; /** * @stable ICU 2.4 */ public static final int SEMKATH = 36; /** * @stable ICU 2.4 */ public static final int SHIN = 37; /** * @stable ICU 2.4 */ public static final int SWASH_KAF = 38; /** * @stable ICU 2.4 */ public static final int SYRIAC_WAW = 39; /** * @stable ICU 2.4 */ public static final int TAH = 40; /** * @stable ICU 2.4 */ public static final int TAW = 41; /** * @stable ICU 2.4 */ public static final int TEH_MARBUTA = 42; /** * @stable ICU 2.4 */ public static final int TETH = 43; /** * @stable ICU 2.4 */ public static final int WAW = 44; /** * @stable ICU 2.4 */ public static final int YEH = 45; /** * @stable ICU 2.4 */ public static final int YEH_BARREE = 46; /** * @stable ICU 2.4 */ public static final int YEH_WITH_TAIL = 47; /** * @stable ICU 2.4 */ public static final int YUDH = 48; /** * @stable ICU 2.4 */ public static final int YUDH_HE = 49; /** * @stable ICU 2.4 */ public static final int ZAIN = 50; /** * @stable ICU 2.6 */ public static final int FE = 51; /** * @stable ICU 2.6 */ public static final int KHAPH = 52; /** * @stable ICU 2.6 */ public static final int ZHAIN = 53; /** * @stable ICU 4.0 */ public static final int BURUSHASKI_YEH_BARREE = 54; /** * @stable ICU 4.0 */ public static final int COUNT = 55; } /** * Grapheme Cluster Break constants. * @see UProperty#GRAPHEME_CLUSTER_BREAK * @stable ICU 3.4 */ public static interface GraphemeClusterBreak { /** * @stable ICU 3.4 */ public static final int OTHER = 0; /** * @stable ICU 3.4 */ public static final int CONTROL = 1; /** * @stable ICU 3.4 */ public static final int CR = 2; /** * @stable ICU 3.4 */ public static final int EXTEND = 3; /** * @stable ICU 3.4 */ public static final int L = 4; /** * @stable ICU 3.4 */ public static final int LF = 5; /** * @stable ICU 3.4 */ public static final int LV = 6; /** * @stable ICU 3.4 */ public static final int LVT = 7; /** * @stable ICU 3.4 */ public static final int T = 8; /** * @stable ICU 3.4 */ public static final int V = 9; /** * @stable ICU 4.0 */ public static final int SPACING_MARK = 10; /** * @stable ICU 4.0 */ public static final int PREPEND = 11; /** * @stable ICU 3.4 */ public static final int COUNT = 12; } /** * Word Break constants. * @see UProperty#WORD_BREAK * @stable ICU 3.4 */ public static interface WordBreak { /** * @stable ICU 3.8 */ public static final int OTHER = 0; /** * @stable ICU 3.8 */ public static final int ALETTER = 1; /** * @stable ICU 3.8 */ public static final int FORMAT = 2; /** * @stable ICU 3.8 */ public static final int KATAKANA = 3; /** * @stable ICU 3.8 */ public static final int MIDLETTER = 4; /** * @stable ICU 3.8 */ public static final int MIDNUM = 5; /** * @stable ICU 3.8 */ public static final int NUMERIC = 6; /** * @stable ICU 3.8 */ public static final int EXTENDNUMLET = 7; /** * @stable ICU 4.0 */ public static final int CR = 8; /** * @stable ICU 4.0 */ public static final int EXTEND = 9; /** * @stable ICU 4.0 */ public static final int LF = 10; /** * @stable ICU 4.0 */ public static final int MIDNUMLET = 11; /** * @stable ICU 4.0 */ public static final int NEWLINE = 12; /** * @stable ICU 4.0 */ public static final int COUNT = 13; } /** * Sentence Break constants. * @see UProperty#SENTENCE_BREAK * @stable ICU 3.4 */ public static interface SentenceBreak { /** * @stable ICU 3.8 */ public static final int OTHER = 0; /** * @stable ICU 3.8 */ public static final int ATERM = 1; /** * @stable ICU 3.8 */ public static final int CLOSE = 2; /** * @stable ICU 3.8 */ public static final int FORMAT = 3; /** * @stable ICU 3.8 */ public static final int LOWER = 4; /** * @stable ICU 3.8 */ public static final int NUMERIC = 5; /** * @stable ICU 3.8 */ public static final int OLETTER = 6; /** * @stable ICU 3.8 */ public static final int SEP = 7; /** * @stable ICU 3.8 */ public static final int SP = 8; /** * @stable ICU 3.8 */ public static final int STERM = 9; /** * @stable ICU 3.8 */ public static final int UPPER = 10; /** * @stable ICU 4.0 */ public static final int CR = 11; /** * @stable ICU 4.0 */ public static final int EXTEND = 12; /** * @stable ICU 4.0 */ public static final int LF = 13; /** * @stable ICU 4.0 */ public static final int SCONTINUE = 14; /** * @stable ICU 4.0 */ public static final int COUNT = 15; } /** * Line Break constants. * @see UProperty#LINE_BREAK * @stable ICU 2.4 */ public static interface LineBreak { /** * @stable ICU 2.4 */ public static final int UNKNOWN = 0; /** * @stable ICU 2.4 */ public static final int AMBIGUOUS = 1; /** * @stable ICU 2.4 */ public static final int ALPHABETIC = 2; /** * @stable ICU 2.4 */ public static final int BREAK_BOTH = 3; /** * @stable ICU 2.4 */ public static final int BREAK_AFTER = 4; /** * @stable ICU 2.4 */ public static final int BREAK_BEFORE = 5; /** * @stable ICU 2.4 */ public static final int MANDATORY_BREAK = 6; /** * @stable ICU 2.4 */ public static final int CONTINGENT_BREAK = 7; /** * @stable ICU 2.4 */ public static final int CLOSE_PUNCTUATION = 8; /** * @stable ICU 2.4 */ public static final int COMBINING_MARK = 9; /** * @stable ICU 2.4 */ public static final int CARRIAGE_RETURN = 10; /** * @stable ICU 2.4 */ public static final int EXCLAMATION = 11; /** * @stable ICU 2.4 */ public static final int GLUE = 12; /** * @stable ICU 2.4 */ public static final int HYPHEN = 13; /** * @stable ICU 2.4 */ public static final int IDEOGRAPHIC = 14; /** * @see #INSEPARABLE * @stable ICU 2.4 */ public static final int INSEPERABLE = 15; /** * Renamed from the misspelled "inseperable" in Unicode 4.0.1. * @stable ICU 3.0 */ public static final int INSEPARABLE = 15; /** * @stable ICU 2.4 */ public static final int INFIX_NUMERIC = 16; /** * @stable ICU 2.4 */ public static final int LINE_FEED = 17; /** * @stable ICU 2.4 */ public static final int NONSTARTER = 18; /** * @stable ICU 2.4 */ public static final int NUMERIC = 19; /** * @stable ICU 2.4 */ public static final int OPEN_PUNCTUATION = 20; /** * @stable ICU 2.4 */ public static final int POSTFIX_NUMERIC = 21; /** * @stable ICU 2.4 */ public static final int PREFIX_NUMERIC = 22; /** * @stable ICU 2.4 */ public static final int QUOTATION = 23; /** * @stable ICU 2.4 */ public static final int COMPLEX_CONTEXT = 24; /** * @stable ICU 2.4 */ public static final int SURROGATE = 25; /** * @stable ICU 2.4 */ public static final int SPACE = 26; /** * @stable ICU 2.4 */ public static final int BREAK_SYMBOLS = 27; /** * @stable ICU 2.4 */ public static final int ZWSPACE = 28; /** * @stable ICU 2.6 */ public static final int NEXT_LINE = 29; /*[NL]*/ /* from here on: new in Unicode 4/ICU 2.6 */ /** * @stable ICU 2.6 */ public static final int WORD_JOINER = 30; /*[WJ]*/ /* from here on: new in Unicode 4.1/ICU 3.4 */ /** * @stable ICU 3.4 */ public static final int H2 = 31; /** * @stable ICU 3.4 */ public static final int H3 = 32; /** * @stable ICU 3.4 */ public static final int JL = 33; /** * @stable ICU 3.4 */ public static final int JT = 34; /** * @stable ICU 3.4 */ public static final int JV = 35; /** * @stable ICU 2.4 */ public static final int COUNT = 36; } /** * Numeric Type constants. * @see UProperty#NUMERIC_TYPE * @stable ICU 2.4 */ public static interface NumericType { /** * @stable ICU 2.4 */ public static final int NONE = 0; /** * @stable ICU 2.4 */ public static final int DECIMAL = 1; /** * @stable ICU 2.4 */ public static final int DIGIT = 2; /** * @stable ICU 2.4 */ public static final int NUMERIC = 3; /** * @stable ICU 2.4 */ public static final int COUNT = 4; } /** * Hangul Syllable Type constants. * * @see UProperty#HANGUL_SYLLABLE_TYPE * @stable ICU 2.6 */ public static interface HangulSyllableType { /** * @stable ICU 2.6 */ public static final int NOT_APPLICABLE = 0; /*[NA]*/ /*See note !!*/ /** * @stable ICU 2.6 */ public static final int LEADING_JAMO = 1; /*[L]*/ /** * @stable ICU 2.6 */ public static final int VOWEL_JAMO = 2; /*[V]*/ /** * @stable ICU 2.6 */ public static final int TRAILING_JAMO = 3; /*[T]*/ /** * @stable ICU 2.6 */ public static final int LV_SYLLABLE = 4; /*[LV]*/ /** * @stable ICU 2.6 */ public static final int LVT_SYLLABLE = 5; /*[LVT]*/ /** * @stable ICU 2.6 */ public static final int COUNT = 6; } // public data members ----------------------------------------------- /** * The lowest Unicode code point value. * @stable ICU 2.1 */ public static final int MIN_VALUE = UTF16.CODEPOINT_MIN_VALUE; /** * The highest Unicode code point value (scalar value) according to the * Unicode Standard. * This is a 21-bit value (21 bits, rounded up).
* Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE * @stable ICU 2.1 */ public static final int MAX_VALUE = UTF16.CODEPOINT_MAX_VALUE; /** * The minimum value for Supplementary code points * @stable ICU 2.1 */ public static final int SUPPLEMENTARY_MIN_VALUE = UTF16.SUPPLEMENTARY_MIN_VALUE; /** * Unicode value used when translating into Unicode encoding form and there * is no existing character. * @stable ICU 2.1 */ public static final int REPLACEMENT_CHAR = '\uFFFD'; /** * Special value that is returned by getUnicodeNumericValue(int) when no * numeric value is defined for a code point. * @stable ICU 2.4 * @see #getUnicodeNumericValue */ public static final double NO_NUMERIC_VALUE = -123456789; /** * Compatibility constant for Java Character's MIN_RADIX. * @stable ICU 3.4 */ public static final int MIN_RADIX = java.lang.Character.MIN_RADIX; /** * Compatibility constant for Java Character's MAX_RADIX. * @stable ICU 3.4 */ public static final int MAX_RADIX = java.lang.Character.MAX_RADIX; /** * Do not lowercase non-initial parts of words when titlecasing. * Option bit for titlecasing APIs that take an options bit set. * * By default, titlecasing will titlecase the first cased character * of a word and lowercase all other characters. * With this option, the other characters will not be modified. * * @see #toTitleCase * @stable ICU 3.8 */ public static final int TITLECASE_NO_LOWERCASE = 0x100; /** * Do not adjust the titlecasing indexes from BreakIterator::next() indexes; * titlecase exactly the characters at breaks from the iterator. * Option bit for titlecasing APIs that take an options bit set. * * By default, titlecasing will take each break iterator index, * adjust it by looking for the next cased character, and titlecase that one. * Other characters are lowercased. * * This follows Unicode 4 & 5 section 3.13 Default Case Operations: * * R3 toTitlecase(X): Find the word boundaries based on Unicode Standard Annex * #29, "Text Boundaries." Between each pair of word boundaries, find the first * cased character F. If F exists, map F to default_title(F); then map each * subsequent character C to default_lower(C). * * @see #toTitleCase * @see #TITLECASE_NO_LOWERCASE * @stable ICU 3.8 */ public static final int TITLECASE_NO_BREAK_ADJUSTMENT = 0x200; // public methods ---------------------------------------------------- /** * Retrieves the numeric value of a decimal digit code point. *
This method observes the semantics of * java.lang.Character.digit(). Note that this * will return positive values for code points for which isDigit * returns false, just like java.lang.Character. *
Semantic Change: In release 1.3.1 and * prior, this did not treat the European letters as having a * digit value, and also treated numeric letters and other numbers as * digits. * This has been changed to conform to the java semantics. *
A code point is a valid digit if and only if: *
    *
  • ch is a decimal digit or one of the european letters, and *
  • the value of ch is less than the specified radix. *
* @param ch the code point to query * @param radix the radix * @return the numeric value represented by the code point in the * specified radix, or -1 if the code point is not a decimal digit * or if its value is too large for the radix * @stable ICU 2.1 */ public static int digit(int ch, int radix) { // when ch is out of bounds getProperty == 0 int props = getProperty(ch); int value; if (getNumericType(props) == NumericType.DECIMAL) { value = UCharacterProperty.getUnsignedValue(props); } else { value = getEuropeanDigit(ch); } return (0 <= value && value < radix) ? value : -1; } /** * Retrieves the numeric value of a decimal digit code point. *
This is a convenience overload of digit(int, int) * that provides a decimal radix. *
Semantic Change: In release 1.3.1 and prior, this * treated numeric letters and other numbers as digits. This has * been changed to conform to the java semantics. * @param ch the code point to query * @return the numeric value represented by the code point, * or -1 if the code point is not a decimal digit or if its * value is too large for a decimal radix * @stable ICU 2.1 */ public static int digit(int ch) { int props = getProperty(ch); if (getNumericType(props) == NumericType.DECIMAL) { return UCharacterProperty.getUnsignedValue(props); } else { return -1; } } /** * Returns the numeric value of the code point as a nonnegative * integer. *
If the code point does not have a numeric value, then -1 is returned. *
* If the code point has a numeric value that cannot be represented as a * nonnegative integer (for example, a fractional value), then -2 is * returned. * @param ch the code point to query * @return the numeric value of the code point, or -1 if it has no numeric * value, or -2 if it has a numeric value that cannot be represented as a * nonnegative integer * @stable ICU 2.1 */ public static int getNumericValue(int ch) { // slightly pruned version of getUnicodeNumericValue(), plus getEuropeanDigit() int props = PROPERTY_.getProperty(ch); int numericType = getNumericType(props); if(numericType==0) { return getEuropeanDigit(ch); } if(numericType==UCharacterProperty.NT_FRACTION || numericType>=UCharacterProperty.NT_COUNT) { return -2; } int numericValue = UCharacterProperty.getUnsignedValue(props); if(numericType>LARGE_MANT_SHIFT; exp=numericValue&LARGE_EXP_MASK; if(mant==0) { mant=1; exp+=LARGE_EXP_OFFSET_EXTRA; } else if(mant>9) { return -2; /* reserved mantissa value */ } else { exp+=LARGE_EXP_OFFSET; } if(exp>9) { return -2; } numValue=mant; /* multiply by 10^exp without math.h */ while(exp>=4) { numValue*=10000.; exp-=4; } switch(exp) { case 3: numValue*=1000.; break; case 2: numValue*=100.; break; case 1: numValue*=10.; break; case 0: default: break; } if(numValue<=Integer.MAX_VALUE) { return (int)numValue; } else { return -2; } } } /** *

Get the numeric value for a Unicode code point as defined in the * Unicode Character Database.

*

A "double" return type is necessary because some numeric values are * fractions, negative, or too large for int.

*

For characters without any numeric values in the Unicode Character * Database, this function will return NO_NUMERIC_VALUE.

*

API Change: In release 2.2 and prior, this API has a * return type int and returns -1 when the argument ch does not have a * corresponding numeric value. This has been changed to synch with ICU4C *

* This corresponds to the ICU4C function u_getNumericValue. * @param ch Code point to get the numeric value for. * @return numeric value of ch, or NO_NUMERIC_VALUE if none is defined. * @stable ICU 2.4 */ public static double getUnicodeNumericValue(int ch) { // equivalent to c version double u_getNumericValue(UChar32 c) int props = PROPERTY_.getProperty(ch); int numericType = getNumericType(props); if(numericType==0 || numericType>=UCharacterProperty.NT_COUNT) { return NO_NUMERIC_VALUE; } int numericValue = UCharacterProperty.getUnsignedValue(props); if(numericType>FRACTION_NUM_SHIFT; denominator=(numericValue&FRACTION_DEN_MASK)+FRACTION_DEN_OFFSET; if(numerator==0) { numerator=-1; } return (double)numerator/(double)denominator; } else /* numericType==NT_LARGE */ { /* large value with exponent */ double numValue; int mant, exp; mant=numericValue>>LARGE_MANT_SHIFT; exp=numericValue&LARGE_EXP_MASK; if(mant==0) { mant=1; exp+=LARGE_EXP_OFFSET_EXTRA; } else if(mant>9) { return NO_NUMERIC_VALUE; /* reserved mantissa value */ } else { exp+=LARGE_EXP_OFFSET; } numValue=mant; /* multiply by 10^exp without math.h */ while(exp>=4) { numValue*=10000.; exp-=4; } switch(exp) { case 3: numValue*=1000.; break; case 2: numValue*=100.; break; case 1: numValue*=10.; break; case 0: default: break; } return numValue; } } /** * Compatibility override of Java deprecated method. This * method will always remain deprecated. Delegates to * java.lang.Character.isSpace. * @param ch the code point * @return true if the code point is a space character as * defined by java.lang.Character.isSpace. * @deprecated ICU 3.4 (Java) */ public static boolean isSpace(int ch) { return ch <= 0x20 && (ch == 0x20 || ch == 0x09 || ch == 0x0a || ch == 0x0c || ch == 0x0d); } /** * Returns a value indicating a code point's Unicode category. * Up-to-date Unicode implementation of java.lang.Character.getType() * except for the above mentioned code points that had their category * changed.
* Return results are constants from the interface * UCharacterCategory
* NOTE: the UCharacterCategory values are not compatible with * those returned by java.lang.Character.getType. UCharacterCategory values * match the ones used in ICU4C, while java.lang.Character type * values, though similar, skip the value 17.

* @param ch code point whose type is to be determined * @return category which is a value of UCharacterCategory * @stable ICU 2.1 */ public static int getType(int ch) { return getProperty(ch) & UCharacterProperty.TYPE_MASK; } /** * Determines if a code point has a defined meaning in the up-to-date * Unicode standard. * E.g. supplementary code points though allocated space are not defined in * Unicode yet.
* Up-to-date Unicode implementation of java.lang.Character.isDefined() * @param ch code point to be determined if it is defined in the most * current version of Unicode * @return true if this code point is defined in unicode * @stable ICU 2.1 */ public static boolean isDefined(int ch) { return getType(ch) != 0; } /** * Determines if a code point is a Java digit. *
This method observes the semantics of * java.lang.Character.isDigit(). It returns true for decimal * digits only. *
Semantic Change: In release 1.3.1 and prior, this treated * numeric letters and other numbers as digits. * This has been changed to conform to the java semantics. * @param ch code point to query * @return true if this code point is a digit * @stable ICU 2.1 */ public static boolean isDigit(int ch) { return getType(ch) == UCharacterCategory.DECIMAL_DIGIT_NUMBER; } /** * Determines if the specified code point is an ISO control character. * A code point is considered to be an ISO control character if it is in * the range \u0000 through \u001F or in the range \u007F through * \u009F.
* Up-to-date Unicode implementation of java.lang.Character.isISOControl() * @param ch code point to determine if it is an ISO control character * @return true if code point is a ISO control character * @stable ICU 2.1 */ public static boolean isISOControl(int ch) { return ch >= 0 && ch <= APPLICATION_PROGRAM_COMMAND_ && ((ch <= UNIT_SEPARATOR_) || (ch >= DELETE_)); } /** * Determines if the specified code point is a letter. * Up-to-date Unicode implementation of java.lang.Character.isLetter() * @param ch code point to determine if it is a letter * @return true if code point is a letter * @stable ICU 2.1 */ public static boolean isLetter(int ch) { // if props == 0, it will just fall through and return false return ((1 << getType(ch)) & ((1 << UCharacterCategory.UPPERCASE_LETTER) | (1 << UCharacterCategory.LOWERCASE_LETTER) | (1 << UCharacterCategory.TITLECASE_LETTER) | (1 << UCharacterCategory.MODIFIER_LETTER) | (1 << UCharacterCategory.OTHER_LETTER))) != 0; } /** * Determines if the specified code point is a letter or digit. * Note this method, unlike java.lang.Character does not regard the ascii * characters 'A' - 'Z' and 'a' - 'z' as digits. * @param ch code point to determine if it is a letter or a digit * @return true if code point is a letter or a digit * @stable ICU 2.1 */ public static boolean isLetterOrDigit(int ch) { return ((1 << getType(ch)) & ((1 << UCharacterCategory.UPPERCASE_LETTER) | (1 << UCharacterCategory.LOWERCASE_LETTER) | (1 << UCharacterCategory.TITLECASE_LETTER) | (1 << UCharacterCategory.MODIFIER_LETTER) | (1 << UCharacterCategory.OTHER_LETTER) | (1 << UCharacterCategory.DECIMAL_DIGIT_NUMBER))) != 0; } /** * Compatibility override of Java deprecated method. This * method will always remain deprecated. Delegates to * java.lang.Character.isJavaIdentifierStart. * @param cp the code point * @return true if the code point can start a java identifier. * @deprecated ICU 3.4 (Java) */ public static boolean isJavaLetter(int cp) { return isJavaIdentifierStart(cp); } /** * Compatibility override of Java deprecated method. This * method will always remain deprecated. Delegates to * java.lang.Character.isJavaIdentifierPart. * @param cp the code point * @return true if the code point can continue a java identifier. * @deprecated ICU 3.4 (Java) */ public static boolean isJavaLetterOrDigit(int cp) { return isJavaIdentifierPart(cp); } /** * Compatibility override of Java method, delegates to * java.lang.Character.isJavaIdentifierStart. * @param cp the code point * @return true if the code point can start a java identifier. * @stable ICU 3.4 */ public static boolean isJavaIdentifierStart(int cp) { // note, downcast to char for jdk 1.4 compatibility return java.lang.Character.isJavaIdentifierStart((char)cp); } /** * Compatibility override of Java method, delegates to * java.lang.Character.isJavaIdentifierPart. * @param cp the code point * @return true if the code point can continue a java identifier. * @stable ICU 3.4 */ public static boolean isJavaIdentifierPart(int cp) { // note, downcast to char for jdk 1.4 compatibility return java.lang.Character.isJavaIdentifierPart((char)cp); } /** * Determines if the specified code point is a lowercase character. * UnicodeData only contains case mappings for code points where they are * one-to-one mappings; it also omits information about context-sensitive * case mappings.
For more information about Unicode case mapping * please refer to the * Technical report * #21.
* Up-to-date Unicode implementation of java.lang.Character.isLowerCase() * @param ch code point to determine if it is in lowercase * @return true if code point is a lowercase character * @stable ICU 2.1 */ public static boolean isLowerCase(int ch) { // if props == 0, it will just fall through and return false return getType(ch) == UCharacterCategory.LOWERCASE_LETTER; } /** * Determines if the specified code point is a white space character. * A code point is considered to be an whitespace character if and only * if it satisfies one of the following criteria: *
    *
  • It is a Unicode space character (categories "Zs" or "Zl" or "Zp"), but is not * also a no-break space (\u00A0 or \u2007 or \u202F). *
  • It is \u0009, HORIZONTAL TABULATION. *
  • It is \u000A, LINE FEED. *
  • It is \u000B, VERTICAL TABULATION. *
  • It is \u000C, FORM FEED. *
  • It is \u000D, CARRIAGE RETURN. *
  • It is \u001C, FILE SEPARATOR. *
  • It is \u001D, GROUP SEPARATOR. *
  • It is \u001E, RECORD SEPARATOR. *
  • It is \u001F, UNIT SEPARATOR. *
* * This API tries to synch to the semantics of the Java API, * java.lang.Character.isWhitespace(), but it may not return * the exactly same results because of the Unicode version * difference. * @param ch code point to determine if it is a white space * @return true if the specified code point is a white space character * @stable ICU 2.1 */ public static boolean isWhitespace(int ch) { // exclude no-break spaces // if props == 0, it will just fall through and return false return ((1 << getType(ch)) & ((1 << UCharacterCategory.SPACE_SEPARATOR) | (1 << UCharacterCategory.LINE_SEPARATOR) | (1 << UCharacterCategory.PARAGRAPH_SEPARATOR))) != 0 && (ch != NO_BREAK_SPACE_) && (ch != FIGURE_SPACE_) && (ch != NARROW_NO_BREAK_SPACE_) // TAB VT LF FF CR FS GS RS US NL are all control characters // that are white spaces. || (ch >= 0x9 && ch <= 0xd) || (ch >= 0x1c && ch <= 0x1f); } /** * Determines if the specified code point is a Unicode specified space * character, i.e. if code point is in the category Zs, Zl and Zp. * Up-to-date Unicode implementation of java.lang.Character.isSpaceChar(). * @param ch code point to determine if it is a space * @return true if the specified code point is a space character * @stable ICU 2.1 */ public static boolean isSpaceChar(int ch) { // if props == 0, it will just fall through and return false return ((1 << getType(ch)) & ((1 << UCharacterCategory.SPACE_SEPARATOR) | (1 << UCharacterCategory.LINE_SEPARATOR) | (1 << UCharacterCategory.PARAGRAPH_SEPARATOR))) != 0; } /** * Determines if the specified code point is a titlecase character. * UnicodeData only contains case mappings for code points where they are * one-to-one mappings; it also omits information about context-sensitive * case mappings.
* For more information about Unicode case mapping please refer to the * * Technical report #21.
* Up-to-date Unicode implementation of java.lang.Character.isTitleCase(). * @param ch code point to determine if it is in title case * @return true if the specified code point is a titlecase character * @stable ICU 2.1 */ public static boolean isTitleCase(int ch) { // if props == 0, it will just fall through and return false return getType(ch) == UCharacterCategory.TITLECASE_LETTER; } /** * Determines if the specified code point may be any part of a Unicode * identifier other than the starting character. * A code point may be part of a Unicode identifier if and only if it is * one of the following: *
    *
  • Lu Uppercase letter *
  • Ll Lowercase letter *
  • Lt Titlecase letter *
  • Lm Modifier letter *
  • Lo Other letter *
  • Nl Letter number *
  • Pc Connecting punctuation character *
  • Nd decimal number *
  • Mc Spacing combining mark *
  • Mn Non-spacing mark *
  • Cf formatting code *
* Up-to-date Unicode implementation of * java.lang.Character.isUnicodeIdentifierPart().
* See UTR #8. * @param ch code point to determine if is can be part of a Unicode * identifier * @return true if code point is any character belonging a unicode * identifier suffix after the first character * @stable ICU 2.1 */ public static boolean isUnicodeIdentifierPart(int ch) { // if props == 0, it will just fall through and return false // cat == format return ((1 << getType(ch)) & ((1 << UCharacterCategory.UPPERCASE_LETTER) | (1 << UCharacterCategory.LOWERCASE_LETTER) | (1 << UCharacterCategory.TITLECASE_LETTER) | (1 << UCharacterCategory.MODIFIER_LETTER) | (1 << UCharacterCategory.OTHER_LETTER) | (1 << UCharacterCategory.LETTER_NUMBER) | (1 << UCharacterCategory.CONNECTOR_PUNCTUATION) | (1 << UCharacterCategory.DECIMAL_DIGIT_NUMBER) | (1 << UCharacterCategory.COMBINING_SPACING_MARK) | (1 << UCharacterCategory.NON_SPACING_MARK))) != 0 || isIdentifierIgnorable(ch); } /** * Determines if the specified code point is permissible as the first * character in a Unicode identifier. * A code point may start a Unicode identifier if it is of type either *
    *
  • Lu Uppercase letter *
  • Ll Lowercase letter *
  • Lt Titlecase letter *
  • Lm Modifier letter *
  • Lo Other letter *
  • Nl Letter number *
* Up-to-date Unicode implementation of * java.lang.Character.isUnicodeIdentifierStart().
* See UTR #8. * @param ch code point to determine if it can start a Unicode identifier * @return true if code point is the first character belonging a unicode * identifier * @stable ICU 2.1 */ public static boolean isUnicodeIdentifierStart(int ch) { /*int cat = getType(ch);*/ // if props == 0, it will just fall through and return false return ((1 << getType(ch)) & ((1 << UCharacterCategory.UPPERCASE_LETTER) | (1 << UCharacterCategory.LOWERCASE_LETTER) | (1 << UCharacterCategory.TITLECASE_LETTER) | (1 << UCharacterCategory.MODIFIER_LETTER) | (1 << UCharacterCategory.OTHER_LETTER) | (1 << UCharacterCategory.LETTER_NUMBER))) != 0; } /** * Determines if the specified code point should be regarded as an * ignorable character in a Unicode identifier. * A character is ignorable in the Unicode standard if it is of the type * Cf, Formatting code.
* Up-to-date Unicode implementation of * java.lang.Character.isIdentifierIgnorable().
* See UTR #8. * @param ch code point to be determined if it can be ignored in a Unicode * identifier. * @return true if the code point is ignorable * @stable ICU 2.1 */ public static boolean isIdentifierIgnorable(int ch) { // see java.lang.Character.isIdentifierIgnorable() on range of // ignorable characters. if (ch <= 0x9f) { return isISOControl(ch) && !((ch >= 0x9 && ch <= 0xd) || (ch >= 0x1c && ch <= 0x1f)); } return getType(ch) == UCharacterCategory.FORMAT; } /** * Determines if the specified code point is an uppercase character. * UnicodeData only contains case mappings for code point where they are * one-to-one mappings; it also omits information about context-sensitive * case mappings.
* For language specific case conversion behavior, use * toUpperCase(locale, str).
* For example, the case conversion for dot-less i and dotted I in Turkish, * or for final sigma in Greek. * For more information about Unicode case mapping please refer to the * * Technical report #21.
* Up-to-date Unicode implementation of java.lang.Character.isUpperCase(). * @param ch code point to determine if it is in uppercase * @return true if the code point is an uppercase character * @stable ICU 2.1 */ public static boolean isUpperCase(int ch) { // if props == 0, it will just fall through and return false return getType(ch) == UCharacterCategory.UPPERCASE_LETTER; } /** * The given code point is mapped to its lowercase equivalent; if the code * point has no lowercase equivalent, the code point itself is returned. * Up-to-date Unicode implementation of java.lang.Character.toLowerCase() * *

This function only returns the simple, single-code point case mapping. * Full case mappings should be used whenever possible because they produce * better results by working on whole strings. * They take into account the string context and the language and can map * to a result string with a different length as appropriate. * Full case mappings are applied by the case mapping functions * that take String parameters rather than code points (int). * See also the User Guide chapter on C/POSIX migration: * http://www.icu-project.org/userguide/posix.html#case_mappings * * @param ch code point whose lowercase equivalent is to be retrieved * @return the lowercase equivalent code point * @stable ICU 2.1 */ public static int toLowerCase(int ch) { return gCsp.tolower(ch); } /** * Converts argument code point and returns a String object representing * the code point's value in UTF16 format. * The result is a string whose length is 1 for non-supplementary code * points, 2 otherwise.
* com.ibm.ibm.icu.UTF16 can be used to parse Strings generated by this * function.
* Up-to-date Unicode implementation of java.lang.Character.toString() * @param ch code point * @return string representation of the code point, null if code point is not * defined in unicode * @stable ICU 2.1 */ public static String toString(int ch) { if (ch < MIN_VALUE || ch > MAX_VALUE) { return null; } if (ch < SUPPLEMENTARY_MIN_VALUE) { return String.valueOf((char)ch); } StringBuffer result = new StringBuffer(); result.append(UTF16.getLeadSurrogate(ch)); result.append(UTF16.getTrailSurrogate(ch)); return result.toString(); } /** * Converts the code point argument to titlecase. * If no titlecase is available, the uppercase is returned. If no uppercase * is available, the code point itself is returned. * Up-to-date Unicode implementation of java.lang.Character.toTitleCase() * *

This function only returns the simple, single-code point case mapping. * Full case mappings should be used whenever possible because they produce * better results by working on whole strings. * They take into account the string context and the language and can map * to a result string with a different length as appropriate. * Full case mappings are applied by the case mapping functions * that take String parameters rather than code points (int). * See also the User Guide chapter on C/POSIX migration: * http://www.icu-project.org/userguide/posix.html#case_mappings * * @param ch code point whose title case is to be retrieved * @return titlecase code point * @stable ICU 2.1 */ public static int toTitleCase(int ch) { return gCsp.totitle(ch); } /** * Converts the character argument to uppercase. * If no uppercase is available, the character itself is returned. * Up-to-date Unicode implementation of java.lang.Character.toUpperCase() * *

This function only returns the simple, single-code point case mapping. * Full case mappings should be used whenever possible because they produce * better results by working on whole strings. * They take into account the string context and the language and can map * to a result string with a different length as appropriate. * Full case mappings are applied by the case mapping functions * that take String parameters rather than code points (int). * See also the User Guide chapter on C/POSIX migration: * http://www.icu-project.org/userguide/posix.html#case_mappings * * @param ch code point whose uppercase is to be retrieved * @return uppercase code point * @stable ICU 2.1 */ public static int toUpperCase(int ch) { return gCsp.toupper(ch); } // extra methods not in java.lang.Character -------------------------- /** * Determines if the code point is a supplementary character. * A code point is a supplementary character if and only if it is greater * than SUPPLEMENTARY_MIN_VALUE * @param ch code point to be determined if it is in the supplementary * plane * @return true if code point is a supplementary character * @stable ICU 2.1 */ public static boolean isSupplementary(int ch) { return ch >= UCharacter.SUPPLEMENTARY_MIN_VALUE && ch <= UCharacter.MAX_VALUE; } /** * Determines if the code point is in the BMP plane. * @param ch code point to be determined if it is not a supplementary * character * @return true if code point is not a supplementary character * @stable ICU 2.1 */ public static boolean isBMP(int ch) { return (ch >= 0 && ch <= LAST_CHAR_MASK_); } /** * Determines whether the specified code point is a printable character * according to the Unicode standard. * @param ch code point to be determined if it is printable * @return true if the code point is a printable character * @stable ICU 2.1 */ public static boolean isPrintable(int ch) { int cat = getType(ch); // if props == 0, it will just fall through and return false return (cat != UCharacterCategory.UNASSIGNED && cat != UCharacterCategory.CONTROL && cat != UCharacterCategory.FORMAT && cat != UCharacterCategory.PRIVATE_USE && cat != UCharacterCategory.SURROGATE && cat != UCharacterCategory.GENERAL_OTHER_TYPES); } /** * Determines whether the specified code point is of base form. * A code point of base form does not graphically combine with preceding * characters, and is neither a control nor a format character. * @param ch code point to be determined if it is of base form * @return true if the code point is of base form * @stable ICU 2.1 */ public static boolean isBaseForm(int ch) { int cat = getType(ch); // if props == 0, it will just fall through and return false return cat == UCharacterCategory.DECIMAL_DIGIT_NUMBER || cat == UCharacterCategory.OTHER_NUMBER || cat == UCharacterCategory.LETTER_NUMBER || cat == UCharacterCategory.UPPERCASE_LETTER || cat == UCharacterCategory.LOWERCASE_LETTER || cat == UCharacterCategory.TITLECASE_LETTER || cat == UCharacterCategory.MODIFIER_LETTER || cat == UCharacterCategory.OTHER_LETTER || cat == UCharacterCategory.NON_SPACING_MARK || cat == UCharacterCategory.ENCLOSING_MARK || cat == UCharacterCategory.COMBINING_SPACING_MARK; } /** * Returns the Bidirection property of a code point. * For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional * property.
* Result returned belongs to the interface * UCharacterDirection * @param ch the code point to be determined its direction * @return direction constant from UCharacterDirection. * @stable ICU 2.1 */ public static int getDirection(int ch) { return gBdp.getClass(ch); } /** * Determines whether the code point has the "mirrored" property. * This property is set for characters that are commonly used in * Right-To-Left contexts and need to be displayed with a "mirrored" * glyph. * @param ch code point whose mirror is to be determined * @return true if the code point has the "mirrored" property * @stable ICU 2.1 */ public static boolean isMirrored(int ch) { return gBdp.isMirrored(ch); } /** * Maps the specified code point to a "mirror-image" code point. * For code points with the "mirrored" property, implementations sometimes * need a "poor man's" mapping to another code point such that the default * glyph may serve as the mirror-image of the default glyph of the * specified code point.
* This is useful for text conversion to and from codepages with visual * order, and for displays without glyph selection capabilities. * @param ch code point whose mirror is to be retrieved * @return another code point that may serve as a mirror-image substitute, * or ch itself if there is no such mapping or ch does not have the * "mirrored" property * @stable ICU 2.1 */ public static int getMirror(int ch) { return gBdp.getMirror(ch); } /** * Gets the combining class of the argument codepoint * @param ch code point whose combining is to be retrieved * @return the combining class of the codepoint * @stable ICU 2.1 */ public static int getCombiningClass(int ch) { if (ch < MIN_VALUE || ch > MAX_VALUE) { throw new IllegalArgumentException("Codepoint out of bounds"); } return NormalizerImpl.getCombiningClass(ch); } /** * A code point is illegal if and only if *

    *
  • Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE *
  • A surrogate value, 0xD800 to 0xDFFF *
  • Not-a-character, having the form 0x xxFFFF or 0x xxFFFE *
* Note: legal does not mean that it is assigned in this version of Unicode. * @param ch code point to determine if it is a legal code point by itself * @return true if and only if legal. * @stable ICU 2.1 */ public static boolean isLegal(int ch) { if (ch < MIN_VALUE) { return false; } if (ch < UTF16.SURROGATE_MIN_VALUE) { return true; } if (ch <= UTF16.SURROGATE_MAX_VALUE) { return false; } if (UCharacterUtility.isNonCharacter(ch)) { return false; } return (ch <= MAX_VALUE); } /** * A string is legal iff all its code points are legal. * A code point is illegal if and only if *
    *
  • Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE *
  • A surrogate value, 0xD800 to 0xDFFF *
  • Not-a-character, having the form 0x xxFFFF or 0x xxFFFE *
* Note: legal does not mean that it is assigned in this version of Unicode. * @param str containing code points to examin * @return true if and only if legal. * @stable ICU 2.1 */ public static boolean isLegal(String str) { int size = str.length(); int codepoint; for (int i = 0; i < size; i ++) { codepoint = UTF16.charAt(str, i); if (!isLegal(codepoint)) { return false; } if (isSupplementary(codepoint)) { i ++; } } return true; } /** * Gets the version of Unicode data used. * @return the unicode version number used * @stable ICU 2.1 */ public static VersionInfo getUnicodeVersion() { return PROPERTY_.m_unicodeVersion_; } /** * Retrieve the most current Unicode name of the argument code point, or * null if the character is unassigned or outside the range * UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name. *
* Note calling any methods related to code point names, e.g. get*Name*() * incurs a one-time initialisation cost to construct the name tables. * @param ch the code point for which to get the name * @return most current Unicode name * @stable ICU 2.1 */ public static String getName(int ch) { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return NAME_.getName(ch, UCharacterNameChoice.UNICODE_CHAR_NAME); } /** * Gets the names for each of the characters in a string * @param s string to format * @param separator string to go between names * @return string of names * @stable ICU 3.8 */ public static String getName(String s, String separator) { if (s.length() == 1) { // handle common case return getName(s.charAt(0)); } int cp; StringBuffer sb = new StringBuffer(); for (int i = 0; i < s.length(); i += UTF16.getCharCount(cp)) { cp = UTF16.charAt(s,i); if (i != 0) sb.append(separator); sb.append(UCharacter.getName(cp)); } return sb.toString(); } /** * Retrieve the earlier version 1.0 Unicode name of the argument code * point, or null if the character is unassigned or outside the range * UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name. *
* Note calling any methods related to code point names, e.g. get*Name*() * incurs a one-time initialisation cost to construct the name tables. * @param ch the code point for which to get the name * @return version 1.0 Unicode name * @stable ICU 2.1 */ public static String getName1_0(int ch) { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return NAME_.getName(ch, UCharacterNameChoice.UNICODE_10_CHAR_NAME); } /** *

Retrieves a name for a valid codepoint. Unlike, getName(int) and * getName1_0(int), this method will return a name even for codepoints that * are not assigned a name in UnicodeData.txt. *

* The names are returned in the following order. *
    *
  • Most current Unicode name if there is any *
  • Unicode 1.0 name if there is any *
  • Extended name in the form of * "". E.g. *
* Note calling any methods related to code point names, e.g. get*Name*() * incurs a one-time initialisation cost to construct the name tables. * @param ch the code point for which to get the name * @return a name for the argument codepoint * @stable ICU 2.6 */ public static String getExtendedName(int ch) { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return NAME_.getName(ch, UCharacterNameChoice.EXTENDED_CHAR_NAME); } /** * Get the ISO 10646 comment for a character. * The ISO 10646 comment is an informative field in the Unicode Character * Database (UnicodeData.txt field 11) and is from the ISO 10646 names list. * @param ch The code point for which to get the ISO comment. * It must be 0<=c<=0x10ffff. * @return The ISO comment, or null if there is no comment for this * character. * @stable ICU 2.4 */ public static String getISOComment(int ch) { if (ch < UCharacter.MIN_VALUE || ch > UCharacter.MAX_VALUE) { return null; } if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } String result = NAME_.getGroupName(ch, UCharacterNameChoice.ISO_COMMENT_); return result; } /** *

Find a Unicode code point by its most current Unicode name and * return its code point value. All Unicode names are in uppercase.

* Note calling any methods related to code point names, e.g. get*Name*() * incurs a one-time initialisation cost to construct the name tables. * @param name most current Unicode character name whose code point is to * be returned * @return code point or -1 if name is not found * @stable ICU 2.1 */ public static int getCharFromName(String name) { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return NAME_.getCharFromName( UCharacterNameChoice.UNICODE_CHAR_NAME, name); } /** *

Find a Unicode character by its version 1.0 Unicode name and return * its code point value. All Unicode names are in uppercase.

* Note calling any methods related to code point names, e.g. get*Name*() * incurs a one-time initialisation cost to construct the name tables. * @param name Unicode 1.0 code point name whose code point is to * returned * @return code point or -1 if name is not found * @stable ICU 2.1 */ public static int getCharFromName1_0(String name) { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return NAME_.getCharFromName( UCharacterNameChoice.UNICODE_10_CHAR_NAME, name); } /** *

Find a Unicode character by either its name and return its code * point value. All Unicode names are in uppercase. * Extended names are all lowercase except for numbers and are contained * within angle brackets.

* The names are searched in the following order *
    *
  • Most current Unicode name if there is any *
  • Unicode 1.0 name if there is any *
  • Extended name in the form of * "". E.g. *
* Note calling any methods related to code point names, e.g. get*Name*() * incurs a one-time initialisation cost to construct the name tables. * @param name codepoint name * @return code point associated with the name or -1 if the name is not * found. * @stable ICU 2.6 */ public static int getCharFromExtendedName(String name) { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return NAME_.getCharFromName( UCharacterNameChoice.EXTENDED_CHAR_NAME, name); } /** * Return the Unicode name for a given property, as given in the * Unicode database file PropertyAliases.txt. Most properties * have more than one name. The nameChoice determines which one * is returned. * * In addition, this function maps the property * UProperty.GENERAL_CATEGORY_MASK to the synthetic names "gcm" / * "General_Category_Mask". These names are not in * PropertyAliases.txt. * * @param property UProperty selector. * * @param nameChoice UProperty.NameChoice selector for which name * to get. All properties have a long name. Most have a short * name, but some do not. Unicode allows for additional names; if * present these will be returned by UProperty.NameChoice.LONG + i, * where i=1, 2,... * * @return a name, or null if Unicode explicitly defines no name * ("n/a") for a given property/nameChoice. If a given nameChoice * throws an exception, then all larger values of nameChoice will * throw an exception. If null is returned for a given * nameChoice, then other nameChoice values may return non-null * results. * * @exception IllegalArgumentException thrown if property or * nameChoice are invalid. * * @see UProperty * @see UProperty.NameChoice * @stable ICU 2.4 */ public static String getPropertyName(int property, int nameChoice) { return PNAMES_.getPropertyName(property, nameChoice); } /** * Return the UProperty selector for a given property name, as * specified in the Unicode database file PropertyAliases.txt. * Short, long, and any other variants are recognized. * * In addition, this function maps the synthetic names "gcm" / * "General_Category_Mask" to the property * UProperty.GENERAL_CATEGORY_MASK. These names are not in * PropertyAliases.txt. * * @param propertyAlias the property name to be matched. The name * is compared using "loose matching" as described in * PropertyAliases.txt. * * @return a UProperty enum. * * @exception IllegalArgumentException thrown if propertyAlias * is not recognized. * * @see UProperty * @stable ICU 2.4 */ public static int getPropertyEnum(String propertyAlias) { return PNAMES_.getPropertyEnum(propertyAlias); } /** * Return the Unicode name for a given property value, as given in * the Unicode database file PropertyValueAliases.txt. Most * values have more than one name. The nameChoice determines * which one is returned. * * Note: Some of the names in PropertyValueAliases.txt can only be * retrieved using UProperty.GENERAL_CATEGORY_MASK, not * UProperty.GENERAL_CATEGORY. These include: "C" / "Other", "L" / * "Letter", "LC" / "Cased_Letter", "M" / "Mark", "N" / "Number", "P" * / "Punctuation", "S" / "Symbol", and "Z" / "Separator". * * @param property UProperty selector constant. * UProperty.INT_START <= property < UProperty.INT_LIMIT or * UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or * UProperty.MASK_START < = property < UProperty.MASK_LIMIT. * If out of range, null is returned. * * @param value selector for a value for the given property. In * general, valid values range from 0 up to some maximum. There * are a few exceptions: (1.) UProperty.BLOCK values begin at the * non-zero value BASIC_LATIN.getID(). (2.) * UProperty.CANONICAL_COMBINING_CLASS values are not contiguous * and range from 0..240. (3.) UProperty.GENERAL_CATEGORY_MASK values * are mask values produced by left-shifting 1 by * UCharacter.getType(). This allows grouped categories such as * [:L:] to be represented. Mask values are non-contiguous. * * @param nameChoice UProperty.NameChoice selector for which name * to get. All values have a long name. Most have a short name, * but some do not. Unicode allows for additional names; if * present these will be returned by UProperty.NameChoice.LONG + i, * where i=1, 2,... * * @return a name, or null if Unicode explicitly defines no name * ("n/a") for a given property/value/nameChoice. If a given * nameChoice throws an exception, then all larger values of * nameChoice will throw an exception. If null is returned for a * given nameChoice, then other nameChoice values may return * non-null results. * * @exception IllegalArgumentException thrown if property, value, * or nameChoice are invalid. * * @see UProperty * @see UProperty.NameChoice * @stable ICU 2.4 */ public static String getPropertyValueName(int property, int value, int nameChoice) { if ((property == UProperty.CANONICAL_COMBINING_CLASS || property == UProperty.LEAD_CANONICAL_COMBINING_CLASS || property == UProperty.TRAIL_CANONICAL_COMBINING_CLASS) && value >= UCharacter.getIntPropertyMinValue( UProperty.CANONICAL_COMBINING_CLASS) && value <= UCharacter.getIntPropertyMaxValue( UProperty.CANONICAL_COMBINING_CLASS) && nameChoice >= 0 && nameChoice < UProperty.NameChoice.COUNT) { // this is hard coded for the valid cc // because PropertyValueAliases.txt does not contain all of them try { return PNAMES_.getPropertyValueName(property, value, nameChoice); } catch (IllegalArgumentException e) { return null; } } return PNAMES_.getPropertyValueName(property, value, nameChoice); } /** * Return the property value integer for a given value name, as * specified in the Unicode database file PropertyValueAliases.txt. * Short, long, and any other variants are recognized. * * Note: Some of the names in PropertyValueAliases.txt will only be * recognized with UProperty.GENERAL_CATEGORY_MASK, not * UProperty.GENERAL_CATEGORY. These include: "C" / "Other", "L" / * "Letter", "LC" / "Cased_Letter", "M" / "Mark", "N" / "Number", "P" * / "Punctuation", "S" / "Symbol", and "Z" / "Separator". * * @param property UProperty selector constant. * UProperty.INT_START <= property < UProperty.INT_LIMIT or * UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or * UProperty.MASK_START < = property < UProperty.MASK_LIMIT. * Only these properties can be enumerated. * * @param valueAlias the value name to be matched. The name is * compared using "loose matching" as described in * PropertyValueAliases.txt. * * @return a value integer. Note: UProperty.GENERAL_CATEGORY * values are mask values produced by left-shifting 1 by * UCharacter.getType(). This allows grouped categories such as * [:L:] to be represented. * * @see UProperty * @throws IllegalArgumentException if property is not a valid UProperty * selector * @stable ICU 2.4 */ public static int getPropertyValueEnum(int property, String valueAlias) { return PNAMES_.getPropertyValueEnum(property, valueAlias); } /** * Returns a code point corresponding to the two UTF16 characters. * @param lead the lead char * @param trail the trail char * @return code point if surrogate characters are valid. * @exception IllegalArgumentException thrown when argument characters do * not form a valid codepoint * @stable ICU 2.1 */ public static int getCodePoint(char lead, char trail) { if (UTF16.isLeadSurrogate(lead) && UTF16.isTrailSurrogate(trail)) { return UCharacterProperty.getRawSupplementary(lead, trail); } throw new IllegalArgumentException("Illegal surrogate characters"); } /** * Returns the code point corresponding to the UTF16 character. * @param char16 the UTF16 character * @return code point if argument is a valid character. * @exception IllegalArgumentException thrown when char16 is not a valid * codepoint * @stable ICU 2.1 */ public static int getCodePoint(char char16) { if (UCharacter.isLegal(char16)) { return char16; } throw new IllegalArgumentException("Illegal codepoint"); } /** * Implementation of UCaseProps.ContextIterator, iterates over a String. * See ustrcase.c/utf16_caseContextIterator(). */ private static class StringContextIterator implements UCaseProps.ContextIterator { /** * Constructor. * @param s String to iterate over. */ StringContextIterator(String s) { this.s=s; limit=s.length(); cpStart=cpLimit=index=0; dir=0; } /** * Set the iteration limit for nextCaseMapCP() to an index within the string. * If the limit parameter is negative or past the string, then the * string length is restored as the iteration limit. * * This limit does not affect the next() function which always * iterates to the very end of the string. * * @param lim The iteration limit. */ public void setLimit(int lim) { if(0<=lim && lim<=s.length()) { limit=lim; } else { limit=s.length(); } } /** * Move to the iteration limit without fetching code points up to there. */ public void moveToLimit() { cpStart=cpLimit=limit; } /** * Iterate forward through the string to fetch the next code point * to be case-mapped, and set the context indexes for it. * Performance optimization, to save on function calls and redundant * tests. Combines UTF16.charAt(), UTF16.getCharCount(), and setIndex(). * * When the iteration limit is reached (and -1 is returned), * getCPStart() will be at the iteration limit. * * Iteration with next() does not affect the position for nextCaseMapCP(). * * @return The next code point to be case-mapped, or <0 when the iteration is done. */ public int nextCaseMapCP() { cpStart=cpLimit; if(cpLimit0) { /* reset for forward iteration */ dir=1; index=cpLimit; } else if(direction<0) { /* reset for backward iteration */ dir=-1; index=cpStart; } else { // not a valid direction dir=0; index=0; } } public int next() { int c; if(dir>0 && index0) { c=UTF16.charAt(s, index-1); index-=UTF16.getCharCount(c); return c; } return -1; } // variables protected String s; protected int index, limit, cpStart, cpLimit; protected int dir; // 0=initial state >0=forward <0=backward } /** * Gets uppercase version of the argument string. * Casing is dependent on the default locale and context-sensitive. * @param str source string to be performed on * @return uppercase version of the argument string * @stable ICU 2.1 */ public static String toUpperCase(String str) { return toUpperCase(ULocale.getDefault(), str); } /** * Gets lowercase version of the argument string. * Casing is dependent on the default locale and context-sensitive * @param str source string to be performed on * @return lowercase version of the argument string * @stable ICU 2.1 */ public static String toLowerCase(String str) { return toLowerCase(ULocale.getDefault(), str); } /** *

Gets the titlecase version of the argument string.

*

Position for titlecasing is determined by the argument break * iterator, hence the user can customize his break iterator for * a specialized titlecasing. In this case only the forward iteration * needs to be implemented. * If the break iterator passed in is null, the default Unicode algorithm * will be used to determine the titlecase positions. *

*

Only positions returned by the break iterator will be title cased, * character in between the positions will all be in lower case.

*

Casing is dependent on the default locale and context-sensitive

* @param str source string to be performed on * @param breakiter break iterator to determine the positions in which * the character should be title cased. * @return lowercase version of the argument string * @stable ICU 2.6 */ public static String toTitleCase(String str, BreakIterator breakiter) { return toTitleCase(ULocale.getDefault(), str, breakiter); } /** * Gets uppercase version of the argument string. * Casing is dependent on the argument locale and context-sensitive. * @param locale which string is to be converted in * @param str source string to be performed on * @return uppercase version of the argument string * @stable ICU 2.1 */ public static String toUpperCase(Locale locale, String str) { return toUpperCase(ULocale.forLocale(locale), str); } /** * Gets uppercase version of the argument string. * Casing is dependent on the argument locale and context-sensitive. * @param locale which string is to be converted in * @param str source string to be performed on * @return uppercase version of the argument string * @stable ICU 3.2 */ public static String toUpperCase(ULocale locale, String str) { StringContextIterator iter = new StringContextIterator(str); StringBuffer result = new StringBuffer(str.length()); int[] locCache = new int[1]; int c; if (locale == null) { locale = ULocale.getDefault(); } locCache[0]=0; while((c=iter.nextCaseMapCP())>=0) { c=gCsp.toFullUpper(c, iter, result, locale, locCache); /* decode the result */ if(c<0) { /* (not) original code point */ c=~c; } else if(c<=UCaseProps.MAX_STRING_LENGTH) { /* mapping already appended to result */ continue; /* } else { append single-code point mapping */ } if(c<=0xffff) { result.append((char)c); } else { UTF16.append(result, c); } } return result.toString(); } /** * Gets lowercase version of the argument string. * Casing is dependent on the argument locale and context-sensitive * @param locale which string is to be converted in * @param str source string to be performed on * @return lowercase version of the argument string * @stable ICU 2.1 */ public static String toLowerCase(Locale locale, String str) { return toLowerCase(ULocale.forLocale(locale), str); } /** * Gets lowercase version of the argument string. * Casing is dependent on the argument locale and context-sensitive * @param locale which string is to be converted in * @param str source string to be performed on * @return lowercase version of the argument string * @stable ICU 3.2 */ public static String toLowerCase(ULocale locale, String str) { StringContextIterator iter = new StringContextIterator(str); StringBuffer result = new StringBuffer(str.length()); int[] locCache = new int[1]; int c; if (locale == null) { locale = ULocale.getDefault(); } locCache[0]=0; while((c=iter.nextCaseMapCP())>=0) { c=gCsp.toFullLower(c, iter, result, locale, locCache); /* decode the result */ if(c<0) { /* (not) original code point */ c=~c; } else if(c<=UCaseProps.MAX_STRING_LENGTH) { /* mapping already appended to result */ continue; /* } else { append single-code point mapping */ } if(c<=0xffff) { result.append((char)c); } else { UTF16.append(result, c); } } return result.toString(); } /** *

Gets the titlecase version of the argument string.

*

Position for titlecasing is determined by the argument break * iterator, hence the user can customize his break iterator for * a specialized titlecasing. In this case only the forward iteration * needs to be implemented. * If the break iterator passed in is null, the default Unicode algorithm * will be used to determine the titlecase positions. *

*

Only positions returned by the break iterator will be title cased, * character in between the positions will all be in lower case.

*

Casing is dependent on the argument locale and context-sensitive

* @param locale which string is to be converted in * @param str source string to be performed on * @param breakiter break iterator to determine the positions in which * the character should be title cased. * @return lowercase version of the argument string * @stable ICU 2.6 */ public static String toTitleCase(Locale locale, String str, BreakIterator breakiter) { return toTitleCase(ULocale.forLocale(locale), str, breakiter); } /** *

Gets the titlecase version of the argument string.

*

Position for titlecasing is determined by the argument break * iterator, hence the user can customize his break iterator for * a specialized titlecasing. In this case only the forward iteration * needs to be implemented. * If the break iterator passed in is null, the default Unicode algorithm * will be used to determine the titlecase positions. *

*

Only positions returned by the break iterator will be title cased, * character in between the positions will all be in lower case.

*

Casing is dependent on the argument locale and context-sensitive

* @param locale which string is to be converted in * @param str source string to be performed on * @param titleIter break iterator to determine the positions in which * the character should be title cased. * @return lowercase version of the argument string * @stable ICU 3.2 */ public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter) { return toTitleCase(locale, str, titleIter, 0); } /** *

Gets the titlecase version of the argument string.

*

Position for titlecasing is determined by the argument break * iterator, hence the user can customize his break iterator for * a specialized titlecasing. In this case only the forward iteration * needs to be implemented. * If the break iterator passed in is null, the default Unicode algorithm * will be used to determine the titlecase positions. *

*

Only positions returned by the break iterator will be title cased, * character in between the positions will all be in lower case.

*

Casing is dependent on the argument locale and context-sensitive

* @param locale which string is to be converted in * @param str source string to be performed on * @param titleIter break iterator to determine the positions in which * the character should be title cased. * @param options bit set to modify the titlecasing operation * @return lowercase version of the argument string * @stable ICU 3.8 * @see #TITLECASE_NO_LOWERCASE * @see #TITLECASE_NO_BREAK_ADJUSTMENT */ public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter, int options) { StringContextIterator iter = new StringContextIterator(str); StringBuffer result = new StringBuffer(str.length()); int[] locCache = new int[1]; int c, nc, srcLength = str.length(); if (locale == null) { locale = ULocale.getDefault(); } locCache[0]=0; if(titleIter == null) { titleIter = BreakIterator.getWordInstance(locale); } titleIter.setText(str); int prev, titleStart, index; boolean isFirstIndex; boolean isDutch = locale.getLanguage().equals("nl"); boolean FirstIJ = true; /* set up local variables */ prev=0; isFirstIndex=true; /* titlecasing loop */ while(prevsrcLength) { index=srcLength; } /* * Unicode 4 & 5 section 3.13 Default Case Operations: * * R3 toTitlecase(X): Find the word boundaries based on Unicode Standard Annex * #29, "Text Boundaries." Between each pair of word boundaries, find the first * cased character F. If F exists, map F to default_title(F); then map each * subsequent character C to default_lower(C). * * In this implementation, segment [prev..index[ into 3 parts: * a) uncased characters (copy as-is) [prev..titleStart[ * b) first case letter (titlecase) [titleStart..titleLimit[ * c) subsequent characters (lowercase) [titleLimit..index[ */ if(prev=0 && UCaseProps.NONE==gCsp.getType(c)) {} titleStart=iter.getCPStart(); if(prev=0) { if ( isDutch && ( nc == 0x004A || nc == 0x006A ) && ( c == 0x0049 ) && ( FirstIJ == true )) { c = 0x004A; /* J */ FirstIJ = false; } else { /* Normal operation: Lowercase the rest of the word. */ c=gCsp.toFullLower(nc, iter, result, locale, locCache); } } else { break; } } } } prev=index; } return result.toString(); } /** * The given character is mapped to its case folding equivalent according * to UnicodeData.txt and CaseFolding.txt; if the character has no case * folding equivalent, the character itself is returned. * *

This function only returns the simple, single-code point case mapping. * Full case mappings should be used whenever possible because they produce * better results by working on whole strings. * They can map to a result string with a different length as appropriate. * Full case mappings are applied by the case mapping functions * that take String parameters rather than code points (int). * See also the User Guide chapter on C/POSIX migration: * http://www.icu-project.org/userguide/posix.html#case_mappings * * @param ch the character to be converted * @param defaultmapping Indicates if all mappings defined in * CaseFolding.txt is to be used, otherwise the * mappings for dotted I and dotless i marked with * 'I' in CaseFolding.txt will be skipped. * @return the case folding equivalent of the character, if * any; otherwise the character itself. * @see #foldCase(String, boolean) * @stable ICU 2.1 */ public static int foldCase(int ch, boolean defaultmapping) { return foldCase(ch, defaultmapping ? FOLD_CASE_DEFAULT : FOLD_CASE_EXCLUDE_SPECIAL_I); } /** * The given string is mapped to its case folding equivalent according to * UnicodeData.txt and CaseFolding.txt; if any character has no case * folding equivalent, the character itself is returned. * "Full", multiple-code point case folding mappings are returned here. * For "simple" single-code point mappings use the API * foldCase(int ch, boolean defaultmapping). * @param str the String to be converted * @param defaultmapping Indicates if all mappings defined in * CaseFolding.txt is to be used, otherwise the * mappings for dotted I and dotless i marked with * 'I' in CaseFolding.txt will be skipped. * @return the case folding equivalent of the character, if * any; otherwise the character itself. * @see #foldCase(int, boolean) * @stable ICU 2.1 */ public static String foldCase(String str, boolean defaultmapping) { return foldCase(str, defaultmapping ? FOLD_CASE_DEFAULT : FOLD_CASE_EXCLUDE_SPECIAL_I); } /** * Option value for case folding: use default mappings defined in CaseFolding.txt. * @stable ICU 2.6 */ public static final int FOLD_CASE_DEFAULT = 0x0000; /** * Option value for case folding: exclude the mappings for dotted I * and dotless i marked with 'I' in CaseFolding.txt. * @stable ICU 2.6 */ public static final int FOLD_CASE_EXCLUDE_SPECIAL_I = 0x0001; /** * The given character is mapped to its case folding equivalent according * to UnicodeData.txt and CaseFolding.txt; if the character has no case * folding equivalent, the character itself is returned. * *

This function only returns the simple, single-code point case mapping. * Full case mappings should be used whenever possible because they produce * better results by working on whole strings. * They can map to a result string with a different length as appropriate. * Full case mappings are applied by the case mapping functions * that take String parameters rather than code points (int). * See also the User Guide chapter on C/POSIX migration: * http://www.icu-project.org/userguide/posix.html#case_mappings * * @param ch the character to be converted * @param options A bit set for special processing. Currently the recognised options are * FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT * @return the case folding equivalent of the character, if * any; otherwise the character itself. * @see #foldCase(String, boolean) * @stable ICU 2.6 */ public static int foldCase(int ch, int options) { return gCsp.fold(ch, options); } /** * The given string is mapped to its case folding equivalent according to * UnicodeData.txt and CaseFolding.txt; if any character has no case * folding equivalent, the character itself is returned. * "Full", multiple-code point case folding mappings are returned here. * For "simple" single-code point mappings use the API * foldCase(int ch, boolean defaultmapping). * @param str the String to be converted * @param options A bit set for special processing. Currently the recognised options are * FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT * @return the case folding equivalent of the character, if * any; otherwise the character itself. * @see #foldCase(int, boolean) * @stable ICU 2.6 */ public static final String foldCase(String str, int options) { StringBuffer result = new StringBuffer(str.length()); int c, i, length; length = str.length(); for(i=0; i This returns the value of Han 'numeric' code points, * including those for zero, ten, hundred, thousand, ten thousand, * and hundred million. * This includes both the standard and 'checkwriting' * characters, the 'big circle' zero character, and the standard * zero character. * @param ch code point to query * @return value if it is a Han 'numeric character,' otherwise return -1. * @stable ICU 2.4 */ public static int getHanNumericValue(int ch) { // TODO: Are these all covered by Unicode numeric value data? switch(ch) { case IDEOGRAPHIC_NUMBER_ZERO_ : case CJK_IDEOGRAPH_COMPLEX_ZERO_ : return 0; // Han Zero case CJK_IDEOGRAPH_FIRST_ : case CJK_IDEOGRAPH_COMPLEX_ONE_ : return 1; // Han One case CJK_IDEOGRAPH_SECOND_ : case CJK_IDEOGRAPH_COMPLEX_TWO_ : return 2; // Han Two case CJK_IDEOGRAPH_THIRD_ : case CJK_IDEOGRAPH_COMPLEX_THREE_ : return 3; // Han Three case CJK_IDEOGRAPH_FOURTH_ : case CJK_IDEOGRAPH_COMPLEX_FOUR_ : return 4; // Han Four case CJK_IDEOGRAPH_FIFTH_ : case CJK_IDEOGRAPH_COMPLEX_FIVE_ : return 5; // Han Five case CJK_IDEOGRAPH_SIXTH_ : case CJK_IDEOGRAPH_COMPLEX_SIX_ : return 6; // Han Six case CJK_IDEOGRAPH_SEVENTH_ : case CJK_IDEOGRAPH_COMPLEX_SEVEN_ : return 7; // Han Seven case CJK_IDEOGRAPH_EIGHTH_ : case CJK_IDEOGRAPH_COMPLEX_EIGHT_ : return 8; // Han Eight case CJK_IDEOGRAPH_NINETH_ : case CJK_IDEOGRAPH_COMPLEX_NINE_ : return 9; // Han Nine case CJK_IDEOGRAPH_TEN_ : case CJK_IDEOGRAPH_COMPLEX_TEN_ : return 10; case CJK_IDEOGRAPH_HUNDRED_ : case CJK_IDEOGRAPH_COMPLEX_HUNDRED_ : return 100; case CJK_IDEOGRAPH_THOUSAND_ : case CJK_IDEOGRAPH_COMPLEX_THOUSAND_ : return 1000; case CJK_IDEOGRAPH_TEN_THOUSAND_ : return 10000; case CJK_IDEOGRAPH_HUNDRED_MILLION_ : return 100000000; } return -1; // no value } /** *

Gets an iterator for character types, iterating over codepoints.

* Example of use:
*
     * RangeValueIterator iterator = UCharacter.getTypeIterator();
     * RangeValueIterator.Element element = new RangeValueIterator.Element();
     * while (iterator.next(element)) {
     *     System.out.println("Codepoint \\u" + 
     *                        Integer.toHexString(element.start) + 
     *                        " to codepoint \\u" +
     *                        Integer.toHexString(element.limit - 1) + 
     *                        " has the character type " + 
     *                        element.value);
     * }
     * 
* @return an iterator * @stable ICU 2.6 */ public static RangeValueIterator getTypeIterator() { return new UCharacterTypeIterator(PROPERTY_); } /** *

Gets an iterator for character names, iterating over codepoints.

*

This API only gets the iterator for the modern, most up-to-date * Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or * for extended names use getExtendedNameIterator().

* Example of use:
*
     * ValueIterator iterator = UCharacter.getNameIterator();
     * ValueIterator.Element element = new ValueIterator.Element();
     * while (iterator.next(element)) {
     *     System.out.println("Codepoint \\u" + 
     *                        Integer.toHexString(element.codepoint) +
     *                        " has the name " + (String)element.value);
     * }
     * 
*

The maximal range which the name iterator iterates is from * UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.

* @return an iterator * @stable ICU 2.6 */ public static ValueIterator getNameIterator() { if(NAME_==null){ throw new RuntimeException("Could not load unames.icu"); } return new UCharacterNameIterator(NAME_, UCharacterNameChoice.UNICODE_CHAR_NAME); } /** *

Gets an iterator for character names, iterating over codepoints.

*

This API only gets the iterator for the older 1.0 Unicode names. * For modern, most up-to-date Unicode names use getNameIterator() or * for extended names use getExtendedNameIterator().

* Example of use:
*
     * ValueIterator iterator = UCharacter.get1_0NameIterator();
     * ValueIterator.Element element = new ValueIterator.Element();
     * while (iterator.next(element)) {
     *     System.out.println("Codepoint \\u" + 
     *                        Integer.toHexString(element.codepoint) +
     *                        " has the name " + (String)element.value);
     * }
     * 
*

The maximal range which the name iterator iterates is from * @return an iterator * @stable ICU 2.6 */ public static ValueIterator getName1_0Iterator() { if(NAME_==null){ throw new RuntimeException("Could not load unames.icu"); } return new UCharacterNameIterator(NAME_, UCharacterNameChoice.UNICODE_10_CHAR_NAME); } /** *

Gets an iterator for character names, iterating over codepoints.

*

This API only gets the iterator for the extended names. * For modern, most up-to-date Unicode names use getNameIterator() or * for older 1.0 Unicode names use get1_0NameIterator().

* Example of use:
*
     * ValueIterator iterator = UCharacter.getExtendedNameIterator();
     * ValueIterator.Element element = new ValueIterator.Element();
     * while (iterator.next(element)) {
     *     System.out.println("Codepoint \\u" + 
     *                        Integer.toHexString(element.codepoint) +
     *                        " has the name " + (String)element.value);
     * }
     * 
*

The maximal range which the name iterator iterates is from * @return an iterator * @stable ICU 2.6 */ public static ValueIterator getExtendedNameIterator() { if(NAME_==null){ throw new MissingResourceException("Could not load unames.icu","",""); } return new UCharacterNameIterator(NAME_, UCharacterNameChoice.EXTENDED_CHAR_NAME); } /** *

Get the "age" of the code point.

*

The "age" is the Unicode version when the code point was first * designated (as a non-character or for Private Use) or assigned a * character. *

This can be useful to avoid emitting code points to receiving * processes that do not accept newer characters.

*

The data is from the UCD file DerivedAge.txt.

* @param ch The code point. * @return the Unicode version number * @stable ICU 2.6 */ public static VersionInfo getAge(int ch) { if (ch < MIN_VALUE || ch > MAX_VALUE) { throw new IllegalArgumentException("Codepoint out of bounds"); } return PROPERTY_.getAge(ch); } /** *

Check a binary Unicode property for a code point.

*

Unicode, especially in version 3.2, defines many more properties * than the original set in UnicodeData.txt.

*

This API is intended to reflect Unicode properties as defined in * the Unicode Character Database (UCD) and Unicode Technical Reports * (UTR).

*

For details about the properties see * http://www.unicode.org/.

*

For names of Unicode properties see the UCD file * PropertyAliases.txt.

*

This API does not check the validity of the codepoint.

*

Important: If ICU is built with UCD files from Unicode versions * below 3.2, then properties marked with "new" are not or * not fully available.

* @param ch code point to test. * @param property selector constant from com.ibm.icu.lang.UProperty, * identifies which binary property to check. * @return true or false according to the binary Unicode property value * for ch. Also false if property is out of bounds or if the * Unicode version does not have data for the property at all, or * not for this code point. * @see com.ibm.icu.lang.UProperty * @stable ICU 2.6 */ public static boolean hasBinaryProperty(int ch, int property) { if (ch < MIN_VALUE || ch > MAX_VALUE) { throw new IllegalArgumentException("Codepoint out of bounds"); } return PROPERTY_.hasBinaryProperty(ch, property); } /** *

Check if a code point has the Alphabetic Unicode property.

*

Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).

*

Different from UCharacter.isLetter(ch)!

* @stable ICU 2.6 * @param ch codepoint to be tested */ public static boolean isUAlphabetic(int ch) { return hasBinaryProperty(ch, UProperty.ALPHABETIC); } /** *

Check if a code point has the Lowercase Unicode property.

*

Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).

*

This is different from UCharacter.isLowerCase(ch)!

* @param ch codepoint to be tested * @stable ICU 2.6 */ public static boolean isULowercase(int ch) { return hasBinaryProperty(ch, UProperty.LOWERCASE); } /** *

Check if a code point has the Uppercase Unicode property.

*

Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).

*

This is different from UCharacter.isUpperCase(ch)!

* @param ch codepoint to be tested * @stable ICU 2.6 */ public static boolean isUUppercase(int ch) { return hasBinaryProperty(ch, UProperty.UPPERCASE); } /** *

Check if a code point has the White_Space Unicode property.

*

Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).

*

This is different from both UCharacter.isSpace(ch) and * UCharacter.isWhitespace(ch)!

* @param ch codepoint to be tested * @stable ICU 2.6 */ public static boolean isUWhiteSpace(int ch) { return hasBinaryProperty(ch, UProperty.WHITE_SPACE); } /** *

Gets the property value for an Unicode property type of a code point. * Also returns binary and mask property values.

*

Unicode, especially in version 3.2, defines many more properties than * the original set in UnicodeData.txt.

*

The properties APIs are intended to reflect Unicode properties as * defined in the Unicode Character Database (UCD) and Unicode Technical * Reports (UTR). For details about the properties see * http://www.unicode.org/.

*

For names of Unicode properties see the UCD file PropertyAliases.txt. *

*
     * Sample usage:
     * int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH);
     * int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC);
     * boolean b = (ideo == 1) ? true : false; 
     * 
* @param ch code point to test. * @param type UProperty selector constant, identifies which binary * property to check. Must be * UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or * UProperty.INT_START <= type < UProperty.INT_LIMIT or * UProperty.MASK_START <= type < UProperty.MASK_LIMIT. * @return numeric value that is directly the property value or, * for enumerated properties, corresponds to the numeric value of * the enumerated constant of the respective property value * enumeration type (cast to enum type if necessary). * Returns 0 or 1 (for false / true) for binary Unicode properties. * Returns a bit-mask for mask properties. * Returns 0 if 'type' is out of bounds or if the Unicode version * does not have data for the property at all, or not for this code * point. * @see UProperty * @see #hasBinaryProperty * @see #getIntPropertyMinValue * @see #getIntPropertyMaxValue * @see #getUnicodeVersion * @stable ICU 2.4 */ public static int getIntPropertyValue(int ch, int type) { if (type < UProperty.BINARY_START) { return 0; // undefined } else if (type < UProperty.BINARY_LIMIT) { return hasBinaryProperty(ch, type) ? 1 : 0; } else if (type < UProperty.INT_START) { return 0; // undefined } else if (type < UProperty.INT_LIMIT) { //int result = 0; switch (type) { case UProperty.BIDI_CLASS: return getDirection(ch); case UProperty.BLOCK: return UnicodeBlock.idOf(ch); case UProperty.CANONICAL_COMBINING_CLASS: return getCombiningClass(ch); case UProperty.DECOMPOSITION_TYPE: return PROPERTY_.getAdditional(ch, 2) & DECOMPOSITION_TYPE_MASK_; case UProperty.EAST_ASIAN_WIDTH: return (PROPERTY_.getAdditional(ch, 0) & EAST_ASIAN_MASK_) >> EAST_ASIAN_SHIFT_; case UProperty.GENERAL_CATEGORY: return getType(ch); case UProperty.JOINING_GROUP: return gBdp.getJoiningGroup(ch); case UProperty.JOINING_TYPE: return gBdp.getJoiningType(ch); case UProperty.LINE_BREAK: return (int)(PROPERTY_.getAdditional(ch, LB_VWORD)& LB_MASK)>>LB_SHIFT; case UProperty.NUMERIC_TYPE: type=getNumericType(PROPERTY_.getProperty(ch)); if(type>NumericType.NUMERIC) { /* keep internal variants of NumericType.NUMERIC from becoming visible */ type=NumericType.NUMERIC; } return type; case UProperty.SCRIPT: return UScript.getScript(ch); case UProperty.HANGUL_SYLLABLE_TYPE: /* purely algorithmic; hardcode known characters, check for assigned new ones */ if(ch>8; case UProperty.TRAIL_CANONICAL_COMBINING_CLASS: return NormalizerImpl.getFCD16(ch)&0xff; case UProperty.GRAPHEME_CLUSTER_BREAK: return (int)(PROPERTY_.getAdditional(ch, 2)& GCB_MASK)>>GCB_SHIFT; case UProperty.SENTENCE_BREAK: return (int)(PROPERTY_.getAdditional(ch, 2)& SB_MASK)>>SB_SHIFT; case UProperty.WORD_BREAK: return (int)(PROPERTY_.getAdditional(ch, 2)& WB_MASK)>>WB_SHIFT; default: return 0; /* undefined */ } } else if (type == UProperty.GENERAL_CATEGORY_MASK) { return UCharacterProperty.getMask(getType(ch)); } return 0; // undefined } /** * Returns a string version of the property value. * @param propertyEnum * @param codepoint * @param nameChoice * @return value as string * @internal * @deprecated This API is ICU internal only. */ public static String getStringPropertyValue(int propertyEnum, int codepoint, int nameChoice) { // TODO some of these are less efficient, since a string is forced! if ((propertyEnum >= UProperty.BINARY_START && propertyEnum < UProperty.BINARY_LIMIT) || (propertyEnum >= UProperty.INT_START && propertyEnum < UProperty.INT_LIMIT)) { return getPropertyValueName(propertyEnum, getIntPropertyValue(codepoint, propertyEnum), nameChoice); } if (propertyEnum == UProperty.NUMERIC_VALUE) { return String.valueOf(getUnicodeNumericValue(codepoint)); } // otherwise must be string property switch (propertyEnum) { case UProperty.AGE: return getAge(codepoint).toString(); case UProperty.ISO_COMMENT: return getISOComment(codepoint); case UProperty.BIDI_MIRRORING_GLYPH: return UTF16.valueOf(getMirror(codepoint)); case UProperty.CASE_FOLDING: return foldCase(UTF16.valueOf(codepoint), true); case UProperty.LOWERCASE_MAPPING: return toLowerCase(UTF16.valueOf(codepoint)); case UProperty.NAME: return getName(codepoint); case UProperty.SIMPLE_CASE_FOLDING: return UTF16.valueOf(foldCase(codepoint,true)); case UProperty.SIMPLE_LOWERCASE_MAPPING: return UTF16.valueOf(toLowerCase(codepoint)); case UProperty.SIMPLE_TITLECASE_MAPPING: return UTF16.valueOf(toTitleCase(codepoint)); case UProperty.SIMPLE_UPPERCASE_MAPPING: return UTF16.valueOf(toUpperCase(codepoint)); case UProperty.TITLECASE_MAPPING: return toTitleCase(UTF16.valueOf(codepoint),null); case UProperty.UNICODE_1_NAME: return getName1_0(codepoint); case UProperty.UPPERCASE_MAPPING: return toUpperCase(UTF16.valueOf(codepoint)); } throw new IllegalArgumentException("Illegal Property Enum"); } /** * Get the minimum value for an integer/binary Unicode property type. * Can be used together with UCharacter.getIntPropertyMaxValue(int) * to allocate arrays of com.ibm.icu.text.UnicodeSet or similar. * @param type UProperty selector constant, identifies which binary * property to check. Must be * UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or * UProperty.INT_START <= type < UProperty.INT_LIMIT. * @return Minimum value returned by UCharacter.getIntPropertyValue(int) * for a Unicode property. 0 if the property * selector 'type' is out of range. * @see UProperty * @see #hasBinaryProperty * @see #getUnicodeVersion * @see #getIntPropertyMaxValue * @see #getIntPropertyValue * @stable ICU 2.4 */ public static int getIntPropertyMinValue(int type) { return 0; // undefined; and: all other properties have a minimum value // of 0 } /** * Get the maximum value for an integer/binary Unicode property. * Can be used together with UCharacter.getIntPropertyMinValue(int) * to allocate arrays of com.ibm.icu.text.UnicodeSet or similar. * Examples for min/max values (for Unicode 3.2): *
    *
  • UProperty.BIDI_CLASS: 0/18 (UCharacterDirection.LEFT_TO_RIGHT/UCharacterDirection.BOUNDARY_NEUTRAL) *
  • UProperty.SCRIPT: 0/45 (UScript.COMMON/UScript.TAGBANWA) *
  • UProperty.IDEOGRAPHIC: 0/1 (false/true) *
* For undefined UProperty constant values, min/max values will be 0/-1. * @param type UProperty selector constant, identifies which binary * property to check. Must be * UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or * UProperty.INT_START <= type < UProperty.INT_LIMIT. * @return Maximum value returned by u_getIntPropertyValue for a Unicode * property. <= 0 if the property selector 'type' is out of range. * @see UProperty * @see #hasBinaryProperty * @see #getUnicodeVersion * @see #getIntPropertyMaxValue * @see #getIntPropertyValue * @stable ICU 2.4 */ public static int getIntPropertyMaxValue(int type) { if (type < UProperty.BINARY_START) { return -1; // undefined } else if (type < UProperty.BINARY_LIMIT) { return 1; // maximum TRUE for all binary properties } else if (type < UProperty.INT_START) { return -1; // undefined } else if (type < UProperty.INT_LIMIT) { switch (type) { case UProperty.BIDI_CLASS: case UProperty.JOINING_GROUP: case UProperty.JOINING_TYPE: return gBdp.getMaxValue(type); case UProperty.BLOCK: return (PROPERTY_.getMaxValues(0) & BLOCK_MASK_) >> BLOCK_SHIFT_; case UProperty.CANONICAL_COMBINING_CLASS: case UProperty.LEAD_CANONICAL_COMBINING_CLASS: case UProperty.TRAIL_CANONICAL_COMBINING_CLASS: return 0xff; // TODO do we need to be more precise, // getting the actual maximum? case UProperty.DECOMPOSITION_TYPE: return PROPERTY_.getMaxValues(2) & DECOMPOSITION_TYPE_MASK_; case UProperty.EAST_ASIAN_WIDTH: return (PROPERTY_.getMaxValues(0) & EAST_ASIAN_MASK_) >> EAST_ASIAN_SHIFT_; case UProperty.GENERAL_CATEGORY: return UCharacterCategory.CHAR_CATEGORY_COUNT - 1; case UProperty.LINE_BREAK: return (PROPERTY_.getMaxValues(LB_VWORD) & LB_MASK) >> LB_SHIFT; case UProperty.NUMERIC_TYPE: return NumericType.COUNT - 1; case UProperty.SCRIPT: return PROPERTY_.getMaxValues(0) & SCRIPT_MASK_; case UProperty.HANGUL_SYLLABLE_TYPE: return HangulSyllableType.COUNT-1; case UProperty.NFD_QUICK_CHECK: case UProperty.NFKD_QUICK_CHECK: return 1; // YES -- these are never "maybe", only "no" or "yes" case UProperty.NFC_QUICK_CHECK: case UProperty.NFKC_QUICK_CHECK: return 2; // MAYBE case UProperty.GRAPHEME_CLUSTER_BREAK: return (PROPERTY_.getMaxValues(2) & GCB_MASK) >> GCB_SHIFT; case UProperty.SENTENCE_BREAK: return (PROPERTY_.getMaxValues(2) & SB_MASK) >> SB_SHIFT; case UProperty.WORD_BREAK: return (PROPERTY_.getMaxValues(2) & WB_MASK) >> WB_SHIFT; default: return -1; // undefined } } return -1; // undefined } /** * Provide the java.lang.Character forDigit API, for convenience. * @stable ICU 3.0 */ public static char forDigit(int digit, int radix) { return java.lang.Character.forDigit(digit, radix); } // JDK 1.5 API coverage /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#LEAD_SURROGATE_MIN_VALUE * @stable ICU 3.0 */ public static final char MIN_HIGH_SURROGATE = UTF16.LEAD_SURROGATE_MIN_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#LEAD_SURROGATE_MAX_VALUE * @stable ICU 3.0 */ public static final char MAX_HIGH_SURROGATE = UTF16.LEAD_SURROGATE_MAX_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#TRAIL_SURROGATE_MIN_VALUE * @stable ICU 3.0 */ public static final char MIN_LOW_SURROGATE = UTF16.TRAIL_SURROGATE_MIN_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#TRAIL_SURROGATE_MAX_VALUE * @stable ICU 3.0 */ public static final char MAX_LOW_SURROGATE = UTF16.TRAIL_SURROGATE_MAX_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#SURROGATE_MIN_VALUE * @stable ICU 3.0 */ public static final char MIN_SURROGATE = UTF16.SURROGATE_MIN_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#SURROGATE_MAX_VALUE * @stable ICU 3.0 */ public static final char MAX_SURROGATE = UTF16.SURROGATE_MAX_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#SUPPLEMENTARY_MIN_VALUE * @stable ICU 3.0 */ public static final int MIN_SUPPLEMENTARY_CODE_POINT = UTF16.SUPPLEMENTARY_MIN_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#CODEPOINT_MAX_VALUE * @stable ICU 3.0 */ public static final int MAX_CODE_POINT = UTF16.CODEPOINT_MAX_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @see UTF16#CODEPOINT_MIN_VALUE * @stable ICU 3.0 */ public static final int MIN_CODE_POINT = UTF16.CODEPOINT_MIN_VALUE; /** * Cover the JDK 1.5 API, for convenience. * @param cp the code point to check * @return true if cp is a valid code point * @stable ICU 3.0 */ public static final boolean isValidCodePoint(int cp) { return cp >= 0 && cp <= MAX_CODE_POINT; } /** * Cover the JDK 1.5 API, for convenience. * @param cp the code point to check * @return true if cp is a supplementary code point * @stable ICU 3.0 */ public static final boolean isSupplementaryCodePoint(int cp) { return cp >= UTF16.SUPPLEMENTARY_MIN_VALUE && cp <= UTF16.CODEPOINT_MAX_VALUE; } /** * Cover the JDK 1.5 API, for convenience. * @param ch the char to check * @return true if ch is a high (lead) surrogate * @stable ICU 3.0 */ public static boolean isHighSurrogate(char ch) { return ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE; } /** * Cover the JDK 1.5 API, for convenience. * @param ch the char to check * @return true if ch is a low (trail) surrogate * @stable ICU 3.0 */ public static boolean isLowSurrogate(char ch) { return ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE; } /** * Cover the JDK 1.5 API, for convenience. Return true if the chars * form a valid surrogate pair. * @param high the high (lead) char * @param low the low (trail) char * @return true if high, low form a surrogate pair * @stable ICU 3.0 */ public static final boolean isSurrogatePair(char high, char low) { return isHighSurrogate(high) && isLowSurrogate(low); } /** * Cover the JDK 1.5 API, for convenience. Return the number of chars needed * to represent the code point. This does not check the * code point for validity. * @param cp the code point to check * @return the number of chars needed to represent the code point * @see UTF16#getCharCount * @stable ICU 3.0 */ public static int charCount(int cp) { return UTF16.getCharCount(cp); } /** * Cover the JDK 1.5 API, for convenience. Return the code point represented by * the characters. This does not check the surrogate pair for validity. * @param high the high (lead) surrogate * @param low the low (trail) surrogate * @return the code point formed by the surrogate pair * @stable ICU 3.0 */ public static final int toCodePoint(char high, char low) { return UCharacterProperty.getRawSupplementary(high, low); } /** * Cover the JDK 1.5 API, for convenience. Return the code point at index. *
Note: the semantics of this API is different from the related UTF16 * API. This examines only the characters at index and index+1. * @param seq the characters to check * @param index the index of the first or only char forming the code point * @return the code point at the index * @stable ICU 3.0 */ //#if defined(FOUNDATION10) || defined(J2SE13) //## public static final int codePointAt(String seq, int index) { //## char c1 = seq.charAt(index++); //## if (isHighSurrogate(c1)) { //## if (index < seq.length()) { //## char c2 = seq.charAt(index); //## if (isLowSurrogate(c2)) { //## return toCodePoint(c1, c2); //## } //## } //## } //## return c1; //## } //## public static final int codePointAt(StringBuffer seq, int index) { //## return codePointAt(seq.toString(), index); //## } //#else //#if defined(ECLIPSE_FRAGMENT) //## public static final int codePointAt(String seq, int index) { //## return codePointAt((CharSequence)seq, index); //## } //## public static final int codePointAt(StringBuffer seq, int index) { //## return codePointAt((CharSequence)seq, index); //## } //#endif public static final int codePointAt(CharSequence seq, int index) { char c1 = seq.charAt(index++); if (isHighSurrogate(c1)) { if (index < seq.length()) { char c2 = seq.charAt(index); if (isLowSurrogate(c2)) { return toCodePoint(c1, c2); } } } return c1; } //#endif /** * Cover the JDK 1.5 API, for convenience. Return the code point at index. *
Note: the semantics of this API is different from the related UTF16 * API. This examines only the characters at index and index+1. * @param text the characters to check * @param index the index of the first or only char forming the code point * @return the code point at the index * @stable ICU 3.0 */ public static final int codePointAt(char[] text, int index) { char c1 = text[index++]; if (isHighSurrogate(c1)) { if (index < text.length) { char c2 = text[index]; if (isLowSurrogate(c2)) { return toCodePoint(c1, c2); } } } return c1; } /** * Cover the JDK 1.5 API, for convenience. Return the code point at index. *
Note: the semantics of this API is different from the related UTF16 * API. This examines only the characters at index and index+1. * @param text the characters to check * @param index the index of the first or only char forming the code point * @param limit the limit of the valid text * @return the code point at the index * @stable ICU 3.0 */ public static final int codePointAt(char[] text, int index, int limit) { if (index >= limit || limit > text.length) { throw new IndexOutOfBoundsException(); } char c1 = text[index++]; if (isHighSurrogate(c1)) { if (index < limit) { char c2 = text[index]; if (isLowSurrogate(c2)) { return toCodePoint(c1, c2); } } } return c1; } /** * Cover the JDK 1.5 API, for convenience. Return the code point before index. *
Note: the semantics of this API is different from the related UTF16 * API. This examines only the characters at index-1 and index-2. * @param seq the characters to check * @param index the index after the last or only char forming the code point * @return the code point before the index * @stable ICU 3.0 */ //#if defined(FOUNDATION10) || defined(J2SE13) //## public static final int codePointBefore(String seq, int index) { //## char c2 = seq.charAt(--index); //## if (isLowSurrogate(c2)) { //## if (index > 0) { //## char c1 = seq.charAt(--index); //## if (isHighSurrogate(c1)) { //## return toCodePoint(c1, c2); //## } //## } //## } //## return c2; //## } //## public static final int codePointBefore(StringBuffer seq, int index) { //## return codePointBefore(seq.toString(), index); //## } //#else //#if defined(ECLIPSE_FRAGMENT) //## public static final int codePointBefore(String seq, int index) { //## return codePointBefore((CharSequence)seq, index); //## } //## public static final int codePointBefore(StringBuffer seq, int index) { //## return codePointBefore((CharSequence)seq, index); //## } //#endif public static final int codePointBefore(CharSequence seq, int index) { char c2 = seq.charAt(--index); if (isLowSurrogate(c2)) { if (index > 0) { char c1 = seq.charAt(--index); if (isHighSurrogate(c1)) { return toCodePoint(c1, c2); } } } return c2; } //#endif /** * Cover the JDK 1.5 API, for convenience. Return the code point before index. *
Note: the semantics of this API is different from the related UTF16 * API. This examines only the characters at index-1 and index-2. * @param text the characters to check * @param index the index after the last or only char forming the code point * @return the code point before the index * @stable ICU 3.0 */ public static final int codePointBefore(char[] text, int index) { char c2 = text[--index]; if (isLowSurrogate(c2)) { if (index > 0) { char c1 = text[--index]; if (isHighSurrogate(c1)) { return toCodePoint(c1, c2); } } } return c2; } /** * Cover the JDK 1.5 API, for convenience. Return the code point before index. *
Note: the semantics of this API is different from the related UTF16 * API. This examines only the characters at index-1 and index-2. * @param text the characters to check * @param index the index after the last or only char forming the code point * @param limit the start of the valid text * @return the code point before the index * @stable ICU 3.0 */ public static final int codePointBefore(char[] text, int index, int limit) { if (index <= limit || limit < 0) { throw new IndexOutOfBoundsException(); } char c2 = text[--index]; if (isLowSurrogate(c2)) { if (index > limit) { char c1 = text[--index]; if (isHighSurrogate(c1)) { return toCodePoint(c1, c2); } } } return c2; } /** * Cover the JDK 1.5 API, for convenience. Writes the chars representing the * code point into the destination at the given index. * @param cp the code point to convert * @param dst the destination array into which to put the char(s) representing the code point * @param dstIndex the index at which to put the first (or only) char * @return the count of the number of chars written (1 or 2) * @throws IllegalArgumentException if cp is not a valid code point * @stable ICU 3.0 */ public static final int toChars(int cp, char[] dst, int dstIndex) { if (cp >= 0) { if (cp < MIN_SUPPLEMENTARY_CODE_POINT) { dst[dstIndex] = (char)cp; return 1; } if (cp <= MAX_CODE_POINT) { dst[dstIndex] = UTF16.getLeadSurrogate(cp); dst[dstIndex+1] = UTF16.getTrailSurrogate(cp); return 2; } } throw new IllegalArgumentException(); } /** * Cover the JDK 1.5 API, for convenience. Returns a char array * representing the code point. * @param cp the code point to convert * @return an array containing the char(s) representing the code point * @throws IllegalArgumentException if cp is not a valid code point * @stable ICU 3.0 */ public static final char[] toChars(int cp) { if (cp >= 0) { if (cp < MIN_SUPPLEMENTARY_CODE_POINT) { return new char[] { (char)cp }; } if (cp <= MAX_CODE_POINT) { return new char[] { UTF16.getLeadSurrogate(cp), UTF16.getTrailSurrogate(cp) }; } } throw new IllegalArgumentException(); } /** * Cover the JDK API, for convenience. Return a byte representing the directionality of * the character. *
Note: Unlike the JDK, this returns DIRECTIONALITY_LEFT_TO_RIGHT for undefined or * out-of-bounds characters.
Note: The return value must be * tested using the constants defined in {@link UCharacterEnums.ECharacterDirection} * since the values are different from the ones defined by java.lang.Character. * @param cp the code point to check * @return the directionality of the code point * @see #getDirection * @stable ICU 3.0 */ public static byte getDirectionality(int cp) { return (byte)getDirection(cp); } /** * Cover the JDK API, for convenience. Count the number of code points in the range of text. * @param text the characters to check * @param start the start of the range * @param limit the limit of the range * @return the number of code points in the range * @stable ICU 3.0 */ //#if defined(FOUNDATION10) || defined(J2SE13) //## public static int codePointCount(String text, int start, int limit) { //## if (start < 0 || limit < start || limit > text.length()) { //## throw new IndexOutOfBoundsException("start (" + start + //## ") or limit (" + limit + //## ") invalid or out of range 0, " + text.length()); //## } //## //## int len = limit - start; //## while (limit > start) { //## char ch = text.charAt(--limit); //## while (ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE && limit > start) { //## ch = text.charAt(--limit); //## if (ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE) { //## --len; //## break; //## } //## } //## } //## return len; //## } //## public static int codePointCount(StringBuffer text, int start, int limit) { //## return codePointCount(text.toString(), start, limit); //## } //#else //#if defined(ECLIPSE_FRAGMENT) //## public static int codePointCount(String text, int start, int limit) { //## return codePointCount((CharSequence)text, start, limit); //## } //## public static int codePointCount(StringBuffer text, int start, int limit) { //## return codePointCount((CharSequence)text, start, limit); //## } //#endif public static int codePointCount(CharSequence text, int start, int limit) { if (start < 0 || limit < start || limit > text.length()) { throw new IndexOutOfBoundsException("start (" + start + ") or limit (" + limit + ") invalid or out of range 0, " + text.length()); } int len = limit - start; while (limit > start) { char ch = text.charAt(--limit); while (ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE && limit > start) { ch = text.charAt(--limit); if (ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE) { --len; break; } } } return len; } //#endif /** * Cover the JDK API, for convenience. Count the number of code points in the range of text. * @param text the characters to check * @param start the start of the range * @param limit the limit of the range * @return the number of code points in the range * @stable ICU 3.0 */ public static int codePointCount(char[] text, int start, int limit) { if (start < 0 || limit < start || limit > text.length) { throw new IndexOutOfBoundsException("start (" + start + ") or limit (" + limit + ") invalid or out of range 0, " + text.length); } int len = limit - start; while (limit > start) { char ch = text[--limit]; while (ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE && limit > start) { ch = text[--limit]; if (ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE) { --len; break; } } } return len; } /** * Cover the JDK API, for convenience. Adjust the char index by a code point offset. * @param text the characters to check * @param index the index to adjust * @param codePointOffset the number of code points by which to offset the index * @return the adjusted index * @stable ICU 3.0 */ //#if defined(FOUNDATION10) || defined(J2SE13) //## public static int offsetByCodePoints(String text, int index, int codePointOffset) { //## if (index < 0 || index > text.length()) { //## throw new IndexOutOfBoundsException("index ( " + index + //## ") out of range 0, " + text.length()); //## } //## //## if (codePointOffset < 0) { //## while (++codePointOffset <= 0) { //## char ch = text.charAt(--index); //## while (ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE && index > 0) { //## ch = text.charAt(--index); //## if (ch < MIN_HIGH_SURROGATE || ch > MAX_HIGH_SURROGATE) { //## if (++codePointOffset > 0) { //## return index+1; //## } //## } //## } //## } //## } else { //## int limit = text.length(); //## while (--codePointOffset >= 0) { //## char ch = text.charAt(index++); //## while (ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE && index < limit) { //## ch = text.charAt(index++); //## if (ch < MIN_LOW_SURROGATE || ch > MAX_LOW_SURROGATE) { //## if (--codePointOffset < 0) { //## return index-1; //## } //## } //## } //## } //## } //## //## return index; //## } //## public static int offsetByCodePoints(StringBuffer text, int index, int codePointOffset) { //## return offsetByCodePoints(text.toString(), index, codePointOffset); //## } //#else //#if defined(ECLIPSE_FRAGMENT) //## public static int offsetByCodePoints(String text, int index, int codePointOffset) { //## return offsetByCodePoints((CharSequence)text, index, codePointOffset); //## } //## public static int offsetByCodePoints(StringBuffer text, int index, int codePointOffset) { //## return offsetByCodePoints((CharSequence)text, index, codePointOffset); //## } //#endif public static int offsetByCodePoints(CharSequence text, int index, int codePointOffset) { if (index < 0 || index > text.length()) { throw new IndexOutOfBoundsException("index ( " + index + ") out of range 0, " + text.length()); } if (codePointOffset < 0) { while (++codePointOffset <= 0) { char ch = text.charAt(--index); while (ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE && index > 0) { ch = text.charAt(--index); if (ch < MIN_HIGH_SURROGATE || ch > MAX_HIGH_SURROGATE) { if (++codePointOffset > 0) { return index+1; } } } } } else { int limit = text.length(); while (--codePointOffset >= 0) { char ch = text.charAt(index++); while (ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE && index < limit) { ch = text.charAt(index++); if (ch < MIN_LOW_SURROGATE || ch > MAX_LOW_SURROGATE) { if (--codePointOffset < 0) { return index-1; } } } } } return index; } //#endif /** * Cover the JDK API, for convenience. Adjust the char index by a code point offset. * @param text the characters to check * @param start the start of the range to check * @param count the length of the range to check * @param index the index to adjust * @param codePointOffset the number of code points by which to offset the index * @return the adjusted index * @stable ICU 3.0 */ public static int offsetByCodePoints(char[] text, int start, int count, int index, int codePointOffset) { int limit = start + count; if (start < 0 || limit < start || limit > text.length || index < start || index > limit) { throw new IndexOutOfBoundsException("index ( " + index + ") out of range " + start + ", " + limit + " in array 0, " + text.length); } if (codePointOffset < 0) { while (++codePointOffset <= 0) { char ch = text[--index]; if (index < start) { throw new IndexOutOfBoundsException("index ( " + index + ") < start (" + start + ")"); } while (ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE && index > start) { ch = text[--index]; if (ch < MIN_HIGH_SURROGATE || ch > MAX_HIGH_SURROGATE) { if (++codePointOffset > 0) { return index+1; } } } } } else { while (--codePointOffset >= 0) { char ch = text[index++]; if (index > limit) { throw new IndexOutOfBoundsException("index ( " + index + ") > limit (" + limit + ")"); } while (ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE && index < limit) { ch = text[index++]; if (ch < MIN_LOW_SURROGATE || ch > MAX_LOW_SURROGATE) { if (--codePointOffset < 0) { return index-1; } } } } } return index; } // protected data members -------------------------------------------- /** * Database storing the sets of character name */ static UCharacterName NAME_ = null; /** * Singleton object encapsulating the imported pnames.icu property aliases */ static UPropertyAliases PNAMES_ = null; // block to initialise name database and unicode 1.0 data static { try { PNAMES_ = new UPropertyAliases(); NAME_ = UCharacterName.getInstance(); } catch (IOException e) { // e.printStackTrace(); throw new MissingResourceException(e.getMessage(),"",""); //throw new RuntimeException(e.getMessage()); // DONOT throw an exception // we might be building ICU modularly wothout names.icu and // pnames.icu } } // private variables ------------------------------------------------- /** * Database storing the sets of character property */ private static final UCharacterProperty PROPERTY_; /** * For optimization */ private static final char[] PROPERTY_TRIE_INDEX_; private static final char[] PROPERTY_TRIE_DATA_; private static final int PROPERTY_INITIAL_VALUE_; private static final UCaseProps gCsp; private static final UBiDiProps gBdp; // block to initialise character property database static { try { PROPERTY_ = UCharacterProperty.getInstance(); PROPERTY_TRIE_INDEX_ = PROPERTY_.m_trieIndex_; PROPERTY_TRIE_DATA_ = PROPERTY_.m_trieData_; PROPERTY_INITIAL_VALUE_ = PROPERTY_.m_trieInitialValue_; } catch (Exception e) { throw new MissingResourceException(e.getMessage(),"",""); } /* * In ICU4J 3.2, most Unicode properties were loaded from uprops.icu. * ICU4J 3.4 adds ucase.icu for case mapping properties and * ubidi.icu for bidi/shaping properties and * removes case/bidi/shaping properties from uprops.icu. * * Loading of uprops.icu was always done during class loading of UCharacter.class. * In order to maintain performance for all such properties, * ucase.icu and ubidi.icu are also loaded during class loading of UCharacter.class. * It will not fail if they are missing. * These data items are loaded early to avoid having to synchronize access to them, * for thread safety and performance. * * We try to load these data items at most once. * If it works, we use the resulting singleton object. * If it fails, then we get a dummy object, which always works unless * we are seriously out of memory. * After UCharacter.class loading, we have a never-changing pointer to either the * real singleton or the dummy. * * This method is used in Unicode properties APIs that * do not have a service object and also do not have an error code parameter. * Other API implementations get the singleton themselves * (synchronized), store it in the service object, and report errors. */ UCaseProps csp; try { csp=UCaseProps.getSingleton(); } catch(IOException e) { csp=UCaseProps.getDummy(); } gCsp=csp; UBiDiProps bdp; try { bdp=UBiDiProps.getSingleton(); } catch(IOException e) { bdp=UBiDiProps.getDummy(); } gBdp=bdp; } /** * To get the last character out from a data type */ private static final int LAST_CHAR_MASK_ = 0xFFFF; // /** // * To get the last byte out from a data type // */ // private static final int LAST_BYTE_MASK_ = 0xFF; // // /** // * Shift 16 bits // */ // private static final int SHIFT_16_ = 16; // // /** // * Shift 24 bits // */ // private static final int SHIFT_24_ = 24; // // /** // * Decimal radix // */ // private static final int DECIMAL_RADIX_ = 10; /** * No break space code point */ private static final int NO_BREAK_SPACE_ = 0xA0; /** * Figure space code point */ private static final int FIGURE_SPACE_ = 0x2007; /** * Narrow no break space code point */ private static final int NARROW_NO_BREAK_SPACE_ = 0x202F; /** * Ideographic number zero code point */ private static final int IDEOGRAPHIC_NUMBER_ZERO_ = 0x3007; /** * CJK Ideograph, First code point */ private static final int CJK_IDEOGRAPH_FIRST_ = 0x4e00; /** * CJK Ideograph, Second code point */ private static final int CJK_IDEOGRAPH_SECOND_ = 0x4e8c; /** * CJK Ideograph, Third code point */ private static final int CJK_IDEOGRAPH_THIRD_ = 0x4e09; /** * CJK Ideograph, Fourth code point */ private static final int CJK_IDEOGRAPH_FOURTH_ = 0x56d8; /** * CJK Ideograph, FIFTH code point */ private static final int CJK_IDEOGRAPH_FIFTH_ = 0x4e94; /** * CJK Ideograph, Sixth code point */ private static final int CJK_IDEOGRAPH_SIXTH_ = 0x516d; /** * CJK Ideograph, Seventh code point */ private static final int CJK_IDEOGRAPH_SEVENTH_ = 0x4e03; /** * CJK Ideograph, Eighth code point */ private static final int CJK_IDEOGRAPH_EIGHTH_ = 0x516b; /** * CJK Ideograph, Nineth code point */ private static final int CJK_IDEOGRAPH_NINETH_ = 0x4e5d; /** * Application Program command code point */ private static final int APPLICATION_PROGRAM_COMMAND_ = 0x009F; /** * Unit separator code point */ private static final int UNIT_SEPARATOR_ = 0x001F; /** * Delete code point */ private static final int DELETE_ = 0x007F; /* * ISO control character first range upper limit 0x0 - 0x1F */ //private static final int ISO_CONTROL_FIRST_RANGE_MAX_ = 0x1F; /** * Shift to get numeric type */ private static final int NUMERIC_TYPE_SHIFT_ = 5; /** * Mask to get numeric type */ private static final int NUMERIC_TYPE_MASK_ = 0x7 << NUMERIC_TYPE_SHIFT_; /* encoding of fractional and large numbers */ //private static final int MAX_SMALL_NUMBER=0xff; private static final int FRACTION_NUM_SHIFT=3; /* numerator: bits 7..3 */ private static final int FRACTION_DEN_MASK=7; /* denominator: bits 2..0 */ //private static final int FRACTION_MAX_NUM=31; private static final int FRACTION_DEN_OFFSET=2; /* denominator values are 2..9 */ //private static final int FRACTION_MIN_DEN=FRACTION_DEN_OFFSET; //private static final int FRACTION_MAX_DEN=FRACTION_MIN_DEN+FRACTION_DEN_MASK; private static final int LARGE_MANT_SHIFT=4; /* mantissa: bits 7..4 */ private static final int LARGE_EXP_MASK=0xf; /* exponent: bits 3..0 */ private static final int LARGE_EXP_OFFSET=2; /* regular exponents 2..17 */ private static final int LARGE_EXP_OFFSET_EXTRA=18; /* extra large exponents 18..33 */ //private static final int LARGE_MIN_EXP=LARGE_EXP_OFFSET; //private static final int LARGE_MAX_EXP=LARGE_MIN_EXP+LARGE_EXP_MASK; //private static final int LARGE_MAX_EXP_EXTRA=LARGE_EXP_OFFSET_EXTRA+LARGE_EXP_MASK; /** * Han digit characters */ private static final int CJK_IDEOGRAPH_COMPLEX_ZERO_ = 0x96f6; private static final int CJK_IDEOGRAPH_COMPLEX_ONE_ = 0x58f9; private static final int CJK_IDEOGRAPH_COMPLEX_TWO_ = 0x8cb3; private static final int CJK_IDEOGRAPH_COMPLEX_THREE_ = 0x53c3; private static final int CJK_IDEOGRAPH_COMPLEX_FOUR_ = 0x8086; private static final int CJK_IDEOGRAPH_COMPLEX_FIVE_ = 0x4f0d; private static final int CJK_IDEOGRAPH_COMPLEX_SIX_ = 0x9678; private static final int CJK_IDEOGRAPH_COMPLEX_SEVEN_ = 0x67d2; private static final int CJK_IDEOGRAPH_COMPLEX_EIGHT_ = 0x634c; private static final int CJK_IDEOGRAPH_COMPLEX_NINE_ = 0x7396; private static final int CJK_IDEOGRAPH_TEN_ = 0x5341; private static final int CJK_IDEOGRAPH_COMPLEX_TEN_ = 0x62fe; private static final int CJK_IDEOGRAPH_HUNDRED_ = 0x767e; private static final int CJK_IDEOGRAPH_COMPLEX_HUNDRED_ = 0x4f70; private static final int CJK_IDEOGRAPH_THOUSAND_ = 0x5343; private static final int CJK_IDEOGRAPH_COMPLEX_THOUSAND_ = 0x4edf; private static final int CJK_IDEOGRAPH_TEN_THOUSAND_ = 0x824c; private static final int CJK_IDEOGRAPH_HUNDRED_MILLION_ = 0x5104; // /** // * Zero Width Non Joiner. // * Equivalent to icu4c ZWNJ. // */ // private static final int ZERO_WIDTH_NON_JOINER_ = 0x200c; // /** // * Zero Width Joiner // * Equivalent to icu4c ZWJ. // */ // private static final int ZERO_WIDTH_JOINER_ = 0x200d; /* * Properties in vector word 2 * Bits * 31..26 reserved * 25..20 Line Break * 19..15 Sentence Break * 14..10 Word Break * 9.. 5 Grapheme Cluster Break * 4.. 0 Decomposition Type */ private static final int LB_MASK = 0x03f00000; private static final int LB_SHIFT = 20; private static final int LB_VWORD = 2; private static final int SB_MASK = 0x000f8000; private static final int SB_SHIFT = 15; private static final int WB_MASK = 0x00007c00; private static final int WB_SHIFT = 10; private static final int GCB_MASK = 0x000003e0; private static final int GCB_SHIFT = 5; /** * Integer properties mask for decomposition type. * Equivalent to icu4c UPROPS_DT_MASK. */ private static final int DECOMPOSITION_TYPE_MASK_ = 0x0000001f; /* * Properties in vector word 0 * Bits * 31..24 DerivedAge version major/minor one nibble each * 23..20 reserved * 19..17 East Asian Width * 16.. 8 UBlockCode * 7.. 0 UScriptCode */ /** * Integer properties mask and shift values for East Asian cell width. * Equivalent to icu4c UPROPS_EA_MASK */ private static final int EAST_ASIAN_MASK_ = 0x000e0000; /** * Integer properties mask and shift values for East Asian cell width. * Equivalent to icu4c UPROPS_EA_SHIFT */ private static final int EAST_ASIAN_SHIFT_ = 17; /** * Integer properties mask and shift values for blocks. * Equivalent to icu4c UPROPS_BLOCK_MASK */ private static final int BLOCK_MASK_ = 0x0001ff00; /** * Integer properties mask and shift values for blocks. * Equivalent to icu4c UPROPS_BLOCK_SHIFT */ private static final int BLOCK_SHIFT_ = 8; /** * Integer properties mask and shift values for scripts. * Equivalent to icu4c UPROPS_SHIFT_MASK */ private static final int SCRIPT_MASK_ = 0x000000ff; // private constructor ----------------------------------------------- ///CLOVER:OFF /** * Private constructor to prevent instantiation */ private UCharacter() { } ///CLOVER:ON // private methods --------------------------------------------------- /** * Getting the digit values of characters like 'A' - 'Z', normal, * half-width and full-width. This method assumes that the other digit * characters are checked by the calling method. * @param ch character to test * @return -1 if ch is not a character of the form 'A' - 'Z', otherwise * its corresponding digit will be returned. */ private static int getEuropeanDigit(int ch) { if ((ch > 0x7a && ch < 0xff21) || ch < 0x41 || (ch > 0x5a && ch < 0x61) || ch > 0xff5a || (ch > 0xff3a && ch < 0xff41)) { return -1; } if (ch <= 0x7a) { // ch >= 0x41 or ch < 0x61 return ch + 10 - ((ch <= 0x5a) ? 0x41 : 0x61); } // ch >= 0xff21 if (ch <= 0xff3a) { return ch + 10 - 0xff21; } // ch >= 0xff41 && ch <= 0xff5a return ch + 10 - 0xff41; } /** * Gets the numeric type of the property argument * @param props 32 bit property * @return the numeric type */ private static int getNumericType(int props) { return (props & NUMERIC_TYPE_MASK_) >> NUMERIC_TYPE_SHIFT_; } /** * Gets the property value at the index. * This is optimized. * Note this is alittle different from CharTrie the index m_trieData_ * is never negative. * This is a duplicate of UCharacterProperty.getProperty. For optimization * purposes, this method calls the trie data directly instead of through * UCharacterProperty.getProperty. * @param ch code point whose property value is to be retrieved * @return property value of code point * @stable ICU 2.6 */ private static final int getProperty(int ch) { if (ch < UTF16.LEAD_SURROGATE_MIN_VALUE || (ch > UTF16.LEAD_SURROGATE_MAX_VALUE && ch < UTF16.SUPPLEMENTARY_MIN_VALUE)) { // BMP codepoint 0000..D7FF or DC00..FFFF try { // using try for ch < 0 is faster than using an if statement return PROPERTY_TRIE_DATA_[ (PROPERTY_TRIE_INDEX_[ch >> 5] << 2) + (ch & 0x1f)]; } catch (ArrayIndexOutOfBoundsException e) { return PROPERTY_INITIAL_VALUE_; } } if (ch <= UTF16.LEAD_SURROGATE_MAX_VALUE) { // lead surrogate D800..DBFF return PROPERTY_TRIE_DATA_[ (PROPERTY_TRIE_INDEX_[(0x2800 >> 5) + (ch >> 5)] << 2) + (ch & 0x1f)]; } // for optimization if (ch <= UTF16.CODEPOINT_MAX_VALUE) { // supplementary code point 10000..10FFFF // look at the construction of supplementary characters // trail forms the ends of it. return PROPERTY_.m_trie_.getSurrogateValue( UTF16.getLeadSurrogate(ch), (char)(ch & 0x3ff)); } // return m_dataOffset_ if there is an error, in this case we return // the default value: m_initialValue_ // we cannot assume that m_initialValue_ is at offset 0 // this is for optimization. return PROPERTY_INITIAL_VALUE_; } } icu4j-4.2/src/com/ibm/icu/lang/UScript.java0000644000175000017500000006164211361046134020414 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2001-2009 International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; import com.ibm.icu.impl.ICUResourceBundle; import com.ibm.icu.impl.UCharacterProperty; import com.ibm.icu.util.ULocale; import com.ibm.icu.util.UResourceBundle; import java.util.Locale; import java.util.MissingResourceException; /** * A class to reflect UTR #24: Script Names * (based on ISO 15924:2000, "Code for the representation of names of * scripts"). UTR #24 describes the basis for a new Unicode data file, * Scripts.txt. * @stable ICU 2.4 */ public final class UScript { /** * Invalid code * @stable ICU 2.4 */ public static final int INVALID_CODE = -1; /** * Common * @stable ICU 2.4 */ public static final int COMMON = 0; /* Zyyy */ /** * Inherited * @stable ICU 2.4 */ public static final int INHERITED = 1; /* Qaai */ /** * Arabic * @stable ICU 2.4 */ public static final int ARABIC = 2; /* Arab */ /** * Armenian * @stable ICU 2.4 */ public static final int ARMENIAN = 3; /* Armn */ /** * Bengali * @stable ICU 2.4 */ public static final int BENGALI = 4; /* Beng */ /** * Bopomofo * @stable ICU 2.4 */ public static final int BOPOMOFO = 5; /* Bopo */ /** * Cherokee * @stable ICU 2.4 */ public static final int CHEROKEE = 6; /* Cher */ /** * Coptic * @stable ICU 2.4 */ public static final int COPTIC = 7; /* Qaac */ /** * Cyrillic * @stable ICU 2.4 */ public static final int CYRILLIC = 8; /* Cyrl (Cyrs) */ /** * Deseret * @stable ICU 2.4 */ public static final int DESERET = 9; /* Dsrt */ /** * Devanagari * @stable ICU 2.4 */ public static final int DEVANAGARI = 10; /* Deva */ /** * Ethiopic * @stable ICU 2.4 */ public static final int ETHIOPIC = 11; /* Ethi */ /** * Georgian * @stable ICU 2.4 */ public static final int GEORGIAN = 12; /* Geor (Geon; Geoa) */ /** * Gothic * @stable ICU 2.4 */ public static final int GOTHIC = 13; /* Goth */ /** * Greek * @stable ICU 2.4 */ public static final int GREEK = 14; /* Grek */ /** * Gujarati * @stable ICU 2.4 */ public static final int GUJARATI = 15; /* Gujr */ /** * Gurmukhi * @stable ICU 2.4 */ public static final int GURMUKHI = 16; /* Guru */ /** * Han * @stable ICU 2.4 */ public static final int HAN = 17; /* Hani */ /** * Hangul * @stable ICU 2.4 */ public static final int HANGUL = 18; /* Hang */ /** * Hebrew * @stable ICU 2.4 */ public static final int HEBREW = 19; /* Hebr */ /** * Hiragana * @stable ICU 2.4 */ public static final int HIRAGANA = 20; /* Hira */ /** * Kannada * @stable ICU 2.4 */ public static final int KANNADA = 21; /* Knda */ /** * Katakana * @stable ICU 2.4 */ public static final int KATAKANA = 22; /* Kana */ /** * Khmer * @stable ICU 2.4 */ public static final int KHMER = 23; /* Khmr */ /** * Lao * @stable ICU 2.4 */ public static final int LAO = 24; /* Laoo */ /** * Latin * @stable ICU 2.4 */ public static final int LATIN = 25; /* Latn (Latf; Latg) */ /** * Malayalam * @stable ICU 2.4 */ public static final int MALAYALAM = 26; /* Mlym */ /** * Mangolian * @stable ICU 2.4 */ public static final int MONGOLIAN = 27; /* Mong */ /** * Myammar * @stable ICU 2.4 */ public static final int MYANMAR = 28; /* Mymr */ /** * Ogham * @stable ICU 2.4 */ public static final int OGHAM = 29; /* Ogam */ /** * Old Itallic * @stable ICU 2.4 */ public static final int OLD_ITALIC = 30; /* Ital */ /** * Oriya * @stable ICU 2.4 */ public static final int ORIYA = 31; /* Orya */ /** * Runic * @stable ICU 2.4 */ public static final int RUNIC = 32; /* Runr */ /** * Sinhala * @stable ICU 2.4 */ public static final int SINHALA = 33; /* Sinh */ /** * Syriac * @stable ICU 2.4 */ public static final int SYRIAC = 34; /* Syrc (Syrj; Syrn; Syre) */ /** * Tamil * @stable ICU 2.4 */ public static final int TAMIL = 35; /* Taml */ /** * Telugu * @stable ICU 2.4 */ public static final int TELUGU = 36; /* Telu */ /** * Thana * @stable ICU 2.4 */ public static final int THAANA = 37; /* Thaa */ /** * Thai * @stable ICU 2.4 */ public static final int THAI = 38; /* Thai */ /** * Tibetan * @stable ICU 2.4 */ public static final int TIBETAN = 39; /* Tibt */ /** * Unified Canadian Aboriginal Symbols * @stable ICU 2.6 */ public static final int CANADIAN_ABORIGINAL = 40; /* Cans */ /** * Unified Canadian Aboriginal Symbols (alias) * @stable ICU 2.4 */ public static final int UCAS = CANADIAN_ABORIGINAL; /* Cans */ /** * Yi syllables * @stable ICU 2.4 */ public static final int YI = 41; /* Yiii */ /** * Tagalog * @stable ICU 2.4 */ public static final int TAGALOG = 42; /* Tglg */ /** * Hanunooo * @stable ICU 2.4 */ public static final int HANUNOO = 43; /* Hano */ /** * Buhid * @stable ICU 2.4 */ public static final int BUHID = 44; /* Buhd */ /** * Tagbanwa * @stable ICU 2.4 */ public static final int TAGBANWA = 45; /* Tagb */ /** * Braille * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int BRAILLE = 46; /* Brai */ /** * Cypriot * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int CYPRIOT = 47; /* Cprt */ /** * Limbu * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int LIMBU = 48; /* Limb */ /** * Linear B * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int LINEAR_B = 49; /* Linb */ /** * Osmanya * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int OSMANYA = 50; /* Osma */ /** * Shavian * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int SHAVIAN = 51; /* Shaw */ /** * Tai Le * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int TAI_LE = 52; /* Tale */ /** * Ugaritic * Script in Unicode 4 * @stable ICU 2.6 * */ public static final int UGARITIC = 53; /* Ugar */ /** * Script in Unicode 4.0.1 * @stable ICU 3.0 */ public static final int KATAKANA_OR_HIRAGANA = 54; /*Hrkt */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int BUGINESE = 55; /* Bugi */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int GLAGOLITIC = 56; /* Glag */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int KHAROSHTHI = 57; /* Khar */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int SYLOTI_NAGRI = 58; /* Sylo */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int NEW_TAI_LUE = 59; /* Talu */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int TIFINAGH = 60; /* Tfng */ /** * Script in Unicode 4.1 * @stable ICU 3.4 */ public static final int OLD_PERSIAN = 61; /* Xpeo */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int BALINESE = 62; /* Bali */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int BATAK = 63; /* Batk */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int BLISSYMBOLS = 64; /* Blis */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int BRAHMI = 65; /* Brah */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int CHAM = 66; /* Cham */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int CIRTH = 67; /* Cirt */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int OLD_CHURCH_SLAVONIC_CYRILLIC = 68; /* Cyrs */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int DEMOTIC_EGYPTIAN = 69; /* Egyd */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int HIERATIC_EGYPTIAN = 70; /* Egyh */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int EGYPTIAN_HIEROGLYPHS = 71; /* Egyp */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int KHUTSURI = 72; /* Geok */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int SIMPLIFIED_HAN = 73; /* Hans */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int TRADITIONAL_HAN = 74; /* Hant */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int PAHAWH_HMONG = 75; /* Hmng */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int OLD_HUNGARIAN = 76; /* Hung */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int HARAPPAN_INDUS = 77; /* Inds */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int JAVANESE = 78; /* Java */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int KAYAH_LI = 79; /* Kali */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int LATIN_FRAKTUR = 80; /* Latf */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int LATIN_GAELIC = 81; /* Latg */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int LEPCHA = 82; /* Lepc */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int LINEAR_A = 83; /* Lina */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int MANDAEAN = 84; /* Mand */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int MAYAN_HIEROGLYPHS = 85; /* Maya */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int MEROITIC = 86; /* Mero */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int NKO = 87; /* Nkoo */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int ORKHON = 88; /* Orkh */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int OLD_PERMIC = 89; /* Perm */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int PHAGS_PA = 90; /* Phag */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int PHOENICIAN = 91; /* Phnx */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int PHONETIC_POLLARD = 92; /* Plrd */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int RONGORONGO = 93; /* Roro */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int SARATI = 94; /* Sara */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int ESTRANGELO_SYRIAC = 95; /* Syre */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int WESTERN_SYRIAC = 96; /* Syrj */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int EASTERN_SYRIAC = 97; /* Syrn */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int TENGWAR = 98; /* Teng */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int VAI = 99; /* Vaii */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int VISIBLE_SPEECH = 100;/* Visp */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int CUNEIFORM = 101;/* Xsux */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int UNWRITTEN_LANGUAGES = 102;/* Zxxx */ /** * ISO 15924 script code * @stable ICU 3.6 */ public static final int UNKNOWN = 103;/* Zzzz */ /* Unknown="Code for uncoded script", for unassigned code points */ /* Private use codes from Qaaa - Qabx are not supported*/ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int CARIAN = 104;/* Cari */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int JAPANESE = 105;/* Jpan */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int LANNA = 106;/* Lana */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int LYCIAN = 107;/* Lyci */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int LYDIAN = 108;/* Lydi */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int OL_CHIKI = 109;/* Olck */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int REJANG = 110;/* Rjng */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int SAURASHTRA = 111;/* Saur */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int SIGN_WRITING = 112;/* Sgnw */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int SUNDANESE = 113;/* Sund */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int MOON = 114;/* Moon */ /** * ISO 15924 script code * @stable ICU 3.8 */ public static final int MEITEI_MAYEK = 115;/* Mtei */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int IMPERIAL_ARAMAIC = 116;/* Armi */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int AVESTAN = 117;/* Avst */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int CHAKMA = 118;/* Cakm */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int KOREAN = 119;/* Kore */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int KAITHI = 120;/* Kthi */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int MANICHAEAN = 121;/* Mani */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int INSCRIPTIONAL_PAHLAVI = 122;/* Phli */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int PSALTER_PAHLAVI = 123;/* Phlp */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int BOOK_PAHLAVI = 124;/* Phlv */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int INSCRIPTIONAL_PARTHIAN = 125;/* Prti */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int SAMARITAN = 126;/* Samr */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int TAI_VIET = 127;/* Tavt */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int MATHEMATICAL_NOTATION = 128;/* Zmth */ /** * ISO 15924 script code * @stable ICU 4.0 */ public static final int SYMBOLS = 129;/* Zsym */ /** * Limit * @stable ICU 2.4 */ public static final int CODE_LIMIT = 130; private static final int SCRIPT_MASK = 0x0000007f; private static final UCharacterProperty prop= UCharacterProperty.getInstance(); private static final String kLocaleScript = "LocaleScript"; //private static final String INVALID_NAME = "Invalid"; /** * Helper function to find the code from locale. * @param locale The locale. */ private static int[] findCodeFromLocale(ULocale locale) { ICUResourceBundle rb; try { rb = (ICUResourceBundle)UResourceBundle.getBundleInstance(ICUResourceBundle.ICU_BASE_NAME, locale); } catch (MissingResourceException e) { return null; } // if rb is not a strict fallback of the requested locale, return null //if(!LocaleUtility.isFallbackOf(rb.getULocale().toString(), locale.toString())){ // return null; //} //non existent locale check if(rb.getLoadingStatus()==ICUResourceBundle.FROM_DEFAULT && ! locale.equals(ULocale.getDefault())){ return null; } UResourceBundle sub = rb.get(kLocaleScript); int[] result = new int[sub.getSize()]; int w = 0; for (int i = 0; i < result.length; ++i) { int code = UCharacter.getPropertyValueEnum(UProperty.SCRIPT, sub.getString(i)); result[w++] = code; } if (w < result.length) { throw new IllegalStateException("bad locale data, listed " + result.length + " scripts but found only " + w); } return result; } /** * Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. * Returns MALAYAM given "Malayam" OR "Mlym". * Returns LATIN given "en" OR "en_US" * @param locale Locale * @return The script codes array. null if the the code cannot be found. * @stable ICU 2.4 */ public static final int[] getCode(Locale locale){ return findCodeFromLocale(ULocale.forLocale(locale)); } /** * Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. * Returns MALAYAM given "Malayam" OR "Mlym". * Returns LATIN given "en" OR "en_US" * @param locale ULocale * @return The script codes array. null if the the code cannot be found. * @stable ICU 3.0 */ public static final int[] getCode(ULocale locale){ return findCodeFromLocale(locale); } /** * Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. * Returns MALAYAM given "Malayam" OR "Mlym". * Returns LATIN given "en" OR "en_US" * *

Note: To search by short or long script alias only, use * UCharacater.getPropertyValueEnum(UProperty.SCRIPT, alias) * instead. This does a fast lookup with no access of the locale * data. * @param nameOrAbbrOrLocale name of the script or ISO 15924 code or locale * @return The script codes array. null if the the code cannot be found. * @stable ICU 2.4 */ public static final int[] getCode(String nameOrAbbrOrLocale){ try { return new int[] { UCharacter.getPropertyValueEnum(UProperty.SCRIPT, nameOrAbbrOrLocale) }; } catch (IllegalArgumentException e) { return findCodeFromLocale(new ULocale(nameOrAbbrOrLocale)); } } /** * Gets a script codes associated with the given ISO 15924 abbreviation or name. * Returns MALAYAM given "Malayam" OR "Mlym". * * @param nameOrAbbr name of the script or ISO 15924 code * @return The script code value or INVALID_CODE if the code cannot be found. * @internal * @deprecated This API is ICU internal only. */ public static final int getCodeFromName(String nameOrAbbr) { try { return UCharacter.getPropertyValueEnum(UProperty.SCRIPT, nameOrAbbr); } catch (IllegalArgumentException e) { return INVALID_CODE; } } /** * Gets the script code associated with the given codepoint. * Returns UScript.MALAYAM given 0x0D02 * @param codepoint UChar32 codepoint * @return The script code * @stable ICU 2.4 */ public static final int getScript(int codepoint){ if (codepoint >= UCharacter.MIN_VALUE & codepoint <= UCharacter.MAX_VALUE) { return (prop.getAdditional(codepoint,0) & SCRIPT_MASK); }else{ throw new IllegalArgumentException(Integer.toString(codepoint)); } } /** * Gets a script name associated with the given script code. * Returns "Malayam" given MALAYAM * @param scriptCode int script code * @return script name as a string in full as given in TR#24 * @stable ICU 2.4 */ public static final String getName(int scriptCode){ return UCharacter.getPropertyValueName(UProperty.SCRIPT, scriptCode, UProperty.NameChoice.LONG); } /** * Gets a script name associated with the given script code. * Returns "Mlym" given MALAYAM * @param scriptCode int script code * @return script abbreviated name as a string as given in TR#24 * @stable ICU 2.4 */ public static final String getShortName(int scriptCode){ return UCharacter.getPropertyValueName(UProperty.SCRIPT, scriptCode, UProperty.NameChoice.SHORT); } ///CLOVER:OFF /** * Private Constructor. Never default construct */ private UScript(){} ///CLOVER:ON } icu4j-4.2/src/com/ibm/icu/lang/package.html0000644000175000017500000000131711361046134020432 0ustar twernertwerner C:ICU4J .lang Package Overview

Enhanced character property and surrogate support.

UCharacter supports all characters and properties defined in the latest version of Unicode, including properties of surrogate characters. It provides new API for querying surrogate characters (represented as int) and also supports the java.lang.Character API. UScript and UScriptRun provide information about scripts, which is not available through the Java APIs.

icu4j-4.2/src/com/ibm/icu/lang/UCharacterNameIterator.java0000644000175000017500000003016311361046134023351 0ustar twernertwerner/* ****************************************************************************** * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ****************************************************************************** */ package com.ibm.icu.lang; import com.ibm.icu.util.ValueIterator; import com.ibm.icu.impl.UCharacterName; import com.ibm.icu.impl.UCharacterNameChoice; /** *

Class enabling iteration of the codepoints and their names.

*

Result of each iteration contains a valid codepoint that has valid * name.

*

See UCharacter.getNameIterator() for an example of use.

* @author synwee * @since release 2.1, March 5 2002 */ class UCharacterNameIterator implements ValueIterator { // public methods ---------------------------------------------------- /** *

Gets the next result for this iteration and returns * true if we are not at the end of the iteration, false otherwise.

*

If the return boolean is a false, the contents of elements will not * be updated.

* @param element for storing the result codepoint and name * @return true if we are not at the end of the iteration, false otherwise. * @see Element */ public boolean next(ValueIterator.Element element) { if (m_current_ >= m_limit_) { return false; } if (m_choice_ != UCharacterNameChoice.UNICODE_10_CHAR_NAME) { int length = m_name_.getAlgorithmLength(); if (m_algorithmIndex_ < length) { while (m_algorithmIndex_ < length) { // find the algorithm range that could contain m_current_ if (m_algorithmIndex_ < 0 || m_name_.getAlgorithmEnd(m_algorithmIndex_) < m_current_) { m_algorithmIndex_ ++; } else { break; } } if (m_algorithmIndex_ < length) { // interleave the data-driven ones with the algorithmic ones // iterate over all algorithmic ranges; assume that they are // in ascending order int start = m_name_.getAlgorithmStart(m_algorithmIndex_); if (m_current_ < start) { // this should get rid of those codepoints that are not // in the algorithmic range int end = start; if (m_limit_ <= start) { end = m_limit_; } if (!iterateGroup(element, end)) { m_current_ ++; return true; } } if (m_current_ >= m_limit_) { // after iterateGroup fails, current codepoint may be // greater than limit return false; } element.integer = m_current_; element.value = m_name_.getAlgorithmName(m_algorithmIndex_, m_current_); // reset the group index if we are in the algorithmic names m_groupIndex_ = -1; m_current_ ++; return true; } } } // enumerate the character names after the last algorithmic range if (!iterateGroup(element, m_limit_)) { m_current_ ++; return true; } else if (m_choice_ == UCharacterNameChoice.EXTENDED_CHAR_NAME) { if (!iterateExtended(element, m_limit_)) { m_current_ ++; return true; } } return false; } /** *

Resets the iterator to start iterating from the integer index * UCharacter.MIN_VALUE or X if a setRange(X, Y) has been called previously. *

*/ public void reset() { m_current_ = m_start_; m_groupIndex_ = -1; m_algorithmIndex_ = -1; } /** *

Restricts the range of integers to iterate and resets the iteration * to begin at the index argument start.

*

If setRange(start, end) is not performed before next(element) is * called, the iteration will start from the integer index * UCharacter.MIN_VALUE and end at UCharacter.MAX_VALUE.

*

* If this range is set outside the range of UCharacter.MIN_VALUE and * UCharacter.MAX_VALUE, next(element) will always return false. *

* @param start first integer in range to iterate * @param limit 1 integer after the last integer in range * @exception IllegalArgumentException thrown when attempting to set an * illegal range. E.g limit <= start */ public void setRange(int start, int limit) { if (start >= limit) { throw new IllegalArgumentException( "start or limit has to be valid Unicode codepoints and start < limit"); } if (start < UCharacter.MIN_VALUE) { m_start_ = UCharacter.MIN_VALUE; } else { m_start_ = start; } if (limit > UCharacter.MAX_VALUE + 1) { m_limit_ = UCharacter.MAX_VALUE + 1; } else { m_limit_ = limit; } m_current_ = m_start_; } // protected constructor --------------------------------------------- /** * Constructor * @param name name data * @param choice name choice from the class * com.ibm.icu.lang.UCharacterNameChoice */ protected UCharacterNameIterator(UCharacterName name, int choice) { if(name==null){ throw new IllegalArgumentException("UCharacterName name argument cannot be null. Missing unames.icu?"); } m_name_ = name; // no explicit choice in UCharacter so no checks on choice m_choice_ = choice; m_start_ = UCharacter.MIN_VALUE; m_limit_ = UCharacter.MAX_VALUE + 1; m_current_ = m_start_; } // private data members --------------------------------------------- /** * Name data */ private UCharacterName m_name_; /** * Name choice */ private int m_choice_; /** * Start iteration range */ private int m_start_; /** * End + 1 iteration range */ private int m_limit_; /** * Current codepoint */ private int m_current_; /** * Group index */ private int m_groupIndex_ = -1; /** * Algorithm index */ private int m_algorithmIndex_ = -1; /** * Group use */ private static char GROUP_OFFSETS_[] = new char[UCharacterName.LINES_PER_GROUP_ + 1]; private static char GROUP_LENGTHS_[] = new char[UCharacterName.LINES_PER_GROUP_ + 1]; // private methods -------------------------------------------------- /** * Group name iteration, iterate all the names in the current 32-group and * returns the first codepoint that has a valid name. * @param result stores the result codepoint and name * @param limit last codepoint + 1 in range to search * @return false if a codepoint with a name is found in group and we can * bail from further iteration, true to continue on with the * iteration */ private boolean iterateSingleGroup(ValueIterator.Element result, int limit) { synchronized(GROUP_OFFSETS_) { synchronized(GROUP_LENGTHS_) { int index = m_name_.getGroupLengths(m_groupIndex_, GROUP_OFFSETS_, GROUP_LENGTHS_); while (m_current_ < limit) { int offset = UCharacterName.getGroupOffset(m_current_); String name = m_name_.getGroupName( index + GROUP_OFFSETS_[offset], GROUP_LENGTHS_[offset], m_choice_); if ((name == null || name.length() == 0) && m_choice_ == UCharacterNameChoice.EXTENDED_CHAR_NAME) { name = m_name_.getExtendedName(m_current_); } if (name != null && name.length() > 0) { result.integer = m_current_; result.value = name; return false; } ++ m_current_; } } } return true; } /** * Group name iteration, iterate all the names in the current 32-group and * returns the first codepoint that has a valid name. * @param result stores the result codepoint and name * @param limit last codepoint + 1 in range to search * @return false if a codepoint with a name is found in group and we can * bail from further iteration, true to continue on with the * iteration */ private boolean iterateGroup(ValueIterator.Element result, int limit) { if (m_groupIndex_ < 0) { m_groupIndex_ = m_name_.getGroup(m_current_); } while (m_groupIndex_ < m_name_.m_groupcount_ && m_current_ < limit) { // iterate till the last group or the last codepoint int startMSB = UCharacterName.getCodepointMSB(m_current_); int gMSB = m_name_.getGroupMSB(m_groupIndex_); // can be -1 if (startMSB == gMSB) { if (startMSB == UCharacterName.getCodepointMSB(limit - 1)) { // if start and limit - 1 are in the same group, then enumerate // only in that one return iterateSingleGroup(result, limit); } // enumerate characters in the partial start group // if (m_name_.getGroupOffset(m_current_) != 0) { if (!iterateSingleGroup(result, UCharacterName.getGroupLimit(gMSB))) { return false; } ++ m_groupIndex_; // continue with the next group } else if (startMSB > gMSB) { // make sure that we start enumerating with the first group // after start m_groupIndex_ ++; } else { int gMIN = UCharacterName.getGroupMin(gMSB); if (gMIN > limit) { gMIN = limit; } if (m_choice_ == UCharacterNameChoice.EXTENDED_CHAR_NAME) { if (!iterateExtended(result, gMIN)) { return false; } } m_current_ = gMIN; } } return true; } /** * Iterate extended names. * @param result stores the result codepoint and name * @param limit last codepoint + 1 in range to search * @return false if a codepoint with a name is found and we can * bail from further iteration, true to continue on with the * iteration (this will always be false for valid codepoints) */ private boolean iterateExtended(ValueIterator.Element result, int limit) { while (m_current_ < limit) { String name = m_name_.getExtendedOr10Name(m_current_); if (name != null && name.length() > 0) { result.integer = m_current_; result.value = name; return false; } ++ m_current_; } return true; } } icu4j-4.2/src/com/ibm/icu/lang/UCharacterEnums.java0000644000175000017500000003407611361046134022055 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2004-2007, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; /** * A container for the different 'enumerated types' used by UCharacter. * @stable ICU 3.0 */ public class UCharacterEnums { /** This is just a namespace, it is not instantiatable. */ ///CLOVER:OFF private UCharacterEnums() {} /** * 'Enum' for the CharacterCategory constants. These constants are * compatible in name but not in value with those defined in * java.lang.Character. * @see UCharacterCategory * @stable ICU 3.0 */ public static interface ECharacterCategory { /** * Unassigned character type * @stable ICU 2.1 */ public static final byte UNASSIGNED = 0; /** * Character type Cn * Not Assigned (no characters in [UnicodeData.txt] have this property) * @stable ICU 2.6 */ public static final byte GENERAL_OTHER_TYPES = 0; /** * Character type Lu * @stable ICU 2.1 */ public static final byte UPPERCASE_LETTER = 1; /** * Character type Ll * @stable ICU 2.1 */ public static final byte LOWERCASE_LETTER = 2; /** * Character type Lt * @stable ICU 2.1 */ public static final byte TITLECASE_LETTER = 3; /** * Character type Lm * @stable ICU 2.1 */ public static final byte MODIFIER_LETTER = 4; /** * Character type Lo * @stable ICU 2.1 */ public static final byte OTHER_LETTER = 5; /** * Character type Mn * @stable ICU 2.1 */ public static final byte NON_SPACING_MARK = 6; /** * Character type Me * @stable ICU 2.1 */ public static final byte ENCLOSING_MARK = 7; /** * Character type Mc * @stable ICU 2.1 */ public static final byte COMBINING_SPACING_MARK = 8; /** * Character type Nd * @stable ICU 2.1 */ public static final byte DECIMAL_DIGIT_NUMBER = 9; /** * Character type Nl * @stable ICU 2.1 */ public static final byte LETTER_NUMBER = 10; /** * Character type No * @stable ICU 2.1 */ public static final byte OTHER_NUMBER = 11; /** * Character type Zs * @stable ICU 2.1 */ public static final byte SPACE_SEPARATOR = 12; /** * Character type Zl * @stable ICU 2.1 */ public static final byte LINE_SEPARATOR = 13; /** * Character type Zp * @stable ICU 2.1 */ public static final byte PARAGRAPH_SEPARATOR = 14; /** * Character type Cc * @stable ICU 2.1 */ public static final byte CONTROL = 15; /** * Character type Cf * @stable ICU 2.1 */ public static final byte FORMAT = 16; /** * Character type Co * @stable ICU 2.1 */ public static final byte PRIVATE_USE = 17; /** * Character type Cs * @stable ICU 2.1 */ public static final byte SURROGATE = 18; /** * Character type Pd * @stable ICU 2.1 */ public static final byte DASH_PUNCTUATION = 19; /** * Character type Ps * @stable ICU 2.1 */ public static final byte START_PUNCTUATION = 20; /** * Character type Pe * @stable ICU 2.1 */ public static final byte END_PUNCTUATION = 21; /** * Character type Pc * @stable ICU 2.1 */ public static final byte CONNECTOR_PUNCTUATION = 22; /** * Character type Po * @stable ICU 2.1 */ public static final byte OTHER_PUNCTUATION = 23; /** * Character type Sm * @stable ICU 2.1 */ public static final byte MATH_SYMBOL = 24; /** * Character type Sc * @stable ICU 2.1 */ public static final byte CURRENCY_SYMBOL = 25; /** * Character type Sk * @stable ICU 2.1 */ public static final byte MODIFIER_SYMBOL = 26; /** * Character type So * @stable ICU 2.1 */ public static final byte OTHER_SYMBOL = 27; /** * Character type Pi * @see #INITIAL_QUOTE_PUNCTUATION * @stable ICU 2.1 */ public static final byte INITIAL_PUNCTUATION = 28; /** * Character type Pi * This name is compatible with java.lang.Character's name for this type. * @see #INITIAL_PUNCTUATION * @stable ICU 2.8 */ public static final byte INITIAL_QUOTE_PUNCTUATION = 28; /** * Character type Pf * @see #FINAL_QUOTE_PUNCTUATION * @stable ICU 2.1 */ public static final byte FINAL_PUNCTUATION = 29; /** * Character type Pf * This name is compatible with java.lang.Character's name for this type. * @see #FINAL_PUNCTUATION * @stable ICU 2.8 */ public static final byte FINAL_QUOTE_PUNCTUATION = 29; /** * Character type count * @stable ICU 2.1 */ public static final byte CHAR_CATEGORY_COUNT = 30; } /** * 'Enum' for the CharacterDirection constants. There are two sets * of names, those used in ICU, and those used in the JDK. The * JDK constants are compatible in name but not in value * with those defined in java.lang.Character. * @see UCharacterDirection * @stable ICU 3.0 */ public static interface ECharacterDirection { /** * Directional type L * @stable ICU 2.1 */ public static final int LEFT_TO_RIGHT = 0; /** * JDK-compatible synonym for LEFT_TO_RIGHT. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = (byte)LEFT_TO_RIGHT; /** * Directional type R * @stable ICU 2.1 */ public static final int RIGHT_TO_LEFT = 1; /** * JDK-compatible synonym for RIGHT_TO_LEFT. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = (byte)RIGHT_TO_LEFT; /** * Directional type EN * @stable ICU 2.1 */ public static final int EUROPEAN_NUMBER = 2; /** * JDK-compatible synonym for EUROPEAN_NUMBER. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = (byte)EUROPEAN_NUMBER; /** * Directional type ES * @stable ICU 2.1 */ public static final int EUROPEAN_NUMBER_SEPARATOR = 3; /** * JDK-compatible synonym for EUROPEAN_NUMBER_SEPARATOR. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = (byte)EUROPEAN_NUMBER_SEPARATOR; /** * Directional type ET * @stable ICU 2.1 */ public static final int EUROPEAN_NUMBER_TERMINATOR = 4; /** * JDK-compatible synonym for EUROPEAN_NUMBER_TERMINATOR. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = (byte)EUROPEAN_NUMBER_TERMINATOR; /** * Directional type AN * @stable ICU 2.1 */ public static final int ARABIC_NUMBER = 5; /** * JDK-compatible synonym for ARABIC_NUMBER. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_ARABIC_NUMBER = (byte)ARABIC_NUMBER; /** * Directional type CS * @stable ICU 2.1 */ public static final int COMMON_NUMBER_SEPARATOR = 6; /** * JDK-compatible synonym for COMMON_NUMBER_SEPARATOR. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = (byte)COMMON_NUMBER_SEPARATOR; /** * Directional type B * @stable ICU 2.1 */ public static final int BLOCK_SEPARATOR = 7; /** * JDK-compatible synonym for BLOCK_SEPARATOR. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = (byte)BLOCK_SEPARATOR; /** * Directional type S * @stable ICU 2.1 */ public static final int SEGMENT_SEPARATOR = 8; /** * JDK-compatible synonym for SEGMENT_SEPARATOR. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = (byte)SEGMENT_SEPARATOR; /** * Directional type WS * @stable ICU 2.1 */ public static final int WHITE_SPACE_NEUTRAL = 9; /** * JDK-compatible synonym for WHITE_SPACE_NEUTRAL. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_WHITESPACE = (byte)WHITE_SPACE_NEUTRAL; /** * Directional type ON * @stable ICU 2.1 */ public static final int OTHER_NEUTRAL = 10; /** * JDK-compatible synonym for OTHER_NEUTRAL. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_OTHER_NEUTRALS = (byte)OTHER_NEUTRAL; /** * Directional type LRE * @stable ICU 2.1 */ public static final int LEFT_TO_RIGHT_EMBEDDING = 11; /** * JDK-compatible synonym for LEFT_TO_RIGHT_EMBEDDING. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = (byte)LEFT_TO_RIGHT_EMBEDDING; /** * Directional type LRO * @stable ICU 2.1 */ public static final int LEFT_TO_RIGHT_OVERRIDE = 12; /** * JDK-compatible synonym for LEFT_TO_RIGHT_OVERRIDE. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = (byte)LEFT_TO_RIGHT_OVERRIDE; /** * Directional type AL * @stable ICU 2.1 */ public static final int RIGHT_TO_LEFT_ARABIC = 13; /** * JDK-compatible synonym for RIGHT_TO_LEFT_ARABIC. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = (byte)RIGHT_TO_LEFT_ARABIC; /** * Directional type RLE * @stable ICU 2.1 */ public static final int RIGHT_TO_LEFT_EMBEDDING = 14; /** * JDK-compatible synonym for RIGHT_TO_LEFT_EMBEDDING. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = (byte)RIGHT_TO_LEFT_EMBEDDING; /** * Directional type RLO * @stable ICU 2.1 */ public static final int RIGHT_TO_LEFT_OVERRIDE = 15; /** * JDK-compatible synonym for RIGHT_TO_LEFT_OVERRIDE. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = (byte)RIGHT_TO_LEFT_OVERRIDE; /** * Directional type PDF * @stable ICU 2.1 */ public static final int POP_DIRECTIONAL_FORMAT = 16; /** * JDK-compatible synonym for POP_DIRECTIONAL_FORMAT. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = (byte)POP_DIRECTIONAL_FORMAT; /** * Directional type NSM * @stable ICU 2.1 */ public static final int DIR_NON_SPACING_MARK = 17; /** * JDK-compatible synonym for DIR_NON_SPACING_MARK. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_NONSPACING_MARK = (byte)DIR_NON_SPACING_MARK; /** * Directional type BN * @stable ICU 2.1 */ public static final int BOUNDARY_NEUTRAL = 18; /** * JDK-compatible synonym for BOUNDARY_NEUTRAL. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = (byte)BOUNDARY_NEUTRAL; /** * Number of directional types * @stable ICU 2.1 */ public static final int CHAR_DIRECTION_COUNT = 19; /** * Undefined bidirectional character type. Undefined char * values have undefined directionality in the Unicode specification. * @stable ICU 3.0 */ public static final byte DIRECTIONALITY_UNDEFINED = -1; } } icu4j-4.2/src/com/ibm/icu/lang/UScriptRun.java0000644000175000017500000004256111361046134021100 0ustar twernertwerner/* ******************************************************************************* * * Copyright (C) 1999-2008, International Business Machines * Corporation and others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; import com.ibm.icu.text.UTF16; /** * UScriptRun is used to find runs of characters in * the same script, as defined in the UScript class. * It implements a simple iterator over an array of characters. * The iterator will assign COMMON and INHERITED * characters to the same script as the preceeding characters. If the * COMMON and INHERITED characters are first, they will be assigned to * the same script as the following characters. * * The iterator will try to match paired punctuation. If it sees an * opening punctuation character, it will remember the script that * was assigned to that character, and assign the same script to the * matching closing punctuation. * * No attempt is made to combine related scripts into a single run. In * particular, Hiragana, Katakana, and Han characters will appear in separate * runs. * Here is an example of how to iterate over script runs: *
 * void printScriptRuns(char[] text)
 * {
 *     UScriptRun scriptRun = new UScriptRun(text);
 *
 *     while (scriptRun.next()) {
 *         int start  = scriptRun.getScriptStart();
 *         int limit  = scriptRun.getScriptLimit();
 *         int script = scriptRun.getScriptCode();
 *
 *         System.out.println("Script \"" + UScript.getName(script) + "\" from " +
 *                            start + " to " + limit + ".");
 *     }
 *  }
 * 
* * @internal * @deprecated This API is ICU internal only. */ public final class UScriptRun { /** * Construct an empty UScriptRun object. The next() * method will return false the first time it is called. * * @internal * @deprecated This API is ICU internal only. */ public UScriptRun() { char[] nullChars = null; reset(nullChars, 0, 0); } /** * Construct a UScriptRun object which iterates over the * characters in the given string. * * @param text the string of characters over which to iterate. * * @internal * @deprecated This API is ICU internal only. */ public UScriptRun(String text) { reset (text); } /** * Construct a UScriptRun object which iterates over a subrange * of the characetrs in the given string. * * @param text the string of characters over which to iterate. * @param start the index of the first character over which to iterate * @param count the number of characters over which to iterate * * @internal * @deprecated This API is ICU internal only. */ public UScriptRun(String text, int start, int count) { reset(text, start, count); } /** * Construct a UScriptRun object which iterates over the given * characetrs. * * @param chars the array of characters over which to iterate. * * @internal * @deprecated This API is ICU internal only. */ public UScriptRun(char[] chars) { reset(chars); } /** * Construct a UScriptRun object which iterates over a subrange * of the given characetrs. * * @param chars the array of characters over which to iterate. * @param start the index of the first character over which to iterate * @param count the number of characters over which to iterate * * @internal * @deprecated This API is ICU internal only. */ public UScriptRun(char[] chars, int start, int count) { reset(chars, start, count); } /** * Reset the iterator to the start of the text. * * @internal * @deprecated This API is ICU internal only. */ public final void reset() { // empty any old parenStack contents. // NOTE: this is not the most efficient way // to do this, but it's the easiest to write... while (stackIsNotEmpty()) { pop(); } scriptStart = textStart; scriptLimit = textStart; scriptCode = UScript.INVALID_CODE; parenSP = -1; pushCount = 0; fixupCount = 0; textIndex = textStart; } /** * Reset the iterator to iterate over the given range of the text. Throws * IllegalArgumentException if the range is outside of the bounds of the * character array. * * @param start the index of the new first character over which to iterate * @param count the new number of characters over which to iterate. * @exception IllegalArgumentException * * @internal * @deprecated This API is ICU internal only. */ public final void reset(int start, int count) throws IllegalArgumentException { int len = 0; if (text != null) { len = text.length; } if (start < 0 || count < 0 || start > len - count) { throw new IllegalArgumentException(); } textStart = start; textLimit = start + count; reset(); } /** * Reset the iterator to iterate over count characters * in chars starting at start. This allows * clients to reuse an iterator. * * @param chars the new array of characters over which to iterate. * @param start the index of the first character over which to iterate. * @param count the number of characters over which to iterate. * * @internal * @deprecated This API is ICU internal only. */ public final void reset(char[] chars, int start, int count) { if (chars == null) { chars = emptyCharArray; } text = chars; reset(start, count); } /** * Reset the iterator to iterate over the characters * in chars. This allows clients to reuse an iterator. * * @param chars the new array of characters over which to iterate. * * @internal * @deprecated This API is ICU internal only. */ public final void reset(char[] chars) { int length = 0; if (chars != null) { length = chars.length; } reset(chars, 0, length); } /** * Reset the iterator to iterate over count characters * in text starting at start. This allows * clients to reuse an iterator. * * @param str the new string of characters over which to iterate. * @param start the index of the first character over which to iterate. * @param count the nuber of characters over which to iterate. * * @internal * @deprecated This API is ICU internal only. */ public final void reset(String str, int start, int count) { char[] chars = null; if (str != null) { chars = str.toCharArray(); } reset(chars, start, count); } /** * Reset the iterator to iterate over the characters * in text. This allows clients to reuse an iterator. * * @param str the new string of characters over which to iterate. * * @internal * @deprecated This API is ICU internal only. */ public final void reset(String str) { int length = 0; if (str != null) { length = str.length(); } reset(str, 0, length); } /** * Get the starting index of the current script run. * * @return the index of the first character in the current script run. * * @internal * @deprecated This API is ICU internal only. */ public final int getScriptStart() { return scriptStart; } /** * Get the index of the first character after the current script run. * * @return the index of the first character after the current script run. * * @internal * @deprecated This API is ICU internal only. */ public final int getScriptLimit() { return scriptLimit; } /** * Get the script code for the script of the current script run. * * @return the script code for the script of the current script run. * @see com.ibm.icu.lang.UScript * * @internal * @deprecated This API is ICU internal only. */ public final int getScriptCode() { return scriptCode; } /** * Find the next script run. Returns false if there * isn't another run, returns true if there is. * * @return false if there isn't another run, true if there is. * * @internal * @deprecated This API is ICU internal only. */ public final boolean next() { // if we've fallen off the end of the text, we're done if (scriptLimit >= textLimit) { return false; } scriptCode = UScript.COMMON; scriptStart = scriptLimit; syncFixup(); while (textIndex < textLimit) { int ch = UTF16.charAt(text, textStart, textLimit, textIndex - textStart); int codePointCount = UTF16.getCharCount(ch); int sc = UScript.getScript(ch); int pairIndex = getPairIndex(ch); textIndex += codePointCount; // Paired character handling: // // if it's an open character, push it onto the stack. // if it's a close character, find the matching open on the // stack, and use that script code. Any non-matching open // characters above it on the stack will be poped. if (pairIndex >= 0) { if ((pairIndex & 1) == 0) { push(pairIndex, scriptCode); } else { int pi = pairIndex & ~1; while (stackIsNotEmpty() && top().pairIndex != pi) { pop(); } if (stackIsNotEmpty()) { sc = top().scriptCode; } } } if (sameScript(scriptCode, sc)) { if (scriptCode <= UScript.INHERITED && sc > UScript.INHERITED) { scriptCode = sc; fixup(scriptCode); } // if this character is a close paired character, // pop the matching open character from the stack if (pairIndex >= 0 && (pairIndex & 1) != 0) { pop(); } } else { // We've just seen the first character of // the next run. Back over it so we'll see // it again the next time. textIndex -= codePointCount; break; } } scriptLimit = textIndex; return true; } /** * Compare two script codes to see if they are in the same script. If one script is * a strong script, and the other is INHERITED or COMMON, it will compare equal. * * @param scriptOne one of the script codes. * @param scriptTwo the other script code. * @return true if the two scripts are the same. * @see com.ibm.icu.lang.UScript */ private static boolean sameScript(int scriptOne, int scriptTwo) { return scriptOne <= UScript.INHERITED || scriptTwo <= UScript.INHERITED || scriptOne == scriptTwo; } /* * An internal class which holds entries on the paren stack. */ private static final class ParenStackEntry { int pairIndex; int scriptCode; public ParenStackEntry(int thePairIndex, int theScriptCode) { pairIndex = thePairIndex; scriptCode = theScriptCode; } } private static final int mod(int sp) { return sp % PAREN_STACK_DEPTH; } private static final int inc(int sp, int count) { return mod(sp + count); } private static final int inc(int sp) { return inc(sp, 1); } private static final int dec(int sp, int count) { return mod(sp + PAREN_STACK_DEPTH - count); } private static final int dec(int sp) { return dec(sp, 1); } private static final int limitInc(int count) { if (count < PAREN_STACK_DEPTH) { count += 1; } return count; } private final boolean stackIsEmpty() { return pushCount <= 0; } private final boolean stackIsNotEmpty() { return ! stackIsEmpty(); } private final void push(int pairIndex, int scrptCode) { pushCount = limitInc(pushCount); fixupCount = limitInc(fixupCount); parenSP = inc(parenSP); parenStack[parenSP] = new ParenStackEntry(pairIndex, scrptCode); } private final void pop() { if (stackIsEmpty()) { return; } parenStack[parenSP] = null; if (fixupCount > 0) { fixupCount -= 1; } pushCount -= 1; parenSP = dec(parenSP); // If the stack is now empty, reset the stack // pointers to their initial values. if (stackIsEmpty()) { parenSP = -1; } } private final ParenStackEntry top() { return parenStack[parenSP]; } private final void syncFixup() { fixupCount = 0; } private final void fixup(int scrptCode) { int fixupSP = dec(parenSP, fixupCount); while (fixupCount-- > 0) { fixupSP = inc(fixupSP); parenStack[fixupSP].scriptCode = scrptCode; } } private char[] emptyCharArray = {}; private char[] text; private int textIndex; private int textStart; private int textLimit; private int scriptStart; private int scriptLimit; private int scriptCode; private static int PAREN_STACK_DEPTH = 32; private static ParenStackEntry parenStack[] = new ParenStackEntry[PAREN_STACK_DEPTH]; private int parenSP = -1; private int pushCount = 0; private int fixupCount = 0; /** * Find the highest bit that's set in a word. Uses a binary search through * the bits. * * @param n the word in which to find the highest bit that's set. * @return the bit number (counting from the low order bit) of the highest bit. */ private static final byte highBit(int n) { if (n <= 0) { return -32; } byte bit = 0; if (n >= 1 << 16) { n >>= 16; bit += 16; } if (n >= 1 << 8) { n >>= 8; bit += 8; } if (n >= 1 << 4) { n >>= 4; bit += 4; } if (n >= 1 << 2) { n >>= 2; bit += 2; } if (n >= 1 << 1) { n >>= 1; bit += 1; } return bit; } /** * Search the pairedChars array for the given character. * * @param ch the character for which to search. * @return the index of the character in the table, or -1 if it's not there. */ private static int getPairIndex(int ch) { int probe = pairedCharPower; int index = 0; if (ch >= pairedChars[pairedCharExtra]) { index = pairedCharExtra; } while (probe > (1 << 0)) { probe >>= 1; if (ch >= pairedChars[index + probe]) { index += probe; } } if (pairedChars[index] != ch) { index = -1; } return index; } private static int pairedChars[] = { 0x0028, 0x0029, // ascii paired punctuation 0x003c, 0x003e, 0x005b, 0x005d, 0x007b, 0x007d, 0x00ab, 0x00bb, // guillemets 0x2018, 0x2019, // general punctuation 0x201c, 0x201d, 0x2039, 0x203a, 0x3008, 0x3009, // chinese paired punctuation 0x300a, 0x300b, 0x300c, 0x300d, 0x300e, 0x300f, 0x3010, 0x3011, 0x3014, 0x3015, 0x3016, 0x3017, 0x3018, 0x3019, 0x301a, 0x301b }; private static int pairedCharPower = 1 << highBit(pairedChars.length); private static int pairedCharExtra = pairedChars.length - pairedCharPower; } icu4j-4.2/src/com/ibm/icu/lang/UCharacterCategory.java0000644000175000017500000000743611361046134022543 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.lang; import com.ibm.icu.lang.UCharacterEnums.ECharacterCategory; /** * Enumerated Unicode category types from the UnicodeData.txt file. * Used as return results from UCharacter * Equivalent to icu's UCharCategory. * Refer to * Unicode Consortium for more information about UnicodeData.txt. *

* NOTE: the UCharacterCategory values are not compatible with * those returned by java.lang.Character.getType. UCharacterCategory values * match the ones used in ICU4C, while java.lang.Character type * values, though similar, skip the value 17.

*

* This class is not subclassable *

* @author Syn Wee Quek * @stable ICU 2.1 */ public final class UCharacterCategory implements ECharacterCategory { /** * Gets the name of the argument category * @param category to retrieve name * @return category name * @stable ICU 2.1 */ public static String toString(int category) { switch (category) { case UPPERCASE_LETTER : return "Letter, Uppercase"; case LOWERCASE_LETTER : return "Letter, Lowercase"; case TITLECASE_LETTER : return "Letter, Titlecase"; case MODIFIER_LETTER : return "Letter, Modifier"; case OTHER_LETTER : return "Letter, Other"; case NON_SPACING_MARK : return "Mark, Non-Spacing"; case ENCLOSING_MARK : return "Mark, Enclosing"; case COMBINING_SPACING_MARK : return "Mark, Spacing Combining"; case DECIMAL_DIGIT_NUMBER : return "Number, Decimal Digit"; case LETTER_NUMBER : return "Number, Letter"; case OTHER_NUMBER : return "Number, Other"; case SPACE_SEPARATOR : return "Separator, Space"; case LINE_SEPARATOR : return "Separator, Line"; case PARAGRAPH_SEPARATOR : return "Separator, Paragraph"; case CONTROL : return "Other, Control"; case FORMAT : return "Other, Format"; case PRIVATE_USE : return "Other, Private Use"; case SURROGATE : return "Other, Surrogate"; case DASH_PUNCTUATION : return "Punctuation, Dash"; case START_PUNCTUATION : return "Punctuation, Open"; case END_PUNCTUATION : return "Punctuation, Close"; case CONNECTOR_PUNCTUATION : return "Punctuation, Connector"; case OTHER_PUNCTUATION : return "Punctuation, Other"; case MATH_SYMBOL : return "Symbol, Math"; case CURRENCY_SYMBOL : return "Symbol, Currency"; case MODIFIER_SYMBOL : return "Symbol, Modifier"; case OTHER_SYMBOL : return "Symbol, Other"; case INITIAL_PUNCTUATION : return "Punctuation, Initial quote"; case FINAL_PUNCTUATION : return "Punctuation, Final quote"; } return "Unassigned"; } // private constructor ----------------------------------------------- ///CLOVER:OFF /** * Private constructor to prevent initialisation */ private UCharacterCategory() { } ///CLOVER:ON } icu4j-4.2/src/com/ibm/icu/charset/0000755000175000017500000000000011361046170016657 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/charset/CharsetSelector.java0000644000175000017500000002131211361046170022613 0ustar twernertwerner/* ****************************************************************************** * Copyright (C) 1996-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ****************************************************************************** */ /* * This is a port of the C++ class UConverterSelector. * * Methods related to serialization are not ported in this version. In addition, * the selectForUTF8 method is not going to be ported, as UTF8 is seldom used * in Java. * * @author Shaopeng Jia */ package com.ibm.icu.charset; import java.nio.charset.Charset; import java.nio.charset.IllegalCharsetNameException; import java.nio.charset.UnsupportedCharsetException; import java.util.List; import java.util.Vector; import com.ibm.icu.impl.IntTrie; import com.ibm.icu.impl.PropsVectors; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; /** * Charset Selector * * A charset selector is built with a list of charset names and given an input * CharSequence returns the list of names the corresponding charsets which can * convert the CharSequence. * * @draft ICU 4.2 * @provisional This API might change or be removed in a future release. */ public final class CharsetSelector { private IntTrie trie; private int[] pv; // table of bits private String[] encodings; // encodings users ask to use private void generateSelectorData(PropsVectors pvec, UnicodeSet excludedCodePoints, int mappingTypes) { int columns = (encodings.length + 31) / 32; // set errorValue to all-ones for (int col = 0; col < columns; ++col) { pvec.setValue(PropsVectors.ERROR_VALUE_CP, PropsVectors.ERROR_VALUE_CP, col, ~0, ~0); } for (int i = 0; i < encodings.length; ++i) { Charset testCharset = CharsetICU.forNameICU(encodings[i]); UnicodeSet unicodePointSet = new UnicodeSet(); // empty set ((CharsetICU) testCharset).getUnicodeSet(unicodePointSet, mappingTypes); int column = i / 32; int mask = 1 << (i % 32); // now iterate over intervals on set i int itemCount = unicodePointSet.getRangeCount(); for (int j = 0; j < itemCount; ++j) { int startChar = unicodePointSet.getRangeStart(j); int endChar = unicodePointSet.getRangeEnd(j); pvec.setValue(startChar, endChar, column, ~0, mask); } } // handle excluded encodings // Simply set their values to all 1's in the pvec if (!excludedCodePoints.isEmpty()) { int itemCount = excludedCodePoints.getRangeCount(); for (int j = 0; j < itemCount; ++j) { int startChar = excludedCodePoints.getRangeStart(j); int endChar = excludedCodePoints.getRangeEnd(j); for (int col = 0; col < columns; col++) { pvec.setValue(startChar, endChar, col, ~0, ~0); } } } trie = pvec.compactToTrieWithRowIndexes(); pv = pvec.getCompactedArray(); } // internal function to intersect two sets of masks // returns whether the mask has reduced to all zeros. The // second set of mask consists of len elements in pv starting from // pvIndex private boolean intersectMasks(int[] dest, int pvIndex, int len) { int oredDest = 0; for (int i = 0; i < len; ++i) { oredDest |= (dest[i] &= pv[pvIndex + i]); } return oredDest == 0; } // internal function private List selectForMask(int[] mask) { // this is the context we will use. Store a table of indices to which // encodings are legit Vector result = new Vector(); int columns = (encodings.length + 31) / 32; int numOnes = countOnes(mask, columns); // now we know the exact space we need to index if (numOnes > 0) { int k = 0; for (int j = 0; j < columns; j++) { int v = mask[j]; for (int i = 0; i < 32 && k < encodings.length; i++, k++) { if ((v & 1) != 0) { result.addElement(encodings[k]); } v >>= 1; } } } // otherwise, index will remain NULL return result; } // internal function to count how many 1's are there in a mask // algorithm taken from http://graphics.stanford.edu/~seander/bithacks.html private int countOnes(int[] mask, int len) { int totalOnes = 0; for (int i = 0; i < len; ++i) { int ent = mask[i]; for (; ent != 0; totalOnes++) { ent &= ent - 1; // clear the least significant bit set } } return totalOnes; } /** * Construct a CharsetSelector from a list of charset names. * * @param charsetList * a list of charset names in the form of strings. If charsetList * is empty, a selector for all available charset is constructed. * @param excludedCodePoints * a set of code points to be excluded from consideration. * Excluded code points appearing in the input CharSequence do * not change the selection result. It could be empty when no * code point should be excluded. * @param mappingTypes * an int which determines whether to consider only roundtrip * mappings or also fallbacks, e.g. CharsetICU.ROUNDTRIP_SET. See * CharsetICU.java for the constants that are currently * supported. * @throws IllegalArgumentException * if the parameters is invalid. * @throws IllegalCharsetNameException * If the given charset name is illegal. * @throws UnsupportedCharsetException * If no support for the named charset is available in this * instance of the Java virtual machine. * @draft ICU 4.2 * @provisional This API might change or be removed in a future release. */ public CharsetSelector(List charsetList, UnicodeSet excludedCodePoints, int mappingTypes) { if (mappingTypes != CharsetICU.ROUNDTRIP_AND_FALLBACK_SET && mappingTypes != CharsetICU.ROUNDTRIP_SET) { throw new IllegalArgumentException("Unsupported mappingTypes"); } int encodingCount = charsetList.size(); if (encodingCount > 0) { encodings = new String[encodingCount]; for (int i = 0; i < encodingCount; i++) { encodings[i] = (String) charsetList.get(i); } } else { Object[] availableNames = CharsetProviderICU.getAvailableNames(); encodingCount = availableNames.length; encodings = new String[encodingCount]; for (int i = 0; i < encodingCount; i++) { encodings[i] = (String) availableNames[i]; } } PropsVectors pvec = new PropsVectors((encodingCount + 31) / 32); generateSelectorData(pvec, excludedCodePoints, mappingTypes); } /** * Select charsets that can map all characters in a CharSequence, ignoring * the excluded code points. * * @param unicodeText * a CharSequence. It could be empty. * @return a list that contains charset names in the form of strings. The * returned encoding names and their order will be the same as * supplied when building the selector. * * @draft ICU 4.2 * @provisional This API might change or be removed in a future release. */ public List selectForString(CharSequence unicodeText) { int columns = (encodings.length + 31) / 32; int[] mask = new int[columns]; for (int i = 0; i < columns; i++) { mask[i] = - 1; // set each bit to 1 // Note: All integers are signed in Java, assigning // 2 ^ 32 -1 to mask is wrong! } int index = 0; while (index < unicodeText.length()) { int c = UTF16.charAt(unicodeText, index); int pvIndex = trie.getCodePointValue(c); index += UTF16.getCharCount(c); if (intersectMasks(mask, pvIndex, columns)) { break; } } return selectForMask(mask); } } icu4j-4.2/src/com/ibm/icu/charset/UConverterConstants.java0000644000175000017500000001617211361046170023522 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; interface UConverterConstants { static final short UNSIGNED_BYTE_MASK = 0xff; static final int UNSIGNED_SHORT_MASK = 0xffff; static final long UNSIGNED_INT_MASK = 0xffffffffL; static final int U_IS_BIG_ENDIAN = 0; /** * Useful constant for the maximum size of the whole locale ID * (including the terminating NULL). */ static final int ULOC_FULLNAME_CAPACITY = 56; /** * This value is intended for sentinel values for APIs that * (take or) return single code points (UChar32). * It is outside of the Unicode code point range 0..0x10ffff. * * For example, a "done" or "error" value in a new API * could be indicated with U_SENTINEL. * * ICU APIs designed before ICU 2.4 usually define service-specific "done" * values, mostly 0xffff. * Those may need to be distinguished from * actual U+ffff text contents by calling functions like * CharacterIterator::hasNext() or UnicodeString::length(). */ static final int U_SENTINEL = -1; //end utf.h //begin ucnv.h /** * Character that separates converter names from options and options from each other. * @see CharsetICU#forNameICU(String) */ static final byte OPTION_SEP_CHAR = ','; /** Maximum length of a converter name including the terminating NULL */ static final int MAX_CONVERTER_NAME_LENGTH = 60; /** Maximum length of a converter name including path and terminating NULL */ static final int MAX_FULL_FILE_NAME_LENGTH = (600+MAX_CONVERTER_NAME_LENGTH); /** Shift in for EBDCDIC_STATEFUL and iso2022 states */ static final int SI = 0x0F; /** Shift out for EBDCDIC_STATEFUL and iso2022 states */ static final int SO = 0x0E; //end ucnv.h // begin bld.h /* size of the overflow buffers in UConverter, enough for escaping callbacks */ //#define ERROR_BUFFER_LENGTH 32 static final int ERROR_BUFFER_LENGTH = 32; /* at most 4 bytes per substitution character (part of .cnv file format! see UConverterStaticData) */ static final int MAX_SUBCHAR_LEN = 4; /* at most 8 bytes per character in toUBytes[] (UTF-8 uses up to 6) */ static final int MAX_CHAR_LEN = 8; /* converter options bits */ static final int OPTION_VERSION = 0xf; static final int OPTION_SWAP_LFNL = 0x10; static final int OPTION_MAC = 0x20; //agljport:comment added for Mac ISCII encodings static final String OPTION_SWAP_LFNL_STRING = ",swaplfnl"; /** values for the unicodeMask */ static final int HAS_SUPPLEMENTARY = 1; static final int HAS_SURROGATES = 2; // end bld.h // begin cnv.h /* this is used in fromUnicode DBCS tables as an "unassigned" marker */ static final int missingCharMarker = 0xFFFF; /** * * @author ram */ static interface UConverterResetChoice { static final int RESET_BOTH = 0; static final int RESET_TO_UNICODE = RESET_BOTH + 1; static final int RESET_FROM_UNICODE = RESET_TO_UNICODE + 1; } // begin utf16.h /** * The maximum number of 16-bit code units per Unicode code point (U+0000..U+10ffff). */ static final int U16_MAX_LENGTH = 2; // end utf16.h // begin err.h /** * FROM_U, TO_U context options for sub callback */ static byte[] SUB_STOP_ON_ILLEGAL = {'i'}; /** * FROM_U, TO_U context options for skip callback */ static byte[] SKIP_STOP_ON_ILLEGAL = {'i'}; /** * The process condition code to be used with the callbacks. * Codes which are greater than IRREGULAR should be * passed on to any chained callbacks. */ static interface UConverterCallbackReason { static final int UNASSIGNED = 0; /**< The code point is unassigned. The error code U_INVALID_CHAR_FOUND will be set. */ static final int ILLEGAL = 1; /**< The code point is illegal. For example, \\x81\\x2E is illegal in SJIS because \\x2E is not a valid trail byte for the \\x81 lead byte. Also, starting with Unicode 3.0.1, non-shortest byte sequences in UTF-8 (like \\xC1\\xA1 instead of \\x61 for U+0061) are also illegal, not just irregular. The error code U_ILLEGAL_CHAR_FOUND will be set. */ static final int IRREGULAR = 2; /**< The codepoint is not a regular sequence in the encoding. For example, \\xED\\xA0\\x80..\\xED\\xBF\\xBF are irregular UTF-8 byte sequences for single surrogate code points. The error code U_INVALID_CHAR_FOUND will be set. */ static final int RESET = 3; /**< The callback is called with this reason when a 'reset' has occured. Callback should reset all state. */ static final int CLOSE = 4; /**< Called when the converter is closed. The callback should release any allocated memory.*/ static final int CLONE = 5; /**< Called when safeClone() is called on the converter. the pointer available as the 'context' is an alias to the original converters' context pointer. If the context must be owned by the new converter, the callback must clone the data and call setFromUCallback (or setToUCallback) with the correct pointer. */ } //end err.h static final String DATA_TYPE = "cnv"; static final int CNV_DATA_BUFFER_SIZE = 25000; static final int SIZE_OF_UCONVERTER_SHARED_DATA = 100; static final int MAXIMUM_UCS2 = 0x0000FFFF; static final int MAXIMUM_UTF = 0x0010FFFF; //static final int MAXIMUM_UCS4 = 0x7FFFFFFF; static final int HALF_SHIFT = 10; static final int HALF_BASE = 0x0010000; static final int HALF_MASK = 0x3FF; static final int SURROGATE_HIGH_START = 0xD800; static final int SURROGATE_HIGH_END = 0xDBFF; static final int SURROGATE_LOW_START = 0xDC00; static final int SURROGATE_LOW_END = 0xDFFF; /* -SURROGATE_LOW_START + HALF_BASE */ static final int SURROGATE_LOW_BASE = 9216; } icu4j-4.2/src/com/ibm/icu/charset/CharsetUTF16BE.java0000644000175000017500000000144711361046170022056 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; /** * The purpose of this class is to set isBigEndian to true and isEndianSpecified to true in the super class, and to * allow the Charset framework to open the variant UTF-16 converter without extra setup work. */ class CharsetUTF16BE extends CharsetUTF16 { public CharsetUTF16BE(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetHZ.java0000644000175000017500000004453011361046170021363 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; class CharsetHZ extends CharsetICU { private static final int UCNV_TILDE = 0x7E; /* ~ */ private static final int UCNV_OPEN_BRACE = 0x7B; /* { */ private static final int UCNV_CLOSE_BRACE = 0x7D; /* } */ private static final byte[] SB_ESCAPE = new byte[] { 0x7E, 0x7D }; private static final byte[] DB_ESCAPE = new byte[] { 0x7E, 0x7B }; private static final byte[] TILDE_ESCAPE = new byte[] { 0x7E, 0x7E }; private static final byte[] fromUSubstitution = new byte[] { (byte) 0x1A }; private CharsetMBCS gbCharset; private boolean isEmptySegment; public CharsetHZ(String icuCanonicalName, String canonicalName, String[] aliases) { super(icuCanonicalName, canonicalName, aliases); gbCharset = (CharsetMBCS) new CharsetProviderICU().charsetForName("GBK"); maxBytesPerChar = 4; minBytesPerChar = 1; maxCharsPerByte = 1; isEmptySegment = false; } class CharsetDecoderHZ extends CharsetDecoderICU { CharsetMBCS.CharsetDecoderMBCS gbDecoder; boolean isStateDBCS = false; public CharsetDecoderHZ(CharsetICU cs) { super(cs); gbDecoder = (CharsetMBCS.CharsetDecoderMBCS) gbCharset.newDecoder(); } protected void implReset() { super.implReset(); gbDecoder.implReset(); isStateDBCS = false; isEmptySegment = false; } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; byte[] tempBuf = new byte[2]; int targetUniChar = 0; int mySourceChar = 0; if (!source.hasRemaining()) return CoderResult.UNDERFLOW; else if (!target.hasRemaining()) return CoderResult.OVERFLOW; while (source.hasRemaining()) { if (target.hasRemaining()) { // get the byte as unsigned mySourceChar = source.get() & 0xff; if (mode == UCNV_TILDE) { /* second byte after ~ */ mode = 0; switch (mySourceChar) { case 0x0A: /* no output for ~\n (line-continuation marker) */ continue; case UCNV_TILDE: if (offsets != null) { offsets.put(source.position() - 2); } target.put((char) mySourceChar); continue; case UCNV_OPEN_BRACE: case UCNV_CLOSE_BRACE: isStateDBCS = (mySourceChar == UCNV_OPEN_BRACE); if (isEmptySegment) { isEmptySegment = false; /* we are handling it, reset to avoid future spurious errors */ this.toUBytesArray[0] = UCNV_TILDE; this.toUBytesArray[1] = (byte)mySourceChar; this.toULength = 2; return CoderResult.malformedForLength(1); } isEmptySegment = true; continue; default: /* * if the first byte is equal to TILDE and the trail byte is not a valid byte then it is an * error condition */ /* * Ticket 5691: consistent illegal sequences: * - We include at least the first byte in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequence before the first one of those. */ isEmptySegment = false; /* different error here, reset this to avoid spurious furture error */ err = CoderResult.malformedForLength(1); toUBytesArray[0] = UCNV_TILDE; if (isStateDBCS ? (0x21 <= mySourceChar && mySourceChar <= 0x7e) : mySourceChar <= 0x7f) { /* The current byte could be the start of a character: Back it out. */ toULength = 1; source.position(source.position() - 1); } else { /* Include the current byte in the illegal sequence. */ toUBytesArray[1] = (byte)mySourceChar; toULength = 2; } return err; } } else if (isStateDBCS) { if (toUnicodeStatus == 0) { /* lead byte */ if (mySourceChar == UCNV_TILDE) { mode = UCNV_TILDE; } else { /* * add another bit to distinguish a 0 byte from not having seen a lead byte */ toUnicodeStatus = mySourceChar | 0x100; isEmptySegment = false; /* the segment has something, either valid or will produce a different error, so reset this */ } continue; } else { /* trail byte */ boolean leadIsOk, trailIsOk; int leadByte = toUnicodeStatus & 0xff; targetUniChar = 0xffff; /* * Ticket 5691: consistent illegal sequence * - We include at least the first byte in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequence before the first one of those * * In HZ DBCS, if the second byte is in the 21..7e range, * we report ony the first byte as the illegal sequence. * Otherwise we convert of report the pair of bytes. */ leadIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (leadByte - 0x21)) <= (0x7d - 0x21); trailIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (mySourceChar - 0x21)) <= (0x7e - 0x21); if (leadIsOk && trailIsOk) { tempBuf[0] = (byte)(leadByte + 0x80); tempBuf[1] = (byte)(mySourceChar + 0x80); targetUniChar = gbDecoder.simpleGetNextUChar(ByteBuffer.wrap(tempBuf), super.isFallbackUsed()); mySourceChar = (leadByte << 8) | mySourceChar; } else if (trailIsOk) { /* report a single illegal byte and continue with the following DBCS starter byte */ source.position(source.position() - 1); mySourceChar = (int)leadByte; } else { /* report a pair of illegal bytes if the second byte is not a DBCS starter */ /* add another bit so that the code below writes 2 bytes in case of error */ mySourceChar = 0x10000 | (leadByte << 8) | mySourceChar; } toUnicodeStatus = 0x00; } } else { if (mySourceChar == UCNV_TILDE) { mode = UCNV_TILDE; continue; } else if (mySourceChar <= 0x7f) { targetUniChar = mySourceChar; /* ASCII */ isEmptySegment = false; /* the segment has something valid */ } else { targetUniChar = 0xffff; isEmptySegment = false; /* different error here, reset this to avoid spurious future error */ } } if (targetUniChar < 0xfffe) { if (offsets != null) { offsets.put(source.position() - 1 - (isStateDBCS ? 1 : 0)); } target.put((char) targetUniChar); } else /* targetUniChar >= 0xfffe */{ if (mySourceChar > 0xff) { toUBytesArray[toUBytesBegin + 0] = (byte) (mySourceChar >> 8); toUBytesArray[toUBytesBegin + 1] = (byte) mySourceChar; toULength = 2; } else { toUBytesArray[toUBytesBegin + 0] = (byte) mySourceChar; toULength = 1; } if (targetUniChar == 0xfffe) { return CoderResult.unmappableForLength(toULength); } else { return CoderResult.malformedForLength(toULength); } } } else { return CoderResult.OVERFLOW; } } return err; } } class CharsetEncoderHZ extends CharsetEncoderICU { CharsetMBCS.CharsetEncoderMBCS gbEncoder; boolean isEscapeAppended = false; boolean isTargetUCharDBCS = false; public CharsetEncoderHZ(CharsetICU cs) { super(cs, fromUSubstitution); gbEncoder = (CharsetMBCS.CharsetEncoderMBCS) gbCharset.newEncoder(); } protected void implReset() { super.implReset(); gbEncoder.implReset(); isEscapeAppended = false; isTargetUCharDBCS = false; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { int length = 0; int[] targetUniChar = new int[] { 0 }; int mySourceChar = 0; boolean oldIsTargetUCharDBCS = isTargetUCharDBCS; if (!source.hasRemaining()) return CoderResult.UNDERFLOW; else if (!target.hasRemaining()) return CoderResult.OVERFLOW; if (fromUChar32 != 0 && target.hasRemaining()) { CoderResult cr = handleSurrogates(source, (char) fromUChar32); return (cr != null) ? cr : CoderResult.unmappableForLength(2); } /* writing the char to the output stream */ while (source.hasRemaining()) { targetUniChar[0] = MISSING_CHAR_MARKER; if (target.hasRemaining()) { mySourceChar = source.get(); oldIsTargetUCharDBCS = isTargetUCharDBCS; if (mySourceChar == UCNV_TILDE) { /* * concatEscape(args, &myTargetIndex, &targetLength,"\x7E\x7E",err,2,&mySourceIndex); */ concatEscape(source, target, offsets, TILDE_ESCAPE); continue; } else if (mySourceChar <= 0x7f) { length = 1; targetUniChar[0] = mySourceChar; } else { length = gbEncoder.fromUChar32(mySourceChar, targetUniChar, super.isFallbackUsed()); /* * we can only use lead bytes 21..7D and trail bytes 21..7E */ if (length == 2 && 0xa1a1 <= targetUniChar[0] && targetUniChar[0] <= 0xfdfe && 0xa1 <= (targetUniChar[0] & 0xff) && (targetUniChar[0] & 0xff) <= 0xfe) { targetUniChar[0] -= 0x8080; } else { targetUniChar[0] = MISSING_CHAR_MARKER; } } if (targetUniChar[0] != MISSING_CHAR_MARKER) { isTargetUCharDBCS = (targetUniChar[0] > 0x00FF); if (oldIsTargetUCharDBCS != isTargetUCharDBCS || !isEscapeAppended) { /* Shifting from a double byte to single byte mode */ if (!isTargetUCharDBCS) { concatEscape(source, target, offsets, SB_ESCAPE); isEscapeAppended = true; } else { /* * Shifting from a single byte to double byte mode */ concatEscape(source, target, offsets, DB_ESCAPE); isEscapeAppended = true; } } if (isTargetUCharDBCS) { if (target.hasRemaining()) { target.put((byte) (targetUniChar[0] >> 8)); if (offsets != null) { offsets.put(source.position() - 1); } if (target.hasRemaining()) { target.put((byte) targetUniChar[0]); if (offsets != null) { offsets.put(source.position() - 1); } } else { errorBuffer[errorBufferLength++] = (byte) targetUniChar[0]; // *err = U_BUFFER_OVERFLOW_ERROR; } } else { errorBuffer[errorBufferLength++] = (byte) (targetUniChar[0] >> 8); errorBuffer[errorBufferLength++] = (byte) targetUniChar[0]; // *err = U_BUFFER_OVERFLOW_ERROR; } } else { if (target.hasRemaining()) { target.put((byte) targetUniChar[0]); if (offsets != null) { offsets.put(source.position() - 1); } } else { errorBuffer[errorBufferLength++] = (byte) targetUniChar[0]; // *err = U_BUFFER_OVERFLOW_ERROR; } } } else { /* oops.. the code point is unassigned */ /* Handle surrogates */ /* check if the char is a First surrogate */ if (UTF16.isSurrogate((char) mySourceChar)) { // use that handy handleSurrogates method everyone's been talking about! CoderResult cr = handleSurrogates(source, (char) mySourceChar); return (cr != null) ? cr : CoderResult.unmappableForLength(2); } else { /* callback(unassigned) for a BMP code point */ // *err = U_INVALID_CHAR_FOUND; fromUChar32 = mySourceChar; return CoderResult.unmappableForLength(1); } } } else { // *err = U_BUFFER_OVERFLOW_ERROR; return CoderResult.OVERFLOW; } } return CoderResult.UNDERFLOW; } private CoderResult concatEscape(CharBuffer source, ByteBuffer target, IntBuffer offsets, byte[] strToAppend) { CoderResult cr = null; for (int i=0; i 0) { start = mid; } else { /* * Since the gencnval tool folds duplicates into one entry, this * alias in gAliasList is unique, but different standards may * map an alias to different converters. */ if ((gUntaggedConvArray[(int) mid] & AMBIGUOUS_ALIAS_MAP_BIT) != 0) { isAmbigous[0]=true; } /* State whether the canonical converter name contains an option. This information is contained in this list in order to maintain backward & forward compatibility. */ /*if (containsOption) { UBool containsCnvOptionInfo = (UBool)gMainTable.optionTable->containsCnvOptionInfo; *containsOption = (UBool)((containsCnvOptionInfo && ((gMainTable.untaggedConvArray[mid] & UCNV_CONTAINS_OPTION_BIT) != 0)) || !containsCnvOptionInfo); }*/ return gUntaggedConvArray[(int) mid] & CONVERTER_INDEX_MASK; } } return Integer.MAX_VALUE; } /** * stripForCompare Remove the underscores, dashes and spaces from * the name, and convert the name to lower case. * * @param dst The destination buffer, which is <= the buffer of name. * @param name The alias to strip * @return the destination buffer. */ public static final StringBuffer stripForCompare(StringBuffer dst, String name) { return io_stripASCIIForCompare(dst, name); } // enum { private static final byte IGNORE = 0; private static final byte ZERO = 1; private static final byte NONZERO = 2; static final byte MINLETTER = 3; /* any values from here on are lowercase letter mappings */ // } /* character types for ASCII 00..7F */ static final byte asciiTypes[] = new byte[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ZERO, NONZERO, NONZERO, NONZERO, NONZERO, NONZERO, NONZERO, NONZERO, NONZERO, NONZERO, 0, 0, 0, 0, 0, 0, 0, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0, 0, 0, 0, 0, 0, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0, 0, 0, 0, 0 }; private static final char GET_CHAR_TYPE(char c) { return (char)((c < asciiTypes.length) ? asciiTypes[c] : (char)IGNORE); } /** @see UConverterAlias#compareNames */ private static final StringBuffer io_stripASCIIForCompare(StringBuffer dst, String name) { int nameIndex = 0; char type, nextType; char c1; boolean afterDigit = false; while (nameIndex < name.length()) { c1 = name.charAt(nameIndex++); type = GET_CHAR_TYPE(c1); switch (type) { case IGNORE: afterDigit = false; continue; /* ignore all but letters and digits */ case ZERO: if (!afterDigit && nameIndex < name.length()) { nextType = GET_CHAR_TYPE(name.charAt(nameIndex)); if (nextType == ZERO || nextType == NONZERO) { continue; /* ignore leading zero before another digit */ } } break; case NONZERO: afterDigit = true; break; default: c1 = (char)type; /* lowercased letter */ afterDigit = false; break; } dst.append(c1); } return dst; } /** * Do a fuzzy compare of a two converter/alias names. The comparison is * case-insensitive. It also ignores the characters '-', '_', and ' ' (dash, * underscore, and space). Thus the strings "UTF-8", "utf_8", and "Utf 8" * are exactly equivalent. * * This is a symmetrical (commutative) operation; order of arguments is * insignificant. This is an important property for sorting the list (when * the list is preprocessed into binary form) and for performing binary * searches on it at run time. * * @param name1 * a converter name or alias, zero-terminated * @param name2 * a converter name or alias, zero-terminated * @return 0 if the names match, or a negative value if the name1 lexically * precedes name2, or a positive value if the name1 lexically * follows name2. * * @see UConverterAlias#stripForCompare */ static int compareNames(String name1, String name2){ int rc, name1Index = 0, name2Index = 0; char type, nextType; char c1 = 0, c2 = 0; boolean afterDigit1 = false, afterDigit2 = false; for (;;) { while (name1Index < name1.length()) { c1 = name1.charAt(name1Index++); type = GET_CHAR_TYPE(c1); switch (type) { case IGNORE: afterDigit1 = false; continue; /* ignore all but letters and digits */ case ZERO: if (!afterDigit1 && name1Index < name1.length()) { nextType = GET_CHAR_TYPE(name1.charAt(name1Index)); if (nextType == ZERO || nextType == NONZERO) { continue; /* ignore leading zero before another digit */ } } break; case NONZERO: afterDigit1 = true; break; default: c1 = (char)type; /* lowercased letter */ afterDigit1 = false; break; } break; /* deliver c1 */ } while (name2Index < name2.length()) { c2 = name2.charAt(name2Index++); type = GET_CHAR_TYPE(c2); switch (type) { case IGNORE: afterDigit2 = false; continue; /* ignore all but letters and digits */ case ZERO: if (!afterDigit2 && name1Index < name1.length()) { nextType = GET_CHAR_TYPE(name2.charAt(name2Index)); if (nextType == ZERO || nextType == NONZERO) { continue; /* ignore leading zero before another digit */ } } break; case NONZERO: afterDigit2 = true; break; default: c2 = (char)type; /* lowercased letter */ afterDigit2 = false; break; } break; /* deliver c2 */ } /* If we reach the ends of both strings then they match */ if (name1Index >= name1.length() && name2Index >= name2.length()) { return 0; } /* Case-insensitive comparison */ rc = (int)c1 - (int)c2; if (rc != 0) { return rc; } } } static int io_countAliases(String alias) throws IOException{ if (haveAliasData() && isAlias(alias)) { boolean[] isAmbigous = new boolean[1]; int convNum = findConverter(alias, isAmbigous); if (convNum < gConverterList.length) { /* tagListNum - 1 is the ALL tag */ int listOffset = gTaggedAliasArray[(int) ((gTagList.length - 1) * gConverterList.length + convNum)]; if (listOffset != 0) { return gTaggedAliasLists[listOffset]; } /* else this shouldn't happen. internal program error */ } /* else converter not found */ } return 0; } /** * Return the number of all aliases (and converter names). * * @return the number of all aliases */ // U_CFUNC uint16_t io_countTotalAliases(UErrorCode *pErrorCode); // static int io_countTotalAliases() throws IOException{ // if (haveAliasData()) { // return (int) gAliasList.length; // } // return 0; // } // U_CFUNC const char * io_getAlias(const char *alias, uint16_t n, // UErrorCode *pErrorCode) static String io_getAlias(String alias, int n) throws IOException{ if (haveAliasData() && isAlias(alias)) { boolean[] isAmbigous = new boolean[1]; int convNum = findConverter(alias,isAmbigous); if (convNum < gConverterList.length) { /* tagListNum - 1 is the ALL tag */ int listOffset = gTaggedAliasArray[(int) ((gTagList.length - 1) * gConverterList.length + convNum)]; if (listOffset != 0) { //int listCount = gTaggedAliasListsArray[listOffset]; /* +1 to skip listCount */ int[] currListArray = gTaggedAliasLists; int currListArrayIndex = listOffset + 1; return GET_STRING(currListArray[currListArrayIndex + n]); } /* else this shouldn't happen. internal program error */ } /* else converter not found */ } return null; } // U_CFUNC uint16_t io_countStandards(UErrorCode *pErrorCode) { // static int io_countStandards() throws IOException{ // if (haveAliasData()) { // return (int) (gTagList.length - NUM_HIDDEN_TAGS); // } // return 0; // } // U_CAPI const char * U_EXPORT2getStandard(uint16_t n, UErrorCode // *pErrorCode) // static String getStandard(int n) throws IOException{ // if (haveAliasData()) { // return GET_STRING(gTagList[n]); // } // return null; // } // U_CAPI const char * U_EXPORT2 getStandardName(const char *alias, const // char *standard, UErrorCode *pErrorCode) static final String getStandardName(String alias, String standard)throws IOException { if (haveAliasData() && isAlias(alias)) { int listOffset = findTaggedAliasListsOffset(alias, standard); if (0 < listOffset && listOffset < gTaggedAliasLists.length) { int[] currListArray = gTaggedAliasLists; int currListArrayIndex = listOffset + 1; if (currListArray[0] != 0) { return GET_STRING(currListArray[(int) currListArrayIndex]); } } } return null; } // U_CAPI uint16_t U_EXPORT2 countAliases(const char *alias, UErrorCode // *pErrorCode) static int countAliases(String alias) throws IOException{ return io_countAliases(alias); } // U_CAPI const char* U_EXPORT2 getAlias(const char *alias, uint16_t n, // UErrorCode *pErrorCode) static String getAlias(String alias, int n) throws IOException{ return io_getAlias(alias, n); } // U_CFUNC uint16_t countStandards(void) // static int countStandards()throws IOException{ // return io_countStandards(); // } /*returns a single Name from the list, will return NULL if out of bounds */ static String getAvailableName (int n){ try{ if (0 <= n && n <= 0xffff) { String name = bld_getAvailableConverter(n); return name; } }catch(IOException ex){ //throw away exception } return null; } // U_CAPI const char * U_EXPORT2 getCanonicalName(const char *alias, const // char *standard, UErrorCode *pErrorCode) { static String getCanonicalName(String alias, String standard) throws IOException{ if (haveAliasData() && isAlias(alias)) { int convNum = findTaggedConverterNum(alias, standard); if (convNum < gConverterList.length) { return GET_STRING(gConverterList[(int) convNum]); } } return null; } static int countAvailable (){ try{ return bld_countAvailableConverters(); }catch(IOException ex){ //throw away exception } return -1; } // U_CAPI UEnumeration * U_EXPORT2 openStandardNames(const char *convName, // const char *standard, UErrorCode *pErrorCode) /* static final UConverterAliasesEnumeration openStandardNames(String convName, String standard)throws IOException { UConverterAliasesEnumeration aliasEnum = null; if (haveAliasData() && isAlias(convName)) { int listOffset = findTaggedAliasListsOffset(convName, standard); * When listOffset == 0, we want to acknowledge that the converter * name and standard are okay, but there is nothing to enumerate. if (listOffset < gTaggedAliasLists.length) { UConverterAliasesEnumeration.UAliasContext context = new UConverterAliasesEnumeration.UAliasContext(listOffset, 0); aliasEnum = new UConverterAliasesEnumeration(); aliasEnum.setContext(context); } else converter or tag not found } return aliasEnum; }*/ // static uint32_t getTagNumber(const char *tagname) private static int getTagNumber(String tagName) { if (gTagList != null) { int tagNum; for (tagNum = 0; tagNum < gTagList.length; tagNum++) { if (tagName.equals(GET_STRING(gTagList[(int) tagNum]))) { return tagNum; } } } return Integer.MAX_VALUE; } // static uint32_t findTaggedAliasListsOffset(const char *alias, const char // *standard, UErrorCode *pErrorCode) private static int findTaggedAliasListsOffset(String alias, String standard) { int idx; int listOffset; int convNum; int tagNum = getTagNumber(standard); boolean[] isAmbigous = new boolean[1]; /* Make a quick guess. Hopefully they used a TR22 canonical alias. */ convNum = findConverter(alias, isAmbigous); if (tagNum < (gTagList.length - NUM_HIDDEN_TAGS) && convNum < gConverterList.length) { listOffset = gTaggedAliasArray[(int) (tagNum * gConverterList.length + convNum)]; if (listOffset != 0 && gTaggedAliasLists[(int) listOffset + 1] != 0) { return listOffset; } if (isAmbigous[0]==true) { /* * Uh Oh! They used an ambiguous alias. We have to search the * whole swiss cheese starting at the highest standard affinity. * This may take a while. */ for (idx = 0; idx < gTaggedAliasArray.length; idx++) { listOffset = gTaggedAliasArray[(int) idx]; if (listOffset != 0 && isAliasInList(alias, listOffset)) { int currTagNum = idx / gConverterList.length; int currConvNum = (idx - currTagNum * gConverterList.length); int tempListOffset = gTaggedAliasArray[(int) (tagNum * gConverterList.length + currConvNum)]; if (tempListOffset != 0 && gTaggedAliasLists[(int) tempListOffset + 1] != 0) { return tempListOffset; } /* * else keep on looking We could speed this up by * starting on the next row because an alias is unique * per row, right now. This would change if alias * versioning appears. */ } } /* The standard doesn't know about the alias */ } /* else no default name */ return 0; } /* else converter or tag not found */ return Integer.MAX_VALUE; } /* Return the canonical name */ // static uint32_t findTaggedConverterNum(const char *alias, const char // *standard, UErrorCode *pErrorCode) private static int findTaggedConverterNum(String alias, String standard) { int idx; int listOffset; int convNum; int tagNum = getTagNumber(standard); boolean[] isAmbigous = new boolean[1]; /* Make a quick guess. Hopefully they used a TR22 canonical alias. */ convNum = findConverter(alias, isAmbigous); if (tagNum < (gTagList.length - NUM_HIDDEN_TAGS) && convNum < gConverterList.length) { listOffset = gTaggedAliasArray[(int) (tagNum * gConverterList.length + convNum)]; if (listOffset != 0 && isAliasInList(alias, listOffset)) { return convNum; } if (isAmbigous[0] == true) { /* * Uh Oh! They used an ambiguous alias. We have to search one * slice of the swiss cheese. We search only in the requested * tag, not the whole thing. This may take a while. */ int convStart = (tagNum) * gConverterList.length; int convLimit = (tagNum + 1) * gConverterList.length; for (idx = convStart; idx < convLimit; idx++) { listOffset = gTaggedAliasArray[(int) idx]; if (listOffset != 0 && isAliasInList(alias, listOffset)) { return idx - convStart; } } /* The standard doesn't know about the alias */ } /* else no canonical name */ } /* else converter or tag not found */ return Integer.MAX_VALUE; } // static U_INLINE UBool isAliasInList(const char *alias, uint32_t // listOffset) private static boolean isAliasInList(String alias, int listOffset) { if (listOffset != 0) { int currAlias; int listCount = gTaggedAliasLists[(int) listOffset]; /* +1 to skip listCount */ int[] currList = gTaggedAliasLists; int currListArrayIndex = listOffset + 1; for (currAlias = 0; currAlias < listCount; currAlias++) { if (currList[(int) (currAlias + currListArrayIndex)] != 0 && compareNames( alias, GET_STRING(currList[(int) (currAlias + currListArrayIndex)])) == 0) { return true; } } } return false; } // begin bld.c static String[] gAvailableConverters = null; static int gAvailableConverterCount = 0; static byte[] gDefaultConverterNameBuffer; // [MAX_CONVERTER_NAME_LENGTH + // 1]; /* +1 for NULL */ static String gDefaultConverterName = null; // static UBool haveAvailableConverterList(UErrorCode *pErrorCode) static boolean haveAvailableConverterList() throws IOException{ if (gAvailableConverters == null) { int idx; int localConverterCount; String converterName; String[] localConverterList; if (!haveAliasData()) { return false; } /* We can't have more than "*converterTable" converters to open */ localConverterList = new String[(int) gConverterList.length]; localConverterCount = 0; for (idx = 0; idx < gConverterList.length; idx++) { converterName = GET_STRING(gConverterList[idx]); //UConverter cnv = UConverter.open(converterName); //TODO: Fix me localConverterList[localConverterCount++] = converterName; } // agljport:todo umtx_lock(NULL); if (gAvailableConverters == null) { gAvailableConverters = localConverterList; gAvailableConverterCount = localConverterCount; /* haveData should have already registered the cleanup function */ } else { // agljport:todo free((char **)localConverterList); } // agljport:todo umtx_unlock(NULL); } return true; } // U_CFUNC uint16_t bld_countAvailableConverters(UErrorCode *pErrorCode) static int bld_countAvailableConverters() throws IOException{ if (haveAvailableConverterList()) { return gAvailableConverterCount; } return 0; } // U_CFUNC const char * bld_getAvailableConverter(uint16_t n, UErrorCode // *pErrorCode) static String bld_getAvailableConverter(int n) throws IOException{ if (haveAvailableConverterList()) { if (n < gAvailableConverterCount) { return gAvailableConverters[n]; } } return null; } /* default converter name --------------------------------------------------- */ /* * In order to be really thread-safe, the get function would have to take * a buffer parameter and copy the current string inside a mutex block. * This implementation only tries to be really thread-safe while * setting the name. * It assumes that setting a pointer is atomic. */ // U_CFUNC const char * getDefaultName() // static final synchronized String getDefaultName() { // /* local variable to be thread-safe */ // String name; // // //agljport:todo umtx_lock(null); // name = gDefaultConverterName; // //agljport:todo umtx_unlock(null); // // if (name == null) { // //UConverter cnv = null; // int length = 0; // // name = CharsetICU.getDefaultCharsetName(); // // /* if the name is there, test it out and get the canonical name with options */ // if (name != null) { // // cnv = UConverter.open(name); // // name = cnv.getName(cnv); // // TODO: fix me // } // // if (name == null || name.length() == 0 ||/* cnv == null ||*/ // length >= gDefaultConverterNameBuffer.length) { // /* Panic time, let's use a fallback. */ // name = new String("US-ASCII"); // } // // //length=(int32_t)(strlen(name)); // // /* Copy the name before we close the converter. */ // name = gDefaultConverterName; // } // // return name; // } //end bld.c }icu4j-4.2/src/com/ibm/icu/charset/CharsetLMBCS.java0000644000175000017500000014576511361046170021716 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.charset.CharsetMBCS.CharsetDecoderMBCS; import com.ibm.icu.charset.CharsetMBCS.CharsetEncoderMBCS; import com.ibm.icu.util.ULocale; import com.ibm.icu.text.UnicodeSet; /** * @author Michael Ow * */ /* * LMBCS * * (Lotus Multi-Byte Character Set) * * LMBS was invented in the alte 1980's and is primarily used in Lotus Notes * databases and in Lotus 1-2-3 files. Programmers who work with the APIs * into these products will sometimes need to deal with strings in this format. * * The code in this file provides an implementation for an ICU converter of * LMBCS to and from Unicode. * * Since the LMBCS character set is only sparsely documented in existing * printed or online material, we have added extensive annotation to this * file to serve as a guide to understanding LMBCS. * * LMBCS was originally designed with these four sometimes-competing design goals: * -Provide encodings for characters in 12 existing national standards * (plus a few other characters) * -Minimal memory footprint * -Maximal speed of conversion into the existing national character sets * -No need to track a changing state as you interpret a string. * * All of the national character sets LMBCS was trying to encode are 'ANSI' * based, in that the bytes from 0x20 - 0x7F are almost exactly the * same common Latin unaccented characters and symbols in all character sets. * * So, in order to help meet the speed & memory design goals, the common ANSI * bytes from 0x20-0x7F are represented by the same single-byte values in LMBCS. */ class CharsetLMBCS extends CharsetICU { /* * The general LMBCS code unit is from 1-3 bytes. We can describe the 3 bytes as * follows: * [G] D1 [D2] * That is, a sometimes-optional 'group' byte, followed by 1 and sometimes 2 * data bytes. The maximum size of a LMBCS character is 3 bytes: */ private static final short ULMBCS_CHARSIZE_MAX = 3; /* * The single-byte values from 0x20 to 0x7F are examples of single D1 bytes. * We often have to figure out if byte values are below or above this, so we * use the ANSI nomenclature 'C0' and 'C1' to refer to the range of control * characters just above & below the common lower-ANSI range. */ private static final short ULMBCS_C0END = 0x1F; private static final short ULMBCS_C1START = 0x80; /* * Most of the values less than 0x20 are reserved in LMBCS to announce * which national character standard is being used for the 'D' bytes. * In the comments we show that common name and the IBM character-set ID * for these character-set announcers: */ private static final short ULMBCS_GRP_L1 = 0x01; /* Latin-1 :ibm-850 */ private static final short ULMBCS_GRP_GR = 0x02; /* Greek :ibm-851 */ private static final short ULMBCS_GRP_HE = 0x03; /* Hebrew :ibm-1255 */ private static final short ULMBCS_GRP_AR = 0x04; /* Arabic :ibm-1256 */ private static final short ULMBCS_GRP_RU = 0x05; /* Cyrillic :ibm-1251 */ private static final short ULMBCS_GRP_L2 = 0x06; /* Latin-2 :ibm-852 */ private static final short ULMBCS_GRP_TR = 0x08; /* Turkish :ibm-1254 */ private static final short ULMBCS_GRP_TH = 0x0B; /* Thai :ibm-874 */ private static final short ULMBCS_GRP_JA = 0x10; /* Japanese :ibm-943 */ private static final short ULMBCS_GRP_KO = 0x11; /* Korean :ibm-1261 */ private static final short ULMBCS_GRP_TW = 0x12; /* Chinese SC :ibm-950 */ private static final short ULMBCS_GRP_CN = 0x13; /* Chinese TC :ibm-1386 */ /* * So, the beginnning of understanding LMBCS is that IF the first byte of a LMBCS * character is one of those 12 values, you can interpret the remaining bytes of * that character as coming from one of those character sets. Since the lower * ANSI bytes already are represented in singl bytes, using one of the chracter * set announcers is used to announce a character that starts with a byte of * 0x80 or greater. * * The character sets are arranged so that the single byte sets all appear * before the multi-byte character sets. When we need to tell whether a * group byte is for a single byte char set or not we use this definition: */ private static final short ULMBCS_DOUBLEOPTGROUP_START = 0x10; /* * However, to fully understand LMBCS, you must also understand a series of * exceptions & optimizations made in service of the design goals. * * First, those of you who are character set mavens may have noticed that * the 'double-byte' character sets are actually multi-byte chracter sets * that can have 1 or two bytes, even in upper-ascii range. To force * each group byte to introduce a fixed-width encoding (to make it faster to * count characters), we use a convention of doubling up on the group byte * to introduce any single-byte character > 0x80 in an otherwise double-byte * character set. So, for example, the LMBCS sequence x10 x10 xAE is the * same as '0xAE' in the Japanese code page 943. * * Next, you will notice that the list of group bytes has some gaps. * These are used in various ways. * * We reserve a few special single byte values for common control * characters. These are in the same place as their ANSI equivalents for speed. */ private static final short ULMBCS_HT = 0x09; /* Fixed control-char - Horizontal Tab */ private static final short ULMBCS_LF = 0x0A; /* Fixed control-char - Line Feed */ private static final short ULMBCS_CR = 0x0D; /* Fixed control-char - Carriage Return */ /* * Then, 1-2-3 reserved a special single-byte character to put at the * beginning of internal 'system' range names: */ private static final short ULMBCS_123SYSTEMRANGE = 0x19; /* * Then we needed a place to put all the other ansi control characters * that must be moved to different values because LMBCS reserves those * values for other purposes. To represent the control characters, we start * with a first byte of 0x0F & add the control character value as the * second byte. */ private static final short ULMBCS_GRP_CTRL = 0x0F; /* * For the C0 controls (less than 0x20), we add 0x20 to preserve the * useful doctrine that any byte less than 0x20 in a LMBCS char must be * the first byte of a character: */ private static final short ULMBCS_CTRLOFFSET = 0x20; /* * Where to put the characters that aren't part of any of the 12 national * character sets? The first thing that was done, in the earlier years of * LMBCS, was to use up the spaces of the form * [G] D1, * where 'G' was one of the single-byte character groups, and * D1 was less than 0x80. These sequences are gathered together * into a Lotus-invented doublebyte character set to represent a * lot of stray values. Internally, in this implementation, we track this * as group '0', as a place to tuck this exceptions list. */ private static final short ULMBCS_GRP_EXCEPT = 0x00; /* * Finally, as the durability and usefulness of UNICODE became clear, * LOTUS added a new group 0x14 to hold Unicode values not otherwise * represented in LMBCS: */ private static final short ULMBCS_GRP_UNICODE = 0x14; /* * The two bytes appearing after a 0x14 are interpreted as UTF-16 BE * (Big Endian) characters. The exception comes when UTF16 * representation would have a zero as the second byte. In that case, * 'F6' is used in its place, and the bytes are swapped. (This prevents * LMBCS from encoding any Unicode values of the form U+F6xx, but that's OK: * 0xF6xx is in the middle of the Private Use Area.) */ private static char ULMBCS_UNICOMPATZERO = 0x00F6; /* * It is also useful in our code to have a constant for the size of * a LMBCS char that holds a literal Unicode value. */ private static final short ULMBCS_UNICODE_SIZE = 3; /* * To squish the LMBCS representation down even further, and to make * translations even faster, sometimes the optimization group byte can be dropped * from a LMBCS character. This is decided on a process-by-process basis. The * group byte that is dropped is called the 'optimization group.' * * For Notes, the optimization group is always 0x1. */ //private static final short ULMBCS_DEFAULTOPTGROUP = 0x01; /* For 1-2-3 files, the optimization group is stored in the header of the 1-2-3 * file. * In any case, when using ICU, you either pass in the * optimization group as part of the name of the converter (LMBCS-1, LMBCS-2, * etc.). Using plain 'LMBCS' as the name of the converter will give you * LMBCS-1. */ /* Implementation strategy */ /* * Because of the extensive use of other character sets, the LMBCS converter * keeps a mapping between optimization groups and IBM character sets, so that * ICU converters can be created and used as needed. * * As you can see, even though any byte below 0x20 could be an optimization * byte, only those at 0x13 or below can map to an actual converter. To limit * some loops and searches, we define a value for that last group converter: */ private static final short ULMBCS_GRP_LAST = 0x13; /* last LMBCS group that has a converter */ private static final String[] OptGroupByteToCPName = { /* 0x0000 */ "lmb-excp", /* internal home for the LOTUS exceptions list */ /* 0x0001 */ "ibm-850", /* 0x0002 */ "ibm-851", /* 0x0003 */ "windows-1255", /* 0x0004 */ "windows-1256", /* 0x0005 */ "windows-1251", /* 0x0006 */ "ibm-852", /* 0x0007 */ null, /* Unused */ /* 0x0008 */ "windows-1254", /* 0x0009 */ null, /* Control char HT */ /* 0x000A */ null, /* Control char LF */ /* 0x000B */ "windows-874", /* 0x000C */ null, /* Unused */ /* 0x000D */ null, /* Control char CR */ /* 0x000E */ null, /* Unused */ /* 0x000F */ null, /* Control chars: 0x0F20 + C0/C1 character: algorithmic */ /* 0x0010 */ "windows-932", /* 0x0011 */ "windows-949", /* 0x0012 */ "windows-950", /* 0x0013 */ "windows-936", /* The rest are null, including the 0x0014 Unicode compatibility region * and 0x0019, the 1-2-3 system range control char */ /* 0x0014 */ null }; /* That's approximately all the data that's needed for translating * LMBCS to Unicode. * * However, to translate Unicode to LMBCS, we need some more support. * * That's because there are often more than one possible mappings from a Unicode * code point back into LMBCS. The first thing we do is look up into a table * to figure out if there are more than one possible mapplings. This table, * arranged by Unicode values (including ranges) either lists which group * to use, or says that it could go into one or more of the SBCS sets, or * into one or more of the DBCS sets. (If the character exists in both DBCS & * SBCS, the table will place it in the SBCS sets, to make the LMBCS code point * length as small as possible. Here's the two special markers we use to indicate * ambiguous mappings: */ private static final short ULMBCS_AMBIGUOUS_SBCS = 0x80; /* could fit in more than one LMBCS sbcs native encoding (example: most accented latin) */ private static final short ULMBCS_AMBIGUOUS_MBCS = 0x81; /* could fit in more than one LMBCS mbcs native encoding (example: Unihan) */ /* And here's a simple way to see if a group falls in an appropriate range */ private boolean ULMBCS_AMBIGUOUS_MATCH(short agroup, short xgroup) { return (((agroup == ULMBCS_AMBIGUOUS_SBCS) && (xgroup < ULMBCS_DOUBLEOPTGROUP_START)) || ((agroup == ULMBCS_AMBIGUOUS_MBCS) && (xgroup >= ULMBCS_DOUBLEOPTGROUP_START))); } /* The table & some code to use it: */ private static class _UniLMBCSGrpMap { int uniStartRange; int uniEndRange; short GrpType; _UniLMBCSGrpMap(int uniStartRange, int uniEndRange, short GrpType) { this.uniStartRange = uniStartRange; this.uniEndRange = uniEndRange; this.GrpType = GrpType; } } private static final _UniLMBCSGrpMap[] UniLMBCSGrpMap = { new _UniLMBCSGrpMap(0x0001, 0x001F, ULMBCS_GRP_CTRL), new _UniLMBCSGrpMap(0x0080, 0x009F, ULMBCS_GRP_CTRL), new _UniLMBCSGrpMap(0x00A0, 0x01CD, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x01CE, 0x01CE, ULMBCS_GRP_TW), new _UniLMBCSGrpMap(0x01CF, 0x02B9, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x02BA, 0x02BA, ULMBCS_GRP_CN), new _UniLMBCSGrpMap(0x02BC, 0x02C8, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x02C9, 0x02D0, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x02D8, 0x02DD, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x0384, 0x03CE, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x0400, 0x044E, ULMBCS_GRP_RU), new _UniLMBCSGrpMap(0x044F, 0x044F, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x0450, 0x0491, ULMBCS_GRP_RU), new _UniLMBCSGrpMap(0x05B0, 0x05F2, ULMBCS_GRP_HE), new _UniLMBCSGrpMap(0x060C, 0x06AF, ULMBCS_GRP_AR), new _UniLMBCSGrpMap(0x0E01, 0x0E5B, ULMBCS_GRP_TH), new _UniLMBCSGrpMap(0x200C, 0x200F, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2010, 0x2010, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2013, 0x2015, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2016, 0x2016, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2017, 0x2024, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2025, 0x2025, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2026, 0x2026, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2027, 0x2027, ULMBCS_GRP_CN), new _UniLMBCSGrpMap(0x2030, 0x2033, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2035, 0x2035, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2039, 0x203A, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x203B, 0x203B, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2074, 0x2074, ULMBCS_GRP_KO), new _UniLMBCSGrpMap(0x207F, 0x207F, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2081, 0x2084, ULMBCS_GRP_KO), new _UniLMBCSGrpMap(0x20A4, 0x20AC, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2103, 0x2109, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2111, 0x2126, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x212B, 0x212B, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2135, 0x2135, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2153, 0x2154, ULMBCS_GRP_KO), new _UniLMBCSGrpMap(0x215B, 0x215E, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2160, 0x2179, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2190, 0x2195, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2196, 0x2199, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x21A8, 0x21A8, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x21B8, 0x21B9, ULMBCS_GRP_CN), new _UniLMBCSGrpMap(0x21D0, 0x21D5, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x21E7, 0x21E7, ULMBCS_GRP_CN), new _UniLMBCSGrpMap(0x2200, 0x220B, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x220F, 0x2215, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2219, 0x2220, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2223, 0x2228, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2229, 0x222B, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x222C, 0x223D, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2245, 0x2248, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x224C, 0x224C, ULMBCS_GRP_TW), new _UniLMBCSGrpMap(0x2252, 0x2252, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2260, 0x2265, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2266, 0x226F, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2282, 0x2297, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2299, 0x22BF, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x22C0, 0x22C0, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2310, 0x2310, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2312, 0x2312, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2318, 0x2321, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2318, 0x2321, ULMBCS_GRP_CN), new _UniLMBCSGrpMap(0x2460, 0x24E9, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2500, 0x2500, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2501, 0x2501, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2502, 0x2502, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2503, 0x2503, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2504, 0x2505, ULMBCS_GRP_TW), new _UniLMBCSGrpMap(0x2506, 0x2665, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0x2666, 0x2666, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2666, 0x2666, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0x2667, 0x2E7F, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0x2E80, 0xF861, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0xF862, 0xF8FF, ULMBCS_GRP_EXCEPT), new _UniLMBCSGrpMap(0xF900, 0xFA2D, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0xFB00, 0xFEFF, ULMBCS_AMBIGUOUS_SBCS), new _UniLMBCSGrpMap(0xFF01, 0xFFEE, ULMBCS_AMBIGUOUS_MBCS), new _UniLMBCSGrpMap(0xFFFF, 0xFFFF, ULMBCS_GRP_UNICODE) }; static short FindLMBCSUniRange(char uniChar) { int index = 0; while (uniChar > UniLMBCSGrpMap[index].uniEndRange) { index++; } if (uniChar >= UniLMBCSGrpMap[index].uniStartRange) { return UniLMBCSGrpMap[index].GrpType; } return ULMBCS_GRP_UNICODE; } /* * We also ask the creator of a converter to send in a preferred locale * that we can use in resolving ambiguous mappings. They send the locale * in as a string, and we map it, if possible, to one of the * LMBCS groups. We use this table, and the associated code, to * do the lookup: * * This table maps locale ID's to LMBCS opt groups. * The default return is group 0x01. Note that for * performance reasons, the table is sorted in * increasing alphabetic order, with the notable * exception of zhTW. This is to force the check * for Traditional Chinese before dropping back to * Simplified. * Note too that the Latin-1 groups have been * commented out because it's the default, and * this shortens the table, allowing a serial * search to go quickly. */ private static class _LocaleLMBCSGrpMap { String LocaleID; short OptGroup; _LocaleLMBCSGrpMap(String LocaleID, short OptGroup) { this.LocaleID = LocaleID; this.OptGroup = OptGroup; } } private static final _LocaleLMBCSGrpMap[] LocaleLMBCSGrpMap = { new _LocaleLMBCSGrpMap("ar", ULMBCS_GRP_AR), new _LocaleLMBCSGrpMap("be", ULMBCS_GRP_RU), new _LocaleLMBCSGrpMap("bg", ULMBCS_GRP_L2), // new _LocaleLMBCSGrpMap("ca", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("cs", ULMBCS_GRP_L2), // new _LocaleLMBCSGrpMap("da", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("de", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("el", ULMBCS_GRP_GR), // new _LocaleLMBCSGrpMap("en", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("es", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("et", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("fi", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("fr", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("he", ULMBCS_GRP_HE), new _LocaleLMBCSGrpMap("hu", ULMBCS_GRP_L2), // new _LocaleLMBCSGrpMap("is", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("it", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("iw", ULMBCS_GRP_HE), new _LocaleLMBCSGrpMap("ja", ULMBCS_GRP_JA), new _LocaleLMBCSGrpMap("ko", ULMBCS_GRP_KO), // new _LocaleLMBCSGrpMap("lt", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("lv", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("mk", ULMBCS_GRP_RU), // new _LocaleLMBCSGrpMap("nl", ULMBCS_GRP_L1), // new _LocaleLMBCSGrpMap("no", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("pl", ULMBCS_GRP_L2), // new _LocaleLMBCSGrpMap("pt", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("ro", ULMBCS_GRP_L2), new _LocaleLMBCSGrpMap("ru", ULMBCS_GRP_RU), new _LocaleLMBCSGrpMap("sh", ULMBCS_GRP_L2), new _LocaleLMBCSGrpMap("sk", ULMBCS_GRP_L2), new _LocaleLMBCSGrpMap("sl", ULMBCS_GRP_L2), new _LocaleLMBCSGrpMap("sq", ULMBCS_GRP_L2), new _LocaleLMBCSGrpMap("sr", ULMBCS_GRP_RU), // new _LocaleLMBCSGrpMap("sv", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("th", ULMBCS_GRP_TH), new _LocaleLMBCSGrpMap("tr", ULMBCS_GRP_TR), new _LocaleLMBCSGrpMap("uk", ULMBCS_GRP_RU), // new _LocaleLMBCSGrpMap("vi", ULMBCS_GRP_L1), new _LocaleLMBCSGrpMap("zhTW", ULMBCS_GRP_TW), new _LocaleLMBCSGrpMap("zh", ULMBCS_GRP_CN), new _LocaleLMBCSGrpMap(null, ULMBCS_GRP_L1) }; static short FindLMBCSLocale(String LocaleID) { int index = 0; if (LocaleID == null) { return 0; } while (LocaleLMBCSGrpMap[index].LocaleID != null) { if (LocaleLMBCSGrpMap[index].LocaleID == LocaleID) { return LocaleLMBCSGrpMap[index].OptGroup; } else if (LocaleLMBCSGrpMap[index].LocaleID.compareTo(LocaleID) > 0){ break; } index++; } return ULMBCS_GRP_L1; } /* * Before we get to the main body of code, here's how we hook up the rest * of ICU. ICU converters are required to define a structure that includes * some function pointers, and some common data, in the style of a C++ * vtable. There is also room in there for converter-specific data. LMBCS * uses that converter-specific data to keep track of the 12 subconverters * we use, the optimization group, and the group (if any) that matches the * locale. We have one structure instantiated for each of the 12 possible * optimization groups. */ private class UConverterDataLMBCS { UConverterSharedData[] OptGrpConverter; /* Converter per Opt. grp. */ short OptGroup; /* default Opt. grp. for this LMBCS session */ short localeConverterIndex; /* reasonable locale match for index */ CharsetDecoderMBCS decoder; CharsetEncoderMBCS encoder; CharsetMBCS charset; UConverterDataLMBCS() { OptGrpConverter = new UConverterSharedData[ULMBCS_GRP_LAST + 1]; charset = (CharsetMBCS)CharsetICU.forNameICU("ibm-850"); encoder = (CharsetEncoderMBCS)charset.newEncoder(); decoder = (CharsetDecoderMBCS)charset.newDecoder(); } } private UConverterDataLMBCS extraInfo; /* extraInfo in ICU4C implementation */ public CharsetLMBCS(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); maxBytesPerChar = ULMBCS_CHARSIZE_MAX; minBytesPerChar = 1; maxCharsPerByte = 1; extraInfo = new UConverterDataLMBCS(); for (int i = 0; i <= ULMBCS_GRP_LAST; i++) { if (OptGroupByteToCPName[i] != null) { extraInfo.OptGrpConverter[i] = ((CharsetMBCS)CharsetICU.forNameICU(OptGroupByteToCPName[i])).sharedData; } } //get the Opt Group number for the LMBCS converter int option = Integer.parseInt(icuCanonicalName.substring(6)); extraInfo.OptGroup = (short)option; extraInfo.localeConverterIndex = FindLMBCSLocale(ULocale.getDefault().getBaseName()); } class CharsetDecoderLMBCS extends CharsetDecoderICU { public CharsetDecoderLMBCS(CharsetICU cs) { super(cs); implReset(); } protected void implReset() { super.implReset(); } /* A function to call when we are looking at the Unicode group byte in LMBCS */ private char GetUniFromLMBCSUni(ByteBuffer ppLMBCSin) { short HighCh = (short)(ppLMBCSin.get() & UConverterConstants.UNSIGNED_BYTE_MASK); short LowCh = (short)(ppLMBCSin.get() & UConverterConstants.UNSIGNED_BYTE_MASK); if (HighCh == ULMBCS_UNICOMPATZERO) { HighCh = LowCh; LowCh = 0; /* zero-byte in LSB special character */ } return (char)((HighCh << 8) | LowCh); } private int LMBCS_SimpleGetNextUChar(UConverterSharedData cnv, ByteBuffer source, int positionOffset, int length) { int uniChar; int oldSourceLimit; int oldSourcePos; extraInfo.charset.sharedData = cnv; oldSourceLimit = source.limit(); oldSourcePos = source.position(); source.position(oldSourcePos + positionOffset); source.limit(source.position() + length); uniChar = extraInfo.decoder.simpleGetNextUChar(source, false); source.limit(oldSourceLimit); source.position(oldSourcePos); return uniChar; } /* Return the Unicode representation for the current LMBCS character. */ /* * Note: Because there is no U_TRUNCATED_CHAR_FOUND error code in ICU4J, we * are going to use BufferOverFlow. The error will be handled correctly * by the calling function. */ private int LMBCSGetNextUCharWorker(ByteBuffer source, CoderResult[] err) { int uniChar = 0; /* an output Unicode char */ short CurByte; /* A byte from the input stream */ /* error check */ if (!source.hasRemaining()) { err[0] = CoderResult.malformedForLength(0); return 0xffff; } /* Grab first byte & save address for error recovery */ CurByte = (short)(source.get() & UConverterConstants.UNSIGNED_BYTE_MASK); /* * at entry of each if clause: * 1. 'CurByte' points at the first byte of a LMBCS character * 2. 'source' points to the next byte of the source stream after 'CurByte' * * the job of each if clause is: * 1. set 'source' to the point at the beginning of the next char (not if LMBCS char is only 1 byte) * 2. set 'uniChar' up with the right Unicode value, or set 'err' appropriately */ /* First lets check the simple fixed values. */ if ((CurByte > ULMBCS_C0END && CurByte < ULMBCS_C1START) /* ascii range */ || CurByte == 0 || CurByte == ULMBCS_HT || CurByte == ULMBCS_CR || CurByte == ULMBCS_LF || CurByte == ULMBCS_123SYSTEMRANGE) { uniChar = CurByte; } else { short group; UConverterSharedData cnv; if (CurByte == ULMBCS_GRP_CTRL) { /* Control character group - no opt group update */ short C0C1byte; /* CHECK_SOURCE_LIMIT(1) */ if (source.position() + 1 > source.limit()) { err[0] = CoderResult.OVERFLOW; source.position(source.limit()); return 0xFFFF; } C0C1byte = (short)(source.get() & UConverterConstants.UNSIGNED_BYTE_MASK); uniChar = (C0C1byte < ULMBCS_C1START) ? C0C1byte - ULMBCS_CTRLOFFSET : C0C1byte; } else if (CurByte == ULMBCS_GRP_UNICODE) { /* Unicode Compatibility group: Big Endian UTF16 */ /* CHECK_SOURCE_LIMIT(2) */ if (source.position() + 2 > source.limit()) { err[0] = CoderResult.OVERFLOW; source.position(source.limit()); return 0xFFFF; } /* don't check for error indicators fffe/ffff below */ return GetUniFromLMBCSUni(source); } else if (CurByte <= ULMBCS_CTRLOFFSET) { group = CurByte; if (group > ULMBCS_GRP_LAST || (cnv = extraInfo.OptGrpConverter[group]) == null) { /* this is not a valid group byte - no converter */ err[0] = CoderResult.unmappableForLength(1); } else if (group >= ULMBCS_DOUBLEOPTGROUP_START) { /* CHECK_SOURCE_LIMIT(2) */ if (source.position() + 2 > source.limit()) { err[0] = CoderResult.OVERFLOW; source.position(source.limit()); return 0xFFFF; } /* check for LMBCS doubled-group-byte case */ if (source.get(source.position()) == group) { /* single byte */ source.get(); uniChar = LMBCS_SimpleGetNextUChar(cnv, source, 0, 1); source.get(); } else { /* double byte */ uniChar = LMBCS_SimpleGetNextUChar(cnv, source, 0, 2); source.get(); source.get(); } } else { /* single byte conversion */ /* CHECK_SOURCE_LIMIT(1) */ if (source.position() + 1 > source.limit()) { err[0] = CoderResult.OVERFLOW; source.position(source.limit()); return 0xFFFF; } CurByte = (short)(source.get() & UConverterConstants.UNSIGNED_BYTE_MASK); if (CurByte >= ULMBCS_C1START) { uniChar = CharsetMBCS.MBCS_SINGLE_SIMPLE_GET_NEXT_BMP(cnv.mbcs, CurByte); } else { /* * The non-optimizable oddballs where there is an explicit byte * AND the second byte is not in the upper ascii range */ byte[] bytes = new byte[2]; cnv = extraInfo.OptGrpConverter[ULMBCS_GRP_EXCEPT]; /* Lookup value must include opt group */ bytes[0] = (byte)group; bytes[1] = (byte)CurByte; uniChar = LMBCS_SimpleGetNextUChar(cnv, ByteBuffer.wrap(bytes), 0, 2); } } } else if (CurByte >= ULMBCS_C1START) { /* group byte is implicit */ group = extraInfo.OptGroup; cnv = extraInfo.OptGrpConverter[group]; if (group >= ULMBCS_DOUBLEOPTGROUP_START) { /* double byte conversion */ if (CharsetMBCS.MBCS_ENTRY_IS_TRANSITION(cnv.mbcs.stateTable[0][CurByte]) /* isLeadByte */) { /* CHECK_SOURCE_LIMIT(0) */ if (source.position() + 0 > source.limit()) { err[0] = CoderResult.OVERFLOW; source.position(source.limit()); return 0xFFFF; } /* let the MBCS conversion consume CurByte again */ uniChar = LMBCS_SimpleGetNextUChar(cnv, source, -1, 1); } else { /* CHECK_SOURCE_LIMIT(1) */ if (source.position() + 1 > source.limit()) { err[0] = CoderResult.OVERFLOW; source.position(source.limit()); return 0xFFFF; } /* let the MBCS conversion consume CurByte again */ uniChar = LMBCS_SimpleGetNextUChar(cnv, source, -1, 2); source.get(); } } else { uniChar = CharsetMBCS.MBCS_SINGLE_SIMPLE_GET_NEXT_BMP(cnv.mbcs, CurByte); } } } return uniChar; } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] err = new CoderResult[1]; err[0] = CoderResult.UNDERFLOW; byte[] LMBCS = new byte[ULMBCS_CHARSIZE_MAX * 2]; /* Increase the size for proper handling in subsequent calls to MBCS functions */ char uniChar; /* one output Unicode char */ int saveSource; /* beginning of current code point */ int errSource = 0; /* index to actual input in case an error occurs */ byte savebytes = 0; /* Process from source to limit, or until error */ while (err[0].isUnderflow() && source.hasRemaining() && target.hasRemaining()) { saveSource = source.position(); /* beginning of current code point */ if (toULength > 0) { /* reassemble char from previous call */ int size_old = toULength; ByteBuffer tmpSourceBuffer; /* limit from source is either remainder of temp buffer, or user limit on source */ int size_new_maybe_1 = ULMBCS_CHARSIZE_MAX - size_old; int size_new_maybe_2 = source.remaining(); int size_new = (size_new_maybe_1 < size_new_maybe_2) ? size_new_maybe_1 : size_new_maybe_2; savebytes = (byte)(size_old + size_new); for (int i = 0; i < savebytes; i++) { if (i < size_old) { LMBCS[i] = toUBytesArray[i]; } else { LMBCS[i] = source.get(); } } tmpSourceBuffer = ByteBuffer.wrap(LMBCS); tmpSourceBuffer.limit(savebytes); uniChar = (char)LMBCSGetNextUCharWorker(tmpSourceBuffer, err); source.position(saveSource + tmpSourceBuffer.position() - size_old); errSource = saveSource - size_old; if (err[0].isOverflow()) { /* err == U_TRUNCATED_CHAR_FOUND */ /* evil special case: source buffers so small a char spans more than 2 buffers */ toULength = savebytes; for (int i = 0; i < savebytes; i++) { toUBytesArray[i] = LMBCS[i]; } source.position(source.limit()); err[0] = CoderResult.UNDERFLOW; return err[0]; } else { /* clear the partial-char marker */ toULength = 0; } } else { errSource = saveSource; uniChar = (char)LMBCSGetNextUCharWorker(source, err); savebytes = (byte)(source.position() - saveSource); } if (err[0].isUnderflow()) { if (uniChar < 0x0fffe) { target.put(uniChar); if (offsets != null) { offsets.put(saveSource); } } else if (uniChar == 0xfffe) { err[0] = CoderResult.unmappableForLength(source.position() - saveSource); } else /* if (uniChar == 0xffff) */ { err[0] = CoderResult.malformedForLength(source.position() - saveSource); } } } /* If target ran out before source, return over flow buffer error. */ if (err[0].isUnderflow() && source.hasRemaining() && !target.hasRemaining()) { err[0] = CoderResult.OVERFLOW; } else if (!err[0].isUnderflow()) { /* If character incomplete or unmappable/illegal, store it in toUBytesArray[] */ toULength = savebytes; if (savebytes > 0) { for (int i = 0; i < savebytes; i++) { toUBytesArray[i] = source.get(errSource + i); } } if (err[0].isOverflow()) { /* err == U_TRUNCATED_CHAR_FOUND */ err[0] = CoderResult.UNDERFLOW; } } return err[0]; } } class CharsetEncoderLMBCS extends CharsetEncoderICU { public CharsetEncoderLMBCS(CharsetICU cs) { super(cs, fromUSubstitution); implReset(); } protected void implReset() { super.implReset(); } /* * Here's the basic helper function that we use when converting from * Unicode to LMBCS, and we suspect that a Unicode character will fit into * one of the 12 groups. The return value is the number of bytes written * starting at pStartLMBCS (if any). */ private int LMBCSConversionWorker(short group, byte[] LMBCS, char pUniChar, short[] lastConverterIndex, boolean[] groups_tried) { byte pLMBCS = 0; UConverterSharedData xcnv = extraInfo.OptGrpConverter[group]; int bytesConverted; int[] value = new int[1]; short firstByte; extraInfo.charset.sharedData = xcnv; bytesConverted = extraInfo.encoder.fromUChar32(pUniChar, value, false); /* get the first result byte */ if (bytesConverted > 0) { firstByte = (short)((value[0] >> ((bytesConverted - 1) * 8)) & UConverterConstants.UNSIGNED_BYTE_MASK); } else { /* most common failure mode is an unassigned character */ groups_tried[group] = true; return 0; } lastConverterIndex[0] = group; /* * All initial byte values in lower ascii range should have been caught by now, * except with the exception group. */ /* use converted data: first write 0, 1 or two group bytes */ if (group != ULMBCS_GRP_EXCEPT && extraInfo.OptGroup != group) { LMBCS[pLMBCS++] = (byte)group; if (bytesConverted == 1 && group >= ULMBCS_DOUBLEOPTGROUP_START) { LMBCS[pLMBCS++] = (byte)group; } } /* don't emit control chars */ if (bytesConverted == 1 && firstByte < 0x20) { return 0; } /* then move over the converted data */ switch (bytesConverted) { case 4: LMBCS[pLMBCS++] = (byte)(value[0] >> 24); case 3: LMBCS[pLMBCS++] = (byte)(value[0] >> 16); case 2: LMBCS[pLMBCS++] = (byte)(value[0] >> 8); case 1: LMBCS[pLMBCS++] = (byte)value[0]; default: /* will never occur */ break; } return pLMBCS; } /* * This is a much simpler version of above, when we * know we are writing LMBCS using the Unicode group. */ private int LMBCSConvertUni(byte[] LMBCS, char uniChar) { int index = 0; short LowCh = (short)(uniChar & UConverterConstants.UNSIGNED_BYTE_MASK); short HighCh = (short)((uniChar >> 8) & UConverterConstants.UNSIGNED_BYTE_MASK); LMBCS[index++] = (byte)ULMBCS_GRP_UNICODE; if (LowCh == 0) { LMBCS[index++] = (byte)ULMBCS_UNICOMPATZERO; LMBCS[index++] = (byte)HighCh; } else { LMBCS[index++] = (byte)HighCh; LMBCS[index++] = (byte)LowCh; } return ULMBCS_UNICODE_SIZE; } /* The main Unicode to LMBCS conversion function */ protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; short[] lastConverterIndex = new short[1]; char uniChar; byte[] LMBCS = new byte[ULMBCS_CHARSIZE_MAX]; byte pLMBCS; int bytes_written; boolean[] groups_tried = new boolean[ULMBCS_GRP_LAST+1]; int sourceIndex = 0; /* * Basic strategy: attempt to fill in local LMBCS 1-char buffer.(LMBCS) * If that succeeds, see if it will all fit into the target & copy it over * if it does. * * We try conversions in the following order: * 1. Single-byte ascii & special fixed control chars (&null) * 2. Look up group in table & try that (could b * A) Unicode group * B) control group * C) national encodeing * or ambiguous SBCS or MBCS group (on to step 4...) * 3. If its ambiguous, try this order: * A) The optimization group * B) The locale group * C) The last group that succeeded with this string. * D) every other group that's relevant * E) If its single-byte ambiguous, try the exceptions group * 4. And as a grand fallback: Unicode */ while (source.hasRemaining() && err.isUnderflow()) { if (!target.hasRemaining()) { err = CoderResult.OVERFLOW; break; } uniChar = source.get(source.position()); bytes_written = 0; pLMBCS = 0; /* check cases in rough order of how common they are, for speed */ /* single-byte matches: strategy 1 */ if (((uniChar > ULMBCS_C0END) && (uniChar < ULMBCS_C1START)) || uniChar == 0 || uniChar == ULMBCS_HT || uniChar == ULMBCS_CR || uniChar == ULMBCS_LF || uniChar == ULMBCS_123SYSTEMRANGE) { LMBCS[pLMBCS++] = (byte)uniChar; bytes_written = 1; } if (bytes_written == 0) { /* Check by Unicode rage (Strategy 2) */ short group = FindLMBCSUniRange(uniChar); if (group == ULMBCS_GRP_UNICODE) { /* (Strategy 2A) */ bytes_written = LMBCSConvertUni(LMBCS, uniChar); } else if (group == ULMBCS_GRP_CTRL) { /* Strategy 2B) */ /* Handle control characters here */ if (uniChar <= ULMBCS_C0END) { LMBCS[pLMBCS++] = ULMBCS_GRP_CTRL; LMBCS[pLMBCS++] = (byte)(ULMBCS_CTRLOFFSET + uniChar); } else if (uniChar >= ULMBCS_C1START && uniChar <= (ULMBCS_C1START + ULMBCS_CTRLOFFSET)) { LMBCS[pLMBCS++] = ULMBCS_GRP_CTRL; LMBCS[pLMBCS++] = (byte)uniChar; } bytes_written = pLMBCS; } else if (group < ULMBCS_GRP_UNICODE) { /* (Strategy 2C) */ /* a specific converter has been identified - use it */ bytes_written = LMBCSConversionWorker(group, LMBCS, uniChar, lastConverterIndex, groups_tried); } if (bytes_written == 0) { /* the ambiguous group cases (Strategy 3) */ groups_tried = new boolean[ULMBCS_GRP_LAST+1]; /* check for non-default optimization group (Strategy 3A) */ if (extraInfo.OptGroup != 1 && ULMBCS_AMBIGUOUS_MATCH(group, extraInfo.OptGroup)) { bytes_written = LMBCSConversionWorker(extraInfo.OptGroup, LMBCS, uniChar, lastConverterIndex, groups_tried); } /* check for locale optimization group (Strategy 3B) */ if (bytes_written == 0 && extraInfo.localeConverterIndex > 0 && ULMBCS_AMBIGUOUS_MATCH(group, extraInfo.localeConverterIndex)) { bytes_written = LMBCSConversionWorker(extraInfo.localeConverterIndex, LMBCS, uniChar, lastConverterIndex, groups_tried); } /* check for last optimization group used for this string (Strategy 3C) */ if (bytes_written == 0 && lastConverterIndex[0] > 0 && ULMBCS_AMBIGUOUS_MATCH(group, lastConverterIndex[0])) { bytes_written = LMBCSConversionWorker(lastConverterIndex[0], LMBCS, uniChar, lastConverterIndex, groups_tried); } if (bytes_written == 0) { /* just check every possible matching converter (Strategy 3D) */ short grp_start; short grp_end; short grp_ix; grp_start = (group == ULMBCS_AMBIGUOUS_MBCS) ? ULMBCS_DOUBLEOPTGROUP_START : ULMBCS_GRP_L1; grp_end = (group == ULMBCS_AMBIGUOUS_MBCS) ? ULMBCS_GRP_LAST : ULMBCS_GRP_TH; for (grp_ix = grp_start; grp_ix <= grp_end && bytes_written == 0; grp_ix++) { if (extraInfo.OptGrpConverter[grp_ix] != null && !groups_tried[grp_ix]) { bytes_written = LMBCSConversionWorker(grp_ix, LMBCS, uniChar, lastConverterIndex, groups_tried); } } /* * a final conversion fallback to the exceptions group if its likely * to be single byte (Strategy 3E) */ if (bytes_written == 0 && grp_start == ULMBCS_GRP_L1) { bytes_written = LMBCSConversionWorker(ULMBCS_GRP_EXCEPT, LMBCS, uniChar, lastConverterIndex, groups_tried); } } /* all of our other strategies failed. Fallback to Unicode. (Strategy 4) */ if (bytes_written == 0) { bytes_written = LMBCSConvertUni(LMBCS, uniChar); } } } /* we have a translation. increment source and write as much as possible to target */ source.get(); pLMBCS = 0; while (target.hasRemaining() && bytes_written > 0) { bytes_written--; target.put(LMBCS[pLMBCS++]); if (offsets != null) { offsets.put(sourceIndex); } } sourceIndex++; if (bytes_written > 0) { /* * write any bytes that didn't fit in target to the error buffer, * common code will move this to target if we get called back with * enough target room */ err = CoderResult.OVERFLOW; errorBufferLength = bytes_written; for (int i = 0; bytes_written > 0; i++, bytes_written--) { errorBuffer[i] = LMBCS[pLMBCS++]; } } } return err; } } public CharsetDecoder newDecoder() { return new CharsetDecoderLMBCS(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderLMBCS(this); } void getUnicodeSetImpl(UnicodeSet setFillIn, int which){ getCompleteUnicodeSet(setFillIn); } private byte[] fromUSubstitution = new byte[]{ 0x3F }; } icu4j-4.2/src/com/ibm/icu/charset/CharsetUTF32LE.java0000644000175000017500000000145011361046170022060 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; /** * The purpose of this class is to set isBigEndian to false and isEndianSpecified to true in the super class, and to * allow the Charset framework to open the variant UTF-32 converter without extra setup work. */ class CharsetUTF32LE extends CharsetUTF32 { public CharsetUTF32LE(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetUTF16.java0000644000175000017500000002564211361046170021652 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; /** * @author Niti Hantaweepant */ class CharsetUTF16 extends CharsetICU { private static final int SIGNATURE_LENGTH = 2; private static final byte[] fromUSubstitution_BE = { (byte) 0xff, (byte) 0xfd }; private static final byte[] fromUSubstitution_LE = { (byte) 0xfd, (byte) 0xff }; private static final byte[] BOM_BE = { (byte) 0xfe, (byte) 0xff }; private static final byte[] BOM_LE = { (byte) 0xff, (byte) 0xfe }; private static final int ENDIAN_XOR_BE = 0; private static final int ENDIAN_XOR_LE = 1; private static final int NEED_TO_WRITE_BOM = 1; private boolean isEndianSpecified; private boolean isBigEndian; private int endianXOR; private byte[] bom; private byte[] fromUSubstitution; public CharsetUTF16(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); this.isEndianSpecified = (this instanceof CharsetUTF16BE || this instanceof CharsetUTF16LE); this.isBigEndian = !(this instanceof CharsetUTF16LE); if (isBigEndian) { this.bom = BOM_BE; this.fromUSubstitution = fromUSubstitution_BE; this.endianXOR = ENDIAN_XOR_BE; } else { this.bom = BOM_LE; this.fromUSubstitution = fromUSubstitution_LE; this.endianXOR = ENDIAN_XOR_LE; } maxBytesPerChar = 4; minBytesPerChar = 2; maxCharsPerByte = 1; } class CharsetDecoderUTF16 extends CharsetDecoderICU { private boolean isBOMReadYet; private int actualEndianXOR; private byte[] actualBOM; public CharsetDecoderUTF16(CharsetICU cs) { super(cs); } protected void implReset() { super.implReset(); isBOMReadYet = false; actualBOM = null; } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { /* * If we detect a BOM in this buffer, then we must add the BOM size to the offsets because the actual * converter function will not see and count the BOM. offsetDelta will have the number of the BOM bytes that * are in the current buffer. */ if (!isBOMReadYet) { while (true) { if (!source.hasRemaining()) return CoderResult.UNDERFLOW; toUBytesArray[toULength++] = source.get(); if (toULength == 1) { // on the first byte, we haven't decided whether or not it's bigEndian yet if ((!isEndianSpecified || isBigEndian) && toUBytesArray[toULength - 1] == BOM_BE[toULength - 1]) { actualBOM = BOM_BE; actualEndianXOR = ENDIAN_XOR_BE; } else if ((!isEndianSpecified || !isBigEndian) && toUBytesArray[toULength - 1] == BOM_LE[toULength - 1]) { actualBOM = BOM_LE; actualEndianXOR = ENDIAN_XOR_LE; } else { // we do not have a BOM (and we have toULength==1 bytes) actualBOM = null; actualEndianXOR = endianXOR; break; } } else if (toUBytesArray[toULength - 1] != actualBOM[toULength - 1]) { // we do not have a BOM (and we have toULength bytes) actualBOM = null; actualEndianXOR = endianXOR; break; } else if (toULength == SIGNATURE_LENGTH) { // we found a BOM! at last! // too bad we have to get ignore it now (like it was unwanted or something) toULength = 0; break; } } isBOMReadYet = true; } // now that we no longer need to look for a BOM, let's do some work // if we have unfinished business if (toUnicodeStatus != 0) { CoderResult cr = decodeTrail(source, target, offsets, (char) toUnicodeStatus); if (cr != null) return cr; } char char16; while (true) { while (toULength < 2) { if (!source.hasRemaining()) return CoderResult.UNDERFLOW; toUBytesArray[toULength++] = source.get(); } if (!target.hasRemaining()) return CoderResult.OVERFLOW; char16 = (char) (((toUBytesArray[0 ^ actualEndianXOR] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | ((toUBytesArray[1 ^ actualEndianXOR] & UConverterConstants.UNSIGNED_BYTE_MASK))); if (!UTF16.isSurrogate(char16)) { toULength = 0; target.put(char16); } else { CoderResult cr = decodeTrail(source, target, offsets, char16); if (cr != null) return cr; } } } private final CoderResult decodeTrail(ByteBuffer source, CharBuffer target, IntBuffer offsets, char lead) { if (!UTF16.isLeadSurrogate(lead)) { // 2 bytes, lead malformed toUnicodeStatus = 0; return CoderResult.malformedForLength(2); } while (toULength < 4) { if (!source.hasRemaining()) { // let this be unfinished business toUnicodeStatus = lead; return CoderResult.UNDERFLOW; } toUBytesArray[toULength++] = source.get(); } char trail = (char) (((toUBytesArray[2 ^ actualEndianXOR] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | ((toUBytesArray[3 ^ actualEndianXOR] & UConverterConstants.UNSIGNED_BYTE_MASK))); if (!UTF16.isTrailSurrogate(trail)) { // pretend like we didnt read the last 2 bytes toULength = 2; source.position(source.position() - 2); // 2 bytes, lead malformed toUnicodeStatus = 0; return CoderResult.malformedForLength(2); } toUnicodeStatus = 0; toULength = 0; target.put(lead); if (target.hasRemaining()) { target.put(trail); return null; } else { /* Put in overflow buffer (not handled here) */ charErrorBufferArray[0] = trail; charErrorBufferLength = 1; return CoderResult.OVERFLOW; } } } class CharsetEncoderUTF16 extends CharsetEncoderICU { private final byte[] temp = new byte[4]; public CharsetEncoderUTF16(CharsetICU cs) { super(cs, fromUSubstitution); fromUnicodeStatus = isEndianSpecified ? 0 : NEED_TO_WRITE_BOM; } protected void implReset() { super.implReset(); fromUnicodeStatus = isEndianSpecified ? 0 : NEED_TO_WRITE_BOM; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult cr; /* write the BOM if necessary */ if (fromUnicodeStatus == NEED_TO_WRITE_BOM) { if (!target.hasRemaining()) return CoderResult.OVERFLOW; fromUnicodeStatus = 0; cr = fromUWriteBytes(this, bom, 0, bom.length, target, offsets, -1); if (cr.isOverflow()) return cr; } if (fromUChar32 != 0) { if (!target.hasRemaining()) return CoderResult.OVERFLOW; // a note: fromUChar32 will either be 0 or a lead surrogate cr = encodeChar(source, target, offsets, (char) fromUChar32); if (cr != null) return cr; } while (true) { if (!source.hasRemaining()) return CoderResult.UNDERFLOW; if (!target.hasRemaining()) return CoderResult.OVERFLOW; cr = encodeChar(source, target, offsets, source.get()); if (cr != null) return cr; } } private final CoderResult encodeChar(CharBuffer source, ByteBuffer target, IntBuffer offsets, char ch) { int sourceIndex = source.position() - 1; CoderResult cr; if (UTF16.isSurrogate(ch)) { cr = handleSurrogates(source, ch); if (cr != null) return cr; char trail = UTF16.getTrailSurrogate(fromUChar32); fromUChar32 = 0; // 4 bytes temp[0 ^ endianXOR] = (byte) (ch >>> 8); temp[1 ^ endianXOR] = (byte) (ch); temp[2 ^ endianXOR] = (byte) (trail >>> 8); temp[3 ^ endianXOR] = (byte) (trail); cr = fromUWriteBytes(this, temp, 0, 4, target, offsets, sourceIndex); } else { // 2 bytes temp[0 ^ endianXOR] = (byte) (ch >>> 8); temp[1 ^ endianXOR] = (byte) (ch); cr = fromUWriteBytes(this, temp, 0, 2, target, offsets, sourceIndex); } return (cr.isUnderflow() ? null : cr); } } public CharsetDecoder newDecoder() { return new CharsetDecoderUTF16(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderUTF16(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ getNonSurrogateUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetMBCS.java0000644000175000017500000076671311361046170021604 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.io.BufferedInputStream; import java.io.IOException; import java.io.InputStream; import java.nio.Buffer; import java.nio.BufferOverflowException; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.charset.UConverterSharedData.UConverterType; import com.ibm.icu.impl.ICUData; import com.ibm.icu.impl.ICUResourceBundle; import com.ibm.icu.impl.InvalidFormatException; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; import com.ibm.icu.charset.UConverterConstants; class CharsetMBCS extends CharsetICU { private byte[] fromUSubstitution = null; UConverterSharedData sharedData = null; private static final int MAX_VERSION_LENGTH = 4; // these variables are used in getUnicodeSet() and may be changed in future // typedef enum UConverterSetFilter { static final int UCNV_SET_FILTER_NONE = 1; static final int UCNV_SET_FILTER_DBCS_ONLY = 2; static final int UCNV_SET_FILTER_2022_CN = 3; static final int UCNV_SET_FILTER_SJIS= 4 ; static final int UCNV_SET_FILTER_GR94DBCS = 5; static final int UCNV_SET_FILTER_HZ = 6; static final int UCNV_SET_FILTER_COUNT = 7; // } UConverterSetFilter; /** * Fallbacks to Unicode are stored outside the normal state table and code point structures in a vector of items of * this type. They are sorted by offset. */ final class MBCSToUFallback { int offset; int codePoint; } /** * This is the MBCS part of the UConverterTable union (a runtime data structure). It keeps all the per-converter * data and points into the loaded mapping tables. */ static final class UConverterMBCSTable { /* toUnicode */ short countStates; byte dbcsOnlyState; boolean stateTableOwned; int countToUFallbacks; int stateTable[/* countStates */][/* 256 */]; int swapLFNLStateTable[/* countStates */][/* 256 */]; /* for swaplfnl */ char unicodeCodeUnits[/* countUnicodeResults */]; MBCSToUFallback toUFallbacks[/* countToUFallbacks */]; /* fromUnicode */ char fromUnicodeTable[]; byte fromUnicodeBytes[]; byte swapLFNLFromUnicodeBytes[]; /* for swaplfnl */ int fromUBytesLength; short outputType, unicodeMask; /* converter name for swaplfnl */ String swapLFNLName; /* extension data */ UConverterSharedData baseSharedData; // int extIndexes[]; ByteBuffer extIndexes; // create int[] view etc. as needed CharBuffer mbcsIndex; /* for fast conversion from most of BMP to MBCS (utf8Friendly data) */ char sbcsIndex[/* SBCS_FAST_LIMIT>>6 */]; /* for fast conversion from low BMP to SBCS (utf8Friendly data) */ boolean utf8Friendly; /* for utf8Friendly data */ char maxFastUChar; /* for utf8Friendly data */ /* roundtrips */ long asciiRoundtrips; UConverterMBCSTable() { utf8Friendly = false; mbcsIndex = null; sbcsIndex = new char[SBCS_FAST_LIMIT>>6]; } /* * UConverterMBCSTable(UConverterMBCSTable t) { countStates = t.countStates; dbcsOnlyState = t.dbcsOnlyState; * stateTableOwned = t.stateTableOwned; countToUFallbacks = t.countToUFallbacks; stateTable = t.stateTable; * swapLFNLStateTable = t.swapLFNLStateTable; unicodeCodeUnits = t.unicodeCodeUnits; toUFallbacks = * t.toUFallbacks; fromUnicodeTable = t.fromUnicodeTable; fromUnicodeBytes = t.fromUnicodeBytes; * swapLFNLFromUnicodeBytes = t.swapLFNLFromUnicodeBytes; fromUBytesLength = t.fromUBytesLength; outputType = * t.outputType; unicodeMask = t.unicodeMask; swapLFNLName = t.swapLFNLName; baseSharedData = t.baseSharedData; * extIndexes = t.extIndexes; } */ } /* Constants used in MBCS data header */ // enum { static final int MBCS_OPT_LENGTH_MASK=0x3f; static final int MBCS_OPT_NO_FROM_U=0x40; /* * If any of the following options bits are set, * then the file must be rejected. */ static final int MBCS_OPT_INCOMPATIBLE_MASK=0xffc0; /* * Remove bits from this mask as more options are recognized * by all implementations that use this constant. */ static final int MBCS_OPT_UNKNOWN_INCOMPATIBLE_MASK=0xff80; // }; /* Constants for fast and UTF-8-friendly conversion. */ // enum { static final int SBCS_FAST_MAX=0x0fff; /* maximum code point with UTF-8-friendly SBCS runtime code, see makeconv SBCS_UTF8_MAX */ static final int SBCS_FAST_LIMIT=SBCS_FAST_MAX+1; /* =0x1000 */ static final int MBCS_FAST_MAX=0xd7ff; /* maximum code point with UTF-8-friendly MBCS runtime code, see makeconv MBCS_UTF8_MAX */ static final int MBCS_FAST_LIMIT=MBCS_FAST_MAX+1; /* =0xd800 */ // }; /** * MBCS data header. See data format description above. */ final class MBCSHeader { byte version[/* U_MAX_VERSION_LENGTH */]; int countStates, countToUFallbacks, offsetToUCodeUnits, offsetFromUTable, offsetFromUBytes; int flags; int fromUBytesLength; /* new and required in version 5 */ int options; /* new and optional in version 5; used if options&MBCS_OPT_NO_FROM_U */ int fullStage2Length; /* number of 32-bit units */ MBCSHeader() { version = new byte[MAX_VERSION_LENGTH]; } } public CharsetMBCS(String icuCanonicalName, String javaCanonicalName, String[] aliases, String classPath, ClassLoader loader) throws InvalidFormatException { super(icuCanonicalName, javaCanonicalName, aliases); /* See if the icuCanonicalName contains certain option information. */ if (icuCanonicalName.indexOf(UConverterConstants.OPTION_SWAP_LFNL_STRING) > -1) { options = UConverterConstants.OPTION_SWAP_LFNL; icuCanonicalName = icuCanonicalName.substring(0, icuCanonicalName.indexOf(UConverterConstants.OPTION_SWAP_LFNL_STRING)); super.icuCanonicalName = icuCanonicalName; } // now try to load the data sharedData = loadConverter(1, icuCanonicalName, classPath, loader); maxBytesPerChar = sharedData.staticData.maxBytesPerChar; minBytesPerChar = sharedData.staticData.minBytesPerChar; maxCharsPerByte = 1; fromUSubstitution = sharedData.staticData.subChar; subChar = sharedData.staticData.subChar; subCharLen = sharedData.staticData.subCharLen; subChar1 = sharedData.staticData.subChar1; fromUSubstitution = new byte[sharedData.staticData.subCharLen]; System.arraycopy(sharedData.staticData.subChar, 0, fromUSubstitution, 0, sharedData.staticData.subCharLen); initializeConverter(options); } public CharsetMBCS(String icuCanonicalName, String javaCanonicalName, String[] aliases) throws InvalidFormatException { this(icuCanonicalName, javaCanonicalName, aliases, ICUResourceBundle.ICU_BUNDLE, null); } private UConverterSharedData loadConverter(int nestedLoads, String myName, String classPath, ClassLoader loader) throws InvalidFormatException { boolean noFromU = false; // Read converter data from file UConverterStaticData staticData = new UConverterStaticData(); UConverterDataReader reader = null; try { String resourceName = classPath + "/" + myName + "." + UConverterSharedData.DATA_TYPE; InputStream i; if (loader != null) { i = ICUData.getRequiredStream(loader, resourceName); } else { i = ICUData.getRequiredStream(resourceName); } BufferedInputStream b = new BufferedInputStream(i, UConverterConstants.CNV_DATA_BUFFER_SIZE); reader = new UConverterDataReader(b); reader.readStaticData(staticData); } catch (IOException e) { throw new InvalidFormatException(); } catch (Exception e) { throw new InvalidFormatException(); } UConverterSharedData data = null; int type = staticData.conversionType; if (type != UConverterSharedData.UConverterType.MBCS || staticData.structSize != UConverterStaticData.SIZE_OF_UCONVERTER_STATIC_DATA) { throw new InvalidFormatException(); } data = new UConverterSharedData(1, null, false, 0); data.dataReader = reader; data.staticData = staticData; data.sharedDataCached = false; // Load data UConverterMBCSTable mbcsTable = data.mbcs; MBCSHeader header = new MBCSHeader(); try { reader.readMBCSHeader(header); } catch (IOException e) { throw new InvalidFormatException(); } int offset; // int[] extIndexesArray = null; String baseNameString = null; int[][] stateTableArray = null; MBCSToUFallback[] toUFallbacksArray = null; char[] unicodeCodeUnitsArray = null; char[] fromUnicodeTableArray = null; byte[] fromUnicodeBytesArray = null; if (header.version[0] == 5 && header.version[1] >= 3 && (header.options & MBCS_OPT_UNKNOWN_INCOMPATIBLE_MASK) == 0) { noFromU = ((header.options & MBCS_OPT_NO_FROM_U) != 0); } else if (header.version[0] != 4) { throw new InvalidFormatException(); } mbcsTable.outputType = (byte) header.flags; /* extension data, header version 4.2 and higher */ offset = header.flags >>> 8; // if(offset!=0 && mbcsTable.outputType == MBCS_OUTPUT_EXT_ONLY) { if (mbcsTable.outputType == MBCS_OUTPUT_EXT_ONLY) { try { baseNameString = reader.readBaseTableName(); if (offset != 0) { // agljport:commment subtract 32 for sizeof(_MBCSHeader) and length of baseNameString and 1 null // terminator byte all already read; mbcsTable.extIndexes = reader.readExtIndexes(offset - (reader.bytesRead - reader.staticDataBytesRead)); } } catch (IOException e) { throw new InvalidFormatException(); } } // agljport:add this would be unnecessary if extIndexes were memory mapped /* * if(mbcsTable.extIndexes != null) { * * try { //int nbytes = mbcsTable.extIndexes[UConverterExt.UCNV_EXT_TO_U_LENGTH]*4 + * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_TO_U_UCHARS_LENGTH]*2 + * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_FROM_U_LENGTH]*6 + * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_FROM_U_BYTES_LENGTH] + * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_FROM_U_STAGE_12_LENGTH]*2 + * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_FROM_U_STAGE_3_LENGTH]*2 + * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_FROM_U_STAGE_3B_LENGTH]*4; //int nbytes = * mbcsTable.extIndexes[UConverterExt.UCNV_EXT_SIZE] //byte[] extTables = dataReader.readExtTables(nbytes); * //mbcsTable.extTables = ByteBuffer.wrap(extTables); } catch(IOException e) { System.err.println("Caught * IOException: " + e.getMessage()); pErrorCode[0] = UErrorCode.U_INVALID_FORMAT_ERROR; return; } } */ if (mbcsTable.outputType == MBCS_OUTPUT_EXT_ONLY) { UConverterSharedData baseSharedData = null; ByteBuffer extIndexes; String baseName; /* extension-only file, load the base table and set values appropriately */ extIndexes = mbcsTable.extIndexes; if (extIndexes == null) { /* extension-only file without extension */ throw new InvalidFormatException(); } if (nestedLoads != 1) { /* an extension table must not be loaded as a base table */ throw new InvalidFormatException(); } /* load the base table */ baseName = baseNameString; if (baseName.equals(staticData.name)) { /* forbid loading this same extension-only file */ throw new InvalidFormatException(); } // agljport:fix args.size=sizeof(UConverterLoadArgs); baseSharedData = loadConverter(2, baseName, classPath, loader); if (baseSharedData.staticData.conversionType != UConverterType.MBCS || baseSharedData.mbcs.baseSharedData != null) { // agljport:fix ucnv_unload(baseSharedData); throw new InvalidFormatException(); } /* copy the base table data */ // agljport:comment deep copy in C changes mbcs through local reference mbcsTable; in java we probably don't // need the deep copy so can just make sure mbcs and its local reference both refer to the same new object mbcsTable = data.mbcs = baseSharedData.mbcs; /* overwrite values with relevant ones for the extension converter */ mbcsTable.baseSharedData = baseSharedData; mbcsTable.extIndexes = extIndexes; /* * It would be possible to share the swapLFNL data with a base converter, but the generated name would have * to be different, and the memory would have to be free'd only once. It is easier to just create the data * for the extension converter separately when it is requested. */ mbcsTable.swapLFNLStateTable = null; mbcsTable.swapLFNLFromUnicodeBytes = null; mbcsTable.swapLFNLName = null; /* * Set a special, runtime-only outputType if the extension converter is a DBCS version of a base converter * that also maps single bytes. */ if (staticData.conversionType == UConverterType.DBCS || (staticData.conversionType == UConverterType.MBCS && staticData.minBytesPerChar >= 2)) { if (baseSharedData.mbcs.outputType == MBCS_OUTPUT_2_SISO) { /* the base converter is SI/SO-stateful */ int entry; /* get the dbcs state from the state table entry for SO=0x0e */ entry = mbcsTable.stateTable[0][0xe]; if (MBCS_ENTRY_IS_FINAL(entry) && MBCS_ENTRY_FINAL_ACTION(entry) == MBCS_STATE_CHANGE_ONLY && MBCS_ENTRY_FINAL_STATE(entry) != 0) { mbcsTable.dbcsOnlyState = (byte) MBCS_ENTRY_FINAL_STATE(entry); mbcsTable.outputType = MBCS_OUTPUT_DBCS_ONLY; } } else if (baseSharedData.staticData.conversionType == UConverterType.MBCS && baseSharedData.staticData.minBytesPerChar == 1 && baseSharedData.staticData.maxBytesPerChar == 2 && mbcsTable.countStates <= 127) { /* non-stateful base converter, need to modify the state table */ int newStateTable[][/* 256 */]; int state[]; // this works because java 2-D array is array of references and we can have state = // newStateTable[i]; int i, count; /* allocate a new state table and copy the base state table contents */ count = mbcsTable.countStates; newStateTable = new int[(count + 1) * 1024][256]; for (i = 0; i < mbcsTable.stateTable.length; ++i) System.arraycopy(mbcsTable.stateTable[i], 0, newStateTable[i], 0, mbcsTable.stateTable[i].length); /* change all final single-byte entries to go to a new all-illegal state */ state = newStateTable[0]; for (i = 0; i < 256; ++i) { if (MBCS_ENTRY_IS_FINAL(state[i])) { state[i] = MBCS_ENTRY_TRANSITION(count, 0); } } /* build the new all-illegal state */ state = newStateTable[count]; for (i = 0; i < 256; ++i) { state[i] = MBCS_ENTRY_FINAL(0, MBCS_STATE_ILLEGAL, 0); } mbcsTable.stateTable = newStateTable; mbcsTable.countStates = (byte) (count + 1); mbcsTable.stateTableOwned = true; mbcsTable.outputType = MBCS_OUTPUT_DBCS_ONLY; } } /* * unlike below for files with base tables, do not get the unicodeMask from the sharedData; instead, use the * base table's unicodeMask, which we copied in the memcpy above; this is necessary because the static data * unicodeMask, especially the UCNV_HAS_SUPPLEMENTARY flag, is part of the base table data */ } else { /* conversion file with a base table; an additional extension table is optional */ /* make sure that the output type is known */ switch (mbcsTable.outputType) { case MBCS_OUTPUT_1: case MBCS_OUTPUT_2: case MBCS_OUTPUT_3: case MBCS_OUTPUT_4: case MBCS_OUTPUT_3_EUC: case MBCS_OUTPUT_4_EUC: case MBCS_OUTPUT_2_SISO: /* OK */ break; default: throw new InvalidFormatException(); } stateTableArray = new int[header.countStates][256]; toUFallbacksArray = new MBCSToUFallback[header.countToUFallbacks]; for (int i = 0; i < toUFallbacksArray.length; ++i) toUFallbacksArray[i] = new MBCSToUFallback(); unicodeCodeUnitsArray = new char[(header.offsetFromUTable - header.offsetToUCodeUnits) / 2]; fromUnicodeTableArray = new char[(header.offsetFromUBytes - header.offsetFromUTable) / 2]; fromUnicodeBytesArray = new byte[header.fromUBytesLength]; try { reader.readMBCSTable(stateTableArray, toUFallbacksArray, unicodeCodeUnitsArray, fromUnicodeTableArray, fromUnicodeBytesArray); } catch (IOException e) { throw new InvalidFormatException(); } mbcsTable.countStates = (byte) header.countStates; mbcsTable.countToUFallbacks = header.countToUFallbacks; mbcsTable.stateTable = stateTableArray; mbcsTable.toUFallbacks = toUFallbacksArray; mbcsTable.unicodeCodeUnits = unicodeCodeUnitsArray; mbcsTable.fromUnicodeTable = fromUnicodeTableArray; mbcsTable.fromUnicodeBytes = fromUnicodeBytesArray; mbcsTable.fromUBytesLength = header.fromUBytesLength; /* * converter versions 6.1 and up contain a unicodeMask that is used here to select the most efficient * function implementations */ // agljport:fix info.size=sizeof(UDataInfo); // agljport:fix udata_getInfo((UDataMemory *)sharedData->dataMemory, &info); // agljport:fix if(info.formatVersion[0]>6 || (info.formatVersion[0]==6 && info.formatVersion[1]>=1)) { /* mask off possible future extensions to be safe */ mbcsTable.unicodeMask = (short) (staticData.unicodeMask & 3); // agljport:fix } else { /* for older versions, assume worst case: contains anything possible (prevent over-optimizations) */ // agljport:fix mbcsTable->unicodeMask=UCNV_HAS_SUPPLEMENTARY|UCNV_HAS_SURROGATES; // agljport:fix } if (offset != 0) { try { // agljport:commment subtract 32 for sizeof(_MBCSHeader) and length of baseNameString and 1 null // terminator byte all already read; // int namelen = baseNameString != null? baseNameString.length() + 1: 0; mbcsTable.extIndexes = reader.readExtIndexes(offset - (reader.bytesRead - reader.staticDataBytesRead)); } catch (IOException e) { throw new InvalidFormatException(); } } if (header.version[1] >= 3 && (mbcsTable.unicodeMask & UConverterConstants.HAS_SURROGATES) == 0 && (mbcsTable.countStates == 1 ? ((char)header.version[2] >= (SBCS_FAST_MAX>>8)) : ((char)header.version[2] >= (MBCS_FAST_MAX>>8)))) { mbcsTable.utf8Friendly = true; if (mbcsTable.countStates == 1) { /* * SBCS: Stage 3 is allocated in 64-entry blocks for U+0000..SBCS_FAST_MAX or higher. * Build a table with indexes to each block, to be used instaed of * the regular stage 1/2 table. */ for (int i = 0; i < (SBCS_FAST_LIMIT>>6); ++i) { mbcsTable.sbcsIndex[i] = mbcsTable.fromUnicodeTable[mbcsTable.fromUnicodeTable[i>>4]+((i<<2)&0x3c)]; } /* set SBCS_FAST_MAX to reflect the reach of sbcsIndex[] even if header.version[2]>(SBCS_FAST_MAX>>8) */ mbcsTable.maxFastUChar = SBCS_FAST_MAX; } else { /* * MBCS: Stage 3 is allocated in 64-entry blocks for U+0000..MBCS_FAST_MAX or higher. * The .cnv file is prebuilt with an additional stage table with indexes to each block. */ if (noFromU) { mbcsTable.mbcsIndex = ByteBuffer.wrap(mbcsTable.fromUnicodeBytes).asCharBuffer(); } mbcsTable.maxFastUChar = (char)((header.version[2]<<8) | 0xff); } } /* calculate a bit set of 4 ASCII characters per bit that round-trip to ASCII bytes */ { long asciiRoundtrips = 0xffffffff; for (int i = 0; i < 0x80; ++i) { if (mbcsTable.stateTable[0][i] != MBCS_ENTRY_FINAL(0, MBCS_STATE_VALID_DIRECT_16, i)) { asciiRoundtrips&=~((long)1<<(i>>2))&UConverterConstants.UNSIGNED_INT_MASK; } } mbcsTable.asciiRoundtrips = asciiRoundtrips&UConverterConstants.UNSIGNED_INT_MASK; } if (noFromU) { int stage1Length = (mbcsTable.unicodeMask&UConverterConstants.HAS_SUPPLEMENTARY) != 0 ? 0x440 : 0x40; int stage2Length = (header.offsetFromUBytes - header.offsetFromUTable)/4 - stage1Length/2; reconstituteData(mbcsTable, stage1Length, stage2Length, header.fullStage2Length); } if (mbcsTable.outputType == MBCS_OUTPUT_DBCS_ONLY || mbcsTable.outputType == MBCS_OUTPUT_2_SISO) { /* * MBCS_OUTPUT_DBCS_ONLY: No SBCS mappings, therefore ASCII does not roundtrip. * MBCS_OUTPUT_2_SISO: Bypass the ASCII fastpath to handle prevLength correctly. */ mbcsTable.asciiRoundtrips = 0; } } return data; } private static boolean writeStage3Roundtrip(UConverterMBCSTable mbcsTable, long value, int codePoints[]) { char[] table; byte[] bytes; int stage2; int p; int c; int i, st3; long temp; table = mbcsTable.fromUnicodeTable; bytes = mbcsTable.fromUnicodeBytes; /* for EUC outputTypes, modify the value like genmbcs.c's transformEUC() */ switch(mbcsTable.outputType) { case MBCS_OUTPUT_3_EUC: if(value<=0xffff) { /* short sequences are stored directly */ /* code set 0 or 1 */ } else if(value<=0x8effff) { /* code set 2 */ value&=0x7fff; } else /* first byte is 0x8f */ { /* code set 3 */ value&=0xff7f; } break; case MBCS_OUTPUT_4_EUC: if(value<=0xffffff) { /* short sequences are stored directly */ /* code set 0 or 1 */ } else if(value<=0x8effffff) { /* code set 2 */ value&=0x7fffff; } else /* first byte is 0x8f */ { /* code set 3 */ value&=0xff7fff; } break; default: break; } for(i=0; i<=0x1f; ++value, ++i) { c=codePoints[i]; if(c<0) { continue; } /* locate the stage 2 & 3 data */ stage2 = table[c>>10] + ((c>>4)&0x3f); st3 = table[stage2*2]<<16|table[stage2*2 + 1]; st3 = (int)(char)(st3 * 16 + (c&0xf)); /* write the codepage bytes into stage 3 */ switch(mbcsTable.outputType) { case MBCS_OUTPUT_3: case MBCS_OUTPUT_4_EUC: p = st3*3; bytes[p] = (byte)(value>>16); bytes[p+1] = (byte)(value>>8); bytes[p+2] = (byte)value; break; case MBCS_OUTPUT_4: bytes[st3*4] = (byte)(value >> 24); bytes[st3*4 + 1] = (byte)(value >> 16); bytes[st3*4 + 2] = (byte)(value >> 8); bytes[st3*4 + 3] = (byte)value; break; default: /* 2 bytes per character */ bytes[st3*2] = (byte)(value >> 8); bytes[st3*2 + 1] = (byte)value; break; } /* set the roundtrip flag */ temp = (1L<<(16+(c&0xf))); table[stage2*2] |= (char)(temp>>16); table[stage2*2 + 1] |= (char)temp; } return true; } private static void reconstituteData(UConverterMBCSTable mbcsTable, int stage1Length, int stage2Length, int fullStage2Length) { int datalength = stage1Length*2+fullStage2Length*4+mbcsTable.fromUBytesLength; int offset = 0; byte[] stage = new byte[datalength]; for (int i = 0; i < stage1Length; ++i) { stage[i*2] = (byte)(mbcsTable.fromUnicodeTable[i]>>8); stage[i*2+1] = (byte)(mbcsTable.fromUnicodeTable[i]); } offset = ((fullStage2Length - stage2Length) * 4) + (stage1Length * 2); for (int i = 0; i < stage2Length; ++i) { stage[offset + i*4] = (byte)(mbcsTable.fromUnicodeTable[stage1Length + i*2]>>8); stage[offset + i*4+1] = (byte)(mbcsTable.fromUnicodeTable[stage1Length + i*2]); stage[offset + i*4+2] = (byte)(mbcsTable.fromUnicodeTable[stage1Length + i*2+1]>>8); stage[offset + i*4+3] = (byte)(mbcsTable.fromUnicodeTable[stage1Length + i*2+1]); } /* indexes into stage 2 count from the bottom of the fromUnicodeTable */ /* reconsitute the initial part of stage 2 from the mbcsIndex */ { int stageUTF8Length=((int)(mbcsTable.maxFastUChar+1))>>6; int stageUTF8Index=0; int st1, st2, st3, i; for (st1 = 0; stageUTF8Index < stageUTF8Length; ++st1) { st2 = ((char)stage[2*st1]<<8) | stage[2*st1+1]; if (st2 != stage1Length/2) { /* each stage 2 block has 64 entries corresponding to 16 entries in the mbcsIndex */ for (i = 0; i < 16; ++i) { st3 = mbcsTable.mbcsIndex.get(stageUTF8Index++); if (st3 != 0) { /* a stage 2 entry's index is per stage 3 16-block, not per stage 3 entry */ st3>>=4; /* * 4 stage 2 entries point to 4 consecutive stage 3 16-blocks which are * allocated together as a single 64-block for access from the mbcsIndex */ stage[4*st2] = (byte)(st3>>24); stage[4*st2+1] = (byte)(st3>>16); stage[4*st2+2] = (byte)(st3>>8); stage[4*st2+3] = (byte)(st3); st2++; st3++; stage[4*st2] = (byte)(st3>>24); stage[4*st2+1] = (byte)(st3>>16); stage[4*st2+2] = (byte)(st3>>8); stage[4*st2+3] = (byte)(st3); st2++; st3++; stage[4*st2] = (byte)(st3>>24); stage[4*st2+1] = (byte)(st3>>16); stage[4*st2+2] = (byte)(st3>>8); stage[4*st2+3] = (byte)(st3); st2++; st3++; stage[4*st2] = (byte)(st3>>24); stage[4*st2+1] = (byte)(st3>>16); stage[4*st2+2] = (byte)(st3>>8); stage[4*st2+3] = (byte)(st3); } else { /* no stage 3 block, skip */ st2+=4; } } } else { /* no stage 2 block, skip */ stageUTF8Index+=16; } } } char[] stage1 = new char[stage.length/2]; for (int i = 0; i < stage1.length; ++i) { stage1[i] = (char)(((stage[i*2])<<8)|(stage[i*2+1] & UConverterConstants.UNSIGNED_BYTE_MASK)); } byte[] stage2 = new byte[stage.length - ((stage1Length * 2) + (fullStage2Length * 4))]; System.arraycopy(stage, ((stage1Length * 2) + (fullStage2Length * 4)), stage2, 0, stage2.length); mbcsTable.fromUnicodeTable = stage1; mbcsTable.fromUnicodeBytes = stage2; /* reconstitute fromUnicodeBytes with roundtrips from toUnicode data */ MBCSEnumToUnicode(mbcsTable); } /* * Internal function enumerating the toUnicode data of an MBCS converter. * Currently only used for reconstituting data for a MBCS_OPT_NO_FROM_U * table, but could also be used for a future getUnicodeSet() option * that includes reverse fallbacks (after updating this function's implementation). * Currently only handles roundtrip mappings. * Does not currently handle extensions. */ private static void MBCSEnumToUnicode(UConverterMBCSTable mbcsTable) { /* * Properties for each state, to speed up the enumeration. * Ignorable actions are unassigned/illegal/state-change-only: * They do not lead to mappings. * * Bits 7..6 * 1 direct/initial state (stateful converters have mulitple) * 0 non-initial state with transitions or with nonignorable result actions * -1 final state with only ignorable actions * * Bits 5..3 * The lowest byte value with non-ignorable actions is * value<<5 (rounded down). * * Bits 2..0: * The highest byte value with non-ignorable actions is * (value<<5)&0x1f (rounded up). */ byte stateProps[] = new byte[MBCS_MAX_STATE_COUNT]; int state; /* recurse from state 0 and set all stateProps */ getStateProp(mbcsTable.stateTable, stateProps, 0); for (state = 0; state < mbcsTable.countStates; ++state) { if (stateProps[state] >= 0x40) { /* start from each direct state */ enumToU(mbcsTable, stateProps, state, 0, 0); } } } private static boolean enumToU(UConverterMBCSTable mbcsTable, byte stateProps[], int state, int offset, int value) { int[] codePoints = new int[32]; int[] row; char[] unicodeCodeUnits; int anyCodePoints; int b, limit; row = mbcsTable.stateTable[state]; unicodeCodeUnits = mbcsTable.unicodeCodeUnits; value<<=8; anyCodePoints = -1; /* becomes non-negative if there is a mapping */ b = (stateProps[state]&0x38)<<2; if (b == 0 && stateProps[state] >= 0x40) { /* skip byte sequences with leading zeros because they are note stored in the fromUnicode table */ codePoints[0] = UConverterConstants.U_SENTINEL; b = 1; } limit = ((stateProps[state]&7)+1)<<5; while (b < limit) { int entry = row[b]; if (MBCS_ENTRY_IS_TRANSITION(entry)) { int nextState = MBCS_ENTRY_TRANSITION_STATE(entry); if (stateProps[nextState] >= 0) { /* recurse to a state with non-ignorable actions */ if (!enumToU(mbcsTable, stateProps, nextState, offset+MBCS_ENTRY_TRANSITION_OFFSET(entry), value|b)) { return false; } } codePoints[b&0x1f] = UConverterConstants.U_SENTINEL; } else { int c; int action; /* * An if-else-if chain provides more reliable performance for * the most common cases compared to a switch. */ action = MBCS_ENTRY_FINAL_ACTION(entry); if (action == MBCS_STATE_VALID_DIRECT_16) { /* output BMP code point */ c = (char)MBCS_ENTRY_FINAL_VALUE_16(entry); } else if (action == MBCS_STATE_VALID_16) { int finalOffset = offset+MBCS_ENTRY_FINAL_VALUE_16(entry); c = unicodeCodeUnits[finalOffset]; if (c < 0xfffe) { /* output BMP code point */ } else { c = UConverterConstants.U_SENTINEL; } } else if (action == MBCS_STATE_VALID_16_PAIR) { int finalOffset = offset+MBCS_ENTRY_FINAL_VALUE_16(entry); c = unicodeCodeUnits[finalOffset++]; if (c < 0xd800) { /* output BMP code point below 0xd800 */ } else if (c <= 0xdbff) { /* output roundtrip or fallback supplementary code point */ c = ((c&0x3ff)<<10)+unicodeCodeUnits[finalOffset]+(0x10000-0xdc00); } else if (c == 0xe000) { /* output roundtrip BMP code point above 0xd800 or fallback BMP code point */ c = unicodeCodeUnits[finalOffset]; } else { c = UConverterConstants.U_SENTINEL; } } else if (action == MBCS_STATE_VALID_DIRECT_20) { /* output supplementary code point */ c = (int)(MBCS_ENTRY_FINAL_VALUE(entry)+0x10000); } else { c = UConverterConstants.U_SENTINEL; } codePoints[b&0x1f] = c; anyCodePoints&=c; } if (((++b)&0x1f) == 0) { if(anyCodePoints>=0) { if(!writeStage3Roundtrip(mbcsTable, value|(b-0x20)&UConverterConstants.UNSIGNED_INT_MASK, codePoints)) { return false; } anyCodePoints=-1; } } } return true; } /* * Only called if stateProps[state]==-1. * A recursive call may do stateProps[state]|=0x40 if this state is the target of an * MBCS_STATE_CHANGE_ONLY. */ private static byte getStateProp(int stateTable[][], byte stateProps[], int state) { int[] row; int min, max, entry, nextState; row = stateTable[state]; stateProps[state] = 0; /* find first non-ignorable state */ for (min = 0;;++min) { entry = row[min]; nextState = MBCS_ENTRY_STATE(entry); if (stateProps[nextState] == -1) { getStateProp(stateTable, stateProps, nextState); } if (MBCS_ENTRY_IS_TRANSITION(entry)) { if (stateProps[nextState] >- 0) { break; } } else if (MBCS_ENTRY_FINAL_ACTION(entry) < MBCS_STATE_UNASSIGNED) { break; } if (min == 0xff) { stateProps[state] = -0x40; /* (byte)0xc0 */ return stateProps[state]; } } stateProps[state]|=(byte)((min>>5)<<3); /* find last non-ignorable state */ for (max = 0xff; min < max; --max) { entry = row[max]; nextState = MBCS_ENTRY_STATE(entry); if (stateProps[nextState] == -1) { getStateProp(stateTable, stateProps, nextState); } if (MBCS_ENTRY_IS_TRANSITION(entry)) { if (stateProps[nextState] >- 0) { break; } } else if (MBCS_ENTRY_FINAL_ACTION(entry) < MBCS_STATE_UNASSIGNED) { break; } } stateProps[state]|=(byte)(max>>5); /* recurse further and collect direct-state information */ while (min <= max) { entry = row[min]; nextState = MBCS_ENTRY_STATE(entry); if (stateProps[nextState] == -1) { getStateProp(stateTable, stateProps, nextState); } if (MBCS_ENTRY_IS_TRANSITION(entry)) { stateProps[nextState]|=0x40; if (MBCS_ENTRY_FINAL_ACTION(entry) <= MBCS_STATE_FALLBACK_DIRECT_20) { stateProps[state]|=0x40; } } ++min; } return stateProps[state]; } protected void initializeConverter(int myOptions) { UConverterMBCSTable mbcsTable; ByteBuffer extIndexes; short outputType; byte maxBytesPerUChar; mbcsTable = sharedData.mbcs; outputType = mbcsTable.outputType; if (outputType == MBCS_OUTPUT_DBCS_ONLY) { /* the swaplfnl option does not apply, remove it */ this.options = myOptions &= ~UConverterConstants.OPTION_SWAP_LFNL; } if ((myOptions & UConverterConstants.OPTION_SWAP_LFNL) != 0) { /* do this because double-checked locking is broken */ boolean isCached; // agljport:todo umtx_lock(NULL); isCached = mbcsTable.swapLFNLStateTable != null; // agljport:todo umtx_unlock(NULL); if (!isCached) { try { if (!EBCDICSwapLFNL()) { /* this option does not apply, remove it */ this.options = myOptions &= ~UConverterConstants.OPTION_SWAP_LFNL; } } catch (Exception e) { /* something went wrong. */ return; } } } if (icuCanonicalName.toLowerCase().indexOf("gb18030") >= 0) { /* set a flag for GB 18030 mode, which changes the callback behavior */ this.options |= MBCS_OPTION_GB18030; } /* fix maxBytesPerUChar depending on outputType and options etc. */ if (outputType == MBCS_OUTPUT_2_SISO) { maxBytesPerChar = 3; /* SO+DBCS */ } extIndexes = mbcsTable.extIndexes; if (extIndexes != null) { maxBytesPerUChar = (byte) GET_MAX_BYTES_PER_UCHAR(extIndexes); if (outputType == MBCS_OUTPUT_2_SISO) { ++maxBytesPerUChar; /* SO + multiple DBCS */ } if (maxBytesPerUChar > maxBytesPerChar) { maxBytesPerChar = maxBytesPerUChar; } } } /* EBCDIC swap LF<->NL--------------------------------------------------------------------------------*/ /* * This code modifies a standard EBCDIC<->Unicode mappling table for * OS/390 (z/OS) Unix System Services (Open Edition). * The difference is in the mapping of Line Feed and New Line control codes: * Standard EBDIC maps * * \x25 |0 * \x15 |0 * * but OS/390 USS EBCDIC swaps the control codes for LF and NL, * mapping * * \x15 |0 * \x25 |0 * * This code modifies a loaded standard EBCDIC<->Unicode mapping table * by copying it into allocated memory and swapping the LF and NL values. * It allows to support the same EBCDIC charset in both version without * duplicating the entire installed table. */ /* standard EBCDIC codes */ private static final short EBCDIC_LF = 0x0025; private static final short EBCDIC_NL = 0x0015; /* standard EBCDIC codes with roundtrip flag as stored in Unicode-to-single-byte tables */ private static final short EBCDIC_RT_LF = 0x0f25; private static final short EBCDIC_RT_NL = 0x0f15; /* Unicode code points */ private static final short U_LF = 0x000A; private static final short U_NL = 0x0085; private boolean EBCDICSwapLFNL() throws Exception { UConverterMBCSTable mbcsTable; char[] table; byte[] results; byte[] bytes; int[][] newStateTable; byte[] newResults; String newName; int stage2Entry; // int size; int sizeofFromUBytes; mbcsTable = sharedData.mbcs; table = mbcsTable.fromUnicodeTable; bytes = mbcsTable.fromUnicodeBytes; results = bytes; /* * Check that this is an EBCDIC table with SBCS portion - * SBCS or EBCDIC with standard EBCDIC LF and NL mappings. * * If not, ignore the option Options are always ignored if they do not apply. */ if (!((mbcsTable.outputType == MBCS_OUTPUT_1 || mbcsTable.outputType == MBCS_OUTPUT_2_SISO) && mbcsTable.stateTable[0][EBCDIC_LF] == MBCS_ENTRY_FINAL(0, MBCS_STATE_VALID_DIRECT_16, U_LF) && mbcsTable.stateTable[0][EBCDIC_NL] == MBCS_ENTRY_FINAL(0, MBCS_STATE_VALID_DIRECT_16, U_NL))) { return false; } if (mbcsTable.outputType == MBCS_OUTPUT_1) { if (!(EBCDIC_RT_LF == MBCS_SINGLE_RESULT_FROM_U(table, results, U_LF) && EBCDIC_RT_NL == MBCS_SINGLE_RESULT_FROM_U(table, results, U_NL))) { return false; } } else /* MBCS_OUTPUT_2_SISO */ { stage2Entry = MBCS_STAGE_2_FROM_U(table, U_LF); if (!(MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, U_LF) && EBCDIC_LF == MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, U_LF))) { return false; } stage2Entry = MBCS_STAGE_2_FROM_U(table, U_NL); if (!(MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, U_NL) && EBCDIC_NL == MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, U_NL))) { return false; } } if (mbcsTable.fromUBytesLength > 0) { /* * We _know_ the number of bytes in the fromUnicodeBytes array * starting with header.version 4.1. */ sizeofFromUBytes = mbcsTable.fromUBytesLength; } else { /* * Otherwise: * There used to be code to enumerate the fromUnicode * trie and find the highest entry, but it was removed in ICU 3.2 * because it was not tested and caused a low code coverage number. */ throw new Exception("U_INVALID_FORMAT_ERROR"); } /* * The table has an appropriate format. * Allocate and build * - a modified to-Unicode state table * - a modified from-Unicode output array * - a converter name string with the swap option appended */ // size = mbcsTable.countStates * 1024 + sizeofFromUBytes + UConverterConstants.MAX_CONVERTER_NAME_LENGTH + 20; /* copy and modify the to-Unicode state table */ newStateTable = new int[mbcsTable.stateTable.length][mbcsTable.stateTable[0].length]; for (int i = 0; i < newStateTable.length; i++) { System.arraycopy(mbcsTable.stateTable[i], 0, newStateTable[i], 0, newStateTable[i].length); } newStateTable[0][EBCDIC_LF] = MBCS_ENTRY_FINAL(0, MBCS_STATE_VALID_DIRECT_16, U_NL); newStateTable[0][EBCDIC_NL] = MBCS_ENTRY_FINAL(0, MBCS_STATE_VALID_DIRECT_16, U_LF); /* copy and modify the from-Unicode result table */ newResults = new byte[sizeofFromUBytes]; System.arraycopy(bytes, 0, newResults, 0, sizeofFromUBytes); /* conveniently, the table access macros work on the left side of expressions */ if (mbcsTable.outputType == MBCS_OUTPUT_1) { MBCS_SINGLE_RESULT_FROM_U_SET(table, newResults, U_LF, EBCDIC_RT_NL); MBCS_SINGLE_RESULT_FROM_U_SET(table, newResults, U_NL, EBCDIC_RT_LF); } else /* MBCS_OUTPUT_2_SISO */ { stage2Entry = MBCS_STAGE_2_FROM_U(table, U_LF); MBCS_VALUE_2_FROM_STAGE_2_SET(newResults, stage2Entry, U_LF, EBCDIC_NL); stage2Entry = MBCS_STAGE_2_FROM_U(table, U_NL); MBCS_VALUE_2_FROM_STAGE_2_SET(newResults, stage2Entry, U_NL, EBCDIC_LF); } /* set the canonical converter name */ newName = new String(icuCanonicalName); newName.concat(UConverterConstants.OPTION_SWAP_LFNL_STRING); if (mbcsTable.swapLFNLStateTable == null) { mbcsTable.swapLFNLStateTable = newStateTable; mbcsTable.swapLFNLFromUnicodeBytes = newResults; mbcsTable.swapLFNLName = newName; } return true; } /** * MBCS output types for conversions from Unicode. These per-converter types determine the storage method in stage 3 * of the lookup table, mostly how many bytes are stored per entry. */ static final int MBCS_OUTPUT_1 = 0; /* 0 */ static final int MBCS_OUTPUT_2 = MBCS_OUTPUT_1 + 1; /* 1 */ static final int MBCS_OUTPUT_3 = MBCS_OUTPUT_2 + 1; /* 2 */ static final int MBCS_OUTPUT_4 = MBCS_OUTPUT_3 + 1; /* 3 */ static final int MBCS_OUTPUT_3_EUC = 8; /* 8 */ static final int MBCS_OUTPUT_4_EUC = MBCS_OUTPUT_3_EUC + 1; /* 9 */ static final int MBCS_OUTPUT_2_SISO = 12; /* c */ static final int MBCS_OUTPUT_2_HZ = MBCS_OUTPUT_2_SISO + 1; /* d */ static final int MBCS_OUTPUT_EXT_ONLY = MBCS_OUTPUT_2_HZ + 1; /* e */ // static final int MBCS_OUTPUT_COUNT = MBCS_OUTPUT_EXT_ONLY + 1; static final int MBCS_OUTPUT_DBCS_ONLY = 0xdb; /* runtime-only type for DBCS-only handling of SISO tables */ /* GB 18030 data ------------------------------------------------------------ */ /* helper macros for linear values for GB 18030 four-byte sequences */ private static long LINEAR_18030(long a, long b, long c, long d) { return ((((a & 0xff) * 10 + (b & 0xff)) * 126L + (c & 0xff)) * 10L + (d & 0xff)); } private static long LINEAR_18030_BASE = LINEAR_18030(0x81, 0x30, 0x81, 0x30); private static long LINEAR(long x) { return LINEAR_18030(x >>> 24, (x >>> 16) & 0xff, (x >>> 8) & 0xff, x & 0xff); } /* * Some ranges of GB 18030 where both the Unicode code points and the GB four-byte sequences are contiguous and are * handled algorithmically by the special callback functions below. The values are start & end of Unicode & GB * codes. * * Note that single surrogates are not mapped by GB 18030 as of the re-released mapping tables from 2000-nov-30. */ private static final long gb18030Ranges[][] = new long[/* 13 */][/* 4 */] { { 0x10000L, 0x10FFFFL, LINEAR(0x90308130L), LINEAR(0xE3329A35L) }, { 0x9FA6L, 0xD7FFL, LINEAR(0x82358F33L), LINEAR(0x8336C738L) }, { 0x0452L, 0x200FL, LINEAR(0x8130D330L), LINEAR(0x8136A531L) }, { 0xE865L, 0xF92BL, LINEAR(0x8336D030L), LINEAR(0x84308534L) }, { 0x2643L, 0x2E80L, LINEAR(0x8137A839L), LINEAR(0x8138FD38L) }, { 0xFA2AL, 0xFE2FL, LINEAR(0x84309C38L), LINEAR(0x84318537L) }, { 0x3CE1L, 0x4055L, LINEAR(0x8231D438L), LINEAR(0x8232AF32L) }, { 0x361BL, 0x3917L, LINEAR(0x8230A633L), LINEAR(0x8230F237L) }, { 0x49B8L, 0x4C76L, LINEAR(0x8234A131L), LINEAR(0x8234E733L) }, { 0x4160L, 0x4336L, LINEAR(0x8232C937L), LINEAR(0x8232F837L) }, { 0x478EL, 0x4946L, LINEAR(0x8233E838L), LINEAR(0x82349638L) }, { 0x44D7L, 0x464BL, LINEAR(0x8233A339L), LINEAR(0x8233C931L) }, { 0xFFE6L, 0xFFFFL, LINEAR(0x8431A234L), LINEAR(0x8431A439L) } }; /* bit flag for UConverter.options indicating GB 18030 special handling */ private static final int MBCS_OPTION_GB18030 = 0x8000; // enum { static final int MBCS_MAX_STATE_COUNT = 128; // }; /** * MBCS action codes for conversions to Unicode. These values are in bits 23..20 of the state table entries. */ static final int MBCS_STATE_VALID_DIRECT_16 = 0; static final int MBCS_STATE_VALID_DIRECT_20 = MBCS_STATE_VALID_DIRECT_16 + 1; static final int MBCS_STATE_FALLBACK_DIRECT_16 = MBCS_STATE_VALID_DIRECT_20 + 1; static final int MBCS_STATE_FALLBACK_DIRECT_20 = MBCS_STATE_FALLBACK_DIRECT_16 + 1; static final int MBCS_STATE_VALID_16 = MBCS_STATE_FALLBACK_DIRECT_20 + 1; static final int MBCS_STATE_VALID_16_PAIR = MBCS_STATE_VALID_16 + 1; static final int MBCS_STATE_UNASSIGNED = MBCS_STATE_VALID_16_PAIR + 1; static final int MBCS_STATE_ILLEGAL = MBCS_STATE_UNASSIGNED + 1; static final int MBCS_STATE_CHANGE_ONLY = MBCS_STATE_ILLEGAL + 1; static int MBCS_ENTRY_SET_STATE(int entry, int state) { return (int)(((entry)&0x80ffffff)|((int)(state)<<24L)); } static int MBCS_ENTRY_STATE(int entry) { return (((entry)>>24)&0x7f); } /* Methods for state table entries */ static int MBCS_ENTRY_TRANSITION(int state, int offset) { return (state << 24L) | offset; } static int MBCS_ENTRY_FINAL(int state, int action, int value) { return (int) (0x80000000 | ((int) (state) << 24L) | ((action) << 20L) | (value)); } static boolean MBCS_ENTRY_IS_TRANSITION(int entry) { return (entry) >= 0; } static boolean MBCS_ENTRY_IS_FINAL(int entry) { return (entry) < 0; } static int MBCS_ENTRY_TRANSITION_STATE(int entry) { return ((entry) >>> 24); } static int MBCS_ENTRY_TRANSITION_OFFSET(int entry) { return ((entry) & 0xffffff); } static int MBCS_ENTRY_FINAL_STATE(int entry) { return ((entry) >>> 24) & 0x7f; } static boolean MBCS_ENTRY_FINAL_IS_VALID_DIRECT_16(int entry) { return ((entry) < 0x80100000); } static int MBCS_ENTRY_FINAL_ACTION(int entry) { return ((entry) >>> 20) & 0xf; } static int MBCS_ENTRY_FINAL_VALUE(int entry) { return ((entry) & 0xfffff); } static char MBCS_ENTRY_FINAL_VALUE_16(int entry) { return (char) (entry); } static boolean MBCS_IS_ASCII_ROUNDTRIP(int b, long asciiRoundtrips) { return (((asciiRoundtrips) & (1<<((b)>>2)))!=0); } /** * This macro version of _MBCSSingleSimpleGetNextUChar() gets a code point from a byte. It works for single-byte, * single-state codepages that only map to and from BMP code points, and it always returns fallback values. */ static char MBCS_SINGLE_SIMPLE_GET_NEXT_BMP(UConverterMBCSTable mbcs, final int b) { return MBCS_ENTRY_FINAL_VALUE_16(mbcs.stateTable[0][b & UConverterConstants.UNSIGNED_BYTE_MASK]); } /* single-byte fromUnicode: get the 16-bit result word */ static char MBCS_SINGLE_RESULT_FROM_U(char[] table, byte[] results, int c) { int i1 = table[c >>> 10] + ((c >>> 4) & 0x3f); int i = 2 * (table[i1] + (c & 0xf)); // used as index into byte[] array treated as char[] array return (char) (((results[i] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | (results[i + 1] & UConverterConstants.UNSIGNED_BYTE_MASK)); } /* single-byte fromUnicode: set the 16-bit result word with newValue*/ static void MBCS_SINGLE_RESULT_FROM_U_SET(char[] table, byte[] results, int c, int newValue) { int i1 = table[c >>> 10] + ((c >>> 4) & 0x3f); int i = 2 * (table[i1] + (c & 0xf)); // used as index into byte[] array treated as char[] array results[i] = (byte)((newValue >> 8) & UConverterConstants.UNSIGNED_BYTE_MASK); results[i + 1] = (byte)(newValue & UConverterConstants.UNSIGNED_BYTE_MASK); } /* multi-byte fromUnicode: get the 32-bit stage 2 entry */ static int MBCS_STAGE_2_FROM_U(char[] table, int c) { int i = 2 * (table[(c) >>> 10] + ((c >>> 4) & 0x3f)); // 2x because used as index into char[] array treated as // int[] array return ((table[i] & UConverterConstants.UNSIGNED_SHORT_MASK) << 16) | (table[i + 1] & UConverterConstants.UNSIGNED_SHORT_MASK); } private static boolean MBCS_FROM_U_IS_ROUNDTRIP(int stage2Entry, int c) { return (((stage2Entry) & (1 << (16 + ((c) & 0xf)))) != 0); } static char MBCS_VALUE_2_FROM_STAGE_2(byte[] bytes, int stage2Entry, int c) { int i = 2 * (16 * ((char) stage2Entry & UConverterConstants.UNSIGNED_SHORT_MASK) + (c & 0xf)); return (char) (((bytes[i] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | (bytes[i + 1] & UConverterConstants.UNSIGNED_BYTE_MASK)); } static void MBCS_VALUE_2_FROM_STAGE_2_SET(byte[] bytes, int stage2Entry, int c, int newValue) { int i = 2 * (16 * ((char) stage2Entry & UConverterConstants.UNSIGNED_SHORT_MASK) + (c & 0xf)); bytes[i] = (byte)((newValue >> 8) & UConverterConstants.UNSIGNED_BYTE_MASK); bytes[i + 1] = (byte)(newValue & UConverterConstants.UNSIGNED_BYTE_MASK); } private static int MBCS_VALUE_4_FROM_STAGE_2(byte[] bytes, int stage2Entry, int c) { int i = 4 * (16 * ((char) stage2Entry & UConverterConstants.UNSIGNED_SHORT_MASK) + (c & 0xf)); return ((bytes[i] & UConverterConstants.UNSIGNED_BYTE_MASK) << 24) | ((bytes[i + 1] & UConverterConstants.UNSIGNED_BYTE_MASK) << 16) | ((bytes[i + 2] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | (bytes[i + 3] & UConverterConstants.UNSIGNED_BYTE_MASK); } static int MBCS_POINTER_3_FROM_STAGE_2(byte[] bytes, int stage2Entry, int c) { return ((16 * ((char) (stage2Entry) & UConverterConstants.UNSIGNED_SHORT_MASK) + ((c) & 0xf)) * 3); } // ------------UConverterExt------------------------------------------------------- static final int EXT_INDEXES_LENGTH = 0; /* 0 */ static final int EXT_TO_U_INDEX = EXT_INDEXES_LENGTH + 1; /* 1 */ static final int EXT_TO_U_LENGTH = EXT_TO_U_INDEX + 1; static final int EXT_TO_U_UCHARS_INDEX = EXT_TO_U_LENGTH + 1; static final int EXT_TO_U_UCHARS_LENGTH = EXT_TO_U_UCHARS_INDEX + 1; static final int EXT_FROM_U_UCHARS_INDEX = EXT_TO_U_UCHARS_LENGTH + 1; /* 5 */ static final int EXT_FROM_U_VALUES_INDEX = EXT_FROM_U_UCHARS_INDEX + 1; static final int EXT_FROM_U_LENGTH = EXT_FROM_U_VALUES_INDEX + 1; static final int EXT_FROM_U_BYTES_INDEX = EXT_FROM_U_LENGTH + 1; static final int EXT_FROM_U_BYTES_LENGTH = EXT_FROM_U_BYTES_INDEX + 1; static final int EXT_FROM_U_STAGE_12_INDEX = EXT_FROM_U_BYTES_LENGTH + 1; /* 10 */ static final int EXT_FROM_U_STAGE_1_LENGTH = EXT_FROM_U_STAGE_12_INDEX + 1; static final int EXT_FROM_U_STAGE_12_LENGTH = EXT_FROM_U_STAGE_1_LENGTH + 1; static final int EXT_FROM_U_STAGE_3_INDEX = EXT_FROM_U_STAGE_12_LENGTH + 1; static final int EXT_FROM_U_STAGE_3_LENGTH = EXT_FROM_U_STAGE_3_INDEX + 1; static final int EXT_FROM_U_STAGE_3B_INDEX = EXT_FROM_U_STAGE_3_LENGTH + 1; static final int EXT_FROM_U_STAGE_3B_LENGTH = EXT_FROM_U_STAGE_3B_INDEX + 1; private static final int EXT_COUNT_BYTES = EXT_FROM_U_STAGE_3B_LENGTH + 1; /* 17 */ // private static final int EXT_COUNT_UCHARS = EXT_COUNT_BYTES + 1; // private static final int EXT_FLAGS = EXT_COUNT_UCHARS + 1; // // private static final int EXT_RESERVED_INDEX = EXT_FLAGS + 1; /* 20, moves with additional indexes */ // // private static final int EXT_SIZE=31; // private static final int EXT_INDEXES_MIN_LENGTH=32; static final int EXT_FROM_U_MAX_DIRECT_LENGTH = 3; /* toUnicode helpers -------------------------------------------------------- */ private static final int TO_U_BYTE_SHIFT = 24; private static final int TO_U_VALUE_MASK = 0xffffff; private static final int TO_U_MIN_CODE_POINT = 0x1f0000; private static final int TO_U_MAX_CODE_POINT = 0x2fffff; private static final int TO_U_ROUNDTRIP_FLAG = (1 << 23); private static final int TO_U_INDEX_MASK = 0x3ffff; private static final int TO_U_LENGTH_SHIFT = 18; private static final int TO_U_LENGTH_OFFSET = 12; /* maximum number of indexed UChars */ static final int MAX_UCHARS = 19; static int TO_U_GET_BYTE(int word) { return word >>> TO_U_BYTE_SHIFT; } static int TO_U_GET_VALUE(int word) { return word & TO_U_VALUE_MASK; } static boolean TO_U_IS_ROUNDTRIP(int value) { return (value & TO_U_ROUNDTRIP_FLAG) != 0; } static boolean TO_U_IS_PARTIAL(int value) { return (value & UConverterConstants.UNSIGNED_INT_MASK) < TO_U_MIN_CODE_POINT; } static int TO_U_GET_PARTIAL_INDEX(int value) { return value; } static int TO_U_MASK_ROUNDTRIP(int value) { return value & ~TO_U_ROUNDTRIP_FLAG; } private static int TO_U_MAKE_WORD(byte b, int value) { return ((b & UConverterConstants.UNSIGNED_BYTE_MASK) << TO_U_BYTE_SHIFT) | value; } /* use after masking off the roundtrip flag */ static boolean TO_U_IS_CODE_POINT(int value) { return (value & UConverterConstants.UNSIGNED_INT_MASK) <= TO_U_MAX_CODE_POINT; } static int TO_U_GET_CODE_POINT(int value) { return (int) ((value & UConverterConstants.UNSIGNED_INT_MASK) - TO_U_MIN_CODE_POINT); } private static int TO_U_GET_INDEX(int value) { return value & TO_U_INDEX_MASK; } private static int TO_U_GET_LENGTH(int value) { return (value >>> TO_U_LENGTH_SHIFT) - TO_U_LENGTH_OFFSET; } /* fromUnicode helpers ------------------------------------------------------ */ /* most trie constants are shared with ucnvmbcs.h */ private static final int STAGE_2_LEFT_SHIFT = 2; // private static final int STAGE_3_GRANULARITY = 4; /* trie access, returns the stage 3 value=index to stage 3b; s1Index=c>>10 */ static int FROM_U(CharBuffer stage12, CharBuffer stage3, int s1Index, int c) { return stage3.get(((int) stage12.get((stage12.get(s1Index) + ((c >>> 4) & 0x3f))) << STAGE_2_LEFT_SHIFT) + (c & 0xf)); } private static final int FROM_U_LENGTH_SHIFT = 24; private static final int FROM_U_ROUNDTRIP_FLAG = 1 << 31; static final int FROM_U_RESERVED_MASK = 0x60000000; private static final int FROM_U_DATA_MASK = 0xffffff; /* special value for "no mapping" to (impossible roundtrip to 0 bytes, value 01) */ static final int FROM_U_SUBCHAR1 = 0x80000001; /* at most 3 bytes in the lower part of the value */ private static final int FROM_U_MAX_DIRECT_LENGTH = 3; /* maximum number of indexed bytes */ static final int MAX_BYTES = 0x1f; static boolean FROM_U_IS_PARTIAL(int value) { return (value >>> FROM_U_LENGTH_SHIFT) == 0; } static int FROM_U_GET_PARTIAL_INDEX(int value) { return value; } static boolean FROM_U_IS_ROUNDTRIP(int value) { return (value & FROM_U_ROUNDTRIP_FLAG) != 0; } private static int FROM_U_MASK_ROUNDTRIP(int value) { return value & ~FROM_U_ROUNDTRIP_FLAG; } /* use after masking off the roundtrip flag */ static int FROM_U_GET_LENGTH(int value) { return (value >>> FROM_U_LENGTH_SHIFT) & MAX_BYTES; } /* get bytes or bytes index */ static int FROM_U_GET_DATA(int value) { return value & FROM_U_DATA_MASK; } /* get the pointer to an extension array from indexes[index] */ static Buffer ARRAY(ByteBuffer indexes, int index, Class itemType) { int oldpos = indexes.position(); Buffer b; indexes.position(indexes.getInt(index << 2)); if (itemType == int.class) b = indexes.asIntBuffer(); else if (itemType == char.class) b = indexes.asCharBuffer(); else if (itemType == short.class) b = indexes.asShortBuffer(); else // default or (itemType == byte.class) b = indexes.slice(); indexes.position(oldpos); return b; } private static int GET_MAX_BYTES_PER_UCHAR(ByteBuffer indexes) { indexes.position(0); return indexes.getInt(EXT_COUNT_BYTES) & 0xff; } /* * @return index of the UChar, if found; else <0 */ static int findFromU(CharBuffer fromUSection, int length, char u) { int i, start, limit; /* binary search */ start = 0; limit = length; for (;;) { i = limit - start; if (i <= 1) { break; /* done */ } /* startmode==0 is equivalent to firstLength==1. */ private static int SISO_STATE(UConverterSharedData sharedData, int mode) { return sharedData.mbcs.outputType == MBCS_OUTPUT_2_SISO ? (byte) mode : sharedData.mbcs.outputType == MBCS_OUTPUT_DBCS_ONLY ? 1 : -1; } class CharsetDecoderMBCS extends CharsetDecoderICU { CharsetDecoderMBCS(CharsetICU cs) { super(cs); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { /* Just call cnvMBCSToUnicodeWithOffsets() to remove duplicate code. */ return cnvMBCSToUnicodeWithOffsets(source, target, offsets, flush); } /* * continue partial match with new input never called for simple, single-character conversion */ private CoderResult continueMatchToU(ByteBuffer source, CharBuffer target, IntBuffer offsets, int srcIndex, boolean flush) { CoderResult cr = CoderResult.UNDERFLOW; int[] value = new int[1]; int match, length; match = matchToU((byte) SISO_STATE(sharedData, mode), preToUArray, preToUBegin, preToULength, source, value, isToUUseFallback(), flush); if (match > 0) { if (match >= preToULength) { /* advance src pointer for the consumed input */ source.position(source.position() + match - preToULength); preToULength = 0; } else { /* the match did not use all of preToU[] - keep the rest for replay */ length = preToULength - match; System.arraycopy(preToUArray, preToUBegin + match, preToUArray, preToUBegin, length); preToULength = (byte) -length; } /* write result */ cr = writeToU(value[0], target, offsets, srcIndex); } else if (match < 0) { /* save state for partial match */ int j, sArrayIndex; /* just _append_ the newly consumed input to preToU[] */ sArrayIndex = source.position(); match = -match; for (j = preToULength; j < match; ++j) { preToUArray[j] = source.get(sArrayIndex++); } source.position(sArrayIndex); /* same as *src=srcLimit; because we reached the end of input */ preToULength = (byte) match; } else /* match==0 */{ /* * no match * * We need to split the previous input into two parts: * * 1. The first codepage character is unmappable - that's how we got into trying the extension data in * the first place. We need to move it from the preToU buffer to the error buffer, set an error code, * and prepare the rest of the previous input for 2. * * 2. The rest of the previous input must be converted once we come back from the callback for the first * character. At that time, we have to try again from scratch to convert these input characters. The * replay will be handled by the ucnv.c conversion code. */ /* move the first codepage character to the error field */ System.arraycopy(preToUArray, preToUBegin, toUBytesArray, toUBytesBegin, preToUFirstLength); toULength = preToUFirstLength; /* move the rest up inside the buffer */ length = preToULength - preToUFirstLength; if (length > 0) { System.arraycopy(preToUArray, preToUBegin + preToUFirstLength, preToUArray, preToUBegin, length); } /* mark preToU for replay */ preToULength = (byte) -length; /* set the error code for unassigned */ cr = CoderResult.unmappableForLength(preToUFirstLength); } return cr; } /* * this works like natchFromU() except - the first character is in pre - no trie is used - the returned * matchLength is not offset by 2 */ private int matchToU(byte sisoState, byte[] preArray, int preArrayBegin, int preLength, ByteBuffer source, int[] pMatchValue, boolean isUseFallback, boolean flush) { ByteBuffer cx = sharedData.mbcs.extIndexes; IntBuffer toUTable, toUSection; int value, matchValue, srcLength = 0; int i, j, index, length, matchLength; short b; if (cx == null || cx.asIntBuffer().get(EXT_TO_U_LENGTH) <= 0) { return 0; /* no extension data, no match */ } /* initialize */ toUTable = (IntBuffer) ARRAY(cx, EXT_TO_U_INDEX, int.class); index = 0; matchValue = 0; i = j = matchLength = 0; if (source != null) { srcLength = source.remaining(); } if (sisoState == 0) { /* SBCS state of an SI/SO stateful converter, look at only exactly 1 byte */ if (preLength > 1) { return 0; /* no match of a DBCS sequence in SBCS mode */ } else if (preLength == 1) { srcLength = 0; } else /* preLength==0 */{ if (srcLength > 1) { srcLength = 1; } } flush = true; } /* we must not remember fallback matches when not using fallbacks */ /* match input units until there is a full match or the input is consumed */ for (;;) { /* go to the next section */ int oldpos = toUTable.position(); toUSection = ((IntBuffer) toUTable.position(index)).slice(); toUTable.position(oldpos); /* read first pair of the section */ value = toUSection.get(); length = TO_U_GET_BYTE(value); value = TO_U_GET_VALUE(value); if (value != 0 && (TO_U_IS_ROUNDTRIP(value) || isToUUseFallback(isUseFallback)) && TO_U_VERIFY_SISO_MATCH(sisoState, i + j)) { /* remember longest match so far */ matchValue = value; matchLength = i + j; } /* match pre[] then src[] */ if (i < preLength) { b = (short) (preArray[preArrayBegin + i++] & UConverterConstants.UNSIGNED_BYTE_MASK); } else if (j < srcLength) { b = (short) (source.get(source.position() + j++) & UConverterConstants.UNSIGNED_BYTE_MASK); } else { /* all input consumed, partial match */ if (flush || (length = (i + j)) > MAX_BYTES) { /* * end of the entire input stream, stop with the longest match so far or: partial match must not * be longer than UCNV_EXT_MAX_BYTES because it must fit into state buffers */ break; } else { /* continue with more input next time */ return -length; } } /* search for the current UChar */ value = findToU(toUSection, length, b); if (value == 0) { /* no match here, stop with the longest match so far */ break; } else { if (TO_U_IS_PARTIAL(value)) { /* partial match, continue */ index = TO_U_GET_PARTIAL_INDEX(value); } else { if ((TO_U_IS_ROUNDTRIP(value) || isToUUseFallback(isUseFallback)) && TO_U_VERIFY_SISO_MATCH(sisoState, i + j)) { /* full match, stop with result */ matchValue = value; matchLength = i + j; } else { /* full match on fallback not taken, stop with the longest match so far */ } break; } } } if (matchLength == 0) { /* no match at all */ return 0; } /* return result */ pMatchValue[0] = TO_U_MASK_ROUNDTRIP(matchValue); return matchLength; } private CoderResult writeToU(int value, CharBuffer target, IntBuffer offsets, int srcIndex) { ByteBuffer cx = sharedData.mbcs.extIndexes; /* output the result */ if (TO_U_IS_CODE_POINT(value)) { /* output a single code point */ return toUWriteCodePoint(TO_U_GET_CODE_POINT(value), target, offsets, srcIndex); } else { /* output a string - with correct data we have resultLength>0 */ char[] a = new char[TO_U_GET_LENGTH(value)]; CharBuffer cb = ((CharBuffer) ARRAY(cx, EXT_TO_U_UCHARS_INDEX, char.class)); cb.position(TO_U_GET_INDEX(value)); cb.get(a, 0, a.length); return toUWriteUChars(this, a, 0, a.length, target, offsets, srcIndex); } } private CoderResult toUWriteCodePoint(int c, CharBuffer target, IntBuffer offsets, int sourceIndex) { CoderResult cr = CoderResult.UNDERFLOW; int tBeginIndex = target.position(); if (target.hasRemaining()) { if (c <= 0xffff) { target.put((char) c); c = UConverterConstants.U_SENTINEL; } else /* c is a supplementary code point */{ target.put(UTF16.getLeadSurrogate(c)); c = UTF16.getTrailSurrogate(c); if (target.hasRemaining()) { target.put((char) c); c = UConverterConstants.U_SENTINEL; } } /* write offsets */ if (offsets != null) { offsets.put(sourceIndex); if ((tBeginIndex + 1) < target.position()) { offsets.put(sourceIndex); } } } /* write overflow from c */ if (c >= 0) { charErrorBufferLength = UTF16.append(charErrorBufferArray, 0, c); cr = CoderResult.OVERFLOW; } return cr; } /* * Input sequence: cnv->toUBytes[0..length[ @return if(U_FAILURE) return the length (toULength, byteIndex) for * the input else return 0 after output has been written to the target */ private int toU(int length, ByteBuffer source, CharBuffer target, IntBuffer offsets, int sourceIndex, boolean flush, CoderResult[] cr) { // ByteBuffer cx; if (sharedData.mbcs.extIndexes != null && initialMatchToU(length, source, target, offsets, sourceIndex, flush, cr)) { return 0; /* an extension mapping handled the input */ } /* GB 18030 */ if (length == 4 && (options & MBCS_OPTION_GB18030) != 0) { long[] range; long linear; int i; linear = LINEAR_18030(toUBytesArray[0], toUBytesArray[1], toUBytesArray[2], toUBytesArray[3]); for (i = 0; i < gb18030Ranges.length; ++i) { range = gb18030Ranges[i]; if (range[2] <= linear && linear <= range[3]) { /* found the sequence, output the Unicode code point for it */ cr[0] = CoderResult.UNDERFLOW; /* add the linear difference between the input and start sequences to the start code point */ linear = range[0] + (linear - range[2]); /* output this code point */ cr[0] = toUWriteCodePoint((int) linear, target, offsets, sourceIndex); return 0; } } } /* no mapping */ cr[0] = CoderResult.unmappableForLength(length); return length; } /* * target 0) { /* advance src pointer for the consumed input */ source.position(source.position() + match - firstLength); /* write result to target */ cr[0] = writeToU(value[0], target, offsets, srcIndex); return true; } else if (match < 0) { /* save state for partial match */ byte[] sArray; int sArrayIndex; int j; /* copy the first code point */ sArray = toUBytesArray; sArrayIndex = toUBytesBegin; preToUFirstLength = (byte) firstLength; for (j = 0; j < firstLength; ++j) { preToUArray[j] = sArray[sArrayIndex++]; } /* now copy the newly consumed input */ sArrayIndex = source.position(); match = -match; for (; j < match; ++j) { preToUArray[j] = source.get(sArrayIndex++); } source.position(sArrayIndex); preToULength = (byte) match; return true; } else /* match==0 no match */{ return false; } } private int simpleMatchToU(ByteBuffer source, boolean useFallback) { int[] value = new int[1]; int match; if (source.remaining() <= 0) { return 0xffff; } /* try to match */ match = matchToU((byte) -1, source.array(), source.position(), source.limit(), null, value, useFallback, true); if (match == (source.limit() - source.position())) { /* write result for simple, single-character conversion */ if (TO_U_IS_CODE_POINT(value[0])) { return TO_U_GET_CODE_POINT(value[0]); } } /* * return no match because - match>0 && value points to string: simple conversion cannot handle multiple * code points - match>0 && match!=length: not all input consumed, forbidden for this function - match==0: * no match found in the first place - match<0: partial match, not supported for simple conversion (and * flush==TRUE) */ return 0xfffe; } CoderResult cnvMBCSToUnicodeWithOffsets(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; int sourceArrayIndex, sourceArrayIndexStart; int stateTable[][/* 256 */]; char[] unicodeCodeUnits; int offset; byte state; int byteIndex; byte[] bytes; int sourceIndex, nextSourceIndex; int entry = 0; char c; byte action; if (preToULength > 0) { /* * pass sourceIndex=-1 because we continue from an earlier buffer in the future, this may change with * continuous offsets */ cr[0] = continueMatchToU(source, target, offsets, -1, flush); if (cr[0].isError() || preToULength < 0) { return cr[0]; } } if (sharedData.mbcs.countStates == 1) { if ((sharedData.mbcs.unicodeMask & UConverterConstants.HAS_SUPPLEMENTARY) == 0) { cr[0] = cnvMBCSSingleToBMPWithOffsets(source, target, offsets, flush); } else { cr[0] = cnvMBCSSingleToUnicodeWithOffsets(source, target, offsets, flush); } return cr[0]; } /* set up the local pointers */ sourceArrayIndex = sourceArrayIndexStart = source.position(); if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { stateTable = sharedData.mbcs.swapLFNLStateTable; } else { stateTable = sharedData.mbcs.stateTable; } unicodeCodeUnits = sharedData.mbcs.unicodeCodeUnits; /* get the converter state from UConverter */ offset = (int)toUnicodeStatus; byteIndex = toULength; bytes = toUBytesArray; /* * if we are in the SBCS state for a DBCS-only converter, then load the DBCS state from the MBCS data * (dbcsOnlyState==0 if it is not a DBCS-only converter) */ state = (byte)mode; if (state == 0) { state = sharedData.mbcs.dbcsOnlyState; } /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex = byteIndex == 0 ? 0 : -1; nextSourceIndex = 0; /* conversion loop */ while (sourceArrayIndex < source.limit()) { /* * This following test is to see if available input would overflow the output. It does not catch output * of more than one code unit that overflows as a result of a surrogate pair or callback output from the * last source byte. Therefore, those situations also test for overflows and will then break the loop, * too. */ if (!target.hasRemaining()) { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } if (byteIndex == 0) { /* optimized loop for 1/2-byte input and BMP output */ // agljport:todo see ucnvmbcs.c for deleted block do { entry = stateTable[state][source.get(sourceArrayIndex)&UConverterConstants.UNSIGNED_BYTE_MASK]; if (MBCS_ENTRY_IS_TRANSITION(entry)) { state = (byte)MBCS_ENTRY_TRANSITION_STATE(entry); offset = MBCS_ENTRY_TRANSITION_OFFSET(entry); ++sourceArrayIndex; if (sourceArrayIndex < source.limit() && MBCS_ENTRY_IS_FINAL(entry = stateTable[state][source.get(sourceArrayIndex)&UConverterConstants.UNSIGNED_BYTE_MASK]) && MBCS_ENTRY_FINAL_ACTION(entry) == MBCS_STATE_VALID_16 && (c = unicodeCodeUnits[offset + MBCS_ENTRY_FINAL_VALUE_16(entry)]) < 0xfffe) { ++sourceArrayIndex; target.put(c); if (offsets != null) { offsets.put(sourceIndex); sourceIndex = (nextSourceIndex += 2); } state = (byte)MBCS_ENTRY_FINAL_STATE(entry); /* typically 0 */ offset = 0; } else { /* set the state and leave the optimized loop */ ++nextSourceIndex; bytes[0] = source.get(sourceArrayIndex - 1); byteIndex = 1; break; } } else { if (MBCS_ENTRY_FINAL_IS_VALID_DIRECT_16(entry)) { /* output BMP code point */ ++sourceArrayIndex; target.put((char)MBCS_ENTRY_FINAL_VALUE_16(entry)); if (offsets != null) { offsets.put(sourceIndex); sourceIndex = ++nextSourceIndex; } state = (byte)MBCS_ENTRY_FINAL_STATE(entry); /* typically 0 */ } else { /* leave the optimized loop */ break; } } } while (sourceArrayIndex < source.limit() && target.hasRemaining()); /* * these tests and break statements could be put inside the loop if C had "break outerLoop" like * Java */ if (sourceArrayIndex >= source.limit()) { break; } if (!target.hasRemaining()) { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } ++nextSourceIndex; bytes[byteIndex++] = source.get(sourceArrayIndex++); } else /* byteIndex>0 */{ ++nextSourceIndex; entry = stateTable[state][(bytes[byteIndex++] = source.get(sourceArrayIndex++)) & UConverterConstants.UNSIGNED_BYTE_MASK]; } if (MBCS_ENTRY_IS_TRANSITION(entry)) { state = (byte)MBCS_ENTRY_TRANSITION_STATE(entry); offset += MBCS_ENTRY_TRANSITION_OFFSET(entry); continue; } /* save the previous state for proper extension mapping with SI/SO-stateful converters */ mode = state; /* set the next state early so that we can reuse the entry variable */ state = (byte)MBCS_ENTRY_FINAL_STATE(entry); /* typically 0 */ /* * An if-else-if chain provides more reliable performance for the most common cases compared to a * switch. */ action = (byte)MBCS_ENTRY_FINAL_ACTION(entry); if (action == MBCS_STATE_VALID_16) { offset += MBCS_ENTRY_FINAL_VALUE_16(entry); c = unicodeCodeUnits[offset]; if (c < 0xfffe) { /* output BMP code point */ target.put(c); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; } else if (c == 0xfffe) { if (isFallbackUsed() && (entry = (int)getFallback(sharedData.mbcs, offset)) != 0xfffe) { /* output fallback BMP code point */ target.put((char)entry); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; } } else { /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(byteIndex); } } else if (action == MBCS_STATE_VALID_DIRECT_16) { /* output BMP code point */ target.put((char)MBCS_ENTRY_FINAL_VALUE_16(entry)); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; } else if (action == MBCS_STATE_VALID_16_PAIR) { offset += MBCS_ENTRY_FINAL_VALUE_16(entry); c = unicodeCodeUnits[offset++]; if (c < 0xd800) { /* output BMP code point below 0xd800 */ target.put(c); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; } else if (isFallbackUsed() ? c <= 0xdfff : c <= 0xdbff) { /* output roundtrip or fallback surrogate pair */ target.put((char)(c & 0xdbff)); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; if (target.hasRemaining()) { target.put(unicodeCodeUnits[offset]); if (offsets != null) { offsets.put(sourceIndex); } } else { /* target overflow */ charErrorBufferArray[0] = unicodeCodeUnits[offset]; charErrorBufferLength = 1; cr[0] = CoderResult.OVERFLOW; offset = 0; break; } } else if (isFallbackUsed() ? (c & 0xfffe) == 0xe000 : c == 0xe000) { /* output roundtrip BMP code point above 0xd800 or fallback BMP code point */ target.put(unicodeCodeUnits[offset]); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; } else if (c == 0xffff) { /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(byteIndex); } } else if (action == MBCS_STATE_VALID_DIRECT_20 || (action == MBCS_STATE_FALLBACK_DIRECT_20 && isFallbackUsed())) { entry = MBCS_ENTRY_FINAL_VALUE(entry); /* output surrogate pair */ target.put((char)(0xd800 | (char)(entry >> 10))); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; c = (char)(0xdc00 | (char)(entry & 0x3ff)); if (target.hasRemaining()) { target.put(c); if (offsets != null) { offsets.put(sourceIndex); } } else { /* target overflow */ charErrorBufferArray[0] = c; charErrorBufferLength = 1; cr[0] = CoderResult.OVERFLOW; offset = 0; break; } } else if (action == MBCS_STATE_CHANGE_ONLY) { /* * This serves as a state change without any output. It is useful for reading simple stateful * encodings, for example using just Shift-In/Shift-Out codes. The 21 unused bits may later be used * for more sophisticated state transitions. */ if (sharedData.mbcs.dbcsOnlyState == 0) { byteIndex = 0; } else { /* SI/SO are illegal for DBCS-only conversion */ state = (byte)(mode); /* restore the previous state */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(byteIndex); } } else if (action == MBCS_STATE_FALLBACK_DIRECT_16) { if (isFallbackUsed()) { /* output BMP code point */ target.put((char)MBCS_ENTRY_FINAL_VALUE_16(entry)); if (offsets != null) { offsets.put(sourceIndex); } byteIndex = 0; } } else if (action == MBCS_STATE_UNASSIGNED) { /* just fall through */ } else if (action == MBCS_STATE_ILLEGAL) { /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(byteIndex); } else { /* reserved, must never occur */ byteIndex = 0; } /* end of action codes: prepare for a new character */ offset = 0; if (byteIndex == 0) { sourceIndex = nextSourceIndex; } else if (cr[0].isError()) { /* callback(illegal) */ if (byteIndex > 1) { /* * Ticket 5691: consistent illegal sequences: * - We include at least the first byte in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequence before the first one of those. */ boolean isDBCSOnly = (sharedData.mbcs.dbcsOnlyState != 0); byte i; for (i = 1; i < byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, (short)(bytes[i] & UConverterConstants.UNSIGNED_BYTE_MASK)); i++) {} if (i < byteIndex) { byte backOutDistance = (byte)(byteIndex - i); int bytesFromThisBuffer = sourceArrayIndex - sourceArrayIndexStart; byteIndex = i; /* length of reported illegal byte sequence */ if (backOutDistance <= bytesFromThisBuffer) { sourceArrayIndex -= backOutDistance; } else { /* Back out bytes from the previous buffer: Need to replay them. */ this.preToULength = (byte)(bytesFromThisBuffer - backOutDistance); /* preToULength is negative! */ for (int n = 0; n < -this.preToULength; n++) { this.preToUArray[n] = bytes[i+n]; } sourceArrayIndex = sourceArrayIndexStart; } } } break; } else /* unassigned sequences indicated with byteIndex>0 */{ /* try an extension mapping */ int sourceBeginIndex = sourceArrayIndex; source.position(sourceArrayIndex); byteIndex = toU(byteIndex, source, target, offsets, sourceIndex, flush, cr); sourceArrayIndex = source.position(); sourceIndex = nextSourceIndex += (int)(sourceArrayIndex - sourceBeginIndex); if (cr[0].isError() || cr[0].isOverflow()) { /* not mappable or buffer overflow */ break; } } } /* set the converter state back into UConverter */ toUnicodeStatus = offset; mode = state; toULength = byteIndex; /* write back the updated pointers */ source.position(sourceArrayIndex); return cr[0]; } /* * This version of cnvMBCSSingleToUnicodeWithOffsets() is optimized for single-byte, single-state codepages that * only map to and from the BMP. In addition to single-byte optimizations, the offset calculations become much * easier. */ private CoderResult cnvMBCSSingleToBMPWithOffsets(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; int sourceArrayIndex, lastSource; int targetCapacity, length; int[][] stateTable; int sourceIndex; int entry; byte action; /* set up the local pointers */ sourceArrayIndex = source.position(); targetCapacity = target.remaining(); if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { stateTable = sharedData.mbcs.swapLFNLStateTable; } else { stateTable = sharedData.mbcs.stateTable; } /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex = 0; lastSource = sourceArrayIndex; /* * since the conversion here is 1:1 UChar:uint8_t, we need only one counter for the minimum of the * sourceLength and targetCapacity */ length = source.remaining(); if (length < targetCapacity) { targetCapacity = length; } /* conversion loop */ while (targetCapacity > 0) { entry = stateTable[0][source.get(sourceArrayIndex++) & UConverterConstants.UNSIGNED_BYTE_MASK]; /* MBCS_ENTRY_IS_FINAL(entry) */ /* test the most common case first */ if (MBCS_ENTRY_FINAL_IS_VALID_DIRECT_16(entry)) { /* output BMP code point */ target.put((char) MBCS_ENTRY_FINAL_VALUE_16(entry)); --targetCapacity; continue; } /* * An if-else-if chain provides more reliable performance for the most common cases compared to a * switch. */ action = (byte) (MBCS_ENTRY_FINAL_ACTION(entry)); if (action == MBCS_STATE_FALLBACK_DIRECT_16) { if (isFallbackUsed()) { /* output BMP code point */ target.put((char) MBCS_ENTRY_FINAL_VALUE_16(entry)); --targetCapacity; continue; } } else if (action == MBCS_STATE_UNASSIGNED) { /* just fall through */ } else if (action == MBCS_STATE_ILLEGAL) { /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(sourceArrayIndex - lastSource); } else { /* reserved, must never occur */ continue; } /* set offsets since the start or the last extension */ if (offsets != null) { int count = sourceArrayIndex - lastSource; /* predecrement: do not set the offset for the callback-causing character */ while (--count > 0) { offsets.put(sourceIndex++); } /* offset and sourceIndex are now set for the current character */ } if (cr[0].isError()) { /* callback(illegal) */ break; } else /* unassigned sequences indicated with byteIndex>0 */{ /* try an extension mapping */ lastSource = sourceArrayIndex; toUBytesArray[0] = source.get(sourceArrayIndex - 1); source.position(sourceArrayIndex); toULength = toU((byte) 1, source, target, offsets, sourceIndex, flush, cr); sourceArrayIndex = source.position(); sourceIndex += 1 + (int) (sourceArrayIndex - lastSource); if (cr[0].isError()) { /* not mappable or buffer overflow */ break; } /* recalculate the targetCapacity after an extension mapping */ targetCapacity = target.remaining(); length = source.remaining(); if (length < targetCapacity) { targetCapacity = length; } } } if (!cr[0].isError() && sourceArrayIndex < source.limit() && !target.hasRemaining()) { /* target is full */ cr[0] = CoderResult.OVERFLOW; } /* set offsets since the start or the last callback */ if (offsets != null) { int count = sourceArrayIndex - lastSource; while (count > 0) { offsets.put(sourceIndex++); --count; } } /* write back the updated pointers */ source.position(sourceArrayIndex); return cr[0]; } /* This version of cnvMBCSToUnicodeWithOffsets() is optimized for single-byte, single-state codepages. */ private CoderResult cnvMBCSSingleToUnicodeWithOffsets(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; int sourceArrayIndex; int[][] stateTable; int sourceIndex; int entry; char c; byte action; /* set up the local pointers */ sourceArrayIndex = source.position(); if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { stateTable = sharedData.mbcs.swapLFNLStateTable; } else { stateTable = sharedData.mbcs.stateTable; } /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex = 0; /* conversion loop */ while (sourceArrayIndex < source.limit()) { /* * This following test is to see if available input would overflow the output. It does not catch output * of more than one code unit that overflows as a result of a surrogate pair or callback output from the * last source byte. Therefore, those situations also test for overflows and will then break the loop, * too. */ if (!target.hasRemaining()) { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } entry = stateTable[0][source.get(sourceArrayIndex++) & UConverterConstants.UNSIGNED_BYTE_MASK]; /* MBCS_ENTRY_IS_FINAL(entry) */ /* test the most common case first */ if (MBCS_ENTRY_FINAL_IS_VALID_DIRECT_16(entry)) { /* output BMP code point */ target.put((char) MBCS_ENTRY_FINAL_VALUE_16(entry)); if (offsets != null) { offsets.put(sourceIndex); } /* normal end of action codes: prepare for a new character */ ++sourceIndex; continue; } /* * An if-else-if chain provides more reliable performance for the most common cases compared to a * switch. */ action = (byte) (MBCS_ENTRY_FINAL_ACTION(entry)); if (action == MBCS_STATE_VALID_DIRECT_20 || (action == MBCS_STATE_FALLBACK_DIRECT_20 && isFallbackUsed())) { entry = MBCS_ENTRY_FINAL_VALUE(entry); /* output surrogate pair */ target.put((char) (0xd800 | (char) (entry >>> 10))); if (offsets != null) { offsets.put(sourceIndex); } c = (char) (0xdc00 | (char) (entry & 0x3ff)); if (target.hasRemaining()) { target.put(c); if (offsets != null) { offsets.put(sourceIndex); } } else { /* target overflow */ charErrorBufferArray[0] = c; charErrorBufferLength = 1; cr[0] = CoderResult.OVERFLOW; break; } ++sourceIndex; continue; } else if (action == MBCS_STATE_FALLBACK_DIRECT_16) { if (isFallbackUsed()) { /* output BMP code point */ target.put((char) MBCS_ENTRY_FINAL_VALUE_16(entry)); if (offsets != null) { offsets.put(sourceIndex); } ++sourceIndex; continue; } } else if (action == MBCS_STATE_UNASSIGNED) { /* just fall through */ } else if (action == MBCS_STATE_ILLEGAL) { /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); } else { /* reserved, must never occur */ ++sourceIndex; continue; } if (cr[0].isError()) { /* callback(illegal) */ break; } else /* unassigned sequences indicated with byteIndex>0 */{ /* try an extension mapping */ int sourceBeginIndex = sourceArrayIndex; toUBytesArray[0] = source.get(sourceArrayIndex - 1); source.position(sourceArrayIndex); toULength = toU((byte) 1, source, target, offsets, sourceIndex, flush, cr); sourceArrayIndex = source.position(); sourceIndex += 1 + (int) (sourceArrayIndex - sourceBeginIndex); if (cr[0].isError()) { /* not mappable or buffer overflow */ break; } } } /* write back the updated pointers */ source.position(sourceArrayIndex); return cr[0]; } private int getFallback(UConverterMBCSTable mbcsTable, int offset) { MBCSToUFallback[] toUFallbacks; int i, start, limit; limit = mbcsTable.countToUFallbacks; if (limit > 0) { /* do a binary search for the fallback mapping */ toUFallbacks = mbcsTable.toUFallbacks; start = 0; while (start < limit - 1) { i = (start + limit) / 2; if (offset < toUFallbacks[i].offset) { limit = i; } else { start = i; } } /* did we really find it? */ if (offset == toUFallbacks[start].offset) { return toUFallbacks[start].codePoint; } } return 0xfffe; } /** * This is a simple version of _MBCSGetNextUChar() that is used by other converter implementations. It only * returns an "assigned" result if it consumes the entire input. It does not use state from the converter, nor * error codes. It does not handle the EBCDIC swaplfnl option (set in UConverter). It handles conversion * extensions but not GB 18030. * * @return U+fffe unassigned U+ffff illegal otherwise the Unicode code point */ int simpleGetNextUChar(ByteBuffer source, boolean useFallback) { // #if 0 // /* // * Code disabled 2002dec09 (ICU 2.4) because it is not currently used in ICU. markus // * TODO In future releases, verify that this function is never called for SBCS // * conversions, i.e., that sharedData->mbcs.countStates==1 is still true. // * Removal improves code coverage. // */ // /* use optimized function if possible */ // if(sharedData->mbcs.countStates==1) { // if(length==1) { // return ucnv_MBCSSingleSimpleGetNextUChar(sharedData, (uint8_t)*source, useFallback); // } else { // return 0xffff; /* illegal: more than a single byte for an SBCS converter */ // } // } // #endif /* set up the local pointers */ int[][] stateTable = sharedData.mbcs.stateTable; char[] unicodeCodeUnits = sharedData.mbcs.unicodeCodeUnits; /* converter state */ int offset = 0; int state = sharedData.mbcs.dbcsOnlyState; int action; int entry; int c; int i = source.position(); int length = source.limit() - i; /* conversion loop */ while (true) { // entry=stateTable[state][(uint8_t)source[i++]]; entry = stateTable[state][source.get(i++) & UConverterConstants.UNSIGNED_BYTE_MASK]; if (MBCS_ENTRY_IS_TRANSITION(entry)) { state = MBCS_ENTRY_TRANSITION_STATE(entry); offset += MBCS_ENTRY_TRANSITION_OFFSET(entry); if (i == source.limit()) { return 0xffff; /* truncated character */ } } else { /* * An if-else-if chain provides more reliable performance for the most common cases compared to a * switch. */ action = MBCS_ENTRY_FINAL_ACTION(entry); if (action == MBCS_STATE_VALID_16) { offset += MBCS_ENTRY_FINAL_VALUE_16(entry); c = unicodeCodeUnits[offset]; if (c != 0xfffe) { /* done */ } else if (isToUUseFallback()) { c = getFallback(sharedData.mbcs, offset); } /* else done with 0xfffe */ } else if (action == MBCS_STATE_VALID_DIRECT_16) { // /* output BMP code point */ c = MBCS_ENTRY_FINAL_VALUE_16(entry); } else if (action == MBCS_STATE_VALID_16_PAIR) { offset += MBCS_ENTRY_FINAL_VALUE_16(entry); c = unicodeCodeUnits[offset++]; if (c < 0xd800) { /* output BMP code point below 0xd800 */ } else if (isToUUseFallback() ? c <= 0xdfff : c <= 0xdbff) { /* output roundtrip or fallback supplementary code point */ c = (((c & 0x3ff) << 10) + unicodeCodeUnits[offset] + (0x10000 - 0xdc00)); } else if (isToUUseFallback() ? (c & 0xfffe) == 0xe000 : c == 0xe000) { /* output roundtrip BMP code point above 0xd800 or fallback BMP code point */ c = unicodeCodeUnits[offset]; } else if (c == 0xffff) { return 0xffff; } else { c = 0xfffe; } } else if (action == MBCS_STATE_VALID_DIRECT_20) { /* output supplementary code point */ c = 0x10000 + MBCS_ENTRY_FINAL_VALUE(entry); } else if (action == MBCS_STATE_FALLBACK_DIRECT_16) { if (!isToUUseFallback(useFallback)) { c = 0xfffe; } else { /* output BMP code point */ c = MBCS_ENTRY_FINAL_VALUE_16(entry); } } else if (action == MBCS_STATE_FALLBACK_DIRECT_20) { if (!isToUUseFallback(useFallback)) { c = 0xfffe; } else { /* output supplementary code point */ c = 0x10000 + MBCS_ENTRY_FINAL_VALUE(entry); } } else if (action == MBCS_STATE_UNASSIGNED) { c = 0xfffe; } else { /* * forbid MBCS_STATE_CHANGE_ONLY for this function, and MBCS_STATE_ILLEGAL and reserved action * codes */ return 0xffff; } break; } } if (i != source.limit()) { /* illegal for this function: not all input consumed */ return 0xffff; } if (c == 0xfffe) { /* try an extension mapping */ if (sharedData.mbcs.extIndexes != null) { /* Increase the limit for proper handling. Used in LMBCS. */ if (source.limit() > i + length) { source.limit(i + length); } return simpleMatchToU(source, useFallback); } } return c; } private boolean hasValidTrailBytes(int[][] stateTable, short state) { int[] row = stateTable[state]; int b, entry; /* First test for final entries in this state for some commonly valid byte values. */ entry = row[0xa1]; if (!MBCS_ENTRY_IS_TRANSITION(entry) && MBCS_ENTRY_FINAL_ACTION(entry) != MBCS_STATE_ILLEGAL) { return true; } entry = row[0x41]; if (!MBCS_ENTRY_IS_TRANSITION(entry) && MBCS_ENTRY_FINAL_ACTION(entry) != MBCS_STATE_ILLEGAL) { return true; } /* Then test for final entries in this state. */ for (b = 0; b <= 0xff; b++) { entry = row[b]; if (!MBCS_ENTRY_IS_TRANSITION(entry) && MBCS_ENTRY_FINAL_ACTION(entry) != MBCS_STATE_ILLEGAL) { return true; } } /* Then recurse for transition entries. */ for (b = 0; b <= 0xff; b++) { entry = row[b]; if (MBCS_ENTRY_IS_TRANSITION(entry) && hasValidTrailBytes(stateTable, (short)(MBCS_ENTRY_TRANSITION_STATE(entry) & UConverterConstants.UNSIGNED_BYTE_MASK))) { return true; } } return false; } private boolean isSingleOrLead(int[][] stateTable, int state, boolean isDBCSOnly, int b) { int[] row = stateTable[state]; int entry = row[b]; if (MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ return hasValidTrailBytes(stateTable, (short)(MBCS_ENTRY_TRANSITION_STATE(entry) & UConverterConstants.UNSIGNED_BYTE_MASK)); } else { short action = (short)(MBCS_ENTRY_FINAL_ACTION(entry) & UConverterConstants.UNSIGNED_BYTE_MASK); if (action == MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { return false; /* SI/SO are illegal for DBCS-only conversion */ } else { return (action != MBCS_STATE_ILLEGAL); } } } } class CharsetEncoderMBCS extends CharsetEncoderICU { private boolean allowReplacementChanges = false; CharsetEncoderMBCS(CharsetICU cs) { super(cs, fromUSubstitution); allowReplacementChanges = true; // allow changes in implReplaceWith implReset(); } protected void implReset() { super.implReset(); preFromUFirstCP = UConverterConstants.U_SENTINEL; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; // if (!source.hasRemaining() && fromUChar32 == 0) // return cr[0]; int sourceArrayIndex; char[] table; byte[] pArray, bytes; int pArrayIndex, outputType, c; int prevSourceIndex, sourceIndex, nextSourceIndex; int stage2Entry = 0, value = 0, length = 0, prevLength; short uniMask; // long asciiRoundtrips; boolean gotoUnassigned = false; try { if (!flush && preFromUFirstCP >= 0) { /* * pass sourceIndex=-1 because we continue from an earlier buffer in the future, this may change * with continuous offsets */ cr[0] = continueMatchFromU(source, target, offsets, flush, -1); if (cr[0].isError() || preFromULength < 0) { return cr[0]; } } /* use optimized function if possible */ outputType = sharedData.mbcs.outputType; uniMask = sharedData.mbcs.unicodeMask; if (outputType == MBCS_OUTPUT_1 && (uniMask & UConverterConstants.HAS_SURROGATES) == 0) { if ((uniMask & UConverterConstants.HAS_SUPPLEMENTARY) == 0) { cr[0] = cnvMBCSSingleFromBMPWithOffsets(source, target, offsets, flush); } else { cr[0] = cnvMBCSSingleFromUnicodeWithOffsets(source, target, offsets, flush); } return cr[0]; } else if (outputType == MBCS_OUTPUT_2) { cr[0] = cnvMBCSDoubleFromUnicodeWithOffsets(source, target, offsets, flush); return cr[0]; } table = sharedData.mbcs.fromUnicodeTable; sourceArrayIndex = source.position(); if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { bytes = sharedData.mbcs.swapLFNLFromUnicodeBytes; } else { bytes = sharedData.mbcs.fromUnicodeBytes; } // asciiRoundtrips = sharedData.mbcs.asciiRoundtrips; /* get the converter state from UConverter */ c = fromUChar32; if (outputType == MBCS_OUTPUT_2_SISO) { prevLength = (int) fromUnicodeStatus; if (prevLength == 0) { /* set the real value */ prevLength = 1; } } else { /* prevent fromUnicodeStatus from being set to something non-0 */ prevLength = 0; } /* sourceIndex=-1 if the current character began in the previous buffer */ prevSourceIndex = -1; sourceIndex = c == 0 ? 0 : -1; nextSourceIndex = 0; /* conversion loop */ /* * This is another piece of ugly code: A goto into the loop if the converter state contains a first * surrogate from the previous function call. It saves me to check in each loop iteration a check of * if(c==0) and duplicating the trail-surrogate-handling code in the else branch of that check. I could * not find any other way to get around this other than using a function call for the conversion and * callback, which would be even more inefficient. * * Markus Scherer 2000-jul-19 */ boolean doloop = true; boolean doread = true; if (c != 0 && target.hasRemaining()) { if (UTF16.isLeadSurrogate((char) c) && (uniMask & UConverterConstants.HAS_SURROGATES) == 0) { // c is a lead surrogate, read another input SideEffects x = new SideEffects(c, sourceArrayIndex, sourceIndex, nextSourceIndex, prevSourceIndex, prevLength); doloop = getTrail(source, target, uniMask, x, flush, cr); doread = x.doread; c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; prevSourceIndex = x.prevSourceIndex; prevLength = x.prevLength; } else { // c is not a lead surrogate, do not read another input doread = false; } } if (doloop) { while (!doread || sourceArrayIndex < source.limit()) { /* * This following test is to see if available input would overflow the output. It does not catch * output of more than one byte that overflows as a result of a multi-byte character or callback * output from the last source character. Therefore, those situations also test for overflows * and will then break the loop, too. */ if (target.hasRemaining()) { /* * Get a correct Unicode code point: a single UChar for a BMP code point or a matched * surrogate pair for a "supplementary code point". */ if (doread) { // doread might be false only on the first looping c = source.get(sourceArrayIndex++); ++nextSourceIndex; /* * This also tests if the codepage maps single surrogates. If it does, then surrogates * are not paired but mapped separately. Note that in this case unmatched surrogates are * not detected. */ if (UTF16.isSurrogate((char) c) && (uniMask & UConverterConstants.HAS_SURROGATES) == 0) { if (UTF16.isLeadSurrogate((char) c)) { // getTrail: SideEffects x = new SideEffects(c, sourceArrayIndex, sourceIndex, nextSourceIndex, prevSourceIndex, prevLength); doloop = getTrail(source, target, uniMask, x, flush, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; prevSourceIndex = x.prevSourceIndex; if (x.doread) { if (doloop) continue; else break; } } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); break; } } } else { doread = true; } /* convert the Unicode code point in c into codepage bytes */ /* * The basic lookup is a triple-stage compact array (trie) lookup. For details see the * beginning of this file. * * Single-byte codepages are handled with a different data structure by _MBCSSingle... * functions. * * The result consists of a 32-bit value from stage 2 and a pointer to as many bytes as are * stored per character. The pointer points to the character's bytes in stage 3. Bits 15..0 * of the stage 2 entry contain the stage 3 index for that pointer, while bits 31..16 are * flags for which of the 16 characters in the block are roundtrip-assigned. * * For 2-byte and 4-byte codepages, the bytes are stored as uint16_t respectively as * uint32_t, in the platform encoding. For 3-byte codepages, the bytes are always stored in * big-endian order. * * For EUC encodings that use only either 0x8e or 0x8f as the first byte of their longest * byte sequences, the first two bytes in this third stage indicate with their 7th bits * whether these bytes are to be written directly or actually need to be preceeded by one of * the two Single-Shift codes. With this, the third stage stores one byte fewer per * character than the actual maximum length of EUC byte sequences. * * Other than that, leading zero bytes are removed and the other bytes output. A single zero * byte may be output if the "assigned" bit in stage 2 was on. The data structure does not * support zero byte output as a fallback, and also does not allow output of leading zeros. */ stage2Entry = MBCS_STAGE_2_FROM_U(table, c); /* get the bytes and the length for the output */ switch (outputType) { /* This is handled above with the method cnvMBCSDoubleFromUnicodeWithOffsets() */ /* case MBCS_OUTPUT_2: value = MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, c); if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { length = 1; } else { length = 2; } break; */ case MBCS_OUTPUT_2_SISO: /* 1/2-byte stateful with Shift-In/Shift-Out */ /* * Save the old state in the converter object right here, then change the local * prevLength state variable if necessary. Then, if this character turns out to be * unassigned or a fallback that is not taken, the callback code must not save the new * state in the converter because the new state is for a character that is not output. * However, the callback must still restore the state from the converter in case the * callback function changed it for its output. */ fromUnicodeStatus = prevLength; /* save the old state */ value = MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, c); if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { if (value == 0 && MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, c) == false) { /* no mapping, leave value==0 */ length = 0; } else if (prevLength <= 1) { length = 1; } else { /* change from double-byte mode to single-byte */ value |= UConverterConstants.SI << 8; length = 2; prevLength = 1; } } else { if (prevLength == 2) { length = 2; } else { /* change from single-byte mode to double-byte */ value |= UConverterConstants.SO << 16; length = 3; prevLength = 2; } } break; case MBCS_OUTPUT_DBCS_ONLY: /* table with single-byte results, but only DBCS mappings used */ value = MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, c); if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { /* no mapping or SBCS result, not taken for DBCS-only */ value = stage2Entry = 0; /* stage2Entry=0 to reset roundtrip flags */ length = 0; } else { length = 2; } break; case MBCS_OUTPUT_3: pArray = bytes; pArrayIndex = MBCS_POINTER_3_FROM_STAGE_2(bytes, stage2Entry, c); value = ((pArray[pArrayIndex] & UConverterConstants.UNSIGNED_BYTE_MASK) << 16) | ((pArray[pArrayIndex + 1] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | (pArray[pArrayIndex + 2] & UConverterConstants.UNSIGNED_BYTE_MASK); if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { length = 1; } else if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xffff) { length = 2; } else { length = 3; } break; case MBCS_OUTPUT_4: value = MBCS_VALUE_4_FROM_STAGE_2(bytes, stage2Entry, c); if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { length = 1; } else if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xffff) { length = 2; } else if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xffffff) { length = 3; } else { length = 4; } break; case MBCS_OUTPUT_3_EUC: value = MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, c); /* EUC 16-bit fixed-length representation */ if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { length = 1; } else if ((value & 0x8000) == 0) { value |= 0x8e8000; length = 3; } else if ((value & 0x80) == 0) { value |= 0x8f0080; length = 3; } else { length = 2; } break; case MBCS_OUTPUT_4_EUC: pArray = bytes; pArrayIndex = MBCS_POINTER_3_FROM_STAGE_2(bytes, stage2Entry, c); value = ((pArray[pArrayIndex] & UConverterConstants.UNSIGNED_BYTE_MASK) << 16) | ((pArray[pArrayIndex + 1] & UConverterConstants.UNSIGNED_BYTE_MASK) << 8) | (pArray[pArrayIndex + 2] & UConverterConstants.UNSIGNED_BYTE_MASK); /* EUC 16-bit fixed-length representation applied to the first two bytes */ if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { length = 1; } else if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xffff) { length = 2; } else if ((value & 0x800000) == 0) { value |= 0x8e800000; length = 4; } else if ((value & 0x8000) == 0) { value |= 0x8f008000; length = 4; } else { length = 3; } break; default: /* must not occur */ /* * To avoid compiler warnings that value & length may be used without having been * initialized, we set them here. In reality, this is unreachable code. Not having a * default branch also causes warnings with some compilers. */ value = stage2Entry = 0; /* stage2Entry=0 to reset roundtrip flags */ length = 0; break; } /* is this code point assigned, or do we use fallbacks? */ if (gotoUnassigned || (!(MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, c) || (isFromUUseFallback(c) && value != 0)))) { gotoUnassigned = false; /* * We allow a 0 byte output if the "assigned" bit is set for this entry. There is no way * with this data structure for fallback output to be a zero byte. */ // unassigned: SideEffects x = new SideEffects(c, sourceArrayIndex, sourceIndex, nextSourceIndex, prevSourceIndex, prevLength); doloop = unassigned(source, target, offsets, x, flush, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; prevSourceIndex = x.prevSourceIndex; prevLength = x.prevLength; if (doloop) continue; else break; } /* write the output character bytes from value and length */ /* from the first if in the loop we know that targetCapacity>0 */ if (length <= target.remaining()) { switch (length) { /* each branch falls through to the next one */ case 4: target.put((byte) (value >>> 24)); if (offsets != null) { offsets.put(sourceIndex); } case 3: target.put((byte) (value >>> 16)); if (offsets != null) { offsets.put(sourceIndex); } case 2: target.put((byte) (value >>> 8)); if (offsets != null) { offsets.put(sourceIndex); } case 1: target.put((byte) value); if (offsets != null) { offsets.put(sourceIndex); } default: /* will never occur */ break; } } else { int errorBufferArrayIndex; /* * We actually do this backwards here: In order to save an intermediate variable, we * output first to the overflow buffer what does not fit into the regular target. */ /* we know that 1<=targetCapacity>> 16); case 2: errorBuffer[errorBufferArrayIndex++] = (byte) (value >>> 8); case 1: errorBuffer[errorBufferArrayIndex] = (byte) value; default: /* will never occur */ break; } errorBufferLength = (byte) length; /* now output what fits into the regular target */ value >>>= 8 * length; /* length was reduced by targetCapacity */ switch (target.remaining()) { /* each branch falls through to the next one */ case 3: target.put((byte) (value >>> 16)); if (offsets != null) { offsets.put(sourceIndex); } case 2: target.put((byte) (value >>> 8)); if (offsets != null) { offsets.put(sourceIndex); } case 1: target.put((byte) value); if (offsets != null) { offsets.put(sourceIndex); } default: /* will never occur */ break; } /* target overflow */ cr[0] = CoderResult.OVERFLOW; c = 0; break; } /* normal end of conversion: prepare for a new character */ c = 0; if (offsets != null) { prevSourceIndex = sourceIndex; sourceIndex = nextSourceIndex; } continue; } else { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } } } /* * the end of the input stream and detection of truncated input are handled by the framework, but for * EBCDIC_STATEFUL conversion we need to emit an SI at the very end * * conditions: successful EBCDIC_STATEFUL in DBCS mode end of input and no truncated input */ if (outputType == MBCS_OUTPUT_2_SISO && prevLength == 2 && flush && sourceArrayIndex >= source.limit() && c == 0) { /* EBCDIC_STATEFUL ending with DBCS: emit an SI to return the output stream to SBCS */ if (target.hasRemaining()) { target.put((byte) UConverterConstants.SI); if (offsets != null) { /* set the last source character's index (sourceIndex points at sourceLimit now) */ offsets.put(prevSourceIndex); } } else { /* target is full */ errorBuffer[0] = (byte) UConverterConstants.SI; errorBufferLength = 1; cr[0] = CoderResult.OVERFLOW; } prevLength = 1; /* we switched into SBCS */ } /* set the converter state back into UConverter */ fromUChar32 = c; fromUnicodeStatus = prevLength; source.position(sourceArrayIndex); } catch (BufferOverflowException ex) { cr[0] = CoderResult.OVERFLOW; } return cr[0]; } /* * This is another simple conversion function for internal use by other conversion implementations. It does not * use the converter state nor call callbacks. It does not handle the EBCDIC swaplfnl option (set in * UConverter). It handles conversion extensions but not GB 18030. * * It converts one single Unicode code point into codepage bytes, encoded as one 32-bit value. The function * returns the number of bytes in *pValue: 1..4 the number of bytes in *pValue 0 unassigned (*pValue undefined) * -1 illegal (currently not used, *pValue undefined) * * *pValue will contain the resulting bytes with the last byte in bits 7..0, the second to last byte in bits * 15..8, etc. Currently, the function assumes but does not check that 0<=c<=0x10ffff. */ int fromUChar32(int c, int[] pValue, boolean isUseFallback) { // #if 0 // /* #if 0 because this is not currently used in ICU - reduce code, increase code coverage */ // const uint8_t *p; // #endif char[] table; int stage2Entry; int value; int length; int p; /* BMP-only codepages are stored without stage 1 entries for supplementary code points */ if (c <= 0xffff || ((sharedData.mbcs.unicodeMask & UConverterConstants.HAS_SUPPLEMENTARY) != 0)) { table = sharedData.mbcs.fromUnicodeTable; /* convert the Unicode code point in c into codepage bytes (same as in _MBCSFromUnicodeWithOffsets) */ if (sharedData.mbcs.outputType == MBCS_OUTPUT_1) { value = MBCS_SINGLE_RESULT_FROM_U(table, sharedData.mbcs.fromUnicodeBytes, c); /* is this code point assigned, or do we use fallbacks? */ if (isUseFallback ? value >= 0x800 : value >= 0xc00) { pValue[0] = value & 0xff; return 1; } } else /* outputType!=MBCS_OUTPUT_1 */{ stage2Entry = MBCS_STAGE_2_FROM_U(table, c); /* get the bytes and the length for the output */ switch (sharedData.mbcs.outputType) { case MBCS_OUTPUT_2: value = MBCS_VALUE_2_FROM_STAGE_2(sharedData.mbcs.fromUnicodeBytes, stage2Entry, c); if (value <= 0xff) { length = 1; } else { length = 2; } break; // #if 0 // /* #if 0 because this is not currently used in ICU - reduce code, increase code coverage */ // case MBCS_OUTPUT_DBCS_ONLY: // /* table with single-byte results, but only DBCS mappings used */ // value=MBCS_VALUE_2_FROM_STAGE_2(sharedData->mbcs.fromUnicodeBytes, stage2Entry, c); // if(value<=0xff) { // /* no mapping or SBCS result, not taken for DBCS-only */ // value=stage2Entry=0; /* stage2Entry=0 to reset roundtrip flags */ // length=0; // } else { // length=2; // } // break; case MBCS_OUTPUT_3: byte[] bytes = sharedData.mbcs.fromUnicodeBytes; p = CharsetMBCS.MBCS_POINTER_3_FROM_STAGE_2(bytes, stage2Entry, c); value = ((bytes[p] & UConverterConstants.UNSIGNED_BYTE_MASK)<<16) | ((bytes[p+1] & UConverterConstants.UNSIGNED_BYTE_MASK)<<8) | (bytes[p+2] & UConverterConstants.UNSIGNED_BYTE_MASK); if (value <= 0xff) { length = 1; } else if (value <= 0xffff) { length = 2; } else { length = 3; } break; // case MBCS_OUTPUT_4: // value=MBCS_VALUE_4_FROM_STAGE_2(sharedData->mbcs.fromUnicodeBytes, stage2Entry, c); // if(value<=0xff) { // length=1; // } else if(value<=0xffff) { // length=2; // } else if(value<=0xffffff) { // length=3; // } else { // length=4; // } // break; // case MBCS_OUTPUT_3_EUC: // value=MBCS_VALUE_2_FROM_STAGE_2(sharedData->mbcs.fromUnicodeBytes, stage2Entry, c); // /* EUC 16-bit fixed-length representation */ // if(value<=0xff) { // length=1; // } else if((value&0x8000)==0) { // value|=0x8e8000; // length=3; // } else if((value&0x80)==0) { // value|=0x8f0080; // length=3; // } else { // length=2; // } // break; // case MBCS_OUTPUT_4_EUC: // p=MBCS_POINTER_3_FROM_STAGE_2(sharedData->mbcs.fromUnicodeBytes, stage2Entry, c); // value=((uint32_t)*p<<16)|((uint32_t)p[1]<<8)|p[2]; // /* EUC 16-bit fixed-length representation applied to the first two bytes */ // if(value<=0xff) { // length=1; // } else if(value<=0xffff) { // length=2; // } else if((value&0x800000)==0) { // value|=0x8e800000; // length=4; // } else if((value&0x8000)==0) { // value|=0x8f008000; // length=4; // } else { // length=3; // } // break; // #endif default: /* must not occur */ return -1; } /* is this code point assigned, or do we use fallbacks? */ if (MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, c) || (CharsetEncoderICU.isFromUUseFallback(isUseFallback, c) && value != 0)) { /* * We allow a 0 byte output if the "assigned" bit is set for this entry. There is no way with * this data structure for fallback output to be a zero byte. */ /* assigned */ pValue[0] = value; return length; } } } if (sharedData.mbcs.extIndexes != null) { length = simpleMatchFromU(c, pValue, isUseFallback); return length >= 0 ? length : -length; /* return abs(length); */ } /* unassigned */ return 0; } /* * continue partial match with new input, requires cnv->preFromUFirstCP>=0 never called for simple, * single-character conversion */ private CoderResult continueMatchFromU(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush, int srcIndex) { CoderResult cr = CoderResult.UNDERFLOW; int[] value = new int[1]; int match; match = matchFromU(preFromUFirstCP, preFromUArray, preFromUBegin, preFromULength, source, value, useFallback, flush); if (match >= 2) { match -= 2; /* remove 2 for the initial code point */ if (match >= preFromULength) { /* advance src pointer for the consumed input */ source.position(source.position() + match - preFromULength); preFromULength = 0; } else { /* the match did not use all of preFromU[] - keep the rest for replay */ int length = preFromULength - match; System.arraycopy(preFromUArray, preFromUBegin + match, preFromUArray, preFromUBegin, length); preFromULength = (byte) -length; } /* finish the partial match */ preFromUFirstCP = UConverterConstants.U_SENTINEL; /* write result */ writeFromU(value[0], target, offsets, srcIndex); } else if (match < 0) { /* save state for partial match */ int sArrayIndex; int j; /* just _append_ the newly consumed input to preFromU[] */ sArrayIndex = source.position(); match = -match - 2; /* remove 2 for the initial code point */ for (j = preFromULength; j < match; ++j) { preFromUArray[j] = source.get(sArrayIndex++); } source.position(sArrayIndex); /* same as *src=srcLimit; because we reached the end of input */ preFromULength = (byte) match; } else { /* match==0 or 1 */ /* * no match * * We need to split the previous input into two parts: * * 1. The first code point is unmappable - that's how we got into trying the extension data in the first * place. We need to move it from the preFromU buffer to the error buffer, set an error code, and * prepare the rest of the previous input for 2. * * 2. The rest of the previous input must be converted once we come back from the callback for the first * code point. At that time, we have to try again from scratch to convert these input characters. The * replay will be handled by the ucnv.c conversion code. */ if (match == 1) { /* matched, no mapping but request for */ useSubChar1 = true; } /* move the first code point to the error field */ fromUChar32 = preFromUFirstCP; preFromUFirstCP = UConverterConstants.U_SENTINEL; /* mark preFromU for replay */ preFromULength = (byte) -preFromULength; /* set the error code for unassigned */ // TODO: figure out what the unmappable length really should be cr = CoderResult.unmappableForLength(1); } return cr; } /** * @param cx * pointer to extension data; if NULL, returns 0 * @param firstCP * the first code point before all the other UChars * @param pre * UChars that must match; !initialMatch: partial match with them * @param preLength * length of pre, >=0 * @param src * UChars that can be used to complete a match * @param srcLength * length of src, >=0 * @param pMatchValue * [out] output result value for the match from the data structure * @param useFallback * "use fallback" flag, usually from cnv->useFallback * @param flush * TRUE if the end of the input stream is reached * @return >1: matched, return value=total match length (number of input units matched) 1: matched, no mapping * but request for (only for the first code point) 0: no match <0: partial match, return * value=negative total match length (partial matches are never returned for flush==TRUE) (partial * matches are never returned as being longer than UCNV_EXT_MAX_UCHARS) the matchLength is 2 if only * firstCP matched, and >2 if firstCP and further code units matched */ // static int32_t ucnv_extMatchFromU(const int32_t *cx, UChar32 firstCP, const UChar *pre, int32_t preLength, // const UChar *src, int32_t srcLength, uint32_t *pMatchValue, UBool useFallback, UBool flush) private int matchFromU(int firstCP, char[] preArray, int preArrayBegin, int preLength, CharBuffer source, int[] pMatchValue, boolean isUseFallback, boolean flush) { ByteBuffer cx = sharedData.mbcs.extIndexes; CharBuffer stage12, stage3; IntBuffer stage3b; CharBuffer fromUTableUChars, fromUSectionUChars; IntBuffer fromUTableValues, fromUSectionValues; int value, matchValue; int i, j, index, length, matchLength; char c; if (cx == null) { return 0; /* no extension data, no match */ } /* trie lookup of firstCP */ index = firstCP >>> 10; /* stage 1 index */ if (index >= cx.asIntBuffer().get(EXT_FROM_U_STAGE_1_LENGTH)) { return 0; /* the first code point is outside the trie */ } stage12 = (CharBuffer) ARRAY(cx, EXT_FROM_U_STAGE_12_INDEX, char.class); stage3 = (CharBuffer) ARRAY(cx, EXT_FROM_U_STAGE_3_INDEX, char.class); index = FROM_U(stage12, stage3, index, firstCP); stage3b = (IntBuffer) ARRAY(cx, EXT_FROM_U_STAGE_3B_INDEX, int.class); value = stage3b.get(stage3b.position() + index); if (value == 0) { return 0; } if (TO_U_IS_PARTIAL(value)) { /* partial match, enter the loop below */ index = FROM_U_GET_PARTIAL_INDEX(value); /* initialize */ fromUTableUChars = (CharBuffer) ARRAY(cx, EXT_FROM_U_UCHARS_INDEX, char.class); fromUTableValues = (IntBuffer) ARRAY(cx, EXT_FROM_U_VALUES_INDEX, int.class); matchValue = 0; i = j = matchLength = 0; /* we must not remember fallback matches when not using fallbacks */ /* match input units until there is a full match or the input is consumed */ for (;;) { /* go to the next section */ int oldpos = fromUTableUChars.position(); fromUSectionUChars = ((CharBuffer) fromUTableUChars.position(index)).slice(); fromUTableUChars.position(oldpos); oldpos = fromUTableValues.position(); fromUSectionValues = ((IntBuffer) fromUTableValues.position(index)).slice(); fromUTableValues.position(oldpos); /* read first pair of the section */ length = fromUSectionUChars.get(); value = fromUSectionValues.get(); if (value != 0 && (FROM_U_IS_ROUNDTRIP(value) || isFromUUseFallback(isUseFallback, firstCP))) { /* remember longest match so far */ matchValue = value; matchLength = 2 + i + j; } /* match pre[] then src[] */ if (i < preLength) { c = preArray[preArrayBegin + i++]; } else if (source != null && j < source.remaining()) { c = source.get(source.position() + j++); } else { /* all input consumed, partial match */ if (flush || (length = (i + j)) > MAX_UCHARS) { /* * end of the entire input stream, stop with the longest match so far or: partial match must * not be longer than UCNV_EXT_MAX_UCHARS because it must fit into state buffers */ break; } else { /* continue with more input next time */ return -(2 + length); } } /* search for the current UChar */ index = findFromU(fromUSectionUChars, length, c); if (index < 0) { /* no match here, stop with the longest match so far */ break; } else { value = fromUSectionValues.get(fromUSectionValues.position() + index); if (FROM_U_IS_PARTIAL(value)) { /* partial match, continue */ index = FROM_U_GET_PARTIAL_INDEX(value); } else { if (FROM_U_IS_ROUNDTRIP(value) || isFromUUseFallback(isUseFallback, firstCP)) { /* full match, stop with result */ matchValue = value; matchLength = 2 + i + j; } else { /* full match on fallback not taken, stop with the longest match so far */ } break; } } } if (matchLength == 0) { /* no match at all */ return 0; } } else /* result from firstCP trie lookup */{ if (FROM_U_IS_ROUNDTRIP(value) || isFromUUseFallback(isUseFallback, firstCP)) { /* full match, stop with result */ matchValue = value; matchLength = 2; } else { /* fallback not taken */ return 0; } } if ((matchValue & FROM_U_RESERVED_MASK) != 0) { /* do not interpret values with reserved bits used, for forward compatibility */ return 0; } /* return result */ if (matchValue == FROM_U_SUBCHAR1) { return 1; /* assert matchLength==2 */ } pMatchValue[0] = FROM_U_MASK_ROUNDTRIP(matchValue); return matchLength; } private int simpleMatchFromU(int cp, int[] pValue, boolean isUseFallback) { int[] value = new int[1]; int match; // signed /* try to match */ match = matchFromU(cp, null, 0, 0, null, value, isUseFallback, true); if (match >= 2) { /* write result for simple, single-character conversion */ int length; boolean isRoundtrip; isRoundtrip = FROM_U_IS_ROUNDTRIP(value[0]); length = FROM_U_GET_LENGTH(value[0]); value[0] = FROM_U_GET_DATA(value[0]); if (length <= EXT_FROM_U_MAX_DIRECT_LENGTH) { pValue[0] = value[0]; return isRoundtrip ? length : -length; // #if 0 /* not currently used */ // } else if(length==4) { // /* de-serialize a 4-byte result */ // const uint8_t *result=UCNV_EXT_ARRAY(cx, UCNV_EXT_FROM_U_BYTES_INDEX, uint8_t)+value; // *pValue= // ((uint32_t)result[0]<<24)| // ((uint32_t)result[1]<<16)| // ((uint32_t)result[2]<<8)| // result[3]; // return isRoundtrip ? 4 : -4; // #endif } } /* * return no match because - match>1 && resultLength>4: result too long for simple conversion - match==1: no * match found, preferred - match==0: no match found in the first place - match<0: partial * match, not supported for simple conversion (and flush==TRUE) */ return 0; } private CoderResult writeFromU(int value, ByteBuffer target, IntBuffer offsets, int srcIndex) { ByteBuffer cx = sharedData.mbcs.extIndexes; byte bufferArray[] = new byte[1 + MAX_BYTES]; int bufferArrayIndex = 0; byte[] resultArray; int resultArrayIndex; int length, prevLength; length = FROM_U_GET_LENGTH(value); value = FROM_U_GET_DATA(value); /* output the result */ if (length <= FROM_U_MAX_DIRECT_LENGTH) { /* * Generate a byte array and then write it below. This is not the fastest possible way, but it should be * ok for extension mappings, and it is much simpler. Offset and overflow handling are only done once * this way. */ int p = bufferArrayIndex + 1; /* reserve buffer[0] for shiftByte below */ switch (length) { case 3: bufferArray[p++] = (byte) (value >>> 16); case 2: bufferArray[p++] = (byte) (value >>> 8); case 1: bufferArray[p++] = (byte) value; default: break; /* will never occur */ } resultArray = bufferArray; resultArrayIndex = bufferArrayIndex + 1; } else { byte[] slice = new byte[length]; ByteBuffer bb = ((ByteBuffer) ARRAY(cx, EXT_FROM_U_BYTES_INDEX, byte.class)); bb.position(value); bb.get(slice, 0, slice.length); resultArray = slice; resultArrayIndex = 0; } /* with correct data we have length>0 */ if ((prevLength = (int) fromUnicodeStatus) != 0) { /* handle SI/SO stateful output */ byte shiftByte; if (prevLength > 1 && length == 1) { /* change from double-byte mode to single-byte */ shiftByte = (byte) UConverterConstants.SI; fromUnicodeStatus = 1; } else if (prevLength == 1 && length > 1) { /* change from single-byte mode to double-byte */ shiftByte = (byte) UConverterConstants.SO; fromUnicodeStatus = 2; } else { shiftByte = 0; } if (shiftByte != 0) { /* prepend the shift byte to the result bytes */ bufferArray[0] = shiftByte; if (resultArray != bufferArray || resultArrayIndex != bufferArrayIndex + 1) { System.arraycopy(resultArray, resultArrayIndex, bufferArray, bufferArrayIndex + 1, length); } resultArray = bufferArray; resultArrayIndex = bufferArrayIndex; ++length; } } return fromUWriteBytes(this, resultArray, resultArrayIndex, length, target, offsets, srcIndex); } /* * @return if(U_FAILURE) return the code point for cnv->fromUChar32 else return 0 after output has been written * to the target */ private int fromU(int cp_, CharBuffer source, ByteBuffer target, IntBuffer offsets, int sourceIndex, int length, boolean flush, CoderResult[] cr) { // ByteBuffer cx; long cp = cp_ & UConverterConstants.UNSIGNED_INT_MASK; useSubChar1 = false; if (sharedData.mbcs.extIndexes != null && initialMatchFromU((int) cp, source, target, offsets, sourceIndex, flush, cr)) { return 0; /* an extension mapping handled the input */ } /* GB 18030 */ if ((options & MBCS_OPTION_GB18030) != 0) { long[] range; int i; for (i = 0; i < gb18030Ranges.length; ++i) { range = gb18030Ranges[i]; if (range[0] <= cp && cp <= range[1]) { /* found the Unicode code point, output the four-byte sequence for it */ long linear; byte bytes[] = new byte[4]; /* get the linear value of the first GB 18030 code in this range */ linear = range[2] - LINEAR_18030_BASE; /* add the offset from the beginning of the range */ linear += (cp - range[0]); bytes[3] = (byte) (0x30 + linear % 10); linear /= 10; bytes[2] = (byte) (0x81 + linear % 126); linear /= 126; bytes[1] = (byte) (0x30 + linear % 10); linear /= 10; bytes[0] = (byte) (0x81 + linear); /* output this sequence */ cr[0] = fromUWriteBytes(this, bytes, 0, 4, target, offsets, sourceIndex); return 0; } } } /* no mapping */ cr[0] = CoderResult.unmappableForLength(length); return (int) cp; } /* * target= 2 && !(FROM_U_GET_LENGTH(value[0]) == 1 && sharedData.mbcs.outputType == MBCS_OUTPUT_DBCS_ONLY)) { /* advance src pointer for the consumed input */ source.position(source.position() + match - 2); /* remove 2 for the initial code point */ /* write result to target */ cr[0] = writeFromU(value[0], target, offsets, srcIndex); return true; } else if (match < 0) { /* save state for partial match */ int sArrayIndex; int j; /* copy the first code point */ preFromUFirstCP = cp; /* now copy the newly consumed input */ sArrayIndex = source.position(); match = -match - 2; /* remove 2 for the initial code point */ for (j = 0; j < match; ++j) { preFromUArray[j] = source.get(sArrayIndex++); } source.position(sArrayIndex); /* same as *src=srcLimit; because we reached the end of input */ preFromULength = (byte) match; return true; } else if (match == 1) { /* matched, no mapping but request for */ useSubChar1 = true; return false; } else /* match==0 no match */{ return false; } } CoderResult cnvMBCSFromUnicodeWithOffsets(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; char[] table; int p; ByteBuffer bytes; short outputType; SideEffects x = new SideEffects(0, 0, 0, 0, 0, 0); int targetCapacity = target.limit() - target.position(); int stage2Entry = 0; //int asciiRoundtrips; long value; int length = 0; int uniMask; boolean doLoop = true; boolean gotoGetTrail = false; if (preFromUFirstCP >= 0) { /* * pass sourceIndex=-1 because we continue from an earlier buffer * in the future, this may change with continuous offsets. */ cr[0] = continueMatchFromU(source, target, offsets, flush, -1); if (cr[0].isError() || preFromULength < 0) { return cr[0]; } } /* use optimized function if possible */ outputType = sharedData.mbcs.outputType; uniMask = sharedData.mbcs.unicodeMask; if (outputType == MBCS_OUTPUT_1 && ((uniMask&UConverterConstants.HAS_SURROGATES) == 0)) { if ((uniMask&UConverterConstants.HAS_SURROGATES) == 0) { cr[0] = cnvMBCSSingleFromBMPWithOffsets(source, target, offsets, flush); } else { cr[0] = cnvMBCSSingleFromUnicodeWithOffsets(source, target, offsets, flush); } return cr[0]; }/* else if (outputType == MBCS_OUTPUT_2 && mbcs.sharedData.mbcs.utf8Friendly) { cr[0] = cnvMBCSDoubleFromUnicodeWithOffsets(source, target, offsets, flush); return cr[0]; }*/ table = sharedData.mbcs.fromUnicodeTable; /* if (mbcs.sharedData.mbcs.utf8Friendly) { mbcsIndex = mbcs.sharedData.mbcs.mbcsIndex; } else { mbcsIndex = null; } */ if ((options&UConverterConstants.OPTION_SWAP_LFNL) != 0) { bytes = ByteBuffer.wrap(sharedData.mbcs.swapLFNLFromUnicodeBytes); } else { bytes = ByteBuffer.wrap(sharedData.mbcs.fromUnicodeBytes); } //asciiRoundtrips = mbcs.sharedData.mbcs.asciiRoundtrips; /* get the converter state from UConverter */ x.c = fromUChar32; if (outputType == MBCS_OUTPUT_2_SISO) { x.prevLength = fromUnicodeStatus; if (x.prevLength == 0) { /* set the real value */ x.prevLength = 1; } } else { /* prevent fromUnicodeStatus from being set to something non-0 */ x.prevLength = 0; } /* sourceIndex = -1 if the current character began in the previous buffer */ x.prevSourceIndex = -1; x.sourceIndex = x.c==0 ? 0 : -1; x.nextSourceIndex = 0; /* conversion loop */ if (x.c != 0 && targetCapacity > 0) { gotoGetTrail = true; // set gotoGetTrail flag and go to gotoGetTrail label } while (gotoGetTrail || source.hasRemaining()) { /* * This following test is to see if available input would overflow the output. * It does not catch output of more than one byte that * overflows as a result of a multi-byte character or callback output * from the last source character. * Therefore, those situations also test for overflows and will * then break the loop, too. */ if (gotoGetTrail || targetCapacity > 0) { /* * Get a correct Unicode code point: * a single UChar for a BMP code point or * a matched surrogate pair for a "supplementary code point." */ if (!gotoGetTrail) { x.c = source.get(); ++x.nextSourceIndex; /* This is commented out because of the fact that IS_ASCII_ROUNDTRIP is not * being used in ICU4J. */ /*if (x.c <= 0x7f && IS_ASCII_ROUNDTRIP(c, asciiRoundtrips)) { target.put((byte)x.c); if (offsets != null) { offsets.put(x.sourceIndex); x.prevSourceIndex = x.sourceIndex; x.sourceIndex = x.nextSourceIndex; } targetCapacity--; x.c = 0; continue; }*/ } /* Code to use utf8friendly code was removed since it is not needed in Java. */ /* This also tests if the codepage maps single surrogates. * If it does, then surrogates are not paired but mapped separately. * Note that in this case unmatched surrogates are not detected. */ if (gotoGetTrail || (UTF16.isSurrogate((char)x.c) && (uniMask&UConverterConstants.HAS_SURROGATES) == 0)) { if (gotoGetTrail || (UTF16.isLeadSurrogate((char)x.c))) { // getTrail label gotoGetTrail = false; // reset gotoGetTrail flag x.sourceArrayIndex = source.position(); doLoop = getTrail(source, target, uniMask, x, flush, cr); if (x.doread && doLoop) { continue; } else if (!x.doread && !doLoop) { break; } else if (!doLoop) { break; } } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); break; } } /* convert the Unicode point in c into codepage bytes */ /* * The basic lookup is a triple-stage compact array (trie) lookup. * * Single-byte codepages are handled with a different data structure * by _MBCSSingle... functions. * * The result consists of a 32-bit value from stage 2 and * a pointer to as many bytes as are stored per character. * The pointer points to the character's bytes in stage 3. * Bits 15..0 of the stage 2 entry contain the stage 3 index * for that pointer, while bits 31..16 are flags for which of * the 16 characters in the block are roundtrip-assigned. * * For 2-byte and 4 byte codepages, the bytes are stored as uint16_t * respectively as uint32_t, in the platform encoding. * For 3-byte codepages, the bytes are always stored in big-endian order. * * For EUC encodings that use only either 0x8e or 0x8f as the first * byte of their longest byte sequences, the first two bytes in * this third stage indicate with their 7th bits whether these bytes * are to be writeen directly or actually need to be preceeded by * one of the two Single-Shift codes. With this, the third stage * stores one byte fewer per character than the actual maximum length of * EUC byte sequences. * * Other than that, leading zero bytes are removed and the other * bytes output. A single zero byte may be ouput if the "assigned" * bit in stage 2 was on. * The data structure does not support zero byte output as a fallback, * and also does not allow output of leading zeros. */ stage2Entry = MBCS_STAGE_2_FROM_U(table, x.c); /* get the bytes and the length for the output */ switch (outputType) { case MBCS_OUTPUT_2: value = MBCS_VALUE_2_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); if (value <= 0xff) { length = 1; } else { length = 2; } break; case MBCS_OUTPUT_2_SISO: /* 1/2-byte stateful with Shift-In/Shift-Out */ /* * Save the old state in the converter object * right here, then change the local pervLength state variable if necessary. * Then, if this character turns out to be unassigned or a fallback that * is not taken, the callback code must not save the new state in the converter * because the new state is for a character that is not output. * However, the callback must still restore the state from the converter * in case the callback function changed it for its output. */ fromUnicodeStatus = x.prevLength; /* save the old state */ value = MBCS_VALUE_2_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); if (value <= 0xff) { if (value == 0 && MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, x.c)) { /* no mapping, leave value == 0 */ length = 0; } else if (x.prevLength <= 1) { length = 1; } else { /* change from double-byte mode to single-byte */ value |= UConverterConstants.UNSIGNED_INT_MASK & (UConverterConstants.SI<<8); length = 2; x.prevLength = 1; } } else { if (x.prevLength == 2) { length = 2; } else { /* change from single-byte mode to double-byte */ value |= UConverterConstants.UNSIGNED_INT_MASK & (UConverterConstants.SO<<16); length = 3; x.prevLength = 2; } } break; case MBCS_OUTPUT_DBCS_ONLY: /* table with single-byte results, but only DBCS mappings used */ value = MBCS_VALUE_2_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); if (value <= 0xff) { /* no mapping or SBCS result, not taken for DBCS-only */ value = stage2Entry = 0; /* stage2Entry=0 to reset roundtrip flags */ length = 0; } else { length = 2; } break; case MBCS_OUTPUT_3: p = MBCS_POINTER_3_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); value = UConverterConstants.UNSIGNED_INT_MASK&((int)bytes.get(p)<<16 | (int)bytes.get(p+1)<<8 | bytes.get(p+2)); if (value <= 0xff) { length = 1; } else if (value <= 0xffff) { length = 2; } else { length = 3; } break; case MBCS_OUTPUT_4: value = MBCS_VALUE_4_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); if (value <= 0xff) { length = 1; } else if (value <= 0xffff) { length = 2; } else if (value <= 0xffffff) { length = 3; } else { length = 4; } break; case MBCS_OUTPUT_3_EUC: value = MBCS_VALUE_2_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); /* EUC 16-bit fixed-length representation */ if (value <= 0xff) { length = 1; } else if ((value&0x8000) == 0) { value |= 0x8e8000; length = 3; } else if ((value&0x80) == 0) { value |= 0x8f0080; length = 3; } else { length = 2; } break; case MBCS_OUTPUT_4_EUC: p = MBCS_POINTER_3_FROM_STAGE_2(bytes.array(), stage2Entry, x.c); value = UConverterConstants.UNSIGNED_INT_MASK&((int)bytes.get(p)<<16 | (int)bytes.get(p+1)<<8 | bytes.get(p+2)); /* EUC 16-bit fixed-length representation applied to the first two bytes */ if (value <= 0xff) { length = 1; } else if (value <= 0xffff) { length = 2; } else if ((value&0x800000) == 0) { value |= 0x08e800000; length = 4; } else if ((value&0x8000) == 0) { value |= 0x08f008000; length = 4; } else { length = 3; } break; default : /* must not occur */ value = stage2Entry = 0; length = 0; break; } /* is this code point assigned, or do we use fallbacks? */ if (!(MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, x.c)) || (CharsetEncoderICU.isFromUUseFallback(useFallback, x.c) && value != 0)) { /* * We allow a 0 byte output if the "assigned" bit is set for this entry. * There is no way with this data structure for fallback output * to be a zero byte. */ // unassigned label int currentSourcePos = source.position(); doLoop = unassigned(source, target, offsets, x, flush, cr); if (doLoop) { continue; } else { if (source.position() < currentSourcePos) { source.position(currentSourcePos); } break; } } /* write the output character bytes from value and length */ /* from the first if in the loop we know that targetCapacity>0 */ if (length <= targetCapacity) { switch (length) { /* each branch falls through to the next one */ case 4: target.put((byte)(value>>24)); if (offsets != null) { offsets.put(x.sourceIndex); } case 3: target.put((byte)(value>>16)); if (offsets != null) { offsets.put(x.sourceIndex); } case 2: target.put((byte)(value>>8)); if (offsets != null) { offsets.put(x.sourceIndex); } case 1: target.put((byte)value); if (offsets != null) { offsets.put(x.sourceIndex); } default : /* will never occur */ break; } targetCapacity -= length; } else { /* * We actually do this backwards here: * In order to save an intermediate variable, we output * first to the overflow buffer what does not fit into the * regular target. */ /* we know that 1<=targetCapacity>16); case 2: errorBuffer[i++] = (byte)(value>>8); case 1: errorBuffer[i++] = (byte)value; default : /* will never occur */ break; } errorBufferLength = length; /* now output what fits into the regular target */ value>>=8*length; /* length was reduced by targetCapacity */ switch (targetCapacity) { /* each branch falls through to the next one */ case 3: target.put((byte)(value>>16)); if (offsets != null) { offsets.put(x.sourceIndex); } case 2: target.put((byte)(value>>8)); if (offsets != null) { offsets.put(x.sourceIndex); } case 1: target.put((byte)value); if (offsets != null) { offsets.put(x.sourceIndex); } default : /* will never occur */ break; } /* target overflow */ targetCapacity = 0; cr[0] = CoderResult.OVERFLOW; x.c = 0; break; } /* normal end of conversion: prepare for a new character */ x.c = 0; if (offsets != null) { x.prevSourceIndex = x.sourceIndex; x.sourceIndex = x.nextSourceIndex; } continue; } else { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } } /* * the end of the input stream and detection of truncated input * are handled by the framework, but for EBCDIC_STATEFUL conversion * we need to emit an SI at the very end * * conditions: * successful * EBCDIC_STATEFUL in DBCS mode * end of input and no truncated input */ if (!cr[0].isError() && outputType == MBCS_OUTPUT_2_SISO && x.prevLength == 2 && flush && !source.hasRemaining() && x.c == 0) { /* EBCDIC_STATEFUL ending with DBCS: emit an SI to return the output stream to SBCS */ if (targetCapacity > 0) { target.put((byte)UConverterConstants.SI); if (offsets != null) { /* set the last source character's index (sourceIndex points at sourceLimit now) */ offsets.put(x.prevSourceIndex); } } else { /* target is full */ errorBuffer[0] = UConverterConstants.SI; errorBufferLength = 1; cr[0] = CoderResult.OVERFLOW; } x.prevLength = 1; /* we switched into SBCS */ } /* set the converter state back into UConverter */ fromUChar32 = x.c; fromUnicodeStatus = x.prevLength; return cr[0]; } /* * This version of ucnv_MBCSFromUnicode() is optimized for single-byte codepages that map only to and from the * BMP. In addition to single-byte/state optimizations, the offset calculations become much easier. */ private CoderResult cnvMBCSSingleFromBMPWithOffsets(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; int sourceArrayIndex, lastSource; int targetCapacity, length; char[] table; byte[] results; int c, sourceIndex; char value, minValue; /* set up the local pointers */ sourceArrayIndex = source.position(); targetCapacity = target.remaining(); table = sharedData.mbcs.fromUnicodeTable; if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { results = sharedData.mbcs.swapLFNLFromUnicodeBytes; // agljport:comment should swapLFNLFromUnicodeBytes // be a ByteBuffer so results can be a 16-bit view // of it? } else { results = sharedData.mbcs.fromUnicodeBytes; // agljport:comment should swapLFNLFromUnicodeBytes be a // ByteBuffer so results can be a 16-bit view of it? } if (useFallback) { /* use all roundtrip and fallback results */ minValue = 0x800; } else { /* use only roundtrips and fallbacks from private-use characters */ minValue = 0xc00; } /* get the converter state from UConverter */ c = fromUChar32; /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex = c == 0 ? 0 : -1; lastSource = sourceArrayIndex; /* * since the conversion here is 1:1 UChar:uint8_t, we need only one counter for the minimum of the * sourceLength and targetCapacity */ length = source.limit() - sourceArrayIndex; if (length < targetCapacity) { targetCapacity = length; } boolean doloop = true; if (c != 0 && targetCapacity > 0) { SideEffectsSingleBMP x = new SideEffectsSingleBMP(c, sourceArrayIndex); doloop = getTrailSingleBMP(source, x, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; } if (doloop) { while (targetCapacity > 0) { /* * Get a correct Unicode code point: a single UChar for a BMP code point or a matched surrogate pair * for a "supplementary code point". */ c = source.get(sourceArrayIndex++); /* * Do not immediately check for single surrogates: Assume that they are unassigned and check for * them in that case. This speeds up the conversion of assigned characters. */ /* convert the Unicode code point in c into codepage bytes */ value = MBCS_SINGLE_RESULT_FROM_U(table, results, c); /* is this code point assigned, or do we use fallbacks? */ if (value >= minValue) { /* assigned, write the output character bytes from value and length */ /* length==1 */ /* this is easy because we know that there is enough space */ target.put((byte) value); --targetCapacity; /* normal end of conversion: prepare for a new character */ c = 0; continue; } else if (!UTF16.isSurrogate((char) c)) { /* normal, unassigned BMP character */ } else if (UTF16.isLeadSurrogate((char) c)) { // getTrail: SideEffectsSingleBMP x = new SideEffectsSingleBMP(c, sourceArrayIndex); doloop = getTrailSingleBMP(source, x, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; if (!doloop) break; } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); break; } /* c does not have a mapping */ /* get the number of code units for c to correctly advance sourceIndex */ length = UTF16.getCharCount(c); /* set offsets since the start or the last extension */ if (offsets != null) { int count = sourceArrayIndex - lastSource; /* do not set the offset for this character */ count -= length; while (count > 0) { offsets.put(sourceIndex++); --count; } /* offsets and sourceIndex are now set for the current character */ } /* try an extension mapping */ lastSource = sourceArrayIndex; source.position(sourceArrayIndex); c = fromU(c, source, target, offsets, sourceIndex, length, flush, cr); sourceArrayIndex = source.position(); sourceIndex += length + (sourceArrayIndex - lastSource); lastSource = sourceArrayIndex; if (cr[0].isError()) { /* not mappable or buffer overflow */ break; } else { /* a mapping was written to the target, continue */ /* recalculate the targetCapacity after an extension mapping */ targetCapacity = target.remaining(); length = source.limit() - sourceArrayIndex; if (length < targetCapacity) { targetCapacity = length; } } } } if (sourceArrayIndex < source.limit() && !target.hasRemaining()) { /* target is full */ cr[0] = CoderResult.OVERFLOW; } /* set offsets since the start or the last callback */ if (offsets != null) { int count = sourceArrayIndex - lastSource; while (count > 0) { offsets.put(sourceIndex++); --count; } } /* set the converter state back into UConverter */ fromUChar32 = c; /* write back the updated pointers */ source.position(sourceArrayIndex); return cr[0]; } /* This version of ucnv_MBCSFromUnicodeWithOffsets() is optimized for single-byte codepages. */ private CoderResult cnvMBCSSingleFromUnicodeWithOffsets(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; int sourceArrayIndex; char[] table; byte[] results; // agljport:comment results is used to to get 16-bit values out of byte[] array int c; int sourceIndex, nextSourceIndex; char value, minValue; /* set up the local pointers */ short uniMask; sourceArrayIndex = source.position(); table = sharedData.mbcs.fromUnicodeTable; if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { results = sharedData.mbcs.swapLFNLFromUnicodeBytes; // agljport:comment should swapLFNLFromUnicodeBytes // be a ByteBuffer so results can be a 16-bit view // of it? } else { results = sharedData.mbcs.fromUnicodeBytes; // agljport:comment should swapLFNLFromUnicodeBytes be a // ByteBuffer so results can be a 16-bit view of it? } if (useFallback) { /* use all roundtrip and fallback results */ minValue = 0x800; } else { /* use only roundtrips and fallbacks from private-use characters */ minValue = 0xc00; } // agljport:comment hasSupplementary only used in getTrail block which now simply repeats the mask operation uniMask = sharedData.mbcs.unicodeMask; /* get the converter state from UConverter */ c = fromUChar32; /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex = c == 0 ? 0 : -1; nextSourceIndex = 0; boolean doloop = true; boolean doread = true; if (c != 0 && target.hasRemaining()) { if (UTF16.isLeadSurrogate((char) c)) { SideEffectsDouble x = new SideEffectsDouble(c, sourceArrayIndex, sourceIndex, nextSourceIndex); doloop = getTrailDouble(source, target, uniMask, x, flush, cr); doread = x.doread; c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; } else { doread = false; } } if (doloop) { while (!doread || sourceArrayIndex < source.limit()) { /* * This following test is to see if available input would overflow the output. It does not catch * output of more than one byte that overflows as a result of a multi-byte character or callback * output from the last source character. Therefore, those situations also test for overflows and * will then break the loop, too. */ if (target.hasRemaining()) { /* * Get a correct Unicode code point: a single UChar for a BMP code point or a matched surrogate * pair for a "supplementary code point". */ if (doread) { c = source.get(sourceArrayIndex++); ++nextSourceIndex; if (UTF16.isSurrogate((char) c)) { if (UTF16.isLeadSurrogate((char) c)) { // getTrail: SideEffectsDouble x = new SideEffectsDouble(c, sourceArrayIndex, sourceIndex, nextSourceIndex); doloop = getTrailDouble(source, target, uniMask, x, flush, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; if (x.doread) { if (doloop) continue; else break; } } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); break; } } } else { doread = true; } /* convert the Unicode code point in c into codepage bytes */ value = MBCS_SINGLE_RESULT_FROM_U(table, results, c); /* is this code point assigned, or do we use fallbacks? */ if (value >= minValue) { /* assigned, write the output character bytes from value and length */ /* length==1 */ /* this is easy because we know that there is enough space */ target.put((byte) value); if (offsets != null) { offsets.put(sourceIndex); } /* normal end of conversion: prepare for a new character */ c = 0; sourceIndex = nextSourceIndex; } else { /* unassigned */ /* try an extension mapping */ SideEffectsDouble x = new SideEffectsDouble(c, sourceArrayIndex, sourceIndex, nextSourceIndex); doloop = unassignedDouble(source, target, x, flush, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; if (!doloop) break; } } else { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } } } /* set the converter state back into UConverter */ fromUChar32 = c; /* write back the updated pointers */ source.position(sourceArrayIndex); return cr[0]; } /* This version of ucnv_MBCSFromUnicodeWithOffsets() is optimized for double-byte codepages. */ private CoderResult cnvMBCSDoubleFromUnicodeWithOffsets(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult[] cr = { CoderResult.UNDERFLOW }; int sourceArrayIndex; char[] table; byte[] bytes; int c, sourceIndex, nextSourceIndex; int stage2Entry; int value; int length; short uniMask; /* use optimized function if possible */ uniMask = sharedData.mbcs.unicodeMask; /* set up the local pointers */ sourceArrayIndex = source.position(); table = sharedData.mbcs.fromUnicodeTable; if ((options & UConverterConstants.OPTION_SWAP_LFNL) != 0) { bytes = sharedData.mbcs.swapLFNLFromUnicodeBytes; } else { bytes = sharedData.mbcs.fromUnicodeBytes; } /* get the converter state from UConverter */ c = fromUChar32; /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex = c == 0 ? 0 : -1; nextSourceIndex = 0; /* conversion loop */ boolean doloop = true; boolean doread = true; if (c != 0 && target.hasRemaining()) { if (UTF16.isLeadSurrogate((char) c)) { SideEffectsDouble x = new SideEffectsDouble(c, sourceArrayIndex, sourceIndex, nextSourceIndex); doloop = getTrailDouble(source, target, uniMask, x, flush, cr); doread = x.doread; c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; } else { doread = false; } } if (doloop) { while (!doread || sourceArrayIndex < source.limit()) { /* * This following test is to see if available input would overflow the output. It does not catch * output of more than one byte that overflows as a result of a multi-byte character or callback * output from the last source character. Therefore, those situations also test for overflows and * will then break the loop, too. */ if (target.hasRemaining()) { if (doread) { /* * Get a correct Unicode code point: a single UChar for a BMP code point or a matched * surrogate pair for a "supplementary code point". */ c = source.get(sourceArrayIndex++); ++nextSourceIndex; /* * This also tests if the codepage maps single surrogates. If it does, then surrogates are * not paired but mapped separately. Note that in this case unmatched surrogates are not * detected. */ if (UTF16.isSurrogate((char) c) && (uniMask & UConverterConstants.HAS_SURROGATES) == 0) { if (UTF16.isLeadSurrogate((char) c)) { // getTrail: SideEffectsDouble x = new SideEffectsDouble(c, sourceArrayIndex, sourceIndex, nextSourceIndex); doloop = getTrailDouble(source, target, uniMask, x, flush, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; if (x.doread) { if (doloop) continue; else break; } } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); break; } } } else { doread = true; } /* convert the Unicode code point in c into codepage bytes */ stage2Entry = MBCS_STAGE_2_FROM_U(table, c); /* get the bytes and the length for the output */ /* MBCS_OUTPUT_2 */ value = MBCS_VALUE_2_FROM_STAGE_2(bytes, stage2Entry, c); if ((value & UConverterConstants.UNSIGNED_INT_MASK) <= 0xff) { length = 1; } else { length = 2; } /* is this code point assigned, or do we use fallbacks? */ if (!(MBCS_FROM_U_IS_ROUNDTRIP(stage2Entry, c) || (isFromUUseFallback(c) && value != 0))) { /* * We allow a 0 byte output if the "assigned" bit is set for this entry. There is no way * with this data structure for fallback output to be a zero byte. */ // unassigned: SideEffectsDouble x = new SideEffectsDouble(c, sourceArrayIndex, sourceIndex, nextSourceIndex); doloop = unassignedDouble(source, target, x, flush, cr); c = x.c; sourceArrayIndex = x.sourceArrayIndex; sourceIndex = x.sourceIndex; nextSourceIndex = x.nextSourceIndex; if (doloop) continue; else break; } /* write the output character bytes from value and length */ /* from the first if in the loop we know that targetCapacity>0 */ if (length == 1) { /* this is easy because we know that there is enough space */ target.put((byte) value); if (offsets != null) { offsets.put(sourceIndex); } } else /* length==2 */{ target.put((byte) (value >>> 8)); if (2 <= target.remaining()) { target.put((byte) value); if (offsets != null) { offsets.put(sourceIndex); offsets.put(sourceIndex); } } else { if (offsets != null) { offsets.put(sourceIndex); } errorBuffer[0] = (byte) value; errorBufferLength = 1; /* target overflow */ cr[0] = CoderResult.OVERFLOW; c = 0; break; } } /* normal end of conversion: prepare for a new character */ c = 0; sourceIndex = nextSourceIndex; continue; } else { /* target is full */ cr[0] = CoderResult.OVERFLOW; break; } } } /* set the converter state back into UConverter */ fromUChar32 = c; /* write back the updated pointers */ source.position(sourceArrayIndex); return cr[0]; } private final class SideEffectsSingleBMP { int c, sourceArrayIndex; SideEffectsSingleBMP(int c_, int sourceArrayIndex_) { c = c_; sourceArrayIndex = sourceArrayIndex_; } } // function made out of block labeled getTrail in ucnv_MBCSSingleFromUnicodeWithOffsets // assumes input c is lead surrogate private final boolean getTrailSingleBMP(CharBuffer source, SideEffectsSingleBMP x, CoderResult[] cr) { if (x.sourceArrayIndex < source.limit()) { /* test the following code unit */ char trail = source.get(x.sourceArrayIndex); if (UTF16.isTrailSurrogate(trail)) { ++x.sourceArrayIndex; x.c = UCharacter.getCodePoint((char) x.c, trail); /* this codepage does not map supplementary code points */ /* callback(unassigned) */ cr[0] = CoderResult.unmappableForLength(2); return false; } else { /* this is an unmatched lead code unit (1st surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); return false; } } else { /* no more input */ return false; } // return true; } private final class SideEffects { int c, sourceArrayIndex, sourceIndex, nextSourceIndex, prevSourceIndex, prevLength; boolean doread = true; SideEffects(int c_, int sourceArrayIndex_, int sourceIndex_, int nextSourceIndex_, int prevSourceIndex_, int prevLength_) { c = c_; sourceArrayIndex = sourceArrayIndex_; sourceIndex = sourceIndex_; nextSourceIndex = nextSourceIndex_; prevSourceIndex = prevSourceIndex_; prevLength = prevLength_; } } // function made out of block labeled getTrail in ucnv_MBCSFromUnicodeWithOffsets // assumes input c is lead surrogate private final boolean getTrail(CharBuffer source, ByteBuffer target, int uniMask, SideEffects x, boolean flush, CoderResult[] cr) { if (x.sourceArrayIndex < source.limit()) { /* test the following code unit */ char trail = source.get(x.sourceArrayIndex); if (UTF16.isTrailSurrogate(trail)) { ++x.sourceArrayIndex; ++x.nextSourceIndex; /* convert this supplementary code point */ x.c = UCharacter.getCodePoint((char) x.c, trail); if ((uniMask & UConverterConstants.HAS_SUPPLEMENTARY) == 0) { /* BMP-only codepages are stored without stage 1 entries for supplementary code points */ fromUnicodeStatus = x.prevLength; /* save the old state */ /* callback(unassigned) */ x.doread = true; return unassigned(source, target, null, x, flush, cr); } else { x.doread = false; return true; } } else { /* this is an unmatched lead code unit (1st surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); return false; } } else { /* no more input */ return false; } } // function made out of block labeled unassigned in ucnv_MBCSFromUnicodeWithOffsets private final boolean unassigned(CharBuffer source, ByteBuffer target, IntBuffer offsets, SideEffects x, boolean flush, CoderResult[] cr) { /* try an extension mapping */ int sourceBegin = x.sourceArrayIndex; source.position(x.sourceArrayIndex); x.c = fromU(x.c, source, target, null, x.sourceIndex, x.nextSourceIndex, flush, cr); x.sourceArrayIndex = source.position(); x.nextSourceIndex += x.sourceArrayIndex - sourceBegin; x.prevLength = (int) fromUnicodeStatus; if (cr[0].isError()) { /* not mappable or buffer overflow */ return false; } else { /* a mapping was written to the target, continue */ /* recalculate the targetCapacity after an extension mapping */ // x.targetCapacity=pArgs.targetLimit-x.targetArrayIndex; /* normal end of conversion: prepare for a new character */ if (offsets != null) { x.prevSourceIndex = x.sourceIndex; x.sourceIndex = x.nextSourceIndex; } return true; } } private final class SideEffectsDouble { int c, sourceArrayIndex, sourceIndex, nextSourceIndex; boolean doread = true; SideEffectsDouble(int c_, int sourceArrayIndex_, int sourceIndex_, int nextSourceIndex_) { c = c_; sourceArrayIndex = sourceArrayIndex_; sourceIndex = sourceIndex_; nextSourceIndex = nextSourceIndex_; } } // function made out of block labeled getTrail in ucnv_MBCSDoubleFromUnicodeWithOffsets // assumes input c is lead surrogate private final boolean getTrailDouble(CharBuffer source, ByteBuffer target, int uniMask, SideEffectsDouble x, boolean flush, CoderResult[] cr) { if (x.sourceArrayIndex < source.limit()) { /* test the following code unit */ char trail = source.get(x.sourceArrayIndex); if (UTF16.isTrailSurrogate(trail)) { ++x.sourceArrayIndex; ++x.nextSourceIndex; /* convert this supplementary code point */ x.c = UCharacter.getCodePoint((char) x.c, trail); if ((uniMask & UConverterConstants.HAS_SUPPLEMENTARY) == 0) { /* BMP-only codepages are stored without stage 1 entries for supplementary code points */ /* callback(unassigned) */ x.doread = true; return unassignedDouble(source, target, x, flush, cr); } else { x.doread = false; return true; } } else { /* this is an unmatched lead code unit (1st surrogate) */ /* callback(illegal) */ cr[0] = CoderResult.malformedForLength(1); return false; } } else { /* no more input */ return false; } } // function made out of block labeled unassigned in ucnv_MBCSDoubleFromUnicodeWithOffsets private final boolean unassignedDouble(CharBuffer source, ByteBuffer target, SideEffectsDouble x, boolean flush, CoderResult[] cr) { /* try an extension mapping */ int sourceBegin = x.sourceArrayIndex; source.position(x.sourceArrayIndex); x.c = fromU(x.c, source, target, null, x.sourceIndex, x.nextSourceIndex, flush, cr); x.sourceArrayIndex = source.position(); x.nextSourceIndex += x.sourceArrayIndex - sourceBegin; if (cr[0].isError()) { /* not mappable or buffer overflow */ return false; } else { /* a mapping was written to the target, continue */ /* recalculate the targetCapacity after an extension mapping */ // x.targetCapacity=pArgs.targetLimit-x.targetArrayIndex; /* normal end of conversion: prepare for a new character */ x.sourceIndex = x.nextSourceIndex; return true; } } /** * Overrides super class method * * @param encoder * @param source * @param target * @param offsets * @return */ protected CoderResult cbFromUWriteSub(CharsetEncoderICU encoder, CharBuffer source, ByteBuffer target, IntBuffer offsets) { CharsetMBCS cs = (CharsetMBCS) encoder.charset(); byte[] subchar; int length; if (cs.subChar1 != 0 && (cs.sharedData.mbcs.extIndexes != null ? encoder.useSubChar1 : (encoder.invalidUCharBuffer[0] <= 0xff))) { /* * select subChar1 if it is set (not 0) and the unmappable Unicode code point is up to U+00ff (IBM MBCS * behavior) */ subchar = new byte[] { cs.subChar1 }; length = 1; } else { /* select subChar in all other cases */ subchar = cs.subChar; length = cs.subCharLen; } /* reset the selector for the next code point */ encoder.useSubChar1 = false; if (cs.sharedData.mbcs.outputType == MBCS_OUTPUT_2_SISO) { byte[] buffer = new byte[4]; int i = 0; /* fromUnicodeStatus contains prevLength */ switch (length) { case 1: if (encoder.fromUnicodeStatus == 2) { /* DBCS mode and SBCS sub char: change to SBCS */ encoder.fromUnicodeStatus = 1; buffer[i++] = UConverterConstants.SI; } buffer[i++] = subchar[0]; break; case 2: if (encoder.fromUnicodeStatus <= 1) { /* SBCS mode and DBCS sub char: change to DBCS */ encoder.fromUnicodeStatus = 2; buffer[i++] = UConverterConstants.SO; } buffer[i++] = subchar[0]; buffer[i++] = subchar[1]; break; default: throw new IllegalArgumentException(); } subchar = buffer; length = i; } return CharsetEncoderICU.fromUWriteBytes(encoder, subchar, 0, length, target, offsets, source.position()); } /** * Gets called whenever CharsetEncoder.replaceWith gets called. allowReplacementChanges only allows subChar and * subChar1 to be modified outside construction (since replaceWith is called once during construction). * * @param replacement * The replacement for subchar. */ protected void implReplaceWith(byte[] replacement) { if (allowReplacementChanges) { CharsetMBCS cs = (CharsetMBCS) this.charset(); System.arraycopy(replacement, 0, cs.subChar, 0, replacement.length); cs.subCharLen = (byte) replacement.length; cs.subChar1 = 0; } } } public CharsetDecoder newDecoder() { return new CharsetDecoderMBCS(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderMBCS(this); } void MBCSGetFilteredUnicodeSetForUnicode(UConverterSharedData data, UnicodeSet setFillIn, int which, int filter){ UConverterMBCSTable mbcsTable; char[] table; char st1,maxStage1, st2; int st3; int c ; mbcsTable = data.mbcs; table = mbcsTable.fromUnicodeTable; if((mbcsTable.unicodeMask & UConverterConstants.HAS_SUPPLEMENTARY)!=0){ maxStage1 = 0x440; } else{ maxStage1 = 0x40; } c=0; /* keep track of current code point while enumerating */ if(mbcsTable.outputType==MBCS_OUTPUT_1){ char stage2, stage3; char minValue; CharBuffer results; results = ByteBuffer.wrap(mbcsTable.fromUnicodeBytes).asCharBuffer(); if(which==ROUNDTRIP_SET) { /* use only roundtrips */ minValue=0xf00; } else { /* use all roundtrip and fallback results */ minValue=0x800; } for(st1=0;st1maxStage1){ stage2 = st2; for(st2=0; st2<64; ++st2){ st3 = table[stage2 + st2]; if(st3!=0){ /*read the stage 3 block */ stage3 = (char)st3; do { if(results.get(stage3++)>=minValue){ setFillIn.add(c); } }while((++c&0xf) !=0); } else { c+= 16; /*empty stage 2 block */ } } } else { c+=1024; /* empty stage 2 block */ } } } else { int stage2,stage3; byte[] bytes; int st3Multiplier; int value; boolean useFallBack; bytes = mbcsTable.fromUnicodeBytes; useFallBack = (which == ROUNDTRIP_AND_FALLBACK_SET); switch(mbcsTable.outputType) { case MBCS_OUTPUT_3: case MBCS_OUTPUT_4_EUC: st3Multiplier = 3; break; case MBCS_OUTPUT_4: st3Multiplier =4; break; default: st3Multiplier =2; break; } //ByteBuffer buffer = (ByteBuffer)charTobyte(table); for(st1=0;st1(maxStage1>>1)){ stage2 = st2 ; for(st2=0;st2<128;++st2){ /*read the stage 3 block */ st3 = table[stage2*2 + st2]<<16; st3+=table[stage2*2 + ++st2]; if(st3!=0){ //if((st3=table[stage2+st2])!=0){ stage3 = st3Multiplier*16*(int)(st3&UConverterConstants.UNSIGNED_SHORT_MASK); /* get the roundtrip flags for the stage 3 block */ st3>>=16; st3 &= UConverterConstants.UNSIGNED_SHORT_MASK; switch(filter) { case UCNV_SET_FILTER_NONE: do { if((st3&1)!=0){ setFillIn.add(c); stage3+=st3Multiplier; }else if (useFallBack) { char b =0; switch(st3Multiplier) { case 4 : b|= ByteBuffer.wrap(bytes).getChar(stage3++); case 3 : b|= ByteBuffer.wrap(bytes).getChar(stage3++); case 2 : b|= ByteBuffer.wrap(bytes).getChar(stage3) | ByteBuffer.wrap(bytes).getChar(stage3+1); stage3+=2; default: break; } if(b!=0) { setFillIn.add(c); } } st3>>=1; }while((++c&0xf)!=0); break; case UCNV_SET_FILTER_DBCS_ONLY: /* Ignore single bytes results (<0x100). */ do { if(((st3&1) != 0 || useFallBack) && (UConverterConstants.UNSIGNED_SHORT_MASK & (ByteBuffer.wrap(bytes).getChar(stage3))) >= 0x100){ setFillIn.add(c); } st3>>=1; stage3+=2; }while((++c&0xf) != 0); break; case UCNV_SET_FILTER_2022_CN : /* only add code points that map to CNS 11643 planes 1&2 for non-EXT ISO-2202-CN. */ do { if(((st3&1) != 0 || useFallBack) && ((value= (UConverterConstants.UNSIGNED_BYTE_MASK & (ByteBuffer.wrap(bytes).get(stage3))))==0x81 || value==0x82) ){ setFillIn.add(c); } st3>>=1; stage3+=3; }while((++c&0xf)!=0); break; case UCNV_SET_FILTER_SJIS: /* only add code points that map tp Shift-JIS codes corrosponding to JIS X 0280. */ do{ if(((st3&1) != 0 || useFallBack) && (value=(UConverterConstants.UNSIGNED_SHORT_MASK & (ByteBuffer.wrap(bytes).getChar(stage3))))>=0x8140 && value<=0xeffc){ setFillIn.add(c); } st3>>=1; stage3+=2; }while((++c&0xf)!=0); break; case UCNV_SET_FILTER_GR94DBCS: /* only add code points that maps to ISO 2022 GR 94 DBCS codes*/ do { if(((st3&1) != 0 || useFallBack) && (UConverterConstants.UNSIGNED_SHORT_MASK & ((value=(UConverterConstants.UNSIGNED_SHORT_MASK & (ByteBuffer.wrap(bytes).getChar(stage3))))- 0xa1a1))<=(0xfefe - 0xa1a1) && (UConverterConstants.UNSIGNED_BYTE_MASK & (value - 0xa1)) <= (0xfe - 0xa1)){ setFillIn.add(c); } st3>>=1; stage3+=2; }while((++c&0xf)!=0); break; case UCNV_SET_FILTER_HZ: /*Only add code points that are suitable for HZ DBCS*/ do { if( ((st3&1) != 0 || useFallBack) && (UConverterConstants.UNSIGNED_SHORT_MASK & ((value=(UConverterConstants.UNSIGNED_SHORT_MASK & (ByteBuffer.wrap(bytes).getChar(stage3))))-0xa1a1))<=(0xfdfe - 0xa1a1) && (UConverterConstants.UNSIGNED_BYTE_MASK & (value - 0xa1)) <= (0xfe - 0xa1)){ setFillIn.add(c); } st3>>=1; stage3+=2; }while((++c&0xf) != 0); break; default: return; } } else { c+=16; /* empty stage 3 block */ } } } else { c+=1024; /*empty stage2 block */ } } } extGetUnicodeSet(setFillIn, which, filter, data); } static void extGetUnicodeSetString(ByteBuffer cx,UnicodeSet setFillIn, boolean useFallback, int minLength, int c, char s[],int length,int sectionIndex){ CharBuffer fromUSectionUChar; IntBuffer fromUSectionValues; fromUSectionUChar = (CharBuffer)ARRAY(cx, EXT_FROM_U_UCHARS_INDEX,char.class ); fromUSectionValues = (IntBuffer)ARRAY(cx, EXT_FROM_U_VALUES_INDEX,int.class ); int fromUSectionUCharIndex = fromUSectionUChar.position()+sectionIndex; int fromUSectionValuesIndex = fromUSectionValues.position()+sectionIndex; int value, i, count; /* read first pair of the section */ count = fromUSectionUChar.get(fromUSectionUCharIndex++); value = fromUSectionValues.get(fromUSectionValuesIndex++); if(value!=0 && (FROM_U_IS_ROUNDTRIP(value) || useFallback) && FROM_U_GET_LENGTH(value)>=minLength) { if(c>=0){ setFillIn.add(c); } else { String normalizedString=""; // String for composite characters for(int j=0; j=minLength) { String normalizedString=""; // String for composite characters for(int j=0; j<(length+1);j++){ normalizedString+=s[j]; } setFillIn.add(normalizedString); } } } static void extGetUnicodeSet(UnicodeSet setFillIn, int which, int filter, UConverterSharedData Data){ int st1, stage1Length, st2, st3, minLength; int ps2, ps3; CharBuffer stage12, stage3; int value, length; IntBuffer stage3b; boolean useFallback; char s[] = new char[MAX_UCHARS]; int c; ByteBuffer cx = Data.mbcs.extIndexes; if(cx == null){ return; } stage12 = (CharBuffer)ARRAY(cx, EXT_FROM_U_STAGE_12_INDEX,char.class ); stage3 = (CharBuffer)ARRAY(cx, EXT_FROM_U_STAGE_3_INDEX,char.class ); stage3b = (IntBuffer)ARRAY(cx, EXT_FROM_U_STAGE_3B_INDEX,int.class ); stage1Length = cx.asIntBuffer().get(EXT_FROM_U_STAGE_1_LENGTH); useFallback =(boolean)(which==ROUNDTRIP_AND_FALLBACK_SET); c = 0; if(filter == UCNV_SET_FILTER_2022_CN) { minLength = 3; } else if (Data.mbcs.outputType == MBCS_OUTPUT_DBCS_ONLY || filter != UCNV_SET_FILTER_NONE) { /* DBCS-only, ignore single-byte results */ minLength = 2; } else { minLength = 1; } for(st1=0; st1< stage1Length; ++st1){ st2 = stage12.get(st1); if(st2>stage1Length) { ps2 = st2; for(st2=0;st2<64;++st2){ st3=((int) stage12.get(ps2+st2))<=minLength){ switch(filter) { case UCNV_SET_FILTER_2022_CN: if(!(FROM_U_GET_LENGTH(value)==3 && FROM_U_GET_DATA(value)<=0x82ffff)){ continue; } break; case UCNV_SET_FILTER_SJIS: if(!(FROM_U_GET_LENGTH(value)==2 && (value=FROM_U_GET_DATA(value))>=0x8140 && value<=0xeffc)){ continue; } break; case UCNV_SET_FILTER_GR94DBCS: if(!(FROM_U_GET_LENGTH(value)==2 && (UConverterConstants.UNSIGNED_SHORT_MASK & ((value=FROM_U_GET_DATA(value)) - 0xa1a1))<=(0xfefe - 0xa1a1) && (UConverterConstants.UNSIGNED_BYTE_MASK & (value - 0xa1))<= (0xfe - 0xa1))){ continue; } break; case UCNV_SET_FILTER_HZ: if(!(FROM_U_GET_LENGTH(value)==2 && (UConverterConstants.UNSIGNED_SHORT_MASK & ((value=FROM_U_GET_DATA(value)) - 0xa1a1))<=(0xfdfe - 0xa1a1) && (UConverterConstants.UNSIGNED_BYTE_MASK & (value - 0xa1))<= (0xfe - 0xa1))){ continue; } break; default: /* * UCNV_SET_FILTER_NONE, * or UCNV_SET_FILTER_DBCS_ONLY which is handled via minLength */ break; } setFillIn.add(c); } }while((++c&0xf) != 0); } else { c+=16; /* emplty stage3 block */ } } } else { c+=1024; /* empty stage 2 block*/ } } } void MBCSGetUnicodeSetForUnicode(UConverterSharedData data, UnicodeSet setFillIn, int which){ MBCSGetFilteredUnicodeSetForUnicode(data, setFillIn, which, this.sharedData.mbcs.outputType==MBCS_OUTPUT_DBCS_ONLY ? UCNV_SET_FILTER_DBCS_ONLY : UCNV_SET_FILTER_NONE ); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ if((options & MBCS_OPTION_GB18030)!=0){ setFillIn.add(0, 0xd7ff); setFillIn.add(0xe000, 0x10ffff); } else { this.MBCSGetUnicodeSetForUnicode(sharedData, setFillIn, which); } } } icu4j-4.2/src/com/ibm/icu/charset/CharsetProviderICU.java0000644000175000017500000003273111361046170023175 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.io.IOException; import java.nio.charset.Charset; import java.nio.charset.UnsupportedCharsetException; import java.nio.charset.spi.CharsetProvider; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import com.ibm.icu.impl.InvalidFormatException; /** * A concrete subclass of CharsetProvider for loading and providing charset converters * in ICU. * @stable ICU 3.6 */ public final class CharsetProviderICU extends CharsetProvider{ private static String optionsString = null; private static boolean gettingJavaCanonicalName = false; /** * Default constructor * @stable ICU 3.6 */ public CharsetProviderICU() { } /** * Constructs a charset for the given charset name. * Implements the abstract method of super class. * @param charsetName charset name * @return charset objet for the given charset name, null if unsupported * @stable ICU 3.6 */ public final Charset charsetForName(String charsetName){ try{ // extract the options from the charset name charsetName = processOptions(charsetName); // get the canonical name String icuCanonicalName = getICUCanonicalName(charsetName); // create the converter object and return it if(icuCanonicalName==null || icuCanonicalName.length()==0){ // Try the original name, may be something added and not in the alias table. // Will get an unsupported encoding exception if it doesn't work. return getCharset(charsetName); } return getCharset(icuCanonicalName); }catch(UnsupportedCharsetException ex){ }catch(IOException ex){ } return null; } /** * Constructs a charset for the given ICU conversion table from the specified class path. * Example use: cnv = CharsetProviderICU.charsetForName("myConverter", "com/myCompany/myDataPackage");. * In this example myConverter.cnv would exist in the com/myCompany/myDataPackage Java package. * Conversion tables can be made with ICU4C's makeconv tool. * This function allows you to allows you to load user defined conversion * tables that are outside of ICU's core data. * @param charsetName The name of the charset conversion table. * @param classPath The class path that contain the conversion table. * @return charset object for the given charset name, null if unsupported * @stable ICU 3.8 */ public final Charset charsetForName(String charsetName, String classPath) { return charsetForName(charsetName, classPath, null); } /** * Constructs a charset for the given ICU conversion table from the specified class path. * This function is similar to {@link #charsetForName(String, String)}. * @param charsetName The name of the charset conversion table. * @param classPath The class path that contain the conversion table. * @param loader the class object from which to load the charset conversion table * @return charset object for the given charset name, null if unsupported * @stable ICU 3.8 */ public Charset charsetForName(String charsetName, String classPath, ClassLoader loader) { CharsetMBCS cs = null; try { cs = new CharsetMBCS(charsetName, charsetName, new String[0], classPath, loader); } catch (InvalidFormatException e) { // return null; } return cs; } /** * Gets the canonical name of the converter as defined by Java * @param enc converter name * @return canonical name of the converter * @internal ICU 3.6 * @deprecated This API is ICU internal only. */ public static final String getICUCanonicalName(String enc) throws UnsupportedCharsetException{ String canonicalName = null; String ret = null; try{ if(enc!=null){ if((canonicalName = UConverterAlias.getCanonicalName(enc, "MIME"))!=null){ ret = canonicalName; } else if((canonicalName = UConverterAlias.getCanonicalName(enc, "IANA"))!=null){ ret = canonicalName; } else if((canonicalName = UConverterAlias.getAlias(enc, 0))!=null){ /* we have some aliases in the form x-blah .. match those */ ret = canonicalName; }/*else if((canonicalName = UConverterAlias.getCanonicalName(enc, ""))!=null){ ret = canonicalName; }*/else if(enc.indexOf("x-")==0){ /* TODO: Match with getJavaCanonicalName method */ /* char temp[ UCNV_MAX_CONVERTER_NAME_LENGTH] = {0}; strcpy(temp, encName+2); */ // Remove the 'x-' and get the ICU canonical name if ((canonicalName = UConverterAlias.getAlias(enc.substring(2), 0))!=null) { ret = canonicalName; } else { ret = ""; } }else{ /* unsupported encoding */ ret = ""; } } return ret; }catch(IOException ex){ throw new UnsupportedCharsetException(enc); } } private static final Charset getCharset(String icuCanonicalName) throws IOException{ String[] aliases = (String[])getAliases(icuCanonicalName); String canonicalName = getJavaCanonicalName(icuCanonicalName); /* Concat the option string to the icuCanonicalName so that the options can be handled properly * by the actual charset. * Note: getJavaCanonicalName() may eventually call this method so skip the concatenation part * during getJavaCanonicalName() call. */ if (gettingJavaCanonicalName) { gettingJavaCanonicalName = false; } else if (optionsString != null) { icuCanonicalName = icuCanonicalName.concat(optionsString); optionsString = null; } return (CharsetICU.getCharset(icuCanonicalName,canonicalName, aliases)); } /** * Gets the canonical name of the converter as defined by Java * @param charsetName converter name * @return canonical name of the converter * @internal ICU 3.6 * @deprecated This API is ICU internal only. */ public static String getJavaCanonicalName(String charsetName){ /* If a charset listed in the IANA Charset Registry is supported by an implementation of the Java platform then its canonical name must be the name listed in the registry. Many charsets are given more than one name in the registry, in which case the registry identifies one of the names as MIME-preferred. If a charset has more than one registry name then its canonical name must be the MIME-preferred name and the other names in the registry must be valid aliases. If a supported charset is not listed in the IANA registry then its canonical name must begin with one of the strings "X-" or "x-". */ if(charsetName==null ){ return null; } try{ String cName = null; /* find out the alias with MIME tag */ if((cName=UConverterAlias.getStandardName(charsetName, "MIME"))!=null){ /* find out the alias with IANA tag */ }else if((cName=UConverterAlias.getStandardName(charsetName, "IANA"))!=null){ }else { /* check to see if an alias already exists with x- prefix, if yes then make that the canonical name */ int aliasNum = UConverterAlias.countAliases(charsetName); String name; for(int i=0;i=0;) { ret[j] = aliasArray[j]; } } return (ret); } private static final void putCharsets(Map map){ int num = UConverterAlias.countAvailable(); for(int i=0;i -1) { /* Remove and save the swap lfnl option string portion of the charset name. */ optionsString = UConverterConstants.OPTION_SWAP_LFNL_STRING; charsetName = charsetName.substring(0, charsetName.indexOf(UConverterConstants.OPTION_SWAP_LFNL_STRING)); } return charsetName; } } icu4j-4.2/src/com/ibm/icu/charset/CharsetICU.java0000644000175000017500000004303411361046170021460 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; //import java.io.ByteArrayInputStream; //import java.io.InputStreamReader; import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; import java.nio.charset.*; import java.util.HashMap; import com.ibm.icu.text.UnicodeSet; /** *

A subclass of java.nio.Charset for providing implementation of ICU's charset converters. * This API is used to convert codepage or character encoded data to and * from UTF-16. You can open a converter with {@link Charset#forName } and {@link #forNameICU }. With that * converter, you can get its properties, set options, convert your data.

* *

Since many software programs recogize different converter names for * different types of converters, there are other functions in this API to * iterate over the converter aliases. * * @stable ICU 3.6 */ public abstract class CharsetICU extends Charset{ String icuCanonicalName; String javaCanonicalName; int options; float maxCharsPerByte; String name; /* +4: 60 internal name of the converter- invariant chars */ int codepage; /* +64: 4 codepage # (now IBM-$codepage) */ byte platform; /* +68: 1 platform of the converter (only IBM now) */ byte conversionType; /* +69: 1 conversion type */ int minBytesPerChar; /* +70: 1 Minimum # bytes per char in this codepage */ int maxBytesPerChar; /* +71: 1 Maximum # bytes output per UChar in this codepage */ byte subChar[/*UCNV_MAX_SUBCHAR_LEN*/]; /* +72: 4 [note: 4 and 8 byte boundary] */ byte subCharLen; /* +76: 1 */ byte hasToUnicodeFallback; /* +77: 1 UBool needs to be changed to UBool to be consistent across platform */ byte hasFromUnicodeFallback; /* +78: 1 */ short unicodeMask; /* +79: 1 bit 0: has supplementary bit 1: has single surrogates */ byte subChar1; /* +80: 1 single-byte substitution character for IBM MBCS (0 if none) */ //byte reserved[/*19*/]; /* +81: 19 to round out the structure */ // typedef enum UConverterUnicodeSet { /** * Parameter that select the set of roundtrippable Unicode code points. * @stable ICU 4.0 */ public static final int ROUNDTRIP_SET=0; /** * Select the set of Unicode code points with roundtrip or fallback mappings. * Not supported at this point. * @internal * @deprecated This API is ICU internal only. */ public static final int ROUNDTRIP_AND_FALLBACK_SET =1; //} UConverterUnicodeSet; /** * * @param icuCanonicalName * @param canonicalName * @param aliases * @stable ICU 3.6 */ protected CharsetICU(String icuCanonicalName, String canonicalName, String[] aliases) { super(canonicalName,aliases); if(canonicalName.length() == 0){ throw new IllegalCharsetNameException(canonicalName); } this.javaCanonicalName = canonicalName; this.icuCanonicalName = icuCanonicalName; } /** * Ascertains if a charset is a sub set of this charset * Implements the abstract method of super class. * @param cs charset to test * @return true if the given charset is a subset of this charset * @stable ICU 3.6 */ public boolean contains(Charset cs){ if (null == cs) { return false; } else if (this.equals(cs)) { return true; } return false; } private static final HashMap algorithmicCharsets = new HashMap(); static{ algorithmicCharsets.put("LMBCS-1", "com.ibm.icu.charset.CharsetLMBCS"); algorithmicCharsets.put("BOCU-1", "com.ibm.icu.charset.CharsetBOCU1" ); algorithmicCharsets.put("SCSU", "com.ibm.icu.charset.CharsetSCSU" ); algorithmicCharsets.put("US-ASCII", "com.ibm.icu.charset.CharsetASCII" ); algorithmicCharsets.put("ISO-8859-1", "com.ibm.icu.charset.Charset88591" ); algorithmicCharsets.put("UTF-16", "com.ibm.icu.charset.CharsetUTF16" ); algorithmicCharsets.put("UTF-16BE", "com.ibm.icu.charset.CharsetUTF16BE" ); algorithmicCharsets.put("UTF-16LE", "com.ibm.icu.charset.CharsetUTF16LE" ); algorithmicCharsets.put("UTF16_OppositeEndian", "com.ibm.icu.charset.CharsetUTF16LE" ); algorithmicCharsets.put("UTF16_PlatformEndian", "com.ibm.icu.charset.CharsetUTF16" ); algorithmicCharsets.put("UTF-32", "com.ibm.icu.charset.CharsetUTF32" ); algorithmicCharsets.put("UTF-32BE", "com.ibm.icu.charset.CharsetUTF32BE" ); algorithmicCharsets.put("UTF-32LE", "com.ibm.icu.charset.CharsetUTF32LE" ); algorithmicCharsets.put("UTF32_OppositeEndian", "com.ibm.icu.charset.CharsetUTF32LE" ); algorithmicCharsets.put("UTF32_PlatformEndian", "com.ibm.icu.charset.CharsetUTF32" ); algorithmicCharsets.put("UTF-8", "com.ibm.icu.charset.CharsetUTF8" ); algorithmicCharsets.put("CESU-8", "com.ibm.icu.charset.CharsetCESU8" ); algorithmicCharsets.put("UTF-7", "com.ibm.icu.charset.CharsetUTF7" ); algorithmicCharsets.put("ISCII,version=0", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=1", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=2", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=3", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=4", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=5", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=6", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=7", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("ISCII,version=8", "com.ibm.icu.charset.CharsetISCII" ); algorithmicCharsets.put("IMAP-mailbox-name", "com.ibm.icu.charset.CharsetUTF7" ); algorithmicCharsets.put("HZ", "com.ibm.icu.charset.CharsetHZ" ); algorithmicCharsets.put("ISO_2022,locale=ja,version=0", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=ja,version=1", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=ja,version=2", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=ja,version=3", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=ja,version=4", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=zh,version=0", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=zh,version=1", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=ko,version=0", "com.ibm.icu.charset.CharsetISO2022" ); algorithmicCharsets.put("ISO_2022,locale=ko,version=1", "com.ibm.icu.charset.CharsetISO2022" ); } /*public*/ static final Charset getCharset(String icuCanonicalName, String javaCanonicalName, String[] aliases){ String className = (String) algorithmicCharsets.get(icuCanonicalName); if(className==null){ //all the cnv files are loaded as MBCS className = "com.ibm.icu.charset.CharsetMBCS"; } try{ CharsetICU conv = null; Class cs = Class.forName(className); Class[] paramTypes = new Class[]{ String.class, String.class, String[].class}; final Constructor c = cs.getConstructor(paramTypes); Object[] params = new Object[]{ icuCanonicalName, javaCanonicalName, aliases}; // Run constructor try { Object obj = c.newInstance(params); if(obj!=null && obj instanceof CharsetICU){ conv = (CharsetICU)obj; return conv; } }catch (InvocationTargetException e) { throw new UnsupportedCharsetException( icuCanonicalName+": "+"Could not load " + className+ ". Exception:" + e.getTargetException()); } }catch(ClassNotFoundException ex){ }catch(NoSuchMethodException ex){ }catch (IllegalAccessException ex){ }catch (InstantiationException ex){ } throw new UnsupportedCharsetException( icuCanonicalName+": "+"Could not load " + className); } static final boolean isSurrogate(int c){ return (((c)&0xfffff800)==0xd800); } /* * Returns the default charset name */ // static final String getDefaultCharsetName(){ // String defaultEncoding = new InputStreamReader(new ByteArrayInputStream(new byte[0])).getEncoding(); // return defaultEncoding; // } /** * Returns a charset object for the named charset. * This method gurantee that ICU charset is returned when * available. If the ICU charset provider does not support * the specified charset, then try other charset providers * including the standard Java charset provider. * * @param charsetName The name of the requested charset, * may be either a canonical name or an alias * @return A charset object for the named charset * @throws IllegalCharsetNameException If the given charset name * is illegal * @throws UnsupportedCharsetException If no support for the * named charset is available in this instance of th Java * virtual machine * @stable ICU 3.6 */ public static Charset forNameICU(String charsetName) throws IllegalCharsetNameException, UnsupportedCharsetException { CharsetProviderICU icuProvider = new CharsetProviderICU(); CharsetICU cs = (CharsetICU) icuProvider.charsetForName(charsetName); if (cs != null) { return cs; } return Charset.forName(charsetName); } // /** // * @see java.lang.Comparable#compareTo(java.lang.Object) // * @stable 3.8 // */ // public int compareTo(Object otherObj) { // if (!(otherObj instanceof CharsetICU)) { // return -1; // } // return icuCanonicalName.compareTo(((CharsetICU)otherObj).icuCanonicalName); // } /** * This follows ucnv.c method ucnv_detectUnicodeSignature() to detect the * start of the stream for example U+FEFF (the Unicode BOM/signature * character) that can be ignored. * * Detects Unicode signature byte sequences at the start of the byte stream * and returns number of bytes of the BOM of the indicated Unicode charset. * 0 is returned when no Unicode signature is recognized. * */ // TODO This should be proposed as CharsetDecoderICU API. // static String detectUnicodeSignature(ByteBuffer source) { // int signatureLength = 0; // number of bytes of the signature // final int SIG_MAX_LEN = 5; // String sigUniCharset = null; // states what unicode charset is the BOM // int i = 0; // // /* // * initial 0xa5 bytes: make sure that if we read Returns the set of Unicode code points that can be converted by an ICU Converter. *

* The current implementation returns only one kind of set (UCNV_ROUNDTRIP_SET): The set of all Unicode code points that can be * roundtrip-converted (converted without any data loss) with the converter This set will not include code points that have fallback * mappings or are only the result of reverse fallback mappings. See UTR #22 "Character Mapping Markup Language" at http://www.unicode.org/reports/tr22/ *

* In the future, there may be more UConverterUnicodeSet choices to select sets with different properties. *

*

This is useful for example for *

  • checking that a string or document can be roundtrip-converted with a converter, * without/before actually performing the conversion
  • *
  • testing if a converter can be used for text for typical text for a certain locale, * by comparing its roundtrip set with the set of ExemplarCharacters from * ICU's locale data or other sources
* * @param setFillIn A valid UnicodeSet. It will be cleared by this function before * the converter's specific set is filled in. * @param which A selector; currently ROUNDTRIP_SET is the only supported value. * @throws IllegalArgumentException if the parameters does not match. * @stable ICU 4.0 */ public void getUnicodeSet(UnicodeSet setFillIn, int which){ if( setFillIn == null || which != ROUNDTRIP_SET ){ throw new IllegalArgumentException(); } setFillIn.clear(); getUnicodeSetImpl(setFillIn, which); } static void getNonSurrogateUnicodeSet(UnicodeSet setFillIn){ setFillIn.add(0, 0xd7ff); setFillIn.add(0xe000, 0x10ffff); } static void getCompleteUnicodeSet(UnicodeSet setFillIn){ setFillIn.add(0, 0x10ffff); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetCESU8.java0000644000175000017500000000201711361046170021663 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import com.ibm.icu.text.UnicodeSet; /** * The purpose of this class is to set isCESU8 to true in the super class, and to allow the Charset framework to open * the variant UTF-8 converter without extra setup work. CESU-8 encodes/decodes supplementary characters as 6 bytes * instead of the proper 4 bytes. */ class CharsetCESU8 extends CharsetUTF8 { public CharsetCESU8(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ getCompleteUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetISO2022.java0000644000175000017500000045030311361046170022001 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import java.util.Arrays; import com.ibm.icu.charset.CharsetMBCS.CharsetDecoderMBCS; import com.ibm.icu.charset.CharsetMBCS.CharsetEncoderMBCS; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; class CharsetISO2022 extends CharsetICU { private UConverterDataISO2022 myConverterData; private int variant; // one of enum {ISO_2022_JP, ISO_2022_KR, or ISO_2022_CN} private static final byte[] SHIFT_IN_STR = { 0x0f }; // private static final byte[] SHIFT_OUT_STR = { 0x0e }; private static final byte CR = 0x0D; private static final byte LF = 0x0A; /* private static final byte H_TAB = 0x09; private static final byte SPACE = 0x20; */ private static final char HWKANA_START = 0xff61; private static final char HWKANA_END = 0xff9f; /* * 94-character sets with native byte values A1..FE are encoded in ISO 2022 * as bytes 21..7E. (Subtract 0x80.) * 96-character sets with native bit values A0..FF are encoded in ISO 2022 * as bytes 20..7F. (Subtract 0x80.) * Do not encode C1 control codes with native bytes 80..9F * as bytes 00..1F (C0 control codes). */ /* private static final char GR94_START = 0xa1; private static final char GR94_END = 0xfe; */ private static final char GR96_START = 0xa0; private static final char GR96_END = 0xff; /* for ISO-2022-JP and -CN implementations */ // typedef enum { /* shared values */ private static final byte INVALID_STATE = -1; private static final byte ASCII = 0; private static final byte SS2_STATE = 0x10; private static final byte SS3_STATE = 0x11; /* JP */ private static final byte ISO8859_1 = 1; private static final byte ISO8859_7 = 2; private static final byte JISX201 = 3; private static final byte JISX208 = 4; private static final byte JISX212 = 5; private static final byte GB2312 = 6; private static final byte KSC5601 = 7; private static final byte HWKANA_7BIT = 8; /* Halfwidth Katakana 7 bit */ /* CN */ /* the first few enum constants must keep their values because they corresponds to myConverterArray[] */ private static final byte GB2312_1 = 1; private static final byte ISO_IR_165= 2; private static final byte CNS_11643 = 3; /* * these are used in StateEnum and ISO2022State variables, * but CNS_11643 must be used to index into myConverterArray[] */ private static final byte CNS_11643_0 = 0x20; private static final byte CNS_11643_1 = 0x21; private static final byte CNS_11643_2 = 0x22; private static final byte CNS_11643_3 = 0x23; private static final byte CNS_11643_4 = 0x24; private static final byte CNS_11643_5 = 0x25; private static final byte CNS_11643_6 = 0x26; private static final byte CNS_11643_7 = 0x27; // } StateEnum; public CharsetISO2022(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); myConverterData = new UConverterDataISO2022(); int versionIndex = icuCanonicalName.indexOf("version="); int version = Integer.decode(icuCanonicalName.substring(versionIndex+8, versionIndex+9)).intValue(); myConverterData.version = version; if (icuCanonicalName.indexOf("locale=ja") > 0) { ISO2022InitJP(version); } else if (icuCanonicalName.indexOf("locale=zh") > 0) { ISO2022InitCN(version); } else /* if (icuCanonicalName.indexOf("locale=ko") > 0) */ { ISO2022InitKR(version); } myConverterData.currentEncoder = (CharsetEncoderMBCS)myConverterData.currentConverter.newEncoder(); myConverterData.currentDecoder = (CharsetDecoderMBCS)myConverterData.currentConverter.newDecoder(); } private void ISO2022InitJP(int version) { variant = ISO_2022_JP; maxBytesPerChar = 6; minBytesPerChar = 1; maxCharsPerByte = 1; // open the required converters and cache them if((jpCharsetMasks[version]&CSM(ISO8859_7)) != 0) { myConverterData.myConverterArray[ISO8859_7] = ((CharsetMBCS)CharsetICU.forNameICU("ISO8859_7")).sharedData; } // myConverterData.myConverterArray[JISX201] = ((CharsetMBCS)CharsetICU.forNameICU("jisx-201")).sharedData; myConverterData.myConverterArray[JISX208] = ((CharsetMBCS)CharsetICU.forNameICU("Shift-JIS")).sharedData; if ((jpCharsetMasks[version]&CSM(JISX212)) != 0) { myConverterData.myConverterArray[JISX212] = ((CharsetMBCS)CharsetICU.forNameICU("jisx-212")).sharedData; } if ((jpCharsetMasks[version]&CSM(GB2312)) != 0) { myConverterData.myConverterArray[GB2312] = ((CharsetMBCS)CharsetICU.forNameICU("ibm-5478")).sharedData; } if ((jpCharsetMasks[version]&CSM(KSC5601)) != 0) { myConverterData.myConverterArray[KSC5601] = ((CharsetMBCS)CharsetICU.forNameICU("ksc_5601")).sharedData; } // create a generic CharsetMBCS object myConverterData.currentConverter = (CharsetMBCS)CharsetICU.forNameICU("icu-internal-25546"); } private void ISO2022InitCN(int version) { variant = ISO_2022_CN; maxBytesPerChar = 8; minBytesPerChar = 1; maxCharsPerByte = 1; // open the required coverters and cache them. myConverterData.myConverterArray[GB2312_1] = ((CharsetMBCS)CharsetICU.forNameICU("ibm-5478")).sharedData; if (version == 1) { myConverterData.myConverterArray[ISO_IR_165] = ((CharsetMBCS)CharsetICU.forNameICU("iso-ir-165")).sharedData; } myConverterData.myConverterArray[CNS_11643] = ((CharsetMBCS)CharsetICU.forNameICU("cns-11643-1992")).sharedData; // create a generic CharsetMBCS object myConverterData.currentConverter = (CharsetMBCS)CharsetICU.forNameICU("icu-internal-25546"); } private void ISO2022InitKR(int version) { variant = ISO_2022_KR; maxBytesPerChar = 3; minBytesPerChar = 1; maxCharsPerByte = 1; if (version == 1) { myConverterData.currentConverter = (CharsetMBCS)CharsetICU.forNameICU("icu-internal-25546"); myConverterData.currentConverter.subChar1 = fromUSubstitutionChar[0][0]; } else { myConverterData.currentConverter = (CharsetMBCS)CharsetICU.forNameICU("ibm-949"); } myConverterData.currentEncoder = (CharsetEncoderMBCS)myConverterData.currentConverter.newEncoder(); myConverterData.currentDecoder = (CharsetDecoderMBCS)myConverterData.currentConverter.newDecoder(); } /* * ISO 2022 control codes must not be converted from Unicode * because they would mess up the byte stream. * The bit mask 0x0800c000 has bits set at bit positions 0xe, 0xf, 0x1b * corresponding to SO, SI, and ESC. */ private static boolean IS_2022_CONTROL(int c) { return (((c)<0x20) && ((((int)1<= 0xa1a1) && ((short)(value&UConverterConstants.UNSIGNED_BYTE_MASK) <= 0xfe && ((short)(value&UConverterConstants.UNSIGNED_BYTE_MASK) >= 0xa1))) { return (value - 0x8080); /* shift down to 21..7e byte range */ } else { return 0; /* not valid for ISO 2022 */ } } /* * Commented out because Ticket 5691: Call sites now check for validity. They can just += 0x8080 after that. * * This method does the reverse of _2022FromGR94DBCS(). Given the 2022 code point, it returns the * 2 byte value that is in the range A1..FE for each byte. Otherwise it returns the 2022 code point * unchanged. * private static int _2022ToGR94DBCS(int value) { int returnValue = value + 0x8080; if ((returnValue <= 0xfefe && returnValue >= 0xa1a1) && ((short)(returnValue&UConverterConstants.UNSIGNED_BYTE_MASK) <= 0xfe && ((short)(returnValue&UConverterConstants.UNSIGNED_BYTE_MASK) >= 0xa1))) { return returnValue; } else { return value; } }*/ /* is the StateEnum charset value for a DBCS charset? */ private static boolean IS_JP_DBCS(byte cs) { return ((JISX208 <= cs) && (cs <= KSC5601)); } private static short CSM(short cs) { return (short)(1<= 0x10000 && (sharedData.mbcs.unicodeMask&UConverterConstants.HAS_SUPPLEMENTARY) == 0) { return 0; } /* convert the Unicode code point in c into codepage bytes */ table = sharedData.mbcs.fromUnicodeTable; /* get the byte for the output */ value = CharsetMBCS.MBCS_SINGLE_RESULT_FROM_U(table, sharedData.mbcs.fromUnicodeBytes, c); /* get the byte for the output */ retval[0] = value & 0xff; if (value >= 0xf00) { return 1; /* roundtrip */ } else if (useFallback ? value>=0x800 : value>=0xc00) { return -1; /* fallback taken */ } else { return 0; /* no mapping */ } } /* * Each of these charset masks (with index x) contains a bit for a charset in exact correspondence * to whether that charset is used in the corresponding version x of ISO_2022, locale=ja,version=x * * Note: The converter uses some leniency: * - The escape sequence ESC ( I for half-width 7-bit Katakana is recognized in * all versions, not just JIS7 and JIS8. * - ICU does not distinguish between different version so of JIS X 0208. */ private static final short jpCharsetMasks[] = { (short)(CSM(ASCII)|CSM(JISX201)|CSM(JISX208)|CSM(HWKANA_7BIT)), (short)(CSM(ASCII)|CSM(JISX201)|CSM(JISX208)|CSM(HWKANA_7BIT)|CSM(JISX212)), (short)(CSM(ASCII)|CSM(JISX201)|CSM(JISX208)|CSM(HWKANA_7BIT)|CSM(JISX212)|CSM(GB2312)|CSM(KSC5601)|CSM(ISO8859_1)|CSM(ISO8859_7)), (short)(CSM(ASCII)|CSM(JISX201)|CSM(JISX208)|CSM(HWKANA_7BIT)|CSM(JISX212)|CSM(GB2312)|CSM(KSC5601)|CSM(ISO8859_1)|CSM(ISO8859_7)), (short)(CSM(ASCII)|CSM(JISX201)|CSM(JISX208)|CSM(HWKANA_7BIT)|CSM(JISX212)|CSM(GB2312)|CSM(KSC5601)|CSM(ISO8859_1)|CSM(ISO8859_7)) }; /* // typedef enum { private static final byte ASCII1 = 0; private static final byte LATIN1 = 1; private static final byte SBCS = 2; private static final byte DBCS = 3; private static final byte MBCS = 4; private static final byte HWKANA = 5; // } Cnv2002Type; */ private class ISO2022State { private byte []cs; /* Charset number for SI (G0)/SO (G1)/SS2 (G2)/SS3 (G3) */ private byte g; /* 0..3 for G0..G3 (SI/SO/SS2/SS3) */ private byte prevG; /* g before single shift (SS2 or SS3) */ ISO2022State() { cs = new byte[4]; } void reset() { Arrays.fill(cs, (byte)0); g = 0; prevG = 0; } } // private static final byte UCNV_OPTIONS_VERSION_MASK = 0xf; private static final byte UCNV_2022_MAX_CONVERTERS = 10; private class UConverterDataISO2022 { UConverterSharedData []myConverterArray; CharsetEncoderMBCS currentEncoder; CharsetDecoderMBCS currentDecoder; CharsetMBCS currentConverter; int currentType; // Cnv2022Type; ISO2022State toU2022State; ISO2022State fromU2022State; int key; int version; boolean isEmptySegment; UConverterDataISO2022() { myConverterArray = new UConverterSharedData[UCNV_2022_MAX_CONVERTERS]; toU2022State = new ISO2022State(); fromU2022State = new ISO2022State(); currentType = 0; key = 0; version = 0; isEmptySegment = false; } void reset() { toU2022State.reset(); fromU2022State.reset(); isEmptySegment = false; } } private static final byte ESC_2022 = 0x1B; /* ESC */ // typedef enum { private static final byte INVALID_2022 = -1; /* Doesn't correspond to a valid iso 2022 escape sequence */ private static final byte VALID_NON_TERMINAL_2022 = 0; /* so far corresponds to a valid iso 2022 escape sequence */ private static final byte VALID_TERMINAL_2022 = 1; /* corresponds to a valid iso 2022 escape sequence */ private static final byte VALID_MAYBE_TERMINAL_2022 = 2; /* so far matches one iso 2022 escape sequence, but by adding more characters might match another escape sequence */ // } UCNV_TableStates_2022; /* * The way these state transition arrays work is: * ex : ESC$B is the sequence for JISX208 * a) First Iteration: char is ESC * i) Get the value of ESC from normalize_esq_chars_2022[] with int value of ESC as index * int x = normalize_esq_chars_2022[27] which is equal to 1 * ii) Search for this value in escSeqStateTable_Key_2022[] * value of x is stored at escSeqStateTable_Key_2022[0] * iii) Save this index as offset * iv) Get state of this sequence from escSeqStateTable_Value_2022[] * escSeqStateTable_value_2022[offset], which is VALID_NON_TERMINAL_2022 * b) Switch on this state and continue to next char * i) Get the value of $ from normalize_esq_chars_2022[] with int value of $ as index * which is normalize_esq_chars_2022[36] == 4 * ii) x is currently 1(from above) * x<<=5 -- x is now 32 * x+=normalize_esq_chars_2022[36] * now x is 36 * iii) Search for this value in escSeqStateTable_Key_2022[] * value of x is stored at escSeqStateTable_Key_2022[2], so offset is 2 * iv) Get state of this sequence from escSeqStateTable_Value_2022[] * escSeqStateTable_Value_2022[offset], which is VALID_NON_TERMINAL_2022 * c) Switch on this state and continue to next char * i) Get the value of B from normalize_esq_chars_2022[] with int value of B as index * ii) x is currently 36 (from above) * x<<=5 -- x is now 1152 * x+= normalize_esq_chars_2022[66] * now x is 1161 * iii) Search for this value in escSeqStateTable_Key_2022[] * value of x is stored at escSeqStateTable_Key_2022[21], so offset is 21 * iv) Get state of this sequence from escSeqStateTable_Value_2022[1] * escSeqStateTable_Value_2022[offset], which is VALID_TERMINAL_2022 * v) Get the converter name from escSeqStateTable_Result_2022[21] which is JISX208 */ /* Below are the 3 arrays depicting a state transition table */ private static final byte normalize_esq_chars_2022[] = { /* 0 1 2 3 4 5 6 7 8 9 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 4, 7, 29, 0, 2, 24, 26, 27, 0, 3, 23, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 28, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; private static final short MAX_STATES_2022 = 74; private static final int escSeqStateTable_Key_2022[/* MAX_STATES_2022 */] = { /* 0 1 2 3 4 5 6 7 8 9 */ 1, 34, 36, 39, 55, 57, 60, 61, 1093, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1109, 1154, 1157, 1160, 1161, 1176, 1178, 1179, 1254, 1257, 1768, 1773, 1957, 35105, 36933, 36936, 36937, 36938, 36939, 36940, 36942, 36943, 36944, 36945, 36946, 36947, 36948, 37640, 37642, 37644, 37646, 37711, 37744, 37745, 37746, 37747, 37748, 40133, 40136, 40138, 40139, 40140, 40141, 1123363, 35947624, 35947625, 35947626, 35947627, 35947629, 35947630, 35947631, 35947635, 35947636, 35947638 }; private static final byte escSeqStateTable_Value_2022[/* MAX_STATES_2022 */] = { /* 0 1 2 3 4 */ VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_MAYBE_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_NON_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022, VALID_TERMINAL_2022 }; /* Type def for refactoring changeState_2022 code */ // typedef enum { private static final byte ISO_2022_JP = 1; private static final byte ISO_2022_KR = 2; private static final byte ISO_2022_CN = 3; // } Variant2022; /* const UConverterSharedData _ISO2022Data; */ //private UConverterSharedData _ISO2022JPData; //private UConverterSharedData _ISO2022KRData; //private UConverterSharedData _ISO2022CNData; /******************** to unicode ********************/ /**************************************************** * Recognized escape sequenes are * (B ASCII * .A ISO-8859-1 * .F ISO-8859-7 * (J JISX-201 * (I JISX-201 * $B JISX-208 * $@ JISX-208 * $(D JISX-212 * $A GB2312 * $(C KSC5601 */ private final static byte nextStateToUnicodeJP[/* MAX_STATES_2022 */] = { /* 0 1 2 3 4 5 6 7 8 9 */ INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, SS2_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, ASCII, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, JISX201, HWKANA_7BIT, JISX201, INVALID_STATE, INVALID_STATE, INVALID_STATE, JISX208, GB2312, JISX208, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, ISO8859_1, ISO8859_7, JISX208, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, KSC5601, JISX212, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE }; private final static byte nextStateToUnicodeCN[/* MAX_STATES_2022 */] = { /* 0 1 2 3 4 5 6 7 8 9 */ INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, SS2_STATE, SS3_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, GB2312_1, INVALID_STATE, ISO_IR_165, CNS_11643_1, CNS_11643_2, CNS_11643_3, CNS_11643_4, CNS_11643_5, CNS_11643_6, CNS_11643_7, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE }; /* runs through a state machine to determine the escape sequence - codepage correspondence */ private CoderResult changeState_2022(CharsetDecoderICU decoder, ByteBuffer source, int var) { CoderResult err = CoderResult.UNDERFLOW; boolean DONE = false; byte value; int key[] = {myConverterData.key}; int offset[] = {0}; int initialToULength = decoder.toULength; byte c; int malformLength = 0; value = VALID_NON_TERMINAL_2022; while (source.hasRemaining()) { c = source.get(); malformLength++; decoder.toUBytesArray[decoder.toULength++] = c; value = getKey_2022(c, key, offset); switch(value) { case VALID_NON_TERMINAL_2022: /* continue with the loop */ break; case VALID_TERMINAL_2022: key[0] = 0; DONE = true; break; case INVALID_2022: DONE = true; break; case VALID_MAYBE_TERMINAL_2022: /* not ISO_2022 itself, finish here */ value = VALID_TERMINAL_2022; key[0] = 0; DONE = true; break; } if (DONE) { break; } } // DONE: myConverterData.key = key[0]; if (value == VALID_NON_TERMINAL_2022) { /* indicate that the escape sequence is incomplete: key !=0 */ return err; } else if (value == INVALID_2022) { err = CoderResult.malformedForLength(malformLength); } else /* value == VALID_TERMINAL_2022 */ { switch (var) { case ISO_2022_JP: { byte tempState = nextStateToUnicodeJP[offset[0]]; switch (tempState) { case INVALID_STATE: err = CoderResult.malformedForLength(malformLength); break; case SS2_STATE: if (myConverterData.toU2022State.cs[2] != 0) { if (myConverterData.toU2022State.g < 2) { myConverterData.toU2022State.prevG = myConverterData.toU2022State.g; } myConverterData.toU2022State.g = 2; } else { /* illegal to have SS2 before a matching designator */ err = CoderResult.malformedForLength(malformLength); } break; /* case SS3_STATE: not used in ISO-2022-JP-x */ case ISO8859_1: case ISO8859_7: if ((jpCharsetMasks[myConverterData.version] & CSM(tempState)) == 0) { err = CoderResult.unmappableForLength(malformLength); } else { /* G2 charset for SS2 */ myConverterData.toU2022State.cs[2] = tempState; } break; default: if ((jpCharsetMasks[myConverterData.version] & CSM(tempState)) == 0) { err = CoderResult.unmappableForLength(source.position() - 1); } else { /* G0 charset */ myConverterData.toU2022State.cs[0] = tempState; } break; } // end of switch break; } case ISO_2022_CN: { byte tempState = nextStateToUnicodeCN[offset[0]]; switch (tempState) { case INVALID_STATE: err = CoderResult.unmappableForLength(malformLength); break; case SS2_STATE: if (myConverterData.toU2022State.cs[2] != 0) { if (myConverterData.toU2022State.g < 2) { myConverterData.toU2022State.prevG = myConverterData.toU2022State.g; } myConverterData.toU2022State.g = 2; } else { /* illegal to have SS2 before a matching designator */ err = CoderResult.malformedForLength(malformLength); } break; case SS3_STATE: if (myConverterData.toU2022State.cs[3] != 0) { if (myConverterData.toU2022State.g < 2) { myConverterData.toU2022State.prevG = myConverterData.toU2022State.g; } myConverterData.toU2022State.g = 3; } else { /* illegal to have SS3 before a matching designator */ err = CoderResult.malformedForLength(malformLength); } break; case ISO_IR_165: if (myConverterData.version == 0) { err = CoderResult.unmappableForLength(malformLength); break; } /* fall through */ case GB2312_1: /* fall through */ case CNS_11643_1: myConverterData.toU2022State.cs[1] = tempState; break; case CNS_11643_2: myConverterData.toU2022State.cs[2] = tempState; break; default: /* other CNS 11643 planes */ if (myConverterData.version == 0) { err = CoderResult.unmappableForLength(source.position() - 1); } else { myConverterData.toU2022State.cs[3] = tempState; } break; } //end of switch } break; case ISO_2022_KR: if (offset[0] == 0x30) { /* nothing to be done, just accept this one escape sequence */ } else { err = CoderResult.unmappableForLength(malformLength); } break; default: err = CoderResult.malformedForLength(malformLength); break; } // end of switch } if (!err.isError()) { decoder.toULength = 0; } else if (err.isMalformed()) { if (decoder.toULength > 1) { /* * Ticket 5691: consistent illegal sequences: * - We include at least the first byte (ESC) in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequece before the first one of those. * In escape sequences, all following bytes are "printable", that is, * unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS), * they are valid single/lead bytes. * For simplicity, we always only report the initial ESC byte as the * illegal sequence and back out all other bytes we looked at. */ /* Back out some bytes. */ int backOutDistance = decoder.toULength - 1; int bytesFromThisBuffer = decoder.toULength - initialToULength; if (backOutDistance <= bytesFromThisBuffer) { /* same as initialToULength<=1 */ source.position(source.position() - backOutDistance); } else { /* Back out bytes from the previous buffer: Need to replay them. */ decoder.preToULength = (byte)(bytesFromThisBuffer - backOutDistance); /* same as -(initalToULength-1) */ /* preToULength is negative! */ for (int i = 0; i < -(decoder.preToULength); i++) { decoder.preToUArray[i] = decoder.toUBytesArray[i+1]; } source.position(source.position() - bytesFromThisBuffer); } decoder.toULength = 1; } } return err; } private static byte getKey_2022(byte c, int[]key, int[]offset) { int togo; int low = 0; int hi = MAX_STATES_2022; int oldmid = 0; togo = normalize_esq_chars_2022[(short)c&UConverterConstants.UNSIGNED_BYTE_MASK]; if (togo == 0) { /* not a valid character anywhere in an escape sequence */ key[0] = 0; offset[0] = 0; return INVALID_2022; } togo = (key[0] << 5) + togo; while (hi != low) { /* binary search */ int mid = (hi+low) >> 1; /* Finds median */ if (mid == oldmid) { break; } if (escSeqStateTable_Key_2022[mid] > togo) { hi = mid; } else if (escSeqStateTable_Key_2022[mid] < togo) { low = mid; } else /* we found it */ { key[0] = togo; offset[0] = mid; return escSeqStateTable_Value_2022[mid]; } oldmid = mid; } return INVALID_2022; } /* * To Unicode Callback helper function */ private static CoderResult toUnicodeCallback(CharsetDecoderICU cnv, int sourceChar, int targetUniChar) { CoderResult err = CoderResult.UNDERFLOW; if (sourceChar > 0xff) { cnv.toUBytesArray[0] = (byte)(sourceChar>>8); cnv.toUBytesArray[1] = (byte)sourceChar; cnv.toULength = 2; } else { cnv.toUBytesArray[0] = (byte)sourceChar; cnv.toULength = 1; } if (targetUniChar == (UConverterConstants.missingCharMarker-1/* 0xfffe */)) { err = CoderResult.unmappableForLength(1); } else { err = CoderResult.malformedForLength(1); } return err; } /****************************ISO-2022-JP************************************/ private class CharsetDecoderISO2022JP extends CharsetDecoderICU { public CharsetDecoderISO2022JP(CharsetICU cs) { super(cs); } protected void implReset() { super.implReset(); myConverterData.reset(); } /* * Map 00..7F to Unicode according to JIS X 0201. * */ private int jisx201ToU(int value) { if (value < 0x5c) { return value; } else if (value == 0x5c) { return 0xa5; } else if (value == 0x7e) { return 0x203e; } else { /* value <= 0x7f */ return value; } } /* * Convert a pair of JIS X 208 21..7E bytes to Shift-JIS. * If either byte is outside 21..7E make sure that the result is not valid * for Shift-JIS so that the converter catches it. * Some invalid byte values already turn into equally invalid Shift-JIS * byte values and need not be tested explicitly. */ private void _2022ToSJIS(char c1, char c2, byte []bytes) { if ((c1&1) > 0) { ++c1; if (c2 <= 0x5f) { c2 += 0x1f; } else if (c2 <= 0x7e) { c2 += 0x20; } else { c2 = 0; /* invalid */ } } else { if ((c2 >= 0x21) && (c2 <= 0x7e)) { c2 += 0x7e; } else { c2 = 0; /* invalid */ } } c1 >>=1; if (c1 <= 0x2f) { c1 += 0x70; } else if (c1 <= 0x3f) { c1 += 0xb0; } else { c1 = 0; /* invalid */ } bytes[0] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & c1); bytes[1] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & c2); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { boolean gotoGetTrail = false; boolean gotoEscape = false; CoderResult err = CoderResult.UNDERFLOW; byte []tempBuf = new byte[2]; int targetUniChar = 0x0000; int mySourceChar = 0x0000; int mySourceCharTemp = 0x0000; // use for getTrail label call. byte cs; /* StateEnum */ byte csTemp= 0; // use for getTrail label call. if (myConverterData.key != 0) { /* continue with a partial escape sequence */ // goto escape; gotoEscape = true; } else if (toULength == 1 && source.hasRemaining() && target.hasRemaining()) { /* continue with a partial double-byte character */ mySourceChar = (toUBytesArray[0] & UConverterConstants.UNSIGNED_BYTE_MASK); toULength = 0; cs = myConverterData.toU2022State.cs[myConverterData.toU2022State.g]; // goto getTrailByte; mySourceCharTemp = 0x99; gotoGetTrail = true; } while (source.hasRemaining() || gotoEscape || gotoGetTrail) { // This code is here for the goto escape label call above. if (gotoEscape) { mySourceCharTemp = ESC_2022; } targetUniChar = UConverterConstants.missingCharMarker; if (gotoEscape || gotoGetTrail || target.hasRemaining()) { if (!gotoEscape && !gotoGetTrail) { mySourceChar = source.get() & UConverterConstants.UNSIGNED_BYTE_MASK; mySourceCharTemp = mySourceChar; } switch (mySourceCharTemp) { case UConverterConstants.SI: if (myConverterData.version == 3) { myConverterData.toU2022State.g = 0; continue; } else { /* only JIS7 uses SI/SO, not ISO-2022-JP-x */ myConverterData.isEmptySegment = false; break; } case UConverterConstants.SO: if (myConverterData.version == 3) { /* JIS7: switch to G1 half-width Katakana */ myConverterData.toU2022State.cs[1] = HWKANA_7BIT; myConverterData.toU2022State.g = 1; continue; } else { /* only JIS7 uses SI/SO, not ISO-2022-JP-x */ myConverterData.isEmptySegment = false; /* reset this, we have a different error */ break; } case ESC_2022: if (!gotoEscape) { source.position(source.position() - 1); } else { gotoEscape = false; } // escape: { int mySourceBefore = source.position(); int toULengthBefore = this.toULength; err = changeState_2022(this, source, variant); /* If in ISO-2022-JP only and we successully completed an escape sequence, but previous segment was empty, create an error */ if(myConverterData.version == 0 && myConverterData.key == 0 && !err.isError() && myConverterData.isEmptySegment) { err = CoderResult.malformedForLength(source.position() - mySourceBefore); this.toULength = toULengthBefore + (source.position() - mySourceBefore); } } /* invalid or illegal escape sequence */ if(err.isError()){ myConverterData.isEmptySegment = false; /* Reset to avoid future spurious errors */ return err; } /* If we successfully completed an escape sequence, we begin a new segment, empty so far */ if(myConverterData.key == 0) { myConverterData.isEmptySegment = true; } continue; /* ISO-2022-JP does not use single-byte (C1) SS2 and SS3 */ case CR: /* falls through */ case LF: /* automatically reset to single-byte mode */ if (myConverterData.toU2022State.cs[0] != ASCII && myConverterData.toU2022State.cs[0] != JISX201) { myConverterData.toU2022State.cs[0] = ASCII; } myConverterData.toU2022State.cs[2] = 0; myConverterData.toU2022State.g = 0; /* falls through */ default : /* convert one or two bytes */ myConverterData.isEmptySegment = false; cs = myConverterData.toU2022State.cs[myConverterData.toU2022State.g]; csTemp = cs; if (gotoGetTrail) { csTemp = (byte)0x99; } if (!gotoGetTrail && ((mySourceChar >= 0xa1) && (mySourceChar <= 0xdf) && myConverterData.version == 4 && !IS_JP_DBCS(cs))) { /* 8-bit halfwidth katakana in any single-byte mode for JIS8 */ targetUniChar = mySourceChar + (HWKANA_START - 0xa1); /* return from a single-shift state to the previous one */ if (myConverterData.toU2022State.g >= 2) { myConverterData.toU2022State.g = myConverterData.toU2022State.prevG; } } else { switch(csTemp) { case ASCII: if (mySourceChar <= 0x7f) { targetUniChar = mySourceChar; } break; case ISO8859_1: if (mySourceChar <= 0x7f) { targetUniChar = mySourceChar + 0x80; } /* return from a single-shift state to the prevous one */ myConverterData.toU2022State.g = myConverterData.toU2022State.prevG; break; case ISO8859_7: if (mySourceChar <= 0x7f) { /* convert mySourceChar+0x80 to use a normal 8-bit table */ targetUniChar = CharsetMBCS.MBCS_SINGLE_SIMPLE_GET_NEXT_BMP(myConverterData.myConverterArray[cs].mbcs, mySourceChar+0x80); } /* return from a single-shift state to the previous one */ myConverterData.toU2022State.g = myConverterData.toU2022State.prevG; break; case JISX201: if (mySourceChar <= 0x7f) { targetUniChar = jisx201ToU(mySourceChar); } break; case HWKANA_7BIT: if ((mySourceChar >= 0x21) && (mySourceChar <= 0x5f)) { /* 7-bit halfwidth Katakana */ targetUniChar = mySourceChar + (HWKANA_START - 0x21); break; } default : /* G0 DBCS */ if (gotoGetTrail || source.hasRemaining()) { // getTrailByte: int tmpSourceChar; gotoGetTrail = false; short trailByte; boolean leadIsOk, trailIsOk; trailByte = (short)(source.get(source.position()) & UConverterConstants.UNSIGNED_BYTE_MASK); /* * Ticket 5691: consistent illegal sequences: * - We include at least the first byte in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequence before the first one of those. * * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is * an ESC/SO/SI, we report only the first byte as the illegal sequence. * Otherwise we convert or report the pair of bytes. */ leadIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (mySourceChar - 0x21)) <= (0x7e - 0x21); trailIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (trailByte - 0x21)) <= (0x7e - 0x21); if (leadIsOk && trailIsOk) { source.get(); tmpSourceChar = (mySourceChar << 8) | trailByte; if (cs == JISX208) { _2022ToSJIS((char)mySourceChar, (char)trailByte, tempBuf); mySourceChar = tmpSourceChar; } else { /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ mySourceChar = tmpSourceChar; if (cs == KSC5601) { tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ } tempBuf[0] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & (tmpSourceChar >> 8)); tempBuf[1] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & tmpSourceChar); } targetUniChar = MBCSSimpleGetNextUChar(myConverterData.myConverterArray[cs], ByteBuffer.wrap(tempBuf), false); } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { /* report a pair of illegal bytes if the second byte is not a DBCS starter */ source.get(); /* add another bit so that the code below writes 2 bytes in case of error */ mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } } else { toUBytesArray[0] = (byte)mySourceChar; toULength = 1; // goto endloop return err; } } /* end of inner switch */ } break; } /* end of outer switch */ if (targetUniChar < (UConverterConstants.missingCharMarker-1/*0xfffe*/)) { if (offsets != null) { offsets.put(target.remaining(), source.remaining() - (mySourceChar <= 0xff ? 1 : 2)); } target.put((char)targetUniChar); } else if (targetUniChar > UConverterConstants.missingCharMarker) { /* disassemble the surrogate pair and write to output */ targetUniChar -= 0x0010000; target.put((char)(0xd800 + (char)(targetUniChar>>10))); target.position(target.position()-1); if (offsets != null) { offsets.put(target.remaining(), source.remaining() - (mySourceChar <= 0xff ? 1 : 2)); } target.get(); if (target.hasRemaining()) { target.put((char)(0xdc00+(char)(targetUniChar&0x3ff))); target.position(target.position()-1); if (offsets != null) { offsets.put(target.remaining(), source.remaining() - (mySourceChar <= 0xff ? 1 : 2)); } target.get(); } else { charErrorBufferArray[charErrorBufferLength++] = (char)(0xdc00+(char)(targetUniChar&0x3ff)); } } else { /* Call the callback function */ err = toUnicodeCallback(this, mySourceChar, targetUniChar); break; } } else { /* goes with "if (target.hasRemaining())" way up near the top of the function */ err = CoderResult.OVERFLOW; break; } } //endloop: return err; } } // end of class CharsetDecoderISO2022JP /****************************ISO-2022-CN************************************/ private class CharsetDecoderISO2022CN extends CharsetDecoderICU { public CharsetDecoderISO2022CN(CharsetICU cs) { super(cs); } protected void implReset() { super.implReset(); myConverterData.reset(); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; byte[] tempBuf = new byte[3]; int targetUniChar = 0x0000; int mySourceChar = 0x0000; int mySourceCharTemp = 0x0000; boolean gotoEscape = false; boolean gotoGetTrailByte = false; if (myConverterData.key != 0) { /* continue with a partial escape sequence */ // goto escape; gotoEscape = true; } else if (toULength == 1 && source.hasRemaining() && target.hasRemaining()) { /* continue with a partial double-byte character */ mySourceChar = (toUBytesArray[0] & UConverterConstants.UNSIGNED_BYTE_MASK); toULength = 0; targetUniChar = UConverterConstants.missingCharMarker; // goto getTrailByte gotoGetTrailByte = true; } while (source.hasRemaining() || gotoGetTrailByte || gotoEscape) { targetUniChar = UConverterConstants.missingCharMarker; if (target.hasRemaining() || gotoEscape) { if (gotoEscape) { mySourceChar = ESC_2022; // goto escape label mySourceCharTemp = mySourceChar; } else if (gotoGetTrailByte) { mySourceCharTemp = 0xff; // goto getTrailByte; set mySourceCharTemp to go to default } else { mySourceChar = UConverterConstants.UNSIGNED_BYTE_MASK & source.get(); mySourceCharTemp = mySourceChar; } switch (mySourceCharTemp) { case UConverterConstants.SI: myConverterData.toU2022State.g = 0; if (myConverterData.isEmptySegment) { myConverterData.isEmptySegment = false; /* we are handling it, reset to avoid future spurious errors */ err = CoderResult.malformedForLength(1); this.toUBytesArray[0] = (byte)mySourceChar; this.toULength = 1; return err; } continue; case UConverterConstants.SO: if (myConverterData.toU2022State.cs[1] != 0) { myConverterData.toU2022State.g = 1; myConverterData.isEmptySegment = true; /* Begin a new segment, empty so far */ continue; } else { /* illegal to have SO before a matching designator */ myConverterData.isEmptySegment = false; /* Handling a different error, reset this to avoid future spurious errs */ break; } case ESC_2022: if (!gotoEscape) { source.position(source.position()-1); } // escape label gotoEscape = false; { int mySourceBefore = source.position(); int toULengthBefore = this.toULength; err = changeState_2022(this, source, ISO_2022_CN); /* After SO there must be at least one character before a designator (designator error handled separately) */ if(myConverterData.key == 0 && !err.isError() && myConverterData.isEmptySegment) { err = CoderResult.malformedForLength(source.position() - mySourceBefore); this.toULength = toULengthBefore + (source.position() - mySourceBefore); } } /* invalid or illegal escape sequence */ if(err.isError()){ myConverterData.isEmptySegment = false; /* Reset to avoid future spurious errors */ return err; } continue; /*ISO-2022-CN does not use single-byte (C1) SS2 and SS3 */ case CR: /* falls through */ case LF: myConverterData.toU2022State.reset(); /* falls through */ default: /* converter one or two bytes */ myConverterData.isEmptySegment = false; if (myConverterData.toU2022State.g != 0 || gotoGetTrailByte) { if (source.hasRemaining() || gotoGetTrailByte) { UConverterSharedData cnv; byte tempState; int tempBufLen; boolean leadIsOk, trailIsOk; short trailByte; // getTrailByte: label gotoGetTrailByte = false; // reset gotoGetTrailByte trailByte = (short)(source.get(source.position()) & UConverterConstants.UNSIGNED_BYTE_MASK); /* * Ticket 5691: consistent illegal sequences: * - We include at least the first byte in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequence before the first one of those. * * In ISO-2022 DBCS, if the second byte is in the range 21..7e range or is * an ESC/SO/SI, we report only the first byte as the illegal sequence. * Otherwise we convert or report the pair of bytes. */ leadIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (mySourceChar - 0x21)) <= (0x7e - 0x21); trailIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (trailByte - 0x21)) <= (0x7e - 0x21); if (leadIsOk && trailIsOk) { source.get(); tempState = myConverterData.toU2022State.cs[myConverterData.toU2022State.g]; if (tempState > CNS_11643_0) { cnv = myConverterData.myConverterArray[CNS_11643]; tempBuf[0] = (byte)(0x80 + (tempState - CNS_11643_0)); tempBuf[1] = (byte)mySourceChar; tempBuf[2] = (byte)trailByte; tempBufLen = 3; } else { cnv = myConverterData.myConverterArray[tempState]; tempBuf[0] = (byte)mySourceChar; tempBuf[1] = (byte)trailByte; tempBufLen = 2; } ByteBuffer tempBuffer = ByteBuffer.wrap(tempBuf); tempBuffer.limit(tempBufLen); targetUniChar = MBCSSimpleGetNextUChar(cnv, tempBuffer, false); mySourceChar = (mySourceChar << 8) | trailByte; } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { /* report a pair of illegal bytes if the second byte is not a DBCS starter */ source.get(); /* add another bit so that the code below writes 2 bytes in case of error */ mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } if (myConverterData.toU2022State.g >= 2) { /* return from a single-shift state to the previous one */ myConverterData.toU2022State.g = myConverterData.toU2022State.prevG; } } else { toUBytesArray[0] = (byte)mySourceChar; toULength = 1; // goto endloop; return err; } } else { if (mySourceChar <= 0x7f) { targetUniChar = (char)mySourceChar; } } break; } if ((UConverterConstants.UNSIGNED_INT_MASK&targetUniChar) < (UConverterConstants.UNSIGNED_INT_MASK&(UConverterConstants.missingCharMarker-1))) { if (offsets != null) { offsets.array()[target.position()] = source.remaining() - (mySourceChar <= 0xff ? 1 : 2); } target.put((char)targetUniChar); } else if ((UConverterConstants.UNSIGNED_INT_MASK&targetUniChar) > (UConverterConstants.UNSIGNED_INT_MASK&(UConverterConstants.missingCharMarker))) { /* disassemble the surrogate pair and write to output */ targetUniChar -= 0x0010000; target.put((char)(0xd800+(char)(targetUniChar>>10))); if (offsets != null) { offsets.array()[target.position()-1] = (int)(source.position() - (mySourceChar <= 0xff ? 1 : 2)); } if (target.hasRemaining()) { target.put((char)(0xdc00+(char)(targetUniChar&0x3ff))); if (offsets != null) { offsets.array()[target.position()-1] = (int)(source.position() - (mySourceChar <= 0xff ? 1 : 2)); } } else { charErrorBufferArray[charErrorBufferLength++] = (char)(0xdc00+(char)(targetUniChar&0x3ff)); } } else { /* Call the callback function */ err = toUnicodeCallback(this, mySourceChar, targetUniChar); break; } } else { err = CoderResult.OVERFLOW; break; } } return err; } } /************************ ISO-2022-KR ********************/ private class CharsetDecoderISO2022KR extends CharsetDecoderICU { public CharsetDecoderISO2022KR(CharsetICU cs) { super(cs); } protected void implReset() { super.implReset(); setInitialStateToUnicodeKR(); myConverterData.reset(); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; int mySourceChar = 0x0000; int targetUniChar = 0x0000; byte[] tempBuf = new byte[2]; boolean usingFallback; boolean gotoGetTrailByte = false; boolean gotoEscape = false; if (myConverterData.version == 1) { return decodeLoopIBM(myConverterData.currentDecoder, source, target, offsets, flush); } /* initialize state */ usingFallback = isFallbackUsed(); if (myConverterData.key != 0) { /* continue with a partial escape sequence */ gotoEscape = true; } else if (toULength == 1 && source.hasRemaining() && target.hasRemaining()) { /* continue with a partial double-byte character */ mySourceChar = (toUBytesArray[0] & UConverterConstants.UNSIGNED_BYTE_MASK); toULength = 0; gotoGetTrailByte = true; } while (source.hasRemaining() || gotoGetTrailByte || gotoEscape) { if (target.hasRemaining() || gotoGetTrailByte || gotoEscape) { if (!gotoGetTrailByte && !gotoEscape) { mySourceChar = (char)(source.get() & UConverterConstants.UNSIGNED_BYTE_MASK); } if (!gotoGetTrailByte && !gotoEscape && mySourceChar == UConverterConstants.SI) { myConverterData.toU2022State.g = 0; if (myConverterData.isEmptySegment) { myConverterData.isEmptySegment = false; /* we are handling it, reset to avoid future spurious errors */ err = CoderResult.malformedForLength(1); this.toUBytesArray[0] = (byte)mySourceChar; this.toULength = 1; return err; } /* consume the source */ continue; } else if (!gotoGetTrailByte && !gotoEscape && mySourceChar == UConverterConstants.SO) { myConverterData.toU2022State.g = 1; myConverterData.isEmptySegment = true; /* consume the source */ continue; } else if (!gotoGetTrailByte && (gotoEscape || mySourceChar == ESC_2022)) { if (!gotoEscape) { source.position(source.position()-1); } // escape label gotoEscape = false; // reset gotoEscape flag myConverterData.isEmptySegment = false; /* Any invalid ESC sequences will be detected separately, so just reset this */ err = changeState_2022(this, source, ISO_2022_KR); if (err.isError()) { return err; } continue; } myConverterData.isEmptySegment = false; /* Any invalid char errors will be detected separately, so just reset this */ if (myConverterData.toU2022State.g == 1 || gotoGetTrailByte) { if (source.hasRemaining() || gotoGetTrailByte) { boolean leadIsOk, trailIsOk; short trailByte; // getTrailByte label gotoGetTrailByte = false; // reset gotoGetTrailByte flag trailByte = (short)(source.get(source.position()) & UConverterConstants.UNSIGNED_BYTE_MASK); targetUniChar = UConverterConstants.missingCharMarker; /* * Ticket 5691: consistent illegal sequences: * - We include at least the first byte in the illegal sequence. * - If any of the non-initial bytes could be the start of a character, * we stop the illegal sequence before the first one of those. * * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is * an ESC/SO/SI, we report only the first byte as the illegal sequence. * Otherwise we convert or report the pair of bytes. */ leadIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (mySourceChar - 0x21)) <= (0x7e - 0x21); trailIsOk = (short)(UConverterConstants.UNSIGNED_BYTE_MASK & (trailByte - 0x21)) <= (0x7e - 0x21); if (leadIsOk && trailIsOk) { source.get(); tempBuf[0] = (byte)(mySourceChar + 0x80); tempBuf[1] = (byte)(trailByte + 0x80); targetUniChar = MBCSSimpleGetNextUChar(myConverterData.currentConverter.sharedData, ByteBuffer.wrap(tempBuf), usingFallback); mySourceChar = (char)((mySourceChar << 8) | trailByte); } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { /* report a pair of illegal bytes if the second byte is not a DBCS starter */ source.get(); /* add another bit so that the code below writes 2 bytes in case of error */ mySourceChar = (char)(0x10000 | (mySourceChar << 8) | trailByte); } } else { toUBytesArray[0] = (byte)mySourceChar; toULength = 1; break; } } else if (mySourceChar <= 0x7f) { int savedSourceLimit = source.limit(); int savedSourcePosition = source.position(); source.limit(source.position()); source.position(source.position()-1); targetUniChar = MBCSSimpleGetNextUChar(myConverterData.currentConverter.sharedData, source, usingFallback); source.limit(savedSourceLimit); source.position(savedSourcePosition); } else { targetUniChar = 0xffff; } if (targetUniChar < 0xfffe) { target.put((char)targetUniChar); if (offsets != null) { offsets.array()[target.position()] = source.position() - (mySourceChar <= 0xff ? 1 : 2); } } else { /* Call the callback function */ err = toUnicodeCallback(this, mySourceChar, targetUniChar); break; } } else { err = CoderResult.OVERFLOW; break; } } return err; } protected CoderResult decodeLoopIBM(CharsetDecoderMBCS cnv, ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; int sourceStart; int sourceLimit; int argSource; int argTarget; boolean gotoEscape = false; int oldSourceLimit; /* remember the original start of the input for offsets */ sourceStart = argSource = source.position(); if (myConverterData.key != 0) { /* continue with a partial escape sequence */ gotoEscape = true; } while (gotoEscape || (!err.isError() && source.hasRemaining())) { if (!gotoEscape) { /* Find the end of the buffer e.g : Next Escape Seq | end of Buffer */ int oldSourcePos = source.position(); sourceLimit = getEndOfBuffer_2022(source); source.position(oldSourcePos); if (source.position() != sourceLimit) { /* * get the current partial byte sequence * * it needs to be moved between the public and the subconverter * so that the conversion frameword, which only sees the public * converter, can handle truncated and illegal input etc. */ if (toULength > 0) { cnv.toUBytesArray = (byte[])(toUBytesArray.clone()); } cnv.toULength = toULength; /* * Convert up to the end of the input, or to before the next escape character. * Does not handle conversion extensions because the preToU[] state etc. * is not copied. */ argTarget = target.position(); oldSourceLimit = source.limit(); // save the old source limit change to new one source.limit(sourceLimit); err = myConverterData.currentDecoder.cnvMBCSToUnicodeWithOffsets(source, target, offsets, flush); source.limit(oldSourceLimit); // restore source limit; if (offsets != null && sourceStart != argSource) { /* update offsets to base them on the actual start of the input */ int delta = argSource - sourceStart; while (argTarget < target.position()) { int currentOffset = offsets.get(); offsets.position(offsets.position()-1); if (currentOffset >= 0) { offsets.put(currentOffset + delta); offsets.position(offsets.position()-1); } offsets.get(); target.get(); } } argSource = source.position(); /* copy input/error/overflow buffers */ if (cnv.toULength > 0) { toUBytesArray = (byte[])(cnv.toUBytesArray.clone()); } toULength = cnv.toULength; if (err.isOverflow()) { if (cnv.charErrorBufferLength > 0) { charErrorBufferArray = (char[])(cnv.charErrorBufferArray.clone()); } charErrorBufferLength = cnv.charErrorBufferLength; cnv.charErrorBufferLength = 0; } } if (err.isError() || err.isOverflow() || (source.position() == source.limit())) { return err; } } // escape label gotoEscape = false; err = changeState_2022(this, source, ISO_2022_KR); } return err; } } /******************** from unicode **********************/ /* preference order of JP charsets */ private final static byte []jpCharsetPref = { ASCII, JISX201, ISO8859_1, ISO8859_7, JISX208, JISX212, GB2312, KSC5601, HWKANA_7BIT }; /* * The escape sequences must be in order of the enum constants like JISX201 = 3, * not in order of jpCharsetPref[]! */ private final static byte [][]escSeqChars = { { 0x1B, 0x28, 0x42}, /* (B ASCII */ { 0x1B, 0x2E, 0x41}, /* .A ISO-8859-1 */ { 0x1B, 0x2E, 0x46}, /* .F ISO-8859-7 */ { 0x1B, 0x28, 0x4A}, /* (J JISX-201 */ { 0x1B, 0x24, 0x42}, /* $B JISX-208 */ { 0x1B, 0x24, 0x28, 0x44}, /* $(D JISX-212 */ { 0x1B, 0x24, 0x41}, /* $A GB2312 */ { 0x1B, 0x24, 0x28, 0x43}, /* $(C KSC5601 */ { 0x1B, 0x28, 0x49} /* (I HWKANA_7BIT */ }; /* * JIS X 0208 has fallbacks from Unicode half-width Katakana to full-width (DBCS) * Katakana. * Now that we use a Shift-JIS table for JIS X 0208 we need to hardcode these fallbacks * because Shift-JIS roundtrips half-width Katakana to single bytes. * These were the only fallbacks in ICU's jisx-208.ucm file. */ private final static char []hwkana_fb = { 0x2123, /* U+FF61 */ 0x2156, 0x2157, 0x2122, 0x2126, 0x2572, 0x2521, 0x2523, 0x2525, 0x2527, 0x2529, 0x2563, 0x2565, 0x2567, 0x2543, 0x213C, /* U+FF70 */ 0x2522, 0x2524, 0x2526, 0x2528, 0x252A, 0x252B, 0x252D, 0x252F, 0x2531, 0x2533, 0x2535, 0x2537, 0x2539, 0x253B, 0x253D, 0x253F, /* U+FF80 */ 0x2541, 0x2544, 0x2546, 0x2548, 0x254A, 0x254B, 0x254C, 0x254D, 0x254E, 0x254F, 0x2552, 0x2555, 0x2558, 0x255B, 0x255E, 0x255F, /* U+FF90 */ 0x2560, 0x2561, 0x2562, 0x2564, 0x2566, 0x2568, 0x2569, 0x256A, 0x256B, 0x256C, 0x256D, 0x256F, 0x2573, 0x212B, 0x212C /* U+FF9F */ }; protected byte [][]fromUSubstitutionChar = new byte[][]{ { (byte)0x1A }, { (byte)0x2F, (byte)0x7E} }; /****************************ISO-2022-JP************************************/ private class CharsetEncoderISO2022JP extends CharsetEncoderICU { public CharsetEncoderISO2022JP(CharsetICU cs) { super(cs, fromUSubstitutionChar[0]); } protected void implReset() { super.implReset(); myConverterData.reset(); } /* Map Unicode to 00..7F according to JIS X 0201. Return U+FFFE if unmappable. */ private int jisx201FromU(int value) { if (value <= 0x7f) { if (value != 0x5c && value != 0x7e) { return value; } } else if (value == 0xa5) { return 0x5c; } else if (value == 0x203e) { return 0x7e; } return (int)(UConverterConstants.UNSIGNED_INT_MASK & 0xfffe); } /* * Take a valid Shift-JIS byte pair, check that it is in the range corresponding * to JIS X 0208, and convert it to a pair of 21..7E bytes. * Return 0 if the byte pair is out of range. */ private int _2022FromSJIS(int value) { short trail; if (value > 0xEFFC) { return 0; /* beyond JIS X 0208 */ } trail = (short)(value & UConverterConstants.UNSIGNED_BYTE_MASK); value &= 0xff00; /* lead byte */ if (value <= 0x9f00) { value -= 0x7000; } else { /* 0xe000 <= value <= 0xef00 */ value -= 0xb000; } value <<= 1; if (trail <= 0x9e) { value -= 0x100; if (trail <= 0x7e) { value |= ((trail - 0x1f) & UConverterConstants.UNSIGNED_BYTE_MASK); } else { value |= ((trail - 0x20) & UConverterConstants.UNSIGNED_BYTE_MASK); } } else { /* trail <= 0xfc */ value |= ((trail - 0x7e) & UConverterConstants.UNSIGNED_BYTE_MASK); } return value; } /* This overrides the cbFromUWriteSub method in CharsetEncoderICU */ CoderResult cbFromUWriteSub (CharsetEncoderICU encoder, CharBuffer source, ByteBuffer target, IntBuffer offsets){ CoderResult err = CoderResult.UNDERFLOW; byte[] buffer = new byte[8]; int i = 0; byte[] subchar; subchar = encoder.replacement(); byte cs; if (myConverterData.fromU2022State.g == 1) { /* JIS7: switch from G1 to G0 */ myConverterData.fromU2022State.g = 0; buffer[i++] = UConverterConstants.SI; } cs = myConverterData.fromU2022State.cs[0]; if (cs != ASCII && cs != JISX201) { /* not in ASCII or JIS X 0201: switch to ASCII */ myConverterData.fromU2022State.cs[0] = ASCII; buffer[i++] = 0x1B; buffer[i++] = 0x28; buffer[i++] = 0x42; } buffer[i++] = subchar[0]; err = CharsetEncoderICU.fromUWriteBytes(this, buffer, 0, i, target, offsets, source.position() - 1); return err; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; int sourceChar; byte cs, g; int choiceCount; int len, outLen; byte[] choices = new byte[10]; int targetValue = 0; boolean usingFallback; byte[] buffer = new byte[8]; boolean getTrail = false; // use for getTrail label int oldSourcePos; // for proper error handling choiceCount = 0; /* check if the last codepoint of previous buffer was a lead surrogate */ if ((sourceChar = fromUChar32) != 0 && target.hasRemaining()) { getTrail = true; } while (getTrail || source.hasRemaining()) { if (getTrail || target.hasRemaining()) { oldSourcePos = source.position(); if (!getTrail) { /* skip if going to getTrail label */ sourceChar = source.get(); } /* check if the char is a First surrogate */ if (getTrail || UTF16.isSurrogate((char)sourceChar)) { if (getTrail || UTF16.isLeadSurrogate((char)sourceChar)) { // getTrail: if (getTrail) { getTrail = false; } /* look ahead to find the trail surrogate */ if (source.hasRemaining()) { /* test the following code unit */ char trail = source.get(); /* go back to the previous position */ source.position(source.position()-1); if (UTF16.isTrailSurrogate(trail)) { source.get(); sourceChar = UCharacter.getCodePoint((char)sourceChar, trail); fromUChar32 = 0x00; /* convert this supplementary code point */ /* exit this condition tree */ } else { /* this is an unmatched lead code unit (1st surrogate) */ /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } } else { /* no more input */ fromUChar32 = sourceChar; break; } } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } } /* do not convert SO/SI/ESC */ if (IS_2022_CONTROL(sourceChar)) { /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } /* do the conversion */ if (choiceCount == 0) { char csm; /* * The csm variable keeps track of which charsets are allowed * and not used yet while building the choices[]. */ csm = (char)jpCharsetMasks[myConverterData.version]; choiceCount = 0; /* JIS7/8: try single-byte half-width Katakana before JISX208 */ if (myConverterData.version == 3 || myConverterData.version == 4) { choices[choiceCount++] = HWKANA_7BIT; } /* Do not try single-bit half-width Katakana for other versions. */ csm &= ~CSM(HWKANA_7BIT); /* try the current G0 charset */ choices[choiceCount++] = cs = myConverterData.fromU2022State.cs[0]; csm &= ~CSM(cs); /* try the current G2 charset */ if ((cs = myConverterData.fromU2022State.cs[2]) != 0) { choices[choiceCount++] = cs; csm &= ~CSM(cs); } /* try all the other charsets */ for (int i = 0; i < jpCharsetPref.length; i++) { cs = jpCharsetPref[i]; if ((CSM(cs) & csm) != 0) { choices[choiceCount++] = cs; csm &= ~CSM(cs); } } } cs = g = 0; /* * len==0: no mapping found yet * len<0: found a fallback result: continue looking for a roundtrip but no further fallbacks * len>0: found a roundtrip result, done */ len = 0; /* * We will turn off usingFallBack after finding a fallback, * but we still get fallbacks from PUA code points as usual. * Therefore, we will also need to check that we don't overwrite * an early fallback with a later one. */ usingFallback = useFallback; for (int i = 0; i < choiceCount && len <= 0; i++) { int[] value = new int[1]; int len2; byte cs0 = choices[i]; switch (cs0) { case ASCII: if (sourceChar <= 0x7f) { targetValue = sourceChar; len = 1; cs = cs0; g = 0; } break; case ISO8859_1: if (GR96_START <= sourceChar && sourceChar <= GR96_END) { targetValue = sourceChar - 0x80; len = 1; cs = cs0; g = 2; } break; case HWKANA_7BIT: if (sourceChar <= HWKANA_END && sourceChar >= HWKANA_START) { if (myConverterData.version == 3) { /* JIS7: use G1 (SO) */ /* Shift U+FF61..U+FF9F to bytes 21..5F. */ targetValue = (int)(UConverterConstants.UNSIGNED_INT_MASK & (sourceChar - (HWKANA_START - 0x21))); len = 1; myConverterData.fromU2022State.cs[1] = cs = cs0; /* do not output an escape sequence */ g = 1; } else if (myConverterData.version == 4) { /* JIS8: use 8-bit bytes with any single-byte charset, see escape sequence output below */ /* Shift U+FF61..U+FF9F to bytes A1..DF. */ targetValue = (int)(UConverterConstants.UNSIGNED_INT_MASK & (sourceChar - (HWKANA_START - 0xa1))); len = 1; cs = myConverterData.fromU2022State.cs[0]; if (IS_JP_DBCS(cs)) { /* switch from a DBCS charset to JISX201 */ cs = JISX201; } /* else stay in the current G0 charset */ g = 0; } /* else do not use HWKANA_7BIT with other versions */ } break; case JISX201: /* G0 SBCS */ value[0] = jisx201FromU(sourceChar); if (value[0] <= 0x7f) { targetValue = value[0]; len = 1; cs = cs0; g = 0; usingFallback = false; } break; case JISX208: /* G0 DBCS from JIS table */ myConverterData.currentConverter.sharedData = myConverterData.myConverterArray[cs0]; myConverterData.currentConverter.sharedData.mbcs.outputType = CharsetMBCS.MBCS_OUTPUT_2; len2 = myConverterData.currentEncoder.fromUChar32(sourceChar, value, usingFallback); //len2 = MBCSFromUChar32_ISO2022(myConverterData.myConverterArray[cs0], sourceChar, value, usingFallback, CharsetMBCS.MBCS_OUTPUT_2); if (len2 == 2 || (len2 == -2 && len == 0)) { /* only accept DBCS: abs(len) == 2 */ value[0] = _2022FromSJIS(value[0]); if (value[0] != 0) { targetValue = value[0]; len = len2; cs = cs0; g = 0; usingFallback = false; } } else if (len == 0 && usingFallback && sourceChar <= HWKANA_END && sourceChar >= HWKANA_START) { targetValue = hwkana_fb[sourceChar - HWKANA_START]; len = -2; cs = cs0; g = 0; usingFallback = false; } break; case ISO8859_7: /* G0 SBCS forced to 7-bit output */ len2 = MBCSSingleFromUChar32(myConverterData.myConverterArray[cs0], sourceChar, value, usingFallback); if (len2 != 0 && !(len2 < 0 && len != 0) && GR96_START <= value[0] && value[0] <= GR96_END) { targetValue = value[0] - 0x80; len = len2; cs = cs0; g = 2; usingFallback = false; } break; default : /* G0 DBCS */ myConverterData.currentConverter.sharedData = myConverterData.myConverterArray[cs0]; myConverterData.currentConverter.sharedData.mbcs.outputType = CharsetMBCS.MBCS_OUTPUT_2; len2 = myConverterData.currentEncoder.fromUChar32(sourceChar, value, usingFallback); //len2 = MBCSFromUChar32_ISO2022(myConverterData.myConverterArray[cs0], sourceChar, value, usingFallback, CharsetMBCS.MBCS_OUTPUT_2); if (len2 == 2 || (len2 == -2 && len == 0)) { /* only accept DBCS: abs(len)==2 */ if (cs0 == KSC5601) { /* * Check for valid bytes for the encoding scheme. * This is necessary because the sub-converter (windows-949) * has a broader encoding scheme than is valid for 2022. */ value[0] = _2022FromGR94DBCS(value[0]); if (value[0] == 0) { break; } } targetValue = value[0]; len = len2; cs = cs0; g = 0; usingFallback = false; } break; } } if (len != 0) { if (len < 0) { len = -len; /* fallback */ } outLen = 0; /* write SI if necessary (only for JIS7 */ if (myConverterData.fromU2022State.g == 1 && g == 0) { buffer[outLen++] = UConverterConstants.SI; myConverterData.fromU2022State.g = 0; } /* write the designation sequence if necessary */ if (cs != myConverterData.fromU2022State.cs[g]) { for (int i = 0; i < escSeqChars[cs].length; i++) { buffer[outLen++] = escSeqChars[cs][i]; } myConverterData.fromU2022State.cs[g] = cs; /* invalidate the choices[] */ choiceCount = 0; } /* write the shift sequence if necessary */ if (g != myConverterData.fromU2022State.g) { switch (g) { /* case 0 handled before writing escapes */ case 1: buffer[outLen++] = UConverterConstants.SO; myConverterData.fromU2022State.g = 1; break; default : /* case 2 */ buffer[outLen++] = 0x1b; buffer[outLen++] = 0x4e; break; /* case 3: no SS3 in ISO-2022-JP-x */ } } /* write the output bytes */ if (len == 1) { buffer[outLen++] = (byte)targetValue; } else { /* len == 2 */ buffer[outLen++] = (byte)(targetValue >> 8); buffer[outLen++] = (byte)targetValue; } }else { /* * if we cannot find the character after checking all codepages * then this is an error. */ err = CoderResult.unmappableForLength(source.position()-oldSourcePos); fromUChar32 = sourceChar; break; } if (sourceChar == CR || sourceChar == LF) { /* reset the G2 state at the end of a line (conversion got use into ASCII or JISX201 already) */ myConverterData.fromU2022State.cs[2] = 0; choiceCount = 0; } /* output outLen>0 bytes in buffer[] */ if (outLen == 1) { target.put(buffer[0]); if (offsets != null) { offsets.put(source.remaining() - 1); /* -1 known to be ASCII */ } } else if (outLen == 2 && (target.position() + 2) <= target.limit()) { target.put(buffer[0]); target.put(buffer[1]); if (offsets != null) { int sourceIndex = source.position() - 1; offsets.put(sourceIndex); offsets.put(sourceIndex); } } else { err = CharsetEncoderICU.fromUWriteBytes(this, buffer, 0, outLen, target, offsets, source.position()-1); } } else { err = CoderResult.OVERFLOW; break; } } /* * the end of the input stream and detection of truncated input * are handled by the framework, but for ISO-2022-JP conversion * we need to be in ASCII mode at the very end * * conditions: * successful * in SO mode or not in ASCII mode * end of input and no truncated input */ if (!err.isError() && (myConverterData.fromU2022State.g != 0 || myConverterData.fromU2022State.cs[0] != ASCII) && flush && !source.hasRemaining() && fromUChar32 == 0) { int sourceIndex; outLen = 0; if (myConverterData.fromU2022State.g != 0) { buffer[outLen++] = UConverterConstants.SI; myConverterData.fromU2022State.g = 0; } if (myConverterData.fromU2022State.cs[0] != ASCII) { for (int i = 0; i < escSeqChars[ASCII].length; i++) { buffer[outLen++] = escSeqChars[ASCII][i]; } myConverterData.fromU2022State.cs[0] = ASCII; } /* get the source index of the last input character */ sourceIndex = source.position(); if (sourceIndex > 0) { --sourceIndex; if (UTF16.isTrailSurrogate(source.get(sourceIndex)) && (sourceIndex == 0 || UTF16.isLeadSurrogate(source.get(sourceIndex-1)))) { --sourceIndex; } } else { sourceIndex = -1; } err = CharsetEncoderICU.fromUWriteBytes(this, buffer, 0, outLen, target, offsets, sourceIndex); } return err; } } /****************************ISO-2022-CN************************************/ /* * Rules for ISO-2022-CN Encoding: * i) The designator sequence must appear once on a line before any instance * of chracter set it designates. * ii) If two lines contain characters from the same character set, both lines * must include the designator sequence. * iii) Once the designator sequence is known, a shifting sequence has to be found * to invoke the shifting * iv) All lines start in ASCII and end in ASCII. * v) Four shifting sequences are employed for this purpose: * Sequence ASCII Eq Charsets * --------- --------- -------- * SI US-ASCII * SO CNS-11643-1992 Plane 1, GB2312, ISO-IR-165 * SS2 N CNS-11643-1992 Plane 2 * SS3 O CNS-11643-1992 Planes 3-7 * vi) * SOdesignator : ESC "$" ")" finalchar_for_SO * SS2designator : ESC "$" "*" finalchar_for_SS2 * SS3designator : ESC "$" "+" finalchar_for_SS3 * * ESC $ ) A Indicates the bytes following SO are Chinese * characters as defined in GB 2312-80, until * another SOdesignation appears * * ESC $ ) E Indicates the bytes following SO are as defined * in ISO-IR-165 (for details, see section 2.1), * until another SOdesignation appears * * ESC $ ) G Indicates the bytes following SO are as defined * in CNS 11643-plane-1, until another SOdesignation appears * * ESC $ * H Indicates teh two bytes immediately following * SS2 is a Chinese character as defined in CNS * 11643-plane-2, until another SS2designation * appears * (Meaning N must preceed ever 2 byte sequence.) * * ESC $ + I Indicates the immediate two bytes following SS3 * is a Chinese character as defined in CNS * 11643-plane-3, until another SS3designation * appears * (Meaning O must preceed every 2 byte sequence.) * * ESC $ + J Indicates the immediate two bytes following SS3 * is a Chinese character as defined in CNS * 11643-plane-4, until another SS3designation * appears * (In English: O must preceed every 2 byte sequence.) * * ESC $ + K Indicates the immediate two bytes following SS3 * is a Chinese character as defined in CNS * 11643-plane-5, until another SS3designation * appears * * ESC $ + L Indicates the immediate two bytes following SS3 * is a Chinese character as defined in CNS * 11643-plane-6, until another SS3designation * appears * * ESC $ + M Indicates the immediate two bytes following SS3 * is a Chinese character as defined in CNS * 11643-plane-7, until another SS3designation * appears * * As in ISO-2022-CN, each line starts in ASCII, and ends in ASCII, and * has its own designation information before any Chinese chracters * appears */ /* The following are defined this way to make strings truely readonly */ private final static byte[] GB_2312_80_STR = { 0x1B, 0x24, 0x29, 0x41 }; private final static byte[] ISO_IR_165_STR = { 0x1B, 0x24, 0x29, 0x45 }; private final static byte[] CNS_11643_1992_Plane_1_STR = { 0x1B, 0x24, 0x29, 0x47 }; private final static byte[] CNS_11643_1992_Plane_2_STR = { 0x1B, 0x24, 0x2A, 0x48 }; private final static byte[] CNS_11643_1992_Plane_3_STR = { 0x1B, 0x24, 0x2B, 0x49 }; private final static byte[] CNS_11643_1992_Plane_4_STR = { 0x1B, 0x24, 0x2B, 0x4A }; private final static byte[] CNS_11643_1992_Plane_5_STR = { 0x1B, 0x24, 0x2B, 0x4B }; private final static byte[] CNS_11643_1992_Plane_6_STR = { 0x1B, 0x24, 0x2B, 0x4C }; private final static byte[] CNS_11643_1992_Plane_7_STR = { 0x1B, 0x24, 0x2B, 0x4D }; /************************ ISO2022-CN Data *****************************/ private final static byte[][] escSeqCharsCN = { SHIFT_IN_STR, GB_2312_80_STR, ISO_IR_165_STR, CNS_11643_1992_Plane_1_STR, CNS_11643_1992_Plane_2_STR, CNS_11643_1992_Plane_3_STR, CNS_11643_1992_Plane_4_STR, CNS_11643_1992_Plane_5_STR, CNS_11643_1992_Plane_6_STR, CNS_11643_1992_Plane_7_STR, }; private class CharsetEncoderISO2022CN extends CharsetEncoderICU { public CharsetEncoderISO2022CN(CharsetICU cs) { super(cs, fromUSubstitutionChar[0]); } protected void implReset() { super.implReset(); myConverterData.reset(); } /* This overrides the cbFromUWriteSub method in CharsetEncoderICU */ CoderResult cbFromUWriteSub (CharsetEncoderICU encoder, CharBuffer source, ByteBuffer target, IntBuffer offsets){ CoderResult err = CoderResult.UNDERFLOW; byte[] buffer = new byte[8]; int i = 0; byte[] subchar; subchar = encoder.replacement(); if (myConverterData.fromU2022State.g != 0) { /* not in ASCII mode: switch to ASCII */ myConverterData.fromU2022State.g = 0; buffer[i++] = UConverterConstants.SI; } buffer[i++] = subchar[0]; err = CharsetEncoderICU.fromUWriteBytes(this, buffer, 0, i, target, offsets, source.position() - 1); return err; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; int sourceChar; byte[] buffer = new byte[8]; int len; byte[] choices = new byte[3]; int choiceCount; int targetValue = 0; boolean usingFallback; boolean gotoGetTrail = false; int oldSourcePos; // For proper error handling choiceCount = 0; /* check if the last codepoint of previous buffer was a lead surrogate */ if ((sourceChar = fromUChar32) != 0 && target.hasRemaining()) { // goto getTrail label gotoGetTrail = true; } while (source.hasRemaining() || gotoGetTrail) { if (target.hasRemaining() || gotoGetTrail) { oldSourcePos = source.position(); if (!gotoGetTrail) { sourceChar = source.get(); } /* check if the char is a First surrogate */ if (UTF16.isSurrogate((char)sourceChar) || gotoGetTrail) { if (UTF16.isLeadSurrogate((char)sourceChar) || gotoGetTrail) { // getTrail label /* reset gotoGetTrail flag*/ gotoGetTrail = false; /* look ahead to find the trail surrogate */ if (source.hasRemaining()) { /* test the following code unit */ char trail = source.get(); source.position(source.position()-1); if (UTF16.isTrailSurrogate(trail)) { source.get(); sourceChar = UCharacter.getCodePoint((char)sourceChar, trail); fromUChar32 = 0x00; /* convert this supplementary code point */ /* exit this condition tree */ } else { /* this is an unmatched lead code unit (1st surrogate) */ /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } } else { /* no more input */ fromUChar32 = sourceChar; break; } } else { /* this is an unmatched trail code unit (2nd surrogate) */ /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } } /* do the conversion */ if (sourceChar <= 0x007f) { /* do not converter SO/SI/ESC */ if (IS_2022_CONTROL(sourceChar)) { /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } /* US-ASCII */ if (myConverterData.fromU2022State.g == 0) { buffer[0] = (byte)sourceChar; len = 1; } else { buffer[0] = UConverterConstants.SI; buffer[1] = (byte)sourceChar; len = 2; myConverterData.fromU2022State.g = 0; choiceCount = 0; } if (sourceChar == CR || sourceChar == LF) { /* reset the state at the end of a line */ myConverterData.fromU2022State.reset(); choiceCount = 0; } } else { /* convert U+0080..U+10ffff */ int i; byte cs, g; if (choiceCount == 0) { /* try the current SO/G1 converter first */ choices[0] = myConverterData.fromU2022State.cs[1]; /* default to GB2312_1 if none is designated yet */ if (choices[0] == 0) { choices[0] = GB2312_1; } if (myConverterData.version == 0) { /* ISO-2022-CN */ /* try other SO/G1 converter; a CNS_11643_1 lookup may result in any plane */ if (choices[0] == GB2312_1) { choices[1] = CNS_11643_1; } else { choices[1] = GB2312_1; } choiceCount = 2; } else { /* ISO-2022-CN-EXT */ /* try one of the other converters */ switch (choices[0]) { case GB2312_1: choices[1] = CNS_11643_1; choices[2] = ISO_IR_165; break; case ISO_IR_165: choices[1] = GB2312_1; choices[2] = CNS_11643_1; break; default : choices[1] = GB2312_1; choices[2] = ISO_IR_165; break; } choiceCount = 3; } } cs = g = 0; /* * len==0: no mapping found yet * len<0: found a fallback result: continue looking for a roundtrip but no further fallbacks * len>0: found a roundtrip result, done */ len = 0; /* * We will turn off usingFallback after finding a fallback, * but we still get fallbacks from PUA code points as usual. * Therefore, we will also need to check that we don't overwrite * an early fallback with a later one. */ usingFallback = useFallback; for (i = 0; i < choiceCount && len <= 0; ++i) { byte cs0 = choices[i]; if (cs0 > 0) { int[] value = new int[1]; int len2; if (cs0 > CNS_11643_0) { myConverterData.currentConverter.sharedData = myConverterData.myConverterArray[CNS_11643]; myConverterData.currentConverter.sharedData.mbcs.outputType = CharsetMBCS.MBCS_OUTPUT_3; len2 = myConverterData.currentEncoder.fromUChar32(sourceChar, value, usingFallback); //len2 = MBCSFromUChar32_ISO2022(myConverterData.myConverterArray[CNS_11643], // sourceChar, value, usingFallback, CharsetMBCS.MBCS_OUTPUT_3); if (len2 == 3 || (len2 == -3 && len == 0)) { targetValue = value[0]; cs = (byte)(CNS_11643_0 + (value[0] >> 16) - 0x80); if (len2 >= 0) { len = 2; } else { len = -2; usingFallback = false; } if (cs == CNS_11643_1) { g = 1; } else if (cs == CNS_11643_2) { g = 2; } else if (myConverterData.version == 1) { /* plane 3..7 */ g = 3; } else { /* ISO-2022-CN (without -EXT) does not support plane 3..7 */ len = 0; } } } else { /* GB2312_1 or ISO-IR-165 */ myConverterData.currentConverter.sharedData = myConverterData.myConverterArray[cs0]; myConverterData.currentConverter.sharedData.mbcs.outputType = CharsetMBCS.MBCS_OUTPUT_2; len2 = myConverterData.currentEncoder.fromUChar32(sourceChar, value, usingFallback); //len2 = MBCSFromUChar32_ISO2022(myConverterData.myConverterArray[cs0], // sourceChar, value, usingFallback, CharsetMBCS.MBCS_OUTPUT_2); if (len2 == 2 || (len2 == -2 && len == 0)) { targetValue = value[0]; len = len2; cs = cs0; g = 1; usingFallback = false; } } } } if (len != 0) { len = 0; /* count output bytes; it must have ben abs(len) == 2 */ /* write the designation sequence if necessary */ if (cs != myConverterData.fromU2022State.cs[g]) { if (cs < CNS_11643) { for (int n = 0; n < escSeqCharsCN[cs].length; n++) { buffer[n] = escSeqCharsCN[cs][n]; } } else { for (int n = 0; n < escSeqCharsCN[CNS_11643 + (cs - CNS_11643_1)].length; n++) { buffer[n] = escSeqCharsCN[CNS_11643 + (cs - CNS_11643_1)][n]; } } len = 4; myConverterData.fromU2022State.cs[g] = cs; if (g == 1) { /* changing the SO/G1 charset invalidates the choices[] */ choiceCount = 0; } } /* write the shift sequence if necessary */ if (g != myConverterData.fromU2022State.g) { switch (g) { case 1: buffer[len++] = UConverterConstants.SO; /* set the new state only if it is the locking shift SO/G1, not for SS2 or SS3 */ myConverterData.fromU2022State.g = 1; break; case 2: buffer[len++] = 0x1b; buffer[len++] = 0x4e; break; default: /* case 3 */ buffer[len++] = 0x1b; buffer[len++] = 0x4f; break; } } /* write the two output bytes */ buffer[len++] = (byte)(targetValue >> 8); buffer[len++] = (byte)targetValue; } else { /* if we cannot find the character after checking all codepages * then this is an error */ err = CoderResult.unmappableForLength(source.position()-oldSourcePos); fromUChar32 = sourceChar; break; } } /* output len>0 bytes in buffer[] */ if (len == 1) { target.put(buffer[0]); if (offsets != null) { offsets.put(source.position()-1); } } else if (len == 2 && (target.remaining() >= 2)) { target.put(buffer[0]); target.put(buffer[1]); if (offsets != null) { int sourceIndex = source.position(); offsets.put(sourceIndex); offsets.put(sourceIndex); } } else { err = CharsetEncoderICU.fromUWriteBytes(this, buffer, 0, len, target, offsets, source.position()-1); if (err.isError()) { break; } } } else { err = CoderResult.OVERFLOW; break; } } /* end while (source.hasRemaining() */ /* * the end of the input stream and detection of truncated input * are handled by the framework, but for ISO-2022-CN conversion * we need to be in ASCII mode at the very end * * condtions: * succesful * not in ASCII mode * end of input and no truncated input */ if (!err.isError() && myConverterData.fromU2022State.g != 0 && flush && !source.hasRemaining() && fromUChar32 == 0) { int sourceIndex; /* we are switching to ASCII */ myConverterData.fromU2022State.g = 0; /* get the source index of the last input character */ sourceIndex = source.position(); if (sourceIndex > 0) { --sourceIndex; if (UTF16.isTrailSurrogate(source.get(sourceIndex)) && (sourceIndex == 0 || UTF16.isLeadSurrogate(source.get(sourceIndex-1)))) { --sourceIndex; } } else { sourceIndex = -1; } err = CharsetEncoderICU.fromUWriteBytes(this, SHIFT_IN_STR, 0, 1, target, offsets, sourceIndex); } return err; } } /******************************** ISO-2022-KR *****************************/ /* * Rules for ISO-2022-KR encoding * i) The KSC5601 designator sequence should appear only once in a file, * at the begining of a line before any KSC5601 characters. This usually * means that it appears by itself on the first line of the file * ii) There are only 2 shifting sequences SO to shift into double byte mode * and SI to shift into single byte mode */ private class CharsetEncoderISO2022KR extends CharsetEncoderICU { public CharsetEncoderISO2022KR(CharsetICU cs) { super(cs, fromUSubstitutionChar[myConverterData.version]); } protected void implReset() { super.implReset(); myConverterData.reset(); setInitialStateFromUnicodeKR(this); } /* This overrides the cbFromUWriteSub method in CharsetEncoderICU */ CoderResult cbFromUWriteSub (CharsetEncoderICU encoder, CharBuffer source, ByteBuffer target, IntBuffer offsets){ CoderResult err = CoderResult.UNDERFLOW; byte[] buffer = new byte[8]; int length, i = 0; byte[] subchar; subchar = encoder.replacement(); length = subchar.length; if (myConverterData.version == 0) { if (length == 1) { if (encoder.fromUnicodeStatus != 0) { /* in DBCS mode: switch to SBCS */ encoder.fromUnicodeStatus = 0; buffer[i++] = UConverterConstants.SI; } buffer[i++] = subchar[0]; } else { /* length == 2 */ if (encoder.fromUnicodeStatus == 0) { /* in SBCS mode: switch to DBCS */ encoder.fromUnicodeStatus = 1; buffer[i++] = UConverterConstants.SO; } buffer[i++] = subchar[0]; buffer[i++] = subchar[1]; } err = CharsetEncoderICU.fromUWriteBytes(this, buffer, 0, i, target, offsets, source.position() - 1); } else { /* save the subvonverter's substitution string */ byte[] currentSubChars = myConverterData.currentEncoder.replacement(); /* set our substitution string into the subconverter */ myConverterData.currentEncoder.replaceWith(subchar); myConverterData.currentConverter.subChar1 = fromUSubstitutionChar[0][0]; /* let the subconverter write the subchar, set/retrieve fromUChar32 state */ myConverterData.currentEncoder.fromUChar32 = encoder.fromUChar32; err = myConverterData.currentEncoder.cbFromUWriteSub(myConverterData.currentEncoder, source, target, offsets); encoder.fromUChar32 = myConverterData.currentEncoder.fromUChar32; /* restore the subconverter's substitution string */ myConverterData.currentEncoder.replaceWith(currentSubChars); if (err.isOverflow()) { if (myConverterData.currentEncoder.errorBufferLength > 0) { encoder.errorBuffer = (byte[])(myConverterData.currentEncoder.errorBuffer.clone()); } encoder.errorBufferLength = myConverterData.currentEncoder.errorBufferLength; myConverterData.currentEncoder.errorBufferLength = 0; } } return err; } private CoderResult encodeLoopIBM(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; myConverterData.currentEncoder.fromUChar32 = fromUChar32; err = myConverterData.currentEncoder.cnvMBCSFromUnicodeWithOffsets(source, target, offsets, flush); fromUChar32 = myConverterData.currentEncoder.fromUChar32; if (err.isOverflow()) { if (myConverterData.currentEncoder.errorBufferLength > 0) { errorBuffer = (byte[])(myConverterData.currentEncoder.errorBuffer.clone()); } errorBufferLength = myConverterData.currentEncoder.errorBufferLength; myConverterData.currentEncoder.errorBufferLength = 0; } return err; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult err = CoderResult.UNDERFLOW; int[] targetByteUnit = { 0x0000 }; int sourceChar = 0x0000; boolean isTargetByteDBCS; boolean oldIsTargetByteDBCS; boolean usingFallback; int length = 0; boolean gotoGetTrail = false; // for goto getTrail label call /* * if the version is 1 then the user is requesting * conversion with ibm-25546 pass the argument to * MBCS converter and return */ if (myConverterData.version == 1) { return encodeLoopIBM(source, target, offsets, flush); } usingFallback = useFallback; isTargetByteDBCS = fromUnicodeStatus == 0 ? false : true; if ((sourceChar = fromUChar32) != 0 && target.hasRemaining()) { gotoGetTrail = true; } while (source.hasRemaining() || gotoGetTrail) { targetByteUnit[0] = UConverterConstants.missingCharMarker; if (target.hasRemaining() || gotoGetTrail) { if (!gotoGetTrail) { sourceChar = source.get(); /* do not convert SO/SI/ESC */ if (IS_2022_CONTROL(sourceChar)) { /* callback(illegal) */ err = CoderResult.malformedForLength(1); fromUChar32 = sourceChar; break; } myConverterData.currentConverter.sharedData.mbcs.outputType = CharsetMBCS.MBCS_OUTPUT_2; length = myConverterData.currentEncoder.fromUChar32(sourceChar, targetByteUnit, usingFallback); //length = MBCSFromUChar32_ISO2022(myConverterData.currentConverter.sharedData, sourceChar, targetByteUnit, usingFallback, CharsetMBCS.MBCS_OUTPUT_2); if (length < 0) { length = -length; /* fallback */ } /* only DBCS or SBCS characters are expected */ /* DB characters with high bit set to 1 are expected */ if (length > 2 || length == 0 || (length == 1 && targetByteUnit[0] > 0x7f) || (length ==2 && ((char)(targetByteUnit[0] - 0xa1a1) > (0xfefe - 0xa1a1) || ((targetByteUnit[0] - 0xa1) & UConverterConstants.UNSIGNED_BYTE_MASK) > (0xfe - 0xa1)))) { targetByteUnit[0] = UConverterConstants.missingCharMarker; } } if (!gotoGetTrail && targetByteUnit[0] != UConverterConstants.missingCharMarker) { oldIsTargetByteDBCS = isTargetByteDBCS; isTargetByteDBCS = (targetByteUnit[0] > 0x00FF); /* append the shift sequence */ if (oldIsTargetByteDBCS != isTargetByteDBCS) { if (isTargetByteDBCS) { target.put((byte)UConverterConstants.SO); } else { target.put((byte)UConverterConstants.SI); } if (offsets != null) { offsets.put(source.position()-1); } } /* write the targetUniChar to target */ if (targetByteUnit[0] <= 0x00FF) { if (target.hasRemaining()) { target.put((byte)targetByteUnit[0]); if (offsets != null) { offsets.put(source.position()-1); } } else { errorBuffer[errorBufferLength++] = (byte)targetByteUnit[0]; err = CoderResult.OVERFLOW; } } else { if (target.hasRemaining()) { target.put((byte)(UConverterConstants.UNSIGNED_BYTE_MASK & ((targetByteUnit[0]>>8) - 0x80))); if (offsets != null) { offsets.put(source.position()-1); } if (target.hasRemaining()) { target.put((byte)(UConverterConstants.UNSIGNED_BYTE_MASK & (targetByteUnit[0]- 0x80))); if (offsets != null) { offsets.put(source.position()-1); } } else { errorBuffer[errorBufferLength++] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & (targetByteUnit[0] - 0x80)); err = CoderResult.OVERFLOW; } } else { errorBuffer[errorBufferLength++] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & ((targetByteUnit[0]>>8) - 0x80)); errorBuffer[errorBufferLength++] = (byte)(UConverterConstants.UNSIGNED_BYTE_MASK & (targetByteUnit[0]- 0x80)); err = CoderResult.OVERFLOW; } } } else { /* oops.. the code point is unassigned * set the error and reason */ /* check if the char is a First surrogate */ if (gotoGetTrail || UTF16.isSurrogate((char)sourceChar)) { if (gotoGetTrail || UTF16.isLeadSurrogate((char)sourceChar)) { // getTrail label // reset gotoGetTrail flag gotoGetTrail = false; /* look ahead to find the trail surrogate */ if (source.hasRemaining()) { /* test the following code unit */ char trail = source.get(); source.position(source.position()-1); if (UTF16.isTrailSurrogate(trail)) { source.get(); sourceChar = UCharacter.getCodePoint((char)sourceChar, trail); err = CoderResult.unmappableForLength(2); /* convert this surrogate code point */ /* exit this condition tree */ } else { /* this is an unmatched lead code unit (1st surrogate) */ /* callback(illegal) */ err = CoderResult.malformedForLength(1); } } else { /* no more input */ err = CoderResult.UNDERFLOW; } } else { /* this is an unmatched trail code unit (2nd surrogate ) */ /* callback(illegal) */ err = CoderResult.malformedForLength(1); } } else { /* callback(unassigned) for a BMP code point */ err = CoderResult.unmappableForLength(1); } fromUChar32 = sourceChar; break; } } else { err = CoderResult.OVERFLOW; break; } } /* * the end of the input stream and detection of truncated input * are handled by the framework, but for ISO-2022-KR conversion * we need to be inASCII mode at the very end * * conditions: * successful * not in ASCII mode * end of input and no truncated input */ if (!err.isError() && isTargetByteDBCS && flush && !source.hasRemaining() && fromUChar32 == 0) { int sourceIndex; /* we are switching to ASCII */ isTargetByteDBCS = false; /* get the source index of the last input character */ sourceIndex = source.position(); if (sourceIndex > 0) { --sourceIndex; if (UTF16.isTrailSurrogate(source.get(sourceIndex)) && UTF16.isLeadSurrogate(source.get(sourceIndex-1))) { --sourceIndex; } } else { sourceIndex = -1; } CharsetEncoderICU.fromUWriteBytes(this, SHIFT_IN_STR, 0, 1, target, offsets, sourceIndex); } /*save the state and return */ fromUnicodeStatus = isTargetByteDBCS ? 1 : 0; return err; } } public CharsetDecoder newDecoder() { switch (variant) { case ISO_2022_JP: return new CharsetDecoderISO2022JP(this); case ISO_2022_CN: return new CharsetDecoderISO2022CN(this); case ISO_2022_KR: setInitialStateToUnicodeKR(); return new CharsetDecoderISO2022KR(this); default: /* should not happen */ return null; } } public CharsetEncoder newEncoder() { CharsetEncoderICU cnv; switch (variant) { case ISO_2022_JP: return new CharsetEncoderISO2022JP(this); case ISO_2022_CN: return new CharsetEncoderISO2022CN(this); case ISO_2022_KR: cnv = new CharsetEncoderISO2022KR(this); setInitialStateFromUnicodeKR(cnv); return cnv; default: /* should not happen */ return null; } } private void setInitialStateToUnicodeKR() { if (myConverterData.version == 1) { myConverterData.currentDecoder.toUnicodeStatus = 0; /* offset */ myConverterData.currentDecoder.mode = 0; /* state */ myConverterData.currentDecoder.toULength = 0; /* byteIndex */ } } private void setInitialStateFromUnicodeKR(CharsetEncoderICU cnv) { /* ISO-2022-KR the designator sequence appears only once * in a file so we append it only once */ if (cnv.errorBufferLength == 0) { cnv.errorBufferLength = 4; cnv.errorBuffer[0] = 0x1b; cnv.errorBuffer[1] = 0x24; cnv.errorBuffer[2] = 0x29; cnv.errorBuffer[3] = 0x43; } if (myConverterData.version == 1) { ((CharsetMBCS)myConverterData.currentEncoder.charset()).subChar1 = 0x1A; myConverterData.currentEncoder.fromUChar32 = 0; myConverterData.currentEncoder.fromUnicodeStatus = 1; /* prevLength */ } } void getUnicodeSetImpl(UnicodeSet setFillIn, int which) { int i; /*open a set and initialize it with code points that are algorithmically round-tripped */ switch(variant){ case ISO_2022_JP: /*include JIS X 0201 which is hardcoded */ setFillIn.add(0xa5); setFillIn.add(0x203e); if((jpCharsetMasks[myConverterData.version]&CSM(ISO8859_1))!=0){ /*include Latin-1 some variants of JP */ setFillIn.add(0, 0xff); } else { /* include ASCII for JP */ setFillIn.add(0, 0x7f); } if(myConverterData.version==3 || myConverterData.version==4 ||which == ROUNDTRIP_AND_FALLBACK_SET){ /* * Do not test(jpCharsetMasks[myConverterData.version]&CSM(HWKANA_7BIT))!=0 because the bit * is on for all JP versions although version 3 & 4 (JIS7 and JIS8) use half-width Katakana. * This is because all ISO_2022_JP variant are lenient in that they accept (in toUnicode) half-width * Katakana via ESC. * However, we only emit (fromUnicode) half-width Katakana according to the * definition of each variant. * * When including fallbacks, * we need to include half-width Katakana Unicode code points for all JP variants because * JIS X 0208 has hardcoded fallbacks for them (which map to full-width Katakana). */ /* include half-width Katakana for JP */ setFillIn.add(HWKANA_START, HWKANA_END); } break; case ISO_2022_CN: /* Include ASCII for CN */ setFillIn.add(0, 0x7f); break; case ISO_2022_KR: /* there is only one converter for KR */ myConverterData.currentConverter.getUnicodeSetImpl(setFillIn, which); break; default: break; } //TODO Replaced by ucnv_MBCSGetFilteredUnicodeSetForUnicode() until for(i=0; i */ // (char)(c - 93) < 4 || /* ]^_` */ // (char)(c - 123) < 3 || /* {|} */ // (c==58) || (c==63) /* *@[ */ // ); //} private static boolean isCRLFTAB(char c) { return ( (c==13) || (c==10) || (c==9) ); } //private static boolean isCRLFSPTAB(char c) { // return ( // (c==32) || (c==13) || (c==10) || (c==9) // ); //} private static final byte PLUS=43; private static final byte MINUS=45; private static final byte BACKSLASH=92; //private static final byte TILDE=126; private static final byte AMPERSAND=0x26; private static final byte COMMA=0x2c; private static final byte SLASH=0x2f; // legal byte values: all US-ASCII graphic characters 0x20..0x7e private static boolean isLegal(char c, boolean useIMAP) { if (useIMAP) { return ( (0x20 <= c) && (c <= 0x7e) ); } else { return ( ((char)(c - 32) < 94 && (c != BACKSLASH)) || isCRLFTAB(c) ); } } // directly encode all of printable ASCII 0x20..0x7e except '&' 0x26 private static boolean inSetDIMAP(char c) { return ( (isLegal(c, true) && c != AMPERSAND) ); } private static byte TO_BASE64_IMAP(int n) { return (n < 63 ? TO_BASE_64[n] : COMMA); } private static byte FROM_BASE64_IMAP(char c) { return (c==COMMA ? 63 : c==SLASH ? -1 : FROM_BASE_64[c]); } /* encode directly sets D and O and CR LF SP TAB */ private static final byte ENCODE_DIRECTLY_MAXIMUM[] = { /*0 1 2 3 4 5 6 7 8 9 a b c d e f*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0 }; /* encode directly set D and CR LF SP TAB but not set O */ private static final byte ENCODE_DIRECTLY_RESTRICTED[] = { /*0 1 2 3 4 5 6 7 8 9 a b c d e f*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0 }; private static final byte TO_BASE_64[] = { /* A-Z */ 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, /* a-z */ 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, /* 0-9 */ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, /* +/ */ 43, 47 }; private static final byte FROM_BASE_64[] = { /* C0 controls, -1 for legal ones (CR LF TAB), -3 for illegal ones */ -3, -3, -3, -3, -3, -3, -3, -3, -3, -1, -1, -3, -3, -1, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, /* general punctuation with + and / and a special value (-2) for - */ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -2, -1, 63, /* digits */ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, /* A-Z */ -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -3, -1, -1, -1, /* a-z*/ -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -3, -3 }; class CharsetDecoderUTF7 extends CharsetDecoderICU { public CharsetDecoderUTF7(CharsetICU cs) { super(cs); implReset(); } protected void implReset() { super.implReset(); toUnicodeStatus=(toUnicodeStatus & 0xf0000000) | 0x1000000; } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult cr=CoderResult.UNDERFLOW; byte base64Value; byte base64Counter; byte inDirectMode; char bits; int byteIndex; int sourceIndex, nextSourceIndex; int length; char b; char c; int sourceArrayIndex=source.position(); //get the state of the machine state { int status=toUnicodeStatus; inDirectMode=(byte)((status >> 24) & 1); base64Counter=(byte)(status >> 16); bits=(char)status; } byteIndex=toULength; /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex=byteIndex==0 ? 0 : -1; nextSourceIndex=0; directMode: while (true) { if (inDirectMode==1) { /* * In Direct Mode, most US-ASCII characters are encoded directly, i.e., * with their US-ASCII byte values. * Backslash and Tilde and most control characters are not alled in UTF-7. * A plus sign starts Unicode (or "escape") Mode. * An ampersand starts Unicode Mode for IMAP. * * In Direct Mode, only the sourceIndex is used. */ byteIndex=0; length=source.remaining(); //targetCapacity=target.remaining(); //Commented out because length of source may be larger than target when it comes to bytes /*if (useIMAP && length > targetCapacity) { length=targetCapacity; }*/ while (length > 0) { b=(char)(source.get()); sourceArrayIndex++; if (!isLegal(b, useIMAP)) { toUBytesArray[0]=(byte)b; byteIndex=1; cr=CoderResult.malformedForLength(sourceArrayIndex); break; } else if ((!useIMAP && b!=PLUS) || (useIMAP && b!=AMPERSAND)) { // write directly encoded character if (target.hasRemaining()) { // Check to make sure that there is room in target. target.put(b); if (offsets!= null) { offsets.put(sourceIndex++); } } else { // Get out and set the CoderResult. break; } } else { /* PLUS or (AMPERSAND in IMAP)*/ /* switch to Unicode mode */ nextSourceIndex=++sourceIndex; inDirectMode=0; byteIndex=0; bits=0; base64Counter=-1; continue directMode; } --length; }//end of while if (source.hasRemaining() && target.position() >= target.limit()) { /* target is full */ cr=CoderResult.OVERFLOW; } break directMode; } else { /* Unicode Mode*/ /* * In Unicode Mode, UTF-16BE is base64-encoded. * The base64 sequence ends with any character that is not in the base64 alphabet. * A terminating minus sign is consumed. * * In Unicode Mode, the sourceIndex has the index to the start of the current * base64 bytes, while nextSourceIndex is precisely parallel to source, * keeping the index to the following byte. */ while(source.hasRemaining()) { if (target.hasRemaining()) { b=(char)source.get(); sourceArrayIndex++; toUBytesArray[byteIndex++]=(byte)b; if ((!useIMAP && b>=126) || (useIMAP && b>0x7e)) { /* illegal - test other illegal US-ASCII values by base64Value==-3 */ inDirectMode=1; cr=CoderResult.malformedForLength(sourceArrayIndex); break directMode; } else if (((base64Value=FROM_BASE_64[b])>=0 && !useIMAP) || ((base64Value=FROM_BASE64_IMAP(b))>=0) && useIMAP) { /* collect base64 bytes */ switch (base64Counter) { case -1: /* -1 is immediately after the + */ case 0: bits=(char)base64Value; base64Counter=1; break; case 1: case 3: case 4: case 6: bits=(char)((bits<<6) | base64Value); ++base64Counter; break; case 2: c=(char)((bits<<4) | (base64Value>>2)); if (useIMAP && isLegal(c, useIMAP)) { // illegal inDirectMode=1; cr=CoderResult.malformedForLength(sourceArrayIndex); // goto endloop; break directMode; } target.put(c); if (offsets != null) { offsets.put(sourceIndex); sourceIndex=nextSourceIndex - 1; } toUBytesArray[0]=(byte)b; /* keep this byte in case an error occurs */ byteIndex=1; bits=(char)(base64Value&3); base64Counter=3; break; case 5: c=(char)((bits<<2) | (base64Value>>4)); if(useIMAP && isLegal(c, useIMAP)) { // illegal inDirectMode=1; cr=CoderResult.malformedForLength(sourceArrayIndex); // goto endloop; break directMode; } target.put(c); if (offsets != null) { offsets.put(sourceIndex); sourceIndex=nextSourceIndex - 1; } toUBytesArray[0]=(byte)b; /* keep this byte in case an error occurs */ byteIndex=1; bits=(char)(base64Value&15); base64Counter=6; break; case 7: c=(char)((bits<<6) | base64Value); if (useIMAP && isLegal(c, useIMAP)) { // illegal inDirectMode=1; cr=CoderResult.malformedForLength(sourceArrayIndex); // goto endloop; break directMode; } target.put(c); if (offsets != null) { offsets.put(sourceIndex); sourceIndex=nextSourceIndex; } byteIndex=0; bits=0; base64Counter=0; break; //default: /* will never occur */ //break; }//end of switch } else if (base64Value==-2) { /* minus sign terminates the base64 sequence */ inDirectMode=1; if (base64Counter==-1) { /* +- i.e. a minus immediately following a plus */ target.put(useIMAP ? (char)AMPERSAND : (char)PLUS); if (offsets != null) { offsets.put(sourceIndex - 1); } } else { /* absorb the minus and leave the Unicode Mode */ if (bits!=0 || (useIMAP && base64Counter!=0 && base64Counter!=3 && base64Counter!=6)) { /*bits are illegally left over, a unicode character is incomplete */ cr=CoderResult.malformedForLength(sourceArrayIndex); break; } } sourceIndex=nextSourceIndex; continue directMode; } else if (!useIMAP && base64Value==-1) { /* for any legal character except base64 and minus sign */ /* leave the Unicode Mode */ inDirectMode=1; if (base64Counter==-1) { /* illegal: + immediately followed by something other than base64 minus sign */ /* include the plus sign in the reported sequence */ --sourceIndex; toUBytesArray[0]=(byte)PLUS; toUBytesArray[1]=(byte)b; byteIndex=2; cr=CoderResult.malformedForLength(sourceArrayIndex); break; } else if (bits==0) { /* un-read the character in case it is a plus sign */ source.position(--sourceArrayIndex); sourceIndex=nextSourceIndex - 1; continue directMode; } else { /* bits are illegally left over, a unicode character is incomplete */ cr=CoderResult.malformedForLength(sourceArrayIndex); break; } } else { if (useIMAP && base64Counter==-1) { // illegal: & immediately followed by something other than base64 or minus sign // include the ampersand in the reported sequence --sourceIndex; toUBytesArray[0]=(byte)AMPERSAND; toUBytesArray[1]=(byte)b; byteIndex=2; } /* base64Value==-3 for illegal characters */ /* illegal */ inDirectMode=1; cr=CoderResult.malformedForLength(sourceArrayIndex); break; } } else { /* target is full */ cr=CoderResult.OVERFLOW; break; } } //end of while break directMode; } }//end of direct mode label if (useIMAP) { if (!cr.isError() && inDirectMode==0 && flush && byteIndex==0 && !source.hasRemaining()) { if (base64Counter==-1) { /* & at the very end of the input */ /* make the ampersand the reported sequence */ toUBytesArray[0]=(byte)AMPERSAND; byteIndex=1; } /* else if (base64Counter!=-1) byteIndex remains 0 because ther is no particular byte sequence */ inDirectMode=1; cr=CoderResult.malformedForLength(sourceIndex); } } else { if (!cr.isError() && flush && !source.hasRemaining() && bits ==0) { /* * if we are in Unicode Mode, then the byteIndex might not be 0, * but that is ok if bits -- 0 * -> we set byteIndex=0 at the end of the stream to avoid a truncated error * (not true for IMAP-mailbox-name where we must end in direct mode) */ if (!cr.isOverflow()) { byteIndex=0; } } } /* set the converter state */ toUnicodeStatus=((int)inDirectMode<<24 | (int)(((short)base64Counter & UConverterConstants.UNSIGNED_BYTE_MASK)<<16) | (int)bits); toULength=byteIndex; return cr; } } class CharsetEncoderUTF7 extends CharsetEncoderICU { public CharsetEncoderUTF7(CharsetICU cs) { super(cs, fromUSubstitution); implReset(); } protected void implReset() { super.implReset(); fromUnicodeStatus=(fromUnicodeStatus & 0xf0000000) | 0x1000000; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult cr=CoderResult.UNDERFLOW; byte inDirectMode; byte encodeDirectly[]; int status; int length, targetCapacity, sourceIndex; byte base64Counter; char bits; char c; char b; /* get the state machine state */ { status=fromUnicodeStatus; encodeDirectly=(((long)status) < 0x10000000) ? ENCODE_DIRECTLY_MAXIMUM : ENCODE_DIRECTLY_RESTRICTED; inDirectMode=(byte)((status >> 24) & 1); base64Counter=(byte)(status >> 16); bits=(char)((byte)status); } /* UTF-7 always encodes UTF-16 code units, therefore we need only a simple sourceIndex */ sourceIndex=0; directMode: while(true) { if(inDirectMode==1) { length=source.remaining(); targetCapacity=target.remaining(); if(length > targetCapacity) { length=targetCapacity; } while (length > 0) { c=source.get(); /* UTF7: currently always encode CR LF SP TAB directly */ /* IMAP: encode 0x20..0x7e except '&' directly */ if ((!useIMAP && c<=127 && encodeDirectly[c]==1) || (useIMAP && inSetDIMAP(c))) { /* encode directly */ target.put((byte)c); if (offsets != null) { offsets.put(sourceIndex++); } } else if ((!useIMAP && c==PLUS) || (useIMAP && c==AMPERSAND)) { /* IMAP: output &- for & */ /* UTF-7: output +- for + */ target.put(useIMAP ? (byte)AMPERSAND : (byte)PLUS); if (target.hasRemaining()) { target.put((byte)MINUS); if (offsets != null) { offsets.put(sourceIndex); offsets.put(sourceIndex++); } /* realign length and targetCapacity */ continue directMode; } else { if (offsets != null) { offsets.put(sourceIndex++); } errorBuffer[0]=MINUS; errorBufferLength=1; cr=CoderResult.OVERFLOW; break; } } else { /* un-read this character and switch to unicode mode */ source.position(source.position() - 1); target.put(useIMAP ? (byte)AMPERSAND : (byte)PLUS); if (offsets != null) { offsets.put(sourceIndex); } inDirectMode=0; base64Counter=0; continue directMode; } --length; } //end of while if (source.hasRemaining() && !target.hasRemaining()) { /* target is full */ cr=CoderResult.OVERFLOW; } break directMode; } else { /* Unicode Mode */ while (source.hasRemaining()) { if (target.hasRemaining()) { c=source.get(); if ((!useIMAP && c<=127 && encodeDirectly[c]==1) || (useIMAP && isLegal(c, useIMAP))) { /* encode directly */ inDirectMode=1; /* trick: back out this character to make this easier */ source.position(source.position() - 1); /* terminate the base64 sequence */ if (base64Counter!=0) { /* write remaining bits for the previous character */ target.put(useIMAP ? TO_BASE64_IMAP(bits) : TO_BASE_64[bits]); if (offsets!=null) { offsets.put(sourceIndex-1); } } if (FROM_BASE_64[c]!=-1 || useIMAP) { /* need to terminate with a minus */ if (target.hasRemaining()) { target.put((byte)MINUS); if (offsets!=null) { offsets.put(sourceIndex-1); } } else { errorBuffer[0]=MINUS; errorBufferLength=1; cr=CoderResult.OVERFLOW; break; } } continue directMode; } else { /* * base64 this character: * Output 2 or 3 base64 bytres for the remaining bits of the previous character * and the bits of this character, each implicitly in UTF-16BE. * * Here, bits is an 8-bit variable because only 6 bits need to be kept from one * character to the next. The actual 2 or 4 bits are shifted to the left edge * of the 6-bits filed 5..0 to make the termination of the base64 sequence easier. */ switch (base64Counter) { case 0: b=(char)(c>>10); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (target.hasRemaining()) { b=(char)((c>>4)&0x3f); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (offsets!=null) { offsets.put(sourceIndex); offsets.put(sourceIndex++); } } else { if (offsets!=null) { offsets.put(sourceIndex++); } b=(char)((c>>4)&0x3f); errorBuffer[0]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; errorBufferLength=1; cr=CoderResult.OVERFLOW; } bits=(char)((c&15)<<2); base64Counter=1; break; case 1: b=(char)(bits|(c>>14)); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (target.hasRemaining()) { b=(char)((c>>8)&0x3f); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (target.hasRemaining()) { b=(char)((c>>2)&0x3f); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (offsets!=null) { offsets.put(sourceIndex); offsets.put(sourceIndex); offsets.put(sourceIndex++); } } else { if (offsets!=null) { offsets.put(sourceIndex); offsets.put(sourceIndex++); } b=(char)((c>>2)&0x3f); errorBuffer[0]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; errorBufferLength=1; cr=CoderResult.OVERFLOW; } } else { if (offsets!=null) { offsets.put(sourceIndex++); } b=(char)((c>>8)&0x3f); errorBuffer[0]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; b=(char)((c>>2)&0x3f); errorBuffer[1]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; errorBufferLength=2; cr=CoderResult.OVERFLOW; } bits=(char)((c&3)<<4); base64Counter=2; break; case 2: b=(char)(bits|(c>>12)); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (target.hasRemaining()) { b=(char)((c>>6)&0x3f); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (target.hasRemaining()) { b=(char)(c&0x3f); target.put(useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]); if (offsets!=null) { offsets.put(sourceIndex); offsets.put(sourceIndex); offsets.put(sourceIndex++); } } else { if (offsets!=null) { offsets.put(sourceIndex); offsets.put(sourceIndex++); } b=(char)(c&0x3f); errorBuffer[0]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; errorBufferLength=1; cr=CoderResult.OVERFLOW; } } else { if (offsets!=null) { offsets.put(sourceIndex++); } b=(char)((c>>6)&0x3f); errorBuffer[0]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; b=(char)(c&0x3f); errorBuffer[1]=useIMAP ? TO_BASE64_IMAP(b) : TO_BASE_64[b]; errorBufferLength=2; cr=CoderResult.OVERFLOW; } bits=0; base64Counter=0; break; //default: /* will never occur */ //break; } //end of switch } } else { /* target is full */ cr=CoderResult.OVERFLOW; break; } } //end of while break directMode; } } //end of directMode label if (flush && !source.hasRemaining()) { /* flush remaining bits to the target */ if (inDirectMode==0) { if (base64Counter!=0) { if (target.hasRemaining()) { target.put(useIMAP ? TO_BASE64_IMAP(bits) : TO_BASE_64[bits]); if (offsets!=null) { offsets.put(sourceIndex - 1); } } else { errorBuffer[errorBufferLength++]=useIMAP ? TO_BASE64_IMAP(bits) : TO_BASE_64[bits]; cr=CoderResult.OVERFLOW; } } if (useIMAP) { /* IMAP: need to terminate with a minus */ if (target.hasRemaining()) { target.put((byte)MINUS); if (offsets!=null) { offsets.put(sourceIndex - 1); } } else { errorBuffer[errorBufferLength++]=MINUS; cr=CoderResult.OVERFLOW; } } } /*reset the state for the next conversion */ fromUnicodeStatus=((status&0xf0000000) | 0x1000000); /* keep version, inDirectMode=TRUE */ } else { /* set the converter state back */ fromUnicodeStatus=((status&0xf0000000) | ((int)inDirectMode<<24) | (int)(((short)base64Counter & UConverterConstants.UNSIGNED_BYTE_MASK)<<16) | ((int)bits)); } return cr; } } public CharsetDecoder newDecoder() { return new CharsetDecoderUTF7(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderUTF7(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ getCompleteUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/package.html0000644000175000017500000000071011361046170021136 0ustar twernertwerner C:ICU4J .charset Package Overview

Enhanced charset conversion support.

CharsetICU, CharsetProviderICU, CharsetEncoderICU and CharsetDecoderICU provide conversion services for many charsets. icu4j-4.2/src/com/ibm/icu/charset/UConverterStaticData.java0000644000175000017500000000606011361046170023562 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2007, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; final class UConverterStaticData { /* +offset: size */ int structSize; /* +0: 4 Size of this structure */ String name; /* +4: 60 internal name of the converter- invariant chars */ int codepage; /* +64: 4 codepage # (now IBM-$codepage) */ byte platform; /* +68: 1 platform of the converter (only IBM now) */ byte conversionType; /* +69: 1 conversion type */ byte minBytesPerChar; /* +70: 1 Minimum # bytes per char in this codepage */ byte maxBytesPerChar; /* +71: 1 Maximum # bytes output per UChar in this codepage */ byte subChar[/*UCNV_MAX_SUBCHAR_LEN*/]; /* +72: 4 [note: 4 and 8 byte boundary] */ byte subCharLen; /* +76: 1 */ byte hasToUnicodeFallback; /* +77: 1 UBool needs to be changed to UBool to be consistent across platform */ byte hasFromUnicodeFallback; /* +78: 1 */ short unicodeMask; /* +79: 1 bit 0: has supplementary bit 1: has single surrogates */ byte subChar1; /* +80: 1 single-byte substitution character for IBM MBCS (0 if none) */ byte reserved[/*19*/]; /* +81: 19 to round out the structure */ /* total size: 100 */ public UConverterStaticData() { subChar = new byte[UConverterConstants.MAX_SUBCHAR_LEN]; reserved = new byte[19]; } /* public UConverterStaticData(int structSize_, String name_, int codepage_, byte platform_, byte conversionType_, byte minBytesPerChar_, byte maxBytesPerChar_, byte[] subChar_, byte subCharLen_, byte hasToUnicodeFallback_, byte hasFromUnicodeFallback_, short unicodeMask_, byte subChar1_, byte[] reserved_) { structSize = structSize_; name = name_; codepage = codepage_; platform = platform_; conversionType = conversionType_; minBytesPerChar = minBytesPerChar_; maxBytesPerChar = maxBytesPerChar_; subChar = new byte[UConverterConstants.MAX_SUBCHAR_LEN]; System.arraycopy(subChar_, 0, subChar, 0, (subChar.length < subChar_.length? subChar.length : subChar_.length)); subCharLen = subCharLen_; hasToUnicodeFallback = hasToUnicodeFallback_; hasFromUnicodeFallback = hasFromUnicodeFallback_; unicodeMask = unicodeMask_; subChar1 = subChar1_; reserved = new byte[19]; System.arraycopy(reserved_, 0, reserved, 0, (reserved.length < reserved_.length? reserved.length : reserved_.length)); }*/ public static final int SIZE_OF_UCONVERTER_STATIC_DATA = 100; } icu4j-4.2/src/com/ibm/icu/charset/UConverterDataReader.java0000644000175000017500000005770411361046170023550 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import com.ibm.icu.impl.ICUBinary; import java.io.IOException; import java.io.InputStream; import java.io.DataInputStream; import java.nio.ByteBuffer; /** * ucnvmbcs.h * * ICU conversion (.cnv) data file structure, following the usual UDataInfo * header. * * Format version: 6.2 * * struct UConverterStaticData -- struct containing the converter name, IBM CCSID, * min/max bytes per character, etc. * see ucnv_bld.h * * -------------------- * * The static data is followed by conversionType-specific data structures. * At the moment, there are only variations of MBCS converters. They all have * the same toUnicode structures, while the fromUnicode structures for SBCS * differ from those for other MBCS-style converters. * * _MBCSHeader.version 4.2 adds an optional conversion extension data structure. * If it is present, then an ICU version reading header versions 4.0 or 4.1 * will be able to use the base table and ignore the extension. * * The unicodeMask in the static data is part of the base table data structure. * Especially, the UCNV_HAS_SUPPLEMENTARY flag determines the length of the * fromUnicode stage 1 array. * The static data unicodeMask refers only to the base table's properties if * a base table is included. * In an extension-only file, the static data unicodeMask is 0. * The extension data indexes have a separate field with the unicodeMask flags. * * MBCS-style data structure following the static data. * Offsets are counted in bytes from the beginning of the MBCS header structure. * Details about usage in comments in ucnvmbcs.c. * * struct _MBCSHeader (see the definition in this header file below) * contains 32-bit fields as follows: * 8 values: * 0 uint8_t[4] MBCS version in UVersionInfo format (currently 4.2.0.0) * 1 uint32_t countStates * 2 uint32_t countToUFallbacks * 3 uint32_t offsetToUCodeUnits * 4 uint32_t offsetFromUTable * 5 uint32_t offsetFromUBytes * 6 uint32_t flags, bits: * 31.. 8 offsetExtension -- _MBCSHeader.version 4.2 (ICU 2.8) and higher * 0 for older versions and if * there is not extension structure * 7.. 0 outputType * 7 uint32_t fromUBytesLength -- _MBCSHeader.version 4.1 (ICU 2.4) and higher * counts bytes in fromUBytes[] * * if(outputType==MBCS_OUTPUT_EXT_ONLY) { * -- base table name for extension-only table * char baseTableName[variable]; -- with NUL plus padding for 4-alignment * * -- all _MBCSHeader fields except for version and flags are 0 * } else { * -- normal base table with optional extension * * int32_t stateTable[countStates][256]; * * struct _MBCSToUFallback { (fallbacks are sorted by offset) * uint32_t offset; * UChar32 codePoint; * } toUFallbacks[countToUFallbacks]; * * uint16_t unicodeCodeUnits[(offsetFromUTable-offsetToUCodeUnits)/2]; * (padded to an even number of units) * * -- stage 1 tables * if(staticData.unicodeMask&UCNV_HAS_SUPPLEMENTARY) { * -- stage 1 table for all of Unicode * uint16_t fromUTable[0x440]; (32-bit-aligned) * } else { * -- BMP-only tables have a smaller stage 1 table * uint16_t fromUTable[0x40]; (32-bit-aligned) * } * * -- stage 2 tables * length determined by top of stage 1 and bottom of stage 3 tables * if(outputType==MBCS_OUTPUT_1) { * -- SBCS: pure indexes * uint16_t stage 2 indexes[?]; * } else { * -- DBCS, MBCS, EBCDIC_STATEFUL, ...: roundtrip flags and indexes * uint32_t stage 2 flags and indexes[?]; * } * * -- stage 3 tables with byte results * if(outputType==MBCS_OUTPUT_1) { * -- SBCS: each 16-bit result contains flags and the result byte, see ucnvmbcs.c * uint16_t fromUBytes[fromUBytesLength/2]; * } else { * -- DBCS, MBCS, EBCDIC_STATEFUL, ... 2/3/4 bytes result, see ucnvmbcs.c * uint8_t fromUBytes[fromUBytesLength]; or * uint16_t fromUBytes[fromUBytesLength/2]; or * uint32_t fromUBytes[fromUBytesLength/4]; * } * } * * -- extension table, details see ucnv_ext.h * int32_t indexes[>=32]; ... */ /* * ucnv_ext.h * * See icuhtml/design/conversion/conversion_extensions.html * * Conversion extensions serve two purposes: * 1. They support m:n mappings. * 2. They support extension-only conversion files that are used together * with the regular conversion data in base files. * * A base file may contain an extension table (explicitly requested or * implicitly generated for m:n mappings), but its extension table is not * used when an extension-only file is used. * * It is an error if a base file contains any regular (not extension) mapping * from the same sequence as a mapping in the extension file * because the base mapping would hide the extension mapping. * * * Data for conversion extensions: * * One set of data structures per conversion direction (to/from Unicode). * The data structures are sorted by input units to allow for binary search. * Input sequences of more than one unit are handled like contraction tables * in collation: * The lookup value of a unit points to another table that is to be searched * for the next unit, recursively. * * For conversion from Unicode, the initial code point is looked up in * a 3-stage trie for speed, * with an additional table of unique results to save space. * * Long output strings are stored in separate arrays, with length and index * in the lookup tables. * Output results also include a flag distinguishing roundtrip from * (reverse) fallback mappings. * * Input Unicode strings must not begin or end with unpaired surrogates * to avoid problems with matches on parts of surrogate pairs. * * Mappings from multiple characters (code points or codepage state * table sequences) must be searched preferring the longest match. * For this to work and be efficient, the variable-width table must contain * all mappings that contain prefixes of the multiple characters. * If an extension table is built on top of a base table in another file * and a base table entry is a prefix of a multi-character mapping, then * this is an error. * * * Implementation note: * * Currently, the parser and several checks in the code limit the number * of UChars or bytes in a mapping to * UCNV_EXT_MAX_UCHARS and UCNV_EXT_MAX_BYTES, respectively, * which are output value limits in the data structure. * * For input, this is not strictly necessary - it is a hard limit only for the * buffers in UConverter that are used to store partial matches. * * Input sequences could otherwise be arbitrarily long if partial matches * need not be stored (i.e., if a sequence does not span several buffers with too * many units before the last buffer), although then results would differ * depending on whether partial matches exceed the limits or not, * which depends on the pattern of buffer sizes. * * * Data structure: * * int32_t indexes[>=32]; * * Array of indexes and lengths etc. The length of the array is at least 32. * The actual length is stored in indexes[0] to be forward compatible. * * Each index to another array is the number of bytes from indexes[]. * Each length of an array is the number of array base units in that array. * * Some of the structures may not be present, in which case their indexes * and lengths are 0. * * Usage of indexes[i]: * [0] length of indexes[] * * // to Unicode table * [1] index of toUTable[] (array of uint32_t) * [2] length of toUTable[] * [3] index of toUUChars[] (array of UChar) * [4] length of toUUChars[] * * // from Unicode table, not for the initial code point * [5] index of fromUTableUChars[] (array of UChar) * [6] index of fromUTableValues[] (array of uint32_t) * [7] length of fromUTableUChars[] and fromUTableValues[] * [8] index of fromUBytes[] (array of char) * [9] length of fromUBytes[] * * // from Unicode trie for initial-code point lookup * [10] index of fromUStage12[] (combined array of uint16_t for stages 1 & 2) * [11] length of stage 1 portion of fromUStage12[] * [12] length of fromUStage12[] * [13] index of fromUStage3[] (array of uint16_t indexes into fromUStage3b[]) * [14] length of fromUStage3[] * [15] index of fromUStage3b[] (array of uint32_t like fromUTableValues[]) * [16] length of fromUStage3b[] * * [17] Bit field containing numbers of bytes: * 31..24 reserved, 0 * 23..16 maximum input bytes * 15.. 8 maximum output bytes * 7.. 0 maximum bytes per UChar * * [18] Bit field containing numbers of UChars: * 31..24 reserved, 0 * 23..16 maximum input UChars * 15.. 8 maximum output UChars * 7.. 0 maximum UChars per byte * * [19] Bit field containing flags: * (extension table unicodeMask) * 1 UCNV_HAS_SURROGATES flag for the extension table * 0 UCNV_HAS_SUPPLEMENTARY flag for the extension table * * [20]..[30] reserved, 0 * [31] number of bytes for the entire extension structure * [>31] reserved; there are indexes[0] indexes * * * uint32_t toUTable[]; * * Array of byte/value pairs for lookups for toUnicode conversion. * The array is partitioned into sections like collation contraction tables. * Each section contains one word with the number of following words and * a default value for when the lookup in this section yields no match. * * A section is sorted in ascending order of input bytes, * allowing for fast linear or binary searches. * The builder may store entries for a contiguous range of byte values * (compare difference between the first and last one with count), * which then allows for direct array access. * The builder should always do this for the initial table section. * * Entries may have 0 values, see below. * No two entries in a section have the same byte values. * * Each uint32_t contains an input byte value in bits 31..24 and the * corresponding lookup value in bits 23..0. * Interpret the value as follows: * if(value==0) { * no match, see below * } else if(value<0x1f0000) { * partial match - use value as index to the next toUTable section * and match the next unit; (value indexes toUTable[value]) * } else { * if(bit 23 set) { * roundtrip; * } else { * fallback; * } * unset value bit 23; * if(value<=0x2fffff) { * (value-0x1f0000) is a code point; (BMP: value<=0x1fffff) * } else { * bits 17..0 (value&0x3ffff) is an index to * the result UChars in toUUChars[]; (0 indexes toUUChars[0]) * length of the result=((value>>18)-12); (length=0..19) * } * } * * The first word in a section contains the number of following words in the * input byte position (bits 31..24, number=1..0xff). * The value of the initial word is used when the current byte is not found * in this section. * If the value is not 0, then it represents a result as above. * If the value is 0, then the search has to return a shorter match with an * earlier default value as the result, or result in "unmappable" even for the * initial bytes. * If the value is 0 for the initial toUTable entry, then the initial byte * does not start any mapping input. * * * UChar toUUChars[]; * * Contains toUnicode mapping results, stored as sequences of UChars. * Indexes and lengths stored in the toUTable[]. * * * UChar fromUTableUChars[]; * uint32_t fromUTableValues[]; * * The fromUTable is split into two arrays, but works otherwise much like * the toUTable. The array is partitioned into sections like collation * contraction tables and toUTable. * A row in the table consists of same-index entries in fromUTableUChars[] * and fromUTableValues[]. * * Interpret a value as follows: * if(value==0) { * no match, see below * } else if(value<=0xffffff) { (bits 31..24 are 0) * partial match - use value as index to the next fromUTable section * and match the next unit; (value indexes fromUTable[value]) * } else { * if(value==0x80000001) { * return no mapping, but request for ; * } * if(bit 31 set) { * roundtrip; * } else { * fallback; * } * // bits 30..29 reserved, 0 * length=(value>>24)&0x1f; (bits 28..24) * if(length==1..3) { * bits 23..0 contain 1..3 bytes, padded with 00s on the left; * } else { * bits 23..0 (value&0xffffff) is an index to * the result bytes in fromUBytes[]; (0 indexes fromUBytes[0]) * } * } * * The first pair in a section contains the number of following pairs in the * UChar position (16 bits, number=1..0xffff). * The value of the initial pair is used when the current UChar is not found * in this section. * If the value is not 0, then it represents a result as above. * If the value is 0, then the search has to return a shorter match with an * earlier default value as the result, or result in "unmappable" even for the * initial UChars. * * If the from Unicode trie is present, then the from Unicode search tables * are not used for initial code points. * In this case, the first entries (index 0) in the tables are not used * (reserved, set to 0) because a value of 0 is used in trie results * to indicate no mapping. * * * uint16_t fromUStage12[]; * * Stages 1 & 2 of a trie that maps an initial code point. * Indexes in stage 1 are all offset by the length of stage 1 so that the * same array pointer can be used for both stages. * If (c>>10)>=(length of stage 1) then c does not start any mapping. * Same bit distribution as for regular conversion tries. * * * uint16_t fromUStage3[]; * uint32_t fromUStage3b[]; * * Stage 3 of the trie. The first array simply contains indexes to the second, * which contains words in the same format as fromUTableValues[]. * Use a stage 3 granularity of 4, which allows for 256k stage 3 entries, * and 16-bit entries in stage 3 allow for 64k stage 3b entries. * The stage 3 granularity means that the stage 2 entry needs to be left-shifted. * * Two arrays are used because it is expected that more than half of the stage 3 * entries will be zero. The 16-bit index stage 3 array saves space even * considering storing a total of 6 bytes per non-zero entry in both arrays * together. * Using a stage 3 granularity of >1 diminishes the compactability in that stage * but provides a larger effective addressing space in stage 2. * All but the final result stage use 16-bit entries to save space. * * fromUStage3b[] contains a zero for "no mapping" at its index 0, * and may contain UCNV_EXT_FROM_U_SUBCHAR1 at index 1 for " SUB mapping" * (i.e., "no mapping" with preference for rather than ), * and all other items are unique non-zero results. * * The default value of a fromUTableValues[] section that is referenced * _directly_ from a fromUStage3b[] item may also be UCNV_EXT_FROM_U_SUBCHAR1, * but this value must not occur anywhere else in fromUTableValues[] * because "no mapping" is always a property of a single code point, * never of multiple. * * * char fromUBytes[]; * * Contains fromUnicode mapping results, stored as sequences of chars. * Indexes and lengths stored in the fromUTableValues[]. */ final class UConverterDataReader implements ICUBinary.Authenticate { //private final static boolean debug = ICUDebug.enabled("UConverterDataReader"); /* * UConverterDataReader(UConverterDataReader r) { dataInputStream = new DataInputStream(r.dataInputStream); unicodeVersion = r.unicodeVersion; } */ /* the number bytes read from the stream */ int bytesRead = 0; /* the number of bytes read for static data */ int staticDataBytesRead = 0; /** *

Protected constructor.

* @param inputStream ICU uprop.dat file input stream * @exception IOException throw if data file fails authentication */ protected UConverterDataReader(InputStream inputStream) throws IOException{ //if(debug) System.out.println("Bytes in inputStream " + inputStream.available()); /*unicodeVersion = */ICUBinary.readHeader(inputStream, DATA_FORMAT_ID, this); //if(debug) System.out.println("Bytes left in inputStream " +inputStream.available()); dataInputStream = new DataInputStream(inputStream); //if(debug) System.out.println("Bytes left in dataInputStream " +dataInputStream.available()); } // protected methods ------------------------------------------------- protected void readStaticData(UConverterStaticData sd) throws IOException { int bRead = 0; sd.structSize = dataInputStream.readInt(); bRead +=4; byte[] name = new byte[UConverterConstants.MAX_CONVERTER_NAME_LENGTH]; dataInputStream.readFully(name); bRead +=name.length; sd.name = new String(name, 0, name.length); sd.codepage = dataInputStream.readInt(); bRead +=4; sd.platform = dataInputStream.readByte(); bRead++; sd.conversionType = dataInputStream.readByte(); bRead++; sd.minBytesPerChar = dataInputStream.readByte(); bRead++; sd.maxBytesPerChar = dataInputStream.readByte(); bRead++; dataInputStream.readFully(sd.subChar); bRead += sd.subChar.length; sd.subCharLen = dataInputStream.readByte(); bRead++; sd.hasToUnicodeFallback = dataInputStream.readByte(); bRead++; sd.hasFromUnicodeFallback = dataInputStream.readByte(); bRead++; sd.unicodeMask = (short)dataInputStream.readUnsignedByte(); bRead++; sd.subChar1 = dataInputStream.readByte(); bRead++; dataInputStream.readFully(sd.reserved); bRead += sd.reserved.length; staticDataBytesRead = bRead; bytesRead += bRead; } protected void readMBCSHeader(CharsetMBCS.MBCSHeader h) throws IOException { dataInputStream.readFully(h.version); bytesRead += h.version.length; h.countStates = dataInputStream.readInt(); bytesRead+=4; h.countToUFallbacks = dataInputStream.readInt(); bytesRead+=4; h.offsetToUCodeUnits = dataInputStream.readInt(); bytesRead+=4; h.offsetFromUTable = dataInputStream.readInt(); bytesRead+=4; h.offsetFromUBytes = dataInputStream.readInt(); bytesRead+=4; h.flags = dataInputStream.readInt(); bytesRead+=4; h.fromUBytesLength = dataInputStream.readInt(); bytesRead+=4; if (h.version[0] == 5 && h.version[1] >= 3) { h.options = dataInputStream.readInt(); bytesRead+=4; if ((h.options & CharsetMBCS.MBCS_OPT_NO_FROM_U) != 0) { h.fullStage2Length = dataInputStream.readInt(); bytesRead+=4; } } } protected void readMBCSTable(int[][] stateTableArray, CharsetMBCS.MBCSToUFallback[] toUFallbacksArray, char[] unicodeCodeUnitsArray, char[] fromUnicodeTableArray, byte[] fromUnicodeBytesArray) throws IOException { int i, j; for(i = 0; i < stateTableArray.length; ++i){ for(j = 0; j < stateTableArray[i].length; ++j){ stateTableArray[i][j] = dataInputStream.readInt(); bytesRead+=4; } } for(i = 0; i < toUFallbacksArray.length; ++i) { toUFallbacksArray[i].offset = dataInputStream.readInt(); bytesRead+=4; toUFallbacksArray[i].codePoint = dataInputStream.readInt(); bytesRead+=4; } for(i = 0; i < unicodeCodeUnitsArray.length; ++i){ unicodeCodeUnitsArray[i] = dataInputStream.readChar(); bytesRead+=2; } for(i = 0; i < fromUnicodeTableArray.length; ++i){ fromUnicodeTableArray[i] = dataInputStream.readChar(); bytesRead+=2; } for(i = 0; i < fromUnicodeBytesArray.length; ++i){ fromUnicodeBytesArray[i] = dataInputStream.readByte(); bytesRead++; } } protected String readBaseTableName() throws IOException { char c; StringBuffer name = new StringBuffer(); while((c = (char)dataInputStream.readByte()) != 0){ name.append(c); bytesRead++; } bytesRead++/*for null terminator*/; return name.toString(); } //protected int[] readExtIndexes(int skip) throws IOException protected ByteBuffer readExtIndexes(int skip) throws IOException { int skipped = dataInputStream.skipBytes(skip); if(skipped != skip){ throw new IOException("could not skip "+ skip +" bytes"); } int n = dataInputStream.readInt(); bytesRead+=4; int[] indexes = new int[n]; indexes[0] = n; for(int i = 1; i < n; ++i) { indexes[i] = dataInputStream.readInt(); bytesRead+=4; } //return indexes; ByteBuffer b = ByteBuffer.allocate(indexes[31]); for(int i = 0; i < n; ++i) { b.putInt(indexes[i]); } int len = dataInputStream.read(b.array(), b.position(), b.remaining()); if(len==-1){ throw new IOException("Read failed"); } bytesRead += len; return b; } /*protected byte[] readExtTables(int n) throws IOException { byte[] tables = new byte[n]; int len =dataInputStream.read(tables); if(len==-1){ throw new IOException("Read failed"); } bytesRead += len; return tables; }*/ byte[] getDataFormatVersion(){ return DATA_FORMAT_VERSION; } /** * Inherited method */ public boolean isDataVersionAcceptable(byte version[]){ return version[0] == DATA_FORMAT_VERSION[0]; } /* byte[] getUnicodeVersion(){ return unicodeVersion; }*/ // private data members ------------------------------------------------- /** * ICU data file input stream */ DataInputStream dataInputStream; // private byte[] unicodeVersion; /** * File format version that this class understands. * No guarantees are made if a older version is used * see store.c of gennorm for more information and values */ // DATA_FORMAT_ID_ values taken from icu4c isCnvAcceptable (ucnv_bld.c) private static final byte DATA_FORMAT_ID[] = {(byte)0x63, (byte)0x6e, (byte)0x76, (byte)0x74}; // dataFormat="cnvt" private static final byte DATA_FORMAT_VERSION[] = {(byte)0x6}; } icu4j-4.2/src/com/ibm/icu/charset/CharsetBOCU1.java0000644000175000017500000012520311361046170021650 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UnicodeSet; import com.ibm.icu.text.UTF16; import com.ibm.icu.lang.UCharacter; /** * @author krajwade * */ class CharsetBOCU1 extends CharsetICU { /* BOCU constants and macros */ /* initial value for "prev": middle of the ASCII range */ private static final byte BOCU1_ASCII_PREV = 0x40; /* bounding byte values for differences */ private static final int BOCU1_MIN = 0x21; private static final int BOCU1_MIDDLE = 0x90; //private static final int BOCU1_MAX_LEAD = 0xfe; private static final int BOCU1_MAX_TRAIL = 0xff; private static final int BOCU1_RESET = 0xff; /* number of lead bytes */ //private static final int BOCU1_COUNT = (BOCU1_MAX_LEAD-BOCU1_MIN+1); /* adjust trail byte counts for the use of some C0 control byte values */ private static final int BOCU1_TRAIL_CONTROLS_COUNT = 20; private static final int BOCU1_TRAIL_BYTE_OFFSET = (BOCU1_MIN-BOCU1_TRAIL_CONTROLS_COUNT); /* number of trail bytes */ private static final int BOCU1_TRAIL_COUNT =((BOCU1_MAX_TRAIL-BOCU1_MIN+1)+BOCU1_TRAIL_CONTROLS_COUNT); /* * number of positive and negative single-byte codes * (counting 0==BOCU1_MIDDLE among the positive ones) */ private static final int BOCU1_SINGLE = 64; /* number of lead bytes for positive and negative 2/3/4-byte sequences */ private static final int BOCU1_LEAD_2 = 43; private static final int BOCU1_LEAD_3 = 3; //private static final int BOCU1_LEAD_4 = 1; /* The difference value range for single-byters. */ private static final int BOCU1_REACH_POS_1 = (BOCU1_SINGLE-1); private static final int BOCU1_REACH_NEG_1 = (-BOCU1_SINGLE); /* The difference value range for double-byters. */ private static final int BOCU1_REACH_POS_2 = (BOCU1_REACH_POS_1+BOCU1_LEAD_2*BOCU1_TRAIL_COUNT); private static final int BOCU1_REACH_NEG_2 = (BOCU1_REACH_NEG_1-BOCU1_LEAD_2*BOCU1_TRAIL_COUNT); /* The difference value range for 3-byters. */ private static final int BOCU1_REACH_POS_3 = (BOCU1_REACH_POS_2+BOCU1_LEAD_3*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT); private static final int BOCU1_REACH_NEG_3 = (BOCU1_REACH_NEG_2-BOCU1_LEAD_3*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT); /* The lead byte start values. */ private static final int BOCU1_START_POS_2 = (BOCU1_MIDDLE+BOCU1_REACH_POS_1+1); private static final int BOCU1_START_POS_3 = (BOCU1_START_POS_2+BOCU1_LEAD_2); private static final int BOCU1_START_POS_4 = (BOCU1_START_POS_3+BOCU1_LEAD_3); /* ==BOCU1_MAX_LEAD */ private static final int BOCU1_START_NEG_2 = (BOCU1_MIDDLE+BOCU1_REACH_NEG_1); private static final int BOCU1_START_NEG_3 = (BOCU1_START_NEG_2-BOCU1_LEAD_2); //private static final int BOCU1_START_NEG_4 = (BOCU1_START_NEG_3-BOCU1_LEAD_3); /* ==BOCU1_MIN+1 */ /* The length of a byte sequence, according to the lead byte (!=BOCU1_RESET). */ /* private static int BOCU1_LENGTH_FROM_LEAD(int lead) { return ((BOCU1_START_NEG_2<=(lead) && (lead)>24 : 4); } /* * Byte value map for control codes, * from external byte values 0x00..0x20 * to trail byte values 0..19 (0..0x13) as used in the difference calculation. * External byte values that are illegal as trail bytes are mapped to -1. */ private static final int[] bocu1ByteToTrail={ /* 0 1 2 3 4 5 6 7 */ -1, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, -1, /* 8 9 a b c d e f */ -1, -1, -1, -1, -1, -1, -1, -1, /* 10 11 12 13 14 15 16 17 */ 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, /* 18 19 1a 1b 1c 1d 1e 1f */ 0x0e, 0x0f, -1, -1, 0x10, 0x11, 0x12, 0x13, /* 20 */ -1 }; /* * Byte value map for control codes, * from trail byte values 0..19 (0..0x13) as used in the difference calculation * to external byte values 0x00..0x20. */ private static final int[] bocu1TrailToByte = { /* 0 1 2 3 4 5 6 7 */ 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x10, 0x11, /* 8 9 a b c d e f */ 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, /* 10 11 12 13 */ 0x1c, 0x1d, 0x1e, 0x1f }; /* * 12 commonly used C0 control codes (and space) are only used to encode * themselves directly, * which makes BOCU-1 MIME-usable and reasonably safe for * ASCII-oriented software. * * These controls are * 0 NUL * * 7 BEL * 8 BS * * 9 TAB * a LF * b VT * c FF * d CR * * e SO * f SI * * 1a SUB * 1b ESC * * The other 20 C0 controls are also encoded directly (to preserve order) * but are also used as trail bytes in difference encoding * (for better compression). */ private static int BOCU1_TRAIL_TO_BYTE(int trail) { return ((trail)>=BOCU1_TRAIL_CONTROLS_COUNT ? (trail)+BOCU1_TRAIL_BYTE_OFFSET : bocu1TrailToByte[trail]); } /* BOCU-1 implementation functions ------------------------------------------ */ private static int BOCU1_SIMPLE_PREV(int c){ return (((c)&~0x7f)+BOCU1_ASCII_PREV); } /** * Compute the next "previous" value for differencing * from the current code point. * * @param c current code point, 0x3040..0xd7a3 (rest handled by macro below) * @return "previous code point" state value */ private static int bocu1Prev(int c) { /* compute new prev */ if(/* 0x3040<=c && */ c<=0x309f) { /* Hiragana is not 128-aligned */ return 0x3070; } else if(0x4e00<=c && c<=0x9fa5) { /* CJK Unihan */ return 0x4e00-BOCU1_REACH_NEG_2; } else if(0xac00<=c /* && c<=0xd7a3 */) { /* Korean Hangul */ return (0xd7a3+0xac00)/2; } else { /* mostly small scripts */ return BOCU1_SIMPLE_PREV(c); } } /** Fast version of bocu1Prev() for most scripts. */ private static int BOCU1_PREV(int c) { return ((c)<0x3040 || (c)>0xd7a3 ? BOCU1_SIMPLE_PREV(c) : bocu1Prev(c)); } protected byte[] fromUSubstitution = new byte[]{(byte)0x1A}; /* Faster versions of packDiff() for single-byte-encoded diff values. */ /** Is a diff value encodable in a single byte? */ private static boolean DIFF_IS_SINGLE(int diff){ return (BOCU1_REACH_NEG_1<=(diff) && (diff)<=BOCU1_REACH_POS_1); } /** Encode a diff value in a single byte. */ private static int PACK_SINGLE_DIFF(int diff){ return (BOCU1_MIDDLE+(diff)); } /** Is a diff value encodable in two bytes? */ private static boolean DIFF_IS_DOUBLE(int diff){ return (BOCU1_REACH_NEG_2<=(diff) && (diff)<=BOCU1_REACH_POS_2); } public CharsetBOCU1(String icuCanonicalName, String javaCanonicalName, String[] aliases){ super(icuCanonicalName, javaCanonicalName, aliases); maxBytesPerChar = 4; minBytesPerChar = 1; maxCharsPerByte = 1; } class CharsetEncoderBOCU extends CharsetEncoderICU { public CharsetEncoderBOCU(CharsetICU cs) { super(cs,fromUSubstitution); } int sourceIndex, nextSourceIndex; int prev, c , diff; boolean checkNegative; boolean LoopAfterTrail; int targetCapacity; CoderResult cr; /* label values for supporting behavior similar to goto in C */ private static final int fastSingle=0; private static final int getTrail=1; private static final int regularLoop=2; private boolean LabelLoop; //used to break the while loop private int labelType = fastSingle; //labeType is set to fastSingle to start the code from fastSingle: /** * Integer division and modulo with negative numerators * yields negative modulo results and quotients that are one more than * what we need here. * This macro adjust the results so that the modulo-value m is always >=0. * * For positive n, the if() condition is always FALSE. * * @param n Number to be split into quotient and rest. * Will be modified to contain the quotient. * @param d Divisor. * @param m Output variable for the rest (modulo result). */ private int NEGDIVMOD(int n, int d, int m) { diff = n; (m)=(diff)%(d); (diff)/=(d); if((m)<0) { --(diff); (m)+=(d); } return m; } /** * Encode a difference -0x10ffff..0x10ffff in 1..4 bytes * and return a packed integer with them. * * The encoding favors small absolute differences with short encodings * to compress runs of same-script characters. * * Optimized version with unrolled loops and fewer floating-point operations * than the standard packDiff(). * * @param diff difference value -0x10ffff..0x10ffff * @return * 0x010000zz for 1-byte sequence zz * 0x0200yyzz for 2-byte sequence yy zz * 0x03xxyyzz for 3-byte sequence xx yy zz * 0xwwxxyyzz for 4-byte sequence ww xx yy zz (ww>0x03) */ private int packDiff(int n) { int result, m = 0; diff = n; if(diff>=BOCU1_REACH_NEG_1) { /* mostly positive differences, and single-byte negative ones */ if(diff<=BOCU1_REACH_POS_2) { /* two bytes */ diff-=BOCU1_REACH_POS_1+1; result=0x02000000; m=diff%BOCU1_TRAIL_COUNT; diff/=BOCU1_TRAIL_COUNT; result|=BOCU1_TRAIL_TO_BYTE(m); result|=(BOCU1_START_POS_2+diff)<<8; } else if(diff<=BOCU1_REACH_POS_3) { /* three bytes */ diff-=BOCU1_REACH_POS_2+1; result=0x03000000; m=diff%BOCU1_TRAIL_COUNT; diff/=BOCU1_TRAIL_COUNT; result|=BOCU1_TRAIL_TO_BYTE(m); m=diff%BOCU1_TRAIL_COUNT; diff/=BOCU1_TRAIL_COUNT; result|=BOCU1_TRAIL_TO_BYTE(m)<<8; result|=(BOCU1_START_POS_3+diff)<<16; } else { /* four bytes */ diff-=BOCU1_REACH_POS_3+1; m=diff%BOCU1_TRAIL_COUNT; diff/=BOCU1_TRAIL_COUNT; result=BOCU1_TRAIL_TO_BYTE(m); m=diff%BOCU1_TRAIL_COUNT; diff/=BOCU1_TRAIL_COUNT; result|=BOCU1_TRAIL_TO_BYTE(m)<<8; /* * We know that / and % would deliver quotient 0 and rest=diff. * Avoid division and modulo for performance. */ result|=BOCU1_TRAIL_TO_BYTE(diff)<<16; result|=((BOCU1_START_POS_4&UConverterConstants.UNSIGNED_INT_MASK))<<24; } } else { /* two- to four-byte negative differences */ if(diff>=BOCU1_REACH_NEG_2) { /* two bytes */ diff-=BOCU1_REACH_NEG_1; result=0x02000000; m = NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); result|=BOCU1_TRAIL_TO_BYTE(m); result|=(BOCU1_START_NEG_2+diff)<<8; } else if(diff>=BOCU1_REACH_NEG_3) { /* three bytes */ diff-=BOCU1_REACH_NEG_2; result=0x03000000; m = NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); result|=BOCU1_TRAIL_TO_BYTE(m); m = NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); result|=BOCU1_TRAIL_TO_BYTE(m)<<8; result|=(BOCU1_START_NEG_3+diff)<<16; } else { /* four bytes */ diff-=BOCU1_REACH_NEG_3; m = NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); result=BOCU1_TRAIL_TO_BYTE(m); m = NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); result|=BOCU1_TRAIL_TO_BYTE(m)<<8; /* * We know that NEGDIVMOD would deliver * quotient -1 and rest=diff+BOCU1_TRAIL_COUNT. * Avoid division and modulo for performance. */ m=diff+BOCU1_TRAIL_COUNT; result|=BOCU1_TRAIL_TO_BYTE(m)<<16; result|=BOCU1_MIN<<24; } } return result; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush){ cr = CoderResult.UNDERFLOW; LabelLoop = true; //used to break the while loop checkNegative = false; // its value is set to true to get out of while loop when c = -c LoopAfterTrail = false; // its value is set to true to ignore code before getTrail: /*set up the local pointers*/ targetCapacity = target.limit() - target.position(); c = fromUChar32; prev = fromUnicodeStatus; if(prev==0){ prev = BOCU1_ASCII_PREV; } /*sourceIndex ==-1 if the current characte began in the previous buffer*/ sourceIndex = c == 0 ? 0: -1; nextSourceIndex = 0; /*conversion loop*/ if(c!=0 && targetCapacity>0){ labelType = getTrail; } while(LabelLoop){ switch(labelType){ case fastSingle: labelType = fastSingle(source, target, offsets); break; case getTrail: labelType = getTrail(source, target, offsets); break; case regularLoop: labelType = regularLoop(source, target, offsets); break; } } return cr; } private int fastSingle(CharBuffer source, ByteBuffer target, IntBuffer offsets){ //fastSingle: /*fast loop for single-byte differences*/ /*use only one loop counter variable , targetCapacity, not also source*/ diff = source.limit() - source.position(); if(targetCapacity>diff){ targetCapacity = diff; } while(targetCapacity>0 && (c=source.get(source.position()))<0x3000){ if(c<=0x20){ if(c!=0x20){ prev = BOCU1_ASCII_PREV; } target.put((byte)c); if(offsets!=null){ offsets.put(nextSourceIndex++); } source.position(source.position()+1); --targetCapacity; }else { diff = c-prev; if(DIFF_IS_SINGLE(diff)){ prev = BOCU1_SIMPLE_PREV(c); target.put((byte)PACK_SINGLE_DIFF(diff)); if(offsets!=null){ offsets.put(nextSourceIndex++); } source.position(source.position()+1); --targetCapacity; }else { break; } } } return regularLoop; } private int getTrail(CharBuffer source, ByteBuffer target, IntBuffer offsets){ if(source.hasRemaining()){ /*test the following code unit*/ char trail = source.get(source.position()); if(UTF16.isTrailSurrogate(trail)){ source.position(source.position()+1); ++nextSourceIndex; c=UCharacter.getCodePoint((char)c, trail); } } else { /*no more input*/ c = -c; /*negative lead surrogate as "incomplete" indicator to avoid c=0 everywhere else*/ checkNegative = true; } LoopAfterTrail = true; return regularLoop; } private int regularLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets){ if(!LoopAfterTrail){ /*restore real values*/ targetCapacity = target.limit()-target.position(); sourceIndex = nextSourceIndex; /*wrong if offsets==null but does not matter*/ } /*regular loop for all classes*/ while(LoopAfterTrail || source.hasRemaining()){ if(LoopAfterTrail || targetCapacity>0){ if(!LoopAfterTrail){ c = source.get(); ++nextSourceIndex; if(c<=0x20){ /* * ISO C0 control & space: * Encode directly for MIME compatibility, * and reset state except for space, to not disrupt compression. */ if(c!=0x20) { prev=BOCU1_ASCII_PREV; } target.put((byte)c); if(offsets != null){ offsets.put(sourceIndex++); } --targetCapacity; sourceIndex=nextSourceIndex; continue; } if(UTF16.isLeadSurrogate((char)c)){ getTrail(source, target, offsets); if(checkNegative){ break; } } } if(LoopAfterTrail){ LoopAfterTrail = false; } /* * all other Unicode code points c==U+0021..U+10ffff * are encoded with the difference c-prev * * a new prev is computed from c, * placed in the middle of a 0x80-block (for most small scripts) or * in the middle of the Unihan and Hangul blocks * to statistically minimize the following difference */ diff = c- prev; prev = BOCU1_PREV(c); if(DIFF_IS_SINGLE(diff)){ target.put((byte)PACK_SINGLE_DIFF(diff)); if(offsets!=null){ offsets.put(sourceIndex++); } --targetCapacity; sourceIndex=nextSourceIndex; if(c<0x3000){ labelType = fastSingle; return labelType; } } else if(DIFF_IS_DOUBLE(diff) && 2<=targetCapacity){ /*optimize 2 byte case*/ int m = 0; if(diff>=0){ diff -= BOCU1_REACH_POS_1 +1; m = diff%BOCU1_TRAIL_COUNT; diff/=BOCU1_TRAIL_COUNT; diff+=BOCU1_START_POS_2; } else { diff -= BOCU1_REACH_NEG_1; m = NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); diff+=BOCU1_START_NEG_2; } target.put((byte)diff); target.put((byte)BOCU1_TRAIL_TO_BYTE(m)); if(offsets!=null){ offsets.put(sourceIndex); offsets.put(sourceIndex); } targetCapacity -= 2; sourceIndex = nextSourceIndex; } else { int length; /*will be 2..4*/ diff = packDiff(diff); length = BOCU1_LENGTH_FROM_PACKED(diff); /*write the output character bytes from diff and length*/ /*from the first if in the loop we know that targetCapacity>0*/ if(length<=targetCapacity){ switch(length){ /*each branch falls through the next one*/ case 4: target.put((byte)(diff>>24)); if(offsets!= null){ offsets.put(sourceIndex); } case 3: target.put((byte)(diff>>16)); if(offsets!= null){ offsets.put(sourceIndex); } case 2: target.put((byte)(diff>>8)); if(offsets!= null){ offsets.put(sourceIndex); } /*case 1 handled above*/ target.put((byte)diff); if(offsets!= null){ offsets.put(sourceIndex); } default: /*will never occur*/ break; } targetCapacity -= length; sourceIndex = nextSourceIndex; } else { ByteBuffer error = ByteBuffer.wrap(errorBuffer); /* * We actually do this backwards here: * In order to save an intermediate variable, we output * first to the overflow buffer what does not fit into the * regular target. */ /* we know that 1<=targetCapacity>16)); case 2: error.put((byte)(diff>>8)); case 1: error.put((byte)diff); default: /* will never occur */ break; } errorBufferLength = length; /* now output what fits into the regular target */ diff>>=8*length; /* length was reduced by targetCapacity */ switch(targetCapacity) { /* each branch falls through to the next one */ case 3: target.put((byte)(diff>>16)); if(offsets!= null){ offsets.put(sourceIndex); } case 2: target.put((byte)(diff>>8)); if(offsets!= null){ offsets.put(sourceIndex); } case 1: target.put((byte)diff); if(offsets!= null){ offsets.put(sourceIndex); } default: /* will never occur */ break; } /* target overflow */ targetCapacity=0; cr = CoderResult.OVERFLOW; break; } } } else{ /*target is full*/ cr = CoderResult.OVERFLOW; break; } } /*set the converter state back into UConverter*/ fromUChar32 = c<0 ? -c :0; fromUnicodeStatus = prev; LabelLoop = false; labelType = fastSingle; return labelType; } } class CharsetDecoderBOCU extends CharsetDecoderICU{ public CharsetDecoderBOCU(CharsetICU cs) { super(cs); } int byteIndex; int sourceIndex, nextSourceIndex; int prev, c , diff, count; byte[] bytes; int targetCapacity; CoderResult cr; /* label values for supporting behavior similar to goto in C */ private static final int fastSingle=0; private static final int getTrail=1; private static final int regularLoop=2; private static final int endLoop=3; private boolean LabelLoop;//used to break the while loop private boolean afterTrail; // its value is set to true to ignore code after getTrail: private int labelType; /* * The BOCU-1 converter uses the standard setup code in ucnv.c/ucnv_bld.c. * The UConverter fields are used as follows: * * fromUnicodeStatus encoder's prev (0 will be interpreted as BOCU1_ASCII_PREV) * * toUnicodeStatus decoder's prev (0 will be interpreted as BOCU1_ASCII_PREV) * mode decoder's incomplete (diff<<2)|count (ignored when toULength==0) */ /* BOCU-1-from-Unicode conversion functions --------------------------------- */ /** * Function for BOCU-1 decoder; handles multi-byte lead bytes. * * @param b lead byte; * BOCU1_MIN<=b= BOCU1_START_NEG_2) { /* positive difference */ if(b < BOCU1_START_POS_3) { /* two bytes */ diffValue = (b - BOCU1_START_POS_2)*BOCU1_TRAIL_COUNT + BOCU1_REACH_POS_1+1; countValue = 1; } else if(b < BOCU1_START_POS_4) { /* three bytes */ diffValue = (b-BOCU1_START_POS_3)*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_POS_2+1; countValue = 2; } else { /* four bytes */ diffValue = BOCU1_REACH_POS_3+1; countValue = 3; } } else { /* negative difference */ if(b >= BOCU1_START_NEG_3) { /* two bytes */ diffValue=(b -BOCU1_START_NEG_2)*BOCU1_TRAIL_COUNT + BOCU1_REACH_NEG_1; countValue=1; } else if(b>BOCU1_MIN) { /* three bytes */ diffValue=(b - BOCU1_START_NEG_3)*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT + BOCU1_REACH_NEG_2; countValue = 2; } else { /* four bytes */ diffValue=-BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_3; countValue=3; } } /* return the state for decoding the trail byte(s) */ return (diffValue<<2)|countValue; } /** * Function for BOCU-1 decoder; handles multi-byte trail bytes. * * @param count number of remaining trail bytes including this one * @param b trail byte * @return new delta for diff including b - <0 indicates an error * * @see decodeBocu1 */ private int decodeBocu1TrailByte(int countValue, int b) { b = b&UConverterConstants.UNSIGNED_BYTE_MASK; if((b)<=0x20) { /* skip some C0 controls and make the trail byte range contiguous */ b = bocu1ByteToTrail[b]; /* b<0 for an illegal trail byte value will result in return<0 below */ } else { //b-= BOCU1_TRAIL_BYTE_OFFSET; b = b - BOCU1_TRAIL_BYTE_OFFSET; } /* add trail byte into difference and decrement count */ if(countValue==1) { return b; } else if(countValue==2) { return b*BOCU1_TRAIL_COUNT; } else /* count==3 */ { return b*(BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT); } } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush){ cr = CoderResult.UNDERFLOW; LabelLoop = true; afterTrail = false; labelType = fastSingle; // labelType is set to fastSingle so t /*get the converter state*/ prev = toUnicodeStatus; if(prev==0){ prev = BOCU1_ASCII_PREV; } diff = mode; count = diff&3; diff>>=2; byteIndex = toULength; bytes = toUBytesArray; /* sourceIndex=-1 if the current character began in the previous buffer */ sourceIndex=byteIndex==0 ? 0 : -1; nextSourceIndex=0; /* conversion "loop" similar to _SCSUToUnicodeWithOffsets() */ if(count>0 && byteIndex>0 && target.position()diff) { count = diff; } while(count>0) { if(BOCU1_START_NEG_2 <=(c=source.get(source.position())) && c< BOCU1_START_POS_2) { c = prev + (c-BOCU1_MIDDLE); if(c<0x3000) { target.put((char)c); if(offsets!=null){ offsets.put(nextSourceIndex++); } prev = BOCU1_SIMPLE_PREV(c); } else { break; } } else if((c&UConverterConstants.UNSIGNED_BYTE_MASK) <= 0x20) { if((c&UConverterConstants.UNSIGNED_BYTE_MASK) != 0x20) { prev = BOCU1_ASCII_PREV; } target.put((char)c); if(offsets!=null){ offsets.put(nextSourceIndex++); } } else { break; } source.position(source.position()+1); --count; } sourceIndex=nextSourceIndex; /* wrong if offsets==NULL but does not matter */ return labelType; } private int getTrail(ByteBuffer source, CharBuffer target, IntBuffer offsets){ labelType = regularLoop; for(;;) { if(source.position() >= source.limit()) { labelType = endLoop; return labelType; } ++nextSourceIndex; c = bytes[byteIndex++] = source.get(); /* trail byte in any position */ c = decodeBocu1TrailByte(count, c); if(c<0) { cr = CoderResult.malformedForLength(1); labelType = endLoop; return labelType; } diff+=c; if(--count==0) { /* final trail byte, deliver a code point */ byteIndex=0; c = prev + diff; if(c > 0x10ffff) { cr = CoderResult.malformedForLength(1); labelType = endLoop; return labelType; } break; } } afterTrail = true; return labelType; } private int afterGetTrail(ByteBuffer source, CharBuffer target, IntBuffer offsets){ /* decode a sequence of single and lead bytes */ while(afterTrail || source.hasRemaining()) { if(!afterTrail){ if(target.position() >= target.limit()) { /* target is full */ cr = CoderResult.OVERFLOW; break; } ++nextSourceIndex; c = source.get()&UConverterConstants.UNSIGNED_BYTE_MASK; if(BOCU1_START_NEG_2 <= c && c < BOCU1_START_POS_2) { /* Write a code point directly from a single-byte difference. */ c = prev + (c-BOCU1_MIDDLE); if(c<0x3000) { target.put((char)c); if(offsets!=null){ offsets.put(sourceIndex); } prev = BOCU1_SIMPLE_PREV(c); sourceIndex = nextSourceIndex; labelType = fastSingle; return labelType; } } else if(c <= 0x20) { /* * Direct-encoded C0 control code or space. * Reset prev for C0 control codes but not for space. */ if(c != 0x20) { prev=BOCU1_ASCII_PREV; } target.put((char)c); if(offsets!=null){ offsets.put(sourceIndex); } sourceIndex=nextSourceIndex; continue; } else if(BOCU1_START_NEG_3 <= c && c < BOCU1_START_POS_3 && source.hasRemaining()) { /* Optimize two-byte case. */ if(c >= BOCU1_MIDDLE) { diff=(c - BOCU1_START_POS_2)*BOCU1_TRAIL_COUNT + BOCU1_REACH_POS_1 + 1; } else { diff=(c-BOCU1_START_NEG_2)*BOCU1_TRAIL_COUNT + BOCU1_REACH_NEG_1; } /* trail byte */ ++nextSourceIndex; c = decodeBocu1TrailByte(1, source.get()); if(c<0 || ((c = prev + diff + c)&UConverterConstants.UNSIGNED_INT_MASK)>0x10ffff) { bytes[0]= source.get(-2); bytes[1]= source.get(-1); byteIndex = 2; cr = CoderResult.malformedForLength(1); break; } } else if(c == BOCU1_RESET) { /* only reset the state, no code point */ prev=BOCU1_ASCII_PREV; sourceIndex=nextSourceIndex; continue; } else { /* * For multi-byte difference lead bytes, set the decoder state * with the partial difference value from the lead byte and * with the number of trail bytes. */ bytes[0]= (byte)c; byteIndex = 1; diff = decodeBocu1LeadByte(c); count = diff&3; diff>>=2; getTrail(source, target, offsets); if(labelType != regularLoop){ return labelType; } } } if(afterTrail){ afterTrail = false; } /* calculate the next prev and output c */ prev = BOCU1_PREV(c); if(c<=0xffff) { target.put((char)c); if(offsets!=null){ offsets.put(sourceIndex); } } else { /* output surrogate pair */ target.put((char)UTF16.getLeadSurrogate(c)); if(target.hasRemaining()) { target.put((char)UTF16.getTrailSurrogate(c)); if(offsets!=null){ offsets.put(sourceIndex); offsets.put(sourceIndex); } } else { /* target overflow */ if(offsets!=null){ offsets.put(sourceIndex); } charErrorBufferArray[0] = UTF16.getTrailSurrogate(c); charErrorBufferLength = 1; cr = CoderResult.OVERFLOW; break; } } sourceIndex=nextSourceIndex; } labelType = endLoop; return labelType; } private void endLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets){ if(cr.isMalformed()) { /* set the converter state in UConverter to deal with the next character */ toUnicodeStatus = BOCU1_ASCII_PREV; mode = 0; } else { /* set the converter state back into UConverter */ toUnicodeStatus=prev; mode=(diff<<2)|count; } toULength=byteIndex; LabelLoop = false; } } public CharsetDecoder newDecoder() { return new CharsetDecoderBOCU(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderBOCU(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ CharsetICU.getCompleteUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetISCII.java0000644000175000017500000024213511361046170021703 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; /** * @author Michael Ow * */ class CharsetISCII extends CharsetICU { private static final short UCNV_OPTIONS_VERSION_MASK = 0X0f; //private static final short NUKTA = 0x093c; //private static final short HALANT = 0x094d; private static final short ZWNJ = 0x200c; /* Zero Width Non Joiner */ private static final short ZWJ = 0x200d; /* Zero Width Joiner */ //private static final int INVALID_CHAR = 0xffff; private static final short ATR = 0xef; /* Attribute code */ private static final short EXT = 0xf0; /* Extension code */ private static final short DANDA = 0x0964; private static final short DOUBLE_DANDA = 0x0965; private static final short ISCII_NUKTA = 0xe9; private static final short ISCII_HALANT = 0xe8; private static final short ISCII_DANDA = 0xea; private static final short ISCII_VOWEL_SIGN_E = 0xe0; private static final short ISCII_INV = 0xd9; private static final short INDIC_BLOCK_BEGIN = 0x0900; private static final short INDIC_BLOCK_END = 0x0d7f; private static final short INDIC_RANGE = (INDIC_BLOCK_END - INDIC_BLOCK_BEGIN); private static final short VOCALLIC_RR = 0x0931; private static final short LF = 0x0a; private static final short ASCII_END = 0xa0; private static final short TELUGU_DELTA = (UniLang.DELTA * UniLang.TELUGU); private static final short DEV_ABBR_SIGN = 0x0970; private static final short DEV_ANUDATTA = 0x0952; private static final short EXT_RANGE_BEGIN = 0xa1; private static final short EXT_RANGE_END = 0xee; private static final short PNJ_DELTA = 0x100; private static final int NO_CHAR_MARKER = 0xfffe; /* Used for proper conversion to and from Gurmukhi */ private static UnicodeSet PNJ_BINDI_TIPPI_SET; private static UnicodeSet PNJ_CONSONANT_SET; private static final short PNJ_BINDI = 0x0a02; private static final short PNJ_TIPPI = 0x0a70; private static final short PNJ_SIGN_VIRAMA = 0x0a4d; private static final short PNJ_ADHAK = 0x0a71; private static final short PNJ_HA = 0x0a39; private static final short PNJ_RRA = 0x0a5c; private static final class UniLang { static final short DEVALANGARI = 0; static final short BENGALI = DEVALANGARI + 1; static final short GURMUKHI = BENGALI + 1; static final short GUJARATI = GURMUKHI + 1; static final short ORIYA = GUJARATI + 1; static final short TAMIL = ORIYA + 1; static final short TELUGU = TAMIL + 1; static final short KANNADA = TELUGU + 1; static final short MALAYALAM = KANNADA + 1; static final short DELTA = 0x80; } private static final class ISCIILang { static final short DEF = 0x40; static final short RMN = 0x41; static final short DEV = 0x42; static final short BNG = 0x43; static final short TML = 0x44; static final short TLG = 0x45; static final short ASM = 0x46; static final short ORI = 0x47; static final short KND = 0x48; static final short MLM = 0x49; static final short GJR = 0x4a; static final short PNJ = 0x4b; static final short ARB = 0x71; static final short PES = 0x72; static final short URD = 0x73; static final short SND = 0x74; static final short KSM = 0x75; static final short PST = 0x76; } private static final class MaskEnum { static final short DEV_MASK = 0x80; static final short PNJ_MASK = 0x40; static final short GJR_MASK = 0x20; static final short ORI_MASK = 0x10; static final short BNG_MASK = 0x08; static final short KND_MASK = 0x04; static final short MLM_MASK = 0x02; static final short TML_MASK = 0x01; static final short ZERO = 0x00; } private final String ISCII_CNV_PREFIX = "ISCII,version="; private final class UConverterDataISCII { int option; int contextCharToUnicode; /* previous Unicode codepoint for contextual analysis */ int contextCharFromUnicode; /* previous Unicode codepoint for contextual analysis */ short defDeltaToUnicode; /* delta for switching to default state when DEF is encountered */ short currentDeltaFromUnicode; /* current delta in Indic block */ short currentDeltaToUnicode; /* current delta in Indic block */ short currentMaskFromUnicode; /* mask for current state in fromUnicode */ short currentMaskToUnicode; /* mask for current state in toUnicode */ short defMaskToUnicode; /* mask for default state in toUnicode */ boolean isFirstBuffer; /* boolean for fromUnicode to see if we need to announce the first script */ boolean resetToDefaultToUnicode; /* boolean for reseting to default delta and mask when a newline is encountered */ String name; int prevToUnicodeStatus; /* Hold the previous toUnicodeStatus. This is necessary because we may need to know the last two code points. */ UConverterDataISCII(int option, String name) { this.option = option; this.name = name; initialize(); } void initialize() { this.contextCharToUnicode = NO_CHAR_MARKER; /* contextCharToUnicode */ this.currentDeltaFromUnicode = 0x0000; /* contextCharFromUnicode */ this.defDeltaToUnicode = (short)(lookupInitialData[option & UCNV_OPTIONS_VERSION_MASK].uniLang * UniLang.DELTA); /* defDeltaToUnicode */ this.currentDeltaFromUnicode = (short)(lookupInitialData[option & UCNV_OPTIONS_VERSION_MASK].uniLang * UniLang.DELTA); /* currentDeltaFromUnicode */ this.currentDeltaToUnicode = (short)(lookupInitialData[option & UCNV_OPTIONS_VERSION_MASK].uniLang * UniLang.DELTA); /* currentDeltaToUnicode */ this.currentMaskToUnicode = (short)lookupInitialData[option & UCNV_OPTIONS_VERSION_MASK].maskEnum; /* currentMaskToUnicode */ this.currentMaskFromUnicode = (short)lookupInitialData[option & UCNV_OPTIONS_VERSION_MASK].maskEnum; /* currentMaskFromUnicode */ this.defMaskToUnicode = (short)lookupInitialData[option & UCNV_OPTIONS_VERSION_MASK].maskEnum; /* defMaskToUnicode */ this.isFirstBuffer = true; /* isFirstBuffer */ this.resetToDefaultToUnicode = false; /* resetToDefaultToUnicode */ this.prevToUnicodeStatus = 0x0000; } } private static final class LookupDataStruct { short uniLang; short maskEnum; short isciiLang; LookupDataStruct(short uniLang, short maskEnum, short isciiLang) { this.uniLang = uniLang; this.maskEnum = maskEnum; this.isciiLang = isciiLang; } } private static final LookupDataStruct [] lookupInitialData = { new LookupDataStruct(UniLang.DEVALANGARI, MaskEnum.DEV_MASK, ISCIILang.DEV), new LookupDataStruct(UniLang.BENGALI, MaskEnum.BNG_MASK, ISCIILang.BNG), new LookupDataStruct(UniLang.GURMUKHI, MaskEnum.PNJ_MASK, ISCIILang.PNJ), new LookupDataStruct(UniLang.GUJARATI, MaskEnum.GJR_MASK, ISCIILang.GJR), new LookupDataStruct(UniLang.ORIYA, MaskEnum.ORI_MASK, ISCIILang.ORI), new LookupDataStruct(UniLang.TAMIL, MaskEnum.TML_MASK, ISCIILang.TML), new LookupDataStruct(UniLang.TELUGU, MaskEnum.KND_MASK, ISCIILang.TLG), new LookupDataStruct(UniLang.KANNADA, MaskEnum.KND_MASK, ISCIILang.KND), new LookupDataStruct(UniLang.MALAYALAM, MaskEnum.MLM_MASK, ISCIILang.MLM) }; /* * The values in validity table are indexed by the lower bits of Unicode * range 0x0900 - 0x09ff. The values have a structure like: * ----------------------------------------------------------------- * |DEV | PNJ | GJR | ORI | BNG | TLG | MLM | TML | * | | | | | ASM | KND | | | * ----------------------------------------------------------------- * If a code point is valid in a particular script * then that bit is turned on * * Unicode does not distinguish between Bengali and Assamese aso we use 1 bit for * to represent these languages * * Telugu and Kannda have same codepoints except for Vocallic_RR which we special case * and combine and use 1 bit to represent these languages */ private static final short validityTable[] = { /* This state table is tool generated so please do not edit unless you know exactly what you are doing */ /* Note: This table was edited to mirror the Windows XP implementation */ /* ISCII: Valid: Unicode */ /* 0xa0: 0x00: 0x900 */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xa1: 0xb8: 0x901 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xa2: 0xfe: 0x902 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xa3: 0xbf: 0x903 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0x00: 0x00: 0x904 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xa4: 0xff: 0x905 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xa5: 0xff: 0x906 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xa6: 0xff: 0x907 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xa7: 0xff: 0x908 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xa8: 0xff: 0x909 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xa9: 0xff: 0x90a */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xaa: 0xfe: 0x90b */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0x00: 0x00: 0x90c */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xae: 0x80: 0x90d */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xab: 0x87: 0x90e */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xac: 0xff: 0x90f */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xad: 0xff: 0x910 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb2: 0x80: 0x911 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xaf: 0x87: 0x912 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb0: 0xff: 0x913 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb1: 0xff: 0x914 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb3: 0xff: 0x915 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb4: 0xfe: 0x916 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xb5: 0xfe: 0x917 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xb6: 0xfe: 0x918 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xb7: 0xff: 0x919 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb8: 0xff: 0x91a */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xb9: 0xfe: 0x91b */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xba: 0xff: 0x91c */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xbb: 0xfe: 0x91d */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xbc: 0xff: 0x91e */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xbd: 0xff: 0x91f */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xbe: 0xfe: 0x920 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xbf: 0xfe: 0x921 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xc0: 0xfe: 0x922 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xc1: 0xff: 0x923 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xc2: 0xff: 0x924 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xc3: 0xfe: 0x925 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xc4: 0xfe: 0x926 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xc5: 0xfe: 0x927 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xc6: 0xff: 0x928 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xc7: 0x81: 0x929 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.TML_MASK, /* 0xc8: 0xff: 0x92a */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xc9: 0xfe: 0x92b */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xca: 0xfe: 0x92c */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xcb: 0xfe: 0x92d */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xcc: 0xfe: 0x92e */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xcd: 0xff: 0x92f */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xcf: 0xff: 0x930 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd0: 0x87: 0x931 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd1: 0xff: 0x932 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd2: 0xb7: 0x933 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd3: 0x83: 0x934 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd4: 0xff: 0x935 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd5: 0xfe: 0x936 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0xd6: 0xbf: 0x937 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd7: 0xff: 0x938 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xd8: 0xff: 0x939 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0x00: 0x00: 0x93a */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x93b */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xe9: 0xda: 0x93c */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x93d */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xda: 0xff: 0x93e */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xdb: 0xff: 0x93f */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xdc: 0xff: 0x940 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xdd: 0xff: 0x941 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xde: 0xff: 0x942 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xdf: 0xbe: 0x943 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0x00: 0x00: 0x944 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ZERO + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xe3: 0x80: 0x945 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xe0: 0x87: 0x946 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xe1: 0xff: 0x947 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xe2: 0xff: 0x948 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xe7: 0x80: 0x949 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xe4: 0x87: 0x94a */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xe5: 0xff: 0x94b */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xe6: 0xff: 0x94c */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xe8: 0xff: 0x94d */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xec: 0x00: 0x94e */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xed: 0x00: 0x94f */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x950 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x951 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x952 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x953 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x954 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x955 */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x956 */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.ZERO + MaskEnum.KND_MASK + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x957 */ MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0x00: 0x00: 0x958 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x959 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x95a */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x95b */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x95c */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x95d */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x95e */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xce: 0x98: 0x95f */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x960 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0x00: 0x00: 0x961 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.ZERO, /* 0x00: 0x00: 0x962 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0x00: 0x00: 0x963 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.BNG_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xea: 0xf8: 0x964 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xeaea: 0x00: 0x965 */ MaskEnum.DEV_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* 0xf1: 0xff: 0x966 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf2: 0xff: 0x967 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf3: 0xff: 0x968 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf4: 0xff: 0x969 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf5: 0xff: 0x96a */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf6: 0xff: 0x96b */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf7: 0xff: 0x96c */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf8: 0xff: 0x96d */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xf9: 0xff: 0x96e */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0xfa: 0xff: 0x96f */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.GJR_MASK + MaskEnum.ORI_MASK + MaskEnum.BNG_MASK + MaskEnum.KND_MASK + MaskEnum.MLM_MASK + MaskEnum.TML_MASK, /* 0x00: 0x80: 0x970 */ MaskEnum.DEV_MASK + MaskEnum.PNJ_MASK + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO + MaskEnum.ZERO, /* * The length of the array is 128 to provide values for 0x900..0x97f. * The last 15 entries for 0x971..0x97f of the table are all zero * because no Indic script uses such Unicode code points. */ /* 0x00: 0x00: 0x971 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x972 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x973 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x974 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x975 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x976 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x977 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x978 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x979 */ MaskEnum.ZERO, /* 0x00: 0x00: 0x97A */ MaskEnum.ZERO, /* 0x00: 0x00: 0x97B */ MaskEnum.ZERO, /* 0x00: 0x00: 0x97C */ MaskEnum.ZERO, /* 0x00: 0x00: 0x97D */ MaskEnum.ZERO, /* 0x00: 0x00: 0x97E */ MaskEnum.ZERO, /* 0x00: 0x00: 0x97F */ MaskEnum.ZERO, }; private static final char fromUnicodeTable[] = { 0x00a0, /* 0x0900 */ 0x00a1, /* 0x0901 */ 0x00a2, /* 0x0902 */ 0x00a3, /* 0x0903 */ 0xa4e0, /* 0x0904 */ 0x00a4, /* 0x0905 */ 0x00a5, /* 0x0906 */ 0x00a6, /* 0x0907 */ 0x00a7, /* 0x0908 */ 0x00a8, /* 0x0909 */ 0x00a9, /* 0x090a */ 0x00aa, /* 0x090b */ 0xA6E9, /* 0x090c */ 0x00ae, /* 0x090d */ 0x00ab, /* 0x090e */ 0x00ac, /* 0x090f */ 0x00ad, /* 0x0910 */ 0x00b2, /* 0x0911 */ 0x00af, /* 0x0912 */ 0x00b0, /* 0x0913 */ 0x00b1, /* 0x0914 */ 0x00b3, /* 0x0915 */ 0x00b4, /* 0x0916 */ 0x00b5, /* 0x0917 */ 0x00b6, /* 0x0918 */ 0x00b7, /* 0x0919 */ 0x00b8, /* 0x091a */ 0x00b9, /* 0x091b */ 0x00ba, /* 0x091c */ 0x00bb, /* 0x091d */ 0x00bc, /* 0x091e */ 0x00bd, /* 0x091f */ 0x00be, /* 0x0920 */ 0x00bf, /* 0x0921 */ 0x00c0, /* 0x0922 */ 0x00c1, /* 0x0923 */ 0x00c2, /* 0x0924 */ 0x00c3, /* 0x0925 */ 0x00c4, /* 0x0926 */ 0x00c5, /* 0x0927 */ 0x00c6, /* 0x0928 */ 0x00c7, /* 0x0929 */ 0x00c8, /* 0x092a */ 0x00c9, /* 0x092b */ 0x00ca, /* 0x092c */ 0x00cb, /* 0x092d */ 0x00cc, /* 0x092e */ 0x00cd, /* 0x092f */ 0x00cf, /* 0x0930 */ 0x00d0, /* 0x0931 */ 0x00d1, /* 0x0932 */ 0x00d2, /* 0x0933 */ 0x00d3, /* 0x0934 */ 0x00d4, /* 0x0935 */ 0x00d5, /* 0x0936 */ 0x00d6, /* 0x0937 */ 0x00d7, /* 0x0938 */ 0x00d8, /* 0x0939 */ 0xFFFF, /* 0x093a */ 0xFFFF, /* 0x093b */ 0x00e9, /* 0x093c */ 0xEAE9, /* 0x093d */ 0x00da, /* 0x093e */ 0x00db, /* 0x093f */ 0x00dc, /* 0x0940 */ 0x00dd, /* 0x0941 */ 0x00de, /* 0x0942 */ 0x00df, /* 0x0943 */ 0xDFE9, /* 0x0944 */ 0x00e3, /* 0x0945 */ 0x00e0, /* 0x0946 */ 0x00e1, /* 0x0947 */ 0x00e2, /* 0x0948 */ 0x00e7, /* 0x0949 */ 0x00e4, /* 0x094a */ 0x00e5, /* 0x094b */ 0x00e6, /* 0x094c */ 0x00e8, /* 0x094d */ 0x00ec, /* 0x094e */ 0x00ed, /* 0x094f */ 0xA1E9, /* 0x0950 */ /* OM Symbol */ 0xFFFF, /* 0x0951 */ 0xF0B8, /* 0x0952 */ 0xFFFF, /* 0x0953 */ 0xFFFF, /* 0x0954 */ 0xFFFF, /* 0x0955 */ 0xFFFF, /* 0x0956 */ 0xFFFF, /* 0x0957 */ 0xb3e9, /* 0x0958 */ 0xb4e9, /* 0x0959 */ 0xb5e9, /* 0x095a */ 0xbae9, /* 0x095b */ 0xbfe9, /* 0x095c */ 0xC0E9, /* 0x095d */ 0xc9e9, /* 0x095e */ 0x00ce, /* 0x095f */ 0xAAe9, /* 0x0960 */ 0xA7E9, /* 0x0961 */ 0xDBE9, /* 0x0962 */ 0xDCE9, /* 0x0963 */ 0x00ea, /* 0x0964 */ 0xeaea, /* 0x0965 */ 0x00f1, /* 0x0966 */ 0x00f2, /* 0x0967 */ 0x00f3, /* 0x0968 */ 0x00f4, /* 0x0969 */ 0x00f5, /* 0x096a */ 0x00f6, /* 0x096b */ 0x00f7, /* 0x096c */ 0x00f8, /* 0x096d */ 0x00f9, /* 0x096e */ 0x00fa, /* 0x096f */ 0xF0BF, /* 0x0970 */ 0xFFFF, /* 0x0971 */ 0xFFFF, /* 0x0972 */ 0xFFFF, /* 0x0973 */ 0xFFFF, /* 0x0974 */ 0xFFFF, /* 0x0975 */ 0xFFFF, /* 0x0976 */ 0xFFFF, /* 0x0977 */ 0xFFFF, /* 0x0978 */ 0xFFFF, /* 0x0979 */ 0xFFFF, /* 0x097a */ 0xFFFF, /* 0x097b */ 0xFFFF, /* 0x097c */ 0xFFFF, /* 0x097d */ 0xFFFF, /* 0x097e */ 0xFFFF, /* 0x097f */ }; private static final char toUnicodeTable[] = { 0x0000, /* 0x00 */ 0x0001, /* 0x01 */ 0x0002, /* 0x02 */ 0x0003, /* 0x03 */ 0x0004, /* 0x04 */ 0x0005, /* 0x05 */ 0x0006, /* 0x06 */ 0x0007, /* 0x07 */ 0x0008, /* 0x08 */ 0x0009, /* 0x09 */ 0x000a, /* 0x0a */ 0x000b, /* 0x0b */ 0x000c, /* 0x0c */ 0x000d, /* 0x0d */ 0x000e, /* 0x0e */ 0x000f, /* 0x0f */ 0x0010, /* 0x10 */ 0x0011, /* 0x11 */ 0x0012, /* 0x12 */ 0x0013, /* 0x13 */ 0x0014, /* 0x14 */ 0x0015, /* 0x15 */ 0x0016, /* 0x16 */ 0x0017, /* 0x17 */ 0x0018, /* 0x18 */ 0x0019, /* 0x19 */ 0x001a, /* 0x1a */ 0x001b, /* 0x1b */ 0x001c, /* 0x1c */ 0x001d, /* 0x1d */ 0x001e, /* 0x1e */ 0x001f, /* 0x1f */ 0x0020, /* 0x20 */ 0x0021, /* 0x21 */ 0x0022, /* 0x22 */ 0x0023, /* 0x23 */ 0x0024, /* 0x24 */ 0x0025, /* 0x25 */ 0x0026, /* 0x26 */ 0x0027, /* 0x27 */ 0x0028, /* 0x28 */ 0x0029, /* 0x29 */ 0x002a, /* 0x2a */ 0x002b, /* 0x2b */ 0x002c, /* 0x2c */ 0x002d, /* 0x2d */ 0x002e, /* 0x2e */ 0x002f, /* 0x2f */ 0x0030, /* 0x30 */ 0x0031, /* 0x31 */ 0x0032, /* 0x32 */ 0x0033, /* 0x33 */ 0x0034, /* 0x34 */ 0x0035, /* 0x35 */ 0x0036, /* 0x36 */ 0x0037, /* 0x37 */ 0x0038, /* 0x38 */ 0x0039, /* 0x39 */ 0x003A, /* 0x3A */ 0x003B, /* 0x3B */ 0x003c, /* 0x3c */ 0x003d, /* 0x3d */ 0x003e, /* 0x3e */ 0x003f, /* 0x3f */ 0x0040, /* 0x40 */ 0x0041, /* 0x41 */ 0x0042, /* 0x42 */ 0x0043, /* 0x43 */ 0x0044, /* 0x44 */ 0x0045, /* 0x45 */ 0x0046, /* 0x46 */ 0x0047, /* 0x47 */ 0x0048, /* 0x48 */ 0x0049, /* 0x49 */ 0x004a, /* 0x4a */ 0x004b, /* 0x4b */ 0x004c, /* 0x4c */ 0x004d, /* 0x4d */ 0x004e, /* 0x4e */ 0x004f, /* 0x4f */ 0x0050, /* 0x50 */ 0x0051, /* 0x51 */ 0x0052, /* 0x52 */ 0x0053, /* 0x53 */ 0x0054, /* 0x54 */ 0x0055, /* 0x55 */ 0x0056, /* 0x56 */ 0x0057, /* 0x57 */ 0x0058, /* 0x58 */ 0x0059, /* 0x59 */ 0x005a, /* 0x5a */ 0x005b, /* 0x5b */ 0x005c, /* 0x5c */ 0x005d, /* 0x5d */ 0x005e, /* 0x5e */ 0x005f, /* 0x5f */ 0x0060, /* 0x60 */ 0x0061, /* 0x61 */ 0x0062, /* 0x62 */ 0x0063, /* 0x63 */ 0x0064, /* 0x64 */ 0x0065, /* 0x65 */ 0x0066, /* 0x66 */ 0x0067, /* 0x67 */ 0x0068, /* 0x68 */ 0x0069, /* 0x69 */ 0x006a, /* 0x6a */ 0x006b, /* 0x6b */ 0x006c, /* 0x6c */ 0x006d, /* 0x6d */ 0x006e, /* 0x6e */ 0x006f, /* 0x6f */ 0x0070, /* 0x70 */ 0x0071, /* 0x71 */ 0x0072, /* 0x72 */ 0x0073, /* 0x73 */ 0x0074, /* 0x74 */ 0x0075, /* 0x75 */ 0x0076, /* 0x76 */ 0x0077, /* 0x77 */ 0x0078, /* 0x78 */ 0x0079, /* 0x79 */ 0x007a, /* 0x7a */ 0x007b, /* 0x7b */ 0x007c, /* 0x7c */ 0x007d, /* 0x7d */ 0x007e, /* 0x7e */ 0x007f, /* 0x7f */ 0x0080, /* 0x80 */ 0x0081, /* 0x81 */ 0x0082, /* 0x82 */ 0x0083, /* 0x83 */ 0x0084, /* 0x84 */ 0x0085, /* 0x85 */ 0x0086, /* 0x86 */ 0x0087, /* 0x87 */ 0x0088, /* 0x88 */ 0x0089, /* 0x89 */ 0x008a, /* 0x8a */ 0x008b, /* 0x8b */ 0x008c, /* 0x8c */ 0x008d, /* 0x8d */ 0x008e, /* 0x8e */ 0x008f, /* 0x8f */ 0x0090, /* 0x90 */ 0x0091, /* 0x91 */ 0x0092, /* 0x92 */ 0x0093, /* 0x93 */ 0x0094, /* 0x94 */ 0x0095, /* 0x95 */ 0x0096, /* 0x96 */ 0x0097, /* 0x97 */ 0x0098, /* 0x98 */ 0x0099, /* 0x99 */ 0x009a, /* 0x9a */ 0x009b, /* 0x9b */ 0x009c, /* 0x9c */ 0x009d, /* 0x9d */ 0x009e, /* 0x9e */ 0x009f, /* 0x9f */ 0x00A0, /* 0xa0 */ 0x0901, /* 0xa1 */ 0x0902, /* 0xa2 */ 0x0903, /* 0xa3 */ 0x0905, /* 0xa4 */ 0x0906, /* 0xa5 */ 0x0907, /* 0xa6 */ 0x0908, /* 0xa7 */ 0x0909, /* 0xa8 */ 0x090a, /* 0xa9 */ 0x090b, /* 0xaa */ 0x090e, /* 0xab */ 0x090f, /* 0xac */ 0x0910, /* 0xad */ 0x090d, /* 0xae */ 0x0912, /* 0xaf */ 0x0913, /* 0xb0 */ 0x0914, /* 0xb1 */ 0x0911, /* 0xb2 */ 0x0915, /* 0xb3 */ 0x0916, /* 0xb4 */ 0x0917, /* 0xb5 */ 0x0918, /* 0xb6 */ 0x0919, /* 0xb7 */ 0x091a, /* 0xb8 */ 0x091b, /* 0xb9 */ 0x091c, /* 0xba */ 0x091d, /* 0xbb */ 0x091e, /* 0xbc */ 0x091f, /* 0xbd */ 0x0920, /* 0xbe */ 0x0921, /* 0xbf */ 0x0922, /* 0xc0 */ 0x0923, /* 0xc1 */ 0x0924, /* 0xc2 */ 0x0925, /* 0xc3 */ 0x0926, /* 0xc4 */ 0x0927, /* 0xc5 */ 0x0928, /* 0xc6 */ 0x0929, /* 0xc7 */ 0x092a, /* 0xc8 */ 0x092b, /* 0xc9 */ 0x092c, /* 0xca */ 0x092d, /* 0xcb */ 0x092e, /* 0xcc */ 0x092f, /* 0xcd */ 0x095f, /* 0xce */ 0x0930, /* 0xcf */ 0x0931, /* 0xd0 */ 0x0932, /* 0xd1 */ 0x0933, /* 0xd2 */ 0x0934, /* 0xd3 */ 0x0935, /* 0xd4 */ 0x0936, /* 0xd5 */ 0x0937, /* 0xd6 */ 0x0938, /* 0xd7 */ 0x0939, /* 0xd8 */ 0x200D, /* 0xd9 */ 0x093e, /* 0xda */ 0x093f, /* 0xdb */ 0x0940, /* 0xdc */ 0x0941, /* 0xdd */ 0x0942, /* 0xde */ 0x0943, /* 0xdf */ 0x0946, /* 0xe0 */ 0x0947, /* 0xe1 */ 0x0948, /* 0xe2 */ 0x0945, /* 0xe3 */ 0x094a, /* 0xe4 */ 0x094b, /* 0xe5 */ 0x094c, /* 0xe6 */ 0x0949, /* 0xe7 */ 0x094d, /* 0xe8 */ 0x093c, /* 0xe9 */ 0x0964, /* 0xea */ 0xFFFF, /* 0xeb */ 0xFFFF, /* 0xec */ 0xFFFF, /* 0xed */ 0xFFFF, /* 0xee */ 0xFFFF, /* 0xef */ 0xFFFF, /* 0xf0 */ 0x0966, /* 0xf1 */ 0x0967, /* 0xf2 */ 0x0968, /* 0xf3 */ 0x0969, /* 0xf4 */ 0x096a, /* 0xf5 */ 0x096b, /* 0xf6 */ 0x096c, /* 0xf7 */ 0x096d, /* 0xf8 */ 0x096e, /* 0xf9 */ 0x096f, /* 0xfa */ 0xFFFF, /* 0xfb */ 0xFFFF, /* 0xfc */ 0xFFFF, /* 0xfd */ 0xFFFF, /* 0xfe */ 0xFFFF, /* 0xff */ }; private static final char nuktaSpecialCases[][] = { { 16 /* length of array */ , 0 }, { 0xa6, 0x090c }, { 0xea, 0x093d }, { 0xdf, 0x0944 }, { 0xa1, 0x0950 }, { 0xb3, 0x0958 }, { 0xb4, 0x0959 }, { 0xb5, 0x095a }, { 0xba, 0x095b }, { 0xbf, 0x095c }, { 0xc0, 0x095d }, { 0xc9, 0x095e }, { 0xaa, 0x0960 }, { 0xa7, 0x0961 }, { 0xdb, 0x0962 }, { 0xdc, 0x0963 } }; private static final char vowelSignESpecialCases[][] = { { 2 /* length of array */ , 0 }, { 0xA4, 0x0904 } }; private static final short lookupTable[][] = { { MaskEnum.ZERO, MaskEnum.ZERO }, /* DEFAULT */ { MaskEnum.ZERO, MaskEnum.ZERO }, /* ROMAN */ { UniLang.DEVALANGARI, MaskEnum.DEV_MASK }, { UniLang.BENGALI, MaskEnum.BNG_MASK }, { UniLang.TAMIL, MaskEnum.TML_MASK }, { UniLang.TELUGU, MaskEnum.KND_MASK }, { UniLang.BENGALI, MaskEnum.BNG_MASK }, { UniLang.ORIYA, MaskEnum.ORI_MASK }, { UniLang.KANNADA, MaskEnum.KND_MASK }, { UniLang.MALAYALAM, MaskEnum.MLM_MASK }, { UniLang.GUJARATI, MaskEnum.GJR_MASK }, { UniLang.GURMUKHI, MaskEnum.PNJ_MASK } }; private UConverterDataISCII extraInfo = null; protected byte[] fromUSubstitution = new byte[]{(byte)0x1A}; public CharsetISCII(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); maxBytesPerChar = 4; minBytesPerChar = 1; maxCharsPerByte = 1; //get the version number of the ISCII converter int option = Integer.parseInt(icuCanonicalName.substring(14)); extraInfo = new UConverterDataISCII( option, new String(ISCII_CNV_PREFIX + (option & UCNV_OPTIONS_VERSION_MASK)) /* name */ ); initializePNJSets(); } /* Initialize the two UnicodeSets use for proper Gurmukhi conversion if they have not already been created. */ private void initializePNJSets() { if (PNJ_BINDI_TIPPI_SET != null && PNJ_CONSONANT_SET != null) { return; } PNJ_BINDI_TIPPI_SET = new UnicodeSet(); PNJ_CONSONANT_SET = new UnicodeSet(); PNJ_CONSONANT_SET.add(0x0a15, 0x0a28); PNJ_CONSONANT_SET.add(0x0a2a, 0x0a30); PNJ_CONSONANT_SET.add(0x0a35, 0x0a36); PNJ_CONSONANT_SET.add(0x0a38, 0x0a39); PNJ_BINDI_TIPPI_SET.addAll(PNJ_CONSONANT_SET); PNJ_BINDI_TIPPI_SET.add(0x0a05); PNJ_BINDI_TIPPI_SET.add(0x0a07); PNJ_BINDI_TIPPI_SET.add(0x0a41, 0x0a42); PNJ_BINDI_TIPPI_SET.add(0x0a3f); PNJ_CONSONANT_SET.compact(); PNJ_BINDI_TIPPI_SET.compact(); } /* * Rules for ISCII to Unicode converter * ISCII is a stateful encoding. To convert ISCII bytes to Unicode, * which is both precomposed and decomposed from characters * pre-context and post-context need to be considered. * * Post context * i) ATR : Attribute code is used to declare the font and script switching. * Currently we only switch scripts and font codes consumed without generating an error * ii) EXT : Extention code is used to declare switching to Sanskrit and for obscure, * obsolete characters * Pre context * i) Halant: if preceeded by a halant then it is a explicit halant * ii) Nukta: * a) if preceeded by a halant then it is a soft halant * b) if preceeded by specific consonants and the ligatures have pre-composed * characters in Unicode then convert to pre-composed characters * iii) Danda: If Danda is preceeded by a Danda then convert to Double Danda */ class CharsetDecoderISCII extends CharsetDecoderICU { public CharsetDecoderISCII(CharsetICU cs) { super(cs); implReset(); } protected void implReset() { super.implReset(); this.toUnicodeStatus = 0xFFFF; extraInfo.initialize(); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { CoderResult cr = CoderResult.UNDERFLOW; int targetUniChar = 0x0000; short sourceChar = 0x0000; UConverterDataISCII data; boolean gotoCallBack = false; int offset = 0; data = extraInfo; //data.contextCharToUnicode; /* contains previous ISCII codepoint visited */ //this.toUnicodeStatus; /* contains the mapping to Unicode of the above codepoint */ while (source.hasRemaining()) { targetUniChar = UConverterConstants.missingCharMarker; if (target.hasRemaining()) { sourceChar = (short)((short)source.get() & UConverterConstants.UNSIGNED_BYTE_MASK); /* look at the post-context perform special processing */ if (data.contextCharToUnicode == ATR) { /* If we have ATR in data.contextCharToUnicode then we need to change our * state to Indic Script specified by sourceChar */ /* check if the sourceChar is supported script range */ if (((short)(ISCIILang.PNJ - sourceChar) & UConverterConstants.UNSIGNED_BYTE_MASK) <= (ISCIILang.PNJ - ISCIILang.DEV)) { data.currentDeltaToUnicode = (short)(lookupTable[sourceChar & 0x0F][0] * UniLang.DELTA); data.currentMaskToUnicode = lookupTable[sourceChar & 0x0F][1]; } else if (sourceChar == ISCIILang.DEF) { /* switch back to default */ data.currentDeltaToUnicode = data.defDeltaToUnicode; data.currentMaskToUnicode = data.defMaskToUnicode; } else { if ((sourceChar >= 0x21 && sourceChar <= 0x3F)) { /* these are display codes consume and continue */ } else { cr = CoderResult.malformedForLength(1); /* reset */ data.contextCharToUnicode = NO_CHAR_MARKER; gotoCallBack = true; } } /* reset */ if (!gotoCallBack) { data.contextCharToUnicode = NO_CHAR_MARKER; continue; } } else if (data.contextCharToUnicode == EXT) { /* check if sourceChar is in 0xA1 - 0xEE range */ if (((short)(EXT_RANGE_END - sourceChar) & UConverterConstants.UNSIGNED_BYTE_MASK) <= (EXT_RANGE_END - EXT_RANGE_BEGIN)) { /* We currently support only Anudatta and Devanagari abbreviation sign */ if (sourceChar == 0xBF || sourceChar == 0xB8) { targetUniChar = (sourceChar == 0xBF) ? DEV_ABBR_SIGN : DEV_ANUDATTA; /* find out if the mappling is valid in this state */ if ((validityTable[((short)targetUniChar) & UConverterConstants.UNSIGNED_BYTE_MASK] & data.currentMaskToUnicode) > 0) { data.contextCharToUnicode = NO_CHAR_MARKER; /* Write the previous toUnicodeStatus, this was delayed to handle consonant clustering for Gurmukhi script. */ if (data.prevToUnicodeStatus != 0) { cr = WriteToTargetToU(offsets, (source.position() - 1), source, target, data.prevToUnicodeStatus, (short)0); data.prevToUnicodeStatus = 0x0000; } /* write to target */ cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, targetUniChar, data.currentDeltaToUnicode); continue; } } /* byte unit is unassigned */ targetUniChar = UConverterConstants.missingCharMarker; cr = CoderResult.unmappableForLength(1); } else { /* only 0xA1 - 0xEE are legal after EXT char */ data.contextCharToUnicode = NO_CHAR_MARKER; cr = CoderResult.malformedForLength(1); } gotoCallBack = true; } else if (data.contextCharToUnicode == ISCII_INV) { if (sourceChar == ISCII_HALANT) { targetUniChar = 0x0020; /* replace with space according to Indic FAQ */ } else { targetUniChar = ZWJ; } /* Write the previous toUnicodeStatus, this was delayed to handle consonant clustering for Gurmukhi script. */ if (data.prevToUnicodeStatus != 0) { cr = WriteToTargetToU(offsets, (source.position() - 1), source, target, data.prevToUnicodeStatus, (short)0); data.prevToUnicodeStatus = 0x0000; } /* write to target */ cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, targetUniChar, data.currentDeltaToUnicode); /* reset */ data.contextCharToUnicode = NO_CHAR_MARKER; } /* look at the pre-context and perform special processing */ if (!gotoCallBack) { switch (sourceChar) { case ISCII_INV: case EXT: /* falls through */ case ATR: data.contextCharToUnicode = (char)sourceChar; if (this.toUnicodeStatus != UConverterConstants.missingCharMarker) { /* Write the previous toUnicodeStatus, this was delayed to handle consonant clustering for Gurmukhi script. */ if (data.prevToUnicodeStatus != 0) { cr = WriteToTargetToU(offsets, (source.position() - 1), source, target, data.prevToUnicodeStatus, (short)0); data.prevToUnicodeStatus = 0x0000; } cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, this.toUnicodeStatus, data.currentDeltaToUnicode); this.toUnicodeStatus = UConverterConstants.missingCharMarker; } continue; case ISCII_DANDA: /* handle double danda */ if (data.contextCharToUnicode == ISCII_DANDA) { targetUniChar = DOUBLE_DANDA; /* clear the context */ data.contextCharToUnicode = NO_CHAR_MARKER; this.toUnicodeStatus = UConverterConstants.missingCharMarker; } else { targetUniChar = GetMapping(sourceChar, targetUniChar, data); data.contextCharToUnicode = (char)sourceChar; } break; case ISCII_HALANT: /* handle explicit halant */ if (data.contextCharToUnicode == ISCII_HALANT) { targetUniChar = ZWNJ; /* clear context */ data.contextCharToUnicode = NO_CHAR_MARKER; } else { targetUniChar = GetMapping(sourceChar, targetUniChar, data); data.contextCharToUnicode = (char)sourceChar; } break; case 0x0A: /* fall through */ case 0x0D: data.resetToDefaultToUnicode = true; targetUniChar = GetMapping(sourceChar, targetUniChar, data); data.contextCharToUnicode = (char)sourceChar; break; case ISCII_VOWEL_SIGN_E: /* find + SIGN_VOWEL_E special mapping */ int n = 1; boolean find = false; for (; n < vowelSignESpecialCases[0][0]; n++) { if (vowelSignESpecialCases[n][0] == ((short)data.contextCharToUnicode & UConverterConstants.UNSIGNED_BYTE_MASK)) { targetUniChar = vowelSignESpecialCases[n][1]; find = true; break; } } if (find) { /* find out if the mapping is valid in this state */ if ((validityTable[(byte)targetUniChar] & data.currentMaskFromUnicode) > 0) { data.contextCharToUnicode = NO_CHAR_MARKER; this.toUnicodeStatus = UConverterConstants.missingCharMarker; break; } } targetUniChar = GetMapping(sourceChar, targetUniChar, data); data.contextCharToUnicode = (char)sourceChar; break; case ISCII_NUKTA: /* handle soft halant */ if (data.contextCharToUnicode == ISCII_HALANT) { targetUniChar = ZWJ; /* clear the context */ data.contextCharToUnicode = NO_CHAR_MARKER; break; } else if (data.currentDeltaToUnicode == PNJ_DELTA && data.contextCharToUnicode == 0xc0) { /* We got here because ISCII_NUKTA was preceded by 0xc0 and we are converting Gurmukhi. * In that case we must convert (0xc0 0xe9) to (\u0a5c\u0a4d\u0a39). * WriteToTargetToU is given 0x095c instead of 0xa5c because that method will automatically * convert the code point given based on the delta provided. */ cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, PNJ_RRA, (short)0); if (!cr.isOverflow()) { cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, PNJ_SIGN_VIRAMA, (short)0); if (!cr.isOverflow()) { cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, PNJ_HA, (short)0); } else { this.charErrorBufferArray[this.charErrorBufferLength++] = PNJ_HA; } } else { this.charErrorBufferArray[this.charErrorBufferLength++] = PNJ_SIGN_VIRAMA; this.charErrorBufferArray[this.charErrorBufferLength++] = PNJ_HA; } this.toUnicodeStatus = UConverterConstants.missingCharMarker; data.contextCharToUnicode = NO_CHAR_MARKER; if (!cr.isError()) { continue; } break; } else { /* try to handle + ISCII_NUKTA special mappings */ int i = 1; boolean found = false; for (; i < nuktaSpecialCases[0][0]; i++) { if (nuktaSpecialCases[i][0] == ((short)data.contextCharToUnicode & UConverterConstants.UNSIGNED_BYTE_MASK)) { targetUniChar = nuktaSpecialCases[i][1]; found = true; break; } } if (found) { /* find out if the mapping is valid in this state */ if ((validityTable[(byte)targetUniChar] & data.currentMaskToUnicode) > 0) { data.contextCharToUnicode = NO_CHAR_MARKER; this.toUnicodeStatus = UConverterConstants.missingCharMarker; if (data.currentDeltaToUnicode == PNJ_DELTA) { /* Write the previous toUnicodeStatus, this was delayed to handle consonant clustering for Gurmukhi script. */ if (data.prevToUnicodeStatus != 0) { cr = WriteToTargetToU(offsets, (source.position() - 1), source, target, data.prevToUnicodeStatus, (short)0); data.prevToUnicodeStatus = 0x0000; } cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, targetUniChar, data.currentDeltaToUnicode); continue; } break; } /* else fall through to default */ } /* else fall through to default */ } default: targetUniChar = GetMapping(sourceChar, targetUniChar, data); data.contextCharToUnicode = (char)sourceChar; break; } //end of switch }//end of CallBack if statement if (!gotoCallBack && this.toUnicodeStatus != UConverterConstants.missingCharMarker) { /* Check to make sure that consonant clusters are handled correctly for Gurmukhi script. */ if (data.currentDeltaToUnicode == PNJ_DELTA && data.prevToUnicodeStatus != 0 && PNJ_CONSONANT_SET.contains(data.prevToUnicodeStatus) && (this.toUnicodeStatus + PNJ_DELTA) == PNJ_SIGN_VIRAMA && (targetUniChar + PNJ_DELTA) == data.prevToUnicodeStatus) { if (offsets != null) { offset = source.position() - 3; } cr = WriteToTargetToU(offsets, offset, source, target, PNJ_ADHAK, (short)0); cr = WriteToTargetToU(offsets, offset, source, target, data.prevToUnicodeStatus, (short)0); data.prevToUnicodeStatus = 0x0000; /* reset the previous unicode code point */ toUnicodeStatus = UConverterConstants.missingCharMarker; continue; } else { /* Write the previous toUnicodeStatus, this was delayed to handle consonant clustering for Gurmukhi script. */ if (data.prevToUnicodeStatus != 0) { cr = WriteToTargetToU(offsets, (source.position() - 1), source, target, data.prevToUnicodeStatus, (short)0); data.prevToUnicodeStatus = 0x0000; } /* Check to make sure that Bindi and Tippi are handled correctly for Gurmukhi script. * If 0xA2 is preceded by a codepoint in the PNJ_BINDI_TIPPI_SET then the target codepoint should be Tippi instead of Bindi. */ if (data.currentDeltaToUnicode == PNJ_DELTA && (targetUniChar + PNJ_DELTA) == PNJ_BINDI && PNJ_BINDI_TIPPI_SET.contains(this.toUnicodeStatus + PNJ_DELTA)) { targetUniChar = PNJ_TIPPI - PNJ_DELTA; cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, this.toUnicodeStatus, PNJ_DELTA); } else if (data.currentDeltaToUnicode == PNJ_DELTA && (targetUniChar + PNJ_DELTA) == PNJ_SIGN_VIRAMA && PNJ_CONSONANT_SET.contains(this.toUnicodeStatus + PNJ_DELTA)) { /* Store the current toUnicodeStatus code point for later handling of consonant cluster in Gurmukhi. */ data.prevToUnicodeStatus = this.toUnicodeStatus + PNJ_DELTA; } else { /* write the previously mapped codepoint */ cr = WriteToTargetToU(offsets, (source.position() - 2), source, target, this.toUnicodeStatus, data.currentDeltaToUnicode); } } this.toUnicodeStatus = UConverterConstants.missingCharMarker; } if (!gotoCallBack && targetUniChar != UConverterConstants.missingCharMarker) { /* now save the targetUniChar for delayed write */ this.toUnicodeStatus = (char)targetUniChar; if (data.resetToDefaultToUnicode) { data.currentDeltaToUnicode = data.defDeltaToUnicode; data.currentMaskToUnicode = data.defMaskToUnicode; data.resetToDefaultToUnicode = false; } } else { /* we reach here only if targetUniChar == missingCharMarker * so assign codes to reason and err */ if (!gotoCallBack) { cr = CoderResult.unmappableForLength(1); } //CallBack : toUBytesArray[0] = (byte)sourceChar; toULength = 1; gotoCallBack = false; break; } } else { cr = CoderResult.OVERFLOW; break; } } //end of while if (cr.isUnderflow() && flush && !source.hasRemaining()) { /*end of the input stream */ if (data.contextCharToUnicode == ATR || data.contextCharToUnicode == EXT || data.contextCharToUnicode == ISCII_INV) { /* set toUBytes[] */ toUBytesArray[0] = (byte)data.contextCharToUnicode; toULength = 1; /* avoid looping on truncated sequences */ data.contextCharToUnicode = NO_CHAR_MARKER; } else { toULength = 0; } if (this.toUnicodeStatus != UConverterConstants.missingCharMarker) { /* output a remaining target character */ WriteToTargetToU(offsets, (source.position() - 2), source, target, this.toUnicodeStatus, data.currentDeltaToUnicode); this.toUnicodeStatus = UConverterConstants.missingCharMarker; } } return cr; } private CoderResult WriteToTargetToU(IntBuffer offsets, int offset, ByteBuffer source, CharBuffer target, int targetUniChar, short delta) { CoderResult cr = CoderResult.UNDERFLOW; /* add offset to current Indic Block */ if (targetUniChar > ASCII_END && targetUniChar != ZWJ && targetUniChar != ZWNJ && targetUniChar != DANDA && targetUniChar != DOUBLE_DANDA) { targetUniChar += delta; } /* now write the targetUniChar */ if (target.hasRemaining()) { target.put((char)targetUniChar); if (offsets != null) { offsets.put(offset); } } else { charErrorBufferArray[charErrorBufferLength++] = (char)targetUniChar; cr = CoderResult.OVERFLOW; } return cr; } private int GetMapping(short sourceChar, int targetUniChar, UConverterDataISCII data) { targetUniChar = toUnicodeTable[sourceChar]; /* is the code point valid in current script? */ if (sourceChar > ASCII_END && (validityTable[(short)targetUniChar & UConverterConstants.UNSIGNED_BYTE_MASK] & data.currentMaskToUnicode) == 0) { /* Vocallic RR is assigne in ISCII Telugu and Unicode */ if (data.currentDeltaToUnicode != (TELUGU_DELTA) || targetUniChar != VOCALLIC_RR) { targetUniChar = UConverterConstants.missingCharMarker; } } return targetUniChar; } } /* * Rules: * Explicit Halant : * + * Soft Halant : * + */ class CharsetEncoderISCII extends CharsetEncoderICU { public CharsetEncoderISCII(CharsetICU cs) { super(cs, fromUSubstitution); implReset(); } protected void implReset() { super.implReset(); extraInfo.initialize(); } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { int targetByteUnit = 0x0000; int sourceChar = 0x0000; UConverterDataISCII converterData; short newDelta = 0; short range = 0; boolean deltaChanged = false; int tempContextFromUnicode = 0x0000; /* For special handling of the Gurmukhi script. */ CoderResult cr = CoderResult.UNDERFLOW; /* initialize data */ converterData = extraInfo; newDelta = converterData.currentDeltaFromUnicode; range = (short)(newDelta / UniLang.DELTA); if ((sourceChar = fromUChar32) != 0) { cr = handleSurrogates(source, (char) sourceChar); return (cr != null) ? cr : CoderResult.unmappableForLength(2); } /* writing the char to the output stream */ while (source.hasRemaining()) { if (!target.hasRemaining()) { return CoderResult.OVERFLOW; } /* Write the language code following LF only if LF is not the last character. */ if (fromUnicodeStatus == LF) { targetByteUnit = ATR << 8; targetByteUnit += (byte)lookupInitialData[range].isciiLang; fromUnicodeStatus = 0x0000; /* now append ATR and language code */ cr = WriteToTargetFromU(offsets, source, target, targetByteUnit); if (cr.isOverflow()) { break; } } sourceChar = source.get(); tempContextFromUnicode = converterData.contextCharFromUnicode; targetByteUnit = UConverterConstants.missingCharMarker; /* check if input is in ASCII and C0 control codes range */ if (sourceChar <= ASCII_END) { fromUnicodeStatus = sourceChar; cr = WriteToTargetFromU(offsets, source, target, sourceChar); if (cr.isOverflow()) { break; } continue; } switch (sourceChar) { case ZWNJ: /* contextChar has HALANT */ if (converterData.contextCharFromUnicode != 0) { converterData.contextCharFromUnicode = 0x00; targetByteUnit = ISCII_HALANT; } else { /* consume ZWNJ and continue */ converterData.contextCharFromUnicode = 0x00; continue; } break; case ZWJ: /* contextChar has HALANT */ if (converterData.contextCharFromUnicode != 0) { targetByteUnit = ISCII_NUKTA; } else { targetByteUnit = ISCII_INV; } converterData.contextCharFromUnicode = 0x00; break; default: /* is the sourceChar in the INDIC_RANGE? */ if((char)(INDIC_BLOCK_END - sourceChar) <= INDIC_RANGE) { /* Danda and Doube Danda are valid in Northern scripts.. since Unicode * does not include these codepoints in all Northern scripts we need to * filter them out */ if (sourceChar != DANDA && sourceChar != DOUBLE_DANDA) { /* find out to which block the sourceChar belongs */ range = (short)((sourceChar - INDIC_BLOCK_BEGIN) / UniLang.DELTA); newDelta = (short)(range * UniLang.DELTA); /* Now are we in the same block as previous? */ if (newDelta != converterData.currentDeltaFromUnicode || converterData.isFirstBuffer) { converterData.currentDeltaFromUnicode = newDelta; converterData.currentMaskFromUnicode = lookupInitialData[range].maskEnum; deltaChanged = true; converterData.isFirstBuffer = false; } if (converterData.currentDeltaFromUnicode == PNJ_DELTA) { if (sourceChar == PNJ_TIPPI) { /* Make sure Tippi is converterd to Bindi. */ sourceChar = PNJ_BINDI; } else if (sourceChar == PNJ_ADHAK) { /* This is for consonant cluster handling. */ converterData.contextCharFromUnicode = PNJ_ADHAK; } } /* Normalize all Indic codepoints to Devanagari and map them to ISCII */ /* now subtract the new delta from sourceChar */ sourceChar -= converterData.currentDeltaFromUnicode; } /* get the target byte unit */ targetByteUnit = fromUnicodeTable[(short)sourceChar & UConverterConstants.UNSIGNED_BYTE_MASK]; /* is the code point valid in current script? */ if ((validityTable[(short)sourceChar & UConverterConstants.UNSIGNED_BYTE_MASK] & converterData.currentMaskFromUnicode) == 0) { /* Vocallic RR is assigned in ISCII Telugu and Unicode */ if (converterData.currentDeltaFromUnicode != (TELUGU_DELTA) || sourceChar != VOCALLIC_RR) { targetByteUnit = UConverterConstants.missingCharMarker; } } if (deltaChanged) { /* we are in a script block which is different than * previous sourceChar's script block write ATR and language codes */ char temp = 0; temp = (char)(ATR << 8); temp += (char)((short)lookupInitialData[range].isciiLang & UConverterConstants.UNSIGNED_BYTE_MASK); /* reset */ deltaChanged = false; /* now append ATR and language code */ cr = WriteToTargetFromU(offsets, source, target, temp); if (cr.isOverflow()) { break; } } if (converterData.currentDeltaFromUnicode == PNJ_DELTA && (sourceChar + PNJ_DELTA) == PNJ_ADHAK) { continue; } } /* reset context char */ converterData.contextCharFromUnicode = 0x00; break; } //end of switch if (converterData.currentDeltaFromUnicode == PNJ_DELTA && tempContextFromUnicode == PNJ_ADHAK && PNJ_CONSONANT_SET.contains(sourceChar + PNJ_DELTA)) { /* If the previous codepoint is Adhak and the current codepoint is a consonant, the targetByteUnit should be C + Halant + C. */ /* reset context char */ converterData.contextCharFromUnicode = 0x0000; targetByteUnit = targetByteUnit << 16 | ISCII_HALANT << 8 | targetByteUnit; /*write targetByteUnit to target */ cr = WriteToTargetFromU(offsets, source, target, targetByteUnit); if (cr.isOverflow()) { break; } } else if (targetByteUnit != UConverterConstants.missingCharMarker) { if (targetByteUnit == ISCII_HALANT) { converterData.contextCharFromUnicode = (char)targetByteUnit; } /*write targetByteUnit to target */ cr = WriteToTargetFromU(offsets, source, target, targetByteUnit); if (cr.isOverflow()) { break; } } else if (UTF16.isSurrogate((char)sourceChar)) { cr = handleSurrogates(source, (char) sourceChar); return (cr != null) ? cr : CoderResult.unmappableForLength(2); } else { return CoderResult.unmappableForLength(1); } } /* end of while */ /* save the state and return */ return cr; } private CoderResult WriteToTargetFromU(IntBuffer offsets, CharBuffer source, ByteBuffer target, int targetByteUnit) { CoderResult cr = CoderResult.UNDERFLOW; int offset = source.position() - 1; /* write the targetUniChar to target */ if (target.hasRemaining()) { if (targetByteUnit <= 0xFF) { target.put((byte)targetByteUnit); if (offsets != null) { offsets.put(offset); } } else { if (targetByteUnit > 0xFFFF) { target.put((byte)(targetByteUnit >> 16)); if (offsets != null) { --offset; offsets.put(offset); } } if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = (byte)(targetByteUnit >> 8); errorBuffer[errorBufferLength++] = (byte)targetByteUnit; cr = CoderResult.OVERFLOW; return cr; } target.put((byte)(targetByteUnit >> 8)); if (offsets != null) { offsets.put(offset); } if (target.hasRemaining()) { target.put((byte)targetByteUnit); if (offsets != null) { offsets.put(offset); } } else { errorBuffer[errorBufferLength++] = (byte)targetByteUnit; cr = CoderResult.OVERFLOW; } } } else { if ((targetByteUnit > 0xFFFF)) { errorBuffer[errorBufferLength++] = (byte)(targetByteUnit >> 16); } else if ((targetByteUnit & 0xFF00) > 0) { errorBuffer[errorBufferLength++] = (byte)(targetByteUnit >> 8); } errorBuffer[errorBufferLength++] = (byte)(targetByteUnit); cr = CoderResult.OVERFLOW; } return cr; } } public CharsetDecoder newDecoder() { return new CharsetDecoderISCII(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderISCII(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ int idx,script; char mask; setFillIn.add(0,ASCII_END ); for(script = UniLang.DEVALANGARI ; script<= UniLang.MALAYALAM ;script++){ mask = (char)lookupInitialData[script].maskEnum; for(idx=0; idx < UniLang.DELTA ; idx++){ // Special check for telugu character if((validityTable[idx] & mask)!=0 || (script == UniLang.TELUGU && idx==0x31)){ setFillIn.add(idx+(script*UniLang.DELTA)+INDIC_BLOCK_BEGIN ); } } } setFillIn.add(DANDA); setFillIn.add(DOUBLE_DANDA); setFillIn.add(ZWNJ); setFillIn.add(ZWJ); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetSCSU.java0000644000175000017500000016256611361046170021631 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UnicodeSet; import com.ibm.icu.text.UTF16; import com.ibm.icu.lang.UCharacter; /** * @author krajwade * */ class CharsetSCSU extends CharsetICU{ /* SCSU definitions --------------------------------------------------------- */ /* SCSU command byte values */ //enum { private static final short SQ0=0x01; /* Quote from window pair 0 */ private static final short SQ7=0x08; /* Quote from window pair 7 */ private static final short SDX=0x0B; /* Define a window as extended */ //private static final short Srs=0x0C; /* reserved */ private static final short SQU=0x0E; /* Quote a single Unicode character */ private static final short SCU=0x0F; /* Change to Unicode mode */ private static final short SC0=0x10; /* Select window 0 */ private static final short SC7=0x17; /* Select window 7 */ private static final short SD0=0x18; /* Define and select window 0 */ //private static final short SD7=0x1F; /* Define and select window 7 */ private static final short UC0=0xE0; /* Select window 0 */ private static final short UC7=0xE7; /* Select window 7 */ private static final short UD0=0xE8; /* Define and select window 0 */ private static final short UD7=0xEF; /* Define and select window 7 */ private static final short UQU=0xF0; /* Quote a single Unicode character */ private static final short UDX=0xF1; /* Define a Window as extended */ private static final short Urs=0xF2; /* reserved */ // }; // enum { /* * Unicode code points from 3400 to E000 are not adressible by * dynamic window, since in these areas no short run alphabets are * found. Therefore add gapOffset to all values from gapThreshold. */ private static final int gapThreshold=0x68; private static final int gapOffset = 0xAC00 ; /* values between reservedStart and fixedThreshold are reserved */ private static final int reservedStart=0xA8; /* use table of predefined fixed offsets for values from fixedThreshold */ private static final int fixedThreshold=0xF; //}; protected byte[] fromUSubstitution = new byte[]{(byte)0x0E,(byte)0xFF, (byte)0xFD}; /* constant offsets for the 8 static windows */ private static final int staticOffsets[]={ 0x0000, /* ASCII for quoted tags */ 0x0080, /* Latin - 1 Supplement (for access to punctuation) */ 0x0100, /* Latin Extended-A */ 0x0300, /* Combining Diacritical Marks */ 0x2000, /* General Punctuation */ 0x2080, /* Currency Symbols */ 0x2100, /* Letterlike Symbols and Number Forms */ 0x3000 /* CJK Symbols and punctuation */ }; /* initial offsets for the 8 dynamic (sliding) windows */ private static final int initialDynamicOffsets[]={ 0x0080, /* Latin-1 */ 0x00C0, /* Latin Extended A */ 0x0400, /* Cyrillic */ 0x0600, /* Arabic */ 0x0900, /* Devanagari */ 0x3040, /* Hiragana */ 0x30A0, /* Katakana */ 0xFF00 /* Fullwidth ASCII */ }; /* Table of fixed predefined Offsets */ private static final int fixedOffsets[]={ /* 0xF9 */ 0x00C0, /* Latin-1 Letters + half of Latin Extended A */ /* 0xFA */ 0x0250, /* IPA extensions */ /* 0xFB */ 0x0370, /* Greek */ /* 0xFC */ 0x0530, /* Armenian */ /* 0xFD */ 0x3040, /* Hiragana */ /* 0xFE */ 0x30A0, /* Katakana */ /* 0xFF */ 0xFF60 /* Halfwidth Katakana */ }; /* state values */ //enum { private static final int readCommand=0; private static final int quotePairOne=1; private static final int quotePairTwo=2; private static final int quoteOne=3; private static final int definePairOne=4; private static final int definePairTwo=5; private static final int defineOne=6; // }; private final class SCSUData{ /* dynamic window offsets, intitialize to default values from initialDynamicOffsets */ int toUDynamicOffsets[] = new int[8] ; int fromUDynamicOffsets[] = new int[8] ; /* state machine state - toUnicode */ boolean toUIsSingleByteMode; short toUState; byte toUQuoteWindow, toUDynamicWindow; short toUByteOne; short toUPadding[]; /* state machine state - fromUnicode */ boolean fromUIsSingleByteMode; byte fromUDynamicWindow; /* * windowUse[] keeps track of the use of the dynamic windows: * At nextWindowUseIndex there is the least recently used window, * and the following windows (in a wrapping manner) are more and more * recently used. * At nextWindowUseIndex-1 there is the most recently used window. */ byte locale; byte nextWindowUseIndex; byte windowUse[] = new byte[8]; SCSUData(){ initialize(); } void initialize(){ for(int i=0;i<8;i++){ this.toUDynamicOffsets[i] = initialDynamicOffsets[i]; } this.toUIsSingleByteMode = true; this.toUState = readCommand; this.toUQuoteWindow = 0; this.toUDynamicWindow = 0; this.toUByteOne = 0; this.fromUIsSingleByteMode = true; this.fromUDynamicWindow = 0; for(int i=0;i<8;i++){ this.fromUDynamicOffsets[i] = initialDynamicOffsets[i]; } this.nextWindowUseIndex = 0; switch(this.locale){ case l_ja: for(int i=0;i<8;i++){ this.windowUse[i] = initialWindowUse_ja[i]; } break; default: for(int i=0;i<8;i++){ this.windowUse[i] = initialWindowUse[i]; } } } } static final byte initialWindowUse[]={ 7, 0, 3, 2, 4, 5, 6, 1 }; static final byte initialWindowUse_ja[]={ 3, 2, 4, 1, 0, 7, 5, 6 }; //enum { //private static final int lGeneric = 0; private static final int l_ja = 1; //}; private SCSUData extraInfo = null; public CharsetSCSU(String icuCanonicalName, String javaCanonicalName, String[] aliases){ super(icuCanonicalName, javaCanonicalName, aliases); maxBytesPerChar = 3; minBytesPerChar = 1; maxCharsPerByte = 1; extraInfo = new SCSUData(); } class CharsetDecoderSCSU extends CharsetDecoderICU { /* label values for supporting behavior similar to goto in C */ private static final int FastSingle=0; private static final int SingleByteMode=1; private static final int EndLoop=2; /* Mode Type */ private static final int ByteMode = 0; private static final int UnicodeMode =1; public CharsetDecoderSCSU(CharsetICU cs) { super(cs); implReset(); } //private SCSUData data ; protected void implReset(){ super.implReset(); toULength = 0; extraInfo.initialize(); } short b; //Get the state machine state private boolean isSingleByteMode ; private short state ; private byte quoteWindow ; private byte dynamicWindow ; private short byteOne; //sourceIndex=-1 if the current character began in the previous buffer private int sourceIndex ; private int nextSourceIndex ; CoderResult cr; SCSUData data ; private boolean LabelLoop;// used to break the while loop protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush){ data = extraInfo; //Get the state machine state isSingleByteMode = data.toUIsSingleByteMode; state = data.toUState; quoteWindow = data.toUQuoteWindow; dynamicWindow = data.toUDynamicWindow; byteOne = data.toUByteOne; LabelLoop = true; //sourceIndex=-1 if the current character began in the previous buffer sourceIndex = data.toUState == readCommand ? 0: -1 ; nextSourceIndex = 0; cr = CoderResult.UNDERFLOW; int labelType = 0; while(LabelLoop){ if(isSingleByteMode){ switch(labelType){ case FastSingle: /*fast path for single-byte mode*/ labelType = fastSingle(source, target, offsets, ByteMode); break; case SingleByteMode: /* normal state machine for single-byte mode, minus handling for what fastSingleCovers */ labelType = singleByteMode(source, target, offsets, ByteMode); break; case EndLoop: endLoop(source, target, offsets); break; } }else{ switch(labelType){ case FastSingle: /*fast path for single-byte mode*/ labelType = fastSingle(source, target, offsets, UnicodeMode); break; case SingleByteMode: /* normal state machine for single-byte mode, minus handling for what fastSingleCovers */ labelType = singleByteMode(source, target, offsets, UnicodeMode); break; case EndLoop: endLoop(source, target, offsets); break; } //LabelLoop = false; } } return cr; } private int fastSingle(ByteBuffer source, CharBuffer target, IntBuffer offsets, int modeType){ int label = 0; if(modeType==ByteMode){ if(state==readCommand){ while(source.hasRemaining() && target.hasRemaining() && (b=(short)(source.get(source.position()) & UConverterConstants.UNSIGNED_BYTE_MASK)) >= 0x20){ source.position(source.position()+1); ++nextSourceIndex; if(b <= 0x7f){ /*Write US graphic character or DEL*/ target.put((char)b); if(offsets != null){ offsets.put(sourceIndex); } }else{ /*Write from dynamic window*/ int c = data.toUDynamicOffsets[dynamicWindow] + (b&0x7f); if(c <= 0xffff){ target.put((char)c); if(offsets != null){ offsets.put(sourceIndex); } }else{ /*Output surrogate pair */ target.put((char)(0xd7c0 + (c>>10))); if(target.hasRemaining()){ target.put((char)(0xdc00 | (c&0x3ff))); if(offsets != null){ offsets.put(sourceIndex); offsets.put(sourceIndex); } }else{ /* target overflow */ if(offsets != null){ offsets.put(sourceIndex); } charErrorBufferArray[0] = (char)(0xdc00 | (c&0x3ff)); charErrorBufferLength = 1; label = EndLoop; cr = CoderResult.OVERFLOW; LabelLoop = false; return label; } } } sourceIndex = nextSourceIndex; } // label = SingleByteMode; } }else if(modeType==UnicodeMode){ /* fast path for unicode mode */ if(state == readCommand){ while((source.position()+1)(Urs-UC0)){ target.put((char)((b<<8)|(source.get(source.position()+1)&UConverterConstants.UNSIGNED_BYTE_MASK))); if(offsets != null){ offsets.put(sourceIndex); } sourceIndex = nextSourceIndex; nextSourceIndex+=2; source.position(source.position()+2); } } } label = SingleByteMode; return label; } private int singleByteMode(ByteBuffer source, CharBuffer target, IntBuffer offsets, int modeType){ int label = SingleByteMode; if(modeType == ByteMode){ while(source.hasRemaining()){ if(!target.hasRemaining()){ cr = CoderResult.OVERFLOW; LabelLoop = false; return label; } b = (short)(source.get() & UConverterConstants.UNSIGNED_BYTE_MASK); ++nextSourceIndex; switch(state){ case readCommand: /*redundant conditions are commented out */ if(((1L<>10))); if(target.hasRemaining()){ target.put((char)(0xdc00 | (c&0x3ff))); if(offsets != null){ offsets.put(sourceIndex); offsets.put(sourceIndex); } }else { /* target overflow */ if(offsets != null){ offsets.put(sourceIndex); } charErrorBufferArray[0] = (char)(0xdc00 | (c&0x3ff)); charErrorBufferLength = 1; label = EndLoop; cr = CoderResult.OVERFLOW; LabelLoop = false; return label; } } } sourceIndex = nextSourceIndex; state = readCommand; label = FastSingle; return label; case definePairOne: dynamicWindow = (byte)((b>>5)&7); byteOne = (byte)(b&0x1f); toUBytesArray[1] = (byte)b; toULength = 2; state = definePairTwo; break; case definePairTwo: data.toUDynamicOffsets[dynamicWindow] = 0x10000 + (byteOne<<15L | b<<7L); sourceIndex = nextSourceIndex; state = readCommand; label = FastSingle; return label; case defineOne: if(b==0){ /*callback (illegal)*/ toUBytesArray[1] = (byte)b; toULength =2; label = EndLoop; return label; }else if(b=fixedThreshold){ data.toUDynamicOffsets[dynamicWindow] = fixedOffsets[b-fixedThreshold]; }else{ /*callback (illegal)*/ toUBytesArray[1] = (byte)b; toULength =2; label = EndLoop; return label; } sourceIndex = nextSourceIndex; state = readCommand; label = FastSingle; return label; } } }else if(modeType==UnicodeMode){ while(source.hasRemaining()){ if(!target.hasRemaining()){ cr = CoderResult.OVERFLOW; LabelLoop = false; return label; } b = source.get(); ++nextSourceIndex; switch(state){ case readCommand: if((byte)(b -UC0)>(Urs - UC0)){ byteOne = b; toUBytesArray[0] = (byte)b; toULength = 1; state = quotePairOne; }else if((b&UConverterConstants.UNSIGNED_BYTE_MASK) <= UC7){ dynamicWindow = (byte)(b - UC0); sourceIndex = nextSourceIndex; isSingleByteMode = true; label = FastSingle; return label; }else if((b&UConverterConstants.UNSIGNED_BYTE_MASK) <= UD7){ dynamicWindow = (byte)(b - UD0); isSingleByteMode = true; toUBytesArray[0] = (byte)b; toULength = 1; state = defineOne; label = SingleByteMode; return label; }else if((b&UConverterConstants.UNSIGNED_BYTE_MASK) == UDX){ isSingleByteMode = true; toUBytesArray[0] = (byte)b; toULength = 1; state = definePairOne; label = SingleByteMode; return label; }else if((b&UConverterConstants.UNSIGNED_BYTE_MASK) == UQU){ toUBytesArray[0] = (byte)b; toULength = 1; state = quotePairOne; }else { /* callback (illegal)*/ cr = CoderResult.malformedForLength(1); toUBytesArray[0] = (byte)b; toULength = 1; label = EndLoop; return label; } break; case quotePairOne: byteOne = b; toUBytesArray[1] = (byte)b; toULength = 2; state = quotePairTwo; break; case quotePairTwo: target.put((char)((byteOne<<8) | b)); if(offsets != null){ offsets.put(sourceIndex); } sourceIndex = nextSourceIndex; state = readCommand; label = FastSingle; return label; } } } label = EndLoop; return label; } private void endLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets){ if(cr==CoderResult.OVERFLOW){ state = readCommand; }else if(state == readCommand){ toULength = 0; } data.toUIsSingleByteMode = isSingleByteMode; data.toUState = state; data.toUQuoteWindow = quoteWindow; data.toUDynamicWindow = dynamicWindow; data.toUByteOne = byteOne; LabelLoop = false; } } class CharsetEncoderSCSU extends CharsetEncoderICU{ public CharsetEncoderSCSU(CharsetICU cs) { super(cs, fromUSubstitution); implReset(); } //private SCSUData data; protected void implReset() { super.implReset(); extraInfo.initialize(); } /* label values for supporting behavior similar to goto in C */ private static final int Loop=0; private static final int GetTrailUnicode=1; private static final int OutputBytes=2; private static final int EndLoop =3; private int delta; private int length; ///variables of compression heuristics private int offset; private char lead, trail; private int code; private byte window; //Get the state machine state private boolean isSingleByteMode; private byte dynamicWindow ; private int currentOffset; int c; SCSUData data ; //sourceIndex=-1 if the current character began in the previous buffer private int sourceIndex ; private int nextSourceIndex; private int targetCapacity; private boolean LabelLoop;//used to break the while loop private boolean AfterGetTrail;// its value is set to true in order to ignore the code before getTrailSingle: private boolean AfterGetTrailUnicode;// is value is set to true in order to ignore the code before getTrailUnicode: CoderResult cr; protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { data = extraInfo; cr = CoderResult.UNDERFLOW; //Get the state machine state isSingleByteMode = data.fromUIsSingleByteMode; dynamicWindow = data.fromUDynamicWindow; currentOffset = data.fromUDynamicOffsets[dynamicWindow]; c = fromUChar32; sourceIndex = c== 0 ? 0: -1 ; nextSourceIndex = 0; targetCapacity = target.limit()-target.position(); //sourceIndex=-1 if the current character began in the previous buffer sourceIndex = c== 0 ? 0: -1 ; nextSourceIndex = 0; int labelType = Loop; // set to Loop so that the code starts from loop: LabelLoop = true; AfterGetTrail = false; AfterGetTrailUnicode = false; while(LabelLoop){ switch(labelType){ case Loop: labelType = loop(source, target, offsets); break; case GetTrailUnicode: labelType = getTrailUnicode(source, target, offsets); break; case OutputBytes: labelType = outputBytes(source, target, offsets); break; case EndLoop: endLoop(source, target, offsets); break; } } return cr; } private byte getWindow(int[] offsets){ int i; for (i=0;i<8;i++){ if(((c-offsets[i]) & UConverterConstants.UNSIGNED_INT_MASK) <= 0x7f){ return (byte)i; } } return -1; } private boolean isInOffsetWindowOrDirect(int offsetValue, int a){ return (boolean)((a & UConverterConstants.UNSIGNED_INT_MASK)<=(offsetValue & UConverterConstants.UNSIGNED_INT_MASK)+0x7f & ((a & UConverterConstants.UNSIGNED_INT_MASK)>=(offsetValue & UConverterConstants.UNSIGNED_INT_MASK) || ((a & UConverterConstants.UNSIGNED_INT_MASK)<=0x7f && ((a & UConverterConstants.UNSIGNED_INT_MASK)>=0x20 || ((1L<<(a & UConverterConstants.UNSIGNED_INT_MASK))&0x2601)!=0)))); } private byte getNextDynamicWindow(){ byte windowValue = data.windowUse[data.nextWindowUseIndex]; if(++data.nextWindowUseIndex==8){ data.nextWindowUseIndex=0; } return windowValue; } private void useDynamicWindow(byte windowValue){ /*first find the index of the window*/ int i,j; i = data.nextWindowUseIndex; do{ if(--i<0){ i=7; } }while(data.windowUse[i]!=windowValue); /*now copy each window[i+1] to [i]*/ j= i+1; if(j==8){ j=0; } while(j!=data.nextWindowUseIndex){ data.windowUse[i] = data.windowUse[j]; i=j; if(++j==8){ j=0; } } /*finally, set the window into the most recently used index*/ data.windowUse[i]= windowValue; } private int getDynamicOffset(){ int i; for(i=0;i<7;++i){ if(((c-fixedOffsets[i])&UConverterConstants.UNSIGNED_INT_MASK)<=0x7f){ offset = fixedOffsets[i]; return 0xf9+i; } } if((c&UConverterConstants.UNSIGNED_INT_MASK)<0x80){ /*No dynamic window for US-ASCII*/ return -1; }else if((c&UConverterConstants.UNSIGNED_INT_MASK)<0x3400 || ((c-0x10000)&UConverterConstants.UNSIGNED_INT_MASK)<(0x14000-0x10000) || ((c-0x1d000)&UConverterConstants.UNSIGNED_INT_MASK)<=(0x1ffff-0x1d000)){ /*This character is in the code range for a "small", i.e, reasonably windowable, script*/ offset = c&0x7fffff80; return (int)(c>>7); }else if(0xe000<=(c&UConverterConstants.UNSIGNED_INT_MASK) && (c&UConverterConstants.UNSIGNED_INT_MASK)!=0xfeff && (c&UConverterConstants.UNSIGNED_INT_MASK) < 0xfff0){ /*for these characters we need to take the gapOffset into account*/ offset=(c)&0x7fffff80; return (int)((c-gapOffset)>>7); }else{ return -1; } } private int loop(CharBuffer source, ByteBuffer target, IntBuffer offsets){ int label = 0; if(isSingleByteMode){ if(c!=0 && targetCapacity>0 && !AfterGetTrail){ label = getTrail(source, target, offsets); return label; } /*state machine for single byte mode*/ while(AfterGetTrail || source.hasRemaining()){ if(targetCapacity<=0 && !AfterGetTrail){ /*target is full*/ cr = CoderResult.OVERFLOW; label = EndLoop; return label; } if(!AfterGetTrail){ c = source.get(); ++nextSourceIndex; } if(((c -0x20)&UConverterConstants.UNSIGNED_INT_MASK)<=0x5f && !AfterGetTrail){ /*pass US-ASCII graphic character through*/ target.put((byte)c); if(offsets!=null){ offsets.put(sourceIndex); } --targetCapacity; }else if((c & UConverterConstants.UNSIGNED_INT_MASK)<0x20 && !AfterGetTrail){ if(((1L<<(c & UConverterConstants.UNSIGNED_INT_MASK))&0x2601)!=0){ /*CR/LF/TAB/NUL*/ target.put((byte)c); if(offsets!=null){ offsets.put(sourceIndex); } --targetCapacity; } else { /*quote c0 control character*/ c|=SQ0<<8; length = 2; label = OutputBytes; return label; } } else if(((delta=(c-currentOffset))&UConverterConstants.UNSIGNED_INT_MASK)<=0x7f && !AfterGetTrail){ /*use the current dynamic window*/ target.put((byte)(delta|0x80)); if(offsets!=null){ offsets.put(sourceIndex); } --targetCapacity; } else if(AfterGetTrail || UTF16.isSurrogate((char)c)){ if(!AfterGetTrail){ if(UTF16.isLeadSurrogate((char)c)){ label = getTrail(source, target, offsets); if(label==EndLoop){ return label; } } else { /*this is unmatched lead code unit (2nd Surrogate)*/ /*callback(illegal)*/ cr = CoderResult.malformedForLength(1); label = EndLoop; return label; } } if(AfterGetTrail){ AfterGetTrail = false; } /*Compress supplementary character U+10000...U+10ffff */ if(((delta=(c-currentOffset))&UConverterConstants.UNSIGNED_INT_MASK)<=0x7f){ /*use the current dynamic window*/ target.put((byte)(delta|0x80)); if(offsets!=null){ offsets.put(sourceIndex); } --targetCapacity; } else if((window=getWindow(data.fromUDynamicOffsets))>=0){ /*there is a dynamic window that contains this character, change to it*/ dynamicWindow = window; currentOffset = data.fromUDynamicOffsets[dynamicWindow]; useDynamicWindow(dynamicWindow); c = (((int)(SC0+dynamicWindow))<<8 | (c-currentOffset)|0x80); length = 2; label = OutputBytes; return label; } else if((code=getDynamicOffset())>=0){ /*might check if there are come character in this window to come */ /*define an extended window with this character*/ code-=0x200; dynamicWindow=getNextDynamicWindow(); currentOffset = data.fromUDynamicOffsets[dynamicWindow]=offset; useDynamicWindow(dynamicWindow); c = ((int)(SDX<<24) | (int)(dynamicWindow<<21)| (int)(code<<8)| (c- currentOffset) |0x80 ); // c = (((SDX)<<25) | (dynamicWindow<<21)| // (code<<8)| (c- currentOffset) |0x80 ); length = 4; label = OutputBytes; return label; } else { /*change to unicode mode and output this (lead, trail) pair*/ isSingleByteMode = false; target.put((byte)SCU); if(offsets!=null){ offsets.put(sourceIndex); } --targetCapacity; c = ((int)(lead<<16))|trail; length = 4; label = OutputBytes; return label; } } else if((c&UConverterConstants.UNSIGNED_INT_MASK)<0xa0){ /*quote C1 control character*/ c = (c&0x7f) | (SQ0+1)<<8; /*SQ0+1 == SQ1*/ length = 2; label = OutputBytes; return label; } else if((c&UConverterConstants.UNSIGNED_INT_MASK)==0xfeff || (c&UConverterConstants.UNSIGNED_INT_MASK)>= 0xfff0){ /*quote signature character = byte order mark and specials*/ c |= SQU<<16; length = 3; label = OutputBytes; return label; } else { /*compress all other BMP characters*/ if((window=getWindow(data.fromUDynamicOffsets))>=0){ /*there is a window defined that contains this character - switch to it or quote from it*/ if(source.position()>=source.limit() || isInOffsetWindowOrDirect(data.fromUDynamicOffsets[window], source.get(source.position()))){ /*change to dynamic window*/ dynamicWindow = window; currentOffset = data.fromUDynamicOffsets[dynamicWindow]; useDynamicWindow(dynamicWindow); c = ((int)((SC0+window)<<8)) | (c- currentOffset) | 0x80; length = 2; label = OutputBytes; return label; } else { /*quote from dynamic window*/ c = ((int)((SQ0+window)<<8)) | (c - data.fromUDynamicOffsets[window]) | 0x80; length = 2; label = OutputBytes; return label; } } else if((window = getWindow(staticOffsets))>=0){ /*quote from static window*/ c = ((int)((SQ0+window)<<8)) | (c - staticOffsets[window]); length = 2; label = OutputBytes; return label; }else if((code=getDynamicOffset())>=0){ /*define a dynamic window with this character*/ dynamicWindow = getNextDynamicWindow(); currentOffset = data.fromUDynamicOffsets[dynamicWindow]=offset; useDynamicWindow(dynamicWindow); c = ((int)((SD0+dynamicWindow)<<16)) | (int)(code<<8)| (c- currentOffset) | 0x80; length = 3; label = OutputBytes; return label; } else if(((int)((c-0x3400)&UConverterConstants.UNSIGNED_INT_MASK))<(0xd800-0x3400) && (source.position()>=source.limit() || ((int)((source.get(source.position())-0x3400)&UConverterConstants.UNSIGNED_INT_MASK))< (0xd800 - 0x3400))){ /* * this character is not compressible (a BMP ideograph of similar) * switch to Unicode mode if this is the last character in the block * or there is at least one more ideograph following immediately */ isSingleByteMode = false; c|=SCU<<16; length =3; label = OutputBytes; return label; } else { /*quote Unicode*/ c|=SQU<<16; length = 3; label = OutputBytes; return label; } } /*normal end of conversion : prepare for new character */ c = 0; sourceIndex = nextSourceIndex; } } else { if(c!=0 && targetCapacity>0 && !AfterGetTrailUnicode){ label = GetTrailUnicode; return label; } /*state machine for Unicode*/ /*unicodeByteMode*/ while(AfterGetTrailUnicode || source.hasRemaining()){ if(targetCapacity<=0 && !AfterGetTrailUnicode){ /*target is full*/ cr = CoderResult.OVERFLOW; LabelLoop = false; break; } if(!AfterGetTrailUnicode){ c = source.get(); ++nextSourceIndex; } if((((c-0x3400)& UConverterConstants.UNSIGNED_INT_MASK))<(0xd800-0x3400) && !AfterGetTrailUnicode){ /*not compressible, write character directly */ if(targetCapacity>=2){ target.put((byte)(c>>8)); target.put((byte)c); if(offsets!=null){ offsets.put(sourceIndex); offsets.put(sourceIndex); } targetCapacity-=2; } else { length =2; label = OutputBytes; return label; } } else if((((c-0x3400)& UConverterConstants.UNSIGNED_INT_MASK))>=(0xf300-0x3400) /* c<0x3400 || c>=0xf300*/&& !AfterGetTrailUnicode){ /*compress BMP character if the following one is not an uncompressible ideograph*/ if(!(source.hasRemaining() && (((source.get(source.position())-0x3400)& UConverterConstants.UNSIGNED_INT_MASK))<(0xd800-0x3400))){ if(((((c-0x30)&UConverterConstants.UNSIGNED_INT_MASK))<10 || (((c-0x61)&UConverterConstants.UNSIGNED_INT_MASK))<26 || (((c-0x41)&UConverterConstants.UNSIGNED_INT_MASK))<26)){ /*ASCII digit or letter*/ isSingleByteMode = true; c |=((int)((UC0+dynamicWindow)<<8))|c; length = 2; label = OutputBytes; return label; } else if((window=getWindow(data.fromUDynamicOffsets))>=0){ /*there is a dynamic window that contains this character, change to it*/ isSingleByteMode = true; dynamicWindow = window; currentOffset = data.fromUDynamicOffsets[dynamicWindow]; useDynamicWindow(dynamicWindow); c = ((int)((UC0+dynamicWindow)<<8)) | (c- currentOffset) | 0x80; length = 2; label = OutputBytes; return label; } else if((code=getDynamicOffset())>=0){ /*define a dynamic window with this character*/ isSingleByteMode = true; dynamicWindow = getNextDynamicWindow(); currentOffset = data.fromUDynamicOffsets[dynamicWindow]=offset; useDynamicWindow(dynamicWindow); c = ((int)((UD0+dynamicWindow)<<16)) | (int)(code<<8) |(c- currentOffset) | 0x80; length = 3; label = OutputBytes; return label; } } /*don't know how to compress these character, just write it directly*/ length = 2; label = OutputBytes; return label; } else if(c<0xe000 && !AfterGetTrailUnicode){ label = GetTrailUnicode; return label; } else { /*quote to avoid SCSU tags*/ c|=UQU<<16; length = 3; label = OutputBytes; return label; } if(AfterGetTrailUnicode){ AfterGetTrailUnicode = false; } /*normal end of conversion, prepare for a new character*/ c = 0; sourceIndex = nextSourceIndex; } } label = EndLoop; return label; } private int getTrail(CharBuffer source, ByteBuffer target, IntBuffer offsets){ lead = (char)c; int label = Loop; if(source.hasRemaining()){ /*test the following code unit*/ trail = source.get(source.position()); if(UTF16.isTrailSurrogate((char)trail)){ source.position(source.position()+1); ++nextSourceIndex; c = UCharacter.getCodePoint((char)c, trail); label = Loop; } else { /*this is unmatched lead code unit (1st Surrogate)*/ /*callback(illegal)*/ cr = CoderResult.malformedForLength(1); label = EndLoop; } }else { /*no more input*/ label = EndLoop; } AfterGetTrail = true; return label; } private int getTrailUnicode(CharBuffer source, ByteBuffer target, IntBuffer offsets){ int label = EndLoop; AfterGetTrailUnicode = true; /*c is surrogate*/ if(UTF16.isLeadSurrogate((char)c)){ // getTrailUnicode: lead = (char)c; if(source.hasRemaining()){ /*test the following code unit*/ trail = source.get(source.position()); if(UTF16.isTrailSurrogate(trail)){ source.get(); ++nextSourceIndex; c = UCharacter.getCodePoint((char)c, trail); /*convert this surrogate code point*/ /*exit this condition tree*/ } else { /*this is unmatched lead code unit(1st surrogate)*/ /*callback(illegal)*/ cr = CoderResult.malformedForLength(1); label = EndLoop; return label; } } else { /*no more input*/ label = EndLoop; return label; } } else { /*this is an unmatched trail code point (2nd surrogate)*/ /*callback (illegal)*/ cr = CoderResult.malformedForLength(1); label = EndLoop; return label; } /*compress supplementary character*/ if((window=getWindow(data.fromUDynamicOffsets))>=0 && !(source.hasRemaining() && ((source.get(source.position())-0x3400)&UConverterConstants.UNSIGNED_INT_MASK) < (0xd800 - 0x3400))){ /* * this is the dynamic window that contains this character and the following * character is not uncompressible, * change to the window */ isSingleByteMode = true; dynamicWindow = window; currentOffset = data.fromUDynamicOffsets[dynamicWindow]; useDynamicWindow(dynamicWindow); c = ((UC0+dynamicWindow)<<8 | (c-currentOffset) | 0x80); length = 2; label = OutputBytes; return label; } else if(source.hasRemaining() && lead == source.get(source.position()) && (code=getDynamicOffset())>=0){ /*two supplementary characters in (probably) the same window - define an extended one*/ isSingleByteMode = true; dynamicWindow = getNextDynamicWindow(); currentOffset = data.fromUDynamicOffsets[dynamicWindow] = offset; useDynamicWindow(dynamicWindow); c = (UDX<<24) | (dynamicWindow<<21) |(code<<8) |(c - currentOffset) | 0x80; length = 4; label = OutputBytes; return label; } else { /*don't know how to compress this character, just write it directly*/ c = (lead<<16)|trail; length = 4; label = OutputBytes; return label; } } private void endLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets){ /*set the converter state back to UConverter*/ data.fromUIsSingleByteMode = isSingleByteMode; data.fromUDynamicWindow = dynamicWindow; fromUChar32 = c; LabelLoop = false; } private int outputBytes(CharBuffer source, ByteBuffer target, IntBuffer offsets){ int label; //int targetCapacity = target.limit()-target.position(); /*write the output character byte from c and length*/ /*from the first if in the loop we know that targetCapacity>0*/ if(length<=targetCapacity){ if(offsets==null){ switch(length){ /*each branch falls through the next one*/ case 4: target.put((byte)(c>>24)); case 3: target.put((byte)(c>>16)); case 2: target.put((byte)(c>>8)); case 1: target.put((byte)c); default: /*will never occur*/ break; } }else { switch(length){ /*each branch falls through to the next one*/ case 4: target.put((byte)(c>>24)); if(offsets!=null){ offsets.put(sourceIndex); } case 3: target.put((byte)(c>>16)); if(offsets!=null){ offsets.put(sourceIndex); } case 2: target.put((byte)(c>>8)); if(offsets!=null){ offsets.put(sourceIndex); } case 1: target.put((byte)c); if(offsets!=null){ offsets.put(sourceIndex); } default: /*will never occur*/ break; } } targetCapacity-=length; /*normal end of conversion: prepare for a new character*/ c = 0; sourceIndex = nextSourceIndex; label = Loop; return label; } else { ByteBuffer p = ByteBuffer.wrap(errorBuffer); /* * We actually do this backwards here: * In order to save an intermediate variable, we output * first to the overflow buffer what does not fit into the * regular target */ /* we know that 0<=targetCapacity>24)); case 3: p.put((byte)(c>>16)); case 2: p.put((byte)(c>>8)); case 1: p.put((byte)c); default: /*will never occur*/ break; } errorBufferLength = length; /*now output what fits into the regular target*/ c>>=8*length; //length was reduced by targetCapacity switch(targetCapacity){ /*each branch falls through the next one*/ case 3: target.put((byte)(c>>16)); if(offsets!=null){ offsets.put(sourceIndex); } case 2: target.put((byte)(c>>8)); if(offsets!=null){ offsets.put(sourceIndex); } case 1: target.put((byte)c); if(offsets!=null){ offsets.put(sourceIndex); } default: break; } /*target overflow*/ targetCapacity = 0; cr = CoderResult.OVERFLOW; c = 0; label = EndLoop; return label; } } } public CharsetDecoder newDecoder() { return new CharsetDecoderSCSU(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderSCSU(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ CharsetICU.getCompleteUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetEncoderICU.java0000644000175000017500000010620011361046170022753 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.BufferOverflowException; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import java.nio.charset.CodingErrorAction; import com.ibm.icu.impl.Assert; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.UTF16; /** * An abstract class that provides framework methods of decoding operations for concrete * subclasses. * In the future this class will contain API that will implement converter semantics of ICU4C. * @stable ICU 3.6 */ public abstract class CharsetEncoderICU extends CharsetEncoder { /* this is used in fromUnicode DBCS tables as an "unassigned" marker */ static final char MISSING_CHAR_MARKER = '\uFFFF'; byte[] errorBuffer = new byte[30]; int errorBufferLength = 0; /** these are for encodeLoopICU */ int fromUnicodeStatus; int fromUChar32; boolean useSubChar1; boolean useFallback; /* maximum number of indexed UChars */ static final int EXT_MAX_UCHARS = 19; /* store previous UChars/chars to continue partial matches */ int preFromUFirstCP; /* >=0: partial match */ char[] preFromUArray = new char[EXT_MAX_UCHARS]; int preFromUBegin; int preFromULength; /* negative: replay */ char[] invalidUCharBuffer = new char[2]; int invalidUCharLength; Object fromUContext; private CharsetCallback.Encoder onUnmappableInput = CharsetCallback.FROM_U_CALLBACK_STOP; private CharsetCallback.Encoder onMalformedInput = CharsetCallback.FROM_U_CALLBACK_STOP; CharsetCallback.Encoder fromCharErrorBehaviour = new CharsetCallback.Encoder() { public CoderResult call(CharsetEncoderICU encoder, Object context, CharBuffer source, ByteBuffer target, IntBuffer offsets, char[] buffer, int length, int cp, CoderResult cr) { if (cr.isUnmappable()) { return onUnmappableInput.call(encoder, context, source, target, offsets, buffer, length, cp, cr); } else /* if (cr.isMalformed()) */ { return onMalformedInput.call(encoder, context, source, target, offsets, buffer, length, cp, cr); } // return CharsetCallback.FROM_U_CALLBACK_STOP.call(encoder, context, source, target, offsets, buffer, length, cp, cr); } }; /* * Construcs a new encoder for the given charset * * @param cs * for which the decoder is created * @param replacement * the substitution bytes */ CharsetEncoderICU(CharsetICU cs, byte[] replacement) { super(cs, (cs.minBytesPerChar + cs.maxBytesPerChar) / 2, cs.maxBytesPerChar, replacement); } /** * Is this Encoder allowed to use fallbacks? A fallback mapping is a mapping * that will convert a Unicode codepoint sequence to a byte sequence, but * the encoded byte sequence will round trip convert to a different * Unicode codepoint sequence. * @return true if the converter uses fallback, false otherwise. * @stable ICU 3.8 */ public boolean isFallbackUsed() { return useFallback; } /** * Sets whether this Encoder can use fallbacks? * @param usesFallback true if the user wants the converter to take * advantage of the fallback mapping, false otherwise. * @stable ICU 3.8 */ public void setFallbackUsed(boolean usesFallback) { useFallback = usesFallback; } /* * Use fallbacks from Unicode to codepage when useFallback or for private-use code points * @param c A codepoint */ final boolean isFromUUseFallback(int c) { return (useFallback) || (UCharacter.getType(c) == UCharacter.PRIVATE_USE); } /** * Use fallbacks from Unicode to codepage when useFallback or for private-use code points */ static final boolean isFromUUseFallback(boolean iUseFallback, int c) { return (iUseFallback) || (UCharacter.getType(c) == UCharacter.PRIVATE_USE); } /** * Sets the action to be taken if an illegal sequence is encountered * * @param newAction * action to be taken * @exception IllegalArgumentException * @stable ICU 3.6 */ protected void implOnMalformedInput(CodingErrorAction newAction) { onMalformedInput = getCallback(newAction); } /** * Sets the action to be taken if an illegal sequence is encountered * * @param newAction * action to be taken * @exception IllegalArgumentException * @stable ICU 3.6 */ protected void implOnUnmappableCharacter(CodingErrorAction newAction) { onUnmappableInput = getCallback(newAction); } /** * Sets the callback encoder method and context to be used if an illegal sequence is encountered. * You would normally call this twice to set both the malform and unmappable error. In this case, * newContext should remain the same since using a different newContext each time will negate the last * one used. * @param err CoderResult * @param newCallback CharsetCallback.Encoder * @param newContext Object * @stable ICU 4.0 */ public final void setFromUCallback(CoderResult err, CharsetCallback.Encoder newCallback, Object newContext) { if (err.isMalformed()) { onMalformedInput = newCallback; } else if (err.isUnmappable()) { onUnmappableInput = newCallback; } else { /* Error: Only malformed and unmappable are handled. */ } if (fromUContext == null || !fromUContext.equals(newContext)) { setFromUContext(newContext); } } /** * Sets fromUContext used in callbacks. * * @param newContext Object * @exception IllegalArgumentException * @stable ICU 4.0 */ public final void setFromUContext(Object newContext) { fromUContext = newContext; } private static CharsetCallback.Encoder getCallback(CodingErrorAction action) { if (action == CodingErrorAction.REPLACE) { return CharsetCallback.FROM_U_CALLBACK_SUBSTITUTE; } else if (action == CodingErrorAction.IGNORE) { return CharsetCallback.FROM_U_CALLBACK_SKIP; } else /* if (action == CodingErrorAction.REPORT) */ { return CharsetCallback.FROM_U_CALLBACK_STOP; } } private static final CharBuffer EMPTY = CharBuffer.allocate(0); /** * Flushes any characters saved in the converter's internal buffer and * resets the converter. * @param out action to be taken * @return result of flushing action and completes the decoding all input. * Returns CoderResult.UNDERFLOW if the action succeeds. * @stable ICU 3.6 */ protected CoderResult implFlush(ByteBuffer out) { return encode(EMPTY, out, null, true); } /** * Resets the from Unicode mode of converter * @stable ICU 3.6 */ protected void implReset() { errorBufferLength = 0; fromUnicodeStatus = 0; fromUChar32 = 0; fromUnicodeReset(); } private void fromUnicodeReset() { preFromUBegin = 0; preFromUFirstCP = UConverterConstants.U_SENTINEL; preFromULength = 0; } /** * Encodes one or more chars. The default behaviour of the * converter is stop and report if an error in input stream is encountered. * To set different behaviour use @see CharsetEncoder.onMalformedInput() * @param in buffer to decode * @param out buffer to populate with decoded result * @return result of decoding action. Returns CoderResult.UNDERFLOW if the decoding * action succeeds or more input is needed for completing the decoding action. * @stable ICU 3.6 */ protected CoderResult encodeLoop(CharBuffer in, ByteBuffer out) { if (!in.hasRemaining() && this.errorBufferLength == 0) { // make sure the errorBuffer is empty // The Java framework should have already substituted what was left. fromUChar32 = 0; //fromUnicodeReset(); return CoderResult.UNDERFLOW; } in.position(in.position() + fromUCountPending()); /* do the conversion */ CoderResult ret = encode(in, out, null, false); setSourcePosition(in); /* No need to reset to keep the proper state of the encoder. if (ret.isUnderflow() && in.hasRemaining()) { // The Java framework is going to substitute what is left. //fromUnicodeReset(); } */ return ret; } /* * Implements ICU semantics of buffer management * @param source * @param target * @param offsets * @return A CoderResult object that contains the error result when an error occurs. */ abstract CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush); /* * Implements ICU semantics for encoding the buffer * @param source The input character buffer * @param target The output byte buffer * @param offsets * @param flush true if, and only if, the invoker can provide no * additional input bytes beyond those in the given buffer. * @return A CoderResult object that contains the error result when an error occurs. */ final CoderResult encode(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { /* check parameters */ if (target == null || source == null) { throw new IllegalArgumentException(); } /* * Make sure that the buffer sizes do not exceed the number range for * int32_t because some functions use the size (in units or bytes) * rather than comparing pointers, and because offsets are int32_t values. * * size_t is guaranteed to be unsigned and large enough for the job. * * Return with an error instead of adjusting the limits because we would * not be able to maintain the semantics that either the source must be * consumed or the target filled (unless an error occurs). * An adjustment would be targetLimit=t+0x7fffffff; for example. */ /* flush the target overflow buffer */ if (errorBufferLength > 0) { byte[] overflowArray; int i, length; overflowArray = errorBuffer; length = errorBufferLength; i = 0; do { if (target.remaining() == 0) { /* the overflow buffer contains too much, keep the rest */ int j = 0; do { overflowArray[j++] = overflowArray[i++]; } while (i < length); errorBufferLength = (byte) j; return CoderResult.OVERFLOW; } /* copy the overflow contents to the target */ target.put(overflowArray[i++]); if (offsets != null) { offsets.put(-1); /* no source index available for old output */ } } while (i < length); /* the overflow buffer is completely copied to the target */ errorBufferLength = 0; } if (!flush && source.remaining() == 0 && preFromULength >= 0) { /* the overflow buffer is emptied and there is no new input: we are done */ return CoderResult.UNDERFLOW; } /* * Do not simply return with a buffer overflow error if * !flush && t==targetLimit * because it is possible that the source will not generate any output. * For example, the skip callback may be called; * it does not output anything. */ return fromUnicodeWithCallback(source, target, offsets, flush); } /* * Implementation note for m:n conversions * * While collecting source units to find the longest match for m:n conversion, * some source units may need to be stored for a partial match. * When a second buffer does not yield a match on all of the previously stored * source units, then they must be "replayed", i.e., fed back into the converter. * * The code relies on the fact that replaying will not nest - * converting a replay buffer will not result in a replay. * This is because a replay is necessary only after the _continuation_ of a * partial match failed, but a replay buffer is converted as a whole. * It may result in some of its units being stored again for a partial match, * but there will not be a continuation _during_ the replay which could fail. * * It is conceivable that a callback function could call the converter * recursively in a way that causes another replay to be stored, but that * would be an error in the callback function. * Such violations will cause assertion failures in a debug build, * and wrong output, but they will not cause a crash. */ final CoderResult fromUnicodeWithCallback(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { int sBufferIndex; int sourceIndex; int errorInputLength; boolean converterSawEndOfInput, calledCallback; /* variables for m:n conversion */ CharBuffer replayArray = CharBuffer.allocate(EXT_MAX_UCHARS); int replayArrayIndex = 0; CharBuffer realSource; boolean realFlush; CoderResult cr = CoderResult.UNDERFLOW; /* get the converter implementation function */ sourceIndex = 0; if (preFromULength >= 0) { /* normal mode */ realSource = null; realFlush = false; } else { /* * Previous m:n conversion stored source units from a partial match * and failed to consume all of them. * We need to "replay" them from a temporary buffer and convert them first. */ realSource = source; realFlush = flush; //UConverterUtility.uprv_memcpy(replayArray, replayArrayIndex, preFromUArray, 0, -preFromULength*UMachine.U_SIZEOF_UCHAR); replayArray.put(preFromUArray, 0, -preFromULength); source = replayArray; source.position(replayArrayIndex); source.limit(replayArrayIndex - preFromULength); //preFromULength is negative, see declaration flush = false; preFromULength = 0; } /* * loop for conversion and error handling * * loop { * convert * loop { * update offsets * handle end of input * handle errors/call callback * } * } */ for (;;) { /* convert */ cr = encodeLoop(source, target, offsets, flush); /* * set a flag for whether the converter * successfully processed the end of the input * * need not check cnv.preFromULength==0 because a replay (<0) will cause * s 0) { /* * if a converter handles offsets and updates the offsets * pointer at the end, then offset should not change * here; * however, some converters do not handle offsets at all * (sourceIndex<0) or may not update the offsets pointer */ /* offsets.position(offsets.position() + length); } if (sourceIndex >= 0) { sourceIndex += (int) (source.position()); } } */ if (preFromULength < 0) { /* * switch the source to new replay units (cannot occur while replaying) * after offset handling and before end-of-input and callback handling */ if (realSource == null) { realSource = source; realFlush = flush; //UConverterUtility.uprv_memcpy(replayArray, replayArrayIndex, preFromUArray, 0, -preFromULength*UMachine.U_SIZEOF_UCHAR); replayArray.put(preFromUArray, 0, -preFromULength); source = replayArray; source.position(replayArrayIndex); source.limit(replayArrayIndex - preFromULength); flush = false; if ((sourceIndex += preFromULength) < 0) { sourceIndex = -1; } preFromULength = 0; } else { /* see implementation note before _fromUnicodeWithCallback() */ //agljport:todo U_ASSERT(realSource==NULL); Assert.assrt(realSource == null); } } /* update pointers */ sBufferIndex = source.position(); if (cr.isUnderflow()) { if (sBufferIndex < source.limit()) { /* * continue with the conversion loop while there is still input left * (continue converting by breaking out of only the inner loop) */ break; } else if (realSource != null) { /* switch back from replaying to the real source and continue */ source = realSource; flush = realFlush; sourceIndex = source.position(); realSource = null; break; } else if (flush && fromUChar32 != 0) { /* * the entire input stream is consumed * and there is a partial, truncated input sequence left */ /* inject an error and continue with callback handling */ //err[0]=ErrorCode.U_TRUNCATED_CHAR_FOUND; cr = CoderResult.malformedForLength(1); calledCallback = false; /* new error condition */ } else { /* input consumed */ if (flush) { /* * return to the conversion loop once more if the flush * flag is set and the conversion function has not * successfully processed the end of the input yet * * (continue converting by breaking out of only the inner loop) */ if (!converterSawEndOfInput) { break; } /* reset the converter without calling the callback function */ implReset(); } /* done successfully */ return cr; } } /*U_FAILURE(*err) */ { if (calledCallback || cr.isOverflow() || (!cr.isMalformed() && !cr.isUnmappable())) { /* * the callback did not or cannot resolve the error: * set output pointers and return * * the check for buffer overflow is redundant but it is * a high-runner case and hopefully documents the intent * well * * if we were replaying, then the replay buffer must be * copied back into the UConverter * and the real arguments must be restored */ if (realSource != null) { int length; //agljport:todo U_ASSERT(cnv.preFromULength==0); length = source.remaining(); if (length > 0) { //UConverterUtility.uprv_memcpy(preFromUArray, 0, sourceArray, pArgs.sourceBegin, length*UMachine.U_SIZEOF_UCHAR); source.get(preFromUArray, 0, length); preFromULength = (byte) -length; } source = realSource; flush = realFlush; } return cr; } } /* callback handling */ { int codePoint; /* get and write the code point */ codePoint = fromUChar32; errorInputLength = UTF16.append(invalidUCharBuffer, 0, fromUChar32); invalidUCharLength = errorInputLength; /* set the converter state to deal with the next character */ fromUChar32 = 0; /* call the callback function */ cr = fromCharErrorBehaviour.call(this, fromUContext, source, target, offsets, invalidUCharBuffer, invalidUCharLength, codePoint, cr); } /* * loop back to the offset handling * * this flag will indicate after offset handling * that a callback was called; * if the callback did not resolve the error, then we return */ calledCallback = true; } } } /* * Ascertains if a given Unicode code point (32bit value for handling surrogates) * can be converted to the target encoding. If the caller wants to test if a * surrogate pair can be converted to target encoding then the * responsibility of assembling the int value lies with the caller. * For assembling a code point the caller can use UTF16 class of ICU4J and do something like: *
     *  while(i
     * or
     * 
     *  String src = new String(mySource);
     *  int i,codepoint;
     *  boolean passed = false;
     *  while(i0xfff)? 2:1;
     *      if(!(CharsetEncoderICU) myConv).canEncode(codepoint)){
     *          passed = false;
     *      }
     *  }
     * 
* * @param codepoint Unicode code point as int value * @return true if a character can be converted */ /* TODO This is different from Java's canEncode(char) API. * ICU's API should implement getUnicodeSet, * and override canEncode(char) which queries getUnicodeSet. * The getUnicodeSet should return a frozen UnicodeSet or use a fillin parameter, like ICU4C. */ /*public boolean canEncode(int codepoint) { return true; }*/ /** * Overrides super class method * @stable ICU 3.6 */ public boolean isLegalReplacement(byte[] repl) { return true; } /* * Writes out the specified output bytes to the target byte buffer or to converter internal buffers. * @param cnv * @param bytesArray * @param bytesBegin * @param bytesLength * @param out * @param offsets * @param sourceIndex * @return A CoderResult object that contains the error result when an error occurs. */ static final CoderResult fromUWriteBytes(CharsetEncoderICU cnv, byte[] bytesArray, int bytesBegin, int bytesLength, ByteBuffer out, IntBuffer offsets, int sourceIndex) { //write bytes int obl = bytesLength; CoderResult cr = CoderResult.UNDERFLOW; int bytesLimit = bytesBegin + bytesLength; try { for (; bytesBegin < bytesLimit;) { out.put(bytesArray[bytesBegin]); bytesBegin++; } // success bytesLength = 0; } catch (BufferOverflowException ex) { cr = CoderResult.OVERFLOW; } if (offsets != null) { while (obl > bytesLength) { offsets.put(sourceIndex); --obl; } } //write overflow cnv.errorBufferLength = bytesLimit - bytesBegin; if (cnv.errorBufferLength > 0) { int index = 0; while (bytesBegin < bytesLimit) { cnv.errorBuffer[index++] = bytesArray[bytesBegin++]; } cr = CoderResult.OVERFLOW; } return cr; } /* * Returns the number of chars held in the converter's internal state * because more input is needed for completing the conversion. This function is * useful for mapping semantics of ICU's converter interface to those of iconv, * and this information is not needed for normal conversion. * @return The number of chars in the state. -1 if an error is encountered. */ /*public*/int fromUCountPending() { if (preFromULength > 0) { return UTF16.getCharCount(preFromUFirstCP) + preFromULength; } else if (preFromULength < 0) { return -preFromULength; } else if (fromUChar32 > 0) { return 1; } else if (preFromUFirstCP > 0) { return UTF16.getCharCount(preFromUFirstCP); } return 0; } /** * * @param source */ private final void setSourcePosition(CharBuffer source) { // ok was there input held in the previous invocation of encodeLoop // that resulted in output in this invocation? source.position(source.position() - fromUCountPending()); } /* * Write the codepage substitution character. * Subclasses to override this method. * For stateful converters, it is typically necessary to handle this * specificially for the converter in order to properly maintain the state. * @param source The input character buffer * @param target The output byte buffer * @param offsets * @return A CoderResult object that contains the error result when an error occurs. */ CoderResult cbFromUWriteSub(CharsetEncoderICU encoder, CharBuffer source, ByteBuffer target, IntBuffer offsets) { CharsetICU cs = (CharsetICU) encoder.charset(); byte[] sub = encoder.replacement(); if (cs.subChar1 != 0 && encoder.invalidUCharBuffer[0] <= 0xff) { return CharsetEncoderICU.fromUWriteBytes(encoder, new byte[] { cs.subChar1 }, 0, 1, target, offsets, source .position()); } else { return CharsetEncoderICU.fromUWriteBytes(encoder, sub, 0, sub.length, target, offsets, source.position()); } } /* * Write the characters to target. * @param source The input character buffer * @param target The output byte buffer * @param offsets * @return A CoderResult object that contains the error result when an error occurs. */ CoderResult cbFromUWriteUChars(CharsetEncoderICU encoder, CharBuffer source, ByteBuffer target, IntBuffer offsets) { CoderResult cr = CoderResult.UNDERFLOW; /* This is a fun one. Recursion can occur - we're basically going to * just retry shoving data through the same converter. Note, if you got * here through some kind of invalid sequence, you maybe should emit a * reset sequence of some kind. Since this IS an actual conversion, * take care that you've changed the callback or the data, or you'll * get an infinite loop. */ int oldTargetPosition = target.position(); int offsetIndex = source.position(); cr = encoder.encode(source, target, null, false); /* no offsets and no flush */ if (offsets != null) { while (target.position() != oldTargetPosition) { offsets.put(offsetIndex); oldTargetPosition++; } } /* Note, if you did something like used a stop subcallback, things would get interesting. * In fact, here's where we want to return the partially consumed in-source! */ if (cr.isOverflow()) { /* Overflowed target. Now, we'll write into the charErrorBuffer. * It's a fixed size. If we overflow it...Hm */ /* start the new target at the first free slot in the error buffer */ int errBuffLen = encoder.errorBufferLength; ByteBuffer newTarget = ByteBuffer.wrap(encoder.errorBuffer); newTarget.position(errBuffLen); /* set the position at the end of the error buffer */ encoder.errorBufferLength = 0; encoder.encode(source, newTarget, null, false); encoder.errorBuffer = newTarget.array(); encoder.errorBufferLength = newTarget.position(); } return cr; } /** *

* Handles a common situation where a character has been read and it may be * a lead surrogate followed by a trail surrogate. This method can change * the source position and will modify fromUChar32. *

* *

* If null is returned, then there was success in reading a * surrogate pair, the codepoint is stored in fromUChar32 and * fromUChar32 should be reset (to 0) after being read. *

* * @param source * The encoding source. * @param lead * A character that may be the first in a surrogate pair. * @return CoderResult.malformedForLength(1) or * CoderResult.UNDERFLOW if there is a problem, or * null if there isn't. * @see handleSurrogates(CharBuffer, char) * @see handleSurrogates(CharBuffer, int, char) * @see handleSurrogates(char[], int, int, char) */ final CoderResult handleSurrogates(CharBuffer source, char lead) { if (!UTF16.isLeadSurrogate(lead)) { fromUChar32 = lead; return CoderResult.malformedForLength(1); } if (!source.hasRemaining()) { fromUChar32 = lead; return CoderResult.UNDERFLOW; } char trail = source.get(); if (!UTF16.isTrailSurrogate(trail)) { fromUChar32 = lead; source.position(source.position() - 1); return CoderResult.malformedForLength(1); } fromUChar32 = UCharacter.getCodePoint(lead, trail); return null; } /** *

* Same as handleSurrogates(CharBuffer, char), but with arrays. As an added * requirement, the calling method must also increment the index if this method returns * null. *

* * * @param source * The encoding source. * @param lead * A character that may be the first in a surrogate pair. * @return CoderResult.malformedForLength(1) or * CoderResult.UNDERFLOW if there is a problem, or null if * there isn't. * @see handleSurrogates(CharBuffer, char) * @see handleSurrogates(CharBuffer, int, char) * @see handleSurrogates(char[], int, int, char) */ final CoderResult handleSurrogates(char[] sourceArray, int sourceIndex, int sourceLimit, char lead) { if (!UTF16.isLeadSurrogate(lead)) { fromUChar32 = lead; return CoderResult.malformedForLength(1); } if (sourceIndex >= sourceLimit) { fromUChar32 = lead; return CoderResult.UNDERFLOW; } char trail = sourceArray[sourceIndex]; if (!UTF16.isTrailSurrogate(trail)) { fromUChar32 = lead; return CoderResult.malformedForLength(1); } fromUChar32 = UCharacter.getCodePoint(lead, trail); return null; } } icu4j-4.2/src/com/ibm/icu/charset/CharsetASCII.java0000644000175000017500000003475511361046170021702 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.BufferOverflowException; import java.nio.BufferUnderflowException; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; class CharsetASCII extends CharsetICU { protected byte[] fromUSubstitution = new byte[] { (byte) 0x1a }; public CharsetASCII(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); maxBytesPerChar = 1; minBytesPerChar = 1; maxCharsPerByte = 1; } class CharsetDecoderASCII extends CharsetDecoderICU { public CharsetDecoderASCII(CharsetICU cs) { super(cs); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { if (!source.hasRemaining()) { /* no input, nothing to do */ return CoderResult.UNDERFLOW; } if (!target.hasRemaining()) { /* no output available, can't do anything */ return CoderResult.OVERFLOW; } CoderResult cr; int oldSource = source.position(); int oldTarget = target.position(); if (source.hasArray() && target.hasArray()) { /* optimized loop */ /* * extract arrays from the buffers and obtain various constant values that will be * necessary in the core loop */ byte[] sourceArray = source.array(); int sourceOffset = source.arrayOffset(); int sourceIndex = oldSource + sourceOffset; int sourceLength = source.limit() - oldSource; char[] targetArray = target.array(); int targetOffset = target.arrayOffset(); int targetIndex = oldTarget + targetOffset; int targetLength = target.limit() - oldTarget; int limit = ((sourceLength < targetLength) ? sourceLength : targetLength) + sourceIndex; int offset = targetIndex - sourceIndex; /* * perform the core loop... if it returns null, it must be due to an overflow or * underflow */ cr = decodeLoopCoreOptimized(source, target, sourceArray, targetArray, sourceIndex, offset, limit); if (cr == null) { if (sourceLength <= targetLength) { source.position(oldSource + sourceLength); target.position(oldTarget + sourceLength); cr = CoderResult.UNDERFLOW; } else { source.position(oldSource + targetLength); target.position(oldTarget + targetLength); cr = CoderResult.OVERFLOW; } } } else { /* unoptimized loop */ try { /* * perform the core loop... if it throws an exception, it must be due to an * overflow or underflow */ cr = decodeLoopCoreUnoptimized(source, target); } catch (BufferUnderflowException ex) { /* all of the source has been read */ cr = CoderResult.UNDERFLOW; } catch (BufferOverflowException ex) { /* the target is full */ source.position(source.position() - 1); /* rewind by 1 */ cr = CoderResult.OVERFLOW; } } /* set offsets since the start */ if (offsets != null) { int count = target.position() - oldTarget; int sourceIndex = -1; while (--count >= 0) offsets.put(++sourceIndex); } return cr; } protected CoderResult decodeLoopCoreOptimized(ByteBuffer source, CharBuffer target, byte[] sourceArray, char[] targetArray, int oldSource, int offset, int limit) { int i, ch = 0; /* * perform ascii conversion from the source array to the target array, making sure each * byte in the source is within the correct range */ for (i = oldSource; i < limit && (((ch = (sourceArray[i] & 0xff)) & 0x80) == 0); i++) targetArray[i + offset] = (char) ch; /* * if some byte was not in the correct range, we need to deal with this byte by calling * decodeMalformedOrUnmappable and move the source and target positions to reflect the * early termination of the loop */ if ((ch & 0x80) != 0) { source.position(i + 1); target.position(i + offset); return decodeMalformedOrUnmappable(ch); } else return null; } protected CoderResult decodeLoopCoreUnoptimized(ByteBuffer source, CharBuffer target) throws BufferUnderflowException, BufferOverflowException { int ch = 0; /* * perform ascii conversion from the source buffer to the target buffer, making sure * each byte in the source is within the correct range */ while (((ch = (source.get() & 0xff)) & 0x80) == 0) target.put((char) ch); /* * if we reach here, it's because a character was not in the correct range, and we need * to deak with this by calling decodeMalformedOrUnmappable */ return decodeMalformedOrUnmappable(ch); } protected CoderResult decodeMalformedOrUnmappable(int ch) { /* * put the guilty character into toUBytesArray and return a message saying that the * character was malformed and of length 1. */ toUBytesArray[0] = (byte) ch; toULength = 1; return CoderResult.malformedForLength(1); } } class CharsetEncoderASCII extends CharsetEncoderICU { public CharsetEncoderASCII(CharsetICU cs) { super(cs, fromUSubstitution); implReset(); } private final static int NEED_TO_WRITE_BOM = 1; protected void implReset() { super.implReset(); fromUnicodeStatus = NEED_TO_WRITE_BOM; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { if (!source.hasRemaining()) { /* no input, nothing to do */ return CoderResult.UNDERFLOW; } if (!target.hasRemaining()) { /* no output available, can't do anything */ return CoderResult.OVERFLOW; } CoderResult cr; int oldSource = source.position(); int oldTarget = target.position(); if (fromUChar32 != 0) { /* * if we have a leading character in fromUChar32 that needs to be dealt with, we * need to check for a matching trail character and taking the appropriate action as * dictated by encodeTrail. */ cr = encodeTrail(source, (char) fromUChar32, flush); } else { if (source.hasArray() && target.hasArray()) { /* optimized loop */ /* * extract arrays from the buffers and obtain various constant values that will * be necessary in the core loop */ char[] sourceArray = source.array(); int sourceOffset = source.arrayOffset(); int sourceIndex = oldSource + sourceOffset; int sourceLength = source.limit() - oldSource; byte[] targetArray = target.array(); int targetOffset = target.arrayOffset(); int targetIndex = oldTarget + targetOffset; int targetLength = target.limit() - oldTarget; int limit = ((sourceLength < targetLength) ? sourceLength : targetLength) + sourceIndex; int offset = targetIndex - sourceIndex; /* * perform the core loop... if it returns null, it must be due to an overflow or * underflow */ cr = encodeLoopCoreOptimized(source, target, sourceArray, targetArray, sourceIndex, offset, limit, flush); if (cr == null) { if (sourceLength <= targetLength) { source.position(oldSource + sourceLength); target.position(oldTarget + sourceLength); cr = CoderResult.UNDERFLOW; } else { source.position(oldSource + targetLength); target.position(oldTarget + targetLength); cr = CoderResult.OVERFLOW; } } } else { /* unoptimized loop */ try { /* * perform the core loop... if it throws an exception, it must be due to an * overflow or underflow */ cr = encodeLoopCoreUnoptimized(source, target, flush); } catch (BufferUnderflowException ex) { cr = CoderResult.UNDERFLOW; } catch (BufferOverflowException ex) { source.position(source.position() - 1); /* rewind by 1 */ cr = CoderResult.OVERFLOW; } } } /* set offsets since the start */ if (offsets != null) { int count = target.position() - oldTarget; int sourceIndex = -1; while (--count >= 0) offsets.put(++sourceIndex); } return cr; } protected CoderResult encodeLoopCoreOptimized(CharBuffer source, ByteBuffer target, char[] sourceArray, byte[] targetArray, int oldSource, int offset, int limit, boolean flush) { int i, ch = 0; /* * perform ascii conversion from the source array to the target array, making sure each * char in the source is within the correct range */ for (i = oldSource; i < limit && (((ch = (int) sourceArray[i]) & 0xff80) == 0); i++) targetArray[i + offset] = (byte) ch; /* * if some byte was not in the correct range, we need to deal with this byte by calling * encodeMalformedOrUnmappable and move the source and target positions to reflect the * early termination of the loop */ if ((ch & 0xff80) != 0) { source.position(i + 1); target.position(i + offset); return encodeMalformedOrUnmappable(source, ch, flush); } else return null; } protected CoderResult encodeLoopCoreUnoptimized(CharBuffer source, ByteBuffer target, boolean flush) throws BufferUnderflowException, BufferOverflowException { int ch; /* * perform ascii conversion from the source buffer to the target buffer, making sure * each char in the source is within the correct range */ while (((ch = (int) source.get()) & 0xff80) == 0) target.put((byte) ch); /* * if we reach here, it's because a character was not in the correct range, and we need * to deak with this by calling encodeMalformedOrUnmappable. */ return encodeMalformedOrUnmappable(source, ch, flush); } protected final CoderResult encodeMalformedOrUnmappable(CharBuffer source, int ch, boolean flush) { /* * if the character is a lead surrogate, we need to call encodeTrail to attempt to match * it up with a trail surrogate. if not, the character is unmappable. */ return (UTF16.isSurrogate((char) ch)) ? encodeTrail(source, (char) ch, flush) : CoderResult.unmappableForLength(1); } private final CoderResult encodeTrail(CharBuffer source, char lead, boolean flush) { /* * ASCII doesn't support characters in the BMP, so if handleSurrogates returns null, * we leave fromUChar32 alone (it should store a new codepoint) and call it unmappable. */ CoderResult cr = handleSurrogates(source, lead); if (cr != null) { return cr; } else { //source.position(source.position() - 2); return CoderResult.unmappableForLength(2); } } } public CharsetDecoder newDecoder() { return new CharsetDecoderASCII(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderASCII(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ setFillIn.add(0,0x7f); } } icu4j-4.2/src/com/ibm/icu/charset/UConverterAliasDataReader.java0000644000175000017500000002323211361046170024507 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.io.*; import com.ibm.icu.impl.ICUBinary; /* Format of cnvalias.icu ----------------------------------------------------- * * cnvalias.icu is a binary, memory-mappable form of convrtrs.txt. * This binary form contains several tables. All indexes are to uint16_t * units, and not to the bytes (uint8_t units). Addressing everything on * 16-bit boundaries allows us to store more information with small index * numbers, which are also 16-bit in size. The majority of the table (except * the string table) are 16-bit numbers. * * First there is the size of the Table of Contents (TOC). The TOC * entries contain the size of each section. In order to find the offset * you just need to sum up the previous offsets. * The TOC length and entries are an array of uint32_t values. * The first section after the TOC starts immediately after the TOC. * * 1) This section contains a list of converters. This list contains indexes * into the string table for the converter name. The index of this list is * also used by other sections, which are mentioned later on. * This list is not sorted. * * 2) This section contains a list of tags. This list contains indexes * into the string table for the tag name. The index of this list is * also used by other sections, which are mentioned later on. * This list is in priority order of standards. * * 3) This section contains a list of sorted unique aliases. This * list contains indexes into the string table for the alias name. The * index of this list is also used by other sections, like the 4th section. * The index for the 3rd and 4th section is used to get the * alias -> converter name mapping. Section 3 and 4 form a two column table. * * 4) This section contains a list of mapped converter names. Consider this * as a table that maps the 3rd section to the 1st section. This list contains * indexes into the 1st section. The index of this list is the same index in * the 3rd section. There is also some extra information in the high bits of * each converter index in this table. Currently it's only used to say that * an alias mapped to this converter is ambiguous. See UCNV_CONVERTER_INDEX_MASK * and UCNV_AMBIGUOUS_ALIAS_MAP_BIT for more information. This section is * the predigested form of the 5th section so that an alias lookup can be fast. * * 5) This section contains a 2D array with indexes to the 6th section. This * section is the full form of all alias mappings. The column index is the * index into the converter list (column header). The row index is the index * to tag list (row header). This 2D array is the top part a 3D array. The * third dimension is in the 6th section. * * 6) This is blob of variable length arrays. Each array starts with a size, * and is followed by indexes to alias names in the string table. This is * the third dimension to the section 5. No other section should be referencing * this section. * * 7) Reserved at this time (There is no information). This _usually_ has a * size of 0. Future versions may add more information here. * * 8) This is the string table. All strings are indexed on an even address. * There are two reasons for this. First many chip architectures locate strings * faster on even address boundaries. Second, since all indexes are 16-bit * numbers, this string table can be 128KB in size instead of 64KB when we * only have strings starting on an even address. * * * Here is the concept of section 5 and 6. It's a 3D cube. Each tag * has a unique alias among all converters. That same alias can * be mentioned in other standards on different converters, * but only one alias per tag can be unique. * * * Converter Names (Usually in TR22 form) * -------------------------------------------. * T / /| * a / / | * g / / | * s / / | * / / | * ------------------------------------------/ | * A | | | * l | | | * i | | / * a | | / * s | | / * e | | / * s | |/ * ------------------------------------------- * * * * Here is what it really looks like. It's like swiss cheese. * There are holes. Some converters aren't recognized by * a standard, or they are really old converters that the * standard doesn't recognize anymore. * * Converter Names (Usually in TR22 form) * -------------------------------------------. * T /##########################################/| * a / # # /# * g / # ## ## ### # ### ### ### #/ * s / # ##### #### ## ## #/# * / ### # # ## # # # ### # # #/## * ------------------------------------------/# # * A |### # # ## # # # ### # # #|# # * l |# # # # # ## # #|# # * i |# # # # # # #|# * a |# #|# * s | #|# * e * s * */ final class UConverterAliasDataReader implements ICUBinary.Authenticate { // private final static boolean debug = ICUDebug.enabled("UConverterAliasDataReader"); /** *

Protected constructor.

* @param inputStream ICU uprop.dat file input stream * @exception IOException throw if data file fails authentication */ protected UConverterAliasDataReader(InputStream inputStream) throws IOException{ //if(debug) System.out.println("Bytes in inputStream " + inputStream.available()); /*unicodeVersion = */ICUBinary.readHeader(inputStream, DATA_FORMAT_ID, this); //if(debug) System.out.println("Bytes left in inputStream " +inputStream.available()); dataInputStream = new DataInputStream(inputStream); //if(debug) System.out.println("Bytes left in dataInputStream " +dataInputStream.available()); } // protected methods ------------------------------------------------- protected int[] readToc(int n)throws IOException { int[] toc = new int[n]; //Read the toc for (int i = 0; i < n ; ++i) { toc[i] = dataInputStream.readInt() & UNSIGNED_INT_MASK; } return toc; } protected void read(int[] convList, int[] tagList, int[] aliasList, int[]untaggedConvArray, int[] taggedAliasArray, int[] taggedAliasLists, int[] optionTable, byte[] stringTable, byte[] normalizedStringTable) throws IOException{ int i; //int listnum = 1; //long listsize; for(i = 0; i < convList.length; ++i) convList[i] = dataInputStream.readUnsignedShort(); for(i = 0; i < tagList.length; ++i) tagList[i] = dataInputStream.readUnsignedShort(); for(i = 0; i < aliasList.length; ++i) aliasList[i] = dataInputStream.readUnsignedShort(); for(i = 0; i < untaggedConvArray.length; ++i) untaggedConvArray[i] = dataInputStream.readUnsignedShort(); for(i = 0; i < taggedAliasArray.length; ++i) taggedAliasArray[i] = dataInputStream.readUnsignedShort(); for(i = 0; i < taggedAliasLists.length; ++i) taggedAliasLists[i] = dataInputStream.readUnsignedShort(); for(i = 0; i < optionTable.length; ++i) optionTable[i] = dataInputStream.readUnsignedShort(); dataInputStream.readFully(stringTable); dataInputStream.readFully(normalizedStringTable); } public boolean isDataVersionAcceptable(byte version[]) { return version.length >= DATA_FORMAT_VERSION.length && version[0] == DATA_FORMAT_VERSION[0] && version[1] == DATA_FORMAT_VERSION[1] && version[2] == DATA_FORMAT_VERSION[2]; } /*byte[] getUnicodeVersion(){ return unicodeVersion; }*/ // private data members ------------------------------------------------- /** * ICU data file input stream */ private DataInputStream dataInputStream; // private byte[] unicodeVersion; /** * File format version that this class understands. * No guarantees are made if a older version is used * see store.c of gennorm for more information and values */ // DATA_FORMAT_ID_ values taken from icu4c isAcceptable (ucnv_io.c) private static final byte DATA_FORMAT_ID[] = {(byte)0x43, (byte)0x76, (byte)0x41, (byte)0x6c}; // dataFormat="CvAl" private static final byte DATA_FORMAT_VERSION[] = {3, 0, 1}; //private static final int UNSIGNED_SHORT_MASK = 0xffff; private static final int UNSIGNED_INT_MASK = 0xffffffff; } icu4j-4.2/src/com/ibm/icu/charset/UConverterSharedData.java0000644000175000017500000003740611361046170023551 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; /** * Defines the UConverterSharedData struct, the immutable, shared part of * UConverter. */ final class UConverterSharedData { // uint32_t structSize; /* Size of this structure */ // int structSize; /* Size of this structure */ /** * used to count number of clients, 0xffffffff for static SharedData */ int referenceCounter; // agljport:todo const void *dataMemory; /* from udata_openChoice() - for cleanup */ // agljport:todo void *table; /* Unused. This used to be a UConverterTable - Pointer to conversion data - see mbcs below */ // const UConverterStaticData *staticData; /* pointer to the static (non changing) data. */ /** * pointer to the static (non changing) * data. */ UConverterStaticData staticData; // UBool sharedDataCached; /* TRUE: shared data is in cache, don't destroy // on close() if 0 ref. FALSE: shared data isn't in the cache, do attempt to // clean it up if the ref is 0 */ /** * TRUE: shared data is in cache, don't destroy * on close() if 0 ref. FALSE: shared data isn't * in the cache, do attempt to clean it up if * the ref is 0 */ boolean sharedDataCached; /* * UBool staticDataOwned; TRUE if static data owned by shared data & should * be freed with it, NEVER true for udata() loaded statics. This ignored * variable was removed to make space for sharedDataCached. */ // const UConverterImpl *impl; /* vtable-style struct of mostly function pointers */ // UConverterImpl impl; /* vtable-style struct of mostly function pointers */ /** initial values of some members of the mutable part of object */ long toUnicodeStatus; /** * Shared data structures currently come in two flavors: * - readonly for built-in algorithmic converters * - allocated for MBCS, with a pointer to an allocated UConverterTable * which always has a UConverterMBCSTable * * To eliminate one allocation, I am making the UConverterMBCSTable a member * of the shared data. It is the last member so that static definitions of * UConverterSharedData work as before. The table field above also remains * to avoid updating all static definitions, but is now unused. * */ CharsetMBCS.UConverterMBCSTable mbcs; UConverterSharedData() { mbcs = new CharsetMBCS.UConverterMBCSTable(); } UConverterSharedData(int referenceCounter_, UConverterStaticData staticData_, boolean sharedDataCached_, long toUnicodeStatus_) { this(); referenceCounter = referenceCounter_; staticData = staticData_; sharedDataCached = sharedDataCached_; // impl = impl_; toUnicodeStatus = toUnicodeStatus_; } /** * UConverterImpl contains all the data and functions for a converter type. * Its function pointers work much like a C++ vtable. Many converter types * need to define only a subset of the functions; when a function pointer is * NULL, then a default action will be performed. * * Every converter type must implement toUnicode, fromUnicode, and * getNextUChar, otherwise the converter may crash. Every converter type * that has variable-length codepage sequences should also implement * toUnicodeWithOffsets and fromUnicodeWithOffsets for correct offset * handling. All other functions may or may not be implemented - it depends * only on whether the converter type needs them. * * When open() fails, then close() will be called, if present. */ /* class UConverterImpl { UConverterType type; UConverterToUnicode toUnicode; protected void doToUnicode(UConverterToUnicodeArgs args, int[] pErrorCode) { } final void toUnicode(UConverterToUnicodeArgs args, int[] pErrorCode) { doToUnicode(args, pErrorCode); } //UConverterFromUnicode fromUnicode; protected void doFromUnicode(UConverterFromUnicodeArgs args, int[] pErrorCode) { } final void fromUnicode(UConverterFromUnicodeArgs args, int[] pErrorCode) { doFromUnicode(args, pErrorCode); } protected int doGetNextUChar(UConverterToUnicodeArgs args, int[] pErrorCode) { return 0; } //UConverterGetNextUChar getNextUChar; final int getNextUChar(UConverterToUnicodeArgs args, int[] pErrorCode) { return doGetNextUChar(args, pErrorCode); } // interface UConverterImplLoadable extends UConverterImpl protected void doLoad(UConverterLoadArgs pArgs, short[] raw, int[] pErrorCode) { } protected void doUnload() { } // interface UConverterImplOpenable extends UConverterImpl protected void doOpen(UConverter cnv, String name, String locale, long options, int[] pErrorCode) { } //UConverterOpen open; final void open(UConverter cnv, String name, String locale, long options, int[] pErrorCode) { doOpen(cnv, name, locale, options, pErrorCode); } protected void doClose(UConverter cnv) { } //UConverterClose close; final void close(UConverter cnv) { doClose(cnv); } protected void doReset(UConverter cnv, int choice) { } //typedef void (*UConverterReset) (UConverter *cnv, UConverterResetChoice choice); //UConverterReset reset; final void reset(UConverter cnv, int choice) { doReset(cnv, choice); } // interface UConverterImplVariableLength extends UConverterImpl protected void doToUnicodeWithOffsets(UConverterToUnicodeArgs args, int[] pErrorCode) { } //UConverterToUnicode toUnicodeWithOffsets; final void toUnicodeWithOffsets(UConverterToUnicodeArgs args, int[] pErrorCode) { doToUnicodeWithOffsets(args, pErrorCode); } protected void doFromUnicodeWithOffsets(UConverterFromUnicodeArgs args, int[] pErrorCode) { } //UConverterFromUnicode fromUnicodeWithOffsets; final void fromUnicodeWithOffsets(UConverterFromUnicodeArgs args, int[] pErrorCode) { doFromUnicodeWithOffsets(args, pErrorCode); } // interface UConverterImplMisc extends UConverterImpl protected void doGetStarters(UConverter converter, boolean starters[], int[] pErrorCode) { } //UConverterGetStarters getStarters; final void getStarters(UConverter converter, boolean starters[], int[] pErrorCode) { doGetStarters(converter, starters, pErrorCode); } protected String doGetName(UConverter cnv) { return ""; } //UConverterGetName getName; final String getName(UConverter cnv) { return doGetName(cnv); } protected void doWriteSub(UConverterFromUnicodeArgs pArgs, long offsetIndex, int[] pErrorCode) { } //UConverterWriteSub writeSub; final void writeSub(UConverterFromUnicodeArgs pArgs, long offsetIndex, int[] pErrorCode) { doWriteSub(pArgs, offsetIndex, pErrorCode); } protected UConverter doSafeClone(UConverter cnv, byte[] stackBuffer, int[] pBufferSize, int[] status) { return new UConverter(); } //UConverterSafeClone safeClone; final UConverter safeClone(UConverter cnv, byte[] stackBuffer, int[] pBufferSize, int[] status) { return doSafeClone(cnv, stackBuffer, pBufferSize, status); } protected void doGetUnicodeSet(UConverter cnv, UnicodeSet /*USetAdder* / sa, int /*UConverterUnicodeSet* / which, int[] pErrorCode) { } //UConverterGetUnicodeSet getUnicodeSet; // final void getUnicodeSet(UConverter cnv, UnicodeSet /*USetAdder* / sa, int /*UConverterUnicodeSet* / which, int[] pErrorCode) //{ // doGetUnicodeSet(cnv, sa, which, pErrorCode); //} //} static final String DATA_TYPE = "cnv"; private static final int CNV_DATA_BUFFER_SIZE = 25000; static final int sizeofUConverterSharedData = 100; //static UDataMemoryIsAcceptable isCnvAcceptable; /** * Load a non-algorithmic converter. * If pkg==NULL, then this function must be called inside umtx_lock(&cnvCacheMutex). // UConverterSharedData * load(UConverterLoadArgs *pArgs, UErrorCode *err) static final UConverterSharedData load(UConverterLoadArgs pArgs, int[] err) { UConverterSharedData mySharedConverterData = null; if(err == null || ErrorCode.isFailure(err[0])) { return null; } if(pArgs.pkg != null && pArgs.pkg.length() != 0) { application-provided converters are not currently cached return UConverterSharedData.createConverterFromFile(pArgs, err); } //agljport:fix mySharedConverterData = getSharedConverterData(pArgs.name); if (mySharedConverterData == null) { Not cached, we need to stream it in from file mySharedConverterData = UConverterSharedData.createConverterFromFile(pArgs, err); if (ErrorCode.isFailure(err[0]) || (mySharedConverterData == null)) { return null; } else { share it with other library clients //agljport:fix shareConverterData(mySharedConverterData); } } else { The data for this converter was already in the cache. Update the reference counter on the shared data: one more client mySharedConverterData.referenceCounter++; } return mySharedConverterData; } Takes an alias name gets an actual converter file name *goes to disk and opens it. *allocates the memory and returns a new UConverter object //static UConverterSharedData *createConverterFromFile(UConverterLoadArgs *pArgs, UErrorCode * err) static final UConverterSharedData createConverterFromFile(UConverterLoadArgs pArgs, int[] err) { UDataMemory data = null; UConverterSharedData sharedData = null; //agljport:todo UTRACE_ENTRY_OC(UTRACE_LOAD); if (err == null || ErrorCode.isFailure(err[0])) { //agljport:todo UTRACE_EXIT_STATUS(*err); return null; } //agljport:todo UTRACE_DATA2(UTRACE_OPEN_CLOSE, "load converter %s from package %s", pArgs->name, pArgs->pkg); //agljport:fix data = udata_openChoice(pArgs.pkgArray, DATA_TYPE.getBytes(), pArgs.name, isCnvAcceptable, null, err); if(ErrorCode.isFailure(err[0])) { //agljport:todo UTRACE_EXIT_STATUS(*err); return null; } sharedData = data_unFlattenClone(pArgs, data, err); if(ErrorCode.isFailure(err[0])) { //agljport:fix udata_close(data); //agljport:todo UTRACE_EXIT_STATUS(*err); return null; } * TODO Store pkg in a field in the shared data so that delta-only converters * can load base converters from the same package. * If the pkg name is longer than the field, then either do not load the converter * in the first place, or just set the pkg field to "". return sharedData; } */ UConverterDataReader dataReader = null; /* * returns a converter type from a string */ /* static final UConverterSharedData getAlgorithmicTypeFromName(String realName) { long mid, start, limit; long lastMid; int result; StringBuffer strippedName = new StringBuffer(UConverterConstants.MAX_CONVERTER_NAME_LENGTH); // Lower case and remove ignoreable characters. UConverterAlias.stripForCompare(strippedName, realName); // do a binary search for the alias start = 0; limit = cnvNameType.length; mid = limit; lastMid = -1; for (;;) { mid = (long)((start + limit) / 2); if (lastMid == mid) { // Have we moved? break; // We haven't moved, and it wasn't found. } lastMid = mid; result = strippedName.substring(0).compareTo(cnvNameType[(int)mid].name); if (result < 0) { limit = mid; } else if (result > 0) { start = mid; } else { return converterData[cnvNameType[(int)mid].type]; } } return null; }*/ /* * Enum for specifying basic types of converters */ static final class UConverterType { static final int UNSUPPORTED_CONVERTER = -1; static final int SBCS = 0; static final int DBCS = 1; static final int MBCS = 2; static final int LATIN_1 = 3; static final int UTF8 = 4; static final int UTF16_BigEndian = 5; static final int UTF16_LittleEndian = 6; static final int UTF32_BigEndian = 7; static final int UTF32_LittleEndian = 8; static final int EBCDIC_STATEFUL = 9; static final int ISO_2022 = 10; static final int LMBCS_1 = 11; static final int LMBCS_2 = LMBCS_1 + 1; // 12 static final int LMBCS_3 = LMBCS_2 + 1; // 13 static final int LMBCS_4 = LMBCS_3 + 1; // 14 static final int LMBCS_5 = LMBCS_4 + 1; // 15 static final int LMBCS_6 = LMBCS_5 + 1; // 16 static final int LMBCS_8 = LMBCS_6 + 1; // 17 static final int LMBCS_11 = LMBCS_8 + 1; // 18 static final int LMBCS_16 = LMBCS_11 + 1; // 19 static final int LMBCS_17 = LMBCS_16 + 1; // 20 static final int LMBCS_18 = LMBCS_17 + 1; // 21 static final int LMBCS_19 = LMBCS_18 + 1; // 22 static final int LMBCS_LAST = LMBCS_19; // 22 static final int HZ = LMBCS_LAST + 1; // 23 static final int SCSU = HZ + 1; // 24 static final int ISCII = SCSU + 1; // 25 static final int US_ASCII = ISCII + 1; // 26 static final int UTF7 = US_ASCII + 1; // 27 static final int BOCU1 = UTF7 + 1; // 28 static final int UTF16 = BOCU1 + 1; // 29 static final int UTF32 = UTF16 + 1; // 30 static final int CESU8 = UTF32 + 1; // 31 static final int IMAP_MAILBOX = CESU8 + 1; // 32 // Number of converter types for which we have conversion routines. static final int NUMBER_OF_SUPPORTED_CONVERTER_TYPES = IMAP_MAILBOX + 1; } /** * Enum for specifying which platform a converter ID refers to. The use of * platform/CCSID is not recommended. See openCCSID(). */ static final class UConverterPlatform { static final int UNKNOWN = -1; static final int IBM = 0; } // static UConverterSharedData[] converterData; /* static class cnvNameTypeClass { String name; int type; cnvNameTypeClass(String name_, int type_) { name = name_; type = type_; } } static cnvNameTypeClass cnvNameType[];*/ static final String DATA_TYPE = "cnv"; //static final int CNV_DATA_BUFFER_SIZE = 25000; //static final int SIZE_OF_UCONVERTER_SHARED_DATA = 228; } icu4j-4.2/src/com/ibm/icu/charset/CharsetCallback.java0000644000175000017500000004761311361046170022543 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CoderResult; /** *

Callback API for CharsetICU API

* * CharsetCallback class defines some error behaviour functions called * by CharsetDecoderICU and CharsetEncoderICU. The class also provides * the facility by which clients can write their own callbacks. * * These functions, although public, should NEVER be called directly. * They should be used as parameters to the onUmappableCharacter() and * onMalformedInput() methods, to set the behaviour of a converter * when it encounters UNMAPPED/INVALID sequences. * Currently the only way to set callbacks is by using CodingErrorAction. * In the future we will provide set methods on CharsetEncoder and CharsetDecoder * that will accept CharsetCallback fields. * * @stable ICU 3.6 */ public class CharsetCallback { /* * FROM_U, TO_U context options for sub callback */ private static final String SUB_STOP_ON_ILLEGAL = "i"; // /* // * FROM_U, TO_U context options for skip callback // */ // private static final String SKIP_STOP_ON_ILLEGAL = "i"; // /* // * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to ICU (%UXXXX) // */ // private static final String ESCAPE_ICU = null; /* * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to JAVA (\\uXXXX) */ private static final String ESCAPE_JAVA = "J"; /* * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to C (\\uXXXX \\UXXXXXXXX) * TO_U_CALLBACK_ESCAPE option to escape the character value accoding to C (\\xXXXX) */ private static final String ESCAPE_C = "C"; /* * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Decimal escape \htmlonly(&#DDDD;)\endhtmlonly * TO_U_CALLBACK_ESCAPE context option to escape the character value accoding to XML Decimal escape \htmlonly(&#DDDD;)\endhtmlonly */ private static final String ESCAPE_XML_DEC = "D"; /* * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Hex escape \htmlonly(&#xXXXX;)\endhtmlonly * TO_U_CALLBACK_ESCAPE context option to escape the character value according to XML Hex escape \htmlonly(&#xXXXX;)\endhtmlonly */ private static final String ESCAPE_XML_HEX = "X"; /* * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to Unicode (U+XXXXX) */ private static final String ESCAPE_UNICODE = "U"; /* * FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to Unicode (U+XXXXX) */ private static final String ESCAPE_CSS2 = "S"; /** * Decoder Callback interface * @stable ICU 3.6 */ public interface Decoder { /** * This function is called when the bytes in the source cannot be handled, * and this function is meant to handle or fix the error if possible. * * @return Result of decoding action. This returned object is set to an error * if this function could not handle the conversion. * @stable ICU 3.6 */ public CoderResult call(CharsetDecoderICU decoder, Object context, ByteBuffer source, CharBuffer target, IntBuffer offsets, char[] buffer, int length, CoderResult cr); } /** * Encoder Callback interface * @stable ICU 3.6 */ public interface Encoder { /** * This function is called when the Unicode characters in the source cannot be handled, * and this function is meant to handle or fix the error if possible. * @return Result of decoding action. This returned object is set to an error * if this function could not handle the conversion. * @stable ICU 3.6 */ public CoderResult call(CharsetEncoderICU encoder, Object context, CharBuffer source, ByteBuffer target, IntBuffer offsets, char[] buffer, int length, int cp, CoderResult cr); } /** * Skip callback * @stable ICU 3.6 */ public static final Encoder FROM_U_CALLBACK_SKIP = new Encoder() { public CoderResult call(CharsetEncoderICU encoder, Object context, CharBuffer source, ByteBuffer target, IntBuffer offsets, char[] buffer, int length, int cp, CoderResult cr){ if(context==null){ return CoderResult.UNDERFLOW; }else if(((String)context).equals(SUB_STOP_ON_ILLEGAL)){ if(!cr.isUnmappable()){ return cr; }else{ return CoderResult.UNDERFLOW; } } return cr; } }; /** * Skip callback * @stable ICU 3.6 */ public static final Decoder TO_U_CALLBACK_SKIP = new Decoder() { public CoderResult call(CharsetDecoderICU decoder, Object context, ByteBuffer source, CharBuffer target, IntBuffer offsets, char[] buffer, int length, CoderResult cr){ if(context==null){ return CoderResult.UNDERFLOW; }else if(((String)context).equals(SUB_STOP_ON_ILLEGAL)){ if(!cr.isUnmappable()){ return cr; }else{ return CoderResult.UNDERFLOW; } } return cr; } }; /** * Write substitute callback * @stable ICU 3.6 */ public static final Encoder FROM_U_CALLBACK_SUBSTITUTE = new Encoder(){ public CoderResult call(CharsetEncoderICU encoder, Object context, CharBuffer source, ByteBuffer target, IntBuffer offsets, char[] buffer, int length, int cp, CoderResult cr){ if(context==null){ return encoder.cbFromUWriteSub(encoder, source, target, offsets); }else if(((String)context).equals(SUB_STOP_ON_ILLEGAL)){ if(!cr.isUnmappable()){ return cr; }else{ return encoder.cbFromUWriteSub(encoder, source, target, offsets); } } return cr; } }; private static final char[] kSubstituteChar1 = new char[]{0x1A}; private static final char[] kSubstituteChar = new char[] {0xFFFD}; /** * Write substitute callback * @stable ICU 3.6 */ public static final Decoder TO_U_CALLBACK_SUBSTITUTE = new Decoder() { public CoderResult call(CharsetDecoderICU decoder, Object context, ByteBuffer source, CharBuffer target, IntBuffer offsets, char[] buffer, int length, CoderResult cr){ CharsetICU cs = (CharsetICU) decoder.charset(); /* could optimize this case, just one uchar */ if(decoder.invalidCharLength == 1 && cs.subChar1 != 0) { return CharsetDecoderICU.toUWriteUChars(decoder, kSubstituteChar1, 0, 1, target, offsets, source.position()); } else { return CharsetDecoderICU.toUWriteUChars(decoder, kSubstituteChar, 0, 1, target, offsets, source.position()); } } }; /** * Stop callback * @stable ICU 3.6 */ public static final Encoder FROM_U_CALLBACK_STOP = new Encoder() { public CoderResult call(CharsetEncoderICU encoder, Object context, CharBuffer source, ByteBuffer target, IntBuffer offsets, char[] buffer, int length, int cp, CoderResult cr){ return cr; } }; /** * Stop callback * @stable ICU 3.6 */ public static final Decoder TO_U_CALLBACK_STOP = new Decoder() { public CoderResult call(CharsetDecoderICU decoder, Object context, ByteBuffer source, CharBuffer target, IntBuffer offsets, char[] buffer, int length, CoderResult cr){ return cr; } }; private static final int VALUE_STRING_LENGTH = 32; private static final char UNICODE_PERCENT_SIGN_CODEPOINT = 0x0025; private static final char UNICODE_U_CODEPOINT = 0x0055; private static final char UNICODE_X_CODEPOINT = 0x0058; private static final char UNICODE_RS_CODEPOINT = 0x005C; private static final char UNICODE_U_LOW_CODEPOINT = 0x0075; private static final char UNICODE_X_LOW_CODEPOINT = 0x0078; private static final char UNICODE_AMP_CODEPOINT = 0x0026; private static final char UNICODE_HASH_CODEPOINT = 0x0023; private static final char UNICODE_SEMICOLON_CODEPOINT = 0x003B; private static final char UNICODE_PLUS_CODEPOINT = 0x002B; private static final char UNICODE_LEFT_CURLY_CODEPOINT = 0x007B; private static final char UNICODE_RIGHT_CURLY_CODEPOINT = 0x007D; private static final char UNICODE_SPACE_CODEPOINT = 0x0020; /** * Write escape callback * @stable ICU 4.0 */ public static final Encoder FROM_U_CALLBACK_ESCAPE = new Encoder() { public CoderResult call(CharsetEncoderICU encoder, Object context, CharBuffer source, ByteBuffer target, IntBuffer offsets, char[] buffer, int length, int cp, CoderResult cr){ char[] valueString = new char[VALUE_STRING_LENGTH]; int valueStringLength = 0; int i = 0; cr = CoderResult.UNDERFLOW; if (context == null || !(context instanceof String)) { while (i < length) { valueString[valueStringLength++] = UNICODE_PERCENT_SIGN_CODEPOINT; /* adding % */ valueString[valueStringLength++] = UNICODE_U_CODEPOINT; /* adding U */ valueStringLength += itou(valueString, valueStringLength, (int)buffer[i++] & UConverterConstants.UNSIGNED_SHORT_MASK, 16, 4); } } else { if (((String)context).equals(ESCAPE_JAVA)) { while (i < length) { valueString[valueStringLength++] = UNICODE_RS_CODEPOINT; /* adding \ */ valueString[valueStringLength++] = UNICODE_U_LOW_CODEPOINT; /* adding u */ valueStringLength += itou(valueString, valueStringLength, (int)buffer[i++] & UConverterConstants.UNSIGNED_SHORT_MASK, 16, 4); } } else if (((String)context).equals(ESCAPE_C)) { valueString[valueStringLength++] = UNICODE_RS_CODEPOINT; /* adding \ */ if (length == 2) { valueString[valueStringLength++] = UNICODE_U_CODEPOINT; /* adding U */ valueStringLength = itou(valueString, valueStringLength, cp, 16, 8); } else { valueString[valueStringLength++] = UNICODE_U_LOW_CODEPOINT; /* adding u */ valueStringLength += itou(valueString, valueStringLength, (int)buffer[0] & UConverterConstants.UNSIGNED_SHORT_MASK, 16, 4); } } else if (((String)context).equals(ESCAPE_XML_DEC)) { valueString[valueStringLength++] = UNICODE_AMP_CODEPOINT; /* adding & */ valueString[valueStringLength++] = UNICODE_HASH_CODEPOINT; /* adding # */ if (length == 2) { valueStringLength += itou(valueString, valueStringLength, cp, 10, 0); } else { valueStringLength += itou(valueString, valueStringLength, (int)buffer[0] & UConverterConstants.UNSIGNED_SHORT_MASK, 10, 0); } valueString[valueStringLength++] = UNICODE_SEMICOLON_CODEPOINT; /* adding ; */ } else if (((String)context).equals(ESCAPE_XML_HEX)) { valueString[valueStringLength++] = UNICODE_AMP_CODEPOINT; /* adding & */ valueString[valueStringLength++] = UNICODE_HASH_CODEPOINT; /* adding # */ valueString[valueStringLength++] = UNICODE_X_LOW_CODEPOINT; /* adding x */ if (length == 2) { valueStringLength += itou(valueString, valueStringLength, cp, 16, 0); } else { valueStringLength += itou(valueString, valueStringLength, (int)buffer[0] & UConverterConstants.UNSIGNED_SHORT_MASK, 16, 0); } valueString[valueStringLength++] = UNICODE_SEMICOLON_CODEPOINT; /* adding ; */ } else if (((String)context).equals(ESCAPE_UNICODE)) { valueString[valueStringLength++] = UNICODE_LEFT_CURLY_CODEPOINT; /* adding { */ valueString[valueStringLength++] = UNICODE_U_CODEPOINT; /* adding U */ valueString[valueStringLength++] = UNICODE_PLUS_CODEPOINT; /* adding + */ if (length == 2) { valueStringLength += itou(valueString, valueStringLength,cp, 16, 4); } else { valueStringLength += itou(valueString, valueStringLength, (int)buffer[0] & UConverterConstants.UNSIGNED_SHORT_MASK, 16, 4); } valueString[valueStringLength++] = UNICODE_RIGHT_CURLY_CODEPOINT; /* adding } */ } else if (((String)context).equals(ESCAPE_CSS2)) { valueString[valueStringLength++] = UNICODE_RS_CODEPOINT; /* adding \ */ valueStringLength += itou(valueString, valueStringLength, cp, 16, 0); /* Always add space character, because the next character might be whitespace, which would erroneously be considered the termination of the escape sequence. */ valueString[valueStringLength++] = UNICODE_SPACE_CODEPOINT; } else { while (i < length) { valueString[valueStringLength++] = UNICODE_PERCENT_SIGN_CODEPOINT; /* adding % */ valueString[valueStringLength++] = UNICODE_U_CODEPOINT; /* adding U */ valueStringLength += itou(valueString, valueStringLength, (int)buffer[i++] & UConverterConstants.UNSIGNED_SHORT_MASK, 16, 4); } } } cr = encoder.cbFromUWriteUChars(encoder, CharBuffer.wrap(valueString, 0, valueStringLength), target, offsets); return cr; } }; /** * Write escape callback * @stable ICU 4.0 */ public static final Decoder TO_U_CALLBACK_ESCAPE = new Decoder() { public CoderResult call(CharsetDecoderICU decoder, Object context, ByteBuffer source, CharBuffer target, IntBuffer offsets, char[] buffer, int length, CoderResult cr){ char[] uniValueString = new char[VALUE_STRING_LENGTH]; int valueStringLength = 0; int i = 0; if (context == null || !(context instanceof String)) { while (i < length) { uniValueString[valueStringLength++] = UNICODE_PERCENT_SIGN_CODEPOINT; /* adding % */ uniValueString[valueStringLength++] = UNICODE_X_CODEPOINT; /* adding U */ valueStringLength += itou(uniValueString, valueStringLength, buffer[i++] & UConverterConstants.UNSIGNED_BYTE_MASK, 16, 2); } } else { if (((String)context).equals(ESCAPE_XML_DEC)) { while (i < length) { uniValueString[valueStringLength++] = UNICODE_AMP_CODEPOINT; /* adding & */ uniValueString[valueStringLength++] = UNICODE_HASH_CODEPOINT; /* adding # */ valueStringLength += itou(uniValueString, valueStringLength, buffer[i++] & UConverterConstants.UNSIGNED_BYTE_MASK, 10, 0); uniValueString[valueStringLength++] = UNICODE_SEMICOLON_CODEPOINT; /* adding ; */ } } else if (((String)context).equals(ESCAPE_XML_HEX)) { while (i < length) { uniValueString[valueStringLength++] = UNICODE_AMP_CODEPOINT; /* adding & */ uniValueString[valueStringLength++] = UNICODE_HASH_CODEPOINT; /* adding # */ uniValueString[valueStringLength++] = UNICODE_X_LOW_CODEPOINT; /* adding x */ valueStringLength += itou(uniValueString, valueStringLength, buffer[i++] & UConverterConstants.UNSIGNED_BYTE_MASK, 16, 0); uniValueString[valueStringLength++] = UNICODE_SEMICOLON_CODEPOINT; /* adding ; */ } } else if (((String)context).equals(ESCAPE_C)) { while (i < length) { uniValueString[valueStringLength++] = UNICODE_RS_CODEPOINT; /* adding \ */ uniValueString[valueStringLength++] = UNICODE_X_LOW_CODEPOINT; /* adding x */ valueStringLength += itou(uniValueString, valueStringLength, buffer[i++] & UConverterConstants.UNSIGNED_BYTE_MASK, 16, 2); } } else { while (i < length) { uniValueString[valueStringLength++] = UNICODE_PERCENT_SIGN_CODEPOINT; /* adding % */ uniValueString[valueStringLength++] = UNICODE_X_CODEPOINT; /* adding X */ itou(uniValueString, valueStringLength, buffer[i++] & UConverterConstants.UNSIGNED_BYTE_MASK, 16, 2); valueStringLength += 2; } } } cr = CharsetDecoderICU.toUWriteUChars(decoder, uniValueString, 0, valueStringLength, target, offsets, 0); return cr; } }; /*** * Java port of uprv_itou() in ICU4C used by TO_U_CALLBACK_ESCAPE and FROM_U_CALLBACK_ESCAPE. * Fills in a char string with the radix-based representation of a number padded with zeroes * to minwidth. */ private static final int itou(char[] buffer, int sourceIndex, int i, int radix, int minwidth) { int length = 0; int digit; int j; char temp; do { digit = (int)(i % radix); buffer[sourceIndex + length++] = (char)(digit <= 9 ? (0x0030+digit) : (0x0030+digit+7)); i = i/radix; } while (i != 0 && (sourceIndex + length) < buffer.length); while (length < minwidth) { buffer[sourceIndex + length++] = (char)0x0030; /* zero padding */ } /* reverses the string */ for (j = 0; j < (length / 2); j++) { temp = buffer[(sourceIndex + length - 1) - j]; buffer[(sourceIndex + length-1) -j] = buffer[sourceIndex + j]; buffer[sourceIndex + j] = temp; } return length; } } icu4j-4.2/src/com/ibm/icu/charset/CharsetUTF16LE.java0000644000175000017500000000145011361046170022062 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; /** * The purpose of this class is to set isBigEndian to false and isEndianSpecified to true in the super class, and to * allow the Charset framework to open the variant UTF-16 converter without extra setup work. */ class CharsetUTF16LE extends CharsetUTF16 { public CharsetUTF16LE(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetDecoderICU.java0000644000175000017500000007136111361046170022752 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CoderResult; import java.nio.charset.CodingErrorAction; import com.ibm.icu.impl.Assert; /** * An abstract class that provides framework methods of decoding operations for concrete * subclasses. * In the future this class will contain API that will implement converter sematics of ICU4C. * @stable ICU 3.6 */ public abstract class CharsetDecoderICU extends CharsetDecoder{ int toUnicodeStatus; byte[] toUBytesArray = new byte[128]; int toUBytesBegin = 0; int toULength; char[] charErrorBufferArray = new char[128]; int charErrorBufferLength; int charErrorBufferBegin; char[] invalidCharBuffer = new char[128]; int invalidCharLength; /* maximum number of indexed bytes */ private static final int EXT_MAX_BYTES = 0x1f; /* store previous UChars/chars to continue partial matches */ byte[] preToUArray = new byte[EXT_MAX_BYTES]; int preToUBegin; int preToULength; /* negative: replay */ int preToUFirstLength; /* length of first character */ int mode; Object toUContext = null; private CharsetCallback.Decoder onUnmappableCharacter = CharsetCallback.TO_U_CALLBACK_STOP; private CharsetCallback.Decoder onMalformedInput = CharsetCallback.TO_U_CALLBACK_STOP; CharsetCallback.Decoder toCharErrorBehaviour = new CharsetCallback.Decoder() { public CoderResult call(CharsetDecoderICU decoder, Object context, ByteBuffer source, CharBuffer target, IntBuffer offsets, char[] buffer, int length, CoderResult cr) { if (cr.isUnmappable()) { return onUnmappableCharacter.call(decoder, context, source, target, offsets, buffer, length, cr); } else /* if (cr.isMalformed()) */ { return onMalformedInput.call(decoder, context, source, target, offsets, buffer, length, cr); } // return CharsetCallback.TO_U_CALLBACK_STOP.call(decoder, context, source, target, offsets, buffer, length, cr); } }; // exist to keep implOnMalformedInput and implOnUnmappableInput from being too recursive private boolean malformedInputCalled = false; private boolean unmappableCharacterCalled = false; /* * Construct a CharsetDecorderICU based on the information provided from a CharsetICU object. * * @param cs The CharsetICU object containing information about how to charset to decode. */ CharsetDecoderICU(CharsetICU cs) { super(cs, (float) (1/(float)cs.maxCharsPerByte), cs.maxCharsPerByte); } /* * Is this Decoder allowed to use fallbacks? A fallback mapping is a mapping * that will convert a byte sequence to a Unicode codepoint sequence, but * the encoded Unicode codepoint sequence will round trip convert to a different * byte sequence. In ICU, this is can be called a reverse fallback. * @return A boolean */ final boolean isFallbackUsed() { return true; } /** * Fallback is currently always used by icu4j decoders. */ static final boolean isToUUseFallback() { return isToUUseFallback(true); } /** * Fallback is currently always used by icu4j decoders. */ static final boolean isToUUseFallback(boolean iUseFallback) { return true; } /** * Sets the action to be taken if an illegal sequence is encountered * * @param newAction action to be taken * @exception IllegalArgumentException * @stable ICU 3.6 */ protected final void implOnMalformedInput(CodingErrorAction newAction) { // don't run infinitely if (malformedInputCalled) return; // if we get a replace, do not let the nio replace if (newAction == CodingErrorAction.REPLACE) { malformedInputCalled = true; super.onMalformedInput(CodingErrorAction.IGNORE); malformedInputCalled = false; } onMalformedInput = getCallback(newAction); } /** * Sets the action to be taken if an illegal sequence is encountered * * @param newAction action to be taken * @exception IllegalArgumentException * @stable ICU 3.6 */ protected final void implOnUnmappableCharacter(CodingErrorAction newAction) { // dont run infinitely if (unmappableCharacterCalled) return; // if we get a replace, do not let the nio replace if (newAction == CodingErrorAction.REPLACE) { unmappableCharacterCalled = true; super.onUnmappableCharacter(CodingErrorAction.IGNORE); unmappableCharacterCalled = false; } onUnmappableCharacter = getCallback(newAction); } /** * Sets the callback encoder method and context to be used if an illegal sequence is encounterd. * You would normally call this twice to set both the malform and unmappable error. In this case, * newContext should remain the same since using a different newContext each time will negate the last * one used. * @param err CoderResult * @param newCallback CharsetCallback.Encoder * @param newContext Object * @stable ICU 4.0 */ public final void setToUCallback(CoderResult err, CharsetCallback.Decoder newCallback, Object newContext) { if (err.isMalformed()) { onMalformedInput = newCallback; } else if (err.isUnmappable()) { onUnmappableCharacter = newCallback; } else { /* Error: Only malformed and unmappable are handled. */ } if (toUContext == null || !toUContext.equals(newContext)) { toUContext = newContext; } } private static CharsetCallback.Decoder getCallback(CodingErrorAction action){ if(action==CodingErrorAction.REPLACE){ return CharsetCallback.TO_U_CALLBACK_SUBSTITUTE; }else if(action==CodingErrorAction.IGNORE){ return CharsetCallback.TO_U_CALLBACK_SKIP; }else /* if(action==CodingErrorAction.REPORT) */ { return CharsetCallback.TO_U_CALLBACK_STOP; } } private final ByteBuffer EMPTY = ByteBuffer.allocate(0); /** * Flushes any characters saved in the converter's internal buffer and * resets the converter. * @param out action to be taken * @return result of flushing action and completes the decoding all input. * Returns CoderResult.UNDERFLOW if the action succeeds. * @stable ICU 3.6 */ protected final CoderResult implFlush(CharBuffer out) { return decode(EMPTY, out, null, true); } /** * Resets the to Unicode mode of converter * @stable ICU 3.6 */ protected void implReset() { toUnicodeStatus = 0 ; toULength = 0; charErrorBufferLength = 0; charErrorBufferBegin = 0; /* store previous UChars/chars to continue partial matches */ preToUBegin = 0; preToULength = 0; /* negative: replay */ preToUFirstLength = 0; mode = 0; } /** * Decodes one or more bytes. The default behaviour of the converter * is stop and report if an error in input stream is encountered. * To set different behaviour use @see CharsetDecoder.onMalformedInput() * This method allows a buffer by buffer conversion of a data stream. * The state of the conversion is saved between calls to convert. * Among other things, this means multibyte input sequences can be * split between calls. If a call to convert results in an Error, the * conversion may be continued by calling convert again with suitably * modified parameters.All conversions should be finished with a call to * the flush method. * @param in buffer to decode * @param out buffer to populate with decoded result * @return Result of decoding action. Returns CoderResult.UNDERFLOW if the decoding * action succeeds or more input is needed for completing the decoding action. * @stable ICU 3.6 */ protected CoderResult decodeLoop(ByteBuffer in,CharBuffer out){ if(in.remaining() < toUCountPending()){ return CoderResult.UNDERFLOW; } // if (!in.hasRemaining()) { // toULength = 0; // return CoderResult.UNDERFLOW; // } in.position(in.position() + toUCountPending()); /* do the conversion */ CoderResult ret = decode(in, out, null, false); // ok was there input held in the previous invocation of decodeLoop // that resulted in output in this invocation? in.position(in.position() - toUCountPending()); return ret; } /* * Implements the ICU semantic for decode operation * @param in The input byte buffer * @param out The output character buffer * @return Result of decoding action. Returns CoderResult.UNDERFLOW if the decoding * action succeeds or more input is needed for completing the decoding action. */ abstract CoderResult decodeLoop(ByteBuffer in, CharBuffer out, IntBuffer offsets, boolean flush); /* * Implements the ICU semantic for decode operation * @param source The input byte buffer * @param target The output character buffer * @param offsets * @param flush true if, and only if, the invoker can provide no * additional input bytes beyond those in the given buffer. * @return Result of decoding action. Returns CoderResult.UNDERFLOW if the decoding * action succeeds or more input is needed for completing the decoding action. */ final CoderResult decode(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { /* check parameters */ if (target == null || source == null) { throw new IllegalArgumentException(); } /* * Make sure that the buffer sizes do not exceed the number range for * int32_t because some functions use the size (in units or bytes) * rather than comparing pointers, and because offsets are int32_t values. * * size_t is guaranteed to be unsigned and large enough for the job. * * Return with an error instead of adjusting the limits because we would * not be able to maintain the semantics that either the source must be * consumed or the target filled (unless an error occurs). * An adjustment would be sourceLimit=t+0x7fffffff; for example. */ /*agljport:fix if( ((size_t)(sourceLimit-s)>(size_t)0x7fffffff && sourceLimit>s) || ((size_t)(targetLimit-t)>(size_t)0x3fffffff && targetLimit>t) ) { *err=U_ILLEGAL_ARGUMENT_ERROR; return; } */ /* flush the target overflow buffer */ if (charErrorBufferLength > 0) { int i = 0; do { if (!target.hasRemaining()) { /* the overflow buffer contains too much, keep the rest */ int j = 0; do { charErrorBufferArray[j++] = charErrorBufferArray[i++]; } while (i < charErrorBufferLength); charErrorBufferLength = (byte) j; return CoderResult.OVERFLOW; } /* copy the overflow contents to the target */ target.put(charErrorBufferArray[i++]); if (offsets != null) { offsets.put(-1); /* no source index available for old output */ } } while (i < charErrorBufferLength); /* the overflow buffer is completely copied to the target */ charErrorBufferLength = 0; } if (!flush && !source.hasRemaining() && preToULength >= 0) { /* the overflow buffer is emptied and there is no new input: we are done */ return CoderResult.UNDERFLOW; } /* * Do not simply return with a buffer overflow error if * !flush && t==targetLimit * because it is possible that the source will not generate any output. * For example, the skip callback may be called; * it does not output anything. */ return toUnicodeWithCallback(source, target, offsets, flush); } /* Currently, we are not using offsets in ICU4J. */ /* private void updateOffsets(IntBuffer offsets,int length, int sourceIndex, int errorInputLength) { int limit; int delta, offset; if(sourceIndex>=0) { /* * adjust each offset by adding the previous sourceIndex * minus the length of the input sequence that caused an * error, if any */ /* delta=sourceIndex-errorInputLength; } else { /* * set each offset to -1 because this conversion function * does not handle offsets */ /* delta=-1; } limit=offsets.position()+length; if(delta==0) { /* most common case, nothing to do */ /* } else if(delta>0) { /* add the delta to each offset (but not if the offset is <0) */ /* while(offsets.position()=0) { offsets.put(offset+delta); } //FIXME: ++offsets; } } else /* delta<0 */ /* { /* * set each offset to -1 because this conversion function * does not handle offsets * or the error input sequence started in a previous buffer */ /* while(offsets.position()=0) { /* normal mode */ } else { /* * Previous m:n conversion stored source units from a partial match * and failed to consume all of them. * We need to "replay" them from a temporary buffer and convert them first. */ realSource=source; realFlush=flush; realSourceIndex=sourceIndex; //UConverterUtility.uprv_memcpy(replayArray, replayBegin, preToUArray, preToUBegin, -preToULength); replayArray.put(preToUArray,0, -preToULength); source=replayArray; source.position(0); source.limit(replayArrayIndex-preToULength); flush=false; sourceIndex=-1; preToULength=0; } /* * loop for conversion and error handling * * loop { * convert * loop { * update offsets * handle end of input * handle errors/call callback * } * } */ for(;;) { /* convert */ cr = decodeLoop(source, target, offsets, flush); /* * set a flag for whether the converter * successfully processed the end of the input * * need not check cnv->preToULength==0 because a replay (<0) will cause * s0) { updateOffsets(offsets, length, sourceIndex, errorInputLength); /* * if a converter handles offsets and updates the offsets * pointer at the end, then pArgs->offset should not change * here; * however, some converters do not handle offsets at all * (sourceIndex<0) or may not update the offsets pointer */ //TODO: pArgs->offsets=offsets+=length; /* } if(sourceIndex>=0) { sourceIndex+=(source.position()-s); } } */ if(preToULength<0) { /* * switch the source to new replay units (cannot occur while replaying) * after offset handling and before end-of-input and callback handling */ if(realSource==null) { realSource=source; realFlush=flush; realSourceIndex=sourceIndex; //UConverterUtility.uprv_memcpy(replayArray, replayBegin, preToUArray, preToUBegin, -preToULength); replayArray.put(preToUArray,0, -preToULength); // reset position replayArray.position(0); source=replayArray; source.limit(replayArrayIndex-preToULength); flush=false; if((sourceIndex+=preToULength)<0) { sourceIndex=-1; } preToULength=0; } else { /* see implementation note before _fromUnicodeWithCallback() */ //agljport:todo U_ASSERT(realSource==NULL); Assert.assrt(realSource==null); } } /* update pointers */ s=source.position(); //t=target.position(); if(cr.isUnderflow()) { if(s0) { /* * the entire input stream is consumed * and there is a partial, truncated input sequence left */ /* inject an error and continue with callback handling */ cr = CoderResult.malformedForLength(toULength); calledCallback=false; /* new error condition */ } else { /* input consumed */ if(flush) { /* * return to the conversion loop once more if the flush * flag is set and the conversion function has not * successfully processed the end of the input yet * * (continue converting by breaking out of only the inner loop) */ if(!converterSawEndOfInput) { break; } /* reset the converter without calling the callback function */ implReset(); } /* done successfully */ return cr; } } /* U_FAILURE(*err) */ { if( calledCallback || cr.isOverflow() || (cr.isMalformed() && cr.isUnmappable()) ) { /* * the callback did not or cannot resolve the error: * set output pointers and return * * the check for buffer overflow is redundant but it is * a high-runner case and hopefully documents the intent * well * * if we were replaying, then the replay buffer must be * copied back into the UConverter * and the real arguments must be restored */ if(realSource!=null) { int length; Assert.assrt(preToULength==0); length=(int)(source.limit()-source.position()); if(length>0) { //UConverterUtility.uprv_memcpy(preToUArray, preToUBegin, pArgs.sourceArray, pArgs.sourceBegin, length); source.get(preToUArray, preToUBegin, length); preToULength=(byte)-length; } source=realSource; flush=realFlush; } return cr; } } /* copy toUBytes[] to invalidCharBuffer[] */ errorInputLength=invalidCharLength=toULength; if(errorInputLength>0) { copy(toUBytesArray, 0, invalidCharBuffer, 0, errorInputLength); } /* set the converter state to deal with the next character */ toULength=0; /* call the callback function */ cr = toCharErrorBehaviour.call(this, toUContext, source, target, offsets, invalidCharBuffer, errorInputLength, cr); /* * loop back to the offset handling * * this flag will indicate after offset handling * that a callback was called; * if the callback did not resolve the error, then we return */ calledCallback=true; } } } /* * Returns the number of chars held in the converter's internal state * because more input is needed for completing the conversion. This function is * useful for mapping semantics of ICU's converter interface to those of iconv, * and this information is not needed for normal conversion. * @return The number of chars in the state. -1 if an error is encountered. */ /*public*/ int toUCountPending() { if(preToULength > 0){ return preToULength ; } else if(preToULength < 0){ return -preToULength; } else if(toULength > 0){ return toULength; } else { return 0; } } private void copy(byte[] src, int srcOffset, char[] dst, int dstOffset, int length) { for(int i=srcOffset; i0 && target.hasRemaining()) { target.put(ucharsArray[ucharsBegin++]); --length; } } else { /* output with offsets */ while(length>0 && target.hasRemaining()) { target.put(ucharsArray[ucharsBegin++]); offsets.put(sourceIndex); --length; } } /* write overflow */ if(length>0) { cnv.charErrorBufferLength= 0; cr = CoderResult.OVERFLOW; do { cnv.charErrorBufferArray[cnv.charErrorBufferLength++]=ucharsArray[ucharsBegin++]; } while(--length>0); } return cr; } /* * This function will write out the Unicode substitution character to the * target character buffer. * Sub classes to override this method if required * @param decoder * @param source * @param target * @param offsets * @return A CoderResult object that contains the error result when an error occurs. */ /* Note: Currently, this method is not being used because the callback method calls toUWriteUChars with * the substitution characters. Will leave in here for the time being. To be removed later. (4.0) */ /*CoderResult cbToUWriteSub(CharsetDecoderICU decoder, ByteBuffer source, CharBuffer target, IntBuffer offsets){ String sub = decoder.replacement(); CharsetICU cs = (CharsetICU) decoder.charset(); if (decoder.invalidCharLength==1 && cs.subChar1 != 0x00) { char[] subArr = new char[] { 0x1a }; return CharsetDecoderICU.toUWriteUChars(decoder, subArr, 0, sub .length(), target, offsets, source.position()); } else { return CharsetDecoderICU.toUWriteUChars(decoder, sub.toCharArray(), 0, sub.length(), target, offsets, source.position()); } }*/ } icu4j-4.2/src/com/ibm/icu/charset/CharsetUTF8.java0000644000175000017500000007404611361046170021575 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; /** * @author Niti Hantaweepant */ class CharsetUTF8 extends CharsetICU { private static final byte[] fromUSubstitution = new byte[] { (byte) 0xef, (byte) 0xbf, (byte) 0xbd }; public CharsetUTF8(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); /* max 3 bytes per code unit from UTF-8 (4 bytes from surrogate _pair_) */ maxBytesPerChar = 3; minBytesPerChar = 1; maxCharsPerByte = 1; } private static final int BITMASK_FROM_UTF8[] = { -1, 0x7f, 0x1f, 0xf, 0x7, 0x3, 0x1 }; private static final byte BYTES_FROM_UTF8[] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0, 0 }; /* * Starting with Unicode 3.0.1: UTF-8 byte sequences of length N _must_ encode code points of or * above utf8_minChar32[N]; byte sequences with more than 4 bytes are illegal in UTF-8, which is * tested with impossible values for them */ private static final int UTF8_MIN_CHAR32[] = { 0, 0, 0x80, 0x800, 0x10000, Integer.MAX_VALUE, Integer.MAX_VALUE }; private final boolean isCESU8 = this instanceof CharsetCESU8; class CharsetDecoderUTF8 extends CharsetDecoderICU { public CharsetDecoderUTF8(CharsetICU cs) { super(cs); } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { if (!source.hasRemaining()) { /* no input, nothing to do */ return CoderResult.UNDERFLOW; } if (!target.hasRemaining()) { /* no output available, can't do anything */ return CoderResult.OVERFLOW; } if (source.hasArray() && target.hasArray()) { /* source and target are backed by arrays, so use the arrays for optimal performance */ byte[] sourceArray = source.array(); int sourceIndex = source.arrayOffset() + source.position(); int sourceLimit = source.arrayOffset() + source.limit(); char[] targetArray = target.array(); int targetIndex = target.arrayOffset() + target.position(); int targetLimit = target.arrayOffset() + target.limit(); byte ch; int char32, bytesExpected, bytesSoFar; CoderResult cr; if (mode == 0) { /* nothing is stored in toUnicodeStatus, read a byte as input */ char32 = (toUBytesArray[0] = sourceArray[sourceIndex++]) & 0xff; bytesExpected = BYTES_FROM_UTF8[char32]; char32 &= BITMASK_FROM_UTF8[bytesExpected]; bytesSoFar = 1; } else { /* a partially or fully built code point is stored in toUnicodeStatus */ char32 = toUnicodeStatus; bytesExpected = mode; bytesSoFar = toULength; toUnicodeStatus = 0; mode = 0; toULength = 0; } outer: while (true) { if (bytesSoFar < bytesExpected) { /* read a trail byte and insert its relevant bits into char32 */ if (sourceIndex >= sourceLimit) { /* no source left, save the state for later and break out of the loop */ toUnicodeStatus = char32; mode = bytesExpected; toULength = bytesSoFar; cr = CoderResult.UNDERFLOW; break; } if (((ch = toUBytesArray[bytesSoFar] = sourceArray[sourceIndex++]) & 0xc0) != 0x80) { /* not a trail byte (is not of the form 10xxxxxx) */ sourceIndex--; toULength = bytesSoFar; cr = CoderResult.malformedForLength(bytesSoFar); break; } char32 = (char32 << 6) | (ch & 0x3f); bytesSoFar++; } else if (bytesSoFar == bytesExpected && UTF8_MIN_CHAR32[bytesExpected] <= char32 && char32 <= 0x10ffff && (isCESU8 ? bytesExpected <= 3 : !UTF16.isSurrogate((char) char32))) { /* * char32 is a valid code point and is composed of the correct number of * bytes ... we now need to output it in UTF-16 */ if (char32 <= UConverterConstants.MAXIMUM_UCS2) { /* fits in 16 bits */ targetArray[targetIndex++] = (char) char32; } else { /* fit char32 into 20 bits */ char32 -= UConverterConstants.HALF_BASE; /* write out the surrogates */ targetArray[targetIndex++] = (char) ((char32 >>> UConverterConstants.HALF_SHIFT) + UConverterConstants.SURROGATE_HIGH_START); if (targetIndex >= targetLimit) { /* put in overflow buffer (not handled here) */ charErrorBufferArray[charErrorBufferBegin++] = (char) char32; cr = CoderResult.OVERFLOW; break; } targetArray[targetIndex++] = (char) ((char32 & UConverterConstants.HALF_MASK) + UConverterConstants.SURROGATE_LOW_START); } /* * we're finished outputing, so now we need to read in the first byte of the * next byte sequence that could form a code point */ if (sourceIndex >= sourceLimit) { cr = CoderResult.UNDERFLOW; break; } if (targetIndex >= targetLimit) { cr = CoderResult.OVERFLOW; break; } /* keep reading the next input (and writing it) while bytes == 1 */ while ((bytesExpected = BYTES_FROM_UTF8[char32 = (toUBytesArray[0] = sourceArray[sourceIndex++]) & 0xff]) == 1) { targetArray[targetIndex++] = (char) char32; if (sourceIndex >= sourceLimit) { cr = CoderResult.UNDERFLOW; break outer; } if (targetIndex >= targetLimit) { cr = CoderResult.OVERFLOW; break outer; } } /* remove the bits that indicate the number of bytes */ char32 &= BITMASK_FROM_UTF8[bytesExpected]; bytesSoFar = 1; } else { /* * either the lead byte in the code sequence is invalid (bytes == 0) or the * lead byte combined with all the trail chars does not form a valid code * point */ toULength = bytesSoFar; cr = CoderResult.malformedForLength(bytesSoFar); break; } } source.position(sourceIndex - source.arrayOffset()); target.position(targetIndex - target.arrayOffset()); return cr; } else { int sourceIndex = source.position(); int sourceLimit = source.limit(); int targetIndex = target.position(); int targetLimit = target.limit(); byte ch; int char32, bytesExpected, bytesSoFar; CoderResult cr; if (mode == 0) { /* nothing is stored in toUnicodeStatus, read a byte as input */ char32 = (toUBytesArray[0] = source.get(sourceIndex++)) & 0xff; bytesExpected = BYTES_FROM_UTF8[char32]; char32 &= BITMASK_FROM_UTF8[bytesExpected]; bytesSoFar = 1; } else { /* a partially or fully built code point is stored in toUnicodeStatus */ char32 = toUnicodeStatus; bytesExpected = mode; bytesSoFar = toULength; toUnicodeStatus = 0; mode = 0; toULength = 0; } outer: while (true) { if (bytesSoFar < bytesExpected) { /* read a trail byte and insert its relevant bits into char32 */ if (sourceIndex >= sourceLimit) { /* no source left, save the state for later and break out of the loop */ toUnicodeStatus = char32; mode = bytesExpected; toULength = bytesSoFar; cr = CoderResult.UNDERFLOW; break; } if (((ch = toUBytesArray[bytesSoFar] = source.get(sourceIndex++)) & 0xc0) != 0x80) { /* not a trail byte (is not of the form 10xxxxxx) */ sourceIndex--; toULength = bytesSoFar; cr = CoderResult.malformedForLength(bytesSoFar); break; } char32 = (char32 << 6) | (ch & 0x3f); bytesSoFar++; } /* * Legal UTF-8 byte sequences in Unicode 3.0.1 and up: * - use only trail bytes after a lead byte (checked above) * - use the right number of trail bytes for a given lead byte * - encode a code point <= U+10ffff * - use the fewest possible number of bytes for their code points * - use at most 4 bytes (for i>=5 it is 0x10ffff>> UConverterConstants.HALF_SHIFT) + UConverterConstants.SURROGATE_HIGH_START)); if (targetIndex >= targetLimit) { /* put in overflow buffer (not handled here) */ charErrorBufferArray[charErrorBufferBegin++] = (char) char32; cr = CoderResult.OVERFLOW; break; } target.put( targetIndex++, (char) ((char32 & UConverterConstants.HALF_MASK) + UConverterConstants.SURROGATE_LOW_START)); } /* * we're finished outputing, so now we need to read in the first byte of the * next byte sequence that could form a code point */ if (sourceIndex >= sourceLimit) { cr = CoderResult.UNDERFLOW; break; } if (targetIndex >= targetLimit) { cr = CoderResult.OVERFLOW; break; } /* keep reading the next input (and writing it) while bytes == 1 */ while ((bytesExpected = BYTES_FROM_UTF8[char32 = (toUBytesArray[0] = source.get(sourceIndex++)) & 0xff]) == 1) { target.put(targetIndex++, (char) char32); if (sourceIndex >= sourceLimit) { cr = CoderResult.UNDERFLOW; break outer; } if (targetIndex >= targetLimit) { cr = CoderResult.OVERFLOW; break outer; } } /* remove the bits that indicate the number of bytes */ char32 &= BITMASK_FROM_UTF8[bytesExpected]; bytesSoFar = 1; } else { /* * either the lead byte in the code sequence is invalid (bytes == 0) or the * lead byte combined with all the trail chars does not form a valid code * point */ toULength = bytesSoFar; cr = CoderResult.malformedForLength(bytesSoFar); break; } } source.position(sourceIndex); target.position(targetIndex); return cr; } } } class CharsetEncoderUTF8 extends CharsetEncoderICU { public CharsetEncoderUTF8(CharsetICU cs) { super(cs, fromUSubstitution); implReset(); } protected void implReset() { super.implReset(); } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { if (!source.hasRemaining()) { /* no input, nothing to do */ return CoderResult.UNDERFLOW; } if (!target.hasRemaining()) { /* no output available, can't do anything */ return CoderResult.OVERFLOW; } if (source.hasArray() && target.hasArray()) { /* source and target are backed by arrays, so use the arrays for optimal performance */ char[] sourceArray = source.array(); int srcIdx = source.arrayOffset() + source.position(); int sourceLimit = source.arrayOffset() + source.limit(); byte[] targetArray = target.array(); int tgtIdx = target.arrayOffset() + target.position(); int targetLimit = target.arrayOffset() + target.limit(); int char32; CoderResult cr; /* take care of the special condition of fromUChar32 not being 0 (it is a surrogate) */ if (fromUChar32 != 0) { /* 4 bytes to encode from char32 and a following char in source */ sourceIndex = srcIdx; targetIndex = tgtIdx; cr = encodeFourBytes(sourceArray, targetArray, sourceLimit, targetLimit, fromUChar32); srcIdx = sourceIndex; tgtIdx = targetIndex; if (cr != null) { source.position(srcIdx - source.arrayOffset()); target.position(tgtIdx - target.arrayOffset()); return cr; } } while (true) { if (srcIdx >= sourceLimit) { /* nothing left to read */ cr = CoderResult.UNDERFLOW; break; } if (tgtIdx >= targetLimit) { /* no space left to write */ cr = CoderResult.OVERFLOW; break; } /* reach the next char into char32 */ char32 = sourceArray[srcIdx++]; if (char32 <= 0x7f) { /* 1 byte to encode from char32 */ targetArray[tgtIdx++] = encodeHeadOf1(char32); } else if (char32 <= 0x7ff) { /* 2 bytes to encode from char32 */ targetArray[tgtIdx++] = encodeHeadOf2(char32); if (tgtIdx >= targetLimit) { errorBuffer[errorBufferLength++] = encodeLastTail(char32); cr = CoderResult.OVERFLOW; break; } targetArray[tgtIdx++] = encodeLastTail(char32); } else if (!UTF16.isSurrogate((char) char32) || isCESU8) { /* 3 bytes to encode from char32 */ targetArray[tgtIdx++] = encodeHeadOf3(char32); if (tgtIdx >= targetLimit) { errorBuffer[errorBufferLength++] = encodeSecondToLastTail(char32); errorBuffer[errorBufferLength++] = encodeLastTail(char32); cr = CoderResult.OVERFLOW; break; } targetArray[tgtIdx++] = encodeSecondToLastTail(char32); if (tgtIdx >= targetLimit) { errorBuffer[errorBufferLength++] = encodeLastTail(char32); cr = CoderResult.OVERFLOW; break; } targetArray[tgtIdx++] = encodeLastTail(char32); } else { /* 4 bytes to encode from char32 and a following char in source */ sourceIndex = srcIdx; targetIndex = tgtIdx; cr = encodeFourBytes(sourceArray, targetArray, sourceLimit, targetLimit, char32); srcIdx = sourceIndex; tgtIdx = targetIndex; if (cr != null) break; } } /* set the new source and target positions and return the CoderResult stored in cr */ source.position(srcIdx - source.arrayOffset()); target.position(tgtIdx - target.arrayOffset()); return cr; } else { int char32; CoderResult cr; /* take care of the special condition of fromUChar32 not being 0 (it is a surrogate) */ if (fromUChar32 != 0) { /* 4 bytes to encode from char32 and a following char in source */ cr = encodeFourBytes(source, target, fromUChar32); if (cr != null) return cr; } while (true) { if (!source.hasRemaining()) { /* nothing left to read */ cr = CoderResult.UNDERFLOW; break; } if (!target.hasRemaining()) { /* no space left to write */ cr = CoderResult.OVERFLOW; break; } /* reach the next char into char32 */ char32 = source.get(); if (char32 <= 0x7f) { /* 1 byte to encode from char32 */ target.put(encodeHeadOf1(char32)); } else if (char32 <= 0x7ff) { /* 2 bytes to encode from char32 */ target.put(encodeHeadOf2(char32)); if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = encodeLastTail(char32); cr = CoderResult.OVERFLOW; break; } target.put(encodeLastTail(char32)); } else if (!UTF16.isSurrogate((char) char32) || isCESU8) { /* 3 bytes to encode from char32 */ target.put(encodeHeadOf3(char32)); if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = encodeSecondToLastTail(char32); errorBuffer[errorBufferLength++] = encodeLastTail(char32); cr = CoderResult.OVERFLOW; break; } target.put(encodeSecondToLastTail(char32)); if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = encodeLastTail(char32); cr = CoderResult.OVERFLOW; break; } target.put(encodeLastTail(char32)); } else { /* 4 bytes to encode from char32 and a following char in source */ cr = encodeFourBytes(source, target, char32); if (cr != null) break; } } /* set the new source and target positions and return the CoderResult stored in cr */ return cr; } } private final CoderResult encodeFourBytes(char[] sourceArray, byte[] targetArray, int sourceLimit, int targetLimit, int char32) { /* we need to read another char to match up the surrogate stored in char32 */ /* handle the surrogate stuff, returning on a non-null CoderResult */ CoderResult cr = handleSurrogates(sourceArray, sourceIndex, sourceLimit, (char)char32); if (cr != null) return cr; sourceIndex++; char32 = fromUChar32; fromUChar32 = 0; /* the rest is routine -- encode four bytes, stopping on overflow */ targetArray[targetIndex++] = encodeHeadOf4(char32); if (targetIndex >= targetLimit) { errorBuffer[errorBufferLength++] = encodeThirdToLastTail(char32); errorBuffer[errorBufferLength++] = encodeSecondToLastTail(char32); errorBuffer[errorBufferLength++] = encodeLastTail(char32); return CoderResult.OVERFLOW; } targetArray[targetIndex++] = encodeThirdToLastTail(char32); if (targetIndex >= targetLimit) { errorBuffer[errorBufferLength++] = encodeSecondToLastTail(char32); errorBuffer[errorBufferLength++] = encodeLastTail(char32); return CoderResult.OVERFLOW; } targetArray[targetIndex++] = encodeSecondToLastTail(char32); if (targetIndex >= targetLimit) { errorBuffer[errorBufferLength++] = encodeLastTail(char32); return CoderResult.OVERFLOW; } targetArray[targetIndex++] = encodeLastTail(char32); /* return null for success */ return null; } private final CoderResult encodeFourBytes(CharBuffer source, ByteBuffer target, int char32) { /* handle the surrogate stuff, returning on a non-null CoderResult */ CoderResult cr = handleSurrogates(source, (char)char32); if (cr != null) return cr; char32 = fromUChar32; fromUChar32 = 0; /* the rest is routine -- encode four bytes, stopping on overflow */ target.put(encodeHeadOf4(char32)); if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = encodeThirdToLastTail(char32); errorBuffer[errorBufferLength++] = encodeSecondToLastTail(char32); errorBuffer[errorBufferLength++] = encodeLastTail(char32); return CoderResult.OVERFLOW; } target.put(encodeThirdToLastTail(char32)); if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = encodeSecondToLastTail(char32); errorBuffer[errorBufferLength++] = encodeLastTail(char32); return CoderResult.OVERFLOW; } target.put(encodeSecondToLastTail(char32)); if (!target.hasRemaining()) { errorBuffer[errorBufferLength++] = encodeLastTail(char32); return CoderResult.OVERFLOW; } target.put(encodeLastTail(char32)); /* return null for success */ return null; } private int sourceIndex; private int targetIndex; } private static final byte encodeHeadOf1(int char32) { return (byte) char32; } private static final byte encodeHeadOf2(int char32) { return (byte) (0xc0 | (char32 >>> 6)); } private static final byte encodeHeadOf3(int char32) { return (byte) (0xe0 | ((char32 >>> 12))); } private static final byte encodeHeadOf4(int char32) { return (byte) (0xf0 | ((char32 >>> 18))); } private static final byte encodeThirdToLastTail(int char32) { return (byte) (0x80 | ((char32 >>> 12) & 0x3f)); } private static final byte encodeSecondToLastTail(int char32) { return (byte) (0x80 | ((char32 >>> 6) & 0x3f)); } private static final byte encodeLastTail(int char32) { return (byte) (0x80 | (char32 & 0x3f)); } /* single-code point definitions -------------------------------------------- */ /* * Does this code unit (byte) encode a code point by itself (US-ASCII 0..0x7f)? * @param c 8-bit code unit (byte) * @return TRUE or FALSE */ // static final boolean isSingle(byte c) {return (((c)&0x80)==0);} /* * Is this code unit (byte) a UTF-8 lead byte? * @param c 8-bit code unit (byte) * @return TRUE or FALSE */ // static final boolean isLead(byte c) {return ((((c)-0xc0) & // UConverterConstants.UNSIGNED_BYTE_MASK)<0x3e);} /* * Is this code unit (byte) a UTF-8 trail byte? * * @param c * 8-bit code unit (byte) * @return TRUE or FALSE */ /*private static final boolean isTrail(byte c) { return (((c) & 0xc0) == 0x80); }*/ public CharsetDecoder newDecoder() { return new CharsetDecoderUTF8(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderUTF8(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ getNonSurrogateUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/CharsetUTF32.java0000644000175000017500000002341111361046170021640 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.IntBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; /** * @author Niti Hantaweepant */ class CharsetUTF32 extends CharsetICU { private static final int SIGNATURE_LENGTH = 4; private static final byte[] fromUSubstitution_BE = { (byte) 0, (byte) 0, (byte) 0xff, (byte) 0xfd }; private static final byte[] fromUSubstitution_LE = { (byte) 0xfd, (byte) 0xff, (byte) 0, (byte) 0 }; private static final byte[] BOM_BE = { 0, 0, (byte) 0xfe, (byte) 0xff }; private static final byte[] BOM_LE = { (byte) 0xff, (byte) 0xfe, 0, 0 }; private static final int ENDIAN_XOR_BE = 0; private static final int ENDIAN_XOR_LE = 3; private static final int NEED_TO_WRITE_BOM = 1; private boolean isEndianSpecified; private boolean isBigEndian; private int endianXOR; private byte[] bom; private byte[] fromUSubstitution; public CharsetUTF32(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); this.isEndianSpecified = (this instanceof CharsetUTF32BE || this instanceof CharsetUTF32LE); this.isBigEndian = !(this instanceof CharsetUTF32LE); if (isBigEndian) { this.bom = BOM_BE; this.fromUSubstitution = fromUSubstitution_BE; this.endianXOR = ENDIAN_XOR_BE; } else { this.bom = BOM_LE; this.fromUSubstitution = fromUSubstitution_LE; this.endianXOR = ENDIAN_XOR_LE; } maxBytesPerChar = 4; minBytesPerChar = 4; maxCharsPerByte = 1; } class CharsetDecoderUTF32 extends CharsetDecoderICU { private boolean isBOMReadYet; private int actualEndianXOR; private byte[] actualBOM; public CharsetDecoderUTF32(CharsetICU cs) { super(cs); } protected void implReset() { super.implReset(); isBOMReadYet = false; actualBOM = null; } protected CoderResult decodeLoop(ByteBuffer source, CharBuffer target, IntBuffer offsets, boolean flush) { /* * If we detect a BOM in this buffer, then we must add the BOM size to the offsets because the actual * converter function will not see and count the BOM. offsetDelta will have the number of the BOM bytes that * are in the current buffer. */ if (!isBOMReadYet) { while (true) { if (!source.hasRemaining()) return CoderResult.UNDERFLOW; toUBytesArray[toULength++] = source.get(); if (toULength == 1) { // on the first byte, we haven't decided whether or not it's bigEndian yet if ((!isEndianSpecified || isBigEndian) && toUBytesArray[toULength - 1] == BOM_BE[toULength - 1]) { actualBOM = BOM_BE; actualEndianXOR = ENDIAN_XOR_BE; } else if ((!isEndianSpecified || !isBigEndian) && toUBytesArray[toULength - 1] == BOM_LE[toULength - 1]) { actualBOM = BOM_LE; actualEndianXOR = ENDIAN_XOR_LE; } else { // we do not have a BOM (and we have toULength==1 bytes) actualBOM = null; actualEndianXOR = endianXOR; break; } } else if (toUBytesArray[toULength - 1] != actualBOM[toULength - 1]) { // we do not have a BOM (and we have toULength bytes) actualBOM = null; actualEndianXOR = endianXOR; break; } else if (toULength == SIGNATURE_LENGTH) { // we found a BOM! at last! // too bad we have to get ignore it now (like it was unwanted or something) toULength = 0; break; } } isBOMReadYet = true; } // now that we no longer need to look for a BOM, let's do some work int char32; while (true) { while (toULength < 4) { if (!source.hasRemaining()) return CoderResult.UNDERFLOW; toUBytesArray[toULength++] = source.get(); } if (!target.hasRemaining()) return CoderResult.OVERFLOW; char32 = 0; for (int i = 0; i < 4; i++) char32 = (char32 << 8) | (toUBytesArray[i ^ actualEndianXOR] & UConverterConstants.UNSIGNED_BYTE_MASK); if (0 <= char32 && char32 <= UConverterConstants.MAXIMUM_UTF && !isSurrogate(char32)) { toULength = 0; if (char32 <= UConverterConstants.MAXIMUM_UCS2) { /* fits in 16 bits */ target.put((char) char32); } else { /* write out the surrogates */ target.put(UTF16.getLeadSurrogate(char32)); char32 = UTF16.getTrailSurrogate(char32); if (target.hasRemaining()) { target.put((char) char32); } else { /* Put in overflow buffer (not handled here) */ charErrorBufferArray[0] = (char) char32; charErrorBufferLength = 1; return CoderResult.OVERFLOW; } } } else { return CoderResult.malformedForLength(toULength); } } } } class CharsetEncoderUTF32 extends CharsetEncoderICU { private final byte[] temp = new byte[4]; public CharsetEncoderUTF32(CharsetICU cs) { super(cs, fromUSubstitution); fromUnicodeStatus = isEndianSpecified ? 0 : NEED_TO_WRITE_BOM; } protected void implReset() { super.implReset(); fromUnicodeStatus = isEndianSpecified ? 0 : NEED_TO_WRITE_BOM; } protected CoderResult encodeLoop(CharBuffer source, ByteBuffer target, IntBuffer offsets, boolean flush) { CoderResult cr; /* write the BOM if necessary */ if (fromUnicodeStatus == NEED_TO_WRITE_BOM) { if (!target.hasRemaining()) return CoderResult.OVERFLOW; fromUnicodeStatus = 0; cr = fromUWriteBytes(this, bom, 0, bom.length, target, offsets, -1); if (cr.isOverflow()) return cr; } if (fromUChar32 != 0) { if (!target.hasRemaining()) return CoderResult.OVERFLOW; // a note: fromUChar32 will either be 0 or a lead surrogate cr = encodeChar(source, target, offsets, (char) fromUChar32); if (cr != null) return cr; } while (true) { if (!source.hasRemaining()) return CoderResult.UNDERFLOW; if (!target.hasRemaining()) return CoderResult.OVERFLOW; cr = encodeChar(source, target, offsets, source.get()); if (cr != null) return cr; } } private final CoderResult encodeChar(CharBuffer source, ByteBuffer target, IntBuffer offsets, char ch) { int sourceIndex = source.position() - 1; CoderResult cr; int char32; if (UTF16.isSurrogate(ch)) { cr = handleSurrogates(source, ch); if (cr != null) return cr; char32 = fromUChar32; fromUChar32 = 0; } else { char32 = ch; } /* We cannot get any larger than 10FFFF because we are coming from UTF-16 */ // temp[0 ^ endianXOR] = (byte) (char32 >>> 24); // (always 0) temp[1 ^ endianXOR] = (byte) (char32 >>> 16); // same as (byte)((char32 >>> 16) & 0x1f) temp[2 ^ endianXOR] = (byte) (char32 >>> 8); temp[3 ^ endianXOR] = (byte) (char32); cr = fromUWriteBytes(this, temp, 0, 4, target, offsets, sourceIndex); return (cr.isUnderflow() ? null : cr); } } public CharsetDecoder newDecoder() { return new CharsetDecoderUTF32(this); } public CharsetEncoder newEncoder() { return new CharsetEncoderUTF32(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ getNonSurrogateUnicodeSet(setFillIn); } } icu4j-4.2/src/com/ibm/icu/charset/Charset88591.java0000644000175000017500000001123111361046170021530 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2006-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.charset; import java.nio.BufferOverflowException; import java.nio.BufferUnderflowException; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CoderResult; import com.ibm.icu.text.UnicodeSet; class Charset88591 extends CharsetASCII { public Charset88591(String icuCanonicalName, String javaCanonicalName, String[] aliases) { super(icuCanonicalName, javaCanonicalName, aliases); } class CharsetDecoder88591 extends CharsetDecoderASCII { public CharsetDecoder88591(CharsetICU cs) { super(cs); } protected CoderResult decodeLoopCoreOptimized(ByteBuffer source, CharBuffer target, byte[] sourceArray, char[] targetArray, int oldSource, int offset, int limit) { /* * perform 88591 conversion from the source array to the target array. no range check is * necessary. */ for (int i = oldSource; i < limit; i++) targetArray[i + offset] = (char) (sourceArray[i] & 0xff); return null; } protected CoderResult decodeLoopCoreUnoptimized(ByteBuffer source, CharBuffer target) throws BufferUnderflowException, BufferOverflowException { /* * perform 88591 conversion from the source buffer to the target buffer. no range check * is necessary (an exception will be generated to end the loop). */ while (true) target.put((char) (source.get() & 0xff)); } } class CharsetEncoder88591 extends CharsetEncoderASCII { public CharsetEncoder88591(CharsetICU cs) { super(cs); } protected final CoderResult encodeLoopCoreOptimized(CharBuffer source, ByteBuffer target, char[] sourceArray, byte[] targetArray, int oldSource, int offset, int limit, boolean flush) { int i, ch = 0; /* * perform 88591 conversion from the source array to the target array, making sure each * char in the source is within the correct range */ for (i = oldSource; i < limit; i++) { ch = (int) sourceArray[i]; if ((ch & 0xff00) == 0) { targetArray[i + offset] = (byte) ch; } else { break; } } /* * if some byte was not in the correct range, we need to deal with this byte by calling * encodeMalformedOrUnmappable and move the source and target positions to reflect the * early termination of the loop */ if ((ch & 0xff00) != 0) { source.position(i + 1); target.position(i + offset); return encodeMalformedOrUnmappable(source, ch, flush); } else return null; } protected final CoderResult encodeLoopCoreUnoptimized(CharBuffer source, ByteBuffer target, boolean flush) throws BufferUnderflowException, BufferOverflowException { int ch; /* * perform 88591 conversion from the source buffer to the target buffer, making sure * each char in the source is within the correct range */ while (true) { ch = (int) source.get(); if ((ch & 0xff00) == 0) { target.put((byte) ch); } else { break; } } /* * if we reach here, it's because a character was not in the correct range, and we need * to deak with this by calling encodeMalformedOrUnmappable. */ return encodeMalformedOrUnmappable(source, ch, flush); } } public CharsetDecoder newDecoder() { return new CharsetDecoder88591(this); } public CharsetEncoder newEncoder() { return new CharsetEncoder88591(this); } void getUnicodeSetImpl( UnicodeSet setFillIn, int which){ setFillIn.add(0,0xff); } } icu4j-4.2/src/com/ibm/icu/dev/0000755000175000017500000000000011361046406016006 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/dev/test/0000755000175000017500000000000011361050732016762 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/dev/test/lang/0000755000175000017500000000000011361046222017702 5ustar twernertwernericu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterTest.java0000644000175000017500000032034311361046222023433 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.impl.UBiDiProps; import com.ibm.icu.impl.UCaseProps; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.dev.test.TestUtil; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.lang.UCharacterCategory; import com.ibm.icu.lang.UCharacterDirection; import com.ibm.icu.lang.UProperty; import com.ibm.icu.lang.UScript; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; import com.ibm.icu.text.UnicodeSetIterator; import com.ibm.icu.util.RangeValueIterator; import com.ibm.icu.util.ValueIterator; import com.ibm.icu.util.VersionInfo; import com.ibm.icu.impl.UCharacterName; import com.ibm.icu.impl.Utility; import com.ibm.icu.impl.USerializedSet; import com.ibm.icu.impl.NormalizerImpl; import com.ibm.icu.impl.UCharacterProperty; import java.io.BufferedReader; import java.util.Arrays; /** * Testing class for UCharacter * Mostly following the test cases for ICU * @author Syn Wee Quek * @since nov 04 2000 */ public final class UCharacterTest extends TestFmwk { // private variables ============================================= /** * ICU4J data version number */ private final VersionInfo VERSION_ = VersionInfo.getInstance("5.1.0.0"); // constructor =================================================== /** * Constructor */ public UCharacterTest() { } // public methods ================================================ public static void main(String[] arg) { try { UCharacterTest test = new UCharacterTest(); test.run(arg); } catch (Exception e) { e.printStackTrace(); } } /** * Testing the letter and number determination in UCharacter */ public void TestLetterNumber() { for (int i = 0x0041; i < 0x005B; i ++) if (!UCharacter.isLetter(i)) errln("FAIL \\u" + hex(i) + " expected to be a letter"); for (int i = 0x0660; i < 0x066A; i ++) if (UCharacter.isLetter(i)) errln("FAIL \\u" + hex(i) + " expected not to be a letter"); for (int i = 0x0660; i < 0x066A; i ++) if (!UCharacter.isDigit(i)) errln("FAIL \\u" + hex(i) + " expected to be a digit"); for (int i = 0x0041; i < 0x005B; i ++) if (!UCharacter.isLetterOrDigit(i)) errln("FAIL \\u" + hex(i) + " expected not to be a digit"); for (int i = 0x0660; i < 0x066A; i ++) if (!UCharacter.isLetterOrDigit(i)) errln("FAIL \\u" + hex(i) + "expected to be either a letter or a digit"); /* * The following checks work only starting from Unicode 4.0. * Check the version number here. */ VersionInfo version = UCharacter.getUnicodeVersion(); if(version.getMajor()<4 || version.equals(VersionInfo.getInstance(4, 0, 1))) { return; } /* * Sanity check: * Verify that exactly the digit characters have decimal digit values. * This assumption is used in the implementation of u_digit() * (which checks nt=de) * compared with the parallel java.lang.Character.digit() * (which checks Nd). * * This was not true in Unicode 3.2 and earlier. * Unicode 4.0 fixed discrepancies. * Unicode 4.0.1 re-introduced problems in this area due to an * unintentionally incomplete last-minute change. */ String digitsPattern = "[:Nd:]"; String decimalValuesPattern = "[:Numeric_Type=Decimal:]"; UnicodeSet digits, decimalValues; digits= new UnicodeSet(digitsPattern); decimalValues=new UnicodeSet(decimalValuesPattern); compareUSets(digits, decimalValues, "[:Nd:]", "[:Numeric_Type=Decimal:]", true); } /** * Tests for space determination in UCharacter */ public void TestSpaces() { int spaces[] = {0x0020, 0x00a0, 0x2000, 0x2001, 0x2005}; int nonspaces[] = {0x0061, 0x0062, 0x0063, 0x0064, 0x0074}; int whitespaces[] = {0x2008, 0x2009, 0x200a, 0x001c, 0x000c /* ,0x200b */}; // 0x200b was "Zs" in Unicode 4.0, but it is "Cf" in Unicode 4.1 int nonwhitespaces[] = {0x0061, 0x0062, 0x003c, 0x0028, 0x003f, 0x00a0, 0x2007, 0x202f, 0xfefe, 0x200b}; int size = spaces.length; for (int i = 0; i < size; i ++) { if (!UCharacter.isSpaceChar(spaces[i])) { errln("FAIL \\u" + hex(spaces[i]) + " expected to be a space character"); break; } if (UCharacter.isSpaceChar(nonspaces[i])) { errln("FAIL \\u" + hex(nonspaces[i]) + " expected not to be space character"); break; } if (!UCharacter.isWhitespace(whitespaces[i])) { errln("FAIL \\u" + hex(whitespaces[i]) + " expected to be a white space character"); break; } if (UCharacter.isWhitespace(nonwhitespaces[i])) { errln("FAIL \\u" + hex(nonwhitespaces[i]) + " expected not to be a space character"); break; } logln("Ok \\u" + hex(spaces[i]) + " and \\u" + hex(nonspaces[i]) + " and \\u" + hex(whitespaces[i]) + " and \\u" + hex(nonwhitespaces[i])); } int rulewhitespace[] = {0x9, 0xd, 0x20, 0x85, 0x200e, 0x200f, 0x2028, 0x2029}; int nonrulewhitespace[] = {0x8, 0xe, 0x21, 0x86, 0xa0, 0xa1, 0x1680, 0x1681, 0x180e, 0x180f, 0x1FFF, 0x2000, 0x200a, 0x200b, 0x2010, 0x202f, 0x2030, 0x205f, 0x2060, 0x3000, 0x3001}; for (int i = 0; i < rulewhitespace.length; i ++) { if (!UCharacterProperty.isRuleWhiteSpace(rulewhitespace[i])) { errln("\\u" + Utility.hex(rulewhitespace[i], 4) + " expected to be a rule white space"); } } for (int i = 0; i < nonrulewhitespace.length; i ++) { if (UCharacterProperty.isRuleWhiteSpace(nonrulewhitespace[i])) { errln("\\u" + Utility.hex(nonrulewhitespace[i], 4) + " expected to be a non rule white space"); } } } /** * Tests for defined and undefined characters */ public void TestDefined() { int undefined[] = {0xfff1, 0xfff7, 0xfa6b}; int defined[] = {0x523E, 0x004f88, 0x00fffd}; int size = undefined.length; for (int i = 0; i < size; i ++) { if (UCharacter.isDefined(undefined[i])) { errln("FAIL \\u" + hex(undefined[i]) + " expected not to be defined"); break; } if (!UCharacter.isDefined(defined[i])) { errln("FAIL \\u" + hex(defined[i]) + " expected defined"); break; } } } /** * Tests for base characters and their cellwidth */ public void TestBase() { int base[] = {0x0061, 0x000031, 0x0003d2}; int nonbase[] = {0x002B, 0x000020, 0x00203B}; int size = base.length; for (int i = 0; i < size; i ++) { if (UCharacter.isBaseForm(nonbase[i])) { errln("FAIL \\u" + hex(nonbase[i]) + " expected not to be a base character"); break; } if (!UCharacter.isBaseForm(base[i])) { errln("FAIL \\u" + hex(base[i]) + " expected to be a base character"); break; } } } /** * Tests for digit characters */ public void TestDigits() { int digits[] = {0x0030, 0x000662, 0x000F23, 0x000ED5, 0x002160}; //special characters not in the properties table int digits2[] = {0x3007, 0x004e00, 0x004e8c, 0x004e09, 0x0056d8, 0x004e94, 0x00516d, 0x4e03, 0x00516b, 0x004e5d}; int nondigits[] = {0x0010, 0x000041, 0x000122, 0x0068FE}; int digitvalues[] = {0, 2, 3, 5, 1}; int digitvalues2[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; int size = digits.length; for (int i = 0; i < size; i ++) { if (UCharacter.isDigit(digits[i]) && UCharacter.digit(digits[i]) != digitvalues[i]) { errln("FAIL \\u" + hex(digits[i]) + " expected digit with value " + digitvalues[i]); break; } } size = nondigits.length; for (int i = 0; i < size; i ++) if (UCharacter.isDigit(nondigits[i])) { errln("FAIL \\u" + hex(nondigits[i]) + " expected nondigit"); break; } size = digits2.length; for (int i = 0; i < 10; i ++) { if (UCharacter.isDigit(digits2[i]) && UCharacter.digit(digits2[i]) != digitvalues2[i]) { errln("FAIL \\u" + hex(digits2[i]) + " expected digit with value " + digitvalues2[i]); break; } } } /** * Tests for numeric characters */ public void TestNumeric() { if (UCharacter.getNumericValue(0x00BC) != -2) { errln("Numeric value of 0x00BC expected to be -2"); } for (int i = '0'; i < '9'; i ++) { int n1 = UCharacter.getNumericValue(i); double n2 = UCharacter.getUnicodeNumericValue(i); if (n1 != n2 || n1 != (i - '0')) { errln("Numeric value of " + (char)i + " expected to be " + (i - '0')); } } for (int i = 'A'; i < 'F'; i ++) { int n1 = UCharacter.getNumericValue(i); double n2 = UCharacter.getUnicodeNumericValue(i); if (n2 != UCharacter.NO_NUMERIC_VALUE || n1 != (i - 'A' + 10)) { errln("Numeric value of " + (char)i + " expected to be " + (i - 'A' + 10)); } } for (int i = 0xFF21; i < 0xFF26; i ++) { // testing full wideth latin characters A-F int n1 = UCharacter.getNumericValue(i); double n2 = UCharacter.getUnicodeNumericValue(i); if (n2 != UCharacter.NO_NUMERIC_VALUE || n1 != (i - 0xFF21 + 10)) { errln("Numeric value of " + (char)i + " expected to be " + (i - 0xFF21 + 10)); } } // testing han numbers int han[] = {0x96f6, 0, 0x58f9, 1, 0x8cb3, 2, 0x53c3, 3, 0x8086, 4, 0x4f0d, 5, 0x9678, 6, 0x67d2, 7, 0x634c, 8, 0x7396, 9, 0x5341, 10, 0x62fe, 10, 0x767e, 100, 0x4f70, 100, 0x5343, 1000, 0x4edf, 1000, 0x824c, 10000, 0x5104, 100000000}; for (int i = 0; i < han.length; i += 2) { if (UCharacter.getHanNumericValue(han[i]) != han[i + 1]) { errln("Numeric value of \\u" + Integer.toHexString(han[i]) + " expected to be " + han[i + 1]); } } } /** * Tests for version */ public void TestVersion() { if (!UCharacter.getUnicodeVersion().equals(VERSION_)) errln("FAIL expected: " + VERSION_ + "got: " + UCharacter.getUnicodeVersion()); } /** * Tests for control characters */ public void TestISOControl() { int control[] = {0x001b, 0x000097, 0x000082}; int noncontrol[] = {0x61, 0x000031, 0x0000e2}; int size = control.length; for (int i = 0; i < size; i ++) { if (!UCharacter.isISOControl(control[i])) { errln("FAIL 0x" + Integer.toHexString(control[i]) + " expected to be a control character"); break; } if (UCharacter.isISOControl(noncontrol[i])) { errln("FAIL 0x" + Integer.toHexString(noncontrol[i]) + " expected to be not a control character"); break; } logln("Ok 0x" + Integer.toHexString(control[i]) + " and 0x" + Integer.toHexString(noncontrol[i])); } } /** * Test Supplementary */ public void TestSupplementary() { for (int i = 0; i < 0x10000; i ++) { if (UCharacter.isSupplementary(i)) { errln("Codepoint \\u" + Integer.toHexString(i) + " is not supplementary"); } } for (int i = 0x10000; i < 0x10FFFF; i ++) { if (!UCharacter.isSupplementary(i)) { errln("Codepoint \\u" + Integer.toHexString(i) + " is supplementary"); } } } /** * Test mirroring */ public void TestMirror() { if (!(UCharacter.isMirrored(0x28) && UCharacter.isMirrored(0xbb) && UCharacter.isMirrored(0x2045) && UCharacter.isMirrored(0x232a) && !UCharacter.isMirrored(0x27) && !UCharacter.isMirrored(0x61) && !UCharacter.isMirrored(0x284) && !UCharacter.isMirrored(0x3400))) { errln("isMirrored() does not work correctly"); } if (!(UCharacter.getMirror(0x3c) == 0x3e && UCharacter.getMirror(0x5d) == 0x5b && UCharacter.getMirror(0x208d) == 0x208e && UCharacter.getMirror(0x3017) == 0x3016 && UCharacter.getMirror(0xbb) == 0xab && UCharacter.getMirror(0x2215) == 0x29F5 && UCharacter.getMirror(0x29F5) == 0x2215 && /* large delta between the code points */ UCharacter.getMirror(0x2e) == 0x2e && UCharacter.getMirror(0x6f3) == 0x6f3 && UCharacter.getMirror(0x301c) == 0x301c && UCharacter.getMirror(0xa4ab) == 0xa4ab && /* see Unicode Corrigendum #6 at http://www.unicode.org/versions/corrigendum6.html */ UCharacter.getMirror(0x2018) == 0x2018 && UCharacter.getMirror(0x201b) == 0x201b && UCharacter.getMirror(0x301d) == 0x301d)) { errln("getMirror() does not work correctly"); } /* verify that Bidi_Mirroring_Glyph roundtrips */ UnicodeSet set=new UnicodeSet("[:Bidi_Mirrored:]"); UnicodeSetIterator iter=new UnicodeSetIterator(set); int start, end, c2, c3; while(iter.nextRange() && (start=iter.codepoint)>=0) { end=iter.codepointEnd; do { c2=UCharacter.getMirror(start); c3=UCharacter.getMirror(c2); if(c3!=start) { errln("getMirror() does not roundtrip: U+"+hex(start)+"->U+"+hex(c2)+"->U+"+hex(c3)); } } while(++start<=end); } // verify that Unicode Corrigendum #6 reverts mirrored status of the following if (UCharacter.isMirrored(0x2018) || UCharacter.isMirrored(0x201d) || UCharacter.isMirrored(0x201f) || UCharacter.isMirrored(0x301e)) { errln("Unicode Corrigendum #6 conflict, one or more of 2018/201d/201f/301e has mirrored property"); } } /** * Tests for printable characters */ public void TestPrint() { int printable[] = {0x0042, 0x00005f, 0x002014}; int nonprintable[] = {0x200c, 0x00009f, 0x00001b}; int size = printable.length; for (int i = 0; i < size; i ++) { if (!UCharacter.isPrintable(printable[i])) { errln("FAIL \\u" + hex(printable[i]) + " expected to be a printable character"); break; } if (UCharacter.isPrintable(nonprintable[i])) { errln("FAIL \\u" + hex(nonprintable[i]) + " expected not to be a printable character"); break; } logln("Ok \\u" + hex(printable[i]) + " and \\u" + hex(nonprintable[i])); } // test all ISO 8 controls for (int ch = 0; ch <= 0x9f; ++ ch) { if (ch == 0x20) { // skip ASCII graphic characters and continue with DEL ch = 0x7f; } if (UCharacter.isPrintable(ch)) { errln("Fail \\u" + hex(ch) + " is a ISO 8 control character hence not printable\n"); } } /* test all Latin-1 graphic characters */ for (int ch = 0x20; ch <= 0xff; ++ ch) { if (ch == 0x7f) { ch = 0xa0; } if (!UCharacter.isPrintable(ch) && ch != 0x00AD/* Unicode 4.0 changed the defintion of soft hyphen to be a Cf*/) { errln("Fail \\u" + hex(ch) + " is a Latin-1 graphic character\n"); } } } /** * Testing for identifier characters */ public void TestIdentifier() { int unicodeidstart[] = {0x0250, 0x0000e2, 0x000061}; int nonunicodeidstart[] = {0x2000, 0x00000a, 0x002019}; int unicodeidpart[] = {0x005f, 0x000032, 0x000045}; int nonunicodeidpart[] = {0x2030, 0x0000a3, 0x000020}; int idignore[] = {0x0006, 0x0010, 0x206b}; int nonidignore[] = {0x0075, 0x0000a3, 0x000061}; int size = unicodeidstart.length; for (int i = 0; i < size; i ++) { if (!UCharacter.isUnicodeIdentifierStart(unicodeidstart[i])) { errln("FAIL \\u" + hex(unicodeidstart[i]) + " expected to be a unicode identifier start character"); break; } if (UCharacter.isUnicodeIdentifierStart(nonunicodeidstart[i])) { errln("FAIL \\u" + hex(nonunicodeidstart[i]) + " expected not to be a unicode identifier start " + "character"); break; } if (!UCharacter.isUnicodeIdentifierPart(unicodeidpart[i])) { errln("FAIL \\u" + hex(unicodeidpart[i]) + " expected to be a unicode identifier part character"); break; } if (UCharacter.isUnicodeIdentifierPart(nonunicodeidpart[i])) { errln("FAIL \\u" + hex(nonunicodeidpart[i]) + " expected not to be a unicode identifier part " + "character"); break; } if (!UCharacter.isIdentifierIgnorable(idignore[i])) { errln("FAIL \\u" + hex(idignore[i]) + " expected to be a ignorable unicode character"); break; } if (UCharacter.isIdentifierIgnorable(nonidignore[i])) { errln("FAIL \\u" + hex(nonidignore[i]) + " expected not to be a ignorable unicode character"); break; } logln("Ok \\u" + hex(unicodeidstart[i]) + " and \\u" + hex(nonunicodeidstart[i]) + " and \\u" + hex(unicodeidpart[i]) + " and \\u" + hex(nonunicodeidpart[i]) + " and \\u" + hex(idignore[i]) + " and \\u" + hex(nonidignore[i])); } } /** * Tests for the character types, direction.
* This method reads in UnicodeData.txt file for testing purposes. A * default path is provided relative to the src path, however the user * could set a system property to change the directory path.
* e.g. java -DUnicodeData="data_directory_path" * com.ibm.icu.dev.test.lang.UCharacterTest */ public void TestUnicodeData() { // this is the 2 char category types used in the UnicodeData file final String TYPE = "LuLlLtLmLoMnMeMcNdNlNoZsZlZpCcCfCoCsPdPsPePcPoSmScSkSoPiPf"; // directory types used in the UnicodeData file // padded by spaces to make each type size 4 final String DIR = "L R EN ES ET AN CS B S WS ON LRE LRO AL RLE RLO PDF NSM BN "; final int LASTUNICODECHAR = 0xFFFD; int ch = 0, index = 0, type = 0, dir = 0; try { BufferedReader input = TestUtil.getDataReader( "unicode/UnicodeData.txt"); int numErrors = 0; while (ch != LASTUNICODECHAR) { String s = input.readLine(); if(s.length()<4 || s.startsWith("#")) { continue; } // geting the unicode character, its type and its direction ch = Integer.parseInt(s.substring(0, 4), 16); index = s.indexOf(';', 5); String t = s.substring(index + 1, index + 3); index += 4; int oldindex = index; index = s.indexOf(';', index); int cc = Integer.parseInt(s.substring(oldindex, index)); oldindex = index + 1; index = s.indexOf(';', oldindex); String d = s.substring(oldindex, index); for (int i = 0; i < 6; i ++) { index = s.indexOf(';', index + 1); // skipping to the 11th field } // iso comment oldindex = index + 1; index = s.indexOf(';', oldindex); String isocomment = s.substring(oldindex, index); // uppercase oldindex = index + 1; index = s.indexOf(';', oldindex); String upper = s.substring(oldindex, index); // lowercase oldindex = index + 1; index = s.indexOf(';', oldindex); String lower = s.substring(oldindex, index); // titlecase last element oldindex = index + 1; String title = s.substring(oldindex); // testing the category // we override the general category of some control // characters type = TYPE.indexOf(t); if (type < 0) type = 0; else type = (type >> 1) + 1; if (UCharacter.getType(ch) != type) { errln("FAIL \\u" + hex(ch) + " expected type " + type); break; } if (UCharacter.getIntPropertyValue(ch, UProperty.GENERAL_CATEGORY_MASK) != (1 << type)) { errln("error: getIntPropertyValue(\\u" + Integer.toHexString(ch) + ", UProperty.GENERAL_CATEGORY_MASK) != " + "getMask(getType(ch))"); } // testing combining class if (UCharacter.getCombiningClass(ch) != cc) { errln("FAIL \\u" + hex(ch) + " expected combining " + "class " + cc); break; } // testing the direction if (d.length() == 1) d = d + " "; dir = DIR.indexOf(d) >> 2; if (UCharacter.getDirection(ch) != dir) { errln("FAIL \\u" + hex(ch) + " expected direction " + dir + " but got " + UCharacter.getDirection(ch)); break; } byte bdir = (byte)dir; if (UCharacter.getDirectionality(ch) != bdir) { errln("FAIL \\u" + hex(ch) + " expected directionality " + bdir + " but got " + UCharacter.getDirectionality(ch)); break; } // testing iso comment try{ String comment = UCharacter.getISOComment(ch); if (comment == null) { comment = ""; } if (!comment.equals(isocomment)) { errln("FAIL \\u" + hex(ch) + " expected iso comment " + isocomment); break; } }catch(Exception e){ if(e.getMessage().indexOf("unames.icu") >= 0){ numErrors++; }else{ throw e; } } int tempchar = ch; if (upper.length() > 0) { tempchar = Integer.parseInt(upper, 16); } if (UCharacter.toUpperCase(ch) != tempchar) { errln("FAIL \\u" + Utility.hex(ch, 4) + " expected uppercase \\u" + Utility.hex(tempchar, 4)); break; } tempchar = ch; if (lower.length() > 0) { tempchar = Integer.parseInt(lower, 16); } if (UCharacter.toLowerCase(ch) != tempchar) { errln("FAIL \\u" + Utility.hex(ch, 4) + " expected lowercase \\u" + Utility.hex(tempchar, 4)); break; } tempchar = ch; if (title.length() > 0) { tempchar = Integer.parseInt(title, 16); } if (UCharacter.toTitleCase(ch) != tempchar) { errln("FAIL \\u" + Utility.hex(ch, 4) + " expected titlecase \\u" + Utility.hex(tempchar, 4)); break; } } input.close(); if(numErrors > 0){ warnln("Could not find unames.icu"); } } catch (Exception e) { e.printStackTrace(); } if (UCharacter.UnicodeBlock.of(0x0041) != UCharacter.UnicodeBlock.BASIC_LATIN || UCharacter.getIntPropertyValue(0x41, UProperty.BLOCK) != UCharacter.UnicodeBlock.BASIC_LATIN.getID()) { errln("UCharacter.UnicodeBlock.of(\\u0041) property failed! " + "Expected : " + UCharacter.UnicodeBlock.BASIC_LATIN.getID() + " got " + UCharacter.UnicodeBlock.of(0x0041)); } // sanity check on repeated properties for (ch = 0xfffe; ch <= 0x10ffff;) { type = UCharacter.getType(ch); if (UCharacter.getIntPropertyValue(ch, UProperty.GENERAL_CATEGORY_MASK) != (1 << type)) { errln("error: UCharacter.getIntPropertyValue(\\u" + Integer.toHexString(ch) + ", UProperty.GENERAL_CATEGORY_MASK) != " + "getMask(getType())"); } if (type != UCharacterCategory.UNASSIGNED) { errln("error: UCharacter.getType(\\u" + Utility.hex(ch, 4) + " != UCharacterCategory.UNASSIGNED (returns " + UCharacterCategory.toString(UCharacter.getType(ch)) + ")"); } if ((ch & 0xffff) == 0xfffe) { ++ ch; } else { ch += 0xffff; } } // test that PUA is not "unassigned" for(ch = 0xe000; ch <= 0x10fffd;) { type = UCharacter.getType(ch); if (UCharacter.getIntPropertyValue(ch, UProperty.GENERAL_CATEGORY_MASK) != (1 << type)) { errln("error: UCharacter.getIntPropertyValue(\\u" + Integer.toHexString(ch) + ", UProperty.GENERAL_CATEGORY_MASK) != " + "getMask(getType())"); } if (type == UCharacterCategory.UNASSIGNED) { errln("error: UCharacter.getType(\\u" + Utility.hex(ch, 4) + ") == UCharacterCategory.UNASSIGNED"); } else if (type != UCharacterCategory.PRIVATE_USE) { logln("PUA override: UCharacter.getType(\\u" + Utility.hex(ch, 4) + ")=" + type); } if (ch == 0xf8ff) { ch = 0xf0000; } else if (ch == 0xffffd) { ch = 0x100000; } else { ++ ch; } } } /** * Test for the character names */ public void TestNames() { try{ int length = UCharacterName.getInstance().getMaxCharNameLength(); if (length < 83) { // Unicode 3.2 max char name length errln("getMaxCharNameLength()=" + length + " is too short"); } // ### TODO same tests for max ISO comment length as for max name length int c[] = {0x0061, //LATIN SMALL LETTER A 0x000284, //LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK 0x003401, //CJK UNIFIED IDEOGRAPH-3401 0x007fed, //CJK UNIFIED IDEOGRAPH-7FED 0x00ac00, //HANGUL SYLLABLE GA 0x00d7a3, //HANGUL SYLLABLE HIH 0x00d800, 0x00dc00, //LINEAR B SYLLABLE B008 A 0xff08, //FULLWIDTH LEFT PARENTHESIS 0x00ffe5, //FULLWIDTH YEN SIGN 0x00ffff, //null 0x0023456 //CJK UNIFIED IDEOGRAPH-23456 }; String name[] = { "LATIN SMALL LETTER A", "LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK", "CJK UNIFIED IDEOGRAPH-3401", "CJK UNIFIED IDEOGRAPH-7FED", "HANGUL SYLLABLE GA", "HANGUL SYLLABLE HIH", "", "", "FULLWIDTH LEFT PARENTHESIS", "FULLWIDTH YEN SIGN", "", "CJK UNIFIED IDEOGRAPH-23456" }; String oldname[] = {"", "LATIN SMALL LETTER DOTLESS J BAR HOOK", "", "", "", "", "", "", "FULLWIDTH OPENING PARENTHESIS", "", "", ""}; String extendedname[] = {"LATIN SMALL LETTER A", "LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK", "CJK UNIFIED IDEOGRAPH-3401", "CJK UNIFIED IDEOGRAPH-7FED", "HANGUL SYLLABLE GA", "HANGUL SYLLABLE HIH", "", "", "FULLWIDTH LEFT PARENTHESIS", "FULLWIDTH YEN SIGN", "", "CJK UNIFIED IDEOGRAPH-23456"}; int size = c.length; String str; int uc; for (int i = 0; i < size; i ++) { // modern Unicode character name str = UCharacter.getName(c[i]); if ((str == null && name[i].length() > 0) || (str != null && !str.equals(name[i]))) { errln("FAIL \\u" + hex(c[i]) + " expected name " + name[i]); break; } // 1.0 Unicode character name str = UCharacter.getName1_0(c[i]); if ((str == null && oldname[i].length() > 0) || (str != null && !str.equals(oldname[i]))) { errln("FAIL \\u" + hex(c[i]) + " expected 1.0 name " + oldname[i]); break; } // extended character name str = UCharacter.getExtendedName(c[i]); if (str == null || !str.equals(extendedname[i])) { errln("FAIL \\u" + hex(c[i]) + " expected extended name " + extendedname[i]); break; } // retrieving unicode character from modern name uc = UCharacter.getCharFromName(name[i]); if (uc != c[i] && name[i].length() != 0) { errln("FAIL " + name[i] + " expected character \\u" + hex(c[i])); break; } //retrieving unicode character from 1.0 name uc = UCharacter.getCharFromName1_0(oldname[i]); if (uc != c[i] && oldname[i].length() != 0) { errln("FAIL " + oldname[i] + " expected 1.0 character \\u" + hex(c[i])); break; } //retrieving unicode character from 1.0 name uc = UCharacter.getCharFromExtendedName(extendedname[i]); if (uc != c[i] && i != 0 && (i == 1 || i == 6)) { errln("FAIL " + extendedname[i] + " expected extended character \\u" + hex(c[i])); break; } } // test getName works with mixed-case names (new in 2.0) if (0x61 != UCharacter.getCharFromName("LATin smALl letTER A")) { errln("FAIL: 'LATin smALl letTER A' should result in character " + "U+0061"); } if (getInclusion() >= 5) { // extra testing different from icu for (int i = UCharacter.MIN_VALUE; i < UCharacter.MAX_VALUE; i ++) { str = UCharacter.getName(i); if (str != null && UCharacter.getCharFromName(str) != i) { errln("FAIL \\u" + hex(i) + " " + str + " retrieval of name and vice versa" ); break; } } } // Test getCharNameCharacters if (getInclusion() >= 10) { boolean map[] = new boolean[256]; UnicodeSet set = new UnicodeSet(1, 0); // empty set UnicodeSet dumb = new UnicodeSet(1, 0); // empty set // uprv_getCharNameCharacters() will likely return more lowercase // letters than actual character names contain because // it includes all the characters in lowercased names of // general categories, for the full possible set of extended names. UCharacterName.getInstance().getCharNameCharacters(set); // build set the dumb (but sure-fire) way Arrays.fill(map, false); int maxLength = 0; for (int cp = 0; cp < 0x110000; ++ cp) { String n = UCharacter.getExtendedName(cp); int len = n.length(); if (len > maxLength) { maxLength = len; } for (int i = 0; i < len; ++ i) { char ch = n.charAt(i); if (!map[ch & 0xff]) { dumb.add(ch); map[ch & 0xff] = true; } } } length = UCharacterName.getInstance().getMaxCharNameLength(); if (length != maxLength) { errln("getMaxCharNameLength()=" + length + " differs from the maximum length " + maxLength + " of all extended names"); } // compare the sets. Where is my uset_equals?!! boolean ok = true; for (int i = 0; i < 256; ++ i) { if (set.contains(i) != dumb.contains(i)) { if (0x61 <= i && i <= 0x7a // a-z && set.contains(i) && !dumb.contains(i)) { // ignore lowercase a-z that are in set but not in dumb ok = true; } else { ok = false; break; } } } String pattern1 = set.toPattern(true); String pattern2 = dumb.toPattern(true); if (!ok) { errln("FAIL: getCharNameCharacters() returned " + pattern1 + " expected " + pattern2 + " (too many lowercase a-z are ok)"); } else { logln("Ok: getCharNameCharacters() returned " + pattern1); } } // improve code coverage String expected = "LATIN SMALL LETTER A|LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK|"+ "CJK UNIFIED IDEOGRAPH-3401|CJK UNIFIED IDEOGRAPH-7FED|HANGUL SYLLABLE GA|"+ "HANGUL SYLLABLE HIH|LINEAR B SYLLABLE B008 A|FULLWIDTH LEFT PARENTHESIS|"+ "FULLWIDTH YEN SIGN|"+ "null|"+ // getName returns null because 0xFFFF does not have a name, but has an extended name! "CJK UNIFIED IDEOGRAPH-23456"; String separator= "|"; String source = Utility.valueOf(c); String result = UCharacter.getName(source, separator); if(!result.equals(expected)){ errln("UCharacter.getName did not return the expected result.\n\t Expected: "+ expected+"\n\t Got: "+ result); } }catch(IllegalArgumentException e){ if(e.getMessage().indexOf("unames.icu") >= 0){ warnln("Could not find unames.icu"); }else{ throw e; } } } /** * Testing name iteration */ public void TestNameIteration()throws Exception { try { ValueIterator iterator = UCharacter.getExtendedNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); ValueIterator.Element old = new ValueIterator.Element(); // testing subrange iterator.setRange(-10, -5); if (iterator.next(element)) { errln("Fail, expected iterator to return false when range is set outside the meaningful range"); } iterator.setRange(0x110000, 0x111111); if (iterator.next(element)) { errln("Fail, expected iterator to return false when range is set outside the meaningful range"); } try { iterator.setRange(50, 10); errln("Fail, expected exception when encountered invalid range"); } catch (Exception e) { } iterator.setRange(-10, 10); if (!iterator.next(element) || element.integer != 0) { errln("Fail, expected iterator to return 0 when range start limit is set outside the meaningful range"); } iterator.setRange(0x10FFFE, 0x200000); int last = 0; while (iterator.next(element)) { last = element.integer; } if (last != 0x10FFFF) { errln("Fail, expected iterator to return 0x10FFFF when range end limit is set outside the meaningful range"); } iterator = UCharacter.getNameIterator(); iterator.setRange(0xF, 0x45); while (iterator.next(element)) { if (element.integer <= old.integer) { errln("FAIL next returned a less codepoint \\u" + Integer.toHexString(element.integer) + " than \\u" + Integer.toHexString(old.integer)); break; } if (!UCharacter.getName(element.integer).equals(element.value)) { errln("FAIL next codepoint \\u" + Integer.toHexString(element.integer) + " does not have the expected name " + UCharacter.getName(element.integer) + " instead have the name " + (String)element.value); break; } old.integer = element.integer; } iterator.reset(); iterator.next(element); if (element.integer != 0x20) { errln("FAIL reset in iterator"); } iterator.setRange(0, 0x110000); old.integer = 0; while (iterator.next(element)) { if (element.integer != 0 && element.integer <= old.integer) { errln("FAIL next returned a less codepoint \\u" + Integer.toHexString(element.integer) + " than \\u" + Integer.toHexString(old.integer)); break; } if (!UCharacter.getName(element.integer).equals(element.value)) { errln("FAIL next codepoint \\u" + Integer.toHexString(element.integer) + " does not have the expected name " + UCharacter.getName(element.integer) + " instead have the name " + (String)element.value); break; } for (int i = old.integer + 1; i < element.integer; i ++) { if (UCharacter.getName(i) != null) { errln("FAIL between codepoints are not null \\u" + Integer.toHexString(old.integer) + " and " + Integer.toHexString(element.integer) + " has " + Integer.toHexString(i) + " with a name " + UCharacter.getName(i)); break; } } old.integer = element.integer; } iterator = UCharacter.getExtendedNameIterator(); old.integer = 0; while (iterator.next(element)) { if (element.integer != 0 && element.integer != old.integer) { errln("FAIL next returned a codepoint \\u" + Integer.toHexString(element.integer) + " different from \\u" + Integer.toHexString(old.integer)); break; } if (!UCharacter.getExtendedName(element.integer).equals( element.value)) { errln("FAIL next codepoint \\u" + Integer.toHexString(element.integer) + " name should be " + UCharacter.getExtendedName(element.integer) + " instead of " + (String)element.value); break; } old.integer++; } iterator = UCharacter.getName1_0Iterator(); old.integer = 0; while (iterator.next(element)) { logln(Integer.toHexString(element.integer) + " " + (String)element.value); if (element.integer != 0 && element.integer <= old.integer) { errln("FAIL next returned a less codepoint \\u" + Integer.toHexString(element.integer) + " than \\u" + Integer.toHexString(old.integer)); break; } if (!element.value.equals(UCharacter.getName1_0( element.integer))) { errln("FAIL next codepoint \\u" + Integer.toHexString(element.integer) + " name cannot be null"); break; } for (int i = old.integer + 1; i < element.integer; i ++) { if (UCharacter.getName1_0(i) != null) { errln("FAIL between codepoints are not null \\u" + Integer.toHexString(old.integer) + " and " + Integer.toHexString(element.integer) + " has " + Integer.toHexString(i) + " with a name " + UCharacter.getName1_0(i)); break; } } old.integer = element.integer; } } catch(Exception e){ // !!! wouldn't preflighting be simpler? This looks like // it is effectively be doing that. It seems that for every // true error the code will call errln, which will throw the error, which // this will catch, which this will then rethrow the error. Just seems // cumbersome. if(e.getMessage().indexOf("unames.icu") >= 0){ warnln("Could not find unames.icu"); } else { errln(e.getMessage()); } } } /** * Testing the for illegal characters */ public void TestIsLegal() { int illegal[] = {0xFFFE, 0x00FFFF, 0x005FFFE, 0x005FFFF, 0x0010FFFE, 0x0010FFFF, 0x110000, 0x00FDD0, 0x00FDDF, 0x00FDE0, 0x00FDEF, 0xD800, 0xDC00, -1}; int legal[] = {0x61, 0x00FFFD, 0x0010000, 0x005FFFD, 0x0060000, 0x0010FFFD, 0xFDCF, 0x00FDF0}; for (int count = 0; count < illegal.length; count ++) { if (UCharacter.isLegal(illegal[count])) { errln("FAIL \\u" + hex(illegal[count]) + " is not a legal character"); } } for (int count = 0; count < legal.length; count ++) { if (!UCharacter.isLegal(legal[count])) { errln("FAIL \\u" + hex(legal[count]) + " is a legal character"); } } String illegalStr = "This is an illegal string "; String legalStr = "This is a legal string "; for (int count = 0; count < illegal.length; count ++) { StringBuffer str = new StringBuffer(illegalStr); if (illegal[count] < 0x10000) { str.append((char)illegal[count]); } else { char lead = UTF16.getLeadSurrogate(illegal[count]); char trail = UTF16.getTrailSurrogate(illegal[count]); str.append(lead); str.append(trail); } if (UCharacter.isLegal(str.toString())) { errln("FAIL " + hex(str.toString()) + " is not a legal string"); } } for (int count = 0; count < legal.length; count ++) { StringBuffer str = new StringBuffer(legalStr); if (legal[count] < 0x10000) { str.append((char)legal[count]); } else { char lead = UTF16.getLeadSurrogate(legal[count]); char trail = UTF16.getTrailSurrogate(legal[count]); str.append(lead); str.append(trail); } if (!UCharacter.isLegal(str.toString())) { errln("FAIL " + hex(str.toString()) + " is a legal string"); } } } /** * Test getCodePoint */ public void TestCodePoint() { int ch = 0x10000; for (char i = 0xD800; i < 0xDC00; i ++) { for (char j = 0xDC00; j <= 0xDFFF; j ++) { if (UCharacter.getCodePoint(i, j) != ch) { errln("Error getting codepoint for surrogate " + "characters \\u" + Integer.toHexString(i) + " \\u" + Integer.toHexString(j)); } ch ++; } } try { UCharacter.getCodePoint((char)0xD7ff, (char)0xDC00); errln("Invalid surrogate characters should not form a " + "supplementary"); } catch(Exception e) { } for (char i = 0; i < 0xFFFF; i++) { if (i == 0xFFFE || (i >= 0xD800 && i <= 0xDFFF) || (i >= 0xFDD0 && i <= 0xFDEF)) { // not a character try { UCharacter.getCodePoint(i); errln("Not a character is not a valid codepoint"); } catch (Exception e) { } } else { if (UCharacter.getCodePoint(i) != i) { errln("A valid codepoint should return itself"); } } } } /** * This method is alittle different from the type test in icu4c. * But combined with testUnicodeData, they basically do the same thing. */ public void TestIteration() { int limit = 0; int prevtype = -1; int shouldBeDir; int test[][]={{0x41, UCharacterCategory.UPPERCASE_LETTER}, {0x308, UCharacterCategory.NON_SPACING_MARK}, {0xfffe, UCharacterCategory.GENERAL_OTHER_TYPES}, {0xe0041, UCharacterCategory.FORMAT}, {0xeffff, UCharacterCategory.UNASSIGNED}}; // default Bidi classes for unassigned code points int defaultBidi[][]={{ 0x0590, UCharacterDirection.LEFT_TO_RIGHT }, { 0x0600, UCharacterDirection.RIGHT_TO_LEFT }, { 0x07C0, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x0900, UCharacterDirection.RIGHT_TO_LEFT }, { 0xFB1D, UCharacterDirection.LEFT_TO_RIGHT }, { 0xFB50, UCharacterDirection.RIGHT_TO_LEFT }, { 0xFE00, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0xFE70, UCharacterDirection.LEFT_TO_RIGHT }, { 0xFF00, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x10800, UCharacterDirection.LEFT_TO_RIGHT }, { 0x11000, UCharacterDirection.RIGHT_TO_LEFT }, { 0x110000, UCharacterDirection.LEFT_TO_RIGHT }}; RangeValueIterator iterator = UCharacter.getTypeIterator(); RangeValueIterator.Element result = new RangeValueIterator.Element(); while (iterator.next(result)) { if (result.start != limit) { errln("UCharacterIteration failed: Ranges not continuous " + "0x" + Integer.toHexString(result.start)); } limit = result.limit; if (result.value == prevtype) { errln("Type of the next set of enumeration should be different"); } prevtype = result.value; for (int i = result.start; i < limit; i ++) { int temptype = UCharacter.getType(i); if (temptype != result.value) { errln("UCharacterIteration failed: Codepoint \\u" + Integer.toHexString(i) + " should be of type " + temptype + " not " + result.value); } } for (int i = 0; i < test.length; ++ i) { if (result.start <= test[i][0] && test[i][0] < result.limit) { if (result.value != test[i][1]) { errln("error: getTypes() has range [" + Integer.toHexString(result.start) + ", " + Integer.toHexString(result.limit) + "] with type " + result.value + " instead of [" + Integer.toHexString(test[i][0]) + ", " + Integer.toHexString(test[i][1])); } } } // LineBreak.txt specifies: // # - Assigned characters that are not listed explicitly are given the value // # "AL". // # - Unassigned characters are given the value "XX". // // PUA characters are listed explicitly with "XX". // Verify that no assigned character has "XX". if (result.value != UCharacterCategory.UNASSIGNED && result.value != UCharacterCategory.PRIVATE_USE) { int c = result.start; while (c < result.limit) { if (0 == UCharacter.getIntPropertyValue(c, UProperty.LINE_BREAK)) { logln("error UProperty.LINE_BREAK(assigned \\u" + Utility.hex(c, 4) + ")=XX"); } ++ c; } } /* * Verify default Bidi classes. * For recent Unicode versions, see UCD.html. * * For older Unicode versions: * See table 3-7 "Bidirectional Character Types" in UAX #9. * http://www.unicode.org/reports/tr9/ * * See also DerivedBidiClass.txt for Cn code points! * * Unicode 4.0.1/Public Review Issue #28 (http://www.unicode.org/review/resolved-pri.html) * changed some default values. * In particular, non-characters and unassigned Default Ignorable Code Points * change from L to BN. * * UCD.html version 4.0.1 does not yet reflect these changes. */ if (result.value == UCharacterCategory.UNASSIGNED || result.value == UCharacterCategory.PRIVATE_USE) { int c = result.start; for (int i = 0; i < defaultBidi.length && c < result.limit; ++ i) { if (c < defaultBidi[i][0]) { while (c < result.limit && c < defaultBidi[i][0]) { // TODO change to public UCharacter.isNonCharacter(c) once it's available if(com.ibm.icu.impl.UCharacterUtility.isNonCharacter(c) || UCharacter.hasBinaryProperty(c, UProperty.DEFAULT_IGNORABLE_CODE_POINT)) { shouldBeDir=UCharacter.BOUNDARY_NEUTRAL; } else { shouldBeDir=defaultBidi[i][1]; } if (UCharacter.getDirection(c) != shouldBeDir || UCharacter.getIntPropertyValue(c, UProperty.BIDI_CLASS) != shouldBeDir) { errln("error: getDirection(unassigned/PUA " + Integer.toHexString(c) + ") should be " + shouldBeDir); } ++ c; } } } } } iterator.reset(); if (iterator.next(result) == false || result.start != 0) { System.out.println("result " + result.start); errln("UCharacterIteration reset() failed"); } } /** * Testing getAge */ public void TestGetAge() { int ages[] = {0x41, 1, 1, 0, 0, 0xffff, 1, 1, 0, 0, 0x20ab, 2, 0, 0, 0, 0x2fffe, 2, 0, 0, 0, 0x20ac, 2, 1, 0, 0, 0xfb1d, 3, 0, 0, 0, 0x3f4, 3, 1, 0, 0, 0x10300, 3, 1, 0, 0, 0x220, 3, 2, 0, 0, 0xff60, 3, 2, 0, 0}; for (int i = 0; i < ages.length; i += 5) { VersionInfo age = UCharacter.getAge(ages[i]); if (age != VersionInfo.getInstance(ages[i + 1], ages[i + 2], ages[i + 3], ages[i + 4])) { errln("error: getAge(\\u" + Integer.toHexString(ages[i]) + ") == " + age.toString() + " instead of " + ages[i + 1] + "." + ages[i + 2] + "." + ages[i + 3] + "." + ages[i + 4]); } } } /** * Test binary non core properties */ public void TestAdditionalProperties() { // test data for hasBinaryProperty() int props[][] = { // code point, property { 0x0627, UProperty.ALPHABETIC, 1 }, { 0x1034a, UProperty.ALPHABETIC, 1 }, { 0x2028, UProperty.ALPHABETIC, 0 }, { 0x0066, UProperty.ASCII_HEX_DIGIT, 1 }, { 0x0067, UProperty.ASCII_HEX_DIGIT, 0 }, { 0x202c, UProperty.BIDI_CONTROL, 1 }, { 0x202f, UProperty.BIDI_CONTROL, 0 }, { 0x003c, UProperty.BIDI_MIRRORED, 1 }, { 0x003d, UProperty.BIDI_MIRRORED, 0 }, /* see Unicode Corrigendum #6 at http://www.unicode.org/versions/corrigendum6.html */ { 0x2018, UProperty.BIDI_MIRRORED, 0 }, { 0x201d, UProperty.BIDI_MIRRORED, 0 }, { 0x201f, UProperty.BIDI_MIRRORED, 0 }, { 0x301e, UProperty.BIDI_MIRRORED, 0 }, { 0x058a, UProperty.DASH, 1 }, { 0x007e, UProperty.DASH, 0 }, { 0x0c4d, UProperty.DIACRITIC, 1 }, { 0x3000, UProperty.DIACRITIC, 0 }, { 0x0e46, UProperty.EXTENDER, 1 }, { 0x0020, UProperty.EXTENDER, 0 }, { 0xfb1d, UProperty.FULL_COMPOSITION_EXCLUSION, 1 }, { 0x1d15f, UProperty.FULL_COMPOSITION_EXCLUSION, 1 }, { 0xfb1e, UProperty.FULL_COMPOSITION_EXCLUSION, 0 }, { 0x110a, UProperty.NFD_INERT, 1 }, /* Jamo L */ { 0x0308, UProperty.NFD_INERT, 0 }, { 0x1164, UProperty.NFKD_INERT, 1 }, /* Jamo V */ { 0x1d79d, UProperty.NFKD_INERT, 0 }, /* math compat version of xi */ { 0x0021, UProperty.NFC_INERT, 1 }, /* ! */ { 0x0061, UProperty.NFC_INERT, 0 }, /* a */ { 0x00e4, UProperty.NFC_INERT, 0 }, /* a-umlaut */ { 0x0102, UProperty.NFC_INERT, 0 }, /* a-breve */ { 0xac1c, UProperty.NFC_INERT, 0 }, /* Hangul LV */ { 0xac1d, UProperty.NFC_INERT, 1 }, /* Hangul LVT */ { 0x1d79d, UProperty.NFKC_INERT, 0 }, /* math compat version of xi */ { 0x2a6d6, UProperty.NFKC_INERT, 1 }, /* Han, last of CJK ext. B */ { 0x00e4, UProperty.SEGMENT_STARTER, 1 }, { 0x0308, UProperty.SEGMENT_STARTER, 0 }, { 0x110a, UProperty.SEGMENT_STARTER, 1 }, /* Jamo L */ { 0x1164, UProperty.SEGMENT_STARTER, 0 },/* Jamo V */ { 0xac1c, UProperty.SEGMENT_STARTER, 1 }, /* Hangul LV */ { 0xac1d, UProperty.SEGMENT_STARTER, 1 }, /* Hangul LVT */ { 0x0044, UProperty.HEX_DIGIT, 1 }, { 0xff46, UProperty.HEX_DIGIT, 1 }, { 0x0047, UProperty.HEX_DIGIT, 0 }, { 0x30fb, UProperty.HYPHEN, 1 }, { 0xfe58, UProperty.HYPHEN, 0 }, { 0x2172, UProperty.ID_CONTINUE, 1 }, { 0x0307, UProperty.ID_CONTINUE, 1 }, { 0x005c, UProperty.ID_CONTINUE, 0 }, { 0x2172, UProperty.ID_START, 1 }, { 0x007a, UProperty.ID_START, 1 }, { 0x0039, UProperty.ID_START, 0 }, { 0x4db5, UProperty.IDEOGRAPHIC, 1 }, { 0x2f999, UProperty.IDEOGRAPHIC, 1 }, { 0x2f99, UProperty.IDEOGRAPHIC, 0 }, { 0x200c, UProperty.JOIN_CONTROL, 1 }, { 0x2029, UProperty.JOIN_CONTROL, 0 }, { 0x1d7bc, UProperty.LOWERCASE, 1 }, { 0x0345, UProperty.LOWERCASE, 1 }, { 0x0030, UProperty.LOWERCASE, 0 }, { 0x1d7a9, UProperty.MATH, 1 }, { 0x2135, UProperty.MATH, 1 }, { 0x0062, UProperty.MATH, 0 }, { 0xfde1, UProperty.NONCHARACTER_CODE_POINT, 1 }, { 0x10ffff, UProperty.NONCHARACTER_CODE_POINT, 1 }, { 0x10fffd, UProperty.NONCHARACTER_CODE_POINT, 0 }, { 0x0022, UProperty.QUOTATION_MARK, 1 }, { 0xff62, UProperty.QUOTATION_MARK, 1 }, { 0xd840, UProperty.QUOTATION_MARK, 0 }, { 0x061f, UProperty.TERMINAL_PUNCTUATION, 1 }, { 0xe003f, UProperty.TERMINAL_PUNCTUATION, 0 }, { 0x1d44a, UProperty.UPPERCASE, 1 }, { 0x2162, UProperty.UPPERCASE, 1 }, { 0x0345, UProperty.UPPERCASE, 0 }, { 0x0020, UProperty.WHITE_SPACE, 1 }, { 0x202f, UProperty.WHITE_SPACE, 1 }, { 0x3001, UProperty.WHITE_SPACE, 0 }, { 0x0711, UProperty.XID_CONTINUE, 1 }, { 0x1d1aa, UProperty.XID_CONTINUE, 1 }, { 0x007c, UProperty.XID_CONTINUE, 0 }, { 0x16ee, UProperty.XID_START, 1 }, { 0x23456, UProperty.XID_START, 1 }, { 0x1d1aa, UProperty.XID_START, 0 }, /* * Version break: * The following properties are only supported starting with the * Unicode version indicated in the second field. */ { -1, 0x320, 0 }, { 0x180c, UProperty.DEFAULT_IGNORABLE_CODE_POINT, 1 }, { 0xfe02, UProperty.DEFAULT_IGNORABLE_CODE_POINT, 1 }, { 0x1801, UProperty.DEFAULT_IGNORABLE_CODE_POINT, 0 }, { 0x0341, UProperty.DEPRECATED, 1 }, { 0xe0041, UProperty.DEPRECATED, 1 }, /* Changed from Unicode 5 to 5.1 */ { 0x00a0, UProperty.GRAPHEME_BASE, 1 }, { 0x0a4d, UProperty.GRAPHEME_BASE, 0 }, { 0xff9d, UProperty.GRAPHEME_BASE, 1 }, { 0xff9f, UProperty.GRAPHEME_BASE, 0 }, /* changed from Unicode 3.2 to 4 and again 5 to 5.1 */ { 0x0300, UProperty.GRAPHEME_EXTEND, 1 }, { 0xff9d, UProperty.GRAPHEME_EXTEND, 0 }, { 0xff9f, UProperty.GRAPHEME_EXTEND, 1 }, /* changed from Unicode 3.2 to 4 and again 5 to 5.1 */ { 0x0603, UProperty.GRAPHEME_EXTEND, 0 }, { 0x0a4d, UProperty.GRAPHEME_LINK, 1 }, { 0xff9f, UProperty.GRAPHEME_LINK, 0 }, { 0x2ff7, UProperty.IDS_BINARY_OPERATOR, 1 }, { 0x2ff3, UProperty.IDS_BINARY_OPERATOR, 0 }, { 0x2ff3, UProperty.IDS_TRINARY_OPERATOR, 1 }, { 0x2f03, UProperty.IDS_TRINARY_OPERATOR, 0 }, { 0x0ec1, UProperty.LOGICAL_ORDER_EXCEPTION, 1 }, { 0xdcba, UProperty.LOGICAL_ORDER_EXCEPTION, 0 }, { 0x2e9b, UProperty.RADICAL, 1 }, { 0x4e00, UProperty.RADICAL, 0 }, { 0x012f, UProperty.SOFT_DOTTED, 1 }, { 0x0049, UProperty.SOFT_DOTTED, 0 }, { 0xfa11, UProperty.UNIFIED_IDEOGRAPH, 1 }, { 0xfa12, UProperty.UNIFIED_IDEOGRAPH, 0 }, { -1, 0x401, 0 }, /* version break for Unicode 4.0.1 */ { 0x002e, UProperty.S_TERM, 1 }, { 0x0061, UProperty.S_TERM, 0 }, { 0x180c, UProperty.VARIATION_SELECTOR, 1 }, { 0xfe03, UProperty.VARIATION_SELECTOR, 1 }, { 0xe01ef, UProperty.VARIATION_SELECTOR, 1 }, { 0xe0200, UProperty.VARIATION_SELECTOR, 0 }, /* enum/integer type properties */ /* test default Bidi classes for unassigned code points */ { 0x0590, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x05cf, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x05ed, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x07f2, UProperty.BIDI_CLASS, UCharacterDirection.DIR_NON_SPACING_MARK }, /* Nko, new in Unicode 5.0 */ { 0x07fe, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, /* unassigned R */ { 0x08ba, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0xfb37, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0xfb42, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x10806, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x10909, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x10fe4, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT }, { 0x0605, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x061c, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x063f, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x070e, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x0775, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0xfbc2, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0xfd90, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0xfefe, UProperty.BIDI_CLASS, UCharacterDirection.RIGHT_TO_LEFT_ARABIC }, { 0x02AF, UProperty.BLOCK, UCharacter.UnicodeBlock.IPA_EXTENSIONS.getID() }, { 0x0C4E, UProperty.BLOCK, UCharacter.UnicodeBlock.TELUGU.getID()}, { 0x155A, UProperty.BLOCK, UCharacter.UnicodeBlock.UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS.getID() }, { 0x1717, UProperty.BLOCK, UCharacter.UnicodeBlock.TAGALOG.getID() }, { 0x1900, UProperty.BLOCK, UCharacter.UnicodeBlock.LIMBU.getID() }, { 0x1AFF, UProperty.BLOCK, UCharacter.UnicodeBlock.NO_BLOCK.getID()}, { 0x3040, UProperty.BLOCK, UCharacter.UnicodeBlock.HIRAGANA.getID()}, { 0x1D0FF, UProperty.BLOCK, UCharacter.UnicodeBlock.BYZANTINE_MUSICAL_SYMBOLS.getID()}, { 0x50000, UProperty.BLOCK, UCharacter.UnicodeBlock.NO_BLOCK.getID() }, { 0xEFFFF, UProperty.BLOCK, UCharacter.UnicodeBlock.NO_BLOCK.getID() }, { 0x10D0FF, UProperty.BLOCK, UCharacter.UnicodeBlock.SUPPLEMENTARY_PRIVATE_USE_AREA_B.getID() }, /* UProperty.CANONICAL_COMBINING_CLASS tested for assigned characters in TestUnicodeData() */ { 0xd7d7, UProperty.CANONICAL_COMBINING_CLASS, 0 }, { 0x00A0, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.NOBREAK }, { 0x00A8, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.COMPAT }, { 0x00bf, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.NONE }, { 0x00c0, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.CANONICAL }, { 0x1E9B, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.CANONICAL }, { 0xBCDE, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.CANONICAL }, { 0xFB5D, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.MEDIAL }, { 0x1D736, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.FONT }, { 0xe0033, UProperty.DECOMPOSITION_TYPE, UCharacter.DecompositionType.NONE }, { 0x0009, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.NEUTRAL }, { 0x0020, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.NARROW }, { 0x00B1, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.AMBIGUOUS }, { 0x20A9, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.HALFWIDTH }, { 0x2FFB, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0x3000, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.FULLWIDTH }, { 0x35bb, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0x58bd, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0xD7A3, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0xEEEE, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.AMBIGUOUS }, { 0x1D198, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.NEUTRAL }, { 0x20000, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0x2F8C7, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0x3a5bd, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.WIDE }, { 0x5a5bd, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.NEUTRAL }, { 0xFEEEE, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.AMBIGUOUS }, { 0x10EEEE, UProperty.EAST_ASIAN_WIDTH, UCharacter.EastAsianWidth.AMBIGUOUS }, /* UProperty.GENERAL_CATEGORY tested for assigned characters in TestUnicodeData() */ { 0xd7d7, UProperty.GENERAL_CATEGORY, 0 }, { 0x0444, UProperty.JOINING_GROUP, UCharacter.JoiningGroup.NO_JOINING_GROUP }, { 0x0639, UProperty.JOINING_GROUP, UCharacter.JoiningGroup.AIN }, { 0x072A, UProperty.JOINING_GROUP, UCharacter.JoiningGroup.DALATH_RISH }, { 0x0647, UProperty.JOINING_GROUP, UCharacter.JoiningGroup.HEH }, { 0x06C1, UProperty.JOINING_GROUP, UCharacter.JoiningGroup.HEH_GOAL }, { 0x06C3, UProperty.JOINING_GROUP, UCharacter.JoiningGroup.HAMZA_ON_HEH_GOAL }, { 0x200C, UProperty.JOINING_TYPE, UCharacter.JoiningType.NON_JOINING }, { 0x200D, UProperty.JOINING_TYPE, UCharacter.JoiningType.JOIN_CAUSING }, { 0x0639, UProperty.JOINING_TYPE, UCharacter.JoiningType.DUAL_JOINING }, { 0x0640, UProperty.JOINING_TYPE, UCharacter.JoiningType.JOIN_CAUSING }, { 0x06C3, UProperty.JOINING_TYPE, UCharacter.JoiningType.RIGHT_JOINING }, { 0x0300, UProperty.JOINING_TYPE, UCharacter.JoiningType.TRANSPARENT }, { 0x070F, UProperty.JOINING_TYPE, UCharacter.JoiningType.TRANSPARENT }, { 0xe0033, UProperty.JOINING_TYPE, UCharacter.JoiningType.TRANSPARENT }, /* TestUnicodeData() verifies that no assigned character has "XX" (unknown) */ { 0xe7e7, UProperty.LINE_BREAK, UCharacter.LineBreak.UNKNOWN }, { 0x10fffd, UProperty.LINE_BREAK, UCharacter.LineBreak.UNKNOWN }, { 0x0028, UProperty.LINE_BREAK, UCharacter.LineBreak.OPEN_PUNCTUATION }, { 0x232A, UProperty.LINE_BREAK, UCharacter.LineBreak.CLOSE_PUNCTUATION }, { 0x3401, UProperty.LINE_BREAK, UCharacter.LineBreak.IDEOGRAPHIC }, { 0x4e02, UProperty.LINE_BREAK, UCharacter.LineBreak.IDEOGRAPHIC }, { 0x20004, UProperty.LINE_BREAK, UCharacter.LineBreak.IDEOGRAPHIC }, { 0xf905, UProperty.LINE_BREAK, UCharacter.LineBreak.IDEOGRAPHIC }, { 0xdb7e, UProperty.LINE_BREAK, UCharacter.LineBreak.SURROGATE }, { 0xdbfd, UProperty.LINE_BREAK, UCharacter.LineBreak.SURROGATE }, { 0xdffc, UProperty.LINE_BREAK, UCharacter.LineBreak.SURROGATE }, { 0x2762, UProperty.LINE_BREAK, UCharacter.LineBreak.EXCLAMATION }, { 0x002F, UProperty.LINE_BREAK, UCharacter.LineBreak.BREAK_SYMBOLS }, { 0x1D49C, UProperty.LINE_BREAK, UCharacter.LineBreak.ALPHABETIC }, { 0x1731, UProperty.LINE_BREAK, UCharacter.LineBreak.ALPHABETIC }, /* UProperty.NUMERIC_TYPE tested in TestNumericProperties() */ /* UProperty.SCRIPT tested in TestUScriptCodeAPI() */ { 0x1100, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LEADING_JAMO }, { 0x1111, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LEADING_JAMO }, { 0x1159, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LEADING_JAMO }, { 0x115f, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LEADING_JAMO }, { 0x1160, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.VOWEL_JAMO }, { 0x1161, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.VOWEL_JAMO }, { 0x1172, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.VOWEL_JAMO }, { 0x11a2, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.VOWEL_JAMO }, { 0x11a8, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.TRAILING_JAMO }, { 0x11b8, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.TRAILING_JAMO }, { 0x11c8, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.TRAILING_JAMO }, { 0x11f9, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.TRAILING_JAMO }, { 0x115a, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { 0x115e, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { 0x11a3, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { 0x11a7, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { 0x11fa, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { 0x11ff, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { 0xac00, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LV_SYLLABLE }, { 0xac1c, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LV_SYLLABLE }, { 0xc5ec, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LV_SYLLABLE }, { 0xd788, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LV_SYLLABLE }, { 0xac01, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LVT_SYLLABLE }, { 0xac1b, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LVT_SYLLABLE }, { 0xac1d, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LVT_SYLLABLE }, { 0xc5ee, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LVT_SYLLABLE }, { 0xd7a3, UProperty.HANGUL_SYLLABLE_TYPE, UCharacter.HangulSyllableType.LVT_SYLLABLE }, { 0xd7a4, UProperty.HANGUL_SYLLABLE_TYPE, 0 }, { -1, 0x410, 0 }, /* version break for Unicode 4.1 */ { 0x00d7, UProperty.PATTERN_SYNTAX, 1 }, { 0xfe45, UProperty.PATTERN_SYNTAX, 1 }, { 0x0061, UProperty.PATTERN_SYNTAX, 0 }, { 0x0020, UProperty.PATTERN_WHITE_SPACE, 1 }, { 0x0085, UProperty.PATTERN_WHITE_SPACE, 1 }, { 0x200f, UProperty.PATTERN_WHITE_SPACE, 1 }, { 0x00a0, UProperty.PATTERN_WHITE_SPACE, 0 }, { 0x3000, UProperty.PATTERN_WHITE_SPACE, 0 }, { 0x1d200, UProperty.BLOCK, UCharacter.UnicodeBlock.ANCIENT_GREEK_MUSICAL_NOTATION_ID }, { 0x2c8e, UProperty.BLOCK, UCharacter.UnicodeBlock.COPTIC_ID }, { 0xfe17, UProperty.BLOCK, UCharacter.UnicodeBlock.VERTICAL_FORMS_ID }, { 0x1a00, UProperty.SCRIPT, UScript.BUGINESE }, { 0x2cea, UProperty.SCRIPT, UScript.COPTIC }, { 0xa82b, UProperty.SCRIPT, UScript.SYLOTI_NAGRI }, { 0x103d0, UProperty.SCRIPT, UScript.OLD_PERSIAN }, { 0xcc28, UProperty.LINE_BREAK, UCharacter.LineBreak.H2 }, { 0xcc29, UProperty.LINE_BREAK, UCharacter.LineBreak.H3 }, { 0xac03, UProperty.LINE_BREAK, UCharacter.LineBreak.H3 }, { 0x115f, UProperty.LINE_BREAK, UCharacter.LineBreak.JL }, { 0x11aa, UProperty.LINE_BREAK, UCharacter.LineBreak.JT }, { 0x11a1, UProperty.LINE_BREAK, UCharacter.LineBreak.JV }, { 0xb2c9, UProperty.GRAPHEME_CLUSTER_BREAK, UCharacter.GraphemeClusterBreak.LVT }, { 0x036f, UProperty.GRAPHEME_CLUSTER_BREAK, UCharacter.GraphemeClusterBreak.EXTEND }, { 0x0000, UProperty.GRAPHEME_CLUSTER_BREAK, UCharacter.GraphemeClusterBreak.CONTROL }, { 0x1160, UProperty.GRAPHEME_CLUSTER_BREAK, UCharacter.GraphemeClusterBreak.V }, { 0x05f4, UProperty.WORD_BREAK, UCharacter.WordBreak.MIDLETTER }, { 0x4ef0, UProperty.WORD_BREAK, UCharacter.WordBreak.OTHER }, { 0x19d9, UProperty.WORD_BREAK, UCharacter.WordBreak.NUMERIC }, { 0x2044, UProperty.WORD_BREAK, UCharacter.WordBreak.MIDNUM }, { 0xfffd, UProperty.SENTENCE_BREAK, UCharacter.SentenceBreak.OTHER }, { 0x1ffc, UProperty.SENTENCE_BREAK, UCharacter.SentenceBreak.UPPER }, { 0xff63, UProperty.SENTENCE_BREAK, UCharacter.SentenceBreak.CLOSE }, { 0x2028, UProperty.SENTENCE_BREAK, UCharacter.SentenceBreak.SEP }, /* undefined UProperty values */ { 0x61, 0x4a7, 0 }, { 0x234bc, 0x15ed, 0 } }; if (UCharacter.getIntPropertyMinValue(UProperty.DASH) != 0 || UCharacter.getIntPropertyMinValue(UProperty.BIDI_CLASS) != 0 || UCharacter.getIntPropertyMinValue(UProperty.BLOCK)!= 0 /* j2478 */ || UCharacter.getIntPropertyMinValue(UProperty.SCRIPT)!= 0 /* JB#2410 */ || UCharacter.getIntPropertyMinValue(0x2345) != 0) { errln("error: UCharacter.getIntPropertyMinValue() wrong"); } if( UCharacter.getIntPropertyMaxValue(UProperty.DASH)!=1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.DASH) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.ID_CONTINUE)!=1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.ID_CONTINUE) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.BINARY_LIMIT-1)!=1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.BINARY_LIMIT-1) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)!=UCharacterDirection.CHAR_DIRECTION_COUNT-1 ) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.BLOCK)!=UCharacter.UnicodeBlock.COUNT-1 ) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.BLOCK) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.LINE_BREAK)!=UCharacter.LineBreak.COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.LINE_BREAK) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.SCRIPT)!=UScript.CODE_LIMIT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.SCRIPT) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.NUMERIC_TYPE)!=UCharacter.NumericType.COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.NUMERIC_TYPE) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.GENERAL_CATEGORY)!=UCharacterCategory.CHAR_CATEGORY_COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.GENERAL_CATEGORY) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.HANGUL_SYLLABLE_TYPE)!=UCharacter.HangulSyllableType.COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.HANGUL_SYLLABLE_TYPE) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.GRAPHEME_CLUSTER_BREAK)!=UCharacter.GraphemeClusterBreak.COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.GRAPHEME_CLUSTER_BREAK) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.SENTENCE_BREAK)!=UCharacter.SentenceBreak.COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.SENTENCE_BREAK) wrong\n"); } if(UCharacter.getIntPropertyMaxValue(UProperty.WORD_BREAK)!=UCharacter.WordBreak.COUNT-1) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.WORD_BREAK) wrong\n"); } /*JB#2410*/ if( UCharacter.getIntPropertyMaxValue(0x2345)!=-1) { errln("error: UCharacter.getIntPropertyMaxValue(0x2345) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.DECOMPOSITION_TYPE) != (UCharacter.DecompositionType.COUNT - 1)) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.DECOMPOSITION_TYPE) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.JOINING_GROUP) != (UCharacter.JoiningGroup.COUNT -1)) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.JOINING_GROUP) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.JOINING_TYPE) != (UCharacter.JoiningType.COUNT -1)) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.JOINING_TYPE) wrong\n"); } if( UCharacter.getIntPropertyMaxValue(UProperty.EAST_ASIAN_WIDTH) != (UCharacter.EastAsianWidth.COUNT -1)) { errln("error: UCharacter.getIntPropertyMaxValue(UProperty.EAST_ASIAN_WIDTH) wrong\n"); } VersionInfo version = UCharacter.getUnicodeVersion(); // test hasBinaryProperty() for (int i = 0; i < props.length; ++ i) { if (props[i][0] < 0) { if (version.compareTo(VersionInfo.getInstance(props[i][1] >> 8, (props[i][1] >> 4) & 0xF, props[i][1] & 0xF, 0)) < 0) { break; } continue; } boolean expect = true; if (props[i][2] == 0) { expect = false; } if (props[i][1] < UProperty.INT_START) { if (UCharacter.hasBinaryProperty(props[i][0], props[i][1]) != expect) { errln("error: UCharacter.hasBinaryProperty(\\u" + Integer.toHexString(props[i][0]) + ", " + Integer.toHexString(props[i][1]) + ") has an error expected " + props[i][2]); } } int retVal = UCharacter.getIntPropertyValue(props[i][0], props[i][1]); if (retVal != props[i][2]) { errln("error: UCharacter.getIntPropertyValue(\\u" + Utility.hex(props[i][0], 4) + ", " + props[i][1] + " is wrong, should be " + props[i][2] + " not " + retVal); } // test separate functions, too switch (props[i][1]) { case UProperty.ALPHABETIC: if (UCharacter.isUAlphabetic(props[i][0]) != expect) { errln("error: UCharacter.isUAlphabetic(\\u" + Integer.toHexString(props[i][0]) + ") is wrong expected " + props[i][2]); } break; case UProperty.LOWERCASE: if (UCharacter.isULowercase(props[i][0]) != expect) { errln("error: UCharacter.isULowercase(\\u" + Integer.toHexString(props[i][0]) + ") is wrong expected " +props[i][2]); } break; case UProperty.UPPERCASE: if (UCharacter.isUUppercase(props[i][0]) != expect) { errln("error: UCharacter.isUUppercase(\\u" + Integer.toHexString(props[i][0]) + ") is wrong expected " + props[i][2]); } break; case UProperty.WHITE_SPACE: if (UCharacter.isUWhiteSpace(props[i][0]) != expect) { errln("error: UCharacter.isUWhiteSpace(\\u" + Integer.toHexString(props[i][0]) + ") is wrong expected " + props[i][2]); } break; default: break; } } } public void TestNumericProperties() { // see UnicodeData.txt, DerivedNumericValues.txt int testvar[][] = { { 0x0F33, UCharacter.NumericType.NUMERIC }, { 0x0C66, UCharacter.NumericType.DECIMAL }, { 0x2159, UCharacter.NumericType.NUMERIC }, { 0x00BD, UCharacter.NumericType.NUMERIC }, { 0x0031, UCharacter.NumericType.DECIMAL }, { 0x10320, UCharacter.NumericType.NUMERIC }, { 0x0F2B, UCharacter.NumericType.NUMERIC }, { 0x00B2, UCharacter.NumericType.DIGIT }, /* Unicode 4.0 change */ { 0x1813, UCharacter.NumericType.DECIMAL }, { 0x2173, UCharacter.NumericType.NUMERIC }, { 0x278E, UCharacter.NumericType.DIGIT }, { 0x1D7F2, UCharacter.NumericType.DECIMAL }, { 0x247A, UCharacter.NumericType.DIGIT }, { 0x1372, UCharacter.NumericType.NUMERIC }, { 0x216B, UCharacter.NumericType.NUMERIC }, { 0x16EE, UCharacter.NumericType.NUMERIC }, { 0x249A, UCharacter.NumericType.NUMERIC }, { 0x303A, UCharacter.NumericType.NUMERIC }, { 0x32B2, UCharacter.NumericType.NUMERIC }, { 0x1375, UCharacter.NumericType.NUMERIC }, { 0x10323, UCharacter.NumericType.NUMERIC }, { 0x0BF1, UCharacter.NumericType.NUMERIC }, { 0x217E, UCharacter.NumericType.NUMERIC }, { 0x2180, UCharacter.NumericType.NUMERIC }, { 0x2181, UCharacter.NumericType.NUMERIC }, { 0x137C, UCharacter.NumericType.NUMERIC }, { 0x61, UCharacter.NumericType.NONE }, { 0x3000, UCharacter.NumericType.NONE }, { 0xfffe, UCharacter.NumericType.NONE }, { 0x10301, UCharacter.NumericType.NONE }, { 0xe0033, UCharacter.NumericType.NONE }, { 0x10ffff, UCharacter.NumericType.NONE }, /* Unicode 4.0 Changes */ { 0x96f6, UCharacter.NumericType.NUMERIC }, { 0x4e00, UCharacter.NumericType.NUMERIC }, { 0x58f1, UCharacter.NumericType.NUMERIC }, { 0x5f10, UCharacter.NumericType.NUMERIC }, { 0x5f0e, UCharacter.NumericType.NUMERIC }, { 0x8086, UCharacter.NumericType.NUMERIC }, { 0x7396, UCharacter.NumericType.NUMERIC }, { 0x5345, UCharacter.NumericType.NUMERIC }, { 0x964c, UCharacter.NumericType.NUMERIC }, { 0x4edf, UCharacter.NumericType.NUMERIC }, { 0x4e07, UCharacter.NumericType.NUMERIC }, { 0x4ebf, UCharacter.NumericType.NUMERIC }, { 0x5146, UCharacter.NumericType.NUMERIC } }; double expected[] = {-1/(double)2, 0, 1/(double)6, 1/(double)2, 1, 1, 3/(double)2, 2, 3, 4, 5, 6, 7, 10, 12, 17, 19, 30, 37, 40, 50, 100, 500, 1000, 5000, 10000, UCharacter.NO_NUMERIC_VALUE, UCharacter.NO_NUMERIC_VALUE, UCharacter.NO_NUMERIC_VALUE, UCharacter.NO_NUMERIC_VALUE, UCharacter.NO_NUMERIC_VALUE, UCharacter.NO_NUMERIC_VALUE, 0 , 1 , 1 , 2 , 3 , 4 , 9 , 30 , 100 , 1000 , 10000 , 100000000 , 1000000000000.00 }; for (int i = 0; i < testvar.length; ++ i) { int c = testvar[i][0]; int type = UCharacter.getIntPropertyValue(c, UProperty.NUMERIC_TYPE); double nv = UCharacter.getUnicodeNumericValue(c); if (type != testvar[i][1]) { errln("UProperty.NUMERIC_TYPE(\\u" + Utility.hex(c, 4) + ") = " + type + " should be " + testvar[i][1]); } if (0.000001 <= Math.abs(nv - expected[i])) { errln("UCharacter.getNumericValue(\\u" + Utility.hex(c, 4) + ") = " + nv + " should be " + expected[i]); } } } /** * Test the property values API. See JB#2410. */ public void TestPropertyValues() { int i, p, min, max; /* Min should be 0 for everything. */ /* Until JB#2478 is fixed, the one exception is UProperty.BLOCK. */ for (p=UProperty.INT_START; p 1 && buffer[0]==0x0049) { set2.add(start); } } compareUSets(set1, set2, "[canon start set of 0049]", "[all c with canon decomp with 0049]", false); } public void TestCoverage() { //cover forDigit char ch1 = UCharacter.forDigit(7, 11); assertEquals("UCharacter.forDigit ", "7", String.valueOf(ch1)); char ch2 = UCharacter.forDigit(17, 20); assertEquals("UCharacter.forDigit ", "h", String.valueOf(ch2)); //Jitterbug 4451, for coverage for (int i = 0x0041; i < 0x005B; i++) { if (!UCharacter.isJavaLetter(i)) errln("FAIL \\u" + hex(i) + " expected to be a letter"); if (!UCharacter.isJavaIdentifierStart(i)) errln("FAIL \\u" + hex(i) + " expected to be a Java identifier start character"); if (!UCharacter.isJavaLetterOrDigit(i)) errln("FAIL \\u" + hex(i) + " expected not to be a Java letter"); if (!UCharacter.isJavaIdentifierPart(i)) errln("FAIL \\u" + hex(i) + " expected to be a Java identifier part character"); } char[] spaces = {'\t','\n','\f','\r',' '}; for (int i = 0; i < spaces.length; i++){ if (!UCharacter.isSpace(spaces[i])) errln("FAIL \\u" + hex(spaces[i]) + " expected to be a Java space"); } if (!UCharacter.getStringPropertyValue(UProperty.AGE,'\u3400',0).equals("3.0.0.0")){ errln("FAIL \\u3400 expected to be 3.0.0.0"); } } public void TestCasePropsDummy() { // code coverage for UCaseProps.getDummy() if(UCaseProps.getDummy().tolower(0x41)!=0x41) { errln("UCaseProps.getDummy().tolower(0x41)!=0x41"); } } public void TestBiDiPropsDummy() { // code coverage for UBiDiProps.getDummy() if(UBiDiProps.getDummy().getClass(0x20)!=0) { errln("UBiDiProps.getDummy().getClass(0x20)!=0"); } } public void TestBlockData() { Class ubc = UCharacter.UnicodeBlock.class; for (int b = 1; b < UCharacter.UnicodeBlock.COUNT; b += 1) { UCharacter.UnicodeBlock blk = UCharacter.UnicodeBlock.getInstance(b); int id = blk.getID(); String name = blk.toString(); if (id != b) { errln("UCharacter.UnicodeBlock.getInstance(" + b + ") returned a block with id = " + id); } try { if (ubc.getField(name + "_ID").getInt(blk) != b) { errln("UCharacter.UnicodeBlock.getInstance(" + b + ") returned a block with a name of " + name + " which does not match the block id."); } } catch (Exception e) { errln("Couldn't get the id name for id " + b); } } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterCategoryTest.java0000644000175000017500000000607411361046222025133 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2006, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.lang.UCharacterCategory; /** * Testing UCharacterCategory * @author Syn Wee Quek * @since April 02 2002 */ public class UCharacterCategoryTest extends TestFmwk { // constructor ----------------------------------------------------------- /** * Private constructor to prevent initialisation */ public UCharacterCategoryTest() { } // public methods -------------------------------------------------------- public static void main(String[] arg) { try { UCharacterCategoryTest test = new UCharacterCategoryTest(); test.run(arg); } catch (Exception e) { e.printStackTrace(); } } /** * Gets the name of the argument category * @returns category name */ public void TestToString() { String name[] = {"Unassigned", "Letter, Uppercase", "Letter, Lowercase", "Letter, Titlecase", "Letter, Modifier", "Letter, Other", "Mark, Non-Spacing", "Mark, Enclosing", "Mark, Spacing Combining", "Number, Decimal Digit", "Number, Letter", "Number, Other", "Separator, Space", "Separator, Line", "Separator, Paragraph", "Other, Control", "Other, Format", "Other, Private Use", "Other, Surrogate", "Punctuation, Dash", "Punctuation, Open", "Punctuation, Close", "Punctuation, Connector", "Punctuation, Other", "Symbol, Math", "Symbol, Currency", "Symbol, Modifier", "Symbol, Other", "Punctuation, Initial quote", "Punctuation, Final quote"}; for (int i = UCharacterCategory.UNASSIGNED; i < UCharacterCategory.CHAR_CATEGORY_COUNT; i ++) { if (!UCharacterCategory.toString(i).equals(name[i])) { errln("Error toString for category " + i + " expected " + name[i]); } } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UTF16Test.java0000644000175000017500000020245211361046222022217 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.dev.test.UTF16Util; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.ReplaceableString; import com.ibm.icu.impl.Utility; /** * Testing class for UTF16 * @author Syn Wee Quek * @since feb 09 2001 */ public final class UTF16Test extends TestFmwk { // constructor =================================================== /** * Constructor */ public UTF16Test() { } // public methods ================================================ /** * Testing UTF16 class methods append */ public void TestAppend() { StringBuffer strbuff = new StringBuffer("this is a string "); char array[] = new char[UCharacter.MAX_VALUE >> 2]; int strsize = strbuff.length(); int arraysize = strsize; Utility.getChars(strbuff, 0, strsize, array, 0); for (int i = 1; i < UCharacter.MAX_VALUE; i += 100) { UTF16.append(strbuff, i); arraysize = UTF16.append(array, arraysize, i); String arraystr = new String(array, 0, arraysize); if (!arraystr.equals(strbuff.toString())) { errln("FAIL Comparing char array append and string append " + "with 0x" + Integer.toHexString(i)); } // this is to cater for the combination of 0xDBXX 0xDC50 which // forms a supplementary character if (i == 0xDC51) { strsize --; } if (UTF16.countCodePoint(strbuff) != strsize + (i / 100) + 1) { errln("FAIL Counting code points in string appended with " + " 0x" + Integer.toHexString(i)); break; } } // coverage for new 1.5 - cover only so no real test strbuff = new StringBuffer(); UTF16.appendCodePoint(strbuff, 0x10000); if (strbuff.length() != 2) { errln("fail appendCodePoint"); } } /** * Testing UTF16 class methods bounds */ public void TestBounds() { StringBuffer strbuff = //0 12345 6 7 8 9 new StringBuffer("\udc000123\ud800\udc00\ud801\udc01\ud802"); String str = strbuff.toString(); char array[] = str.toCharArray(); int boundtype[] = {UTF16.SINGLE_CHAR_BOUNDARY, UTF16.SINGLE_CHAR_BOUNDARY, UTF16.SINGLE_CHAR_BOUNDARY, UTF16.SINGLE_CHAR_BOUNDARY, UTF16.SINGLE_CHAR_BOUNDARY, UTF16.LEAD_SURROGATE_BOUNDARY, UTF16.TRAIL_SURROGATE_BOUNDARY, UTF16.LEAD_SURROGATE_BOUNDARY, UTF16.TRAIL_SURROGATE_BOUNDARY, UTF16.SINGLE_CHAR_BOUNDARY}; int length = str.length(); for (int i = 0; i < length; i ++) { if (UTF16.bounds(str, i) != boundtype[i]) { errln("FAIL checking bound type at index " + i); } if (UTF16.bounds(strbuff, i) != boundtype[i]) { errln("FAIL checking bound type at index " + i); } if (UTF16.bounds(array, 0, length, i) != boundtype[i]) { errln("FAIL checking bound type at index " + i); } } // does not straddle between supplementary character int start = 4; int limit = 9; int subboundtype1[] = {UTF16.SINGLE_CHAR_BOUNDARY, UTF16.LEAD_SURROGATE_BOUNDARY, UTF16.TRAIL_SURROGATE_BOUNDARY, UTF16.LEAD_SURROGATE_BOUNDARY, UTF16.TRAIL_SURROGATE_BOUNDARY}; try { UTF16.bounds(array, start, limit, -1); errln("FAIL Out of bounds index in bounds should fail"); } catch (Exception e) { // getting rid of warnings System.out.print(""); } for (int i = 0; i < limit - start; i ++) { if (UTF16.bounds(array, start, limit, i) != subboundtype1[i]) { errln("FAILED Subarray bounds in [" + start + ", " + limit + "] expected " + subboundtype1[i] + " at offset " + i); } } // starts from the mid of a supplementary character int subboundtype2[] = {UTF16.SINGLE_CHAR_BOUNDARY, UTF16.LEAD_SURROGATE_BOUNDARY, UTF16.TRAIL_SURROGATE_BOUNDARY}; start = 6; limit = 9; for (int i = 0; i < limit - start; i ++) { if (UTF16.bounds(array, start, limit, i) != subboundtype2[i]) { errln("FAILED Subarray bounds in [" + start + ", " + limit + "] expected " + subboundtype2[i] + " at offset " + i); } } // ends in the mid of a supplementary character int subboundtype3[] = {UTF16.LEAD_SURROGATE_BOUNDARY, UTF16.TRAIL_SURROGATE_BOUNDARY, UTF16.SINGLE_CHAR_BOUNDARY}; start = 5; limit = 8; for (int i = 0; i < limit - start; i ++) { if (UTF16.bounds(array, start, limit, i) != subboundtype3[i]) { errln("FAILED Subarray bounds in [" + start + ", " + limit + "] expected " + subboundtype3[i] + " at offset " + i); } } } /** * Testing UTF16 class methods charAt and charAtCodePoint */ public void TestCharAt() { StringBuffer strbuff = new StringBuffer("12345\ud800\udc0167890\ud800\udc02"); if (UTF16.charAt(strbuff, 0) != '1' || UTF16.charAt(strbuff, 2) != '3' || UTF16.charAt(strbuff, 5) != 0x10001 || UTF16.charAt(strbuff, 6) != 0x10001 || UTF16.charAt(strbuff, 12) != 0x10002 || UTF16.charAt(strbuff, 13) != 0x10002) { errln("FAIL Getting character from string buffer error" ); } String str = strbuff.toString(); if (UTF16.charAt(str, 0) != '1' || UTF16.charAt(str, 2) != '3' || UTF16.charAt(str, 5) != 0x10001 || UTF16.charAt(str, 6) != 0x10001 || UTF16.charAt(str, 12) != 0x10002 || UTF16.charAt(str, 13) != 0x10002) { errln("FAIL Getting character from string error" ); } char array[] = str.toCharArray(); int start = 0; int limit = str.length(); if (UTF16.charAt(array, start, limit, 0) != '1' || UTF16.charAt(array, start, limit, 2) != '3' || UTF16.charAt(array, start, limit, 5) != 0x10001 || UTF16.charAt(array, start, limit, 6) != 0x10001 || UTF16.charAt(array, start, limit, 12) != 0x10002 || UTF16.charAt(array, start, limit, 13) != 0x10002) { errln("FAIL Getting character from array error" ); } // check the sub array here. start = 6; limit = 13; try { UTF16.charAt(array, start, limit, -1); errln("FAIL out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.charAt(array, start, limit, 8); errln("FAIL out of bounds error expected"); } catch (Exception e) { System.out.print(""); } if (UTF16.charAt(array, start, limit, 0) != 0xdc01) { errln("FAIL Expected result in subarray 0xdc01"); } if (UTF16.charAt(array, start, limit, 6) != 0xd800) { errln("FAIL Expected result in subarray 0xd800"); } ReplaceableString replaceable = new ReplaceableString(str); if (UTF16.charAt(replaceable, 0) != '1' || UTF16.charAt(replaceable, 2) != '3' || UTF16.charAt(replaceable, 5) != 0x10001 || UTF16.charAt(replaceable, 6) != 0x10001 || UTF16.charAt(replaceable, 12) != 0x10002 || UTF16.charAt(replaceable, 13) != 0x10002) { errln("FAIL Getting character from replaceable error" ); } } /** * Testing UTF16 class methods countCodePoint */ public void TestCountCodePoint() { StringBuffer strbuff = new StringBuffer(""); char array[] = null; if (UTF16.countCodePoint(strbuff) != 0 || UTF16.countCodePoint("") != 0 || UTF16.countCodePoint(array,0 ,0) != 0) { errln("FAIL Counting code points for empty strings"); } strbuff = new StringBuffer("this is a string "); String str = strbuff.toString(); array = str.toCharArray(); int size = str.length(); if (UTF16.countCodePoint(array, 0, 0) != 0) { errln("FAIL Counting code points for 0 offset array"); } if (UTF16.countCodePoint(str) != size || UTF16.countCodePoint(strbuff) != size || UTF16.countCodePoint(array, 0, size) != size) { errln("FAIL Counting code points"); } UTF16.append(strbuff, 0x10000); str = strbuff.toString(); array = str.toCharArray(); if (UTF16.countCodePoint(str) != size + 1 || UTF16.countCodePoint(strbuff) != size + 1 || UTF16.countCodePoint(array, 0, size + 1) != size + 1 || UTF16.countCodePoint(array, 0, size + 2) != size + 1) { errln("FAIL Counting code points"); } UTF16.append(strbuff, 0x61); str = strbuff.toString(); array = str.toCharArray(); if (UTF16.countCodePoint(str) != size + 2 || UTF16.countCodePoint(strbuff) != size + 2 || UTF16.countCodePoint(array, 0, size + 1) != size + 1 || UTF16.countCodePoint(array, 0, size + 2) != size + 1 || UTF16.countCodePoint(array, 0, size + 3) != size + 2) { errln("FAIL Counting code points"); } } /** * Testing UTF16 class methods delete */ public void TestDelete() { //01234567890123456 StringBuffer strbuff = new StringBuffer("these are strings"); int size = strbuff.length(); char array[] = strbuff.toString().toCharArray(); UTF16.delete(strbuff, 3); UTF16.delete(strbuff, 3); UTF16.delete(strbuff, 3); UTF16.delete(strbuff, 3); UTF16.delete(strbuff, 3); UTF16.delete(strbuff, 3); try { UTF16.delete(strbuff, strbuff.length()); errln("FAIL deleting out of bounds character should fail"); } catch (Exception e) { System.out.print(""); } UTF16.delete(strbuff, strbuff.length() - 1); if (!strbuff.toString().equals("the string")) { errln("FAIL expected result after deleting characters is " + "\"the string\""); } size = UTF16.delete(array, size, 3); size = UTF16.delete(array, size, 3); size = UTF16.delete(array, size, 3); size = UTF16.delete(array, size, 3); size = UTF16.delete(array, size, 3); size = UTF16.delete(array, size, 3); try { UTF16.delete(array, size, size); errln("FAIL deleting out of bounds character should fail"); } catch (Exception e) { System.out.print(""); } size = UTF16.delete(array, size, size - 1); String str = new String(array, 0, size); if (!str.equals("the string")) { errln("FAIL expected result after deleting characters is " + "\"the string\""); } //012345678 9 01 2 3 4 strbuff = new StringBuffer("string: \ud800\udc00 \ud801\udc01 \ud801\udc01"); size = strbuff.length(); array = strbuff.toString().toCharArray(); UTF16.delete(strbuff, 8); UTF16.delete(strbuff, 8); UTF16.delete(strbuff, 9); UTF16.delete(strbuff, 8); UTF16.delete(strbuff, 9); UTF16.delete(strbuff, 6); UTF16.delete(strbuff, 6); if (!strbuff.toString().equals("string")) { errln("FAIL expected result after deleting characters is \"string\""); } size = UTF16.delete(array, size, 8); size = UTF16.delete(array, size, 8); size = UTF16.delete(array, size, 9); size = UTF16.delete(array, size, 8); size = UTF16.delete(array, size, 9); size = UTF16.delete(array, size, 6); size = UTF16.delete(array, size, 6); str = new String(array, 0, size); if (!str.equals("string")) { errln("FAIL expected result after deleting characters is \"string\""); } } /** * Testing findOffsetFromCodePoint and findCodePointOffset */ public void TestfindOffset() { // jitterbug 47 String str = "a\uD800\uDC00b"; StringBuffer strbuff = new StringBuffer(str); char array[] = str.toCharArray(); int limit = str.length(); if (UTF16.findCodePointOffset(str, 0) != 0 || UTF16.findOffsetFromCodePoint(str, 0) != 0 || UTF16.findCodePointOffset(strbuff, 0) != 0 || UTF16.findOffsetFromCodePoint(strbuff, 0) != 0 || UTF16.findCodePointOffset(array, 0, limit, 0) != 0 || UTF16.findOffsetFromCodePoint(array, 0, limit, 0) != 0) { errln("FAIL Getting the first codepoint offset to a string with " + "supplementary characters"); } if (UTF16.findCodePointOffset(str, 1) != 1 || UTF16.findOffsetFromCodePoint(str, 1) != 1 || UTF16.findCodePointOffset(strbuff, 1) != 1 || UTF16.findOffsetFromCodePoint(strbuff, 1) != 1 || UTF16.findCodePointOffset(array, 0, limit, 1) != 1 || UTF16.findOffsetFromCodePoint(array, 0, limit, 1) != 1) { errln("FAIL Getting the second codepoint offset to a string with " + "supplementary characters"); } if (UTF16.findCodePointOffset(str, 2) != 1 || UTF16.findOffsetFromCodePoint(str, 2) != 3 || UTF16.findCodePointOffset(strbuff, 2) != 1 || UTF16.findOffsetFromCodePoint(strbuff, 2) != 3 || UTF16.findCodePointOffset(array, 0, limit, 2) != 1 || UTF16.findOffsetFromCodePoint(array, 0, limit, 2) != 3) { errln("FAIL Getting the third codepoint offset to a string with " + "supplementary characters"); } if (UTF16.findCodePointOffset(str, 3) != 2 || UTF16.findOffsetFromCodePoint(str, 3) != 4 || UTF16.findCodePointOffset(strbuff, 3) != 2 || UTF16.findOffsetFromCodePoint(strbuff, 3) != 4 || UTF16.findCodePointOffset(array, 0, limit, 3) != 2 || UTF16.findOffsetFromCodePoint(array, 0, limit, 3) != 4) { errln("FAIL Getting the last codepoint offset to a string with " + "supplementary characters"); } if (UTF16.findCodePointOffset(str, 4) != 3 || UTF16.findCodePointOffset(strbuff, 4) != 3 || UTF16.findCodePointOffset(array, 0, limit, 4) != 3) { errln("FAIL Getting the length offset to a string with " + "supplementary characters"); } try { UTF16.findCodePointOffset(str, 5); errln("FAIL Getting the a non-existence codepoint to a string " + "with supplementary characters"); } catch (Exception e) { // this is a success logln("Passed out of bounds codepoint offset"); } try { UTF16.findOffsetFromCodePoint(str, 4); errln("FAIL Getting the a non-existence codepoint to a string " + "with supplementary characters"); } catch (Exception e) { // this is a success logln("Passed out of bounds codepoint offset"); } try { UTF16.findCodePointOffset(strbuff, 5); errln("FAIL Getting the a non-existence codepoint to a string " + "with supplementary characters"); } catch (Exception e) { // this is a success logln("Passed out of bounds codepoint offset"); } try { UTF16.findOffsetFromCodePoint(strbuff, 4); errln("FAIL Getting the a non-existence codepoint to a string " + "with supplementary characters"); } catch (Exception e) { // this is a success logln("Passed out of bounds codepoint offset"); } try { UTF16.findCodePointOffset(array, 0, limit, 5); errln("FAIL Getting the a non-existence codepoint to a string " + "with supplementary characters"); } catch (Exception e) { // this is a success logln("Passed out of bounds codepoint offset"); } try { UTF16.findOffsetFromCodePoint(array, 0, limit, 4); errln("FAIL Getting the a non-existence codepoint to a string " + "with supplementary characters"); } catch (Exception e) { // this is a success logln("Passed out of bounds codepoint offset"); } if (UTF16.findCodePointOffset(array, 1, 3, 0) != 0 || UTF16.findOffsetFromCodePoint(array, 1, 3, 0) != 0 || UTF16.findCodePointOffset(array, 1, 3, 1) != 0 || UTF16.findCodePointOffset(array, 1, 3, 2) != 1 || UTF16.findOffsetFromCodePoint(array, 1, 3, 1) != 2) { errln("FAIL Getting valid codepoint offset in sub array"); } } /** * Testing UTF16 class methods getCharCount, *Surrogate */ public void TestGetCharCountSurrogate() { if (UTF16.getCharCount(0x61) != 1 || UTF16.getCharCount(0x10000) != 2) { errln("FAIL getCharCount result failure"); } if (UTF16.getLeadSurrogate(0x61) != 0 || UTF16.getTrailSurrogate(0x61) != 0x61 || UTF16.isLeadSurrogate((char)0x61) || UTF16.isTrailSurrogate((char)0x61) || UTF16.getLeadSurrogate(0x10000) != 0xd800 || UTF16.getTrailSurrogate(0x10000) != 0xdc00 || UTF16.isLeadSurrogate((char)0xd800) != true || UTF16.isTrailSurrogate((char)0xd800) || UTF16.isLeadSurrogate((char)0xdc00) || UTF16.isTrailSurrogate((char)0xdc00) != true) { errln("FAIL *Surrogate result failure"); } if (UTF16.isSurrogate((char)0x61) || !UTF16.isSurrogate((char)0xd800) || !UTF16.isSurrogate((char)0xdc00)) { errln("FAIL isSurrogate result failure"); } } /** * Testing UTF16 class method insert */ public void TestInsert() { StringBuffer strbuff = new StringBuffer("0123456789"); char array[] = new char[128]; Utility.getChars(strbuff, 0, strbuff.length(), array, 0); int length = 10; UTF16.insert(strbuff, 5, 't'); UTF16.insert(strbuff, 5, 's'); UTF16.insert(strbuff, 5, 'e'); UTF16.insert(strbuff, 5, 't'); if (!(strbuff.toString().equals("01234test56789"))) { errln("FAIL inserting \"test\""); } length = UTF16.insert(array, length, 5, 't'); length = UTF16.insert(array, length, 5, 's'); length = UTF16.insert(array, length, 5, 'e'); length = UTF16.insert(array, length, 5, 't'); String str = new String(array, 0, length); if (!(str.equals("01234test56789"))) { errln("FAIL inserting \"test\""); } UTF16.insert(strbuff, 0, 0x10000); UTF16.insert(strbuff, 11, 0x10000); UTF16.insert(strbuff, strbuff.length(), 0x10000); if (!(strbuff.toString().equals( "\ud800\udc0001234test\ud800\udc0056789\ud800\udc00"))) { errln("FAIL inserting supplementary characters"); } length = UTF16.insert(array, length, 0, 0x10000); length = UTF16.insert(array, length, 11, 0x10000); length = UTF16.insert(array, length, length, 0x10000); str = new String(array, 0, length); if (!(str.equals( "\ud800\udc0001234test\ud800\udc0056789\ud800\udc00"))) { errln("FAIL inserting supplementary characters"); } try { UTF16.insert(strbuff, -1, 0); errln("FAIL invalid insertion offset"); } catch (Exception e) { System.out.print(""); } try { UTF16.insert(strbuff, 64, 0); errln("FAIL invalid insertion offset"); } catch (Exception e) { System.out.print(""); } try { UTF16.insert(array, length, -1, 0); errln("FAIL invalid insertion offset"); } catch (Exception e) { System.out.print(""); } try { UTF16.insert(array, length, 64, 0); errln("FAIL invalid insertion offset"); } catch (Exception e) { System.out.print(""); } try { // exceeded array size UTF16.insert(array, array.length, 64, 0); errln("FAIL invalid insertion offset"); } catch (Exception e) { System.out.print(""); } } /* * Testing moveCodePointOffset APIs */ // // checkMoveCodePointOffset // Run a single test case through each of the moveCodePointOffset() functions. // Parameters - // s The string to work in. // startIdx The starting position within the string. // amount The number of code points to move. // expectedResult The string index after the move, or -1 if the // function should throw an exception. private void checkMoveCodePointOffset(String s, int startIdx, int amount, int expectedResult) { // Test with the String flavor of moveCodePointOffset try { int result = UTF16.moveCodePointOffset(s, startIdx, amount); if (result != expectedResult) { errln("FAIL: UTF16.moveCodePointOffset(String \"" + s + "\", " + startIdx + ", " + amount + ")" + " returned " + result + ", expected result was " + (expectedResult==-1 ? "exception" : Integer.toString(expectedResult))); } } catch (IndexOutOfBoundsException e) { if (expectedResult != -1) { errln("FAIL: UTF16.moveCodePointOffset(String \"" + s + "\", " + startIdx + ", " + amount + ")" + " returned exception" + ", expected result was " + expectedResult); } } // Test with the StringBuffer flavor of moveCodePointOffset StringBuffer sb = new StringBuffer(s); try { int result = UTF16.moveCodePointOffset(sb, startIdx, amount); if (result != expectedResult) { errln("FAIL: UTF16.moveCodePointOffset(StringBuffer \"" + s + "\", " + startIdx + ", " + amount + ")" + " returned " + result + ", expected result was " + (expectedResult==-1 ? "exception" : Integer.toString(expectedResult))); } } catch (IndexOutOfBoundsException e) { if (expectedResult != -1) { errln("FAIL: UTF16.moveCodePointOffset(StringBuffer \"" + s + "\", " + startIdx + ", " + amount + ")" + " returned exception" + ", expected result was " + expectedResult); } } // Test with the char[] flavor of moveCodePointOffset char ca[] = s.toCharArray(); try { int result = UTF16.moveCodePointOffset(ca, 0, s.length(), startIdx, amount); if (result != expectedResult) { errln("FAIL: UTF16.moveCodePointOffset(char[] \"" + s + "\", 0, " + s.length() + ", " + startIdx + ", " + amount + ")" + " returned " + result + ", expected result was " + (expectedResult==-1 ? "exception" : Integer.toString(expectedResult))); } } catch (IndexOutOfBoundsException e) { if (expectedResult != -1) { errln("FAIL: UTF16.moveCodePointOffset(char[] \"" + s + "\", 0, " + s.length() + ", " + startIdx + ", " + amount + ")" + " returned exception" + ", expected result was " + expectedResult); } } // Put the test string into the interior of a char array, // run test on the subsection of the array. char ca2[] = new char[s.length()+2]; ca2[0] = (char)0xd800; ca2[s.length()+1] = (char)0xd8ff; s.getChars(0, s.length(), ca2, 1); try { int result = UTF16.moveCodePointOffset(ca2, 1, s.length()+1, startIdx, amount); if (result != expectedResult) { errln("UTF16.moveCodePointOffset(char[] \"" + "." + s + ".\", 1, " + (s.length()+1) + ", " + startIdx + ", " + amount + ")" + " returned " + result + ", expected result was " + (expectedResult==-1 ? "exception" : Integer.toString(expectedResult))); } } catch (IndexOutOfBoundsException e) { if (expectedResult != -1) { errln("UTF16.moveCodePointOffset(char[] \"" + "." + s + ".\", 1, " + (s.length()+1) + ", " + startIdx + ", " + amount + ")" + " returned exception" + ", expected result was " + expectedResult); } } } public void TestMoveCodePointOffset() { // checkMoveCodePointOffset(String, startIndex, amount, expected ); expected=-1 for exception. // No Supplementary chars checkMoveCodePointOffset("abc", 1, 1, 2); checkMoveCodePointOffset("abc", 1, -1, 0); checkMoveCodePointOffset("abc", 1, -2, -1); checkMoveCodePointOffset("abc", 1, 2, 3); checkMoveCodePointOffset("abc", 1, 3, -1); checkMoveCodePointOffset("abc", 1, 0, 1); checkMoveCodePointOffset("abc", 3, 0, 3); checkMoveCodePointOffset("abc", 4, 0, -1); checkMoveCodePointOffset("abc", 0, 0, 0); checkMoveCodePointOffset("abc", -1, 0, -1); checkMoveCodePointOffset("", 0, 0, 0); checkMoveCodePointOffset("", 0, -1, -1); checkMoveCodePointOffset("", 0, 1, -1); checkMoveCodePointOffset("a", 0, 0, 0); checkMoveCodePointOffset("a", 1, 0, 1); checkMoveCodePointOffset("a", 0, 1, 1); checkMoveCodePointOffset("a", 1, -1, 0); // Supplementary in middle of string checkMoveCodePointOffset("a\ud800\udc00b", 0, 1, 1); checkMoveCodePointOffset("a\ud800\udc00b", 0, 2, 3); checkMoveCodePointOffset("a\ud800\udc00b", 0, 3, 4); checkMoveCodePointOffset("a\ud800\udc00b", 0, 4, -1); checkMoveCodePointOffset("a\ud800\udc00b", 4, -1, 3); checkMoveCodePointOffset("a\ud800\udc00b", 4, -2, 1); checkMoveCodePointOffset("a\ud800\udc00b", 4, -3, 0); checkMoveCodePointOffset("a\ud800\udc00b", 4, -4, -1); // Supplementary at start of string checkMoveCodePointOffset("\ud800\udc00ab", 0, 1, 2); checkMoveCodePointOffset("\ud800\udc00ab", 1, 1, 2); checkMoveCodePointOffset("\ud800\udc00ab", 2, 1, 3); checkMoveCodePointOffset("\ud800\udc00ab", 2, -1, 0); checkMoveCodePointOffset("\ud800\udc00ab", 1, -1, 0); checkMoveCodePointOffset("\ud800\udc00ab", 0, -1, -1); // Supplementary at end of string checkMoveCodePointOffset("ab\ud800\udc00", 1, 1, 2); checkMoveCodePointOffset("ab\ud800\udc00", 2, 1, 4); checkMoveCodePointOffset("ab\ud800\udc00", 3, 1, 4); checkMoveCodePointOffset("ab\ud800\udc00", 4, 1, -1); checkMoveCodePointOffset("ab\ud800\udc00", 5, -2, -1); checkMoveCodePointOffset("ab\ud800\udc00", 4, -1, 2); checkMoveCodePointOffset("ab\ud800\udc00", 3, -1, 2); checkMoveCodePointOffset("ab\ud800\udc00", 2, -1, 1); checkMoveCodePointOffset("ab\ud800\udc00", 1, -1, 0); // Unpaired surrogate in middle checkMoveCodePointOffset("a\ud800b", 0, 1, 1); checkMoveCodePointOffset("a\ud800b", 1, 1, 2); checkMoveCodePointOffset("a\ud800b", 2, 1, 3); checkMoveCodePointOffset("a\udc00b", 0, 1, 1); checkMoveCodePointOffset("a\udc00b", 1, 1, 2); checkMoveCodePointOffset("a\udc00b", 2, 1, 3); checkMoveCodePointOffset("a\udc00\ud800b", 0, 1, 1); checkMoveCodePointOffset("a\udc00\ud800b", 1, 1, 2); checkMoveCodePointOffset("a\udc00\ud800b", 2, 1, 3); checkMoveCodePointOffset("a\udc00\ud800b", 3, 1, 4); checkMoveCodePointOffset("a\ud800b", 1, -1, 0); checkMoveCodePointOffset("a\ud800b", 2, -1, 1); checkMoveCodePointOffset("a\ud800b", 3, -1, 2); checkMoveCodePointOffset("a\udc00b", 1, -1, 0); checkMoveCodePointOffset("a\udc00b", 2, -1, 1); checkMoveCodePointOffset("a\udc00b", 3, -1, 2); checkMoveCodePointOffset("a\udc00\ud800b", 1, -1, 0); checkMoveCodePointOffset("a\udc00\ud800b", 2, -1, 1); checkMoveCodePointOffset("a\udc00\ud800b", 3, -1, 2); checkMoveCodePointOffset("a\udc00\ud800b", 4, -1, 3); // Unpaired surrogate at start checkMoveCodePointOffset("\udc00ab", 0, 1, 1); checkMoveCodePointOffset("\ud800ab", 0, 2, 2); checkMoveCodePointOffset("\ud800\ud800ab", 0, 3, 3); checkMoveCodePointOffset("\udc00\udc00ab", 0, 4, 4); checkMoveCodePointOffset("\udc00ab", 2, -1, 1); checkMoveCodePointOffset("\ud800ab", 1, -1, 0); checkMoveCodePointOffset("\ud800ab", 1, -2, -1); checkMoveCodePointOffset("\ud800\ud800ab", 2, -1, 1); checkMoveCodePointOffset("\udc00\udc00ab", 2, -2, 0); checkMoveCodePointOffset("\udc00\udc00ab", 2, -3, -1); // Unpaired surrogate at end checkMoveCodePointOffset("ab\udc00\udc00ab", 3, 1, 4); checkMoveCodePointOffset("ab\udc00\udc00ab", 2, 1, 3); checkMoveCodePointOffset("ab\udc00\udc00ab", 1, 1, 2); checkMoveCodePointOffset("ab\udc00\udc00ab", 4, -1, 3); checkMoveCodePointOffset("ab\udc00\udc00ab", 3, -1, 2); checkMoveCodePointOffset("ab\udc00\udc00ab", 2, -1, 1); //01234567890 1 2 3 45678901234 String str = new String("0123456789\ud800\udc00\ud801\udc010123456789"); int move1[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 12, 14, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24}; int move2[] = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 14, 15, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, -1}; int move3[] = { 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 15, 16, 16, 17, 18, 19, 20, 21, 22, 23, 24, -1, -1}; int size = str.length(); for (int i = 0; i < size; i ++) { checkMoveCodePointOffset(str, i, 1, move1[i]); checkMoveCodePointOffset(str, i, 2, move2[i]); checkMoveCodePointOffset(str, i, 3, move3[i]); } char strarray[] = str.toCharArray(); if (UTF16.moveCodePointOffset(strarray, 9, 13, 0, 2) != 3) { errln("FAIL: Moving offset 0 by 2 codepoint in subarray [9, 13] " + "expected result 3"); } if (UTF16.moveCodePointOffset(strarray, 9, 13, 1, 2) != 4) { errln("FAIL: Moving offset 1 by 2 codepoint in subarray [9, 13] " + "expected result 4"); } if (UTF16.moveCodePointOffset(strarray, 11, 14, 0, 2) != 3) { errln("FAIL: Moving offset 0 by 2 codepoint in subarray [11, 14] " + "expected result 3"); } } /** * Testing UTF16 class methods setCharAt */ public void TestSetCharAt() { StringBuffer strbuff = new StringBuffer("012345"); char array[] = new char[128]; Utility.getChars(strbuff, 0, strbuff.length(), array, 0); int length = 6; for (int i = 0; i < length; i ++) { UTF16.setCharAt(strbuff, i, '0'); UTF16.setCharAt(array, length, i, '0'); } String str = new String(array, 0, length); if (!(strbuff.toString().equals("000000")) || !(str.equals("000000"))) { errln("FAIL: setChar to '0' failed"); } UTF16.setCharAt(strbuff, 0, 0x10000); UTF16.setCharAt(strbuff, 4, 0x10000); UTF16.setCharAt(strbuff, 7, 0x10000); if (!(strbuff.toString().equals( "\ud800\udc0000\ud800\udc000\ud800\udc00"))) { errln("FAIL: setChar to 0x10000 failed"); } length = UTF16.setCharAt(array, length, 0, 0x10000); length = UTF16.setCharAt(array, length, 4, 0x10000); length = UTF16.setCharAt(array, length, 7, 0x10000); str = new String(array, 0, length); if (!(str.equals("\ud800\udc0000\ud800\udc000\ud800\udc00"))) { errln("FAIL: setChar to 0x10000 failed"); } UTF16.setCharAt(strbuff, 0, '0'); UTF16.setCharAt(strbuff, 1, '1'); UTF16.setCharAt(strbuff, 2, '2'); UTF16.setCharAt(strbuff, 4, '3'); UTF16.setCharAt(strbuff, 4, '4'); UTF16.setCharAt(strbuff, 5, '5'); if (!strbuff.toString().equals("012345")) { errln("Fail converting supplementaries in StringBuffer to BMP " + "characters"); } length = UTF16.setCharAt(array, length, 0, '0'); length = UTF16.setCharAt(array, length, 1, '1'); length = UTF16.setCharAt(array, length, 2, '2'); length = UTF16.setCharAt(array, length, 4, '3'); length = UTF16.setCharAt(array, length, 4, '4'); length = UTF16.setCharAt(array, length, 5, '5'); str = new String(array, 0, length); if (!str.equals("012345")) { errln("Fail converting supplementaries in array to BMP " + "characters"); } try { UTF16.setCharAt(strbuff, -1, 0); errln("FAIL: setting character at invalid offset"); } catch (Exception e) { System.out.print(""); } try { UTF16.setCharAt(array, length, -1, 0); errln("FAIL: setting character at invalid offset"); } catch (Exception e) { System.out.print(""); } try { UTF16.setCharAt(strbuff, length, 0); errln("FAIL: setting character at invalid offset"); } catch (Exception e) { System.out.print(""); } try { UTF16.setCharAt(array, length, length, 0); errln("FAIL: setting character at invalid offset"); } catch (Exception e) { System.out.print(""); } } /** * Testing UTF16 valueof APIs */ public void TestValueOf() { if(UCharacter.getCodePoint('\ud800','\udc00')!=0x10000){ errln("FAIL: getCodePoint('\ud800','\udc00')"); } if (!UTF16.valueOf(0x61).equals("a") || !UTF16.valueOf(0x10000).equals("\ud800\udc00")) { errln("FAIL: valueof(char32)"); } String str = new String("01234\ud800\udc0056789"); StringBuffer strbuff = new StringBuffer(str); char array[] = str.toCharArray(); int length = str.length(); String expected[] = {"0", "1", "2", "3", "4", "\ud800\udc00", "\ud800\udc00", "5", "6", "7", "8", "9"}; for (int i = 0; i < length; i ++) { if (!UTF16.valueOf(str, i).equals(expected[i]) || !UTF16.valueOf(strbuff, i).equals(expected[i]) || !UTF16.valueOf(array, 0, length, i).equals(expected[i])) { errln("FAIL: valueOf() expected " + expected[i]); } } try { UTF16.valueOf(str, -1); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.valueOf(strbuff, -1); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.valueOf(array, 0, length, -1); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.valueOf(str, length); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.valueOf(strbuff, length); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.valueOf(array, 0, length, length); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } if (!UTF16.valueOf(array, 6, length, 0).equals("\udc00") || !UTF16.valueOf(array, 0, 6, 5).equals("\ud800")) { errln("FAIL: error getting partial supplementary character"); } try { UTF16.valueOf(array, 3, 5, -1); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } try { UTF16.valueOf(array, 3, 5, 3); errln("FAIL: out of bounds error expected"); } catch (Exception e) { System.out.print(""); } } public void TestIndexOf() { //012345678901234567890123456789012345 String test1 = "test test ttest tetest testesteststt"; String test2 = "test"; int testChar1 = 0x74; int testChar2 = 0x20402; // int testChar3 = 0xdc02; // int testChar4 = 0xd841; String test3 = "\ud841\udc02\u0071\udc02\ud841\u0071\ud841\udc02\u0071\u0072\ud841\udc02\u0071\ud841\udc02\u0071\udc02\ud841\u0073"; String test4 = UCharacter.toString(testChar2); if (UTF16.indexOf(test1, test2) != 0 || UTF16.indexOf(test1, test2, 0) != 0) { errln("indexOf failed: expected to find '" + test2 + "' at position 0 in text '" + test1 + "'"); } if (UTF16.indexOf(test1, testChar1) != 0 || UTF16.indexOf(test1, testChar1, 0) != 0) { errln("indexOf failed: expected to find 0x" + Integer.toHexString(testChar1) + " at position 0 in text '" + test1 + "'"); } if (UTF16.indexOf(test3, testChar2) != 0 || UTF16.indexOf(test3, testChar2, 0) != 0) { errln("indexOf failed: expected to find 0x" + Integer.toHexString(testChar2) + " at position 0 in text '" + Utility.hex(test3) + "'"); } String test5 = "\ud841\ud841\udc02"; if (UTF16.indexOf(test5, testChar2) != 1 || UTF16.indexOf(test5, testChar2, 0) != 1) { errln("indexOf failed: expected to find 0x" + Integer.toHexString(testChar2) + " at position 0 in text '" + Utility.hex(test3) + "'"); } if (UTF16.lastIndexOf(test1, test2) != 29 || UTF16.lastIndexOf(test1, test2, test1.length()) != 29) { errln("lastIndexOf failed: expected to find '" + test2 + "' at position 29 in text '" + test1 + "'"); } if (UTF16.lastIndexOf(test1, testChar1) != 35 || UTF16.lastIndexOf(test1, testChar1, test1.length()) != 35) { errln("lastIndexOf failed: expected to find 0x" + Integer.toHexString(testChar1) + " at position 35 in text '" + test1 + "'"); } if (UTF16.lastIndexOf(test3, testChar2) != 13 || UTF16.lastIndexOf(test3, testChar2, test3.length()) != 13) { errln("indexOf failed: expected to find 0x" + Integer.toHexString(testChar2) + " at position 13 in text '" + Utility.hex(test3) + "'"); } int occurrences = 0; for (int startPos = 0; startPos != -1 && startPos < test1.length();) { startPos = UTF16.indexOf(test1, test2, startPos); if (startPos >= 0) { ++ occurrences; startPos += 4; } } if (occurrences != 6) { errln("indexOf failed: expected to find 6 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 10; startPos != -1 && startPos < test1.length();) { startPos = UTF16.indexOf(test1, test2, startPos); if (startPos >= 0) { ++ occurrences; startPos += 4; } } if (occurrences != 4) { errln("indexOf with starting offset failed: expected to find 4 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 0; startPos != -1 && startPos < test3.length();) { startPos = UTF16.indexOf(test3, test4, startPos); if (startPos != -1) { ++ occurrences; startPos += 2; } } if (occurrences != 4) { errln("indexOf failed: expected to find 4 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 10; startPos != -1 && startPos < test3.length();) { startPos = UTF16.indexOf(test3, test4, startPos); if (startPos != -1) { ++ occurrences; startPos += 2; } } if (occurrences != 2) { errln("indexOf failed: expected to find 2 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 0; startPos != -1 && startPos < test1.length();) { startPos = UTF16.indexOf(test1, testChar1, startPos); if (startPos != -1) { ++ occurrences; startPos += 1; } } if (occurrences != 16) { errln("indexOf with character failed: expected to find 16 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 10; startPos != -1 && startPos < test1.length();) { startPos = UTF16.indexOf(test1, testChar1, startPos); if (startPos != -1) { ++ occurrences; startPos += 1; } } if (occurrences != 12) { errln("indexOf with character & start offset failed: expected to find 12 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 0; startPos != -1 && startPos < test3.length();) { startPos = UTF16.indexOf(test3, testChar2, startPos); if (startPos != -1) { ++ occurrences; startPos += 1; } } if (occurrences != 4) { errln("indexOf failed: expected to find 4 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 5; startPos != -1 && startPos < test3.length();) { startPos = UTF16.indexOf(test3, testChar2, startPos); if (startPos != -1) { ++ occurrences; startPos += 1; } } if (occurrences != 3) { errln("indexOf with character & start & end offsets failed: expected to find 2 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 32; startPos != -1;) { startPos = UTF16.lastIndexOf(test1, test2, startPos); if (startPos != -1) { ++ occurrences; startPos -= 5; } } if (occurrences != 6) { errln("lastIndexOf with starting and ending offsets failed: expected to find 4 occurrences, found " + occurrences); } occurrences = 0; for (int startPos = 32; startPos != -1;) { startPos = UTF16.lastIndexOf(test1, testChar1, startPos); if (startPos != -1) { ++ occurrences; startPos -= 5; } } if (occurrences != 7) { errln("lastIndexOf with character & start & end offsets failed: expected to find 11 occurrences, found " + occurrences); } //testing UChar32 occurrences = 0; for (int startPos = test3.length(); startPos != -1;) { startPos = UTF16.lastIndexOf(test3, testChar2, startPos - 5); if (startPos != -1) { ++ occurrences; } } if (occurrences != 3) { errln("lastIndexOf with character & start & end offsets failed: expected to find 3 occurrences, found " + occurrences); } // testing supplementary for (int i = 0; i < INDEXOF_SUPPLEMENTARY_CHAR_.length; i ++) { int ch = INDEXOF_SUPPLEMENTARY_CHAR_[i]; for (int j = 0; j < INDEXOF_SUPPLEMENTARY_CHAR_INDEX_[i].length; j ++) { int index = 0; int expected = INDEXOF_SUPPLEMENTARY_CHAR_INDEX_[i][j]; if (j > 0) { index = INDEXOF_SUPPLEMENTARY_CHAR_INDEX_[i][j - 1] + 1; } if (UTF16.indexOf(INDEXOF_SUPPLEMENTARY_STRING_, ch, index) != expected || UTF16.indexOf(INDEXOF_SUPPLEMENTARY_STRING_, UCharacter.toString(ch), index) != expected) { errln("Failed finding index for supplementary 0x" + Integer.toHexString(ch)); } index = INDEXOF_SUPPLEMENTARY_STRING_.length(); if (j < INDEXOF_SUPPLEMENTARY_CHAR_INDEX_[i].length - 1) { index = INDEXOF_SUPPLEMENTARY_CHAR_INDEX_[i][j + 1] - 1; } if (UTF16.lastIndexOf(INDEXOF_SUPPLEMENTARY_STRING_, ch, index) != expected || UTF16.lastIndexOf(INDEXOF_SUPPLEMENTARY_STRING_, UCharacter.toString(ch), index) != expected) { errln("Failed finding last index for supplementary 0x" + Integer.toHexString(ch)); } } } for (int i = 0; i < INDEXOF_SUPPLEMENTARY_STR_INDEX_.length; i ++) { int index = 0; int expected = INDEXOF_SUPPLEMENTARY_STR_INDEX_[i]; if (i > 0) { index = INDEXOF_SUPPLEMENTARY_STR_INDEX_[i - 1] + 1; } if (UTF16.indexOf(INDEXOF_SUPPLEMENTARY_STRING_, INDEXOF_SUPPLEMENTARY_STR_, index) != expected) { errln("Failed finding index for supplementary string " + hex(INDEXOF_SUPPLEMENTARY_STRING_)); } index = INDEXOF_SUPPLEMENTARY_STRING_.length(); if (i < INDEXOF_SUPPLEMENTARY_STR_INDEX_.length - 1) { index = INDEXOF_SUPPLEMENTARY_STR_INDEX_[i + 1] - 1; } if (UTF16.lastIndexOf(INDEXOF_SUPPLEMENTARY_STRING_, INDEXOF_SUPPLEMENTARY_STR_, index) != expected) { errln("Failed finding last index for supplementary string " + hex(INDEXOF_SUPPLEMENTARY_STRING_)); } } } public void TestReplace() { String test1 = "One potato, two potato, three potato, four\n"; String test2 = "potato"; String test3 = "MISSISSIPPI"; String result = UTF16.replace(test1, test2, test3); String expectedValue = "One MISSISSIPPI, two MISSISSIPPI, three MISSISSIPPI, four\n"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } result = UTF16.replace(test1, test3, test2); expectedValue = test1; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } result = UTF16.replace(test1, ',', 'e'); expectedValue = "One potatoe two potatoe three potatoe four\n"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } result = UTF16.replace(test1, ',', 0x10000); expectedValue = "One potato\ud800\udc00 two potato\ud800\udc00 three potato\ud800\udc00 four\n"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } result = UTF16.replace(test1, "potato", "\ud800\udc00\ud801\udc01"); expectedValue = "One \ud800\udc00\ud801\udc01, two \ud800\udc00\ud801\udc01, three \ud800\udc00\ud801\udc01, four\n"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } String test4 = "\ud800\ud800\udc00\ud800\udc00\udc00\ud800\ud800\udc00\ud800\udc00\udc00"; result = UTF16.replace(test4, 0xd800, 'A'); expectedValue = "A\ud800\udc00\ud800\udc00\udc00A\ud800\udc00\ud800\udc00\udc00"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } result = UTF16.replace(test4, 0xdC00, 'A'); expectedValue = "\ud800\ud800\udc00\ud800\udc00A\ud800\ud800\udc00\ud800\udc00A"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } result = UTF16.replace(test4, 0x10000, 'A'); expectedValue = "\ud800AA\udc00\ud800AA\udc00"; if (!result.equals(expectedValue)) { errln("findAndReplace failed: expected \"" + expectedValue + "\", got \"" + test1 + "\"."); } } public void TestReverse() { StringBuffer test = new StringBuffer( "backwards words say to used I"); StringBuffer result = UTF16.reverse(test); if (!result.toString().equals("I desu ot yas sdrow sdrawkcab")) { errln("reverse() failed: Expected \"I desu ot yas sdrow sdrawkcab\",\n got \"" + result + "\""); } StringBuffer testbuffer = new StringBuffer(); UTF16.append(testbuffer, 0x2f999); UTF16.append(testbuffer, 0x1d15f); UTF16.append(testbuffer, 0x00c4); UTF16.append(testbuffer, 0x1ed0); result = UTF16.reverse(testbuffer); if (result.charAt(0) != 0x1ed0 || result.charAt(1) != 0xc4 || UTF16.charAt(result, 2) != 0x1d15f || UTF16.charAt(result, 4)!=0x2f999) { errln("reverse() failed with supplementary characters"); } } /** * Testing the setter and getter apis for StringComparator */ public void TestStringComparator() { UTF16.StringComparator compare = new UTF16.StringComparator(); if (compare.getCodePointCompare() != false) { errln("Default string comparator should be code unit compare"); } if (compare.getIgnoreCase() != false) { errln("Default string comparator should be case sensitive compare"); } if (compare.getIgnoreCaseOption() != UTF16.StringComparator.FOLD_CASE_DEFAULT) { errln("Default string comparator should have fold case default compare"); } compare.setCodePointCompare(true); if (compare.getCodePointCompare() != true) { errln("Error setting code point compare"); } compare.setCodePointCompare(false); if (compare.getCodePointCompare() != false) { errln("Error setting code point compare"); } compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_DEFAULT); if (compare.getIgnoreCase() != true || compare.getIgnoreCaseOption() != UTF16.StringComparator.FOLD_CASE_DEFAULT) { errln("Error setting ignore case and options"); } compare.setIgnoreCase(false, UTF16.StringComparator.FOLD_CASE_EXCLUDE_SPECIAL_I); if (compare.getIgnoreCase() != false || compare.getIgnoreCaseOption() != UTF16.StringComparator.FOLD_CASE_EXCLUDE_SPECIAL_I) { errln("Error setting ignore case and options"); } compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_EXCLUDE_SPECIAL_I); if (compare.getIgnoreCase() != true || compare.getIgnoreCaseOption() != UTF16.StringComparator.FOLD_CASE_EXCLUDE_SPECIAL_I) { errln("Error setting ignore case and options"); } compare.setIgnoreCase(false, UTF16.StringComparator.FOLD_CASE_DEFAULT); if (compare.getIgnoreCase() != false || compare.getIgnoreCaseOption() != UTF16.StringComparator.FOLD_CASE_DEFAULT) { errln("Error setting ignore case and options"); } } public void TestCodePointCompare() { // these strings are in ascending order String str[] = {"\u0061", "\u20ac\ud801", "\u20ac\ud800\udc00", "\ud800", "\ud800\uff61", "\udfff", "\uff61\udfff", "\uff61\ud800\udc02", "\ud800\udc02", "\ud84d\udc56"}; UTF16.StringComparator cpcompare = new UTF16.StringComparator(true, false, UTF16.StringComparator.FOLD_CASE_DEFAULT); UTF16.StringComparator cucompare = new UTF16.StringComparator(); for (int i = 0; i < str.length - 1; ++ i) { if (cpcompare.compare(str[i], str[i + 1]) >= 0) { errln("error: compare() in code point order fails for string " + Utility.hex(str[i]) + " and " + Utility.hex(str[i + 1])); } // test code unit compare if (cucompare.compare(str[i], str[i + 1]) != str[i].compareTo(str[i + 1])) { errln("error: compare() in code unit order fails for string " + Utility.hex(str[i]) + " and " + Utility.hex(str[i + 1])); } } } public void TestCaseCompare() { String mixed = "\u0061\u0042\u0131\u03a3\u00df\ufb03\ud93f\udfff"; String otherDefault = "\u0041\u0062\u0131\u03c3\u0073\u0053\u0046\u0066\u0049\ud93f\udfff"; String otherExcludeSpecialI = "\u0041\u0062\u0131\u03c3\u0053\u0073\u0066\u0046\u0069\ud93f\udfff"; String different = "\u0041\u0062\u0131\u03c3\u0073\u0053\u0046\u0066\u0049\ud93f\udffd"; UTF16.StringComparator compare = new UTF16.StringComparator(); compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_DEFAULT); // test u_strcasecmp() int result = compare.compare(mixed, otherDefault); if (result != 0) { errln("error: default compare(mixed, other) = " + result + " instead of 0"); } // test u_strcasecmp() - exclude special i compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_EXCLUDE_SPECIAL_I); result = compare.compare(mixed, otherExcludeSpecialI); if (result != 0) { errln("error: exclude_i compare(mixed, other) = " + result + " instead of 0"); } // test u_strcasecmp() compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_DEFAULT); result = compare.compare(mixed, different); if (result <= 0) { errln("error: default compare(mixed, different) = " + result + " instead of positive"); } // test substrings - stop before the sharp s (U+00df) compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_DEFAULT); result = compare.compare(mixed.substring(0, 4), different.substring(0, 4)); if (result != 0) { errln("error: default compare(mixed substring, different substring) = " + result + " instead of 0"); } // test substrings - stop in the middle of the sharp s (U+00df) compare.setIgnoreCase(true, UTF16.StringComparator.FOLD_CASE_DEFAULT); result = compare.compare(mixed.substring(0, 5), different.substring(0, 5)); if (result <= 0) { errln("error: default compare(mixed substring, different substring) = " + result + " instead of positive"); } } public void TestHasMoreCodePointsThan() { String str = "\u0061\u0062\ud800\udc00\ud801\udc01\u0063\ud802\u0064" + "\udc03\u0065\u0066\ud804\udc04\ud805\udc05\u0067"; int length = str.length(); while (length >= 0) { for (int i = 0; i <= length; ++ i) { String s = str.substring(0, i); for (int number = -1; number <= ((length - i) + 2); ++ number) { boolean flag = UTF16.hasMoreCodePointsThan(s, number); if (flag != (UTF16.countCodePoint(s) > number)) { errln("hasMoreCodePointsThan(" + Utility.hex(s) + ", " + number + ") = " + flag + " is wrong"); } } } -- length; } // testing for null bad input for(length = -1; length <= 1; ++ length) { for (int i = 0; i <= length; ++ i) { for (int number = -2; number <= 2; ++ number) { boolean flag = UTF16.hasMoreCodePointsThan((String)null, number); if (flag != (UTF16.countCodePoint((String)null) > number)) { errln("hasMoreCodePointsThan(null, " + number + ") = " + flag + " is wrong"); } } } } length = str.length(); while (length >= 0) { for (int i = 0; i <= length; ++ i) { StringBuffer s = new StringBuffer(str.substring(0, i)); for (int number = -1; number <= ((length - i) + 2); ++ number) { boolean flag = UTF16.hasMoreCodePointsThan(s, number); if (flag != (UTF16.countCodePoint(s) > number)) { errln("hasMoreCodePointsThan(" + Utility.hex(s) + ", " + number + ") = " + flag + " is wrong"); } } } -- length; } // testing for null bad input for (length = -1; length <= 1; ++ length) { for (int i = 0; i <= length; ++ i) { for (int number = -2; number <= 2; ++ number) { boolean flag = UTF16.hasMoreCodePointsThan( (StringBuffer)null, number); if (flag != (UTF16.countCodePoint((StringBuffer)null) > number)) { errln("hasMoreCodePointsThan(null, " + number + ") = " + flag + " is wrong"); } } } } char strarray[] = str.toCharArray(); while (length >= 0) { for (int limit = 0; limit <= length; ++ limit) { for (int start = 0; start <= limit; ++ start) { for (int number = -1; number <= ((limit - start) + 2); ++ number) { boolean flag = UTF16.hasMoreCodePointsThan(strarray, start, limit, number); if (flag != (UTF16.countCodePoint(strarray, start, limit) > number)) { errln("hasMoreCodePointsThan(" + Utility.hex(str.substring(start, limit)) + ", " + start + ", " + limit + ", " + number + ") = " + flag + " is wrong"); } } } } -- length; } // testing for null bad input for (length = -1; length <= 1; ++ length) { for (int i = 0; i <= length; ++ i) { for (int number = -2; number <= 2; ++ number) { boolean flag = UTF16.hasMoreCodePointsThan( (StringBuffer)null, number); if (flag != (UTF16.countCodePoint((StringBuffer)null) > number)) { errln("hasMoreCodePointsThan(null, " + number + ") = " + flag + " is wrong"); } } } } // bad input try { UTF16.hasMoreCodePointsThan(strarray, -2, -1, 5); errln("hasMoreCodePointsThan(chararray) with negative indexes has to throw an exception"); } catch (Exception e) { logln("PASS: UTF16.hasMoreCodePointsThan failed as expected"); } try { UTF16.hasMoreCodePointsThan(strarray, 5, 2, 5); errln("hasMoreCodePointsThan(chararray) with limit less than start index has to throw an exception"); } catch (Exception e) { logln("PASS: UTF16.hasMoreCodePointsThan failed as expected"); } try { if (UTF16.hasMoreCodePointsThan(strarray, -2, 2, 5)) { errln("hasMoreCodePointsThan(chararray) with negative start indexes can't return true"); } } catch (Exception e) { } } public void TestNewString() { final int[] codePoints = { UCharacter.toCodePoint(UCharacter.MIN_HIGH_SURROGATE, UCharacter.MAX_LOW_SURROGATE), UCharacter.toCodePoint(UCharacter.MAX_HIGH_SURROGATE, UCharacter.MIN_LOW_SURROGATE), UCharacter.MAX_HIGH_SURROGATE, 'A', -1, }; final String cpString = "" + UCharacter.MIN_HIGH_SURROGATE + UCharacter.MAX_LOW_SURROGATE + UCharacter.MAX_HIGH_SURROGATE + UCharacter.MIN_LOW_SURROGATE + UCharacter.MAX_HIGH_SURROGATE + 'A'; final int[][] tests = { { 0, 1, 0, 2 }, { 0, 2, 0, 4 }, { 1, 1, 2, 2 }, { 1, 2, 2, 3 }, { 1, 3, 2, 4 }, { 2, 2, 4, 2 }, { 2, 3, 0, -1 }, { 4, 5, 0, -1 }, { 3, -1, 0, -1 } }; for (int i = 0; i < tests.length; ++i) { int[] t = tests[i]; int s = t[0]; int c = t[1]; int rs = t[2]; int rc = t[3]; Exception e = null; try { String str = UTF16.newString(codePoints, s, c); if (rc == -1 || !str.equals(cpString.substring(rs, rs+rc))) { errln("failed codePoints iter: " + i + " start: " + s + " len: " + c); } continue; } catch (IndexOutOfBoundsException e1) { e = e1; } catch (IllegalArgumentException e2) { e = e2; } if (rc != -1) { errln(e.getMessage()); } } } public static void main(String[] arg) { try { UTF16Test test = new UTF16Test(); test.run(arg); // test.TestCaseCompare(); } catch (Exception e) { e.printStackTrace(); } } // private data members ---------------------------------------------- private final static String INDEXOF_SUPPLEMENTARY_STRING_ = "\ud841\udc02\u0071\udc02\ud841\u0071\ud841\udc02\u0071\u0072" + "\ud841\udc02\u0071\ud841\udc02\u0071\udc02\ud841\u0073"; private final static int INDEXOF_SUPPLEMENTARY_CHAR_[] = {0x71, 0xd841, 0xdc02, UTF16Util.getRawSupplementary((char)0xd841, (char)0xdc02)}; private final static int INDEXOF_SUPPLEMENTARY_CHAR_INDEX_[][] = {{2, 5, 8, 12, 15}, {4, 17}, {3, 16}, {0, 6, 10, 13} }; private final static String INDEXOF_SUPPLEMENTARY_STR_ = "\udc02\ud841"; private final static int INDEXOF_SUPPLEMENTARY_STR_INDEX_[] = {3, 16}; // private methods --------------------------------------------------- } icu4j-4.2/src/com/ibm/icu/dev/test/lang/TestCharacter.java0000644000175000017500000000204711361046222023304 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk.TestGroup; public class TestCharacter extends TestGroup { public static void main(String[] args) { new TestCharacter().run(args); } public TestCharacter() { super( new String[] { "UCharacterTest", "UCharacterCaseTest", "UCharacterCategoryTest", "UCharacterDirectionTest", "UPropertyAliasesTest", "UTF16Test", "UCharacterSurrogateTest", "UCharacterThreadTest" }, "Character Property and UTF16 Tests"); } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterCaseTest.java0000644000175000017500000013005211361046222024223 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.dev.test.TestUtil; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.BreakIterator; import com.ibm.icu.text.RuleBasedBreakIterator; import com.ibm.icu.text.UnicodeSet; import com.ibm.icu.util.ULocale; import com.ibm.icu.impl.UCaseProps; import com.ibm.icu.impl.Utility; import java.util.Locale; import java.io.BufferedReader; import java.util.Vector; /** *

Testing character casing

*

Mostly following the test cases in strcase.cpp for ICU

* @author Syn Wee Quek * @since march 14 2002 */ public final class UCharacterCaseTest extends TestFmwk { // constructor ----------------------------------------------------------- /** * Constructor */ public UCharacterCaseTest() { } // public methods -------------------------------------------------------- public static void main(String[] arg) { try { UCharacterCaseTest test = new UCharacterCaseTest(); test.run(arg); } catch (Exception e) { e.printStackTrace(); } } /** * Testing the uppercase and lowercase function of UCharacter */ public void TestCharacter() { for (int i = 0; i < CHARACTER_LOWER_.length; i ++) { if (UCharacter.isLetter(CHARACTER_LOWER_[i]) && !UCharacter.isLowerCase(CHARACTER_LOWER_[i])) { errln("FAIL isLowerCase test for \\u" + hex(CHARACTER_LOWER_[i])); break; } if (UCharacter.isLetter(CHARACTER_UPPER_[i]) && !(UCharacter.isUpperCase(CHARACTER_UPPER_[i]) || UCharacter.isTitleCase(CHARACTER_UPPER_[i]))) { errln("FAIL isUpperCase test for \\u" + hex(CHARACTER_UPPER_[i])); break; } if (CHARACTER_LOWER_[i] != UCharacter.toLowerCase(CHARACTER_UPPER_[i]) || (CHARACTER_UPPER_[i] != UCharacter.toUpperCase(CHARACTER_LOWER_[i]) && CHARACTER_UPPER_[i] != UCharacter.toTitleCase(CHARACTER_LOWER_[i]))) { errln("FAIL case conversion test for \\u" + hex(CHARACTER_UPPER_[i]) + " to \\u" + hex(CHARACTER_LOWER_[i])); break; } if (CHARACTER_LOWER_[i] != UCharacter.toLowerCase(CHARACTER_LOWER_[i])) { errln("FAIL lower case conversion test for \\u" + hex(CHARACTER_LOWER_[i])); break; } if (CHARACTER_UPPER_[i] != UCharacter.toUpperCase(CHARACTER_UPPER_[i]) && CHARACTER_UPPER_[i] != UCharacter.toTitleCase(CHARACTER_UPPER_[i])) { errln("FAIL upper case conversion test for \\u" + hex(CHARACTER_UPPER_[i])); break; } logln("Ok \\u" + hex(CHARACTER_UPPER_[i]) + " and \\u" + hex(CHARACTER_LOWER_[i])); } } public void TestFolding() { // test simple case folding for (int i = 0; i < FOLDING_SIMPLE_.length; i += 3) { if (UCharacter.foldCase(FOLDING_SIMPLE_[i], true) != FOLDING_SIMPLE_[i + 1]) { errln("FAIL: foldCase(\\u" + hex(FOLDING_SIMPLE_[i]) + ", true) should be \\u" + hex(FOLDING_SIMPLE_[i + 1])); } if (UCharacter.foldCase(FOLDING_SIMPLE_[i], UCharacter.FOLD_CASE_DEFAULT) != FOLDING_SIMPLE_[i + 1]) { errln("FAIL: foldCase(\\u" + hex(FOLDING_SIMPLE_[i]) + ", UCharacter.FOLD_CASE_DEFAULT) should be \\u" + hex(FOLDING_SIMPLE_[i + 1])); } if (UCharacter.foldCase(FOLDING_SIMPLE_[i], false) != FOLDING_SIMPLE_[i + 2]) { errln("FAIL: foldCase(\\u" + hex(FOLDING_SIMPLE_[i]) + ", false) should be \\u" + hex(FOLDING_SIMPLE_[i + 2])); } if (UCharacter.foldCase(FOLDING_SIMPLE_[i], UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I) != FOLDING_SIMPLE_[i + 2]) { errln("FAIL: foldCase(\\u" + hex(FOLDING_SIMPLE_[i]) + ", UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I) should be \\u" + hex(FOLDING_SIMPLE_[i + 2])); } } // Test full string case folding with default option and separate // buffers if (!FOLDING_DEFAULT_[0].equals(UCharacter.foldCase(FOLDING_MIXED_[0], true))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[0]) + ", true)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[0], true)) + " should be " + prettify(FOLDING_DEFAULT_[0])); } if (!FOLDING_DEFAULT_[0].equals(UCharacter.foldCase(FOLDING_MIXED_[0], UCharacter.FOLD_CASE_DEFAULT))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[0]) + ", UCharacter.FOLD_CASE_DEFAULT)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[0], UCharacter.FOLD_CASE_DEFAULT)) + " should be " + prettify(FOLDING_DEFAULT_[0])); } if (!FOLDING_EXCLUDE_SPECIAL_I_[0].equals( UCharacter.foldCase(FOLDING_MIXED_[0], false))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[0]) + ", false)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[0], false)) + " should be " + prettify(FOLDING_EXCLUDE_SPECIAL_I_[0])); } if (!FOLDING_EXCLUDE_SPECIAL_I_[0].equals( UCharacter.foldCase(FOLDING_MIXED_[0], UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[0]) + ", UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[0], UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I)) + " should be " + prettify(FOLDING_EXCLUDE_SPECIAL_I_[0])); } if (!FOLDING_DEFAULT_[1].equals(UCharacter.foldCase(FOLDING_MIXED_[1], true))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[1]) + ", true)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[1], true)) + " should be " + prettify(FOLDING_DEFAULT_[1])); } if (!FOLDING_DEFAULT_[1].equals(UCharacter.foldCase(FOLDING_MIXED_[1], UCharacter.FOLD_CASE_DEFAULT))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[1]) + ", UCharacter.FOLD_CASE_DEFAULT)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[1], UCharacter.FOLD_CASE_DEFAULT)) + " should be " + prettify(FOLDING_DEFAULT_[1])); } // alternate handling for dotted I/dotless i (U+0130, U+0131) if (!FOLDING_EXCLUDE_SPECIAL_I_[1].equals( UCharacter.foldCase(FOLDING_MIXED_[1], false))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[1]) + ", false)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[1], false)) + " should be " + prettify(FOLDING_EXCLUDE_SPECIAL_I_[1])); } if (!FOLDING_EXCLUDE_SPECIAL_I_[1].equals( UCharacter.foldCase(FOLDING_MIXED_[1], UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I))) { errln("FAIL: foldCase(" + prettify(FOLDING_MIXED_[1]) + ", UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I)=" + prettify(UCharacter.foldCase(FOLDING_MIXED_[1], UCharacter.FOLD_CASE_EXCLUDE_SPECIAL_I)) + " should be " + prettify(FOLDING_EXCLUDE_SPECIAL_I_[1])); } } /** * Testing the strings case mapping methods */ public void TestUpper() { // uppercase with root locale and in the same buffer if (!UPPER_ROOT_.equals(UCharacter.toUpperCase(UPPER_BEFORE_))) { errln("Fail " + UPPER_BEFORE_ + " after uppercase should be " + UPPER_ROOT_ + " instead got " + UCharacter.toUpperCase(UPPER_BEFORE_)); } // uppercase with turkish locale and separate buffers if (!UPPER_TURKISH_.equals(UCharacter.toUpperCase(TURKISH_LOCALE_, UPPER_BEFORE_))) { errln("Fail " + UPPER_BEFORE_ + " after turkish-sensitive uppercase should be " + UPPER_TURKISH_ + " instead of " + UCharacter.toUpperCase(TURKISH_LOCALE_, UPPER_BEFORE_)); } // uppercase a short string with root locale if (!UPPER_MINI_UPPER_.equals(UCharacter.toUpperCase(UPPER_MINI_))) { errln("error in toUpper(root locale)=\"" + UPPER_MINI_ + "\" expected \"" + UPPER_MINI_UPPER_ + "\""); } if (!SHARED_UPPERCASE_TOPKAP_.equals( UCharacter.toUpperCase(SHARED_LOWERCASE_TOPKAP_))) { errln("toUpper failed: expected \"" + SHARED_UPPERCASE_TOPKAP_ + "\", got \"" + UCharacter.toUpperCase(SHARED_LOWERCASE_TOPKAP_) + "\"."); } if (!SHARED_UPPERCASE_TURKISH_.equals( UCharacter.toUpperCase(TURKISH_LOCALE_, SHARED_LOWERCASE_TOPKAP_))) { errln("toUpper failed: expected \"" + SHARED_UPPERCASE_TURKISH_ + "\", got \"" + UCharacter.toUpperCase(TURKISH_LOCALE_, SHARED_LOWERCASE_TOPKAP_) + "\"."); } if (!SHARED_UPPERCASE_GERMAN_.equals( UCharacter.toUpperCase(GERMAN_LOCALE_, SHARED_LOWERCASE_GERMAN_))) { errln("toUpper failed: expected \"" + SHARED_UPPERCASE_GERMAN_ + "\", got \"" + UCharacter.toUpperCase(GERMAN_LOCALE_, SHARED_LOWERCASE_GERMAN_) + "\"."); } if (!SHARED_UPPERCASE_GREEK_.equals( UCharacter.toUpperCase(SHARED_LOWERCASE_GREEK_))) { errln("toLower failed: expected \"" + SHARED_UPPERCASE_GREEK_ + "\", got \"" + UCharacter.toUpperCase( SHARED_LOWERCASE_GREEK_) + "\"."); } } public void TestLower() { if (!LOWER_ROOT_.equals(UCharacter.toLowerCase(LOWER_BEFORE_))) { errln("Fail " + LOWER_BEFORE_ + " after lowercase should be " + LOWER_ROOT_ + " instead of " + UCharacter.toLowerCase(LOWER_BEFORE_)); } // lowercase with turkish locale if (!LOWER_TURKISH_.equals(UCharacter.toLowerCase(TURKISH_LOCALE_, LOWER_BEFORE_))) { errln("Fail " + LOWER_BEFORE_ + " after turkish-sensitive lowercase should be " + LOWER_TURKISH_ + " instead of " + UCharacter.toLowerCase(TURKISH_LOCALE_, LOWER_BEFORE_)); } if (!SHARED_LOWERCASE_ISTANBUL_.equals( UCharacter.toLowerCase(SHARED_UPPERCASE_ISTANBUL_))) { errln("1. toLower failed: expected \"" + SHARED_LOWERCASE_ISTANBUL_ + "\", got \"" + UCharacter.toLowerCase(SHARED_UPPERCASE_ISTANBUL_) + "\"."); } if (!SHARED_LOWERCASE_TURKISH_.equals( UCharacter.toLowerCase(TURKISH_LOCALE_, SHARED_UPPERCASE_ISTANBUL_))) { errln("2. toLower failed: expected \"" + SHARED_LOWERCASE_TURKISH_ + "\", got \"" + UCharacter.toLowerCase(TURKISH_LOCALE_, SHARED_UPPERCASE_ISTANBUL_) + "\"."); } if (!SHARED_LOWERCASE_GREEK_.equals( UCharacter.toLowerCase(GREEK_LOCALE_, SHARED_UPPERCASE_GREEK_))) { errln("toLower failed: expected \"" + SHARED_LOWERCASE_GREEK_ + "\", got \"" + UCharacter.toLowerCase(GREEK_LOCALE_, SHARED_UPPERCASE_GREEK_) + "\"."); } } public void TestTitleRegression() throws java.io.IOException { UCaseProps props = new UCaseProps(); int type = props.getTypeOrIgnorable('\''); assertEquals("Case Ignorable check", -1, type); // should be case-ignorable (-1) UnicodeSet allCaseIgnorables = new UnicodeSet(); for (int cp = 0; cp <= 0x10FFFF; ++cp) { if (props.getTypeOrIgnorable(cp) < 0) { allCaseIgnorables.add(cp); } } logln(allCaseIgnorables.toString()); assertEquals("Titlecase check", "The Quick Brown Fox Can't Jump Over The Lazy Dogs.", UCharacter.toTitleCase(ULocale.ENGLISH, "THE QUICK BROWN FOX CAN'T JUMP OVER THE LAZY DOGS.", null)); } public void TestTitle() { try{ for (int i = 0; i < TITLE_DATA_.length;) { String test = TITLE_DATA_[i++]; String expected = TITLE_DATA_[i++]; ULocale locale = new ULocale(TITLE_DATA_[i++]); int breakType = Integer.parseInt(TITLE_DATA_[i++]); String optionsString = TITLE_DATA_[i++]; BreakIterator iter = breakType >= 0 ? BreakIterator.getBreakInstance(locale, breakType) : breakType == -2 ? // Open a trivial break iterator that only delivers { 0, length } // or even just { 0 } as boundaries. new RuleBasedBreakIterator(".*;") : null; int options = 0; if (optionsString.indexOf('L') >= 0) { options |= UCharacter.TITLECASE_NO_LOWERCASE; } if (optionsString.indexOf('A') >= 0) { options |= UCharacter.TITLECASE_NO_BREAK_ADJUSTMENT; } String result = UCharacter.toTitleCase(locale, test, iter, options); if (!expected.equals(result)) { errln("titlecasing for " + prettify(test) + " (options " + options + ") should be " + prettify(expected) + " but got " + prettify(result)); } if (options == 0) { result = UCharacter.toTitleCase(locale, test, iter); if (!expected.equals(result)) { errln("titlecasing for " + prettify(test) + " should be " + prettify(expected) + " but got " + prettify(result)); } } } }catch(Exception ex){ warnln("Could not find data for BreakIterators"); } } public void TestDutchTitle() { ULocale LOC_DUTCH = new ULocale("nl"); int options = 0; options |= UCharacter.TITLECASE_NO_LOWERCASE; BreakIterator iter = BreakIterator.getWordInstance(LOC_DUTCH); assertEquals("Dutch titlecase check in English", "Ijssel Igloo Ijmuiden", UCharacter.toTitleCase(ULocale.ENGLISH, "ijssel igloo IJMUIDEN", null)); assertEquals("Dutch titlecase check in Dutch", "IJssel Igloo IJmuiden", UCharacter.toTitleCase(LOC_DUTCH, "ijssel igloo IJMUIDEN", null)); iter.setText("ijssel igloo IjMUIdEN iPoD ijenough"); assertEquals("Dutch titlecase check in Dutch with nolowercase option", "IJssel Igloo IJMUIdEN IPoD IJenough", UCharacter.toTitleCase(LOC_DUTCH, "ijssel igloo IjMUIdEN iPoD ijenough", iter, options)); } public void TestSpecial() { for (int i = 0; i < SPECIAL_LOCALES_.length; i ++) { int j = i * 3; Locale locale = SPECIAL_LOCALES_[i]; String str = SPECIAL_DATA_[j]; if (locale != null) { if (!SPECIAL_DATA_[j + 1].equals( UCharacter.toLowerCase(locale, str))) { errln("error lowercasing special characters " + hex(str) + " expected " + hex(SPECIAL_DATA_[j + 1]) + " for locale " + locale.toString() + " but got " + hex(UCharacter.toLowerCase(locale, str))); } if (!SPECIAL_DATA_[j + 2].equals( UCharacter.toUpperCase(locale, str))) { errln("error uppercasing special characters " + hex(str) + " expected " + SPECIAL_DATA_[j + 2] + " for locale " + locale.toString() + " but got " + hex(UCharacter.toUpperCase(locale, str))); } } else { if (!SPECIAL_DATA_[j + 1].equals( UCharacter.toLowerCase(str))) { errln("error lowercasing special characters " + hex(str) + " expected " + SPECIAL_DATA_[j + 1] + " but got " + hex(UCharacter.toLowerCase(locale, str))); } if (!SPECIAL_DATA_[j + 2].equals( UCharacter.toUpperCase(locale, str))) { errln("error uppercasing special characters " + hex(str) + " expected " + SPECIAL_DATA_[j + 2] + " but got " + hex(UCharacter.toUpperCase(locale, str))); } } } // turkish & azerbaijani dotless i & dotted I // remove dot above if there was a capital I before and there are no // more accents above if (!SPECIAL_DOTTED_LOWER_TURKISH_.equals(UCharacter.toLowerCase( TURKISH_LOCALE_, SPECIAL_DOTTED_))) { errln("error in dots.toLower(tr)=\"" + SPECIAL_DOTTED_ + "\" expected \"" + SPECIAL_DOTTED_LOWER_TURKISH_ + "\" but got " + UCharacter.toLowerCase(TURKISH_LOCALE_, SPECIAL_DOTTED_)); } if (!SPECIAL_DOTTED_LOWER_GERMAN_.equals(UCharacter.toLowerCase( GERMAN_LOCALE_, SPECIAL_DOTTED_))) { errln("error in dots.toLower(de)=\"" + SPECIAL_DOTTED_ + "\" expected \"" + SPECIAL_DOTTED_LOWER_GERMAN_ + "\" but got " + UCharacter.toLowerCase(GERMAN_LOCALE_, SPECIAL_DOTTED_)); } // lithuanian dot above in uppercasing if (!SPECIAL_DOT_ABOVE_UPPER_LITHUANIAN_.equals( UCharacter.toUpperCase(LITHUANIAN_LOCALE_, SPECIAL_DOT_ABOVE_))) { errln("error in dots.toUpper(lt)=\"" + SPECIAL_DOT_ABOVE_ + "\" expected \"" + SPECIAL_DOT_ABOVE_UPPER_LITHUANIAN_ + "\" but got " + UCharacter.toUpperCase(LITHUANIAN_LOCALE_, SPECIAL_DOT_ABOVE_)); } if (!SPECIAL_DOT_ABOVE_UPPER_GERMAN_.equals(UCharacter.toUpperCase( GERMAN_LOCALE_, SPECIAL_DOT_ABOVE_))) { errln("error in dots.toUpper(de)=\"" + SPECIAL_DOT_ABOVE_ + "\" expected \"" + SPECIAL_DOT_ABOVE_UPPER_GERMAN_ + "\" but got " + UCharacter.toUpperCase(GERMAN_LOCALE_, SPECIAL_DOT_ABOVE_)); } // lithuanian adds dot above to i in lowercasing if there are more // above accents if (!SPECIAL_DOT_ABOVE_LOWER_LITHUANIAN_.equals( UCharacter.toLowerCase(LITHUANIAN_LOCALE_, SPECIAL_DOT_ABOVE_UPPER_))) { errln("error in dots.toLower(lt)=\"" + SPECIAL_DOT_ABOVE_UPPER_ + "\" expected \"" + SPECIAL_DOT_ABOVE_LOWER_LITHUANIAN_ + "\" but got " + UCharacter.toLowerCase(LITHUANIAN_LOCALE_, SPECIAL_DOT_ABOVE_UPPER_)); } if (!SPECIAL_DOT_ABOVE_LOWER_GERMAN_.equals( UCharacter.toLowerCase(GERMAN_LOCALE_, SPECIAL_DOT_ABOVE_UPPER_))) { errln("error in dots.toLower(de)=\"" + SPECIAL_DOT_ABOVE_UPPER_ + "\" expected \"" + SPECIAL_DOT_ABOVE_LOWER_GERMAN_ + "\" but got " + UCharacter.toLowerCase(GERMAN_LOCALE_, SPECIAL_DOT_ABOVE_UPPER_)); } } /** * Tests for case mapping in the file SpecialCasing.txt * This method reads in SpecialCasing.txt file for testing purposes. * A default path is provided relative to the src path, however the user * could set a system property to change the directory path.
* e.g. java -DUnicodeData="data_dir_path" com.ibm.dev.test.lang.UCharacterTest */ public void TestSpecialCasingTxt() { try { // reading in the SpecialCasing file BufferedReader input = TestUtil.getDataReader( "unicode/SpecialCasing.txt"); while (true) { String s = input.readLine(); if (s == null) { break; } if (s.length() == 0 || s.charAt(0) == '#') { continue; } String chstr[] = getUnicodeStrings(s); StringBuffer strbuffer = new StringBuffer(chstr[0]); StringBuffer lowerbuffer = new StringBuffer(chstr[1]); StringBuffer upperbuffer = new StringBuffer(chstr[3]); Locale locale = null; for (int i = 4; i < chstr.length; i ++) { String condition = chstr[i]; if (Character.isLowerCase(chstr[i].charAt(0))) { // specified locale locale = new Locale(chstr[i], ""); } else if (condition.compareToIgnoreCase("Not_Before_Dot") == 0) { // turns I into dotless i } else if (condition.compareToIgnoreCase( "More_Above") == 0) { strbuffer.append((char)0x300); lowerbuffer.append((char)0x300); upperbuffer.append((char)0x300); } else if (condition.compareToIgnoreCase( "After_Soft_Dotted") == 0) { strbuffer.insert(0, 'i'); lowerbuffer.insert(0, 'i'); String lang = ""; if (locale != null) { lang = locale.getLanguage(); } if (lang.equals("tr") || lang.equals("az")) { // this is to be removed when 4.0 data comes out // and upperbuffer.insert uncommented // see jitterbug 2344 chstr[i] = "After_I"; strbuffer.deleteCharAt(0); lowerbuffer.deleteCharAt(0); i --; continue; // upperbuffer.insert(0, '\u0130'); } else { upperbuffer.insert(0, 'I'); } } else if (condition.compareToIgnoreCase( "Final_Sigma") == 0) { strbuffer.insert(0, 'c'); lowerbuffer.insert(0, 'c'); upperbuffer.insert(0, 'C'); } else if (condition.compareToIgnoreCase("After_I") == 0) { strbuffer.insert(0, 'I'); lowerbuffer.insert(0, 'i'); String lang = ""; if (locale != null) { lang = locale.getLanguage(); } if (lang.equals("tr") || lang.equals("az")) { upperbuffer.insert(0, 'I'); } } } chstr[0] = strbuffer.toString(); chstr[1] = lowerbuffer.toString(); chstr[3] = upperbuffer.toString(); if (locale == null) { if (!UCharacter.toLowerCase(chstr[0]).equals(chstr[1])) { errln(s); errln("Fail: toLowerCase for character " + Utility.escape(chstr[0]) + ", expected " + Utility.escape(chstr[1]) + " but resulted in " + Utility.escape(UCharacter.toLowerCase(chstr[0]))); } if (!UCharacter.toUpperCase(chstr[0]).equals(chstr[3])) { errln(s); errln("Fail: toUpperCase for character " + Utility.escape(chstr[0]) + ", expected " + Utility.escape(chstr[3]) + " but resulted in " + Utility.escape(UCharacter.toUpperCase(chstr[0]))); } } else { if (!UCharacter.toLowerCase(locale, chstr[0]).equals( chstr[1])) { errln(s); errln("Fail: toLowerCase for character " + Utility.escape(chstr[0]) + ", expected " + Utility.escape(chstr[1]) + " but resulted in " + Utility.escape(UCharacter.toLowerCase(locale, chstr[0]))); } if (!UCharacter.toUpperCase(locale, chstr[0]).equals( chstr[3])) { errln(s); errln("Fail: toUpperCase for character " + Utility.escape(chstr[0]) + ", expected " + Utility.escape(chstr[3]) + " but resulted in " + Utility.escape(UCharacter.toUpperCase(locale, chstr[0]))); } } } input.close(); } catch (Exception e) { e.printStackTrace(); } } public void TestUpperLower() { int upper[] = {0x0041, 0x0042, 0x00b2, 0x01c4, 0x01c6, 0x01c9, 0x01c8, 0x01c9, 0x000c}; int lower[] = {0x0061, 0x0062, 0x00b2, 0x01c6, 0x01c6, 0x01c9, 0x01c9, 0x01c9, 0x000c}; String upperTest = "abcdefg123hij.?:klmno"; String lowerTest = "ABCDEFG123HIJ.?:KLMNO"; // Checks LetterLike Symbols which were previously a source of // confusion [Bertrand A. D. 02/04/98] for (int i = 0x2100; i < 0x2138; i ++) { /* Unicode 5.0 adds lowercase U+214E (TURNED SMALL F) to U+2132 (TURNED CAPITAL F) */ if (i != 0x2126 && i != 0x212a && i != 0x212b && i!=0x2132) { if (i != UCharacter.toLowerCase(i)) { // itself errln("Failed case conversion with itself: \\u" + Utility.hex(i, 4)); } if (i != UCharacter.toUpperCase(i)) { errln("Failed case conversion with itself: \\u" + Utility.hex(i, 4)); } } } for (int i = 0; i < upper.length; i ++) { if (UCharacter.toLowerCase(upper[i]) != lower[i]) { errln("FAILED UCharacter.tolower() for \\u" + Utility.hex(upper[i], 4) + " Expected \\u" + Utility.hex(lower[i], 4) + " Got \\u" + Utility.hex(UCharacter.toLowerCase(upper[i]), 4)); } } logln("testing upper lower"); for (int i = 0; i < upperTest.length(); i ++) { logln("testing to upper to lower"); if (UCharacter.isLetter(upperTest.charAt(i)) && !UCharacter.isLowerCase(upperTest.charAt(i))) { errln("Failed isLowerCase test at \\u" + Utility.hex(upperTest.charAt(i), 4)); } else if (UCharacter.isLetter(lowerTest.charAt(i)) && !UCharacter.isUpperCase(lowerTest.charAt(i))) { errln("Failed isUpperCase test at \\u" + Utility.hex(lowerTest.charAt(i), 4)); } else if (upperTest.charAt(i) != UCharacter.toLowerCase(lowerTest.charAt(i))) { errln("Failed case conversion from \\u" + Utility.hex(lowerTest.charAt(i), 4) + " To \\u" + Utility.hex(upperTest.charAt(i), 4)); } else if (lowerTest.charAt(i) != UCharacter.toUpperCase(upperTest.charAt(i))) { errln("Failed case conversion : \\u" + Utility.hex(upperTest.charAt(i), 4) + " To \\u" + Utility.hex(lowerTest.charAt(i), 4)); } else if (upperTest.charAt(i) != UCharacter.toLowerCase(upperTest.charAt(i))) { errln("Failed case conversion with itself: \\u" + Utility.hex(upperTest.charAt(i))); } else if (lowerTest.charAt(i) != UCharacter.toUpperCase(lowerTest.charAt(i))) { errln("Failed case conversion with itself: \\u" + Utility.hex(lowerTest.charAt(i))); } } logln("done testing upper Lower"); } // private data members - test data -------------------------------------- private static final Locale TURKISH_LOCALE_ = new Locale("tr", "TR"); private static final Locale GERMAN_LOCALE_ = new Locale("de", "DE"); private static final Locale GREEK_LOCALE_ = new Locale("el", "GR"); private static final Locale ENGLISH_LOCALE_ = new Locale("en", "US"); private static final Locale LITHUANIAN_LOCALE_ = new Locale("lt", "LT"); private static final int CHARACTER_UPPER_[] = {0x41, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, 0x00b1, 0x00b2, 0xb3, 0x0048, 0x0049, 0x004a, 0x002e, 0x003f, 0x003a, 0x004b, 0x004c, 0x4d, 0x004e, 0x004f, 0x01c4, 0x01c8, 0x000c, 0x0000}; private static final int CHARACTER_LOWER_[] = {0x61, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x00b1, 0x00b2, 0xb3, 0x0068, 0x0069, 0x006a, 0x002e, 0x003f, 0x003a, 0x006b, 0x006c, 0x6d, 0x006e, 0x006f, 0x01c6, 0x01c9, 0x000c, 0x0000}; /* * CaseFolding.txt says about i and its cousins: * 0049; C; 0069; # LATIN CAPITAL LETTER I * 0049; T; 0131; # LATIN CAPITAL LETTER I * * 0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE * 0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE * That's all. * See CaseFolding.txt and the Unicode Standard for how to apply the case foldings. */ private static final int FOLDING_SIMPLE_[] = { // input, default, exclude special i 0x61, 0x61, 0x61, 0x49, 0x69, 0x131, 0x130, 0x130, 0x69, 0x131, 0x131, 0x131, 0xdf, 0xdf, 0xdf, 0xfb03, 0xfb03, 0xfb03, 0x1040e,0x10436,0x10436, 0x5ffff,0x5ffff,0x5ffff }; private static final String FOLDING_MIXED_[] = {"\u0061\u0042\u0130\u0049\u0131\u03d0\u00df\ufb03\ud93f\udfff", "A\u00df\u00b5\ufb03\uD801\uDC0C\u0130\u0131"}; private static final String FOLDING_DEFAULT_[] = {"\u0061\u0062\u0069\u0307\u0069\u0131\u03b2\u0073\u0073\u0066\u0066\u0069\ud93f\udfff", "ass\u03bcffi\uD801\uDC34i\u0307\u0131"}; private static final String FOLDING_EXCLUDE_SPECIAL_I_[] = {"\u0061\u0062\u0069\u0131\u0131\u03b2\u0073\u0073\u0066\u0066\u0069\ud93f\udfff", "ass\u03bcffi\uD801\uDC34i\u0131"}; /** * "IESUS CHRISTOS" */ private static final String SHARED_UPPERCASE_GREEK_ = "\u0399\u0395\u03a3\u03a5\u03a3\u0020\u03a7\u03a1\u0399\u03a3\u03a4\u039f\u03a3"; /** * "iesus christos" */ private static final String SHARED_LOWERCASE_GREEK_ = "\u03b9\u03b5\u03c3\u03c5\u03c2\u0020\u03c7\u03c1\u03b9\u03c3\u03c4\u03bf\u03c2"; private static final String SHARED_LOWERCASE_TURKISH_ = "\u0069\u0073\u0074\u0061\u006e\u0062\u0075\u006c\u002c\u0020\u006e\u006f\u0074\u0020\u0063\u006f\u006e\u0073\u0074\u0061\u006e\u0074\u0131\u006e\u006f\u0070\u006c\u0065\u0021"; private static final String SHARED_UPPERCASE_TURKISH_ = "\u0054\u004f\u0050\u004b\u0041\u0050\u0049\u0020\u0050\u0041\u004c\u0041\u0043\u0045\u002c\u0020\u0130\u0053\u0054\u0041\u004e\u0042\u0055\u004c"; private static final String SHARED_UPPERCASE_ISTANBUL_ = "\u0130STANBUL, NOT CONSTANTINOPLE!"; private static final String SHARED_LOWERCASE_ISTANBUL_ = "i\u0307stanbul, not constantinople!"; private static final String SHARED_LOWERCASE_TOPKAP_ = "topkap\u0131 palace, istanbul"; private static final String SHARED_UPPERCASE_TOPKAP_ = "TOPKAPI PALACE, ISTANBUL"; private static final String SHARED_LOWERCASE_GERMAN_ = "S\u00FC\u00DFmayrstra\u00DFe"; private static final String SHARED_UPPERCASE_GERMAN_ = "S\u00DCSSMAYRSTRASSE"; private static final String UPPER_BEFORE_ = "\u0061\u0042\u0069\u03c2\u00df\u03c3\u002f\ufb03\ufb03\ufb03\ud93f\udfff"; private static final String UPPER_ROOT_ = "\u0041\u0042\u0049\u03a3\u0053\u0053\u03a3\u002f\u0046\u0046\u0049\u0046\u0046\u0049\u0046\u0046\u0049\ud93f\udfff"; private static final String UPPER_TURKISH_ = "\u0041\u0042\u0130\u03a3\u0053\u0053\u03a3\u002f\u0046\u0046\u0049\u0046\u0046\u0049\u0046\u0046\u0049\ud93f\udfff"; private static final String UPPER_MINI_ = "\u00df\u0061"; private static final String UPPER_MINI_UPPER_ = "\u0053\u0053\u0041"; private static final String LOWER_BEFORE_ = "\u0061\u0042\u0049\u03a3\u00df\u03a3\u002f\ud93f\udfff"; private static final String LOWER_ROOT_ = "\u0061\u0062\u0069\u03c3\u00df\u03c2\u002f\ud93f\udfff"; private static final String LOWER_TURKISH_ = "\u0061\u0062\u0131\u03c3\u00df\u03c2\u002f\ud93f\udfff"; /** * each item is an array with input string, result string, locale ID, break iterator, options * the break iterator is specified as an int, same as in BreakIterator.KIND_*: * 0=KIND_CHARACTER 1=KIND_WORD 2=KIND_LINE 3=KIND_SENTENCE 4=KIND_TITLE -1=default (NULL=words) -2=no breaks (.*) * options: T=U_FOLD_CASE_EXCLUDE_SPECIAL_I L=U_TITLECASE_NO_LOWERCASE A=U_TITLECASE_NO_BREAK_ADJUSTMENT * see ICU4C source/test/testdata/casing.txt */ private static final String TITLE_DATA_[] = { "\u0061\u0042\u0020\u0069\u03c2\u0020\u00df\u03c3\u002f\ufb03\ud93f\udfff", "\u0041\u0042\u0020\u0049\u03a3\u0020\u0053\u0073\u03a3\u002f\u0046\u0066\u0069\ud93f\udfff", "", "0", "", "\u0061\u0042\u0020\u0069\u03c2\u0020\u00df\u03c3\u002f\ufb03\ud93f\udfff", "\u0041\u0062\u0020\u0049\u03c2\u0020\u0053\u0073\u03c3\u002f\u0046\u0066\u0069\ud93f\udfff", "", "1", "", "\u02bbaMeLikA huI P\u016b \u02bb\u02bb\u02bbiA", "\u02bbAmelika Hui P\u016b \u02bb\u02bb\u02bbIa", // titlecase first _cased_ letter, j4933 "", "-1", "", " tHe QUIcK bRoWn", " The Quick Brown", "", "4", "", "\u01c4\u01c5\u01c6\u01c7\u01c8\u01c9\u01ca\u01cb\u01cc", "\u01c5\u01c5\u01c5\u01c8\u01c8\u01c8\u01cb\u01cb\u01cb", // UBRK_CHARACTER "", "0", "", "\u01c9ubav ljubav", "\u01c8ubav Ljubav", // Lj vs. L+j "", "-1", "", "'oH dOn'T tItLeCaSe AfTeR lEtTeR+'", "'Oh Don't Titlecase After Letter+'", "", "-1", "", "a \u02bbCaT. A \u02bbdOg! \u02bbeTc.", "A \u02bbCat. A \u02bbDog! \u02bbEtc.", "", "-1", "", // default "a \u02bbCaT. A \u02bbdOg! \u02bbeTc.", "A \u02bbcat. A \u02bbdog! \u02bbetc.", "", "-1", "A", // U_TITLECASE_NO_BREAK_ADJUSTMENT "a \u02bbCaT. A \u02bbdOg! \u02bbeTc.", "A \u02bbCaT. A \u02bbdOg! \u02bbETc.", "", "3", "L", // UBRK_SENTENCE and U_TITLECASE_NO_LOWERCASE "\u02bbcAt! \u02bbeTc.", "\u02bbCat! \u02bbetc.", "", "-2", "", // -2=Trivial break iterator "\u02bbcAt! \u02bbeTc.", "\u02bbcat! \u02bbetc.", "", "-2", "A", // U_TITLECASE_NO_BREAK_ADJUSTMENT "\u02bbcAt! \u02bbeTc.", "\u02bbCAt! \u02bbeTc.", "", "-2", "L", // U_TITLECASE_NO_LOWERCASE "\u02bbcAt! \u02bbeTc.", "\u02bbcAt! \u02bbeTc.", "", "-2", "AL" // Both options }; /** *

basic string, lower string, upper string, title string

*/ private static final String SPECIAL_DATA_[] = { UTF16.valueOf(0x1043C) + UTF16.valueOf(0x10414), UTF16.valueOf(0x1043C) + UTF16.valueOf(0x1043C), UTF16.valueOf(0x10414) + UTF16.valueOf(0x10414), "ab'cD \uFB00i\u0131I\u0130 \u01C7\u01C8\u01C9 " + UTF16.valueOf(0x1043C) + UTF16.valueOf(0x10414), "ab'cd \uFB00i\u0131ii\u0307 \u01C9\u01C9\u01C9 " + UTF16.valueOf(0x1043C) + UTF16.valueOf(0x1043C), "AB'CD FFIII\u0130 \u01C7\u01C7\u01C7 " + UTF16.valueOf(0x10414) + UTF16.valueOf(0x10414), // sigmas followed/preceded by cased letters "i\u0307\u03a3\u0308j \u0307\u03a3\u0308j i\u00ad\u03a3\u0308 \u0307\u03a3\u0308 ", "i\u0307\u03c3\u0308j \u0307\u03c3\u0308j i\u00ad\u03c2\u0308 \u0307\u03c3\u0308 ", "I\u0307\u03a3\u0308J \u0307\u03a3\u0308J I\u00ad\u03a3\u0308 \u0307\u03a3\u0308 " }; private static final Locale SPECIAL_LOCALES_[] = { null, ENGLISH_LOCALE_, null, }; private static final String SPECIAL_DOTTED_ = "I \u0130 I\u0307 I\u0327\u0307 I\u0301\u0307 I\u0327\u0307\u0301"; private static final String SPECIAL_DOTTED_LOWER_TURKISH_ = "\u0131 i i i\u0327 \u0131\u0301\u0307 i\u0327\u0301"; private static final String SPECIAL_DOTTED_LOWER_GERMAN_ = "i i\u0307 i\u0307 i\u0327\u0307 i\u0301\u0307 i\u0327\u0307\u0301"; private static final String SPECIAL_DOT_ABOVE_ = "a\u0307 \u0307 i\u0307 j\u0327\u0307 j\u0301\u0307"; private static final String SPECIAL_DOT_ABOVE_UPPER_LITHUANIAN_ = "A\u0307 \u0307 I J\u0327 J\u0301\u0307"; private static final String SPECIAL_DOT_ABOVE_UPPER_GERMAN_ = "A\u0307 \u0307 I\u0307 J\u0327\u0307 J\u0301\u0307"; private static final String SPECIAL_DOT_ABOVE_UPPER_ = "I I\u0301 J J\u0301 \u012e \u012e\u0301 \u00cc\u00cd\u0128"; private static final String SPECIAL_DOT_ABOVE_LOWER_LITHUANIAN_ = "i i\u0307\u0301 j j\u0307\u0301 \u012f \u012f\u0307\u0301 i\u0307\u0300i\u0307\u0301i\u0307\u0303"; private static final String SPECIAL_DOT_ABOVE_LOWER_GERMAN_ = "i i\u0301 j j\u0301 \u012f \u012f\u0301 \u00ec\u00ed\u0129"; // private methods ------------------------------------------------------- /** * Converting the hex numbers represented betwee n ';' to Unicode strings * @param str string to break up into Unicode strings * @return array of Unicode strings ending with a null */ private String[] getUnicodeStrings(String str) { Vector v = new Vector(10); int start = 0; for (int casecount = 4; casecount > 0; casecount --) { int end = str.indexOf("; ", start); String casestr = str.substring(start, end); StringBuffer buffer = new StringBuffer(); int spaceoffset = 0; while (spaceoffset < casestr.length()) { int nextspace = casestr.indexOf(' ', spaceoffset); if (nextspace == -1) { nextspace = casestr.length(); } buffer.append((char)Integer.parseInt( casestr.substring(spaceoffset, nextspace), 16)); spaceoffset = nextspace + 1; } start = end + 2; v.add(buffer.toString()); } int comments = str.indexOf(" #", start); if (comments != -1 && comments != start) { if (str.charAt(comments - 1) == ';') { comments --; } String conditions = str.substring(start, comments); int offset = 0; while (offset < conditions.length()) { int spaceoffset = conditions.indexOf(' ', offset); if (spaceoffset == -1) { spaceoffset = conditions.length(); } v.add(conditions.substring(offset, spaceoffset)); offset = spaceoffset + 1; } } int size = v.size(); String result[] = new String[size]; for (int i = 0; i < size; i ++) { result[i] = (String)v.elementAt(i); } return result; } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/TestUScriptRun.java0000644000175000017500000004005711361046222023471 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1999-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.lang.UScript; import com.ibm.icu.lang.UScriptRun; import com.ibm.icu.dev.test.TestFmwk; public class TestUScriptRun extends TestFmwk { public TestUScriptRun() { // nothing } public static void main(String[] args) throws Exception { new TestUScriptRun().run(args); } private static final class RunTestData { String runText; int runScript; public RunTestData(String theText, int theScriptCode) { runText = theText; runScript = theScriptCode; } } private static final RunTestData[][] m_testData = { { new RunTestData("\u0020\u0946\u0939\u093F\u0928\u094D\u0926\u0940\u0020", UScript.DEVANAGARI), new RunTestData("\u0627\u0644\u0639\u0631\u0628\u064A\u0629\u0020", UScript.ARABIC), new RunTestData("\u0420\u0443\u0441\u0441\u043A\u0438\u0439\u0020", UScript.CYRILLIC), new RunTestData("English (", UScript.LATIN), new RunTestData("\u0E44\u0E17\u0E22", UScript.THAI), new RunTestData(") ", UScript.LATIN), new RunTestData("\u6F22\u5B75", UScript.HAN), new RunTestData("\u3068\u3072\u3089\u304C\u306A\u3068", UScript.HIRAGANA), new RunTestData("\u30AB\u30BF\u30AB\u30CA", UScript.KATAKANA), new RunTestData("\uD801\uDC00\uD801\uDC01\uD801\uDC02\uD801\uDC03", UScript.DESERET), }, { new RunTestData("((((((((((abc))))))))))", UScript.LATIN) } }; private static final String padding = "This string is used for padding..."; private void CheckScriptRuns(UScriptRun scriptRun, int[] runStarts, RunTestData[] testData) { int run, runStart, runLimit; int runScript; /* iterate over all the runs */ run = 0; while (scriptRun.next()) { runStart = scriptRun.getScriptStart(); runLimit = scriptRun.getScriptLimit(); runScript = scriptRun.getScriptCode(); if (runStart != runStarts[run]) { errln("Incorrect start offset for run " + run + ": expected " + runStarts[run] + ", got " + runStart); } if (runLimit != runStarts[run + 1]) { errln("Incorrect limit offset for run " + run + ": expected " + runStarts[run + 1] + ", got " + runLimit); } if (runScript != testData[run].runScript) { errln("Incorrect script for run " + run + ": expected \"" + UScript.getName(testData[run].runScript) + "\", got \"" + UScript.getName(runScript) + "\""); } run += 1; /* stop when we've seen all the runs we expect to see */ if (run >= testData.length) { break; } } /* Complain if we didn't see then number of runs we expected */ if (run != testData.length) { errln("Incorrect number of runs: expected " + testData.length + ", got " + run); } } public void TestContstruction() { UScriptRun scriptRun = null; char[] nullChars = null, dummyChars = {'d', 'u', 'm', 'm', 'y'}; String nullString = null, dummyString = new String(dummyChars); try { scriptRun = new UScriptRun(nullString, 0, 100); errln("new UScriptRun(nullString, 0, 100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullString, 100, 0); errln("new UScriptRun(nullString, 100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullString, 0, -100); errln("new UScriptRun(nullString, 0, -100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullString, -100, 0); errln("new UScriptRun(nullString, -100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullChars, 0, 100); errln("new UScriptRun(nullChars, 0, 100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullChars, 100, 0); errln("new UScriptRun(nullChars, 100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullChars, 0, -100); errln("new UScriptRun(nullChars, 0, -100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(nullChars, -100, 0); errln("new UScriptRun(nullChars, -100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyString, 0, 6); errln("new UScriptRun(dummyString, 0, 6) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyString, 6, 0); errln("new UScriptRun(dummy, 6, 0) did not produce an IllegalArgumentException!"); }catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyString, 0, -100); errln("new UScriptRun(dummyString, 0, -100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyString, -100, 0); errln("new UScriptRun(dummy, -100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyChars, 0, 6); errln("new UScriptRun(dummyChars, 0, 6) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyChars, 6, 0); errln("new UScriptRun(dummyChars, 6, 0) did not produce an IllegalArgumentException!"); }catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyChars, 0, -100); errln("new UScriptRun(dummyChars, 0, -100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } try { scriptRun = new UScriptRun(dummyChars, -100, 0); errln("new UScriptRun(dummy, -100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: UScriptRun failed as expected"); } if(scriptRun!=null){ errln("Did not get the expected Exception"); } } public void TestReset() { UScriptRun scriptRun = null; char[] dummy = {'d', 'u', 'm', 'm', 'y'}; try { scriptRun = new UScriptRun(); } catch (IllegalArgumentException iae) { errln("new UScriptRun() produced an IllegalArgumentException!"); } try { scriptRun.reset(0, 100); errln("scriptRun.reset(0, 100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(100, 0); errln("scriptRun.reset(100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(0, -100); errln("scriptRun.reset(0, -100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(-100, 0); errln("scriptRun.reset(-100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(dummy, 0, 6); errln("scriptRun.reset(dummy, 0, 6) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(dummy, 6, 0); errln("scriptRun.reset(dummy, 6, 0) did not produce an IllegalArgumentException!"); }catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(dummy, 0, -100); errln("scriptRun.reset(dummy, 0, -100) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(dummy, -100, 0); errln("scriptRun.reset(dummy, -100, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(dummy, 0, dummy.length); } catch (IllegalArgumentException iae) { errln("scriptRun.reset(dummy, 0, dummy.length) produced an IllegalArgumentException!"); } try { scriptRun.reset(0, 6); errln("scriptRun.reset(0, 6) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } try { scriptRun.reset(6, 0); errln("scriptRun.reset(6, 0) did not produce an IllegalArgumentException!"); } catch (IllegalArgumentException iae) { logln("PASS: scriptRun.reset failed as expected"); } } public void TestRuns() { for (int i = 0; i < m_testData.length; i += 1) { RunTestData[] test = m_testData[i]; int stringLimit = 0; int[] runStarts = new int[test.length + 1]; String testString = ""; UScriptRun scriptRun = null; /* * Fill in the test string and the runStarts array. */ for (int run = 0; run < test.length; run += 1) { runStarts[run] = stringLimit; stringLimit += test[run].runText.length(); testString += test[run].runText; } /* The limit of the last run */ runStarts[test.length] = stringLimit; try { scriptRun = new UScriptRun(testString); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("new UScriptRun(testString) produced an IllegalArgumentException!"); } try { scriptRun.reset(); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("scriptRun.reset() on a valid UScriptRun produced an IllegalArgumentException!"); } try { scriptRun = new UScriptRun(testString.toCharArray()); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("new UScriptRun(testString.toCharArray()) produced an IllegalArgumentException!"); } try { scriptRun.reset(); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("scriptRun.reset() on a valid UScriptRun produced an IllegalArgumentException!"); } try { scriptRun = new UScriptRun(); if (scriptRun.next()) { errln("scriptRun.next() on an empty UScriptRun returned true!"); } } catch (IllegalArgumentException iae) { errln("new UScriptRun() produced an IllegalArgumentException!"); } try { scriptRun.reset(testString, 0, testString.length()); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("scriptRun.reset(testString, 0, testString.length) produced an IllegalArgumentException!"); } try { scriptRun.reset(testString.toCharArray(), 0, testString.length()); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("scriptRun.reset(testString.toCharArray(), 0, testString.length) produced an IllegalArgumentException!"); } String paddedTestString = padding + testString + padding; int startOffset = padding.length(); int count = testString.length(); for (int run = 0; run < runStarts.length; run += 1) { runStarts[run] += startOffset; } try { scriptRun.reset(paddedTestString, startOffset, count); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("scriptRun.reset(paddedTestString, startOffset, count) produced an IllegalArgumentException!"); } try { scriptRun.reset(paddedTestString.toCharArray(), startOffset, count); CheckScriptRuns(scriptRun, runStarts, test); } catch (IllegalArgumentException iae) { errln("scriptRun.reset(paddedTestString.toCharArray(), startOffset, count) produced an IllegalArgumentException!"); } } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterSurrogateTest.java0000644000175000017500000004356511361046222025337 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2004-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.impl.Utility; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.UTF16; /** * Test JDK 1.5 cover APIs. */ public final class UCharacterSurrogateTest extends TestFmwk { public static void main(String[] args) { new UCharacterSurrogateTest().run(args); } public void TestUnicodeBlockForName() { String[] names = {"Latin-1 Supplement", "Optical Character Recognition", "CJK Unified Ideographs Extension A", "Supplemental Arrows-B", "Supplemental arrows b", "supp-lement-al arrowsb", "Supplementary Private Use Area-B", "supplementary_Private_Use_Area-b", "supplementary_PRIVATE_Use_Area_b"}; for (int i = 0; i < names.length; ++i) { try { UCharacter.UnicodeBlock b = UCharacter.UnicodeBlock .forName(names[i]); logln("found: " + b + " for name: " + names[i]); } catch (Exception e) { errln("could not find block for name: " + names[i]); break; } } } public void TestIsValidCodePoint() { if (UCharacter.isValidCodePoint(-1)) errln("-1"); if (!UCharacter.isValidCodePoint(0)) errln("0"); if (!UCharacter.isValidCodePoint(UCharacter.MAX_CODE_POINT)) errln("0x10ffff"); if (UCharacter.isValidCodePoint(UCharacter.MAX_CODE_POINT + 1)) errln("0x110000"); } public void TestIsSupplementaryCodePoint() { if (UCharacter.isSupplementaryCodePoint(-1)) errln("-1"); if (UCharacter.isSupplementaryCodePoint(0)) errln("0"); if (UCharacter .isSupplementaryCodePoint(UCharacter.MIN_SUPPLEMENTARY_CODE_POINT - 1)) errln("0xffff"); if (!UCharacter .isSupplementaryCodePoint(UCharacter.MIN_SUPPLEMENTARY_CODE_POINT)) errln("0x10000"); if (!UCharacter.isSupplementaryCodePoint(UCharacter.MAX_CODE_POINT)) errln("0x10ffff"); if (UCharacter.isSupplementaryCodePoint(UCharacter.MAX_CODE_POINT + 1)) errln("0x110000"); } public void TestIsHighSurrogate() { if (UCharacter .isHighSurrogate((char) (UCharacter.MIN_HIGH_SURROGATE - 1))) errln("0xd7ff"); if (!UCharacter.isHighSurrogate(UCharacter.MIN_HIGH_SURROGATE)) errln("0xd800"); if (!UCharacter.isHighSurrogate(UCharacter.MAX_HIGH_SURROGATE)) errln("0xdbff"); if (UCharacter .isHighSurrogate((char) (UCharacter.MAX_HIGH_SURROGATE + 1))) errln("0xdc00"); } public void TestIsLowSurrogate() { if (UCharacter .isLowSurrogate((char) (UCharacter.MIN_LOW_SURROGATE - 1))) errln("0xdbff"); if (!UCharacter.isLowSurrogate(UCharacter.MIN_LOW_SURROGATE)) errln("0xdc00"); if (!UCharacter.isLowSurrogate(UCharacter.MAX_LOW_SURROGATE)) errln("0xdfff"); if (UCharacter .isLowSurrogate((char) (UCharacter.MAX_LOW_SURROGATE + 1))) errln("0xe000"); } public void TestIsSurrogatePair() { if (UCharacter.isSurrogatePair( (char) (UCharacter.MIN_HIGH_SURROGATE - 1), UCharacter.MIN_LOW_SURROGATE)) errln("0xd7ff,0xdc00"); if (UCharacter.isSurrogatePair( (char) (UCharacter.MAX_HIGH_SURROGATE + 1), UCharacter.MIN_LOW_SURROGATE)) errln("0xd800,0xdc00"); if (UCharacter.isSurrogatePair(UCharacter.MIN_HIGH_SURROGATE, (char) (UCharacter.MIN_LOW_SURROGATE - 1))) errln("0xd800,0xdbff"); if (UCharacter.isSurrogatePair(UCharacter.MIN_HIGH_SURROGATE, (char) (UCharacter.MAX_LOW_SURROGATE + 1))) errln("0xd800,0xe000"); if (!UCharacter.isSurrogatePair(UCharacter.MIN_HIGH_SURROGATE, UCharacter.MIN_LOW_SURROGATE)) errln("0xd800,0xdc00"); } public void TestCharCount() { UCharacter.charCount(-1); UCharacter.charCount(UCharacter.MAX_CODE_POINT + 1); if (UCharacter.charCount(UCharacter.MIN_SUPPLEMENTARY_CODE_POINT - 1) != 1) errln("0xffff"); if (UCharacter.charCount(UCharacter.MIN_SUPPLEMENTARY_CODE_POINT) != 2) errln("0x010000"); } public void TestToCodePoint() { final char[] pairs = {(char) (UCharacter.MIN_HIGH_SURROGATE + 0), (char) (UCharacter.MIN_LOW_SURROGATE + 0), (char) (UCharacter.MIN_HIGH_SURROGATE + 1), (char) (UCharacter.MIN_LOW_SURROGATE + 1), (char) (UCharacter.MIN_HIGH_SURROGATE + 2), (char) (UCharacter.MIN_LOW_SURROGATE + 2), (char) (UCharacter.MAX_HIGH_SURROGATE - 2), (char) (UCharacter.MAX_LOW_SURROGATE - 2), (char) (UCharacter.MAX_HIGH_SURROGATE - 1), (char) (UCharacter.MAX_LOW_SURROGATE - 1), (char) (UCharacter.MAX_HIGH_SURROGATE - 0), (char) (UCharacter.MAX_LOW_SURROGATE - 0),}; for (int i = 0; i < pairs.length; i += 2) { int cp = UCharacter.toCodePoint(pairs[i], pairs[i + 1]); if (pairs[i] != UTF16.getLeadSurrogate(cp) || pairs[i + 1] != UTF16.getTrailSurrogate(cp)) { errln(Integer.toHexString(pairs[i]) + ", " + pairs[i + 1]); break; } } } public void TestCodePointAtBefore() { String s = "" + UCharacter.MIN_HIGH_SURROGATE + // isolated high UCharacter.MIN_HIGH_SURROGATE + // pair UCharacter.MIN_LOW_SURROGATE + UCharacter.MIN_LOW_SURROGATE; // isolated // low char[] c = s.toCharArray(); int[] avalues = { UCharacter.MIN_HIGH_SURROGATE, UCharacter.toCodePoint(UCharacter.MIN_HIGH_SURROGATE, UCharacter.MIN_LOW_SURROGATE), UCharacter.MIN_LOW_SURROGATE, UCharacter.MIN_LOW_SURROGATE}; int[] bvalues = { UCharacter.MIN_HIGH_SURROGATE, UCharacter.MIN_HIGH_SURROGATE, UCharacter.toCodePoint(UCharacter.MIN_HIGH_SURROGATE, UCharacter.MIN_LOW_SURROGATE), UCharacter.MIN_LOW_SURROGATE,}; StringBuffer b = new StringBuffer(s); for (int i = 0; i < avalues.length; ++i) { if (UCharacter.codePointAt(s, i) != avalues[i]) errln("string at: " + i); if (UCharacter.codePointAt(c, i) != avalues[i]) errln("chars at: " + i); if (UCharacter.codePointAt(b, i) != avalues[i]) errln("stringbuffer at: " + i); if (UCharacter.codePointBefore(s, i + 1) != bvalues[i]) errln("string before: " + i); if (UCharacter.codePointBefore(c, i + 1) != bvalues[i]) errln("chars before: " + i); if (UCharacter.codePointBefore(b, i + 1) != bvalues[i]) errln("stringbuffer before: " + i); } //cover codePointAtBefore with limit logln("Testing codePointAtBefore with limit ..."); for (int i = 0; i < avalues.length; ++i) { if (UCharacter.codePointAt(c, i, 4) != avalues[i]) errln("chars at: " + i); if (UCharacter.codePointBefore(c, i + 1, 0) != bvalues[i]) errln("chars before: " + i); } } public void TestToChars() { char[] chars = new char[3]; int cp = UCharacter.toCodePoint(UCharacter.MIN_HIGH_SURROGATE, UCharacter.MIN_LOW_SURROGATE); UCharacter.toChars(cp, chars, 1); if (chars[1] != UCharacter.MIN_HIGH_SURROGATE || chars[2] != UCharacter.MIN_LOW_SURROGATE) { errln("fail"); } chars = UCharacter.toChars(cp); if (chars[0] != UCharacter.MIN_HIGH_SURROGATE || chars[1] != UCharacter.MIN_LOW_SURROGATE) { errln("fail"); } } public void TestCodePointCount() { class Test { String str(String s, int start, int limit) { if(s==null){ s=""; } return "codePointCount('" + Utility.escape(s) + "' " + start + ", " + limit + ")"; } void test(String s, int start, int limit, int expected) { int val1 = UCharacter.codePointCount(s.toCharArray(), start, limit); int val2 = UCharacter.codePointCount(s, start, limit); if (val1 != expected) { errln("char[] " + str(s, start, limit) + "(" + val1 + ") != " + expected); } else if (val2 != expected) { errln("String " + str(s, start, limit) + "(" + val2 + ") != " + expected); } else if (isVerbose()) { logln(str(s, start, limit) + " == " + expected); } } void fail(String s, int start, int limit, Class exc) { try { UCharacter.codePointCount(s, start, limit); errln("unexpected success " + str(s, start, limit)); } catch (Throwable e) { if (!exc.isInstance(e)) { warnln("bad exception " + str(s, start, limit) + e.getClass().getName()); } } } } Test test = new Test(); test.fail(null, 0, 1, NullPointerException.class); test.fail("a", -1, 0, IndexOutOfBoundsException.class); test.fail("a", 1, 2, IndexOutOfBoundsException.class); test.fail("a", 1, 0, IndexOutOfBoundsException.class); test.test("", 0, 0, 0); test.test("\ud800", 0, 1, 1); test.test("\udc00", 0, 1, 1); test.test("\ud800\udc00", 0, 1, 1); test.test("\ud800\udc00", 1, 2, 1); test.test("\ud800\udc00", 0, 2, 1); test.test("\udc00\ud800", 0, 1, 1); test.test("\udc00\ud800", 1, 2, 1); test.test("\udc00\ud800", 0, 2, 2); test.test("\ud800\ud800\udc00", 0, 2, 2); test.test("\ud800\ud800\udc00", 1, 3, 1); test.test("\ud800\ud800\udc00", 0, 3, 2); test.test("\ud800\udc00\udc00", 0, 2, 1); test.test("\ud800\udc00\udc00", 1, 3, 2); test.test("\ud800\udc00\udc00", 0, 3, 2); } public void TestOffsetByCodePoints() { class Test { String str(String s, int start, int count, int index, int offset) { return "offsetByCodePoints('" + Utility.escape(s) + "' " + start + ", " + count + ", " + index + ", " + offset + ")"; } void test(String s, int start, int count, int index, int offset, int expected, boolean flip) { char[] chars = s.toCharArray(); String string = s.substring(start, start + count); int val1 = UCharacter.offsetByCodePoints(chars, start, count, index, offset); int val2 = UCharacter.offsetByCodePoints(string, index - start, offset) + start; if (val1 != expected) { errln("char[] " + str(s, start, count, index, offset) + "(" + val1 + ") != " + expected); } else if (val2 != expected) { errln("String " + str(s, start, count, index, offset) + "(" + val2 + ") != " + expected); } else if (isVerbose()) { logln(str(s, start, count, index, offset) + " == " + expected); } if (flip) { val1 = UCharacter.offsetByCodePoints(chars, start, count, expected, -offset); val2 = UCharacter.offsetByCodePoints(string, expected - start, -offset) + start; if (val1 != index) { errln("char[] " + str(s, start, count, expected, -offset) + "(" + val1 + ") != " + index); } else if (val2 != index) { errln("String " + str(s, start, count, expected, -offset) + "(" + val2 + ") != " + index); } else if (isVerbose()) { logln(str(s, start, count, expected, -offset) + " == " + index); } } } void fail(char[] text, int start, int count, int index, int offset, Class exc) { try { UCharacter.offsetByCodePoints(text, start, count, index, offset); errln("unexpected success " + str(new String(text), start, count, index, offset)); } catch (Throwable e) { if (!exc.isInstance(e)) { errln("bad exception " + str(new String(text), start, count, index, offset) + e.getClass().getName()); } } } void fail(String text, int index, int offset, Class exc) { try { UCharacter.offsetByCodePoints(text, index, offset); errln("unexpected success " + str(text, index, offset, 0, text.length())); } catch (Throwable e) { if (!exc.isInstance(e)) { errln("bad exception " + str(text, 0, text.length(), index, offset) + e.getClass().getName()); } } } } Test test = new Test(); test.test("\ud800\ud800\udc00", 0, 2, 0, 1, 1, true); test.fail((char[]) null, 0, 1, 0, 1, NullPointerException.class); test.fail((String) null, 0, 1, NullPointerException.class); test.fail("abc", -1, 0, IndexOutOfBoundsException.class); test.fail("abc", 4, 0, IndexOutOfBoundsException.class); test.fail("abc", 1, -2, IndexOutOfBoundsException.class); test.fail("abc", 2, 2, IndexOutOfBoundsException.class); char[] abc = "abc".toCharArray(); test.fail(abc, -1, 2, 0, 0, IndexOutOfBoundsException.class); test.fail(abc, 2, 2, 3, 0, IndexOutOfBoundsException.class); test.fail(abc, 1, -1, 0, 0, IndexOutOfBoundsException.class); test.fail(abc, 1, 1, 2, -2, IndexOutOfBoundsException.class); test.fail(abc, 1, 1, 1, 2, IndexOutOfBoundsException.class); test.fail(abc, 1, 2, 1, 3, IndexOutOfBoundsException.class); test.fail(abc, 0, 2, 2, -3, IndexOutOfBoundsException.class); test.test("", 0, 0, 0, 0, 0, false); test.test("\ud800", 0, 1, 0, 1, 1, true); test.test("\udc00", 0, 1, 0, 1, 1, true); String s = "\ud800\udc00"; test.test(s, 0, 1, 0, 1, 1, true); test.test(s, 0, 2, 0, 1, 2, true); test.test(s, 0, 2, 1, 1, 2, false); test.test(s, 1, 1, 1, 1, 2, true); s = "\udc00\ud800"; test.test(s, 0, 1, 0, 1, 1, true); test.test(s, 0, 2, 0, 1, 1, true); test.test(s, 0, 2, 0, 2, 2, true); test.test(s, 0, 2, 1, 1, 2, true); test.test(s, 1, 1, 1, 1, 2, true); s = "\ud800\ud800\udc00"; test.test(s, 0, 1, 0, 1, 1, true); test.test(s, 0, 2, 0, 1, 1, true); test.test(s, 0, 2, 0, 2, 2, true); test.test(s, 0, 2, 1, 1, 2, true); test.test(s, 0, 3, 0, 1, 1, true); test.test(s, 0, 3, 0, 2, 3, true); test.test(s, 0, 3, 1, 1, 3, true); test.test(s, 0, 3, 2, 1, 3, false); test.test(s, 1, 1, 1, 1, 2, true); test.test(s, 1, 2, 1, 1, 3, true); test.test(s, 1, 2, 2, 1, 3, false); test.test(s, 2, 1, 2, 1, 3, true); s = "\ud800\udc00\udc00"; test.test(s, 0, 1, 0, 1, 1, true); test.test(s, 0, 2, 0, 1, 2, true); test.test(s, 0, 2, 1, 1, 2, false); test.test(s, 0, 3, 0, 1, 2, true); test.test(s, 0, 3, 0, 2, 3, true); test.test(s, 0, 3, 1, 1, 2, false); test.test(s, 0, 3, 1, 2, 3, false); test.test(s, 0, 3, 2, 1, 3, true); test.test(s, 1, 1, 1, 1, 2, true); test.test(s, 1, 2, 1, 1, 2, true); test.test(s, 1, 2, 1, 2, 3, true); test.test(s, 1, 2, 2, 1, 3, true); test.test(s, 2, 1, 2, 1, 3, true); } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UPropertyAliasesTest.java0000644000175000017500000001234011361046222024660 0ustar twernertwerner/* ********************************************************************** * Copyright (c) 2002-2007, International Business Machines * Corporation and others. All Rights Reserved. ********************************************************************** * Author: Alan Liu * Created: November 5 2002 * Since: ICU 2.4 ********************************************************************** */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.lang.*; import com.ibm.icu.dev.test.TestFmwk; public class UPropertyAliasesTest extends TestFmwk { public UPropertyAliasesTest() {} public static void main(String[] args) throws Exception { new UPropertyAliasesTest().run(args); } /** * Test the property names and property value names API. */ public void TestPropertyNames() { int p, v, choice, rev; for (p=0; ; ++p) { boolean sawProp = false; for (choice=0; ; ++choice) { String name = null; try { name = UCharacter.getPropertyName(p, choice); if (!sawProp) log("prop " + p + ":"); String n = (name != null) ? ("\"" + name + '"') : "null"; log(" " + choice + "=" + n); sawProp = true; } catch (IllegalArgumentException e) { if (choice > 0) break; } if (name != null) { /* test reverse mapping */ rev = UCharacter.getPropertyEnum(name); if (rev != p) { errln("Property round-trip failure: " + p + " -> " + name + " -> " + rev); } } } if (sawProp) { /* looks like a valid property; check the values */ String pname = UCharacter.getPropertyName(p, UProperty.NameChoice.LONG); int max = 0; if (p == UProperty.CANONICAL_COMBINING_CLASS) { max = 255; } else if (p == UProperty.GENERAL_CATEGORY_MASK) { /* it's far too slow to iterate all the way up to the real max, U_GC_P_MASK */ max = 0x1000; // U_GC_NL_MASK; } else if (p == UProperty.BLOCK) { /* UBlockCodes, unlike other values, start at 1 */ max = 1; } logln(""); for (v=-1; ; ++v) { boolean sawValue = false; for (choice=0; ; ++choice) { String vname = null; try { vname = UCharacter.getPropertyValueName(p, v, choice); String n = (vname != null) ? ("\"" + vname + '"') : "null"; if (!sawValue) log(" " + pname + ", value " + v + ":"); log(" " + choice + "=" + n); sawValue = true; } catch (IllegalArgumentException e) { if (choice>0) break; } if (vname != null) { /* test reverse mapping */ rev = UCharacter.getPropertyValueEnum(p, vname); if (rev != v) { errln("Value round-trip failure (" + pname + "): " + v + " -> " + vname + " -> " + rev); } } } if (sawValue) { logln(""); } if (!sawValue && v>=max) break; } } if (!sawProp) { if (p>=UProperty.STRING_LIMIT) { break; } else if (p>=UProperty.DOUBLE_LIMIT) { p = UProperty.STRING_START - 1; } else if (p>=UProperty.MASK_LIMIT) { p = UProperty.DOUBLE_START - 1; } else if (p>=UProperty.INT_LIMIT) { p = UProperty.MASK_START - 1; } else if (p>=UProperty.BINARY_LIMIT) { p = UProperty.INT_START - 1; } } } int i = UCharacter.getIntPropertyMinValue( UProperty.CANONICAL_COMBINING_CLASS); try { for (; i <= UCharacter.getIntPropertyMaxValue( UProperty.CANONICAL_COMBINING_CLASS); i ++) { UCharacter.getPropertyValueName( UProperty.CANONICAL_COMBINING_CLASS, i, UProperty.NameChoice.LONG); } } catch (IllegalArgumentException e) { errln("0x" + Integer.toHexString(i) + " should have a null property value name"); } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterThreadTest.java0000644000175000017500000000523411361046222024562 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import java.util.LinkedList; import java.util.List; import java.util.ListIterator; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.lang.UCharacter; /** * @author aheninger * */ public class UCharacterThreadTest extends TestFmwk { // constructor ----------------------------------------------------------- /** * Private constructor to prevent initialisation */ public UCharacterThreadTest() { } // public methods -------------------------------------------------------- public static void main(String[] arg) { try { UCharacterThreadTest test = new UCharacterThreadTest(); test.run(arg); } catch (Exception e) { e.printStackTrace(); } } // // Test multi-threaded parallel calls to UCharacter.getName(codePoint) // Regression test for ticket 6264. // public void TestUCharactersGetName() throws InterruptedException { List threads = new LinkedList(); for(int t=0; t<20; t++) { int codePoint = 47 + t; String correctName = UCharacter.getName(codePoint); GetNameThread thread = new GetNameThread(codePoint, correctName); thread.start(); threads.add(thread); } ListIterator i = threads.listIterator(); while (i.hasNext()) { GetNameThread thread = (GetNameThread)i.next(); thread.join(); if (!thread.correctName.equals(thread.actualName)) { errln("FAIL, expected \"" + thread.correctName + "\", got \"" + thread.actualName + "\""); } } } private static class GetNameThread extends Thread { private final int codePoint; private final String correctName; private String actualName; GetNameThread(int codePoint, String correctName) { this.codePoint = codePoint; this.correctName = correctName; } public void run() { for(int i=0; i<10000; i++) { actualName = UCharacter.getName(codePoint); if (!correctName.equals(actualName)) { break; } } } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/TestAll.java0000644000175000017500000000170111361046222022114 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 1996-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk.TestGroup; /** * Top level test used to run character property tests. */ public class TestAll extends TestGroup { public static void main(String[] args) throws Exception { new TestAll().run(args); } public TestAll() { super( new String[] { "TestCharacter", "TestUScript", "TestUScriptRun" }, "Character and Script Tests"); } public static final String CLASS_TARGET_NAME = "Property"; } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterDirectionTest.java0000644000175000017500000000512411361046222025271 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 2001-2006, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.lang.UCharacterDirection; /** * Testing UCharacterDirection * @author Syn Wee Quek * @since July 22 2002 */ public class UCharacterDirectionTest extends TestFmwk { // constructor ----------------------------------------------------------- /** * Private constructor to prevent initialisation */ public UCharacterDirectionTest() { } // public methods -------------------------------------------------------- public static void main(String[] arg) { try { UCharacterDirectionTest test = new UCharacterDirectionTest(); test.run(arg); } catch (Exception e) { e.printStackTrace(); } } /** * Gets the name of the argument category * @returns category name */ public void TestToString() { String name[] = {"Left-to-Right", "Right-to-Left", "European Number", "European Number Separator", "European Number Terminator", "Arabic Number", "Common Number Separator", "Paragraph Separator", "Segment Separator", "Whitespace", "Other Neutrals", "Left-to-Right Embedding", "Left-to-Right Override", "Right-to-Left Arabic", "Right-to-Left Embedding", "Right-to-Left Override", "Pop Directional Format", "Non-Spacing Mark", "Boundary Neutral", "Unassigned"}; for (int i = UCharacterDirection.LEFT_TO_RIGHT; i < UCharacterDirection.CHAR_DIRECTION_COUNT; i ++) { if (!UCharacterDirection.toString(i).equals(name[i])) { errln("Error toString for direction " + i + " expected " + name[i]); } } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/UCharacterCompare.java0000644000175000017500000003211311361046222024075 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2006, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.lang.UCharacterCategory; import java.io.FileWriter; import java.io.PrintWriter; import java.util.Hashtable; import java.util.Enumeration; /** * A class to compare the difference in methods between java.lang.Character and * UCharacter * @author Syn Wee Quek * @since oct 06 2000 * @see com.ibm.icu.lang.UCharacter */ public final class UCharacterCompare { // private variables ================================================ private static Hashtable m_hashtable_ = new Hashtable(); // public methods ====================================================== /** * Main testing method */ public static void main(String arg[]) { System.out.println("Starting character compare"); try { FileWriter f; if (arg.length == 0) f = new FileWriter("compare.txt"); else f = new FileWriter(arg[0]); PrintWriter p = new PrintWriter(f); p.print("char character name "); p.println("method name ucharacter character"); for (char i = Character.MIN_VALUE; i < Character.MAX_VALUE; i ++) { System.out.println("character \\u" + Integer.toHexString(i)); if (UCharacter.isDefined(i) != Character.isDefined(i)) trackDifference(p, i, "isDefined()", "" + UCharacter.isDefined(i), "" + Character.isDefined(i)); else { if (UCharacter.digit(i, 10) != Character.digit(i, 10)) trackDifference(p, i, "digit()", "" + UCharacter.digit(i, 10), "" + Character.digit(i, 10)); if (UCharacter.getNumericValue(i) != Character.getNumericValue(i)) trackDifference(p, i, "getNumericValue()", "" + UCharacter.getNumericValue(i), "" + Character.getNumericValue(i)); if (!compareType(UCharacter.getType(i), Character.getType(i))) trackDifference(p, i, "getType()", "" + UCharacter.getType(i), "" + Character.getType(i)); if (UCharacter.isDigit(i) != Character.isDigit(i)) trackDifference(p, i, "isDigit()", "" + UCharacter.isDigit(i), "" + Character.isDigit(i)); if (UCharacter.isISOControl(i) != Character.isISOControl(i)) trackDifference(p, i, "isISOControl()", "" + UCharacter.isISOControl(i), "" + Character.isISOControl(i)); if (UCharacter.isLetter(i) != Character.isLetter(i)) trackDifference(p, i, "isLetter()", "" + UCharacter.isLetter(i), "" + Character.isLetter(i)); if (UCharacter.isLetterOrDigit(i) != Character.isLetterOrDigit(i)) trackDifference(p, i, "isLetterOrDigit()", "" + UCharacter.isLetterOrDigit(i), "" + Character.isLetterOrDigit(i)); if (UCharacter.isLowerCase(i) != Character.isLowerCase(i)) trackDifference(p, i, "isLowerCase()", "" + UCharacter.isLowerCase(i), "" + Character.isLowerCase(i)); if (UCharacter.isWhitespace(i) != Character.isWhitespace(i)) trackDifference(p, i, "isWhitespace()", "" + UCharacter.isWhitespace(i), "" + Character.isWhitespace(i)); if (UCharacter.isSpaceChar(i) != Character.isSpaceChar(i)) trackDifference(p, i, "isSpaceChar()", "" + UCharacter.isSpaceChar(i), "" + Character.isSpaceChar(i)); if (UCharacter.isTitleCase(i) != Character.isTitleCase(i)) trackDifference(p, i, "isTitleChar()", "" + UCharacter.isTitleCase(i), "" + Character.isTitleCase(i)); if (UCharacter.isUnicodeIdentifierPart(i) != Character.isUnicodeIdentifierPart(i)) trackDifference(p, i, "isUnicodeIdentifierPart()", "" + UCharacter.isUnicodeIdentifierPart(i), "" + Character.isUnicodeIdentifierPart(i)); if (UCharacter.isUnicodeIdentifierStart(i) != Character.isUnicodeIdentifierStart(i)) trackDifference(p, i, "isUnicodeIdentifierStart()", "" + UCharacter.isUnicodeIdentifierStart(i), "" + Character.isUnicodeIdentifierStart(i)); if (UCharacter.isIdentifierIgnorable(i) != Character.isIdentifierIgnorable(i)) trackDifference(p, i, "isIdentifierIgnorable()", "" + UCharacter.isIdentifierIgnorable(i), "" + Character.isIdentifierIgnorable(i)); if (UCharacter.isUpperCase(i) != Character.isUpperCase(i)) trackDifference(p, i, "isUpperCase()", "" + UCharacter.isUpperCase(i), "" + Character.isUpperCase(i)); if (UCharacter.toLowerCase(i) != Character.toLowerCase(i)) trackDifference(p, i, "toLowerCase()", Integer.toHexString(UCharacter.toLowerCase(i)), Integer.toHexString(Character.toLowerCase(i))); if (!UCharacter.toString(i).equals(new Character(i).toString())) trackDifference(p, i, "toString()", UCharacter.toString(i), new Character(i).toString()); if (UCharacter.toTitleCase(i) != Character.toTitleCase(i)) trackDifference(p, i, "toTitleCase()", Integer.toHexString(UCharacter.toTitleCase(i)), Integer.toHexString(Character.toTitleCase(i))); if (UCharacter.toUpperCase(i) != Character.toUpperCase(i)) trackDifference(p, i, "toUpperCase()", Integer.toHexString(UCharacter.toUpperCase(i)), Integer.toHexString(Character.toUpperCase(i))); } } summary(p); p.close(); } catch (Exception e) { e.printStackTrace(); } } // private methods =================================================== /** * Comparing types * @param uchartype UCharacter type * @param jchartype java.lang.Character type */ private static boolean compareType(int uchartype, int jchartype) { if (uchartype == UCharacterCategory.UNASSIGNED && jchartype == Character.UNASSIGNED) return true; if (uchartype == UCharacterCategory.UPPERCASE_LETTER && jchartype == Character.UPPERCASE_LETTER) return true; if (uchartype == UCharacterCategory.LOWERCASE_LETTER && jchartype == Character.LOWERCASE_LETTER) return true; if (uchartype == UCharacterCategory.TITLECASE_LETTER && jchartype == Character.TITLECASE_LETTER) return true; if (uchartype == UCharacterCategory.MODIFIER_LETTER && jchartype == Character.MODIFIER_LETTER) return true; if (uchartype == UCharacterCategory.OTHER_LETTER && jchartype == Character.OTHER_LETTER) return true; if (uchartype == UCharacterCategory.NON_SPACING_MARK && jchartype == Character.NON_SPACING_MARK) return true; if (uchartype == UCharacterCategory.ENCLOSING_MARK && jchartype == Character.ENCLOSING_MARK) return true; if (uchartype == UCharacterCategory.COMBINING_SPACING_MARK && jchartype == Character.COMBINING_SPACING_MARK) return true; if (uchartype == UCharacterCategory.DECIMAL_DIGIT_NUMBER && jchartype == Character.DECIMAL_DIGIT_NUMBER) return true; if (uchartype == UCharacterCategory.LETTER_NUMBER && jchartype == Character.LETTER_NUMBER) return true; if (uchartype == UCharacterCategory.OTHER_NUMBER && jchartype == Character.OTHER_NUMBER) return true; if (uchartype == UCharacterCategory.SPACE_SEPARATOR && jchartype == Character.SPACE_SEPARATOR) return true; if (uchartype == UCharacterCategory.LINE_SEPARATOR && jchartype == Character.LINE_SEPARATOR) return true; if (uchartype == UCharacterCategory.PARAGRAPH_SEPARATOR && jchartype == Character.PARAGRAPH_SEPARATOR) return true; if (uchartype == UCharacterCategory.CONTROL && jchartype == Character.CONTROL) return true; if (uchartype == UCharacterCategory.FORMAT && jchartype == Character.FORMAT) return true; if (uchartype == UCharacterCategory.PRIVATE_USE && jchartype == Character.PRIVATE_USE) return true; if (uchartype == UCharacterCategory.SURROGATE && jchartype == Character.SURROGATE) return true; if (uchartype == UCharacterCategory.DASH_PUNCTUATION && jchartype == Character.DASH_PUNCTUATION) return true; if (uchartype == UCharacterCategory.START_PUNCTUATION && jchartype == Character.START_PUNCTUATION) return true; if (uchartype == UCharacterCategory.END_PUNCTUATION && jchartype == Character.END_PUNCTUATION) return true; if (uchartype == UCharacterCategory.CONNECTOR_PUNCTUATION && jchartype == Character.CONNECTOR_PUNCTUATION) return true; if (uchartype == UCharacterCategory.OTHER_PUNCTUATION && jchartype == Character.OTHER_PUNCTUATION) return true; if (uchartype == UCharacterCategory.MATH_SYMBOL && jchartype == Character.MATH_SYMBOL) return true; if (uchartype == UCharacterCategory.CURRENCY_SYMBOL && jchartype == Character.CURRENCY_SYMBOL) return true; if (uchartype == UCharacterCategory.MODIFIER_SYMBOL && jchartype == Character.MODIFIER_SYMBOL) return true; if (uchartype == UCharacterCategory.OTHER_SYMBOL && jchartype == Character.OTHER_SYMBOL) return true; if (uchartype == UCharacterCategory.INITIAL_PUNCTUATION && jchartype == Character.START_PUNCTUATION) return true; if (uchartype == UCharacterCategory.FINAL_PUNCTUATION && jchartype == Character.END_PUNCTUATION) return true; /*if (uchartype == UCharacterCategory.GENERAL_OTHER_TYPES && jchartype == Character.GENERAL_OTHER_TYPES) return true;*/ return false; } /** * Difference writing to file * @param f file outputstream * @param ch code point * @param method for testing * @param ucharval UCharacter value after running method * @param charval Character value after running method */ private static void trackDifference(PrintWriter f, int ch, String method, String ucharval, String charval) throws Exception { if (m_hashtable_.containsKey(method)) { Integer value = (Integer)m_hashtable_.get(method); m_hashtable_.put(method, new Integer(value.intValue() + 1)); } else m_hashtable_.put(method, new Integer(1)); String temp = Integer.toHexString(ch); StringBuffer s = new StringBuffer(temp); for (int i = 0; i < 6 - temp.length(); i ++) s.append(' '); temp = UCharacter.getExtendedName(ch); if (temp == null) temp = " "; s.append(temp); for (int i = 0; i < 73 - temp.length(); i ++) s.append(' '); s.append(method); for (int i = 0; i < 27 - method.length(); i ++) s.append(' '); s.append(ucharval); for (int i = 0; i < 11 - ucharval.length(); i ++) s.append(' '); s.append(charval); f.println(s.toString()); } /** * Does up a summary of the differences * @param f file outputstream */ private static void summary(PrintWriter f) { f.println("=================================================="); f.println("Summary of differences"); for (Enumeration e = m_hashtable_.keys() ; e.hasMoreElements() ;) { StringBuffer method = new StringBuffer((String)e.nextElement()); int count = ((Integer)m_hashtable_.get(method.toString())).intValue(); for (int i = 30 - method.length(); i > 0; i --) method.append(' '); f.println(method + " " + count); } } } icu4j-4.2/src/com/ibm/icu/dev/test/lang/TestUScript.java0000644000175000017500000004041711361046222023004 0ustar twernertwerner/** ******************************************************************************* * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.lang; import com.ibm.icu.lang.UScript; import com.ibm.icu.util.ULocale; import com.ibm.icu.dev.test.TestFmwk; import java.util.Locale; public class TestUScript extends TestFmwk { /** * Constructor */ public TestUScript() { } public static void main(String[] args) throws Exception { new TestUScript().run(args); } public void TestLocaleGetCode(){ final ULocale[] testNames={ /* test locale */ new ULocale("en"), new ULocale("en_US"), new ULocale("sr"), new ULocale("ta") , new ULocale("te_IN"), new ULocale("hi"), new ULocale("he"), new ULocale("ar"), new ULocale("abcde"), new ULocale("abcde_cdef"), new ULocale("iw") }; final int[] expected ={ /* locales should return */ UScript.LATIN, UScript.LATIN, UScript.CYRILLIC, UScript.TAMIL, UScript.TELUGU,UScript.DEVANAGARI, UScript.HEBREW, UScript.ARABIC, UScript.INVALID_CODE,UScript.INVALID_CODE, UScript.HEBREW }; int i =0; int numErrors =0; for( ; i0) { // assume missing locale data, so not an error, just a warning if (isModularBuild() || noData()) { // if nodata is set don't even warn warnln("Could not find locale data"); } else { errln("encountered " + numErrors + " errors."); } } } public void TestMultipleCode(){ final String[] testNames = { "ja" ,"ko_KR","zh","zh_TW"}; final int[][] expected = { {UScript.KATAKANA,UScript.HIRAGANA,UScript.HAN}, {UScript.HANGUL, UScript.HAN}, {UScript.HAN}, {UScript.HAN,UScript.BOPOMOFO} }; int numErrors = 0; for(int i=0; i0 ){ warnln("encountered " + numErrors + " errors in UScript.getName()"); } } public void TestGetShortName(){ final int[] testCodes={ /* abbr should return */ UScript.HAN, UScript.HANGUL, UScript.HEBREW, UScript.HIRAGANA, UScript.KANNADA, UScript.KATAKANA, UScript.KHMER, UScript.LAO, UScript.LATIN, UScript.MALAYALAM, UScript.MONGOLIAN, }; final String[] expectedAbbr={ /* test abbr */ "Hani", "Hang","Hebr","Hira", "Knda","Kana","Khmr","Laoo", "Latn", "Mlym", "Mong", }; int i=0; int numErrors=0; while(i0 ){ warnln("encountered " + numErrors + " errors in UScript.getShortName()"); } } public void TestGetScript(){ int codepoints[][] = new int[][] { {0x0000FF9D, UScript.KATAKANA }, {0x0000FFBE, UScript.HANGUL }, {0x0000FFC7, UScript.HANGUL }, {0x0000FFCF, UScript.HANGUL }, {0x0000FFD7, UScript.HANGUL}, {0x0000FFDC, UScript.HANGUL}, {0x00010300, UScript.OLD_ITALIC}, {0x00010330, UScript.GOTHIC}, {0x0001034A, UScript.GOTHIC}, {0x00010400, UScript.DESERET}, {0x00010428, UScript.DESERET}, {0x0001D167, UScript.INHERITED}, {0x0001D17B, UScript.INHERITED}, {0x0001D185, UScript.INHERITED}, {0x0001D1AA, UScript.INHERITED}, {0x00020000, UScript.HAN}, {0x00000D02, UScript.MALAYALAM}, {0x00000D00, UScript.UNKNOWN}, {0x00000000, UScript.COMMON}, {0x0001D169, UScript.INHERITED }, {0x0001D182, UScript.INHERITED }, {0x0001D18B, UScript.INHERITED }, {0x0001D1AD, UScript.INHERITED }, }; int i =0; int code = UScript.INVALID_CODE; boolean passed = true; while(i< codepoints.length){ code = UScript.getScript(codepoints[i][0]); if(code != codepoints[i][1]){ logln("UScript.getScript for codepoint 0x"+ hex(codepoints[i][0])+" failed"); passed = false; } i++; } if(!passed){ errln("UScript.getScript failed."); } } public void TestScriptNames(){ for(int i=0; i=0){ errln("UScript.getScript for codepoint 0x"+ hex(i)+" failed"); } String abbr = UScript.getShortName(code); if(abbr.indexOf("INV")>=0){ errln("UScript.getScript for codepoint 0x"+ hex(i)+" failed"); } } } public void TestNewCode(){ /* * These script codes were originally added to ICU pre-3.6, so that ICU would * have all ISO 15924 script codes. ICU was then based on Unicode 4.1. * These script codes were added with only short names because we don't * want to invent long names ourselves. * Unicode 5 and later encode some of these scripts and give them long names. * Whenever this happens, the long script names here need to be updated. */ String[] expectedLong = new String[]{ "Balinese", "Batk", "Blis", "Brah", "Cham", "Cirt", "Cyrs", "Egyd", "Egyh", "Egyp", "Geok", "Hans", "Hant", "Hmng", "Hung", "Inds", "Java", "Kayah_Li", "Latf", "Latg", "Lepcha", "Lina", "Mand", "Maya", "Mero", "Nko", "Orkh", "Perm", "Phags_Pa", "Phoenician", "Plrd", "Roro", "Sara", "Syre", "Syrj", "Syrn", "Teng", "Vai", "Visp", "Cuneiform", "Zxxx", "Unknown", "Carian", "Jpan", "Lana", "Lycian", "Lydian", "Ol_Chiki", "Rejang", "Saurashtra", "Sgnw", "Sundanese", "Moon", "Mtei", // ICU 4.0 "Armi", "Avst", "Cakm", "Kore", "Kthi", "Mani", "Phli", "Phlp", "Phlv", "Prti", "Samr", "Tavt", "Zmth", "Zsym", }; String[] expectedShort = new String[]{ "Bali", "Batk", "Blis", "Brah", "Cham", "Cirt", "Cyrs", "Egyd", "Egyh", "Egyp", "Geok", "Hans", "Hant", "Hmng", "Hung", "Inds", "Java", "Kali", "Latf", "Latg", "Lepc", "Lina", "Mand", "Maya", "Mero", "Nkoo", "Orkh", "Perm", "Phag", "Phnx", "Plrd", "Roro", "Sara", "Syre", "Syrj", "Syrn", "Teng", "Vaii", "Visp", "Xsux", "Zxxx", "Zzzz", "Cari", "Jpan", "Lana", "Lyci", "Lydi", "Olck", "Rjng", "Saur", "Sgnw", "Sund", "Moon", "Mtei", // ICU 4.0 "Armi", "Avst", "Cakm", "Kore", "Kthi", "Mani", "Phli", "Phlp", "Phlv", "Prti", "Samr", "Tavt", "Zmth", "Zsym", }; int j = 0; int i = 0; for(i=UScript.BALINESE; i 5) locCount = 5; logln("Quick mode: only _testing first 5 Locales"); } for(int i = 0; i < locCount; ++i) { logln(loc[i].getDisplayName()); fmt = NumberFormat.getInstance(loc[i]); _test(fmt); fmt = NumberFormat.getCurrencyInstance(loc[i]); _test(fmt); fmt = NumberFormat.getPercentInstance(loc[i]); _test(fmt); } logln("Numeric error " + min_numeric_error + " to " + max_numeric_error); } /** * Return a random value from -range..+range. */ private Random random; public double randomDouble(double range) { if (random == null) { random = createRandom(); // use test framework's random seed } return random.nextDouble() * range; } public void _test(NumberFormat fmt) { _test(fmt, Double.NaN); _test(fmt, Double.POSITIVE_INFINITY); _test(fmt, Double.NEGATIVE_INFINITY); _test(fmt, 500); _test(fmt, 0); _test(fmt, -0); _test(fmt, 0.0); double negZero = 0.0; negZero /= -1.0; _test(fmt, negZero); _test(fmt, 9223372036854775808.0d); _test(fmt, -9223372036854775809.0d); //_test(fmt, 6.936065876100493E74d); // _test(fmt, 6.212122845281909E48d); for (int i = 0; i < 10; ++i) { _test(fmt, randomDouble(1)); _test(fmt, randomDouble(10000)); _test(fmt, Math.floor((randomDouble(10000)))); _test(fmt, randomDouble(1e50)); _test(fmt, randomDouble(1e-50)); _test(fmt, randomDouble(1e100)); _test(fmt, randomDouble(1e75)); _test(fmt, randomDouble(1e308) / ((DecimalFormat) fmt).getMultiplier()); _test(fmt, randomDouble(1e75) / ((DecimalFormat) fmt).getMultiplier()); _test(fmt, randomDouble(1e65) / ((DecimalFormat) fmt).getMultiplier()); _test(fmt, randomDouble(1e-292)); _test(fmt, randomDouble(1e-78)); _test(fmt, randomDouble(1e-323)); _test(fmt, randomDouble(1e-100)); _test(fmt, randomDouble(1e-78)); } } public void _test(NumberFormat fmt, double value) { _test(fmt, new Double(value)); } public void _test(NumberFormat fmt, long value) { _test(fmt, new Long(value)); } public void _test(NumberFormat fmt, Number value) { logln("test data = " + value); fmt.setMaximumFractionDigits(999); String s, s2; if (value.getClass().getName().equalsIgnoreCase("java.lang.Double")) s = fmt.format(value.doubleValue()); else s = fmt.format(value.longValue()); Number n = new Double(0); boolean show = verbose; if (DEBUG) logln( /*value.getString(temp) +*/ " F> " + s); try { n = fmt.parse(s); } catch (java.text.ParseException e) { System.out.println(e); } if (DEBUG) logln(s + " P> " /*+ n.getString(temp)*/); if (value.getClass().getName().equalsIgnoreCase("java.lang.Double")) s2 = fmt.format(n.doubleValue()); else s2 = fmt.format(n.longValue()); if (DEBUG) logln(/*n.getString(temp) +*/ " F> " + s2); if (STRING_COMPARE) { if (!s.equals(s2)) { errln("*** STRING ERROR \"" + s + "\" != \"" + s2 + "\""); show = true; } } if (EXACT_NUMERIC_COMPARE) { if (value != n) { errln("*** NUMERIC ERROR"); show = true; } } else { // Compute proportional error double error = proportionalError(value, n); if (error > MAX_ERROR) { errln("*** NUMERIC ERROR " + error); show = true; } if (error > max_numeric_error) max_numeric_error = error; if (error < min_numeric_error) min_numeric_error = error; } if (show) logln( /*value.getString(temp) +*/ value.getClass().getName() + " F> " + s + " P> " + /*n.getString(temp) +*/ n.getClass().getName() + " F> " + s2); } public double proportionalError(Number a, Number b) { double aa,bb; if(a.getClass().getName().equalsIgnoreCase("java.lang.Double")) aa = a.doubleValue(); else aa = a.longValue(); if(a.getClass().getName().equalsIgnoreCase("java.lang.Double")) bb = b.doubleValue(); else bb = b.longValue(); double error = aa - bb; if(aa != 0 && bb != 0) error /= aa; return Math.abs(error); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/PluralFormatTest.java0000644000175000017500000001734111361046232024373 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2007-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.impl.Utility; import com.ibm.icu.text.PluralFormat; import com.ibm.icu.util.ULocale; import java.util.HashMap; import java.util.Map; /** * @author tschumann (Tim Schumann) * */ public class PluralFormatTest extends TestFmwk { public static void main(String[] args) throws Exception { new PluralFormatTest().run(args); } private void helperTestRules(String localeIDs, String testPattern, Map changes) { String[] locales = Utility.split(localeIDs, ','); // Create example outputs for all supported locales. /* System.out.println("\n" + localeIDs); String lastValue = (String) changes.get(new Integer(0)); int lastNumber = 0; for (int i = 1; i < 199; ++i) { if (changes.get(new Integer(i)) != null) { if (lastNumber == i-1) { System.out.println(lastNumber + ": " + lastValue); } else { System.out.println(lastNumber + "... " + (i-1) + ": " + lastValue); } lastNumber = i; lastValue = (String) changes.get(new Integer(i)); } } System.out.println(lastNumber + "..." + 199 + ": " + lastValue); */ log("test pattern: '" + testPattern + "'"); for (int i = 0; i < locales.length; ++i) { try { PluralFormat plf = new PluralFormat(new ULocale(locales[i]), testPattern); log("plf: " + plf); String expected = (String) changes.get(new Integer(0)); for (int n = 0; n < 200; ++n) { if (changes.get(new Integer(n)) != null) { expected = (String) changes.get(new Integer(n)); } assertEquals("Locale: " + locales[i] + ", number: " + n, expected, plf.format(n)); } } catch (IllegalArgumentException e) { errln(e.getMessage() + " locale: " + locales[i] + " pattern: '" + testPattern + "' " + System.currentTimeMillis()); } } } public void TestOneFormLocales() { String localeIDs = "ja,ko,tr,vi"; String testPattern = "other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); helperTestRules(localeIDs, testPattern, changes); } public void TestSingular1Locales() { String localeIDs = "da,de,el,en,eo,es,et,fi,fo,he,it,nb,nl,nn,no,pt_PT,sv"; String testPattern = "one{one} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "other"); helperTestRules(localeIDs, testPattern, changes); } public void TestSingular01Locales() { String localeIDs = "fr,pt_BR"; String testPattern = "one{one} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "one"); changes.put(new Integer(2), "other"); helperTestRules(localeIDs, testPattern, changes); } public void TestZeroSingularLocales() { String localeIDs = "lv"; String testPattern = "zero{zero} one{one} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "zero"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "other"); for (int i = 2; i < 20; ++i) { if (i == 11) { continue; } changes.put(new Integer(i*10 + 1), "one"); changes.put(new Integer(i*10 + 2), "other"); } helperTestRules(localeIDs, testPattern, changes); } public void TestSingularDual() { String localeIDs = "ga"; String testPattern = "one{one} two{two} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "two"); changes.put(new Integer(3), "other"); helperTestRules(localeIDs, testPattern, changes); } public void TestSingularZeroSome() { String localeIDs = "ro"; String testPattern = "few{few} one{one} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "few"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "few"); changes.put(new Integer(20), "other"); changes.put(new Integer(101), "few"); changes.put(new Integer(120), "other"); helperTestRules(localeIDs, testPattern, changes); } public void TestSpecial12_19() { String localeIDs = "lt"; String testPattern = "one{one} few{few} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "few"); changes.put(new Integer(10), "other"); for (int i = 2; i < 20; ++i) { if (i == 11) { continue; } changes.put(new Integer(i*10 + 1), "one"); changes.put(new Integer(i*10 + 2), "few"); changes.put(new Integer((i+1)*10), "other"); } helperTestRules(localeIDs, testPattern, changes); } public void TestPaucalExcept11_14() { String localeIDs = "hr,ru,sr,uk"; String testPattern = "one{one} few{few} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "few"); changes.put(new Integer(5), "other"); for (int i = 2; i < 20; ++i) { if (i == 11) { continue; } changes.put(new Integer(i*10 + 1), "one"); changes.put(new Integer(i*10 + 2), "few"); changes.put(new Integer(i*10 + 5), "other"); } helperTestRules(localeIDs, testPattern, changes); } public void TestSingularPaucal() { String localeIDs = "cs,sk"; String testPattern = "one{one} few{few} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "few"); changes.put(new Integer(5), "other"); helperTestRules(localeIDs, testPattern, changes); } public void TestPaucal1_234() { String localeIDs = "pl"; String testPattern = "one{one} few{few} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "few"); changes.put(new Integer(5), "other"); for (int i = 2; i < 20; ++i) { if (i == 2 || i == 11 || i == 12) { continue; } changes.put(new Integer(i*10 + 2), "few"); changes.put(new Integer(i*10 + 5), "other"); } helperTestRules(localeIDs, testPattern, changes); } public void TestPaucal1_2_34() { String localeIDs = "sl"; String testPattern = "one{one} two{two} few{few} other{other}"; Map changes = new HashMap(); changes.put(new Integer(0), "other"); changes.put(new Integer(1), "one"); changes.put(new Integer(2), "two"); changes.put(new Integer(3), "few"); changes.put(new Integer(5), "other"); changes.put(new Integer(101), "one"); changes.put(new Integer(102), "two"); changes.put(new Integer(103), "few"); changes.put(new Integer(105), "other"); helperTestRules(localeIDs, testPattern, changes); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestDecimalFormatSymbolsC.java0000644000175000017500000001175311361046232026776 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /** * Port From: ICU4C v1.8.1 : format : IntlTestDecimalFormatSymbols * Source File: $ICU4CRoot/source/test/intltest/tsdcfmsy.cpp **/ package com.ibm.icu.dev.test.format; import java.text.FieldPosition; import java.util.Locale; import com.ibm.icu.text.*; /** * Tests for DecimalFormatSymbols **/ public class IntlTestDecimalFormatSymbolsC extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new IntlTestDecimalFormatSymbolsC().run(args); } /** * Test the API of DecimalFormatSymbols; primarily a simple get/set set. */ public void TestSymbols() { DecimalFormatSymbols fr = new DecimalFormatSymbols(Locale.FRENCH); DecimalFormatSymbols en = new DecimalFormatSymbols(Locale.ENGLISH); if (en.equals(fr)) { errln("ERROR: English DecimalFormatSymbols equal to French"); } // just do some VERY basic tests to make sure that get/set work char zero = en.getZeroDigit(); fr.setZeroDigit(zero); if (fr.getZeroDigit() != en.getZeroDigit()) { errln("ERROR: get/set ZeroDigit failed"); } char group = en.getGroupingSeparator(); fr.setGroupingSeparator(group); if (fr.getGroupingSeparator() != en.getGroupingSeparator()) { errln("ERROR: get/set GroupingSeparator failed"); } char decimal = en.getDecimalSeparator(); fr.setDecimalSeparator(decimal); if (fr.getDecimalSeparator() != en.getDecimalSeparator()) { errln("ERROR: get/set DecimalSeparator failed"); } char perMill = en.getPerMill(); fr.setPerMill(perMill); if (fr.getPerMill() != en.getPerMill()) { errln("ERROR: get/set PerMill failed"); } char percent = en.getPercent(); fr.setPercent(percent); if (fr.getPercent() != en.getPercent()) { errln("ERROR: get/set Percent failed"); } char digit = en.getDigit(); fr.setDigit(digit); if (fr.getPercent() != en.getPercent()) { errln("ERROR: get/set Percent failed"); } char patternSeparator = en.getPatternSeparator(); fr.setPatternSeparator(patternSeparator); if (fr.getPatternSeparator() != en.getPatternSeparator()) { errln("ERROR: get/set PatternSeparator failed"); } String infinity = en.getInfinity(); fr.setInfinity(infinity); String infinity2 = fr.getInfinity(); if (!infinity.equals(infinity2)) { errln("ERROR: get/set Infinity failed"); } String nan = en.getNaN(); fr.setNaN(nan); String nan2 = fr.getNaN(); if (!nan.equals(nan2)) { errln("ERROR: get/set NaN failed"); } char minusSign = en.getMinusSign(); fr.setMinusSign(minusSign); if (fr.getMinusSign() != en.getMinusSign()) { errln("ERROR: get/set MinusSign failed"); } // char exponential = en.getExponentialSymbol(); // fr.setExponentialSymbol(exponential); // if(fr.getExponentialSymbol() != en.getExponentialSymbol()) { // errln("ERROR: get/set Exponential failed"); // } //DecimalFormatSymbols foo = new DecimalFormatSymbols(); //The variable is never used en = (DecimalFormatSymbols) fr.clone(); if (!en.equals(fr)) { errln("ERROR: Clone failed"); } DecimalFormatSymbols sym = new DecimalFormatSymbols(Locale.US); verify(34.5, "00.00", sym, "34.50"); sym.setDecimalSeparator('S'); verify(34.5, "00.00", sym, "34S50"); sym.setPercent('P'); verify(34.5, "00 %", sym, "3450 P"); sym.setCurrencySymbol("D"); verify(34.5, "\u00a4##.##", sym, "D34.5"); sym.setGroupingSeparator('|'); verify(3456.5, "0,000.##", sym, "3|456S5"); } /** helper functions**/ public void verify(double value, String pattern, DecimalFormatSymbols sym, String expected) { DecimalFormat df = new DecimalFormat(pattern, sym); StringBuffer buffer = new StringBuffer(""); FieldPosition pos = new FieldPosition(-1); buffer = df.format(value, buffer, pos); if(!buffer.toString().equals(expected)){ errln("ERROR: format failed after setSymbols()\n Expected" + expected + ", Got " + buffer); } } }icu4j-4.2/src/com/ibm/icu/dev/test/format/MessageRegression.java0000644000175000017500000010702211361046232024544 0ustar twernertwerner/* ********************************************************************** * Copyright (c) 2005-2008, International Business Machines * Corporation and others. All Rights Reserved. ********************************************************************** * Author: Alan Liu * Created: April 12, 2004 * Since: ICU 3.0 ********************************************************************** */ /** * MessageRegression.java * * @test 1.29 01/03/12 * @bug 4031438 4058973 4074764 4094906 4104976 4105380 4106659 4106660 4106661 * 4111739 4112104 4113018 4114739 4114743 4116444 4118592 4118594 4120552 * 4142938 4169959 4232154 4293229 * @summary Regression tests for MessageFormat and associated classes */ /* (C) Copyright Taligent, Inc. 1996 - All Rights Reserved (C) Copyright IBM Corp. 1996 - All Rights Reserved The original version of this source code and documentation is copyrighted and owned by Taligent, Inc., a wholly-owned subsidiary of IBM. These materials are provided under terms of a License Agreement between Taligent and Sun. This technology is protected by multiple US and International patents. This notice and attribution to Taligent may not be removed. Taligent is a registered trademark of Taligent, Inc. */ package com.ibm.icu.dev.test.format; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; import java.text.ChoiceFormat; import java.text.ParsePosition; import java.util.Date; import java.util.Iterator; import java.util.Locale; import java.util.Map; import java.util.HashMap; import com.ibm.icu.text.MessageFormat; public class MessageRegression extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new MessageRegression().run(args); } /* @bug 4074764 * Null exception when formatting pattern with MessageFormat * with no parameters. */ public void Test4074764() { String[] pattern = {"Message without param", "Message with param:{0}", "Longer Message with param {0}"}; //difference between the two param strings are that //in the first one, the param position is within the //length of the string without param while it is not so //in the other case. MessageFormat messageFormatter = new MessageFormat(""); try { //Apply pattern with param and print the result messageFormatter.applyPattern(pattern[1]); Object[] paramArray = {new String("BUG"), new Date()}; String tempBuffer = messageFormatter.format(paramArray); if (!tempBuffer.equals("Message with param:BUG")) errln("MessageFormat with one param test failed."); logln("Formatted with one extra param : " + tempBuffer); //Apply pattern without param and print the result messageFormatter.applyPattern(pattern[0]); tempBuffer = messageFormatter.format(null); if (!tempBuffer.equals("Message without param")) errln("MessageFormat with no param test failed."); logln("Formatted with no params : " + tempBuffer); tempBuffer = messageFormatter.format(paramArray); if (!tempBuffer.equals("Message without param")) errln("Formatted with arguments > subsitution failed. result = " + tempBuffer.toString()); logln("Formatted with extra params : " + tempBuffer); //This statement gives an exception while formatting... //If we use pattern[1] for the message with param, //we get an NullPointerException in MessageFormat.java(617) //If we use pattern[2] for the message with param, //we get an StringArrayIndexOutOfBoundsException in MessageFormat.java(614) //Both are due to maxOffset not being reset to -1 //in applyPattern() when the pattern does not //contain any param. } catch (Exception foo) { errln("Exception when formatting with no params."); } } /* @bug 4058973 * MessageFormat.toPattern has weird rounding behavior. */ public void Test4058973() { MessageFormat fmt = new MessageFormat("{0,choice,0#no files|1#one file|1< {0,number,integer} files}"); String pat = fmt.toPattern(); if (!pat.equals("{0,choice,0.0#no files|1.0#one file|1.0< {0,number,integer} files}")) { errln("MessageFormat.toPattern failed"); } } /* @bug 4031438 * More robust message formats. */ public void Test4031438() { String pattern1 = "Impossible {1} has occurred -- status code is {0} and message is {2}."; String pattern2 = "Double '' Quotes {0} test and quoted '{1}' test plus 'other {2} stuff'."; MessageFormat messageFormatter = new MessageFormat(""); try { logln("Apply with pattern : " + pattern1); messageFormatter.applyPattern(pattern1); Object[] paramArray = {new Integer(7)}; String tempBuffer = messageFormatter.format(paramArray); if (!tempBuffer.equals("Impossible {1} has occurred -- status code is 7 and message is {2}.")) errln("Tests arguments < substitution failed"); logln("Formatted with 7 : " + tempBuffer); ParsePosition status = new ParsePosition(0); Object[] objs = messageFormatter.parse(tempBuffer, status); if (objs[paramArray.length] != null) errln("Parse failed with more than expected arguments"); for (int i = 0; i < objs.length; i++) { if (objs[i] != null && !objs[i].toString().equals(paramArray[i].toString())) { errln("Parse failed on object " + objs[i] + " at index : " + i); } } tempBuffer = messageFormatter.format(null); if (!tempBuffer.equals("Impossible {1} has occurred -- status code is {0} and message is {2}.")) errln("Tests with no arguments failed"); logln("Formatted with null : " + tempBuffer); logln("Apply with pattern : " + pattern2); messageFormatter.applyPattern(pattern2); tempBuffer = messageFormatter.format(paramArray); if (!tempBuffer.equals("Double ' Quotes 7 test and quoted {1} test plus other {2} stuff.")) errln("quote format test (w/ params) failed."); logln("Formatted with params : " + tempBuffer); tempBuffer = messageFormatter.format(null); if (!tempBuffer.equals("Double ' Quotes {0} test and quoted {1} test plus other {2} stuff.")) errln("quote format test (w/ null) failed."); logln("Formatted with null : " + tempBuffer); logln("toPattern : " + messageFormatter.toPattern()); } catch (Exception foo) { warnln("Exception when formatting in bug 4031438. "+foo.getMessage()); } } public void Test4052223() { ParsePosition pos = new ParsePosition(0); if (pos.getErrorIndex() != -1) { errln("ParsePosition.getErrorIndex initialization failed."); } MessageFormat fmt = new MessageFormat("There are {0} apples growing on the {1} tree."); String str = new String("There is one apple growing on the peach tree."); Object[] objs = fmt.parse(str, pos); logln("unparsable string , should fail at " + pos.getErrorIndex()); if (pos.getErrorIndex() == -1) errln("Bug 4052223 failed : parsing string " + str); pos.setErrorIndex(4); if (pos.getErrorIndex() != 4) errln("setErrorIndex failed, got " + pos.getErrorIndex() + " instead of 4"); if (objs != null) { errln("objs should be null"); } ChoiceFormat f = new ChoiceFormat( "-1#are negative|0#are no or fraction|1#is one|1.0"); Object[] objs1 = null; Object[] objs2 = {}; Object[] objs3 = {null}; try { logln("pattern: \"" + mf.toPattern() + "\""); log("format(null) : "); logln("\"" + mf.format(objs1) + "\""); log("format({}) : "); logln("\"" + mf.format(objs2) + "\""); log("format({null}) :"); logln("\"" + mf.format(objs3) + "\""); } catch (Exception e) { errln("Exception thrown for null argument tests."); } } /* @bug 4113018 * MessageFormat.applyPattern works wrong with illegal patterns. */ public void Test4113018() { String originalPattern = "initial pattern"; MessageFormat mf = new MessageFormat(originalPattern); String illegalPattern = "format: {0, xxxYYY}"; logln("pattern before: \"" + mf.toPattern() + "\""); logln("illegal pattern: \"" + illegalPattern + "\""); try { mf.applyPattern(illegalPattern); errln("Should have thrown IllegalArgumentException for pattern : " + illegalPattern); } catch (IllegalArgumentException e) { if (!originalPattern.equals(mf.toPattern())) errln("pattern after: \"" + mf.toPattern() + "\""); } } /* @bug 4106661 * ChoiceFormat is silent about the pattern usage in javadoc. */ public void Test4106661() { ChoiceFormat fmt = new ChoiceFormat( "-1#are negative| 0#are no or fraction | 1#is one |1.0 " + out + "; want \"" + DATA[i+1+j] + '"'); } String pat = cf.toPattern(); String pat2 = new ChoiceFormat(pat).toPattern(); if (!pat.equals(pat2)) errln("Fail: Pattern \"" + DATA[i] + "\" x toPattern -> \"" + pat + '"'); else logln("Ok: Pattern \"" + DATA[i] + "\" x toPattern -> \"" + pat + '"'); } catch (IllegalArgumentException e) { errln("Fail: Pattern \"" + DATA[i] + "\" -> " + e); } } } /** * @bug 4112104 * MessageFormat.equals(null) throws a NullPointerException. The JLS states * that it should return false. */ public void Test4112104() { MessageFormat format = new MessageFormat(""); try { // This should NOT throw an exception if (format.equals(null)) { // It also should return false errln("MessageFormat.equals(null) returns false"); } } catch (NullPointerException e) { errln("MessageFormat.equals(null) throws " + e); } } /** * @bug 4169959 * MessageFormat does not format null objects. CANNOT REPRODUCE THIS BUG. */ public void Test4169959() { // This works logln(MessageFormat.format( "This will {0}", new String[]{"work"} ) ); // This fails logln(MessageFormat.format( "This will {0}", new Object[]{ null } ) ); } public void test4232154() { boolean gotException = false; try { new MessageFormat("The date is {0:date}"); } catch (Exception e) { gotException = true; if (!(e instanceof IllegalArgumentException)) { throw new RuntimeException("got wrong exception type"); } if ("argument number too large at ".equals(e.getMessage())) { throw new RuntimeException("got wrong exception message"); } } if (!gotException) { throw new RuntimeException("didn't get exception for invalid input"); } } public void test4293229() { MessageFormat format = new MessageFormat("'''{'0}'' '''{0}'''"); Object[] args = { null }; String expected = "'{0}' '{0}'"; String result = format.format(args); if (!result.equals(expected)) { throw new RuntimeException("wrong format result - expected \"" + expected + "\", got \"" + result + "\""); } } // This test basically ensures that the tests defined above also work with // valid named arguments. public void testBugTestsWithNamesArguments() { { // Taken from Test4031438(). String pattern1 = "Impossible {arg1} has occurred -- status code is {arg0} and message is {arg2}."; String pattern2 = "Double '' Quotes {ARG_ZERO} test and quoted '{ARG_ONE}' test plus 'other {ARG_TWO} stuff'."; MessageFormat messageFormatter = new MessageFormat(""); try { logln("Apply with pattern : " + pattern1); messageFormatter.applyPattern(pattern1); HashMap paramsMap = new HashMap(); paramsMap.put("arg0", new Integer(7)); String tempBuffer = messageFormatter.format(paramsMap); if (!tempBuffer.equals("Impossible {arg1} has occurred -- status code is 7 and message is {arg2}.")) errln("Tests arguments < substitution failed"); logln("Formatted with 7 : " + tempBuffer); ParsePosition status = new ParsePosition(0); Map objs = messageFormatter.parseToMap(tempBuffer, status); if (objs.get("arg1") != null || objs.get("arg2") != null) errln("Parse failed with more than expected arguments"); for (Iterator keyIter = objs.keySet().iterator(); keyIter.hasNext();) { String key = (String) keyIter.next(); if (objs.get(key) != null && !objs.get(key).toString().equals(paramsMap.get(key).toString())) { errln("Parse failed on object " + objs.get(key) + " with argument name : " + key ); } } tempBuffer = messageFormatter.format(null); if (!tempBuffer.equals("Impossible {arg1} has occurred -- status code is {arg0} and message is {arg2}.")) errln("Tests with no arguments failed"); logln("Formatted with null : " + tempBuffer); logln("Apply with pattern : " + pattern2); messageFormatter.applyPattern(pattern2); paramsMap.clear(); paramsMap.put("ARG_ZERO", new Integer(7)); tempBuffer = messageFormatter.format(paramsMap); if (!tempBuffer.equals("Double ' Quotes 7 test and quoted {ARG_ONE} test plus other {ARG_TWO} stuff.")) errln("quote format test (w/ params) failed."); logln("Formatted with params : " + tempBuffer); tempBuffer = messageFormatter.format(null); if (!tempBuffer.equals("Double ' Quotes {ARG_ZERO} test and quoted {ARG_ONE} test plus other {ARG_TWO} stuff.")) errln("quote format test (w/ null) failed."); logln("Formatted with null : " + tempBuffer); logln("toPattern : " + messageFormatter.toPattern()); } catch (Exception foo) { warnln("Exception when formatting in bug 4031438. "+foo.getMessage()); } }{ // Taken from Test4052223(). ParsePosition pos = new ParsePosition(0); if (pos.getErrorIndex() != -1) { errln("ParsePosition.getErrorIndex initialization failed."); } MessageFormat fmt = new MessageFormat("There are {numberOfApples} apples growing on the {whatKindOfTree} tree."); String str = new String("There is one apple growing on the peach tree."); Map objs = fmt.parseToMap(str, pos); logln("unparsable string , should fail at " + pos.getErrorIndex()); if (pos.getErrorIndex() == -1) errln("Bug 4052223 failed : parsing string " + str); pos.setErrorIndex(4); if (pos.getErrorIndex() != 4) errln("setErrorIndex failed, got " + pos.getErrorIndex() + " instead of 4"); if (objs != null) errln("unparsable string, should return null"); }{ // Taken from Test4111739(). MessageFormat format1 = null; MessageFormat format2 = null; ObjectOutputStream ostream = null; ByteArrayOutputStream baos = null; ObjectInputStream istream = null; try { baos = new ByteArrayOutputStream(); ostream = new ObjectOutputStream(baos); } catch(IOException e) { errln("Unexpected exception : " + e.getMessage()); return; } try { format1 = new MessageFormat("pattern{argument}"); ostream.writeObject(format1); ostream.flush(); byte bytes[] = baos.toByteArray(); istream = new ObjectInputStream(new ByteArrayInputStream(bytes)); format2 = (MessageFormat)istream.readObject(); } catch(Exception e) { errln("Unexpected exception : " + e.getMessage()); } if (!format1.equals(format2)) { errln("MessageFormats before and after serialization are not" + " equal\nformat1 = " + format1 + "(" + format1.toPattern() + ")\nformat2 = " + format2 + "(" + format2.toPattern() + ")"); } else { logln("Serialization for MessageFormat is OK."); } }{ // Taken from Test4116444(). String[] patterns = {"", "one", "{namedArgument,date,short}"}; MessageFormat mf = new MessageFormat(""); for (int i = 0; i < patterns.length; i++) { String pattern = patterns[i]; mf.applyPattern(pattern); try { Map objs = mf.parseToMap(null, new ParsePosition(0)); logln("pattern: \"" + pattern + "\""); log(" parsedObjects: "); if (objs != null) { log("{"); for (Iterator keyIter = objs.keySet().iterator(); keyIter.hasNext();) { String key = (String)keyIter.next(); if (objs.get(key) != null) { err("\"" + objs.get(key).toString() + "\""); } else { log("null"); } if (keyIter.hasNext()) { log(","); } } log("}") ; } else { log("null"); } logln(""); } catch (Exception e) { errln("pattern: \"" + pattern + "\""); errln(" Exception: " + e.getMessage()); } } }{ // Taken from Test4114739(). MessageFormat mf = new MessageFormat("<{arg}>"); Map objs1 = null; Map objs2 = new HashMap(); Map objs3 = new HashMap(); objs3.put("arg", null); try { logln("pattern: \"" + mf.toPattern() + "\""); log("format(null) : "); logln("\"" + mf.format(objs1) + "\""); log("format({}) : "); logln("\"" + mf.format(objs2) + "\""); log("format({null}) :"); logln("\"" + mf.format(objs3) + "\""); } catch (Exception e) { errln("Exception thrown for null argument tests."); } }{ // Taken from Test4118594(). String argName = "something_stupid"; MessageFormat mf = new MessageFormat("{"+ argName + "}, {" + argName + "}, {" + argName + "}"); String forParsing = "x, y, z"; Map objs = mf.parseToMap(forParsing, new ParsePosition(0)); logln("pattern: \"" + mf.toPattern() + "\""); logln("text for parsing: \"" + forParsing + "\""); if (!objs.get(argName).toString().equals("z")) errln("argument0: \"" + objs.get(argName) + "\""); mf.setLocale(Locale.US); mf.applyPattern("{" + argName + ",number,#.##}, {" + argName + ",number,#.#}"); Map oldobjs = new HashMap(); oldobjs.put(argName, new Double(3.1415)); String result = mf.format( oldobjs ); logln("pattern: \"" + mf.toPattern() + "\""); logln("text for parsing: \"" + result + "\""); // result now equals "3.14, 3.1" if (!result.equals("3.14, 3.1")) errln("result = " + result); Map newobjs = mf.parseToMap(result, new ParsePosition(0)); // newobjs now equals {new Double(3.1)} if (((Number)newobjs.get(argName)).doubleValue() != 3.1) // was (Double) [alan] errln( "newobjs.get(argName) = " + newobjs.get(argName)); }{ // Taken from Test4105380(). String patternText1 = "The disk \"{diskName}\" contains {numberOfFiles}."; String patternText2 = "There are {numberOfFiles} on the disk \"{diskName}\""; MessageFormat form1 = new MessageFormat(patternText1); MessageFormat form2 = new MessageFormat(patternText2); double[] filelimits = {0,1,2}; String[] filepart = {"no files","one file","{numberOfFiles,number} files"}; ChoiceFormat fileform = new ChoiceFormat(filelimits, filepart); form1.setFormat(1, fileform); form2.setFormat(0, fileform); Map testArgs = new HashMap(); testArgs.put("diskName", "MyDisk"); testArgs.put("numberOfFiles", new Long(12373)); logln(form1.format(testArgs)); logln(form2.format(testArgs)); }{ // Taken from test4293229(). MessageFormat format = new MessageFormat("'''{'myNamedArgument}'' '''{myNamedArgument}'''"); Map args = new HashMap(); String expected = "'{myNamedArgument}' '{myNamedArgument}'"; String result = format.format(args); if (!result.equals(expected)) { throw new RuntimeException("wrong format result - expected \"" + expected + "\", got \"" + result + "\""); } } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestNumberFormatAPI.java0000644000175000017500000002025611361050730025542 0ustar twernertwerner//##header /***************************************************************************************** * * Copyright (C) 1996-2009, International Business Machines * Corporation and others. All Rights Reserved. **/ /** * Port From: JDK 1.4b1 : java.text.Format.IntlTestNumberFormatAPI * Source File: java/text/format/IntlTestNumberFormatAPI.java **/ /* @test 1.4 98/03/06 @summary test International Number Format API */ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.*; import com.ibm.icu.util.ULocale; import java.util.Locale; import java.math.BigInteger; import java.text.FieldPosition; import java.text.ParsePosition; import java.text.ParseException; public class IntlTestNumberFormatAPI extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new IntlTestNumberFormatAPI().run(args); } // This test checks various generic API methods in DecimalFormat to achieve 100% API coverage. public void TestAPI() { logln("NumberFormat API test---"); logln(""); Locale.setDefault(Locale.ENGLISH); // ======= Test constructors logln("Testing NumberFormat constructors"); NumberFormat def = NumberFormat.getInstance(); NumberFormat fr = NumberFormat.getInstance(Locale.FRENCH); NumberFormat cur = NumberFormat.getCurrencyInstance(); NumberFormat cur_fr = NumberFormat.getCurrencyInstance(Locale.FRENCH); NumberFormat per = NumberFormat.getPercentInstance(); NumberFormat per_fr = NumberFormat.getPercentInstance(Locale.FRENCH); NumberFormat integer = NumberFormat.getIntegerInstance(); NumberFormat int_fr = NumberFormat.getIntegerInstance(Locale.FRENCH); //Fix "The variable is never used" compilation warnings logln("Currency : " + cur.format(1234.5)); logln("Percent : " + per.format(1234.5)); logln("Integer : " + integer.format(1234.5)); logln("Int_fr : " + int_fr.format(1234.5)); // ======= Test equality logln("Testing equality operator"); if( per_fr.equals(cur_fr) ) { errln("ERROR: == failed"); } // ======= Test various format() methods logln("Testing various format() methods"); // final double d = -10456.0037; // this appears as -10456.003700000001 on NT // final double d = -1.04560037e-4; // this appears as -1.0456003700000002E-4 on NT final double d = -10456.00370000000000; // this works! final long l = 100000000; String res1 = new String(); String res2 = new String(); StringBuffer res3 = new StringBuffer(); StringBuffer res4 = new StringBuffer(); StringBuffer res5 = new StringBuffer(); StringBuffer res6 = new StringBuffer(); FieldPosition pos1 = new FieldPosition(0); FieldPosition pos2 = new FieldPosition(0); FieldPosition pos3 = new FieldPosition(0); FieldPosition pos4 = new FieldPosition(0); res1 = cur_fr.format(d); logln( "" + d + " formatted to " + res1); res2 = cur_fr.format(l); logln("" + l + " formatted to " + res2); res3 = cur_fr.format(d, res3, pos1); logln( "" + d + " formatted to " + res3); res4 = cur_fr.format(l, res4, pos2); logln("" + l + " formatted to " + res4); res5 = cur_fr.format(d, res5, pos3); logln("" + d + " formatted to " + res5); res6 = cur_fr.format(l, res6, pos4); logln("" + l + " formatted to " + res6); // ======= Test parse() logln("Testing parse()"); // String text = new String("-10,456.0037"); String text = new String("-10456,0037"); ParsePosition pos = new ParsePosition(0); ParsePosition pos01 = new ParsePosition(0); double d1 = ((Number)fr.parseObject(text, pos)).doubleValue(); if(d1 != d) { errln("ERROR: Roundtrip failed (via parse()) for " + text); } logln(text + " parsed into " + d1); double d2 = fr.parse(text, pos01).doubleValue(); if(d2 != d) { errln("ERROR: Roundtrip failed (via parse()) for " + text); } logln(text + " parsed into " + d2); double d3 = 0; try { d3 = fr.parse(text).doubleValue(); } catch (ParseException e) { errln("ERROR: parse() failed"); } if(d3 != d) { errln("ERROR: Roundtrip failed (via parse()) for " + text); } logln(text + " parsed into " + d3); // ======= Test getters and setters logln("Testing getters and setters"); final Locale[] locales = NumberFormat.getAvailableLocales(); long count = locales.length; logln("Got " + count + " locales" ); for(int i = 0; i < count; i++) { String name; name = locales[i].getDisplayName(); logln(name); } fr.setParseIntegerOnly( def.isParseIntegerOnly() ); if(fr.isParseIntegerOnly() != def.isParseIntegerOnly() ) { errln("ERROR: setParseIntegerOnly() failed"); } fr.setGroupingUsed( def.isGroupingUsed() ); if(fr.isGroupingUsed() != def.isGroupingUsed() ) { errln("ERROR: setGroupingUsed() failed"); } fr.setMaximumIntegerDigits( def.getMaximumIntegerDigits() ); if(fr.getMaximumIntegerDigits() != def.getMaximumIntegerDigits() ) { errln("ERROR: setMaximumIntegerDigits() failed"); } fr.setMinimumIntegerDigits( def.getMinimumIntegerDigits() ); if(fr.getMinimumIntegerDigits() != def.getMinimumIntegerDigits() ) { errln("ERROR: setMinimumIntegerDigits() failed"); } fr.setMaximumFractionDigits( def.getMaximumFractionDigits() ); if(fr.getMaximumFractionDigits() != def.getMaximumFractionDigits() ) { errln("ERROR: setMaximumFractionDigits() failed"); } fr.setMinimumFractionDigits( def.getMinimumFractionDigits() ); if(fr.getMinimumFractionDigits() != def.getMinimumFractionDigits() ) { errln("ERROR: setMinimumFractionDigits() failed"); } // ======= Test getStaticClassID() // logln("Testing instanceof()"); // try { // NumberFormat test = new DecimalFormat(); // if (! (test instanceof DecimalFormat)) { // errln("ERROR: instanceof failed"); // } // } // catch (Exception e) { // errln("ERROR: Couldn't create a DecimalFormat"); // } } // Jitterbug 4451, for coverage public void TestCoverage(){ class StubNumberFormat extends NumberFormat{ /** * For serialization */ private static final long serialVersionUID = 3768385020503005993L; public void run(){ String p = NumberFormat.getPattern(ULocale.getDefault().toLocale(),0); if (!p.equals(NumberFormat.getPattern(ULocale.getDefault(),0))){ errln("NumberFormat.getPattern(Locale, int) should delegate to (ULocale,)"); } } public StringBuffer format(double number, StringBuffer toAppendTo, FieldPosition pos) {return null;} public StringBuffer format(long number, StringBuffer toAppendTo, FieldPosition pos) {return null;} public StringBuffer format(BigInteger number, StringBuffer toAppendTo, FieldPosition pos) {return null;} //#if defined(FOUNDATION10) //#else public StringBuffer format(java.math.BigDecimal number, StringBuffer toAppendTo, FieldPosition pos) {return null;} //#endif public StringBuffer format(com.ibm.icu.math.BigDecimal number, StringBuffer toAppendTo, FieldPosition pos) {return null;} public Number parse(String text, ParsePosition parsePosition) {return null;} } new StubNumberFormat().run(); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/NumberFormatSerialTestData.java0000644000175000017500000006627211361046232026325 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; public class NumberFormatSerialTestData { //get Content public static byte[][] getContent() { return content; } //NumberFormat.getInstance(Locale.US) static byte[] generalInstance = new byte[]{ -84, -19, 0, 5, 115, 114, 0, 30, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 11, -1, 3, 98, -40, 114, 48, 58, 2, 0, 22, 90, 0, 27, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 90, 0, 23, 101, 120, 112, 111, 110, 101, 110, 116, 83, 105, 103, 110, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 73, 0, 11, 102, 111, 114, 109, 97, 116, 87, 105, 100, 116, 104, 66, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 66, 0, 13, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 50, 66, 0, 17, 109, 105, 110, 69, 120, 112, 111, 110, 101, 110, 116, 68, 105, 103, 105, 116, 115, 73, 0, 10, 109, 117, 108, 116, 105, 112, 108, 105, 101, 114, 67, 0, 3, 112, 97, 100, 73, 0, 11, 112, 97, 100, 80, 111, 115, 105, 116, 105, 111, 110, 73, 0, 12, 114, 111, 117, 110, 100, 105, 110, 103, 77, 111, 100, 101, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 90, 0, 22, 117, 115, 101, 69, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 78, 111, 116, 97, 116, 105, 111, 110, 76, 0, 16, 110, 101, 103, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 116, 0, 18, 76, 106, 97, 118, 97, 47, 108, 97, 110, 103, 47, 83, 116, 114, 105, 110, 103, 59, 76, 0, 16, 110, 101, 103, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 17, 114, 111, 117, 110, 100, 105, 110, 103, 73, 110, 99, 114, 101, 109, 101, 110, 116, 116, 0, 22, 76, 106, 97, 118, 97, 47, 109, 97, 116, 104, 47, 66, 105, 103, 68, 101, 99, 105, 109, 97, 108, 59, 76, 0, 7, 115, 121, 109, 98, 111, 108, 115, 116, 0, 39, 76, 99, 111, 109, 47, 105, 98, 109, 47, 105, 99, 117, 47, 116, 101, 120, 116, 47, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 59, 120, 114, 0, 29, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 78, 117, 109, 98, 101, 114, 70, 111, 114, 109, 97, 116, -33, -10, -77, -65, 19, 125, 7, -24, 3, 0, 11, 90, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 85, 115, 101, 100, 66, 0, 17, 109, 97, 120, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 97, 120, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 97, 120, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 97, 120, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 66, 0, 17, 109, 105, 110, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 105, 110, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 105, 110, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 105, 110, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 90, 0, 16, 112, 97, 114, 115, 101, 73, 110, 116, 101, 103, 101, 114, 79, 110, 108, 121, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 120, 114, 0, 16, 106, 97, 118, 97, 46, 116, 101, 120, 116, 46, 70, 111, 114, 109, 97, 116, -5, -40, -68, 18, -23, 15, 24, 67, 2, 0, 0, 120, 112, 1, 3, 127, 0, 0, 0, 3, 0, 0, 1, 53, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 120, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 32, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 2, 0, 116, 0, 1, 45, 116, 0, 0, 116, 0, 1, 45, 116, 0, 0, 116, 0, 0, 116, 0, 0, 116, 0, 0, 116, 0, 0, 112, 115, 114, 0, 37, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 80, 29, 23, -103, 8, 104, -109, -100, 2, 0, 18, 67, 0, 16, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 5, 100, 105, 103, 105, 116, 67, 0, 11, 101, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 67, 0, 17, 103, 114, 111, 117, 112, 105, 110, 103, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 109, 105, 110, 117, 115, 83, 105, 103, 110, 67, 0, 17, 109, 111, 110, 101, 116, 97, 114, 121, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 112, 97, 100, 69, 115, 99, 97, 112, 101, 67, 0, 16, 112, 97, 116, 116, 101, 114, 110, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 7, 112, 101, 114, 77, 105, 108, 108, 67, 0, 7, 112, 101, 114, 99, 101, 110, 116, 67, 0, 8, 112, 108, 117, 115, 83, 105, 103, 110, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 67, 0, 9, 122, 101, 114, 111, 68, 105, 103, 105, 116, 76, 0, 3, 78, 97, 78, 113, 0, 126, 0, 1, 76, 0, 14, 99, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 76, 0, 17, 101, 120, 112, 111, 110, 101, 110, 116, 83, 101, 112, 97, 114, 97, 116, 111, 114, 113, 0, 126, 0, 1, 76, 0, 8, 105, 110, 102, 105, 110, 105, 116, 121, 113, 0, 126, 0, 1, 76, 0, 18, 105, 110, 116, 108, 67, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 120, 112, 0, 46, 0, 35, 0, 0, 0, 44, 0, 45, 0, 46, 0, 42, 0, 59, 32, 48, 0, 37, 0, 43, 0, 0, 0, 2, 0, 48, 116, 0, 3, -17, -65, -67, 116, 0, 1, 36, 116, 0, 1, 69, 116, 0, 3, -30, -120, -98, 116, 0, 3, 85, 83, 68, }; //NumberFormat.getCurrencyInstance(Locale.US) static byte[] currencyInstance = new byte[]{ -84, -19, 0, 5, 115, 114, 0, 30, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 11, -1, 3, 98, -40, 114, 48, 58, 2, 0, 22, 90, 0, 27, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 90, 0, 23, 101, 120, 112, 111, 110, 101, 110, 116, 83, 105, 103, 110, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 73, 0, 11, 102, 111, 114, 109, 97, 116, 87, 105, 100, 116, 104, 66, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 66, 0, 13, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 50, 66, 0, 17, 109, 105, 110, 69, 120, 112, 111, 110, 101, 110, 116, 68, 105, 103, 105, 116, 115, 73, 0, 10, 109, 117, 108, 116, 105, 112, 108, 105, 101, 114, 67, 0, 3, 112, 97, 100, 73, 0, 11, 112, 97, 100, 80, 111, 115, 105, 116, 105, 111, 110, 73, 0, 12, 114, 111, 117, 110, 100, 105, 110, 103, 77, 111, 100, 101, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 90, 0, 22, 117, 115, 101, 69, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 78, 111, 116, 97, 116, 105, 111, 110, 76, 0, 16, 110, 101, 103, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 116, 0, 18, 76, 106, 97, 118, 97, 47, 108, 97, 110, 103, 47, 83, 116, 114, 105, 110, 103, 59, 76, 0, 16, 110, 101, 103, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 17, 114, 111, 117, 110, 100, 105, 110, 103, 73, 110, 99, 114, 101, 109, 101, 110, 116, 116, 0, 22, 76, 106, 97, 118, 97, 47, 109, 97, 116, 104, 47, 66, 105, 103, 68, 101, 99, 105, 109, 97, 108, 59, 76, 0, 7, 115, 121, 109, 98, 111, 108, 115, 116, 0, 39, 76, 99, 111, 109, 47, 105, 98, 109, 47, 105, 99, 117, 47, 116, 101, 120, 116, 47, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 59, 120, 114, 0, 29, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 78, 117, 109, 98, 101, 114, 70, 111, 114, 109, 97, 116, -33, -10, -77, -65, 19, 125, 7, -24, 3, 0, 11, 90, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 85, 115, 101, 100, 66, 0, 17, 109, 97, 120, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 97, 120, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 97, 120, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 97, 120, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 66, 0, 17, 109, 105, 110, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 105, 110, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 105, 110, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 105, 110, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 90, 0, 16, 112, 97, 114, 115, 101, 73, 110, 116, 101, 103, 101, 114, 79, 110, 108, 121, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 120, 114, 0, 16, 106, 97, 118, 97, 46, 116, 101, 120, 116, 46, 70, 111, 114, 109, 97, 116, -5, -40, -68, 18, -23, 15, 24, 67, 2, 0, 0, 120, 112, 1, 2, 127, 0, 0, 0, 2, 0, 0, 1, 53, 2, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 120, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 32, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 2, 0, 116, 0, 3, 40, -62, -92, 116, 0, 1, 41, 116, 0, 2, 40, 36, 116, 0, 1, 41, 116, 0, 2, -62, -92, 116, 0, 0, 116, 0, 1, 36, 116, 0, 0, 112, 115, 114, 0, 37, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 80, 29, 23, -103, 8, 104, -109, -100, 2, 0, 18, 67, 0, 16, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 5, 100, 105, 103, 105, 116, 67, 0, 11, 101, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 67, 0, 17, 103, 114, 111, 117, 112, 105, 110, 103, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 109, 105, 110, 117, 115, 83, 105, 103, 110, 67, 0, 17, 109, 111, 110, 101, 116, 97, 114, 121, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 112, 97, 100, 69, 115, 99, 97, 112, 101, 67, 0, 16, 112, 97, 116, 116, 101, 114, 110, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 7, 112, 101, 114, 77, 105, 108, 108, 67, 0, 7, 112, 101, 114, 99, 101, 110, 116, 67, 0, 8, 112, 108, 117, 115, 83, 105, 103, 110, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 67, 0, 9, 122, 101, 114, 111, 68, 105, 103, 105, 116, 76, 0, 3, 78, 97, 78, 113, 0, 126, 0, 1, 76, 0, 14, 99, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 76, 0, 17, 101, 120, 112, 111, 110, 101, 110, 116, 83, 101, 112, 97, 114, 97, 116, 111, 114, 113, 0, 126, 0, 1, 76, 0, 8, 105, 110, 102, 105, 110, 105, 116, 121, 113, 0, 126, 0, 1, 76, 0, 18, 105, 110, 116, 108, 67, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 120, 112, 0, 46, 0, 35, 0, 0, 0, 44, 0, 45, 0, 46, 0, 42, 0, 59, 32, 48, 0, 37, 0, 43, 0, 0, 0, 2, 0, 48, 116, 0, 3, -17, -65, -67, 116, 0, 1, 36, 116, 0, 1, 69, 116, 0, 3, -30, -120, -98, 116, 0, 3, 85, 83, 68, }; //NumberFormat.getPercentInstance(Locale.US) static byte[] percentInstance = new byte[]{ -84, -19, 0, 5, 115, 114, 0, 30, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 11, -1, 3, 98, -40, 114, 48, 58, 2, 0, 22, 90, 0, 27, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 90, 0, 23, 101, 120, 112, 111, 110, 101, 110, 116, 83, 105, 103, 110, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 73, 0, 11, 102, 111, 114, 109, 97, 116, 87, 105, 100, 116, 104, 66, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 66, 0, 13, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 50, 66, 0, 17, 109, 105, 110, 69, 120, 112, 111, 110, 101, 110, 116, 68, 105, 103, 105, 116, 115, 73, 0, 10, 109, 117, 108, 116, 105, 112, 108, 105, 101, 114, 67, 0, 3, 112, 97, 100, 73, 0, 11, 112, 97, 100, 80, 111, 115, 105, 116, 105, 111, 110, 73, 0, 12, 114, 111, 117, 110, 100, 105, 110, 103, 77, 111, 100, 101, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 90, 0, 22, 117, 115, 101, 69, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 78, 111, 116, 97, 116, 105, 111, 110, 76, 0, 16, 110, 101, 103, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 116, 0, 18, 76, 106, 97, 118, 97, 47, 108, 97, 110, 103, 47, 83, 116, 114, 105, 110, 103, 59, 76, 0, 16, 110, 101, 103, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 17, 114, 111, 117, 110, 100, 105, 110, 103, 73, 110, 99, 114, 101, 109, 101, 110, 116, 116, 0, 22, 76, 106, 97, 118, 97, 47, 109, 97, 116, 104, 47, 66, 105, 103, 68, 101, 99, 105, 109, 97, 108, 59, 76, 0, 7, 115, 121, 109, 98, 111, 108, 115, 116, 0, 39, 76, 99, 111, 109, 47, 105, 98, 109, 47, 105, 99, 117, 47, 116, 101, 120, 116, 47, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 59, 120, 114, 0, 29, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 78, 117, 109, 98, 101, 114, 70, 111, 114, 109, 97, 116, -33, -10, -77, -65, 19, 125, 7, -24, 3, 0, 11, 90, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 85, 115, 101, 100, 66, 0, 17, 109, 97, 120, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 97, 120, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 97, 120, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 97, 120, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 66, 0, 17, 109, 105, 110, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 105, 110, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 105, 110, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 105, 110, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 90, 0, 16, 112, 97, 114, 115, 101, 73, 110, 116, 101, 103, 101, 114, 79, 110, 108, 121, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 120, 114, 0, 16, 106, 97, 118, 97, 46, 116, 101, 120, 116, 46, 70, 111, 114, 109, 97, 116, -5, -40, -68, 18, -23, 15, 24, 67, 2, 0, 0, 120, 112, 1, 0, 127, 0, 0, 0, 0, 0, 0, 1, 53, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 120, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 100, 0, 32, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 2, 0, 116, 0, 1, 45, 116, 0, 1, 37, 116, 0, 1, 45, 116, 0, 1, 37, 116, 0, 0, 113, 0, 126, 0, 8, 116, 0, 0, 116, 0, 1, 37, 112, 115, 114, 0, 37, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 80, 29, 23, -103, 8, 104, -109, -100, 2, 0, 18, 67, 0, 16, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 5, 100, 105, 103, 105, 116, 67, 0, 11, 101, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 67, 0, 17, 103, 114, 111, 117, 112, 105, 110, 103, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 109, 105, 110, 117, 115, 83, 105, 103, 110, 67, 0, 17, 109, 111, 110, 101, 116, 97, 114, 121, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 112, 97, 100, 69, 115, 99, 97, 112, 101, 67, 0, 16, 112, 97, 116, 116, 101, 114, 110, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 7, 112, 101, 114, 77, 105, 108, 108, 67, 0, 7, 112, 101, 114, 99, 101, 110, 116, 67, 0, 8, 112, 108, 117, 115, 83, 105, 103, 110, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 67, 0, 9, 122, 101, 114, 111, 68, 105, 103, 105, 116, 76, 0, 3, 78, 97, 78, 113, 0, 126, 0, 1, 76, 0, 14, 99, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 76, 0, 17, 101, 120, 112, 111, 110, 101, 110, 116, 83, 101, 112, 97, 114, 97, 116, 111, 114, 113, 0, 126, 0, 1, 76, 0, 8, 105, 110, 102, 105, 110, 105, 116, 121, 113, 0, 126, 0, 1, 76, 0, 18, 105, 110, 116, 108, 67, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 120, 112, 0, 46, 0, 35, 0, 0, 0, 44, 0, 45, 0, 46, 0, 42, 0, 59, 32, 48, 0, 37, 0, 43, 0, 0, 0, 2, 0, 48, 116, 0, 3, -17, -65, -67, 116, 0, 1, 36, 116, 0, 1, 69, 116, 0, 3, -30, -120, -98, 116, 0, 3, 85, 83, 68, }; //NumberFormat.getScientificInstance(Locale.US) static byte[] scientificInstance = new byte[]{ -84, -19, 0, 5, 115, 114, 0, 30, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 11, -1, 3, 98, -40, 114, 48, 58, 2, 0, 22, 90, 0, 27, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 90, 0, 23, 101, 120, 112, 111, 110, 101, 110, 116, 83, 105, 103, 110, 65, 108, 119, 97, 121, 115, 83, 104, 111, 119, 110, 73, 0, 11, 102, 111, 114, 109, 97, 116, 87, 105, 100, 116, 104, 66, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 66, 0, 13, 103, 114, 111, 117, 112, 105, 110, 103, 83, 105, 122, 101, 50, 66, 0, 17, 109, 105, 110, 69, 120, 112, 111, 110, 101, 110, 116, 68, 105, 103, 105, 116, 115, 73, 0, 10, 109, 117, 108, 116, 105, 112, 108, 105, 101, 114, 67, 0, 3, 112, 97, 100, 73, 0, 11, 112, 97, 100, 80, 111, 115, 105, 116, 105, 111, 110, 73, 0, 12, 114, 111, 117, 110, 100, 105, 110, 103, 77, 111, 100, 101, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 90, 0, 22, 117, 115, 101, 69, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 78, 111, 116, 97, 116, 105, 111, 110, 76, 0, 16, 110, 101, 103, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 116, 0, 18, 76, 106, 97, 118, 97, 47, 108, 97, 110, 103, 47, 83, 116, 114, 105, 110, 103, 59, 76, 0, 16, 110, 101, 103, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 110, 101, 103, 97, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 80, 114, 101, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 16, 112, 111, 115, 83, 117, 102, 102, 105, 120, 80, 97, 116, 116, 101, 114, 110, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 80, 114, 101, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 14, 112, 111, 115, 105, 116, 105, 118, 101, 83, 117, 102, 102, 105, 120, 113, 0, 126, 0, 1, 76, 0, 17, 114, 111, 117, 110, 100, 105, 110, 103, 73, 110, 99, 114, 101, 109, 101, 110, 116, 116, 0, 22, 76, 106, 97, 118, 97, 47, 109, 97, 116, 104, 47, 66, 105, 103, 68, 101, 99, 105, 109, 97, 108, 59, 76, 0, 7, 115, 121, 109, 98, 111, 108, 115, 116, 0, 39, 76, 99, 111, 109, 47, 105, 98, 109, 47, 105, 99, 117, 47, 116, 101, 120, 116, 47, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 59, 120, 114, 0, 29, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 78, 117, 109, 98, 101, 114, 70, 111, 114, 109, 97, 116, -33, -10, -77, -65, 19, 125, 7, -24, 3, 0, 11, 90, 0, 12, 103, 114, 111, 117, 112, 105, 110, 103, 85, 115, 101, 100, 66, 0, 17, 109, 97, 120, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 97, 120, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 97, 120, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 97, 120, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 66, 0, 17, 109, 105, 110, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 66, 0, 16, 109, 105, 110, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 73, 0, 21, 109, 105, 110, 105, 109, 117, 109, 70, 114, 97, 99, 116, 105, 111, 110, 68, 105, 103, 105, 116, 115, 73, 0, 20, 109, 105, 110, 105, 109, 117, 109, 73, 110, 116, 101, 103, 101, 114, 68, 105, 103, 105, 116, 115, 90, 0, 16, 112, 97, 114, 115, 101, 73, 110, 116, 101, 103, 101, 114, 79, 110, 108, 121, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 120, 114, 0, 16, 106, 97, 118, 97, 46, 116, 101, 120, 116, 46, 70, 111, 114, 109, 97, 116, -5, -40, -68, 18, -23, 15, 24, 67, 2, 0, 0, 120, 112, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 120, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 32, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 2, 1, 116, 0, 1, 45, 116, 0, 0, 116, 0, 1, 45, 116, 0, 0, 116, 0, 0, 113, 0, 126, 0, 8, 116, 0, 0, 116, 0, 0, 112, 115, 114, 0, 37, 99, 111, 109, 46, 105, 98, 109, 46, 105, 99, 117, 46, 116, 101, 120, 116, 46, 68, 101, 99, 105, 109, 97, 108, 70, 111, 114, 109, 97, 116, 83, 121, 109, 98, 111, 108, 115, 80, 29, 23, -103, 8, 104, -109, -100, 2, 0, 18, 67, 0, 16, 100, 101, 99, 105, 109, 97, 108, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 5, 100, 105, 103, 105, 116, 67, 0, 11, 101, 120, 112, 111, 110, 101, 110, 116, 105, 97, 108, 67, 0, 17, 103, 114, 111, 117, 112, 105, 110, 103, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 109, 105, 110, 117, 115, 83, 105, 103, 110, 67, 0, 17, 109, 111, 110, 101, 116, 97, 114, 121, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 9, 112, 97, 100, 69, 115, 99, 97, 112, 101, 67, 0, 16, 112, 97, 116, 116, 101, 114, 110, 83, 101, 112, 97, 114, 97, 116, 111, 114, 67, 0, 7, 112, 101, 114, 77, 105, 108, 108, 67, 0, 7, 112, 101, 114, 99, 101, 110, 116, 67, 0, 8, 112, 108, 117, 115, 83, 105, 103, 110, 73, 0, 21, 115, 101, 114, 105, 97, 108, 86, 101, 114, 115, 105, 111, 110, 79, 110, 83, 116, 114, 101, 97, 109, 67, 0, 9, 122, 101, 114, 111, 68, 105, 103, 105, 116, 76, 0, 3, 78, 97, 78, 113, 0, 126, 0, 1, 76, 0, 14, 99, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 76, 0, 17, 101, 120, 112, 111, 110, 101, 110, 116, 83, 101, 112, 97, 114, 97, 116, 111, 114, 113, 0, 126, 0, 1, 76, 0, 8, 105, 110, 102, 105, 110, 105, 116, 121, 113, 0, 126, 0, 1, 76, 0, 18, 105, 110, 116, 108, 67, 117, 114, 114, 101, 110, 99, 121, 83, 121, 109, 98, 111, 108, 113, 0, 126, 0, 1, 120, 112, 0, 46, 0, 35, 0, 0, 0, 44, 0, 45, 0, 46, 0, 42, 0, 59, 32, 48, 0, 37, 0, 43, 0, 0, 0, 2, 0, 48, 116, 0, 3, -17, -65, -67, 116, 0, 1, 36, 116, 0, 1, 69, 116, 0, 3, -30, -120, -98, 116, 0, 3, 85, 83, 68, }; final static byte[][] content = {generalInstance, currencyInstance, percentInstance, scientificInstance}; } icu4j-4.2/src/com/ibm/icu/dev/test/format/DataDrivenFormatTest.java0000644000175000017500000001607711361046232025162 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2007-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import java.text.FieldPosition; import java.text.ParsePosition; import java.util.Date; import java.util.Iterator; import com.ibm.icu.dev.test.ModuleTest; import com.ibm.icu.dev.test.TestDataModule; import com.ibm.icu.dev.test.TestDataModule.DataMap; import com.ibm.icu.dev.test.util.CalendarFieldsSet; import com.ibm.icu.dev.test.util.DateTimeStyleSet; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.util.Calendar; import com.ibm.icu.util.ULocale; /** * @author srl * */ public class DataDrivenFormatTest extends ModuleTest { /** * @param baseName * @param locName */ public DataDrivenFormatTest() { super("com/ibm/icu/dev/data/testdata/", "format"); } /* (non-Javadoc) * @see com.ibm.icu.dev.test.ModuleTest#processModules() */ public void processModules() { //String testName = t.getName().toString(); for (Iterator siter = t.getSettingsIterator(); siter.hasNext();) { // Iterate through and get each of the test case to process DataMap settings = (DataMap) siter.next(); String type = settings.getString("Type"); if(type.equals("date_format")) { testConvertDate(t, settings, true); } else if(type.equals("date_parse")) { testConvertDate(t, settings, false); } else { errln("Unknown type: " + type); } } } /** * @param args */ public static void main(String[] args) { new DataDrivenFormatTest().run(args); } private static final String kPATTERN = "PATTERN="; private static final String kMILLIS = "MILLIS="; private static final String kRELATIVE_MILLIS = "RELATIVE_MILLIS="; private static final String kRELATIVE_ADD = "RELATIVE_ADD:"; private void testConvertDate(TestDataModule.TestData testData, DataMap settings, boolean fmt) { DateFormat basicFmt = new SimpleDateFormat("EEE MMM dd yyyy / YYYY'-W'ww-ee"); int n = 0; for (Iterator iter = testData.getDataIterator(); iter.hasNext();) { ++n; long now = System.currentTimeMillis(); DataMap currentCase = (DataMap) iter.next(); String caseString = "["+testData.getName()+"#"+n+(fmt?"format":"parse")+"]"; String locale = currentCase.getString("locale"); String spec = currentCase.getString("spec"); String date = currentCase.getString("date"); String str = currentCase.getString("str"); Date fromDate = null; boolean useDate = false; ULocale loc = new ULocale(locale); String pattern = null; // boolean usePattern = false; DateFormat format = null; DateTimeStyleSet styleSet; CalendarFieldsSet fromSet = null; // parse 'spec' - either 'PATTERN=yy mm dd' or 'DATE=x,TIME=y' if(spec.startsWith(kPATTERN)) { pattern = spec.substring(kPATTERN.length()); // usePattern = true; format = new SimpleDateFormat(pattern, loc); } else { styleSet = new DateTimeStyleSet(); styleSet.parseFrom(spec); format = DateFormat.getDateTimeInstance(styleSet.getDateStyle(), styleSet.getTimeStyle(), loc); } Calendar cal = Calendar.getInstance(loc); // parse 'date' - either 'MILLIS=12345' or a CalendarFieldsSet if(date.startsWith(kMILLIS)) { useDate = true; fromDate = new Date(Long.parseLong(date.substring(kMILLIS.length()))); } else if(date.startsWith(kRELATIVE_MILLIS)) { useDate = true; fromDate = new Date(now+Long.parseLong(date.substring(kRELATIVE_MILLIS.length()))); } else if(date.startsWith(kRELATIVE_ADD)) { String add = date.substring(kRELATIVE_ADD.length()); // "add" is a string indicating which fields to add CalendarFieldsSet addSet = new CalendarFieldsSet(); addSet.parseFrom(add); useDate = true; cal.clear(); cal.setTimeInMillis(now); /// perform op on 'to calendar' for (int q=0; q " + text + " -> " + rt); } ++count; } if (lowLimit < 0) { double d = 1.234; while (d < 1000) { String text = formatter.format(d); double rt = formatter.parse(text).doubleValue(); if (rt != d) { errln("Round-trip failed: " + d + " -> " + text + " -> " + rt); } d *= 10; } } } catch (Throwable e) { errln("Test failed with exception: " + e.toString()); e.printStackTrace(); } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/DateFormatRegressionTestJ.java0000644000175000017500000002442211361046232026162 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2005, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /* * New added, 2001-10-17 [Jing/GCL] */ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.*; import com.ibm.icu.util.*; import java.util.Date; import java.text.ParseException; import java.text.ParsePosition; import java.util.Locale; public class DateFormatRegressionTestJ extends com.ibm.icu.dev.test.TestFmwk { private static final String TIME_STRING = "2000/11/17 08:01:00"; private static final long UTC_LONG = 974476860000L; private static SimpleDateFormat sdf_; protected void init()throws Exception{ sdf_ = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss"); } public static void main(String[] args) throws Exception { new DateFormatRegressionTestJ().run(args); } //Return value of getAmPmStrings public void Test4103926() { String act_Ampms[]; String exp_Ampms[]={"AM","PM"}; Locale.setDefault(Locale.US); DateFormatSymbols dfs = new DateFormatSymbols(); act_Ampms = dfs.getAmPmStrings(); if(act_Ampms.length != exp_Ampms.length) { errln("The result is not expected!"); } else { for(int i =0; i" + str); d = sdf.parse(str, new ParsePosition(0)); logln(" after parse----->" + d.toString()); str = sdf.format(d); logln(" after format----->" + str); d = sdf.parse(str, new ParsePosition(0)); logln(" after parse----->" + d.toString()); str = sdf.format(d); logln(" after format----->" + str); } } //Class used by Test4407042 class DateParseThread extends Thread { public void run() { SimpleDateFormat sdf = (SimpleDateFormat) sdf_.clone(); TimeZone defaultTZ = TimeZone.getDefault(); TimeZone PST = TimeZone.getTimeZone("PST"); int defaultOffset = defaultTZ.getRawOffset(); int PSTOffset = PST.getRawOffset(); int offset = defaultOffset - PSTOffset; long ms = UTC_LONG - offset; try { int i = 0; while (i < 10000) { Date date = sdf.parse(TIME_STRING); long t = date.getTime(); i++; if (t != ms) { throw new ParseException("Parse Error: " + i + " (" + sdf.format(date) + ") " + t + " != " + ms, 0); } } } catch (Exception e) { errln("parse error: " + e.getMessage()); } } } //Class used by Test4407042 class DateFormatThread extends Thread { public void run() { SimpleDateFormat sdf = (SimpleDateFormat) sdf_.clone(); TimeZone tz = TimeZone.getTimeZone("PST"); sdf.setTimeZone(tz); int i = 0; while (i < 10000) { i++; String s = sdf.format(new Date(UTC_LONG)); if (!s.equals(TIME_STRING)) { errln("Format Error: " + i + " " + s + " != " + TIME_STRING); } } } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/TimeZoneFormatTest.java0000644000175000017500000004360411361046232024667 0ustar twernertwerner/* ******************************************************************************** * Copyright (C) 2007-2009, Google, International Business Machines Corporation * * and others. All Rights Reserved. * ******************************************************************************** */ package com.ibm.icu.dev.test.format; import java.text.ParseException; import java.text.ParsePosition; import java.util.Date; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.util.BasicTimeZone; import com.ibm.icu.util.Calendar; import com.ibm.icu.util.SimpleTimeZone; import com.ibm.icu.util.TimeZone; import com.ibm.icu.util.TimeZoneTransition; import com.ibm.icu.util.ULocale; public class TimeZoneFormatTest extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new TimeZoneFormatTest().run(args); } private static final String[] PATTERNS = {"z", "zzzz", "Z", "ZZZZ", "v", "vvvv", "V", "VVVV"}; /* * Test case for checking if a TimeZone is properly set in the result calendar * and if the result TimeZone has the expected behavior. */ public void TestTimeZoneRoundTrip() { TimeZone unknownZone = new SimpleTimeZone(-31415, "Etc/Unknown"); int badDstOffset = -1234; int badZoneOffset = -2345; int[][] testDateData = { {2007, 1, 15}, {2007, 6, 15}, {1990, 1, 15}, {1990, 6, 15}, {1960, 1, 15}, {1960, 6, 15}, }; Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC")); cal.clear(); // Set up rule equivalency test range long low, high; cal.set(1900, 0, 1); low = cal.getTimeInMillis(); cal.set(2040, 0, 1); high = cal.getTimeInMillis(); // Set up test dates Date[] DATES = new Date[testDateData.length]; cal.clear(); for (int i = 0; i < DATES.length; i++) { cal.set(testDateData[i][0], testDateData[i][1], testDateData[i][2]); DATES[i] = cal.getTime(); } // Set up test locales ULocale[] LOCALES = null; if (getInclusion() > 5) { LOCALES = ULocale.getAvailableLocales(); } else { LOCALES = new ULocale[] {new ULocale("en"), new ULocale("en_CA"), new ULocale("fr"), new ULocale("zh_Hant")}; } String[] tzids = TimeZone.getAvailableIDs(); int[] inOffsets = new int[2]; int[] outOffsets = new int[2]; // Run the roundtrip test for (int locidx = 0; locidx < LOCALES.length; locidx++) { for (int patidx = 0; patidx < PATTERNS.length; patidx++) { SimpleDateFormat sdf = new SimpleDateFormat(PATTERNS[patidx], LOCALES[locidx]); for (int tzidx = 0; tzidx < tzids.length; tzidx++) { TimeZone tz = TimeZone.getTimeZone(tzids[tzidx]); for (int datidx = 0; datidx < DATES.length; datidx++) { // Format sdf.setTimeZone(tz); String tzstr = sdf.format(DATES[datidx]); // Before parse, set unknown zone to SimpleDateFormat instance // just for making sure that it does not depends on the time zone // originally set. sdf.setTimeZone(unknownZone); // Parse ParsePosition pos = new ParsePosition(0); Calendar outcal = Calendar.getInstance(unknownZone); outcal.set(Calendar.DST_OFFSET, badDstOffset); outcal.set(Calendar.ZONE_OFFSET, badZoneOffset); sdf.parse(tzstr, outcal, pos); // Check the result TimeZone outtz = outcal.getTimeZone(); tz.getOffset(DATES[datidx].getTime(), false, inOffsets); outtz.getOffset(DATES[datidx].getTime(), false, outOffsets); if (PATTERNS[patidx].equals("VVVV")) { // Location: time zone rule must be preserved except // zones not actually associated with a specific location. // Time zones in this category do not have "/" in its ID. String canonicalID = TimeZone.getCanonicalID(tzids[tzidx]); if (canonicalID != null && !outtz.getID().equals(canonicalID)) { // Canonical ID did not match - check the rules boolean bFailure = false; if ((tz instanceof BasicTimeZone) && (outtz instanceof BasicTimeZone)) { bFailure = !(canonicalID.indexOf('/') == -1) && !((BasicTimeZone)outtz).hasEquivalentTransitions(tz, low, high); } if (bFailure) { errln("Canonical round trip failed; tz=" + tzids[tzidx] + ", locale=" + LOCALES[locidx] + ", pattern=" + PATTERNS[patidx] + ", time=" + DATES[datidx].getTime() + ", str=" + tzstr + ", outtz=" + outtz.getID()); } else { logln("Canonical round trip failed (as expected); tz=" + tzids[tzidx] + ", locale=" + LOCALES[locidx] + ", pattern=" + PATTERNS[patidx] + ", time=" + DATES[datidx].getTime() + ", str=" + tzstr + ", outtz=" + outtz.getID()); } } } else { // Check if localized GMT format or RFC format is used. int numDigits = 0; for (int n = 0; n < tzstr.length(); n++) { if (UCharacter.isDigit(tzstr.charAt(n))) { numDigits++; } } if (numDigits >= 3) { // Localized GMT or RFC: total offset (raw + dst) must be preserved. int inOffset = inOffsets[0] + inOffsets[1]; int outOffset = outOffsets[0] + outOffsets[1]; if (inOffset != outOffset) { errln("Offset round trip failed; tz=" + tzids[tzidx] + ", locale=" + LOCALES[locidx] + ", pattern=" + PATTERNS[patidx] + ", time=" + DATES[datidx].getTime() + ", str=" + tzstr + ", inOffset=" + inOffset + ", outOffset=" + outOffset); } } else { // Specific or generic: raw offset must be preserved. if (inOffsets[0] != outOffsets[0]) { if (TimeZone.getDefaultTimeZoneType() == TimeZone.TIMEZONE_JDK && tzids[tzidx].startsWith("SystemV/")) { // JDK uses rule SystemV for these zones while // ICU handles these zones as aliases of existing time zones logln("Raw offset round trip failed; tz=" + tzids[tzidx] + ", locale=" + LOCALES[locidx] + ", pattern=" + PATTERNS[patidx] + ", time=" + DATES[datidx].getTime() + ", str=" + tzstr + ", inRawOffset=" + inOffsets[0] + ", outRawOffset=" + outOffsets[0]); } else { errln("Raw offset round trip failed; tz=" + tzids[tzidx] + ", locale=" + LOCALES[locidx] + ", pattern=" + PATTERNS[patidx] + ", time=" + DATES[datidx].getTime() + ", str=" + tzstr + ", inRawOffset=" + inOffsets[0] + ", outRawOffset=" + outOffsets[0]); } } } } } } } } } /* * Test case of round trip time and text. This test case detects every canonical TimeZone's * rule transition since 1900 until 2020, then check if time around each transition can * round trip as expected. */ public void TestTimeRoundTrip() { boolean TEST_ALL = "true".equalsIgnoreCase(getProperty("TimeZoneRoundTripAll")); int startYear, endYear; if (TEST_ALL || getInclusion() > 5) { startYear = 1900; } else { startYear = 1990; } Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC")); endYear = cal.get(Calendar.YEAR) + 3; cal.set(startYear, Calendar.JANUARY, 1); final long START_TIME = cal.getTimeInMillis(); cal.set(endYear, Calendar.JANUARY, 1); final long END_TIME = cal.getTimeInMillis(); // Whether each pattern is ambiguous at DST->STD local time overlap final boolean[] AMBIGUOUS_DST_DECESSION = {false, false, false, false, true, true, false, true}; // Whether each pattern is ambiguous at STD->STD/DST->DST local time overlap final boolean[] AMBIGUOUS_NEGATIVE_SHIFT = {true, true, false, false, true, true, true, true}; final String BASEPATTERN = "yyyy-MM-dd'T'HH:mm:ss.SSS"; ULocale[] LOCALES = null; boolean REALLY_VERBOSE = false; // timer for performance analysis long[] times = new long[PATTERNS.length]; long timer; if (TEST_ALL) { // It may take about an hour for testing all locales LOCALES = ULocale.getAvailableLocales(); } else if (getInclusion() > 5) { LOCALES = new ULocale[] { new ULocale("ar_EG"), new ULocale("bg_BG"), new ULocale("ca_ES"), new ULocale("da_DK"), new ULocale("de"), new ULocale("de_DE"), new ULocale("el_GR"), new ULocale("en"), new ULocale("en_AU"), new ULocale("en_CA"), new ULocale("en_US"), new ULocale("es"), new ULocale("es_ES"), new ULocale("es_MX"), new ULocale("fi_FI"), new ULocale("fr"), new ULocale("fr_CA"), new ULocale("fr_FR"), new ULocale("he_IL"), new ULocale("hu_HU"), new ULocale("it"), new ULocale("it_IT"), new ULocale("ja"), new ULocale("ja_JP"), new ULocale("ko"), new ULocale("ko_KR"), new ULocale("nb_NO"), new ULocale("nl_NL"), new ULocale("nn_NO"), new ULocale("pl_PL"), new ULocale("pt"), new ULocale("pt_BR"), new ULocale("pt_PT"), new ULocale("ru_RU"), new ULocale("sv_SE"), new ULocale("th_TH"), new ULocale("tr_TR"), new ULocale("zh"), new ULocale("zh_Hans"), new ULocale("zh_Hans_CN"), new ULocale("zh_Hant"), new ULocale("zh_Hant_HK"), new ULocale("zh_Hant_TW") }; } else { LOCALES = new ULocale[] { new ULocale("en"), }; } SimpleDateFormat sdfGMT = new SimpleDateFormat(BASEPATTERN); sdfGMT.setTimeZone(TimeZone.getTimeZone("Etc/GMT")); long testCounts = 0; long[] testTimes = new long[4]; boolean[] expectedRoundTrip = new boolean[4]; int testLen = 0; for (int locidx = 0; locidx < LOCALES.length; locidx++) { logln("Locale: " + LOCALES[locidx].toString()); for (int patidx = 0; patidx < PATTERNS.length; patidx++) { logln(" pattern: " + PATTERNS[patidx]); String pattern = BASEPATTERN + " " + PATTERNS[patidx]; SimpleDateFormat sdf = new SimpleDateFormat(pattern, LOCALES[locidx]); String[] ids = TimeZone.getAvailableIDs(); for (int zidx = 0; zidx < ids.length; zidx++) { String id = TimeZone.getCanonicalID(ids[zidx]); if (id == null || !id.equals(ids[zidx])) { // Skip aliases continue; } BasicTimeZone btz = (BasicTimeZone)TimeZone.getTimeZone(ids[zidx], TimeZone.TIMEZONE_ICU); TimeZone tz = TimeZone.getTimeZone(ids[zidx]); sdf.setTimeZone(tz); long t = START_TIME; TimeZoneTransition tzt = null; boolean middle = true; while (t < END_TIME) { if (tzt == null) { testTimes[0] = t; expectedRoundTrip[0] = true; testLen = 1; } else { int fromOffset = tzt.getFrom().getRawOffset() + tzt.getFrom().getDSTSavings(); int toOffset = tzt.getTo().getRawOffset() + tzt.getTo().getDSTSavings(); int delta = toOffset - fromOffset; if (delta < 0) { boolean isDstDecession = tzt.getFrom().getDSTSavings() > 0 && tzt.getTo().getDSTSavings() == 0; testTimes[0] = t + delta - 1; expectedRoundTrip[0] = true; testTimes[1] = t + delta; expectedRoundTrip[1] = isDstDecession ? !AMBIGUOUS_DST_DECESSION[patidx] : !AMBIGUOUS_NEGATIVE_SHIFT[patidx]; testTimes[2] = t - 1; expectedRoundTrip[2] = isDstDecession ? !AMBIGUOUS_DST_DECESSION[patidx] : !AMBIGUOUS_NEGATIVE_SHIFT[patidx]; testTimes[3] = t; expectedRoundTrip[3] = true; testLen = 4; } else { testTimes[0] = t - 1; expectedRoundTrip[0] = true; testTimes[1] = t; expectedRoundTrip[1] = true; testLen = 2; } } for (int testidx = 0; testidx < testLen; testidx++) { testCounts++; timer = System.currentTimeMillis(); String text = sdf.format(new Date(testTimes[testidx])); try { Date parsedDate = sdf.parse(text); long restime = parsedDate.getTime(); if (restime != testTimes[testidx]) { StringBuffer msg = new StringBuffer(); msg.append("Time round trip failed for ") .append("tzid=").append(ids[zidx]) .append(", locale=").append(LOCALES[locidx]) .append(", pattern=").append(PATTERNS[patidx]) .append(", text=").append(text) .append(", gmt=").append(sdfGMT.format(new Date(testTimes[testidx]))) .append(", time=").append(testTimes[testidx]) .append(", restime=").append(restime) .append(", diff=").append(restime - testTimes[testidx]); if (expectedRoundTrip[testidx]) { errln("FAIL: " + msg.toString()); } else if (REALLY_VERBOSE) { logln(msg.toString()); } } } catch (ParseException pe) { errln("FAIL: " + pe.getMessage()); } times[patidx] += System.currentTimeMillis() - timer; } tzt = btz.getNextTransition(t, false); if (tzt == null) { break; } if (middle) { // Test the date in the middle of two transitions. t += (tzt.getTime() - t)/2; middle = false; tzt = null; } else { t = tzt.getTime(); } } } } } long total = 0; logln("### Elapsed time by patterns ###"); for (int i = 0; i < PATTERNS.length; i++) { logln(times[i] + "ms (" + PATTERNS[i] + ")"); total += times[i]; } logln("Total: " + total + "ms"); logln("Iteration: " + testCounts); } }icu4j-4.2/src/com/ibm/icu/dev/test/format/TimeZoneAliases.txt0000644000175000017500000001171111361046232024050 0ustar twernertwerner#-------------------------------------------------------------------- # Copyright (c) 2004, International Business Machines # Corporation and others. All Rights Reserved. #-------------------------------------------------------------------- America/Atka ; America/Adak America/Ensenada ; America/Tijuana America/Fort_Wayne ; America/Indianapolis America/Indiana/Indianapolis ; America/Indianapolis America/Kentucky/Louisville ; America/Louisville America/Knox_IN ; America/Indiana/Knox America/Porto_Acre ; America/Rio_Branco America/Rosario ; America/Cordoba America/Shiprock ; America/Denver America/Virgin ; America/St_Thomas Antarctica/South_Pole ; Antarctica/McMurdo Arctic/Longyearbyen ; Europe/Oslo Asia/Ashkhabad ; Asia/Ashgabat Asia/Chungking ; Asia/Chongqing Asia/Dacca ; Asia/Dhaka Asia/Istanbul ; Europe/Istanbul Asia/Macao ; Asia/Macau Asia/Tel_Aviv ; Asia/Jerusalem Asia/Thimbu ; Asia/Thimphu Asia/Ujung_Pandang ; Asia/Makassar Asia/Ulan_Bator ; Asia/Ulaanbaatar #Atlantic/Jan_Mayen ; Europe/Oslo Australia/ACT ; Australia/Sydney Australia/Canberra ; Australia/Sydney Australia/LHI ; Australia/Lord_Howe Australia/NSW ; Australia/Sydney Australia/North ; Australia/Darwin Australia/Queensland ; Australia/Brisbane Australia/South ; Australia/Adelaide Australia/Tasmania ; Australia/Hobart Australia/Victoria ; Australia/Melbourne Australia/West ; Australia/Perth Australia/Yancowinna ; Australia/Broken_Hill Brazil/Acre ; America/Porto_Acre Brazil/DeNoronha ; America/Noronha Brazil/East ; America/Sao_Paulo Brazil/West ; America/Manaus CST6CDT ; America/Chicago Canada/Atlantic ; America/Halifax Canada/Central ; America/Winnipeg Canada/East-Saskatchewan ; America/Regina Canada/Eastern ; America/Toronto Canada/Mountain ; America/Edmonton Canada/Newfoundland ; America/St_Johns Canada/Pacific ; America/Vancouver Canada/Saskatchewan ; America/Regina Canada/Yukon ; America/Whitehorse Chile/Continental ; America/Santiago Chile/EasterIsland ; Pacific/Easter Cuba ; America/Havana EST ; America/Indianapolis EST5EDT ; America/New_York Egypt ; Africa/Cairo Eire ; Europe/Dublin Etc/GMT+0 ; Etc/GMT Etc/GMT-0 ; Etc/GMT Etc/GMT0 ; Etc/GMT Etc/Greenwich ; Etc/GMT Etc/Universal ; Etc/UTC Etc/Zulu ; Etc/UTC #Europe/Bratislava ; Europe/Prague #Europe/Ljubljana ; Europe/Belgrade Europe/Nicosia ; Asia/Nicosia #Europe/San_Marino ; Europe/Rome #Europe/Sarajevo ; Europe/Belgrade #Europe/Skopje ; Europe/Belgrade Europe/Tiraspol ; Europe/Chisinau #Europe/Vatican ; Europe/Rome #Europe/Zagreb ; Europe/Belgrade GB ; Europe/London GB-Eire ; Europe/London GMT ; Etc/GMT GMT+0 ; Etc/GMT+0 GMT-0 ; Etc/GMT-0 GMT0 ; Etc/GMT0 Greenwich ; Etc/Greenwich HST ; Pacific/Honolulu Hongkong ; Asia/Hong_Kong Iceland ; Atlantic/Reykjavik Iran ; Asia/Tehran Israel ; Asia/Jerusalem Jamaica ; America/Jamaica Japan ; Asia/Tokyo Kwajalein ; Pacific/Kwajalein Libya ; Africa/Tripoli MST ; America/Phoenix MST7MDT ; America/Denver Mexico/BajaNorte ; America/Tijuana Mexico/BajaSur ; America/Mazatlan Mexico/General ; America/Mexico_City Mideast/Riyadh87 ; Asia/Riyadh87 Mideast/Riyadh88 ; Asia/Riyadh88 Mideast/Riyadh89 ; Asia/Riyadh89 NZ ; Pacific/Auckland NZ-CHAT ; Pacific/Chatham Navajo ; America/Denver PRC ; Asia/Shanghai PST8PDT ; America/Los_Angeles Pacific/Samoa ; Pacific/Pago_Pago Poland ; Europe/Warsaw Portugal ; Europe/Lisbon ROC ; Asia/Taipei ROK ; Asia/Seoul Singapore ; Asia/Singapore SystemV/AST4 ; America/Puerto_Rico SystemV/AST4ADT ; America/Halifax SystemV/CST6 ; America/Regina SystemV/CST6CDT ; America/Chicago SystemV/EST5 ; America/Indianapolis SystemV/EST5EDT ; America/New_York SystemV/HST10 ; Pacific/Honolulu SystemV/MST7 ; America/Phoenix SystemV/MST7MDT ; America/Denver SystemV/PST8 ; Pacific/Pitcairn SystemV/PST8PDT ; America/Los_Angeles SystemV/YST9 ; Pacific/Gambier SystemV/YST9YDT ; America/Anchorage Turkey ; Europe/Istanbul UCT ; Etc/UCT US/Alaska ; America/Anchorage US/Aleutian ; America/Adak US/Arizona ; America/Phoenix US/Central ; America/Chicago US/East-Indiana ; America/Indianapolis US/Eastern ; America/New_York US/Hawaii ; Pacific/Honolulu US/Indiana-Starke ; America/Indiana/Knox US/Michigan ; America/Detroit US/Mountain ; America/Denver US/Pacific ; America/Los_Angeles US/Pacific-New ; America/Los_Angeles US/Samoa ; Pacific/Pago_Pago UTC ; Etc/UTC Universal ; Etc/Universal W-SU ; Europe/Moscow Zulu ; Etc/Zulu ACT ; Australia/Darwin AET ; Australia/Sydney AGT ; America/Buenos_Aires ART ; Africa/Cairo AST ; America/Anchorage BET ; America/Sao_Paulo BST ; Asia/Dhaka CAT ; Africa/Harare CNT ; America/St_Johns CST ; America/Chicago CTT ; Asia/Shanghai EAT ; Africa/Addis_Ababa ECT ; Europe/Paris IET ; America/Indianapolis IST ; Asia/Calcutta JST ; Asia/Tokyo MIT ; Pacific/Apia NET ; Asia/Yerevan NST ; Pacific/Auckland PLT ; Asia/Karachi PNT ; America/Phoenix PRT ; America/Puerto_Rico PST ; America/Los_Angeles SST ; Pacific/Guadalcanal VST ; Asia/Saigonicu4j-4.2/src/com/ibm/icu/dev/test/format/RbnfTest.java0000644000175000017500000013471211361050730022652 0ustar twernertwerner//##header /* ******************************************************************************* * Copyright (C) 1996-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import java.math.BigInteger; import java.text.DecimalFormat; import java.text.NumberFormat; import java.text.ParseException; import java.util.Locale; import java.util.Random; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.text.RuleBasedNumberFormat; import com.ibm.icu.util.ULocale; public class RbnfTest extends TestFmwk { public static void main(String[] args) { RbnfTest test = new RbnfTest(); try { test.run(args); } catch (Throwable e) { System.out.println("Entire test failed because of exception: " + e.toString()); e.printStackTrace(); } } static String fracRules = "%main:\n" + // this rule formats the number if it's 1 or more. It formats // the integral part using a DecimalFormat ("#,##0" puts // thousands separators in the right places) and the fractional // part using %%frac. If there is no fractional part, it // just shows the integral part. " x.0: <#,##0<[ >%%frac>];\n" + // this rule formats the number if it's between 0 and 1. It // shows only the fractional part (0.5 shows up as "1/2," not // "0 1/2") " 0.x: >%%frac>;\n" + // the fraction rule set. This works the same way as the one in the // preceding example: We multiply the fractional part of the number // being formatted by each rule's base value and use the rule that // produces the result closest to 0 (or the first rule that produces 0). // Since we only provide rules for the numbers from 2 to 10, we know // we'll get a fraction with a denominator between 2 and 10. // "<0<" causes the numerator of the fraction to be formatted // using numerals "%%frac:\n" + " 2: 1/2;\n" + " 3: <0>];\n" // use %%hr to format values greater than 3,600 seconds // (the ">>>" below causes us to see the number of minutes // when when there are zero minutes) + " 3600/60: <%%hr<[, >>>];\n" // this rule set takes care of the singular and plural forms // of "minute" + "%%min:\n" + " 0 minutes; 1 minute; =0= minutes;\n" // this rule set takes care of the singular and plural forms // of "hour" + "%%hr:\n" + " 0 hours; 1 hour; =0= hours;\n" // main rule set for formatting in numerals + "%in-numerals:\n" // values below 60 seconds are shown with "sec." + " =0= sec.;\n" // higher values are shown with colons: %%min-sec is used for // values below 3,600 seconds... + " 60: =%%min-sec=;\n" // ...and %%hr-min-sec is used for values of 3,600 seconds // and above + " 3600: =%%hr-min-sec=;\n" // this rule causes values of less than 10 minutes to show without // a leading zero + "%%min-sec:\n" + " 0: :=00=;\n" + " 60/60: <0<>>;\n" // this rule set is used for values of 3,600 or more. Minutes are always // shown, and always shown with two digits + "%%hr-min-sec:\n" + " 0: :=00=;\n" + " 60/60: <00<>>;\n" + " 3600/60: <#,##0<:>>>;\n" // the lenient-parse rules allow several different characters to be used // as delimiters between hours, minutes, and seconds + "%%lenient-parse:\n" + " & : = . = ' ' = -;\n"; public void TestCoverage() { // extra calls to boost coverage numbers RuleBasedNumberFormat fmt0 = new RuleBasedNumberFormat(RuleBasedNumberFormat.SPELLOUT); RuleBasedNumberFormat fmt1 = (RuleBasedNumberFormat)fmt0.clone(); RuleBasedNumberFormat fmt2 = new RuleBasedNumberFormat(RuleBasedNumberFormat.SPELLOUT); if (!fmt0.equals(fmt0)) { errln("self equality fails"); } if (!fmt0.equals(fmt1)) { errln("clone equality fails"); } if (!fmt0.equals(fmt2)) { errln("duplicate equality fails"); } String str = fmt0.toString(); logln(str); RuleBasedNumberFormat fmt3 = new RuleBasedNumberFormat(durationInSecondsRules); if (fmt0.equals(fmt3)) { errln("nonequal fails"); } if (!fmt3.equals(fmt3)) { errln("self equal 2 fails"); } str = fmt3.toString(); logln(str); String[] names = fmt3.getRuleSetNames(); try { fmt3.setDefaultRuleSet(null); fmt3.setDefaultRuleSet("%%foo"); errln("sdrf %%foo didn't fail"); } catch (Exception e) { logln("Got the expected exception"); } try { fmt3.setDefaultRuleSet("%bogus"); errln("sdrf %bogus didn't fail"); } catch (Exception e) { logln("Got the expected exception"); } try { str = fmt3.format(2.3, names[0]); logln(str); str = fmt3.format(2.3, "%%foo"); errln("format double %%foo didn't fail"); } catch (Exception e) { logln("Got the expected exception"); } try { str = fmt3.format(123L, names[0]); logln(str); str = fmt3.format(123L, "%%foo"); errln("format double %%foo didn't fail"); } catch (Exception e) { logln("Got the expected exception"); } RuleBasedNumberFormat fmt4 = new RuleBasedNumberFormat(fracRules, Locale.ENGLISH); RuleBasedNumberFormat fmt5 = new RuleBasedNumberFormat(fracRules, Locale.ENGLISH); str = fmt4.toString(); logln(str); if (!fmt4.equals(fmt5)) { errln("duplicate 2 equality failed"); } str = fmt4.format(123L); logln(str); try { Number num = fmt4.parse(str); logln(num.toString()); } catch (Exception e) { errln("parse caught exception"); } str = fmt4.format(.000123); logln(str); try { Number num = fmt4.parse(str); logln(num.toString()); } catch (Exception e) { errln("parse caught exception"); } str = fmt4.format(456.000123); logln(str); try { Number num = fmt4.parse(str); logln(num.toString()); } catch (Exception e) { errln("parse caught exception"); } } public void TestUndefinedSpellout() { Locale greek = new Locale("el", "", ""); RuleBasedNumberFormat[] formatters = { new RuleBasedNumberFormat(greek, RuleBasedNumberFormat.SPELLOUT), new RuleBasedNumberFormat(greek, RuleBasedNumberFormat.ORDINAL), new RuleBasedNumberFormat(greek, RuleBasedNumberFormat.DURATION), }; String[] data = { "0", "1", "15", "20", "23", "73", "88", "100", "106", "127", "200", "579", "1,000", "2,000", "3,004", "4,567", "15,943", "105,000", "2,345,678", "-36", "-36.91215", "234.56789" }; NumberFormat decFormat = NumberFormat.getInstance(Locale.US); for (int j = 0; j < formatters.length; ++j) { com.ibm.icu.text.NumberFormat formatter = formatters[j]; logln("formatter[" + j + "]"); for (int i = 0; i < data.length; ++i) { try { String result = formatter.format(decFormat.parse(data[i])); logln("[" + i + "] " + data[i] + " ==> " + result); } catch (Exception e) { errln("formatter[" + j + "], data[" + i + "] " + data[i] + " threw exception " + e.getMessage()); } } } } /** * Perform a simple spot check on the English spellout rules */ public void TestEnglishSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.US, RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "1", "one" }, { "15", "fifteen" }, { "20", "twenty" }, { "23", "twenty-three" }, { "73", "seventy-three" }, { "88", "eighty-eight" }, { "100", "one hundred" }, { "106", "one hundred six" }, { "127", "one hundred twenty-seven" }, { "200", "two hundred" }, { "579", "five hundred seventy-nine" }, { "1,000", "one thousand" }, { "2,000", "two thousand" }, { "3,004", "three thousand four" }, { "4,567", "four thousand five hundred sixty-seven" }, { "15,943", "fifteen thousand nine hundred forty-three" }, { "2,345,678", "two million three hundred forty-five " + "thousand six hundred seventy-eight" }, { "-36", "minus thirty-six" }, { "234.567", "two hundred thirty-four point five six seven" } }; doTest(formatter, testData, true); formatter.setLenientParseMode(true); String[][] lpTestData = { { "FOurhundred thiRTY six", "436" }, // test spaces before fifty-7 causing lenient parse match of "fifty-" to " fifty" // leaving "-7" for remaining parse, resulting in 2643 as the parse result. { "fifty-7", "57" }, { " fifty-7", "57" }, { " fifty-7", "57" }, { "2 thousand six HUNDRED fifty-7", "2,657" }, { "fifteen hundred and zero", "1,500" } }; doLenientParseTest(formatter, lpTestData); } /** * Perform a simple spot check on the English ordinal-abbreviation rules */ public void TestOrdinalAbbreviations() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.US, RuleBasedNumberFormat.ORDINAL); String[][] testData = { { "1", "1\u02e2\u1d57" }, { "2", "2\u207f\u1d48" }, { "3", "3\u02b3\u1d48" }, { "4", "4\u1d57\u02b0" }, { "7", "7\u1d57\u02b0" }, { "10", "10\u1d57\u02b0" }, { "11", "11\u1d57\u02b0" }, { "13", "13\u1d57\u02b0" }, { "20", "20\u1d57\u02b0" }, { "21", "21\u02e2\u1d57" }, { "22", "22\u207f\u1d48" }, { "23", "23\u02b3\u1d48" }, { "24", "24\u1d57\u02b0" }, { "33", "33\u02b3\u1d48" }, { "102", "102\u207f\u1d48" }, { "312", "312\u1d57\u02b0" }, { "12,345", "12,345\u1d57\u02b0" } }; doTest(formatter, testData, false); } /** * Perform a simple spot check on the duration-formatting rules */ public void TestDurations() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.US, RuleBasedNumberFormat.DURATION); String[][] testData = { { "3,600", "1:00:00" }, //move me and I fail { "0", "0 sec." }, { "1", "1 sec." }, { "24", "24 sec." }, { "60", "1:00" }, { "73", "1:13" }, { "145", "2:25" }, { "666", "11:06" }, // { "3,600", "1:00:00" }, { "3,740", "1:02:20" }, { "10,293", "2:51:33" } }; doTest(formatter, testData, true); formatter.setLenientParseMode(true); String[][] lpTestData = { { "2-51-33", "10,293" } }; doLenientParseTest(formatter, lpTestData); } /** * Perform a simple spot check on the Spanish spellout rules */ public void TestSpanishSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(new Locale("es", "es", ""), RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "1", "uno" }, { "6", "seis" }, { "16", "diecis\u00e9is" }, { "20", "veinte" }, { "24", "veinticuatro" }, { "26", "veintis\u00e9is" }, { "73", "setenta y tres" }, { "88", "ochenta y ocho" }, { "100", "cien" }, { "106", "ciento seis" }, { "127", "ciento veintisiete" }, { "200", "doscientos" }, { "579", "quinientos setenta y nueve" }, { "1,000", "mil" }, { "2,000", "dos mil" }, { "3,004", "tres mil cuatro" }, { "4,567", "cuatro mil quinientos sesenta y siete" }, { "15,943", "quince mil novecientos cuarenta y tres" }, { "2,345,678", "dos millones trescientos cuarenta y cinco mil " + "seiscientos setenta y ocho"}, { "-36", "menos treinta y seis" }, { "234.567", "doscientos treinta y cuatro coma cinco seis siete" } }; doTest(formatter, testData, true); } /** * Perform a simple spot check on the French spellout rules */ public void TestFrenchSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.FRANCE, RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "1", "un" }, { "15", "quinze" }, { "20", "vingt" }, { "21", "vingt-et-un" }, { "23", "vingt-trois" }, { "62", "soixante-deux" }, { "70", "soixante-dix" }, { "71", "soixante-et-onze" }, { "73", "soixante-treize" }, { "80", "quatre-vingts" }, { "88", "quatre-vingt-huit" }, { "100", "cent" }, { "106", "cent-six" }, { "127", "cent-vingt-sept" }, { "200", "deux-cents" }, { "579", "cinq-cent-soixante-dix-neuf" }, { "1,000", "mille" }, { "1,123", "mille-cent-vingt-trois" }, { "1,594", "mille-cinq-cent-quatre-vingt-quatorze" }, { "2,000", "deux-mille" }, { "3,004", "trois-mille-quatre" }, { "4,567", "quatre-mille-cinq-cent-soixante-sept" }, { "15,943", "quinze-mille-neuf-cent-quarante-trois" }, { "2,345,678", "deux millions trois-cent-quarante-cinq-mille-" + "six-cent-soixante-dix-huit" }, { "-36", "moins trente-six" }, { "234.567", "deux-cent-trente-quatre virgule cinq six sept" } }; doTest(formatter, testData, true); formatter.setLenientParseMode(true); String[][] lpTestData = { { "trente-et-un", "31" }, { "un cent quatre vingt dix huit", "198" } }; doLenientParseTest(formatter, lpTestData); } /** * Perform a simple spot check on the Swiss French spellout rules */ public void TestSwissFrenchSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(new Locale("fr", "CH", ""), RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "1", "un" }, { "15", "quinze" }, { "20", "vingt" }, { "21", "vingt-et-un" }, { "23", "vingt-trois" }, { "62", "soixante-deux" }, { "70", "septante" }, { "71", "septante-et-un" }, { "73", "septante-trois" }, { "80", "huitante" }, { "88", "huitante-huit" }, { "100", "cent" }, { "106", "cent-six" }, { "127", "cent-vingt-sept" }, { "200", "deux-cents" }, { "579", "cinq-cent-septante-neuf" }, { "1,000", "mille" }, { "1,123", "mille-cent-vingt-trois" }, { "1,594", "mille-cinq-cent-nonante-quatre" }, { "2,000", "deux-mille" }, { "3,004", "trois-mille-quatre" }, { "4,567", "quatre-mille-cinq-cent-soixante-sept" }, { "15,943", "quinze-mille-neuf-cent-quarante-trois" }, { "2,345,678", "deux millions trois-cent-quarante-cinq-mille-" + "six-cent-septante-huit" }, { "-36", "moins trente-six" }, { "234.567", "deux-cent-trente-quatre virgule cinq six sept" } }; doTest(formatter, testData, true); } /** * Perform a simple spot check on the Italian spellout rules */ public void TestItalianSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.ITALIAN, RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "1", "uno" }, { "15", "quindici" }, { "20", "venti" }, { "23", "venti\u00ADtr\u00E9" }, { "73", "settanta\u00ADtr\u00E9" }, { "88", "ottant\u00ADotto" }, { "100", "cento" }, { "106", "cento\u00ADsei" }, { "108", "cent\u00ADotto" }, { "127", "cento\u00ADventi\u00ADsette" }, { "181", "cent\u00ADottant\u00ADuno" }, { "200", "due\u00ADcento" }, { "579", "cinque\u00ADcento\u00ADsettanta\u00ADnove" }, { "1,000", "mille" }, { "2,000", "due\u00ADmila" }, { "3,004", "tre\u00ADmila\u00ADquattro" }, { "4,567", "quattro\u00ADmila\u00ADcinque\u00ADcento\u00ADsessanta\u00ADsette" }, { "15,943", "quindici\u00ADmila\u00ADnove\u00ADcento\u00ADquaranta\u00ADtr\u00E9" }, { "-36", "meno trenta\u00ADsei" }, { "234.567", "due\u00ADcento\u00ADtrenta\u00ADquattro virgola cinque sei sette" } }; doTest(formatter, testData, true); } /** * Perform a simple spot check on the German spellout rules */ public void TestGermanSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.GERMANY, RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "1", "eins" }, { "15", "f\u00fcnfzehn" }, { "20", "zwanzig" }, { "23", "drei\u00ADund\u00ADzwanzig" }, { "73", "drei\u00ADund\u00ADsiebzig" }, { "88", "acht\u00ADund\u00ADachtzig" }, { "100", "ein\u00ADhundert" }, { "106", "ein\u00ADhundert\u00ADsechs" }, { "127", "ein\u00ADhundert\u00ADsieben\u00ADund\u00ADzwanzig" }, { "200", "zwei\u00ADhundert" }, { "579", "f\u00fcnf\u00ADhundert\u00ADneun\u00ADund\u00ADsiebzig" }, { "1,000", "ein\u00ADtausend" }, { "2,000", "zwei\u00ADtausend" }, { "3,004", "drei\u00ADtausend\u00ADvier" }, { "4,567", "vier\u00ADtausend\u00ADf\u00fcnf\u00ADhundert\u00ADsieben\u00ADund\u00ADsechzig" }, { "15,943", "f\u00fcnfzehn\u00ADtausend\u00ADneun\u00ADhundert\u00ADdrei\u00ADund\u00ADvierzig" }, { "2,345,678", "zwei Millionen drei\u00ADhundert\u00ADf\u00fcnf\u00ADund\u00ADvierzig\u00ADtausend\u00AD" + "sechs\u00ADhundert\u00ADacht\u00ADund\u00ADsiebzig" } }; doTest(formatter, testData, true); formatter.setLenientParseMode(true); String[][] lpTestData = { { "ein Tausend sechs Hundert fuenfunddreissig", "1,635" } }; doLenientParseTest(formatter, lpTestData); } /** * Perform a simple spot check on the Thai spellout rules */ public void TestThaiSpellout() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(new Locale("th", "TH", ""), RuleBasedNumberFormat.SPELLOUT); String[][] testData = { { "0", "\u0e28\u0e39\u0e19\u0e22\u0e4c" }, { "1", "\u0e2b\u0e19\u0e36\u0e48\u0e07" }, { "10", "\u0e2a\u0e34\u0e1a" }, { "11", "\u0e2a\u0e34\u0e1a\u200b\u0e40\u0e2d\u0e47\u0e14" }, { "21", "\u0e22\u0e35\u0e48\u200b\u0e2a\u0e34\u0e1a\u200b\u0e40\u0e2d\u0e47\u0e14" }, { "101", "\u0e2b\u0e19\u0e36\u0e48\u0e07\u200b\u0e23\u0e49\u0e2d\u0e22\u200b\u0e2b\u0e19\u0e36\u0e48\u0e07" }, { "1.234", "\u0e2b\u0e19\u0e36\u0e48\u0e07\u200b\u0e08\u0e38\u0e14\u200b\u0e2a\u0e2d\u0e07\u0e2a\u0e32\u0e21\u0e2a\u0e35\u0e48" }, { "21.45", "\u0e22\u0e35\u0e48\u200b\u0e2a\u0e34\u0e1a\u200b\u0e40\u0e2d\u0e47\u0e14\u200b\u0e08\u0e38\u0e14\u200b\u0e2a\u0e35\u0e48\u0e2b\u0e49\u0e32" }, { "22.45", "\u0e22\u0e35\u0e48\u200b\u0e2a\u0e34\u0e1a\u200b\u0e2a\u0e2d\u0e07\u200b\u0e08\u0e38\u0e14\u200b\u0e2a\u0e35\u0e48\u0e2b\u0e49\u0e32" }, { "23.45", "\u0e22\u0e35\u0e48\u200b\u0e2a\u0e34\u0e1a\u200b\u0e2a\u0e32\u0e21\u200b\u0e08\u0e38\u0e14\u200b\u0e2a\u0e35\u0e48\u0e2b\u0e49\u0e32" }, { "123.45", "\u0e2b\u0e19\u0e36\u0e48\u0e07\u200b\u0e23\u0e49\u0e2d\u0e22\u200b\u0e22\u0e35\u0e48\u200b\u0e2a\u0e34\u0e1a\u200b\u0e2a\u0e32\u0e21\u200b\u0e08\u0e38\u0e14\u200b\u0e2a\u0e35\u0e48\u0e2b\u0e49\u0e32" }, { "12,345.678", "\u0E2B\u0E19\u0E36\u0E48\u0E07\u200b\u0E2B\u0E21\u0E37\u0E48\u0E19\u200b\u0E2A\u0E2D\u0E07\u200b\u0E1E\u0E31\u0E19\u200b\u0E2A\u0E32\u0E21\u200b\u0E23\u0E49\u0E2D\u0E22\u200b\u0E2A\u0E35\u0E48\u200b\u0E2A\u0E34\u0E1A\u200b\u0E2B\u0E49\u0E32\u200b\u0E08\u0E38\u0E14\u200b\u0E2B\u0E01\u0E40\u0E08\u0E47\u0E14\u0E41\u0E1B\u0E14" }, }; doTest(formatter, testData, true); /* formatter.setLenientParseMode(true); String[][] lpTestData = { { "ein Tausend sechs Hundert fuenfunddreissig", "1,635" } }; doLenientParseTest(formatter, lpTestData); */ } public void TestFractionalRuleSet() { RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(fracRules, Locale.ENGLISH); String[][] testData = { { "0", "0" }, { "1", "1" }, { "10", "10" }, { ".1", "1/10" }, { ".11", "1/9" }, { ".125", "1/8" }, { ".1428", "1/7" }, { ".1667", "1/6" }, { ".2", "1/5" }, { ".25", "1/4" }, { ".333", "1/3" }, { ".5", "1/2" }, { "1.1", "1 1/10" }, { "2.11", "2 1/9" }, { "3.125", "3 1/8" }, { "4.1428", "4 1/7" }, { "5.1667", "5 1/6" }, { "6.2", "6 1/5" }, { "7.25", "7 1/4" }, { "8.333", "8 1/3" }, { "9.5", "9 1/2" }, { ".2222", "2/9" }, { ".4444", "4/9" }, { ".5555", "5/9" }, { "1.2856", "1 2/7" } }; doTest(formatter, testData, false); // exact values aren't parsable from fractions } public void TestSwedishSpellout() { Locale locale = new Locale("sv", "", ""); RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(locale, RuleBasedNumberFormat.SPELLOUT); String[][] testDataDefault = { { "101", "ett\u00ADhundra\u00ADett" }, { "123", "ett\u00ADhundra\u00ADtjugo\u00ADtre" }, { "1,001", "ettusen ett" }, { "1,100", "ettusen ett\u00ADhundra" }, { "1,101", "ettusen ett\u00ADhundra\u00ADett" }, { "1,234", "ettusen tv\u00e5\u00ADhundra\u00ADtrettio\u00ADfyra" }, { "10,001", "tio\u00ADtusen ett" }, { "11,000", "elva\u00ADtusen" }, { "12,000", "tolv\u00ADtusen" }, { "20,000", "tjugo-tusen" }, { "21,000", "tjugo\u00ADett-tusen" }, { "21,001", "tjugo\u00ADett-tusen ett" }, { "200,000", "tv\u00e5\u00ADhundra-tusen" }, { "201,000", "tv\u00e5\u00ADhundra\u00ADett-tusen" }, { "200,200", "tv\u00e5\u00ADhundra-tusen tv\u00e5\u00ADhundra" }, { "2,002,000", "tv\u00e5 miljoner tv\u00e5\u00ADtusen" }, { "12,345,678", "tolv miljoner tre\u00ADhundra\u00ADfyrtio\u00ADfem-tusen sex\u00ADhundra\u00ADsjuttio\u00AD\u00e5tta" }, { "123,456.789", "ett\u00ADhundra\u00ADtjugo\u00ADtre-tusen fyra\u00ADhundra\u00ADfemtio\u00ADsex komma sju \u00e5tta nio" }, { "-12,345.678", "minus tolv\u00ADtusen tre\u00ADhundra\u00ADfyrtio\u00ADfem komma sex sju \u00e5tta" } }; logln("testing default rules"); doTest(formatter, testDataDefault, true); String[][] testDataNeutrum = { { "101", "ett\u00adhundra\u00aden" }, { "1,001", "ettusen en" }, { "1,101", "ettusen ett\u00adhundra\u00aden" }, { "10,001", "tio\u00adtusen en" }, { "21,001", "tjugo\u00aden\u00adtusen en" } }; formatter.setDefaultRuleSet("%spellout-cardinal-neutre"); logln("testing neutrum rules"); doTest(formatter, testDataNeutrum, true); String[][] testDataYear = { { "101", "ett\u00adhundra\u00adett" }, { "900", "nio\u00adhundra" }, { "1,001", "ettusen ett" }, { "1,100", "elva\u00adhundra" }, { "1,101", "elva\u00adhundra\u00adett" }, { "1,234", "tolv\u00adhundra\u00adtrettio\u00adfyra" }, { "2,001", "tjugo\u00adhundra\u00adett" }, { "10,001", "tio\u00adtusen ett" } }; formatter.setDefaultRuleSet("%spellout-numbering-year"); logln("testing year rules"); doTest(formatter, testDataYear, true); } public void TestBigNumbers() { BigInteger bigI = new BigInteger("1234567890", 10); StringBuffer buf = new StringBuffer(); RuleBasedNumberFormat fmt = new RuleBasedNumberFormat(RuleBasedNumberFormat.SPELLOUT); fmt.format(bigI, buf, null); logln("big int: " + buf.toString()); //#if defined(FOUNDATION10) //#else buf.setLength(0); java.math.BigDecimal bigD = new java.math.BigDecimal(bigI); fmt.format(bigD, buf, null); logln("big dec: " + buf.toString()); //#endif } public void TestTrailingSemicolon() { String thaiRules = "%default:\n" + " -x: \u0e25\u0e1a>>;\n" + " x.x: <<\u0e08\u0e38\u0e14>>>;\n" + " \u0e28\u0e39\u0e19\u0e22\u0e4c; \u0e2b\u0e19\u0e36\u0e48\u0e07; \u0e2a\u0e2d\u0e07; \u0e2a\u0e32\u0e21;\n" + " \u0e2a\u0e35\u0e48; \u0e2b\u0e49\u0e32; \u0e2b\u0e01; \u0e40\u0e08\u0e47\u0e14; \u0e41\u0e1b\u0e14;\n" + " \u0e40\u0e01\u0e49\u0e32; \u0e2a\u0e34\u0e1a; \u0e2a\u0e34\u0e1a\u0e40\u0e2d\u0e47\u0e14;\n" + " \u0e2a\u0e34\u0e1a\u0e2a\u0e2d\u0e07; \u0e2a\u0e34\u0e1a\u0e2a\u0e32\u0e21;\n" + " \u0e2a\u0e34\u0e1a\u0e2a\u0e35\u0e48; \u0e2a\u0e34\u0e1a\u0e2b\u0e49\u0e32;\n" + " \u0e2a\u0e34\u0e1a\u0e2b\u0e01; \u0e2a\u0e34\u0e1a\u0e40\u0e08\u0e47\u0e14;\n" + " \u0e2a\u0e34\u0e1a\u0e41\u0e1b\u0e14; \u0e2a\u0e34\u0e1a\u0e40\u0e01\u0e49\u0e32;\n" + " 20: \u0e22\u0e35\u0e48\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 30: \u0e2a\u0e32\u0e21\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 40: \u0e2a\u0e35\u0e48\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 50: \u0e2b\u0e49\u0e32\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 60: \u0e2b\u0e01\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 70: \u0e40\u0e08\u0e47\u0e14\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 80: \u0e41\u0e1b\u0e14\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 90: \u0e40\u0e01\u0e49\u0e32\u0e2a\u0e34\u0e1a[>%%alt-ones>];\n" + " 100: <<\u0e23\u0e49\u0e2d\u0e22[>>];\n" + " 1000: <<\u0e1e\u0e31\u0e19[>>];\n" + " 10000: <<\u0e2b\u0e21\u0e37\u0e48\u0e19[>>];\n" + " 100000: <<\u0e41\u0e2a\u0e19[>>];\n" + " 1,000,000: <<\u0e25\u0e49\u0e32\u0e19[>>];\n" + " 1,000,000,000: <<\u0e1e\u0e31\u0e19\u0e25\u0e49\u0e32\u0e19[>>];\n" + " 1,000,000,000,000: <<\u0e25\u0e49\u0e32\u0e19\u0e25\u0e49\u0e32\u0e19[>>];\n" + " 1,000,000,000,000,000: =#,##0=;\n" + "%%alt-ones:\n" + " \u0e28\u0e39\u0e19\u0e22\u0e4c;\n" + " \u0e40\u0e2d\u0e47\u0e14;\n" + " =%default=;\n ; ;; "; RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(thaiRules, new Locale("th", "TH", "")); String[][] testData = { { "0", "\u0e28\u0e39\u0e19\u0e22\u0e4c" }, { "1", "\u0e2b\u0e19\u0e36\u0e48\u0e07" }, { "123.45", "\u0e2b\u0e19\u0e36\u0e48\u0e07\u0e23\u0e49\u0e2d\u0e22\u0e22\u0e35\u0e48\u0e2a\u0e34\u0e1a\u0e2a\u0e32\u0e21\u0e08\u0e38\u0e14\u0e2a\u0e35\u0e48\u0e2b\u0e49\u0e32" } }; doTest(formatter, testData, true); } public void TestSmallValues() { String[][] testData = { { "0.001", "zero point zero zero one" }, { "0.0001", "zero point zero zero zero one" }, { "0.00001", "zero point zero zero zero zero one" }, { "0.000001", "zero point zero zero zero zero zero one" }, { "0.0000001", "zero point zero zero zero zero zero zero one" }, { "0.00000001", "zero point zero zero zero zero zero zero zero one" }, { "0.000000001", "zero point zero zero zero zero zero zero zero zero one" }, { "0.0000000001", "zero point zero zero zero zero zero zero zero zero zero one" }, { "0.00000000001", "zero point zero zero zero zero zero zero zero zero zero zero one" }, { "0.000000000001", "zero point zero zero zero zero zero zero zero zero zero zero zero one" }, { "0.0000000000001", "zero point zero zero zero zero zero zero zero zero zero zero zero zero one" }, { "0.00000000000001", "zero point zero zero zero zero zero zero zero zero zero zero zero zero zero one" }, { "0.000000000000001", "zero point zero zero zero zero zero zero zero zero zero zero zero zero zero zero one" }, { "10,000,000.001", "ten million point zero zero one" }, { "10,000,000.0001", "ten million point zero zero zero one" }, { "10,000,000.00001", "ten million point zero zero zero zero one" }, { "10,000,000.000001", "ten million point zero zero zero zero zero one" }, { "10,000,000.0000001", "ten million point zero zero zero zero zero zero one" }, { "10,000,000.00000001", "ten million point zero zero zero zero zero zero zero one" }, { "10,000,000.000000002", "ten million point zero zero zero zero zero zero zero zero two" }, { "10,000,000", "ten million" }, { "1,234,567,890.0987654", "one billion two hundred thirty-four million five hundred sixty-seven thousand eight hundred ninety point zero nine eight seven six five four" }, { "123,456,789.9876543", "one hundred twenty-three million four hundred fifty-six thousand seven hundred eighty-nine point nine eight seven six five four three" }, { "12,345,678.87654321", "twelve million three hundred forty-five thousand six hundred seventy-eight point eight seven six five four three two one" }, { "1,234,567.7654321", "one million two hundred thirty-four thousand five hundred sixty-seven point seven six five four three two one" }, { "123,456.654321", "one hundred twenty-three thousand four hundred fifty-six point six five four three two one" }, { "12,345.54321", "twelve thousand three hundred forty-five point five four three two one" }, { "1,234.4321", "one thousand two hundred thirty-four point four three two one" }, { "123.321", "one hundred twenty-three point three two one" }, { "0.0000000011754944", "zero point zero zero zero zero zero zero zero zero one one seven five four nine four four" }, { "0.000001175494351", "zero point zero zero zero zero zero one one seven five four nine four three five one" }, }; RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(Locale.US, RuleBasedNumberFormat.SPELLOUT); doTest(formatter, testData, true); } public void TestRuleSetDisplayName() { ULocale.setDefault(ULocale.US); String[][] localizations = new String[][] { /* public rule sets*/ {"%simplified", "%default", "%ordinal"}, /* display names in "en_US" locale*/ {"en_US", "Simplified", "Default", "Ordinal"}, /* display names in "zh_Hans" locale*/ {"zh_Hans", "\u7B80\u5316", "\u7F3A\u7701", "\u5E8F\u5217"}, /* display names in a fake locale*/ {"foo_Bar_BAZ", "Simplified", "Default", "Ordinal"} }; //Construct RuleBasedNumberFormat by rule sets and localizations list RuleBasedNumberFormat formatter = new RuleBasedNumberFormat(ukEnglish, localizations, ULocale.US); RuleBasedNumberFormat f2= new RuleBasedNumberFormat(ukEnglish, localizations); assertTrue("Check the two formatters' equality", formatter.equals(f2)); //get displayName by name String[] ruleSetNames = formatter.getRuleSetNames(); for (int i=0; i " + s); if (testParse) { // We do not validate the result in this test case, // because there are cases which do not round trip by design. try { // non-lenient parse fmt.setLenientParseMode(false); Number num = fmt.parse(s); logln(loc.getName() + names[j] + "success parse: " + s + " -> " + num); // lenient parse fmt.setLenientParseMode(true); num = fmt.parse(s); logln(loc.getName() + names[j] + "success parse (lenient): " + s + " -> " + num); } catch (ParseException pe) { String msg = loc.getName() + names[j] + "ERROR:" + pe.getMessage(); logln(msg); if (errors == null) { errors = new StringBuffer(); } errors.append("\n" + msg); } } } } } if (errors != null) { //TODO: We need to fix parse problems - see #6895 / #6896 //errln(errors.toString()); logln(errors.toString()); } } void doTest(RuleBasedNumberFormat formatter, String[][] testData, boolean testParsing) { // NumberFormat decFmt = NumberFormat.getInstance(Locale.US); NumberFormat decFmt = new DecimalFormat("#,###.################"); try { for (int i = 0; i < testData.length; i++) { String number = testData[i][0]; String expectedWords = testData[i][1]; logln("test[" + i + "] number: " + number + " target: " + expectedWords); Number num = decFmt.parse(number); String actualWords = formatter.format(num); if (!actualWords.equals(expectedWords)) { errln("Spot check format failed: for " + number + ", expected\n " + expectedWords + ", but got\n " + actualWords); } else if (testParsing) { String actualNumber = decFmt.format(formatter .parse(actualWords)); if (!actualNumber.equals(number)) { errln("Spot check parse failed: for " + actualWords + ", expected " + number + ", but got " + actualNumber); } } } } catch (Throwable e) { e.printStackTrace(); errln("Test failed with exception: " + e.toString()); } } void doLenientParseTest(RuleBasedNumberFormat formatter, String[][] testData) { NumberFormat decFmt = NumberFormat.getInstance(Locale.US); try { for (int i = 0; i < testData.length; i++) { String words = testData[i][0]; String expectedNumber = testData[i][1]; String actualNumber = decFmt.format(formatter.parse(words)); if (!actualNumber.equals(expectedNumber)) { errln("Lenient-parse spot check failed: for " + words + ", expected " + expectedNumber + ", but got " + actualNumber); } } } catch (Throwable e) { errln("Test failed with exception: " + e.toString()); e.printStackTrace(); } } /** * Spellout rules for U.K. English. * I borrow the rule sets for TestRuleSetDisplayName() */ public static final String ukEnglish = "%simplified:\n" + " -x: minus >>;\n" + " x.x: << point >>;\n" + " zero; one; two; three; four; five; six; seven; eight; nine;\n" + " ten; eleven; twelve; thirteen; fourteen; fifteen; sixteen;\n" + " seventeen; eighteen; nineteen;\n" + " 20: twenty[->>];\n" + " 30: thirty[->>];\n" + " 40: forty[->>];\n" + " 50: fifty[->>];\n" + " 60: sixty[->>];\n" + " 70: seventy[->>];\n" + " 80: eighty[->>];\n" + " 90: ninety[->>];\n" + " 100: << hundred[ >>];\n" + " 1000: << thousand[ >>];\n" + " 1,000,000: << million[ >>];\n" + " 1,000,000,000,000: << billion[ >>];\n" + " 1,000,000,000,000,000: =#,##0=;\n" + "%alt-teens:\n" + " =%simplified=;\n" + " 1000>: <%%alt-hundreds<[ >>];\n" + " 10,000: =%simplified=;\n" + " 1,000,000: << million[ >%simplified>];\n" + " 1,000,000,000,000: << billion[ >%simplified>];\n" + " 1,000,000,000,000,000: =#,##0=;\n" + "%%alt-hundreds:\n" + " 0: SHOULD NEVER GET HERE!;\n" + " 10: <%simplified< thousand;\n" + " 11: =%simplified= hundred>%%empty>;\n" + "%%empty:\n" + " 0:;" + "%ordinal:\n" + " zeroth; first; second; third; fourth; fifth; sixth; seventh;\n" + " eighth; ninth;\n" + " tenth; eleventh; twelfth; thirteenth; fourteenth;\n" + " fifteenth; sixteenth; seventeenth; eighteenth;\n" + " nineteenth;\n" + " twentieth; twenty->>;\n" + " 30: thirtieth; thirty->>;\n" + " 40: fortieth; forty->>;\n" + " 50: fiftieth; fifty->>;\n" + " 60: sixtieth; sixty->>;\n" + " 70: seventieth; seventy->>;\n" + " 80: eightieth; eighty->>;\n" + " 90: ninetieth; ninety->>;\n" + " 100: <%simplified< hundredth; <%simplified< hundred >>;\n" + " 1000: <%simplified< thousandth; <%simplified< thousand >>;\n" + " 1,000,000: <%simplified< millionth; <%simplified< million >>;\n" + " 1,000,000,000,000: <%simplified< billionth;\n" + " <%simplified< billion >>;\n" + " 1,000,000,000,000,000: =#,##0=;" + "%default:\n" + " -x: minus >>;\n" + " x.x: << point >>;\n" + " =%simplified=;\n" + " 100: << hundred[ >%%and>];\n" + " 1000: << thousand[ >%%and>];\n" + " 100,000>>: << thousand[>%%commas>];\n" + " 1,000,000: << million[>%%commas>];\n" + " 1,000,000,000,000: << billion[>%%commas>];\n" + " 1,000,000,000,000,000: =#,##0=;\n" + "%%and:\n" + " and =%default=;\n" + " 100: =%default=;\n" + "%%commas:\n" + " ' and =%default=;\n" + " 100: , =%default=;\n" + " 1000: , <%default< thousand, >%default>;\n" + " 1,000,000: , =%default=;" + "%%lenient-parse:\n" + " & ' ' , ',' ;\n"; } icu4j-4.2/src/com/ibm/icu/dev/test/format/GlobalizationPreferencesTest.java0000644000175000017500000017043211361046232026744 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2004-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import java.util.ArrayList; import java.util.List; import java.util.MissingResourceException; import java.util.ResourceBundle; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.text.BreakIterator; import com.ibm.icu.text.Collator; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.NumberFormat; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.util.BuddhistCalendar; import com.ibm.icu.util.Calendar; import com.ibm.icu.util.Currency; import com.ibm.icu.util.GlobalizationPreferences; import com.ibm.icu.util.GregorianCalendar; import com.ibm.icu.util.IslamicCalendar; import com.ibm.icu.util.JapaneseCalendar; import com.ibm.icu.util.TimeZone; import com.ibm.icu.util.ULocale; public class GlobalizationPreferencesTest extends TestFmwk { public static void main(String[] args) throws Exception { new GlobalizationPreferencesTest().run(args); } public void TestDefault() { GlobalizationPreferences gp = new GlobalizationPreferences(); ULocale defLocale = new ULocale("en_US"); ULocale defFallbackLocale = new ULocale("en"); if (!defLocale.equals(ULocale.getDefault())) { // Locale.US is always used as the default locale in the test environment // If not, some test cases will fail... errln("FAIL: The default locale of the test environment must be en_US"); } logln("Default locale: " + defLocale.toString()); // First locale is en_US ULocale gpLocale0 = gp.getLocale(0); logln("Primary locale: " + gpLocale0.toString()); if (!gpLocale0.equals(defLocale)) { errln("FAIL: The primary locale is not en_US"); } // Second locale is en ULocale gpLocale1 = gp.getLocale(1); logln("Secondary locale: " + gpLocale1.toString()); if (!gpLocale1.equals(defFallbackLocale)) { errln("FAIL: The secondary locale is not en"); } // Third locale is null ULocale gpLocale2 = gp.getLocale(2); if (gpLocale2 != null) { errln("FAIL: Number of locales must be 2"); } // Calendar locale Calendar cal = gp.getCalendar(); ULocale calLocale = cal.getLocale(ULocale.VALID_LOCALE); logln("Calendar locale: " + calLocale.toString()); if (!calLocale.equals(defLocale)) { errln("FAIL: The calendar locale must match with the default JVM locale"); } // Collator locale Collator coll = gp.getCollator(); ULocale collLocale = coll.getLocale(ULocale.VALID_LOCALE); logln("Collator locale: " + collLocale.toString()); if (!collLocale.equals(defLocale)) { errln("FAIL: The collator locale must match with the default JVM locale"); } // BreakIterator locale BreakIterator brk = gp.getBreakIterator(GlobalizationPreferences.BI_CHARACTER); ULocale brkLocale = brk.getLocale(ULocale.VALID_LOCALE); logln("BreakIterator locale: " + brkLocale.toString()); if (!brkLocale.equals(defLocale)) { errln("FAIL: The break iterator locale must match with the default JVM locale"); } /* Skip - Bug#5209 // DateFormat locale DateFormat df = gp.getDateFormat(GlobalizationPreferences.DF_FULL, GlobalizationPreferences.DF_NONE); ULocale dfLocale = df.getLocale(ULocale.VALID_LOCALE); logln("DateFormat locale: " + dfLocale.toString()); if (!dfLocale.equals(defLocale)) { errln("FAIL: The date format locale must match with the default JVM locale"); } */ // NumberFormat locale NumberFormat nf = gp.getNumberFormat(GlobalizationPreferences.NF_NUMBER); ULocale nfLocale = nf.getLocale(ULocale.VALID_LOCALE); logln("NumberFormat locale: " + nfLocale.toString()); if (!nfLocale.equals(defLocale)) { errln("FAIL: The number format locale must match with the default JVM locale"); } } public void TestFreezable() { logln("Create a new GlobalizationPreference object"); GlobalizationPreferences gp = new GlobalizationPreferences(); if (gp.isFrozen()) { errln("FAIL: This object is not yet frozen"); } logln("Call reset()"); boolean bSet = true; try { gp.reset(); } catch (UnsupportedOperationException uoe) { bSet = false; } if (!bSet) { errln("FAIL: reset() must not throw an exception before frozen"); } // Freeze the object logln("Freeze the object"); gp.freeze(); if (!gp.isFrozen()) { errln("FAIL: This object is already fronzen"); } // reset() logln("Call reset() after frozen"); bSet = true; try { gp.reset(); } catch (UnsupportedOperationException uoe) { bSet = false; } if (bSet) { errln("FAIL: reset() must be blocked after frozen"); } // setLocales(ULocale[]) logln("Call setLocales(ULocale[]) after frozen"); bSet = true; try { gp.setLocales(new ULocale[] {new ULocale("fr_FR")}); } catch (UnsupportedOperationException uoe) { bSet = false; } if (bSet) { errln("FAIL: setLocales(ULocale[]) must be blocked after frozen"); } // setLocales(ULocale[]) logln("Call setLocales(List) after frozen"); bSet = true; ArrayList list = new ArrayList(1); list.add(new ULocale("fr_FR")); try { gp.setLocales(list); } catch (UnsupportedOperationException uoe) { bSet = false; } if (bSet) { errln("FAIL: setLocales(List) must be blocked after frozen"); } // setLocales(String) logln("Call setLocales(String) after frozen"); bSet = true; try { gp.setLocales("pt-BR,es;q=0.7"); } catch (UnsupportedOperationException uoe) { bSet = false; } if (bSet) { errln("FAIL: setLocales(String) must be blocked after frozen"); } // setLocale(ULocale) logln("Call setLocale(ULocale) after frozen"); bSet = true; try { gp.setLocale(new ULocale("fi_FI")); } catch (UnsupportedOperationException uoe) { bSet = false; } if (bSet) { errln("FAIL: setLocale(ULocale) must be blocked after frozen"); } // setTerritory(String) logln("Call setTerritory(String) after frozen"); bSet = true; try { gp.setTerritory("AU"); } catch (UnsupportedOperationException uoe) { bSet = false; } if (bSet) { errln("FAIL: setTerritory(String) must be blocked after frozen"); } // Modifiable clone logln("Create a modifiable clone"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); if (gp1.isFrozen()) { errln("FAIL: The object returned by cloneAsThawed() must not be frozen yet"); } // setLocale(ULocale) logln("Call setLocale(ULocale) of the modifiable clone"); bSet = true; try { gp1.setLocale(new ULocale("fr_FR")); } catch (UnsupportedOperationException uoe) { bSet = false; } if (!bSet) { errln("FAIL: setLocales(ULocale) must not throw an exception before frozen"); } } static String[][] INPUT_LOCALEIDS = { {"en_US"}, {"fr_CA", "fr"}, {"fr", "fr_CA"}, {"es", "fr", "en_US"}, {"zh_CN", "zh_Hans", "zh_Hans_CN"}, {"en_US_123"}, {"es_US", "es"}, {"de_DE", "es", "fr_FR"}, }; static String[] ACCEPT_LANGUAGES = { "en-US", "fr-CA,fr;q=0.5", "fr_CA;q=0.5,fr", "es,fr;q=0.76,en_US;q=0.75", "zh-CN,zh-Hans;q=0.5,zh-Hans-CN;q=0.1", "en-US-123", " es\t; q =0.5 \t, es-US ;q =1", "fr-FR; q=0.5, de-DE, es", }; static String[][] RESULTS_LOCALEIDS = { {"en_US", "en"}, {"fr_CA", "fr"}, {"fr_CA", "fr"}, {"es", "fr", "en_US", "en"}, {"zh_Hans_CN", "zh_CN", "zh_Hans", "zh"}, {"en_US_123", "en_US", "en"}, {"es_US", "es"}, {"de_DE", "de", "es", "fr_FR", "fr"}, }; public void TestSetLocales() { GlobalizationPreferences gp = new GlobalizationPreferences(); // setLocales(List) for (int i = 0; i < INPUT_LOCALEIDS.length; i++) { String[] localeStrings = INPUT_LOCALEIDS[i]; ArrayList locales = new ArrayList(); StringBuffer sb = new StringBuffer(); for (int j = 0; j < localeStrings.length; j++) { locales.add(new ULocale(localeStrings[j])); if (j != 0) { sb.append(", "); } sb.append(localeStrings[j]); } logln("Input locales: " + sb.toString()); gp.reset(); gp.setLocales(locales); List resultLocales = gp.getLocales(); if (resultLocales.size() != RESULTS_LOCALEIDS[i].length) { errln("FAIL: Number of locales mismatch - GP:" + resultLocales.size() + " Expected:" + RESULTS_LOCALEIDS[i].length); } else { for (int j = 0; j < RESULTS_LOCALEIDS[i].length; j++) { ULocale loc = gp.getLocale(j); logln("Locale[" + j + "]: " + loc.toString()); if (!gp.getLocale(j).toString().equals(RESULTS_LOCALEIDS[i][j])) { errln("FAIL: Locale index(" + j + ") does not match - GP:" + loc.toString() + " Expected:" + RESULTS_LOCALEIDS[i][j]); } } } } // setLocales(ULocale[]) for (int i = 0; i < INPUT_LOCALEIDS.length; i++) { String[] localeStrings = INPUT_LOCALEIDS[i]; ULocale[] localeArray = new ULocale[INPUT_LOCALEIDS[i].length]; StringBuffer sb = new StringBuffer(); for (int j = 0; j < localeStrings.length; j++) { localeArray[j] = new ULocale(localeStrings[j]); if (j != 0) { sb.append(", "); } sb.append(localeStrings[j]); } logln("Input locales: " + sb.toString()); gp.reset(); gp.setLocales(localeArray); List resultLocales = gp.getLocales(); if (resultLocales.size() != RESULTS_LOCALEIDS[i].length) { errln("FAIL: Number of locales mismatch - GP:" + resultLocales.size() + " Expected:" + RESULTS_LOCALEIDS[i].length); } else { for (int j = 0; j < RESULTS_LOCALEIDS[i].length; j++) { ULocale loc = gp.getLocale(j); logln("Locale[" + j + "]: " + loc.toString()); if (!gp.getLocale(j).toString().equals(RESULTS_LOCALEIDS[i][j])) { errln("FAIL: Locale index(" + j + ") does not match - GP:" + loc.toString() + " Expected:" + RESULTS_LOCALEIDS[i][j]); } } } } // setLocales(String) for (int i = 0; i < ACCEPT_LANGUAGES.length; i++) { String acceptLanguage = ACCEPT_LANGUAGES[i]; logln("Accept language: " + acceptLanguage); gp.reset(); gp.setLocales(acceptLanguage); List resultLocales = gp.getLocales(); if (resultLocales.size() != RESULTS_LOCALEIDS[i].length) { errln("FAIL: Number of locales mismatch - GP:" + resultLocales.size() + " Expected:" + RESULTS_LOCALEIDS[i].length); } else { for (int j = 0; j < RESULTS_LOCALEIDS[i].length; j++) { ULocale loc = gp.getLocale(j); logln("Locale[" + j + "]: " + loc.toString()); if (!gp.getLocale(j).toString().equals(RESULTS_LOCALEIDS[i][j])) { errln("FAIL: Locale index(" + j + ") does not match - GP:" + loc.toString() + " Expected:" + RESULTS_LOCALEIDS[i][j]); } } } } // accept-language without q-value logln("Set accept-language - de,de-AT"); gp.setLocales("de,de-AT"); if (!gp.getLocale(0).toString().equals("de_AT")) { errln("FAIL: getLocale(0) returns " + gp.getLocale(0).toString() + " Expected: de_AT"); } // Invalid accept-language logln("Set locale - ko_KR"); gp.setLocale(new ULocale("ko_KR")); boolean bException = false; try { logln("Set invlaid accept-language - ko=100"); gp.setLocales("ko=100"); } catch (IllegalArgumentException iae) { logln("IllegalArgumentException was thrown"); bException = true; } if (!bException) { errln("FAIL: IllegalArgumentException was not thrown for illegal accept-language - ko=100"); } if (!gp.getLocale(0).toString().equals("ko_KR")) { errln("FAIL: Previous valid locale list had gone"); } } public void TestResourceBundle() { String baseName = "com.ibm.icu.dev.data.resources.TestDataElements"; ResourceBundle rb; logln("Get a resource bundle " + baseName + " using GlobalizationPreferences initialized by locales - en_GB, en_US"); GlobalizationPreferences gp = new GlobalizationPreferences(); ULocale[] locales = new ULocale[2]; locales[0] = new ULocale("en_GB"); locales[1] = new ULocale("en_US"); gp.setLocales(locales); try { rb = gp.getResourceBundle(baseName); String str = rb.getString("from_en_US"); if (!str.equals("This data comes from en_US")) { errln("FAIL: from_en_US is not from en_US bundle"); } } catch (MissingResourceException mre) { errln("FAIL: Missing resouces"); } gp.reset(); logln("Get a resource bundle " + baseName + " using GlobalizationPreferences initialized by locales - ja, en_US_California"); locales = new ULocale[2]; locales[0] = new ULocale("ja"); locales[1] = new ULocale("en_US_California"); gp.setLocales(locales); try { rb = gp.getResourceBundle(baseName, Thread.currentThread().getContextClassLoader()); String str = rb.getString("from_en_US"); if (!str.equals("This data comes from en_US")) { errln("FAIL: from_en_US is not from en_US bundle"); } } catch (MissingResourceException mre) { errln("FAIL: Missing resouces"); } logln("Get a resource bundle which does not exist"); boolean bException = false; try { rb = gp.getResourceBundle("foo.bar.XXX"); } catch (MissingResourceException mre) { logln("Missing resource exception for getting resource bundle - foo.bar.XXX"); bException = true; } if (!bException) { errln("FAIL: MissingResourceException must be thrown for RB - foo.bar.XXX"); } } public void TestTerritory() { GlobalizationPreferences gp = new GlobalizationPreferences(); // Territory for unsupported language locale logln("Set locale - ang"); gp.setLocale(new ULocale("ang")); String territory = gp.getTerritory(); if (!territory.equals("US")) { errln("FAIL: Territory is " + territory + " - Expected: US"); } // Territory for language only locale "fr" logln("Set locale - fr"); gp.setLocale(new ULocale("fr")); territory = gp.getTerritory(); if (!territory.equals("FR")) { errln("FAIL: Territory is " + territory + " - Expected: FR"); } // Set explicity territory logln("Set explicit territory - CA"); gp.setTerritory("CA"); territory = gp.getTerritory(); if (!territory.equals("CA")) { errln("FAIL: Territory is " + territory + " - Expected: CA"); } // Freeze logln("Freeze this object"); gp.freeze(); boolean bFrozen = false; try { gp.setTerritory("FR"); } catch (UnsupportedOperationException uoe) { logln("setTerritory is blocked"); bFrozen = true; } if (!bFrozen) { errln("FAIL: setTerritory must be blocked after frozen"); } territory = gp.getTerritory(); if (!territory.equals("CA")) { errln("FAIL: Territory is not CA"); } // Safe clone GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); territory = gp1.getTerritory(); if (!territory.equals("CA")) { errln("FAIL: Territory is " + territory + " - Expected: CA"); } gp1.reset(); ULocale[] locales = new ULocale[2]; locales[0] = new ULocale("ja"); locales[1] = new ULocale("zh_Hant_TW"); logln("Set locales - ja, zh_Hant_TW"); gp1.setLocales(locales); territory = gp1.getTerritory(); if (!territory.equals("TW")) { errln("FAIL: Territory is " + territory + " - Expected: TW"); } } public void TestCurrency() { GlobalizationPreferences gp = new GlobalizationPreferences(); // Set language only locale - ja logln("Set locale - ja"); gp.setLocale(new ULocale("ja")); Currency cur = gp.getCurrency(); String code = cur.getCurrencyCode(); if (!code.equals("JPY")) { errln("FAIL: Currency is " + code + " - Expected: JPY"); } gp.reset(); // Set locales with territory logln("Set locale - ja_US"); gp.setLocale(new ULocale("ja_US")); cur = gp.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("USD")) { errln("FAIL: Currency is " + code + " - Expected: USD"); } // Set locales with territory in the second locale logln("Set locales - it, en_US"); ULocale[] locales = new ULocale[2]; locales[0] = new ULocale("it"); locales[1] = new ULocale("en_US"); gp.setLocales(locales); cur = gp.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("USD")) { errln("FAIL: Currency is " + code + " - Expected: USD"); } // Set explicit territory logln("Set territory - DE"); gp.setTerritory("DE"); cur = gp.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("EUR")) { errln("FAIL: Currency is " + code + " - Expected: EUR"); } // Set explicit currency Currency ecur = Currency.getInstance("BRL"); gp.setCurrency(ecur); logln("Set explicit currency - BRL"); cur = gp.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("BRL")) { errln("FAIL: Currency is " + code + " - Expected: BRL"); } // Set explicit territory again logln("Set territory - JP"); cur = gp.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("BRL")) { errln("FAIL: Currency is " + code + " - Expected: BRL"); } // Freeze logln("Freeze this object"); Currency ecur2 = Currency.getInstance("CHF"); boolean bFrozen = false; gp.freeze(); try { gp.setCurrency(ecur2); } catch (UnsupportedOperationException uoe) { logln("setCurrency is blocked"); bFrozen = true; } if (!bFrozen) { errln("FAIL: setCurrency must be blocked"); } // Safe clone logln("cloneAsThawed"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); cur = gp.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("BRL")) { errln("FAIL: Currency is " + code + " - Expected: BRL"); } // Set ecplicit currency gp1.setCurrency(ecur2); cur = gp1.getCurrency(); code = cur.getCurrencyCode(); if (!code.equals("CHF")) { errln("FAIL: Currency is " + code + " - Expected: CHF"); } } public void TestCalendar() { GlobalizationPreferences gp = new GlobalizationPreferences(); // Set locale - pt_BR logln("Set locale - pt"); gp.setLocale(new ULocale("pt")); Calendar cal = gp.getCalendar(); String calType = cal.getType(); if (!calType.equals("gregorian")) { errln("FAIL: Calendar type is " + calType + " Expected: gregorian"); } // Set a list of locales logln("Set locales - en, en_JP, en_GB"); ULocale[] locales = new ULocale[3]; locales[0] = new ULocale("en"); locales[1] = new ULocale("en_JP"); locales[2] = new ULocale("en_GB"); gp.setLocales(locales); cal = gp.getCalendar(); ULocale calLocale = cal.getLocale(ULocale.VALID_LOCALE); if (!calLocale.equals(locales[2])) { errln("FAIL: Calendar locale is " + calLocale.toString() + " - Expected: en_GB"); } // Set ecplicit calendar logln("Set Japanese calendar to this object"); JapaneseCalendar jcal = new JapaneseCalendar(); gp.setCalendar(jcal); cal = gp.getCalendar(); calType = cal.getType(); if (!calType.equals("japanese")) { errln("FAIL: Calendar type is " + calType + " Expected: japanese"); } jcal.setFirstDayOfWeek(3); if (cal.getFirstDayOfWeek() == jcal.getFirstDayOfWeek()) { errln("FAIL: Calendar returned by getCalendar must be a safe copy"); } cal.setFirstDayOfWeek(3); Calendar cal1 = gp.getCalendar(); if (cal1.getFirstDayOfWeek() == cal.getFirstDayOfWeek()) { errln("FAIL: Calendar returned by getCalendar must be a safe copy"); } // Freeze logln("Freeze this object"); IslamicCalendar ical = new IslamicCalendar(); boolean bFrozen = false; gp.freeze(); try { gp.setCalendar(ical); } catch (UnsupportedOperationException uoe) { logln("setCalendar is blocked"); bFrozen = true; } if (!bFrozen) { errln("FAIL: setCalendar must be blocked"); } // Safe clone logln("cloneAsThawed"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); cal = gp.getCalendar(); calType = cal.getType(); if (!calType.equals("japanese")) { errln("FAIL: Calendar type afte clone is " + calType + " Expected: japanese"); } logln("Set islamic calendar"); gp1.setCalendar(ical); cal = gp1.getCalendar(); calType = cal.getType(); if (!calType.equals("islamic")) { errln("FAIL: Calendar type afte clone is " + calType + " Expected: islamic"); } } public void TestTimeZone() { GlobalizationPreferences gp = new GlobalizationPreferences(); // Set locale - zh_CN logln("Set locale - zh_CN"); gp.setLocale(new ULocale("zh_CN")); TimeZone tz = gp.getTimeZone(); String tzid = tz.getID(); if (!tzid.equals("Asia/Shanghai")) { errln("FAIL: Time zone ID is " + tzid + " Expected: Asia/Shanghai"); } // Set locale - en logln("Set locale - en"); gp.setLocale(new ULocale("en")); tz = gp.getTimeZone(); tzid = tz.getID(); if (!tzid.equals("America/New_York")) { errln("FAIL: Time zone ID is " + tzid + " Expected: America/New_York"); } // Set territory - GB logln("Set territory - GB"); gp.setTerritory("GB"); tz = gp.getTimeZone(); tzid = tz.getID(); if (!tzid.equals("Europe/London")) { errln("FAIL: Time zone ID is " + tzid + " Expected: Europe/London"); } // Check if getTimeZone returns a safe clone tz.setID("Bad_ID"); tz = gp.getTimeZone(); tzid = tz.getID(); if (!tzid.equals("Europe/London")) { errln("FAIL: Time zone ID is " + tzid + " Expected: Europe/London"); } // Set explicit time zone TimeZone jst = TimeZone.getTimeZone("Asia/Tokyo"); String customJstId = "Japan_Standard_Time"; jst.setID(customJstId); gp.setTimeZone(jst); tz = gp.getTimeZone(); tzid = tz.getID(); if (!tzid.equals(customJstId)) { errln("FAIL: Time zone ID is " + tzid + " Expected: " + customJstId); } // Freeze logln("Freeze this object"); TimeZone cst = TimeZone.getTimeZone("Europe/Paris"); boolean bFrozen = false; gp.freeze(); try { gp.setTimeZone(cst); } catch (UnsupportedOperationException uoe) { logln("setTimeZone is blocked"); bFrozen = true; } if (!bFrozen) { errln("FAIL: setTimeZone must be blocked"); } // Modifiable clone logln("cloneAsThawed"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); tz = gp1.getTimeZone(); tzid = tz.getID(); if (!tzid.equals(customJstId)) { errln("FAIL: Time zone ID is " + tzid + " Expected: " + customJstId); } // Set explicit time zone gp1.setTimeZone(cst); tz = gp1.getTimeZone(); tzid = tz.getID(); if (!tzid.equals(cst.getID())) { errln("FAIL: Time zone ID is " + tzid + " Expected: " + cst.getID()); } } public void TestCollator() { GlobalizationPreferences gp = new GlobalizationPreferences(); // Set locale - tr logln("Set locale - tr"); gp.setLocale(new ULocale("tr")); Collator coll = gp.getCollator(); String locStr = coll.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("tr")) { errln("FAIL: Collator locale is " + locStr + " Expected: tr"); } // Unsupported collator locale - zun logln("Set locale - zun"); gp.setLocale(new ULocale("zun")); coll = gp.getCollator(); locStr = coll.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("root")) { errln("FAIL: Collator locale is " + locStr + " Expected: root"); } // Set locales - en_JP, fr, en_US, fr_FR logln("Set locale - en_JP, fr, en_US, fr_FR"); ULocale[] locales = new ULocale[4]; locales[0] = new ULocale("en_JP"); locales[1] = new ULocale("fr"); locales[2] = new ULocale("en_US"); locales[3] = new ULocale("fr_FR"); gp.setLocales(locales); coll = gp.getCollator(); locStr = coll.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("fr_FR")) { errln("FAIL: Collator locale is " + locStr + " Expected: fr_FR"); } // Set explicit Collator Collator coll1 = Collator.getInstance(new ULocale("it")); coll1.setDecomposition(Collator.CANONICAL_DECOMPOSITION); logln("Set collator for it in canonical deconposition mode"); gp.setCollator(coll1); coll1.setStrength(Collator.IDENTICAL); coll = gp.getCollator(); locStr = coll.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("it")) { errln("FAIL: Collator locale is " + locStr + " Expected: it"); } if (coll1.equals(coll)) { errln("FAIL: setCollator must use a safe copy of a Collator"); } // Freeze logln("Freeze this object"); boolean isFrozen = false; gp.freeze(); try { gp.setCollator(coll1); } catch (UnsupportedOperationException uoe) { logln("setCollator is blocked"); isFrozen = true; } if (!isFrozen) { errln("FAIL: setCollator must be blocked after freeze"); } // Modifiable clone logln("cloneAsThawed"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); coll = gp1.getCollator(); locStr = coll.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("it")) { errln("FAIL: Collator locale is " + locStr + " Expected: it"); } if (coll.getDecomposition() != Collator.CANONICAL_DECOMPOSITION) { errln("FAIL: Decomposition mode is not CANONICAL_DECOMPOSITION"); } // Set custom collator again gp1.setCollator(coll1); coll = gp1.getCollator(); if (coll.getStrength() != Collator.IDENTICAL) { errln("FAIL: Strength is not IDENTICAL"); } } public void TestBreakIterator() { GlobalizationPreferences gp = new GlobalizationPreferences(); // Unsupported break iterator locale - aar logln("Set locale - aar"); gp.setLocale(new ULocale("aar")); BreakIterator brk = gp.getBreakIterator(GlobalizationPreferences.BI_LINE); String locStr = brk.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("root")) { errln("FAIL: Line break iterator locale is " + locStr + " Expected: root"); } // Set locale - es logln("Set locale - es"); gp.setLocale(new ULocale("es")); brk = gp.getBreakIterator(GlobalizationPreferences.BI_CHARACTER); /* TODO: JB#5383 locStr = brk.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("es")) { errln("FAIL: Character break iterator locale is " + locStr + " Expected: es"); } */ // Set explicit break sentence iterator logln("Set break iterator for sentence using locale hu_HU"); BreakIterator brk1 = BreakIterator.getSentenceInstance(new ULocale("hu_HU")); gp.setBreakIterator(GlobalizationPreferences.BI_SENTENCE, brk1); brk = gp.getBreakIterator(GlobalizationPreferences.BI_SENTENCE); /* TODO: JB#5210 locStr = brk.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("hu_HU")) { errln("FAIL: Sentence break locale is " + locStr + " Expected: hu_HU"); } */ brk.setText("This is a test case. Is this a new instance?"); brk.next(); if (brk1.current() == brk.current()) { errln("FAIL: getBreakIterator must return a new instance"); } // Illegal argument logln("Get break iterator type 100"); boolean illegalArg = false; try { brk = gp.getBreakIterator(100); } catch (IllegalArgumentException iae) { logln("Break iterator type 100 is illegal"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getBreakIterator must throw IllegalArgumentException for type 100"); } logln("Set break iterator type -1"); illegalArg = false; try { gp.setBreakIterator(-1, brk1); } catch (IllegalArgumentException iae) { logln("Break iterator type -1 is illegal"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getBreakIterator must throw IllegalArgumentException for type -1"); } // Freeze logln("Freeze this object"); BreakIterator brk2 = BreakIterator.getTitleInstance(new ULocale("es_MX")); boolean isFrozen = false; gp.freeze(); try { gp.setBreakIterator(GlobalizationPreferences.BI_TITLE, brk2); } catch (UnsupportedOperationException uoe) { logln("setBreakIterator is blocked"); isFrozen = true; } if (!isFrozen) { errln("FAIL: setBreakIterator must be blocked after frozen"); } // Modifiable clone logln("cloneAsThawed"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); brk = gp1.getBreakIterator(GlobalizationPreferences.BI_WORD); /* TODO: JB#5383 locStr = brk.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("es")) { errln("FAIL: Word break iterator locale is " + locStr + " Expected: es"); } */ ULocale frFR = new ULocale("fr_FR"); BreakIterator brkC = BreakIterator.getCharacterInstance(frFR); BreakIterator brkW = BreakIterator.getWordInstance(frFR); BreakIterator brkL = BreakIterator.getLineInstance(frFR); BreakIterator brkS = BreakIterator.getSentenceInstance(frFR); BreakIterator brkT = BreakIterator.getTitleInstance(frFR); gp1.setBreakIterator(GlobalizationPreferences.BI_CHARACTER, brkC); gp1.setBreakIterator(GlobalizationPreferences.BI_WORD, brkW); gp1.setBreakIterator(GlobalizationPreferences.BI_LINE, brkL); gp1.setBreakIterator(GlobalizationPreferences.BI_SENTENCE, brkS); gp1.setBreakIterator(GlobalizationPreferences.BI_TITLE, brkT); /* TODO: JB#5210 locStr = brkC.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("ja_JP")) { errln("FAIL: Character break iterator locale is " + locStr + " Expected: fr_FR"); } locStr = brkW.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("ja_JP")) { errln("FAIL: Word break iterator locale is " + locStr + " Expected: fr_FR"); } locStr = brkL.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("ja_JP")) { errln("FAIL: Line break iterator locale is " + locStr + " Expected: fr_FR"); } locStr = brkS.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("ja_JP")) { errln("FAIL: Sentence break iterator locale is " + locStr + " Expected: fr_FR"); } locStr = brkT.getLocale(ULocale.VALID_LOCALE).toString(); if (!locStr.equals("ja_JP")) { errln("FAIL: Title break iterator locale is " + locStr + " Expected: fr_FR"); } */ } public void TestDisplayName() { GlobalizationPreferences gp = new GlobalizationPreferences(); ULocale loc_fr_FR_Paris = new ULocale("fr_FR_Paris"); ULocale loc_peo = new ULocale("peo"); // Locale list - fr_FR_Paris ArrayList locales1 = new ArrayList(1); locales1.add(loc_fr_FR_Paris); // Locale list - ain, fr_FR_Paris ArrayList locales2 = new ArrayList(2); locales2.add(loc_peo); locales2.add(loc_fr_FR_Paris); logln("Locales: | | "); // ID_LOCALE String id = "zh_Hant_HK"; String name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_LOCALE); gp.setLocales(locales1); String name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_LOCALE); gp.setLocales(locales2); String name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_LOCALE); logln("Locale[zh_Hant_HK]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Locale ID"); } // ID_LANGUAGE gp.reset(); id = "fr"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_LANGUAGE); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_LANGUAGE); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_LANGUAGE); logln("Language[fr]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Language ID"); } // ID_SCRIPT gp.reset(); id = "cyrl"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_SCRIPT); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_SCRIPT); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_SCRIPT); logln("Script[cyrl]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Script ID"); } // ID_TERRITORY gp.reset(); id = "JP"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_TERRITORY); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_TERRITORY); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_TERRITORY); logln("Territory[JP]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Territory ID"); } // ID_VARIANT gp.reset(); id = "NEDIS"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_VARIANT); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_VARIANT); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_VARIANT); logln("Variant[NEDIS]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Variant ID"); } // ID_KEYWORD gp.reset(); id = "collation"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_KEYWORD); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_KEYWORD); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_KEYWORD); logln("Keyword[collation]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Keyword ID"); } // ID_KEYWORD_VALUE gp.reset(); id = "collation=traditional"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_KEYWORD_VALUE); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_KEYWORD_VALUE); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_KEYWORD_VALUE); logln("Keyword value[traditional]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Keyword value ID"); } // ID_CURRENCY_SYMBOL gp.reset(); id = "USD"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_CURRENCY_SYMBOL); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_CURRENCY_SYMBOL); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_CURRENCY_SYMBOL); logln("Currency symbol[USD]: " + name1 + " | " + name2 + " | " + name3); String dollar = "$"; String us_dollar = "$US"; if (!name1.equals(dollar) || !name2.equals(us_dollar) || !name3.equals(us_dollar)) { errln("FAIL: Currency symbol ID"); } // ID_CURRENCY gp.reset(); id = "USD"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_CURRENCY); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_CURRENCY); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_CURRENCY); logln("Currency[USD]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Currency ID"); } // ID_TIMEZONE gp.reset(); id = "Europe/Paris"; name1 = gp.getDisplayName(id, GlobalizationPreferences.ID_TIMEZONE); gp.setLocales(locales1); name2 = gp.getDisplayName(id, GlobalizationPreferences.ID_TIMEZONE); gp.setLocales(locales2); name3 = gp.getDisplayName(id, GlobalizationPreferences.ID_TIMEZONE); logln("Timezone[Europe/Paris]: " + name1 + " | " + name2 + " | " + name3); if (name1.equals(name2) || !name2.equals(name3)) { errln("FAIL: Timezone ID"); } // Illegal ID gp.reset(); boolean illegalArg = false; try { name1 = gp.getDisplayName(id, -1); } catch (IllegalArgumentException iae) { logln("Illegal type -1"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getDisplayName must throw IllegalArgumentException for type -1"); } illegalArg = false; try { name1 = gp.getDisplayName(id, 100); } catch (IllegalArgumentException iae) { logln("Illegal type 100"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getDisplayName must throw IllegalArgumentException for type 100"); } } public void TestDateFormat() { GlobalizationPreferences gp = new GlobalizationPreferences(); String pattern; DateFormat df; // Set unsupported locale - ach logln("Set locale - ach"); gp.setLocale(new ULocale("ach")); // Date - short df = gp.getDateFormat(GlobalizationPreferences.DF_SHORT, GlobalizationPreferences.DF_NONE); pattern = ((SimpleDateFormat)df).toPattern(); // root pattern must be used if (!pattern.equals("yyyy-MM-dd")) { errln("FAIL: SHORT date pattern is " + pattern + " Expected: yyyy-MM-dd"); } // Set locale - fr, fr_CA, fr_FR ArrayList lcls = new ArrayList(3); lcls.add(new ULocale("fr")); lcls.add(new ULocale("fr_CA")); lcls.add(new ULocale("fr_FR")); logln("Set locales - fr, fr_CA, fr_FR"); gp.setLocales(lcls); // Date - short df = gp.getDateFormat(GlobalizationPreferences.DF_SHORT, GlobalizationPreferences.DF_NONE); pattern = ((SimpleDateFormat)df).toPattern(); // fr_CA pattern must be used if (!pattern.equals("yy-MM-dd")) { errln("FAIL: SHORT date pattern is " + pattern + " Expected: yy-MM-dd"); } // Set locale - en_GB logln("Set locale - en_GB"); gp.setLocale(new ULocale("en_GB")); // Date - full df = gp.getDateFormat(GlobalizationPreferences.DF_FULL, GlobalizationPreferences.DF_NONE); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("EEEE, d MMMM y")) { errln("FAIL: FULL date pattern is " + pattern + " Expected: EEEE, d MMMM y"); } // Date - long df = gp.getDateFormat(GlobalizationPreferences.DF_LONG, GlobalizationPreferences.DF_NONE); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("d MMMM y")) { errln("FAIL: LONG date pattern is " + pattern + " Expected: d MMMM y"); } // Date - medium df = gp.getDateFormat(GlobalizationPreferences.DF_MEDIUM, GlobalizationPreferences.DF_NONE); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("d MMM y")) { errln("FAIL: MEDIUM date pattern is " + pattern + " Expected: d MMM y"); } // Date - short df = gp.getDateFormat(GlobalizationPreferences.DF_SHORT, GlobalizationPreferences.DF_NONE); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("dd/MM/yyyy")) { errln("FAIL: SHORT date pattern is " + pattern + " Expected: dd/MM/yyyy"); } // Time - full df = gp.getDateFormat(GlobalizationPreferences.DF_NONE, GlobalizationPreferences.DF_FULL); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("HH:mm:ss zzzz")) { errln("FAIL: FULL time pattern is " + pattern + " Expected: HH:mm:ss zzzz"); } // Time - long df = gp.getDateFormat(GlobalizationPreferences.DF_NONE, GlobalizationPreferences.DF_LONG); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("HH:mm:ss z")) { errln("FAIL: LONG time pattern is " + pattern + " Expected: HH:mm:ss z"); } // Time - medium df = gp.getDateFormat(GlobalizationPreferences.DF_NONE, GlobalizationPreferences.DF_MEDIUM); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("HH:mm:ss")) { errln("FAIL: MEDIUM time pattern is " + pattern + " Expected: HH:mm:ss"); } // Time - short df = gp.getDateFormat(GlobalizationPreferences.DF_NONE, GlobalizationPreferences.DF_SHORT); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("HH:mm")) { errln("FAIL: SHORT time pattern is " + pattern + " Expected: HH:mm"); } // Date/Time - full df = gp.getDateFormat(GlobalizationPreferences.DF_FULL, GlobalizationPreferences.DF_FULL); pattern = ((SimpleDateFormat)df).toPattern(); if (!pattern.equals("EEEE, d MMMM y HH:mm:ss zzzz")) { errln("FAIL: FULL date/time pattern is " + pattern + " Expected: EEEE, d MMMM y HH:mm:ss zzzz"); } // Invalid style boolean illegalArg = false; try { df = gp.getDateFormat(-1, GlobalizationPreferences.DF_NONE); } catch (IllegalArgumentException iae) { logln("Illegal date style -1"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getDateFormat() must throw IllegalArgumentException for dateStyle -1"); } illegalArg = false; try { df = gp.getDateFormat(GlobalizationPreferences.DF_NONE, GlobalizationPreferences.DF_NONE); } catch (IllegalArgumentException iae) { logln("Illegal style - dateStyle:DF_NONE / timeStyle:DF_NONE"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getDateFormat() must throw IllegalArgumentException for dateStyle:DF_NONE/timeStyle:DF_NONE"); } // Set explicit time zone logln("Set timezone - America/Sao_Paulo"); TimeZone tz = TimeZone.getTimeZone("America/Sao_Paulo"); gp.setTimeZone(tz); df = gp.getDateFormat(GlobalizationPreferences.DF_LONG, GlobalizationPreferences.DF_MEDIUM); String tzid = df.getTimeZone().getID(); if (!tzid.equals("America/Sao_Paulo")) { errln("FAIL: The DateFormat instance must use timezone America/Sao_Paulo"); } // Set explicit calendar logln("Set calendar - japanese"); Calendar jcal = new JapaneseCalendar(); jcal.setTimeZone(TimeZone.getTimeZone("Asia/Tokyo")); gp.setCalendar(jcal); df = gp.getDateFormat(GlobalizationPreferences.DF_SHORT, GlobalizationPreferences.DF_SHORT); Calendar dfCal = df.getCalendar(); if (!(dfCal instanceof JapaneseCalendar)) { errln("FAIL: The DateFormat instance must use Japanese calendar"); } // TimeZone must be still America/Sao_Paulo tzid = df.getTimeZone().getID(); if (!tzid.equals("America/Sao_Paulo")) { errln("FAIL: The DateFormat instance must use timezone America/Sao_Paulo"); } // Set explicit DateFormat logln("Set explicit date format - full date"); DateFormat customFD = DateFormat.getDateInstance(new IslamicCalendar(), DateFormat.FULL, new ULocale("ar_SA")); customFD.setTimeZone(TimeZone.getTimeZone("Asia/Riyadh")); gp.setDateFormat(GlobalizationPreferences.DF_FULL, GlobalizationPreferences.DF_NONE, customFD); df = gp.getDateFormat(GlobalizationPreferences.DF_FULL, GlobalizationPreferences.DF_NONE); dfCal = df.getCalendar(); if (!(dfCal instanceof IslamicCalendar)) { errln("FAIL: The DateFormat instance must use Islamic calendar"); } // TimeZone in the custom DateFormat is overridden by GP's timezone setting tzid = df.getTimeZone().getID(); if (!tzid.equals("America/Sao_Paulo")) { errln("FAIL: The DateFormat instance must use timezone America/Sao_Paulo"); } // Freeze logln("Freeze this object"); gp.freeze(); DateFormat customLD = DateFormat.getDateInstance(new BuddhistCalendar(), DateFormat.LONG, new ULocale("th")); customLD.setTimeZone(TimeZone.getTimeZone("Asia/Bangkok")); boolean isFrozen = false; try { gp.setDateFormat(GlobalizationPreferences.DF_LONG, GlobalizationPreferences.DF_NONE, customLD); } catch (UnsupportedOperationException uoe) { logln("setDateFormat is blocked"); isFrozen = true; } if (!isFrozen) { errln("FAIL: setDateFormat must be blocked after frozen"); } // Modifiable clone logln("cloneAsThawed"); GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); gp1.setDateFormat(GlobalizationPreferences.DF_LONG, GlobalizationPreferences.DF_NONE, customLD); df = gp1.getDateFormat(GlobalizationPreferences.DF_SHORT, GlobalizationPreferences.DF_SHORT); dfCal = df.getCalendar(); if (!(dfCal instanceof JapaneseCalendar)) { errln("FAIL: The DateFormat instance must use Japanese calendar"); } // TimeZone must be still America/Sao_Paulo tzid = df.getTimeZone().getID(); if (!tzid.equals("America/Sao_Paulo")) { errln("FAIL: The DateFormat instance must use timezone America/Sao_Paulo"); } df = gp1.getDateFormat(GlobalizationPreferences.DF_LONG, GlobalizationPreferences.DF_NONE); dfCal = df.getCalendar(); if (!(dfCal instanceof BuddhistCalendar)) { errln("FAIL: The DateFormat instance must use Buddhist calendar"); } // TimeZone must be still America/Sao_Paulo tzid = df.getTimeZone().getID(); if (!tzid.equals("America/Sao_Paulo")) { errln("FAIL: The DateFormat instance must use timezone America/Sao_Paulo"); } } public void TestNumberFormat() { GlobalizationPreferences gp = new GlobalizationPreferences(); NumberFormat nf; String numStr; double num = 123456.789; // Set unsupported locale with supported territory ang_KR logln("Set locale - ang_KR"); gp.setLocale(new ULocale("ang_KR")); nf = gp.getNumberFormat(GlobalizationPreferences.NF_CURRENCY); numStr = nf.format(num); if (!numStr.equals("\u20a9\u00a0123,457")) { errln("FAIL: Number string is " + numStr + " Expected: \u20a9\u00a0123,457"); } // Set locale - de_DE logln("Set locale - de_DE"); gp.setLocale(new ULocale("de_DE")); // NF_NUMBER logln("NUMBER type"); nf = gp.getNumberFormat(GlobalizationPreferences.NF_NUMBER); numStr = nf.format(num); if (!numStr.equals("123.456,789")) { errln("FAIL: Number string is " + numStr + " Expected: 123.456,789"); } // NF_CURRENCY logln("CURRENCY type"); nf = gp.getNumberFormat(GlobalizationPreferences.NF_CURRENCY); numStr = nf.format(num); if (!numStr.equals("123.456,79\u00a0\u20AC")) { errln("FAIL: Number string is " + numStr + " Expected: 123.456,79\u00a0\u20AC"); } // NF_PERCENT logln("PERCENT type"); nf = gp.getNumberFormat(GlobalizationPreferences.NF_PERCENT); numStr = nf.format(num); if (!numStr.equals("12.345.679\u00a0%")) { errln("FAIL: Number string is " + numStr + " Expected: 12.345.679\u00a0%"); } // NF_SCIENTIFIC logln("SCIENTIFIC type"); nf = gp.getNumberFormat(GlobalizationPreferences.NF_SCIENTIFIC); numStr = nf.format(num); if (!numStr.equals("1,23456789E5")) { errln("FAIL: Number string is " + numStr + " Expected: 1,23456789E5"); } // NF_INTEGER logln("INTEGER type"); nf = gp.getNumberFormat(GlobalizationPreferences.NF_INTEGER); numStr = nf.format(num); if (!numStr.equals("123.457")) { errln("FAIL: Number string is " + numStr + " Expected: 123.457"); } // Invalid number type logln("INVALID type"); boolean illegalArg = false; try { nf = gp.getNumberFormat(100); } catch (IllegalArgumentException iae) { logln("Illegal number format type 100"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getNumberFormat must throw IllegalArgumentException for type 100"); } illegalArg = false; try { nf = gp.getNumberFormat(-1); } catch (IllegalArgumentException iae) { logln("Illegal number format type -1"); illegalArg = true; } if (!illegalArg) { errln("FAIL: getNumberFormat must throw IllegalArgumentException for type -1"); } // Set explicit territory logln("Set territory - US"); gp.setTerritory("US"); nf = gp.getNumberFormat(GlobalizationPreferences.NF_CURRENCY); numStr = nf.format(num); if (!numStr.equals("123.456,79\u00a0$")) { errln("FAIL: Number string is " + numStr + " Expected: 123.456,79\u00a0$"); } // Set explicit currency logln("Set currency - GBP"); gp.setCurrency(Currency.getInstance("GBP")); nf = gp.getNumberFormat(GlobalizationPreferences.NF_CURRENCY); numStr = nf.format(num); if (!numStr.equals("123.456,79\u00a0\u00A3")) { errln("FAIL: Number string is " + numStr + " Expected: 123.456,79\u00a0\u00A3"); } // Set exliplicit NumberFormat logln("Set explicit NumberFormat objects"); NumberFormat customNum = NumberFormat.getNumberInstance(new ULocale("he_IL")); gp.setNumberFormat(GlobalizationPreferences.NF_NUMBER, customNum); NumberFormat customCur = NumberFormat.getCurrencyInstance(new ULocale("zh_CN")); gp.setNumberFormat(GlobalizationPreferences.NF_CURRENCY, customCur); NumberFormat customPct = NumberFormat.getPercentInstance(new ULocale("el_GR")); gp.setNumberFormat(GlobalizationPreferences.NF_PERCENT, customPct); NumberFormat customSci = NumberFormat.getScientificInstance(new ULocale("ru_RU")); gp.setNumberFormat(GlobalizationPreferences.NF_SCIENTIFIC, customSci); NumberFormat customInt = NumberFormat.getIntegerInstance(new ULocale("pt_PT")); gp.setNumberFormat(GlobalizationPreferences.NF_INTEGER, customInt); nf = gp.getNumberFormat(GlobalizationPreferences.NF_NUMBER); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("he_IL")) { errln("FAIL: The NumberFormat instance must use locale he_IL"); } nf = gp.getNumberFormat(GlobalizationPreferences.NF_CURRENCY); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("zh_CN")) { errln("FAIL: The NumberFormat instance must use locale zh_CN"); } nf = gp.getNumberFormat(GlobalizationPreferences.NF_PERCENT); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("el_GR")) { errln("FAIL: The NumberFormat instance must use locale el_GR"); } nf = gp.getNumberFormat(GlobalizationPreferences.NF_SCIENTIFIC); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("ru_RU")) { errln("FAIL: The NumberFormat instance must use locale ru_RU"); } nf = gp.getNumberFormat(GlobalizationPreferences.NF_INTEGER); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("pt_PT")) { errln("FAIL: The NumberFormat instance must use locale pt_PT"); } NumberFormat customNum1 = NumberFormat.getNumberInstance(new ULocale("hi_IN")); // Freeze logln("Freeze this object"); boolean isFrozen = false; gp.freeze(); try { gp.setNumberFormat(GlobalizationPreferences.NF_NUMBER, customNum1); } catch (UnsupportedOperationException uoe) { logln("setNumberFormat is blocked"); isFrozen = true; } if (!isFrozen) { errln("FAIL: setNumberFormat must be blocked after frozen"); } // Create a modifiable clone GlobalizationPreferences gp1 = (GlobalizationPreferences)gp.cloneAsThawed(); // Number type format's locale is still he_IL nf = gp1.getNumberFormat(GlobalizationPreferences.NF_NUMBER); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("he_IL")) { errln("FAIL: The NumberFormat instance must use locale he_IL"); } logln("Set custom number format using locale hi_IN"); gp1.setNumberFormat(GlobalizationPreferences.NF_NUMBER, customNum1); nf = gp1.getNumberFormat(GlobalizationPreferences.NF_NUMBER); if (!nf.getLocale(ULocale.VALID_LOCALE).toString().equals("hi_IN")) { errln("FAIL: The NumberFormat instance must use locale hi_IN"); } } /* * JB#5380 GlobalizationPreferences#getCalendar() should return a Calendar object * initialized with the current time */ public void TestJB5380() { GlobalizationPreferences gp = new GlobalizationPreferences(); GregorianCalendar gcal = new GregorianCalendar(); // set way old date gcal.set(Calendar.YEAR, 1950); // set calendar to GP gp.setCalendar(gcal); Calendar cal = gp.getCalendar(); // Calendar instance returned from GP should be initialized // by the current time long timeDiff = System.currentTimeMillis() - cal.getTimeInMillis(); if (Math.abs(timeDiff) > 1000) { // if difference is more than 1 second.. errln("FAIL: The Calendar was not initialized by current time - difference:" + timeDiff); } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestDecimalFormatAPIC.java0000644000175000017500000004620611361050730025756 0ustar twernertwerner//##header /* ******************************************************************************* * Copyright (C) 2001-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /** * Port From: ICU4C v1.8.1 : format : IntlTestDecimalFormatAPI * Source File: $ICU4CRoot/source/test/intltest/dcfmapts.cpp **/ package com.ibm.icu.dev.test.format; import java.text.AttributedCharacterIterator; import java.text.FieldPosition; import java.text.Format; import java.text.ParsePosition; import java.util.Iterator; import java.util.Locale; import java.util.Vector; import com.ibm.icu.text.DecimalFormat; import com.ibm.icu.text.DecimalFormatSymbols; import com.ibm.icu.text.NumberFormat; // This is an API test, not a unit test. It doesn't test very many cases, and doesn't // try to test the full functionality. It just calls each function in the class and // verifies that it works on a basic level. public class IntlTestDecimalFormatAPIC extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new IntlTestDecimalFormatAPIC().run(args); } // This test checks various generic API methods in DecimalFormat to achieve 100% API coverage. public void TestAPI() { logln("DecimalFormat API test---"); logln(""); Locale.setDefault(Locale.ENGLISH); // ======= Test constructors logln("Testing DecimalFormat constructors"); DecimalFormat def = new DecimalFormat(); final String pattern = new String("#,##0.# FF"); DecimalFormat pat = null; try { pat = new DecimalFormat(pattern); } catch (IllegalArgumentException e) { errln("ERROR: Could not create DecimalFormat (pattern)"); } DecimalFormatSymbols symbols = new DecimalFormatSymbols(Locale.FRENCH); DecimalFormat cust1 = new DecimalFormat(pattern, symbols); // ======= Test clone(), assignment, and equality logln("Testing clone() and equality operators"); Format clone = (Format) def.clone(); if (!def.equals(clone)) { errln("ERROR: Clone() failed"); } // ======= Test various format() methods logln("Testing various format() methods"); // final double d = -10456.0037; // this appears as -10456.003700000001 on NT // final double d = -1.04560037e-4; // this appears as -1.0456003700000002E-4 on NT final double d = -10456.00370000000000; // this works! final long l = 100000000; logln("" + Double.toString(d) + " is the double value"); StringBuffer res1 = new StringBuffer(); StringBuffer res2 = new StringBuffer(); StringBuffer res3 = new StringBuffer(); StringBuffer res4 = new StringBuffer(); FieldPosition pos1 = new FieldPosition(0); FieldPosition pos2 = new FieldPosition(0); FieldPosition pos3 = new FieldPosition(0); FieldPosition pos4 = new FieldPosition(0); res1 = def.format(d, res1, pos1); logln("" + Double.toString(d) + " formatted to " + res1); res2 = pat.format(l, res2, pos2); logln("" + l + " formatted to " + res2); res3 = cust1.format(d, res3, pos3); logln("" + Double.toString(d) + " formatted to " + res3); res4 = cust1.format(l, res4, pos4); logln("" + l + " formatted to " + res4); // ======= Test parse() logln("Testing parse()"); String text = new String("-10,456.0037"); ParsePosition pos = new ParsePosition(0); String patt = new String("#,##0.#"); pat.applyPattern(patt); double d2 = pat.parse(text, pos).doubleValue(); if (d2 != d) { errln( "ERROR: Roundtrip failed (via parse(" + Double.toString(d2) + " != " + Double.toString(d) + ")) for " + text); } logln(text + " parsed into " + (long) d2); // ======= Test getters and setters logln("Testing getters and setters"); final DecimalFormatSymbols syms = pat.getDecimalFormatSymbols(); def.setDecimalFormatSymbols(syms); if (!pat.getDecimalFormatSymbols().equals(def.getDecimalFormatSymbols())) { errln("ERROR: set DecimalFormatSymbols() failed"); } String posPrefix; pat.setPositivePrefix("+"); posPrefix = pat.getPositivePrefix(); logln("Positive prefix (should be +): " + posPrefix); if (posPrefix != "+") { errln("ERROR: setPositivePrefix() failed"); } String negPrefix; pat.setNegativePrefix("-"); negPrefix = pat.getNegativePrefix(); logln("Negative prefix (should be -): " + negPrefix); if (negPrefix != "-") { errln("ERROR: setNegativePrefix() failed"); } String posSuffix; pat.setPositiveSuffix("_"); posSuffix = pat.getPositiveSuffix(); logln("Positive suffix (should be _): " + posSuffix); if (posSuffix != "_") { errln("ERROR: setPositiveSuffix() failed"); } String negSuffix; pat.setNegativeSuffix("~"); negSuffix = pat.getNegativeSuffix(); logln("Negative suffix (should be ~): " + negSuffix); if (negSuffix != "~") { errln("ERROR: setNegativeSuffix() failed"); } long multiplier = 0; pat.setMultiplier(8); multiplier = pat.getMultiplier(); logln("Multiplier (should be 8): " + multiplier); if (multiplier != 8) { errln("ERROR: setMultiplier() failed"); } int groupingSize = 0; pat.setGroupingSize(2); groupingSize = pat.getGroupingSize(); logln("Grouping size (should be 2): " + (long) groupingSize); if (groupingSize != 2) { errln("ERROR: setGroupingSize() failed"); } pat.setDecimalSeparatorAlwaysShown(true); boolean tf = pat.isDecimalSeparatorAlwaysShown(); logln( "DecimalSeparatorIsAlwaysShown (should be true) is " + (tf ? "true" : "false")); if (tf != true) { errln("ERROR: setDecimalSeparatorAlwaysShown() failed"); } String funkyPat; funkyPat = pat.toPattern(); logln("Pattern is " + funkyPat); String locPat; locPat = pat.toLocalizedPattern(); logln("Localized pattern is " + locPat); // ======= Test applyPattern() logln("Testing applyPattern()"); String p1 = new String("#,##0.0#;(#,##0.0#)"); logln("Applying pattern " + p1); pat.applyPattern(p1); String s2; s2 = pat.toPattern(); logln("Extracted pattern is " + s2); if (!s2.equals(p1)) { errln("ERROR: toPattern() result did not match pattern applied"); } String p2 = new String("#,##0.0# FF;(#,##0.0# FF)"); logln("Applying pattern " + p2); pat.applyLocalizedPattern(p2); String s3; s3 = pat.toLocalizedPattern(); logln("Extracted pattern is " + s3); if (!s3.equals(p2)) { errln("ERROR: toLocalizedPattern() result did not match pattern applied"); } // ======= Test getStaticClassID() // logln("Testing instanceof()"); // try { // NumberFormat test = new DecimalFormat(); // if (! (test instanceof DecimalFormat)) { // errln("ERROR: instanceof failed"); // } // } // catch (Exception e) { // errln("ERROR: Couldn't create a DecimalFormat"); // } } public void TestRounding() { double Roundingnumber = 2.55; double Roundingnumber1 = -2.55; //+2.55 results -2.55 results double result[] = { 3, -3, 2, -2, 3, -2, 2, -3, 3, -3, 3, -3, 3, -3 }; DecimalFormat pat = new DecimalFormat(); String s = ""; s = pat.toPattern(); logln("pattern = " + s); int mode; int i = 0; String message; String resultStr; for (mode = 0; mode < 7; mode++) { pat.setRoundingMode(mode); if (pat.getRoundingMode() != mode) { errln( "SetRoundingMode or GetRoundingMode failed for mode=" + mode); } //for +2.55 with RoundingIncrement=1.0 pat.setRoundingIncrement(1.0); resultStr = pat.format(Roundingnumber); message = "round(" + (double) Roundingnumber + "," + mode + ",FALSE) with RoundingIncrement=1.0==>"; verify(message, resultStr, result[i++]); message = ""; resultStr = ""; //for -2.55 with RoundingIncrement=1.0 resultStr = pat.format(Roundingnumber1); message = "round(" + (double) Roundingnumber1 + "," + mode + ",FALSE) with RoundingIncrement=1.0==>"; verify(message, resultStr, result[i++]); message = ""; resultStr = ""; } } //#if defined(FOUNDATION10) || defined(J2SE13) //#else public void testFormatToCharacterIterator() { Number number = new Double(350.76); Number negativeNumber = new Double(-350.76); Locale us = Locale.US; // test number instance t_Format(1, number, NumberFormat.getNumberInstance(us), getNumberVectorUS()); // test percent instance t_Format(3, number, NumberFormat.getPercentInstance(us), getPercentVectorUS()); // test permille pattern DecimalFormat format = new DecimalFormat("###0.##\u2030"); t_Format(4, number, format, getPermilleVector()); // test exponential pattern with positive exponent format = new DecimalFormat("00.0#E0"); t_Format(5, number, format, getPositiveExponentVector()); // test exponential pattern with negative exponent format = new DecimalFormat("0000.0#E0"); t_Format(6, number, format, getNegativeExponentVector()); // test currency instance with US Locale t_Format(7, number, NumberFormat.getCurrencyInstance(us), getPositiveCurrencyVectorUS()); // test negative currency instance with US Locale t_Format(8, negativeNumber, NumberFormat.getCurrencyInstance(us), getNegativeCurrencyVectorUS()); // test multiple grouping seperators number = new Long(100300400); t_Format(11, number, NumberFormat.getNumberInstance(us), getNumberVector2US()); // test 0 number = new Long(0); t_Format(12, number, NumberFormat.getNumberInstance(us), getZeroVector()); } private static Vector getNumberVectorUS() { Vector v = new Vector(); v.add(new FieldContainer(0, 3, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(3, 4, NumberFormat.Field.DECIMAL_SEPARATOR)); v.add(new FieldContainer(4, 6, NumberFormat.Field.FRACTION)); return v; } // private static Vector getPositiveCurrencyVectorTR() { // Vector v = new Vector(); // v.add(new FieldContainer(0, 3, NumberFormat.Field.INTEGER)); // v.add(new FieldContainer(4, 6, NumberFormat.Field.CURRENCY)); // return v; // } // // private static Vector getNegativeCurrencyVectorTR() { // Vector v = new Vector(); // v.add(new FieldContainer(0, 1, NumberFormat.Field.SIGN)); // v.add(new FieldContainer(1, 4, NumberFormat.Field.INTEGER)); // v.add(new FieldContainer(5, 7, NumberFormat.Field.CURRENCY)); // return v; // } private static Vector getPositiveCurrencyVectorUS() { Vector v = new Vector(); v.add(new FieldContainer(0, 1, NumberFormat.Field.CURRENCY)); v.add(new FieldContainer(1, 4, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(4, 5, NumberFormat.Field.DECIMAL_SEPARATOR)); v.add(new FieldContainer(5, 7, NumberFormat.Field.FRACTION)); return v; } private static Vector getNegativeCurrencyVectorUS() { Vector v = new Vector(); v.add(new FieldContainer(1, 2, NumberFormat.Field.CURRENCY)); v.add(new FieldContainer(2, 5, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(5, 6, NumberFormat.Field.DECIMAL_SEPARATOR)); v.add(new FieldContainer(6, 8, NumberFormat.Field.FRACTION)); return v; } private static Vector getPercentVectorUS() { Vector v = new Vector(); v.add(new FieldContainer(0, 2, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(2, 3, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(2, 3, NumberFormat.Field.GROUPING_SEPARATOR)); v.add(new FieldContainer(3, 6, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(6, 7, NumberFormat.Field.PERCENT)); return v; } private static Vector getPermilleVector() { Vector v = new Vector(); v.add(new FieldContainer(0, 6, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(6, 7, NumberFormat.Field.PERMILLE)); return v; } private static Vector getNegativeExponentVector() { Vector v = new Vector(); v.add(new FieldContainer(0, 4, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(4, 5, NumberFormat.Field.DECIMAL_SEPARATOR)); v.add(new FieldContainer(5, 6, NumberFormat.Field.FRACTION)); v.add(new FieldContainer(6, 7, NumberFormat.Field.EXPONENT_SYMBOL)); v.add(new FieldContainer(7, 8, NumberFormat.Field.EXPONENT_SIGN)); v.add(new FieldContainer(8, 9, NumberFormat.Field.EXPONENT)); return v; } private static Vector getPositiveExponentVector() { Vector v = new Vector(); v.add(new FieldContainer(0, 2, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(2, 3, NumberFormat.Field.DECIMAL_SEPARATOR)); v.add(new FieldContainer(3, 5, NumberFormat.Field.FRACTION)); v.add(new FieldContainer(5, 6, NumberFormat.Field.EXPONENT_SYMBOL)); v.add(new FieldContainer(6, 7, NumberFormat.Field.EXPONENT)); return v; } private static Vector getNumberVector2US() { Vector v = new Vector(); v.add(new FieldContainer(0, 3, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(3, 4, NumberFormat.Field.GROUPING_SEPARATOR)); v.add(new FieldContainer(3, 4, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(4, 7, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(7, 8, NumberFormat.Field.GROUPING_SEPARATOR)); v.add(new FieldContainer(7, 8, NumberFormat.Field.INTEGER)); v.add(new FieldContainer(8, 11, NumberFormat.Field.INTEGER)); return v; } private static Vector getZeroVector() { Vector v = new Vector(); v.add(new FieldContainer(0, 1, NumberFormat.Field.INTEGER)); return v; } private void t_Format(int count, Object object, Format format, Vector expectedResults) { Vector results = findFields(format.formatToCharacterIterator(object)); assertTrue("Test " + count + ": Format returned incorrect CharacterIterator for " + format.format(object), compare(results, expectedResults)); } /** * compares two vectors regardless of the order of their elements */ private static boolean compare(Vector vector1, Vector vector2) { return vector1.size() == vector2.size() && vector1.containsAll(vector2); } /** * finds attributes with regards to char index in this * AttributedCharacterIterator, and puts them in a vector * * @param iterator * @return a vector, each entry in this vector are of type FieldContainer , * which stores start and end indexes and an attribute this range * has */ private static Vector findFields(AttributedCharacterIterator iterator) { Vector result = new Vector(); while (iterator.getIndex() != iterator.getEndIndex()) { int start = iterator.getRunStart(); int end = iterator.getRunLimit(); Iterator it = iterator.getAttributes().keySet().iterator(); while (it.hasNext()) { AttributedCharacterIterator.Attribute attribute = (AttributedCharacterIterator.Attribute) it .next(); Object value = iterator.getAttribute(attribute); result.add(new FieldContainer(start, end, attribute, value)); // System.out.println(start + " " + end + ": " + attribute + ", // " + value ); // System.out.println("v.add(new FieldContainer(" + start +"," + // end +"," + attribute+ "," + value+ "));"); } iterator.setIndex(end); } return result; } protected static class FieldContainer { int start, end; AttributedCharacterIterator.Attribute attribute; Object value; // called from support_decimalformat and support_simpledateformat tests public FieldContainer(int start, int end, AttributedCharacterIterator.Attribute attribute) { this(start, end, attribute, attribute); } // called from support_messageformat tests public FieldContainer(int start, int end, AttributedCharacterIterator.Attribute attribute, int value) { this(start, end, attribute, new Integer(value)); } // called from support_messageformat tests public FieldContainer(int start, int end, AttributedCharacterIterator.Attribute attribute, Object value) { this.start = start; this.end = end; this.attribute = attribute; this.value = value; } public boolean equals(Object obj) { if (!(obj instanceof FieldContainer)) return false; FieldContainer fc = (FieldContainer) obj; return (start == fc.start && end == fc.end && attribute == fc.attribute && value.equals(fc.value)); } } //#endif /*Helper functions */ public void verify(String message, String got, double expected) { logln(message + got + " Expected : " + (long)expected); String expectedStr = ""; expectedStr=expectedStr + (long)expected; if(!got.equals(expectedStr) ) { errln("ERROR: Round() failed: " + message + got + " Expected : " + expectedStr); } } } //eof icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestNumberFormat.java0000644000175000017500000002166711361046232025221 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2007, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /** * Port From: ICU4C v1.8.1 : format : IntlTestNumberFormat * Source File: $ICU4CRoot/source/test/intltest/tsnmfmt.cpp **/ package com.ibm.icu.dev.test.format; import java.util.Locale; import java.util.Random; import com.ibm.icu.text.*; /** * This test does round-trip testing (format -> parse -> format -> parse -> etc.) of * NumberFormat. */ public class IntlTestNumberFormat extends com.ibm.icu.dev.test.TestFmwk { public NumberFormat fNumberFormat; public static void main(String[] args) throws Exception { new IntlTestNumberFormat().run(args); } /** * Internal use */ public void _testLocale(Locale locale) { String localeName = locale + " (" + locale.getDisplayName() + ")"; logln("Number test " + localeName); fNumberFormat = NumberFormat.getInstance(locale); _testFormat(); logln("Currency test " + localeName); fNumberFormat = NumberFormat.getCurrencyInstance(locale); _testFormat(); logln("Percent test " + localeName); fNumberFormat = NumberFormat.getPercentInstance(locale); _testFormat(); } /** * call _testFormat for currency, percent and plain number instances */ public void TestLocale() { Locale locale = Locale.getDefault(); String localeName = locale + " (" + locale.getDisplayName() + ")"; logln("Number test " + localeName); fNumberFormat = NumberFormat.getInstance(locale); _testFormat(); logln("Currency test " + localeName); fNumberFormat = NumberFormat.getCurrencyInstance(locale); _testFormat(); logln("Percent test " + localeName); fNumberFormat = NumberFormat.getPercentInstance(locale); _testFormat(); } /** * call tryIt with many variations, called by testLocale */ public void _testFormat() { if (fNumberFormat == null){ errln("**** FAIL: Null format returned by createXxxInstance."); return; } DecimalFormat s = (DecimalFormat)fNumberFormat; logln("pattern :" + s.toPattern()); tryIt(-2.02147304840132e-68); tryIt(3.88057859588817e-68); tryIt(-2.64651110485945e+65); tryIt(9.29526819488338e+64); tryIt(-2.02147304840132e-100); tryIt(3.88057859588817e-096); tryIt(-2.64651110485945e+306); tryIt(9.29526819488338e+250); tryIt(-9.18228054496402e+64); tryIt(-9.69413034454191e+64); tryIt(-9.18228054496402e+255); tryIt(-9.69413034454191e+273); tryIt(1.234e-200); tryIt(-2.3e-168); tryIt(Double.NaN); tryIt(Double.POSITIVE_INFINITY); tryIt(Double.NEGATIVE_INFINITY); tryIt(251887531); tryIt(5e-20 / 9); tryIt(5e20 / 9); tryIt(1.234e-50); tryIt(9.99999999999996); tryIt(9.999999999999996); tryIt(Integer.MIN_VALUE); tryIt(Integer.MAX_VALUE); tryIt((double)Integer.MIN_VALUE); tryIt((double)Integer.MAX_VALUE); tryIt((double)Integer.MIN_VALUE - 1.0); tryIt((double)Integer.MAX_VALUE + 1.0); tryIt(5.0 / 9.0 * 1e-20); tryIt(4.0 / 9.0 * 1e-20); tryIt(5.0 / 9.0 * 1e+20); tryIt(4.0 / 9.0 * 1e+20); tryIt(2147483647.); tryIt(0); tryIt(0.0); tryIt(1); tryIt(10); tryIt(100); tryIt(-1); tryIt(-10); tryIt(-100); tryIt(-1913860352); Random random = createRandom(); // use test framework's random seed for (int j = 0; j < 10; j++) { double d = random.nextDouble()*2e10 - 1e10; tryIt(d); } } /** * Perform tests using aNumber and fNumberFormat, called in many variations */ public void tryIt(double aNumber) { final int DEPTH = 10; double[] number = new double[DEPTH]; String[] string = new String[DEPTH]; int numberMatch = 0; int stringMatch = 0; boolean dump = false; int i; for (i = 0; i < DEPTH; i++) { if (i == 0) { number[i] = aNumber; } else { try { number[i - 1] = fNumberFormat.parse(string[i - 1]).doubleValue(); } catch(java.text.ParseException pe) { errln("**** FAIL: Parse of " + string[i-1] + " failed."); dump = true; break; } } string[i] = fNumberFormat.format(number[i]); if (i > 0) { if (numberMatch == 0 && number[i] == number[i-1]) numberMatch = i; else if (numberMatch > 0 && number[i] != number[i-1]) { errln("**** FAIL: Numeric mismatch after match."); dump = true; break; } if (stringMatch == 0 && string[i] == string[i-1]) stringMatch = i; else if (stringMatch > 0 && string[i] != string[i-1]) { errln("**** FAIL: String mismatch after match."); dump = true; break; } } if (numberMatch > 0 && stringMatch > 0) break; if (i == DEPTH) --i; if (stringMatch > 2 || numberMatch > 2) { errln("**** FAIL: No string and/or number match within 2 iterations."); dump = true; } if (dump) { for (int k=0; k<=i; ++k) { logln(k + ": " + number[k] + " F> " + string[k] + " P> "); } } } } /** * perform tests using aNumber and fNumberFormat, called in many variations **/ public void tryIt(int aNumber) { long number; String stringNum = fNumberFormat.format(aNumber); try { number = fNumberFormat.parse(stringNum).longValue(); } catch (java.text.ParseException pe) { errln("**** FAIL: Parse of " + stringNum + " failed."); return; } if (number != aNumber) { errln("**** FAIL: Parse of " + stringNum + " failed. Got:" + number + " Expected:" + aNumber); } } /** * test NumberFormat::getAvailableLocales **/ public void TestAvailableLocales() { final Locale[] locales = NumberFormat.getAvailableLocales(); int count = locales.length; logln(count + " available locales"); if (count != 0) { String all = ""; for (int i = 0; i< count; ++i) { if (i!=0) all += ", "; all += locales[i].getDisplayName(); } logln(all); } else errln("**** FAIL: Zero available locales or null array pointer"); } /** * call testLocale for all locales **/ public void TestMonster() { final String SEP = "============================================================\n"; int count; final Locale[] allLocales = NumberFormat.getAvailableLocales(); Locale[] locales = allLocales; count = locales.length; if (count != 0) { if (getInclusion() < 10 && count > 6) { count = 6; locales = new Locale[6]; locales[0] = allLocales[0]; locales[1] = allLocales[1]; locales[2] = allLocales[2]; // In a quick test, make sure we test locales that use // currency prefix, currency suffix, and choice currency // logic. Otherwise bugs in these areas can slip through. locales[3] = new Locale("ar", "AE", ""); locales[4] = new Locale("cs", "CZ", ""); locales[5] = new Locale("en", "IN", ""); } for (int i=0; i Message : " + foo.getMessage()); } } /** * DecimalFormat does not round up correctly. */ public void Test4071492 (){ double x = 0.00159999; NumberFormat nf = NumberFormat.getInstance(); nf.setMaximumFractionDigits(4); String out = nf.format(x); logln("0.00159999 formats with 4 fractional digits to " + out); String expected = "0.0016"; if (!out.equals(expected)) errln("FAIL: Expected " + expected); } /** * A space as a group separator for localized pattern causes * wrong format. WorkAround : use non-breaking space. */ public void Test4086575() { NumberFormat nf = NumberFormat.getInstance(Locale.FRANCE); logln("nf toPattern1: " + ((DecimalFormat)nf).toPattern()); logln("nf toLocPattern1: " + ((DecimalFormat)nf).toLocalizedPattern()); // No group separator logln("...applyLocalizedPattern ###,00;(###,00) "); ((DecimalFormat)nf).applyLocalizedPattern("###,00;(###,00)"); logln("nf toPattern2: " + ((DecimalFormat)nf).toPattern()); logln("nf toLocPattern2: " + ((DecimalFormat)nf).toLocalizedPattern()); logln("nf: " + nf.format(1234)); // 1234,00 logln("nf: " + nf.format(-1234)); // (1234,00) // Space as group separator logln("...applyLocalizedPattern # ###,00;(# ###,00) "); ((DecimalFormat)nf).applyLocalizedPattern("#\u00a0###,00;(#\u00a0###,00)"); logln("nf toPattern2: " + ((DecimalFormat)nf).toPattern()); logln("nf toLocPattern2: " + ((DecimalFormat)nf).toLocalizedPattern()); String buffer = nf.format(1234); if (!buffer.equals("1\u00a0234,00")) errln("nf : " + buffer); // Expect 1 234,00 buffer = nf.format(-1234); if (!buffer.equals("(1\u00a0234,00)")) errln("nf : " + buffer); // Expect (1 234,00) // Erroneously prints: // 1234,00 , // (1234,00 ,) } /** * DecimalFormat.parse returns wrong value */ public void Test4068693() { logln("----- Test Application -----"); //ParsePosition pos; DecimalFormat df = new DecimalFormat(); Number d = df.parse("123.55456", new ParsePosition(0)); if (!d.toString().equals("123.55456")) { errln("Result -> " + d.doubleValue()); } } /* bugs 4069754, 4067878 * null pointer thrown when accessing a deserialized DecimalFormat * object. */ public void Test4069754() throws Exception { //try { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); myformat it = new myformat(); logln(it.Now()); oos.writeObject(it); oos.flush(); baos.close(); logln("Save OK!"); byte [] bytes = baos.toByteArray(); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes)); myformat o = (myformat)ois.readObject(); ois.close(); it.Now(); logln("Load OK!"); if (!o._dateFormat.equals(it._dateFormat)) { throw new Exception("The saved and loaded object are not equals!"); } logln("Compare OK!"); //} catch (Exception foo) { //errln("Test for bug 4069754 or 4057878 failed => Exception: " + foo.getMessage()); //} } /** * DecimalFormat.applyPattern(String) allows illegal patterns */ public void Test4087251 (){ DecimalFormat df = new DecimalFormat(); try { df.applyPattern("#.#.#"); logln("toPattern() returns \"" + df.toPattern() + "\""); errln("applyPattern(\"#.#.#\") doesn't throw IllegalArgumentException"); } catch (IllegalArgumentException e) { logln("Caught Illegal Argument Error !"); } // Second test; added 5/11/98 when reported to fail on 1.2b3 try { df.applyPattern("#0.0#0#0"); logln("toPattern() returns \"" + df.toPattern() + "\""); errln("applyPattern(\"#0.0#0#0\") doesn't throw IllegalArgumentException"); } catch (IllegalArgumentException e) { logln("Ok - IllegalArgumentException for #0.0#0#0"); } } /** * DecimalFormat.format() loses precision */ public void Test4090489 (){ DecimalFormat df = new DecimalFormat(); df.setMinimumFractionDigits(10); df.setGroupingUsed(false); double d = 1.000000000000001E7; java.math.BigDecimal bd = new java.math.BigDecimal(d); StringBuffer sb = new StringBuffer(""); FieldPosition fp = new FieldPosition(0); logln("d = " + d); logln("BigDecimal.toString(): " + bd.toString()); df.format(d, sb, fp); if (!sb.toString().equals("10000000.0000000100")) { errln("DecimalFormat.format(): " + sb.toString()); } } /** * DecimalFormat.format() loses precision */ public void Test4090504 () { double d = 1; logln("d = " + d); DecimalFormat df = new DecimalFormat(); StringBuffer sb; FieldPosition fp; try { for (int i = 17; i <= 20; i++) { df.setMaximumFractionDigits(i); sb = new StringBuffer(""); fp = new FieldPosition(0); logln(" getMaximumFractionDigits() = " + i); logln(" formated: " + df.format(d, sb, fp)); } } catch (Exception foo) { errln("Bug 4090504 regression test failed. Message : " + foo.getMessage()); } } /** * DecimalFormat.parse(String str, ParsePosition pp) loses precision */ public void Test4095713 () { DecimalFormat df = new DecimalFormat(); String str = "0.1234"; Double d1 = new Double(str); Number d2 = df.parse(str, new ParsePosition(0)); logln(d1.toString()); if (d2.doubleValue() != d1.doubleValue()) errln("Bug 4095713 test failed, new double value : " + d2.doubleValue()); } /** * DecimalFormat.parse() fails when multiplier is not set to 1 */ public void Test4092561 () { Locale savedLocale = Locale.getDefault(); Locale.setDefault(Locale.US); DecimalFormat df = new DecimalFormat(); String str = Long.toString(Long.MIN_VALUE); logln("Long.MIN_VALUE : " + df.parse(str, new ParsePosition(0)).toString()); df.setMultiplier(100); Number num = df.parse(str, new ParsePosition(0)); if (num.doubleValue() != -9.223372036854776E16) { errln("Bug 4092561 test failed when multiplier is set to not 1."); } Locale.setDefault(savedLocale); } /** * DecimalFormat: Negative format ignored. */ public void Test4092480 () { DecimalFormat dfFoo = new DecimalFormat("000"); try { dfFoo.applyPattern("0000;-000"); if (!dfFoo.toPattern().equals("#0000")) errln("dfFoo.toPattern : " + dfFoo.toPattern()); logln(dfFoo.format(42)); logln(dfFoo.format(-42)); dfFoo.applyPattern("000;-000"); if (!dfFoo.toPattern().equals("#000")) errln("dfFoo.toPattern : " + dfFoo.toPattern()); logln(dfFoo.format(42)); logln(dfFoo.format(-42)); dfFoo.applyPattern("000;-0000"); if (!dfFoo.toPattern().equals("#000")) errln("dfFoo.toPattern : " + dfFoo.toPattern()); logln(dfFoo.format(42)); logln(dfFoo.format(-42)); dfFoo.applyPattern("0000;-000"); if (!dfFoo.toPattern().equals("#0000")) errln("dfFoo.toPattern : " + dfFoo.toPattern()); logln(dfFoo.format(42)); logln(dfFoo.format(-42)); } catch (Exception foo) { errln("Message " + foo.getMessage()); } } /** * NumberFormat.getCurrencyInstance() produces format that uses * decimal separator instead of monetary decimal separator. * * Rewrote this test not to depend on the actual pattern. Pattern should * never contain the monetary separator! Decimal separator in pattern is * interpreted as monetary separator if currency symbol is seen! */ public void Test4087244 () { Locale de = new Locale("pt", "PT"); DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance(de); DecimalFormatSymbols sym = df.getDecimalFormatSymbols(); sym.setMonetaryDecimalSeparator('$'); df.setDecimalFormatSymbols(sym); char decSep = sym.getDecimalSeparator(); char monSep = sym.getMonetaryDecimalSeparator(); //char zero = sym.getZeroDigit(); //The variable is never used if (decSep == monSep) { errln("ERROR in test: want decimal sep != monetary sep"); } else { df.setMinimumIntegerDigits(1); df.setMinimumFractionDigits(2); String str = df.format(1.23); String monStr = "1" + monSep + "23"; String decStr = "1" + decSep + "23"; if (str.indexOf(monStr) >= 0 && str.indexOf(decStr) < 0) { logln("OK: 1.23 -> \"" + str + "\" contains \"" + monStr + "\" and not \"" + decStr + '"'); } else { errln("FAIL: 1.23 -> \"" + str + "\", should contain \"" + monStr + "\" and not \"" + decStr + '"'); } } } /** * Number format data rounding errors for locale FR */ public void Test4070798 () { NumberFormat formatter; String tempString; /* User error : String expectedDefault = "-5\u00a0789,987"; String expectedCurrency = "5\u00a0789,98\u00a0F"; String expectedPercent = "-578\u00a0998%"; */ String expectedDefault = "-5\u00a0789,988"; String expectedCurrency = "5\u00a0789,99\u00a0" + EURO; // euro String expectedPercent = "-578\u00a0999\u00a0%"; formatter = NumberFormat.getNumberInstance(Locale.FRANCE); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedDefault)) { logln ("Bug 4070798 default test passed."); } else { errln("Failed:" + " Expected " + expectedDefault + " Received " + tempString ); } formatter = NumberFormat.getCurrencyInstance(Locale.FRANCE); tempString = formatter.format( 5789.9876 ); if (tempString.equals(expectedCurrency) ) { logln ("Bug 4070798 currency test assed."); } else { errln("Failed:" + " Expected " + expectedCurrency + " Received " + tempString ); } formatter = NumberFormat.getPercentInstance(Locale.FRANCE); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedPercent) ) { logln ("Bug 4070798 percentage test passed."); } else { errln("Failed:" + " Expected " + expectedPercent + " Received " + tempString ); } } /** * Data rounding errors for French (Canada) locale */ public void Test4071005 () { NumberFormat formatter; String tempString; /* user error : String expectedDefault = "-5 789,987"; String expectedCurrency = "5 789,98\u00a0$"; String expectedPercent = "-578 998%"; */ String expectedDefault = "-5\u00a0789,988"; String expectedCurrency = "5\u00a0789,99\u00a0$"; String expectedPercent = "-578\u00a0999\u00A0%"; formatter = NumberFormat.getNumberInstance(Locale.CANADA_FRENCH); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedDefault)) { logln ("Bug 4071005 default test passed."); } else { errln("Failed:" + " Expected " + expectedDefault + " Received " + tempString ); } formatter = NumberFormat.getCurrencyInstance(Locale.CANADA_FRENCH); tempString = formatter.format( 5789.9876 ) ; if (tempString.equals(expectedCurrency) ) { logln ("Bug 4071005 currency test passed."); } else { errln("Failed:" + " Expected " + expectedCurrency + " Received " + tempString ); } formatter = NumberFormat.getPercentInstance(Locale.CANADA_FRENCH); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedPercent) ) { logln ("Bug 4071005 percentage test passed."); } else { errln("Failed:" + " Expected " + expectedPercent + " Received " + tempString ); } } /** * Data rounding errors for German (Germany) locale */ public void Test4071014 () { NumberFormat formatter; String tempString; /* user error : String expectedDefault = "-5.789,987"; String expectedCurrency = "5.789,98\u00a0DM"; String expectedPercent = "-578.998%"; */ String expectedDefault = "-5.789,988"; String expectedCurrency = "5.789,99\u00a0" + EURO; String expectedPercent = "-578.999\u00a0%"; formatter = NumberFormat.getNumberInstance(Locale.GERMANY); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedDefault)) { logln ("Bug 4071014 default test passed."); } else { errln("Failed:" + " Expected " + expectedDefault + " Received " + tempString ); } formatter = NumberFormat.getCurrencyInstance(Locale.GERMANY); tempString = formatter.format( 5789.9876 ) ; if (tempString.equals(expectedCurrency) ) { logln ("Bug 4071014 currency test passed."); } else { errln("Failed:" + " Expected " + expectedCurrency + " Received " + tempString ); } formatter = NumberFormat.getPercentInstance(Locale.GERMANY); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedPercent) ) { logln ("Bug 4071014 percentage test passed."); } else { errln("Failed:" + " Expected " + expectedPercent + " Received " + tempString ); } } /** * Data rounding errors for Italian locale number formats * Note- with the Euro, there is no need for currency rounding anymore */ public void Test4071859 () { NumberFormat formatter; String tempString; /* user error : String expectedDefault = "-5.789,987"; String expectedCurrency = "-L.\u00a05.789,98"; String expectedPercent = "-578.998%"; */ String expectedDefault = "-5.789,988"; String expectedCurrency = "-" + EURO + "\u00a05.789,99"; String expectedPercent = "-578.999%"; formatter = NumberFormat.getNumberInstance(Locale.ITALY); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedDefault)) { logln ("Bug 4071859 default test passed."); } else { errln("a) Failed:" + " Expected " + expectedDefault + " Received " + tempString ); } formatter = NumberFormat.getCurrencyInstance(Locale.ITALY); tempString = formatter.format( -5789.9876 ) ; if (tempString.equals(expectedCurrency) ) { logln ("Bug 4071859 currency test passed."); } else { errln("b) Failed:" + " Expected " + expectedCurrency + " Received " + tempString ); } formatter = NumberFormat.getPercentInstance(Locale.ITALY); tempString = formatter.format (-5789.9876); if (tempString.equals(expectedPercent) ) { logln ("Bug 4071859 percentage test passed."); } else { errln("c) Failed:" + " Expected " + expectedPercent + " Received " + tempString ); } } /* bug 4071859 * Test rounding for nearest even. */ public void Test4093610() { DecimalFormat df = new DecimalFormat("#0.#"); roundingTest(df, 12.35, "12.4"); roundingTest(df, 12.45, "12.4"); roundingTest(df, 12.452,"12.5"); roundingTest(df, 12.55, "12.6"); roundingTest(df, 12.65, "12.6"); roundingTest(df, 12.652,"12.7"); roundingTest(df, 12.75, "12.8"); roundingTest(df, 12.752,"12.8"); roundingTest(df, 12.85, "12.8"); roundingTest(df, 12.852,"12.9"); roundingTest(df, 12.95, "13"); roundingTest(df, 12.952,"13"); } void roundingTest(DecimalFormat df, double x, String expected) { String out = df.format(x); logln("" + x + " formats with 1 fractional digits to " + out); if (!out.equals(expected)) errln("FAIL: Expected " + expected); } /** * Tests the setMaximumFractionDigits limit. */ public void Test4098741() { try { NumberFormat fmt = NumberFormat.getPercentInstance(); fmt.setMaximumFractionDigits(20); logln(fmt.format(.001)); } catch (Exception foo) { warnln("Bug 4098471 failed with exception thrown : " + foo.getMessage()); } } /** * Tests illegal pattern exception. * Fix comment : HShih A31 Part1 will not be fixed and javadoc needs to be updated. * Part2 has been fixed. */ public void Test4074454() { try { DecimalFormat fmt = new DecimalFormat("#,#00.00;-#.#"); logln("format 3456.78: " + fmt.format(3456.78)); //fix "The variable 'fmt' is never used" logln("Inconsistent negative pattern is fine."); DecimalFormat newFmt = new DecimalFormat("#,#00.00 p''ieces;-#,#00.00 p''ieces"); String tempString = newFmt.format(3456.78); if (!tempString.equals("3,456.78 p'ieces")) errln("Failed! 3456.78 p'ieces expected, but got : " + tempString); } catch (Exception foo) { warnln("An exception was thrown for any inconsistent negative pattern."); } } /** * Tests all different comments. * Response to some comments : * [1] DecimalFormat.parse API documentation is more than just one line. * This is not a reproducable doc error in 116 source code. * [2] See updated javadoc. * [3] Fixed. * [4] NumberFormat.parse(String, ParsePosition) : If parsing fails, * a null object will be returned. The unchanged parse position also * reflects an error. * NumberFormat.parse(String) : If parsing fails, an ParseException * will be thrown. * See updated javadoc for more details. * [5] See updated javadoc. * [6] See updated javadoc. * [7] This is a correct behavior if the DateFormat object is linient. * Otherwise, an IllegalArgumentException will be thrown when formatting * "January 35". See GregorianCalendar class javadoc for more details. */ public void Test4099404() { try { DecimalFormat fmt = new DecimalFormat("000.0#0"); logln("format 3456.78: " + fmt.format(3456.78)); //fix "The variable 'fmt' is never used" errln("Bug 4099404 failed applying illegal pattern \"000.0#0\""); } catch (Exception foo) { logln("Bug 4099404 pattern \"000.0#0\" passed"); } try { DecimalFormat fmt = new DecimalFormat("0#0.000"); logln("format 3456.78: " + fmt.format(3456.78)); //fix "The variable 'fmt' is never used" errln("Bug 4099404 failed applying illegal pattern \"0#0.000\""); } catch (Exception foo) { logln("Bug 4099404 pattern \"0#0.000\" passed"); } } /** * DecimalFormat.applyPattern doesn't set minimum integer digits */ public void Test4101481() { DecimalFormat sdf = new DecimalFormat("#,##0"); if (sdf.getMinimumIntegerDigits() != 1) errln("Minimum integer digits : " + sdf.getMinimumIntegerDigits()); } /** * Tests ParsePosition.setErrorPosition() and ParsePosition.getErrorPosition(). */ public void Test4052223() { try { DecimalFormat fmt = new DecimalFormat("#,#00.00"); Number num = fmt.parse("abc3"); errln("Bug 4052223 failed : can't parse string \"a\". Got " + num); } catch (ParseException foo) { logln("Caught expected ParseException : " + foo.getMessage() + " at index : " + foo.getErrorOffset()); } } /** * API tests for API addition request A9. */ public void Test4061302() { DecimalFormatSymbols fmt = new DecimalFormatSymbols(); String currency = fmt.getCurrencySymbol(); String intlCurrency = fmt.getInternationalCurrencySymbol(); char monDecSeparator = fmt.getMonetaryDecimalSeparator(); if (currency.equals("") || intlCurrency.equals("") || monDecSeparator == 0) { errln("getCurrencySymbols failed, got empty string."); } logln("Before set ==> Currency : " + currency + " Intl Currency : " + intlCurrency + " Monetary Decimal Separator : " + monDecSeparator); fmt.setCurrencySymbol("XYZ"); fmt.setInternationalCurrencySymbol("ABC"); fmt.setMonetaryDecimalSeparator('*'); currency = fmt.getCurrencySymbol(); intlCurrency = fmt.getInternationalCurrencySymbol(); monDecSeparator = fmt.getMonetaryDecimalSeparator(); if (!currency.equals("XYZ") || !intlCurrency.equals("ABC") || monDecSeparator != '*') { errln("setCurrencySymbols failed."); } logln("After set ==> Currency : " + currency + " Intl Currency : " + intlCurrency + " Monetary Decimal Separator : " + monDecSeparator); } /** * API tests for API addition request A23. FieldPosition.getBeginIndex and * FieldPosition.getEndIndex. */ public void Test4062486() { DecimalFormat fmt = new DecimalFormat("#,##0.00"); StringBuffer formatted = new StringBuffer(); FieldPosition field = new FieldPosition(0); Double num = new Double(1234.5); fmt.format(num, formatted, field); if (field.getBeginIndex() != 0 && field.getEndIndex() != 5) errln("Format 1234.5 failed. Begin index: " + field.getBeginIndex() + " End index: " + field.getEndIndex()); field.setBeginIndex(7); field.setEndIndex(4); if (field.getBeginIndex() != 7 && field.getEndIndex() != 4) errln("Set begin/end field indexes failed. Begin index: " + field.getBeginIndex() + " End index: " + field.getEndIndex()); } /** * DecimalFormat.parse incorrectly works with a group separator. */ public void Test4108738() { DecimalFormat df = new DecimalFormat("#,##0.###", new DecimalFormatSymbols(java.util.Locale.US)); String text = "1.222,111"; Number num = df.parse(text,new ParsePosition(0)); if (!num.toString().equals("1.222")) errln("\"" + text + "\" is parsed as " + num); text = "1.222x111"; num = df.parse(text,new ParsePosition(0)); if (!num.toString().equals("1.222")) errln("\"" + text + "\" is parsed as " + num); } /** * DecimalFormat.format() incorrectly formats negative doubles. */ public void Test4106658() { Locale savedLocale = Locale.getDefault(); Locale.setDefault(Locale.US); DecimalFormat df = new DecimalFormat(); // Corrected; see 4147706 double d1 = -0.0; double d2 = -0.0001; StringBuffer buffer = new StringBuffer(); logln("pattern: \"" + df.toPattern() + "\""); df.format(d1, buffer, new FieldPosition(0)); if (!buffer.toString().equals("-0")) { // Corrected; see 4147706 errln(d1 + " is formatted as " + buffer); } buffer.setLength(0); df.format(d2, buffer, new FieldPosition(0)); if (!buffer.toString().equals("-0")) { // Corrected; see 4147706 errln(d2 + " is formatted as " + buffer); } Locale.setDefault(savedLocale); } /** * DecimalFormat.parse returns 0 if string parameter is incorrect. */ public void Test4106662() { DecimalFormat df = new DecimalFormat(); String text = "x"; ParsePosition pos1 = new ParsePosition(0), pos2 = new ParsePosition(0); logln("pattern: \"" + df.toPattern() + "\""); Number num = df.parse(text, pos1); if (num != null) { errln("Test Failed: \"" + text + "\" is parsed as " + num); } df = null; df = new DecimalFormat("$###.00"); num = df.parse("$", pos2); if (num != null){ errln("Test Failed: \"$\" is parsed as " + num); } } /** * NumberFormat.parse doesn't return null */ public void Test4114639() { NumberFormat format = NumberFormat.getInstance(); String text = "time 10:x"; ParsePosition pos = new ParsePosition(8); Number result = format.parse(text, pos); if (result != null) errln("Should return null but got : " + result); // Should be null; it isn't } /** * DecimalFormat.format(long n) fails if n * multiplier > MAX_LONG. */ public void Test4106664() { DecimalFormat df = new DecimalFormat(); long n = 1234567890123456L; int m = 12345678; BigInteger bigN = BigInteger.valueOf(n); bigN = bigN.multiply(BigInteger.valueOf(m)); df.setMultiplier(m); df.setGroupingUsed(false); logln("formated: " + df.format(n, new StringBuffer(), new FieldPosition(0))); logln("expected: " + bigN.toString()); } /** * DecimalFormat.format incorrectly formats -0.0. */ public void Test4106667() { Locale savedLocale = Locale.getDefault(); Locale.setDefault(Locale.US); DecimalFormat df = new DecimalFormat(); df.setPositivePrefix("+"); double d = -0.0; logln("pattern: \"" + df.toPattern() + "\""); StringBuffer buffer = new StringBuffer(); df.format(d, buffer, new FieldPosition(0)); if (!buffer.toString().equals("-0")) { // Corrected; see 4147706 errln(d + " is formatted as " + buffer); } Locale.setDefault(savedLocale); } /** * DecimalFormat.setMaximumIntegerDigits() works incorrectly. */ public void Test4110936() { NumberFormat nf = NumberFormat.getInstance(); nf.setMaximumIntegerDigits(128); logln("setMaximumIntegerDigits(128)"); if (nf.getMaximumIntegerDigits() != 128) errln("getMaximumIntegerDigits() returns " + nf.getMaximumIntegerDigits()); } /** * Locale data should use generic currency symbol * * 1) Make sure that all currency formats use the generic currency symbol. * 2) Make sure we get the same results using the generic symbol or a * hard-coded one. */ public void Test4122840() { Locale[] locales = NumberFormat.getAvailableLocales(); for (int i = 0; i < locales.length; i++) { UResourceBundle rb = UResourceBundle.getBundleInstance(ICUResourceBundle.ICU_BASE_NAME,locales[i]); // // Get the currency pattern for this locale. We have to fish it // out of the ResourceBundle directly, since DecimalFormat.toPattern // will return the localized symbol, not \00a4 // UResourceBundle numPatterns = rb.get("NumberPatterns"); String pattern = numPatterns.getString(1); if (pattern.indexOf('\u00A4') == -1 ) { // 'x' not "x" -- workaround bug in IBM JDK 1.4.1 errln("Currency format for " + locales[i] + " does not contain generic currency symbol:" + pattern ); } // Create a DecimalFormat using the pattern we got and format a number DecimalFormatSymbols symbols = new DecimalFormatSymbols(locales[i]); DecimalFormat fmt1 = new DecimalFormat(pattern, symbols); String result1 = fmt1.format(1.111); // // Now substitute in the locale's currency symbol and create another // pattern. Replace the decimal separator with the monetary separator. // //char decSep = symbols.getDecimalSeparator(); //The variable is never used char monSep = symbols.getMonetaryDecimalSeparator(); StringBuffer buf = new StringBuffer(pattern); for (int j = 0; j < buf.length(); j++) { if (buf.charAt(j) == '\u00a4') { String cur = "'" + symbols.getCurrencySymbol() + "'"; buf.replace(j, j+1, cur); j += cur.length() - 1; } } symbols.setDecimalSeparator(monSep); DecimalFormat fmt2 = new DecimalFormat(buf.toString(), symbols); String result2 = fmt2.format(1.111); // NOTE: en_IN is a special case (ChoiceFormat currency display name) if (!result1.equals(result2) && !locales[i].toString().equals("en_IN")) { errln("Results for " + locales[i] + " differ: " + result1 + " vs " + result2); } } } /** * DecimalFormat.format() delivers wrong string. */ public void Test4125885() { double rate = 12.34; DecimalFormat formatDec = new DecimalFormat ("000.00"); logln("toPattern: " + formatDec.toPattern()); String rateString= formatDec.format(rate); if (!rateString.equals("012.34")) errln("result : " + rateString + " expected : 012.34"); rate = 0.1234; formatDec = null; formatDec = new DecimalFormat ("+000.00%;-000.00%"); logln("toPattern: " + formatDec.toPattern()); rateString= formatDec.format(rate); if (!rateString.equals("+012.34%")) errln("result : " + rateString + " expected : +012.34%"); } /** ** * DecimalFormat produces extra zeros when formatting numbers. */ public void Test4134034() { DecimalFormat nf = new DecimalFormat("##,###,###.00"); String f = nf.format(9.02); if (f.equals("9.02")) logln(f + " ok"); else errln("9.02 -> " + f + "; want 9.02"); f = nf.format(0); if (f.equals(".00")) logln(f + " ok"); else errln("0 -> " + f + "; want .00"); } /** * CANNOT REPRODUCE - This bug could not be reproduced. It may be * a duplicate of 4134034. * * JDK 1.1.6 Bug, did NOT occur in 1.1.5 * Possibly related to bug 4125885. * * This class demonstrates a regression in version 1.1.6 * of DecimalFormat class. * * 1.1.6 Results * Value 1.2 Format #.00 Result '01.20' !!!wrong * Value 1.2 Format 0.00 Result '001.20' !!!wrong * Value 1.2 Format 00.00 Result '0001.20' !!!wrong * Value 1.2 Format #0.0# Result '1.2' * Value 1.2 Format #0.00 Result '001.20' !!!wrong * * 1.1.5 Results * Value 1.2 Format #.00 Result '1.20' * Value 1.2 Format 0.00 Result '1.20' * Value 1.2 Format 00.00 Result '01.20' * Value 1.2 Format #0.0# Result '1.2' * Value 1.2 Format #0.00 Result '1.20' */ public void Test4134300() { String[] DATA = { // Pattern Expected string "#.00", "1.20", "0.00", "1.20", "00.00", "01.20", "#0.0#", "1.2", "#0.00", "1.20", }; for (int i=0; i.format(" + IN[j] + ")", OUT[j], f.format(IN[j])); } } } //#if defined(FOUNDATION10) //#else /** * BigDecimal numbers get their fractions truncated by NumberFormat. */ public void Test4141750() { try { String str = "12345.67"; java.math.BigDecimal bd = new java.math.BigDecimal(str); String sd = NumberFormat.getInstance(Locale.US).format(bd); if (!sd.endsWith("67")) errln("Fail: " + str + " x format -> " + sd); } catch (Exception e) { warnln(e.toString()); //e.printStackTrace(); } } //#endif /** * DecimalFormat toPattern() doesn't quote special characters or handle * single quotes. */ public void Test4145457() { try { DecimalFormat nf = (DecimalFormat)NumberFormat.getInstance(); DecimalFormatSymbols sym = nf.getDecimalFormatSymbols(); sym.setDecimalSeparator('\''); nf.setDecimalFormatSymbols(sym); double pi = 3.14159; String[] PATS = { "#.00 'num''ber'", "''#.00''" }; for (int i=0; i \"" + pat + '"'); if (val == val2 && out.equals(out2)) { logln("Ok " + pi + " x \"" + PATS[i] + "\" -> \"" + out + "\" -> " + val + " -> \"" + out2 + "\" -> " + val2); } else { errln("Fail " + pi + " x \"" + PATS[i] + "\" -> \"" + out + "\" -> " + val + " -> \"" + out2 + "\" -> " + val2); } } } catch (ParseException e) { errln("Fail: " + e); e.printStackTrace(); } } /** * DecimalFormat.applyPattern() sets minimum integer digits incorrectly. * CANNOT REPRODUCE * This bug is a duplicate of 4139344, which is a duplicate of 4134300 */ public void Test4147295() { DecimalFormat sdf = new DecimalFormat(); String pattern = "#,###"; logln("Applying pattern \"" + pattern + "\""); sdf.applyPattern(pattern); int minIntDig = sdf.getMinimumIntegerDigits(); if (minIntDig != 0) { errln("Test failed"); errln(" Minimum integer digits : " + minIntDig); errln(" new pattern: " + sdf.toPattern()); } else { logln("Test passed"); logln(" Minimum integer digits : " + minIntDig); } } /** * DecimalFormat formats -0.0 as +0.0 * See also older related bug 4106658, 4106667 */ public void Test4147706() { DecimalFormat df = new DecimalFormat("#,##0.0##"); df.setDecimalFormatSymbols(new DecimalFormatSymbols(Locale.ENGLISH)); double d1 = -0.0; double d2 = -0.0001; StringBuffer f1 = df.format(d1, new StringBuffer(), new FieldPosition(0)); StringBuffer f2 = df.format(d2, new StringBuffer(), new FieldPosition(0)); if (!f1.toString().equals("-0.0")) { errln(d1 + " x \"" + df.toPattern() + "\" is formatted as \"" + f1 + '"'); } if (!f2.toString().equals("-0.0")) { errln(d2 + " x \"" + df.toPattern() + "\" is formatted as \"" + f2 + '"'); } } /** * NumberFormat cannot format Double.MAX_VALUE */ public void Test4162198() { double dbl = Double.MAX_VALUE; NumberFormat f = NumberFormat.getInstance(); f.setMaximumFractionDigits(Integer.MAX_VALUE); f.setMaximumIntegerDigits(Integer.MAX_VALUE); String s = f.format(dbl); logln("The number " + dbl + " formatted to " + s); Number n = null; try { n = f.parse(s); } catch (java.text.ParseException e) { errln("Caught a ParseException:"); e.printStackTrace(); } logln("The string " + s + " parsed as " + n); if (n.doubleValue() != dbl) { errln("Round trip failure"); } } /** * NumberFormat does not parse negative zero. */ public void Test4162852() throws ParseException { for (int i=0; i<2; ++i) { NumberFormat f = (i == 0) ? NumberFormat.getInstance() : NumberFormat.getPercentInstance(); double d = -0.0; String s = f.format(d); double e = f.parse(s).doubleValue(); logln("" + d + " -> " + '"' + s + '"' + " -> " + e); if (e != 0.0 || 1.0/e > 0.0) { logln("Failed to parse negative zero"); } } } /** * NumberFormat truncates data */ public void Test4167494() throws Exception { NumberFormat fmt = NumberFormat.getInstance(Locale.US); double a = Double.MAX_VALUE; String s = fmt.format(a); double b = fmt.parse(s).doubleValue(); boolean match = a == b; if (match) { logln("" + a + " -> \"" + s + "\" -> " + b + " ok"); } else { errln("" + a + " -> \"" + s + "\" -> " + b + " FAIL"); } // We don't test Double.MIN_VALUE because the locale data for the US // currently doesn't specify enough digits to display Double.MIN_VALUE. // This is correct for now; however, we leave this here as a reminder // in case we want to address this later. if (false) { a = Double.MIN_VALUE; s = fmt.format(a); b = fmt.parse(s).doubleValue(); match = a == b; if (match) { logln("" + a + " -> \"" + s + "\" -> " + b + " ok"); } else { errln("" + a + " -> \"" + s + "\" -> " + b + " FAIL"); } } } /** * DecimalFormat.parse() fails when ParseIntegerOnly set to true */ public void Test4170798() { Locale savedLocale = Locale.getDefault(); Locale.setDefault(Locale.US); DecimalFormat df = new DecimalFormat(); df.setParseIntegerOnly(true); Number n = df.parse("-0.0", new ParsePosition(0)); if (!(n instanceof Double) || n.intValue() != 0) { errln("FAIL: parse(\"-0.0\") returns " + n + " (" + n.getClass().getName() + ')'); } Locale.setDefault(savedLocale); } /** * toPattern only puts the first grouping separator in. */ public void Test4176114() { String[] DATA = { "00", "#00", "000", "#000", // No grouping "#000", "#000", // No grouping "#,##0", "#,##0", "#,000", "#,000", "0,000", "#0,000", "00,000", "#00,000", "000,000", "#,000,000", "0,000,000,000,000.0000", "#0,000,000,000,000.0000", // Reported }; for (int i=0; i " + s + ", want " + DATA[i+1]); } } } /** * DecimalFormat is incorrectly rounding numbers like 1.2501 to 1.2 */ public void Test4179818() { String DATA[] = { // Input Pattern Expected output "1.2511", "#.#", "1.3", "1.2501", "#.#", "1.3", "0.9999", "#", "1", }; DecimalFormat fmt = new DecimalFormat("#", new DecimalFormatSymbols(Locale.US)); for (int i=0; i max. // Numberformat should catch this and throw an exception. for (int i = 0; i < offsets.length; ++i) { bytes[offsets[i]] = (byte)(4 - i); } { ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes)); try { NumberFormat format = (NumberFormat) ois.readObject(); logln("format: " + format.format(1234.56)); //fix "The variable is never used" errln("FAIL: Deserialized bogus NumberFormat with minXDigits > maxXDigits"); } catch (InvalidObjectException e) { logln("Ok: " + e.getMessage()); } } // Set values so they are too high, but min <= max // Format should pass the min <= max test, and DecimalFormat should reset to current maximum // (for compatibility with versions streamed out before the maximums were imposed). for (int i = 0; i < offsets.length; ++i) { bytes[offsets[i]] = 4; } { ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes)); NumberFormat format = (NumberFormat) ois.readObject(); //For compatibility with previous version if ((format.getMaximumIntegerDigits() != 309) || format.getMaximumFractionDigits() != 340) { errln("FAIL: Deserialized bogus NumberFormat with values out of range," + " intMin: " + format.getMinimumIntegerDigits() + " intMax: " + format.getMaximumIntegerDigits() + " fracMin: " + format.getMinimumFractionDigits() + " fracMax: " + format.getMaximumFractionDigits()); } else { logln("Ok: Digit count out of range"); } } } /** * Some DecimalFormatSymbols changes are not picked up by DecimalFormat. * This includes the minus sign, currency symbol, international currency * symbol, percent, and permille. This is filed as bugs 4212072 and * 4212073. */ public void Test4212072() throws IOException, ClassNotFoundException { DecimalFormatSymbols sym = new DecimalFormatSymbols(Locale.US); DecimalFormat fmt = new DecimalFormat("#", sym); sym.setMinusSign('^'); fmt.setDecimalFormatSymbols(sym); if (!fmt.format(-1).equals("^1")) { errln("FAIL: -1 x (minus=^) -> " + fmt.format(-1) + ", exp ^1"); } if (!fmt.getNegativePrefix().equals("^")) { errln("FAIL: (minus=^).getNegativePrefix -> " + fmt.getNegativePrefix() + ", exp ^"); } sym.setMinusSign('-'); fmt.applyPattern("#%"); sym.setPercent('^'); fmt.setDecimalFormatSymbols(sym); if (!fmt.format(0.25).equals("25^")) { errln("FAIL: 0.25 x (percent=^) -> " + fmt.format(0.25) + ", exp 25^"); } if (!fmt.getPositiveSuffix().equals("^")) { errln("FAIL: (percent=^).getPositiveSuffix -> " + fmt.getPositiveSuffix() + ", exp ^"); } sym.setPercent('%'); fmt.applyPattern("#\u2030"); sym.setPerMill('^'); fmt.setDecimalFormatSymbols(sym); if (!fmt.format(0.25).equals("250^")) { errln("FAIL: 0.25 x (permill=^) -> " + fmt.format(0.25) + ", exp 250^"); } if (!fmt.getPositiveSuffix().equals("^")) { errln("FAIL: (permill=^).getPositiveSuffix -> " + fmt.getPositiveSuffix() + ", exp ^"); } sym.setPerMill('\u2030'); fmt.applyPattern("\u00A4#.00"); sym.setCurrencySymbol("usd"); fmt.setDecimalFormatSymbols(sym); if (!fmt.format(12.5).equals("usd12.50")) { errln("FAIL: 12.5 x (currency=usd) -> " + fmt.format(12.5) + ", exp usd12.50"); } if (!fmt.getPositivePrefix().equals("usd")) { errln("FAIL: (currency=usd).getPositivePrefix -> " + fmt.getPositivePrefix() + ", exp usd"); } sym.setCurrencySymbol("$"); fmt.applyPattern("\u00A4\u00A4#.00"); sym.setInternationalCurrencySymbol("DOL"); fmt.setDecimalFormatSymbols(sym); if (!fmt.format(12.5).equals("DOL12.50")) { errln("FAIL: 12.5 x (intlcurrency=DOL) -> " + fmt.format(12.5) + ", exp DOL12.50"); } if (!fmt.getPositivePrefix().equals("DOL")) { errln("FAIL: (intlcurrency=DOL).getPositivePrefix -> " + fmt.getPositivePrefix() + ", exp DOL"); } sym.setInternationalCurrencySymbol("USD"); if (VersionInfo.ICU_VERSION == VersionInfo.getInstance(2,2)) { // bug in 2.2 that fails this test // to be fixed in the later versions System.out.println("\n Test skipped for release 2.2"); return; } // Since the pattern logic has changed, make sure that patterns round // trip properly. Test stream in/out integrity too. Locale[] avail = NumberFormat.getAvailableLocales(); for (int i=0; i \"" + pat + "\" -> \"" + f2.toPattern() + '"'); } // Test toLocalizedPattern/applyLocalizedPattern round trip pat = df.toLocalizedPattern(); try{ f2.applyLocalizedPattern(pat); String s1 = f2.format(123456); String s2 = df.format(123456); if(!s1.equals(s2)){ errln("FAIL: " + avail[i] + " #" + j + " -> localized \"" + s2 + "\" -> \"" + s2 + '"'+ " in locale "+df.getLocale(ULocale.ACTUAL_LOCALE)); } if (!df.equals(f2)) { errln("FAIL: " + avail[i] + " #" + j + " -> localized \"" + pat + "\" -> \"" + f2.toLocalizedPattern() + '"'+ " in locale "+df.getLocale(ULocale.ACTUAL_LOCALE)); errln("s1: "+s1+" s2: "+s2); } }catch(IllegalArgumentException ex){ errln(ex.getMessage()+" for locale "+ df.getLocale(ULocale.ACTUAL_LOCALE)); } // Test writeObject/readObject round trip ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(df); oos.flush(); baos.close(); byte[] bytes = baos.toByteArray(); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes)); f2 = (DecimalFormat) ois.readObject(); if (!df.equals(f2)) { errln("FAIL: Stream in/out " + avail[i] + " -> \"" + pat + "\" -> " + (f2 != null ? ("\""+f2.toPattern()+'"') : "null")); } } } // @since ICU 2.4 // Make sure that all special characters, when quoted in a suffix or // prefix, lose their special meaning. char[] SPECIALS = { '0', ',', '.', '\u2030', '%', '#', ';', 'E', '*', '+', '-' }; sym = new DecimalFormatSymbols(Locale.US); for (int j=0; j toPattern() => \"" + pat2 + "\""); } String s = fmt.format(123); String exp = "" + special + "123" + special; if (!s.equals(exp)) { errln("FAIL: 123 x \"" + pat + "\" => \"" + s + "\", exp \"" + exp + "\""); } } catch (IllegalArgumentException e) { errln("FAIL: Pattern \"" + pat + "\" => " + e.getMessage()); } } } /** * DecimalFormat.parse() fails for mulipliers 2^n. */ public void Test4216742() throws ParseException { DecimalFormat fmt = (DecimalFormat) NumberFormat.getInstance(Locale.US); long[] DATA = { Long.MIN_VALUE, Long.MAX_VALUE, -100000000L, 100000000L}; for (int i=0; i 0 != DATA[i] > 0) { errln("\"" + str + "\" parse(x " + fmt.getMultiplier() + ") => " + n); } } } } /** * DecimalFormat formats 1.001 to "1.00" instead of "1" with 2 fraction * digits. */ public void Test4217661() { Object[] DATA = { new Double(0.001), "0", new Double(1.001), "1", new Double(0.006), "0.01", new Double(1.006), "1.01", }; NumberFormat fmt = NumberFormat.getInstance(Locale.US); fmt.setMaximumFractionDigits(2); for (int i=0; i " + n); } catch (ParseException pe) { errln("ERROR: Failed round trip with strict parsing."); } } df.applyPattern(patterns[1]); numstr = "005"; try { Number n = df.parse(numstr); errln("ERROR: Expected round trip failure not encountered: numstr -> " + n); } catch (ParseException pe) { logln("INFO: Expected ParseExpection for " + numstr + " with strick parse enabled"); } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/PluralFormatUnitTest.java0000644000175000017500000002535411361046232025236 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2007-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.text.*; import com.ibm.icu.util.ULocale; import java.text.ParsePosition; /** * @author tschumann (Tim Schumann) * */ public class PluralFormatUnitTest extends TestFmwk { public static void main(String[] args) throws Exception { new PluralFormatUnitTest().run(args); } public void TestConstructor() { // Test correct formatting of numbers. PluralFormat plFmts[] = new PluralFormat[8]; plFmts[0] = new PluralFormat(); plFmts[0].applyPattern("other{#}"); plFmts[1] = new PluralFormat(PluralRules.DEFAULT); plFmts[1].applyPattern("other{#}"); plFmts[2] = new PluralFormat(PluralRules.DEFAULT, "other{#}"); plFmts[3] = new PluralFormat("other{#}"); plFmts[4] = new PluralFormat(ULocale.getDefault()); plFmts[4].applyPattern("other{#}"); plFmts[5] = new PluralFormat(ULocale.getDefault(), PluralRules.DEFAULT); plFmts[5].applyPattern("other{#}"); plFmts[6] = new PluralFormat(ULocale.getDefault(), PluralRules.DEFAULT, "other{#}"); plFmts[7] = new PluralFormat(ULocale.getDefault(), "other{#}"); // These plural formats should produce the same output as a // NumberFormat for the default locale. NumberFormat numberFmt = NumberFormat.getInstance(ULocale.getDefault()); for (int n = 1; n < 13; n++) { String result = numberFmt.format(n); for (int k = 0; k < plFmts.length; ++k) { this.assertEquals("PluralFormat's output is not as expected", result, plFmts[k].format(n)); } } // Test some bigger numbers. for (int n = 100; n < 113; n++) { String result = numberFmt.format(n*n); for (int k = 0; k < plFmts.length; ++k) { this.assertEquals("PluralFormat's output is not as expected", result, plFmts[k].format(n*n)); } } } public void TestApplyPatternAndFormat() { // Create rules for testing. PluralRules oddAndEven = PluralRules.createRules("odd: n mod 2 is 1"); { // Test full specified case for testing RuleSet PluralFormat plfOddAndEven = new PluralFormat(oddAndEven); plfOddAndEven.applyPattern("odd{# is odd.} other{# is even.}"); // Test fall back to other. PluralFormat plfOddOrEven = new PluralFormat(oddAndEven); plfOddOrEven.applyPattern("other{# is odd or even.}"); NumberFormat numberFormat = NumberFormat.getInstance(ULocale.getDefault()); for (int i = 0; i < 22; ++i) { assertEquals("Fallback to other gave wrong results", numberFormat.format(i) + " is odd or even.", plfOddOrEven.format(i)); assertEquals("Fully specified PluralFormat gave wrong results", numberFormat.format(i) + ((i%2 == 1) ? " is odd." : " is even."), plfOddAndEven.format(i)); } // Check that double definition results in an exception. try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{foo} odd{bar} other{foobar}"); errln("Double definition of a plural case message should " + "provoke an exception but did not."); }catch (IllegalArgumentException e){} try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{foo} other{bar} other{foobar}"); errln("Double definition of a plural case message should " + "provoke an exception but did not."); }catch (IllegalArgumentException e){} } // omit other keyword. try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{foo}"); errln("Not defining plural case other should result in an " + "exception but did not."); }catch (IllegalArgumentException e){} // Test unknown keyword. try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("otto{foo} other{bar}"); errln("Defining a message for an unknown keyword should result in" + "an exception but did not."); }catch (IllegalArgumentException e){} // Test invalid keyword. try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("1odd{foo} other{bar}"); errln("Defining a message for an invalid keyword should result in" + "an exception but did not."); }catch (IllegalArgumentException e){} // Test invalid syntax // -- comma between keyword{message} clauses // -- space in keywords // -- keyword{message1}{message2} try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{foo},other{bar}"); errln("Separating keyword{message} items with other characters " + "than space should provoke an exception but did not."); }catch (IllegalArgumentException e){} try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("od d{foo} other{bar}"); errln("Spaces inside keywords should provoke an exception but " + "did not."); }catch (IllegalArgumentException e){} try { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{foo}{foobar}other{foo}"); errln("Defining multiple messages after a keyword should provoke " + "an exception but did not."); }catch (IllegalArgumentException e){} // Check that nested format is preserved. { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{The number {0, number, #.#0} is odd.}" + "other{The number {0, number, #.#0} is even.}"); for (int i = 1; i < 3; ++i) { assertEquals("format did not preserve a nested format string.", ((i % 2 == 1) ? "The number {0, number, #.#0} is odd." : "The number {0, number, #.#0} is even."), plFmt.format(i)); } } // Check that a pound sign in curly braces is preserved. { PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd{The number {#} is odd.}" + "other{The number {#} is even.}"); for (int i = 1; i < 3; ++i) { assertEquals("format did not preserve # inside curly braces.", ((i % 2 == 1) ? "The number {#} is odd." : "The number {#} is even."), plFmt.format(i)); } } } public void TestSetLocale() { // Create rules for testing. PluralRules oddAndEven = PluralRules.createRules("odd__: n mod 2 is 1"); PluralFormat plFmt = new PluralFormat(oddAndEven); plFmt.applyPattern("odd__{odd} other{even}"); plFmt.setLocale(ULocale.ENGLISH); // Check that pattern gets deleted. NumberFormat nrFmt = NumberFormat.getInstance(ULocale.ENGLISH); assertEquals("pattern was not resetted by setLocale() call.", nrFmt.format(5), plFmt.format(5)); // Check that rules got updated. try { plFmt.applyPattern("odd__{odd} other{even}"); errln("SetLocale should reset rules but did not."); } catch (IllegalArgumentException e) { if (e.getMessage().indexOf("Unknown keyword") < 0){ errln("Wrong exception thrown"); } } plFmt.applyPattern("one{one} other{not one}"); for (int i = 0; i < 20; ++i) { assertEquals("Wrong ruleset loaded by setLocale()", ((i==1) ? "one" : "not one"), plFmt.format(i)); } } public void TestParse() { PluralFormat plFmt = new PluralFormat("other{test}"); try { plFmt.parse("test", new ParsePosition(0)); errln("parse() should throw an UnsupportedOperationException but " + "did not"); } catch (UnsupportedOperationException e) { } plFmt = new PluralFormat("other{test}"); try { plFmt.parseObject("test", new ParsePosition(0)); errln("parse() should throw an UnsupportedOperationException but " + "did not"); } catch (UnsupportedOperationException e) { } } public void TestPattern() { Object[] args = { "acme", null }; { PluralFormat pf = new PluralFormat(" one {one ''widget} other {# widgets} "); String pat = pf.toPattern(); logln("pf pattern: '" + pat + "'"); assertEquals("no leading spaces", "o", pat.substring(0, 1)); assertEquals("no trailing spaces", "}", pat.substring(pat.length() - 1)); } MessageFormat pfmt = new MessageFormat("The disk ''{0}'' contains {1, plural, one {one ''''{1, number, #.0}'''' widget} other {# widgets}}."); System.out.println(); for (int i = 0; i < 3; ++i) { args[1] = new Integer(i); logln(pfmt.format(args)); } PluralFormat pf = (PluralFormat)pfmt.getFormatsByArgumentIndex()[1]; logln(pf.toPattern()); logln(pfmt.toPattern()); MessageFormat pfmt2 = new MessageFormat(pfmt.toPattern()); assertEquals("message formats are equal", pfmt, pfmt2); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/TestAll.java0000644000175000017500000000733711361046232022477 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 1996-2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import com.ibm.icu.dev.test.TestFmwk.TestGroup; /** * Top level test used to run all other tests as a batch. */ public class TestAll extends TestGroup { public static void main(String[] args) { new TestAll().run(args); } public TestAll() { super(new String[] { "TestAll$RBNF", "TestAll$NumberFormat", "TestAll$DateFormat", "TestAll$DateIntervalFormat", "TestAll$TimeUnitFormat", "TestAll$MessageFormat", "TestAll$PluralFormat", "com.ibm.icu.dev.test.format.BigNumberFormatTest", "com.ibm.icu.dev.test.format.GlobalizationPreferencesTest", "DataDrivenFormatTest" }, "Formatting Tests"); } public static class RBNF extends TestGroup { public RBNF() { super(new String[] { "RbnfTest", "RbnfRoundTripTest", "RBNFParseTest", }); } } public static class NumberFormat extends TestGroup { public NumberFormat() { super(new String[] { "IntlTestNumberFormat", "IntlTestNumberFormatAPI", "NumberFormatTest", "NumberFormatRegistrationTest", "NumberFormatRoundTripTest", "NumberRegression", "NumberFormatRegressionTest", "IntlTestDecimalFormatAPI", "IntlTestDecimalFormatAPIC", "IntlTestDecimalFormatSymbols", "IntlTestDecimalFormatSymbolsC", }); } } public static class DateFormat extends TestGroup { public DateFormat() { super(new String[] { "DateFormatMiscTests", "DateFormatRegressionTest", "DateFormatRoundTripTest", "DateFormatTest", "IntlTestDateFormat", "IntlTestDateFormatAPI", "IntlTestDateFormatAPIC", "IntlTestDateFormatSymbols", "DateTimeGeneratorTest", "IntlTestSimpleDateFormatAPI", "DateFormatRegressionTestJ", "TimeZoneFormatTest" }); } } public static class DateIntervalFormat extends TestGroup { public DateIntervalFormat() { super(new String[] { "DateIntervalFormatTest" }); } } public static class TimeUnitFormat extends TestGroup { public TimeUnitFormat() { super(new String[] { "TimeUnitTest" }); } } public static class PluralFormat extends TestGroup { public PluralFormat() { super(new String[] { "PluralFormatUnitTest", "PluralFormatTest", "PluralRulesTest", }); } } public static class MessageFormat extends TestGroup { public MessageFormat() { super(new String[] { "TestMessageFormat", "MessageRegression", }); } } public static final String CLASS_TARGET_NAME = "Format"; } icu4j-4.2/src/com/ibm/icu/dev/test/format/RBNFParseTest.java0000644000175000017500000001240611361046232023502 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2004-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.RuleBasedNumberFormat; import com.ibm.icu.util.ULocale; import com.ibm.icu.dev.test.TestFmwk; import java.util.Locale; public class RBNFParseTest extends TestFmwk { public static void main(String[] args) { new RBNFParseTest().run(args); } public void TestParse() { // these rules make no sense but behave rationally String[] okrules = { "random text", "%foo:bar", "%foo: bar", "0:", "0::", "%%foo:;", "-", "-1", "-:", ".", ".1", "[", "]", "[]", "[foo]", "[[]", "[]]", "[[]]", "[][]", "<", ">", "=", "==", "===", "=foo=", }; String[] exceptrules = { "", ";", ";;", ":", "::", ":1", ":;", ":;:;", "<<", "<<<", "10:;9:;", ">>", ">>>", "10:", // formatting any value with a one's digit will fail "11: << x", // formating a multiple of 10 causes rollback rule to fail "%%foo: 0 foo; 10: =%%bar=; %%bar: 0: bar; 10: =%%foo=;", }; String[][] allrules = { okrules, exceptrules, }; for (int j = 0; j < allrules.length; ++j) { String[] tests = allrules[j]; boolean except = tests == exceptrules; for (int i = 0; i < tests.length; ++i) { logln("----------"); logln("rules: '" + tests[i] + "'"); boolean caughtException = false; try { RuleBasedNumberFormat fmt = new RuleBasedNumberFormat(tests[i], Locale.US); logln("1.23: " + fmt.format(20)); logln("-123: " + fmt.format(-123)); logln(".123: " + fmt.format(.123)); logln(" 123: " + fmt.format(123)); } catch (Exception e) { if (!except) { errln("Unexpected exception: " + e.getMessage()); } else { caughtException = true; } } if (except && !caughtException) { errln("expected exception but didn't get one!"); } } } } private void parseFormat(RuleBasedNumberFormat rbnf, String s, String target) { try { Number n = rbnf.parse(s); String t = rbnf.format(n); assertEquals(rbnf.getLocale(ULocale.ACTUAL_LOCALE) + ": " + s + " : " + n, target, t); } catch (java.text.ParseException e){ fail("exception:" + e); } } private void parseList(RuleBasedNumberFormat rbnf_en, RuleBasedNumberFormat rbnf_fr, String[][] lists) { for (int i = 0; i < lists.length; ++i) { String[] list = lists[i]; String s = list[0]; String target_en = list[1]; String target_fr = list[2]; parseFormat(rbnf_en, s, target_en); parseFormat(rbnf_fr, s, target_fr); } } public void TestLenientParse() throws Exception { RuleBasedNumberFormat rbnf_en, rbnf_fr; rbnf_en = new RuleBasedNumberFormat(Locale.ENGLISH, RuleBasedNumberFormat.SPELLOUT); rbnf_en.setLenientParseMode(true); rbnf_fr = new RuleBasedNumberFormat(Locale.FRENCH, RuleBasedNumberFormat.SPELLOUT); rbnf_fr.setLenientParseMode(true); Number n = rbnf_en.parse("1,2 million"); logln(n.toString()); String[][] lists = { { "1,2", "twelve", "un virgule deux" }, { "1,2 million", "twelve million", "un virgule deux" }, { "1,2 millions", "twelve million", "un million deux-cents-mille" }, { "1.2", "one point two", "douze" }, { "1.2 million", "one million two hundred thousand", "douze" }, { "1.2 millions", "one million two hundred thousand", "douze millions" }, }; Locale.setDefault(Locale.FRANCE); logln("Default locale:" + Locale.getDefault()); logln("rbnf_en:" + rbnf_en.getDefaultRuleSetName()); logln("rbnf_fr:" + rbnf_en.getDefaultRuleSetName()); parseList(rbnf_en, rbnf_fr, lists); Locale.setDefault(Locale.US); logln("Default locale:" + Locale.getDefault()); logln("rbnf_en:" + rbnf_en.getDefaultRuleSetName()); logln("rbnf_fr:" + rbnf_en.getDefaultRuleSetName()); parseList(rbnf_en, rbnf_fr, lists); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestDecimalFormatSymbols.java0000644000175000017500000001520211361046232026664 0ustar twernertwerner/***************************************************************************************** * * Copyright (C) 1996-2009, International Business Machines * Corporation and others. All Rights Reserved. **/ /** * Port From: JDK 1.4b1 : java.text.Format.IntlTestDecimalFormatSymbols * Source File: java/text/format/IntlTestDecimalFormatSymbols.java **/ /* @test 1.4 98/03/06 @summary test International Decimal Format Symbols */ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.*; import com.ibm.icu.util.Currency; import java.util.Locale; public class IntlTestDecimalFormatSymbols extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new IntlTestDecimalFormatSymbols().run(args); } // Test the API of DecimalFormatSymbols; primarily a simple get/set set. public void TestSymbols() { DecimalFormatSymbols fr = new DecimalFormatSymbols(Locale.FRENCH); DecimalFormatSymbols en = new DecimalFormatSymbols(Locale.ENGLISH); if(en.equals(fr)) { errln("ERROR: English DecimalFormatSymbols equal to French"); } // just do some VERY basic tests to make sure that get/set work char zero = en.getZeroDigit(); fr.setZeroDigit(zero); if(fr.getZeroDigit() != en.getZeroDigit()) { errln("ERROR: get/set ZeroDigit failed"); } char sigDigit = en.getSignificantDigit(); fr.setSignificantDigit(sigDigit); if(fr.getSignificantDigit() != en.getSignificantDigit()) { errln("ERROR: get/set SignificantDigit failed"); } Currency currency = Currency.getInstance("USD"); fr.setCurrency(currency); if (!fr.getCurrency().equals(currency)){ errln("ERROR: get/set Currency failed"); } char group = en.getGroupingSeparator(); fr.setGroupingSeparator(group); if(fr.getGroupingSeparator() != en.getGroupingSeparator()) { errln("ERROR: get/set GroupingSeparator failed"); } char decimal = en.getDecimalSeparator(); fr.setDecimalSeparator(decimal); if(fr.getDecimalSeparator() != en.getDecimalSeparator()) { errln("ERROR: get/set DecimalSeparator failed"); } char monetaryGroup = en.getMonetaryGroupingSeparator(); fr.setMonetaryGroupingSeparator(monetaryGroup); if(fr.getMonetaryGroupingSeparator() != en.getMonetaryGroupingSeparator()) { errln("ERROR: get/set MonetaryGroupingSeparator failed"); } char monetaryDecimal = en.getMonetaryDecimalSeparator(); fr.setMonetaryDecimalSeparator(monetaryDecimal); if(fr.getMonetaryDecimalSeparator() != en.getMonetaryDecimalSeparator()) { errln("ERROR: get/set MonetaryDecimalSeparator failed"); } char perMill = en.getPerMill(); fr.setPerMill(perMill); if(fr.getPerMill() != en.getPerMill()) { errln("ERROR: get/set PerMill failed"); } char percent = en.getPercent(); fr.setPercent(percent); if(fr.getPercent() != en.getPercent()) { errln("ERROR: get/set Percent failed"); } char digit = en.getDigit(); fr.setDigit(digit); if(fr.getPercent() != en.getPercent()) { errln("ERROR: get/set Percent failed"); } char patternSeparator = en.getPatternSeparator(); fr.setPatternSeparator(patternSeparator); if(fr.getPatternSeparator() != en.getPatternSeparator()) { errln("ERROR: get/set PatternSeparator failed"); } String infinity = en.getInfinity(); fr.setInfinity(infinity); String infinity2 = fr.getInfinity(); if(! infinity.equals(infinity2)) { errln("ERROR: get/set Infinity failed"); } String nan = en.getNaN(); fr.setNaN(nan); String nan2 = fr.getNaN(); if(! nan.equals(nan2)) { errln("ERROR: get/set NaN failed"); } char minusSign = en.getMinusSign(); fr.setMinusSign(minusSign); if(fr.getMinusSign() != en.getMinusSign()) { errln("ERROR: get/set MinusSign failed"); } char plusSign = en.getPlusSign(); fr.setPlusSign(plusSign); if(fr.getPlusSign() != en.getPlusSign()) { errln("ERROR: get/set PlusSign failed"); } char padEscape = en.getPadEscape(); fr.setPadEscape(padEscape); if(fr.getPadEscape() != en.getPadEscape()) { errln("ERROR: get/set PadEscape failed"); } String exponential = en.getExponentSeparator(); fr.setExponentSeparator(exponential); if(fr.getExponentSeparator() != en.getExponentSeparator()) { errln("ERROR: get/set Exponential failed"); } // Test CurrencySpacing. // In CLDR 1.7, only root.txt has CurrencySpacing data. This data might // be different between en and fr in the future. for (int i = DecimalFormatSymbols.CURRENCY_SPC_CURRENCY_MATCH; i <= DecimalFormatSymbols.CURRENCY_SPC_INSERT; i++) { if (en.getPatternForCurrencySpacing(i, true) != fr.getPatternForCurrencySpacing(i, true)) { errln("ERROR: get currency spacing item:"+ i+" before the currency"); if (en.getPatternForCurrencySpacing(i, false) != fr.getPatternForCurrencySpacing(i, false)) { errln("ERROR: get currency spacing item:" + i + " after currency"); } } } String dash = "-"; en.setPatternForCurrencySpacing(DecimalFormatSymbols.CURRENCY_SPC_INSERT, true, dash); if (dash != en.getPatternForCurrencySpacing(DecimalFormatSymbols.CURRENCY_SPC_INSERT, true)) { errln("ERROR: set currency spacing pattern for before currency."); } //DecimalFormatSymbols foo = new DecimalFormatSymbols(); //The variable is never used en = (DecimalFormatSymbols) fr.clone(); if(! en.equals(fr)) { errln("ERROR: Clone failed"); } } public void testCoverage() { DecimalFormatSymbols df = new DecimalFormatSymbols(); DecimalFormatSymbols df2 = (DecimalFormatSymbols)df.clone(); if (!df.equals(df2) || df.hashCode() != df2.hashCode()) { errln("decimal format symbols clone, equals, or hashCode failed"); } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/DateIntervalFormatTest.java0000644000175000017500000014611511361046232025520 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /** * Port From: ICU4C v1.8.1 : format : DateIntervalFormatTest * Source File: $ICU4CRoot/source/test/intltest/dtifmtts.cpp **/ package com.ibm.icu.dev.test.format; import java.text.FieldPosition; import java.text.ParseException; import java.util.Date; import java.util.Locale; import com.ibm.icu.impl.Utility; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.DateIntervalFormat; import com.ibm.icu.text.DateIntervalInfo; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.util.Calendar; import com.ibm.icu.util.DateInterval; import com.ibm.icu.util.ULocale; public class DateIntervalFormatTest extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new DateIntervalFormatTest().run(args); } /** * Test format */ public void testFormat() { // first item is date pattern // followed by a group of locale/from_data/to_data/skeleton/interval_data String[] DATA = { "yyyy MM dd HH:mm:ss", // test root "root", "2007 11 10 10:10:10", "2007 12 10 10:10:10", "yM", "2007-11 \\u2013 12", // test 'H' and 'h', using availableFormat in fallback "en", "2007 11 10 10:10:10", "2007 11 10 15:10:10", "Hms", "10:10:10 \\u2013 15:10:10", "en", "2007 11 10 10:10:10", "2007 11 10 15:10:10", "hms", "10:10:10 AM \\u2013 3:10:10 PM", // test skeleton with both date and time "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMyhm", "Nov 10, 2007 10:10 AM \\u2013 Nov 20, 2007 10:10 AM", "en", "2007 11 10 10:10:10", "2007 11 10 11:10:10", "dMMMyhm", "Nov 10, 2007 10:10\\u201311:10 AM", "en", "2007 11 10 10:10:10", "2007 11 10 11:10:10", "hms", "10:10:10 AM \\u2013 11:10:10 AM", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEEEdMMMMy", "Wednesday, October 10, 2007 \\u2013 Friday, October 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMMMy", "October 10, 2007 \\u2013 October 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMMM", "October 10, 2007 \\u2013 October 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "MMMMy", "October 2007 \\u2013 October 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEEEdMMMM", "Wednesday, October 10, 2007 \\u2013 Friday, October 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EdMMMy", "Wed, Oct 10, 2007 \\u2013 Fri, Oct 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMMy", "Oct 10, 2007 \\u2013 Oct 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMM", "Oct 10, 2007 \\u2013 Oct 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "MMMy", "Oct 2007 \\u2013 Oct 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EdMMM", "Wed, Oct 10, 2007 \\u2013 Fri, Oct 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EdMy", "Wed, 10/10/07 \\u2013 Fri, 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMy", "10/10/07 \\u2013 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dM", "10/10/07 \\u2013 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "My", "10/07 \\u2013 10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EdM", "Wed, 10/10/07 \\u2013 Fri, 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "d", "10/10/07 \\u2013 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "Ed", "10 Wed \\u2013 10 Fri", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "y", "2007\\u20132008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "M", "10/07 \\u2013 10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "MMM", "Oct 2007 \\u2013 Oct 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "MMMM", "October 2007 \\u2013 October 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hm", "10/10/2007 10:10 AM \\u2013 10/10/2008 10:10 AM", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hmv", "10/10/2007 10:10 AM PT \\u2013 10/10/2008 10:10 AM PT", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hmz", "10/10/2007 10:10 AM PDT \\u2013 10/10/2008 10:10 AM PDT", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "h", "10/10/2007 10 \\u2013 10/10/2008 10", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hv", "10/10/2007 PT (Hour: 10) \\u2013 10/10/2008 PT (Hour: 10)", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hz", "10/10/2007 PDT (Hour: 10) \\u2013 10/10/2008 PDT (Hour: 10)", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEddMMyyyy", "Wed, 10/10/07 \\u2013 Fri, 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EddMMy", "Wed, 10/10/07 \\u2013 Fri, 10/10/08", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hhmm", "10/10/2007 10:10 AM \\u2013 10/10/2008 10:10 AM", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hhmmzz", "10/10/2007 10:10 AM PDT \\u2013 10/10/2008 10:10 AM PDT", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hms", "10/10/2007 10:10:10 AM \\u2013 10/10/2008 10:10:10 AM", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMMMMy", "O 10, 2007 \\u2013 O 10, 2008", "en", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEEEEdM", "W, 10/10/07 \\u2013 F, 10/10/08", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEEEdMMMMy", "Wednesday, October 10 \\u2013 Saturday, November 10, 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMMy", "October 10 \\u2013 November 10, 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMM", "October 10 \\u2013 November 10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMMMy", "October\\u2013November 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEEEdMMMM", "Wednesday, October 10 \\u2013 Saturday, November 10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EdMMMy", "Wed, Oct 10 \\u2013 Sat, Nov 10, 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMy", "Oct 10 \\u2013 Nov 10, 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMM", "Oct 10 \\u2013 Nov 10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMMy", "Oct\\u2013Nov 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EdMMM", "Wed, Oct 10 \\u2013 Sat, Nov 10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EdMy", "Wed, 10/10/07 \\u2013 Sat, 11/10/07", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMy", "10/10/07 \\u2013 11/10/07", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dM", "10/10 \\u2013 11/10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "My", "10/07 \\u2013 11/07", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EdM", "Wed, 10/10 \\u2013 Sat, 11/10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "d", "10/10 \\u2013 11/10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "Ed", "10 Wed \\u2013 10 Sat", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "y", "2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "M", "10\\u201311", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMM", "Oct\\u2013Nov", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMMM", "October-November", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hm", "10/10/2007 10:10 AM \\u2013 11/10/2007 10:10 AM", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hmv", "10/10/2007 10:10 AM PT \\u2013 11/10/2007 10:10 AM PT", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hmz", "10/10/2007 10:10 AM PDT \\u2013 11/10/2007 10:10 AM PST", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "h", "10/10/2007 10 \\u2013 11/10/2007 10", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hv", "10/10/2007 PT (Hour: 10) \\u2013 11/10/2007 PT (Hour: 10)", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hz", "10/10/2007 PDT (Hour: 10) \\u2013 11/10/2007 PST (Hour: 10)", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEddMMyyyy", "Wed, 10/10/07 \\u2013 Sat, 11/10/07", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EddMMy", "Wed, 10/10/07 \\u2013 Sat, 11/10/07", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hhmm", "10/10/2007 10:10 AM \\u2013 11/10/2007 10:10 AM", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hhmmzz", "10/10/2007 10:10 AM PDT \\u2013 11/10/2007 10:10 AM PST", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hms", "10/10/2007 10:10:10 AM \\u2013 11/10/2007 10:10:10 AM", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMMMy", "O 10 \\u2013 N 10, 2007", "en", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEEEEdM", "W, 10/10 \\u2013 S, 11/10", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEdMMMMy", "Saturday, November 10 \\u2013 Tuesday, November 20, 2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMMy", "November 10\\u201320, 2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMM", "November 10\\u201320", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMMMy", "November 2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEdMMMM", "Saturday, November 10 \\u2013 Tuesday, November 20", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdMMMy", "Sat, Nov 10 \\u2013 Tue, Nov 20, 2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMM", "Nov 10\\u201320", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMMy", "Nov 2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdMMM", "Sat, Nov 10 \\u2013 Tue, Nov 20", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdMy", "Sat, 11/10/07 \\u2013 Tue, 11/20/07", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMy", "11/10/07 \\u2013 11/20/07", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dM", "11/10 \\u2013 11/20", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "My", "11/2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdM", "Sat, 11/10 \\u2013 Tue, 11/20", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "d", "10\\u201320", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "Ed", "10 Sat \\u2013 20 Tue", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "M", "11", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMM", "Nov", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hm", "11/10/2007 10:10 AM \\u2013 11/20/2007 10:10 AM", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hmv", "11/10/2007 10:10 AM PT \\u2013 11/20/2007 10:10 AM PT", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hmz", "11/10/2007 10:10 AM PST \\u2013 11/20/2007 10:10 AM PST", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hz", "11/10/2007 PST (Hour: 10) \\u2013 11/20/2007 PST (Hour: 10)", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEddMMyyyy", "Sat, 11/10/07 \\u2013 Tue, 11/20/07", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EddMMy", "Sat, 11/10/07 \\u2013 Tue, 11/20/07", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hhmm", "11/10/2007 10:10 AM \\u2013 11/20/2007 10:10 AM", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hms", "11/10/2007 10:10:10 AM \\u2013 11/20/2007 10:10:10 AM", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMMMy", "N 10\\u201320, 2007", "en", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEEdM", "S, 11/10 \\u2013 T, 11/20", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEEEdMMMMy", "Wednesday, January 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMMMy", "January 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMMM", "January 10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "MMMMy", "January 2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEEEdMMMM", "Wednesday, January 10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMMy", "Jan 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMM", "Jan 10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "MMMy", "Jan 2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EdMMM", "Wed, Jan 10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EdMy", "Wed, 1/10/2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMy", "1/10/2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dM", "1/10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EdM", "Wed, 1/10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "d", "10", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "Ed", "10 Wed", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "y", "2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "MMM", "Jan", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "MMMM", "January", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hm", "10:00 AM \\u2013 2:10 PM", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hmz", "10:00 AM \\u2013 2:10 PM PST", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "h", "10 AM \\u2013 2 PM", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hv", "10 AM \\u2013 2 PM PT", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hz", "10 AM \\u2013 2 PM PST", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEddMMyyyy", "Wed, 01/10/2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hhmm", "10:00 AM \\u2013 2:10 PM", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hhmmzz", "10:00 AM \\u2013 2:10 PM PST", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMMMMy", "J 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEEEEdM", "W, 1/10", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EEEEdMMMMy", "Wednesday, January 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "dMMMMy", "January 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "MMMMy", "January 2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EdMMMy", "Wed, Jan 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "dMMMy", "Jan 10, 2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "dMMM", "Jan 10", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EdMMM", "Wed, Jan 10", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EdMy", "Wed, 1/10/2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "dM", "1/10", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EdM", "Wed, 1/10", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "y", "2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "MMM", "Jan", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hm", "10:00\\u201310:20 AM", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hmv", "10:00\\u201310:20 AM PT", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "h", "10", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hz", "PST (Hour: 10)", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EddMMy", "Wed, 01/10/2007", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hhmm", "10:00\\u201310:20 AM", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hhmmzz", "10:00\\u201310:20 AM PST", "en", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hms", "10:0:10 AM \\u2013 10:20:10 AM", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "dMMMM", "January 10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "EEEEdMMMM", "Wednesday, January 10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "EdMMMy", "Wed, Jan 10, 2007", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "dMMM", "Jan 10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "EdMMM", "Wed, Jan 10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "dM", "1/10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "My", "1/2007", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "EdM", "Wed, 1/10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "d", "10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "Ed", "10 Wed", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "y", "2007", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "M", "1", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "MMM", "Jan", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "MMMM", "January", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hm", "10:10 AM", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hmv", "10:10 AM PT", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hmz", "10:10 AM PST", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "h", "10", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hz", "PST (Hour: 10)", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hhmmzz", "10:10 AM PST", "en", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hms", "10:10:10 AM", "zh", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEEEdMMMMy", "2007\\u5e7410\\u670810\\u65e5\\u661f\\u671f\\u4e09\\u81f32008\\u5e7410\\u670810\\u65e5\\u661f\\u671f\\u4e94", "zh", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hm", "2007\\u5e7410\\u670810\\u65e5 \\u4e0a\\u534810:10\\u20132008\\u5e7410\\u670810\\u65e5 \\u4e0a\\u534810:10", "zh", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMMy", "2007\\u5e7410\\u670810\\u65e5\\u81f311\\u670810\\u65e5", "zh", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMM", "10\\u670810\\u65e5\\u81f311\\u670810\\u65e5", "zh", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMMMy", "2007\\u5e7410\\u6708\\u81f311\\u6708", "zh", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEEEdMMMM", "10\\u670810\\u65e5\\u661f\\u671f\\u4e09\\u81f311\\u670810\\u65e5\\u661f\\u671f\\u516d", "zh", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hmv", "2007\\u5e7410\\u670810\\u65e5 \\u4e0a\\u534810:10 \\u7f8e\\u56fd (\\u6d1b\\u6749\\u77f6)\\u20132007\\u5e7411\\u670810\\u65e5 \\u4e0a\\u534810:10 \\u7f8e\\u56fd (\\u6d1b\\u6749\\u77f6)", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEdMMMMy", "2007\\u5e7411\\u670810\\u65e5\\u661f\\u671f\\u516d\\u81f320\\u65e5\\u661f\\u671f\\u4e8c", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMMy", "2007\\u5e7411\\u670810\\u65e5\\u81f320\\u65e5", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMM", "11\\u670810\\u65e5\\u81f320\\u65e5", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMMMy", "2007-11", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEdMMMM", "11\\u670810\\u65e5\\u661f\\u671f\\u516d\\u81f320\\u65e5\\u661f\\u671f\\u4e8c", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdMMM", "11\\u670810\\u65e5\\u5468\\u516d\\u81f320\\u65e5\\u5468\\u4e8c", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdMy", "07-11-10\\u5468\\u516d\\u81f307-11-20\\u5468\\u4e8c", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMy", "07-11-10\\u81f307-11-20", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dM", "11-10\\u81f311-20", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "My", "2007-11", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdM", "11-10\\u5468\\u516d\\u81f311-20\\u5468\\u4e8c", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "d", "10\\u65e5\\u81f320\\u65e5", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "y", "2007", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "M", "11", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMM", "11", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMMM", "11", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hmz", "2007\\u5e7411\\u670810\\u65e5 \\u4e0a\\u534810:10 \\u683c\\u6797\\u5c3c\\u6cbb\\u6807\\u51c6\\u65f6\\u95f4-0800\\u20132007\\u5e7411\\u670820\\u65e5 \\u4e0a\\u534810:10 \\u683c\\u6797\\u5c3c\\u6cbb\\u6807\\u51c6\\u65f6\\u95f4-0800", "zh", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "h", "2007\\u5e7411\\u670810\\u65e5 10\\u20132007\\u5e7411\\u670820\\u65e5 10", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEEEdMMMMy", "2007\\u5e7401\\u670810\\u65e5\\u661f\\u671f\\u4e09", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hm", "\\u4e0a\\u534810:00\\u81f3\\u4e0b\\u53482:10", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hmv", "\\u7f8e\\u56fd (\\u6d1b\\u6749\\u77f6)\\u4e0a\\u534810:00\\u81f3\\u4e0b\\u53482:10", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hmz", "\\u683c\\u6797\\u5c3c\\u6cbb\\u6807\\u51c6\\u65f6\\u95f4-0800\\u4e0a\\u534810:00\\u81f3\\u4e0b\\u53482:10", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "h", "\\u4e0a\\u534810\\u81f3\\u4e0b\\u53482\\u65f6", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hv", "\\u7f8e\\u56fd (\\u6d1b\\u6749\\u77f6)\\u4e0a\\u534810\\u81f3\\u4e0b\\u53482\\u65f6", "zh", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hz", "\\u683c\\u6797\\u5c3c\\u6cbb\\u6807\\u51c6\\u65f6\\u95f4-0800\\u4e0a\\u534810\\u81f3\\u4e0b\\u53482\\u65f6", "zh", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "dMMMM", "01-10", "zh", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hm", "\\u4e0a\\u534810:00\\u81f310:20", "zh", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hmv", "\\u7f8e\\u56fd (\\u6d1b\\u6749\\u77f6)\\u4e0a\\u534810:00\\u81f310:20", "zh", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "h", "10", "zh", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hz", "\\u683c\\u6797\\u5c3c\\u6cbb\\u6807\\u51c6\\u65f6\\u95f4-0800 (\\u5c0f\\u65f6: 10)", "zh", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "EEEEdMMMMy", "2007\\u5e7401\\u670810\\u65e5\\u661f\\u671f\\u4e09", "zh", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hm", "\\u4e0a\\u534810:10", "zh", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "h", "10", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEEEdMMMy", "Mittwoch, 10. Okt 2007 - Freitag, 10. Okt 2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMMy", "10. Okt 2007 - 10. Okt 2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMMM", "10. Okt 2007 - 10. Okt 2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "MMMy", "Okt 2007 - Okt 2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EEEdMMM", "Mi., 10. Okt 2007 - Fr., 10. Okt 2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EdMy", "Mi., 10.10.07 - Fr., 10.10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dMy", "10.10.07 - 10.10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "dM", "10.10.07 - 10.10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "My", "10.07 - 10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "EdM", "Mi., 10.10.07 - Fr., 10.10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "d", "10.10.07 - 10.10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "y", "2007-2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "M", "10.07 - 10.08", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "MMM", "Okt 2007 - Okt 2008", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "hm", "10.10.2007 10:10 vorm. - 10.10.2008 10:10 vorm.", "de", "2007 10 10 10:10:10", "2008 10 10 10:10:10", "jm", "10.10.2007 10:10 - 10.10.2008 10:10", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEEEdMMMy", "Mittwoch, 10. Okt - Samstag, 10. Nov 2007", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMMy", "10. Okt - 10. Nov 2007", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dMMM", "10. Okt - 10. Nov", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMMy", "Okt-Nov 2007", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EEEEdMMM", "Mittwoch, 10. Okt - Samstag, 10. Nov", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EdMy", "Mi., 10.10.07 - Sa., 10.11.07", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "dM", "10.10. - 10.11.", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "My", "10.07 - 11.07", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "EdM", "Mi., 10.10. - Sa., 10.11.", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "d", "10.10. - 10.11.", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "M", "10.-11.", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "MMM", "Okt-Nov", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hmv", "10.10.2007 10:10 vorm. Vereinigte Staaten (Los Angeles) - 10.11.2007 10:10 vorm. Vereinigte Staaten (Los Angeles)", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "jmv", "10.10.2007 10:10 Vereinigte Staaten (Los Angeles) - 10.11.2007 10:10 Vereinigte Staaten (Los Angeles)", "de", "2007 10 10 10:10:10", "2007 11 10 10:10:10", "hms", "10.10.2007 10:10:10 vorm. - 10.11.2007 10:10:10 vorm.", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEdMMMy", "Samstag, 10. - Dienstag, 20. Nov 2007", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMMy", "10.-20. Nov 2007", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMMM", "10.-20. Nov", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "MMMy", "Nov 2007", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EEEEdMMM", "Samstag, 10. - Dienstag, 20. Nov", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdMy", "Sa., 10.11.07 - Di., 20.11.07", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dMy", "10.11.07 - 20.11.07", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "dM", "10.11. - 20.11.", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "My", "2007-11", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "EdM", "Sa., 10.11. - Di., 20.11.", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "d", "10.-20.", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "y", "2007", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "M", "11", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "hmv", "10.11.2007 10:10 vorm. Vereinigte Staaten (Los Angeles) - 20.11.2007 10:10 vorm. Vereinigte Staaten (Los Angeles)", "de", "2007 11 10 10:10:10", "2007 11 20 10:10:10", "jmv", "10.11.2007 10:10 Vereinigte Staaten (Los Angeles) - 20.11.2007 10:10 Vereinigte Staaten (Los Angeles)", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEEEdMMMy", "Mittwoch, 10. Jan 2007", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMMy", "10. Jan 2007", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "dMMM", "10. Jan", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "MMMy", "Jan 2007", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "EEEEdMMM", "Mittwoch 10. Jan", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "hmz", "10:00-14:10 GMT-08:00", "de", "2007 01 10 10:00:10", "2007 01 10 14:10:10", "h", "10-14", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "EEEEdMMM", "Mittwoch 10. Jan", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hm", "10:00-10:20", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hmv", "10:00-10:20 Vereinigte Staaten (Los Angeles)", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hmz", "10:00-10:20 GMT-08:00", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "h", "10", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hv", "Vereinigte Staaten (Los Angeles) (Stunde: 10)", "de", "2007 01 10 10:00:10", "2007 01 10 10:20:10", "hz", "GMT-08:00 (Stunde: 10)", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "EEEEdMMMy", "Mittwoch, 10. Jan 2007", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hm", "10:10 vorm.", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "jm", "10:10", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hmv", "10:10 vorm. Vereinigte Staaten (Los Angeles)", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "jmv", "10:10 Vereinigte Staaten (Los Angeles)", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hmz", "10:10 vorm. GMT-08:00", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "jmz", "10:10 GMT-08:00", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "h", "10", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hv", "Vereinigte Staaten (Los Angeles) (Stunde: 10)", "de", "2007 01 10 10:10:10", "2007 01 10 10:10:20", "hz", "GMT-08:00 (Stunde: 10)", }; expect(DATA, DATA.length); } private void expect(String[] data, int data_length) { int i = 1; while (i " + exception); } logln(""); } } /* @Bug 4099975 * SimpleDateFormat constructor SimpleDateFormat(String, DateFormatSymbols) * should clone the DateFormatSymbols parameter */ public void Test4099975new() { Date d = new Date(); //test SimpleDateFormat Constructor { DateFormatSymbols symbols = new DateFormatSymbols(Locale.US); SimpleDateFormat df = new SimpleDateFormat("E hh:mm", symbols); SimpleDateFormat dfClone = (SimpleDateFormat) df.clone(); logln(df.toLocalizedPattern()); String s0 = df.format(d); String s_dfClone = dfClone.format(d); symbols.setLocalPatternChars("abcdefghijklmonpqr"); // change value of field logln(df.toLocalizedPattern()); String s1 = df.format(d); if (!s1.equals(s0) || !s1.equals(s_dfClone)) { errln("Constructor: the formats are not equal"); } if (!df.equals(dfClone)) { errln("The Clone Object does not equal with the orignal source"); } } //test SimpleDateFormat.setDateFormatSymbols() { DateFormatSymbols symbols = new DateFormatSymbols(Locale.US); SimpleDateFormat df = new SimpleDateFormat("E hh:mm"); df.setDateFormatSymbols(symbols); SimpleDateFormat dfClone = (SimpleDateFormat) df.clone(); logln(df.toLocalizedPattern()); String s0 = df.format(d); String s_dfClone = dfClone.format(d); symbols.setLocalPatternChars("abcdefghijklmonpqr"); // change value of field logln(df.toLocalizedPattern()); String s1 = df.format(d); if (!s1.equals(s0) || !s1.equals(s_dfClone)) { errln("setDateFormatSymbols: the formats are not equal"); } if (!df.equals(dfClone)) { errln("The Clone Object does not equal with the orignal source"); } } } /* * @bug 4117335 */ public void Test4117335() { final String bc = "\u7D00\u5143\u524D"; final String ad = "\u897f\u66a6"; final String jstLong = "\u65e5\u672c\u6a19\u6e96\u6642"; final String jdtLong = "\u65e5\u672c\u590f\u6642\u9593"; final String jstShort = "JST"; final String jdtShort = "JDT"; final String tzID = "Asia/Tokyo"; DateFormatSymbols symbols = new DateFormatSymbols(Locale.JAPAN); final String[] eras = symbols.getEras(); assertEquals("BC =", bc, eras[0]); assertEquals("AD =", ad, eras[1]); // don't use hard-coded index! final String zones[][] = symbols.getZoneStrings(); int index = -1; for (int i = 0; i < zones.length; ++i) { if (tzID.equals(zones[i][0])) { index = i; break; } } if (index == -1) { errln("could not find " + tzID); } else { assertEquals("Long zone name = ", jstLong, zones[index][1]); assertEquals("Short zone name = ", jstShort, zones[index][2]); assertEquals("Long zone name (3) = ", jdtLong, zones[index][3]); assertEquals("Short zone name (4) = ", jdtShort, zones[index][4]); } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestDateFormatAPIC.java0000644000175000017500000001302611361046232025271 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /** * Port From: ICU4C v1.8.1 : format : IntlTestDateFormatAPI * Source File: $ICU4CRoot/source/test/intltest/dtfmapts.cpp **/ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.*; import java.util.Date; import java.text.FieldPosition; import java.text.ParsePosition; /* * This is an API test, not a unit test. It doesn't test very many cases, and doesn't * try to test the full functionality. It just calls each function in the class and * verifies that it works on a basic level. */ public class IntlTestDateFormatAPIC extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new IntlTestDateFormatAPIC().run(args); } /** * Test hiding of parse() and format() APIs in the Format hierarchy. * We test the entire hierarchy, even though this test is located in * the DateFormat API test. */ public void TestNameHiding() { // N.B.: This test passes if it COMPILES, since it's a test of // compile-time name hiding. Date dateObj = new Date(0); Number numObj = new Double(3.1415926535897932384626433832795); StringBuffer strBuffer = new StringBuffer(""); String str; FieldPosition fpos = new FieldPosition(0); ParsePosition ppos = new ParsePosition(0); // DateFormat calling Format API { logln("DateFormat"); DateFormat dateFmt = DateFormat.getInstance(); if (dateFmt != null) { str = dateFmt.format(dateObj); strBuffer = dateFmt.format(dateObj, strBuffer, fpos); } else { errln("FAIL: Can't create DateFormat"); } } // SimpleDateFormat calling Format & DateFormat API { logln("SimpleDateFormat"); SimpleDateFormat sdf = new SimpleDateFormat(); // Format API str = sdf.format(dateObj); strBuffer = sdf.format(dateObj, strBuffer, fpos); // DateFormat API strBuffer = sdf.format(new Date(0), strBuffer, fpos); str = sdf.format(new Date(0)); try { sdf.parse(str); sdf.parse(str, ppos); } catch (java.text.ParseException pe) { System.out.println(pe); } } // NumberFormat calling Format API { logln("NumberFormat"); NumberFormat fmt = NumberFormat.getInstance(); if (fmt != null) { str = fmt.format(numObj); strBuffer = fmt.format(numObj, strBuffer, fpos); } else { errln("FAIL: Can't create NumberFormat"); } } // DecimalFormat calling Format & NumberFormat API { logln("DecimalFormat"); DecimalFormat fmt = new DecimalFormat(); // Format API str = fmt.format(numObj); strBuffer = fmt.format(numObj, strBuffer, fpos); // NumberFormat API str = fmt.format(2.71828); str = fmt.format(1234567); strBuffer = fmt.format(1.41421, strBuffer, fpos); strBuffer = fmt.format(9876543, strBuffer, fpos); Number obj = fmt.parse(str, ppos); try { obj = fmt.parse(str); if(obj==null){ errln("FAIL: The format object could not parse the string : "+str); } } catch (java.text.ParseException pe) { System.out.println(pe); } } //ICU4J have not the classes ChoiceFormat and MessageFormat /* // ChoiceFormat calling Format & NumberFormat API { logln("ChoiceFormat"); ChoiceFormat fmt = new ChoiceFormat("0#foo|1#foos|2#foos"); // Format API str = fmt.format(numObj); strBuffer = fmt.format(numObj, strBuffer, fpos); // NumberFormat API str = fmt.format(2.71828); str = fmt.format(1234567); strBuffer = fmt.format(1.41421, strBuffer, fpos); strBuffer = fmt.format(9876543, strBuffer, fpos); Number obj = fmt.parse(str, ppos); try { obj = fmt.parse(str); } catch (java.text.ParseException pe) { System.out.println(pe); } } // MessageFormat calling Format API { logln("MessageFormat"); MessageFormat fmt = new MessageFormat(""); // Format API // We use dateObj, which MessageFormat should reject. // We're testing name hiding, not the format method. try { str = fmt.format(dateObj); } catch (Exception e) { //e.printStackTrace(); } try { strBuffer = fmt.format(dateObj, strBuffer, fpos); } catch (Exception e) { //e.printStackTrace(); } } */ } }icu4j-4.2/src/com/ibm/icu/dev/test/format/TimeUnitTest.java0000644000175000017500000001123611361046232023516 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2008, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import java.text.ParseException; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.text.NumberFormat; import com.ibm.icu.text.TimeUnitFormat; import com.ibm.icu.util.TimeUnit; import com.ibm.icu.util.TimeUnitAmount; import com.ibm.icu.util.ULocale; /** * @author markdavis * */ public class TimeUnitTest extends TestFmwk { public static void main(String[] args) throws Exception{ new TimeUnitTest().run(args); } public void TestBasic() { String[] locales = {"en", "sl", "fr", "zh", "ar", "ru", "zh_Hant"}; for ( int locIndex = 0; locIndex < locales.length; ++locIndex ) { //System.out.println("locale: " + locales[locIndex]); Object[] formats = new Object[] { new TimeUnitFormat(new ULocale(locales[locIndex]), TimeUnitFormat.FULL_NAME), new TimeUnitFormat(new ULocale(locales[locIndex]), TimeUnitFormat.ABBREVIATED_NAME) }; for (int style = TimeUnitFormat.FULL_NAME; style <= TimeUnitFormat.ABBREVIATED_NAME; ++style) { final TimeUnit[] values = TimeUnit.values(); for (int j = 0; j < values.length; ++j) { final TimeUnit timeUnit = values[j]; double[] tests = {0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 5, 10, 100, 101.35}; for (int i = 0; i < tests.length; ++i) { TimeUnitAmount source = new TimeUnitAmount(tests[i], timeUnit); String formatted = ((TimeUnitFormat)formats[style]).format(source); //System.out.println(formatted); logln(tests[i] + " => " + formatted); try { TimeUnitAmount result = (TimeUnitAmount) ((TimeUnitFormat)formats[style]).parseObject(formatted); if (result == null || !source.equals(result)) { errln("No round trip: " + source + " => " + formatted + " => " + result); } // mix style parsing result = (TimeUnitAmount) ((TimeUnitFormat)formats[1 - style]).parseObject(formatted); if (result == null || !source.equals(result)) { errln("No round trip: " + source + " => " + formatted + " => " + result); } } catch (ParseException e) { errln(e.getMessage()); } } } } } } public void TestAPI() { TimeUnitFormat format = new TimeUnitFormat(); format.setLocale(new ULocale("pt_BR")); formatParsing(format); format = new TimeUnitFormat(new ULocale("de")); formatParsing(format); format = new TimeUnitFormat(new ULocale("ja")); format.setNumberFormat(NumberFormat.getNumberInstance(new ULocale("en"))); formatParsing(format); format = new TimeUnitFormat(); ULocale es = new ULocale("es"); format.setNumberFormat(NumberFormat.getNumberInstance(es)); format.setLocale(es); formatParsing(format); } private void formatParsing(TimeUnitFormat format) { final TimeUnit[] values = TimeUnit.values(); for (int j = 0; j < values.length; ++j) { final TimeUnit timeUnit = values[j]; double[] tests = {0, 0.5, 1, 2, 3, 5}; for (int i = 0; i < tests.length; ++i) { TimeUnitAmount source = new TimeUnitAmount(tests[i], timeUnit); String formatted = format.format(source); //System.out.println(formatted); logln(tests[i] + " => " + formatted); try { TimeUnitAmount result = (TimeUnitAmount) format.parseObject(formatted); if (result == null || !source.equals(result)) { errln("No round trip: " + source + " => " + formatted + " => " + result); } } catch (ParseException e) { errln(e.getMessage()); } } } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/DateFormatRegressionTest.java0000644000175000017500000013043311361046232026050 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2009, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ /** * Port From: ICU4C v1.8.1 : format : DateFormatRegressionTest * Source File: $ICU4CRoot/source/test/intltest/dtfmrgts.cpp **/ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.*; import com.ibm.icu.util.*; import java.io.*; import java.text.FieldPosition; import java.text.Format; import java.text.ParseException; import java.text.ParsePosition; import java.util.Date; import java.util.Locale; /** * Performs regression test for DateFormat **/ public class DateFormatRegressionTest extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception{ new DateFormatRegressionTest().run(args); } /** * @bug 4029195 */ public void Test4029195() { Calendar cal = Calendar.getInstance(); Date today = cal.getTime(); logln("today: " + today); SimpleDateFormat sdf = (SimpleDateFormat) DateFormat.getDateInstance(); String pat = sdf.toPattern(); logln("pattern: " + pat); StringBuffer fmtd = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); fmtd = sdf.format(today, fmtd, pos); logln("today: " + fmtd); sdf.applyPattern("G yyyy DDD"); StringBuffer todayS = new StringBuffer(""); todayS = sdf.format(today, todayS, pos); logln("today: " + todayS); try { today = sdf.parse(todayS.toString()); logln("today date: " + today); } catch (Exception e) { errln("Error reparsing date: " + e.getMessage()); } try { StringBuffer rt = new StringBuffer(""); rt = sdf.format(sdf.parse(todayS.toString()), rt, pos); logln("round trip: " + rt); if (!rt.toString().equals(todayS.toString())) errln("Fail: Want " + todayS + " Got " + rt); } catch (ParseException e) { errln("Fail: " + e); e.printStackTrace(); } } /** * @bug 4052408 */ public void Test4052408() { DateFormat fmt = DateFormat.getDateTimeInstance(DateFormat.SHORT, DateFormat.SHORT, Locale.US); Calendar cal = Calendar.getInstance(); cal.clear(); cal.set(97 + 1900, Calendar.MAY, 3, 8, 55); Date dt = cal.getTime(); String str = fmt.format(dt); logln(str); if (!str.equals("5/3/97 8:55 AM")) errln("Fail: Test broken; Want 5/3/97 8:55 AM Got " + str); String expected[] = { "", //"ERA_FIELD", "97", //"YEAR_FIELD", "5", //"MONTH_FIELD", "3", //"DATE_FIELD", "", //"HOUR_OF_DAY1_FIELD", "", //"HOUR_OF_DAY0_FIELD", "55", //"MINUTE_FIELD", "", //"SECOND_FIELD", "", //"MILLISECOND_FIELD", "", //"DAY_OF_WEEK_FIELD", "", //"DAY_OF_YEAR_FIELD", "", //"DAY_OF_WEEK_IN_MONTH_FIELD", "", //"WEEK_OF_YEAR_FIELD", "", //"WEEK_OF_MONTH_FIELD", "AM", //"AM_PM_FIELD", "8", //"HOUR1_FIELD", "", //"HOUR0_FIELD", "" //"TIMEZONE_FIELD" }; String fieldNames[] = { "ERA_FIELD", "YEAR_FIELD", "MONTH_FIELD", "DATE_FIELD", "HOUR_OF_DAY1_FIELD", "HOUR_OF_DAY0_FIELD", "MINUTE_FIELD", "SECOND_FIELD", "MILLISECOND_FIELD", "DAY_OF_WEEK_FIELD", "DAY_OF_YEAR_FIELD", "DAY_OF_WEEK_IN_MONTH_FIELD", "WEEK_OF_YEAR_FIELD", "WEEK_OF_MONTH_FIELD", "AM_PM_FIELD", "HOUR1_FIELD", "HOUR0_FIELD", "TIMEZONE_FIELD"}; boolean pass = true; for (int i = 0; i <= 17; ++i) { FieldPosition pos = new FieldPosition(i); StringBuffer buf = new StringBuffer(""); fmt.format(dt, buf, pos); //char[] dst = new char[pos.getEndIndex() - pos.getBeginIndex()]; String dst = buf.substring(pos.getBeginIndex(), pos.getEndIndex()); str = dst; log(i + ": " + fieldNames[i] + ", \"" + str + "\", " + pos.getBeginIndex() + ", " + pos.getEndIndex()); String exp = expected[i]; if ((exp.length() == 0 && str.length() == 0) || str.equals(exp)) logln(" ok"); else { logln(" expected " + exp); pass = false; } } if (!pass) errln("Fail: FieldPosition not set right by DateFormat"); } /** * @bug 4056591 * Verify the function of the [s|g]et2DigitYearStart() API. */ public void Test4056591() { try { SimpleDateFormat fmt = new SimpleDateFormat("yyMMdd", Locale.US); Calendar cal = Calendar.getInstance(); cal.clear(); cal.set(1809, Calendar.DECEMBER, 25); Date start = cal.getTime(); fmt.set2DigitYearStart(start); if ((fmt.get2DigitYearStart() != start)) errln("get2DigitYearStart broken"); cal.clear(); cal.set(1809, Calendar.DECEMBER, 25); Date d1 = cal.getTime(); cal.clear(); cal.set(1909, Calendar.DECEMBER, 24); Date d2 = cal.getTime(); cal.clear(); cal.set(1809, Calendar.DECEMBER, 26); Date d3 = cal.getTime(); cal.clear(); cal.set(1861, Calendar.DECEMBER, 25); Date d4 = cal.getTime(); Date dates[] = {d1, d2, d3, d4}; String strings[] = {"091225", "091224", "091226", "611225"}; for (int i = 0; i < 4; i++) { String s = strings[i]; Date exp = dates[i]; Date got = fmt.parse(s); logln(s + " . " + got + "; exp " + exp); if (got.getTime() != exp.getTime()) errln("set2DigitYearStart broken"); } } catch (ParseException e) { errln("Fail: " + e); e.printStackTrace(); } } /** * @bug 4059917 */ public void Test4059917() { SimpleDateFormat fmt; String myDate; fmt = new SimpleDateFormat("yyyy/MM/dd"); myDate = "1997/01/01"; aux917( fmt, myDate ); fmt = new SimpleDateFormat("yyyyMMdd"); myDate = "19970101"; aux917( fmt, myDate ); } public void aux917(SimpleDateFormat fmt, String str) { String pat = fmt.toPattern(); logln("=================="); logln("testIt: pattern=" + pat + " string=" + str); ParsePosition pos = new ParsePosition(0); Object o = fmt.parseObject(str, pos); //logln( UnicodeString("Parsed object: ") + o ); StringBuffer formatted = new StringBuffer(""); FieldPosition poss = new FieldPosition(0); formatted = fmt.format(o, formatted, poss); logln("Formatted string: " + formatted); if (!formatted.toString().equals(str)) errln("Fail: Want " + str + " Got " + formatted); } /** * @bug 4060212 */ public void Test4060212() { String dateString = "1995-040.05:01:29"; logln("dateString= " + dateString); logln("Using yyyy-DDD.hh:mm:ss"); SimpleDateFormat formatter = new SimpleDateFormat("yyyy-DDD.hh:mm:ss"); ParsePosition pos = new ParsePosition(0); Date myDate = formatter.parse(dateString, pos); DateFormat fmt = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.LONG); String myString = fmt.format(myDate); logln(myString); Calendar cal = new GregorianCalendar(); cal.setTime(myDate); if ((cal.get(Calendar.DAY_OF_YEAR) != 40)) errln("Fail: Got " + cal.get(Calendar.DAY_OF_YEAR) + " Want 40"); logln("Using yyyy-ddd.hh:mm:ss"); formatter = new SimpleDateFormat("yyyy-ddd.hh:mm:ss"); pos.setIndex(0); myDate = formatter.parse(dateString, pos); myString = fmt.format(myDate); logln(myString); cal.setTime(myDate); if ((cal.get(Calendar.DAY_OF_YEAR) != 40)) errln("Fail: Got " + cal.get(Calendar.DAY_OF_YEAR) + " Want 40"); } /** * @bug 4061287 */ public void Test4061287() { SimpleDateFormat df = new SimpleDateFormat("dd/MM/yyyy"); try { logln(df.parse("35/01/1971").toString()); } catch (ParseException e) { errln("Fail: " + e); e.printStackTrace(); } df.setLenient(false); boolean ok = false; try { logln(df.parse("35/01/1971").toString()); } catch (ParseException e) { ok = true; } if (!ok) errln("Fail: Lenient not working"); } /** * @bug 4065240 */ public void Test4065240() { Date curDate; DateFormat shortdate, fulldate; String strShortDate, strFullDate; Locale saveLocale = Locale.getDefault(); TimeZone saveZone = TimeZone.getDefault(); try { Locale curLocale = new Locale("de", "DE"); Locale.setDefault(curLocale); // {sfb} adoptDefault instead of setDefault //TimeZone.setDefault(TimeZone.createTimeZone("EST")); TimeZone.setDefault(TimeZone.getTimeZone("EST")); Calendar cal = Calendar.getInstance(); cal.clear(); cal.set(98 + 1900, 0, 1); curDate = cal.getTime(); shortdate = DateFormat.getDateInstance(DateFormat.SHORT); fulldate = DateFormat.getDateTimeInstance(DateFormat.LONG, DateFormat.LONG); strShortDate = "The current date (short form) is "; String temp; temp = shortdate.format(curDate); strShortDate += temp; strFullDate = "The current date (long form) is "; String temp2 = fulldate.format(curDate); strFullDate += temp2; logln(strShortDate); logln(strFullDate); // {sfb} What to do with resource bundle stuff????? // Check to see if the resource is present; if not, we can't test //ResourceBundle bundle = //The variable is never used // ICULocaleData.getBundle("DateFormatZoneData", curLocale); // {sfb} API change to ResourceBundle -- add getLocale() /*if (bundle.getLocale().getLanguage().equals("de")) { // UPDATE THIS AS ZONE NAME RESOURCE FOR in de_DE is updated if (!strFullDate.endsWith("GMT-05:00")) errln("Fail: Want GMT-05:00"); } else { logln("*** TEST COULD NOT BE COMPLETED BECAUSE DateFormatZoneData ***"); logln("*** FOR LOCALE de OR de_DE IS MISSING ***"); }*/ } catch (Exception e) { logln(e.getMessage()); } finally { Locale.setDefault(saveLocale); TimeZone.setDefault(saveZone); } } /* DateFormat.equals is too narrowly defined. As a result, MessageFormat does not work correctly. DateFormat.equals needs to be written so that the Calendar sub-object is not compared using Calendar.equals, but rather compared for equivalency. This may necessitate adding a (package private) method to Calendar to test for equivalency. Currently this bug breaks MessageFormat.toPattern */ /** * @bug 4071441 */ public void Test4071441() { DateFormat fmtA = DateFormat.getInstance(); DateFormat fmtB = DateFormat.getInstance(); // {sfb} Is it OK to cast away const here? Calendar calA = fmtA.getCalendar(); Calendar calB = fmtB.getCalendar(); calA.clear(); calA.set(1900, 0 ,0); calB.clear(); calB.set(1900, 0, 0); if (!calA.equals(calB)) errln("Fail: Can't complete test; Calendar instances unequal"); if (!fmtA.equals(fmtB)) errln("Fail: DateFormat unequal when Calendars equal"); calB.clear(); calB.set(1961, Calendar.DECEMBER, 25); if (calA.equals(calB)) errln("Fail: Can't complete test; Calendar instances equal"); if (!fmtA.equals(fmtB)) errln("Fail: DateFormat unequal when Calendars equivalent"); logln("DateFormat.equals ok"); } /* The java.text.DateFormat.parse(String) method expects for the US locale a string formatted according to mm/dd/yy and parses it correctly. When given a string mm/dd/yyyy it only parses up to the first two y's, typically resulting in a date in the year 1919. Please extend the parsing method(s) to handle strings with four-digit year values (probably also applicable to various other locales. */ /** * @bug 4073003 */ public void Test4073003() { try { DateFormat fmt = DateFormat.getDateInstance(DateFormat.SHORT, Locale.US); String tests[] = {"12/25/61", "12/25/1961", "4/3/2010", "4/3/10"}; for (int i = 0; i < 4; i += 2) { Date d = fmt.parse(tests[i]); Date dd = fmt.parse(tests[i + 1]); String s; s = fmt.format(d); String ss; ss = fmt.format(dd); if (d.getTime() != dd.getTime()) errln("Fail: " + d + " != " + dd); if (!s.equals(ss)) errln("Fail: " + s + " != " + ss); logln("Ok: " + s + " " + d); } } catch (ParseException e) { errln("Fail: " + e); e.printStackTrace(); } } /** * @bug 4089106 */ public void Test4089106() { TimeZone def = TimeZone.getDefault(); try { TimeZone z = new SimpleTimeZone((int) (1.25 * 3600000), "FAKEZONE"); TimeZone.setDefault(z); SimpleDateFormat f = new SimpleDateFormat(); if (!f.getTimeZone().equals(z)) errln("Fail: SimpleTimeZone should use TimeZone.getDefault()"); } finally { TimeZone.setDefault(def); } } /** * @bug 4100302 */ public void Test4100302() { Locale locales[] = { Locale.CANADA, Locale.CANADA_FRENCH, Locale.CHINA, Locale.CHINESE, Locale.ENGLISH, Locale.FRANCE, Locale.FRENCH, Locale.GERMAN, Locale.GERMANY, Locale.ITALIAN, Locale.ITALY, Locale.JAPAN, Locale.JAPANESE, Locale.KOREA, Locale.KOREAN, Locale.PRC, Locale.SIMPLIFIED_CHINESE, Locale.TAIWAN, Locale.TRADITIONAL_CHINESE, Locale.UK, Locale.US}; try { boolean pass = true; for (int i = 0; i < 21; i++) { Format format = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, locales[i]); byte[] bytes; ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(format); oos.flush(); baos.close(); bytes = baos.toByteArray(); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes)); Object o = ois.readObject(); if (!format.equals(o)) { pass = false; logln("DateFormat instance for locale " + locales[i] + " is incorrectly serialized/deserialized."); } else { logln("DateFormat instance for locale " + locales[i] + " is OKAY."); } } if (!pass) errln("Fail: DateFormat serialization/equality bug"); } catch (OptionalDataException e) { errln("Fail: " + e); } catch (IOException e) { errln("Fail: " + e); } catch (ClassNotFoundException e) { errln("Fail: " + e); } } /** * @bug 4101483 */ public void Test4101483() { SimpleDateFormat sdf = new SimpleDateFormat("z", Locale.US); FieldPosition fp = new FieldPosition(DateFormat.TIMEZONE_FIELD); Date d = new Date(9234567890L); StringBuffer buf = new StringBuffer(""); sdf.format(d, buf, fp); logln(sdf.format(d, buf, fp).toString()); logln("beginIndex = " + fp.getBeginIndex()); logln("endIndex = " + fp.getEndIndex()); if (fp.getBeginIndex() == fp.getEndIndex()) errln("Fail: Empty field"); } /** * @bug 4103340 * @bug 4138203 * This bug really only works in Locale.US, since that's what the locale * used for Date.toString() is. Bug 4138203 reports that it fails on Korean * NT; it would actually have failed on any non-US locale. Now it should * work on all locales. */ public void Test4103340() { // choose a date that is the FIRST of some month // and some arbitrary time Calendar cal = Calendar.getInstance(); cal.clear(); cal.set(1997, 3, 1, 1, 1, 1); Date d = cal.getTime(); SimpleDateFormat df = new SimpleDateFormat("MMMM", Locale.US); String s = d.toString(); StringBuffer s2 = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); s2 = df.format(d, s2, pos); logln("Date=" + s); logln("DF=" + s2); String substr = s2.substring(0,2); if (s.indexOf(substr) == -1) errln("Months should match"); } /** * @bug 4103341 */ public void Test4103341() { TimeZone saveZone = TimeZone.getDefault(); try { // {sfb} changed from adoptDefault to setDefault TimeZone.setDefault(TimeZone.getTimeZone("CST")); SimpleDateFormat simple = new SimpleDateFormat("MM/dd/yyyy HH:mm"); TimeZone temp = TimeZone.getDefault(); if (!simple.getTimeZone().equals(temp)) errln("Fail: SimpleDateFormat not using default zone"); } finally { TimeZone.setDefault(saveZone); } } /** * @bug 4104136 */ public void Test4104136() { SimpleDateFormat sdf = new SimpleDateFormat(); String pattern = "'time' hh:mm"; sdf.applyPattern(pattern); logln("pattern: \"" + pattern + "\""); String strings[] = {"time 10:30", "time 10:x", "time 10x"}; ParsePosition ppos[] = {new ParsePosition(10), new ParsePosition(0), new ParsePosition(0)}; Calendar cal = Calendar.getInstance(); cal.clear(); cal.set(1970, Calendar.JANUARY, 1, 10, 30); Date dates[] = {cal.getTime(), new Date(-1), new Date(-1)}; for (int i = 0; i < 3; i++) { String text = strings[i]; ParsePosition finish = ppos[i]; Date exp = dates[i]; ParsePosition pos = new ParsePosition(0); Date d = sdf.parse(text, pos); logln(" text: \"" + text + "\""); logln(" index: %d" + pos.getIndex()); logln(" result: " + d); if (pos.getIndex() != finish.getIndex()) errln("Fail: Expected pos " + finish.getIndex()); if (!((d == null && exp.equals(new Date(-1))) || (d.equals(exp)))) errln( "Fail: Expected result " + exp); } } /** * @bug 4104522 * CANNOT REPRODUCE * According to the bug report, this test should throw a * StringIndexOutOfBoundsException during the second parse. However, * this is not seen. */ public void Test4104522() { SimpleDateFormat sdf = new SimpleDateFormat(); String pattern = "'time' hh:mm"; sdf.applyPattern(pattern); logln("pattern: \"" + pattern + "\""); // works correctly ParsePosition pp = new ParsePosition(0); String text = "time "; Date dt = sdf.parse(text, pp); logln(" text: \"" + text + "\"" + " date: " + dt); // works wrong pp.setIndex(0); text = "time"; dt = sdf.parse(text, pp); logln(" text: \"" + text + "\"" + " date: " + dt); } /** * @bug 4106807 */ public void Test4106807() { Date dt; DateFormat df = DateFormat.getDateTimeInstance(); SimpleDateFormat sdfs[] = { new SimpleDateFormat("yyyyMMddHHmmss"), new SimpleDateFormat("yyyyMMddHHmmss'Z'"), new SimpleDateFormat("yyyyMMddHHmmss''"), new SimpleDateFormat("yyyyMMddHHmmss'a''a'"), new SimpleDateFormat("yyyyMMddHHmmss %")}; String strings[] = { "19980211140000", "19980211140000", "19980211140000", "19980211140000a", "19980211140000 "}; GregorianCalendar gc = new GregorianCalendar(); TimeZone timeZone = TimeZone.getDefault(); TimeZone gmt = (TimeZone) timeZone.clone(); gmt.setRawOffset(0); for (int i = 0; i < 5; i++) { SimpleDateFormat format = sdfs[i]; String dateString = strings[i]; try { format.setTimeZone(gmt); dt = format.parse(dateString); // {sfb} some of these parses will fail purposely StringBuffer fmtd = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); fmtd = df.format(dt, fmtd, pos); logln(fmtd.toString()); //logln(df.format(dt)); gc.setTime(dt); logln("" + gc.get(Calendar.ZONE_OFFSET)); StringBuffer s = new StringBuffer(""); s = format.format(dt, s, pos); logln(s.toString()); } catch (ParseException e) { logln("No way Jose"); } } } /* Synopsis: Chinese time zone CTT is not recogonized correctly. Description: Platform Chinese Windows 95 - ** Time zone set to CST ** */ /** * @bug 4108407 */ // {sfb} what to do with this one ?? public void Test4108407() { /* // TODO user.timezone is a protected system property, catch securityexception and warn // if this is reenabled long l = System.currentTimeMillis(); logln("user.timezone = " + System.getProperty("user.timezone", "?")); logln("Time Zone :" + DateFormat.getDateInstance().getTimeZone().getID()); logln("Default format :" + DateFormat.getDateInstance().format(new Date(l))); logln("Full format :" + DateFormat.getDateInstance(DateFormat.FULL).format(new Date(l))); logln("*** Set host TZ to CST ***"); logln("*** THE RESULTS OF THIS TEST MUST BE VERIFIED MANUALLY ***"); */ } /** * @bug 4134203 * SimpleDateFormat won't parse "GMT" */ public void Test4134203() { String dateFormat = "MM/dd/yy HH:mm:ss zzz"; SimpleDateFormat fmt = new SimpleDateFormat(dateFormat); ParsePosition p0 = new ParsePosition(0); Date d = fmt.parse("01/22/92 04:52:00 GMT", p0); logln(d.toString()); if(p0.equals(new ParsePosition(0))) errln("Fail: failed to parse 'GMT'"); // In the failure case an exception is thrown by parse(); // if no exception is thrown, the test passes. } /** * @bug 4151631 * SimpleDateFormat incorrect handling of 2 single quotes in format() */ public void Test4151631() { String pattern = "'TO_DATE('''dd'-'MM'-'yyyy HH:mm:ss''' , ''DD-MM-YYYY HH:MI:SS'')'"; logln("pattern=" + pattern); SimpleDateFormat format = new SimpleDateFormat(pattern, Locale.US); StringBuffer result = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); Calendar cal = Calendar.getInstance(); cal.clear(); cal.set(1998, Calendar.JUNE, 30, 13, 30, 0); Date d = cal.getTime(); result = format.format(d, result, pos); if (!result.toString().equals("TO_DATE('30-06-1998 13:30:00' , 'DD-MM-YYYY HH:MI:SS')")) { errln("Fail: result=" + result); } else { logln("Pass: result=" + result); } } /** * @bug 4151706 * 'z' at end of date format throws index exception in SimpleDateFormat * CANNOT REPRODUCE THIS BUG ON 1.2FCS */ public void Test4151706() { String dateString = "Thursday, 31-Dec-98 23:00:00 GMT"; SimpleDateFormat fmt = new SimpleDateFormat("EEEE, dd-MMM-yy HH:mm:ss z", Locale.US); Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC"), Locale.US); cal.clear(); cal.set(1998, Calendar.DECEMBER, 31, 23, 0, 0); Date d = new Date(); try { d = fmt.parse(dateString); // {sfb} what about next two lines? if (d.getTime() != cal.getTime().getTime()) errln("Incorrect value: " + d); } catch (Exception e) { errln("Fail: " + e); } StringBuffer temp = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); logln(dateString + " . " + fmt.format(d, temp, pos)); } /** * @bug 4162071 * Cannot reproduce this bug under 1.2 FCS -- it may be a convoluted duplicate * of some other bug that has been fixed. */ public void Test4162071() { String dateString = "Thu, 30-Jul-1999 11:51:14 GMT"; String format = "EEE', 'dd-MMM-yyyy HH:mm:ss z"; // RFC 822/1123 SimpleDateFormat df = new SimpleDateFormat(format, Locale.US); try { Date x = df.parse(dateString); StringBuffer temp = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); logln(dateString + " -> " + df.format(x, temp, pos)); } catch (Exception e) { errln("Parse format \"" + format + "\" failed."); } } /** * DateFormat shouldn't parse year "-1" as a two-digit year (e.g., "-1" . 1999). */ public void Test4182066() { SimpleDateFormat fmt = new SimpleDateFormat("MM/dd/yy", Locale.US); SimpleDateFormat dispFmt = new SimpleDateFormat("MMM dd yyyy HH:mm:ss GG", Locale.US); /* We expect 2-digit year formats to put 2-digit years in the right * window. Out of range years, that is, anything less than "00" or * greater than "99", are treated as literal years. So "1/2/3456" * becomes 3456 AD. Likewise, "1/2/-3" becomes -3 AD == 2 BC. */ final String STRINGS[] = {"02/29/00", "01/23/01", "04/05/-1", "01/23/-9", "11/12/1314", "10/31/1", "09/12/+1", "09/12/001",}; int STRINGS_COUNT = STRINGS.length; Calendar cal = Calendar.getInstance(); Date FAIL_DATE = cal.getTime(); cal.clear(); cal.set(2000, Calendar.FEBRUARY, 29); Date d0 = cal.getTime(); cal.clear(); cal.set(2001, Calendar.JANUARY, 23); Date d1 = cal.getTime(); cal.clear(); cal.set(-1, Calendar.APRIL, 5); Date d2 = cal.getTime(); cal.clear(); cal.set(-9, Calendar.JANUARY, 23); Date d3 = cal.getTime(); cal.clear(); cal.set(1314, Calendar.NOVEMBER, 12); Date d4 = cal.getTime(); cal.clear(); cal.set(1, Calendar.OCTOBER, 31); Date d5 = cal.getTime(); cal.clear(); cal.set(1, Calendar.SEPTEMBER, 12); Date d7 = cal.getTime(); Date DATES[] = {d0, d1, d2, d3, d4, d5, FAIL_DATE, d7}; String out = ""; boolean pass = true; for (int i = 0; i < STRINGS_COUNT; ++i) { String str = STRINGS[i]; Date expected = DATES[i]; Date actual = null; try { actual = fmt.parse(str); } catch (ParseException e) { actual = FAIL_DATE; } String actStr = ""; if ((actual.getTime()) == FAIL_DATE.getTime()) { actStr += "null"; } else { // Yuck: See j25 actStr = ((DateFormat) dispFmt).format(actual); } if (expected.getTime() == (actual.getTime())) { out += str + " => " + actStr + "\n"; } else { String expStr = ""; if (expected.getTime() == FAIL_DATE.getTime()) { expStr += "null"; } else { // Yuck: See j25 expStr = ((DateFormat) dispFmt).format(expected); } out += "FAIL: " + str + " => " + actStr + ", expected " + expStr + "\n"; pass = false; } } if (pass) { log(out); } else { err(out); } } /** * j32 {JDK Bug 4210209 4209272} * DateFormat cannot parse Feb 29 2000 when setLenient(false) */ public void Test4210209() { String pattern = "MMM d, yyyy"; DateFormat fmt = new SimpleDateFormat(pattern, Locale.US); DateFormat disp = new SimpleDateFormat("MMM dd yyyy GG", Locale.US); Calendar calx = fmt.getCalendar(); calx.setLenient(false); Calendar calendar = Calendar.getInstance(); calendar.clear(); calendar.set(2000, Calendar.FEBRUARY, 29); Date d = calendar.getTime(); String s = fmt.format(d); logln(disp.format(d) + " f> " + pattern + " => \"" + s + "\""); ParsePosition pos = new ParsePosition(0); d = fmt.parse(s, pos); logln("\"" + s + "\" p> " + pattern + " => " + (d!=null?disp.format(d):"null")); logln("Parse pos = " + pos.getIndex() + ", error pos = " + pos.getErrorIndex()); if (pos.getErrorIndex() != -1) { errln("FAIL: Error index should be -1"); } // The underlying bug is in GregorianCalendar. If the following lines // succeed, the bug is fixed. If the bug isn't fixed, they will throw // an exception. GregorianCalendar cal = new GregorianCalendar(); cal.clear(); cal.setLenient(false); cal.set(2000, Calendar.FEBRUARY, 29); // This should work! d = cal.getTime(); logln("Attempt to set Calendar to Feb 29 2000: " + disp.format(d)); } public void Test714() { //TimeZone Offset TimeZone defaultTZ = TimeZone.getDefault(); TimeZone PST = TimeZone.getTimeZone("PST"); int defaultOffset = defaultTZ.getRawOffset(); int PSTOffset = PST.getRawOffset(); Date d = new Date(978103543000l - (defaultOffset - PSTOffset)); d = new Date(d.getTime() - (defaultTZ.inDaylightTime(d) ? 3600000 : 0)); DateFormat fmt = DateFormat.getDateTimeInstance(-1, DateFormat.MEDIUM, Locale.US); String tests = "7:25:43 AM"; String s = fmt.format(d); if (!s.equals(tests)) { errln("Fail: " + s + " != " + tests); } else { logln("OK: " + s + " == " + tests); } } public void Test_GEec() { class PatternAndResult { private String pattern; private String result; PatternAndResult(String pat, String res) { pattern = pat; result = res; } public String getPattern() { return pattern; } public String getResult() { return result; } } final PatternAndResult[] tests = { new PatternAndResult( "dd MMM yyyy GGG", "02 Jul 2008 AD" ), new PatternAndResult( "dd MMM yyyy GGGGG", "02 Jul 2008 A" ), new PatternAndResult( "e dd MMM yyyy", "3 02 Jul 2008" ), new PatternAndResult( "ee dd MMM yyyy", "03 02 Jul 2008" ), new PatternAndResult( "c dd MMM yyyy", "3 02 Jul 2008" ), new PatternAndResult( "cc dd MMM yyyy", "3 02 Jul 2008" ), new PatternAndResult( "eee dd MMM yyyy", "Wed 02 Jul 2008" ), new PatternAndResult( "EEE dd MMM yyyy", "Wed 02 Jul 2008" ), new PatternAndResult( "EE dd MMM yyyy", "Wed 02 Jul 2008" ), new PatternAndResult( "eeee dd MMM yyyy", "Wednesday 02 Jul 2008" ), new PatternAndResult( "eeeee dd MMM yyyy", "W 02 Jul 2008" ), new PatternAndResult( "e ww YYYY", "3 27 2008" ), new PatternAndResult( "c ww YYYY", "3 27 2008" ), }; ULocale loc = ULocale.ENGLISH; TimeZone tz = TimeZone.getTimeZone("America/Los_Angeles"); Calendar cal = new GregorianCalendar(tz, loc); SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MMM-dd", loc); for ( int i = 0; i < tests.length; i++ ) { PatternAndResult item = tests[i]; dateFormat.applyPattern( item.getPattern() ); cal.set(2008, 6, 2, 5, 0); // 2008 July 02 5 AM PDT StringBuffer buf = new StringBuffer(32); FieldPosition fp = new FieldPosition(DateFormat.YEAR_FIELD); dateFormat.format(cal, buf, fp); if ( buf.toString().compareTo(item.getResult()) != 0 ) { errln("for pattern " + item.getPattern() + ", expected " + item.getResult() + ", got " + buf ); } ParsePosition pos = new ParsePosition(0); dateFormat.parse( item.getResult(), cal, pos); int year = cal.get(Calendar.YEAR); int month = cal.get(Calendar.MONTH); int day = cal.get(Calendar.DATE); if ( year != 2008 || month != 6 || day != 2 ) { errln("use pattern " + item.getPattern() + " to parse " + item.getResult() + ", expected y2008 m6 d2, got " + year + " " + month + " " + day ); } } } static final char kArabicZero = 0x0660; static final char kHindiZero = 0x0966; static final char kLatinZero = 0x0030; public void TestHindiArabicDigits() { String s; char first; String what; { DateFormat df = DateFormat.getInstance(new GregorianCalendar(), new Locale("hi","IN")); what = "Gregorian Calendar, hindi"; s = df.format(new Date(0)); /* 31/12/1969 */ logln(what + "=" + s); first = s.charAt(0); if(first(kHindiZero+9)) { errln(what + "- wrong digit, got " + s + " (integer digit value " + new Integer((int)first).toString()); } } { DateFormat df = DateFormat.getInstance(new IslamicCalendar(), new Locale("ar","IQ")); s = df.format(new Date(0)); /* 21/10/1989 */ what = "Islamic Calendar, Arabic"; logln(what + ": " + s); first = s.charAt(0); if(first(kArabicZero+9)) { errln(what + " wrong digit, got " + s + " (integer digit value " + new Integer((int)first).toString()); } } { DateFormat df = DateFormat.getInstance(new GregorianCalendar(), new Locale("ar","IQ")); s = df.format(new Date(0)); /* 31/12/1969 */ what = "Gregorian, ar_IQ, df.getInstance"; logln(what + ": " + s); first = s.charAt(0); if(first(kArabicZero+9)) { errln(what + " wrong digit but got " + s + " (integer digit value " + new Integer((int)first).toString()); } } { DateFormat df = DateFormat.getInstance(new GregorianCalendar(), new Locale("mt","MT")); s = df.format(new Date(0)); /* 31/12/1969 */ what = "Gregorian, mt_MT, df.getInstance"; logln(what + ": " + s); first = s.charAt(0); if(first(kLatinZero+9)) { errln(what + " wrong digit but got " + s + " (integer digit value " + new Integer((int)first).toString()); } } { DateFormat df = DateFormat.getInstance(new IslamicCalendar(), new Locale("ar","IQ")); s = df.format(new Date(0)); /* 31/12/1969 */ what = "Islamic calendar, ar_IQ, df.getInstance"; logln(what+ ": " + s); first = s.charAt(0); if(first(kArabicZero+9)) { errln(what + " wrong digit but got " + s + " (integer digit value " + new Integer((int)first).toString()); } } { DateFormat df = DateFormat.getDateTimeInstance(DateFormat.SHORT, DateFormat.SHORT, new Locale("ar","IQ")); s = df.format(new Date(0)); /* 31/12/1969 */ what = "ar_IQ, getDateTimeInstance"; logln(what+ ": " + s); first = s.charAt(0); if(first(kArabicZero+9)) { errln(what + " wrong digit but got " + s + " (integer digit value " + new Integer((int)first).toString()); } } { DateFormat df = DateFormat.getInstance(new JapaneseCalendar(), new Locale("ar","IQ")); s = df.format(new Date(0)); /* 31/12/1969 */ what = "ar_IQ, Japanese Calendar, getInstance"; logln(what+ ": " + s); // Note: The default date pattern for Japanese calendar starts with era in CLDR 1.7 char last = s.charAt(s.length() - 1); if(last(kArabicZero+9)) { errln(what + " wrong digit but got " + s + " (integer digit value " + new Integer((int)last).toString()); } } } // Ticket#5683 // Some ICU4J 3.6 data files contain garbage data which prevent the code to resolve another // bundle as an alias. zh_TW should be equivalent to zh_Hant_TW public void TestT5683() { Locale[] aliasLocales = { new Locale("zh", "CN"), new Locale("zh", "TW"), new Locale("zh", "HK"), new Locale("zh", "SG"), new Locale("zh", "MO") }; ULocale[] canonicalLocales = { new ULocale("zh_Hans_CN"), new ULocale("zh_Hant_TW"), new ULocale("zh_Hant_HK"), new ULocale("zh_Hans_SG"), new ULocale("zh_Hant_MO") }; Date d = new Date(0); for (int i = 0; i < aliasLocales.length; i++) { DateFormat dfAlias = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, aliasLocales[i]); DateFormat dfCanonical = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, canonicalLocales[i]); String sAlias = dfAlias.format(d); String sCanonical = dfCanonical.format(d); if (!sAlias.equals(sCanonical)) { errln("Fail: The format result for locale " + aliasLocales[i] + " is different from the result for locale " + canonicalLocales[i] + ": " + sAlias + "[" + aliasLocales[i] + "] / " + sCanonical + "[" + canonicalLocales[i] + "]"); } } } public void Test5006GetShortMonths() throws Exception { // Currently supported NLV locales Locale ENGLISH = new Locale("en", "US"); // We don't support 'en' alone Locale ARABIC = new Locale("ar", ""); Locale CZECH = new Locale("cs", ""); Locale GERMAN = new Locale("de", ""); Locale GREEK = new Locale("el", ""); Locale SPANISH = new Locale("es", ""); Locale FRENCH = new Locale("fr", ""); Locale HUNGARIAN = new Locale("hu", ""); Locale ITALIAN = new Locale("it", ""); Locale HEBREW = new Locale("iw", ""); Locale JAPANESE = new Locale("ja", ""); Locale KOREAN = new Locale("ko", ""); Locale POLISH = new Locale("pl", ""); Locale PORTUGUESE = new Locale("pt", "BR"); Locale RUSSIAN = new Locale("ru", ""); Locale TURKISH = new Locale("tr", ""); Locale CHINESE_SIMPLIFIED = new Locale("zh", "CN"); Locale CHINESE_TRADITIONAL = new Locale("zh", "TW"); Locale[] locales = new Locale[] { ENGLISH, ARABIC, CZECH, GERMAN, GREEK, SPANISH, FRENCH, HUNGARIAN, ITALIAN, HEBREW, JAPANESE, KOREAN, POLISH, PORTUGUESE, RUSSIAN, TURKISH, CHINESE_SIMPLIFIED, CHINESE_TRADITIONAL }; String[] islamicTwelfthMonthLocalized = new String[locales.length]; String[] gregorianTwelfthMonthLocalized = new String[locales.length]; for (int i = 0; i < locales.length; i++) { Locale locale = locales[i]; // Islamic com.ibm.icu.util.Calendar islamicCalendar = new com.ibm.icu.util.IslamicCalendar(locale); com.ibm.icu.text.SimpleDateFormat islamicDateFormat = (com.ibm.icu.text.SimpleDateFormat) islamicCalendar .getDateTimeFormat(com.ibm.icu.text.DateFormat.FULL, -1, locale); com.ibm.icu.text.DateFormatSymbols islamicDateFormatSymbols = islamicDateFormat .getDateFormatSymbols(); String[] shortMonths = islamicDateFormatSymbols.getShortMonths(); String twelfthMonthLocalized = shortMonths[11]; islamicTwelfthMonthLocalized[i] = twelfthMonthLocalized; // Gregorian com.ibm.icu.util.Calendar gregorianCalendar = new com.ibm.icu.util.GregorianCalendar( locale); com.ibm.icu.text.SimpleDateFormat gregorianDateFormat = (com.ibm.icu.text.SimpleDateFormat) gregorianCalendar .getDateTimeFormat(com.ibm.icu.text.DateFormat.FULL, -1, locale); com.ibm.icu.text.DateFormatSymbols gregorianDateFormatSymbols = gregorianDateFormat .getDateFormatSymbols(); shortMonths = gregorianDateFormatSymbols.getShortMonths(); twelfthMonthLocalized = shortMonths[11]; gregorianTwelfthMonthLocalized[i] = twelfthMonthLocalized; } // Compare for (int i = 0; i < locales.length; i++) { String gregorianTwelfthMonth = gregorianTwelfthMonthLocalized[i]; String islamicTwelfthMonth = islamicTwelfthMonthLocalized[i]; logln(locales[i] + ": " + gregorianTwelfthMonth + ", " + islamicTwelfthMonth); if (gregorianTwelfthMonth.equalsIgnoreCase(islamicTwelfthMonth)) { errln(locales[i] + ": gregorian and islamic are same: " + gregorianTwelfthMonth + ", " + islamicTwelfthMonth); } } } } icu4j-4.2/src/com/ibm/icu/dev/test/format/IntlTestDateFormat.java0000644000175000017500000002344511361046232024642 0ustar twernertwerner/*************************************************************************************** * * Copyright (C) 1996-2007, International Business Machines * Corporation and others. All Rights Reserved. */ /** * Port From: JDK 1.4b1 : java.text.Format.IntlTestDateFormat * Source File: java/text/format/IntlTestDateFormat.java **/ /* @test 1.4 98/03/06 @summary test International Date Format */ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.util.ULocale; import java.text.FieldPosition; import java.text.ParseException; import java.util.Random; import java.util.Date; public class IntlTestDateFormat extends com.ibm.icu.dev.test.TestFmwk { // Values in milliseconds (== Date) private static final long ONESECOND = 1000; private static final long ONEMINUTE = 60 * ONESECOND; private static final long ONEHOUR = 60 * ONEMINUTE; private static final long ONEDAY = 24 * ONEHOUR; //private static final double ONEYEAR = 365.25 * ONEDAY; // Approximate //The variable is never used // EModes //private static final byte GENERIC = 0; //private static final byte TIME = GENERIC + 1; //The variable is never used //private static final byte DATE = TIME + 1; //The variable is never used //private static final byte DATE_TIME = DATE + 1; //The variable is never used private DateFormat fFormat = null; private String fTestName = new String("getInstance"); private int fLimit = 3; // How many iterations it should take to reach convergence private Random random; // initialized in randDouble public IntlTestDateFormat() { //Constructure } protected void init() throws Exception{ fFormat = DateFormat.getInstance(); } public static void main(String[] args) throws Exception { new IntlTestDateFormat().run(args); } public void TestULocale() { localeTest(ULocale.getDefault(), "Default Locale"); } // This test does round-trip testing (format -> parse -> format -> parse -> etc.) of DateFormat. public void localeTest(final ULocale locale, final String localeName) { int timeStyle, dateStyle; // For patterns including only time information and a timezone, it may take // up to three iterations, since the timezone may shift as the year number // is determined. For other patterns, 2 iterations should suffice. fLimit = 3; for(timeStyle = 0; timeStyle < 4; timeStyle++) { fTestName = new String("Time test " + timeStyle + " (" + localeName + ")"); try { fFormat = DateFormat.getTimeInstance(timeStyle, locale); } catch(StringIndexOutOfBoundsException e) { errln("FAIL: localeTest time getTimeInstance exception"); throw e; } TestFormat(); } fLimit = 2; for(dateStyle = 0; dateStyle < 4; dateStyle++) { fTestName = new String("Date test " + dateStyle + " (" + localeName + ")"); try { fFormat = DateFormat.getDateInstance(dateStyle, locale); } catch(StringIndexOutOfBoundsException e) { errln("FAIL: localeTest date getTimeInstance exception"); throw e; } TestFormat(); } for(dateStyle = 0; dateStyle < 4; dateStyle++) { for(timeStyle = 0; timeStyle < 4; timeStyle++) { fTestName = new String("DateTime test " + dateStyle + "/" + timeStyle + " (" + localeName + ")"); try { fFormat = DateFormat.getDateTimeInstance(dateStyle, timeStyle, locale); } catch(StringIndexOutOfBoundsException e) { errln("FAIL: localeTest date/time getDateTimeInstance exception"); throw e; } TestFormat(); } } } public void TestFormat() { if (fFormat == null) { errln("FAIL: DateFormat creation failed"); return; } // logln("TestFormat: " + fTestName); Date now = new Date(); tryDate(new Date(0)); tryDate(new Date((long) 1278161801778.0)); tryDate(now); // Shift 6 months into the future, AT THE SAME TIME OF DAY. // This will test the DST handling. tryDate(new Date(now.getTime() + 6*30*ONEDAY)); Date limit = new Date(now.getTime() * 10); // Arbitrary limit for (int i=0; i<2; ++i) // tryDate(new Date(floor(randDouble() * limit))); tryDate(new Date((long) (randDouble() * limit.getTime()))); } private void describeTest() { if (fFormat == null) { errln("FAIL: no DateFormat"); return; } // Assume it's a SimpleDateFormat and get some info SimpleDateFormat s = (SimpleDateFormat) fFormat; logln(fTestName + " Pattern " + s.toPattern()); } private void tryDate(Date theDate) { final int DEPTH = 10; Date[] date = new Date[DEPTH]; StringBuffer[] string = new StringBuffer[DEPTH]; int dateMatch = 0; int stringMatch = 0; boolean dump = false; int i; for (i=0; i 0) { if (dateMatch == 0 && date[i] == date[i-1]) dateMatch = i; else if (dateMatch > 0 && date[i] != date[i-1]) { describeTest(); errln("********** FAIL: Date mismatch after match."); dump = true; break; } if (stringMatch == 0 && string[i] == string[i-1]) stringMatch = i; else if (stringMatch > 0 && string[i] != string[i-1]) { describeTest(); errln("********** FAIL: String mismatch after match."); dump = true; break; } } if (dateMatch > 0 && stringMatch > 0) break; } if (i == DEPTH) --i; if (stringMatch > fLimit || dateMatch > fLimit) { describeTest(); errln("********** FAIL: No string and/or date match within " + fLimit + " iterations."); dump = true; } if (dump) { for (int k=0; k<=i; ++k) { logln("" + k + ": " + date[k] + " F> " + string[k] + " P> "); } } } // Return a random double from 0.01 to 1, inclusive private double randDouble() { if (random == null) { random = createRandom(); } // Assume 8-bit (or larger) rand values. Also assume // that the system rand() function is very poor, which it always is. // double d; // int i; // do { // for (i=0; i < sizeof(double); ++i) // { // char poke = (char*)&d; // poke[i] = (rand() & 0xFF); // } // } while (TPlatformUtilities.isNaN(d) || TPlatformUtilities.isInfinite(d)); // if (d < 0.0) d = -d; // if (d > 0.0) // { // double e = floor(log10(d)); // if (e < -2.0) d *= pow(10.0, -e-2); // else if (e > -1.0) d /= pow(10.0, e+1); // } // return d; return random.nextDouble(); } public void TestAvailableLocales() { final ULocale[] locales = DateFormat.getAvailableULocales(); long count = locales.length; logln("" + count + " available locales"); if (locales != null && count != 0) { StringBuffer all = new StringBuffer(); for (int i=0; i " + outString); } } /** * Test getIntegerInstance(); */ public void Test4408066() { NumberFormat nf1 = NumberFormat.getIntegerInstance(); NumberFormat nf2 = NumberFormat.getIntegerInstance(Locale.CHINA); //test isParseIntegerOnly if (!nf1.isParseIntegerOnly() || !nf2.isParseIntegerOnly()) { errln("Failed : Integer Number Format Instance should set setParseIntegerOnly(true)"); } //Test format { double[] data = { -3.75, -2.5, -1.5, -1.25, 0, 1.0, 1.25, 1.5, 2.5, 3.75, 10.0, 255.5 }; String[] expected = { "-4", "-2", "-2", "-1", "0", "1", "1", "2", "2", "4", "10", "256" }; for (int i = 0; i < data.length; ++i) { String result = nf1.format(data[i]); if (!result.equals(expected[i])) { errln("Failed => Source: " + Double.toString(data[i]) + ";Formatted : " + result + ";but expectted: " + expected[i]); } } } //Test parse, Parsing should stop at "." { String data[] = { "-3.75", "-2.5", "-1.5", "-1.25", "0", "1.0", "1.25", "1.5", "2.5", "3.75", "10.0", "255.5" }; long[] expected = { -3, -2, -1, -1, 0, 1, 1, 1, 2, 3, 10, 255 }; for (int i = 0; i < data.length; ++i) { Number n = null; try { n = nf1.parse(data[i]); } catch (ParseException e) { errln("Failed: " + e.getMessage()); } if (!(n instanceof Long) || (n instanceof Integer)) { errln("Failed: Integer Number Format should parse string to Long/Integer"); } if (n.longValue() != expected[i]) { errln("Failed=> Source: " + data[i] + ";result : " + n.toString() + ";expected :" + Long.toString(expected[i])); } } } } //Test New serialized DecimalFormat(2.0) read old serialized forms of DecimalFormat(1.3.1.1) public void TestSerialization() throws IOException{ //#if defined(FOUNDATION10) || defined(J2SE13) //#else byte[][] contents = NumberFormatSerialTestData.getContent(); double data = 1234.56; String[] expected = { "1,234.56", "$1,234.56", "123,456%", "1.23456E3"}; for (int i = 0; i < 4; ++i) { ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(contents[i])); try { NumberFormat format = (NumberFormat) ois.readObject(); String result = format.format(data); if (result.equals(expected[i])) { logln("OK: Deserialized bogus NumberFormat(new version read old version)"); } else { errln("FAIL: the test data formats are not euqal"); } } catch (Exception e) { warnln("FAIL: " + e.getMessage()); } } //#endif } /* * Test case for JB#5509, strict parsing issue */ public void TestJB5509() { String[] data = { "1,2", "1.2", "1,2.5", "1,23.5", "1,234.5", "1,234", "1,234,567", "1,234,567.8", "1,234,5", "1,234,5.6", "1,234,56.7" }; boolean[] expected = { // false for expected parse failure false, true, false, false, true, true, true, true, false, false, false, false }; DecimalFormat df = new DecimalFormat("#,##0.###", new DecimalFormatSymbols(new ULocale("en_US"))); df.setParseStrict(true); for (int i = 0; i < data.length; i++) { try { df.parse(data[i]); if (!expected[i]) { errln("Failed: ParseException must be thrown for string " + data[i]); } } catch (ParseException pe) { if (expected[i]) { errln("Failed: ParseException must not be thrown for string " + data[i]); } } } } /* * Test case for ticket#5698 - parsing extremely large/small values */ public void TestT5698() { final String[] data = { "12345679E66666666666666666", "-12345679E66666666666666666", ".1E2147483648", // exponent > max int ".1E2147483647", // exponent == max int ".1E-2147483648", // exponent == min int ".1E-2147483649", // exponent < min int "1.23E350", // value > max double "1.23E300", // value < max double "-1.23E350", // value < min double "-1.23E300", // value > min double "4.9E-324", // value = smallest non-zero double "1.0E-325", // 0 < value < smallest non-zero positive double0 "-1.0E-325", // 0 > value > largest non-zero negative double }; final double[] expected = { Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY, Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, 0.0, 0.0, Double.POSITIVE_INFINITY, 1.23e300d, Double.NEGATIVE_INFINITY, -1.23e300d, 4.9e-324d, 0.0, -0.0, }; NumberFormat nfmt = NumberFormat.getInstance(); for (int i = 0; i < data.length; i++) { try { Number n = nfmt.parse(data[i]); if (expected[i] != n.doubleValue()) { errln("Failed: Parsed result for " + data[i] + ": " + n.doubleValue() + " / expected: " + expected[i]); } } catch (ParseException pe) { errln("Failed: ParseException is thrown for " + data[i]); } } } void checkNBSPPatternRtNum(String testcase, NumberFormat nf, double myNumber) { String myString = nf.format(myNumber); double aNumber; try { aNumber = nf.parse(myString).doubleValue(); } catch (ParseException e) { // TODO Auto-generated catch block errln("FAIL: " + testcase +" - failed to parse. " + e.toString()); return; } if(Math.abs(aNumber-myNumber)>.001) { errln("FAIL: "+testcase+": formatted "+myNumber+", parsed into "+aNumber+"\n"); } else { logln("PASS: "+testcase+": formatted "+myNumber+", parsed into "+aNumber+"\n"); } } void checkNBSPPatternRT(String testcase, NumberFormat nf) { checkNBSPPatternRtNum(testcase, nf, 12345.); checkNBSPPatternRtNum(testcase, nf, -12345.); } public void TestNBSPInPattern() { NumberFormat nf = null; String testcase; testcase="ar_AE UNUM_CURRENCY"; nf = NumberFormat.getCurrencyInstance(new ULocale("ar_AE")); checkNBSPPatternRT(testcase, nf); // if we don't have CLDR 1.6 data, bring out the problem anyways String SPECIAL_PATTERN = "\u00A4\u00A4'\u062f.\u0625.\u200f\u00a0'###0.00"; testcase = "ar_AE special pattern: " + SPECIAL_PATTERN; nf = new DecimalFormat(); ((DecimalFormat)nf).applyPattern(SPECIAL_PATTERN); checkNBSPPatternRT(testcase, nf); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/DateTimeGeneratorTest.java0000644000175000017500000006677611361046232025346 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2006-2010, Google, International Business Machines Corporation * * and others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import java.text.ParsePosition; import java.util.Date; import java.util.Iterator; import java.util.List; import java.util.Random; import com.ibm.icu.dev.test.TestFmwk; import com.ibm.icu.impl.PatternTokenizer; import com.ibm.icu.impl.Utility; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.DateTimePatternGenerator; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; import com.ibm.icu.text.DateTimePatternGenerator.VariableField; import com.ibm.icu.util.Calendar; import com.ibm.icu.util.GregorianCalendar; import com.ibm.icu.util.SimpleTimeZone; import com.ibm.icu.util.TimeZone; import com.ibm.icu.util.ULocale; public class DateTimeGeneratorTest extends TestFmwk { public static boolean GENERATE_TEST_DATA = System.getProperty("GENERATE_TEST_DATA") != null; public static int RANDOM_COUNT = 1000; public static boolean DEBUG = false; public static void main(String[] args) throws Exception { new DateTimeGeneratorTest().run(args); } public void TestSimple() { // some simple use cases ULocale locale = ULocale.GERMANY; TimeZone zone = TimeZone.getTimeZone("Europe/Paris"); // make from locale DateTimePatternGenerator gen = DateTimePatternGenerator.getInstance(locale); SimpleDateFormat format = new SimpleDateFormat(gen.getBestPattern("MMMddHmm"), locale); format.setTimeZone(zone); assertEquals("simple format: MMMddHmm", "14. Okt 8:58", format.format(sampleDate)); // (a generator can be built from scratch, but that is not a typical use case) // modify the generator by adding patterns DateTimePatternGenerator.PatternInfo returnInfo = new DateTimePatternGenerator.PatternInfo(); gen.addPattern("d'. von' MMMM", true, returnInfo); // the returnInfo is mostly useful for debugging problem cases format.applyPattern(gen.getBestPattern("MMMMddHmm")); assertEquals("modified format: MMMddHmm", "14. von Oktober 8:58", format.format(sampleDate)); // get a pattern and modify it format = (SimpleDateFormat)DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, locale); format.setTimeZone(zone); String pattern = format.toPattern(); assertEquals("full-date", "Donnerstag, 14. Oktober 1999 08:58:59 Mitteleurop\u00E4ische Sommerzeit", format.format(sampleDate)); // modify it to change the zone. String newPattern = gen.replaceFieldTypes(pattern, "vvvv"); format.applyPattern(newPattern); assertEquals("full-date: modified zone", "Donnerstag, 14. Oktober 1999 08:58:59 Frankreich", format.format(sampleDate)); // add test of basic cases //lang YYYYMMM MMMd MMMdhmm hmm hhmm Full Date-Time // en Mar 2007 Mar 4 6:05 PM Mar 4 6:05 PM 06:05 PM Sunday, March 4, 2007 6:05:05 PM PT DateTimePatternGenerator enGen = DateTimePatternGenerator.getInstance(ULocale.ENGLISH); TimeZone enZone = TimeZone.getTimeZone("Etc/GMT"); SimpleDateFormat enFormat = (SimpleDateFormat)DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, ULocale.ENGLISH); enFormat.setTimeZone(enZone); String[][] tests = { {"yyyyMMMdd", "Oct 14, 1999"}, {"yyyyqqqq", "4th quarter 1999"}, {"yMMMdd", "Oct 14, 1999"}, {"EyyyyMMMdd", "Thu, Oct 14, 1999"}, {"yyyyMMdd", "10/14/1999"}, {"yyyyMMM", "Oct 1999"}, {"yyyyMM", "10/1999"}, {"yyMM", "10/99"}, {"yMMMMMd", "O 14, 1999"}, // narrow format {"EEEEEMMMMMd", "T, O 14"}, // narrow format {"MMMd", "Oct 14"}, {"MMMdhmm", "Oct 14 6:58 AM"}, {"EMMMdhmms", "Thu, Oct 14 6:58:59 AM"}, {"MMdhmm", "10/14 6:58 AM"}, {"EEEEMMMdhmms", "Thursday, Oct 14 6:58:59 AM"}, {"yyyyMMMddhhmmss", "Oct 14, 1999 06:58:59 AM"}, {"EyyyyMMMddhhmmss", "Thu, Oct 14, 1999 06:58:59 AM"}, {"hmm", "6:58 AM"}, {"hhmm", "06:58 AM"}, {"hhmmVVVV", "06:58 AM GMT+00:00"}, }; for (int i = 0; i < tests.length; ++i) { final String testSkeleton = tests[i][0]; String pat = enGen.getBestPattern(testSkeleton); enFormat.applyPattern(pat); String formattedDate = enFormat.format(sampleDate); assertEquals("Testing skeleton '" + testSkeleton + "' with " + sampleDate, tests[i][1], formattedDate); } } public void TestRoot() { DateTimePatternGenerator rootGen = DateTimePatternGenerator.getInstance(ULocale.ROOT); SimpleDateFormat rootFormat = new SimpleDateFormat(rootGen.getBestPattern("yMdHms"), ULocale.ROOT); rootFormat.setTimeZone(gmt); assertEquals("root format: yMdHms", "1999-10-14 6:58:59", rootFormat.format(sampleDate)); } public void TestEmpty() { // now nothing DateTimePatternGenerator nullGen = DateTimePatternGenerator.getEmptyInstance(); SimpleDateFormat format = new SimpleDateFormat(nullGen.getBestPattern("yMdHms"), ULocale.ROOT); TimeZone rootZone = TimeZone.getTimeZone("Etc/GMT"); format.setTimeZone(rootZone); } public void TestPatternParser() { StringBuffer buffer = new StringBuffer(); PatternTokenizer pp = new PatternTokenizer() .setIgnorableCharacters(new UnicodeSet("[-]")) .setSyntaxCharacters(new UnicodeSet("[a-zA-Z]")) .setEscapeCharacters(new UnicodeSet("[b#]")) .setUsingQuote(true); logln("Using Quote"); for (int i = 0; i < patternTestData.length; ++i) { String patternTest = (String) patternTestData[i]; CheckPattern(buffer, pp, patternTest); } String[] randomSet = {"abcdef", "$12!@#-", "'\\"}; for (int i = 0; i < RANDOM_COUNT; ++i) { String patternTest = getRandomString(randomSet, 0, 10); CheckPattern(buffer, pp, patternTest); } logln("Using Backslash"); pp.setUsingQuote(false).setUsingSlash(true); for (int i = 0; i < patternTestData.length; ++i) { String patternTest = (String) patternTestData[i]; CheckPattern(buffer, pp, patternTest); } for (int i = 0; i < RANDOM_COUNT; ++i) { String patternTest = getRandomString(randomSet, 0, 10); CheckPattern(buffer, pp, patternTest); } } Random random = new java.util.Random(-1); private String getRandomString(String[] randomList, int minLen, int maxLen) { StringBuffer result = new StringBuffer(); int len = random.nextInt(maxLen + 1 - minLen) + minLen; for (int i = minLen; i < len; ++ i) { String source = randomList[random.nextInt(randomList.length)]; // don't bother with surrogates char ch = source.charAt(random.nextInt(source.length())); UTF16.append(result, ch); } return result.toString(); } private void CheckPattern(StringBuffer buffer, PatternTokenizer pp, String patternTest) { pp.setPattern(patternTest); if (DEBUG && isVerbose()) { showItems(buffer, pp, patternTest); } String normalized = pp.setStart(0).normalize(); logln("input:\t<" + patternTest + ">" + "\tnormalized:\t<" + normalized + ">"); String doubleNormalized = pp.setPattern(normalized).normalize(); if (!normalized.equals(doubleNormalized)) { errln("Normalization not idempotent:\t" + patternTest + "\tnormalized: " + normalized + "\tnormalized2: " + doubleNormalized); // allow for debugging at the point of failure if (DEBUG) { pp.setPattern(patternTest); normalized = pp.setStart(0).normalize(); pp.setPattern(normalized); showItems(buffer, pp, normalized); doubleNormalized = pp.normalize(); } } } private void showItems(StringBuffer buffer, PatternTokenizer pp, String patternTest) { logln("input:\t<" + patternTest + ">"); while (true) { buffer.setLength(0); int status = pp.next(buffer); if (status == PatternTokenizer.DONE) break; String lit = ""; if (status != PatternTokenizer.SYNTAX ) { lit = "\t<" + pp.quoteLiteral(buffer) + ">"; } logln("\t" + statusName[status] + "\t<" + buffer + ">" + lit); } } static final String[] statusName = {"DONE", "SYNTAX", "LITERAL", "BROKEN_QUOTE", "BROKEN_ESCAPE", "UNKNOWN"}; public void TestBasic() { ULocale uLocale = null; DateTimePatternGenerator dtfg = null; Date date = null; for (int i = 0; i < dateTestData.length; ++i) { if (dateTestData[i] instanceof ULocale) { uLocale = (ULocale) dateTestData[i]; dtfg = DateTimePatternGenerator.getInstance(uLocale); if (GENERATE_TEST_DATA) logln("new ULocale(\"" + uLocale.toString() + "\"),"); } else if (dateTestData[i] instanceof Date) { date = (Date) dateTestData[i]; if (GENERATE_TEST_DATA) logln("new Date(" + date.getTime()+ "L),"); } else if (dateTestData[i] instanceof String) { String testSkeleton = (String) dateTestData[i]; String pattern = dtfg.getBestPattern(testSkeleton); SimpleDateFormat sdf = new SimpleDateFormat(pattern, uLocale); String formatted = sdf.format(date); if (GENERATE_TEST_DATA) logln("new String[] {\"" + testSkeleton + "\", \"" + Utility.escape(formatted) + "\"},"); //logln(uLocale + "\t" + testSkeleton + "\t" + pattern + "\t" + sdf.format(date)); } else { String[] testPair = (String[]) dateTestData[i]; String testSkeleton = testPair[0]; String testFormatted = testPair[1]; String pattern = dtfg.getBestPattern(testSkeleton); SimpleDateFormat sdf = new SimpleDateFormat(pattern, uLocale); String formatted = sdf.format(date); if (GENERATE_TEST_DATA) { logln("new String[] {\"" + testSkeleton + "\", \"" + Utility.escape(formatted) + "\"},"); } else if (!formatted.equals(testFormatted)) { errln(uLocale + "\tformatted string doesn't match test case: " + testSkeleton + "\t generated: " + pattern + "\t expected: " + testFormatted + "\t got: " + formatted); if (true) { // debug pattern = dtfg.getBestPattern(testSkeleton); sdf = new SimpleDateFormat(pattern, uLocale); formatted = sdf.format(date); } } //logln(uLocale + "\t" + testSkeleton + "\t" + pattern + "\t" + sdf.format(date)); } } } static final Object[] patternTestData = { "'$f''#c", "'' 'a", "'.''.'", "\\u0061\\\\", "mm.dd 'dd ' x", "'' ''", }; // can be generated by using GENERATE_TEST_DATA. Must be reviewed before adding static final Object[] dateTestData = { new Date(916300739000L), // 1999-01-13T23:58:59,0-0800 new ULocale("en_US"), new String[] {"yM", "1/1999"}, new String[] {"yMMM", "Jan 1999"}, new String[] {"yMd", "1/13/1999"}, new String[] {"yMMMd", "Jan 13, 1999"}, new String[] {"Md", "1/13"}, new String[] {"MMMd", "Jan 13"}, new String[] {"yQQQ", "Q1 1999"}, new String[] {"jjmm", "11:58 PM"}, new String[] {"hhmm", "11:58 PM"}, new String[] {"HHmm", "23:58"}, new String[] {"mmss", "58:59"}, new ULocale("zh_Hans_CN"), new String[] {"yM", "1999-1"}, new String[] {"yMMM", "1999-01"}, new String[] {"yMd", "1999\u5E741\u670813\u65E5"}, new String[] {"yMMMd", "1999\u5E7401\u670813\u65E5"}, new String[] {"Md", "1-13"}, new String[] {"MMMd", "01-13"}, new String[] {"yQQQ", "1999\u5E741\u5B63"}, new String[] {"hhmm", "\u4E0B\u534811:58"}, new String[] {"HHmm", "23:58"}, new String[] {"mmss", "58:59"}, new ULocale("de_DE"), new String[] {"yM", "1999-1"}, new String[] {"yMMM", "Jan 1999"}, new String[] {"yMd", "13.1.1999"}, new String[] {"yMMMd", "13. Jan 1999"}, new String[] {"Md", "13.1."}, // 13.1 new String[] {"MMMd", "13. Jan"}, new String[] {"yQQQ", "Q1 1999"}, new String[] {"jjmm", "23:58"}, new String[] {"hhmm", "11:58 nachm."}, new String[] {"HHmm", "23:58"}, new String[] {"mmss", "58:59"}, new ULocale("fi"), new String[] {"yM", "1/1999"}, // 1.1999 new String[] {"yMMM", "tammikuuta 1999"}, // tammi 1999 new String[] {"yMd", "13.1.1999"}, new String[] {"yMMMd", "13. tammikuuta 1999"}, new String[] {"Md", "13.1."}, new String[] {"MMMd", "13. tammikuuta"}, new String[] {"yQQQ", "1. nelj./1999"}, // 1. nelj. 1999 new String[] {"jjmm", "23.58"}, new String[] {"hhmm", "11.58 ip."}, new String[] {"HHmm", "23.58"}, new String[] {"mmss", "58.59"}, }; public void DayMonthTest() { final ULocale locale = ULocale.FRANCE; // set up the generator DateTimePatternGenerator dtpgen = DateTimePatternGenerator.getInstance(locale); // get a pattern for an abbreviated month and day final String pattern = dtpgen.getBestPattern("MMMd"); SimpleDateFormat formatter = new SimpleDateFormat(pattern, locale); // use it to format (or parse) String formatted = formatter.format(new Date()); logln("formatted=" + formatted); // for French, the result is "13 sept." } public void TestOrdering() { ULocale[] locales = ULocale.getAvailableLocales(); for (int i = 0; i < locales.length; ++i) { for (int style1 = DateFormat.FULL; style1 <= DateFormat.SHORT; ++style1) { for (int style2 = DateFormat.FULL; style2 < style1; ++style2) { checkCompatible(style1, style2, locales[i]); } } } } public void TestReplacingZoneString() { Date testDate = new Date(); TimeZone testTimeZone = TimeZone.getTimeZone("America/New_York"); TimeZone bogusTimeZone = new SimpleTimeZone(1234, "Etc/Unknown"); Calendar calendar = Calendar.getInstance(); ParsePosition parsePosition = new ParsePosition(0); ULocale[] locales = ULocale.getAvailableLocales(); int count = 0; for (int i = 0; i < locales.length; ++i) { // skip the country locales unless we are doing exhaustive tests if (getInclusion() < 6) { if (locales[i].getCountry().length() > 0) { continue; } } count++; // Skipping some test case in the non-exhaustive mode to reduce the test time //ticket#6503 if(params.inclusion<=5 && count%3!=0){ continue; } logln(locales[i].toString()); DateTimePatternGenerator dtpgen = DateTimePatternGenerator.getInstance(locales[i]); for (int style1 = DateFormat.FULL; style1 <= DateFormat.SHORT; ++style1) { final SimpleDateFormat oldFormat = (SimpleDateFormat) DateFormat.getTimeInstance(style1, locales[i]); String pattern = oldFormat.toPattern(); String newPattern = dtpgen.replaceFieldTypes(pattern, "VVVV"); // replaceZoneString(pattern, "VVVV"); if (newPattern.equals(pattern)) { continue; } // verify that it roundtrips parsing SimpleDateFormat newFormat = new SimpleDateFormat(newPattern, locales[i]); newFormat.setTimeZone(testTimeZone); String formatted = newFormat.format(testDate); calendar.setTimeZone(bogusTimeZone); parsePosition.setIndex(0); newFormat.parse(formatted, calendar, parsePosition); if (parsePosition.getErrorIndex() >= 0) { errln("Failed parse with VVVV:\t" + locales[i] + ",\t\"" + pattern + "\",\t\"" + newPattern + "\",\t\"" + formatted.substring(0,parsePosition.getErrorIndex()) + "{}" + formatted.substring(parsePosition.getErrorIndex()) + "\""); } else if (!calendar.getTimeZone().getID().equals(testTimeZone.getID())) { errln("Failed timezone roundtrip with VVVV:\t" + locales[i] + ",\t\"" + pattern + "\",\t\"" + newPattern + "\",\t\"" + formatted + "\",\t" + calendar.getTimeZone().getID() + " != " + testTimeZone.getID()); } else { logln(locales[i] + ":\t\"" + pattern + "\" => \t\"" + newPattern + "\"\t" + formatted); } } } } public void TestVariableCharacters() { UnicodeSet valid = new UnicodeSet("[G y Y u Q q M L w W d D F g E e c a h H K k m s S A z Z v V]"); for (char c = 0; c < 0xFF; ++c) { boolean works = false; try { VariableField vf = new VariableField(String.valueOf(c), true); logln("VariableField " + vf.toString()); works = true; } catch (Exception e) {} if (works != valid.contains(c)) { if (works) { errln("VariableField can be created with illegal character: " + c); } else { errln("VariableField can't be created with legal character: " + c); } } } } static String[] DATE_STYLE_NAMES = { "FULL", "LONG", "MEDIUM", "SHORT" }; /** * @param fullOrder * @param longOrder */ private void checkCompatible(int style1, int style2, ULocale uLocale) { DateOrder order1 = getOrdering(style1, uLocale); DateOrder order2 = getOrdering(style2, uLocale); if (!order1.hasSameOrderAs(order2)) { if (order1.monthLength == order2.monthLength) { // error if have same month length, different ordering if (skipIfBeforeICU(4,3,0)) { logln(showOrderComparison(uLocale, style1, style2, order1, order2)); } else { errln(showOrderComparison(uLocale, style1, style2, order1, order2)); } } else if (isVerbose() && order1.monthLength > 2 && order2.monthLength > 2) { // warn if both are not numeric logln(showOrderComparison(uLocale, style1, style2, order1, order2)); } } } private String showOrderComparison(ULocale uLocale, int style1, int style2, DateOrder order1, DateOrder order2) { String pattern1 = ((SimpleDateFormat) DateFormat.getDateInstance(style1, uLocale)).toPattern(); String pattern2 = ((SimpleDateFormat) DateFormat.getDateInstance(style2, uLocale)).toPattern(); return "Mismatch in in ordering for " + uLocale + ": " + DATE_STYLE_NAMES[style1] + ": " + order1 + ", <" + pattern1 + ">; " + DATE_STYLE_NAMES[style2] + ": " + order2 + ", <" + pattern2 + ">; " ; } /** * Main date fields -- Poor-man's enum -- change to real enum when we get JDK 1.5 */ public static class DateFieldType { private String name; private DateFieldType(String string) { name = string; } public static DateFieldType YEAR = new DateFieldType("YEAR"), MONTH = new DateFieldType("MONTH"), DAY = new DateFieldType("DAY"); public String toString() { return name; } } /** * Simple struct for output from getOrdering */ static class DateOrder { int monthLength; DateFieldType[] fields = new DateFieldType[3]; public boolean isCompatible(DateOrder other) { return monthLength == other.monthLength; } /** * @param order2 * @return */ public boolean hasSameOrderAs(DateOrder other) { // TODO Auto-generated method stub return fields[0] == other.fields[0] && fields[1] == other.fields[1] && fields[2] == other.fields[2]; } public String toString() { return "{" + monthLength + ", " + fields[0] + ", " + fields[1] + ", " + fields[2] + "}"; } public boolean equals(Object that) { DateOrder other = (DateOrder) that; return monthLength == other.monthLength && fields[0] == other.fields[0] && fields[1] == other.fields[1] && fields[2] == other.fields[2]; } } DateTimePatternGenerator.FormatParser formatParser = new DateTimePatternGenerator.FormatParser (); DateTimePatternGenerator generator = DateTimePatternGenerator.getEmptyInstance(); private Calendar sampleCalendar = new GregorianCalendar(1999, Calendar.OCTOBER, 13, 23, 58, 59); private Date sampleDate = sampleCalendar.getTime(); private TimeZone gmt = TimeZone.getTimeZone("Etc/GMT"); /** * Replace the zone string with a different type, eg v's for z's, etc.

Called with a pattern, such as one gotten from *

     * String pattern = ((SimpleDateFormat) DateFormat.getTimeInstance(style, locale)).toPattern();
     * 
* @param pattern original pattern to change, such as "HH:mm zzzz" * @param newZone Must be: z, zzzz, Z, ZZZZ, v, vvvv, V, or VVVV * @return */ public String replaceZoneString(String pattern, String newZone) { final List itemList = formatParser.set(pattern).getItems(); boolean changed = false; for (int i = 0; i < itemList.size(); ++i) { Object item = itemList.get(i); if (item instanceof VariableField) { VariableField variableField = (VariableField) item; if (variableField.getType() == DateTimePatternGenerator.ZONE) { if (!variableField.toString().equals(newZone)) { changed = true; itemList.set(i, new VariableField(newZone, true)); } } } } return changed ? formatParser.toString() : pattern; } public boolean containsZone(String pattern) { for (Iterator it = formatParser.set(pattern).getItems().iterator(); it.hasNext();) { Object item = it.next(); if (item instanceof VariableField) { VariableField variableField = (VariableField) item; if (variableField.getType() == DateTimePatternGenerator.ZONE) { return true; } } } return false; } /** * Get the ordering from a particular date format. Best is to use * DateFormat.FULL to get the format with String form month (like "January") * and DateFormat.SHORT for the numeric format order. They may be different. * (Theoretically all 4 formats could be different but that never happens in * practice.) * * @param style * DateFormat.FULL..DateFormat.SHORT * @param locale * desired locale. * @return * @return list of ordered items DateFieldType (I * didn't know what form you really wanted so this is just a * stand-in.) */ private DateOrder getOrdering(int style, ULocale locale) { // and the date pattern String pattern = ((SimpleDateFormat) DateFormat.getDateInstance(style, locale)).toPattern(); int count = 0; DateOrder result = new DateOrder(); for (Iterator it = formatParser.set(pattern).getItems().iterator(); it.hasNext();) { Object item = it.next(); if (!(item instanceof String)) { // the first character of the variable field determines the type, // according to CLDR. String variableField = item.toString(); switch (variableField.charAt(0)) { case 'y': case 'Y': case 'u': result.fields[count++] = DateFieldType.YEAR; break; case 'M': case 'L': result.monthLength = variableField.length(); if (result.monthLength < 2) { result.monthLength = 2; } result.fields[count++] = DateFieldType.MONTH; break; case 'd': case 'D': case 'F': case 'g': result.fields[count++] = DateFieldType.DAY; break; } } } return result; } /* * Test case for DateFormatPatternGenerator threading problem #7169 */ public void TestT7169() { Thread[] workers = new Thread[10]; for (int i = 0 ; i < workers.length; i++) { workers[i] = new Thread(new Runnable() { public void run() { try { for (int j = 0; j < 50; j++) { DateTimePatternGenerator patternGenerator = DateTimePatternGenerator.getFrozenInstance(ULocale.US); patternGenerator.getBestPattern("MMMMd"); } } catch (Exception e) { errln("FAIL: Caught an exception (frozen)" + e); } try { for (int j = 0; j < 50; j++) { DateTimePatternGenerator patternGenerator = DateTimePatternGenerator.getInstance(ULocale.US); patternGenerator.getBestPattern("MMMMd"); } } catch (Exception e) { errln("FAIL: Caught an exception " + e); } } }); } for (int i = 0; i < workers.length; i++) { workers[i].start(); } for (int i = 0; i < workers.length; i++) { try { workers[i].join(); } catch (InterruptedException ie) { } } } } //eof icu4j-4.2/src/com/ibm/icu/dev/test/format/WriteNumberFormatSerialTestData.java0000644000175000017500000001124611361046232027327 0ustar twernertwerner/* ******************************************************************************* * Copyright (C) 2001-2004, International Business Machines Corporation and * * others. All Rights Reserved. * ******************************************************************************* */ package com.ibm.icu.dev.test.format; import com.ibm.icu.text.*; import java.util.Locale; import java.io.*; /** * @version 1.0 * @author Ram Viswanadha */ public class WriteNumberFormatSerialTestData { static final String header="/*\n" + " *******************************************************************************\n"+ " * Copyright (C) 2001, International Business Machines Corporation and *\n"+ " * others. All Rights Reserved. *\n"+ " *******************************************************************************\n"+ " */\n\n"+ "package com.ibm.icu.dev.test.format;\n\n"+ "public class NumberFormatSerialTestData {\n"+ " //get Content\n"+ " public static byte[][] getContent() {\n"+ " return content;\n"+ " }\n"; static final String footer ="\n final static byte[][] content = {generalInstance, currencyInstance, percentInstance, scientificInstance};\n"+ "}\n"; public static void main(String[] args){ NumberFormat nf = NumberFormat.getInstance(Locale.US); NumberFormat nfc = NumberFormat.getCurrencyInstance(Locale.US); NumberFormat nfp = NumberFormat.getPercentInstance(Locale.US); NumberFormat nfsp = NumberFormat.getScientificInstance(Locale.US); try{ FileOutputStream file = new FileOutputStream("NumberFormatSerialTestData.java"); file.write(header.getBytes()); write(file,(Object)nf,"generalInstance", "//NumberFormat.getInstance(Locale.US)"); write(file,(Object)nfc,"currencyInstance","//NumberFormat.getCurrencyInstance(Locale.US)"); write(file,(Object)nfp,"percentInstance","//NumberFormat.getPercentInstance(Locale.US)"); write(file,(Object)nfsp,"scientificInstance","//NumberFormat.getScientificInstance(Locale.US)"); file.write(footer.getBytes()); file.close(); }catch( Exception e){ System.out.println(e.getMessage()); e.printStackTrace(); } } private static void write(FileOutputStream file,Object o ,String name,String comment){ try{ ByteArrayOutputStream bts = new ByteArrayOutputStream(); ObjectOutputStream os = new ObjectOutputStream(bts); os.writeObject((Object)o); os.flush(); os.close(); byte[] myArr = bts.toByteArray(); //String temp = new String(myArr); System.out.println(" "+comment+ " :"); /*System.out.println("minimumIntegerDigits : " + (temp.indexOf("minimumIntegerDigits")+"minimumIntegerDigits".length())); System.out.println("maximumIntegerDigits : " + (temp.indexOf("maximumIntegerDigits")+"maximumIntegerDigits".length())); System.out.println("minimumFractionDigits : " + (temp.indexOf("minimumFractionDigits")+"minimumFractionDigits".length())); System.out.println("maximumFractionDigits : " + (temp.indexOf("maximumFractionDigits")+"maximumFractionDigits".length())); */ //file.write(myArr); file.write(("\n "+comment).getBytes()); file.write(new String("\n static byte[] "+name+" = new byte[]{ \n").getBytes("UTF-8")); file.write( " ".getBytes()); for(int i=0; i 5) locCount = 5; logln("Quick mode: only testing first 5 Locales"); } TimeZone tz = TimeZone.getDefault(); logln("Default TimeZone: " + tz.getID()); if (INFINITE) { // Special infinite loop test mode for finding hard to reproduce errors Locale loc = Locale.getDefault(); logln("ENTERING INFINITE TEST LOOP FOR Locale: " + loc.getDisplayName()); for (;;) { _test(loc); } } else { _test(Locale.getDefault()); for (int i = 0; i < locCount; ++i) { _test(avail[i]); } } } public String styleName(int s) { switch (s) { case DateFormat.SHORT : return "SHORT"; case DateFormat.MEDIUM : return "MEDIUM"; case DateFormat.LONG : return "LONG"; case DateFormat.FULL : return "FULL"; default : return "Unknown"; } } public void _test(Locale loc) { if (!INFINITE) { logln("Locale: " + loc.getDisplayName()); } // Total possibilities = 24 // 4 date // 4 time // 16 date-time boolean[] TEST_TABLE = new boolean[24]; int i = 0; for (i = 0; i < 24; ++i) TEST_TABLE[i] = true; // If we have some sparseness, implement it here. Sparseness decreases // test time by eliminating some tests, up to 23. for (i = 0; i < SPARSENESS; i++) { int random = (int) (ran.nextDouble() * 24); if (random >= 0 && random < 24 && TEST_TABLE[i]) { TEST_TABLE[random] = false; } } int itable = 0; int style = 0; for (style = DateFormat.FULL; style <= DateFormat.SHORT; ++style) { if (TEST_TABLE[itable++]) { logln("Testing style " + styleName(style)); DateFormat df = DateFormat.getDateInstance(style, loc); _test(df, false); } } for (style = DateFormat.FULL; style <= DateFormat.SHORT; ++style) { if (TEST_TABLE[itable++]) { logln("Testing style " + styleName(style)); DateFormat df = DateFormat.getTimeInstance(style, loc); _test(df, true); } } for (int dstyle = DateFormat.FULL; dstyle <= DateFormat.SHORT; ++dstyle) { for (int tstyle = DateFormat.FULL; tstyle <= DateFormat.SHORT; ++tstyle) { if (TEST_TABLE[itable++]) { logln("Testing dstyle " + styleName(dstyle) + ", tstyle " + styleName(tstyle)); DateFormat df = DateFormat.getDateTimeInstance(dstyle, tstyle, loc); _test(df, false); } } } } public void _test(DateFormat fmt, boolean timeOnly) { if (!(fmt instanceof SimpleDateFormat)) { errln("DateFormat wasn't a SimpleDateFormat"); return; } String pat = ((SimpleDateFormat) fmt).toPattern(); logln(pat); // NOTE TO MAINTAINER // This indexOf check into the pattern needs to be refined to ignore // quoted characters. Currently, this isn't a problem with the locale // patterns we have, but it may be a problem later. boolean hasEra = (pat.indexOf("G") != -1); boolean hasZoneDisplayName = (pat.indexOf("z") != -1) || (pat.indexOf("v") != -1) || (pat.indexOf("V") != -1); // Because patterns contain incomplete data representing the Date, // we must be careful of how we do the roundtrip. We start with // a randomly generated Date because they're easier to generate. // From this we get a string. The string is our real starting point, // because this string should parse the same way all the time. Note // that it will not necessarily parse back to the original date because // of incompleteness in patterns. For example, a time-only pattern won't // parse back to the same date. try { for (int i = 0; i < TRIALS; ++i) { Date[] d = new Date[DEPTH]; String[] s = new String[DEPTH]; d[0] = generateDate(); // We go through this loop until we achieve a match or until // the maximum loop count is reached. We record the points at // which the date and the string starts to match. Once matching // starts, it should continue. int loop; int dmatch = 0; // d[dmatch].getTime() == d[dmatch-1].getTime() int smatch = 0; // s[smatch].equals(s[smatch-1]) for (loop = 0; loop < DEPTH; ++loop) { if (loop > 0) { d[loop] = fmt.parse(s[loop - 1]); } s[loop] = fmt.format(d[loop]); if (loop > 0) { if (smatch == 0) { boolean match = s[loop].equals(s[loop - 1]); if (smatch == 0) { if (match) smatch = loop; } else if (!match) errln("FAIL: String mismatch after match"); } if (dmatch == 0) { // {sfb} watch out here, this might not work boolean match = d[loop].getTime() == d[loop - 1].getTime(); if (dmatch == 0) { if (match) dmatch = loop; } else if (!match) errln("FAIL: Date mismatch after match"); } if (smatch != 0 && dmatch != 0) break; } } // At this point loop == DEPTH if we've failed, otherwise loop is the // max(smatch, dmatch), that is, the index at which we have string and // date matching. // Date usually matches in 2. Exceptions handled below. int maxDmatch = 2; int maxSmatch = 1; if (dmatch > maxDmatch || smatch > maxSmatch) { //If the Date is BC if (!timeOnly && !hasEra && getField(d[0], Calendar.ERA) == GregorianCalendar.BC) { maxDmatch = 3; maxSmatch = 2; } if (hasZoneDisplayName && (fmt.getTimeZone().inDaylightTime(d[0]) || fmt.getTimeZone().inDaylightTime(d[1]) || d[0].getTime() < 0L /* before 1970 */)) { maxSmatch = 2; if (timeOnly) { maxDmatch = 3; } } } if (dmatch > maxDmatch || smatch > maxSmatch) { SimpleDateFormat sdf = new SimpleDateFormat("EEEE, MMMM d, yyyy HH:mm:ss, z G", Locale.US); logln("Date = " + sdf.format(d[0]) + "; ms = " + d[0].getTime()); logln("Dmatch: " + dmatch + " maxD: " + maxDmatch + " Smatch:" + smatch + " maxS:" + maxSmatch); for (int j = 0; j <= loop && j < DEPTH; ++j) { StringBuffer temp = new StringBuffer(""); FieldPosition pos = new FieldPosition(0); logln((j > 0 ? " P> " : " ") + dateFormat.format(d[j], temp, pos) + " F> " + s[j] + (j > 0 && d[j].getTime() == d[j - 1].getTime() ? " d==" : "") + (j > 0 && s[j].equals(s[j - 1]) ? " s==" : "")); } errln("Pattern: " + pat + " failed to match" + "; ms = " + d[0].getTime()); } } } catch (ParseException e) { errln("Exception: " + e.getMessage()); logln(e.toString()); } } public int getField(Date d, int f) { getFieldCal.setTime(d); int ret = getFieldCal.get(f); return ret; } public Date generateDate() { double a = ran.nextDouble(); // Now 'a' ranges from 0..1; scale it to range from 0 to 8000 years a *= 8000; // Range from (4000-1970) BC to (8000-1970) AD a -= 4000; // Now scale up to ms a *= 365.25 * 24 * 60 * 60 * 1000; return new Date((long)a); } } icu4j-4.2/src/com/ibm/icu/dev/test/format/TestMessageFormat.java0000644000175000017500000017076111361050730024524 0ustar twernertwerner//##header /* ********************************************************************** * Copyright (c) 2004-2009, International Business Machines * Corporation and others. All Rights Reserved. ********************************************************************** * Author: Alan Liu * Created: April 6, 2004 * Since: ICU 3.0 ********************************************************************** */ package com.ibm.icu.dev.test.format; import java.text.AttributedCharacterIterator; import java.text.AttributedString; import java.text.ChoiceFormat; import java.text.FieldPosition; import java.text.Format; import java.text.ParseException; import java.text.ParsePosition; import java.util.Date; import java.util.Iterator; import java.util.Locale; import java.util.HashMap; import java.util.Map; import java.util.Set; import com.ibm.icu.text.DateFormat; import com.ibm.icu.text.DecimalFormat; import com.ibm.icu.text.DecimalFormatSymbols; import com.ibm.icu.text.MessageFormat; import com.ibm.icu.text.NumberFormat; import com.ibm.icu.text.SimpleDateFormat; import com.ibm.icu.text.UFormat; import com.ibm.icu.util.TimeZone; import com.ibm.icu.util.ULocale; public class TestMessageFormat extends com.ibm.icu.dev.test.TestFmwk { public static void main(String[] args) throws Exception { new TestMessageFormat().run(args); } public void TestBug3() { double myNumber = -123456; DecimalFormat form = null; Locale locale[] = { new Locale("ar", "", ""), new Locale("be", "", ""), new Locale("bg", "", ""), new Locale("ca", "", ""), new Locale("cs", "", ""), new Locale("da", "", ""), new Locale("de", "", ""), new Locale("de", "AT", ""), new Locale("de", "CH", ""), new Locale("el", "", ""), // 10 new Locale("en", "CA", ""), new Locale("en", "GB", ""), new Locale("en", "IE", ""), new Locale("en", "US", ""), new Locale("es", "", ""), new Locale("et", "", ""), new Locale("fi", "", ""), new Locale("fr", "", ""), new Locale("fr", "BE", ""), new Locale("fr", "CA", ""), // 20 new Locale("fr", "CH", ""), new Locale("he", "", ""), new Locale("hr", "", ""), new Locale("hu", "", ""), new Locale("is", "", ""), new Locale("it", "", ""), new Locale("it", "CH", ""), new Locale("ja", "", ""), new Locale("ko", "", ""), new Locale("lt", "", ""), // 30 new Locale("lv", "", ""), new Locale("mk", "", ""), new Locale("nl", "", ""), new Locale("nl", "BE", ""), new Locale("no", "", ""), new Locale("pl", "", ""), new Locale("pt", "", ""), new Locale("ro", "", ""), new Locale("ru", "", ""), new Locale("sh", "", ""), // 40 new Locale("sk", "", ""), new Locale("sl", "", ""), new Locale("sq", "", ""), new Locale("sr", "", ""), new Locale("sv", "", ""), new Locale("tr", "", ""), new Locale("uk", "", ""), new Locale("zh", "", ""), new Locale("zh", "TW", "") // 49 }; StringBuffer buffer = new StringBuffer(); ParsePosition parsePos = new ParsePosition(0); int i; for (i= 0; i < 49; i++) { // form = (DecimalFormat)NumberFormat.getCurrencyInstance(locale[i]); form = (DecimalFormat)NumberFormat.getInstance(locale[i]); if (form == null) { errln("Number format creation failed for " + locale[i].getDisplayName()); continue; } FieldPosition pos = new FieldPosition(0); buffer.setLength(0); form.format(myNumber, buffer, pos); parsePos.setIndex(0); Object result = form.parse(buffer.toString(), parsePos); logln(locale[i].getDisplayName() + " -> " + result); if (parsePos.getIndex() != buffer.length()) { errln("Number format parse failed."); } } } public void TestBug1() { final double limit[] = {0.0, 1.0, 2.0}; final String formats[] = {"0.0<=Arg<1.0", "1.0<=Arg<2.0", "2.0<-Arg"}; ChoiceFormat cf = new ChoiceFormat(limit, formats); assertEquals("ChoiceFormat.format", formats[1], cf.format(1)); } public void TestBug2() { // {sfb} use double format in pattern, so result will match (not strictly necessary) final String pattern = "There {0,choice,0.0#are no files|1.0#is one file|1.0 " + result); if (i != 3) { // TODO: fix this, for now skip ordinal parsing (format string at index 3) try { Object[] parsedArgs = fmt.parse(result); if (parsedArgs.length != 1) { errln("parse returned " + parsedArgs.length + " args"); } else if (!parsedArgs[0].equals(num)) { errln("parsed argument " + parsedArgs[0] + " != " + num); } } catch (Exception e) { errln("parse of '" + result + " returned exception: " + e.getMessage()); } } } } } public void TestSetGetFormats() { Object arguments[] = { new Double(456.83), new Date(871068000000L), "deposit" }; StringBuffer result = new StringBuffer(); String formatStr = "At