opencc-0.4.3/LICENSE 000640 567316 013202 00000021715 12145345503 015317 0 ustar 00carbokuo nonconf 000000 000000 Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
1. You must give any other recipients of the Work or Derivative Works a copy of this License; and
2. You must cause any modified files to carry prominent notices stating that You changed the files; and
3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
opencc-0.4.3/test/testcases/zht2zhs.in 000640 567316 013202 00000000502 12145345503 021222 0 ustar 00carbokuo nonconf 000000 000000 曾經有一份真誠的愛情放在我面前,我沒有珍惜,等我失去的時候我才後悔莫及。人事間最痛苦的事莫過於此。如果上天能夠給我一個再來一次得機會,我會對那個女孩子說三個字,我愛你。如果非要在這份愛上加個期限,我希望是,一萬年。
opencc-0.4.3/test/testcases/zht2zhs.ans 000640 567316 013202 00000000502 12145345503 021375 0 ustar 00carbokuo nonconf 000000 000000 曾经有一份真诚的爱情放在我面前,我没有珍惜,等我失去的时候我才后悔莫及。人事间最痛苦的事莫过于此。如果上天能够给我一个再来一次得机会,我会对那个女孩子说三个字,我爱你。如果非要在这份爱上加个期限,我希望是,一万年。
opencc-0.4.3/test/testcases/zhs2zhtw_p.ans 000640 567316 013202 00000000313 12145345503 022103 0 ustar 00carbokuo nonconf 000000 000000 滑鼠裏面的矽二極體壞了,導致游標解析度降低。
我們在寮國的伺服器的硬碟需要使用網際網路演算法軟體解決非同步的問題。
爲什麼你在牀裏面睡着? opencc-0.4.3/test/testcases/zhtw2zhcn_s.ans 000640 567316 013202 00000000302 12145345503 022242 0 ustar 00carbokuo nonconf 000000 000000 鼠标里面的硅二极管坏了,导致光标分辨率降低。
我们在老挝的服务器的硬盘需要使用互联网算法软件解决异步的问题。
为什么你在床里面睡着? opencc-0.4.3/test/testcases/zhtw2zhcn_t.in 000640 567316 013202 00000000313 12145345503 022072 0 ustar 00carbokuo nonconf 000000 000000 滑鼠裡面的矽二極體壞了,導致游標解析度降低。
我們在寮國的伺服器的硬碟需要使用網際網路演算法軟體解決非同步的問題。
為什麼你在床裡面睡著? opencc-0.4.3/test/testcases/mix2zhs.ans 000640 567316 013202 00000000123 12145345503 021364 0 ustar 00carbokuo nonconf 000000 000000 为什么简繁混杂是一个难题?
马拉松是一种有益身心的活动。
opencc-0.4.3/test/testcases/zhs2zhtw_vp.ans 000640 567316 013202 00000000313 12145345503 022271 0 ustar 00carbokuo nonconf 000000 000000 滑鼠裡面的矽二極體壞了,導致游標解析度降低。
我們在寮國的伺服器的硬碟需要使用網際網路演算法軟體解決非同步的問題。
為什麼你在床裡面睡著? opencc-0.4.3/test/testcases/mix2zht.in 000640 567316 013202 00000000123 12145345503 021212 0 ustar 00carbokuo nonconf 000000 000000 爲什么簡繁混杂是一個難題?
馬拉松是一种有益身心的活动。
opencc-0.4.3/test/testcases/zhs2zhtw_p.in 000640 567316 013202 00000000302 12145345503 021726 0 ustar 00carbokuo nonconf 000000 000000 鼠标里面的硅二极管坏了,导致光标分辨率降低。
我们在老挝的服务器的硬盘需要使用互联网算法软件解决异步的问题。
为什么你在床里面睡着? opencc-0.4.3/test/testcases/zhs2zhtw_vp.in 000640 567316 013202 00000000302 12145345503 022114 0 ustar 00carbokuo nonconf 000000 000000 鼠标里面的硅二极管坏了,导致光标分辨率降低。
我们在老挝的服务器的硬盘需要使用互联网算法软件解决异步的问题。
为什么你在床里面睡着? opencc-0.4.3/test/testcases/zhs2zht.in 000640 567316 013202 00000001200 12145345503 021216 0 ustar 00carbokuo nonconf 000000 000000 夸夸其谈 夸父逐日
我干什么不干你事。
太后的头发很干燥。
燕燕于飞,差池其羽。之子于归,远送于野。
请成相,世之殃,愚暗愚暗堕贤良。人主无贤,如瞽无相何伥伥!请布基,慎圣人,愚而自专事不治。主忌苟胜,群臣莫谏必逢灾。
曾经有一份真诚的爱情放在我面前,我没有珍惜,等我失去的时候我才后悔莫及。人事间最痛苦的事莫过于此。如果上天能够给我一个再来一次得机会,我会对那个女孩子说三个字,我爱你。如果非要在这份爱上加个期限,我希望是,一万年。
opencc-0.4.3/test/testcases/zhtw2zhcn_s.in 000640 567316 013202 00000000313 12145345503 022071 0 ustar 00carbokuo nonconf 000000 000000 滑鼠裡面的矽二極體壞了,導致游標解析度降低。
我們在寮國的伺服器的硬碟需要使用網際網路演算法軟體解決非同步的問題。
為什麼你在床裡面睡著? opencc-0.4.3/test/testcases/mix2zhs.in 000640 567316 013202 00000000123 12145345503 021211 0 ustar 00carbokuo nonconf 000000 000000 爲什么簡繁混杂是一個難題?
馬拉松是一种有益身心的活动。
opencc-0.4.3/test/testcases/zhs2zht.ans 000640 567316 013202 00000001200 12145345503 021371 0 ustar 00carbokuo nonconf 000000 000000 誇誇其談 夸父逐日
我幹什麼不干你事。
太后的頭髮很乾燥。
燕燕于飛,差池其羽。之子于歸,遠送於野。
請成相,世之殃,愚闇愚闇墮賢良。人主無賢,如瞽無相何倀倀!請布基,慎聖人,愚而自專事不治。主忌苟勝,羣臣莫諫必逢災。
曾經有一份真誠的愛情放在我面前,我沒有珍惜,等我失去的時候我才後悔莫及。人事間最痛苦的事莫過於此。如果上天能夠給我一個再來一次得機會,我會對那個女孩子說三個字,我愛你。如果非要在這份愛上加個期限,我希望是,一萬年。
opencc-0.4.3/test/testcases/mix2zht.ans 000640 567316 013202 00000000123 12145345503 021365 0 ustar 00carbokuo nonconf 000000 000000 爲什麼簡繁混雜是一個難題?
馬拉松是一種有益身心的活動。
opencc-0.4.3/test/testcases/zhtw2zhcn_t.ans 000640 567316 013202 00000000302 12145345503 022243 0 ustar 00carbokuo nonconf 000000 000000 鼠標裏面的硅二極管壞了,導致光標分辨率降低。
我們在老撾的服務器的硬盤需要使用互聯網算法軟件解決異步的問題。
爲什麼你在牀裏面睡着? opencc-0.4.3/test/CMakeLists.txt 000640 567316 013202 00000001236 12145345503 020025 0 ustar 00carbokuo nonconf 000000 000000 set(CONFIGURATIONS
zhs2zht
zht2zhs
mix2zht
mix2zhs
zhs2zhtw_p
zhs2zhtw_vp
zhtw2zhcn_t
zhtw2zhcn_s
)
foreach(CONFIG ${CONFIGURATIONS})
add_test(
${CONFIG}_convert
${CMAKE_COMMAND} -E chdir ${PROJECT_BINARY_DIR}/data
${PROJECT_BINARY_DIR}/src/tools/opencc
-i ${CMAKE_SOURCE_DIR}/test/testcases/${CONFIG}.in
-o ${PROJECT_BINARY_DIR}/test/${CONFIG}.out
-c ${CMAKE_SOURCE_DIR}/data/config/${CONFIG}.ini
)
add_test(
${CONFIG}_compare
diff
${PROJECT_BINARY_DIR}/test/${CONFIG}.out
${CMAKE_SOURCE_DIR}/test/testcases/${CONFIG}.ans
)
set_property(
TEST ${CONFIG}_compare
APPEND PROPERTY
DEPENDS ${CONFIG}_convert)
endforeach(CONFIG) opencc-0.4.3/release.sh 000750 567316 013202 00000000340 12145345503 016257 0 ustar 00carbokuo nonconf 000000 000000 mkdir -p release \
&& cd release \
&& cmake \
-D ENABLE_GETTEXT:BOOL=ON \
-D BUILD_DOCUMENTATION:BOOL=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr \
.. \
&& make \
&& make test \
&& make package_source
opencc-0.4.3/.cproject 000640 567316 013202 00000012454 12145345503 016124 0 ustar 00carbokuo nonconf 000000 000000
opencc-0.4.3/opencc.gyp 000640 567316 013202 00000001322 12145345503 016272 0 ustar 00carbokuo nonconf 000000 000000 {
"includes": [
"gypi/global.gypi",
"gypi/configs.gypi",
"gypi/dicts.gypi",
],
"targets": [{
"target_name": "libopencc",
"type": "<(library)",
"sources": [
"src/config_reader.c",
"src/converter.c",
"src/dict_group.c",
"src/dict_chain.c",
"src/encoding.c",
"src/utils.c",
"src/opencc.c",
"src/dict.c",
"src/dictionary/datrie.c",
"src/dictionary/text.c"
],
"conditions": [
["OS=='linux'", {
"cflags": [
"-fPIC"
]
}]
]
}, {
"target_name": "opencc",
"type": "executable",
"sources": [
"src/tools/opencc.c"
],
"dependencies": [
"libopencc"
]
}]
}
opencc-0.4.3/NEWS.md 000640 567316 013202 00000007457 12145345503 015417 0 ustar 00carbokuo nonconf 000000 000000 # Change History of OpenCC
## Ver 0.4.3
* 增加接口`opencc_convert_utf8_free`
* 修正Node.js插件內存泄漏問題
* 修正Windows下獲取當前目錄的問題
## Ver 0.4.2
* 修正「阪」、「薰」繁簡轉換
* 增加四對缺失的簡繁轉換
* 增加API文檔,由Doxygen生成
* 重構大量代碼
## Ver 0.4.1
* 修正Node.js 0.10兼容性問題。
* 從Unihan數據庫增加若干缺失的簡繁轉換單字。
## Ver 0.4.0
2013年3月2日
* 修正「雕」「谥」「峯」轉換,新增數百條臺灣科技詞彙。
* 修正命令行-h錯誤。
* 修正長行讀取錯誤。
* 修正錯誤類型拼寫錯誤。
* 修正UTF-8編碼轉換錯誤。
* 自動跳過UTF-8的BOM。
* 修正配置和數據文件相對路徑問題。
* 增加了gyp編譯系統。
* 增加了Node.js接口。
## Ver 0.3.0
2011年12月2日
* 增加中國大陸、臺灣地區異體字和習慣用詞轉換功能。
* 修正詞典轉換鏈爲奇數時的緩衝區複製Bug。
* 修正Big Endian平臺上的UTF-8轉換錯誤。
* 修正「齣」「薑」詞組的問題。
* 修正「钁」「卷」「干」「薰」「糉」「蝨」「麺」。
* 增加「綑」到「捆」的繁簡轉換。
* 增加「跡」「蹟」對立。
* 增加「夫」「伕」對立。
* 增加「毀」「譭」「燬」對立。
* 增加「背」「揹」對立。
## Ver 0.2.0
2010年12月23日
* 取消libopencc對iconv的依賴。
* 增加UTF8編碼格式錯誤時提示信息。
* 重構Python封裝。
* 修正讀取一行長度超過緩衝區時的UTF8截斷錯誤。
* 使用CMake代替Autotools構建編譯框架。
* 修正包括「拿不準」在內諸多簡繁轉換問題。
## Ver 0.1.2
2010年9月16日
* 增加「僅分詞」和「顯示多重候選字詞」的轉換接口。
* 改進辭典文件的結構。
* 修正轉換緩衝區永遠不足的Bug。
* 修正多辭典轉換時略過某個辭典的Bug。
* 修正輸入爲空時轉換的Bug。
* 改進opencc命令行工具參數提示和幫助。
## Ver 0.1.1
2010年8月10日
* 增加簡繁混雜到簡體或繁體的轉換。
* 增加多詞典/詞典組的轉換支持。
* 修正big endian平臺上的兼容性問題。
* 修正apple平臺下編譯iconv依賴的問題。
* 修正辭典中詞條長度長度不相等時轉換錯誤的Bug。
* 重構辭典代碼抽象。
* 增加編譯時的測試。
* 分離辭典爲字典和詞典。
## Ver 0.1.0
2010年7月28日
* 修正文件名緩衝區不足的Bug。
* libopencc版本更新至1.0.0。
* 分離臺灣特有的繁簡轉換「著」「么」。
* 修改「众」「教」「查」「污」對應默認異體。
* 加入「齧啮」「灩滟」繁簡轉換。
* 增加「岳嶽」一簡對多繁轉換。
* 隱藏不必要的類型,更新接口註釋。
## Ver 0.0.5
2010年7月21日
* 修正`wchar_t`兼容性問題,使用`ucs4`。
* 增加Windows移植分支。
* 修正一個文件名緩衝區分配的問題。
* 增加「囉」「溼」「廕」「彷」「徵」繁簡轉換。
## Ver 0.0.4
2010年7月16日
* 增加「卹」「牴」「皁」「羶」「薹」等轉換。
* 精簡辭典中大量不必要的數詞(含「千」「萬」)。
* 修正最短路徑分詞時優先後向匹配的實現問題。
* 修正辭典加載兼容性問題,當無法mmap時直接申請內存。
* 修正C++接口在64位平臺下編譯的問題。
## Ver 0.0.3
2010年6月22日
* 加入繁體到簡體的轉換。
* 增加提示信息的中文翻譯,使用`GNU Gettext`。
* 增加辭典配置文件支持。
* 修正一些兼容性Bug。
## Ver 0.0.2
2010年6月19日
* 分離詞庫。
* 增加平面文件詞庫讀取的支持。
* 增加平面文件詞庫到`Datrie`詞庫的轉換工具`opencc_dict`。
* 提供UTF8文本直接轉換的接口。
## Ver 0.0.1
2010年6月11日
* OpenCC初始版本釋出。
* 支持簡繁轉換。
opencc-0.4.3/gypi/global.gypi 000640 567316 013202 00000000251 12145345503 017404 0 ustar 00carbokuo nonconf 000000 000000 {
"variables": {
"opencc_version": "0.4.3"
},
"target_defaults": {
"defines": [
"VERSION=\"<(opencc_version)\"",
"PKGDATADIR=\"\""
]
}
}
opencc-0.4.3/gypi/opencc_dict.gypi 000640 567316 013202 00000000571 12145345503 020423 0 ustar 00carbokuo nonconf 000000 000000 {
"targets": [{
"target_name": "opencc_dict",
"type": "executable",
"sources": [
"../src/tools/opencc_dict.c",
"../src/encoding.c",
"../src/utils.c",
"../src/dict_group.c",
"../src/dict_chain.c",
"../src/config_reader.c",
"../src/dict.c",
"../src/dictionary/datrie.c",
"../src/dictionary/text.c"
]
}]
}
opencc-0.4.3/gypi/configs.gypi 000640 567316 013202 00000001324 12145345503 017576 0 ustar 00carbokuo nonconf 000000 000000 {
"targets": [{
"target_name": "configs",
"type": "none",
"copies": [{
"destination": "<(PRODUCT_DIR)",
"files": [
"../data/config/mix2zhs.ini",
"../data/config/mix2zht.ini",
"../data/config/zhs2zht.ini",
"../data/config/zhs2zhtw_p.ini",
"../data/config/zhs2zhtw_v.ini",
"../data/config/zhs2zhtw_vp.ini",
"../data/config/zht2zhs.ini",
"../data/config/zht2zhtw_p.ini",
"../data/config/zht2zhtw_v.ini",
"../data/config/zht2zhtw_vp.ini",
"../data/config/zhtw2zhcn_s.ini",
"../data/config/zhtw2zhcn_t.ini",
"../data/config/zhtw2zhs.ini",
"../data/config/zhtw2zht.ini"
]
}]
}]
}
opencc-0.4.3/gypi/dicts.gypi 000640 567316 013202 00000003544 12145345503 017262 0 ustar 00carbokuo nonconf 000000 000000 {
"includes": [
"opencc_dict.gypi",
],
"targets": [{
"target_name": "dicts",
"type": "none",
"variables": {
"cmd": "<(PRODUCT_DIR)/opencc_dict",
"input_prefix": "data/",
"output_prefix": "<(PRODUCT_DIR)/"
},
"copies": [{
"destination": "<(PRODUCT_DIR)",
"files": [
"../data/tw/to_tw_variants.txt",
"../data/tw/to_tw_phrases.txt",
"../data/tw/from_tw_variants.txt",
"../data/tw/from_tw_phrases.txt",
"../data/cn/to_cn_phrases.txt"
]
}],
"actions": [{
"action_name": "simp_to_trad_characters",
"variables": {
"input": "<(input_prefix)simp_to_trad/characters.txt",
},
"inputs": ["<(cmd)", "<(input)"],
"outputs": ["<(output_prefix)simp_to_trad_characters.ocd"],
"action": ["<(cmd)", "-i", "<(input)", "-o", "<@(_outputs)"]
}, {
"action_name": "simp_to_trad_phrases",
"variables": {
"input": "<(input_prefix)simp_to_trad/phrases.txt",
},
"inputs": ["<(cmd)", "<(input)"],
"outputs": ["<(output_prefix)simp_to_trad_phrases.ocd"],
"action": ["<(cmd)", "-i", "<(input)", "-o", "<@(_outputs)"]
}, {
"action_name": "trad_to_simp_characters",
"variables": {
"input": "<(input_prefix)trad_to_simp/characters.txt",
},
"inputs": ["<(cmd)", "<(input)"],
"outputs": ["<(output_prefix)trad_to_simp_characters.ocd"],
"action": ["<(cmd)", "-i", "<(input)", "-o", "<@(_outputs)"]
}, {
"action_name": "trad_to_simp_phrases",
"variables": {
"input": "<(input_prefix)trad_to_simp/phrases.txt",
},
"inputs": ["<(cmd)", "<(input)"],
"outputs": ["<(output_prefix)trad_to_simp_phrases.ocd"],
"action": ["<(cmd)", "-i", "<(input)", "-o", "<@(_outputs)"]
}],
"dependencies": [
"opencc_dict"
]
}]
}
opencc-0.4.3/src/encoding.h 000640 567316 013202 00000003062 12145345503 017033 0 ustar 00carbokuo nonconf 000000 000000 /**
* @file
* UCS4-UTF8 Encoding module.
*
* @license
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_ENCODING_H_
#define __OPENCC_ENCODING_H_
#include "common.h"
/**
* Converts a UTF-8 string into UCS-4.
*
* @param utf8 UTF-8 string
* @param length Length of UTF-8 string or 0 to consider as \0 ended string
* @return The converted UCS-4 string. Must be free when not in use.
*/
ucs4_t* utf8_to_ucs4(const char* utf8, size_t length);
/**
* Converts a UCS-4 string into UTF-8.
*
* @param ucs4 UCS-4 string
* @param length Length of UCS-4 string or 0 to consider as \0 ended string
* @return The converted UTF-8 string. Must be free when not in use.
*/
char* ucs4_to_utf8(const ucs4_t* ucs4, size_t length);
size_t ucs4len(const ucs4_t* str);
int ucs4cmp(const ucs4_t* str1, const ucs4_t* str2);
void ucs4cpy(ucs4_t* dest, const ucs4_t* src);
void ucs4ncpy(ucs4_t* dest, const ucs4_t* src, size_t len);
#endif /* __OPENCC_ENCODING_H_ */
opencc-0.4.3/src/symbols.cmake 000640 567316 013202 00000001734 12145345503 017572 0 ustar 00carbokuo nonconf 000000 000000 set(
OPENCC_SYMBOLS
opencc_open
opencc_close
opencc_convert
opencc_convert_utf8
opencc_convert_utf8_free
opencc_dict_load
opencc_set_conversion_mode
opencc_errno
opencc_perror
)
set (LINK_FLAGS "")
if (APPLE)
# Create a symbols_list file for the darwin linker
string(REPLACE ";" "\n_" _symbols "${OPENCC_SYMBOLS}")
set(_symbols_list "${CMAKE_CURRENT_BINARY_DIR}/symbols.list")
file(WRITE ${_symbols_list} "_${_symbols}\n")
set(LINK_FLAGS
"${LINK_FLAGS} -Wl,-exported_symbols_list,'${_symbols_list}'")
elseif (CMAKE_C_COMPILER_ID STREQUAL GNU)
# Create a version script for GNU ld.
set(_symbols "{ global: ${OPENCC_SYMBOLS}; local: *; };")
set(_version_script "${CMAKE_CURRENT_BINARY_DIR}/version.script")
file(WRITE ${_version_script} "${_symbols}\n")
set(LINK_FLAGS "${LINK_FLAGS} -Wl,--version-script,'${_version_script}'")
endif (APPLE)
set_target_properties(
${LIBOPENCC_TARGET}
${LIBOPENCC_STATIC_TARGET}
PROPERTIES
LINK_FLAGS
"${LINK_FLAGS}"
)
opencc-0.4.3/src/dictionary/datrie.c 000640 567316 013202 00000017770 12145345503 020670 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "datrie.h"
#include
#include
#ifdef __WIN32
/* Todo: Win32 mmap*/
#else /* ifdef __WIN32 */
# include
# define MMAP_ENABLED
#endif /* ifdef __WIN32 */
typedef enum {
MEMORY_TYPE_MMAP,
MEMORY_TYPE_ALLOCATE
} memory_type;
typedef struct {
const DatrieItem* dat;
uint32_t dat_item_count;
ucs4_t* lexicon;
uint32_t lexicon_count;
ucs4_t*** lexicon_set;
void* dic_memory;
size_t dic_size;
memory_type dic_memory_type;
} DatrieDict;
static int load_allocate(DatrieDict* datrie_dictionary, int fd) {
datrie_dictionary->dic_memory_type = MEMORY_TYPE_ALLOCATE;
datrie_dictionary->dic_memory = malloc(datrie_dictionary->dic_size);
if (datrie_dictionary->dic_memory == NULL) {
/* 內存申請失敗 */
return -1;
}
lseek(fd, 0, SEEK_SET);
if (read(fd, datrie_dictionary->dic_memory,
datrie_dictionary->dic_size) == -1) {
/* 讀取失敗 */
return -1;
}
return 0;
}
static int load_mmap(DatrieDict* datrie_dictionary, int fd) {
#ifdef MMAP_ENABLED
datrie_dictionary->dic_memory_type = MEMORY_TYPE_MMAP;
datrie_dictionary->dic_memory = mmap(NULL,
datrie_dictionary->dic_size,
PROT_READ,
MAP_PRIVATE,
fd,
0);
if (datrie_dictionary->dic_memory == MAP_FAILED) {
/* 內存映射創建失敗 */
datrie_dictionary->dic_memory = NULL;
return -1;
}
return 0;
#else /* ifdef MMAP_ENABLED */
return -1;
#endif /* ifdef MMAP_ENABLED */
}
static int load_dict(DatrieDict* datrie_dictionary, FILE* fp) {
int fd = fileno(fp);
fseek(fp, 0, SEEK_END);
datrie_dictionary->dic_size = ftell(fp);
/* 首先嘗試mmap,如果失敗嘗試申請內存 */
if (load_mmap(datrie_dictionary, fd) == -1) {
if (load_allocate(datrie_dictionary, fd) == -1) {
return -1;
}
}
size_t header_len = strlen("OPENCCDATRIE");
if (strncmp((const char*)datrie_dictionary->dic_memory, "OPENCCDATRIE",
header_len) != 0) {
return -1;
}
size_t offset = 0;
offset += header_len * sizeof(char);
/* 詞彙表 */
uint32_t lexicon_length =
*((uint32_t*)(datrie_dictionary->dic_memory + offset));
offset += sizeof(uint32_t);
datrie_dictionary->lexicon = (ucs4_t*)(datrie_dictionary->dic_memory + offset);
offset += lexicon_length * sizeof(ucs4_t);
/* 詞彙索引表 */
uint32_t lexicon_index_length =
*((uint32_t*)(datrie_dictionary->dic_memory + offset));
offset += sizeof(uint32_t);
uint32_t* lexicon_index = (uint32_t*)(datrie_dictionary->dic_memory + offset);
offset += lexicon_index_length * sizeof(uint32_t);
datrie_dictionary->lexicon_count =
*((uint32_t*)(datrie_dictionary->dic_memory + offset));
offset += sizeof(uint32_t);
datrie_dictionary->dat_item_count =
*((uint32_t*)(datrie_dictionary->dic_memory + offset));
offset += sizeof(uint32_t);
datrie_dictionary->dat =
(DatrieItem*)(datrie_dictionary->dic_memory + offset);
/* 構造索引表 */
datrie_dictionary->lexicon_set = (ucs4_t***)malloc(
datrie_dictionary->lexicon_count * sizeof(ucs4_t * *));
size_t i, last = 0;
for (i = 0; i < datrie_dictionary->lexicon_count; i++) {
size_t count, j;
for (j = last; j < lexicon_index_length; j++) {
if (lexicon_index[j] == (uint32_t)-1) {
break;
}
}
count = j - last;
datrie_dictionary->lexicon_set[i] =
(ucs4_t**)malloc((count + 1) * sizeof(ucs4_t*));
for (j = 0; j < count; j++) {
datrie_dictionary->lexicon_set[i][j] =
datrie_dictionary->lexicon + lexicon_index[last + j];
}
datrie_dictionary->lexicon_set[i][count] = NULL;
last += j + 1;
}
return 0;
}
static int unload_dict(DatrieDict* datrie_dictionary) {
if (datrie_dictionary->dic_memory != NULL) {
size_t i;
for (i = 0; i < datrie_dictionary->lexicon_count; i++) {
free(datrie_dictionary->lexicon_set[i]);
}
free(datrie_dictionary->lexicon_set);
if (MEMORY_TYPE_MMAP == datrie_dictionary->dic_memory_type) {
#ifdef MMAP_ENABLED
return munmap(datrie_dictionary->dic_memory, datrie_dictionary->dic_size);
#else /* ifdef MMAP_ENABLED */
debug_should_not_be_here();
#endif /* ifdef MMAP_ENABLED */
} else if (MEMORY_TYPE_ALLOCATE == datrie_dictionary->dic_memory_type) {
free(datrie_dictionary->dic_memory);
} else {
return -1;
}
}
return 0;
}
Dict* dict_datrie_new(const char* filename) {
DatrieDict* datrie_dictionary = (DatrieDict*)malloc(
sizeof(DatrieDict));
datrie_dictionary->dat = NULL;
datrie_dictionary->lexicon = NULL;
FILE* fp = fopen(filename, "rb");
if (load_dict(datrie_dictionary, fp) == -1) {
dict_datrie_delete((Dict*)datrie_dictionary);
return (Dict*)-1;
}
fclose(fp);
return (Dict*)datrie_dictionary;
}
int dict_datrie_delete(Dict* dict) {
DatrieDict* datrie_dictionary =
(DatrieDict*)dict;
if (unload_dict(datrie_dictionary) == -1) {
free(datrie_dictionary);
return -1;
}
free(datrie_dictionary);
return 0;
}
int encode_char(ucs4_t ch) {
return (int)ch;
}
void datrie_match(const DatrieDict* datrie_dictionary,
const ucs4_t* word,
size_t* match_pos,
size_t* id,
size_t limit) {
int i, p;
for (i = 0, p = 0; word[p] && (limit == 0 || (size_t)p < limit) &&
datrie_dictionary->dat[i].base != DATRIE_UNUSED; p++) {
int k = encode_char(word[p]);
int j = datrie_dictionary->dat[i].base + k;
if ((j < 0) || ((size_t)j >= datrie_dictionary->dat_item_count) ||
(datrie_dictionary->dat[j].parent != i)) {
break;
}
i = j;
}
if (match_pos) {
*match_pos = p;
}
if (id) {
*id = i;
}
}
const ucs4_t* const* dict_datrie_match_longest(Dict* dict,
const ucs4_t* word,
size_t maxlen,
size_t* match_length) {
DatrieDict* datrie_dictionary =
(DatrieDict*)dict;
size_t pos, item;
datrie_match(datrie_dictionary, word, &pos, &item, maxlen);
while (datrie_dictionary->dat[item].word == -1 && pos > 1) {
datrie_match(datrie_dictionary, word, &pos, &item, pos - 1);
}
if ((pos == 0) || (datrie_dictionary->dat[item].word == -1)) {
if (match_length != NULL) {
*match_length = 0;
}
return NULL;
}
if (match_length != NULL) {
*match_length = pos;
}
return (const ucs4_t* const*)
datrie_dictionary->lexicon_set[datrie_dictionary->dat[item].word];
}
size_t dict_datrie_get_all_match_lengths(Dict* dict,
const ucs4_t* word,
size_t* match_length) {
DatrieDict* datrie_dictionary =
(DatrieDict*)dict;
size_t rscnt = 0;
int i, p;
for (i = 0, p = 0; word[p] && datrie_dictionary->dat[i].base != DATRIE_UNUSED;
p++) {
int k = encode_char(word[p]);
int j = datrie_dictionary->dat[i].base + k;
if ((j < 0) || ((size_t)j >= datrie_dictionary->dat_item_count) ||
(datrie_dictionary->dat[j].parent != i)) {
break;
}
i = j;
if (datrie_dictionary->dat[i].word != -1) {
match_length[rscnt++] = p + 1;
}
}
return rscnt;
}
opencc-0.4.3/src/dictionary/text.h 000640 567316 013202 00000002767 12145345503 020411 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_DICTIONARY_TEXT_H_
#define __OPENCC_DICTIONARY_TEXT_H_
#include "../dict.h"
typedef struct {
ucs4_t* key;
ucs4_t** value;
} TextEntry;
typedef struct {
size_t entry_count;
size_t max_length;
TextEntry* lexicon;
ucs4_t* word_buff;
} TextDict;
Dict* dict_text_new(const char* filename);
void dict_text_delete(Dict* dict);
const ucs4_t* const* dict_text_match_longest(Dict* dict,
const ucs4_t* word,
size_t maxlen,
size_t* match_length);
size_t dict_text_get_all_match_lengths(Dict* dict,
const ucs4_t* word,
size_t* match_length);
size_t dict_text_get_lexicon(Dict* dict, TextEntry* lexicon);
#endif /* __OPENCC_DICTIONARY_TEXT_H_ */
opencc-0.4.3/src/dictionary/datrie.h 000640 567316 013202 00000002630 12145345503 020662 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_DICTIONARY_DATRIE_H_
#define __OPENCC_DICTIONARY_DATRIE_H_
#include "../dict.h"
#define DATRIE_UNUSED -1
typedef struct {
int base;
int parent;
int word;
} DatrieItem;
Dict* dict_datrie_new(const char* filename);
int dict_datrie_delete(Dict* dict);
const ucs4_t* const* dict_datrie_match_longest(Dict* dict,
const ucs4_t* word,
size_t maxlen,
size_t* match_length);
size_t dict_datrie_get_all_match_lengths(Dict* dict,
const ucs4_t* word,
size_t* match_length);
int encode_char(ucs4_t ch);
#endif /* __OPENCC_DICTIONARY_DATRIE_H_ */
opencc-0.4.3/src/dictionary/text.c 000640 567316 013202 00000016135 12145345503 020376 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "../encoding.h"
#include "text.h"
#define INITIAL_DICTIONARY_SIZE 1024
#define ENTRY_BUFF_SIZE 128
#define ENTRY_WBUFF_SIZE ENTRY_BUFF_SIZE / sizeof(size_t)
int qsort_entry_cmp(const void* a, const void* b) {
return ucs4cmp(((TextEntry*)a)->key, ((TextEntry*)b)->key);
}
int parse_entry(const char* buff, TextEntry* entry_i) {
size_t length;
const char* pbuff;
/* 解析鍵 */
for (pbuff = buff; *pbuff != '\t' && *pbuff != '\0'; ++pbuff) {}
if (*pbuff == '\0') {
return -1;
}
length = pbuff - buff;
ucs4_t* ucs4_buff;
ucs4_buff = utf8_to_ucs4(buff, length);
if (ucs4_buff == (ucs4_t*)-1) {
return -1;
}
entry_i->key = (ucs4_t*)malloc((length + 1) * sizeof(ucs4_t));
ucs4cpy(entry_i->key, ucs4_buff);
free(ucs4_buff);
/* 解析值 */
size_t value_i, value_count = INITIAL_DICTIONARY_SIZE;
entry_i->value = (ucs4_t**)malloc(value_count * sizeof(ucs4_t*));
for (value_i = 0; *pbuff != '\0' && *pbuff != '\n'; ++value_i) {
if (value_i >= value_count) {
value_count += value_count;
entry_i->value = (ucs4_t**)realloc(
entry_i->value,
value_count * sizeof(ucs4_t*)
);
}
for (buff = ++pbuff;
*pbuff != ' ' && *pbuff != '\0' && *pbuff != '\n' && *pbuff != '\r';
++pbuff) {}
length = pbuff - buff;
ucs4_buff = utf8_to_ucs4(buff, length);
if (ucs4_buff == (ucs4_t*)-1) {
/* 發生錯誤 回退內存申請 */
ssize_t i;
for (i = value_i - 1; i >= 0; --i) {
free(entry_i->value[i]);
}
free(entry_i->value);
free(entry_i->key);
return -1;
}
entry_i->value[value_i] = (ucs4_t*)malloc((length + 1) * sizeof(ucs4_t));
ucs4cpy(entry_i->value[value_i], ucs4_buff);
free(ucs4_buff);
}
entry_i->value = (ucs4_t**)realloc(
entry_i->value,
value_count * sizeof(ucs4_t*)
);
entry_i->value[value_i] = NULL;
return 0;
}
Dict* dict_text_new(const char* filename) {
TextDict* text_dictionary;
text_dictionary = (TextDict*)malloc(sizeof(TextDict));
text_dictionary->entry_count = INITIAL_DICTIONARY_SIZE;
text_dictionary->max_length = 0;
text_dictionary->lexicon = (TextEntry*)malloc(
sizeof(TextEntry) * text_dictionary->entry_count);
text_dictionary->word_buff = NULL;
static char buff[ENTRY_BUFF_SIZE];
FILE* fp = fopen(filename, "r");
if (fp == NULL) {
dict_text_delete((Dict*)text_dictionary);
return (Dict*)-1;
}
skip_utf8_bom(fp);
size_t i = 0;
while (fgets(buff, ENTRY_BUFF_SIZE, fp)) {
if (i >= text_dictionary->entry_count) {
text_dictionary->entry_count += text_dictionary->entry_count;
text_dictionary->lexicon = (TextEntry*)realloc(
text_dictionary->lexicon,
sizeof(TextEntry) * text_dictionary->entry_count
);
}
if (parse_entry(buff, text_dictionary->lexicon + i) == -1) {
text_dictionary->entry_count = i;
dict_text_delete((Dict*)text_dictionary);
return (Dict*)-1;
}
size_t length = ucs4len(text_dictionary->lexicon[i].key);
if (length > text_dictionary->max_length) {
text_dictionary->max_length = length;
}
i++;
}
fclose(fp);
text_dictionary->entry_count = i;
text_dictionary->lexicon = (TextEntry*)realloc(
text_dictionary->lexicon,
sizeof(TextEntry) * text_dictionary->entry_count
);
text_dictionary->word_buff = (ucs4_t*)
malloc(sizeof(ucs4_t) *
(text_dictionary->max_length + 1));
qsort(text_dictionary->lexicon,
text_dictionary->entry_count,
sizeof(text_dictionary->lexicon[0]),
qsort_entry_cmp
);
return (Dict*)text_dictionary;
}
void dict_text_delete(Dict* dict) {
TextDict* text_dictionary = (TextDict*)dict;
size_t i;
for (i = 0; i < text_dictionary->entry_count; ++i) {
free(text_dictionary->lexicon[i].key);
ucs4_t** j;
for (j = text_dictionary->lexicon[i].value; *j; ++j) {
free(*j);
}
free(text_dictionary->lexicon[i].value);
}
free(text_dictionary->lexicon);
free(text_dictionary->word_buff);
free(text_dictionary);
}
const ucs4_t* const* dict_text_match_longest(Dict* dict,
const ucs4_t* word,
size_t maxlen,
size_t* match_length) {
TextDict* text_dictionary = (TextDict*)dict;
if (text_dictionary->entry_count == 0) {
return NULL;
}
if (maxlen == 0) {
maxlen = ucs4len(word);
}
size_t len = text_dictionary->max_length;
if (maxlen < len) {
len = maxlen;
}
ucs4ncpy(text_dictionary->word_buff, word, len);
text_dictionary->word_buff[len] = L'\0';
TextEntry buff;
buff.key = text_dictionary->word_buff;
for (; len > 0; len--) {
text_dictionary->word_buff[len] = L'\0';
TextEntry* brs = (TextEntry*)bsearch(
&buff,
text_dictionary->lexicon,
text_dictionary->entry_count,
sizeof(text_dictionary->lexicon[0]),
qsort_entry_cmp
);
if (brs != NULL) {
if (match_length != NULL) {
*match_length = len;
}
return (const ucs4_t* const*)brs->value;
}
}
if (match_length != NULL) {
*match_length = 0;
}
return NULL;
}
size_t dict_text_get_all_match_lengths(Dict* dict,
const ucs4_t* word,
size_t* match_length) {
TextDict* text_dictionary = (TextDict*)dict;
size_t rscnt = 0;
if (text_dictionary->entry_count == 0) {
return rscnt;
}
size_t length = ucs4len(word);
size_t len = text_dictionary->max_length;
if (length < len) {
len = length;
}
ucs4ncpy(text_dictionary->word_buff, word, len);
text_dictionary->word_buff[len] = L'\0';
TextEntry buff;
buff.key = text_dictionary->word_buff;
for (; len > 0; len--) {
text_dictionary->word_buff[len] = L'\0';
TextEntry* brs = (TextEntry*)bsearch(
&buff,
text_dictionary->lexicon,
text_dictionary->entry_count,
sizeof(text_dictionary->lexicon[0]),
qsort_entry_cmp
);
if (brs != NULL) {
match_length[rscnt++] = len;
}
}
return rscnt;
}
size_t dict_text_get_lexicon(Dict* dict, TextEntry* lexicon) {
TextDict* text_dictionary = (TextDict*)dict;
size_t i;
for (i = 0; i < text_dictionary->entry_count; i++) {
lexicon[i].key = text_dictionary->lexicon[i].key;
lexicon[i].value = text_dictionary->lexicon[i].value;
}
return text_dictionary->entry_count;
}
opencc-0.4.3/src/converter.c 000640 567316 013202 00000044324 12145345503 017255 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "common.h"
#include "converter.h"
#include "dict_group.h"
#include "dict_chain.h"
#include "encoding.h"
#define DELIMITER ' '
#define SEGMENT_MAXIMUM_LENGTH 0
#define SEGMENT_SHORTEST_PATH 1
#define SEGMENT_METHOD SEGMENT_SHORTEST_PATH
#if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH
# define OPENCC_SP_SEG_DEFAULT_BUFFER_SIZE 1024
typedef struct {
int initialized;
size_t buffer_size;
size_t* match_length;
size_t* min_len;
size_t* parent;
size_t* path;
} SpsegData;
#endif
static converter_error errnum = CONVERTER_ERROR_VOID;
#if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH
static void sp_seg_buffer_free(SpsegData* ossb) {
free(ossb->match_length);
free(ossb->min_len);
free(ossb->parent);
free(ossb->path);
}
static void sp_seg_set_buffer_size(SpsegData* ossb, size_t buffer_size) {
if (ossb->initialized == 1) {
sp_seg_buffer_free(ossb);
}
ossb->buffer_size = buffer_size;
ossb->match_length = (size_t*)malloc((buffer_size + 1) * sizeof(size_t));
ossb->min_len = (size_t*)malloc(buffer_size * sizeof(size_t));
ossb->parent = (size_t*)malloc(buffer_size * sizeof(size_t));
ossb->path = (size_t*)malloc(buffer_size * sizeof(size_t));
ossb->initialized = 1;
}
static size_t sp_seg(Converter* converter,
ucs4_t** inbuf,
size_t* inbuf_left,
ucs4_t** outbuf,
size_t* outbuf_left,
size_t length) {
/* 最短路徑分詞 */
/* 對長度爲1時特殊優化 */
if (length == 1) {
const ucs4_t* const* match_rs = dict_group_match_longest(
converter->current_dict_group,
*inbuf,
1,
NULL);
size_t match_len = 1;
if (converter->conversion_mode == OPENCC_CONVERSION_FAST) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
const ucs4_t* result = match_rs[0];
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (ucs4len(result) > *outbuf_left) {
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
for (; *result; result++) {
**outbuf = *result;
(*outbuf)++, (*outbuf_left)--;
}
*inbuf += match_len;
*inbuf_left -= match_len;
}
} else if (converter->conversion_mode ==
OPENCC_CONVERSION_LIST_CANDIDATES) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
size_t i;
for (i = 0; match_rs[i] != NULL; i++) {
const ucs4_t* result = match_rs[i];
int show_delimiter = match_rs[i + 1] != NULL ? 1 : 0;
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (ucs4len(result) + show_delimiter > *outbuf_left) {
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
for (; *result; result++) {
**outbuf = *result;
(*outbuf)++, (*outbuf_left)--;
}
if (show_delimiter) {
**outbuf = DELIMITER;
(*outbuf)++, (*outbuf_left)--;
}
}
*inbuf += match_len;
*inbuf_left -= match_len;
}
} else if (converter->conversion_mode == OPENCC_CONVERSION_SEGMENT_ONLY) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (match_len + 1 > *outbuf_left) {
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
size_t i;
for (i = 0; i < match_len; i++) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
}
}
**outbuf = DELIMITER;
(*outbuf)++, (*outbuf_left)--;
} else {
debug_should_not_be_here();
}
/* 必須保證有一個字符空間 */
return match_len;
}
/* 設置緩衝區空間 */
SpsegData* ossb = converter->data;
size_t buffer_size_need = length + 1;
if ((ossb->initialized == 0) || (ossb->buffer_size < buffer_size_need)) {
sp_seg_set_buffer_size(ossb, buffer_size_need);
}
size_t i, j;
for (i = 0; i <= length; i++) {
ossb->min_len[i] = INFINITY_INT;
}
ossb->min_len[0] = ossb->parent[0] = 0;
for (i = 0; i < length; i++) {
/* 獲取所有匹配長度 */
size_t match_count = dict_group_get_all_match_lengths(
converter->current_dict_group,
(*inbuf) + i,
ossb->match_length
);
if (ossb->match_length[0] != 1) {
ossb->match_length[match_count++] = 1;
}
/* 動態規劃求最短分割路徑 */
for (j = 0; j < match_count; j++) {
size_t k = ossb->match_length[j];
ossb->match_length[j] = 0;
if ((k > 1) && (ossb->min_len[i] + 1 <= ossb->min_len[i + k])) {
ossb->min_len[i + k] = ossb->min_len[i] + 1;
ossb->parent[i + k] = i;
} else if ((k == 1) &&
(ossb->min_len[i] + 1 < ossb->min_len[i + k])) {
ossb->min_len[i + k] = ossb->min_len[i] + 1;
ossb->parent[i + k] = i;
}
}
}
/* 取得最短分割路徑 */
for (i = length, j = ossb->min_len[length]; i != 0; i = ossb->parent[i]) {
ossb->path[--j] = i;
}
size_t inbuf_left_start = *inbuf_left;
size_t begin, end;
/* 根據最短分割路徑轉換 */
for (i = begin = 0; i < ossb->min_len[length]; i++) {
end = ossb->path[i];
size_t match_len;
const ucs4_t* const* match_rs = dict_group_match_longest(
converter->current_dict_group,
*inbuf,
end - begin,
&match_len
);
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
if (converter->conversion_mode == OPENCC_CONVERSION_FAST) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
const ucs4_t* result = match_rs[0];
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (ucs4len(result) > *outbuf_left) {
if (inbuf_left_start - *inbuf_left > 0) {
break;
}
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
for (; *result; result++) {
**outbuf = *result;
(*outbuf)++, (*outbuf_left)--;
}
*inbuf += match_len;
*inbuf_left -= match_len;
}
} else if (converter->conversion_mode ==
OPENCC_CONVERSION_LIST_CANDIDATES) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
size_t i;
for (i = 0; match_rs[i] != NULL; i++) {
const ucs4_t* result = match_rs[i];
int show_delimiter = match_rs[i + 1] != NULL ? 1 : 0;
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (ucs4len(result) + show_delimiter > *outbuf_left) {
if (inbuf_left_start - *inbuf_left > 0) {
break;
}
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
for (; *result; result++) {
**outbuf = *result;
(*outbuf)++, (*outbuf_left)--;
}
if (show_delimiter) {
**outbuf = DELIMITER;
(*outbuf)++, (*outbuf_left)--;
}
}
*inbuf += match_len;
*inbuf_left -= match_len;
}
} else if (converter->conversion_mode == OPENCC_CONVERSION_SEGMENT_ONLY) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (match_len + 1 > *outbuf_left) {
if (inbuf_left_start - *inbuf_left > 0) {
break;
}
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
size_t i;
for (i = 0; i < match_len; i++) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
}
}
**outbuf = DELIMITER;
(*outbuf)++, (*outbuf_left)--;
} else {
debug_should_not_be_here();
}
}
begin = end;
}
return inbuf_left_start - *inbuf_left;
}
static size_t segment(Converter* converter,
ucs4_t** inbuf,
size_t* inbuf_left,
ucs4_t** outbuf,
size_t* outbuf_left) {
/* 歧義分割最短路徑分詞 */
size_t i, start, bound;
const ucs4_t* inbuf_start = *inbuf;
size_t inbuf_left_start = *inbuf_left;
size_t sp_seg_length;
bound = 0;
for (i = start = 0; inbuf_start[i] && *inbuf_left > 0 && *outbuf_left > 0;
i++) {
if ((i != 0) && (i == bound)) {
/* 對歧義部分進行最短路徑分詞 */
sp_seg_length = sp_seg(converter,
inbuf,
inbuf_left,
outbuf,
outbuf_left,
bound - start);
if (sp_seg_length == (size_t)-1) {
return (size_t)-1;
}
if (sp_seg_length == 0) {
if (inbuf_left_start - *inbuf_left > 0) {
return inbuf_left_start - *inbuf_left;
}
/* 空間不足 */
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
start = i;
}
size_t match_len;
dict_group_match_longest(
converter->current_dict_group,
inbuf_start + i,
0,
&match_len
);
if (match_len == 0) {
match_len = 1;
}
if (i + match_len > bound) {
bound = i + match_len;
}
}
if ((*inbuf_left > 0) && (*outbuf_left > 0)) {
sp_seg_length = sp_seg(converter,
inbuf,
inbuf_left,
outbuf,
outbuf_left,
bound - start);
if (sp_seg_length == (size_t)-1) {
return (size_t)-1;
}
if (sp_seg_length == 0) {
if (inbuf_left_start - *inbuf_left > 0) {
return inbuf_left_start - *inbuf_left;
}
/* 空間不足 */
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
}
if (converter->conversion_mode == OPENCC_CONVERSION_SEGMENT_ONLY) {
(*outbuf)--;
(*outbuf_left)++;
}
return inbuf_left_start - *inbuf_left;
}
#endif /* if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH */
#if SEGMENT_METHOD == SEGMENT_MAXIMUM_LENGTH
static size_t segment(Converter* converter,
ucs4_t** inbuf,
size_t* inbuf_left,
ucs4_t** outbuf,
size_t* outbuf_left) {
/* 正向最大分詞 */
size_t inbuf_left_start = *inbuf_left;
for (; **inbuf && *inbuf_left > 0 && *outbuf_left > 0;) {
size_t match_len;
const ucs4_t* const* match_rs = dict_group_match_longest(
converter->current_dict_group,
*inbuf,
*inbuf_left,
&match_len
);
if (converter->conversion_mode == OPENCC_CONVERSION_FAST) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
const ucs4_t* result = match_rs[0];
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (ucs4len(result) > *outbuf_left) {
if (inbuf_left_start - *inbuf_left > 0) {
break;
}
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
for (; *result; result++) {
**outbuf = *result;
(*outbuf)++, (*outbuf_left)--;
}
*inbuf += match_len;
*inbuf_left -= match_len;
}
} else if (converter->conversion_mode ==
OPENCC_CONVERSION_LIST_CANDIDATES) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
size_t i;
for (i = 0; match_rs[i] != NULL; i++) {
const ucs4_t* result = match_rs[i];
int show_delimiter = match_rs[i + 1] != NULL ? 1 : 0;
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (ucs4len(result) + show_delimiter > *outbuf_left) {
if (inbuf_left_start - *inbuf_left > 0) {
break;
}
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
for (; *result; result++) {
**outbuf = *result;
(*outbuf)++, (*outbuf_left)--;
}
if (show_delimiter) {
**outbuf = DELIMITER;
(*outbuf)++, (*outbuf_left)--;
}
}
*inbuf += match_len;
*inbuf_left -= match_len;
}
} else if (converter->conversion_mode == OPENCC_CONVERSION_SEGMENT_ONLY) {
if (match_rs == NULL) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
} else {
/* 輸出緩衝區剩餘空間小於分詞長度 */
if (match_len + 1 > *outbuf_left) {
if (inbuf_left_start - *inbuf_left > 0) {
break;
}
errnum = CONVERTER_ERROR_OUTBUF;
return (size_t)-1;
}
size_t i;
for (i = 0; i < match_len; i++) {
**outbuf = **inbuf;
(*outbuf)++, (*outbuf_left)--;
(*inbuf)++, (*inbuf_left)--;
}
}
**outbuf = DELIMITER;
(*outbuf)++, (*outbuf_left)--;
} else {
debug_should_not_be_here();
}
}
if (converter->conversion_mode == OPENCC_CONVERSION_SEGMENT_ONLY) {
(*outbuf)--;
(*outbuf_left)++;
}
return inbuf_left_start - *inbuf_left;
}
#endif /* if SEGMENT_METHOD == SEGMENT_MAXIMUM_LENGTH */
size_t converter_convert(Converter* converter,
ucs4_t** inbuf,
size_t* inbuf_left,
ucs4_t** outbuf,
size_t* outbuf_left) {
if (converter->dict_chain == NULL) {
errnum = CONVERTER_ERROR_NODICT;
return (size_t)-1;
}
if (converter->dict_chain->count == 1) {
/* 只有一個辭典,直接輸出 */
return segment(converter,
inbuf,
inbuf_left,
outbuf,
outbuf_left);
}
// 啓用辭典轉換鏈
size_t inbuf_size = *inbuf_left;
size_t outbuf_size = *outbuf_left;
size_t retval = (size_t)-1;
size_t cinbuf_left, coutbuf_left;
size_t coutbuf_delta = 0;
size_t i, cur;
ucs4_t* tmpbuf = (ucs4_t*)malloc(sizeof(ucs4_t) * outbuf_size);
ucs4_t* orig_outbuf = *outbuf;
ucs4_t* cinbuf, * coutbuf;
cinbuf_left = inbuf_size;
coutbuf_left = outbuf_size;
cinbuf = *inbuf;
coutbuf = tmpbuf;
for (i = cur = 0; i < converter->dict_chain->count; ++i, cur = 1 - cur) {
if (i > 0) {
cinbuf_left = coutbuf_delta;
coutbuf_left = outbuf_size;
if (cur == 1) {
cinbuf = tmpbuf;
coutbuf = orig_outbuf;
} else {
cinbuf = orig_outbuf;
coutbuf = tmpbuf;
}
}
converter->current_dict_group = dict_chain_get_group(
converter->dict_chain,
i);
size_t ret = segment(converter,
&cinbuf,
&cinbuf_left,
&coutbuf,
&coutbuf_left);
if (ret == (size_t)-1) {
free(tmpbuf);
return (size_t)-1;
}
coutbuf_delta = outbuf_size - coutbuf_left;
if (i == 0) {
retval = ret;
*inbuf = cinbuf;
*inbuf_left = cinbuf_left;
}
}
if (cur == 1) {
// 結果在緩衝區
memcpy(*outbuf, tmpbuf, coutbuf_delta * sizeof(ucs4_t));
}
*outbuf += coutbuf_delta;
*outbuf_left = coutbuf_left;
free(tmpbuf);
return retval;
}
void converter_assign_dictionary(Converter* converter, DictChain* dict_chain) {
converter->dict_chain = dict_chain;
if (converter->dict_chain->count > 0) {
converter->current_dict_group = dict_chain_get_group(
converter->dict_chain,
0);
}
}
Converter* converter_open(void) {
Converter* converter = (Converter*)malloc(sizeof(Converter));
converter->dict_chain = NULL;
converter->current_dict_group = NULL;
#if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH
converter->data = (SpsegData*)malloc(sizeof(SpsegData));
SpsegData* spseg_buffer = converter->data;
spseg_buffer->initialized = 0;
spseg_buffer->match_length = NULL;
spseg_buffer->min_len = NULL;
spseg_buffer->parent = NULL;
spseg_buffer->path = NULL;
sp_seg_set_buffer_size(spseg_buffer, OPENCC_SP_SEG_DEFAULT_BUFFER_SIZE);
#endif /* if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH */
return converter;
}
void converter_close(Converter* converter) {
#if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH
sp_seg_buffer_free(converter->data);
free((SpsegData *)converter->data);
#endif /* if SEGMENT_METHOD == SEGMENT_SHORTEST_PATH */
free(converter);
}
void converter_set_conversion_mode(Converter* converter,
opencc_conversion_mode conversion_mode) {
converter->conversion_mode = conversion_mode;
}
converter_error converter_errno(void) {
return errnum;
}
void converter_perror(const char* spec) {
perr(spec);
perr("\n");
switch (errnum) {
case CONVERTER_ERROR_VOID:
break;
case CONVERTER_ERROR_NODICT:
perr(_("No dictionary loaded"));
break;
case CONVERTER_ERROR_OUTBUF:
perr(_("Output buffer not enough for one segment"));
break;
default:
perr(_("Unknown"));
}
}
opencc-0.4.3/src/utils.h 000640 567316 013202 00000002674 12145345503 016415 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_UTILS_H_
#define __OPENCC_UTILS_H_
#include "common.h"
#define debug_should_not_be_here() \
do { \
fprintf(stderr, "Should not be here %s: %d\n", __FILE__, __LINE__); \
assert(0); \
} while (0) \
void perr(const char* str);
int qsort_int_cmp(const void* a, const void* b);
char* mstrcpy(const char* str);
char* mstrncpy(const char* str, size_t n);
void skip_utf8_bom(FILE* fp);
const char* executable_path(void);
char* try_open_file(const char* path);
char* get_file_path(const char* filename);
int is_absolute_path(const char* path);
#endif /* __OPENCC_UTILS_H_ */
opencc-0.4.3/src/dict_chain.h 000640 567316 013202 00000001717 12145345503 017337 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __DICTIONARY_SET_H_
#define __DICTIONARY_SET_H_
#include "common.h"
DictChain* dict_chain_new(Config* config);
void dict_chain_delete(DictChain* dict_chain);
DictGroup* dict_chain_add_group(DictChain* dict_chain);
DictGroup* dict_chain_get_group(DictChain* dict_chain, size_t index);
#endif /* __DICTIONARY_SET_H_ */
opencc-0.4.3/src/dict.c 000640 567316 013202 00000006002 12145345503 016160 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "dict.h"
#include "dictionary/datrie.h"
#include "dictionary/text.h"
Dict* dict_new(const char* filename, opencc_dictionary_type type) {
Dict* dictionary = (Dict*)malloc(sizeof(Dict));
dictionary->type = type;
switch (type) {
case OPENCC_DICTIONARY_TYPE_TEXT:
dictionary->dict = dict_text_new(filename);
break;
case OPENCC_DICTIONARY_TYPE_DATRIE:
dictionary->dict = dict_datrie_new(filename);
break;
default:
free(dictionary);
dictionary = (Dict*)-1; /* TODO:辭典格式不支持 */
}
return dictionary;
}
void dict_delete(Dict* dict) {
switch (dict->type) {
case OPENCC_DICTIONARY_TYPE_TEXT:
dict_text_delete(dict->dict);
break;
case OPENCC_DICTIONARY_TYPE_DATRIE:
dict_datrie_delete(dict->dict);
break;
default:
debug_should_not_be_here();
}
free(dict);
}
const ucs4_t* const* dict_match_longest(Dict* dict,
const ucs4_t* word,
size_t maxlen,
size_t* match_length) {
switch (dict->type) {
case OPENCC_DICTIONARY_TYPE_TEXT:
return dict_text_match_longest(dict->dict,
word,
maxlen,
match_length);
break;
case OPENCC_DICTIONARY_TYPE_DATRIE:
return dict_datrie_match_longest(dict->dict,
word,
maxlen,
match_length);
break;
default:
debug_should_not_be_here();
}
return (const ucs4_t* const*)-1;
}
size_t dict_get_all_match_lengths(Dict* dict,
const ucs4_t* word,
size_t* match_length) {
switch (dict->type) {
case OPENCC_DICTIONARY_TYPE_TEXT:
return dict_text_get_all_match_lengths(dict->dict,
word,
match_length);
break;
case OPENCC_DICTIONARY_TYPE_DATRIE:
return dict_datrie_get_all_match_lengths(dict->dict,
word,
match_length);
break;
default:
debug_should_not_be_here();
}
return (size_t)-1;
}
opencc-0.4.3/src/common.h 000640 567316 013202 00000004220 12145345503 016532 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __COMMON_H_
#define __COMMON_H_
#include
#include
#include
#include
#include
#include "opencc_types.h"
#define INFINITY_INT ((~0U) >> 1)
#ifdef ENABLE_GETTEXT
# include
# include
# define _(STRING) dgettext(PACKAGE_NAME, STRING)
#else // ENABLE_GETTEXT
# define _(STRING) STRING
#endif // ENABLE_GETTEXT
#ifndef PKGDATADIR
#define PKGDATADIR ""
#endif
struct SConfig;
struct SConverter;
struct SDict;
struct SDictGroup;
struct SDictChain;
struct SDictMeta;
typedef struct SConfig Config;
typedef struct SConverter Converter;
typedef struct SDict Dict;
typedef struct SDictGroup DictGroup;
typedef struct SDictChain DictChain;
typedef struct SDictMeta DictMeta;
struct SDict {
opencc_dictionary_type type;
Dict* dict;
};
#define DICTIONARY_MAX_COUNT 128
struct SDictGroup {
DictChain* dict_chain;
size_t count;
Dict* dicts[DICTIONARY_MAX_COUNT];
};
#define DICTIONARY_GROUP_MAX_COUNT 128
struct SDictChain {
Config* config;
size_t count;
DictGroup* groups[DICTIONARY_GROUP_MAX_COUNT];
};
struct SDictMeta {
opencc_dictionary_type dict_type;
char* file_name;
size_t index;
size_t stamp;
};
struct SConfig {
char* title;
char* description;
DictChain* dict_chain;
char* file_path;
DictMeta dicts[DICTIONARY_MAX_COUNT];
size_t dicts_count;
size_t stamp;
};
struct SConverter {
opencc_conversion_mode conversion_mode;
DictChain* dict_chain;
DictGroup* current_dict_group;
void* data;
};
#endif // __COMMON_H_
opencc-0.4.3/src/opencc_types.h 000640 567316 013202 00000002770 12145345503 017745 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_TYPES_H_
#define __OPENCC_TYPES_H_
#ifdef __cplusplus
extern "C" {
#endif // ifdef __cplusplus
#include
#include
typedef void* opencc_t;
typedef uint32_t ucs4_t;
enum _opencc_error {
OPENCC_ERROR_VOID,
OPENCC_ERROR_DICTLOAD,
OPENCC_ERROR_CONFIG,
OPENCC_ERROR_ENCODING,
OPENCC_ERROR_ENCODIND = OPENCC_ERROR_ENCODING,
OPENCC_ERROR_CONVERTER
};
typedef enum _opencc_error opencc_error;
enum _opencc_dictionary_type {
OPENCC_DICTIONARY_TYPE_TEXT,
OPENCC_DICTIONARY_TYPE_DATRIE
};
typedef enum _opencc_dictionary_type opencc_dictionary_type;
enum _opencc_conversion_mode {
OPENCC_CONVERSION_FAST = 0,
OPENCC_CONVERSION_SEGMENT_ONLY = 1,
OPENCC_CONVERSION_LIST_CANDIDATES = 2
};
typedef enum _opencc_conversion_mode opencc_conversion_mode;
#ifdef __cplusplus
}
#endif // ifdef __cplusplus
#endif /* __OPENCC_TYPES_H_ */
opencc-0.4.3/src/config_reader.h 000640 567316 013202 00000002207 12145345503 020034 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_CONFIG_H_
#define __OPENCC_CONFIG_H_
#include "common.h"
#include "dict_chain.h"
typedef enum {
CONFIG_ERROR_VOID,
CONFIG_ERROR_CANNOT_ACCESS_CONFIG_FILE,
CONFIG_ERROR_PARSE,
CONFIG_ERROR_NO_PROPERTY,
CONFIG_ERROR_INVALID_DICT_TYPE,
} config_error;
Config* config_open(const char* filename);
void config_close(Config* config);
DictChain* config_get_dict_chain(Config* config);
config_error config_errno(void);
void config_perror(const char* spec);
#endif /* __OPENCC_CONFIG_H_ */
opencc-0.4.3/src/dict_chain.c 000640 567316 013202 00000002737 12145345503 017335 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "dict_group.h"
#include "dict_chain.h"
DictChain* dict_chain_new(Config* config) {
DictChain* dict_chain = (DictChain*)malloc(sizeof(DictChain));
dict_chain->count = 0;
dict_chain->config = config;
return dict_chain;
}
void dict_chain_delete(DictChain* dict_chain) {
size_t i;
for (i = 0; i < dict_chain->count; i++) {
dict_group_delete(dict_chain->groups[i]);
}
free(dict_chain);
}
DictGroup* dict_chain_add_group(DictChain* dict_chain) {
if (dict_chain->count + 1 == DICTIONARY_GROUP_MAX_COUNT) {
return (DictGroup*)-1;
}
DictGroup* group = dict_group_new(dict_chain);
dict_chain->groups[dict_chain->count++] = group;
return group;
}
DictGroup* dict_chain_get_group(DictChain* dict_chain, size_t index) {
if (index >= dict_chain->count) {
return (DictGroup*)-1;
}
return dict_chain->groups[index];
}
opencc-0.4.3/src/wrapper/cplusplus/openccxx.h 000640 567316 013202 00000006037 12145345503 022613 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCCXX_H_
#define __OPENCCXX_H_
/**
* c++ wrapper for opencc
*/
#ifdef __cplusplus
extern "C" {
# include
}
# include
# include
namespace opencc {
class opencc {
public:
opencc(const char* config_file = NULL)
: od((opencc_t)-1) {
open(config_file);
}
virtual ~opencc() {
if (od != (opencc_t)-1) {
opencc_close(od);
}
}
operator bool() const {
return od != (opencc_t)-1;
}
int open(const char* config_file) {
if (od != (opencc_t)-1) {
opencc_close(od);
}
od = opencc_open(config_file);
return (od == (opencc_t)-1) ? (-1) : (0);
}
int set_conversion_mode(opencc_conversion_mode conversion_mode) {
if (od == (opencc_t)-1) {
return -1;
}
opencc_set_conversion_mode(od, conversion_mode);
return 0;
}
long convert(const std::string& in, std::string& out, long length = -1) {
if (od == (opencc_t)-1) {
return -1;
}
if (length == -1) {
length = in.length();
}
char* outbuf = opencc_convert_utf8(od, in.c_str(), length);
if (outbuf == (char*)-1) {
return -1;
}
out = outbuf;
free(outbuf);
return length;
}
/**
* Warning:
* This method can be used only if wchar_t is encoded in UCS4 on your
*platform.
*/
long convert(const std::wstring& in, std::wstring& out, long length = -1) {
if (od == (opencc_t)-1) {
return -1;
}
size_t inbuf_left = in.length();
if ((length >= 0) && (length < (long)inbuf_left)) {
inbuf_left = length;
}
const ucs4_t* inbuf = (const ucs4_t*)in.c_str();
long count = 0;
while (inbuf_left != 0) {
size_t retval;
size_t outbuf_left;
ucs4_t* outbuf;
/* occupy space */
outbuf_left = inbuf_left + 64;
out.resize(count + outbuf_left);
outbuf = (ucs4_t*)out.c_str() + count;
retval = opencc_convert(od, (ucs4_t**)&inbuf,
&inbuf_left, &outbuf, &outbuf_left);
if (retval == (size_t)-1) {
return -1;
}
count += retval;
}
/* set the zero termination and shrink the size */
out.resize(count + 1);
out[count] = L'\0';
return count;
}
opencc_error errno() const {
return opencc_errno();
}
void perror(const char* spec = "OpenCC") const {
opencc_perror(spec);
}
private:
opencc_t od;
};
}
#endif // ifdef __cplusplus
#endif /* __OPENCCXX_H_ */
opencc-0.4.3/src/wrapper/python/opencc.py 000750 567316 013202 00000005245 12145345503 021725 0 ustar 00carbokuo nonconf 000000 000000 #!/usr/bin/env python
# -*- coding: utf-8 -*-
from ctypes import cast, cdll, c_char_p, c_int, c_size_t, c_void_p
from ctypes.util import find_library
import sys
class ConvertError(Exception):
pass
class DictType:
TEXT,DATRIE = 0,1
## @defgroup python_api Python API
# API in python language
## OpenCC Python language binding
# @ingroup python_api
class OpenCC:
## Constructor
# @param self The object pointer.
# @param config Filename of config.
# @param verbose Specifies whether error information is printed.
# @ingroup python_api
def __init__(self, config=None, verbose=True):
self.libopencc = cdll.LoadLibrary(find_library('opencc'))
self.libopencc.opencc_open.restype = c_void_p
self.libopencc.opencc_convert_utf8.argtypes = [c_void_p, c_char_p, c_size_t]
# for checking for the returned '-1' pointer in case opencc_convert() fails.
# c_char_p always tries to convert the returned (char *) to a Python string,
self.libopencc.opencc_convert_utf8.restype = c_void_p
self.libopencc.opencc_close.argtypes = [c_void_p]
self.libopencc.opencc_perror.argtypes = [c_char_p]
self.libopencc.opencc_dict_load.argtypes = [c_void_p, c_char_p, c_int]
self.libc = cdll.LoadLibrary(find_library('c'))
self.libc.free.argtypes = [c_void_p]
self.config = config
self.verbose = verbose
self.od = None
## @deprecated
def __enter__(self):
if self.config is None:
self.od = self.libopencc.opencc_open(0)
else:
self.od = self.libopencc.opencc_open(c_char_p(self.config))
return self
## @deprecated
def __exit__(self, type, value, traceback):
self.libopencc.opencc_close(self.od)
self.od = None
def __perror(self, message):
if self.verbose:
self.libopencc.opencc_perror(message)
## Converts text.
# @param self The object pointer.
# @param text Input text.
# @return Converted text.
# @ingroup python_api
def convert(self, text):
retv_c = self.libopencc.opencc_convert_utf8(self.od, text, len(text))
if retv_c == -1:
self.__perror('OpenCC error:')
raise ConvertError()
retv_c = cast(retv_c, c_char_p)
str_buffer = retv_c.value
self.libc.free(retv_c);
return str_buffer
## @deprecated
def dict_load(self, filename, dicttype):
retv = self.libopencc.opencc_dict_load(self.od, filename, dicttype)
if retv == -1:
self.__perror('OpenCC error:')
return retv
if __name__ == "__main__":
with sys.stdin as fp:
text = fp.read()
with OpenCC() as converter:
for path in ['simp_to_trad_characters.ocd',
'simp_to_trad_phrases.ocd']:
converter.dict_load(path, DictType.DATRIE)
print converter.convert(text)
opencc-0.4.3/src/converter.h 000640 567316 013202 00000002700 12145345503 017252 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __CONVERTER_H_
#define __CONVERTER_H_
#include "common.h"
#include "dict_chain.h"
typedef enum {
CONVERTER_ERROR_VOID,
CONVERTER_ERROR_NODICT,
CONVERTER_ERROR_OUTBUF,
} converter_error;
void converter_assign_dictionary(Converter* converter, DictChain* DictChain);
Converter* converter_open(void);
void converter_close(Converter* converter);
size_t converter_convert(Converter* converter,
ucs4_t** inbuf,
size_t* inbuf_left,
ucs4_t** outbuf,
size_t* outbuf_left);
void converter_set_conversion_mode(Converter* converter,
opencc_conversion_mode conversion_mode);
converter_error converter_errno(void);
void converter_perror(const char* spec);
#endif /* __CONVERTER_H_ */
opencc-0.4.3/src/encoding.c 000640 567316 013202 00000017222 12145345503 017031 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "encoding.h"
#include "opencc.h"
#define INITIAL_BUFF_SIZE 1024
#define GET_BIT(byte, pos) (((byte) >> (pos))& 1)
#define BITMASK(length) ((1 << length) - 1)
ucs4_t* utf8_to_ucs4(const char* utf8, size_t length) {
if (length == 0) {
length = (size_t)-1;
}
size_t i;
for (i = 0; i < length && utf8[i] != '\0'; i++) {}
length = i;
size_t freesize = INITIAL_BUFF_SIZE;
ucs4_t* ucs4 = (ucs4_t*)malloc(sizeof(ucs4_t) * freesize);
ucs4_t* pucs4 = ucs4;
for (i = 0; i < length; i++) {
ucs4_t byte[4] = { 0 };
if (GET_BIT(utf8[i], 7) == 0) {
/* U-00000000 - U-0000007F */
/* 0xxxxxxx */
byte[0] = utf8[i] & BITMASK(7);
} else if (GET_BIT(utf8[i], 5) == 0) {
/* U-00000080 - U-000007FF */
/* 110xxxxx 10xxxxxx */
if (i + 1 >= length) {
goto err;
}
byte[0] = (utf8[i + 1] & BITMASK(6)) +
((utf8[i] & BITMASK(2)) << 6);
byte[1] = (utf8[i] >> 2) & BITMASK(3);
i += 1;
} else if (GET_BIT(utf8[i], 4) == 0) {
/* U-00000800 - U-0000FFFF */
/* 1110xxxx 10xxxxxx 10xxxxxx */
if (i + 2 >= length) {
goto err;
}
byte[0] = (utf8[i + 2] & BITMASK(6)) +
((utf8[i + 1] & BITMASK(2)) << 6);
byte[1] = ((utf8[i + 1] >> 2) & BITMASK(4))
+ ((utf8[i] & BITMASK(4)) << 4);
i += 2;
} else if (GET_BIT(utf8[i], 3) == 0) {
/* U-00010000 - U-001FFFFF */
/* 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */
if (i + 3 >= length) {
goto err;
}
byte[0] = (utf8[i + 3] & BITMASK(6)) +
((utf8[i + 2] & BITMASK(2)) << 6);
byte[1] = ((utf8[i + 2] >> 2) & BITMASK(4)) +
((utf8[i + 1] & BITMASK(4)) << 4);
byte[2] = ((utf8[i + 1] >> 4) & BITMASK(2)) +
((utf8[i] & BITMASK(3)) << 2);
i += 3;
} else if (GET_BIT(utf8[i], 2) == 0) {
/* U-00200000 - U-03FFFFFF */
/* 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx */
if (i + 4 >= length) {
goto err;
}
byte[0] = (utf8[i + 4] & BITMASK(6)) +
((utf8[i + 3] & BITMASK(2)) << 6);
byte[1] = ((utf8[i + 3] >> 2) & BITMASK(4)) +
((utf8[i + 2] & BITMASK(4)) << 4);
byte[2] = ((utf8[i + 2] >> 4) & BITMASK(2)) +
((utf8[i + 1] & BITMASK(6)) << 2);
byte[3] = utf8[i] & BITMASK(2);
i += 4;
} else if (GET_BIT(utf8[i], 1) == 0) {
/* U-04000000 - U-7FFFFFFF */
/* 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx */
if (i + 5 >= length) {
goto err;
}
byte[0] = (utf8[i + 5] & BITMASK(6)) +
((utf8[i + 4] & BITMASK(2)) << 6);
byte[1] = ((utf8[i + 4] >> 2) & BITMASK(4)) +
((utf8[i + 3] & BITMASK(4)) << 4);
byte[2] = ((utf8[i + 3] >> 4) & BITMASK(2)) +
((utf8[i + 2] & BITMASK(6)) << 2);
byte[3] = (utf8[i + 1] & BITMASK(6)) +
((utf8[i] & BITMASK(1)) << 6);
i += 5;
} else {
goto err;
}
if (freesize == 0) {
freesize = pucs4 - ucs4;
ucs4 = (ucs4_t*)realloc(ucs4, sizeof(ucs4_t) * (freesize + freesize));
pucs4 = ucs4 + freesize;
}
*pucs4 = (byte[3] << 24) + (byte[2] << 16) + (byte[1] << 8) + byte[0];
pucs4++;
freesize--;
}
length = (pucs4 - ucs4 + 1);
ucs4 = (ucs4_t*)realloc(ucs4, sizeof(ucs4_t) * length);
ucs4[length - 1] = 0;
return ucs4;
err:
free(ucs4);
return (ucs4_t*)-1;
}
char* ucs4_to_utf8(const ucs4_t* ucs4, size_t length) {
if (length == 0) {
length = (size_t)-1;
}
size_t i;
for (i = 0; i < length && ucs4[i] != 0; i++) {}
length = i;
size_t freesize = INITIAL_BUFF_SIZE;
char* utf8 = (char*)malloc(sizeof(char) * freesize);
char* putf8 = utf8;
for (i = 0; i < length; i++) {
if ((ssize_t)freesize - 6 <= 0) {
freesize = putf8 - utf8;
utf8 = (char*)realloc(utf8, sizeof(char) * (freesize + freesize));
putf8 = utf8 + freesize;
}
ucs4_t c = ucs4[i];
ucs4_t byte[4] = {
(c >> 0) & BITMASK(8), (c >> 8) & BITMASK(8),
(c >> 16) & BITMASK(8), (c >> 24) & BITMASK(8)
};
size_t delta = 0;
if (c <= 0x7F) {
/* U-00000000 - U-0000007F */
/* 0xxxxxxx */
putf8[0] = byte[0] & BITMASK(7);
delta = 1;
} else if (c <= 0x7FF) {
/* U-00000080 - U-000007FF */
/* 110xxxxx 10xxxxxx */
putf8[1] = 0x80 + (byte[0] & BITMASK(6));
putf8[0] = 0xC0 + ((byte[0] >> 6) & BITMASK(2)) +
((byte[1] & BITMASK(3)) << 2);
delta = 2;
} else if (c <= 0xFFFF) {
/* U-00000800 - U-0000FFFF */
/* 1110xxxx 10xxxxxx 10xxxxxx */
putf8[2] = 0x80 + (byte[0] & BITMASK(6));
putf8[1] = 0x80 + ((byte[0] >> 6) & BITMASK(2)) +
((byte[1] & BITMASK(4)) << 2);
putf8[0] = 0xE0 + ((byte[1] >> 4) & BITMASK(4));
delta = 3;
} else if (c <= 0x1FFFFF) {
/* U-00010000 - U-001FFFFF */
/* 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */
putf8[3] = 0x80 + (byte[0] & BITMASK(6));
putf8[2] = 0x80 + ((byte[0] >> 6) & BITMASK(2)) +
((byte[1] & BITMASK(4)) << 2);
putf8[1] = 0x80 + ((byte[1] >> 4) & BITMASK(4)) +
((byte[2] & BITMASK(2)) << 4);
putf8[0] = 0xF0 + ((byte[2] >> 2) & BITMASK(3));
delta = 4;
} else if (c <= 0x3FFFFFF) {
/* U-00200000 - U-03FFFFFF */
/* 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx */
putf8[4] = 0x80 + (byte[0] & BITMASK(6));
putf8[3] = 0x80 + ((byte[0] >> 6) & BITMASK(2)) +
((byte[1] & BITMASK(4)) << 2);
putf8[2] = 0x80 + ((byte[1] >> 4) & BITMASK(4)) +
((byte[2] & BITMASK(2)) << 4);
putf8[1] = 0x80 + ((byte[2] >> 2) & BITMASK(6));
putf8[0] = 0xF8 + (byte[3] & BITMASK(2));
delta = 5;
} else if (c <= 0x7FFFFFFF) {
/* U-04000000 - U-7FFFFFFF */
/* 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx */
putf8[5] = 0x80 + (byte[0] & BITMASK(6));
putf8[4] = 0x80 + ((byte[0] >> 6) & BITMASK(2)) +
((byte[1] & BITMASK(4)) << 2);
putf8[3] = 0x80 + ((byte[1] >> 4) & BITMASK(4)) +
((byte[2] & BITMASK(2)) << 4);
putf8[2] = 0x80 + ((byte[2] >> 2) & BITMASK(6));
putf8[1] = 0x80 + (byte[3] & BITMASK(6));
putf8[0] = 0xFC + ((byte[3] >> 6) & BITMASK(1));
delta = 6;
} else {
free(utf8);
return (char*)-1;
}
putf8 += delta;
freesize -= delta;
}
length = (putf8 - utf8 + 1);
utf8 = (char*)realloc(utf8, sizeof(char) * length);
utf8[length - 1] = '\0';
return utf8;
}
size_t ucs4len(const ucs4_t* str) {
const register ucs4_t* pstr = str;
while (*pstr) {
++pstr;
}
return pstr - str;
}
int ucs4cmp(const ucs4_t* src, const ucs4_t* dst) {
register int ret = 0;
while (!(ret = *src - *dst) && *dst) {
++src, ++dst;
}
return ret;
}
void ucs4cpy(ucs4_t* dest, const ucs4_t* src) {
while (*src) {
*dest++ = *src++;
}
*dest = 0;
}
void ucs4ncpy(ucs4_t* dest, const ucs4_t* src, size_t len) {
while (*src && len-- > 0) {
*dest++ = *src++;
}
}
opencc-0.4.3/src/dict.h 000640 567316 013202 00000002422 12145345503 016167 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_DICTIONARY_ABSTRACT_H_
#define __OPENCC_DICTIONARY_ABSTRACT_H_
#include "common.h"
#include "utils.h"
Dict* dict_new(const char* filename, opencc_dictionary_type type);
void dict_delete(Dict* dict);
const ucs4_t* const* dict_match_longest(Dict* dict,
const ucs4_t* word,
size_t maxlen,
size_t* match_length);
size_t dict_get_all_match_lengths(Dict* dict,
const ucs4_t* word,
size_t* match_length);
#endif /* __OPENCC_DICTIONARY_ABSTRACT_H_ */
opencc-0.4.3/src/utils.c 000640 567316 013202 00000010373 12145345503 016403 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "utils.h"
#include
#ifdef __APPLE__
#include "TargetConditionals.h"
#ifdef TARGET_OS_MAC
#include
#elif TARGET_OS_IPHONE
#elif TARGET_IPHONE_SIMULATOR
#else /* ifdef TARGET_OS_MAC */
#endif /* ifdef TARGET_OS_MAC */
#elif defined _WIN32 || defined _WIN64
#include "Windows.h"
#endif /* ifdef __APPLE__ */
#if defined _WIN32 || defined _WIN64
#define PATH_SEPARATOR '\\'
#else
#define PATH_SEPARATOR '/'
#endif
#define PATH_BUFFER_SIZE 4096
void perr(const char* str) {
fputs(str, stderr);
}
int qsort_int_cmp(const void* a, const void* b) {
return *((int*)a) - *((int*)b);
}
char* mstrcpy(const char* str) {
char* strbuf = (char*)malloc(sizeof(char) * (strlen(str) + 1));
strcpy(strbuf, str);
return strbuf;
}
char* mstrncpy(const char* str, size_t n) {
char* strbuf = (char*)malloc(sizeof(char) * (n + 1));
strncpy(strbuf, str, n);
strbuf[n] = '\0';
return strbuf;
}
void skip_utf8_bom(FILE* fp) {
int bom[3];
int n;
/* UTF-8 BOM is EF BB BF */
if (fp == NULL) {
return;
}
/* If we are not at beginning of file, return */
if (ftell(fp) != 0) {
return;
}
/* Try to read first 3 bytes */
for (n = 0; n <= 2 && (bom[n] = getc(fp)) != EOF; n++) {}
/* If we can only read <3 bytes, push them back */
/* Or if first 3 bytes is not BOM, push them back */
if ((n < 3) || (bom[0] != 0xEF) || (bom[1] != 0xBB) || (bom[2] != 0xBF)) {
for (n--; n >= 0; n--) {
ungetc(bom[n], fp);
}
}
/* Otherwise, BOM is already skipped */
}
const char* executable_path(void) {
static char path_buffer[PATH_BUFFER_SIZE];
static int calculated = 0;
if (!calculated) {
#ifdef __linux
ssize_t res = readlink("/proc/self/exe", path_buffer, sizeof(path_buffer));
assert(res != -1);
#elif __APPLE__
uint32_t size = sizeof(path_buffer);
int res = _NSGetExecutablePath(path_buffer, &size);
assert(res == 0);
#elif _WIN32 || _WIN64
// NOTE: for "C:\\opencc.exe" on Windows, the returned path "C:" is
// incorrect until a '/' is appended to it later in try_open_file()
DWORD res = GetModuleFileNameA(NULL, path_buffer, PATH_BUFFER_SIZE);
assert(res != 0);
#else
/* Other unsupported os */
assert(0);
#endif /* ifdef __linux */
char* last_sep = strrchr(path_buffer, PATH_SEPARATOR);
assert(last_sep != NULL);
*last_sep = '\0';
calculated = 1;
}
return path_buffer;
}
char* try_open_file(const char* path) {
/* Try to find file in current working directory */
FILE* fp = fopen(path, "r");
if (fp) {
fclose(fp);
return mstrcpy(path);
}
/* If path is absolute, return NULL */
if (is_absolute_path(path)) {
return NULL;
}
/* Try to find file in executable directory */
const char* exe_dir = executable_path();
char* filename =
(char*)malloc(sizeof(char) * (strlen(path) + strlen(exe_dir) + 2));
sprintf(filename, "%s/%s", exe_dir, path);
fp = fopen(filename, "r");
if (fp) {
fclose(fp);
return filename;
}
free(filename);
/* Try to use PKGDATADIR */
filename =
(char*)malloc(sizeof(char) * (strlen(path) + strlen(PKGDATADIR) + 2));
sprintf(filename, "%s/%s", PKGDATADIR, path);
fp = fopen(filename, "r");
if (fp) {
fclose(fp);
return filename;
}
free(filename);
return NULL;
}
char* get_file_path(const char* filename) {
const char* last_sep = strrchr(filename, '/');
if (last_sep == NULL) {
last_sep = filename;
}
char* path = mstrncpy(filename, last_sep - filename);
return path;
}
int is_absolute_path(const char* path) {
if (path[0] == '/') {
return 1;
}
if (path[1] == ':') {
return 1;
}
return 0;
}
opencc-0.4.3/src/opencc.c 000640 567316 013202 00000015310 12145345503 016506 0 ustar 00carbokuo nonconf 000000 000000 /**
* @file
* OpenCC API.
*
* @license
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "common.h"
#include "config_reader.h"
#include "converter.h"
#include "dict_group.h"
#include "dict_chain.h"
#include "encoding.h"
#include "opencc.h"
typedef struct {
DictChain* dict_chain;
Converter* converter;
} OpenccDesc;
static opencc_error errnum = OPENCC_ERROR_VOID;
static int lib_initialized = 0;
static void lib_initialize(void) {
#ifdef ENABLE_GETTEXT
bindtextdomain(PACKAGE_NAME, LOCALEDIR);
#endif /* ifdef ENABLE_GETTEXT */
lib_initialized = 1;
}
size_t opencc_convert(opencc_t t_opencc,
ucs4_t** inbuf,
size_t* inbuf_left,
ucs4_t** outbuf,
size_t* outbuf_left) {
if (!lib_initialized) {
lib_initialize();
}
OpenccDesc* opencc = (OpenccDesc*)t_opencc;
size_t retval = converter_convert(opencc->converter,
inbuf,
inbuf_left,
outbuf,
outbuf_left);
if (retval == (size_t)-1) {
errnum = OPENCC_ERROR_CONVERTER;
}
return retval;
}
char* opencc_convert_utf8(opencc_t t_opencc, const char* inbuf, size_t length) {
if (!lib_initialized) {
lib_initialize();
}
size_t actual_length = strlen(inbuf);
if ((length == (size_t)-1) || (length > actual_length)) {
length = actual_length;
}
ucs4_t* winbuf = utf8_to_ucs4(inbuf, length);
if (winbuf == (ucs4_t*)-1) {
/* Can not convert input UTF8 to UCS4 */
errnum = OPENCC_ERROR_ENCODING;
return (char*)-1;
}
/* Set up UTF8 buffer */
size_t outbuf_len = length;
size_t outsize = outbuf_len;
char* original_outbuf = (char*)malloc(sizeof(char) * (outbuf_len + 1));
char* outbuf = original_outbuf;
original_outbuf[0] = '\0';
/* Set conversion buffer */
size_t wbufsize = length + 64;
ucs4_t* woutbuf = (ucs4_t*)malloc(sizeof(ucs4_t) * (wbufsize + 1));
ucs4_t* pinbuf = winbuf;
ucs4_t* poutbuf = woutbuf;
size_t inbuf_left, outbuf_left;
inbuf_left = ucs4len(winbuf);
outbuf_left = wbufsize;
while (inbuf_left > 0) {
size_t retval = opencc_convert(t_opencc,
&pinbuf,
&inbuf_left,
&poutbuf,
&outbuf_left);
if (retval == (size_t)-1) {
free(outbuf);
free(winbuf);
free(woutbuf);
return (char*)-1;
}
*poutbuf = L'\0';
char* ubuff = ucs4_to_utf8(woutbuf, (size_t)-1);
if (ubuff == (char*)-1) {
free(outbuf);
free(winbuf);
free(woutbuf);
errnum = OPENCC_ERROR_ENCODING;
return (char*)-1;
}
size_t ubuff_len = strlen(ubuff);
while (ubuff_len > outsize) {
size_t outbuf_offset = outbuf - original_outbuf;
outsize += outbuf_len;
outbuf_len += outbuf_len;
original_outbuf =
(char*)realloc(original_outbuf, sizeof(char) * outbuf_len);
outbuf = original_outbuf + outbuf_offset;
}
strncpy(outbuf, ubuff, ubuff_len);
free(ubuff);
outbuf += ubuff_len;
*outbuf = '\0';
outbuf_left = wbufsize;
poutbuf = woutbuf;
}
free(winbuf);
free(woutbuf);
original_outbuf = (char*)realloc(original_outbuf,
sizeof(char) * (strlen(original_outbuf) + 1));
return original_outbuf;
}
void opencc_convert_utf8_free(char* buf) {
free(buf);
}
opencc_t opencc_open(const char* config_file) {
if (!lib_initialized) {
lib_initialize();
}
OpenccDesc* opencc;
opencc = (OpenccDesc*)malloc(sizeof(OpenccDesc));
opencc->dict_chain = NULL;
opencc->converter = converter_open();
converter_set_conversion_mode(opencc->converter, OPENCC_CONVERSION_FAST);
if (config_file == NULL) {
/* TODO load default */
assert(0);
} else {
/* Load config */
Config* config = config_open(config_file);
if (config == (Config*)-1) {
errnum = OPENCC_ERROR_CONFIG;
return (opencc_t)-1;
}
opencc->dict_chain = config_get_dict_chain(config);
converter_assign_dictionary(opencc->converter, opencc->dict_chain);
config_close(config);
}
return (opencc_t)opencc;
}
int opencc_close(opencc_t t_opencc) {
if (!lib_initialized) {
lib_initialize();
}
OpenccDesc* opencc = (OpenccDesc*)t_opencc;
converter_close(opencc->converter);
if (opencc->dict_chain != NULL) {
dict_chain_delete(opencc->dict_chain);
}
free(opencc);
return 0;
}
int opencc_dict_load(opencc_t t_opencc,
const char* dict_filename,
opencc_dictionary_type dict_type) {
if (!lib_initialized) {
lib_initialize();
}
OpenccDesc* opencc = (OpenccDesc*)t_opencc;
DictGroup* DictGroup;
if (opencc->dict_chain == NULL) {
opencc->dict_chain = dict_chain_new(NULL);
DictGroup = dict_chain_add_group(opencc->dict_chain);
} else {
DictGroup = dict_chain_get_group(opencc->dict_chain, 0);
}
int retval = dict_group_load(DictGroup, dict_filename, dict_type);
if (retval == -1) {
errnum = OPENCC_ERROR_DICTLOAD;
return -1;
}
converter_assign_dictionary(opencc->converter, opencc->dict_chain);
return retval;
}
void opencc_set_conversion_mode(opencc_t t_opencc,
opencc_conversion_mode conversion_mode) {
if (!lib_initialized) {
lib_initialize();
}
OpenccDesc* opencc = (OpenccDesc*)t_opencc;
converter_set_conversion_mode(opencc->converter, conversion_mode);
}
opencc_error opencc_errno(void) {
if (!lib_initialized) {
lib_initialize();
}
return errnum;
}
void opencc_perror(const char* spec) {
if (!lib_initialized) {
lib_initialize();
}
perr(spec);
perr("\n");
switch (errnum) {
case OPENCC_ERROR_VOID:
break;
case OPENCC_ERROR_DICTLOAD:
dictionary_perror(_("Dictionary loading error"));
break;
case OPENCC_ERROR_CONFIG:
config_perror(_("Configuration error"));
break;
case OPENCC_ERROR_CONVERTER:
converter_perror(_("Converter error"));
break;
case OPENCC_ERROR_ENCODING:
perr(_("Encoding error"));
break;
default:
perr(_("Unknown"));
}
perr("\n");
}
opencc-0.4.3/src/dict_group.h 000640 567316 013202 00000003267 12145345503 017413 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __DICTIONARY_GROUP_H_
#define __DICTIONARY_GROUP_H_
#include "common.h"
#include "dict.h"
typedef enum {
DICTIONARY_ERROR_VOID,
DICTIONARY_ERROR_NODICT,
DICTIONARY_ERROR_CANNOT_ACCESS_DICTFILE,
DICTIONARY_ERROR_INVALID_DICT,
DICTIONARY_ERROR_INVALID_INDEX,
} dictionary_error;
DictGroup* dict_group_new(DictChain* t_DictChain);
void dict_group_delete(DictGroup* dict_group);
int dict_group_load(DictGroup* dict_group,
const char* filename,
opencc_dictionary_type type);
const ucs4_t* const* dict_group_match_longest(
DictGroup* dict_group,
const ucs4_t* word,
size_t maxlen,
size_t* match_length);
size_t dict_group_get_all_match_lengths(DictGroup* dict_group,
const ucs4_t* word,
size_t* match_length);
Dict* dict_group_get_dict(DictGroup* dict_group, size_t index);
dictionary_error dictionary_errno(void);
void dictionary_perror(const char* spec);
#endif /* __DICTIONARY_GROUP_H_ */
opencc-0.4.3/src/opencc.h 000640 567316 013202 00000011306 12145345503 016514 0 ustar 00carbokuo nonconf 000000 000000 /**
* @file
* OpenCC API.
*
* @license
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef __OPENCC_H_
#define __OPENCC_H_
/**
* @defgroup opencc_api OpenCC API
*
* API in C language
*/
#include "opencc_types.h"
#ifdef __cplusplus
extern "C" {
#endif
/**
* Filename of default Simplified to Traditional configuration.
*
* @ingroup opencc_api
*/
#define OPENCC_DEFAULT_CONFIG_SIMP_TO_TRAD "zhs2zht.ini"
/**
* Filename of default Traditional to Simplified configuration.
*
* @ingroup opencc_api
*/
#define OPENCC_DEFAULT_CONFIG_TRAD_TO_SIMP "zht2zhs.ini"
/**
* Makes an instance of opencc.
* Leave config_file to NULL if you do not want to load any configuration file.
*
* @param config_file Location of configuration file.
* @return A description pointer of the newly allocated instance of
* opencc. On error the return value will be (opencc_t) -1.
* @ingroup opencc_api
*/
opencc_t opencc_open(const char* config_file);
/**
* Destroys an instance of opencc.
*
* @param od The description pointer.
* @return 0 on success or non-zero number on failure.
*/
int opencc_close(opencc_t od);
/**
* Converts a UCS-4 string from *inbuf to *outbuf.
* Do not forget to assign **outbuf to L'\0' after called if you want to use it
* as a C-Style string.
*
* @param od The opencc description pointer.
* @param inbuf The pointer to the UCS-4 string.
* @param inbufleft The maximum number of characters in *inbuf to be converted.
* @param outbuf The pointer to the output buffer.
* @param outbufleft The size of output buffer.
*
* @return The number of characters in the input buffer that has been
* converted.
* @ingroup opencc_api
*/
size_t opencc_convert(opencc_t od,
ucs4_t** inbuf,
size_t* inbufleft,
ucs4_t** outbuf,
size_t* outbufleft);
/**
* Converts UTF-8 string from inbuf.
* This function returns an allocated C-Style string via malloc(), which stores
* the converted string.
* You should call opencc_convert_utf8_free() to release allocated memory.
*
* @param od The opencc description pointer.
* @param inbuf The UTF-8 encoded string.
* @param length The maximum length of inbuf to convert. If length is set to -1,
* the whole c-style string in inbuf will be converted.
*
* @return The newly allocated UTF-8 string that stores text converted
* from inbuf.
* @ingroup opencc_api
*/
char* opencc_convert_utf8(opencc_t od, const char* inbuf, size_t length);
/**
* Releases allocated buffer by opencc_convert_utf8.
*
* @param buf Pointer to the allocated string buffer by opencc_convert_utf8.
*
* @ingroup opencc_api
*/
void opencc_convert_utf8_free(char* buf);
/**
* Loads a dictionary to default dictionary chain.
*
* @param od The opencc description pointer.
* @param dict_filename The name (or location) of the dictionary file.
* @param dict_type The type of the dictionary.
*
* @return 0 on success or non-zero number on failure.
*
* @ingroup opencc_api
* @deprecated This function is not recommended to use and will be removed.
*/
int opencc_dict_load(opencc_t od,
const char* dict_filename,
opencc_dictionary_type dict_type);
/**
* Changes the mode of conversion.
*
* @param od The opencc description pointer.
* @param conversion_mode Conversion mode. Options are
* - OPENCC_CONVERSION_FAST
* - OPENCC_CONVERSION_SEGMENT_ONLY
* - OPENCC_CONVERSION_LIST_CANDIDATES
* @ingroup opencc_api
*/
void opencc_set_conversion_mode(opencc_t od,
opencc_conversion_mode conversion_mode);
/**
* Returns an opencc_convert_errno_t which describes the last error.
*
* @return The error type.
*/
opencc_error opencc_errno(void);
/**
* Prints the error message to stderr.
*
* @param spec Prefix message.
* @ingroup opencc_api
*/
void opencc_perror(const char* spec);
#ifdef __cplusplus
}
#endif
#endif /* __OPENCC_H_ */
opencc-0.4.3/src/tools/opencc_dict.c 000640 567316 013202 00000024304 12145345503 020654 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "../dictionary/datrie.h"
#include "../dictionary/text.h"
#include "../dict_group.h"
#include "../encoding.h"
#include "../utils.h"
#include
#include
#ifndef VERSION
#define VERSION ""
#endif
#define DATRIE_SIZE 1000000
#define DATRIE_WORD_MAX_COUNT 500000
#define DATRIE_WORD_MAX_LENGTH 32
#define BUFFER_SIZE 1024
typedef struct {
uint32_t cursor;
ucs4_t* pointer;
} Value;
typedef struct {
ucs4_t* key;
Value* value;
size_t length;
size_t value_count;
} Entry;
Entry lexicon[DATRIE_WORD_MAX_COUNT];
uint32_t lexicon_count, words_set_count;
int words_set[DATRIE_WORD_MAX_COUNT];
ucs4_t words_set_char[DATRIE_WORD_MAX_COUNT];
DatrieItem dat[DATRIE_SIZE];
uint32_t lexicon_index_length, lexicon_cursor_end;
void match_word(const DatrieItem* dat,
const ucs4_t* word,
int* match_pos,
int* id,
int limit) {
int i, j, p;
for (i = 0, p = 0;
word[p] && (limit == 0 || p < limit) && dat[i].base != DATRIE_UNUSED;
p++) {
int k = encode_char(word[p]);
j = dat[i].base + k;
if ((j < 0) || (j > DATRIE_SIZE) || (dat[j].parent != i)) {
break;
}
i = j;
}
if (match_pos) {
*match_pos = p;
}
if (id) {
*id = i;
}
}
int unused(int i) {
if ((i >= 0) && (i < DATRIE_SIZE)) {
return dat[i].parent == DATRIE_UNUSED;
}
return 0;
}
int is_prefix(const ucs4_t* a, const ucs4_t* b) {
const ucs4_t* p = a, * q = b;
while (*p != 0) {
if (*q == 0) {
return 0;
}
if (*p != *q) {
return 0;
}
p++;
q++;
}
return 1;
}
int binary_search(const ucs4_t* str) {
int a = 0, b = lexicon_count - 1, c;
while (a + 1 < b) {
c = (a + b) / 2;
if (ucs4cmp(str, lexicon[c].key) <= 0) {
b = c;
} else {
a = c + 1;
}
}
if (is_prefix(str,
lexicon[a].key) &&
((a == 0) || !is_prefix(str, lexicon[a - 1].key))) {
return a;
}
if (is_prefix(str, lexicon[b].key) && !is_prefix(str, lexicon[b - 1].key)) {
return b;
}
return -1;
}
int wcmp(const void* a, const void* b) {
return *(const ucs4_t*)a < *(const ucs4_t*)b ? -1 : 1;
}
void get_words_with_prefix(ucs4_t* word, int p) {
int i;
static ucs4_t buff[DATRIE_WORD_MAX_LENGTH];
static ucs4_t words_set_char_buff[DATRIE_WORD_MAX_COUNT];
for (i = 0; i < p; i++) {
buff[i] = word[i];
}
buff[p] = 0;
words_set_count = 0;
for (i = binary_search(buff);
(uint32_t)i < lexicon_count && is_prefix(buff, lexicon[i].key); i++) {
if (ucs4cmp(buff, lexicon[i].key) == 0) {
continue;
}
words_set_char_buff[words_set_count] = lexicon[i].key[p];
words_set[words_set_count++] = i;
}
words_set_char_buff[words_set_count] = 0;
qsort(words_set_char_buff, words_set_count, sizeof(words_set_char_buff[0]),
wcmp);
ucs4_t* wfp, * wp, last;
for (last = 0, wfp = words_set_char_buff, wp = words_set_char; *wfp; wfp++) {
if (*wfp != last) {
last = *wfp;
*wp = *wfp;
wp++;
}
}
*wp = 0;
}
int words_space_available(int delta) {
ucs4_t* wp;
for (wp = words_set_char; *wp; wp++) {
if (!unused(encode_char(*wp) + delta)) {
return 0;
}
}
return 1;
}
void insert_first_char(int id) {
Entry* word = lexicon + id;
int key = encode_char(word->key[0]);
dat[key].base = DATRIE_UNUSED;
dat[key].parent = 0;
if (word->length == 1) {
dat[key].word = (id);
}
}
void insert_words(int delta, int parent, size_t word_len) {
int i;
for (i = 0; (uint32_t)i < words_set_count; i++) {
int j = words_set[i];
int k = encode_char(lexicon[j].key[word_len]) + delta;
dat[k].parent = parent;
if (lexicon[j].length == word_len + 1) {
dat[k].word = (j);
}
}
}
void insert(int id) {
static int space_min = 0;
Entry* word = &lexicon[id];
for (;;) {
int p, i;
match_word(dat, word->key, &p, &i, 0);
if ((size_t)p == word->length) {
return;
}
get_words_with_prefix(word->key, p);
int delta;
delta = space_min - words_set_char[0];
for (; delta < DATRIE_SIZE; delta++) {
if (words_space_available(delta)) {
break;
}
}
if (delta == DATRIE_SIZE) {
fprintf(stderr, "DATRIE_SIZE Not Enough!\n");
exit(1);
}
insert_words(delta, i, p);
dat[i].base = delta;
while (!unused(space_min)) {
space_min++;
}
}
}
void make(void) {
size_t i;
for (i = 1; i < DATRIE_SIZE; i++) {
dat[i].parent = dat[i].base = DATRIE_UNUSED;
dat[i].word = -1;
}
dat[0].parent = dat[0].base = 0;
for (i = 0; i < lexicon_count; i++) {
insert_first_char(i);
}
for (i = 0; i < lexicon_count; i++) {
insert(i);
}
}
int cmp(const void* a, const void* b) {
return ucs4cmp(((const TextEntry*)a)->key, ((const TextEntry*)b)->key);
}
void init(const char* filename) {
DictGroup* DictGroup = dict_group_new(NULL);
if (dict_group_load(DictGroup, filename,
OPENCC_DICTIONARY_TYPE_TEXT) == -1) {
dictionary_perror("Dictionary loading error");
fprintf(stderr, _("\n"));
exit(1);
}
Dict* dict_abs = dict_group_get_dict(DictGroup, 0);
if (dict_abs == (Dict*)-1) {
dictionary_perror("Dictionary loading error");
fprintf(stderr, _("\n"));
exit(1);
}
static TextEntry tlexicon[DATRIE_WORD_MAX_COUNT];
/* TODO add datrie support */
Dict* dictionary = dict_abs->dict;
lexicon_count = dict_text_get_lexicon(dictionary, tlexicon);
qsort(tlexicon, lexicon_count, sizeof(tlexicon[0]), cmp);
size_t i;
size_t lexicon_cursor = 0;
for (i = 0; i < lexicon_count; i++) {
lexicon[i].key = tlexicon[i].key;
lexicon[i].length = ucs4len(lexicon[i].key);
size_t j;
for (j = 0; tlexicon[i].value[j] != NULL; j++) {}
lexicon[i].value_count = j;
lexicon_index_length += lexicon[i].value_count + 1;
lexicon[i].value = (Value*)malloc(lexicon[i].value_count * sizeof(Value));
for (j = 0; j < lexicon[i].value_count; j++) {
lexicon[i].value[j].cursor = lexicon_cursor;
lexicon[i].value[j].pointer = tlexicon[i].value[j];
lexicon_cursor += ucs4len(tlexicon[i].value[j]) + 1;
}
}
lexicon_cursor_end = lexicon_cursor;
}
void output(const char* file_name) {
FILE* fp = fopen(file_name, "wb");
if (!fp) {
fprintf(stderr, _("Can not write file: %s\n"), file_name);
exit(1);
}
uint32_t i, item_count;
for (i = DATRIE_SIZE - 1; i > 0; i--) {
if (dat[i].parent != DATRIE_UNUSED) {
break;
}
}
item_count = i + 1;
fwrite("OPENCCDATRIE", sizeof(char), strlen("OPENCCDATRIE"), fp);
/* 詞彙表長度 */
fwrite(&lexicon_cursor_end, sizeof(uint32_t), 1, fp);
for (i = 0; i < lexicon_count; i++) {
size_t j;
for (j = 0; j < lexicon[i].value_count; j++) {
fwrite(lexicon[i].value[j].pointer, sizeof(ucs4_t),
ucs4len(lexicon[i].value[j].pointer) + 1, fp);
}
}
/* 詞彙索引表長度 */
fwrite(&lexicon_index_length, sizeof(uint32_t), 1, fp);
for (i = 0; i < lexicon_count; i++) {
size_t j;
for (j = 0; j < lexicon[i].value_count; j++) {
fwrite(&lexicon[i].value[j].cursor, sizeof(uint32_t), 1, fp);
}
uint32_t dem = (uint32_t)-1;
fwrite(&dem, sizeof(uint32_t), 1, fp); /* 分隔符 */
}
fwrite(&lexicon_count, sizeof(uint32_t), 1, fp);
fwrite(&item_count, sizeof(uint32_t), 1, fp);
fwrite(dat, sizeof(dat[0]), item_count, fp);
fclose(fp);
}
#ifdef DEBUG_WRITE_TEXT
void write_text_file() {
FILE* fp;
int i;
fp = fopen("datrie.txt", "w");
fprintf(fp, "%d\n", lexicon_count);
for (i = 0; i < lexicon_count; i++) {
char* buff = ucs4_to_utf8(lexicon[i].value, (size_t)-1);
fprintf(fp, "%s\n", buff);
free(buff);
}
for (i = 0; i < DATRIE_SIZE; i++) {
if (dat[i].parent != DATRIE_UNUSED) {
fprintf(fp, "%d %d %d %d\n", i, dat[i].base, dat[i].parent, dat[i].word);
}
}
fclose(fp);
}
#endif /* ifdef DEBUG_WRITE_TEXT */
void show_version() {
printf(_("\nOpen Chinese Convert (OpenCC) Dictionary Tool\nVersion %s\n\n"),
VERSION);
}
void show_usage() {
show_version();
printf(_("Usage:\n"));
printf(_(" opencc_dict -i input_file -o output_file\n\n"));
printf(_(" -i input_file\n"));
printf(_(" Read data from input_file.\n"));
printf(_(" -o output_file\n"));
printf(_(" Write converted data to output_file.\n"));
printf(_("\n"));
printf(_("\n"));
}
int main(int argc, char** argv) {
static int oc;
static char input_file[BUFFER_SIZE], output_file[BUFFER_SIZE];
int input_file_specified = 0, output_file_specified = 0;
#ifdef ENABLE_GETTEXT
setlocale(LC_ALL, "");
bindtextdomain(PACKAGE_NAME, LOCALEDIR);
#endif /* ifdef ENABLE_GETTEXT */
while ((oc = getopt(argc, argv, "vh-:i:o:")) != -1) {
switch (oc) {
case 'v':
show_version();
return 0;
case 'h':
case '?':
show_usage();
return 0;
case '-':
if (strcmp(optarg, "version") == 0) {
show_version();
} else if (strcmp(optarg, "help") == 0) {
show_usage();
} else {
show_usage();
}
return 0;
case 'i':
strcpy(input_file, optarg);
input_file_specified = 1;
break;
case 'o':
strcpy(output_file, optarg);
output_file_specified = 1;
break;
}
}
if (!input_file_specified) {
fprintf(stderr, _("Please specify input file using -i.\n"));
show_usage();
return 1;
}
if (!output_file_specified) {
fprintf(stderr, _("Please specify output file using -o.\n"));
show_usage();
return 1;
}
init(input_file);
make();
output(output_file);
#ifdef DEBUG_WRITE_TEXT
write_text_file();
#endif /* ifdef DEBUG_WRITE_TEXT */
return 0;
}
opencc-0.4.3/src/tools/opencc.c 000640 567316 013202 00000013055 12145345503 017652 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "../opencc.h"
#include "../utils.h"
#include
#include
#include
#include
#include
#ifndef VERSION
#define VERSION ""
#endif
#define BUFFER_SIZE 65536
void convert(const char* input_file,
const char* output_file,
const char* config_file) {
opencc_t od = opencc_open(config_file);
if (od == (opencc_t)-1) {
opencc_perror(_("OpenCC initialization error"));
exit(1);
}
FILE* fp = stdin;
FILE* fpo = stdout;
if (input_file) {
fp = fopen(input_file, "r");
if (!fp) {
fprintf(stderr, _("Can not read file: %s\n"), input_file);
exit(1);
}
skip_utf8_bom(fp);
}
if (output_file) {
fpo = fopen(output_file, "w");
if (!fpo) {
fprintf(stderr, _("Can not write file: %s\n"), output_file);
exit(1);
}
}
size_t size = BUFFER_SIZE;
char* buffer_in = NULL, * buffer_out = NULL;
buffer_in = (char*)malloc(size * sizeof(char));
char* lookahead = (char*)malloc(size * sizeof(char));
size_t lookahead_size = 0;
while (!feof(fp)) {
size_t read;
if (lookahead_size > 0) {
memcpy(buffer_in, lookahead, lookahead_size);
read =
fread(buffer_in + lookahead_size, 1, size - lookahead_size,
fp) + lookahead_size;
lookahead_size = 0;
} else {
read = fread(buffer_in, 1, size, fp);
}
// If we haven't finished reading after filling the entire buffer,
// then it could be that we broke within an UTF-8 character, in
// that case we must backtrack and find the boundary
if (read == size) {
// Find the boundary of last UTF-8 character
int i;
for (i = read - 1; i >= 0; i--) {
char c = buffer_in[i];
if (!(c & 0x80) || ((c & 0xC0) == 0xC0)) {
break;
}
}
assert(i >= 0);
memcpy(lookahead, buffer_in + i, read - i);
lookahead_size = read - i;
buffer_in[i] = '\0';
} else {
buffer_in[read] = '\0';
}
buffer_out = opencc_convert_utf8(od, buffer_in, (size_t)-1);
if (buffer_out != (char*)-1) {
fprintf(fpo, "%s", buffer_out);
opencc_convert_utf8_free(buffer_out);
} else {
opencc_perror(_("OpenCC error"));
break;
}
}
if (lookahead_size > 0) {
assert(lookahead_size < size);
lookahead[lookahead_size] = '\0';
buffer_out = opencc_convert_utf8(od, lookahead, (size_t)-1);
if (buffer_out != (char*)-1) {
fprintf(fpo, "%s", buffer_out);
opencc_convert_utf8_free(buffer_out);
} else {
opencc_perror(_("OpenCC error"));
}
}
opencc_close(od);
free(lookahead);
free(buffer_in);
fclose(fp);
fclose(fpo);
}
void show_version() {
printf(_("\n"));
printf(_("Open Chinese Convert (OpenCC) Command Line Tool\n"));
printf(_("Version %s\n"), VERSION);
printf(_("\n"));
printf(_("Author: %s\n"), "BYVoid ");
printf(_("Bug Report: %s\n"), "http://github.com/BYVoid/OpenCC/issues");
printf(_("\n"));
}
void show_usage() {
show_version();
printf(_("Usage:\n"));
printf(_(" opencc [Options]\n"));
printf(_("\n"));
printf(_("Options:\n"));
printf(_(" -i [file], --input=[file] Read original text from [file].\n"));
printf(_(" -o [file], --output=[file] Write converted text to [file].\n"));
printf(_(
" -c [file], --config=[file] Load configuration of conversion from [file].\n"));
printf(_(" -v, --version Print version and build information.\n"));
printf(_(" -h, --help Print this help.\n"));
printf(_("\n"));
printf(_(
"With no input file, reads standard input and writes converted stream to standard output.\n"));
printf(_(
"Default configuration(%s) will be loaded if not set.\n"),
OPENCC_DEFAULT_CONFIG_SIMP_TO_TRAD);
printf(_("\n"));
}
int main(int argc, char** argv) {
#ifdef ENABLE_GETTEXT
setlocale(LC_ALL, "");
bindtextdomain(PACKAGE_NAME, LOCALEDIR);
#endif /* ifdef ENABLE_GETTEXT */
static struct option longopts[] =
{
{ "version", no_argument, NULL, 'v' },
{ "help", no_argument, NULL, 'h' },
{ "input", required_argument, NULL, 'i' },
{ "output", required_argument, NULL, 'o' },
{ "config", required_argument, NULL, 'c' },
{ 0, 0, 0, 0 },
};
static int oc;
static char* input_file, * output_file, * config_file;
while ((oc = getopt_long(argc, argv, "vh?i:o:c:", longopts, NULL)) != -1) {
switch (oc) {
case 'v':
show_version();
return 0;
case 'h':
case '?':
show_usage();
return 0;
case 'i':
input_file = mstrcpy(optarg);
break;
case 'o':
output_file = mstrcpy(optarg);
break;
case 'c':
config_file = mstrcpy(optarg);
break;
}
}
if (config_file == NULL) {
config_file = mstrcpy(OPENCC_DEFAULT_CONFIG_SIMP_TO_TRAD);
}
convert(input_file, output_file, config_file);
free(input_file);
free(output_file);
free(config_file);
return 0;
}
opencc-0.4.3/src/tools/CMakeLists.txt 000640 567316 013202 00000001504 12145345503 020773 0 ustar 00carbokuo nonconf 000000 000000 set(
LIBOPENCC_DICTIONARY_SOURCES
../dict.c
../dictionary/datrie.c
../dictionary/text.c
../dict.h
../dictionary/datrie.h
../dictionary/text.h
)
set(
OPENCC_DCIT_SOURCES
${LIBOPENCC_DICTIONARY_SOURCES}
opencc_dict.c
../dict_group.c
../dict_group.h
../dict_chain.c
../dict_chain.h
../config_reader.c
../config_reader.h
../encoding.c
../encoding.h
../utils.c
../utils.h
)
add_executable(
opencc_dict
${OPENCC_DCIT_SOURCES}
)
target_link_libraries(
opencc_dict
${LIBOPENCC_TARGET}
)
install(
TARGETS
opencc_dict
RUNTIME
DESTINATION
${DIR_BIN}
)
set(
OPENCC_SOURCES
opencc.c
../utils.c
../utils.h
)
add_executable(
opencc
${OPENCC_SOURCES}
)
add_dependencies(
opencc
ocds
)
target_link_libraries(
opencc
${LIBOPENCC_TARGET}
)
install(
TARGETS
opencc
RUNTIME
DESTINATION
${DIR_BIN}
)
opencc-0.4.3/src/config_reader.c 000640 567316 013202 00000014663 12145345503 020040 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "config_reader.h"
#include "dict_group.h"
#include "dict_chain.h"
#define LINE_BUFFER_SIZE 8192
#define CONFIG_DICT_TYPE_OCD "OCD"
#define CONFIG_DICT_TYPE_TEXT "TEXT"
static config_error errnum = CONFIG_ERROR_VOID;
static int qsort_dictionary_buffer_cmp(const void* a, const void* b) {
if (((DictMeta*)a)->index < ((DictMeta*)b)->index) {
return -1;
}
if (((DictMeta*)a)->index > ((DictMeta*)b)->index) {
return 1;
}
return ((DictMeta*)a)->stamp < ((DictMeta*)b)->stamp ? -1 : 1;
}
static int load_dictionary(Config* config) {
if (config->dicts_count == 0) {
return 0;
}
// Sort dictionaries
qsort(config->dicts,
config->dicts_count,
sizeof(config->dicts[0]),
qsort_dictionary_buffer_cmp);
DictGroup* group = dict_chain_add_group(config->dict_chain);
size_t last_index = 0;
size_t i;
for (i = 0; i < config->dicts_count; i++) {
if (config->dicts[i].index > last_index) {
last_index = config->dicts[i].index;
group = dict_chain_add_group(config->dict_chain);
}
dict_group_load(group,
config->dicts[i].file_name,
config->dicts[i].dict_type);
}
return 0;
}
static int parse_add_dict(Config* config, size_t index, const char* dictstr) {
const char* pstr = dictstr;
while (*pstr != '\0' && *pstr != ' ') {
pstr++;
}
opencc_dictionary_type dict_type;
if (strncmp(dictstr, CONFIG_DICT_TYPE_OCD,
sizeof(CONFIG_DICT_TYPE_OCD) - 1) == 0) {
dict_type = OPENCC_DICTIONARY_TYPE_DATRIE;
} else if (strncmp(dictstr, CONFIG_DICT_TYPE_TEXT,
sizeof(CONFIG_DICT_TYPE_OCD) - 1) == 0) {
dict_type = OPENCC_DICTIONARY_TYPE_TEXT;
} else {
errnum = CONFIG_ERROR_INVALID_DICT_TYPE;
return -1;
}
while (*pstr != '\0' && (*pstr == ' ' || *pstr == '\t')) {
pstr++;
}
size_t i = config->dicts_count++;
config->dicts[i].dict_type = dict_type;
config->dicts[i].file_name = mstrcpy(pstr);
config->dicts[i].index = index;
config->dicts[i].stamp = config->stamp++;
return 0;
}
static int parse_property(Config* config, const char* key, const char* value) {
if (strncmp(key, "dict", 4) == 0) {
int index = 0;
sscanf(key + 4, "%d", &index);
return parse_add_dict(config, index, value);
} else if (strcmp(key, "title") == 0) {
free(config->title);
config->title = mstrcpy(value);
return 0;
} else if (strcmp(key, "description") == 0) {
free(config->description);
config->description = mstrcpy(value);
return 0;
}
errnum = CONFIG_ERROR_NO_PROPERTY;
return -1;
}
static int parse_line(const char* line, char** key, char** value) {
const char* line_begin = line;
while (*line != '\0' && (*line != ' ' && *line != '\t' && *line != '=')) {
line++;
}
size_t key_len = line - line_begin;
while (*line != '\0' && *line != '=') {
line++;
}
if (*line == '\0') {
return -1;
}
assert(*line == '=');
*key = mstrncpy(line_begin, key_len);
line++;
while (*line != '\0' && (*line == ' ' || *line == '\t')) {
line++;
}
if (*line == '\0') {
free(*key);
return -1;
}
*value = mstrcpy(line);
return 0;
}
static char* parse_trim(char* str) {
for (; *str != '\0' && (*str == ' ' || *str == '\t'); str++) {}
register char* prs = str;
for (; *prs != '\0' && *prs != '\n' && *prs != '\r'; prs++) {}
for (prs--; prs > str && (*prs == ' ' || *prs == '\t'); prs--) {}
*(++prs) = '\0';
return str;
}
static int parse(Config* config, const char* filename) {
char* path = try_open_file(filename);
if (path == NULL) {
errnum = CONFIG_ERROR_CANNOT_ACCESS_CONFIG_FILE;
return -1;
}
config->file_path = get_file_path(path);
FILE* fp = fopen(path, "r");
assert(fp != NULL);
free(path);
skip_utf8_bom(fp);
static char buff[LINE_BUFFER_SIZE];
while (fgets(buff, LINE_BUFFER_SIZE, fp) != NULL) {
char* trimed_buff = parse_trim(buff);
if ((*trimed_buff == ';') || (*trimed_buff == '#') ||
(*trimed_buff == '\0')) {
/* Comment Line or empty line */
continue;
}
char* key = NULL, * value = NULL;
if (parse_line(trimed_buff, &key, &value) == -1) {
free(key);
free(value);
fclose(fp);
errnum = CONFIG_ERROR_PARSE;
return -1;
}
if (parse_property(config, key, value) == -1) {
free(key);
free(value);
fclose(fp);
return -1;
}
free(key);
free(value);
}
fclose(fp);
return 0;
}
DictChain* config_get_dict_chain(Config* config) {
if (config->dict_chain != NULL) {
dict_chain_delete(config->dict_chain);
}
config->dict_chain = dict_chain_new(config);
load_dictionary(config);
return config->dict_chain;
}
config_error config_errno(void) {
return errnum;
}
void config_perror(const char* spec) {
perr(spec);
perr("\n");
switch (errnum) {
case CONFIG_ERROR_VOID:
break;
case CONFIG_ERROR_CANNOT_ACCESS_CONFIG_FILE:
perror(_("Can not access configuration file"));
break;
case CONFIG_ERROR_PARSE:
perr(_("Configuration file parse error"));
break;
case CONFIG_ERROR_NO_PROPERTY:
perr(_("Invalid property"));
break;
case CONFIG_ERROR_INVALID_DICT_TYPE:
perr(_("Invalid dictionary type"));
break;
default:
perr(_("Unknown"));
}
}
Config* config_open(const char* filename) {
Config* config = (Config*)malloc(sizeof(Config));
config->title = NULL;
config->description = NULL;
config->dicts_count = 0;
config->stamp = 0;
config->dict_chain = NULL;
config->file_path = NULL;
if (parse(config, filename) == -1) {
config_close((Config*)config);
return (Config*)-1;
}
return (Config*)config;
}
void config_close(Config* config) {
size_t i;
for (i = 0; i < config->dicts_count; i++) {
free(config->dicts[i].file_name);
}
free(config->title);
free(config->description);
free(config->file_path);
free(config);
}
opencc-0.4.3/src/CMakeLists.txt 000640 567316 013202 00000003142 12145345503 017633 0 ustar 00carbokuo nonconf 000000 000000 set(
LIBOPENCC_HEADERS
opencc.h
opencc_types.h
wrapper/cplusplus/openccxx.h
)
set(
LIBOPENCC_DICTIONARY_SOURCES
dict.c
dictionary/datrie.c
dictionary/text.c
dict.h
dictionary/datrie.h
dictionary/text.h
)
set(
LIBOPENCC_SOURCES
${LIBOPENCC_DICTIONARY_SOURCES}
config_reader.c
converter.c
dict_group.c
dict_chain.c
encoding.c
utils.c
opencc.c
config_reader.h
converter.h
dict_group.h
dict_chain.h
encoding.h
utils.h
)
set (LIBOPENCC_TARGET libopencc)
set (LIBOPENCC_STATIC_TARGET libopencc_static)
add_definitions(
-DPKGDATADIR="${DIR_SHARE_OPENCC}"
-DLOCALEDIR="${DIR_SHARE_LOCALE}"
-DVERSION="${OPENCC_VERSION}"
-DBYTEORDER=${BYTEORDER}
-DPACKAGE_NAME="${PACKAGE_NAME}"
-Wall
)
add_library(
${LIBOPENCC_TARGET}
SHARED
${LIBOPENCC_SOURCES}
)
add_library(
${LIBOPENCC_STATIC_TARGET}
STATIC
${LIBOPENCC_SOURCES}
)
set_target_properties(
${LIBOPENCC_TARGET}
${LIBOPENCC_STATIC_TARGET}
PROPERTIES
OUTPUT_NAME
opencc
VERSION
1.0.0
SOVERSION
1
)
if (ENABLE_GETTEXT)
add_definitions(
-DENABLE_GETTEXT
)
link_directories(
${GETTEXT_LIBRARIES}
)
include_directories(
${GETTEXT_INCLUDE_DIR}
)
endif (ENABLE_GETTEXT)
if (CMAKE_BUILD_TYPE MATCHES Debug)
add_definitions(
-O0
-g3
)
endif (CMAKE_BUILD_TYPE MATCHES Debug)
if (NOT WIN32)
install(
TARGETS
${LIBOPENCC_TARGET}
LIBRARY DESTINATION
${DIR_LIBRARY}
)
endif (NOT WIN32)
install(
TARGETS
${LIBOPENCC_STATIC_TARGET}
ARCHIVE DESTINATION
${DIR_LIBRARY_STATIC}
)
install(
FILES
${LIBOPENCC_HEADERS}
DESTINATION
${DIR_INCLUDE}/opencc
)
include(symbols.cmake)
add_subdirectory(tools)
opencc-0.4.3/src/dict_group.c 000640 567316 013202 00000011720 12145345503 017377 0 ustar 00carbokuo nonconf 000000 000000 /*
* Open Chinese Convert
*
* Copyright 2010-2013 BYVoid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "config_reader.h"
#include "dict_group.h"
#include "dict_chain.h"
static dictionary_error errnum = DICTIONARY_ERROR_VOID;
DictGroup* dict_group_new(DictChain* dict_chain) {
DictGroup* dict_group =
(DictGroup*)malloc(sizeof(DictGroup));
dict_group->count = 0;
dict_group->dict_chain = dict_chain;
return dict_group;
}
void dict_group_delete(DictGroup* dict_group) {
size_t i;
for (i = 0; i < dict_group->count; i++) {
dict_delete(dict_group->dicts[i]);
}
free(dict_group);
}
static char* try_find_dictionary_with_config(
DictGroup* dict_group,
const char* filename) {
if (is_absolute_path(filename)) {
return NULL;
}
/* Get config path */
if (dict_group->dict_chain == NULL) {
return NULL;
}
Config* config = dict_group->dict_chain->config;
if (config == NULL) {
return NULL;
}
const char* config_path = config->file_path;
if (config_path == NULL) {
return NULL;
}
char* config_path_filename = (char*)malloc(strlen(config_path) + strlen(
filename) + 2);
sprintf(config_path_filename, "%s/%s", config_path, filename);
FILE* fp = fopen(config_path_filename, "r");
if (fp) {
fclose(fp);
return config_path_filename;
}
return NULL;
}
int dict_group_load(DictGroup* dict_group,
const char* filename,
opencc_dictionary_type type) {
Dict* dictionary;
char* path = try_open_file(filename);
if (path == NULL) {
path = try_find_dictionary_with_config(dict_group, filename);
if (path == NULL) {
errnum = DICTIONARY_ERROR_CANNOT_ACCESS_DICTFILE;
return -1;
}
}
dictionary = dict_new(path, type);
free(path);
if (dictionary == (Dict*)-1) {
errnum = DICTIONARY_ERROR_INVALID_DICT;
return -1;
}
dict_group->dicts[dict_group->count++] = dictionary;
return 0;
}
Dict* dict_group_get_dict(DictGroup* dict_group, size_t index) {
if (index >= dict_group->count) {
errnum = DICTIONARY_ERROR_INVALID_INDEX;
return (Dict*)-1;
}
return dict_group->dicts[index];
}
const ucs4_t* const* dict_group_match_longest(
DictGroup* dict_group,
const ucs4_t* word,
size_t maxlen,
size_t* match_length) {
if (dict_group->count == 0) {
errnum = DICTIONARY_ERROR_NODICT;
return (const ucs4_t* const*)-1;
}
const ucs4_t* const* retval = NULL;
size_t t_match_length, max_length = 0;
size_t i;
for (i = 0; i < dict_group->count; i++) {
/* 依次查找每個辭典,取得最長匹配長度 */
const ucs4_t* const* t_retval = dict_match_longest(
dict_group->dicts[i],
word,
maxlen,
&t_match_length);
if (t_retval != NULL) {
if (t_match_length > max_length) {
max_length = t_match_length;
retval = t_retval;
}
}
}
if (match_length != NULL) {
*match_length = max_length;
}
return retval;
}
size_t dict_group_get_all_match_lengths(DictGroup* dict_group,
const ucs4_t* word,
size_t* match_length) {
if (dict_group->count == 0) {
errnum = DICTIONARY_ERROR_NODICT;
return (size_t)-1;
}
size_t rscnt = 0;
size_t i;
for (i = 0; i < dict_group->count; i++) {
size_t retval;
retval = dict_get_all_match_lengths(
dict_group->dicts[i],
word,
match_length + rscnt
);
rscnt += retval;
/* 去除重複長度 */
if ((i > 0) && (rscnt > 1)) {
qsort(match_length, rscnt, sizeof(match_length[0]), qsort_int_cmp);
size_t j, k;
for (j = 0, k = 1; k < rscnt; k++) {
if (match_length[k] != match_length[j]) {
match_length[++j] = match_length[k];
}
}
rscnt = j + 1;
}
}
return rscnt;
}
dictionary_error dictionary_errno(void) {
return errnum;
}
void dictionary_perror(const char* spec) {
perr(spec);
perr("\n");
switch (errnum) {
case DICTIONARY_ERROR_VOID:
break;
case DICTIONARY_ERROR_NODICT:
perr(_("No dictionary loaded"));
break;
case DICTIONARY_ERROR_CANNOT_ACCESS_DICTFILE:
perror(_("Can not open dictionary file"));
break;
case DICTIONARY_ERROR_INVALID_DICT:
perror(_("Invalid dictionary file"));
break;
case DICTIONARY_ERROR_INVALID_INDEX:
perror(_("Invalid dictionary index"));
break;
default:
perr(_("Unknown"));
}
}
opencc-0.4.3/binding.gyp 000640 567316 013202 00000001001 12145345503 016427 0 ustar 00carbokuo nonconf 000000 000000 {
"includes": [
"gypi/global.gypi",
"gypi/configs.gypi",
"gypi/dicts.gypi",
],
"targets": [{
"target_name": "binding",
"sources": [
"node/binding.cc",
"src/config_reader.c",
"src/converter.c",
"src/dict_group.c",
"src/dict_chain.c",
"src/encoding.c",
"src/utils.c",
"src/opencc.c",
"src/dict.c",
"src/dictionary/datrie.c",
"src/dictionary/text.c"
],
"dependencies": [
"configs",
"dicts",
]
}]
}
opencc-0.4.3/opencc.pc.in 000640 567316 013202 00000000346 12145345503 016507 0 ustar 00carbokuo nonconf 000000 000000 prefix=@DIR_PREFIX@
exec_prefix=${prefix}
libdir=@DIR_LIBRARY@
includedir=@DIR_INCLUDE@
Name: opencc
Description: Open Chinese Convert
Version: @OPENCC_VERSION@
Requires:
Libs: -L${libdir} -lopencc
Cflags: -I${includedir}/opencc
opencc-0.4.3/README.md 000640 567316 013202 00000007764 12145345503 015601 0 ustar 00carbokuo nonconf 000000 000000 # Open Chinese Convert
## Introduction
Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversion between Traditional Chinese and Simplified Chinese, supporting character-level conversion, phrase-level conversion, variant conversion and regional idioms among Mainland China, Taiwan and Hong kong.
中文簡繁轉換開源項目,支持詞彙級别的轉換、異體字轉換和地區習慣用詞轉換(中國大陸、臺灣、香港)。
### OpenCC特點
* 嚴格區分「一簡對多繁」和「一簡對多異」。
* 完全兼容異體字,可以實現動態替換。
* 嚴格審校一簡對多繁詞條,原則爲「能分則不合」。
* 支持中國大陸、臺灣、香港異體字和地區習慣用詞轉換,如「裏」「裡」、「鼠標」「滑鼠」。
* 使用歧義分割+最少分詞算法,儘可能從技術上優化轉換效果。
* 詞庫和函數庫完全分離,可以自由修改、導入、擴展。
* 支持C、C++、Python、PHP、Java、Ruby、Node.js。
* 兼容Windows、Linux、Mac平臺。
* 已經用於ibus-pinyin、fcitx的繁體模式輸入。
## Links
### Project home page
http://code.google.com/p/opencc/
### Introduction (詳細介紹)
https://code.google.com/p/opencc/wiki/Introduction
### Development Documentation
http://byvoid.github.io/OpenCC/
### Source Code on Github
https://github.com/byvoid/opencc
### OpenCC Online (在線轉換)
http://opencc.byvoid.com/
### 現代漢語常用簡繁一對多字義辨析表
http://ytenx.org/byohlyuk/KienxPyan
### Projects using Opencc
* [ibus-pinyin](http://code.google.com/p/ibus/)
* [fcitx](http://code.google.com/p/fcitx/)
* [rimeime](http://code.google.com/p/rimeime/)
* [libgooglepinyin](http://code.google.com/p/libgooglepinyin/)
* [ibus-libpinyin](https://github.com/libpinyin/ibus-libpinyin)
* [BYVBlog](https://github.com/byvoid/byvblog)
* [豆瓣同城微信](http://weixinqiao.com/douban-event/)
## Installation
### [Debian](http://packages.qa.debian.org/o/opencc.html)/[Ubuntu](https://launchpad.net/ubuntu/+source/opencc)
apt-get install opencc
### [Fedora](https://admin.fedoraproject.org/pkgdb/acls/name/opencc)
yum install opencc
### [Arch](https://www.archlinux.org/packages/community/x86_64/opencc/)
pacman -S opencc
### [Mac](https://github.com/mxcl/homebrew/blob/master/Library/Formula/opencc.rb)
brew install opencc
### [Node.js](https://npmjs.org/package/opencc)
npm install opencc
## Usage
$ opencc --help
Open Chinese Convert (OpenCC) Command Line Tool
Author: BYVoid
Bug Report: http://github.com/BYVoid/OpenCC/issues
Usage:
opencc [Options]
Options:
-i [file], --input=[file] Read original text from [file].
-o [file], --output=[file] Write converted text to [file].
-c [file], --config=[file] Load configuration of conversion from [file].
-v, --version Print version and build information.
-h, --help Print this help.
With no input file, reads standard input and writes converted stream to standard output.
Default configuration(zhs2zht.ini) will be loaded if not set.
## Build
### Build with CMake
Make a directory and check in:
mkdir build
cd build
Build sources:
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -D ENABLE_GETTEXT:BOOL=ON ..
make
On windows, run these commands instead:
cmake .. -G "MSYS Makefiles" -DCMAKE_INSTALL_PREFIX="" -DCMAKE_BUILD_TYPE=Release -DENABLE_GETTEXT:BOOL=OFF
make
Install:
sudo make install
### Build with gyp
mkdir build
gyp --depth . -D library=shared_library -f make --generator-output=build opencc.gyp
make -C build
## Screenshot



## Contributors
* [BYVoid](http://www.byvoid.com/)
* 佛振
* Peng Huang
* LI Daobing
opencc-0.4.3/po/zh_HK.po 000640 567316 013202 00000015005 12145345503 016266 0 ustar 00carbokuo nonconf 000000 000000 # Chinese translations for opencc package.
# Copyright (C) 2010 BYVoid
# This file is distributed under the same license as the opencc package.
#
# BYVoid , 2010.
msgid ""
msgstr ""
"Project-Id-Version: opencc 0.1.2\n"
"Report-Msgid-Bugs-To: http://code.google.com/p/open-chinese-convert/issues/"
"entry\n"
"POT-Creation-Date: 2010-09-17 08:39+0800\n"
"PO-Revision-Date: 2010-09-17 08:48+0800\n"
"Last-Translator: \n"
"Language-Team: American English \n"
"Language: zh_HK\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Lokalize 1.0\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
#: src/config_reader.c:275
msgid "Can not access configuration file"
msgstr "無法訪問配置文件"
#: src/config_reader.c:278
msgid "Configuration file parse error"
msgstr "配置文件解析錯誤"
#: src/config_reader.c:281
msgid "Invalid property"
msgstr "無效屬性"
#: src/config_reader.c:284
msgid "Invalid dictionary type"
msgstr "無效的辭典類型"
#: src/config_reader.c:287 src/converter.c:747 src/dict_group.c:218
#: src/opencc.c:271
msgid "Unknown"
msgstr "未知"
#: src/converter.c:741 src/dict_group.c:206
msgid "No dictionary loaded"
msgstr "沒有辭典加載"
#: src/converter.c:744
msgid "Output buffer not enough for one segment"
msgstr "輸出緩衝區不足以存儲一個分詞"
#: src/dict_group.c:209
msgid "Can not open dictionary file"
msgstr "無法打開辭典"
#: src/dict_group.c:212
msgid "Invalid dictionary file"
msgstr "辭典格式無效"
#: src/dict_group.c:215
msgid "Invalid dictionary index"
msgstr "辭典索引無效"
#: src/opencc.c:262
msgid "Dictionary loading error"
msgstr "辭典讀取錯誤"
#: src/opencc.c:265
msgid "Configuration error"
msgstr "配置錯誤"
#: src/opencc.c:268
msgid "Converter error"
msgstr "轉換器錯誤"
#: src/tools/opencc.c:39
msgid "OpenCC initialization error"
msgstr "OpenCC初始化錯誤"
#: src/tools/opencc.c:51
#, c-format
msgid "Can not read file: %s\n"
msgstr "無法讀取文件:%s\n"
#: src/tools/opencc.c:61 src/tools/opencc_dict.c:296
#, c-format
msgid "Can not write file: %s\n"
msgstr "無法寫入文件: %s\n"
#: src/tools/opencc.c:72
msgid "OpenCC error"
msgstr "OpenCC 錯誤"
#: src/tools/opencc.c:86 src/tools/opencc.c:89 src/tools/opencc.c:92
#: src/tools/opencc.c:100 src/tools/opencc.c:107 src/tools/opencc.c:110
#: src/tools/opencc_dict.c:246 src/tools/opencc_dict.c:254
#: src/tools/opencc_dict.c:385 src/tools/opencc_dict.c:386
#, c-format
msgid "\n"
msgstr "\n"
#: src/tools/opencc.c:87
#, c-format
msgid "Open Chinese Convert (OpenCC) Command Line Tool\n"
msgstr "Open Chinese Convert (OpenCC) 命令行工具\n"
#: src/tools/opencc.c:88
#, c-format
msgid "Version %s\n"
msgstr "版本 %s\n"
#: src/tools/opencc.c:90
#, c-format
msgid "Author: %s\n"
msgstr "作者: %s\n"
#: src/tools/opencc.c:91
#, c-format
msgid "Bug Report: %s\n"
msgstr "Bug彙報: %s\n"
#: src/tools/opencc.c:98 src/tools/opencc_dict.c:379
#, c-format
msgid "Usage:\n"
msgstr "使用方法:\n"
#: src/tools/opencc.c:99
#, c-format
msgid " opencc [Options]\n"
msgstr " opencc [參數]\n"
#: src/tools/opencc.c:101
#, c-format
msgid "Options:\n"
msgstr "參數:\n"
#: src/tools/opencc.c:102
#, c-format
msgid " -i [file], --input=[file] Read original text from [file].\n"
msgstr " -i [file], --input=[file] 從 [file] 讀取原始文本。\n"
#: src/tools/opencc.c:103
#, c-format
msgid " -o [file], --output=[file] Write converted text to [file].\n"
msgstr " -o [file], --output=[file] 將轉換後的文本寫入 [file].\n"
#: src/tools/opencc.c:104
#, c-format
msgid ""
" -c [file], --config=[file] Load configuration of conversion from [file].\n"
msgstr " -c [file], --config=[file] 從 [file] 中讀取配置。\n"
#: src/tools/opencc.c:105
#, c-format
msgid " -v, --version Print version and build information.\n"
msgstr " -v, --version 顯示版本和生成信息。\n"
#: src/tools/opencc.c:106
#, c-format
msgid " -h, --help Print this help.\n"
msgstr " -h, --help 顯示此幫助。\n"
#: src/tools/opencc.c:108
#, c-format
msgid ""
"With no input file, reads standard input and writes converted stream to "
"standard output.\n"
msgstr "如果沒有設置輸入文件,將會從標準輸入中讀取數據,並輸出到標準輸出。\n"
#: src/tools/opencc.c:109
#, c-format
msgid "Default configuration(%s) will be loaded if not set.\n"
msgstr "如果沒有設置config file,則會讀取默認配置文件(%s)。\n"
#: src/tools/opencc.c:144
#, c-format
msgid "Please use %s --help.\n"
msgstr "請使用%s --help以獲得幫助。\n"
#: src/tools/opencc_dict.c:373
#, c-format
msgid ""
"\n"
"Open Chinese Convert (OpenCC) Dictionary Tool\n"
"Version %s\n"
"\n"
msgstr ""
"\n"
"Open Chinese Convert (OpenCC) 辭典工具\n"
"版本 %s\n"
"\n"
#: src/tools/opencc_dict.c:380
#, c-format
msgid ""
" opencc_dict -i input_file -o output_file\n"
"\n"
msgstr ""
" opencc_dict -i input_file -o output_file\n"
"\n"
#: src/tools/opencc_dict.c:381
#, c-format
msgid " -i input_file\n"
msgstr " -i input_file\n"
#: src/tools/opencc_dict.c:382
#, c-format
msgid " Read data from input_file.\n"
msgstr " 從input_file讀取數據。\n"
#: src/tools/opencc_dict.c:383
#, c-format
msgid " -o output_file\n"
msgstr " -o output_file\n"
#: src/tools/opencc_dict.c:384
#, c-format
msgid " Write converted data to output_file.\n"
msgstr " 將生成的辭典寫入output_file。\n"
#: src/tools/opencc_dict.c:432
#, c-format
msgid "Please specify input file using -i.\n"
msgstr "請使用-i指定輸入文件。\n"
#: src/tools/opencc_dict.c:439
#, c-format
msgid "Please specify output file using -o.\n"
msgstr "請使用-o指定輸入文件。\n"
#~ msgid ""
#~ " opencc [-i input_file] [-o output_file] [-c config_file]\n"
#~ "\n"
#~ msgstr ""
#~ " opencc [-i input_file] [-o output_file] [-c config_file]\n"
#~ "\n"
#~ msgid " -c config_file\n"
#~ msgstr " -c config_file\n"
#~ msgid " Load dictionary configuration from config_file.\n"
#~ msgstr " 從config_file讀取配置。\n"
#~ msgid " Note:\n"
#~ msgstr " 註釋:\n"
#~ msgid ""
#~ " Text from standard input will be read if input_file is not set\n"
#~ " and will be written to standard output if output_file is not set.\n"
#~ msgstr ""
#~ " 如果沒有設置input_file,將會從標準輸入讀取文本。如果\n"
#~ " 沒有設置output_file,將會把轉換後文本寫入到標準輸出。\n"
opencc-0.4.3/po/zh_TW.po 000640 567316 013202 00000015005 12145345503 016316 0 ustar 00carbokuo nonconf 000000 000000 # Chinese translations for opencc package.
# Copyright (C) 2010 BYVoid
# This file is distributed under the same license as the opencc package.
#
# BYVoid , 2010.
msgid ""
msgstr ""
"Project-Id-Version: opencc 0.1.2\n"
"Report-Msgid-Bugs-To: http://code.google.com/p/open-chinese-convert/issues/"
"entry\n"
"POT-Creation-Date: 2010-09-17 08:39+0800\n"
"PO-Revision-Date: 2010-09-17 08:48+0800\n"
"Last-Translator: \n"
"Language-Team: American English \n"
"Language: zh_TW\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Lokalize 1.0\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
#: src/config_reader.c:275
msgid "Can not access configuration file"
msgstr "無法訪問配置文件"
#: src/config_reader.c:278
msgid "Configuration file parse error"
msgstr "配置文件解析錯誤"
#: src/config_reader.c:281
msgid "Invalid property"
msgstr "無效屬性"
#: src/config_reader.c:284
msgid "Invalid dictionary type"
msgstr "無效的辭典類型"
#: src/config_reader.c:287 src/converter.c:747 src/dict_group.c:218
#: src/opencc.c:271
msgid "Unknown"
msgstr "未知"
#: src/converter.c:741 src/dict_group.c:206
msgid "No dictionary loaded"
msgstr "沒有辭典加載"
#: src/converter.c:744
msgid "Output buffer not enough for one segment"
msgstr "輸出緩衝區不足以存儲一個分詞"
#: src/dict_group.c:209
msgid "Can not open dictionary file"
msgstr "無法打開辭典"
#: src/dict_group.c:212
msgid "Invalid dictionary file"
msgstr "辭典格式無效"
#: src/dict_group.c:215
msgid "Invalid dictionary index"
msgstr "辭典索引無效"
#: src/opencc.c:262
msgid "Dictionary loading error"
msgstr "辭典讀取錯誤"
#: src/opencc.c:265
msgid "Configuration error"
msgstr "配置錯誤"
#: src/opencc.c:268
msgid "Converter error"
msgstr "轉換器錯誤"
#: src/tools/opencc.c:39
msgid "OpenCC initialization error"
msgstr "OpenCC初始化錯誤"
#: src/tools/opencc.c:51
#, c-format
msgid "Can not read file: %s\n"
msgstr "無法讀取文件:%s\n"
#: src/tools/opencc.c:61 src/tools/opencc_dict.c:296
#, c-format
msgid "Can not write file: %s\n"
msgstr "無法寫入文件: %s\n"
#: src/tools/opencc.c:72
msgid "OpenCC error"
msgstr "OpenCC 錯誤"
#: src/tools/opencc.c:86 src/tools/opencc.c:89 src/tools/opencc.c:92
#: src/tools/opencc.c:100 src/tools/opencc.c:107 src/tools/opencc.c:110
#: src/tools/opencc_dict.c:246 src/tools/opencc_dict.c:254
#: src/tools/opencc_dict.c:385 src/tools/opencc_dict.c:386
#, c-format
msgid "\n"
msgstr "\n"
#: src/tools/opencc.c:87
#, c-format
msgid "Open Chinese Convert (OpenCC) Command Line Tool\n"
msgstr "Open Chinese Convert (OpenCC) 命令行工具\n"
#: src/tools/opencc.c:88
#, c-format
msgid "Version %s\n"
msgstr "版本 %s\n"
#: src/tools/opencc.c:90
#, c-format
msgid "Author: %s\n"
msgstr "作者: %s\n"
#: src/tools/opencc.c:91
#, c-format
msgid "Bug Report: %s\n"
msgstr "Bug彙報: %s\n"
#: src/tools/opencc.c:98 src/tools/opencc_dict.c:379
#, c-format
msgid "Usage:\n"
msgstr "使用方法:\n"
#: src/tools/opencc.c:99
#, c-format
msgid " opencc [Options]\n"
msgstr " opencc [參數]\n"
#: src/tools/opencc.c:101
#, c-format
msgid "Options:\n"
msgstr "參數:\n"
#: src/tools/opencc.c:102
#, c-format
msgid " -i [file], --input=[file] Read original text from [file].\n"
msgstr " -i [file], --input=[file] 從 [file] 讀取原始文本。\n"
#: src/tools/opencc.c:103
#, c-format
msgid " -o [file], --output=[file] Write converted text to [file].\n"
msgstr " -o [file], --output=[file] 將轉換後的文本寫入 [file].\n"
#: src/tools/opencc.c:104
#, c-format
msgid ""
" -c [file], --config=[file] Load configuration of conversion from [file].\n"
msgstr " -c [file], --config=[file] 從 [file] 中讀取配置。\n"
#: src/tools/opencc.c:105
#, c-format
msgid " -v, --version Print version and build information.\n"
msgstr " -v, --version 顯示版本和生成信息。\n"
#: src/tools/opencc.c:106
#, c-format
msgid " -h, --help Print this help.\n"
msgstr " -h, --help 顯示此幫助。\n"
#: src/tools/opencc.c:108
#, c-format
msgid ""
"With no input file, reads standard input and writes converted stream to "
"standard output.\n"
msgstr "如果沒有設置輸入文件,將會從標準輸入中讀取數據,並輸出到標準輸出。\n"
#: src/tools/opencc.c:109
#, c-format
msgid "Default configuration(%s) will be loaded if not set.\n"
msgstr "如果沒有設置config file,則會讀取默認配置文件(%s)。\n"
#: src/tools/opencc.c:144
#, c-format
msgid "Please use %s --help.\n"
msgstr "請使用%s --help以獲得幫助。\n"
#: src/tools/opencc_dict.c:373
#, c-format
msgid ""
"\n"
"Open Chinese Convert (OpenCC) Dictionary Tool\n"
"Version %s\n"
"\n"
msgstr ""
"\n"
"Open Chinese Convert (OpenCC) 辭典工具\n"
"版本 %s\n"
"\n"
#: src/tools/opencc_dict.c:380
#, c-format
msgid ""
" opencc_dict -i input_file -o output_file\n"
"\n"
msgstr ""
" opencc_dict -i input_file -o output_file\n"
"\n"
#: src/tools/opencc_dict.c:381
#, c-format
msgid " -i input_file\n"
msgstr " -i input_file\n"
#: src/tools/opencc_dict.c:382
#, c-format
msgid " Read data from input_file.\n"
msgstr " 從input_file讀取數據。\n"
#: src/tools/opencc_dict.c:383
#, c-format
msgid " -o output_file\n"
msgstr " -o output_file\n"
#: src/tools/opencc_dict.c:384
#, c-format
msgid " Write converted data to output_file.\n"
msgstr " 將生成的辭典寫入output_file。\n"
#: src/tools/opencc_dict.c:432
#, c-format
msgid "Please specify input file using -i.\n"
msgstr "請使用-i指定輸入文件。\n"
#: src/tools/opencc_dict.c:439
#, c-format
msgid "Please specify output file using -o.\n"
msgstr "請使用-o指定輸入文件。\n"
#~ msgid ""
#~ " opencc [-i input_file] [-o output_file] [-c config_file]\n"
#~ "\n"
#~ msgstr ""
#~ " opencc [-i input_file] [-o output_file] [-c config_file]\n"
#~ "\n"
#~ msgid " -c config_file\n"
#~ msgstr " -c config_file\n"
#~ msgid " Load dictionary configuration from config_file.\n"
#~ msgstr " 從config_file讀取配置。\n"
#~ msgid " Note:\n"
#~ msgstr " 註釋:\n"
#~ msgid ""
#~ " Text from standard input will be read if input_file is not set\n"
#~ " and will be written to standard output if output_file is not set.\n"
#~ msgstr ""
#~ " 如果沒有設置input_file,將會從標準輸入讀取文本。如果\n"
#~ " 沒有設置output_file,將會把轉換後文本寫入到標準輸出。\n"
opencc-0.4.3/po/LINGUAS 000640 567316 013202 00000000022 12145345503 015741 0 ustar 00carbokuo nonconf 000000 000000 zh_CN
zh_HK
zh_TW
opencc-0.4.3/po/update.sh 000750 567316 013202 00000000771 12145345503 016547 0 ustar 00carbokuo nonconf 000000 000000 #!/bin/sh
xgettext \
--default-domain="opencc" \
--directory=".." \
--force-po \
--add-comments="TRANSLATORS:" \
--keyword=_ --keyword=N_ \
--files-from="POTFILES.in" \
--copyright-holder="BYVoid " \
--msgid-bugs-address="http://code.google.com/p/open-chinese-convert/issues/entry" \
--from-code=UTF-8 \
--sort-by-file \
--output=opencc.pot
for LANG in `cat LINGUAS`
do
echo -n $LANG
msgmerge \
--backup=none \
--update $LANG.po \
opencc.pot
done
rm opencc.pot opencc-0.4.3/po/POTFILES.in 000640 567316 013202 00000000655 12145345503 016505 0 ustar 00carbokuo nonconf 000000 000000 src/config_reader.c
src/config_reader.h
src/converter.c
src/converter.h
src/dict_group.c
src/dict_group.h
src/dict_chain.c
src/dict_chain.h
src/encoding.c
src/encoding.h
src/opencc.c
src/opencc.h
src/opencc_types.h
src/utils.c
src/utils.h
src/wrapper/cplusplus/openccxx.h
src/dict.c
src/dict.h
src/dictionary/datrie.c
src/dictionary/datrie.h
src/dictionary/text.c
src/dictionary/text.h
src/tools/opencc.c
src/tools/opencc_dict.c
opencc-0.4.3/po/zh_CN.po 000640 567316 013202 00000015005 12145345503 016264 0 ustar 00carbokuo nonconf 000000 000000 # Chinese translations for opencc package.
# Copyright (C) 2010 BYVoid
# This file is distributed under the same license as the opencc package.
#
# BYVoid , 2010.
msgid ""
msgstr ""
"Project-Id-Version: opencc 0.1.2\n"
"Report-Msgid-Bugs-To: http://code.google.com/p/open-chinese-convert/issues/"
"entry\n"
"POT-Creation-Date: 2010-09-17 08:39+0800\n"
"PO-Revision-Date: 2010-09-17 08:48+0800\n"
"Last-Translator: \n"
"Language-Team: American English \n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Lokalize 1.0\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
#: src/config_reader.c:275
msgid "Can not access configuration file"
msgstr "无法访问配置文件"
#: src/config_reader.c:278
msgid "Configuration file parse error"
msgstr "配置文件解析错误"
#: src/config_reader.c:281
msgid "Invalid property"
msgstr "无效属性"
#: src/config_reader.c:284
msgid "Invalid dictionary type"
msgstr "无效的辞典类型"
#: src/config_reader.c:287 src/converter.c:747 src/dict_group.c:218
#: src/opencc.c:271
msgid "Unknown"
msgstr "未知"
#: src/converter.c:741 src/dict_group.c:206
msgid "No dictionary loaded"
msgstr "没有辞典加载"
#: src/converter.c:744
msgid "Output buffer not enough for one segment"
msgstr "输出缓冲区不足以存储一个分词"
#: src/dict_group.c:209
msgid "Can not open dictionary file"
msgstr "无法打开辞典"
#: src/dict_group.c:212
msgid "Invalid dictionary file"
msgstr "辞典格式无效"
#: src/dict_group.c:215
msgid "Invalid dictionary index"
msgstr "辞典索引无效"
#: src/opencc.c:262
msgid "Dictionary loading error"
msgstr "辞典读取错误"
#: src/opencc.c:265
msgid "Configuration error"
msgstr "配置错误"
#: src/opencc.c:268
msgid "Converter error"
msgstr "转换器错误"
#: src/tools/opencc.c:39
msgid "OpenCC initialization error"
msgstr "OpenCC初始化错误"
#: src/tools/opencc.c:51
#, c-format
msgid "Can not read file: %s\n"
msgstr "无法读取文件:%s\n"
#: src/tools/opencc.c:61 src/tools/opencc_dict.c:296
#, c-format
msgid "Can not write file: %s\n"
msgstr "无法写入文件: %s\n"
#: src/tools/opencc.c:72
msgid "OpenCC error"
msgstr "OpenCC 错误"
#: src/tools/opencc.c:86 src/tools/opencc.c:89 src/tools/opencc.c:92
#: src/tools/opencc.c:100 src/tools/opencc.c:107 src/tools/opencc.c:110
#: src/tools/opencc_dict.c:246 src/tools/opencc_dict.c:254
#: src/tools/opencc_dict.c:385 src/tools/opencc_dict.c:386
#, c-format
msgid "\n"
msgstr "\n"
#: src/tools/opencc.c:87
#, c-format
msgid "Open Chinese Convert (OpenCC) Command Line Tool\n"
msgstr "Open Chinese Convert (OpenCC) 命令行工具\n"
#: src/tools/opencc.c:88
#, c-format
msgid "Version %s\n"
msgstr "版本 %s\n"
#: src/tools/opencc.c:90
#, c-format
msgid "Author: %s\n"
msgstr "作者: %s\n"
#: src/tools/opencc.c:91
#, c-format
msgid "Bug Report: %s\n"
msgstr "Bug汇报: %s\n"
#: src/tools/opencc.c:98 src/tools/opencc_dict.c:379
#, c-format
msgid "Usage:\n"
msgstr "使用方法:\n"
#: src/tools/opencc.c:99
#, c-format
msgid " opencc [Options]\n"
msgstr " opencc [参数]\n"
#: src/tools/opencc.c:101
#, c-format
msgid "Options:\n"
msgstr "参数:\n"
#: src/tools/opencc.c:102
#, c-format
msgid " -i [file], --input=[file] Read original text from [file].\n"
msgstr " -i [file], --input=[file] 从 [file] 读取原始文本。\n"
#: src/tools/opencc.c:103
#, c-format
msgid " -o [file], --output=[file] Write converted text to [file].\n"
msgstr " -o [file], --output=[file] 将转换后的文本写入 [file].\n"
#: src/tools/opencc.c:104
#, c-format
msgid ""
" -c [file], --config=[file] Load configuration of conversion from [file].\n"
msgstr " -c [file], --config=[file] 从 [file] 中读取配置。\n"
#: src/tools/opencc.c:105
#, c-format
msgid " -v, --version Print version and build information.\n"
msgstr " -v, --version 显示版本和生成信息。\n"
#: src/tools/opencc.c:106
#, c-format
msgid " -h, --help Print this help.\n"
msgstr " -h, --help 显示此帮助。\n"
#: src/tools/opencc.c:108
#, c-format
msgid ""
"With no input file, reads standard input and writes converted stream to "
"standard output.\n"
msgstr "如果没有设置输入文件,将会从标准输入中读取数据,并输出到标准输出。\n"
#: src/tools/opencc.c:109
#, c-format
msgid "Default configuration(%s) will be loaded if not set.\n"
msgstr "如果没有设置config file,则会读取默认配置文件(%s)。\n"
#: src/tools/opencc.c:144
#, c-format
msgid "Please use %s --help.\n"
msgstr "请使用%s --help以获得帮助。\n"
#: src/tools/opencc_dict.c:373
#, c-format
msgid ""
"\n"
"Open Chinese Convert (OpenCC) Dictionary Tool\n"
"Version %s\n"
"\n"
msgstr ""
"\n"
"Open Chinese Convert (OpenCC) 辞典工具\n"
"版本 %s\n"
"\n"
#: src/tools/opencc_dict.c:380
#, c-format
msgid ""
" opencc_dict -i input_file -o output_file\n"
"\n"
msgstr ""
" opencc_dict -i input_file -o output_file\n"
"\n"
#: src/tools/opencc_dict.c:381
#, c-format
msgid " -i input_file\n"
msgstr " -i input_file\n"
#: src/tools/opencc_dict.c:382
#, c-format
msgid " Read data from input_file.\n"
msgstr " 从input_file读取数据。\n"
#: src/tools/opencc_dict.c:383
#, c-format
msgid " -o output_file\n"
msgstr " -o output_file\n"
#: src/tools/opencc_dict.c:384
#, c-format
msgid " Write converted data to output_file.\n"
msgstr " 将生成的辞典写入output_file。\n"
#: src/tools/opencc_dict.c:432
#, c-format
msgid "Please specify input file using -i.\n"
msgstr "请使用-i指定输入文件。\n"
#: src/tools/opencc_dict.c:439
#, c-format
msgid "Please specify output file using -o.\n"
msgstr "请使用-o指定输入文件。\n"
#~ msgid ""
#~ " opencc [-i input_file] [-o output_file] [-c config_file]\n"
#~ "\n"
#~ msgstr ""
#~ " opencc [-i input_file] [-o output_file] [-c config_file]\n"
#~ "\n"
#~ msgid " -c config_file\n"
#~ msgstr " -c config_file\n"
#~ msgid " Load dictionary configuration from config_file.\n"
#~ msgstr " 从config_file读取配置。\n"
#~ msgid " Note:\n"
#~ msgstr " 注释:\n"
#~ msgid ""
#~ " Text from standard input will be read if input_file is not set\n"
#~ " and will be written to standard output if output_file is not set.\n"
#~ msgstr ""
#~ " 如果没有设置input_file,将会从标准输入读取文本。如果\n"
#~ " 没有设置output_file,将会把转换后文本写入到标准输出。\n"
opencc-0.4.3/po/CMakeLists.txt 000640 567316 013202 00000001130 12145345503 017455 0 ustar 00carbokuo nonconf 000000 000000 file(STRINGS LINGUAS LANGUAGES)
separate_arguments(LANGUAGES)
set(DOMAIN ${PACKAGE_NAME})
foreach(LANG ${LANGUAGES})
add_custom_target(
${LANG}_mo
ALL
DEPENDS
${LANG}.mo
)
add_custom_command(
OUTPUT ${LANG}.mo
COMMAND ${GETTEXT_MSGFMT_EXECUTABLE}
${GETTEXT_MSGFMT_PARAMETER}
-o ${LANG}.mo ${CMAKE_SOURCE_DIR}/po/${LANG}.po
DEPENDS
${LANG}.po
COMMENT "mo-update [${LANG}]: Creating mo file."
)
install(
FILES
${CMAKE_BINARY_DIR}/po/${LANG}.mo
RENAME
${DOMAIN}.mo
DESTINATION
${DIR_SHARE_LOCALE}/${LANG}/LC_MESSAGES
)
endforeach(LANG ${LANGUAGES}) opencc-0.4.3/AUTHORS 000640 567316 013202 00000000425 12145345503 015355 0 ustar 00carbokuo nonconf 000000 000000 Author:
BYVoid
Contributors:
Peng Huang
Kefu Chai
LI Daobing
Asias
Peng Wu
Xiaojun Ma
佛振
opencc-0.4.3/debug.sh 000750 567316 013202 00000000332 12145345503 015726 0 ustar 00carbokuo nonconf 000000 000000 mkdir -p debug \
&& cd debug \
&& cmake \
-D ENABLE_GETTEXT:BOOL=OFF \
-D BUILD_DOCUMENTATION:BOOL=ON \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=`pwd`/root \
.. \
&& make \
&& make install \
&& make test
opencc-0.4.3/doc/opencc.doxy.in 000640 567316 013202 00000234372 12145345503 017645 0 ustar 00carbokuo nonconf 000000 000000 # Doxyfile 1.8.3.1
# This file describes the settings to be used by the documentation system
# doxygen (www.doxygen.org) for a project.
#
# All text after a hash (#) is considered a comment and will be ignored.
# The format is:
# TAG = value [value, ...]
# For lists items can also be appended using:
# TAG += value [value, ...]
# Values that contain spaces should be placed between quotes (" ").
#---------------------------------------------------------------------------
# Project related configuration options
#---------------------------------------------------------------------------
# This tag specifies the encoding used for all characters in the config file
# that follow. The default is UTF-8 which is also the encoding used for all
# text before the first occurrence of this tag. Doxygen uses libiconv (or the
# iconv built into libc) for the transcoding. See
# http://www.gnu.org/software/libiconv for the list of possible encodings.
DOXYFILE_ENCODING = UTF-8
# The PROJECT_NAME tag is a single word (or sequence of words) that should
# identify the project. Note that if you do not use Doxywizard you need
# to put quotes around the project name if it contains spaces.
PROJECT_NAME = "Open Chinese Convert"
# The PROJECT_NUMBER tag can be used to enter a project or revision number.
# This could be handy for archiving the generated documentation or
# if some version control system is used.
PROJECT_NUMBER = "@OPENCC_VERSION@"
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer
# a quick idea about the purpose of the project. Keep the description short.
PROJECT_BRIEF = "A project for conversion between Traditional and Simplified Chinese"
# With the PROJECT_LOGO tag one can specify an logo or icon that is
# included in the documentation. The maximum height of the logo should not
# exceed 55 pixels and the maximum width should not exceed 200 pixels.
# Doxygen will copy the logo to the output directory.
PROJECT_LOGO =
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
# base path where the generated documentation will be put.
# If a relative path is entered, it will be relative to the location
# where doxygen was started. If left blank the current directory will be used.
OUTPUT_DIRECTORY =
# If the CREATE_SUBDIRS tag is set to YES, then doxygen will create
# 4096 sub-directories (in 2 levels) under the output directory of each output
# format and will distribute the generated files over these directories.
# Enabling this option can be useful when feeding doxygen a huge amount of
# source files, where putting all generated files in the same directory would
# otherwise cause performance problems for the file system.
CREATE_SUBDIRS = NO
# The OUTPUT_LANGUAGE tag is used to specify the language in which all
# documentation generated by doxygen is written. Doxygen will use this
# information to generate all constant output in the proper language.
# The default language is English, other supported languages are:
# Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional,
# Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German,
# Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English
# messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian,
# Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak,
# Slovene, Spanish, Swedish, Ukrainian, and Vietnamese.
OUTPUT_LANGUAGE = English
# If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will
# include brief member descriptions after the members that are listed in
# the file and class documentation (similar to JavaDoc).
# Set to NO to disable this.
BRIEF_MEMBER_DESC = YES
# If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend
# the brief description of a member or function before the detailed description.
# Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the
# brief descriptions will be completely suppressed.
REPEAT_BRIEF = YES
# This tag implements a quasi-intelligent brief description abbreviator
# that is used to form the text in various listings. Each string
# in this list, if found as the leading text of the brief description, will be
# stripped from the text and the result after processing the whole list, is
# used as the annotated text. Otherwise, the brief description is used as-is.
# If left blank, the following values are used ("$name" is automatically
# replaced with the name of the entity): "The $name class" "The $name widget"
# "The $name file" "is" "provides" "specifies" "contains"
# "represents" "a" "an" "the"
ABBREVIATE_BRIEF =
# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then
# Doxygen will generate a detailed section even if there is only a brief
# description.
ALWAYS_DETAILED_SEC = NO
# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all
# inherited members of a class in the documentation of that class as if those
# members were ordinary class members. Constructors, destructors and assignment
# operators of the base classes will not be shown.
INLINE_INHERITED_MEMB = NO
# If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full
# path before files name in the file list and in the header files. If set
# to NO the shortest path that makes the file name unique will be used.
FULL_PATH_NAMES = YES
# If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag
# can be used to strip a user-defined part of the path. Stripping is
# only done if one of the specified strings matches the left-hand part of
# the path. The tag can be used to show relative paths in the file list.
# If left blank the directory from which doxygen is run is used as the
# path to strip. Note that you specify absolute paths here, but also
# relative paths, which will be relative from the directory where doxygen is
# started.
STRIP_FROM_PATH =
# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of
# the path mentioned in the documentation of a class, which tells
# the reader which header file to include in order to use a class.
# If left blank only the name of the header file containing the class
# definition is used. Otherwise one should specify the include paths that
# are normally passed to the compiler using the -I flag.
STRIP_FROM_INC_PATH =
# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter
# (but less readable) file names. This can be useful if your file system
# doesn't support long names like on DOS, Mac, or CD-ROM.
SHORT_NAMES = NO
# If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen
# will interpret the first line (until the first dot) of a JavaDoc-style
# comment as the brief description. If set to NO, the JavaDoc
# comments will behave just like regular Qt-style comments
# (thus requiring an explicit @brief command for a brief description.)
JAVADOC_AUTOBRIEF = YES
# If the QT_AUTOBRIEF tag is set to YES then Doxygen will
# interpret the first line (until the first dot) of a Qt-style
# comment as the brief description. If set to NO, the comments
# will behave just like regular Qt-style comments (thus requiring
# an explicit \brief command for a brief description.)
QT_AUTOBRIEF = NO
# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen
# treat a multi-line C++ special comment block (i.e. a block of //! or ///
# comments) as a brief description. This used to be the default behaviour.
# The new default is to treat a multi-line C++ comment block as a detailed
# description. Set this tag to YES if you prefer the old behaviour instead.
MULTILINE_CPP_IS_BRIEF = NO
# If the INHERIT_DOCS tag is set to YES (the default) then an undocumented
# member inherits the documentation from any documented member that it
# re-implements.
INHERIT_DOCS = YES
# If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce
# a new page for each member. If set to NO, the documentation of a member will
# be part of the file/class/namespace that contains it.
SEPARATE_MEMBER_PAGES = NO
# The TAB_SIZE tag can be used to set the number of spaces in a tab.
# Doxygen uses this value to replace tabs by spaces in code fragments.
TAB_SIZE = 2
# This tag can be used to specify a number of aliases that acts
# as commands in the documentation. An alias has the form "name=value".
# For example adding "sideeffect=\par Side Effects:\n" will allow you to
# put the command \sideeffect (or @sideeffect) in the documentation, which
# will result in a user-defined paragraph with heading "Side Effects:".
# You can put \n's in the value part of an alias to insert newlines.
ALIASES =
# This tag can be used to specify a number of word-keyword mappings (TCL only).
# A mapping has the form "name=value". For example adding
# "class=itcl::class" will allow you to use the command class in the
# itcl::class meaning.
TCL_SUBST =
# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C
# sources only. Doxygen will then generate output that is more tailored for C.
# For instance, some of the names that are used will be different. The list
# of all members will be omitted, etc.
OPTIMIZE_OUTPUT_FOR_C = YES
# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java
# sources only. Doxygen will then generate output that is more tailored for
# Java. For instance, namespaces will be presented as packages, qualified
# scopes will look different, etc.
OPTIMIZE_OUTPUT_JAVA = NO
# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran
# sources only. Doxygen will then generate output that is more tailored for
# Fortran.
OPTIMIZE_FOR_FORTRAN = NO
# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL
# sources. Doxygen will then generate output that is tailored for
# VHDL.
OPTIMIZE_OUTPUT_VHDL = NO
# Doxygen selects the parser to use depending on the extension of the files it
# parses. With this tag you can assign which parser to use for a given
# extension. Doxygen has a built-in mapping, but you can override or extend it
# using this tag. The format is ext=language, where ext is a file extension,
# and language is one of the parsers supported by doxygen: IDL, Java,
# Javascript, CSharp, C, C++, D, PHP, Objective-C, Python, Fortran, VHDL, C,
# C++. For instance to make doxygen treat .inc files as Fortran files (default
# is PHP), and .f files as C (default is Fortran), use: inc=Fortran f=C. Note
# that for custom extensions you also need to set FILE_PATTERNS otherwise the
# files are not read by doxygen.
EXTENSION_MAPPING =
# If MARKDOWN_SUPPORT is enabled (the default) then doxygen pre-processes all
# comments according to the Markdown format, which allows for more readable
# documentation. See http://daringfireball.net/projects/markdown/ for details.
# The output of markdown processing is further processed by doxygen, so you
# can mix doxygen, HTML, and XML commands with Markdown formatting.
# Disable only in case of backward compatibilities issues.
MARKDOWN_SUPPORT = YES
# When enabled doxygen tries to link words that correspond to documented classes,
# or namespaces to their corresponding documentation. Such a link can be
# prevented in individual cases by by putting a % sign in front of the word or
# globally by setting AUTOLINK_SUPPORT to NO.
AUTOLINK_SUPPORT = YES
# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want
# to include (a tag file for) the STL sources as input, then you should
# set this tag to YES in order to let doxygen match functions declarations and
# definitions whose arguments contain STL classes (e.g. func(std::string); v.s.
# func(std::string) {}). This also makes the inheritance and collaboration
# diagrams that involve STL classes more complete and accurate.
BUILTIN_STL_SUPPORT = NO
# If you use Microsoft's C++/CLI language, you should set this option to YES to
# enable parsing support.
CPP_CLI_SUPPORT = NO
# Set the SIP_SUPPORT tag to YES if your project consists of sip sources only.
# Doxygen will parse them like normal C++ but will assume all classes use public
# instead of private inheritance when no explicit protection keyword is present.
SIP_SUPPORT = NO
# For Microsoft's IDL there are propget and propput attributes to indicate
# getter and setter methods for a property. Setting this option to YES (the
# default) will make doxygen replace the get and set methods by a property in
# the documentation. This will only work if the methods are indeed getting or
# setting a simple type. If this is not the case, or you want to show the
# methods anyway, you should set this option to NO.
IDL_PROPERTY_SUPPORT = YES
# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC
# tag is set to YES, then doxygen will reuse the documentation of the first
# member in the group (if any) for the other members of the group. By default
# all members of a group must be documented explicitly.
DISTRIBUTE_GROUP_DOC = NO
# Set the SUBGROUPING tag to YES (the default) to allow class member groups of
# the same type (for instance a group of public functions) to be put as a
# subgroup of that type (e.g. under the Public Functions section). Set it to
# NO to prevent subgrouping. Alternatively, this can be done per class using
# the \nosubgrouping command.
SUBGROUPING = YES
# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and
# unions are shown inside the group in which they are included (e.g. using
# @ingroup) instead of on a separate page (for HTML and Man pages) or
# section (for LaTeX and RTF).
INLINE_GROUPED_CLASSES = NO
# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and
# unions with only public data fields will be shown inline in the documentation
# of the scope in which they are defined (i.e. file, namespace, or group
# documentation), provided this scope is documented. If set to NO (the default),
# structs, classes, and unions are shown on a separate page (for HTML and Man
# pages) or section (for LaTeX and RTF).
INLINE_SIMPLE_STRUCTS = NO
# When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum
# is documented as struct, union, or enum with the name of the typedef. So
# typedef struct TypeS {} TypeT, will appear in the documentation as a struct
# with name TypeT. When disabled the typedef will appear as a member of a file,
# namespace, or class. And the struct will be named TypeS. This can typically
# be useful for C code in case the coding convention dictates that all compound
# types are typedef'ed and only the typedef is referenced, never the tag name.
TYPEDEF_HIDES_STRUCT = NO
# The SYMBOL_CACHE_SIZE determines the size of the internal cache use to
# determine which symbols to keep in memory and which to flush to disk.
# When the cache is full, less often used symbols will be written to disk.
# For small to medium size projects (<1000 input files) the default value is
# probably good enough. For larger projects a too small cache size can cause
# doxygen to be busy swapping symbols to and from disk most of the time
# causing a significant performance penalty.
# If the system has enough physical memory increasing the cache will improve the
# performance by keeping more symbols in memory. Note that the value works on
# a logarithmic scale so increasing the size by one will roughly double the
# memory usage. The cache size is given by this formula:
# 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0,
# corresponding to a cache size of 2^16 = 65536 symbols.
SYMBOL_CACHE_SIZE = 0
# Similar to the SYMBOL_CACHE_SIZE the size of the symbol lookup cache can be
# set using LOOKUP_CACHE_SIZE. This cache is used to resolve symbols given
# their name and scope. Since this can be an expensive process and often the
# same symbol appear multiple times in the code, doxygen keeps a cache of
# pre-resolved symbols. If the cache is too small doxygen will become slower.
# If the cache is too large, memory is wasted. The cache size is given by this
# formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range is 0..9, the default is 0,
# corresponding to a cache size of 2^16 = 65536 symbols.
LOOKUP_CACHE_SIZE = 0
#---------------------------------------------------------------------------
# Build related configuration options
#---------------------------------------------------------------------------
# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in
# documentation are documented, even if no documentation was available.
# Private class members and static file members will be hidden unless
# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES
EXTRACT_ALL = NO
# If the EXTRACT_PRIVATE tag is set to YES all private members of a class
# will be included in the documentation.
EXTRACT_PRIVATE = NO
# If the EXTRACT_PACKAGE tag is set to YES all members with package or internal
# scope will be included in the documentation.
EXTRACT_PACKAGE = NO
# If the EXTRACT_STATIC tag is set to YES all static members of a file
# will be included in the documentation.
EXTRACT_STATIC = NO
# If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs)
# defined locally in source files will be included in the documentation.
# If set to NO only classes defined in header files are included.
EXTRACT_LOCAL_CLASSES = YES
# This flag is only useful for Objective-C code. When set to YES local
# methods, which are defined in the implementation section but not in
# the interface are included in the documentation.
# If set to NO (the default) only methods in the interface are included.
EXTRACT_LOCAL_METHODS = NO
# If this flag is set to YES, the members of anonymous namespaces will be
# extracted and appear in the documentation as a namespace called
# 'anonymous_namespace{file}', where file will be replaced with the base
# name of the file that contains the anonymous namespace. By default
# anonymous namespaces are hidden.
EXTRACT_ANON_NSPACES = NO
# If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all
# undocumented members of documented classes, files or namespaces.
# If set to NO (the default) these members will be included in the
# various overviews, but no documentation section is generated.
# This option has no effect if EXTRACT_ALL is enabled.
HIDE_UNDOC_MEMBERS = NO
# If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all
# undocumented classes that are normally visible in the class hierarchy.
# If set to NO (the default) these classes will be included in the various
# overviews. This option has no effect if EXTRACT_ALL is enabled.
HIDE_UNDOC_CLASSES = NO
# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all
# friend (class|struct|union) declarations.
# If set to NO (the default) these declarations will be included in the
# documentation.
HIDE_FRIEND_COMPOUNDS = NO
# If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any
# documentation blocks found inside the body of a function.
# If set to NO (the default) these blocks will be appended to the
# function's detailed documentation block.
HIDE_IN_BODY_DOCS = NO
# The INTERNAL_DOCS tag determines if documentation
# that is typed after a \internal command is included. If the tag is set
# to NO (the default) then the documentation will be excluded.
# Set it to YES to include the internal documentation.
INTERNAL_DOCS = NO
# If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate
# file names in lower-case letters. If set to YES upper-case letters are also
# allowed. This is useful if you have classes or files whose names only differ
# in case and if your file system supports case sensitive file names. Windows
# and Mac users are advised to set this option to NO.
CASE_SENSE_NAMES = NO
# If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen
# will show members with their full class and namespace scopes in the
# documentation. If set to YES the scope will be hidden.
HIDE_SCOPE_NAMES = NO
# If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen
# will put a list of the files that are included by a file in the documentation
# of that file.
SHOW_INCLUDE_FILES = YES
# If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen
# will list include files with double quotes in the documentation
# rather than with sharp brackets.
FORCE_LOCAL_INCLUDES = NO
# If the INLINE_INFO tag is set to YES (the default) then a tag [inline]
# is inserted in the documentation for inline members.
INLINE_INFO = YES
# If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen
# will sort the (detailed) documentation of file and class members
# alphabetically by member name. If set to NO the members will appear in
# declaration order.
SORT_MEMBER_DOCS = YES
# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the
# brief documentation of file, namespace and class members alphabetically
# by member name. If set to NO (the default) the members will appear in
# declaration order.
SORT_BRIEF_DOCS = NO
# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen
# will sort the (brief and detailed) documentation of class members so that
# constructors and destructors are listed first. If set to NO (the default)
# the constructors will appear in the respective orders defined by
# SORT_MEMBER_DOCS and SORT_BRIEF_DOCS.
# This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO
# and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO.
SORT_MEMBERS_CTORS_1ST = NO
# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the
# hierarchy of group names into alphabetical order. If set to NO (the default)
# the group names will appear in their defined order.
SORT_GROUP_NAMES = NO
# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be
# sorted by fully-qualified names, including namespaces. If set to
# NO (the default), the class list will be sorted only by class name,
# not including the namespace part.
# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES.
# Note: This option applies only to the class list, not to the
# alphabetical list.
SORT_BY_SCOPE_NAME = NO
# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to
# do proper type resolution of all parameters of a function it will reject a
# match between the prototype and the implementation of a member function even
# if there is only one candidate or it is obvious which candidate to choose
# by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen
# will still accept a match between prototype and implementation in such cases.
STRICT_PROTO_MATCHING = NO
# The GENERATE_TODOLIST tag can be used to enable (YES) or
# disable (NO) the todo list. This list is created by putting \todo
# commands in the documentation.
GENERATE_TODOLIST = YES
# The GENERATE_TESTLIST tag can be used to enable (YES) or
# disable (NO) the test list. This list is created by putting \test
# commands in the documentation.
GENERATE_TESTLIST = YES
# The GENERATE_BUGLIST tag can be used to enable (YES) or
# disable (NO) the bug list. This list is created by putting \bug
# commands in the documentation.
GENERATE_BUGLIST = YES
# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or
# disable (NO) the deprecated list. This list is created by putting
# \deprecated commands in the documentation.
GENERATE_DEPRECATEDLIST= YES
# The ENABLED_SECTIONS tag can be used to enable conditional
# documentation sections, marked by \if section-label ... \endif
# and \cond section-label ... \endcond blocks.
ENABLED_SECTIONS =
# The MAX_INITIALIZER_LINES tag determines the maximum number of lines
# the initial value of a variable or macro consists of for it to appear in
# the documentation. If the initializer consists of more lines than specified
# here it will be hidden. Use a value of 0 to hide initializers completely.
# The appearance of the initializer of individual variables and macros in the
# documentation can be controlled using \showinitializer or \hideinitializer
# command in the documentation regardless of this setting.
MAX_INITIALIZER_LINES = 30
# Set the SHOW_USED_FILES tag to NO to disable the list of files generated
# at the bottom of the documentation of classes and structs. If set to YES the
# list will mention the files that were used to generate the documentation.
SHOW_USED_FILES = YES
# Set the SHOW_FILES tag to NO to disable the generation of the Files page.
# This will remove the Files entry from the Quick Index and from the
# Folder Tree View (if specified). The default is YES.
SHOW_FILES = YES
# Set the SHOW_NAMESPACES tag to NO to disable the generation of the
# Namespaces page.
# This will remove the Namespaces entry from the Quick Index
# and from the Folder Tree View (if specified). The default is YES.
SHOW_NAMESPACES = YES
# The FILE_VERSION_FILTER tag can be used to specify a program or script that
# doxygen should invoke to get the current version for each file (typically from
# the version control system). Doxygen will invoke the program by executing (via
# popen()) the command , where is the value of
# the FILE_VERSION_FILTER tag, and is the name of an input file
# provided by doxygen. Whatever the program writes to standard output
# is used as the file version. See the manual for examples.
FILE_VERSION_FILTER =
# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed
# by doxygen. The layout file controls the global structure of the generated
# output files in an output format independent way. To create the layout file
# that represents doxygen's defaults, run doxygen with the -l option.
# You can optionally specify a file name after the option, if omitted
# DoxygenLayout.xml will be used as the name of the layout file.
LAYOUT_FILE =
# The CITE_BIB_FILES tag can be used to specify one or more bib files
# containing the references data. This must be a list of .bib files. The
# .bib extension is automatically appended if omitted. Using this command
# requires the bibtex tool to be installed. See also
# http://en.wikipedia.org/wiki/BibTeX for more info. For LaTeX the style
# of the bibliography can be controlled using LATEX_BIB_STYLE. To use this
# feature you need bibtex and perl available in the search path. Do not use
# file names with spaces, bibtex cannot handle them.
CITE_BIB_FILES =
#---------------------------------------------------------------------------
# configuration options related to warning and progress messages
#---------------------------------------------------------------------------
# The QUIET tag can be used to turn on/off the messages that are generated
# by doxygen. Possible values are YES and NO. If left blank NO is used.
QUIET = NO
# The WARNINGS tag can be used to turn on/off the warning messages that are
# generated by doxygen. Possible values are YES and NO. If left blank
# NO is used.
WARNINGS = YES
# If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings
# for undocumented members. If EXTRACT_ALL is set to YES then this flag will
# automatically be disabled.
WARN_IF_UNDOCUMENTED = YES
# If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for
# potential errors in the documentation, such as not documenting some
# parameters in a documented function, or documenting parameters that
# don't exist or using markup commands wrongly.
WARN_IF_DOC_ERROR = YES
# The WARN_NO_PARAMDOC option can be enabled to get warnings for
# functions that are documented, but have no documentation for their parameters
# or return value. If set to NO (the default) doxygen will only warn about
# wrong or incomplete parameter documentation, but not about the absence of
# documentation.
WARN_NO_PARAMDOC = NO
# The WARN_FORMAT tag determines the format of the warning messages that
# doxygen can produce. The string should contain the $file, $line, and $text
# tags, which will be replaced by the file and line number from which the
# warning originated and the warning text. Optionally the format may contain
# $version, which will be replaced by the version of the file (if it could
# be obtained via FILE_VERSION_FILTER)
WARN_FORMAT = "$file:$line: $text"
# The WARN_LOGFILE tag can be used to specify a file to which warning
# and error messages should be written. If left blank the output is written
# to stderr.
WARN_LOGFILE =
#---------------------------------------------------------------------------
# configuration options related to the input files
#---------------------------------------------------------------------------
# The INPUT tag can be used to specify the files and/or directories that contain
# documented source files. You may enter file names like "myfile.cpp" or
# directories like "/usr/src/myproject". Separate the files or directories
# with spaces.
INPUT = @CMAKE_SOURCE_DIR@/src @CMAKE_SOURCE_DIR@/node @CMAKE_SOURCE_DIR@/data @CMAKE_SOURCE_DIR@/README.md
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is
# also the default input encoding. Doxygen uses libiconv (or the iconv built
# into libc) for the transcoding. See http://www.gnu.org/software/libiconv for
# the list of possible encodings.
INPUT_ENCODING = UTF-8
# If the value of the INPUT tag contains directories, you can use the
# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp
# and *.h) to filter out the source-files in the directories. If left
# blank the following patterns are tested:
# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh
# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py
# *.f90 *.f *.for *.vhd *.vhdl
FILE_PATTERNS = *.c *.cc *.h *.py *.js
# The RECURSIVE tag can be used to turn specify whether or not subdirectories
# should be searched for input files as well. Possible values are YES and NO.
# If left blank NO is used.
RECURSIVE = YES
# The EXCLUDE tag can be used to specify files and/or directories that should be
# excluded from the INPUT source files. This way you can easily exclude a
# subdirectory from a directory tree whose root is specified with the INPUT tag.
# Note that relative paths are relative to the directory from which doxygen is
# run.
EXCLUDE =
# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or
# directories that are symbolic links (a Unix file system feature) are excluded
# from the input.
EXCLUDE_SYMLINKS = NO
# If the value of the INPUT tag contains directories, you can use the
# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude
# certain files from those directories. Note that the wildcards are matched
# against the file with absolute path, so to exclude all test directories
# for example use the pattern */test/*
EXCLUDE_PATTERNS =
# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names
# (namespaces, classes, functions, etc.) that should be excluded from the
# output. The symbol name can be a fully qualified name, a word, or if the
# wildcard * is used, a substring. Examples: ANamespace, AClass,
# AClass::ANamespace, ANamespace::*Test
EXCLUDE_SYMBOLS =
# The EXAMPLE_PATH tag can be used to specify one or more files or
# directories that contain example code fragments that are included (see
# the \include command).
EXAMPLE_PATH = @CMAKE_SOURCE_DIR@
# If the value of the EXAMPLE_PATH tag contains directories, you can use the
# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp
# and *.h) to filter out the source-files in the directories. If left
# blank all files are included.
EXAMPLE_PATTERNS =
# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be
# searched for input files to be used with the \include or \dontinclude
# commands irrespective of the value of the RECURSIVE tag.
# Possible values are YES and NO. If left blank NO is used.
EXAMPLE_RECURSIVE = NO
# The IMAGE_PATH tag can be used to specify one or more files or
# directories that contain image that are included in the documentation (see
# the \image command).
IMAGE_PATH =
# The INPUT_FILTER tag can be used to specify a program that doxygen should
# invoke to filter for each input file. Doxygen will invoke the filter program
# by executing (via popen()) the command , where
# is the value of the INPUT_FILTER tag, and is the name of an
# input file. Doxygen will then use the output that the filter program writes
# to standard output.
# If FILTER_PATTERNS is specified, this tag will be
# ignored.
INPUT_FILTER =
# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern
# basis.
# Doxygen will compare the file name with each pattern and apply the
# filter if there is a match.
# The filters are a list of the form:
# pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further
# info on how filters are used. If FILTER_PATTERNS is empty or if
# non of the patterns match the file name, INPUT_FILTER is applied.
FILTER_PATTERNS =
# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using
# INPUT_FILTER) will be used to filter the input files when producing source
# files to browse (i.e. when SOURCE_BROWSER is set to YES).
FILTER_SOURCE_FILES = NO
# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file
# pattern. A pattern will override the setting for FILTER_PATTERN (if any)
# and it is also possible to disable source filtering for a specific pattern
# using *.ext= (so without naming a filter). This option only has effect when
# FILTER_SOURCE_FILES is enabled.
FILTER_SOURCE_PATTERNS =
# If the USE_MD_FILE_AS_MAINPAGE tag refers to the name of a markdown file that
# is part of the input, its contents will be placed on the main page (index.html).
# This can be useful if you have a project on for instance GitHub and want reuse
# the introduction page also for the doxygen output.
USE_MDFILE_AS_MAINPAGE = README.md
#---------------------------------------------------------------------------
# configuration options related to source browsing
#---------------------------------------------------------------------------
# If the SOURCE_BROWSER tag is set to YES then a list of source files will
# be generated. Documented entities will be cross-referenced with these sources.
# Note: To get rid of all source code in the generated output, make sure also
# VERBATIM_HEADERS is set to NO.
SOURCE_BROWSER = YES
# Setting the INLINE_SOURCES tag to YES will include the body
# of functions and classes directly in the documentation.
INLINE_SOURCES = NO
# Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct
# doxygen to hide any special comment blocks from generated source code
# fragments. Normal C, C++ and Fortran comments will always remain visible.
STRIP_CODE_COMMENTS = YES
# If the REFERENCED_BY_RELATION tag is set to YES
# then for each documented function all documented
# functions referencing it will be listed.
REFERENCED_BY_RELATION = NO
# If the REFERENCES_RELATION tag is set to YES
# then for each documented function all documented entities
# called/used by that function will be listed.
REFERENCES_RELATION = NO
# If the REFERENCES_LINK_SOURCE tag is set to YES (the default)
# and SOURCE_BROWSER tag is set to YES, then the hyperlinks from
# functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will
# link to the source code.
# Otherwise they will link to the documentation.
REFERENCES_LINK_SOURCE = YES
# If the USE_HTAGS tag is set to YES then the references to source code
# will point to the HTML generated by the htags(1) tool instead of doxygen
# built-in source browser. The htags tool is part of GNU's global source
# tagging system (see http://www.gnu.org/software/global/global.html). You
# will need version 4.8.6 or higher.
USE_HTAGS = NO
# If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen
# will generate a verbatim copy of the header file for each class for
# which an include is specified. Set to NO to disable this.
VERBATIM_HEADERS = YES
#---------------------------------------------------------------------------
# configuration options related to the alphabetical class index
#---------------------------------------------------------------------------
# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index
# of all compounds will be generated. Enable this if the project
# contains a lot of classes, structs, unions or interfaces.
ALPHABETICAL_INDEX = YES
# If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then
# the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns
# in which this list will be split (can be a number in the range [1..20])
COLS_IN_ALPHA_INDEX = 5
# In case all classes in a project start with a common prefix, all
# classes will be put under the same header in the alphabetical index.
# The IGNORE_PREFIX tag can be used to specify one or more prefixes that
# should be ignored while generating the index headers.
IGNORE_PREFIX =
#---------------------------------------------------------------------------
# configuration options related to the HTML output
#---------------------------------------------------------------------------
# If the GENERATE_HTML tag is set to YES (the default) Doxygen will
# generate HTML output.
GENERATE_HTML = YES
# The HTML_OUTPUT tag is used to specify where the HTML docs will be put.
# If a relative path is entered the value of OUTPUT_DIRECTORY will be
# put in front of it. If left blank `html' will be used as the default path.
HTML_OUTPUT =
# The HTML_FILE_EXTENSION tag can be used to specify the file extension for
# each generated HTML page (for example: .htm,.php,.asp). If it is left blank
# doxygen will generate files with .html extension.
HTML_FILE_EXTENSION = .html
# The HTML_HEADER tag can be used to specify a personal HTML header for
# each generated HTML page. If it is left blank doxygen will generate a
# standard header. Note that when using a custom header you are responsible
# for the proper inclusion of any scripts and style sheets that doxygen
# needs, which is dependent on the configuration options used.
# It is advised to generate a default header using "doxygen -w html
# header.html footer.html stylesheet.css YourConfigFile" and then modify
# that header. Note that the header is subject to change so you typically
# have to redo this when upgrading to a newer version of doxygen or when
# changing the value of configuration settings such as GENERATE_TREEVIEW!
HTML_HEADER =
# The HTML_FOOTER tag can be used to specify a personal HTML footer for
# each generated HTML page. If it is left blank doxygen will generate a
# standard footer.
HTML_FOOTER =
# The HTML_STYLESHEET tag can be used to specify a user-defined cascading
# style sheet that is used by each HTML page. It can be used to
# fine-tune the look of the HTML output. If left blank doxygen will
# generate a default style sheet. Note that it is recommended to use
# HTML_EXTRA_STYLESHEET instead of this one, as it is more robust and this
# tag will in the future become obsolete.
HTML_STYLESHEET =
# The HTML_EXTRA_STYLESHEET tag can be used to specify an additional
# user-defined cascading style sheet that is included after the standard
# style sheets created by doxygen. Using this option one can overrule
# certain style aspects. This is preferred over using HTML_STYLESHEET
# since it does not replace the standard style sheet and is therefor more
# robust against future updates. Doxygen will copy the style sheet file to
# the output directory.
HTML_EXTRA_STYLESHEET =
# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or
# other source files which should be copied to the HTML output directory. Note
# that these files will be copied to the base HTML output directory. Use the
# $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these
# files. In the HTML_STYLESHEET file, use the file name only. Also note that
# the files will be copied as-is; there are no commands or markers available.
HTML_EXTRA_FILES =
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output.
# Doxygen will adjust the colors in the style sheet and background images
# according to this color. Hue is specified as an angle on a colorwheel,
# see http://en.wikipedia.org/wiki/Hue for more information.
# For instance the value 0 represents red, 60 is yellow, 120 is green,
# 180 is cyan, 240 is blue, 300 purple, and 360 is red again.
# The allowed range is 0 to 359.
HTML_COLORSTYLE_HUE = 220
# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of
# the colors in the HTML output. For a value of 0 the output will use
# grayscales only. A value of 255 will produce the most vivid colors.
HTML_COLORSTYLE_SAT = 100
# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to
# the luminance component of the colors in the HTML output. Values below
# 100 gradually make the output lighter, whereas values above 100 make
# the output darker. The value divided by 100 is the actual gamma applied,
# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2,
# and 100 does not change the gamma.
HTML_COLORSTYLE_GAMMA = 80
# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML
# page will contain the date and time when the page was generated. Setting
# this to NO can help when comparing the output of multiple runs.
HTML_TIMESTAMP = YES
# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML
# documentation will contain sections that can be hidden and shown after the
# page has loaded.
HTML_DYNAMIC_SECTIONS = NO
# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of
# entries shown in the various tree structured indices initially; the user
# can expand and collapse entries dynamically later on. Doxygen will expand
# the tree to such a level that at most the specified number of entries are
# visible (unless a fully collapsed tree already exceeds this amount).
# So setting the number of entries 1 will produce a full collapsed tree by
# default. 0 is a special value representing an infinite number of entries
# and will result in a full expanded tree by default.
HTML_INDEX_NUM_ENTRIES = 100
# If the GENERATE_DOCSET tag is set to YES, additional index files
# will be generated that can be used as input for Apple's Xcode 3
# integrated development environment, introduced with OSX 10.5 (Leopard).
# To create a documentation set, doxygen will generate a Makefile in the
# HTML output directory. Running make will produce the docset in that
# directory and running "make install" will install the docset in
# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find
# it at startup.
# See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html
# for more information.
GENERATE_DOCSET = NO
# When GENERATE_DOCSET tag is set to YES, this tag determines the name of the
# feed. A documentation feed provides an umbrella under which multiple
# documentation sets from a single provider (such as a company or product suite)
# can be grouped.
DOCSET_FEEDNAME = "Doxygen generated docs"
# When GENERATE_DOCSET tag is set to YES, this tag specifies a string that
# should uniquely identify the documentation set bundle. This should be a
# reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen
# will append .docset to the name.
DOCSET_BUNDLE_ID = org.doxygen.Project
# When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely
# identify the documentation publisher. This should be a reverse domain-name
# style string, e.g. com.mycompany.MyDocSet.documentation.
DOCSET_PUBLISHER_ID = org.doxygen.Publisher
# The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher.
DOCSET_PUBLISHER_NAME = Publisher
# If the GENERATE_HTMLHELP tag is set to YES, additional index files
# will be generated that can be used as input for tools like the
# Microsoft HTML help workshop to generate a compiled HTML help file (.chm)
# of the generated HTML documentation.
GENERATE_HTMLHELP = NO
# If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can
# be used to specify the file name of the resulting .chm file. You
# can add a path in front of the file if the result should not be
# written to the html output directory.
CHM_FILE =
# If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can
# be used to specify the location (absolute path including file name) of
# the HTML help compiler (hhc.exe). If non-empty doxygen will try to run
# the HTML help compiler on the generated index.hhp.
HHC_LOCATION =
# If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag
# controls if a separate .chi index file is generated (YES) or that
# it should be included in the master .chm file (NO).
GENERATE_CHI = NO
# If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING
# is used to encode HtmlHelp index (hhk), content (hhc) and project file
# content.
CHM_INDEX_ENCODING =
# If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag
# controls whether a binary table of contents is generated (YES) or a
# normal table of contents (NO) in the .chm file.
BINARY_TOC = NO
# The TOC_EXPAND flag can be set to YES to add extra items for group members
# to the contents of the HTML help documentation and to the tree view.
TOC_EXPAND = NO
# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and
# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated
# that can be used as input for Qt's qhelpgenerator to generate a
# Qt Compressed Help (.qch) of the generated HTML documentation.
GENERATE_QHP = NO
# If the QHG_LOCATION tag is specified, the QCH_FILE tag can
# be used to specify the file name of the resulting .qch file.
# The path specified is relative to the HTML output folder.
QCH_FILE =
# The QHP_NAMESPACE tag specifies the namespace to use when generating
# Qt Help Project output. For more information please see
# http://doc.trolltech.com/qthelpproject.html#namespace
QHP_NAMESPACE = org.doxygen.Project
# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating
# Qt Help Project output. For more information please see
# http://doc.trolltech.com/qthelpproject.html#virtual-folders
QHP_VIRTUAL_FOLDER = doc
# If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to
# add. For more information please see
# http://doc.trolltech.com/qthelpproject.html#custom-filters
QHP_CUST_FILTER_NAME =
# The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the
# custom filter to add. For more information please see
#
# Qt Help Project / Custom Filters.
QHP_CUST_FILTER_ATTRS =
# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this
# project's
# filter section matches.
#
# Qt Help Project / Filter Attributes.
QHP_SECT_FILTER_ATTRS =
# If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can
# be used to specify the location of Qt's qhelpgenerator.
# If non-empty doxygen will try to run qhelpgenerator on the generated
# .qhp file.
QHG_LOCATION =
# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files
# will be generated, which together with the HTML files, form an Eclipse help
# plugin. To install this plugin and make it available under the help contents
# menu in Eclipse, the contents of the directory containing the HTML and XML
# files needs to be copied into the plugins directory of eclipse. The name of
# the directory within the plugins directory should be the same as
# the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before
# the help appears.
GENERATE_ECLIPSEHELP = NO
# A unique identifier for the eclipse help plugin. When installing the plugin
# the directory name containing the HTML and XML files should also have
# this name.
ECLIPSE_DOC_ID = org.doxygen.Project
# The DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs)
# at top of each HTML page. The value NO (the default) enables the index and
# the value YES disables it. Since the tabs have the same information as the
# navigation tree you can set this option to NO if you already set
# GENERATE_TREEVIEW to YES.
DISABLE_INDEX = NO
# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index
# structure should be generated to display hierarchical information.
# If the tag value is set to YES, a side panel will be generated
# containing a tree-like index structure (just like the one that
# is generated for HTML Help). For this to work a browser that supports
# JavaScript, DHTML, CSS and frames is required (i.e. any modern browser).
# Windows users are probably better off using the HTML help feature.
# Since the tree basically has the same information as the tab index you
# could consider to set DISABLE_INDEX to NO when enabling this option.
GENERATE_TREEVIEW = NO
# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values
# (range [0,1..20]) that doxygen will group on one line in the generated HTML
# documentation. Note that a value of 0 will completely suppress the enum
# values from appearing in the overview section.
ENUM_VALUES_PER_LINE = 4
# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be
# used to set the initial width (in pixels) of the frame in which the tree
# is shown.
TREEVIEW_WIDTH = 250
# When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open
# links to external symbols imported via tag files in a separate window.
EXT_LINKS_IN_WINDOW = NO
# Use this tag to change the font size of Latex formulas included
# as images in the HTML documentation. The default is 10. Note that
# when you change the font size after a successful doxygen run you need
# to manually remove any form_*.png images from the HTML output directory
# to force them to be regenerated.
FORMULA_FONTSIZE = 10
# Use the FORMULA_TRANPARENT tag to determine whether or not the images
# generated for formulas are transparent PNGs. Transparent PNGs are
# not supported properly for IE 6.0, but are supported on all modern browsers.
# Note that when changing this option you need to delete any form_*.png files
# in the HTML output before the changes have effect.
FORMULA_TRANSPARENT = YES
# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax
# (see http://www.mathjax.org) which uses client side Javascript for the
# rendering instead of using prerendered bitmaps. Use this if you do not
# have LaTeX installed or if you want to formulas look prettier in the HTML
# output. When enabled you may also need to install MathJax separately and
# configure the path to it using the MATHJAX_RELPATH option.
USE_MATHJAX = NO
# When MathJax is enabled you can set the default output format to be used for
# thA MathJax output. Supported types are HTML-CSS, NativeMML (i.e. MathML) and
# SVG. The default value is HTML-CSS, which is slower, but has the best
# compatibility.
MATHJAX_FORMAT = HTML-CSS
# When MathJax is enabled you need to specify the location relative to the
# HTML output directory using the MATHJAX_RELPATH option. The destination
# directory should contain the MathJax.js script. For instance, if the mathjax
# directory is located at the same level as the HTML output directory, then
# MATHJAX_RELPATH should be ../mathjax. The default value points to
# the MathJax Content Delivery Network so you can quickly see the result without
# installing MathJax.
# However, it is strongly recommended to install a local
# copy of MathJax from http://www.mathjax.org before deployment.
MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest
# The MATHJAX_EXTENSIONS tag can be used to specify one or MathJax extension
# names that should be enabled during MathJax rendering.
MATHJAX_EXTENSIONS =
# When the SEARCHENGINE tag is enabled doxygen will generate a search box
# for the HTML output. The underlying search engine uses javascript
# and DHTML and should work on any modern browser. Note that when using
# HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets
# (GENERATE_DOCSET) there is already a search function so this one should
# typically be disabled. For large projects the javascript based search engine
# can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution.
SEARCHENGINE = YES
# When the SERVER_BASED_SEARCH tag is enabled the search engine will be
# implemented using a web server instead of a web client using Javascript.
# There are two flavours of web server based search depending on the
# EXTERNAL_SEARCH setting. When disabled, doxygen will generate a PHP script for
# searching and an index file used by the script. When EXTERNAL_SEARCH is
# enabled the indexing and searching needs to be provided by external tools.
# See the manual for details.
SERVER_BASED_SEARCH = NO
# When EXTERNAL_SEARCH is enabled doxygen will no longer generate the PHP
# script for searching. Instead the search results are written to an XML file
# which needs to be processed by an external indexer. Doxygen will invoke an
# external search engine pointed to by the SEARCHENGINE_URL option to obtain
# the search results. Doxygen ships with an example indexer (doxyindexer) and
# search engine (doxysearch.cgi) which are based on the open source search engine
# library Xapian. See the manual for configuration details.
EXTERNAL_SEARCH = NO
# The SEARCHENGINE_URL should point to a search engine hosted by a web server
# which will returned the search results when EXTERNAL_SEARCH is enabled.
# Doxygen ships with an example search engine (doxysearch) which is based on
# the open source search engine library Xapian. See the manual for configuration
# details.
SEARCHENGINE_URL =
# When SERVER_BASED_SEARCH and EXTERNAL_SEARCH are both enabled the unindexed
# search data is written to a file for indexing by an external tool. With the
# SEARCHDATA_FILE tag the name of this file can be specified.
SEARCHDATA_FILE = searchdata.xml
# When SERVER_BASED_SEARCH AND EXTERNAL_SEARCH are both enabled the
# EXTERNAL_SEARCH_ID tag can be used as an identifier for the project. This is
# useful in combination with EXTRA_SEARCH_MAPPINGS to search through multiple
# projects and redirect the results back to the right project.
EXTERNAL_SEARCH_ID =
# The EXTRA_SEARCH_MAPPINGS tag can be used to enable searching through doxygen
# projects other than the one defined by this configuration file, but that are
# all added to the same external search index. Each project needs to have a
# unique id set via EXTERNAL_SEARCH_ID. The search mapping then maps the id
# of to a relative location where the documentation can be found.
# The format is: EXTRA_SEARCH_MAPPINGS = id1=loc1 id2=loc2 ...
EXTRA_SEARCH_MAPPINGS =
#---------------------------------------------------------------------------
# configuration options related to the LaTeX output
#---------------------------------------------------------------------------
# If the GENERATE_LATEX tag is set to YES (the default) Doxygen will
# generate Latex output.
GENERATE_LATEX = NO
# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put.
# If a relative path is entered the value of OUTPUT_DIRECTORY will be
# put in front of it. If left blank `latex' will be used as the default path.
LATEX_OUTPUT = latex
# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be
# invoked. If left blank `latex' will be used as the default command name.
# Note that when enabling USE_PDFLATEX this option is only used for
# generating bitmaps for formulas in the HTML output, but not in the
# Makefile that is written to the output directory.
LATEX_CMD_NAME = latex
# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to
# generate index for LaTeX. If left blank `makeindex' will be used as the
# default command name.
MAKEINDEX_CMD_NAME = makeindex
# If the COMPACT_LATEX tag is set to YES Doxygen generates more compact
# LaTeX documents. This may be useful for small projects and may help to
# save some trees in general.
COMPACT_LATEX = NO
# The PAPER_TYPE tag can be used to set the paper type that is used
# by the printer. Possible values are: a4, letter, legal and
# executive. If left blank a4wide will be used.
PAPER_TYPE = a4
# The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX
# packages that should be included in the LaTeX output.
EXTRA_PACKAGES =
# The LATEX_HEADER tag can be used to specify a personal LaTeX header for
# the generated latex document. The header should contain everything until
# the first chapter. If it is left blank doxygen will generate a
# standard header. Notice: only use this tag if you know what you are doing!
LATEX_HEADER =
# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for
# the generated latex document. The footer should contain everything after
# the last chapter. If it is left blank doxygen will generate a
# standard footer. Notice: only use this tag if you know what you are doing!
LATEX_FOOTER =
# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated
# is prepared for conversion to pdf (using ps2pdf). The pdf file will
# contain links (just like the HTML output) instead of page references
# This makes the output suitable for online browsing using a pdf viewer.
PDF_HYPERLINKS = YES
# If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of
# plain latex in the generated Makefile. Set this option to YES to get a
# higher quality PDF documentation.
USE_PDFLATEX = YES
# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode.
# command to the generated LaTeX files. This will instruct LaTeX to keep
# running if errors occur, instead of asking the user for help.
# This option is also used when generating formulas in HTML.
LATEX_BATCHMODE = NO
# If LATEX_HIDE_INDICES is set to YES then doxygen will not
# include the index chapters (such as File Index, Compound Index, etc.)
# in the output.
LATEX_HIDE_INDICES = NO
# If LATEX_SOURCE_CODE is set to YES then doxygen will include
# source code with syntax highlighting in the LaTeX output.
# Note that which sources are shown also depends on other settings
# such as SOURCE_BROWSER.
LATEX_SOURCE_CODE = NO
# The LATEX_BIB_STYLE tag can be used to specify the style to use for the
# bibliography, e.g. plainnat, or ieeetr. The default style is "plain". See
# http://en.wikipedia.org/wiki/BibTeX for more info.
LATEX_BIB_STYLE = plain
#---------------------------------------------------------------------------
# configuration options related to the RTF output
#---------------------------------------------------------------------------
# If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output
# The RTF output is optimized for Word 97 and may not look very pretty with
# other RTF readers or editors.
GENERATE_RTF = NO
# The RTF_OUTPUT tag is used to specify where the RTF docs will be put.
# If a relative path is entered the value of OUTPUT_DIRECTORY will be
# put in front of it. If left blank `rtf' will be used as the default path.
RTF_OUTPUT = rtf
# If the COMPACT_RTF tag is set to YES Doxygen generates more compact
# RTF documents. This may be useful for small projects and may help to
# save some trees in general.
COMPACT_RTF = NO
# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated
# will contain hyperlink fields. The RTF file will
# contain links (just like the HTML output) instead of page references.
# This makes the output suitable for online browsing using WORD or other
# programs which support those fields.
# Note: wordpad (write) and others do not support links.
RTF_HYPERLINKS = NO
# Load style sheet definitions from file. Syntax is similar to doxygen's
# config file, i.e. a series of assignments. You only have to provide
# replacements, missing definitions are set to their default value.
RTF_STYLESHEET_FILE =
# Set optional variables used in the generation of an rtf document.
# Syntax is similar to doxygen's config file.
RTF_EXTENSIONS_FILE =
#---------------------------------------------------------------------------
# configuration options related to the man page output
#---------------------------------------------------------------------------
# If the GENERATE_MAN tag is set to YES (the default) Doxygen will
# generate man pages
GENERATE_MAN = NO
# The MAN_OUTPUT tag is used to specify where the man pages will be put.
# If a relative path is entered the value of OUTPUT_DIRECTORY will be
# put in front of it. If left blank `man' will be used as the default path.
MAN_OUTPUT = man
# The MAN_EXTENSION tag determines the extension that is added to
# the generated man pages (default is the subroutine's section .3)
MAN_EXTENSION = .3
# If the MAN_LINKS tag is set to YES and Doxygen generates man output,
# then it will generate one additional man file for each entity
# documented in the real man page(s). These additional files
# only source the real man page, but without them the man command
# would be unable to find the correct page. The default is NO.
MAN_LINKS = NO
#---------------------------------------------------------------------------
# configuration options related to the XML output
#---------------------------------------------------------------------------
# If the GENERATE_XML tag is set to YES Doxygen will
# generate an XML file that captures the structure of
# the code including all documentation.
GENERATE_XML = NO
# The XML_OUTPUT tag is used to specify where the XML pages will be put.
# If a relative path is entered the value of OUTPUT_DIRECTORY will be
# put in front of it. If left blank `xml' will be used as the default path.
XML_OUTPUT = xml
# The XML_SCHEMA tag can be used to specify an XML schema,
# which can be used by a validating XML parser to check the
# syntax of the XML files.
XML_SCHEMA =
# The XML_DTD tag can be used to specify an XML DTD,
# which can be used by a validating XML parser to check the
# syntax of the XML files.
XML_DTD =
# If the XML_PROGRAMLISTING tag is set to YES Doxygen will
# dump the program listings (including syntax highlighting
# and cross-referencing information) to the XML output. Note that
# enabling this will significantly increase the size of the XML output.
XML_PROGRAMLISTING = YES
#---------------------------------------------------------------------------
# configuration options for the AutoGen Definitions output
#---------------------------------------------------------------------------
# If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will
# generate an AutoGen Definitions (see autogen.sf.net) file
# that captures the structure of the code including all
# documentation. Note that this feature is still experimental
# and incomplete at the moment.
GENERATE_AUTOGEN_DEF = NO
#---------------------------------------------------------------------------
# configuration options related to the Perl module output
#---------------------------------------------------------------------------
# If the GENERATE_PERLMOD tag is set to YES Doxygen will
# generate a Perl module file that captures the structure of
# the code including all documentation. Note that this
# feature is still experimental and incomplete at the
# moment.
GENERATE_PERLMOD = NO
# If the PERLMOD_LATEX tag is set to YES Doxygen will generate
# the necessary Makefile rules, Perl scripts and LaTeX code to be able
# to generate PDF and DVI output from the Perl module output.
PERLMOD_LATEX = NO
# If the PERLMOD_PRETTY tag is set to YES the Perl module output will be
# nicely formatted so it can be parsed by a human reader.
# This is useful
# if you want to understand what is going on.
# On the other hand, if this
# tag is set to NO the size of the Perl module output will be much smaller
# and Perl will parse it just the same.
PERLMOD_PRETTY = YES
# The names of the make variables in the generated doxyrules.make file
# are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX.
# This is useful so different doxyrules.make files included by the same
# Makefile don't overwrite each other's variables.
PERLMOD_MAKEVAR_PREFIX =
#---------------------------------------------------------------------------
# Configuration options related to the preprocessor
#---------------------------------------------------------------------------
# If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will
# evaluate all C-preprocessor directives found in the sources and include
# files.
ENABLE_PREPROCESSING = YES
# If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro
# names in the source code. If set to NO (the default) only conditional
# compilation will be performed. Macro expansion can be done in a controlled
# way by setting EXPAND_ONLY_PREDEF to YES.
MACRO_EXPANSION = NO
# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES
# then the macro expansion is limited to the macros specified with the
# PREDEFINED and EXPAND_AS_DEFINED tags.
EXPAND_ONLY_PREDEF = NO
# If the SEARCH_INCLUDES tag is set to YES (the default) the includes files
# pointed to by INCLUDE_PATH will be searched when a #include is found.
SEARCH_INCLUDES = YES
# The INCLUDE_PATH tag can be used to specify one or more directories that
# contain include files that are not input files but should be processed by
# the preprocessor.
INCLUDE_PATH =
# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard
# patterns (like *.h and *.hpp) to filter out the header-files in the
# directories. If left blank, the patterns specified with FILE_PATTERNS will
# be used.
INCLUDE_FILE_PATTERNS =
# The PREDEFINED tag can be used to specify one or more macro names that
# are defined before the preprocessor is started (similar to the -D option of
# gcc). The argument of the tag is a list of macros of the form: name
# or name=definition (no spaces). If the definition and the = are
# omitted =1 is assumed. To prevent a macro definition from being
# undefined via #undef or recursively expanded use the := operator
# instead of the = operator.
PREDEFINED =
# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then
# this tag can be used to specify a list of macro names that should be expanded.
# The macro definition that is found in the sources will be used.
# Use the PREDEFINED tag if you want to use a different macro definition that
# overrules the definition found in the source code.
EXPAND_AS_DEFINED =
# If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then
# doxygen's preprocessor will remove all references to function-like macros
# that are alone on a line, have an all uppercase name, and do not end with a
# semicolon, because these will confuse the parser if not removed.
SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration::additions related to external references
#---------------------------------------------------------------------------
# The TAGFILES option can be used to specify one or more tagfiles. For each
# tag file the location of the external documentation should be added. The
# format of a tag file without this location is as follows:
#
# TAGFILES = file1 file2 ...
# Adding location for the tag files is done as follows:
#
# TAGFILES = file1=loc1 "file2 = loc2" ...
# where "loc1" and "loc2" can be relative or absolute paths
# or URLs. Note that each tag file must have a unique name (where the name does
# NOT include the path). If a tag file is not located in the directory in which
# doxygen is run, you must also specify the path to the tagfile here.
TAGFILES =
# When a file name is specified after GENERATE_TAGFILE, doxygen will create
# a tag file that is based on the input files it reads.
GENERATE_TAGFILE =
# If the ALLEXTERNALS tag is set to YES all external classes will be listed
# in the class index. If set to NO only the inherited external classes
# will be listed.
ALLEXTERNALS = NO
# If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed
# in the modules index. If set to NO, only the current project's groups will
# be listed.
EXTERNAL_GROUPS = YES
# The PERL_PATH should be the absolute path and name of the perl script
# interpreter (i.e. the result of `which perl').
PERL_PATH = /usr/bin/perl
#---------------------------------------------------------------------------
# Configuration options related to the dot tool
#---------------------------------------------------------------------------
# If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will
# generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base
# or super classes. Setting the tag to NO turns the diagrams off. Note that
# this option also works with HAVE_DOT disabled, but it is recommended to
# install and use dot, since it yields more powerful graphs.
CLASS_DIAGRAMS = YES
# You can define message sequence charts within doxygen comments using the \msc
# command. Doxygen will then run the mscgen tool (see
# http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the
# documentation. The MSCGEN_PATH tag allows you to specify the directory where
# the mscgen tool resides. If left empty the tool is assumed to be found in the
# default search path.
MSCGEN_PATH =
# If set to YES, the inheritance and collaboration graphs will hide
# inheritance and usage relations if the target is undocumented
# or is not a class.
HIDE_UNDOC_RELATIONS = YES
# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is
# available from the path. This tool is part of Graphviz, a graph visualization
# toolkit from AT&T and Lucent Bell Labs. The other options in this section
# have no effect if this option is set to NO (the default)
HAVE_DOT = NO
# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is
# allowed to run in parallel. When set to 0 (the default) doxygen will
# base this on the number of processors available in the system. You can set it
# explicitly to a value larger than 0 to get control over the balance
# between CPU load and processing speed.
DOT_NUM_THREADS = 0
# By default doxygen will use the Helvetica font for all dot files that
# doxygen generates. When you want a differently looking font you can specify
# the font name using DOT_FONTNAME. You need to make sure dot is able to find
# the font, which can be done by putting it in a standard location or by setting
# the DOTFONTPATH environment variable or by setting DOT_FONTPATH to the
# directory containing the font.
DOT_FONTNAME = Helvetica
# The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs.
# The default size is 10pt.
DOT_FONTSIZE = 10
# By default doxygen will tell dot to use the Helvetica font.
# If you specify a different font using DOT_FONTNAME you can use DOT_FONTPATH to
# set the path where dot can find it.
DOT_FONTPATH =
# If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen
# will generate a graph for each documented class showing the direct and
# indirect inheritance relations. Setting this tag to YES will force the
# CLASS_DIAGRAMS tag to NO.
CLASS_GRAPH = YES
# If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen
# will generate a graph for each documented class showing the direct and
# indirect implementation dependencies (inheritance, containment, and
# class references variables) of the class with other documented classes.
COLLABORATION_GRAPH = YES
# If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen
# will generate a graph for groups, showing the direct groups dependencies
GROUP_GRAPHS = YES
# If the UML_LOOK tag is set to YES doxygen will generate inheritance and
# collaboration diagrams in a style similar to the OMG's Unified Modeling
# Language.
UML_LOOK = NO
# If the UML_LOOK tag is enabled, the fields and methods are shown inside
# the class node. If there are many fields or methods and many nodes the
# graph may become too big to be useful. The UML_LIMIT_NUM_FIELDS
# threshold limits the number of items for each type to make the size more
# managable. Set this to 0 for no limit. Note that the threshold may be
# exceeded by 50% before the limit is enforced.
UML_LIMIT_NUM_FIELDS = 10
# If set to YES, the inheritance and collaboration graphs will show the
# relations between templates and their instances.
TEMPLATE_RELATIONS = NO
# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT
# tags are set to YES then doxygen will generate a graph for each documented
# file showing the direct and indirect include dependencies of the file with
# other documented files.
INCLUDE_GRAPH = YES
# If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and
# HAVE_DOT tags are set to YES then doxygen will generate a graph for each
# documented header file showing the documented files that directly or
# indirectly include this file.
INCLUDED_BY_GRAPH = YES
# If the CALL_GRAPH and HAVE_DOT options are set to YES then
# doxygen will generate a call dependency graph for every global function
# or class method. Note that enabling this option will significantly increase
# the time of a run. So in most cases it will be better to enable call graphs
# for selected functions only using the \callgraph command.
CALL_GRAPH = NO
# If the CALLER_GRAPH and HAVE_DOT tags are set to YES then
# doxygen will generate a caller dependency graph for every global function
# or class method. Note that enabling this option will significantly increase
# the time of a run. So in most cases it will be better to enable caller
# graphs for selected functions only using the \callergraph command.
CALLER_GRAPH = NO
# If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen
# will generate a graphical hierarchy of all classes instead of a textual one.
GRAPHICAL_HIERARCHY = YES
# If the DIRECTORY_GRAPH and HAVE_DOT tags are set to YES
# then doxygen will show the dependencies a directory has on other directories
# in a graphical way. The dependency relations are determined by the #include
# relations between the files in the directories.
DIRECTORY_GRAPH = YES
# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images
# generated by dot. Possible values are svg, png, jpg, or gif.
# If left blank png will be used. If you choose svg you need to set
# HTML_FILE_EXTENSION to xhtml in order to make the SVG files
# visible in IE 9+ (other browsers do not have this requirement).
DOT_IMAGE_FORMAT = png
# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to
# enable generation of interactive SVG images that allow zooming and panning.
# Note that this requires a modern browser other than Internet Explorer.
# Tested and working are Firefox, Chrome, Safari, and Opera. For IE 9+ you
# need to set HTML_FILE_EXTENSION to xhtml in order to make the SVG files
# visible. Older versions of IE do not have SVG support.
INTERACTIVE_SVG = NO
# The tag DOT_PATH can be used to specify the path where the dot tool can be
# found. If left blank, it is assumed the dot tool can be found in the path.
DOT_PATH =
# The DOTFILE_DIRS tag can be used to specify one or more directories that
# contain dot files that are included in the documentation (see the
# \dotfile command).
DOTFILE_DIRS =
# The MSCFILE_DIRS tag can be used to specify one or more directories that
# contain msc files that are included in the documentation (see the
# \mscfile command).
MSCFILE_DIRS =
# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of
# nodes that will be shown in the graph. If the number of nodes in a graph
# becomes larger than this value, doxygen will truncate the graph, which is
# visualized by representing a node as a red box. Note that doxygen if the
# number of direct children of the root node in a graph is already larger than
# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note
# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH.
DOT_GRAPH_MAX_NODES = 50
# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the
# graphs generated by dot. A depth value of 3 means that only nodes reachable
# from the root by following a path via at most 3 edges will be shown. Nodes
# that lay further from the root node will be omitted. Note that setting this
# option to 1 or 2 may greatly reduce the computation time needed for large
# code bases. Also note that the size of a graph can be further restricted by
# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction.
MAX_DOT_GRAPH_DEPTH = 0
# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent
# background. This is disabled by default, because dot on Windows does not
# seem to support this out of the box. Warning: Depending on the platform used,
# enabling this option may lead to badly anti-aliased labels on the edges of
# a graph (i.e. they become hard to read).
DOT_TRANSPARENT = NO
# Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output
# files in one run (i.e. multiple -o and -T options on the command line). This
# makes dot run faster, but since only newer versions of dot (>1.8.10)
# support this, this feature is disabled by default.
DOT_MULTI_TARGETS = NO
# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will
# generate a legend page explaining the meaning of the various boxes and
# arrows in the dot generated graphs.
GENERATE_LEGEND = YES
# If the DOT_CLEANUP tag is set to YES (the default) Doxygen will
# remove the intermediate dot files that are used to generate
# the various graphs.
DOT_CLEANUP = YES
opencc-0.4.3/doc/opencc.1 000640 567316 013202 00000001326 12145345503 016404 0 ustar 00carbokuo nonconf 000000 000000 .TH OPENCC "1" "June 2010" "opencc " "User Commands"
.SH NAME
opencc \- simplified-traditional chinese conversion tool
.SH DESCRIPTION
Open Chinese Convert (OpenCC) Command Line Tool
.SS "Usage:"
.HP
opencc [\-i input_file] [\-o output_file] [\-c config_file]
.HP
\fB\-i\fR
Read original text from input_file.
.HP
\fB\-o\fR
Write converted text to output_file.
.HP
\fB\-c\fR
Load dictionary configuration from config_file.
.IP
Note:
.IP
Text from standard input will be read if input_file is not set and will be written to standard output if output_file is not set.
.IP
Default configuration(zhs2zht.ini) will be load if config_file is not set.
.PP
Open Chinese Convert (OpenCC) Command Line Tool
.SH "SEE ALSO"
.BR iconv (1)
opencc-0.4.3/doc/CMakeLists.txt 000640 567316 013202 00000001565 12145345503 017620 0 ustar 00carbokuo nonconf 000000 000000 install(
FILES
opencc.1
opencc_dict.1
DESTINATION
${DIR_SHARE}/man/man1
)
if(BUILD_DOCUMENTATION)
find_package(Doxygen)
if (NOT DOXYGEN_FOUND)
message(
FATAL_ERROR
"Doxygen is needed to build the documentation. Please install it correctly"
)
endif()
configure_file(
opencc.doxy.in
opencc.doxy
@ONLY
IMMEDIATE
)
add_custom_target(
apidoc
ALL
COMMENT
"Building API Documentation"
COMMAND
doxygen ${PROJECT_BINARY_DIR}/doc/opencc.doxy
SOURCES
${PROJECT_BINARY_DIR}/doc/opencc.doxy
)
install(
DIRECTORY
${CMAKE_BINARY_DIR}/doc/html
DESTINATION
${DIR_SHARE_OPENCC}/doc
)
set_directory_properties(
PROPERTIES
ADDITIONAL_MAKE_CLEAN_FILES
"${CMAKE_BINARY_DIR}/doc/html"
)
endif()
opencc-0.4.3/doc/opencc_dict.1 000640 567316 013202 00000000656 12145345503 017414 0 ustar 00carbokuo nonconf 000000 000000 .TH OPENCC_DICT "1" "June 2010" "opencc_dict " "User Commands"
.SH NAME
opencc_dict \- open chinese convert dictionary tool
.SH DESCRIPTION
Open Chinese Convert (OpenCC) Dictionary Tool
.SS "Usage:"
.HP
opencc_dict \fB\-i\fR input_file \fB\-o\fR output_file
.HP
\fB\-i\fR
Read data from input_file.
.HP
\fB\-o\fR
Write converted data to output_file.
.PP
Open Chinese Convert (OpenCC) Dictionary Tool
.SH "SEE ALSO"
.BR opencc (1)
opencc-0.4.3/node/binding.cc 000640 567316 013202 00000012603 12145345503 017154 0 ustar 00carbokuo nonconf 000000 000000 #include
#include
#include
#include "../src/opencc.h"
using namespace v8;
char* ToUtf8String(const Local& str) {
char* utf8 = new char[str->Utf8Length() + 1];
utf8[str->Utf8Length()] = '\0';
str->WriteUtf8(utf8);
return utf8;
}
class OpenccBinding : public node::ObjectWrap {
struct ConvertRequest {
OpenccBinding* opencc_instance;
char* input;
char* output;
Persistent callback;
};
public:
explicit OpenccBinding(const char * config_file) {
handler_ = opencc_open(config_file);
}
virtual ~OpenccBinding() {
if (handler_ != (opencc_t) -1)
opencc_close(handler_);
}
operator bool() const {
return handler_ != (opencc_t) -1;
}
static Handle New(const Arguments& args) {
HandleScope scope;
OpenccBinding* opencc_instance;
if (args.Length() >= 1 && args[0]->IsString()) {
char* config_file = ToUtf8String(args[0]->ToString());
opencc_instance = new OpenccBinding(config_file);
delete[] config_file;
} else {
const char* config_file = OPENCC_DEFAULT_CONFIG_SIMP_TO_TRAD;
opencc_instance = new OpenccBinding(config_file);
}
if (!*opencc_instance) {
ThrowException(Exception::Error(
String::New("Can not create opencc instance")));
return scope.Close(Undefined());
}
opencc_instance->Wrap(args.This());
return args.This();
}
static Handle Convert(const Arguments& args) {
HandleScope scope;
if (args.Length() < 2 || !args[0]->IsString() || !args[1]->IsFunction()) {
ThrowException(Exception::TypeError(String::New("Wrong arguments")));
return scope.Close(Undefined());
}
ConvertRequest* conv_data = new ConvertRequest;
conv_data->opencc_instance = ObjectWrap::Unwrap(args.This());
conv_data->input = ToUtf8String(args[0]->ToString());
conv_data->callback = Persistent::New(Local::Cast(args[1]));
uv_work_t* req = new uv_work_t;
req->data = conv_data;
uv_queue_work(uv_default_loop(), req, DoConnect, (uv_after_work_cb)AfterConvert);
return Undefined();
}
static void DoConnect(uv_work_t* req) {
ConvertRequest* conv_data = static_cast(req->data);
opencc_t opencc_handler = conv_data->opencc_instance->handler_;
conv_data->output = opencc_convert_utf8(opencc_handler, conv_data->input, (size_t) -1);
}
static void AfterConvert(uv_work_t* req) {
HandleScope scope;
ConvertRequest* conv_data = static_cast(req->data);
Local converted = String::New(conv_data->output);
const unsigned argc = 2;
Local argv[argc] = {
Local::New(Undefined()),
Local::New(converted)
};
conv_data->callback->Call(Context::GetCurrent()->Global(), argc, argv);
conv_data->callback.Dispose();
delete[] conv_data->input;
opencc_convert_utf8_free(conv_data->output);
delete conv_data;
delete req;
}
static Handle ConvertSync(const Arguments& args) {
HandleScope scope;
if (args.Length() < 1 || !args[0]->IsString()) {
ThrowException(Exception::TypeError(String::New("Wrong arguments")));
return scope.Close(Undefined());
}
OpenccBinding* opencc_instance = ObjectWrap::Unwrap(args.This());
opencc_t opencc_handler = opencc_instance->handler_;
char* input = ToUtf8String(args[0]->ToString());
char* output = opencc_convert_utf8(opencc_handler, input, (size_t) -1);
Local converted = String::New(output);
delete[] input;
opencc_convert_utf8_free(output);
return scope.Close(converted);
}
static Handle SetConversionMode(const Arguments& args) {
HandleScope scope;
if (args.Length() < 1 || !args[0]->IsInt32()) {
ThrowException(Exception::TypeError(String::New("Wrong arguments")));
return scope.Close(Undefined());
}
OpenccBinding* opencc_instance = ObjectWrap::Unwrap(args.This());
opencc_t opencc_handler = opencc_instance->handler_;
int conversion_mode = args[0]->ToInt32()->Value();
if (conversion_mode < 0 || conversion_mode > 2) {
ThrowException(Exception::Error(
String::New("conversion_mode must between 0 and 2")));
return scope.Close(Undefined());
}
opencc_set_conversion_mode(opencc_handler,
(opencc_conversion_mode) conversion_mode);
return scope.Close(Boolean::New(true));
}
static void init(Handle