You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Ned Deily 8e5c0a774f Issue #13590: Improve support for OS X Xcode 4: 14 years ago
..
README - Modernize code to use Py_ssize_t more intensively. 20 years ago
_codecs_cn.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
_codecs_hk.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
_codecs_iso2022.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
_codecs_jp.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
_codecs_kr.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
_codecs_tw.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
alg_jisx0201.h Recorded merge of revisions 81029 via svnmerge from 16 years ago
cjkcodecs.h Recorded merge of revisions 81029 via svnmerge from 16 years ago
emu_jisx0213_2000.h Recorded merge of revisions 81029 via svnmerge from 16 years ago
mappings_cn.h - Modernize code to use Py_ssize_t more intensively. 20 years ago
mappings_hk.h Merged revisions 60481,60485,60489-60492,60494-60496,60498-60499,60501-60503,60505-60506,60508-60509,60523-60524,60532,60543,60545,60547-60548,60552,60554,60556-60559,60561-60562,60569,60571-60572,60574,60576-60583,60585-60586,60589,60591,60594-60595,60597-60598,60600-60601,60606-60612,60615,60617-60678 via svnmerge from 18 years ago
mappings_jisx0213_pair.h - Modernize code to use Py_ssize_t more intensively. 20 years ago
mappings_jp.h - Modernize code to use Py_ssize_t more intensively. 20 years ago
mappings_kr.h - Modernize code to use Py_ssize_t more intensively. 20 years ago
mappings_tw.h - Modernize code to use Py_ssize_t more intensively. 20 years ago
multibytecodec.c Issue #13590: Improve support for OS X Xcode 4: 14 years ago
multibytecodec.h Recorded merge of revisions 81029 via svnmerge from 16 years ago

README

To generate or modify mapping headers
-------------------------------------
Mapping headers are imported from CJKCodecs as pre-generated form.
If you need to tweak or add something on it, please look at tools/
subdirectory of CJKCodecs' distribution.



Notes on implmentation characteristics of each codecs
-----------------------------------------------------

1) Big5 codec

The big5 codec maps the following characters as cp950 does rather
than conforming Unicode.org's that maps to 0xFFFD.

BIG5 Unicode Description

0xA15A 0x2574 SPACING UNDERSCORE
0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE
0xA1C5 0x02CD SPACING HEAVY UNDERSCORE
0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT
0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT
0xA2CC 0x5341 HANGZHOU NUMERAL TEN
0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY

Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
big5 codes already, a roundtrip compatibility is not guaranteed for
them.


2) cp932 codec

To conform to Windows's real mapping, cp932 codec maps the following
codepoints in addition of the official cp932 mapping.

CP932 Unicode Description

0x80 0x80 UNDEFINED
0xA0 0xF8F0 UNDEFINED
0xFD 0xF8F1 UNDEFINED
0xFE 0xF8F2 UNDEFINED
0xFF 0xF8F3 UNDEFINED


3) euc-jisx0213 codec

The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
unicode U+FF3C instead of U+005C as on unicode.org's mapping.
Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
is shown as a full width character, mapping to U+FF3C can make
more sense.

The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
overlapped by each other, it doesn't bother standard conformations
(and JIS X 0213 Plane 2 is intended to use so.) On encoding
sessions, the codec will try to encode kanji characters in this
order:

JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212


4) euc-jp codec

The euc-jp codec is a compatibility instance on these points:
- U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
- U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
- U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)


5) shift-jis codec

The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
instead of using JIS X 0201 for compatibility. The differences are:
- U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
- U+007E TILDE is mapped to SHIFT-JIS 0x7e.
- U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.