== Notes on {kddi,docomo,softbank}-*.ucm mappings. kddi-jisx-208 is a variant of JIS X 208 used by KDDI, a Japanese cell phone carrier. kddi-shift_jis, docomo-shift_jis, and softbank-shift_jis are variants of Shift_JIS used by KDDI, DoCoMo and SoftBank. - kddi-jisx-208 contains Emoji (emoticon) code points in 0x75xx, 0x76xx, 0x77xx, 0x78xx, 0x79xx, 0x7Axx, 0x7Bxx, where xx means 21-7E. - kddi-shift_jis contains Emoji code points in 0xEBxx, 0xECxx, 0xEDxx, and 0xEExx, 0xF3xx, 0xF4xx, 0xF6xx, 0xF7xx, where xx means 40-7E, 80-FC. - docomo-shift_jis contains Emoji code points in 0xF8xx, and 0xF9xx, where xx means 40-7E, 80-FC. - softbank-shift_jis contains Emoji code points in 0xF7xx, 0xF9xx, and 0xFBxx, where xx means 40-7E, 80-FC. - softbank-jisx-208 contains Emoji code points in 0x75xx, 0x76xx, 0x77xx, 0x78xx, 0x79xx, 0x7Axx, 0x7Bxx, 0x7Dxx where xx means 21-7E. == How the -2012.ucm tables were modified in April 2013 The -2012 versions were created by http://code.google.com/p/emoji4unicode/source/browse/trunk/src/gen_conversion_files.py using each of the older 2012 versions as the base table files to avoid non-Emoji changes: # gen_google_ucm.sh icu_mappings=/google/src/cloud/mscherer/icubranch/google_vendor_src_branch/icu/source/data/mappings dest=/home/mscherer/www/no_crawl/emoji ./gen_conversion_files.py $icu_mappings/docomo-shift_jis-2012.ucm cp ../generated/docomo-shift_jis-2012.ucm $dest ./gen_conversion_files.py $icu_mappings/kddi-shift_jis-2012.ucm cp ../generated/kddi-shift_jis-2012.ucm $dest ./gen_conversion_files.py $icu_mappings/softbank-shift_jis-2012.ucm cp ../generated/softbank-shift_jis-2012.ucm $dest ./gen_conversion_files.py The only differences from 2012-sep are in mappings for symbols that have Unicode Variation Selector (VS) sequences. The older tables relied on a hack in the ICU conversion code that ignored the "use fallback" flag for fallbacks from sequences with VS. The new tables rely on a new feature in ICU4C 51: For the relevant symbols that have roundtrip mappings, - the mappings with Emoji Variation Selector use the |0 roundtrip precision - the other mappings (no VS & text VS) use the |4 "good one-way" precision See http://bugs.icu-project.org/trac/ticket/9602 == How the -2012.ucm tables were created in September 2012 The 2012 versions were created by http://code.google.com/p/emoji4unicode/source/browse/trunk/src/gen_conversion_files.py using each of the 2007 versions as the base table files to avoid non-Emoji changes: icu_mappings=~/p4/emoji/google_vendor_src_branch/icu/source/data/mappings dest=~/www/no_crawl/emoji ./gen_conversion_files.py $icu_mappings/docomo-shift_jis-2007.ucm cp ../generated/docomo-shift_jis-2012.ucm $dest ./gen_conversion_files.py $icu_mappings/kddi-shift_jis-2007.ucm cp ../generated/kddi-shift_jis-2012.ucm $dest ./gen_conversion_files.py $icu_mappings/softbank-shift_jis-2007.ucm cp ../generated/softbank-shift_jis-2012.ucm $dest ./gen_conversion_files.py The emoji4unicode code uses the mappings that were established during the Unicode Emoji standardization process. The new conversion tables round-trip carrier Emoji symbol codes to and from Unicode 6 standard code points and also include fallback mappings from the Google PUA code points to the carrier codes. The trailing "|0" etc. on the mapping table lines specify the mapping type: |0 round-trip Unicode <-> charset |1 fallback Unicode -> charset |3 "reverse fallback" Unicode <- charset For details about the .ucm file format see http://userguide.icu-project.org/conversion/data#TOC-.ucm-File-Format == How the -2007.ucm tables were created So far, we haven't obtained "official" conversion tables from the cell phone carriers. However, we empirically know their clients support VDCs in MS932, like U2460 (CIRCLED DIGIT ONE), etc. Hence we use MS932 as the base table for them. kddi-jisx-208-2007.ucm is based on jisx-208.ucm in this directory. The original table's mappings to codes 0x75xx to 0x7Bxx are excluded to avoid collisions with emoji. kddi-shift_jis-2007.ucm is based on windows-932-2000.ucm. The original table's mappings to codes 0xEBxx to 0xEExx, and 0xF0xx to 0xF90xx (EUDC block), are excluded to avoid collisions with emoji. docomo-shift_jis-2007.ucm is based on windows-932-2000.ucm. The original table's mappings to codes 0xF0xx to 0xF90xx (EUDC block) are excluded to avoid collisions with emoji. softbank-shift_jis-2007.ucm is based on windows-932-2000.ucm. The original table's mappings to codes 0xF0xx to 0xF90xx (EUDC block), and 0xFBxx, are excluded to avoid collisions with emoji. softbank-jisx-208-2007.ucm is based on jisx-208.ucm in this directory. The original table's mappings to codes 0x75xx to 0x7Bxx, and 0x7Dxx are excluded to avoid collisions with emoji. == Google Standard Emoji Unicode Mapping The Google standard emoji Unicode mapping can be found at: /home/build/google3/i18n/encodings/emoji/emoji_unicode_mapping.txt TODO(mscherer): Use <icu:base> to share most standard JIS mappings among *-shift_jis-2007.ucm files.