Name: icu
URL: http://site.icu-project.org/
Version: 4.6
License: MIT
Security Critical: yes

Description:
This directory contains the source code of ICU 4.6 for C/C++

1. It was obtained with the following:

    $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46

2. Platform header files for Linux, FreeBSD, OpenBSD, Android, Mac OS X, and QNX:

   - Apply platform.patch in patches directory. : It applies the upstream
     patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
     and change source/common/unicode/ptypes.h to refer to plinux.h and
     pmac.h generated below.

   - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
     'runConfigureICU MacOSX' are run to generate
     source/common/unicode/platform.h.

   - On OpenBSD, source/common/unicode/platform.h is being generated
     by the icu4c port in the ports directory and not by runConfigureICU.
     In case the file has to be updated you can do:
     cd /home/ports/textproc/icu4c && make configure

   - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'

   - Apply patches/pmach.h.patch on Mac to pmac.h

   - On Android, the pandroid.h was generated by copying plinux.h to
     pandroid.h and applying the patches/pandroid.h.patch.

   - For QNX, the pqnx.h was generated by copying plinux.h to
     pqnx.h and applying the patches/platform.qnx.patch.

   - For NaCl (icu_nacl.gypi), the pnacl.h was generated by copying plinux.h to
     pnacl.h and applying the patches/pnacl.h.patch.

   - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h

3. The following directories were removed because they're not used by Chromium
   at the moment:
   as_is
   packaging
   source/extra
   source/sample
   source/layout
   source/layoutex


4. The word breaking for Chinese and Japanese were modified to use a word
   frequency list with the following patch and cjdict.txt.

   - patches/segmentation.patch :
       Adds a dictionary (word-frequency)-based word breaking for CJK
       (Korean is supported in the code, but it does not do anything
        because we don't have a Korean word-list.)

   - source/data/brkitr/cjdict.txt :
       Chinese and Japanese word frequency list.
       See the file for license/copyright notice

   - source/data/brkitr/cc_edict.txt :
       the list of words derived from CC-Edict.)

   - patches/brkitr.patch
     * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
                  handling of U+0022, and splitting of FQDN into labels at '.'.
		  For Hebrew, see http://unicode.org/cldr/track/ticket/3120
     * line.txt : Incorporated line_he and minor changes in CL, OP and ID
                  definitions.
		  For Hebrew, see http://unicode.org/cldr/track/ticket/4004
		  For others, see http://unicode.org/cldr/track/ticket/3974
		                  http://unicode.org/cldr/track/ticket/4200
		                  http://unicode.org/cldr/track/ticket/
     * brklocal.mk : build file changes to drop unnecessary brkitr rule
                     files (e.g. word_ja.txt, line_he.txt)

   - android/brkitr.patch (to be applied for Android build only) :
       Reverts some changes about Chinese/Japanese segmentation rules in
       patches/brkitr.patch to reduce binary size for Android.

   If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
   to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.

5. Converter changes : converters.patch
  - Include what we really need. See source/data/mappings/ucmlocal.txt
  - Alias and mapping changes : source/data/mappings/convrtrs.txt
  - Changes several tables and add six new tables, three of which
    are 'fake' tables for ISO-2022-CN(-Ext).
  - ucnv2022.c is modified to use 3 'fake' tables added above for
    ISO-2022-CN(-Ext).

6. Locale changes
  - patches/locale1.patch :
      Filipino, Amharic, and Swahili locales
      exemplar character set changes for CJK + 9 Indian locales
      Minor fixes for Danish, , Turkish, and Korean.

  - patches/locale2.patch :
      The minimum locale data Chrome needs for 47 languages Chrome is
      not localized to. Each locale data file has ExemplarCharacters,
      LocaleScript, layout, and the name of the language for a locale
      in its native language.

  - patches/locale3.patch : Locale build configuration files. They
    add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
    source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.

  - In source/data/region, run the following command to get rid of numeric region
    display names we don't use (everything other than 419).
     $ sed -i  '/[0-35-9][0-9][0-9]{/ d' *.txt

  - android/patch_locale.sh (to be run for Android build only):
      Makes changes to source/data/{curr,region,lang} to exclude these data
      except the language and script names of zh_Hans and zh_Hant.
 
  - Add tg.txt to source/data/locale source/data/lang to add the minimal locale
    data necessary for the spellchecker. In both directories, add tg.txt to
    reslocal.mk

7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt

  - patches/unihan.patch:
    unihan collation tables are never used in Chrome/Webkit, but it takes
    about 1MB in the uncompressed ICU data file in ICU 4.2.1.

8. Timezone data update
  - Grab the latest version of the following timezone data files and
    put them in source/data/misc.

     metaZones.txt
     timezoneTypes.txt
     windowsZones.txt
     zoneinfo64.txt

   As of Mar 2014, the latest version is 2014a and the above files
   are available at
   http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014a/44/

9. Transliterator customization

   - Add el_Upper.txt taken from ICU 52 to source/data/trnslit

   - Also add css3transform.txt to the same directory
   - Put the following line in trnslocal.mk

     TRANSLIT_SOURCE=css3transform.txt

10. Build-related changes

  - patches/wpo.patch
  - patches/vscomp.patch
    (see http://bugs.icu-project.org/trac/ticket/8355 and
         http://bugs.icu-project.org/trac/ticket/8356 )
  - patches/rtti.patch : Make RTTI work without exception handling on Windows
    (see http://bugs.icu-project.org/trac/ticket/8343)
  - patches/data.build.patch :
      To remove some data files we don't use and cut down the data size.
  - patches/data.build.win.patch :
      Windows-only data build patch. Add a new target DATALIB to makedata.mak
  - patches/clang.patch: To build with Clang.
    (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in
    the patch have already been fixed in the ICU trunk.)
  - add an empty file (stubdatabuilt.txt) to source/stubdata

11. Pre-built data libraries are checked in.

    Before building data file on Linux, re-run 'runConfigureICU Linux' again
    if it's run without data.build.patch in #10 above.

    Because we removed layout and layoutex directories in step 3,
    'runConfigureICU Linux' will fail even with '--disable-layout'. A
    work-around is to have a copy of our icu tree in a separate build directory
    and add back directories we removed in step 3 before
    running 'runConfigure'.

    'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
    to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
    in {BUILD_DIR_ROOT}/data.

    'make' will fail again when pkgdata looks for css3transform.res. Edit
    data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
    (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again.


    - source/data/in/icudtl.dat : Built on Linux with all the patches
      above applied. icudt46l.dat is generated in
      {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
      version number (46) dropped.

    - windows/icudt.dll : With icudt46l.dat in place, all the patches applied
      and header files moved (#11 below), generated by building icudt_build
      project of build/icudt_build.sln on Windows. icudt46.dll is
      generated in bin/{Release,Debug} and copied to windows/icudt.dll
      and checked in. Note that we drop the version number ('46') from the
      dll name to avoind having to update our build scripts/configuration
      files everytime ICU is upgraded to a new version.

    - {mac,linux}/icudt46l_dat.S : Built on Linux with all the
      patches above (except android/brkitr.patch) applied and checked in.
      This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp.

      mac/icudt46l_dat.S is identical to linux/icudt46l_dat.S. It's made
      by changing the header portion of the Linux version to read as following
      (no leading whitespace) :

          .globl _icudt46_dat
          #ifdef U_HIDE_DATA_SYMBOL
                 .private_extern _icudt46_dat
          #endif
                 .data
                 .const
                 .align 4
          _icudt46_dat:


    - android/icudt46l_dat.S : Built on Linux with all the patches above and
      android/brkitr.patch applied and android/patch_locale.sh executed, and
      checked in.
    - android/icudtl.dat : Generated as icudt46l.dat in
      {BUILD_DIR_ROOT}/data/out/tmp along with icudt46l_dat.S and
      copied to the above location with '46' dropped in its name.


12. Apply the fix found with static analysis tools such as PSV and coverity

  - patches/static.analysis.patch
  - upstream trunk/4.8 do not have this code any more.

13. Fix for msvs2010 applied:
--- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
 (revision 78292)
+++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
 (working copy)
@@ -75,7 +75,7 @@
 * Visual Studios 9.0.
 * Cygwin with MSVC 9.0 also complains here about redefinition.
 */
-#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
+#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
 const int32_t StringPiece::npos;
 #endif

14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
  - upstream bug: http://bugs.icu-project.org/trac/ticket/8561
  - Handle other chars besides the dot. This is required because decNumber's
    parser expects the dot as a decimal separator.
  - Locales that don't use dot were producing "NaN" values.

15. Fix a bug in the regex engine.
  - patches/regex.patch
  - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream)

16. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
   - patches/search_collation.patch
   - upstream bug: http://bugs.icu-project.org/trac/ticket/8290

17. Fix a use of uninitialized memory bug in regular expression matching
   - patches/rematch.patch
   - upstream bug: http://bugs.icu-project.org/trac/ticket/8824

18. Make it compile with -Werror on gcc 4.6
   - patches/gcc46.patch (ToT upstream does not have this code any more).

19. Fix four out of bounds memory access error in common/uloc.c
    and common/uresbund.c
   - patches/uloc.patch
   - upstream bug:
     1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize)
     2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords)
     3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund)
        http://bugs.icu-project.org/trac/ticket/8813 (uresbund)
     4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords)

20. Fix a null pointer error in ubrk_setText in ubrk.cpp.
    - patches/ubrk.patch
    - upstream bug : http://bugs.icu-project.org/trac/ticket/9115

21. Fix a clang warning in rbbi.cpp by merging in an upstream change.
    - patches/changeset_30255.patch
    - upstream change : http://bugs.icu-project.org/trac/changeset/30255

22. Fix time zone handling and compilation on iOS.
    - patches/ios_timezone.patch
    - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051
                      http://bugs.icu-project.org/trac/ticket/8661

23. Fix a buffer overflow in utext
    - patches/utext.patch
    - upstream change : http://bugs.icu-project.org/trac/changeset/29356

24. Fix compilation errors on VS2012 and above.
    - patches/vs2012.patch

25. Fix a buffer overflow in UTF-16/32 detection.
    - patches/csetdet.patch
    - upstream bug: http://bugs.icu-project.org/trac/ticket/10318

26. Add BreakIterator::getRuleStatus
    - patches/breakiterator.patch
    - Copy and paste BreakIterator::getRuleStatus API from ICU 52

27. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT.
    - patches/declspec.patch

28. Add support for QNX Neutrino.
    -  patches/platform.qnx.patch:
       See #2 about the platform header generation.
    -  patches/si_value.undef.patch:
       Work around an all-lowercase macro defined in <signal.h>.
       Upstream took a different approach:
       http://bugs.icu-project.org/trac/ticket/9935
    -  patches/xopen_source.patch:
       Set _XOPEN_SOURCE to 600 as in the upstream changeset:
       http://bugs.icu-project.org/trac/changeset/30418