Hyphen - hyphenation library to use converted TeX hyphenation patterns (C) 1998 Raph Levien (C) 2001 ALTLinux, Moscow (C) 2006, 2007, 2008 László Németh This was part of libHnj library by Raph Levien. Peter Novodvorsky from ALTLinux cut hyphenation part from libHnj to use it in OpenOffice.org. Compound word and non-standard hyphenation support by László Németh. License is the original LibHnj license: LibHnj is dual licensed under LGPL and MPL (see also README.libhnj). Because LGPL allows GPL relicensing, COPYING contains now LGPL/GPL/MPL tri-license for explicit Mozilla source compatibility. Original Libhnj source with OOo's patches are managed by Rene Engelhard and Chris Halls at Debian: http://packages.debian.org/stable/libdevel/libhnj-dev and http://packages.debian.org/unstable/source/libhnj OTHER FILES This distribution is the source of the en_US hyphenation patterns "hyph_en_US.dic", too. See README_hyph_en_US.txt. Source files of hyph_en_US.dic in the distribution: hyphen.tex (en_US hyphenation patterns from plain TeX) Source: http://tug.ctan.org/text-archive/macros/plain/base/hyphen.tex tbhyphext.tex: hyphenation exception log from TugBoat archive Source of the hyphenation exception list: http://www.ctan.org/tex-archive/info/digests/tugboat/tb0hyf.tex Generated with the hyphenex script (http://www.ctan.org/tex-archive/info/digests/tugboat/hyphenex.sh) sh hyphenex.sh <tb0hyf.tex >tbhyphext.tex INSTALLATION ./configure make make install UNIT TESTS (WITH VALGRIND DEBUGGER) make check VALGRIND=memcheck make check USAGE ./example hyph_en_US.dic mywords.txt or (under Linux) echo example | ./example hyph_en_US.dic /dev/stdin NOTE: In the case of Unicode encoded input, convert your words to lowercase before hyphenation (under UTF-8 console environment): cat mywords.txt | awk '{print tolower($0)}' >mywordslow.txt DEVELOPMENT See README.hyphen for hyphenation algorithm, README.nonstandard and doc/tb87nemeth.pdf for non-standard hyphenation, README.compound for compound word hyphenation, and tests/*. Description of the dictionary format: First line contains the character encoding (ISO8859-x, UTF-8). Possible options in the following lines: LEFTHYPHENMIN num minimal hyphenation distance from the left word end RIGHTHYPHENMIN num minimal hyphation distance from the right word end COMPOUNDLEFTHYPHENMIN num min. hyph. dist. from the left compound word boundary COMPOUNDRIGHTHYPHENMIN num min. hyph. dist. from the right comp. word boundary hyphenation patterns see README.* files NEXTWORD separate the two compound sets (see README.compound) Default values: Without explicite declarations, hyphenmin fields of dict struct are zeroes, but in this case the lefthyphenmin and righthyphenmin will be the default 2 under the hyphenation (for backward compatibility). Comments Use percent sign at the beginning of the lines to add comments to your hpyhenation patterns (after the character encoding in the first line): % comment ***************************************************************************** * Warning! Correct working of Libhnj *needs* prepared hyphenation patterns. * For example, generating hyph_en_US.dic from "hyphen.us" TeX patterns: perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 or with default LEFTHYPHENMIN and RIGHTHYPHENMIN values: perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 2 3 perl substrings.pl hyphen.gb hyph_en_GB.dic ISO8859-1 3 3 **************************************************************************** OTHERS Java hyphenation: Peter B. West (Folio project) implements a hyphenator with non standard hyphenation facilities based on extended Libhnj. The HyFo module is released in binary form as jar files and in source form as zip files. See http://sourceforge.net/project/showfiles.php?group_id=119136 László Németh <nemeth (at) openoffice (dot) org>