The “m17n”1 Multilingualization library is a project developed by Japan's National Institute of Advanced Industrial Science and Technology (AIST). The library is written in C. The library has many interesting features.
Multilingual text is represented using an “MText” object. An MText object is basically a string with attributes called text properties. The MText object is designed to be used in place of ordinary C strings in code. An MText can have zero or more text properties. Text properties consist of key-value pairs. In addition to text properties, individual characters can also have properties. Character properties are also key-value pairs. A data structure called a Chartable is used to store per-character properties in an efficient manner. Functions are provided for serializing MText data to and from XML.
The character code space from 0 to 0x10FFFF represents Unicode, while the code space beyond 0x10FFFF all the way out to 0x3FFFFF might be used for processing scripts not yet included in Unicode. Naoto Takahashi, a senior research scientist from AIST, is describing this library in detail in his talk here at the conference entitled “A Library for Multilingual Text Processing” (Session B13).
1. The m17n Library, http://www.m17n.org.