TIP
Some interfaces may currently accept char* arguments rather than an appropriate Unicode implementation--for example, TText and its subclasses. These interfaces will eventually be removed. Do not rely on them; use TText objects instead.
The TUnicode class encapsulates a character along with its associated semantic information. TUnicode member functions give you access to this semantic information. Always use these functions to access information about the character properties for a specific character.
NOTE
Refer to a specific Unicode value using its character name rather than using the code point. For example, refer to TGeneralPunctuation::kQuestionMark rather than the value U+003F.
Because of the large number of characters, the names are scoped into a set of classes based on script or function: TLatin, TGreek, TASCII, TDingbats, TGeneralPunctuation, TMathematicalOperators, and so on. These classes are provided only for referencing the enumerated names they contain; do not use them for any other reason. For a complete list of classes used to enumerate character names, see the online class and member function documentation or the following header files:
Unicode character
naming
The CommonPoint application system provides a name, through a set of enumerations, for every character in the Unicode set, with the exception of most of the Han ideographic characters.
The official name for the Han ideograph at a given code point U+XXXX is CJK UNIFIED IDEOGRAPH XXXX, so enumerated names provide no advantage. However, TUnicode does provide names for some particularly significant ideographs, such as digits and the 214 KangXi radicals.
| File name | Names included | |
UnicodeGeneral.h
|
Characters for the Roman script and general utility characters such as punctuation and control codes | |
UnicodeEastAsia.h
|
Characters for East Asian scripts such as Hangul and Kana | |
UnicodeEastEurope.h
|
Characters for Eastern European scripts such as Cyrillic | |
UnicodeMidEast.h
|
Characters for Middle Eastern scripts such as Arabic and Hebrew | |
UnicodeSouthAsia.h
|
Characters for South and Southeast Asian scripts such as Bengali and Thai | |
UnicodeSymbols.h
|
Symbol characters such as dingbats or mathematical operators | |
UnicodeCompatibility.h
|
Additional characters provided for compatibility with existing character sets, such as Roman numerals (the CommonPoint application system provides these codes for compatibility only; it is recommended that you do not use them) |
Querying Unicode character properties
TUnicode provides static member functions that return an enumerated value describing a character's script or type--GetScript and GetType. TUnicode also provides static member functions that check a UniChar for a certain property--for example, querying whether a character is an uppercase character, or a digit, or one of the space characters. These functions let you easily check a character for a specific property without needing to know all of the possibilities. For example, you can test for a space character with the TUnicode::IsASpace function without needing to know the full set of Unicode characters used to represent a space.
The following figure shows some of the TUnicode static member functions. See TUnicode in the online class and member function documentation for a complete list of character property functions and descriptions.

Use the TCharacterPropertyIterator class to scan the set of Unicode characters for characters with a specific set of properties. For example, you might use this class to return a list of punctuation characters for a particular script.