TEXT, NATIVE LANGUAGE SUPPORT, AND TIME MEDIA - Language-sensitive text analysis

Language-sensitive text analysis

Use the text analysis classes to provide language-sensitive functionality for:

Collating
Pattern matching
Analyzing boundaries

Instances of the text ordering classes use tables of collation rules that define the ordering of characters for a particular language. These classes provide member functions for simple comparison operations on text strings. Use these classes to implement sorting algorithms.

The text analysis classes also include a group of pattern iterator classes that support language-sensitive pattern matching. These pattern iterators use a particular instance of a text ordering class to provide language-sensitive searching capabilities for that language. Use these iterators to implement language-sensitive, case-insensitive searching.

You can easily customize these collation sequences or merge them for multilingual text analysis. The CommonPoint application system currently provides language-sensitive ordering objects for English, French, and Japanese. The intent is to provide collation tables for other European languages, Japanese, and Arabic in a later release.

The text analysis classes also include a set of classes that provide language-sensitive analysis for character, word, and sentence boundaries, providing the capability for language-sensitive text selection. These classes also perform analysis to determine proper line breaking. The analysis is based on a table of word-break rules for that language. The Text Editing framework currently uses these classes to implement the ability for end users to select words by double-clicking the mouse, and to provide correct line breaking in formatted text.

See Chapter 8, "Text analysis," for more information on these classes.

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Generated with WebMaker