Text analysis

The native language support services provide classes that enable analyzing text data in a language-sensitive manner. These classes support:

Comparing and ordering text strings (described in "Comparing and collating text" on page 198)
Searching for a specific text pattern (described in "Pattern matching" on page 203)
Text selection at the character, word, and sentence level (described in "Language-sensitive text selection" on page 208)

Use language-sensitive text analysis when you want to compare text data according to the alphabetical ordering rules of a natural language rather than the byte-values of the character encoding set. Language-sensitive processing is required for case-insensitive ordering and matching. For example, without using language-sensitive processing, in the Roman script the letter Z would be ordered before the letter a.

Language-sensitive text comparisons are based on text-ordering objects that provide correct collation and selection for a particular natural language. End users specify a preferred text-ordering object in their locale. Text comparison functions use the preferred object to analyze text according to the rules of that language. By using this mechanism, you can implement functions that will be language-sensitive in whatever language the current end user specifies.

The system currently provides ordering objects for English, French, and Japanese (Kana). Later releases will include ordering objects for other European languages and Arabic.

You can also use the text analysis classes for language-insensitive text comparison. See "Language-insensitive collation" on page 202 for more information.

Comparing and collating text
Pattern matching
Language-sensitive text selection
Creating your own collation objects

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Generated with WebMaker