TEXT, NATIVE LANGUAGE SUPPORT, AND TIME MEDIA

Transliterators

Transliteratorsare instances of a TTransliterator class used to perform transformations on text input based on a specific algorithm or set of rules. Transliterators perform the dead-key functionality provided by keyboards on some systems. You can use transliterators for many features, including:

Accent composition--for example, combining the key sequence [a][xac ] into the character ä
Changing the case of certain letters--for example, changing lowercase to uppercase or capitalizing the first letter of each word
Smart quotes--replacing straight quotes ("-") with left- and right-hand quotes ("-")
Phonetic transcription between scripts--for example, converting from the Roman script to Greek or Cyrillic, allowing the end user to enter text in a different script than the keyboard script

Transliterators are also used to create input methods for ideographic languages. See "Input Method framework" on page 186 for a description of input methods.

Transliterators are typically selected for inclusion in a typing configuration by end users. They can be chained together to process typing input in a specific way. The Typing framework will eventually provide an interface allowing end users to directly manipulate transliterators. You can also use transliterators programmatically for tasks such as capitalizing title text.

Transliterator classes

The following classes implement transliterators:

TTransliterator is the abstract base class providing the protocol for transliterators. Transliterators have localizable names and are supported by the locale mechanism.

THexTransliterator is a concrete transliterator class that provides transliteration from hexadecimal numbers between one and four digits into their Unicode representations. Hexadecimal numbers are read using the syntax %xxxx, for example, %12f3. You can use this class directly; it was not designed to be subclassed.

TRuleBasedTransliterator is a concrete class that encapsulates a set of context-sensitive rules for transliterating text, along with a parallel set of rules that perform the reverse transliteration. These rules can be localized, allowing the same transliterator to be used in different locales.

TTransliterateRule encapsulates a single transliteration rule, containing the input text, result text, and, optionally, preceding and succeeding context information. Use TTransliterateRulesIterator to iterate through the rules contained by a particular transliterator instance.

TTransliteratorHandle provides a lightweight mechanism for referencing a particular instance of a transliterator class. Because a transliterator encapsulates a large amount of data, use this class to access a transliterator unless you need to specifically access individual rules. You can create a handle referencing one of the existing transliterators using the transliterator's name. For example:

TTransliteratorHandle aHandle(TTransliteratorHandle::kJapaneseTransliterator);

Available transliterators

The system currently provides the following rule-based transliterators:

An extended Roman transliterator (the default) providing accent composition, smart quotes, and smart spacing (eliminates double spaces)
A capitalization transliterator providing capitalization for the Roman script
Phonetic transliterators from the Roman script to Japanese (both Hiragana and Katakana)
A symbol transliterator, allowing you to enter the name of a Unicode symbol and have it transliterated to the actual character. For example, infinity is transliterated to

NOTE

Some of the files the CommonPoint application system provides that contain transliteration tables were created using the Macintosh character set; they may not display correctly on other platforms. These transliteration tables still work correctly, but you may not be able to view them. This will be fixed in a later release.

How a transliterator works

A transliterator takes a text instance and processes it according to that transliterator's algorithm or set of rules. This text instance typically represents typing input and is passed to the transliterator by the typing configuration. You can also pass a text instance to a transliterator programmatically. The transliterator either returns the translated text in a separate text instance or directly modifies the input text, as appropriate.

When the typing configuration contains several transliterators chained together, the modified text produced by one transliterator is the input text to the next transliterator in the chain.

Transliteration rules

A rule-based transliterator uses a table of rules to translate text. Each table contains two parallel sets of rules. One set defines the rules for transliterating input text, while the second set defines rules for reversing the transliteration, returning the transliterated text to its original state. This second set of rules is optional.

Transliteration rules are context-sensitive. Each rule defines:

Input text
Result text
Preceding context (optional)
Succeeding context (optional)

Each rule field can contain up to 256 characters. This table lists examples of the fields for some simple transliteration rules:

Rule Input Result Preceding context Succeeding context

Change red to green red green NIL NIL

Change red light to green light red green NIL light

Change red light to red signal light signal red NIL

You can also specify range variables for transliteration rules. Range variables allow you to provide a limited amount of wildcard matching for rule input and context fields. The variable is a single character, which cannot itself appear in any rule. It can be set to any range or set of characters (for example, AEIOU or A-Z). You can then use these variables within rules. Whenever a character in the variable range or set appears in that field, the rule is applied.

You can also create inverse rules. Whenever a character in the variable range or set does not appear in that field, the rule is applied. For example, you could create a variable value equal to all Roman letters (the ranges a-z and A-Z) and you could use the same value for an inverse variable to denote any nonletter character.

Iterating through transliterator rules

If you need access to the rules contained by a particular rule-based transliterator, use TTransliterateRulesIterator. To do this:

Instantiate the transliterator.
Call TRuleBasedTransliterator::CreateIterator to create the iterator.
Call the iterator First to initialize the iterator.
Iterate through each rule by calling the Next function.

For example:

  TRuleBasedTransliterator aTranslit(TTransliteratorHandle::kJapaneseTransliterator);
  TTransliterateRulesIterator* iterator = aTranslit.CreateIterator;
  
  TTransliterateRule translitRule = iterator.First();
  // Process rule and continue iteration.
  delete iterator;

TTransliterateRulesIterator does not allow backward iteration through the
rules table.

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Generated with WebMaker

Rule	Input	Result	Preceding context	Succeeding context
Change red to green	red	green	NIL	NIL
Change red light to green light	red	green	NIL	light
Change red light to red signal	light	signal	red	NIL