The text analysis classes include a set of objects that provide language-sensitive boundary detection for the following text elements:
- Character clusters--for example, the letter â should typically be treated as a single character even though it is stored as two characters (the base letter plus the diacritic)
- Words
- Sentences
The TTextChunkIterator class provides the basic protocol for iterating over these elements and for iterating through word-wrap points. If you need to use a text chunk iterator directly, call the appropriate TTextChunkIterator function to create the correct iterator--for example, CreatePreferredWordSelectionIterator.
Text chunk iterators maintain a pointer to the target text object. If the text is modified, resynchronize the iterator using the TTextChunkIterator::Set member function. If the text object is deleted, you must also delete the text chunk iterator.
TTextChunkIterator is the abstract base class for all text chunk iterators in the system. The following types of iterators are provided:
- Character cluster iterators for iterating through characters or logically grouped character sequences. This iterator enables correct backspacing through a sequence of characters.
- Word selection iterators for iterating through words. This iterator enables selection of a word, typically with a double mouse-click.
- Sentence selection iterators for iterating through sentences. This iterator enables selection of a sentence, typically with a triple mouse-click.
- Word-wrap iterators for iterating through points within words at which the word can be broken for line-wrapping purposes.
TTextChunkIteratorReference provides a reference to a text chunk iterator. TTextChunkIterator provides specific member functions you can use to create references for each type of iterator. Use references whenever you stream a text chunk iterator to another address space. If you stream the reference back, call the function TTextChunkIteratorReference::IsValid to verify the status. Create a new iterator for that type of text element if the status is invalid.
NOTE
The text chunk iterators assume that any invisible character--that is, a character for which TUnicode::IsInvisible returns True--is the final character in a word, with the exceptions of TGeneralPunctuation::kNoBreakSpace and TGeneralPunctuation::kZeroWidthNoBreakSpace. The iterators also assume that the TGeneralPunctuation characters kLineSeparator and kParagraphSeparator and the tab and page separator characters are the final character in any word.
[Contents]
[Previous]
[Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.
Generated with WebMaker