Handling exception characters

When transcoding text between Unicode and another character encoding, you often come across exception characters. An exception character is a character that does not have a single character equivalent or does not exist in the target encoding system.

When you call a transcoder's conversion functions, therefore, you can request one of the following transcoding scopes (a TTranscoder::ETranscodingScope value) to indicate how the transcoder should handle exception characters:


Transcoding data with the kNoRoundTrip option is the simplest mechanism because characters have a one-to-one mapping and you can easily calculate storage needs. If you want to transcode text using either the kPartialRoundTrip or kFullRoundTrip option, you must consider how you will allocate storage. Exception characters might expand into two characters, so calculating storage needs based on the length of the input text might not be adequate.

You can handle calculating storage by:

TIP For conversion with the kFullRoundTrip option, you might also want to use the TTranscoder::CanConvert member function to query whether all the characters can be converted and to flag any nonconvertible characters.

Using GetMaximumBytesPerCharacter

This example shows how to use the GetMaximumBytesPerCharacter member function to create an adequate storage buffer, and then transcode the text in unicodeText to ASCII character data.

      TASCIITranscoder transcoder();
      
      TTextCount maxBytesPerChar;
      maxBytesPerChar = transcoder.GetMaximumBytesPerCharacter(
          TTranscoder::kPartialRoundTrip);
      
      TTextCount bufferLength = maxBytesPerChar * unicodeText.GetLength();
      unsigned char* returnChars = new unsigned char[bufferLength];
      
      TTextRange unicodeRange(0,unicodeText.Length());
      TTextCount numCharsConverted = transcoder.ExtractFromText(
          unicodeText, unicodeRange, returnChars, bufferLength,
          TTranscoder::kPartialRoundTrip);

Using GetBufferSize

This example shows how to use the GetBufferSize member function to create the required storage buffer, and then to transcode the text in unicodeText into ASCII character data.

      TASCIITranscoder transcoder;
      TTextRange unicodeRange(0,unicodeText.GetLength());
      
      TTextCount bufferLength = transcoder.GetBufferSize(unicodeText,
          unicodeRange, TTranscoder::kPartialRoundTrip);
      unsigned char*\ returnChars = new unsigned char[bufferLength];
      
      TTextCount numCharsConverted = transcoder.ExtractFromText(
          unicodeText, unicodeRange, returnChars, bufferLength,
          TTranscoder::kPartialRoundTrip);

Using CanConvert

This example shows how to use the CanConvert member function to query whether the entire string of ASCII characters in the char* myString can be transcoded to Unicode.

      TASCIITranscoder transcoder;
      unsigned long stringLength = strlen(myString);
      
      TTextCount countCanConvert = transcoder.CanConvert(myString,
          stringLength, TTranscoder::kFullRoundTrip);
      
      if (countCanConvert < stringLength) {
          // flag non-convertible characters, print messages, etcetera...
      }

[Contents] [Previous] [Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.

Generated with WebMaker