Class: TUnicode

Declaration: Unicode.h

Taxonomy Categories:

Member Functions:


Interface Category:

API.

Inherits From:

None.

Inherited By:

None.

Purpose:

Provides access to CommonPoint's implementation of the Unicode character set. Unicode is a new, 16-bit character set standard which includes all the characters and symbols used in writing all the major languages of the world. Future versions of Unicode will support even more languages. The TUnicode class provides enumerations of important constants associated with Unicode and functions to provide information about the semantic properties of Unicode characters. Possible properties for Unicode characters include: being upper case, being lower case, being a combining character, being an East Asian ideograph, and so on. Details on Unicode and how it should be implemented can be found in the published Unicode Standard. Unicode defines a User Zone of 6400 characters. This space can be used to include any characters desired or needed within Unicode. CommonPoint provides a default implementation of the User Zone. Unicode defines names for its characters, but the number of named values in Unicode is so vast that having a single enumerated list of Unicode values is not feasible (even if the names for the East Asian ideographs are left out). To avoid this, the names for Unicode values have been spread out among a group of Unicode naming classes on the basis of writing system or script (such as Latin, Greek, Cyrillic, Hebrew) or function (such as punctuation, mathematical operator). Many of the member functions provide analogs for traditional C-language functions such as isalpha.

Instantiation:

Allocate on the heap or the stack.

Deriving Classes:

Do not derive this class.

Concurrency:

Multithread safe.

Resource Use:

No special requirements.

Member Function: TUnicode::TUnicode

  1. TUnicode ()
  2. TUnicode (const TUnicode &)

Interface Category:

API.

Purpose:

  1. Default constructor.
  2. Copy constructor.

Calling Context:

  1. Called by the stream-in operators. Also called by anyone who needs information about CommonPoint's default implementation of Unicode.
  2. Called to copy an object.

Parameters:

Return Value:

None.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::~TUnicode

virtual ~ TUnicode ()

Interface Category:

API.

Purpose:

Destructor.

Calling Context:

Called to destroy an object.

Parameters:

Return Value:

None.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::operator=

TUnicode & operator =(const TUnicode &)

Interface Category:

API.

Purpose:

Assignment operator.

Calling Context:

Called when an object is assigned to another compatible object.

Parameters:

Return Value:

A non-const reference to the left-hand side object.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsLineSeparator

static bool IsLineSeparator (UniChar uc)

Interface Category:

API.

Purpose:

Returns true for Unicode characters which can be used to separate lines.

Calling Context:

Used by anyone wishing to determine if a particular Unicode character can be used to separate lines.

Parameters:

Return Value:

Returns true if the parameter can be used to separate lines.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsParagraphSeparator

static bool IsParagraphSeparator (UniChar uc)

Interface Category:

API.

Purpose:

Returns true for Unicode characters which can be used to separate paragraphs.

Calling Context:

Used by anyone wishing to determine if a particular Unicode character can be used to separate paragraphs.

Parameters:

Return Value:

Returns true if the parameter can be used to separate paragraphs.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsLineOrParagraphSeparator

static bool IsLineOrParagraphSeparator (UniChar uc)

Interface Category:

API.

Purpose:

Returns true for Unicode characters which can be used to separate lines or paragraphs.

Calling Context:

Used by anyone wishing to determine if a particular Unicode character can be used to separate lines or paragraphs.

Parameters:

Return Value:

Returns true if the parameter can be used to separate lines or paragraphs.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsASpace

static bool IsASpace (UniChar uc)

Interface Category:

API.

Purpose:

Determines if a particular Unicode character is a space. Spaces within Unicode include the regular space, a non-breaking space, a zero-width space and spaces of varying widths.

Calling Context:

Used by anyone wishing to determine if a Unicode character is a space, such as someone using spaces to separate input into tokens.

Parameters:

Return Value:

Returns true if the parameter is Unicode space character.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsInvisible

static bool IsInvisible (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character has no visual representation. This is true for space characters, for example.

Calling Context:

Used by anyone, such as Line Layout, who wishes to know if a particular Unicode character can be displayed.

Parameters:

Return Value:

Returns true if the parameter cannot be displayed.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsUpper

static bool IsUpper (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is considered uppercase.

Calling Context:

Called by anyone who wishes to determine the case of a Unicode character, such as for collation or word-breaking.

Parameters:

Return Value:

Returns true if the parameter is an uppercase character.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsLower

static bool IsLower (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is considered lowercase.

Calling Context:

Called by anyone who wishes to determine the case of a Unicode character, such as for collation or word-breaking.

Parameters:

Return Value:

Returns true if the parameter is a lowercase character.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsUncased

static bool IsUncased (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is considered an uncased letter. Many alphabets, such as Hebrew, Arabic, and the various Indic alphabets, do not distinguish between upper- and lower-case letters, and so their letters are uncased.

Calling Context:

Called by anyone who wishes to determine the case of a Unicode character, such as for collation or word-breaking.

Parameters:

Return Value:

Returns true if the parameter is an uncased character.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsBaseForm

static bool IsBaseForm (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character can be accented or take a diacritic mark.

Calling Context:

Used by anyone, such as Line Layout, who needs to know if a given character can taken an accent for positioning accents. Clients developing collation algorithms might also need this information to determine the equivalence of two character sequences.

Parameters:

Return Value:

Returns true if the parameter can be accented or take a diacritic mark.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsDiacritic

static bool IsDiacritic (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is an accent or diacritic mark.

Calling Context:

Used by anyone, such as Line Layout, who needs to know if a given character is an accent to be positioned over or grouped with a base form earlier in the text stream. Clients developing collation algorithms might also need this information to determine the equivalence of two character sequences.

Parameters:

Return Value:

Returns true if the parameter is an accent or diacritic.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsAlpha

static bool IsAlpha (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character belongs to an alphabet.

Calling Context:

Used by anyone who wants to scan text for alphabetic characters.

Parameters:

Return Value:

Returns true if the parameter is a letter of an alphabet.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsAlphaNumeric

static bool IsAlphaNumeric (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character belongs to an alphabet or is a digit.

Calling Context:

Used by anyone who wants to scan text for alphanumeric characters, such as a compiler separating text into identifiers.

Parameters:

Return Value:

Returns true if the parameter belongs to an alphabet or is a digit.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsGraphic

static bool IsGraphic (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character has a visual representation. This is the opposite of IsInvisible.

Calling Context:

Used by anyone, such as Line Layout, who wishes to know if a particular Unicode character can be displayed.

Parameters:

Return Value:

Returns true if the parameter can be displayed.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsDigit

static bool IsDigit (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is a decimal digit. This same functionality is provided within the TUnicodeDecimalNumerals class.

Calling Context:

Called by anyone who wishes to determine if a given Unicode character is a decimal digit.

Parameters:

Return Value:

Returns true if the parameter is a decimal digit.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

This function is provided mostly for the sake of compatibility with standard ANSI C functions. Clients should usually use the number formatting and numerals classes rather than this function.

Member Function: TUnicode::IsXDigit

static bool IsXDigit (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is a standard hexadecimal digit, that is, '0' through '9', and 'A' through 'F' or 'a' through 'f'.

Calling Context:

Called by anyone who wishes to determine if a given Unicode character is a hexadecimal digit.

Parameters:

Return Value:

Returns true if the parameter is a hexadecimal digit.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

This function is provided mostly for the sake of compatibility with standard ANSI C functions. Clients should usually use the number formatting and numerals classes rather than this function. This function is equivalent to IsHexDigit.

Member Function: TUnicode::IsHexDigit

static bool IsHexDigit (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is a standard hexadecimal digit, that is, '0' through '9', and 'A' through 'F' or 'a' through 'f'.

Calling Context:

Called by anyone who wishes to determine if a given Unicode character is a hexadecimal digit.

Parameters:

Return Value:

Returns true if the parameter is a hexadecimal digit.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

This function is provided mostly for the sake of compatibility with standard ANSI C functions. Clients should usually use the number formatting and numerals classes rather than this function. This function is equivalent to IsXDigit.

Member Function: TUnicode::IsASCII

static bool IsASCII (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is found in the ASCII character set. This is true for Unicode values U+0000 through U+007F.

Calling Context:

Used by anyone who needs to distinguish ASCII from non-ASCII characters within Unicode.

Parameters:

Return Value:

Returns true if the parameter is found in the ASCII character set.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

This function is provided mostly for compatibility with standard ANSI C libraries. Its use should be avoided.

Member Function: TUnicode::IsControl

static bool IsControl (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is considered a control character, such as is used to control certain terminal devices.

Calling Context:

Used by anyone who needs to distinguish control codes from non-control codes.

Parameters:

Return Value:

Returns true if the parameter is a control code.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

This function is the logical inverse of IsPrint.

Member Function: TUnicode::IsPrint

static bool IsPrint (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is not considered a control character.

Calling Context:

Used by anyone who needs to distinguish control codes from non-control codes.

Parameters:

Return Value:

Returns true if the parameter is not a control code.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

This function is the logical inverse of IsControl.

Member Function: TUnicode::IsPunctuation

static bool IsPunctuation (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character is used as punctuation, such as periods, commas, and so on.

Calling Context:

Used by anyone who needs to distinguish general Unicode punctuation characters from Unicode non-punctuation, such as a text parser.

Parameters:

Return Value:

Returns true if the parameter is punctuation.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsOpenPunctuation

static bool IsOpenPunctuation (UniChar uc)

Interface Category:

API.

Purpose:

Returns true for the opening element of certain punctuation pairs, such as the left parenthesis or bracket.

Calling Context:

Used by anyone who needs to distinguish Unicode opening punctuation characters from other Unicode characters, such as a text parser.

Parameters:

Return Value:

Returns true if the parameter is the opening element of a punctuation pair.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsClosePunctuation

static bool IsClosePunctuation (UniChar uc)

Interface Category:

API.

Purpose:

Returns true for the closing element of certain punctuation pairs, such as the right parenthesis or bracket.

Calling Context:

Used by anyone who needs to distinguish Unicode closing punctuation characters from other Unicode characters, such as a text parser.

Parameters:

Return Value:

Returns true if the parameter is the closing element of a punctuation pair.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsSymbol

static bool IsSymbol (UniChar uc)

Interface Category:

API.

Purpose:

Returns true for Unicode characters which are considered symbols and not used in writing human languages, such as mathematical operators and dingbats.

Calling Context:

Used by anyone who needs to distinguish Unicode symbol characters from other Unicode characters, such as a text parser.

Parameters:

Return Value:

Returns true if the parameter is a Unicode symbol character.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsInSet

static bool IsInSet (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a given Unicode code point is defined in the current version of Unicode.

Calling Context:

Used by anyone who needs to scan a text stream for invalid Unicode values.

Parameters:

Return Value:

Returns true if the parameter is the code point of a currently-defined Unicode character.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::MatchPunctuation

static UniChar MatchPunctuation (UniChar searchChar)

Interface Category:

API.

Purpose:

Certain punctuation characters such as parentheses come in pairs; this function returns the mate to a character from one of these pairs.

Calling Context:

Called by anyone who needs to determine the mate for a Unicode character in a punctuation pair. A program editor could use it to balance parentheses, for example.

Parameters:

Return Value:

The mate for the Unicode character given as a parameter. The parameter itself is returned if it has no mate.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::DigitValue

static int DigitValue (UniChar uc)

Interface Category:

API.

Purpose:

Returns the decimal value of a Unicode digit character.

Calling Context:

Called to determine the decimal value of a Unicode digit. This same functionality is supplied by the class TUnicodeDecimalNumerals.

Parameters:

Return Value:

Returns the decimal value of its parameter. If the parameter is not a decimal digit, returns -1.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

Does not throw an exception if given a parameter which is not a decimal digit. Use IsDigit to confirm that a character is a digit before using. This function is provided mostly for the sake of compatibility with standard ANSI C functions. Clients should usually use the number formatting and numerals classes rather than this function.

Member Function: TUnicode::CurrentVersion

static double CurrentVersion ()

Interface Category:

API.

Purpose:

Returns the version of Unicode implemented by the class as a floating-point number. This value is currently 1.1.

Calling Context:

Used to determine the version of Unicode implemented by this class.

Parameters:

Return Value:

The version of Unicode implemented by the class as a floating point number. This value is currently 1.1.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::StandardNameToConstant

static void StandardNameToConstant (const TText & standardName, TText & constantName)

Interface Category:

API.

Purpose:

Converts a formal character name as defined by the Unicode standard into a version which can be used as an identifier in most programming languages. For example the formal name FULL STOP is converted to FullStop.

Calling Context:

Called by anyone who needs to convert character names from the formal versions defined in the standard to a form more appropriate for use as an identifier in a programming language.

Parameters:

Return Value:

None.

Exceptions:

Throws TUnicodeException::fgCannotAllocateMemory if memory for the buffers used in the conversion process cannot be allocated.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::ConstantNameToStandard

static void ConstantNameToStandard (const TText & constantName, TText & standardName)

Interface Category:

API.

Purpose:

Converts a character name appropriate for use as an identifier in most programming languages into a version conformant with the formal naming rules for Unicode characters. For example the identifier FullStop is converted to FULL STOP.

Calling Context:

Called by anyone who needs to convert character names from the formal versions defined in the standard to a form more appropriate for use as an identifier in a programming language.

Parameters:

Return Value:

None.

Exceptions:

Throws TUnicodeException::fgCannotAllocateMemory if memory for the buffers used in the conversion process cannot be allocated.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::IsTrailingInvisible

static bool IsTrailingInvisible (UniChar uc)

Interface Category:

API.

Purpose:

Returns true if a Unicode character has no visual representation and breaks words. This is true for many space characters, for example, but false for non-breaking spaces.

Calling Context:

Used by anyone, such as Line Layout, who wishes to know if a particular Unicode character can be displayed and breaks words.

Parameters:

Return Value:

Returns true if the parameter cannot be displayed and breaks words.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::GetUserType

static ECharacterProperty GetUserType (UniChar uc)

Interface Category:

API.

Purpose:

Returns basic information on the properties of a Unicode character..

Calling Context:

Used by GetType to determine the character properties for characters in the User Zone.

Parameters:

Return Value:

The character properties for the parameter.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::GetType

static ECharacterProperty GetType (UniChar uc)

Interface Category:

API.

Purpose:

Returns basic information on the properties of a Unicode character.

Calling Context:

Generally, clients will use the various IsXXXXX functions defined in this class, rather than using GetType themselves. Most of the IsXXXXX functions call GetType. GetType can, however, be called by anyone who wishes to test or check the properties of a Unicode character, especially if the IsXXXXX functions do not meet their needs. For example, GetType is called by the class TCharacterPropertyIterator.

Parameters:

Return Value:

The character properties for the parameter.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.

Member Function: TUnicode::GetScript

static EUnicodeScript GetScript (UniChar uc)

Interface Category:

API.

Purpose:

Returns the script or writing system with which a Unicode character is associated.

Calling Context:

Called to obtain the script or writing system with which a Unicode character is associated.

Parameters:

Return Value:

The Unicode script with which the character is associated.

Exceptions:

Throws no exceptions, passes all exceptions through.

Concurrency:

Multithread safe.

Other Considerations:

None.
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.