TChar Class Reference

#include <e32cmn.h>

class TChar

Nested Classes and Structures

TChar::TCharInfo

Public Member Enumerations
enum	anonymous { EFoldCase, EFoldAccents, EFoldDigits, EFoldSpaces, ..., EFoldAll }
enum	TBdCategory { ELeftToRight, ELeftToRightEmbedding, ELeftToRightOverride, ERightToLeft, ..., EOtherNeutral }
enum	TCategory { EAlphaGroup, ELetterOtherGroup, ELetterModifierGroup, EMarkGroup, ..., ECnCategory }
enum	TCjkWidth { ENeutralWidth, EHalfWidth, EFullWidth, ENarrow, EWide }
enum	TEncoding { EUnicode, EShiftJIS }

Public Member Functions
	TChar()
	TChar(TUint)
IMPORT_C TBool	Compose(TUint &, const TDesC16 &)
IMPORT_C TBool	Decompose(TPtrC16 &)
TBool	Eos()
void	Fold()
void	Fold(TInt)
IMPORT_C TBdCategory	GetBdCategory()
IMPORT_C TCategory	GetCategory()
IMPORT_C TCjkWidth	GetCjkWidth()
IMPORT_C TInt	GetCombiningClass()
IMPORT_C void	GetInfo(TCharInfo &)
IMPORT_C TUint	GetLowerCase()
IMPORT_C TInt	GetNumericValue()
IMPORT_C TUint	GetTitleCase()
IMPORT_C TUint	GetUpperCase()
IMPORT_C TBool	IsAlpha()
IMPORT_C TBool	IsAlphaDigit()
IMPORT_C TBool	IsAssigned()
IMPORT_C TBool	IsControl()
IMPORT_C TBool	IsDigit()
IMPORT_C TBool	IsGraph()
IMPORT_C TBool	IsHexDigit()
IMPORT_C TBool	IsLower()
IMPORT_C TBool	IsMirrored()
IMPORT_C TBool	IsPrint()
IMPORT_C TBool	IsPunctuation()
IMPORT_C TBool	IsSpace()
IMPORT_C TBool	IsTitle()
IMPORT_C TBool	IsUpper()
void	LowerCase()
void	TitleCase()
void	UpperCase()
	operator TUint()
TChar	operator+(TUint)
TChar &	operator+=(TUint)
TChar	operator-(TUint)
TChar &	operator-=(TUint)

Protected Member Functions
void	SetChar(TUint)

Detailed Description

Holds a character value and provides a number of utility functions to manipulate it and test its properties.

For example, there are functions to convert the character to uppercase and test whether or not it is a control character.

The character value is stored as a 32-bit unsigned integer. The shorthand "TChar value" is used to describe the character value wrapped by a TChar object.

TChar can be used to represent Unicode values outside plane 0 (that is, the extended Unicode range from 0x10000 to 0xFFFFF). This differentiates it from TText which can only be used for 16-bit Unicode character values.

Member Enumeration Documentation

Enum anonymous

Flags defining operations to be performed using TChar::Fold().

The flag values are passed to the Fold() funtion.

Enumerator	Value	Description
EFoldCase	1	Convert characters to their lower case form if any.
EFoldAccents	2	Strip accents
EFoldDigits	4	Convert digits representing values 0..9 to characters '0'..'9'
EFoldSpaces	8	Convert all spaces (ordinary, fixed-width, ideographic, etc.) to ' '
EFoldKana	16	Convert hiragana to katakana.
EFoldWidth	32	Fold fullwidth and halfwidth variants to their standard forms
EFoldStandard	EFoldCase \| EFoldAccents \| EFoldDigits \| EFoldSpaces	Perform standard folding operations, i.e.those done by Fold() with no argument
EFoldAll	-1	Perform all possible folding operations

Enum TBdCategory

The bi-directional Unicode character category.

For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9.

Enumerator	Value	Description
ELeftToRight		Left to right.
ELeftToRightEmbedding		Left to right embedding.
ELeftToRightOverride		Left-to-Right Override.
ERightToLeft		Right to left.
ERightToLeftArabic		Right to left Arabic.
ERightToLeftEmbedding		Right to left embedding.
ERightToLeftOverride		Right-to-Left Override.
EPopDirectionalFormat		Pop Directional Format.
EEuropeanNumber		European number.
EEuropeanNumberSeparator		European number separator.
EEuropeanNumberTerminator		European number terminator.
EArabicNumber		Arabic number.
ECommonNumberSeparator		Common number separator.
ENonSpacingMark		Non Spacing Mark.
EBoundaryNeutral		Boundary Neutral.
EParagraphSeparator		Paragraph Separator.
ESegmentSeparator		Segment separator.
EWhitespace		Whitespace
EOtherNeutral		Other neutrals; all other characters: punctuation, symbols.

Enum TCategory

General Unicode character category.

The high nibble encodes the major category (Mark, Number, etc.) and a low nibble encodes the subdivisions of that category.

The category codes can be used in three ways:

(i) as unique constants: there is one for each Unicode category, with a name of the form

    E<XX>Category

where

    <XX>

is the category name given by the Unicode database (e.g., the constant ELuCategory is used for lowercase letters, category Lu);

(ii) as numbers in certain ranges: letter categories are all <= EMaxLetterCategory;

(iii) as codes in which the upper nibble gives the category group (e.g., punctuation categories all yield TRUE for the test (category & 0xF0) ==EPunctuationGroup).

Enumerator	Value	Description
EAlphaGroup	0x00	Alphabetic letters. Includes ELuCategory, ELlCategory and ELtCategory.
ELetterOtherGroup	0x10	Other letters. Includes ELoCategory.
ELetterModifierGroup	0x20	Letter modifiers. Includes ELmCategory.
EMarkGroup	0x30	Marks group. Includes EMnCategory, EMcCategory and EMeCategory.
ENumberGroup	0x40	Numbers group. Includes ENdCategory, ENlCategory and ENoCategory.
EPunctuationGroup	0x50	Punctuation group. IncludesEPcCategory, PdCategory, EpeCategory, EPsCategory and EPoCategory.
ESymbolGroup	0x60	Symbols group. Includes ESmCategory, EScCategory, ESkCategory and ESoCategory.
ESeparatorGroup	0x70	Separators group. Includes EZsCategory, EZlCategory and EZlpCategory.
EControlGroup	0x80	Control, format, private use, unassigned. Includes ECcCategory, ECtCategory, ECsCategory, ECoCategory and ECnCategory.
EMaxAssignedGroup	0xE0	The highest possible groups category.
EUnassignedGroup	0xF0	Unassigned to any other group.
ELuCategory	EAlphaGroup \| 0	Letter, Uppercase.
ELlCategory	EAlphaGroup \| 1	Letter, Lowercase.
ELtCategory	EAlphaGroup \| 2	Letter, Titlecase.
ELoCategory	ELetterOtherGroup \| 0	Letter, Other.
EMaxLetterCategory	ELetterOtherGroup \| 0x0F	The highest possible (non-modifier) letter category.
ELmCategory	ELetterModifierGroup \| 0	Letter, Modifier.
EMaxLetterOrLetterModifierCategory	ELetterModifierGroup \| 0x0F	The highest possible letter category.
EMnCategory	EMarkGroup \| 0	Mark, Non-Spacing
EMcCategory	EMarkGroup \| 1	Mark, Combining.
EMeCategory	EMarkGroup \| 2	Mark, Enclosing.
ENdCategory	ENumberGroup \| 0	Number, Decimal Digit.
ENlCategory	ENumberGroup \| 1	Number, Letter.
ENoCategory	ENumberGroup \| 2	Number, Other.
EPcCategory	EPunctuationGroup \| 0	Punctuation, Connector.
EPdCategory	EPunctuationGroup \| 1	Punctuation, Dash.
EPsCategory	EPunctuationGroup \| 2	Punctuation, Open.
EPeCategory	EPunctuationGroup \| 3	Punctuation, Close.
EPiCategory	EPunctuationGroup \| 4	Punctuation, Initial Quote
EPfCategory	EPunctuationGroup \| 5	Punctuation, Final Quote
EPoCategory	EPunctuationGroup \| 6	Punctuation, Other.
ESmCategory	ESymbolGroup \| 0	Symbol, Math.
EScCategory	ESymbolGroup \| 1	Symbol, Currency.
ESkCategory	ESymbolGroup \| 2	Symbol, Modifier.
ESoCategory	ESymbolGroup \| 3	Symbol, Other.
EMaxGraphicCategory	ESymbolGroup \| 0x0F	The highest possible graphic character category.
EZsCategory	ESeparatorGroup \| 0	Separator, Space.
EMaxPrintableCategory	EZsCategory	The highest possible printable character category.
EZlCategory	ESeparatorGroup \| 1	Separator, Line.
EZpCategory	ESeparatorGroup \| 2	Separator, Paragraph.
ECcCategory	EControlGroup \| 0	Other, Control.
ECfCategory	EControlGroup \| 1	Other, Format.
EMaxAssignedCategory	EMaxAssignedGroup \| 0x0F	The highest possible category for assigned 16-bit characters; does not include surrogates, which are interpreted as pairs and have no meaning on their own.
ECsCategory	EUnassignedGroup \| 0	Other, Surrogate.
ECoCategory	EUnassignedGroup \| 1	Other, Private Use.
ECnCategory	EUnassignedGroup \| 2	Other, Not Assigned.

Enum TCjkWidth

Notional character width as known to East Asian (Chinese, Japanese, Korean (CJK)) coding systems.

Enumerator	Value	Description
ENeutralWidth		Includes 'ambiguous width' defined in Unicode Technical Report 11: East Asian Width
EHalfWidth		Character which occupies a single cell.
EFullWidth		Character which occupies 2 cells.
ENarrow		Characters that are always narrow and have explicit full-width counterparts. All of ASCII is an example of East Asian Narrow characters.
EWide		Characters that are always wide. This category includes characters that have explicit half-width counterparts.

Enum TEncoding

Deprecated

Encoding systems used by the translation functions.

Enumerator	Value	Description
EUnicode		The Unicode encoding.
EShiftJIS		The shift-JIS encoding (used in Japan).

Constructor & Destructor Documentation

TChar ( )

TChar

(

)

[inline]

Default constructor.

Constructs this character object with an undefined value.

TChar ( TUint )

TChar

(

TUint

aChar

)

[inline]

Constructs this character object and initialises it with the specified value.

Parameter	Description
aChar	The initialisation value.

Member Function Documentation

Compose ( TUint &, const TDesC16 & )

IMPORT_C TBool	Compose	(	TUint &	aResult,
			const TDesC16 &	aSource
		)	[static]

Composes a string of Unicode characters to produce a single character result.

For example, 0061 ('a') and 030A (combining ring above) compose to give 00E5 ('a' with ring above).

A canonical decomposition is a relationship between a string of characters - usually a base character and one or more diacritics - and a composed character. The Unicode standard requires that compliant software treats composed characters identically with their canonical decompositions. The mappings used by these functions are fixed and cannot be overridden for particular locales.

Parameter	Description
aResult	If successful, the composed character value. If unsuccessful, this value contains 0xFFFF.
aSource	String of source Unicode characters.

Returns: True, if the compose operation is successful in combining the entire sequence of characters in the descriptor into a single compound character; false, otherwise.

Decompose ( TPtrC16 & )

IMPORT_C TBool

Decompose

(

TPtrC16 &

aResult

)

const

Maps this character to its maximal canonical decomposition.

For example, 01E1 ('a' with dot above and macron) decomposes into 0061 ('a') 0307 (dot) and 0304 (macron).

Note that this function is used during collation, as performed by the Mem::CompareC() function, to convert the compared strings to their maximal canonical decompositions.

Parameter	Description
aResult	If successful, the descriptor represents the canonical decomposition of this character. If unsuccessful, the descriptor is empty.

Returns: True if decomposition is successful; false, otherwise.

Eos ( )

TBool

Eos

(

)

const [inline]

Tests whether the character is the C/C++ end-of-string character - 0.

Returns: True, if the character is 0; false, otherwise.

Fold ( )

void

Fold

(

)

[inline]

Converts the character to a form which can be used in tolerant comparisons without control over the operations performed.

Tolerant comparisons are those which ignore character differences like case and accents.

This function can be used when searching for a string in a text file or a file in a directory. Folding performs the following conversions: converts to lowercase, strips accents, converts all digits representing the values 0..9 to the ordinary digit characters '0'..'9', converts all spaces (standard, non-break, fixed-width, ideographic, etc.) to the ordinary space character (0x0020), converts Japanese characters in the hiragana syllabary to katakana, and converts East Asian halfwidth and fullwidth variants to their ordinary forms. You can choose to perform any subset of these operations by using the other function overload.

Fold ( TInt )

void

Fold

(

TInt

aFlags

)

[inline]

Converts the character to a form which can be used in tolerant comparisons allowing selection of the specific fold operations to be performed.

Parameter	Description
aFlags	Flags which define the operations to be performed. The values are defined in the enum beginning with EFoldCase.

GetBdCategory ( )

IMPORT_C TBdCategory

GetBdCategory

(

)

const

Gets the bi-directional category of a character.

For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9/.

Returns: The character's bi-directional category.

GetCategory ( )

IMPORT_C TCategory

GetCategory

(

)

const

Gets this character's Unicode category.

Returns: This character's Unicode category.

GetCjkWidth ( )

IMPORT_C TCjkWidth

GetCjkWidth

(

)

const

Gets the Chinese, Japanese, Korean (CJK) notional width.

Some display systems used in East Asia display characters on a grid of fixed-width character cells like the standard MSDOS display mode.

Some characters, e.g. the Japanese katakana syllabary, take up a single character cell and some characters, e.g., kanji, Chinese characters used in Japanese, take up two. These are called half-width and full-width characters. This property is fixed and cannot be overridden for particular locales.

For more information on returned widths, see Unicode Technical Report 11 on East Asian Width available at: http://www.unicode.org/unicode/reports/tr11/

Returns: The notional width of an east Asian character.

GetCombiningClass ( )

IMPORT_C TInt

GetCombiningClass

(

)

const

Gets this character's combining class.

Note that diacritics and other combining characters have non-zero combining classes.

Returns: The combining class.

GetInfo ( TCharInfo & )

IMPORT_C void

GetInfo

(

TCharInfo &

aInfo

)

const

Gets this character;s standard category information.

This includes everything except its CJK width and decomposition, if any.

Parameter	Description
aInfo	On return, contains the character's standard category information.

GetLowerCase ( )

IMPORT_C TUint

GetLowerCase

(

)

const

Gets the character value after conversion to lowercase or the character's own value, if no lowercase form exists.

The character object itself is not changed.

Returns: The character value after conversion to lowercase.

GetNumericValue ( )

IMPORT_C TInt

GetNumericValue

(

)

const

Gets the integer numeric value of this character.

Numeric values need not be in the range 0..9; the Unicode character set includes various other numeric characters such as the Roman and Tamil numerals for 500, 1000, etc.

Returns: The numeric value: -1 if the character has no integer numeric value,-2 if the character has a fractional numeric value.

GetTitleCase ( )

IMPORT_C TUint

GetTitleCase

(

)

const

Gets the character value after conversion to titlecase or the character's own value, if no titlecase form exists.

The titlecase form of a character is identical to its uppercase form unless a specific titlecase form exists.

Returns: The value of the character value after conversion to titlecase form.

GetUpperCase ( )

IMPORT_C TUint

GetUpperCase

(

)

const

Gets the character value after conversion to uppercase or the character's own value, if no uppercase form exists.

The character object itself is not changed.

Returns: The character value after conversion to uppercase.

IsAlpha ( )

IMPORT_C TBool

IsAlpha

(

)

const

Tests whether the character is alphabetic.

For Unicode, the function returns TRUE for all letters, including those from syllabaries and ideographic scripts. The function returns FALSE for letter-like characters that are in fact diacritics. Specifically, the function returns TRUE for categories: ELuCategory, ELtCategory, ELlCategory, and ELoCategory; it returns FALSE for all other categories including ELmCategory.

Returns: True, if the character is alphabetic; false, otherwise.

IsAlphaDigit ( )

IMPORT_C TBool

IsAlphaDigit

(

)

const

Tests whether the character is alphabetic or a decimal digit.

It is identical to (IsAlpha()||IsDigit()).

Returns: True, if the character is alphabetic or a decimal digit; false, otherwise.

IsAssigned ( )

IMPORT_C TBool

IsAssigned

(

)

const

Tests whether this character has an assigned meaning in the Unicode encoding.

All characters outside the range 0x0000 - 0xFFFF are unassigned and there are also many unassigned characters within the Unicode range.

Locales can change the assigned/unassigned status of characters. This means that the precise behaviour of this function is locale-dependent.

Returns: True, if this character has an assigned meaning; false, otherwise.

IsControl ( )

IMPORT_C TBool

IsControl

(

)

const

Tests whether the character is a control character.

For Unicode, the function returns TRUE for all characters in the categories: ECcCategory, ECfCategory, ECsCategory, ECoCategory and ECnCategoryCc.

See also: TChar::TCategory

Returns: True, if the character is a control character; false, otherwise.

IsDigit ( )

IMPORT_C TBool

IsDigit

(

)

const

Tests whether the character is a standard decimal digit.

For Unicode, this function returns TRUE only for the digits '0'...'9' (U+0030...U+0039), not for other digits in scripts like Arabic, Tamil, etc.

Returns: True, if the character is a standard decimal digit; false, otherwise.

IsGraph ( )

IMPORT_C TBool

IsGraph

(

)

const

Tests whether the character is a graphic character.

For Unicode, graphic characters include printable characters but not the space character. Specifically, graphic characters are any character except those in categories: EZsCategory,EZlCategory,EZpCategory, ECcCategory,ECfCategory, ECsCategory, ECoCategory, and ,ECnCategory.

Note that for ISO Latin-1, all alphanumeric and punctuation characters are graphic.

See also: TChar::TCategory

Returns: True, if the character is a graphic character; false, otherwise.

IsHexDigit ( )

IMPORT_C TBool

IsHexDigit

(

)

const

Tests whether the character is a hexadecimal digit (0-9, a-f, A-F).

Returns: True, if the character is a hexadecimal digit; false, otherwise.

IsLower ( )

IMPORT_C TBool

IsLower

(

)

const

Tests whether the character is lowercase.

Returns: True, if the character is lowercase; false, otherwise.

IsMirrored ( )

IMPORT_C TBool

IsMirrored

(

)

const

Tests whether this character has the mirrored property.

Mirrored characters, like ( ) [ ] < >, change direction according to the directionality of the surrounding characters. For example, an opening parenthesis 'faces right' in Hebrew or Arabic, and to say that 2 < 3 you would have to say that 3 > 2, where the '>' is, in this example, a less-than sign to be read right-to-left.

Returns: True, if this character has the mirrored property; false, otherwise.

IsPrint ( )

IMPORT_C TBool

IsPrint

(

)

const

Tests whether the character is a printable character.

For Unicode, printable characters are any character except those in categories: ECcCategory, ECfCategory, ECsCategory, ECoCategory and ECnCategory.

Note that for ISO Latin-1, all alphanumeric and punctuation characters, plus space, are printable.

See also: TChar::TCategory

Returns: True, if the character is printable; false, otherwise.

IsPunctuation ( )

IMPORT_C TBool

IsPunctuation

(

)

const

Tests whether the character is a punctuation character.

For Unicode, punctuation characters are any character in the categories: EPcCategory, EPdCategory, EPsCategory, EPeCategory, EPiCategory, EPfCategory, EPoCategory.

See also: TChar::TCategory

Returns: True, if the character is punctuation; false, otherwise.

IsSpace ( )

IMPORT_C TBool

IsSpace

(

)

const

Tests whether the character is a white space character.

White space includes spaces, tabs and separators.

For Unicode, the function returns TRUE for all characters in the categories: EZsCategory, EZlCategory and EZpCategory, and also for the characters 0x0009 (horizontal tab), 0x000A (linefeed), 0x000B (vertical tab), 0x000C (form feed), and 0x000D (carriage return).

See also: TChar::TCategory

Returns: True, if the character is white space; false, otherwise.

IsTitle ( )

IMPORT_C TBool

IsTitle

(

)

const

Tests whether this character is in titlecase.

Returns: True, if this character is in titlecase; false, otherwise.

IsUpper ( )

IMPORT_C TBool

IsUpper

(

)

const

Tests whether the character is uppercase.

Returns: True, if the character is uppercase; false, otherwise.

LowerCase ( )

void

LowerCase

(

)

[inline]

Converts the character to its lowercase form.

Characters lacking a lowercase form are unchanged.

SetChar ( TUint )

void

SetChar

(

TUint

aChar

)

[protected, inline]

TitleCase ( )

void

TitleCase

(

)

[inline]

Converts the character to its titlecase form.

The titlecase form of a character is identical to its uppercase form unless a specific titlecase form exists. Characters lacking a titlecase form are unchanged.

UpperCase ( )

void

UpperCase

(

)

[inline]

Converts the character to its uppercase form.

Characters lacking an uppercase form are unchanged.

operator TUint ( )

operator TUint

(

)

const [inline]

Gets the value of the character as an unsigned integer.

The operator casts a TChar to a TUint, returning the TUint value wrapped by this character object.

operator+ ( TUint )

TChar

operator+

(

TUint

aChar

)

[inline]

Gets the result of adding an unsigned integer value to this character object.

This character object is not changed.

Parameter	Description
aChar	The value to be added.

Returns: A character object whose value is the result of the addition operation.

operator+= ( TUint )

TChar &

operator+=

(

TUint

aChar

)

[inline]

Adds an unsigned integer value to this character object.

This character object is changed by the operation.

Parameter	Description
aChar	The value to be added.

Returns: A reference to this character object.

operator- ( TUint )

TChar

operator-

(

TUint

aChar

)

[inline]

Gets the result of subtracting an unsigned integer value from this character object.

This character object is not changed.

Parameter	Description
aChar	The value to be subtracted.

Returns: A character object whose value is the result of the subtraction operation.

operator-= ( TUint )

TChar &

operator-=

(

TUint

aChar

)

[inline]

Subtracts an unsigned integer value from this character object.

This character object is changed by the operation.

Parameter	Description
aChar	The value to be subtracted.

Returns: A reference to this character object.