XML Framework Overview

The XML Framework is a collection of components for event-based XML parsing and provides content-processing architecture.

Purpose

The XML Framework provides configurable features for parsing XML and WBXML (WAP Binary XML), with options for validating against a specification and auto-correcting for spelling errors in the validated text, using a single interface. It is based on the SAX (Simple API for XML) specification.

Required background

You must have a basic understanding on XML before using XML Framework.

Key concepts

The following are the key concepts of XML Framework:

Attribute

A name-value pair separated by an equals sign, for example author="Jane Austen"

Attribute type

One of certain data types defined for attributes, for instance CDATA.

Client

An application which uses the XML framework for parsing or generating a document.

Document Type Definition (DTD)

A document which defines a particular use of XML entities (the names, attributes and values permitted).

Extension

WBXML extends XML syntax with extension tokens which are used differently by different applications. For example, extension token is used to refer to a string table created specifically for each message and transmitted in the introduction of the message.

Parser

It is an interface to the XML framework which allows a client to access the parser plug-ins, which are specific for a mark-up language. For example, XML Expat Parser and WBXML Parser.

String dictionary collection

A class that holds a collection of string dictionaries.

String dictionary plug-ins

The XML Framework allows strings to be stored in DTD document, XML namespace or WBXML code page in an ECOM plug-in that could be accessed as required by the parser and the client. These plug-ins are referred as string dictionary plug-ins.

String pool

A string pool is a mechanism for storing strings in a particular way using which the strings can be compared quickly.

String table

A WBXML document is encoded and decoded using a table of frequently encountered strings which the body of the document references by index to compress the data.

Uniform Resource Identifier (URI)

The web address associated with a prefix. For instance, http://www.w3.org/XML/1998/namespace.

WBXML

WAP Binary XML (WBXML) is a binary representation of XML. It was developed by the Open Mobile Alliance as a standard to allow XML documents to be transmitted in a compact manner over mobile networks and was proposed as an addition to the World Wide Web Consortium's Wireless Application Protocol family of standards.

Architecture

The following diagram illustrates the XML framework, consisting of client and a parser:

Figure 1. Block diagram of XML framework

The XML framework consists of classes which model the main constituents of the architecture - the framework as a whole, the parser plug-ins and extensions to XML, the content processor chain and the content handler mechanism.

The XML and WBXML parsers convert the contents of a document to UTF-8 format. This is to ensure that extended characters are not lost from the document by the String Pool. Expat is the engine behind the XML parser plug-in.

The XML Framework allows strings to be stored for a particular DTD, XML namespace or WBXML codepage in an ECOM plug-in that can be accessed when requireded by the Parser and the Client. These plug-ins are referred as String Dictionary Plug-ins and they are managed through a string dictionary collection object. See String Dictionary

Libxml2 provides XML processing, parsing and validation APIs. See libxml2.

Plug-in 1 and Plug-in 2 are examples of optional processors, which may be chained together with the parser output to allow further processing of the data, before the client receives it. Such plug-ins can be a DTD validator or a document auto-corrector. The chain is not limited to just two plug-ins.

Parser framework

The XML framework contains Parser framework which is represented by the CParser class. A client with an XML document to be parsed creates a CParser object and calls its parse functions. CParser obtains the data about plug-ins and the document to be parsed fromCMatchData and RDocumentParameters classes respectively.

The parser framework conforms to the event-based SAX specification. It outputs an event when it starts or finishes reading one of the following:

  • a document

  • a start tag

  • an end tag

  • a prefix mapping

  • a processing instruction

  • character data

  • ignorable white space

For more information on XML-related concepts, refer to W3C or similar sources.

Parser plug-ins

The CParser is the interface to the XML framework allowing the client to access the parser plug-ins, each one of which is specific to a mark-up language (e.g. XML, WBXML). Individual parser plug-in implements the MParser interface. It is associated through the TParserInitParams class, with a character set converter (to convert other formats to Unicode), a string dictionary and an element stack.

The Symbian platform framework is delivered with three parser plug-ins, two for XML and one for WBXML.

  • The first XML parser consists of CXmlParser class, which is wrapped around the CExpat class, an implementation of the stream-based Expat parser.

  • The second XML parser consists of CXMLEngineSAXPlugin class, which encapsulates the SAX parser of the libxml2 component. It is not available if the Symbian platform build excludes this component.

  • The WBXML parser is implemented as the CWmxmlParser class.

Extensions to XML

The XML framework provides extensions to XML. At present WBXML is implemented. WBXML requires use of string dictionaries and extension tokens to store the element strings specific to the WBXML, which are represented by the RStringDictionaryCollection, MWbxmlExtensionHandler and TExtensionTokens classes.

Content processors

Content processors are plug-ins which perform further operations on the output of a parser plug-in. They implement the MContentProcessor interface and are associated through the TContentProcessorInitParams class, with a string dictionary and element stack. They are organised into chains by the MContentSource class which directs the output of each plug-in to the next plug-in in the chain.

Content handlers

A client application which is designed to react to the output of the XML framework event must implement the MContentHandler interface. The functions to be implemented correspond to the SAX specification discussed in the Parser Framework section.

APIs

The XML Framework exports the following APIs:

API Description

CExpat

Encapsulates the Expat XML parser.

CXmlParser

Implementation of the stream-based Expat parser.

CXMLEngineSAXPlugin

Encapsulates the SAX parser of the libxml2 component.

CParser

Represents the entire parser framework.

CMatchData

Consists of the data of the plug-ins.

CWmxmlParser

WBXML parser implementation.

RDocumentParameters

Consists of the data about the document to be parsed.

RElementStack

Data structure used to store XML elements and check the tag ordering.

RStringDictionaryCollection

Holds a collection of dictionaries requested by the user.

Typical uses

The following tasks can be performed using XML Framework:

  • Parsing an XML document.

  • Choosing a parser plug-in.

  • Using content processor.

  • Writing a parser plug-in.

  • Customising a parser plug-in.

  • Creating a resource file for a parser plug-in.

Related concepts