Writing a Parser Plug-in

This section describes how to write a parser plug-in.

Introduction

The Symbian XML framework supplies an XML parser plug-in which is based on Expat. The framework provides plug-ins with standard features. However a user can customize a plug-in according to his requirement such as parsing only a part of a document or releasing a specific resource.

The Symbian platform XML framework defines certain standard features which a parser may have, and while designing a parser, consider the features it provides. The following is a list of standard features provided by a parser:

  • Report unrecognised tags, namespace, namespace prefix and mappings.

  • Convert elements and attributes to lower case, that is, it is case-insensitive like an HTML parser.

  • Describe the data in a specified encoding: the default is UTF-8.

  • Accept XML 1.0 and XML 1.1. The default is to accept XML 1.0 only.

A user defined parser plug-in must implement the MParser interface which has six pure virtual APIs. Three of them concern the parser features listed above. Two other methods perform the parsing; their purpose is to implement the parse functions of the CParser class discussed in Choosing a Parser Plug-in and one method is for releasing resources. The following is the list of APIs of MParser:

Class Description

EnableFeature()

Enables the feature.

DisableFeature()

Disables the feature.

IsFeatureEnabled()

Checks if the feature is enabled.

ParseChunkL()

Parses part of a document. Implements CParser::ParseL().

ParseLastChunkL()

Parses the last part of a document. Implements CParser::ParseEndL().

Release()

Must be called to release resources when the framework has finished using the parser implementation.

Some documents contain markups from more than one XML application, which means that the parser may encounter tags and attributes which look the same but belong to different namespaces. This is why the MParser interface provides a feature for the reporting of namespaces. XML associates tags and attributes with namespaces by adding a prefix to them, and the prefixes are mapped to the URI where the namespace is defined. The class RTagInfo is provided to hold this information. It is initialised with three strings representing the URI, prefix and local name, and these information can be retrieved by Uri(), Prefix() and LocalName() respectively. If the application has to parse documents which combine multiple namespaces, then the implementation of MParser must hold a parsed tag and attributes in an RTagInfo object. The content handler will then have sufficient information to react differently to tags in different namespaces.

Some XML applications, notably WBXML, extend XML syntax by adding extension tokens to the markup language. The WBXML specification defines nine global extension tokens but does not assign semantics to them. The meaning of extension tokens is specific to the document in which they are used, but they are typically used for compression to identify certain data which must be compressed in a specific way. For instance, extension tokens are sometimes used to identify data as being variables not constants, or as having a particular data type. To handle extension tokens, a parser plugin must implement the method WbxmlExtensionHandler::OnExtensionL() with three parameters aData, aToken, aErrorCode. The first parameter holds the actual data, the second specifies the global extension token and the third is the error code.

Procedure

To write a parser plug-in, follow the steps given below:

  1. Encapsulate the data structures in TParserInitParams.

    The main data structures required are contained in the TParserInitParams class, which is typically passed as a parameter to the constructor method of an MParser implementation. TParserInitParams has the following member classes:

    API Description

    CCharSetConverter

    Used to convert text to and from Unicode.

    MContentHandler

    Interface to the application which is writen to handle the output of the parser. It is discussed in Using Symbian XML Framework.

    RStringDictionaryCollection

    A collection of string dictionaries discussed in Customising a Parser. A string dictionary is an implementation of the MStringDictionary interface which is used to tokenise XML input into tagged elements in accordance with the DTD associated with the document to be parsed.

    RElementStack

    An array structure used to stack elements in the order in which the parser encounters them.

    MParser* CMyParser::NewL(TAny* aInitParams)
        {
         CMyParser* self = new( ELeave ) CMyParser( reinterpret_cast<TParserInitParams*>( aInitParams ) );
         return( static_cast<MParser*>( self ) );
        }
    CMyParser::CMyParser( TParserInitParams* aInitParams )
    :   iContentHandler( reinterpret_cast<MContentHandler*>( aInitParams->iContentHandler ) ),
        iStringDictionaryCollection( reinterpret_cast<RStringDictionaryCollection*>( aInitParams->iStringDictionaryCollection ) ),
        iCharSetConverter( reinterpret_cast<CCharSetConverter*>( aInitParams->iCharSetConverter ) ),
        iElementStack( reinterpret_cast<RElementStack*>( aInitParams->iElementStack ) )
        {
        }
  2. Select XML parser features.

  3. Implement CMyParser derived from MParser.

    class CMyParser : public MParser
        {
        static MParser* NewL(TAny* aInitParams);
        virtual ~CMyParser();
        
        public:
            /** Enable a feature. */
            TInt EnableFeature( TInt aParserFeature )
                { 
                // your code here to enable the specified feature
                }
           /** Disable a feature. */
           TInt DisableFeature( TInt aParserFeature )
                { 
                // your code here to disable the specified feature
                }
           /** See if a feature is enabled. */
           TBool IsFeatureEnabled( TInt aParserFeature ) const
                { 
                // your code here to check if the specified feature is enabled
                } 
           /** Parses a descriptor that contains part of a document. */
           void ParseChunkL( const TDesC8& aChunk )
                { 
                // your code here
                } 
           /** Parses a descriptor that contains the last  part of a document. */
           void ParseLastChunkL( const TDesC8& aFinalChunk )
                { 
                // your code here
                } 
           /** Interfaces don't have a destructor, so we have an explicit method instead. */
           void Release()
                { 
                // your code here 
                }
        };
  4. Release resources using Release().

    When a parse fails, the parser object must be destroyed. This means that the implementation of the MParser and MContentHandler methods must contain calls to User::LeaveIfError() with an error code as parameter. Specific error codes are supplied for various cases as discussed in the Error Codes section of this document.