| 
                   | 
               
                  
                   | 
            |
The Symbian XML framework supplies an XML parser plugin which is based on Expat. However, regular users of XML may want to create a new parser plugin that is customised to provide the robustness and features required for their specific purposes.
            You create a new parser plugin by implementing the
            MParser interface. A full discussion of how to write a
            parser is beyond the scope of this document. The main data structures which you
            need are contained in the class TParserInitParams, which
            is typically passed as a parameter to the constructor method of an
            MParser implementation.
            TParserInitiParams has the following member classes. 
            
         
CCharSetConverterUsed to convert text to and from Unicode.
MContentHandlerThe interface to the application which you write to handle the output of the parser: discussed in Using Symbian XML Framework.
RStringDictionaryCollection
                  A collection of string dictionaries: discussed in
                  Customising a Parser. A string
                  dictionary is an implementation of the MStringDictionary
                  interface: it is used to tokenise XML input into tagged elements in accordance
                  with the DTD associated with the document to be parsed. 
                  
               
RElementStackAn array structure used to stack elements in the order in which the parser encounters them.
The Symbian OS XML framework defines certain standard features which a parser may have, and in designing your parser you should consider which features it will provide. The features concern the formatting of the input and output and the information which the parser reports to the calling program in addition to the tags and parsed text. The following is a list of features.
The parser reports unrecognised tags.
The parser reports an error when it encounters unrecognised tags.
The parser reports the namespace.
The parser reports the namespace prefix.
The parser reports the namespace mappings.
The parser converts elements and attributes to lower case: that is, it is case-insensitive like an HTML parser.
The parser describes the data in a specified encoding: the default is UTF-8.
Allow external entities to appear as attribute values: the default is to raise an error when this happens.
The parser accepts XML 1.0 and XML 1.1. The default is to accept XML 1.0 only.
The parser sends all the content for an element in a single chunk: selection of this feature affects the implementation of the parsing methods.
            The following list contains the six methods of the
            MParser which must be implemented in a parser plugin.
            Three of them concern the parser features listed above. Two other methods
            perform the parsing: their purpose is to implement the parse functions of the
            CParser class discussed in
            Using Symbian XML Framework.
            The method Release() is provided because interfaces do not
            have destructor functions. 
            
         
                  EnableFeature(): Enables one of the parser
                  features. The input parameter is a flag defined in the enumeration
                  .TParserFeature. 
                  
               
                  DisableFeature(): Disables one of the parser
                  features. The input parameter is a flag defined in the
                  enumeration.TParserFeature. 
                  
               
                  IsFeatureEnabled(): Checks whether one of the
                  parser features is enabled. The input parameter is a flag defined in the
                  enumeration .TParserFeature. 
                  
               
                  ParseChunkL(): Parses part of a document.
                  Implements CParser::ParseL(). 
                  
               
                  ParseLastChunkL(): Parses the last part of a
                  document: may be called with null input. Implements CParser::ParseEndL(). 
                  
               
                  Release(): Must be called to release resources
                  when the framework has finished using the parser implementation. 
                  
               
            Some documents contain markup from more than one XML application, which
            means that the parser may encounter tags and attributes which look the same but
            belong to different namespaces. This is why the MParser
            interface provides for the reporting of namespaces. XML associates tags and
            attributes with namespaces by adding a prefix to them and the prefixes are
            mapped to the URI where the namespace is defined. The class
            RTagInfo is provided to hold this information. It is
            initialised with three strings representing the URI, prefix and local name, and
            has three functions to retrieve the information: in its three members
            Uri(), Prefix() and
            LocalName(). If you want your application to parse
            documents which combine multiple namespaces, your implementation of
            MParser should hold a parsed tag and attributes in an
            RTagInfo object. The content handler will then have
            sufficient information to react differently to tags in different namespaces. 
            
         
            Some XML applications, notably WBXML, extend XML syntax by adding
            extension tokens to the markup language. The
            WBXML specification defines
            nine global extension tokens but does not assign semantics to them. The meaning
            of extension tokens is specific to the document in which they are used (users
            are free to give them any significance whatever), but they are typically used
            in combination with compression to identify certain data which needs to be
            compressed in a specific way. For instance, extension tokens are sometimes used
            to identify data as being variables not constants, or as having a particular
            data type. To handle extension tokens, a parser plugin must implement the
            method WbxmlExtensionHandler::OnExtensionL() with three
            parameters aData, aToken,
            aErrorCode. The first parameter holds the actual data, the
            second specifies the global extension token and the third is the error code. 
            
         
            When a parse fails, the parser object must be destroyed. This means that
            the implementation of the MParser and
            MContentHandler methods must contain calls to
            User::LeaveIfError() with an error code as parameter.
            Specific error codes are supplied for various cases: they are discussed in the
            Error Codes section of this
            document. 
            
         
class CMyParser : public MParser
{
public:
/** Enable a feature. */
virtual TInt EnableFeature(TInt aParserFeature)
{ // your code here to enable the specified feature
}
/** Disable a feature. */
virtual TInt DisableFeature(TInt aParserFeature)
{ // your code here to disable the specified feature
}
/** See if a feature is enabled. */
virtual TBool IsFeatureEnabled(TInt aParserFeature) const
{ // your code here to check if the specified feature is enabled
}
 
/** Parses a descriptor that contains part of a document. */
virtual void ParseChunkL(const TDesC8& aChunk)
{ // your code here
}
 
/** Parses a descriptor that contains the last  part of a document. */
virtual void ParseLastChunkL(const TDesC8& aFinalChunk)
{ // your code here
}
 
/** Interfaces don't have a destructor, so we have an explicit method instead. */
virtual void Release()
{ // your code here
}
};
You sometimes want to use one of the parser plugins supplied with the XML framework but need to modify it to suit the structure of a particular document: in particular it is common to modify the WBXML parser. This section explains how to customise the WBXML parser.
You have to parse a WBXML document with a DTD which has not previously been implemented for the Symbian OS XML framework. This means that you need to add a new string table representing the DTD.
Parsers use string dictionaries to convert a file of text strings into a stringpool of RString objects: these are a Symbian OS C++ construct designed to perform comparison and manipulation of strings very rapidly. String pools are discussed in the Symbian OS Guide.The principle behind them is to construct a table of frequently occurring strings, to calculate integer constants representing the offset of each string from the beginning of the table, and to process the integers instead of the strings. A tool exists to perform these calculations and create the C++ code: all you have to do is create the input to the tool in the form of a string table.
A string table is a text file having the extension .st. It contains the name of C++ enumeration constants paired with the strings they refer to. Each pair occupies a line of text and its two elements are separated by white space, as in this example
stringtable Wml1_1CodePage00TagTable
EA              a
EAnchor         anchor
EAccess         access
The simplest use for a string table arises when you use WBXML as a method of compressing generic XML. In such a case you simply create a single .st file for all the frequent strings which you expect it to encounter. In our example scenario the task is slightly more complex because you are parsing a specific XML application which conforms to a DTD. A DTD specifies elements, and perhaps also attributes and attribute values, and these must be held in three separate .st files. You create the files as described above: it is the file containing attribute values which requires care. The left hand column of an attribute value string table must be exactly the same as the left hand column of the corresponding attribute string table. That is, it must list the same constant names and list them in the same order. The right hand column of an attribute value table contains the values defined for the attributes. However it may be that no value is defined for some attributes: in this case the attribute value table contains a line consisting only of the constant name, followed not by white space but by the end of the line. The following two examples show a fragment of an attribute string table and the corresponding attribute value string table.
EAcceptcharset                  accept-charset
EAlign1                         align
EAlign2                         align
EAlign3                         align
EAcceptcharset
EAlign1
EAlign2                         bottom
EAlign3                         top
In this example, the attribute 'accept-charset' has no value defined for it, so the constant name 'EAcceptCharset' is paired with nothing in the attribute value table. The attribute 'align' may take no value or the values 'bottom' and 'top': therefore the first table pairs it with three different constant names and the second table pairs the constant names with nothing, with 'bottom' and with 'top'.
The data structure used to define a DTD is called a code page: a set of string tables as described above is an implementation of a code page. When the string tables are converted into C++ the data in them is held in a structure called a string dictionary. Since the same XML application may have more than one DTD, there may be more than one code page and the associated string dictionaries are held in a structure called a string dictionary collection, with functionality to switch between one code page and another.
            You convert the string tables to C++ by invoking the conversion tool from
            the build files when you compile your parser. The conversion tool can be found
            in ...\epoc32\tools\ The Symbian OS Guide explains how to customise the .mpp and bld.inf files for your
            project to call the tool at build time. 
            
         
            The XML framework is designed to manage numerous parser implementations
            and has functionality to choose the implementation most suited to the current
            document. The criteria used to make the selection are held in the
            Xml::CMatchData class. When this information does not
            force the selection of exactly one parser, the framework defaults firstly to
            choose a Symbian-supplied parser if present: otherwise it will choose the one
            with the lowest UID. When you have created a parser implementation you also
            create a resource file which supplies this information. The field
            implementation_uid should contain the UID of the plugin, the default_data field
            should contain the document type it can parse, and the opaque_data field should
            specify the supplier (Symbian or other). The following is a specimen resource
            file. 
            
         
RESOURCE REGISTRY_INFO validatorInfo
    {
    dll_uid = 0x10273863;
    interfaces = 
        {
        INTERFACE_INFO
            {
            interface_uid = 0x101FAA0B;
            implementations = 
                {
                IMPLEMENTATION_INFO
                    {
                    implementation_uid = 0x10273864;
                    version_no = 2;
                    display_name = "Example parser";
                    default_data = "text/xml||text/wbxml";
                    opaque_data = "LicenseeX";
                    }
                };
            }
        };
    }
Error codes are supplied in the header file xmlframeworkerrors.h. They refer to six areas of functionality and the names are self-explanatory. When a parser fails, it typically generates a Leave() function with the appropriate error code as a parameter. A plugin may not require some of the error codes, depending on its functionality: for instance if string dictionaries are not used neither is the associated error code.
            Plugin selection errors are returned by the framework when ECom fails to
            supply a plugin. KErrXmlGeneratorPluginNotFound is
            supplied although the current framework does not include an XML generator.
            KErrXmlPluginNotFound is returned when a call to construct
            a content processor fails. 
            
         
                  
  | 
            
Charset converter errors are returned by CCharSetConverter. A character set may be either not supported at all or not available: not available means that there is no functionality to convert to and from that character set.
                  
  | 
            
String dictionary errors These are returned by the automatically generated string dictionary code.
                  
  | 
            
General errors refer to an entire document rather than local parse failures.
                  
  | 
            
            There is only one error code associated with the parser selection
            functionality. KErrXmlMoreThanOneParserMatched is only an
            error if the flag KXmlLeaveOnManyFlag is set. 
            
         
                  
  | 
            
            The constants KErrXmlFirst and
            KErrXmlLast are not error codes but the bounds of the XML
            error message space: they allow you to specify that you only want to handle XML
            errors. 
            
         
                  
  |