Handling unknown elements?

Thu Apr 9 00:44:30 BST 1998

One dilemma I have been trying to figure out with XML is the problem of
handling unknown element types and what to do with their children.

For simple tree based data modeling this is pretty simple, if you come
across an unknown element that the application does not understand, you
just ignore it and all of its children.

However what if like in the case of HTML an application may have mixed
content where it understands the tag for boldface text but not
understand the for italicized text. The actual character data may
be a child of the element in this case.

In case you anyone would like to know I have designed an XML Application
framework that for now works fine for tree-based data modeling, but it
really will have problems with documents that have all sorts of element
(and their properties) applied to the character content, rather than
with tree-based data modeling where you simply have elements as nodes
and the leaf nodes have the actual character content stored in them.

The only alternative for documents is to use something like a DOM tree
or else an event based parser.  The framework I have designed is pretty
much what you could call object based in the sense that when the parser
encounters a start or empty element tag it retrieves its name and asks
the current parent element for an element to handle that tags attributes
and content.

Does anyone have any ideas for a solution that could be both object
based, but document based as well?

I have thought of maybe having an opaque "UNKNOWN" element handler
object that would forward all requests queries for finding child
elements to its parent element, but the problem with that is how do you
know and tell the application if a particular tag should be treated as
an object based tag where all of its children should certainly be
ignored, or else you should simply join all of its children
(symbolically) to the "UNKNOWN" tags parent tag.

I know this might seem a little convoluted but here is what I am trying
to say in XML

<B>
    <I>
        Foo
    </I>
    <I>
        Bar
    </I>
</B>

Using the opaque "UNKNOWN" element it would look like this in tree form
if the tag were unknown:

                              <B>
               |                                  |
   <UNKNOWN>        <UNKNOWN>
               |                                  |
           "Foo"                          "Bar"

Symbolically this could be represented as simply:

                              <B>
                        |                |
                    "Foo"         "Bar"

Which in document format would evaluate to:

                              <B>
                                 |
                          "FooBar"

However, if I were to do all of this in Object format, any unknown child
elements of which in this case happens to be the element would
be skipped as well as all of the other sub elements contained in 
regardless of their type.

The only solution I can possibly think of to this dilemma is to have
each element object have a boolean flag that tells the XML Application
Framework (which happens to be a parser now but could easily be built on
top of SAX in 1/2 an hour) whether to ignore unknown child elements or
else join the children of unknown child elements as children themselves.

Anyone here got any better ideas on this?

Tyler

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)