Why must an XML document contain an element?

Tue May 6 21:23:06 BST 1997

Hi Eric,
	I see Tim has answered some of your queries.  I'll take another
(implied one).
In message <libSDtMail.9705061419.19450.ebaatz at barbaresco> Eric Baatz - Sun Microsystems Labs BOS writes:
> My application accepts plain text.  If its client wants it to do

I think you have an assumption here that you know what software will be 
processing your document at the other end.  So far that isn't defined in
XML - it may be later.  At present all that the client knows is:
	- the document is XML
	- *possibly* what the DOCTYPE is
	- *possibly* what stylesheets are associated with the document.

AT present there is no mechanism in XML for saying 'please process this document
with FOOBAZ software'.  That's more like a plugin requirement.  The most
that XML can say is:
	- please apply this stylesheet to the document.  And the stylesheet
		can have sophisticated algorithmic behaviour through DSSSL
	- OR please apply this behavior to the document (or some part of the
		document).  At present the syntax isn't defined.  My current
		approach in JUMBO is to apply a separate Java class per 
		element.  Other people may have different strategies.

Let's assume your document is
<FOO> This is the first line

and there was a newline
</FOO>
If you sent your document to JUMBO, it would capture the text including 
spaces and newlines and store it as a PCDATA element.  If you wanted to
output it it would output it as you sent it.  If you wanted to display
it it would look excatly the same.  If, however, you used <HTML> instead
it would try (rather crudely) to format it as HTML.  the newline would
disappear and newlines would be included in the display where the text
hid the right edge.  At present JUMBO is not sophisticated enough to manage
the DEFAULT|PRESERVE attribute - by default it's DEFAULT which is the
application's default w/s processing mode (which happens to be PRESERVE!!).
Remember also that a 'plain text' document has a lot of implied structure
which the application cannot be expected to pick up without careful
conventions.
> a better job, it can markup the text using an XML syntax.
> 
> So, the client could want to send the application something like:
> 
>   This is plain text.
>   
> However, if the application is expecting XML markup, then it would

I am not quite sure whether I understand your use of client and application.
My model is:
	WWW --->doc---> parser --> application
If you use a WWW browser (?client) to interface to the WWW, then you
might have:
	WWW--->doc---> browser -->parser --> application
Some people would call the whole of the client-side stuff a client, whereas
others might just use it for the browser.  I think this is an important 
point and have urged the XML community to try to identify these components
precisely.  For my own part, I separate parser and application in the
architecture, and this is a useful model.

	What does your application get from the browser/parser?  We're still
trying to work that out.  NXP gives me an Esis stream [Norbert, I need a 
handle to extarct the DOCTYPE, since that's not in Esis].  Lark gives me a 
root element of a tree, which I can navigate myself.  Some people want to
pass groves to the application, but I'm not sure of the status of those
developments.

	P.

> be nice if everything a client sent was an XML document.  So, for
> the sake of clarity and consistency, I can force the client to send:

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)