PCDATA

Peter Murray-Rust Peter at ursus.demon.co.uk
Wed May 7 08:22:02 BST 1997


In message <3.0.32.19970506181242.009f2d80 at pop.intergate.bc.ca> Tim Bray writes:
> At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote:
> >How many PCDATA elements would be expected in the file?
> <?XML VERSION="1.0"?>
> <!DOCTYPE CML>
> <CML>
> <XVAR>
> This is a variable
> </XVAR>
> </CML>
> 
> Let's flatten that.  Clearly there can't be any PCDATA before <CML>, so:
> 
> <CML>\n<XVAR>\nThis is a variable\n</XVAR>\n</CML>
>      11      2222222222222222222222       33
> 
> Three pieces of PCDATA.  Uh, I'll check Lark now... if it says anything
> else, that's a bug. -T

No bug.  And Michael SMcQ gave the same answer.  I am not sure what NXP
gives at the moment, I'll have to check.  So *I*, and most of the people
who will be using CML, have a potentially serious problem and I don't know what 
to do.

Ancillary Question:
If this had been run through a validating parser and the DTD had contained
<!ELEMENT CML (FOO|XVAR)*>
I assume the above document would be invalid?  (#PCDATA does not occur in the
CML content model).

But am I not right in thinking that in SGML the 'additional' newlines
are discarded?  If I run this document through sgmls with the above
document, doesn't it validate?  (I'm doing this from memory, so please be 
gentle).  And at the same time throw away the 'spurious' #PCDATA elements?

Problem 1.
For a DTD which makes a restricted use of PCDATA, most documents are going to
have lines of hundreds or thousands of characters long.  The lines above
would have to be:
<CML><XVAR>This is a variable</XVAR></CML>
and this could easily - in some of my applications - be very much longer.
This makes such documents tricky to edit by hand and could cause problems
with some text processing software.

Problem 2.
It is going to be almost impossible to educate an HMTL2XML community that the
two documents above are different.  I have only just realised this problem
today, although I seem to remember in earlier versions of the spec the
behaviour was different?  So I now haven't the slightest idea what I should
be doing - and I thought this was all solved...

Problem 3.
This seems to imply that a WF document *produces different output* if it is 
validated against a DTD.  I accept this is true for SGML, but is it also
true for XML?  If so, I think we shall have an awful problem educating
people.


You will appreciate that I may have clung onto ideas which were parts of 
earlier versions of the draft.  I'd be very grateful for an 'extremely
simple' explanation of what happens with various input of the type above.
If it's what I think, then at the very least I think that the current draft
needs to address this more directly.  Personally I would like some sort
of XML-based switch that allowed a simple behaviour and allowed newlines
for formatting.

The spec says that DEFAULT says the the *application's* default white-space
processing modes (why plural?) are acceptable.  Is the application a DTD or
a program?  If the latter, then we are potentially going to have serious
problems.  If the former, then I don't see how the information is conveyed
from the DTD to the program, if the program is generic (like JUMBO).

P.  (somewhat confused :-).


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list