Specification Questions

Neil Bradley neil at bradley.co.uk
Sat Aug 2 10:38:48 BST 1997

Thanks for the feedback, it was very helpful. However, I STILL do not
understand the need for the brackets in the latter half of Mixed:

> > The second line of the rule for [50]Mixed is:
> > 
> >    |  '(' S? %( '#PCDATA' ) S? ')'
> > 
> > I cannot understand the purpose of the inner brackets in this part
> > of the rule.
> I believe it is to allow parameter entity replacement at that spot:
> <!ENTITY % foobar (#PCDATA)>
> <!ELEMENT FOO (%foobar;)>

I understand the explanation, but the first half of the same rule is
as follows:

  '('  S?  %( %'#PCDATA' ( ..........

If   %'#PCDATA'  can appear here, why can't the second part of the
rule be similarly formulated:

  |   '('  S?  % '#PCDATA'   S?  ')'

Am I wrong in thinking this would allow a content of " ( %xyz; ) "?

> > There is also little written about interpretation of line-ending
> > codes. Although the standard states that white space and
> > line-ending codes are ignored in element content, nothing is said
> > regarding the age old problem of line-ending codes in mixed
> > content. 
> The spec makes no special provision for whitespace at the beginning
> and end of elements. I believe that this is intended to be one of
> its simplifications over "regular" SGML. This seeming
> incompatibility is mitigated by an an SGML TC which will allow XML
> to remain compatible with (post-TC) SGML.
>  Paul Prescod

Is it up to the application to decide what to do with any leading line
ending code in these positions then?

I am pleased to be rid of the 'record' concept (using RS and RE)
defined for SGML, particularly as I have tended to use Mac and UNIX
systems which use a single character to end a line (albeit different
ones!). However, I still think there is too little information on the
effect of line ending codes in mixed content. Obviously the safe thing
to do is to make the content of all elements with a mixed content
model fit on a single line, as in:

<p>This is a <b>long</b> paragraph.........................</p>

But with large text blocks, created using text editors, people will
continue to use line ending codes to make it readable on-screen.
Normally, a break between words would be interpreted as a space when
the block is paginated:

<p>This is a <b>long</b> paragraph that is broken over two
lines, with an implied space between 'two' and 'lines'.</p>

Yet what happens when a comment or processing instruction
appears on its own line?

<p>This is a long paragraph that is broken over two
<!-- comment -->
lines, with an implied space between 'two' and 'lines'.</p>

Is this interpreted as "two <!-- comment --> lines...", which reduces
to "two   lines"?


Neil Bradley - Author of The Concise SGML Companion.
neil at bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

More information about the Xml-dev mailing list