scripts and PIs

Simon St.Laurent simonstl at
Fri Oct 15 16:22:12 BST 1999

Wow.  That got much better response than when I first asked if CDATA
sections weren't pretty clunky.  I'm going to try to respond to several
messages here and see if I can find a coherent thread.

Marcus Carr wrote:
>I prefer PIs. I don't like the idea that the currrent 
>mechanism to insert script also carries an implied structural 
>significance - that  might sometimes be true, but I'd prefer to 
>be able to make that decision. Also, I think that script is very 
>much a "processing instruction" type of thing. 

This is pretty much where I was coming from.  The last time I brought up
this issue, it quickly moved from 'this is clunky' to 'why would you want
to mix scripts and element content'.  I think pulling scripts out of the
element structure, by putting them in PIs, has a nice philosophical bent
that soothes the controversy over mixing scripts and content.

Marcus Carr:
>Would we then require/desire more control over PIs by allowing them 
>to support attributes though? If you had to wrap the PI in an element 
>just so you had somewhere to store other information it would seem not 
>to buy much, but it would be interesting if the PI was able to handle 
>them as well. Even without attributes I prefer PIs - they seem more 
>natural than an element and a CDATA section.

The attribute question is especially important if you want to be able to
import external script files like <SCRIPT SRC="myscript.js" />.  This one
I'll have to ponder a bit, but I think 'naturalness' is definitely tilting
me toward PIs.

Eve Maler wrote:
>By the way, the main reason why we changed the closing delimiter of SGML 
>PIs from > to ?> in creating XML was to make PIs safer for containing 
>scripting and code.  It's pretty common to have the string ">" appear in 
>scripting, but (we hoped) far less common to have "?>".  Besides, it made 
>it more symmetrical...

That seems like an argument for taking this approach.  If the closing
delimiter was still '>', I don't think this idea would even have had a
chance to germinate.  And symmetry is nice as well.

Rick Jelliffe then asked some hard questions:
>I think there is a programming factor too.  Ask yourself, 
>"Is the PI a child of the parent element or is it an effect 
>at that point in the document?" If it is not an effect for 
>that point in the document but has been bundled into a section 
>perhaps with other similar markup  (e.g. into <meta> in HTML) 
>then I think it should be an element.

This is important, and HTML's way of handling SCRIPT elements has always
been sort of interesting, and I think is on the verge of a significant
change/clean-up.  SCRIPT elements really came in two varieties, depending
on the type of content they held.  If the SCRIPT element held
subroutines/methods/whatever - organized blocks of code - then it was
processed and usable by any scripts that came after the SCRIPT element.  If
the SCRIPT element held 'bare' code, it would get processed at that point
in document processing.

Over time, I think more and more developers have built code libraries that
appear at the start of a document (or are reference) and then rely on the
event handling mechanisms available in HTML, like onclick.  The W3C seems
to be moving these even further out of HTML, and making events and scripts
assignable in CSS. (See - just the 'Additions to
CSS' section.  HTC looks _hideous_.)

So basically there are multiple scopes that are possible - document-wide
and point-specific.  I don't think it's reasonable to expect processors to
handle different approaches, one element for document-wide and one PI for
point-specific, so I'd argue for a PI-based approach, except in cases like
XHTML where we're stuck with a legacy of <SCRIPT>.  (I'm hoping that a
PI-based approach would allow developers to use scripts in any XML dialect,
not just HTML.)

Rick Jelliffe:
>                   Element   PI     Attribute
>-----------------  --------  ------ ---------
>Push/pull          Push      Push   Pull
>effect/structure   structure effect structure

A great table - we need more things like this.  (Damn, I just finished and
submitted a chapter on choosing between elements and attributes. Oh well.
I hope someone else picks up on this.)

I think the structure/effect dichotomy is useful, and I think for scripting
you really have to call it an effect.  It may be an effect at a given
location in the structure, but I think it's definitely _still_ an effect.

>More rules could be added to provide more guidence: in particular,
>whether the script was applicable to all uses of the document.
>But there is no reason why even all these rules will give a clear
>answer in all cases: in that case, house style will probably apply
>and W3C house-style is clearly to favour elements and to
>favour processing by element type (or HTML class) rather than
>supporting point-based markup.

I think the 'no clear answer' argument is probably the real answer here -
all cases are blurry.  I don't expect the W3C to change XHTML to support
PIs for scripts, however sensible I think that approach might be.  I don't
think that this approach is incompatible with the event-based approach they
seems to be taking with regards to scripting in general, however, so I'd
like to pursue this approach for other possibilities in XML markup.

>So, in the case of SCRIPT in HTML, I think it should be an element
>not a PI.  There is no special processing that a PI invokes at the
>point of its declaration in an HTML document.

In fact, the only mentions of PIs in the XHTML PR suggest that they will be
rendered by many user agents and conflate their existence with the XML
declaration. I agree with your conclusion, but mostly on the grounds that
XHTML and HTML don't really understand what PIs are.

Tom Passin writes:
>Generally, I agree that a SCRIPT could (should) be an element.  But some
>script languages or even statements require specific formatting.  For
>example, a Javascript single-line comment must not wrap; some Python code
>depends on specific indenting; etc.  I can imagine that a processor, finding
>a PI at that point, would preserve the special formatting where otherwise it
>would not.  Of course, the processor could simply know to preserve
>formatting when it hits a <SCRIPT> element.

Both PIs and elements should be reported with formatting intact, except for
the line-break normalization that should take place in element content.  I
suspect it's more likely that PI content would reach the script processor
untouched, however.

>If the processor is XSLT, SCRIPT elements with "<" and "&" characters can be
>output properly for HTML (according to the more recent drafts of XSLT), so
>here we don't need PIs or CDATA either.

Moving beyond HTML to XHTML and into other dialects of XML will require PIs
or CDATA, as the rest of XML isn't nearly as forgiving.

When I have some time (?!), I hope to write this up as a more formal
proposal, but thanks to everyone for the ideas and excellent discussion.

Simon St.Laurent
XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Sharing Bandwidth / Cookies

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list