PI target names

Tue May 11 20:38:09 BST 1999

From: Don Park <donpark at quake.net>

>> One can wonder why PIs were specified in the first place, if W3C
doesn't
>> want anyone to use them.
>
>What I would like to know is WHY W3C does not want to encourage folks
to use
>PI.  Perhaps I'll agree with them once I hear all the arguments,
perhaps
>not.  I would like to know what the arguments are.

We should be careful not to mythologize that everyone involved in W3C is
anti-PI.  Some people perhaps feel that XML is a technology preview to
prototype ideas for HTML; that is a credible view to have (though I
don't hold it personally); for them PIs represent a dead-end because it
is too late to retrofit them into HTML, they think.

* To me PIs represent, above all, a proposal of humility by SGML/XML's
designers: to admit that even within the most carefully constructed
schema, there may in practise be kinds of tags required that the schema
designer has not anticipated or expected. These tags may represent kinds
of structures which simply do not fit into the element structure. (And
the people who detect these different structures may not have the
authority or capability to alter the schema.)

* Furthermore, there may be conflicting ideas about interesting points
within a document, and that such different ideas should be allowed, but
only by using the "low-hanging fruit" of point-based tags, not a full
concurrent or asynchronous (wrong word) element tree.

 For example, in what schema language can you say "a document entity
must start with this encoding header"?  Or, before the top element you
must associated a stylesheet?   If entities were forced to start with
elements and always to contain at least on element, then we could do
away with these kinds of PIs: we could use attributes on elements.
There is always a problem that in most DTDs (and in some of the schemas,
EDD is an exception) that there are many possible root element types,
and it is not possible to define attribute requirements based on tree
locations (actually, SGML's attribute LINKing allowed some kinds of
variant attribute-requirements based on tree-position): this creates a
kind of aberrant category of attributes which belong to tree-locations
rather than element types.

* PIs also represent a method of extensibility in which the PI tags do
not alter validation against the DTD.  It would be nice to have a schema
declaration language which allowed kinds of validity (or at least some
kind of notation well-formedness) of PIs.  But we should not think that
extensibility was entirely missing from SGML: the trouble with PIs as
traditionally practised in SGML was that there was no "target"
convention enforced or defined, so the extensibility never was able to
get organized.

* In the absense of a standard way of pointing to individual character
positions (numerical character indexing) there is no standard way to
have out-of-line markup inside entities which does not disrupt element
structures. PIs provide a way of tagging positions, both to accomplish
inline parallel structures, if needed, or as targets for out-of-line
markup. Unnormalized Unicode has a big ambiguity problem which makes,
for many written scripts, numerical character indexing unreliable or
problematic: so it may be useful to have the back-up of being able to
index to particular PIs within an element instead.

* The classic use of PIs is a tag to hang publication-dependencies on.
(SGML also had another kind of attribute, the PI attribute, which
allowed you to hang PIs off elements too. I don't know whether this can
be simulated by ENTITY attributes, where the entity contains a marked-up
PI, but I doubt it.)  So, for example, you might decide that all
pagebreaks and newlines should be signified by PIs. This simplifies
content models no end (I can show examples, but they are for Chinese
documents). If HTML had PIs, these could have been used to hide scripts
(instead of <!-- which is just plain wrong) and for Server-Side
Includes: not having the form of markup meant that comments had to be
abused.)

* The other big justification for PIs is an analytical one.  Of course
all the structures in XML can be reduced to LISP S-expressions, or RDF
graphs and other intricate webs of arcs.  But then the pieces need to be
reassembled for the sake of comprehensibility and usability: characters
are kept as strings not individual numbers each in their own tag, for
example; some arcs are labelled to make a tree-structure (the other arcs
are made links or attributes) . So PIs represent part of a theory  of
document structure or construction that says that element structure !=
entity structure != notation structure != processing instruction
structures (perhaps "!=" is too strong: "need not equal" is better; and
of course, XML simplifies this in the interests of parseability).  Note
that this is not a theory about the data itself; it is about
documents/serializations.

Rick Jelliffe

P.S. It would be nice if the Schema group made declarations to allow
numeric character
references inside PIs and comments.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)