ATTN: Please comment on XHTML (before it's too late)

David Megginson david at megginson.com
Mon Aug 30 02:16:28 BST 1999


Paul Prescod writes:

 > Tim Bray wrote:
 > > 
 > > HTML4 has 3 DTDs.  That certainly doesn't mean it's 3 languages.  
 > 
 > What definition of language are you using? 

Once people said that a language is just a dialect with an army and a
navy, but in the 19th and early 20th centuries the United States went
to all the trouble of building their own army and navy and still
didn't end up with their own language.

What that really means is that people tend to use the language name to
make the most important, not the least important distinctions.  I call
what people in much of Canada, the U.S., England, Scotland, Wales,
Ireland, Australia, New Zealand, and parts of India and South Africa
speak as their mother tongue "English" despite the enormous
differences among them (and among different regions and among
different socio-economic groups within each region).

Most of the time, that's the most important distinction to me: I want
to know if someone is speaking (or writing) English or German, not
Central-Canadian-young-university-educated-suburban-English or
Eastern-US-rural-working-class-Tidewater-English or whatever.  

Still, the differences are significant: there are differences in
vocabulary (CA "chocolate bar" = US "candy bar", UK "boot" = US
"trunk", etc.), differences in grammar (morphological differences like
"y'all" vs "youse" and many syntactic differences), and differences in
pronunciation.  If I need to draw attention to these differences, I do
so with a secondary qualifier ("Scottish English", etc.), but most of
the time I don't need to make that distinction so I don't bother with
the qualifier.

This all matters, because naming HTML elements is really the same
thing -- the name should identify the most obvious information, not
the least obvious.

I agree with Paul that we need a method for discovering the version of
HTML names (or any names) being used, but I have not heard any good
argument about why we cannot just use an html:version attribute --
that way, what Eliot calls "cheap processors" will still work
more-or-less OK, while fancy processors can check the version and do
the right thing.  The alternative -- inventing a whole machinery of
Namespace URI equivalence mappings -- seems a little heavy-handed for
such a simple problem.

 > Until recently our only name for languages *was* the DTD's public or
 > system identifier. Therefore it has been routine for similar languages
 > to have differing names:
 > 
 > PUBLIC "-//TEI//DTD TEI Lite 1.5 //EN" "pubtext/teilite.dtd"
 > PUBLIC "-//TEI//DTD TEI Lite 1.6//EN" "pubtext/teilite.dtd"
 > 
 > PUBLIC "-//W3C//DTD HTML 4.0//EN" strict.dtd
 > PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" pubtext\html\html4.dtd
 > PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN" frameset.dtd
 > 
 > PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" 
 > PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" 

Wrong -- those are the names of the schemas, not the names of the
languages.  TEI and HTML have long had human-readable names that most
people use most of the time: Namespaces is just an attempt to help
machines to disambiguate them.


All the best,


David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list