Transition to XML (was Re: Opportunities for XML-DEV)

Simon St.Laurent simonstl at simonstl.com
Fri Sep 18 19:43:17 BST 1998


At 10:55 AM 9/18/98 -0700, Lisa Rein wrote:
>It took me WEEKS just to clean up the (pretty horrid) HTML on my little
>site -- but now it is at least HTML 4.0 valid.  I feel this is an
>important first step to first get everyone to at least be well-formed
>HTML, and THEN start pushing for xml transition.  

Fortunately, I'd picked up the gospel of well-formedness from my early days
with dynamic HTML.  It's a lot easier to script elements when their
boundaries are well-defined.  Most of my pages are hand-coded, so they're
fairly clean.  The ones from MS Word - well, we'll see.

I'm actually not bothering with being valid to the HTML 4 DTD, since it's
an SGML DTD.  I'll carve out my own little XML subset for now (building on
John Cowan's IBTWSH), and watch for the W3C's HTML modules to appear.

>The XML transition itself will be more easily accomplished from a web of
>well-formed HTML anyway.  In fact, if I have my dithers, much of it will
>be automatable.   

This is definitely true.  The more we can report on how easy and
automatable this process is, the more likely it is that others will join
the fun.

>Many sites that I have been trying to access with xml-enabled apps are
>dead-ends due to non-well-formed pages.  In order to accomplish creating
>an XML directory, for example, I would like to be able to index at the
>very least the websites of the members of this list -- but when I tried
>many were inaccessible ;-(
>
>So then I am forced to either "scrape" the data off of your sites, and
>regenerate versions of it that can be indexed -- which is another pain
>(shame on you all ;-) -- or keep looking for other well-formed sites --
>a frustrating search which usually just leads me back to my own site.
>
>Simon's other point (about the need to start migrating ourselves) hits a
>sore spot with me because I've been getting called on this lately --
>It's pretty embarrassing when some one calls you on not practicing what
>you preach -- especially with what some still view (unfortunately) as
>the a "religion" of xml. 

This is definitely the case.  (I got called on using MS Word's HTML output
for XSchema over the summer, which got me thinking.)  XML needs momentum,
and its stronger supporters are a reasonable place to start.  I don't think
we need or want XML police (although that's what validating parsers are,
more or less), but the more valid and well-formed material out there, the
better.  I can't wait to move beyond the HTML vocabulary, but it's a start.

Making these sites indexable would be another gigantic boost to the cause
of XML, worthy in its own right.

>Most HTML programmers can understand the sacredness of well-formed docs
>(-- although they might not get the value of syntax preservation in
>general.)  I'm not sure why the well-formedness thing gets across, but
>it does.  We can use this to everyone's advantage.

I'm finding it gets through more and more as developers hit walls.  Dynamic
HTML was a key area for this, CSS is starting to get through (largely
thanks to positioning, in my experience), and corporate intranet developers
frustrated by the weakness of the client end of their applications need to
find a better way. Validation is harder to explain, but I think it may just
take time.

Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer
Cookies / Sharing Bandwidth (November)
Building XML Applications (December)
http://www.simonstl.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list