"SMDL"--work in progress

Adam M. Donahue adam at cyber-guru.com
Mon Jun 8 01:41:47 BST 1998


Hi all,

This is my first posting to xml-dev, as I'm still getting comfortable 
the XML specification.  However, as an exercise I have begun 
putting together a first XML DTD (hopefully to be put together) which 
I'm tentatively calling the "Site Map Definition Language" or SMDL.  
(The name will most likely change.)

The are currently hundreds of thousands (if not millions) of grouped 
collections of documents on the WWW which we often refer to 
individually at "sites."  We can use XML to define the various types 
of information out there in a uniform language.  We can further 
classify these different types of documents into groups, the most 
common of which right now is the idea of the Web site.  Right now, 
the structure of any given site is usually laid out in a tree of 
documents.  However, we have not yet seen a uniform way for 
expressing--in a single document--the contents of this tree.  This 
has resulted in non-standard "site maps," which vary greatly 
between different locations on the Web.  Compounding this is the 
fact that these maps, though often friendly to the user surfing the 
web, are highly machine unreadable (which is, of course, a general 
problem with HTML).  That is, automated web robots dispatched by 
the major search engines have no easy way of gaining quick 
access to the layout of an individual site.  These engines must then 
result to recursively searching sites for linked documents.  This 
poses a problem for both the web content provider, who may have 
robots accessing and cataloging pages not meant to be cataloged;  
it also poses a problem to the web robots themselves, which have 
the time consuming and bandwidth hogging task of requesting 
several pages in the hopes of keep a database up-to-date.

SMDL is a work in progress to solve the above problems.  With it, I 
hope to define a uniform language which web content providers can 
use to offer both user agents and robots access to the full structure 
of a site--including information about the tree-like layout of a site;  
the frequency of updates of a particular resource;  whether content 
is dynamic or not (now, with html files, for instance, the web robot 
cannot necessarily know if a page is server-parsed);  and other 
information.  Obviously there are a lot of possibilities.  This is why 
I'm coming to the group a bit early to get feedback.  What would 
you include in such a language?  Also, is this a worthwhile 
endeavor?

My proposal now is very small, and no doubt missing some 
important elements.  (Again, I will post very soon;  I want to make 
sure I've eliminated obvious DTD errors.)  It's mainly an exercise as 
an early XML application.  So don't be afraid to say it's unnecessary 
(I doubt you would anyhow) and anything else.  I appreciate any 
feedback at all.

Thanks in advance.  I look forward to further participation in this 
group, especially with the somewhat exciting XSchema 
specification.

Adam





mailto:adam at cyber-guru.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list