XML Search Engine Holy War - Attributes vs. Elements

Sat Oct 16 17:40:44 BST 1999

DuCharme, Robert wrote:
> 
> >1. Ignore Attributes all together and index Elements and Character Data
> >only.
> 
> >The feeling is that the use of attributes should be restricted (by
> >authors) and used to allow other scripts/applications to either include
> >or preclude the element and resultant children nodes from some sort of
> >processing, displaying or further manipulation.
> 
> This shouldn't even be considered. 

Yes, although a schema designer is free to document a semantic, this 
is application level design.  "All politics is local."  A search engine 
built on that premise is restricting its application space.

> Attributes are used for far more than
> what the above paragraph describes. Typical uses include many classic search
> criteria such as meta-information about authorship, revision stages, and
> revision dates. 

Include security markings in that too.  While one might not be smart 
to use that in a web application, security markings in attributes 
have been used in SGML DTDs.  Redacting...

> The sole purpose of ID type attributes is to uniquely
> identify elements, and unique identifiers ought to be pretty handy when
> searching for information. A system that can quickly locate elements with a
> particular value in an IDREF type attribute would be very useful in link
> maintenance and implementation.

And as in X3D, (being discussed(ID/IDREFS vs NMTOKENS)), for DEF/USE
relationships.  There are 
also examples of putting what others might consider "content" into 
attributes to preserve a symmetry with nodes and fields.  Some use 
and will use XML just as a binding to an abstract description (eg, 
X3D).  There is no simple case or practice that enables an engine 
to ignore attribute values and types unless one is blinding the engine 
by design.

> but a nice
> thing about implementing storage of attributes is that they map more easily
> to relational databases where ID and IDREF attributes can be easily indexed
> for searching.

Yes.  There are scripts and samples that do that now.  What I see in
practice 
is that export and import systems start out using only elements mapping 
field names to GIs, then after some experimentation, they begin to rely
more 
on GIs and attributes.  Applying ID/IDREF depends on how you use the
names.  
It is good for primary key/foreign key relationships if strict
relational 
rules are followed, but when packing/serializing, it isn't necessarily 
strict so NMTOKENs may be preferred.  The question is one of requiring 
a validity pass from the XML processor.

len

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)