Indexing of XML documents

Peter Murray-Rust Peter at ursus.demon.co.uk
Fri Mar 14 23:47:13 GMT 1997


I hope I can express this problem clearly - I'm sure that you are
familiar with it.

When we need to resolve a TEI pointer like (id a23) we may have to scan
the whole document.  In general we will wish to cache (index) IDs since
we don't wish to rescan for another search.  One obvious place to do this
is when the document is first read in (admittedly there may never be a need
to scan the whole document).

When validating a document the IDs, GIs and ATTNAMEs all have to be scanned
since they occur in VC's.  Presumably as a by-product of validation we can 
at least expect a hashtable of IDs (and possibly GIs).  

The question is, should we do both of these by default (or even others
that I haven't thought of)?  Or should we do none and leave it to the app?
Or should the parser have a switch?

	P.

[BTW a WF document can have multiple identical IDs, OK?  Presumably the
behaviour of an app that has to reference them is 'undefined'?]

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list