XML tools and big documents

Michael Kay M.H.Kay at eng.icl.co.uk
Fri Sep 4 12:56:05 BST 1998


>To this end, I have been (in such spare time as i have)
tinkering
>about with Mr. Clark's XP API (com.jclark.xml.tok, mostly)
to write an
>application that will allow me to attach the logical
element structure
>to offsets in the storage entity, so that I can consider
the logical
>structure's relationship to points in the text without
reparsing the
>document
I think we're all looking for a solution to the problem that
a >1Mb document is too big, we don't want to parse it every
time we want to look at it, but storing the fine-grained DOM
representation has the opposite problem, it takes too much
space and takes too long to reassemble a reasonable unit
like a page. Indexing the original serial XML (say at
"chapter" level) is one solution; it's essentially
equivalent to my approach, which has been to split the
original XML (say at "chapter" level) and store the
"chapters" as separate linked XML documents.

What I mean by "chapter" is typically 1-10Kb, or
alternatively, a chunk of text such that the user doesn't
mind pressing "Next" when he's got to the end of it.

Mike Kay


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list