Data warehousing and XML

Graydon Hoare gray at interlog.com
Tue Dec 2 15:27:11 GMT 1997


On Tue, 2 Dec 1997, Paul Prescod wrote:

> neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.

er, you're missing something. The whole point of data mining is admitting
that all the schemas you will ever establish are in some way flawed, no
matter what you do. There would be no need for such tools if we were
simply able to see the future, and know that it's terribly important to
maintain a count of how many sticks of gum get shipped to guam on tuesdays
in december. 

This is precisely why text retrieval is so hard -- the "schema" that all
documents are written in is a human written language, and nobody knows how
to machine-process that. You can chunk it up all you like into logical
blocks, but you're always going to be missing certain substantive
information relating to the text. In fact, if you want to get really
finicky about it, plain vanilla transcribed text loses useful information
conveyed in spoken language, and requires an expert "document engineer"
to produce (compare a literate adult's writing to that of a child).

-graydon <graydon at pobox.com>
 





xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list