Data warehousing and XML

Paul Prescod papresco at technologist.com
Tue Dec 2 16:13:05 GMT 1997


Graydon Hoare wrote:
> 
> er, you're missing something. The whole point of data mining is admitting
> that all the schemas you will ever establish are in some way flawed, no
> matter what you do. There would be no need for such tools if we were
> simply able to see the future, and know that it's terribly important to
> maintain a count of how many sticks of gum get shipped to guam on tuesdays
> in december.

Right but Len's question was about having a "schema of *some kind*". The
closer your schema is to explicitly recognizing the information you want
to discover, the easier it is to discover the information. If you have
no schema then you are Very Far Away from that goal.
 
> This is precisely why text retrieval is so hard -- the "schema" that all
> documents are written in is a human written language, and nobody knows how
> to machine-process that. You can chunk it up all you like into logical
> blocks, but you're always going to be missing certain substantive
> information relating to the text. 

Certainly, but those who actually do this processing still chunk it up
into the logical blocks because according to some schema, because that
is the way to get closest to achieving the goal. So in answer to Len's
question I still say that having a schema is better than not having one,
despite the fact that having the schema does not "solve" the problem. It
gets you closer to solving the problem.

 Paul Prescod

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list