Data warehousing and XML

len bullard cbullard at
Wed Dec 3 00:29:09 GMT 1997

Paul Prescod wrote:

> I don't doubt that there are some people in the world who want to "mine"
> documents, but I think that they are in the minority, and will be for a
> long time. But more important, it makes little sense to me to "mine" XML
> data. Even if you wanted to mine your structured document data it will
> almost always make sense to load that into the mining tool's internal
> data structures.

Umm.. that actually was one of the often requested capabilities 
when I was still working on SGML systems.  The problem was 
precisely that a great deal of the *interesting* information 
was not in relational databases.  Comparative policy analysis, 
for example.  
> Once again, XML is great as the transfer format, but when you get down
> to doing your queries, your data mining software should not be parsing
> the XML syntax.

Ok.  Hmm?  Well, what were the various proposals over the 
years for SGML querying systems for?

> > However, let me ask a technical
> > question that you can probably answer with a deeper
> > technical perspective than mine?  How well can one query
> > data (or convert it for that matter) for which one
> > has no rigorous schema (of some kind)?
> In some cases you can do sophisticated queries on data without a schema,
> but you would have to jump through AI hoops. It's not a job I would
> apply for, but neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.

That is what I thought to be the case.  I remember when we 
were doing the GE CASS system we bounced around the idea 
of using DTDs as sort of a reversed query, that is, it 
gave us a way to figure out what kinds of queries should 
be interesting.  We never pursued the idea because the 
SGML systems of that time were fairly primitive.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list