Data warehousing and XML
len bullard
cbullard at hiwaay.net
Wed Dec 3 00:29:09 GMT 1997
Paul Prescod wrote:
> I don't doubt that there are some people in the world who want to "mine"
> documents, but I think that they are in the minority, and will be for a
> long time. But more important, it makes little sense to me to "mine" XML
> data. Even if you wanted to mine your structured document data it will
> almost always make sense to load that into the mining tool's internal
> data structures.
Umm.. that actually was one of the often requested capabilities
when I was still working on SGML systems. The problem was
precisely that a great deal of the *interesting* information
was not in relational databases. Comparative policy analysis,
for example.
> Once again, XML is great as the transfer format, but when you get down
> to doing your queries, your data mining software should not be parsing
> the XML syntax.
Ok. Hmm? Well, what were the various proposals over the
years for SGML querying systems for?
> > However, let me ask a technical
> > question that you can probably answer with a deeper
> > technical perspective than mine? How well can one query
> > data (or convert it for that matter) for which one
> > has no rigorous schema (of some kind)?
>
> In some cases you can do sophisticated queries on data without a schema,
> but you would have to jump through AI hoops. It's not a job I would
> apply for, but neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.
That is what I thought to be the case. I remember when we
were doing the GE CASS system we bounced around the idea
of using DTDs as sort of a reversed query, that is, it
gave us a way to figure out what kinds of queries should
be interesting. We never pursued the idea because the
SGML systems of that time were fairly primitive.
len
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list