(IDEA) XML Persistent Datastore
Kevin A. Burton
burton at relativity.yi.org
Fri Oct 15 04:27:50 BST 1999
I had some time to draw up my thoughts for an XML persistent datastore.
This was included in an earlier document which I have included in the
bottom.
Requirements:
- Persistence independent:
- supports any database vendor (Oracle, Microsoft,
Sybase, Hypersonic
SQL, Ozone, etc).
- done by container managed persistence
- supports bean managed persistence.
- This is done by abstracting the persistence
methanism into
another set of beans
- I realized this was a requirement when working
on massive size
databases and when doing strange schema
manipulation (like
multi-site replication)
- it should be possible (and this I am fuzzy on) to map
every DOM object
to a persistant object through EJB. Instead of using a
DOM parser from
OpenXML, XML4J you use the native one provided.
- DOM queries based on XPath and XQL
- can be used with massivly scalable databases or when
you don't want to
transfer a massive database. Currently DMOZ
(http://www.dmoz.org) has
300M of RDF. An XPath query could allow us to transfer
only 10M of it.
- Security API:
- public documents can't just be exposed to the whole
world. I propose
an Access Control List based security mechanism with
inheritance (with
the ability to override)
- security meta information (user:mode) should be
transfered through
some sort of XML document over a secure channel.
- This would allow the developer to query an XPath and
if his query
violated security no data was returned. IE
/government/executive-branch/president/credit-information would not
return
anything thereby providing no information as to the
potential existance of
the data.
- Data View
- Conventional SQL supports the concepts of views.
Views are defined as SQL
queries on the original data
- XML Views should be done with XPath/XQL
- Schema constraints through XML schema
- Performance search/index engine
- currently modern HTTP search engines (Infoseek,
Excite) treat HTML documents
as one big CDATA section with all tagging removed. This
needs to be fixed here.
Since all data is basically XML it should be possible to
import structured
documents and run global (and fast) queries on them.
This is possible because
of XML Name spaces. Every DTD must be inserted here
with a unique XML name
space.
- since the persistence mechanism is abstracted here the
native database index
can be use removing the burden of the developer from
coding one.
Just some ideas I have had... I am curious to see what everyone here
thinks....
---------------------------
(overview)
When the Internet started to experience massive growth one of the ways
designed to conquer its massive amount of information was to setup a
"search engine" which would spider the Internet or Intranet and
calculate word counts from the HTML presentation layer. Later,
databases started being written to the web application spectrum that
added a gateway to the structured data.
Later the W3C in its infinite wisdom published the XML specification to
help split the difference and a standard give structure to documents
that can easily be extended.
(idea)
Currently Java lacks an index server. One that parses a URL or
filesystem and generates a META-Index (approx 40% original size) of the
content and allows users to find a document within a filesystem or
website based on a query.
This lacks any structure as the user might search on something like
China and either get the Country or the type of dishes.
What I think is needed is a blend of the two. There is a *lot* of
existing data that is HTML based (and will be for a while) that needs to
be indexed within a 100% Java environment. Adding XML support to a Java
Index Server would allow the user to enumerate a list of known DTDs and
run a query on <country>China</country> (or XQL/XOQL/DOM) and obtain the
doc (possibly HTMl after XSL re-format).
(goals)
I would just like to get feedback. There are some smart people here. ;)
Kevin
--
Kevin A Burton
http://relativity.yi.org
Mobile: 408-910-6145
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list