(IDEA) XML Persistent Datastore

Kevin A. Burton burton at relativity.yi.org
Fri Oct 15 04:27:50 BST 1999


I had some time to draw up my thoughts for an XML persistent datastore. 
This was included in an earlier document which I have included in the
bottom.

Requirements:

        - Persistence independent:
                - supports any database vendor (Oracle, Microsoft,
Sybase, Hypersonic 
                SQL, Ozone, etc).
                - done by container managed persistence
                - supports bean managed persistence.
                        - This is done by abstracting the persistence
methanism into 
                        another set of beans
                        - I realized this was a requirement when working
on massive size 
                        databases and when doing strange schema
manipulation (like 
                        multi-site replication)
                - it should be possible (and this I am fuzzy on) to map
every DOM object
                to a persistant object through EJB.  Instead of using a
DOM parser from 
                OpenXML, XML4J you use the native one provided.

        - DOM queries based on XPath and XQL

                - can be used with massivly scalable databases or when
you don't want to 
                transfer a massive database.  Currently DMOZ
(http://www.dmoz.org) has 
                300M of RDF.  An XPath query could allow us to transfer
only 10M of it.

        - Security API:
                - public documents can't just be exposed to the whole
world.  I propose
                an Access Control List based security mechanism with
inheritance (with
                the ability to override)
                - security meta information (user:mode) should be
transfered through
                some sort of XML document over a secure channel.
                - This would allow the developer to query an XPath and
if his query 
                violated security no data was returned.  IE
               
/government/executive-branch/president/credit-information would not
return
                anything thereby providing no information as to the
potential existance of 
                the data.
                
        - Data View
                - Conventional SQL supports the concepts of views. 
Views are defined as SQL
                queries on the original data
                - XML Views should be done with XPath/XQL

        - Schema constraints through XML schema

        - Performance search/index engine
                - currently modern HTTP search engines (Infoseek,
Excite) treat HTML documents 
                as one big CDATA section with all tagging removed.  This
needs to be fixed here.
                Since all data is basically XML it should be possible to
import structured 
                documents and run global (and fast) queries on them. 
This is possible because
                of XML Name spaces.  Every DTD must be inserted here
with a unique XML name 
                space.

                - since the persistence mechanism is abstracted here the
native database index
                can be use removing the burden of the developer from
coding one.

                
Just some ideas I have had... I am curious to see what everyone here
thinks....
        
---------------------------

(overview)

When the Internet started to experience massive growth one of the ways
designed to conquer its massive amount of information was to setup a
"search engine" which would spider the Internet or Intranet and
calculate word counts from the HTML presentation layer.  Later,
databases started being written to the web application spectrum that
added a gateway to the structured data.

Later the W3C in its infinite wisdom published the XML specification to
help split the difference and a standard give structure to documents
that can easily be extended.

(idea)

Currently Java lacks an index server.  One that parses a URL or
filesystem and generates a META-Index (approx 40% original size) of the
content and allows users to find a document within a filesystem or
website based on a query.

This lacks any structure as the user might search on something like
China and either get the Country or the type of dishes.

What I think is needed is a blend of the two.  There is a *lot* of
existing data that is HTML based (and will be for a while) that needs to
be indexed within a 100% Java environment.  Adding XML support to a Java
Index Server would allow the user to enumerate a list of known DTDs and
run a query on <country>China</country> (or XQL/XOQL/DOM) and obtain the
doc (possibly HTMl after XSL re-format).

(goals)

I would just like to get feedback.  There are some smart people here. ;)

Kevin



--

Kevin A Burton
http://relativity.yi.org
Mobile:  408-910-6145

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list